<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:planet="http://planet.intertwingly.net/" xmlns:indexing="urn:atom-extension:indexing" indexing:index="no"><access:restriction xmlns:access="http://www.bloglines.com/about/specs/fac-1.0" relationship="deny"/>
  <title>Planet Pypefitters</title>
  <updated>2009-01-07T05:15:13Z</updated>
  <generator uri="http://intertwingly.net/code/venus/">Venus</generator>
  <author>
    <name>Tres Seaver</name>
    <email>tseaver@agendaless.com</email>
  </author>
  <id>http://planet.pypefitters.org/atom.xml</id>
  <link href="http://planet.pypefitters.org/atom.xml" rel="self" type="application/atom+xml"/>
  <link href="http://planet.pypefitters.org/" rel="alternate"/>

  <entry>
    <id>http://plope.com/Members/chrism/alt_configuration</id>
    <link href="http://plope.com/Members/chrism/alt_configuration" rel="alternate" type="text/html"/>
    <title>Application Configuration for Frameworks</title>
    <summary>It would be useful to find or create a configuration system for use in
repoze.bfg (and other frameworks) that had the same purpose as
'zope.configuration' (aka ZCML), but rids itself of the hard
dependency on 'zope.component'.</summary>
    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><h1>Idea</h1>
<p> It would be useful to find or create a configuration system for use in
 repoze.bfg (and other frameworks) that had the same purpose as
 <code>zope.configuration</code> (aka ZCML), but rids itself of the hard 
 dependency on <code>zope.component</code>.</p>
<h1>Rationale</h1>
<p> Most frameworks need some form of explicit configuration step.  This
 usually happens in two phases.  The first phase is to load the
 configuration data from some file(s).  The second phase is an
 "execution" phase which operates against the configuration data.  The
 execution phase is not passive, it's active.  The return value of the 
 execution phase is not terribly important, it is only executed in 
 order to perform arbitrary setup.</p>
<p> It is often useful to "box this step in" to some known time period.
 Instead of relying on import ordering to do ad-hoc configuration,
 making the execution step explicit ensures that the configuration
 happens in a known order as a result of a single function call.  It
 also forms a convention which can help load "the right" configuration
 for functional testing and other purposes.</p>
<p> The specific code that is executed depends on what the user puts in
 the configuration data.  The configuration data is typically
 explicitly not Python: Python is often too powerful for this task.
 Instead, there is some controlled set of directives in a structured
 format that are available for users to inject into a configuration
 file.</p>
<p> Most frameworks also need somewhere to stash and retrieve
 <em>application</em>-specific settings (as opposed to module-scope settings),
 as more than one instance of an application using the framework may
 exist in the same process space.</p>
<p> <code>zope.configuration</code> provides such a system but it assumes that the
 framework developer is willing to "bite off" the entire Zope component
 architecture.  Also, <code>zope.configuration</code> makes the assumption that there
 is a single global configuration; this is a limitation that makes it
 hard to run multiple instances of an application using the
 configuration system in the same process.</p>
<h1>Requirements</h1>

<ul>
<li>The pattern for loading and executing directives from
    configuration is similar to the pattern exposed by the design of
    zope.configuration.  E.g. there is a "load" phase which reads data
    from a config file, and an execute phase which executes the
    configuration; these can be invoked independently as necessary.</li>
<li>The input format should be overrideable (e.g. YAML vs. XML)</li>
<li>No automatic dependency on zope.component (ie. the configuration
    registry itself is not a component architecture registry).  It
    should be possible to <em>use</em> a <code>zope.component</code> registry within the
    system, but the configuration system should not assume it is
    populating a component architecture registry.</li>
<li>The configuration system should allow developers to extend the
    system with new directives; for example, someone should be able to
    plug in an implementation of a directive that, when spelled in the
    config file by a user, could cause a set of Routes .connect
    statements to be run, or a ZCA utility to be registered.</li>
<li>The configuration system should be able to raise an error when an
    unknown directive is specified or the options to a particular
    directive are somehow wrong.</li>
<li>Configuration should be able to span multiple files.  There should
    be a way to include configuration from another Python package or
    filesystem location.</li>
<li>The configuration system should provide an application-specific
    data structure to both retrieve <em>and</em> temporarily store data at
    runtime.</li>

</ul>
<p>Anyway, that's enough for now.  Just thinking out loud.</p></div>
    </content>
    <updated>2008-12-30T09:44:16Z</updated>
    <category term="python"/>
    <category term="tech"/>
    <category term="zope"/>
    <author>
      <name>chrism</name>
    </author>
    <source>
      <id>http://plope.com</id>
      <link href="http://plope.com/python.rss" rel="self" type="application/atom+xml"/>
      <link href="http://plope.com" rel="alternate" type="text/html"/>
      <subtitle>Chris McDonough's Python Feed</subtitle>
      <title>Chris McDonough's Python Feed</title>
      <updated>2008-12-30T09:44:16Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://blog.ianbicking.org/2008/12/27/avoiding-silos-link-as-a-first-class-object/</id>
    <link href="http://blog.ianbicking.org/2008/12/27/avoiding-silos-link-as-a-first-class-object/" rel="alternate" type="text/html"/>
    <title xml:lang="en">Avoiding Silos: “link” as a first-class object</title>
    <summary xml:lang="en">One of the constant annoyances to me in web applications is the self-proclaimed need for those applications to know about everything and do everything, and only spotty ad hoc techniques for including things from other applications.
An example might be blog navigation or search, where you can only include data from the application itself.  Or [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><div class="document">
<p>One of the constant annoyances to me in web applications is the self-proclaimed need for those applications to know about everything and do everything, and only spotty ad hoc techniques for including things from other applications.</p>
<p>An example might be blog navigation or search, where you can only include data from the application itself.  Or "Recent Posts" which can only show locally-produce posts.  What if I post something elsewhere?  I have to create some shoddy placeholder post to refer to it.  Bah!  Underlying this the data is usually structured in a specific way, with the HTML being a sort of artifact of the database, the markup transient and a slave to the database’s structure.</p>
<p>An example of this might be a recent post listing like:</p>
<pre class="literal-block">&lt;ul&gt;
  for post in recent_posts:
    &lt;li&gt;
      &lt;a href="/post/{{post.year}}/{{post.month}}/{{post.slug}}"&gt;
        {{post.title}}&lt;/a&gt;
    &lt;/li&gt;
&lt;/ul&gt;
</pre>
<p>There’s clearly no room for exceptions in this code.  I am thus proposing that any system like this should have the notion of a "link" as a first-class object.  The code should look like this:</p>
<pre class="literal-block">&lt;ul&gt;
  for post in recent_posts:
    &lt;li&gt;
      {{post.link()}}
    &lt;/li&gt;
&lt;/ul&gt;
</pre>
<p>Just like with <a class="reference external" href="http://blog.ianbicking.org/2008/10/24/hypertext-driven-urls/">changing IDs to links</a> in service documents, the template doesn’t actually look any more complicated than it did before (simpler, even).  But now we can use simple object-oriented techniques to create first-class links.  The code might look like:</p>
<pre class="literal-block">class Post(SomeORM):
    def url(self):
        if self.type == 'link':
            return self.body
        else:
            base = get_request().application_url
            return '%s/%s/%s/%s' % (
                base, self.year, self.month, self.slug)

    def link(self):
        return html('&lt;a href="%s"&gt;%s&lt;/a&gt;') % (
            self.url(), self.title)
</pre>
<p>The addition of the <tt class="docutils literal"><span class="pre">.url()</span></tt> method has the obvious effect of making these offsite links work.  Using a <tt class="docutils literal"><span class="pre">.link()</span></tt> method has the added advantage of allowing things like HTML snippets to be inserted into the system (even though that is not implemented here).  By allowing arbitrary HTML in certain places you make it possible for people to extend the site in little ways — possibly adding markup to a title, or allowing an item in the list that actually contains two URLs (e.g., <tt class="docutils literal"><span class="pre">&lt;a</span> <span class="pre">href="url1"&gt;Some</span> <span class="pre">Item&lt;/a&gt;</span> <span class="pre">(&lt;a</span> <span class="pre">href="url2"&gt;via&lt;/a&gt;)</span></tt>).</p>
<p>In the context of Python I recommend making these into methods, not properties, because it allows you to later add keyword arguments to specialize the markup (like <tt class="docutils literal"><span class="pre">post.link(abbreviated=True)</span></tt>).</p>
<p>One negative aspect of this is that you cannot affect all the markup through the template alone, you may have to go into the Python code to change things.  Anyone have ideas for handling this problem?</p>
</div></div>
    </content>
    <updated>2008-12-27T19:09:18Z</updated>
    <published>2008-12-27T19:09:18Z</published>
    <category scheme="http://blog.ianbicking.org" term="HTML"/>
    <category scheme="http://blog.ianbicking.org" term="Web"/>
    <category scheme="http://blog.ianbicking.org" term="Python"/>
    <category scheme="http://blog.ianbicking.org" term="Programming"/>
    <author>
      <name>Ian Bicking</name>
      <uri>http://blog.ianbicking.org</uri>
    </author>
    <source>
      <id>http://blog.ianbicking.org/feed/atom/</id>
      <link href="http://blog.ianbicking.org" rel="alternate" type="text/html"/>
      <link href="http://blog.ianbicking.org/feed/atom/" rel="self" type="application/atom+xml"/>
      <title xml:lang="en">Ian Bicking: a blog</title>
      <updated>2008-12-27T19:09:18Z</updated>
    </source>
  </entry>

  <entry>
    <id>http://plope.com/Members/chrism/i_am_not_a_programmer</id>
    <link href="http://plope.com/Members/chrism/i_am_not_a_programmer" rel="alternate" type="text/html"/>
    <title>"I Am Not A Programmer"</title>
    <summary>The worst way to ask for help.</summary>
    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>So... you're developing or modifying an existing web or GUI application.  You 
send an email to the maillist or inject into IRC asking for help:
</p><pre>  I need help!  I just need to do &lt;X&gt;.  I am not a programmer.
</pre>
<p/>
<p>Hey... guess what!  Congratulations, you just became a programmer by deciding to do 
what you're trying to do!  So you can't hide behind that particular "I'm not a programmer"
skirt anymore, you gave that right up when you opened the first file in your editor.</p>
<p>None of us are really "programmers"... there's no cabal or test
we all take.  We're all trying to figure it out just like you.  So save the excuses,
use Google and the powers of logic to ask a reasonable question that pertains
to your problem and maybe someone will help.</p></div>
    </content>
    <updated>2008-12-21T17:59:20Z</updated>
    <category term="python"/>
    <category term="tech"/>
    <category term="zope"/>
    <author>
      <name>chrism</name>
    </author>
    <source>
      <id>http://plope.com</id>
      <link href="http://plope.com/python.rss" rel="self" type="application/atom+xml"/>
      <link href="http://plope.com" rel="alternate" type="text/html"/>
      <subtitle>Chris McDonough's Python Feed</subtitle>
      <title>Chris McDonough's Python Feed</title>
      <updated>2008-12-30T09:44:16Z</updated>
    </source>
  </entry>

  <entry>
    <id>http://plope.com/Members/chrism/pluginizing_an_app</id>
    <link href="http://plope.com/Members/chrism/pluginizing_an_app" rel="alternate" type="text/html"/>
    <title>Plugin-Izing an Application</title>
    <summary>I just tried to make an application pluggable as per André Roberge's recent specification using the Zope Component Architecture.</summary>
    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I just tried to create a plugin system as per André Roberge's
<a href="http://aroberge.blogspot.com/2008/12/plugins-part-1-application.html">specification</a>
using the Zope Component Architecture.  The program which is
being made pluggable is a program written by the effbot.  I'm sure
the effbot will be thrilled. ;-)</p>
<p>This is not exactly the kind of problem domain where you'd expect to need
plugins.  It's a parser and needs to operate very quickly, and most
plugin architectures (including the ZCA) are based on indirections
that aren't really optimized to be inside the inner loop of programs
that require very high speed.  But that said, I suppose it's as good
an application as any to introduce plugins into as a demonstration, as
long as you're not judging on speed difference between the pluggable
version and the original.  I didn't measure speed, as I'd never use
the component architecture "for real" in this particular program.</p>
<p>After completing pluginizing the application, a few things strike me.
First of all, in order to get the ZCA to parse ZCML, it needs a <em>lot</em>
of dependencies.  We knew this, of course, but it's pretty striking
exactly how many are required when you're dealing with this simple of
a problem.  Here they are:
</p><pre>  pytz-2008i-py2.4.egg
  zope.component-3.4.0-py2.4.egg
  zope.configuration-3.4.0-py2.4.egg
  zope.deferredimport-3.4.0-py2.4.egg
  zope.deprecation-3.4.0-py2.4.egg
  zope.event-3.4.0-py2.4.egg
  zope.exceptions-3.5.2-py2.4.egg
  zope.i18n-3.6.0-py2.4.egg
  zope.i18nmessageid-3.4.3-py2.4-macosx-10.5-i386.egg
  zope.interface-3.4.1-py2.4-macosx-10.5-i386.egg
  zope.location-3.4.0-py2.4.egg
  zope.proxy-3.4.1-py2.4-macosx-10.5-i386.egg
  zope.proxy-3.4.2-py2.4-macosx-10.5-i386.egg
  zope.publisher-3.5.4-py2.4.egg
  zope.schema-3.4.0-py2.4.egg
  zope.security-3.5.2-py2.4-macosx-10.5-i386.egg
  zope.testing-3.5.1-py2.4.egg
  zope.traversing-3.5.0a4-py2.4.egg
</pre>
<p/>
<p>That's just absurd.  The <em>publisher</em>?  <em>zope.security</em>?
<em>zope.location</em>?  We really need to detangle these dependencies this
at some point to make it reasonable for very small applications to use
<code>zope.configuration</code> (ZCML).  Most of these dependencies are actually
dependencies of ZCML (<code>zope.configuration</code>), rather than the ZCA
"proper" (<code>zope.component</code>).</p>
<p>In any case, on to the actual plugin-ization.  To no one's surprise,
the resulting plugin-ized version of the application is much more
complex.</p>
<p>To actually plugin-ize the app, I made each operator into a named
utility using the ZCA, configured via ZCML.  The name of the utility
is the operator itself.  Each utility is registered via ZCML, ala:
</p><pre>  &lt;utility
    provides=".interfaces.IOperator"
    component=".operators.operator_add_token"
    name="+"
  /&gt;
</pre>
<p/>
<p>One utility is registered for each operator required.  Accordingly,
the application's tokenize() function now looks up each operator via a
utility lookup:
</p><pre>  def tokenize(program):
      for number, operator in re.findall("\s*(?:(\d+)|(\*\*|.))", program):
          if number:
              yield literal_token(int(number))
          elif operator:
              utility = queryUtility(IOperator, name=operator)
              if utility is None:
                  raise SyntaxError("unknown operator: %r" % operator)
              yield utility()
          else:
              raise SyntaxError("unknown operator: %r" % operator)
      yield end_token()
</pre>
<p/>
<p>When an operator is encountered, <code>queryUtility</code> is run; it will try to
find a named utility (or it won't, and will raise a syntax error).
The utilities themselves are classes.  For example, the
<code>operator_add_token</code> utility is defined as:
</p><pre>    class operator_add_token(object):
        """ plugin """
        lbp = 10
        def nud(self, context):
            return context.expression(100)

        def led(self, context, left):
            return left + context.expression(10)
</pre>
<p/>
<p>Note that I changed the application to use a "context" object rather
than module-scope globals to find the <code>token</code> and <code>expression</code>
callable, so this definition isn't exactly like the one defined by the
original application, but it still does the same thing.</p>
<p>In any case, all of the operators are defined in the same file
(<code>calc.operators</code>).  This is just for convenience; they could be
defined all over hell and gone if you liked (you'd just change the
ZCML to refer to a utility component at a different dotted name).  And of
course if you included more ZCML (which can cross files too), you'd
could add another operator or override the implementation of an
existing operator.  I don't have very much imagination, so I did
neither.  You get the point, hopefully.</p>
<p>I suppose this is about the simplest possible example of using the
Zope Component Architecture to create a pluggable application.  You
can also define your own ZCML directives (e.g. I could have made the
ZCML read something like <code>&lt;registerOperator name="+"
implementation=".operators.operator_add"&gt;</code>.  You also don't really
need to use ZCML, it's just a shell around the actual component
architecture that makes clear the difference between code and
configuration.</p>
<p>The result of my toying around exists at
<a href="http://plope.com/static/misc/calc-0.2.tar.gz">http://plope.com/static/misc/calc-0.2.tar.gz</a>
.  Run "setup.py install" (in a virtualenv, to prevent polluting your system
Python with the above libraries) to install it. To run it
subsequently, run "bin/calctest" (a setuptools console script).</p></div>
    </content>
    <updated>2008-12-19T03:53:16Z</updated>
    <category term="python"/>
    <category term="tech"/>
    <category term="zope"/>
    <author>
      <name>chrism</name>
    </author>
    <source>
      <id>http://plope.com</id>
      <link href="http://plope.com/python.rss" rel="self" type="application/atom+xml"/>
      <link href="http://plope.com" rel="alternate" type="text/html"/>
      <subtitle>Chris McDonough's Python Feed</subtitle>
      <title>Chris McDonough's Python Feed</title>
      <updated>2008-12-30T09:44:16Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://blog.ianbicking.org/2008/12/17/javascript-status-message-display/</id>
    <link href="http://blog.ianbicking.org/2008/12/17/javascript-status-message-display/" rel="alternate" type="text/html"/>
    <title xml:lang="en">Javascript Status Message Display</title>
    <summary xml:lang="en">In a little wiki I’ve been playing with I’ve been trying out little ideas that I’ve had but haven’t had a place to actually implement them.  One is how notification messages work.  I’m sure other people have done the same thing, but I thought I’d describe it anyway.
A common pattern is to accept [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><div class="document">
<p>In a <a class="reference external" href="http://www.bitbucket.org/ianb/pickywiki/">little wiki I’ve been playing with</a> I’ve been trying out little ideas that I’ve had but haven’t had a place to actually implement them.  One is how notification messages work.  I’m sure other people have done the same thing, but I thought I’d describe it anyway.</p>
<p>A <a class="reference external" href="http://blog.ianbicking.org/web-application-patterns-status-notification.html">common pattern</a> is to accept a POST request and then redirect the user to some page, setting a status message.  Typically the status message is either set in a cookie or in the session, then the standard template for the application has some code to check for a message and display it.</p>
<p>The problem with this is that this breaks all caching — at any time any page can have some message injected into it, basically for no reason at all.  So I thought: why not do the whole thing in Javascript?  The server will set a cookie, but only Javascript will read it.</p>
<p>The code goes like this; on the server (easily translated into any framework):</p>
<pre class="literal-block">resp.set_cookie('flash_message', urllib.quote(msg))
</pre>
<p>I quote the message because it can contain characters unsafe for cookies, and URL quoting is a particularly easy quoting to apply.</p>
<p>Then I have this Javascript (using jQuery):</p>
<pre class="literal-block">$(function () {
    // Anything in $(function...) is run on page load
    var flashMsg = readCookie('flash_message');
    if (flashMsg) {
        flashMsg = unescape(flashMsg);
        var el = $('&lt;div id="flash-message"&gt;'+
          '&lt;div id="flash-message-close"&gt;'+
          '&lt;a title="dismiss this message" '+
          'id="flash-message-button" href="#"&gt;X&lt;/a&gt;&lt;/div&gt;'+
          flashMsg + '&lt;/div&gt;');
        $('a#flash-message-button', el).bind(
          'click', function () {
            $(this.parentNode.parentNode).remove();
        });
        $('#body').prepend(el);
        eraseCookie('flash_message');
    }
});
</pre>
<p>Note that I’ve decided to treat the flash message as HTML.  I don’t see a strong risk of injection attack in this case, though I must admit I’m a little unclear about what the normal policies are for cross-domain cookie setting.</p>
<p>I use <a class="reference external" href="http://www.quirksmode.org/js/cookies.html">these cookie functions</a> because oddly I can’t find cookie handling functions in jQuery.  It’s always weird to me how primitive <tt class="docutils literal"><span class="pre">document.cookie</span></tt> is.  Anyway, CSS looks like this:</p>
<pre class="literal-block">#flash-message {
  margin: 0.5em;
  border: 2px solid #000;
  background-color: #9f9;
  -moz-border-radius: 4px;
  text-align: center;
}

#flash-message-close {
  float: right;
  font-size: 70%;
  margin: 2px;
}

a#flash-message-button {
  text-decoration: none;
  color: #000;
  border: 1px solid #9f9;
}

a#flash-message-button:hover {
  border: 1px solid #000;
  background-color: #009;
  color: #fff;
}
</pre>
<p>This doesn’t have non-Javascript fallback, but I think that’s okay.  This isn’t something that a spider would ever see (since spiders shouldn’t be submitting forms that result in update messages).  Accessible browsers generally implement Javascript so that’s also not particularly a problem, though there may be additional hints I could give in CSS or Javascript to help make this more readable (if there’s a message, it should probably be the first thing read on the page).</p>
<p>Another common component of pages that varies separate from the page itself is logged-in status, but that’s more heavily connected to your application.  Get both into Javascript and you might be able to turn caching way up on a lot of your pages.</p>
</div></div>
    </content>
    <updated>2008-12-17T18:03:15Z</updated>
    <published>2008-12-17T18:03:15Z</published>
    <category scheme="http://blog.ianbicking.org" term="Web"/>
    <category scheme="http://blog.ianbicking.org" term="Javascript"/>
    <category scheme="http://blog.ianbicking.org" term="Programming"/>
    <author>
      <name>Ian Bicking</name>
      <uri>http://blog.ianbicking.org</uri>
    </author>
    <source>
      <id>http://blog.ianbicking.org/feed/atom/</id>
      <link href="http://blog.ianbicking.org" rel="alternate" type="text/html"/>
      <link href="http://blog.ianbicking.org/feed/atom/" rel="self" type="application/atom+xml"/>
      <title xml:lang="en">Ian Bicking: a blog</title>
      <updated>2008-12-27T19:09:18Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://blog.ianbicking.org/2008/12/16/using-pip-requirements/</id>
    <link href="http://blog.ianbicking.org/2008/12/16/using-pip-requirements/" rel="alternate" type="text/html"/>
    <title xml:lang="en">Using pip Requirements</title>
    <summary xml:lang="en">Following onto a set of recent posts (from James, me, then James again), Martijn Faassen wrote a description of Grok’s version management.  Our ideas are pretty close, but he’s using buildout, and I’ll describe out to do the same things with pip.
Here’s a kind of development workflow that I think works well:

A framework release [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><div class="document">
<p>Following onto a set of recent posts (from <a class="reference external" href="http://www.b-list.org/weblog/2008/dec/14/packaging/">James</a>, <a class="reference external" href="http://blog.ianbicking.org/2008/12/14/a-few-corrections-to-on-packaging/">me</a>, then <a class="reference external" href="http://www.b-list.org/weblog/2008/dec/15/pip/">James again</a>), Martijn Faassen <a class="reference external" href="http://faassen.n--tree.net/blog/view/weblog/2008/12/16/0">wrote a description of Grok’s version management</a>.  Our ideas are pretty close, but he’s using buildout, and I’ll describe out to do the same things with pip.</p>
<p>Here’s a kind of development workflow that I think works well:</p>
<ul class="simple">
<li>A framework release is prepared.  Ideally there’s a buildbot that has been running (as <a class="reference external" href="http://pylonshq.com:8010/">Pylons has</a>, for example), so the integration has been running for a while.</li>
<li>People make sure there are released versions of all the important components.  If there are known conflicts between pieces, libraries and the framework update their <tt class="docutils literal"><span class="pre">install_requires</span></tt> in their <tt class="docutils literal"><span class="pre">setup.py</span></tt> files to make sure people don’t use conflicting pieces together.</li>
<li>Once everything has been released, there is a known set of packages that work together.  Using a buildbot maybe future versions will also work together, but they won’t necessarily work together with applications built on the framework.  And breakage can also occur regardless of a buildbot.</li>
<li>Also, people may have versions of libraries already installed, but just because they’ve installed something doesn’t mean they really mean to stay with an old version.  While known conflicts have been noted, there’s going to be lots of unknown conflicts and future conflicts.</li>
<li>When starting development with a framework, the developer would like to start with some known-good set, which is a set that can be developed by the framework developers, or potentially by any person.  For instance, if you extend a public framework with an internal framework (or even a public sub-framework like <a class="reference external" href="http://pinaxproject.com/">Pinax</a>) then the known-good set will be developed by a different set of people.</li>
<li>As an application is developed, the developer will add on other libraries, or use some of their own libraries.  Development will probably occur at the trunk/tip of several libraries as they are developed together.</li>
<li>A developer might upgrade the entire framework, or just upgrade one piece (for instance, to get a bug fix they are interested in, or follow a branch that has functionality they care about).  The developer doesn’t necessarily have the same notion of "stable" and "released" as the core framework developers have.</li>
<li>At the time of deployment the developer wants to make sure all the pieces are deployed together as they’ve tested them, and how they know them to work.  At any time, another developer may want to clone the same set of libraries.</li>
<li>After initial deployment, the developer may want to upgrade a single component, if only to test that an upgrade works, or if it resolves a bug.  They may test out combinations only to throw them away, and they don’t want to bump versions of libraries in order to deploy new combinations.</li>
</ul>
<p>This is the kind of development pattern that requirement files are meant to assist with.  They can provide a known-good set of packages.  Or they can provide a starting point for an active line of development.  Or they can provide a historical record of how something was put together.</p>
<p>The easy way to start a requirement file for pip is just to put the packages you know you want to work with.  For instance, we’ll call this <tt class="docutils literal"><span class="pre">project-start.txt</span></tt>:</p>
<pre class="literal-block">Pylons
-e svn+http://mycompany/svn/MyApp/trunk#egg=MyApp
-e svn+http://mycompany/svn/MyLibrary/trunk#egg=MyLibrary
</pre>
<p>You can plug away for a while, and maybe you decide you want to freeze the file.  So you do:</p>
<pre class="literal-block">$ pip freeze -r project-start.txt project-frozen.txt
</pre>
<p>By using <tt class="docutils literal"><span class="pre">-r</span> <span class="pre">project-start.txt</span></tt> you give <tt class="docutils literal"><span class="pre">pip</span> <span class="pre">freeze</span></tt> a template for it to start with.  From that, you’ll get <tt class="docutils literal"><span class="pre">project-frozen.txt</span></tt> that will look like:</p>
<pre class="literal-block">Pylons==0.9.7
-e svn+http://mycompany/svn/MyApp/trunk@1045#egg=MyApp
-e svn+http://mycompany/svn/MyLibrary/trunk@1058#egg=MyLibrary

## The following requirements were added by pip --freeze:
Beaker==0.2.1
WebHelpers==0.9.1
nose==1.4
# Installing as editable to satisfy requirement INITools==0.2.1dev-r3488:
-e svn+http://svn.colorstudy.com/INITools/trunk@3488#egg=INITools-0.2.1dev_r3488
</pre>
<p>At that point you might decide that you don’t care about the nose version, or you might have installed something from trunk when you could have used the last release.  So you go and adjust some things.</p>
<p>Martijn also asks: how do you have framework developers maintain one file, and then also have developers maintain their own lists for their projects?</p>
<p>You could start with a file like this for the framework itself.  Pylons for instance could ship with something like this.  To install Pylons you could then do:</p>
<pre class="literal-block">$ pip -E MyProject install \
&gt;    -r http://pylonshq.com/0.9.7-requirements.txt
</pre>
<p>You can also download that file yourself, add some comments, rename the file and add your project to it, and use that.  When you freeze the order of the packages and any comments will be preserved, so you can keep track of what changed.  Also it should be ameniable to source control, and diffs would be sensible.</p>
<p>You could also use indirection, creating a file like this for your project:</p>
<pre class="literal-block">-r http://pylonshq.com/0.9.7-requirements.txt
-e svn+http://mycompany/svn/MyApp/trunk#egg=MyApp
-e svn+http://mycompany/svn/MyLibrary/trunk#egg=MyLibrary
</pre>
<p>That is, requirements files can refer to each other.  So if you want to maintain your own requirements file alongside the development of an upstream requirements file, you could do that.</p>
</div></div>
    </content>
    <updated>2008-12-17T01:30:21Z</updated>
    <published>2008-12-17T01:30:21Z</published>
    <category scheme="http://blog.ianbicking.org" term="Packaging"/>
    <category scheme="http://blog.ianbicking.org" term="Python"/>
    <author>
      <name>Ian Bicking</name>
      <uri>http://blog.ianbicking.org</uri>
    </author>
    <source>
      <id>http://blog.ianbicking.org/feed/atom/</id>
      <link href="http://blog.ianbicking.org" rel="alternate" type="text/html"/>
      <link href="http://blog.ianbicking.org/feed/atom/" rel="self" type="application/atom+xml"/>
      <title xml:lang="en">Ian Bicking: a blog</title>
      <updated>2008-12-27T19:09:18Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://blog.ianbicking.org/2008/12/14/a-few-corrections-to-on-packaging/</id>
    <link href="http://blog.ianbicking.org/2008/12/14/a-few-corrections-to-on-packaging/" rel="alternate" type="text/html"/>
    <title xml:lang="en">A Few Corrections To “On Packaging”</title>
    <summary xml:lang="en">James Bennett recently wrote an article on Python packaging and installation, and Setuptools.  There’s a lot of issues, and writing up my thoughts could take a long time, but I thought at least I should correct some errors, specifically category errors.  Figuring out where all the pieces in Setuptools (and pip and virtualenv) [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><div class="document">
<p><a class="reference external" href="http://www.b-list.org/weblog/2008/dec/14/packaging/">James Bennett recently wrote an article on Python packaging and installation</a>, and Setuptools.  There’s a lot of issues, and writing up my thoughts could take a long time, but I thought at least I should correct some errors, specifically category errors.  Figuring out where all the pieces in Setuptools (and pip and virtualenv) fit <em>is</em> difficult, so I don’t blame James for making some mistakes, but in the interest of clarifying the discussion…</p>
<p>I will start with a kind of glossary:</p>
<dl class="docutils">
<dt>Distribution:</dt>
<dd>This is something-with-a-setup.py.  A tarball, zip, a checkout, etc.  Distributions have names; this is the name in <tt class="docutils literal"><span class="pre">setup(name="…")</span></tt> in the setup.py file.  They have some other metadata too (description, version, etc), and Setuptools adds to that metadata some.  Distutils doesn’t make it very easy to add to the metadata — it’ll whine a little about things it doesn’t know, but won’t do anything with that extra data.  Fixing this problem in Distutils is an important aspect of Setuptools, and part of what Distutils itself unsuitable as a basis for good library management.</dd>
<dt>package/module:</dt>
<dd>This is something you import.  It is not the same as a distribution, though usually a distribution will have the same name as a package.  In my own libraries I try to name the distribution with mixed case (like Paste) and the package with lower case (like paste).  Keeping the terminology straight here is <em>very</em> difficult; and usually it doesn’t matter, but sometimes it does.</dd>
<dt>Setuptools The Distribution:</dt>
<dd>This is what you install when you install Setuptools.  It includes several pieces that Phillip Eby wrote, that work together but are not strictly a single thing.</dd>
<dt>setuptools The Package:</dt>
<dd>This is what you get when you do <tt class="docutils literal"><span class="pre">import</span> <span class="pre">setuptools</span></tt>.  Setuptools largely works by monkeypatching distutils, so simply importing setuptools activates its functionality from then on.  This package is entirely focused on installation and package management, it is not something you should use at runtime (unless you are installing packages as your runtime, of course).</dd>
<dt>pkg_resources The Module:</dt>
<dd>This is also included in Setuptools The Distribution, and is for use at runtime.  This is a single module that provides the ability to query what distributions are installed, metadata about those distributions, information about the location where they are installed.  It also allows distributions to be "activated".  A <em>distribution</em> can be available but not activated.  Activating a distribution means adding its location to <tt class="docutils literal"><span class="pre">sys.path</span></tt>, and probably you’ve noticed how long sys.path is when you use easy_install.  Almost everything that allows different libraries to be installed, or allows different versions of libraries, does it through some management of sys.path.  pkg_resources also allows for generic access to "resources" (i.e., non-code files), and let’s those resources be in zip files.  pkg_resources is safe to use, it doesn’t do any of the funny stuff that people get annoyed with.</dd>
<dt>easy_install:</dt>
<dd>This is also in Setuptools The Distribution.  The basic functionality it provides is that given a name, it can search for package with that distribution name, and also satisfying a version requirement.  It then downloads the package, installs it (using <tt class="docutils literal"><span class="pre">setup.py</span> <span class="pre">install</span></tt>, but with the setuptools monkeypatches in place).  After that, it checks the newly installed distribution to see if it requires any other libraries that aren’t yet installed, and if so it installs them.</dd>
<dt>Eggs the Distribution Format:</dt>
<dd>These are zip files that Setuptools creates when you run <tt class="docutils literal"><span class="pre">python</span> <span class="pre">setup.py</span> <span class="pre">bdist_egg</span></tt>.  Unlike a tarball, these can be binary packages, containing compiled modules, and generally contain .pyc files (which are portable across platforms, but not Python versions).  This format only includes files that will actually be installed; as a result it does not include doc files or <tt class="docutils literal"><span class="pre">setup.py</span></tt> itself.  All the metadata from <tt class="docutils literal"><span class="pre">setup.py</span></tt> that is needed for installation is put in files in a directory <tt class="docutils literal"><span class="pre">EGG-INFO</span></tt>.</dd>
<dt>Eggs the Installation Format:</dt>
<dd>Eggs the Distribution Format are a subset of the Installation Format.  That is, if you put an Egg zip file on the path, it is installed, no other process is necessary.  But the Installation Format is more general.  To have an egg installed, you either need something like <tt class="docutils literal"><span class="pre">DistroName-X.Y.egg/</span></tt> on the path, and then an <tt class="docutils literal"><span class="pre">EGG-INFO/</span></tt> directory under that with the metadata, or a path like <tt class="docutils literal"><span class="pre">DistroName.egg-info/</span></tt> with the metadata directly in that directory.  This metadata can exist anywhere, and doesn’t have to be directly alongside the actual Python code.  Egg directories are required for pkg_resources to activate and deactivate distributions, but otherwise they aren’t necessary.</dd>
<dt>pip:</dt>
<dd>This is an alternative to easy_install.  It works <em>somewhat</em> differently than easy_install, but not much.  Mostly it is <em>better</em> than easy_install, in that it has some extra features and is easier to use.  Unlike easy_install, it downloads all distributions up-front, and generates the metadata to read distribution and version requirements.  It uses Setuptools to generate this metadata from a setup.py file, and uses pkg_resources to parse this metadata.  It then installs packages <em>with the setuptools monkeypatches applied</em>.  It just happens to use an option <tt class="docutils literal"><span class="pre">python</span> <span class="pre">setup.py</span> <span class="pre">–single-version-externally-managed</span></tt>, which gets Setuptools to install packages in a more flat manner, with <tt class="docutils literal"><span class="pre">Distro.egg-info/</span></tt> directories alongside the package.  Pip installs eggs!  I’ve heard the many complaints about easy_install (and I’ve had many myself), but ultimately I think pip does well by just fixing a few small issues.  Pip is <em>not</em> a repudiation of Setuptools or the basic mechanisms that easy_install uses.</dd>
<dt>PoachEggs:</dt>
<dd>This is a defunct package that had some of the features of pip (particularly requirement files) but used easy_install for installation.  Don’t bother with this, it was just a bridge to get to pip.</dd>
<dt>virtualenv:</dt>
<dd>This is a little hack that creates isolated Python environments.  It’s based on <tt class="docutils literal"><span class="pre">virtual-python.py</span></tt>, which is something I wrote based on some documentation notes PJE wrote for Setuptools.  Basically virtualenv just creates a <tt class="docutils literal"><span class="pre">bin/python</span></tt> interpreter that has its own value of <tt class="docutils literal"><span class="pre">sys.prefix</span></tt>, but uses the system Python and standard library.  It also installs Setuptools to make it easier to bootstrap the environment (because bootstrapping Setuptools is itself a bit tedious).  I’ll add pip to it too sometime.  Using virtualenv you don’t have to worry about different library versions, because for any one environment you will probably only need one version of a library.  On any one <em>machine</em> you probably need different versions, which is why installing packages system-wide is problematic for most libraries.  (I’ve been meaning to write a post on why I think using system packaging for libraries is counter-productive, but that’ll wait for another time.)</dd>
</dl>
<hr class="docutils"/>
<p>So… there’s the pieces involved, at least the ones I can remember now.  And I haven’t really discussed .pth files, entry points, sys.path trickery, site.py, distutils.cfg… sadly this is a complex state of affairs, but it was also complex before Setuptools.</p>
<p>There are a few things that I think people really dislike about Setuptools.</p>
<p>First, zip files.  Setuptools prefers zip files, for reasons that won’t mean much to you, and maybe are more historical than anything.  When a distribution doesn’t indicate if it is zip-safe, Setuptools looks at the code and sees if it uses <tt class="docutils literal"><span class="pre">__file__</span></tt>, an if not it presumes that the code is probably zip-safe.  The specific problem James cites is what appears to be a bug in Django, that Django looks for code and can’t traverse into zip files in the same way that Python itself can.  Setuptools didn’t itself add anything to Python to make it import zip files, that functionality was added to Python some time before.  The zipped eggs that Setuptools installs are using existing (standard!) Python functionality.</p>
<p>That said, I don’t think zipping libraries up is all that useful, and while it <em>should</em> work, it doesn’t always, and it makes code harder to inspect and understand.  So since it’s not that useful, I’ve disabled it when pip installs packages.  I also have had it disabled on my own system for years now, by creating a <tt class="docutils literal"><span class="pre">distutils.cfg</span></tt> file with <tt class="docutils literal"><span class="pre">[easy_install]</span> <span class="pre">zip_ok</span> <span class="pre">=</span> <span class="pre">False</span></tt> in it.  Sadly App Engine is forcing me to use zip files again, because of its absurdly small file limits… but that’s a different topic.  (There is an experimental <tt class="docutils literal"><span class="pre">pip</span> <span class="pre">zip</span></tt> command mostly intended for App Engine.)</p>
<p>Another pain point is version management with <tt class="docutils literal"><span class="pre">setup.py</span></tt> and Setuptools.  Indeed it is easy to get things messed up, and it is easy to piss people off by overspecifying, and sometimes things can get in a weird state for no good reason (often because of easy_install’s rather naive leap-before-you-look installation order).  Pip fixes that last point, but it also tries to suggest more constructive and less painful ways to manage other pieces.</p>
<p>Pip requirement files are an assertion of <strong>versions that work together</strong>.  setup.py requirements (the Setuptools requirements) should contain two things: <strong>1</strong>: all the libraries used by the distribution (without which there’s no way it’ll work) and <strong>2</strong>: exclusions of the versions of those libraries that are <strong>known not to work</strong>.  setup.py requirements should not be viewed as an assertion that by satisfying those requirements everything <em>will</em> work, just that it <em>might</em> work.  Only the end developer, testing the system together, can figure out if it really works.  Then pip gives you a way to record that working set (using <a class="reference external" href="http://pip.openplans.org/#freezing-requirements">pip freeze</a>), separate from any single distribution or library.</p>
<p>There’s also a lot of conflicts between Setuptools and package maintainers.  This is kind of a proxy war between developers and sysadmins, who have very different motivations.  It deserves a post of its own, but the conflicts are about more than just how Setuptools is implemented.</p>
<p>I’d love if there was a language-neutral library installation and management tool that really worked.  Linux system package managers are absolutely not that tool; frankly it is absurd to even consider them as an alternative.  So for now we do our best in our respective language communities.  If we’re going to move forward, we’ll have to acknowledge what’s come before, and the reasoning for it.</p>
</div></div>
    </content>
    <updated>2008-12-14T21:53:25Z</updated>
    <published>2008-12-14T21:53:25Z</published>
    <category scheme="http://blog.ianbicking.org" term="Packaging"/>
    <category scheme="http://blog.ianbicking.org" term="Python"/>
    <author>
      <name>Ian Bicking</name>
      <uri>http://blog.ianbicking.org</uri>
    </author>
    <source>
      <id>http://blog.ianbicking.org/feed/atom/</id>
      <link href="http://blog.ianbicking.org" rel="alternate" type="text/html"/>
      <link href="http://blog.ianbicking.org/feed/atom/" rel="self" type="application/atom+xml"/>
      <title xml:lang="en">Ian Bicking: a blog</title>
      <updated>2008-12-27T19:09:18Z</updated>
    </source>
  </entry>

  <entry>
    <id>http://plope.com/bounty_solved</id>
    <link href="http://plope.com/bounty_solved" rel="alternate" type="text/html"/>
    <title>Bounty Solved</title>
    <summary>Brandon Craig Rhodes is the winner of the bounty for figuring out the emacs-flymake + pyflakes issue.</summary>
    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Brandon spake verily:
</p><pre> I have solved our problem with triple-quotes.

 The problem is that Emacs expects PyFlakes to only output error
 messages, but on a syntax error, PyFlakes prints out an error message,
 then the *entire* contents of the module that it cannot import, and
 *finally* a line that contains a number of spaces equal to the offset
 into the file of the syntax error (in the case of my real-world file,
 the triple-quote was 3,896 characters into the file, so PyFlake's line 
 of spaces was that long as well).  The offending code is in the
 "pyflakes" command-line program and looks like this:

    print &gt;&gt; sys.stderr, 'could not compile %r:%d:' % (filename, lineno)
    print &gt;&gt; sys.stderr, line
    print &gt;&gt; sys.stderr, " " * (offset-2), "^"

 By removing or commenting out those last two lines, so that PyFlakes 
 only outputs its error message, you will stop flooding the Emacs
 regular-expression engine with data.  It's actually the long line of
 spaces that causes the problem, and it's one regular expression that's
 really sensitive to it (this is from the Emacs 22 flymake.el):

     ;; ant/javac
     (" *\\(\\[javac\\] *\\)?\\(\\([a-zA-Z]:\\)?[^:(\t\n]+\\)\:\\([0-9]+\\)\:[ \t\n]*\\(.+\\)"
      2 4 nil 5))

 For some reason (I'm not an RE engine guru), something about the way
 it's matching spaces takes exponential time when given several thousand
 spaces.  Go figure.  Anyway, I see no point in throwing anything but 
 error messages at Flymake, so I comment out both the printing of the
 module and the spaces from PyFlakes.
</pre>
<p/>
<p>And sure nuff, that solves it.  Brandon gets a well-deserved $500.00.</p></div>
    </content>
    <updated>2008-12-14T09:01:49Z</updated>
    <category term="python"/>
    <category term="tech"/>
    <category term="zope"/>
    <author>
      <name>chrism</name>
    </author>
    <source>
      <id>http://plope.com</id>
      <link href="http://plope.com/python.rss" rel="self" type="application/atom+xml"/>
      <link href="http://plope.com" rel="alternate" type="text/html"/>
      <subtitle>Chris McDonough's Python Feed</subtitle>
      <title>Chris McDonough's Python Feed</title>
      <updated>2008-12-30T09:44:16Z</updated>
    </source>
  </entry>

  <entry>
    <id>http://plope.com/pyflakes_flymake_bounty</id>
    <link href="http://plope.com/pyflakes_flymake_bounty" rel="alternate" type="text/html"/>
    <title>Bounty: Make Emacs Flymake + PyFlakes Setup Not Spin On Triple-Quote Entry</title>
    <summary>I will pay anyone who can fix the integration between Pyflakes and Emacs flymake-mode so that it doesn't "spin" when you type a single set of triple-quotes (""").</summary>
    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Several months ago, I created a <a href="http://plope.com/Members/chrism/flymake-mode">blog entry</a> where I documented an integration of <a href="http://divmod.org/trac/wiki/DivmodPyflakes">PyFlakes</a> with Emacs <a href="http://flymake.sourceforge.net/">flymake-mode</a> that is insanely useful when writing Python code.</p>
<p>This setup has been working fantastically, except for one thing.  When you type a single set of triple-quotes ( <code> """ </code> ), and stop typing, the Emacs process will begin to "spin", the CPU usage hits 100%.  If you type the ending set of triple-quotes, although they don't immediately appear on the screen, Emacs will return editing control to you perhaps several minutes later with your keystrokes evident; the process continues to spin during the waiting period.  If you never type the ending triple-quotes, the process will spin "forever".</p>
<p>I have worked around this so far: every time I need some triple-quoted string, I type two sets of triple quotes very quickly ( <code> """ """ </code>), which presumably prevents the flymake-mode syntax checker from kicking in.  Then I just back up and insert the text I want.  However, sometimes I forget to type the ending triple-quotes, and it sucks to wait for the process to stop tripping over itself and it sucks worse to have to kill it and start it again.  It kills momentum during coding something awful.</p>
<p>I am not interested in debugging this problem personally (I have no experience writing Emacs Lisp).  But I'd really, really like to get it fixed.  Therefore, I hereby offer a $500.00 bounty to anyone who has the know-how to reproduce and remediate this problem (I use Carbon Emacs on Mac OS X although I believe the problem is not OS-platform-specific, it will probably be evident in any version of GNU Emacs).  I will pay the first person that comes up with a solution via PayPal or check or however it works for you really after you've demonstrated that the problem is solved and provided instructions on how others can fix it themselves (by applying a patch, or whatever needs be done).  Send me a mail (chrism@plope.com) if you've decided to work on it.</p>
<p>If you take a look at the problem, and you think it's going to take longer than $500.00 is worth, let me know, and I'll see if I can't find some other folks to cough up some more dough.</p></div>
    </content>
    <updated>2008-12-13T18:11:53Z</updated>
    <category term="python"/>
    <category term="tech"/>
    <category term="zope"/>
    <author>
      <name>chrism</name>
    </author>
    <source>
      <id>http://plope.com</id>
      <link href="http://plope.com/python.rss" rel="self" type="application/atom+xml"/>
      <link href="http://plope.com" rel="alternate" type="text/html"/>
      <subtitle>Chris McDonough's Python Feed</subtitle>
      <title>Chris McDonough's Python Feed</title>
      <updated>2008-12-30T09:44:16Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/</id>
    <link href="http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/" rel="alternate" type="text/html"/>
    <title xml:lang="en">lxml: an underappreciated web scraping library</title>
    <summary xml:lang="en">When people think about web scraping in Python, they usually think BeautifulSoup.  That’s okay, but I would encourage you to also consider lxml.
First, people think BeautifulSoup is better at parsing broken HTML.  This is not correct.  lxml parses broken HTML quite nicely.  I haven’t done any thorough testing, but at least [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><div class="document">
<p>When people think about web scraping in Python, they usually think <a class="reference external" href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a>.  That’s okay, but I would encourage you to also consider <a class="reference external" href="http://codespeak.net/lxml/">lxml</a>.</p>
<p>First, people think BeautifulSoup is better at parsing broken HTML.  <strong>This is not correct.</strong>  lxml parses broken HTML quite nicely.  I haven’t done any thorough testing, but at least <a class="reference external" href="http://www.crummy.com/software/BeautifulSoup/documentation.html#Parsing%20HTML">the BeautifulSoup broken HTML example</a> is parsed better by lxml (which knows that <tt class="docutils literal"><span class="pre">&lt;td&gt;</span></tt> elements should go inside <tt class="docutils literal"><span class="pre">&lt;table&gt;</span></tt> elements).</p>
<p>Second, people feel lxml is harder to install.  This is correct.  <strong>BUT</strong>, lxml 2.2alpha1 includes an option to compile static versions of the underlying C libraries, which should improve the installation experience, especially on Macs.  To install this new way, try:</p>
<pre class="literal-block">$ STATIC_DEPS=true easy_install 'lxml&gt;=2.2alpha1'
</pre>
<p>One you have lxml installed, you have a great parser (which happens to be <a class="reference external" href="http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/">super-fast</a> and that is <strong>not a tradeoff</strong>).  You get a fairly familiar API based on <a class="reference external" href="http://docs.python.org/library/xml.etree.elementtree.html#module-xml.etree.ElementTree">ElementTree</a>, which though a little strange feeling at first, offers a compact and canonical representation of a document tree, compared to more traditional representations.  But there’s more…</p>
<p>One of the features that should be appealing to many people doing screen scraping is that you get CSS selectors.  You can use XPath as well, but usually that’s more complicated (<a class="reference external" href="http://css2xpath.appspot.com/?css=div.pad%20a&amp;amp;format=html">for example</a>).  Here’s <a class="reference external" href="http://crowtheries.net/?p=60">an example I found</a> getting links from a menu in a page in BeautifulSoup:</p>
<pre class="literal-block">from BeautifulSoup import BeautifulSoup
import urllib2
soup = BeautifulSoup(urllib2.urlopen('http://java.sun.com').read())
menu = soup.findAll('div',attrs={'class':'pad'})
for subMenu in menu:
    links = subMenu.findAll('a')
    for link in links:
        print "%s : %s" % (link.string, link['href'])
</pre>
<p>Here’s the same example in lxml:</p>
<pre class="literal-block">from lxml.html import parse
doc = parse('http://java.sun.com').getroot()
for link in doc.cssselect('div.pad a'):
    print '%s: %s' % (link.text_content(), link.get('href'))
</pre>
<p>lxml generally knows more about HTML than BeautifulSoup.  Also I think it does well with the small details; for instance, the lxml example will match elements in <tt class="docutils literal"><span class="pre">&lt;div</span> <span class="pre">class="pad</span> <span class="pre">menu"&gt;</span></tt> (space-separated classes), which the BeautifulSoup example does not do (obviously there are other ways to search, but <a class="reference external" href="http://www.crummy.com/software/BeautifulSoup/documentation.html#Searching%20by%20CSS%20class">the obvious and documented technique</a> doesn’t pay attention to HTML semantics).</p>
<p>One feature that I think is really useful is <tt class="docutils literal"><span class="pre">.make_links_absolute()</span></tt>.  This takes the base URL of the page (<tt class="docutils literal"><span class="pre">doc.base</span></tt>) and uses it to make all the links absolute.  This makes it possible to relocate snippets of HTML or whole sets of documents (as with <a class="reference external" href="http://svn.colorstudy.com/home/ianb/PageCollector/trunk">this program</a>).  This isn’t just <tt class="docutils literal"><span class="pre">&lt;a</span> <span class="pre">href&gt;</span></tt> links, but stylesheets, inline CSS with <tt class="docutils literal"><span class="pre">@import</span></tt> statements, <tt class="docutils literal"><span class="pre">background</span></tt> attributes, etc.  It doesn’t see quite <em>all</em> links (for instance, links in Javascript) but it sees most of them, and works well for most sites.  So if you want to make a local copy of a site:</p>
<pre class="literal-block">from lxml.html import parse, open_in_browser
doc = parse('http://wiki.python.org/moin/').getroot()
doc.make_links_absolute()
open_in_browser(doc)
</pre>
<p><tt class="docutils literal"><span class="pre">open_in_browser</span></tt> serializes the document to a temporary file and then opens a web browser (using <a class="reference external" href="http://docs.python.org/library/webbrowser.html">webbrowser</a>).</p>
<p>Here’s <a class="reference external" href="http://svn.colorstudy.com/home/ianb/recipes/lxmldiff.py">an example</a> that compares two pages using <tt class="docutils literal"><span class="pre">lxml.html.diff</span></tt>:</p>
<pre class="literal-block">from lxml.html.diff import htmldiff
from lxml.html import parse, tostring, open_in_browser, fromstring

def get_page(url):
    doc = parse(url).getroot()
    doc.make_links_absolute()
    return tostring(doc)

def compare_pages(url1, url2, selector='body div'):
    basis = parse(url1).getroot()
    basis.make_links_absolute()
    other = parse(url2).getroot()
    other.make_links_absolute()
    el1 = basis.cssselect(selector)[0]
    el2 = other.cssselect(selector)[0]
    diff_content = htmldiff(tostring(el1), tostring(el2))
    diff_el = fromstring(diff_content)
    el1.getparent().insert(el1.getparent().index(el1), diff_el)
    el1.getparent().remove(el1)
    return basis

if __name__ == '__main__':
    import sys
    doc = compare_pages(sys.argv[1], sys.argv[2], sys.argv[3])
    open_in_browser(doc)
</pre>
<p>You can use it like:</p>
<pre class="literal-block">$ python lxmldiff.py 
'http://wiki.python.org/moin/BeginnersGuide?action=recall&amp;#038;rev=70' 
'http://wiki.python.org/moin/BeginnersGuide?action=recall&amp;#038;rev=81' 
'div#content'
</pre>
<p>Another feature lxml has is form handling.  All the cool sexy new sites use minimal forms, but searching for "registration forms" I get <a class="reference external" href="http://www.actuaryjobs.com/cform.html">this nice complex form</a>.  Let’s look at it:</p>
<pre class="literal-block">&gt;&gt;&gt; from lxml.html import parse, tostring
&gt;&gt;&gt; doc = parse('http://www.actuaryjobs.com/cform.html').getroot()
&gt;&gt;&gt; doc.forms
[&lt;Element form at -48232164&gt;]
&gt;&gt;&gt; form = doc.forms[0]
&gt;&gt;&gt; form.inputs.keys()
['thank_you_title', 'City', 'Zip', ... ]
</pre>
<p>Now we have a form object.  There’s two ways to get to the fields: <tt class="docutils literal"><span class="pre">form.inputs</span></tt>, which gives us a dictionary of all the actual <tt class="docutils literal"><span class="pre">&lt;input&gt;</span></tt> elements (and textarea and select).  There’s also <tt class="docutils literal"><span class="pre">form.fields</span></tt>, which is a dictionary-like object.  The dictionary-like object is convenient, for instance:</p>
<pre class="literal-block">&gt;&gt;&gt; form.fields['cEmail'] = 'me@example.com'
</pre>
<p>This actually updates the input element itself:</p>
<pre class="literal-block">&gt;&gt;&gt; tostring(form.inputs['cEmail'])
'&lt;input type="input" name="cEmail" size="30" value="test2"&gt;'
</pre>
<p>I think it’s actually a nicer API than <a class="reference external" href="http://formencode.org/htmlfill.html">htmlfill</a> and can serve the same purpose on the server side.</p>
<p>But then you can also use the same interface for scraping, by filling fields and getting the submission.  That looks like:</p>
<pre class="literal-block">&gt;&gt;&gt; import urllib
&gt;&gt;&gt; action = form.action
&gt;&gt;&gt; data = urllib.urlencode(form.form_values())
&gt;&gt;&gt; if form.method == 'GET':
...     if '?' in action:
...         action += '&amp;#038;' + data
...     else:
...         action += '?' + data
...     data = None
&gt;&gt;&gt; resp = urllib.urlopen(action, data)
&gt;&gt;&gt; resp_doc = parse(resp).getroot()
</pre>
<p>Lastly, there’s <a class="reference external" href="http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html">HTML cleaning</a>.  I think all these features work together well, do useful things, and it’s based on an actual understanding HTML instead of just treating tags and attributes as arbitrary.  (Also if you really like jQuery, you might want to look at <a class="reference external" href="http://pypi.python.org/pypi/pyquery">pyquery</a>, which is a jQuery-like API on top of lxml).</p>
</div></div>
    </content>
    <updated>2008-12-11T03:19:44Z</updated>
    <published>2008-12-11T03:19:44Z</published>
    <category scheme="http://blog.ianbicking.org" term="HTML"/>
    <category scheme="http://blog.ianbicking.org" term="Python"/>
    <category scheme="http://blog.ianbicking.org" term="Programming"/>
    <author>
      <name>Ian Bicking</name>
      <uri>http://blog.ianbicking.org</uri>
    </author>
    <source>
      <id>http://blog.ianbicking.org/feed/atom/</id>
      <link href="http://blog.ianbicking.org" rel="alternate" type="text/html"/>
      <link href="http://blog.ianbicking.org/feed/atom/" rel="self" type="application/atom+xml"/>
      <title xml:lang="en">Ian Bicking: a blog</title>
      <updated>2008-12-27T19:09:18Z</updated>
    </source>
  </entry>

  <entry xml:lang="en-US">
    <id>tag:www.groovie.org,:Article/775</id>
    <link href="http://groovie.org/2008/12/08/public-launch-of-stanford-intellectual-property-litigation-clearinghouse" rel="alternate" type="text/html"/>
    <title xml:lang="en-US">Public launch of Stanford Intellectual Property Litigation Clearinghouse</title>
    <summary type="xhtml" xml:lang="en-US"><div xmlns="http://www.w3.org/1999/xhtml"><p>At long last, we’ve finally launched the <a href="http://lexmachina.stanford.edu/">Stanford <span class="caps">IPLC</span> website</a> that I’ve been working on for the past year. It’s quite nice to finally have something out there that I can show people, though I know its definitely more of a niche area of interest, as not everyone is probably as interested in intellectual property litigation as I am. :)</p>


	<p>This site is running <a href="http://pylonshq.com/">Pylons</a> of course, with various other technologies I’m unable to disclose powering the back-end.</p>


	<p>Note that signing up <strong>requires a valid e-mail address</strong> as e-mail confirmations are sent out to them. For those that are keen to keep up on what’s going on with patent litigation, hopefully our website can help out.</p></div>
    </summary>
    <content type="xhtml" xml:lang="en-US"><div xmlns="http://www.w3.org/1999/xhtml"><p>At long last, we’ve finally launched the <a href="http://lexmachina.stanford.edu/">Stanford <span class="caps">IPLC</span> website</a> that I’ve been working on for the past year. It’s quite nice to finally have something out there that I can show people, though I know its definitely more of a niche area of interest, as not everyone is probably as interested in intellectual property litigation as I am. :)</p>


	<p>This site is running <a href="http://pylonshq.com/">Pylons</a> of course, with various other technologies I’m unable to disclose powering the back-end.</p>


	<p>Note that signing up <strong>requires a valid e-mail address</strong> as e-mail confirmations are sent out to them. For those that are keen to keep up on what’s going on with patent litigation, hopefully our website can help out.</p></div>
    </content>
    <updated>2008-12-09T01:04:48Z</updated>
    <published>2008-12-09T01:04:46Z</published>
    <category label="Pylons" scheme="http://groovie.org/category/pylons" term="pylons"/>
    <category label="Python" scheme="http://groovie.org/category/python" term="python"/>
    <author>
      <name>ben</name>
    </author>
    <source>
      <id>tag:www.groovie.org,:/articles</id>
      <link href="http://www.groovie.org" rel="alternate" type="text/html"/>
      <link href="http://www.groovie.org/articles.atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en-US">Code, Thoughts, and Misc Debris</subtitle>
      <title xml:lang="en-US">Groovie :</title>
      <updated>2008-12-09T01:04:48Z</updated>
    </source>
  </entry>

  <entry>
    <id>http://plope.com/Members/chrism/note_to_bloggers</id>
    <link href="http://plope.com/Members/chrism/note_to_bloggers" rel="alternate" type="text/html"/>
    <title>Note To Bloggers</title>
    <summary>A note to bloggers on highly technical topics:  don't assume I have any idea what you're talking about.</summary>
    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Nope, I don't remember you releasing that software last week.  That means it's useless to tell me that some bugs were found and fixed in it this week without a brief re-explanation of what the software actually does.  Also, I haven't been following along closely, which means I have no idea what you're referring to when you talk about your software by name without explaining a little bit about what it does first.  Note that it's also useless to start a blog entry with "the next thing I'm going to cover is..."; I don't remember the first thing you did, and since you didn't provide a link to the old thing, I'm already lost.  Be kind: think about your audience a bit.
</p></div>
    </content>
    <updated>2008-12-08T03:59:27Z</updated>
    <category term="python"/>
    <category term="tech"/>
    <category term="zope"/>
    <author>
      <name>chrism</name>
    </author>
    <source>
      <id>http://plope.com</id>
      <link href="http://plope.com/python.rss" rel="self" type="application/atom+xml"/>
      <link href="http://plope.com" rel="alternate" type="text/html"/>
      <subtitle>Chris McDonough's Python Feed</subtitle>
      <title>Chris McDonough's Python Feed</title>
      <updated>2008-12-30T09:44:16Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://blog.ianbicking.org/2008/12/03/the-magic-sentinel/</id>
    <link href="http://blog.ianbicking.org/2008/12/03/the-magic-sentinel/" rel="alternate" type="text/html"/>
    <title xml:lang="en">The Magic Sentinel</title>
    <summary xml:lang="en">In an effort to get back on the blogging saddle, here’s a little note on default values in Python.
In Python there are often default values.  The most typical default value is None — None is a object of vague meaning that almost screams "I’m a default".  But sometimes None is a valid value, [...]</summary>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><div class="document">
<p>In an effort to get back on the blogging saddle, here’s a little note on default values in Python.</p>
<p>In Python there are often default values.  The most typical default value is None — None is a object of vague meaning that almost screams "I’m a default".  But sometimes None is a valid value, and sometimes you want to detect the case of "no value given" and None can hardly be called <em>no value</em>.</p>
<p>Here’s an example:</p>
<pre class="literal-block">def getuser(username, default=None):
    if not user_exists(username):
        return default
    ...
</pre>
<p>In this case there is <em>always</em> a default, and so anytime you call <tt class="docutils literal"><span class="pre">getuser()</span></tt> you have to check for a None result.  But maybe you have code where you’d really just like to get an exception if the user isn’t found.  To get this you can use a <a class="reference external" href="http://en.wikipedia.org/wiki/Sentinel_(computer_science)">sentinel</a>.  A sentinel is an object that has no particular meaning except to signal the end (like a NULL byte in a C string), or a special condition (like no default user).</p>
<p>Sometimes people do it like this:</p>
<pre class="literal-block">_no_default = ()
def getuser(username, default=_no_default):
    if not user_exists(username):
        if default is _no_default:
            raise LookupError("No user with the username %r" % username)
        return default
    ...
</pre>
<p>This works because that zero-item tuple <tt class="docutils literal"><span class="pre">()</span></tt> is a unique object, and since we are using the comparison <tt class="docutils literal"><span class="pre">default</span> <span class="pre">is</span> <span class="pre">_no_default</span></tt> only that <em>exact</em> object will trigger that LookupError.</p>
<p>Once you understand the pattern, this is easy enough to read.  But when you use <tt class="docutils literal"><span class="pre">help()</span></tt> or other automatic generation it is a little confusing, because the default value just appears as <tt class="docutils literal"><span class="pre">()</span></tt>.  You could also use <tt class="docutils literal"><span class="pre">object()</span></tt> or <tt class="docutils literal"><span class="pre">[]</span></tt> or anything else, but the automatically generated documentation still won’t look that nice.  So for a bit more polish I suggest:</p>
<pre class="literal-block">class _NoDefault(object):
    def __repr__(self):
        return '(no default)'
NoDefault = _NoDefault()
del _NoDefault

def getuser(username, default=NoDefault):
    ...
</pre>
<p>You might then think "hey, why isn’t there one NoDefault that everyone can share?"  If you do share that sentinel you run the risk of accidentally passing in that value even though you didn’t intend to.  The value "NoDefault" will become overloaded with meaning, just as None is.  By having a more private sentinel object you avoid that.  A single nice sentinal factory (like <tt class="docutils literal"><span class="pre">_NoDefault</span></tt> in this example) would be nice, though.  Though really <a class="reference external" href="http://www.python.org/dev/peps/pep-3102/">PEP 3102</a> will probably make sentinals like this unnecessary for Python 3.0.</p>
<p>Note that you can also implement arguments with no default via <tt class="docutils literal"><span class="pre">*args</span></tt> and <tt class="docutils literal"><span class="pre">**kwargs</span></tt>, e.g.:</p>
<pre class="literal-block">def getuser(username, *args):
    if not user_exists(username):
        if not args:
            raise LookupError(...)
        else:
            return args[0]
</pre>
<p>But to do this right you should test that <tt class="docutils literal"><span class="pre">len(args)&lt;=1</span></tt>, raise appropriate errors, maybe consider keyword arguments, and so one.  It’s a pain in the butt, and when you’re finished the signature displayed by <tt class="docutils literal"><span class="pre">help()</span></tt> will be wrong anyway.</p>
</div></div>
    </content>
    <updated>2008-12-04T04:59:27Z</updated>
    <published>2008-12-04T04:59:27Z</published>
    <category scheme="http://blog.ianbicking.org" term="Python"/>
    <category scheme="http://blog.ianbicking.org" term="Programming"/>
    <author>
      <name>Ian Bicking</name>
      <uri>http://blog.ianbicking.org</uri>
    </author>
    <source>
      <id>http://blog.ianbicking.org/feed/atom/</id>
      <link href="http://blog.ianbicking.org" rel="alternate" type="text/html"/>
      <link href="http://blog.ianbicking.org/feed/atom/" rel="self" type="application/atom+xml"/>
      <title xml:lang="en">Ian Bicking: a blog</title>
      <updated>2008-12-27T19:09:18Z</updated>
    </source>
  </entry>

  <entry xml:lang="en-US">
    <id>tag:www.groovie.org,:Article/774</id>
    <link href="http://groovie.org/2008/12/03/wsgi-pylons-talk-in-san-francisco-tonight-dec-3rd" rel="alternate" type="text/html"/>
    <title xml:lang="en-US">WSGI/Pylons Talk in San Francisco tonight (Dec 3rd)</title>
    <summary type="xhtml" xml:lang="en-US"><div xmlns="http://www.w3.org/1999/xhtml"><p>I’ll be giving a talk tonight about <span class="caps">WSGI</span>, making apps using it, and Pylons tonight in San Francisco. If you live in the Bay Area and have been wanting to learn more about some of the packages utilizing <span class="caps">WSGI</span>, as well as Pylons, <span class="caps">RSVP</span> soon as they’d like to get some numbers for food.</p>


	<p>Here’s the <a href="http://www.meetup.com/sfpython/calendar/9194676/">meetup event page for the talk</a>. This is one of the longer talks I’ve given, so I should actually have enough time to cover the topics (WSGI, <span class="caps">WSGI</span> Middleware, making low-level <span class="caps">WSGI</span> apps with WebOb, making small <span class="caps">WSGI</span> stacks, and Pylons) in good detail, as well as some demonstrations.</p></div>
    </summary>
    <content type="xhtml" xml:lang="en-US"><div xmlns="http://www.w3.org/1999/xhtml"><p>I’ll be giving a talk tonight about <span class="caps">WSGI</span>, making apps using it, and Pylons tonight in San Francisco. If you live in the Bay Area and have been wanting to learn more about some of the packages utilizing <span class="caps">WSGI</span>, as well as Pylons, <span class="caps">RSVP</span> soon as they’d like to get some numbers for food.</p>


	<p>Here’s the <a href="http://www.meetup.com/sfpython/calendar/9194676/">meetup event page for the talk</a>. This is one of the longer talks I’ve given, so I should actually have enough time to cover the topics (WSGI, <span class="caps">WSGI</span> Middleware, making low-level <span class="caps">WSGI</span> apps with WebOb, making small <span class="caps">WSGI</span> stacks, and Pylons) in good detail, as well as some demonstrations.</p></div>
    </content>
    <updated>2008-12-03T18:03:09Z</updated>
    <published>2008-12-03T18:02:54Z</published>
    <category label="Python" scheme="http://groovie.org/category/python" term="python"/>
    <category label="Pylons" scheme="http://groovie.org/category/pylons" term="pylons"/>
    <author>
      <name>ben</name>
    </author>
    <source>
      <id>tag:www.groovie.org,:/articles</id>
      <link href="http://www.groovie.org" rel="alternate" type="text/html"/>
      <link href="http://www.groovie.org/articles.atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en-US">Code, Thoughts, and Misc Debris</subtitle>
      <title xml:lang="en-US">Groovie :</title>
      <updated>2008-12-09T01:04:48Z</updated>
    </source>
  </entry>

  <entry>
    <id>http://plope.com/Members/chrism/wherefore_couchdb</id>
    <link href="http://plope.com/Members/chrism/wherefore_couchdb" rel="alternate" type="text/html"/>
    <title>Wherefore CouchDB for ZODB Users</title>
    <summary>I took a bit of time to try to understand CouchDB as a longtime ZODB user.</summary>
    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>After reading Eric Florenzano's pair of posts <a href="http://www.eflorenzano.com/blog/post/why-couchdb-sucks/">Why CouchDB
Sucks</a> and
<a href="http://www.eflorenzano.com/blog/post/why-couchdb-rocks/">Why CouchDB
Rocks</a> and
being generally sick of not understanding why folks were excited about
this system, I put my spelunkin boots on to find out more about
<a href="http://incubator.apache.org/couchdb/">CouchDB</a> .</p>
<p>Before I set about trying to understand it, I didn't understand why
folks were excited about CouchDB given that a good number of its
features (append-only storage and "schemaless design" in particular)
have been present in <a href="http://wiki.zope.org/ZODB/FrontPage">ZODB</a> for a
little under ten years now.  Even more in particular, I was <em>really</em>
baffled as to why <em>Python</em> developers were excited about such a system
given the availability of ZODB.</p>
<p>I think I understand a bit better now.  ZODB and CouchDB are quite
similar in a lot of respects, but CouchDB beats ZODB on a narrow set
of goals that seem to be becoming more important these days.</p>
<p>First of all, availability is everything.  Since the documented way to
talk to CouchDB is over HTTP, and because you send it JSON primitives
containing data structures, it can be used from just about any program
written in just about any language.  And given that most folks are not
accustomed to being able to access any database without dark-magic C
bindings that speak strange connection-oriented TCP protocols (or the
equivalent embedded usually-crashy C bindings and awkward APIs), this
is probably a novelty for many users.  For Zope users, however, it's
not really that nifty, as we've been writing applications that use
HTTP to modify ZODB structures for quite a long time now.</p>
<p>Second of all, Damien Katz, CouchDB's author used to work on Lotus
Notes.  Though I've never developed under (or even used) Lotus Notes,
I do know that it is widely respected for its replication facilities.
CouchDB has offline replay replication that I imagine smells a lot
like Notes' built-in facilities for the same.  This is a big deal if
you're creating offline applications that need to synchronize to one
or more other databases.  ZODB itself has no such facility.</p>
<p>Third of all (and probably most importantly), CouchDB has a built-in
indexing and querying facility, in the form of
<a href="http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views?action=show&amp;redirect=Views">Views</a>.
This is something that ZODB does not share.  Instead, ZODB relies on
applications that are built on top of it to provide indexing
capabilities.  Moreover, creating indexes in applications that use
ZODB is historically a static kind of thing that the application
developer does "up front", or at least as a "software release" sort of
thing.  In CouchDB, creating an index is not really an exceptional
sort of event.  You tell the server, over HTTP, to create the index by
PUT-ing a view.  The first time any view is queried, CouchDB does the
indexing.  The following times the view is queried, it uses the index
you've created via the view to return the results faster.  No
application that I know of written on top of ZODB allows you to do
such a thing so casually.</p>
<p>There is a lot of Interweb talk about the stuff that Erlang provides
to CouchDB "for free", usually discussed in terms of stability and
"crash-only" design.  I don't know anything about this topic, so I
can't comment.  But CouchDB views are implemented in terms of
"map/reduce", which seems to imply that it will be possible to create
a farm of servers, each which can operate on only a slice of the
entire data structure in order to return a result.  However, I'm not
sure that this feature is actually implemented in any release of
CouchDB (I think it's just implied by its design).</p>
<p>All in all, CouchDB is a neat piece of software.  If I ever have to
build an application that needs to store data that needs to be
accessible from programs written in langauges other than Python across
HTTP and I don't need to use a relational database, it seems like a
great solution.</p>
<p>ZODB is probably still a better choice if your application is 100%
Python and you want arbitrary application database write logic to be
bounded within a single transaction.  CouchDB's transaction semantics
default to one-request-one-transaction, although they do have some
batching facilities.  Likewise, if you need to store large amounts of
data (like arbitrary numbers of multimegabyte or multigigabyte files),
ZODB is also probably a better choice, as it has blobs built in, and
the blobs don't need to be base64 encoded in memory as "attachments"
in order to transmit over a wire protocol.  Likewise, if you need to
store data structures that cannot be represented as JSON (like complex
object instances), ZODB really can't be beat.</p>
<p><strong>REVISION: Jan Lehnardt from the Apache projects sent this  via email wrt to blobs</strong></p>
<p/><pre>Thanks for the excellent overview. You certainly understand CouchDB and
this helps a lot removing FUD. Good work, thanks for taking the time.<p/>
<p>There's one addition I'd like to make: CouchDB no longer requires encoding
binary data in base64. Attachments for documents (that work just like email
attachments) have their own REST API since this summer where you can
send the raw binary data to create and retrieve documents. So storing loads
of data is no longer a bad idea.
</p></pre><p/>
<p>It's sort of obvious that most of what makes up CouchDB (besides the
currently unquantifiable-by-me benefits of it being written in Erlang,
anyway) could really be done in a reasonably straightforward ZODB web
application.  I actually started such an animal (just to learn, not to
use, I doubt I will play with it much more) called
<a href="http://svn.repoze.org/repoze.loveseat/trunk/">loveseat</a> .  The hardest
thing to get right would be the indexing and the replication, of
course.  It already does database creation and document creation and
retrieval, but not views or replication.  That would probably take
several months to get right, and I only had today. ;-)</p>
<p>I'll also note that ZODB has pretty terrible marketing compared to
CouchDB.  So I suppose I should try to do some.  For you Python
developers that don't know about ZODB, and whom are excited about
CouchDB, you might check out ZODB.  You can think of a ZODB database
as a place to hang a graph of arbitrary Python objects that becomes
persistent.  It's sort of an "uber-pickle"; it actually makes heavy
use of the pickle module under the hood, but it breaks the object
graph up into separate pickles, and so can be used for high volume
applications.  Changes take place transactionally.  Multiple clients
can make use of the same ZODB database over a protocol called "ZEO"
(Zope Enterprise Objects), which isn't nearly as scary as it sounds;
basically, you just set up a ZEO server and point the clients at it.
You can use packages like
<a href="http://static.repoze.org/catalogdocs/">repoze.catalog</a> to do indexing
and querying of object data that is inserted into the graph.  You can
use packages like
<a href="http://svn.repoze.org/repoze.folder/trunk/">repoze.folder</a> to hold
large collections of objects.  It's fast, and other than a few C
extensions, completely written in Python.  It's been around, like I
said for about ten years, and it's in production in tens (hundreds?)
of thousands of Zope deployments today.  The "Zope" in ZODB is a
"brand-only" name; it does not require Zope; it can be used in any
Python application.  It has limited deployment outside a Zope context,
but I think this is mostly cultural: I personally use it in non-Zope
applications frequently.  It works on all major platforms.  A good
place to start would be to do:
</p><pre>  easy_install ZODB3
</pre>
<p/>
<p>Then maybe take a gander at <a href="http://www.h7.dion.ne.jp/~harm/ZODB-Tutorial.py">this
tutorial</a> .  And for a
more verbose look, try <a href="http://doc.async.com.br/python/zodb-howto/zodb-zeo.html">this
howto</a> .</p></div>
    </content>
    <updated>2008-12-01T03:06:45Z</updated>
    <category term="python"/>
    <category term="tech"/>
    <category term="zope"/>
    <author>
      <name>chrism</name>
    </author>
    <source>
      <id>http://plope.com</id>
      <link href="http://plope.com/python.rss" rel="self" type="application/atom+xml"/>
      <link href="http://plope.com" rel="alternate" type="text/html"/>
      <subtitle>Chris McDonough's Python Feed</subtitle>
      <title>Chris McDonough's Python Feed</title>
      <updated>2008-12-30T09:44:16Z</updated>
    </source>
  </entry>

  <entry>
    <id>http://plope.com/Members/chrism/repoze-status-report-nov-2008</id>
    <link href="http://plope.com/Members/chrism/repoze-status-report-nov-2008" rel="alternate" type="text/html"/>
    <title>Repoze Status Report</title>
    <summary>I guess it's about time to give a status report about stuff happening under the "Repoze" flag.</summary>
    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I suppose it's about time to give a status report about
<a href="http://repoze.org">Repoze</a>.</p>
<p>Much of the action since the last report in March has revolved around
the <a href="http://static.repoze.org/bfgdocs">repoze.bfg</a> web framework.  It's
been about four months since we made the first release of that
package.  repoze.bfg is a web framework that could be most effectively
compared to <a href="http://pylonshq.com/">Pylons</a> inasmuch as it makes very
few decisions for you, and it's based on WSGI technologies and many
Ian Bicking creations such as Paste and WebOb.  However, it also
relies heavily on Zope technologies and concepts (such as object
publishing), and as a result, should be familar to Zope developers.
It's a bit different than <a href="http://grok.zope.org/">Grok</a> or Zope 3,
because it doesn't attempt to expose all the features of Zope 3
(particularly it doesn't expose much of the Zope Component
Architecture); instead it uses those technoloigies and exposes a more
minimal API.</p>
<p>In the time since we released 0.1, we've made 26 minor releases of the
repoze.bfg package (the most current version is 0.5.0).  During that
time, we've maintained backwards compatibility with the original
release.  The <a href="http://static.repoze.org/bfgdocs">docs</a> have been kept
pretty much in sync with the codebase.  Also over that time, we've
added the following features:</p>

<ul>
<li>URL-based dispatch via <a href="http://routes.groovie.org/">Routes</a></li>
<li>Allow views to be customized by a <a href="http://static.repoze.org/bfgdocs/narr/views.html#view-request-types">request
  type</a>
  (meaning that you can dispatch to a view based on attributes of the
  incoming request, such as whether it's a GET or a POST, or which
  hostname was used to request the page).</li>
<li>Created demo applications such as
  <a href="http://shootout.repoze.org">repoze.shootout</a> (courtesty of Carlos de
  la Guardia) and <a href="http://cluegun.repoze.org">repoze.cluegun</a> which can
  be used as "jumping-off-points" to develop your own bfg
  applications.</li>
<li>Added a
  <a href="http://svn.repoze.org/repoze.bfg.convention/trunk/">repoze.bfg.convention</a>
  module to allow people to configure bfg views without using ZCML.</li>
<li>Integrated <a href="http://chameleon.repoze.org">Chameleon</a> as the default
  templating implementation.  This templating implementation allows
  for text, ZPT-style, and Genshi-style templates.  Support for the
  more obscure nooks and crannies of each of these languages has been
  improved by Malthe Borch, and it's still much faster than most
  templating languages.</li>
<li>Added the capability to render views programmatically via the
  <a href="http://static.repoze.org/bfgdocs/api/view.html">repoze.bfg.view</a>
  module.</li>
<li>Added documentation about the <a href="http://static.repoze.org/bfgdocs/narr/startup.html#the-startup-process">repoze.bfg startup
  process</a></li>
<li>Added documentation about the <a href="http://static.repoze.org/bfgdocs/narr/environment.html">environment and
  configuration</a>
  of a repoze.bfg application.</li>
<li>Added a <a href="http://static.repoze.org/bfgdocs/narr/unittesting.html#using-the-repoze-bfg-testing-api">helper
  module</a>
  that makes unit testing easier</li>
<li>Many bugfixes.</li>

</ul>
<p>My company (<a href="http://agendaless.com/">Agendaless Consulting</a>) is
developing two customer projects on top of BFG now.  I also gave a
presentation about repoze.bfg at the most recent Plone conference.
The
<a href="http://static.repoze.org/presentations/bfg-ploneconf-oct2008.pdf">slides</a>
are available.</p>
<p>Of course, lots of stuff has been happening outside BFG too.</p>
<p>The <a href="http://repoze.org/">Repoze</a> project has been able to give about 25
new developers SVN commit access over the same amount of time, which
seems to bode well for the technologies that share the Repoze brand.
Many folks have been contributing a <em>lot</em> of stuff.  If I were to try
to account for all the stuff added to the <a href="http://svn.repoze.org/">repoze SVN
repository</a> in this message, it would be
essentially unreadable due to length.  Some highlights:</p>

<ul>
<li><a href="http://svn.repoze.org/repoze.what/trunk/">repoze.what</a> is a WSGI
  authorization system managed by Gustavo Narea, which builds on top
  of the <a href="http://static.repoze.org/whodocs/">repoze.who</a> .  This is a
  package which aims to provide generic WSGI apps with authorization,
  and will be used by TurboGears 2.</li>
<li><a href="http://static.repoze.org/catalogdocs">repoze.catalog</a> is an indexing
  and searching system which provides Zope-catalog-like features.</li>
<li><a href="http://svn.repoze.org/repoze.lemonade/trunk/">repoze.lemonade</a> is a
  package that brings Zope-CMF-like content registrations to WSGI
  apps.</li>
<li><a href="http://svn.repoze.org/repoze.workflow/trunk/">repoze.workflow</a> is a
  library that provides an implementation of a state machine usable
  for workflow-like tasks.</li>
<li><a href="http://svn.repoze.org/repoze.monty/trunk/">repoze.monty</a> is a form
  field marshalling library which contains code stolen from Zope which
  converts form elements to various data types when they are posted.</li>
<li><a href="http://svn.repoze.org/repoze.bitblt/trunk/">repoze.bitblt</a> is image
  resizing WSGI middleware written by Malthe Borch and Stefan Eletzhofer.
  <a href="http://svn.repoze.org/repoze.squeeze/trunk/">repoze.squeeze</a> is
  automagical JavaScript and CSS merging in WSGI middleware, also by
  Malthe and Stefan.</li>
<li><a href="http://svn.repoze.org/repoze.urispace/trunk/">repoze.urispace</a> is a
  WSGI application that implements the Akamai
  <a href="http://www.w3.org/TR/urispace.html">urispace</a> specification.</li>
<li><a href="http://svn.repoze.org/repoze.accelerator/trunk/">repoze.accelerator</a>
  is an implementation of an HTTP caching reverse proxy in WSGI
  middleware.</li>

</ul>
<p>We hope to be able to continue normalizing both Zope and non-Zope
technologies for use in a WSGI toolchain.  Things are looking pretty
good so far.  If you're interested in participating, or just asking
questions, don't hesitate to visit the <a href="irc://freenode.net#repoze">#repoze IRC
channel</a> or the <a href="http://lists.repoze.org/listinfo/repoze-dev">mailing
list</a> .</p></div>
    </content>
    <updated>2008-11-23T20:50:43Z</updated>
    <category term="python"/>
    <category term="tech"/>
    <category term="zope"/>
    <author>
      <name>chrism</name>
    </author>
    <source>
      <id>http://plope.com</id>
      <link href="http://plope.com/python.rss" rel="self" type="application/atom+xml"/>
      <link href="http://plope.com" rel="alternate" type="text/html"/>
      <subtitle>Chris McDonough's Python Feed</subtitle>
      <title>Chris McDonough's Python Feed</title>
      <updated>2008-12-30T09:44:16Z</updated>
    </source>
  </entry>

  <entry>
    <id>tag:blogger.com,1999:blog-4258103213764887486.post-2281517892564271814</id>
    <link href="http://artificialcode.blogspot.com/feeds/2281517892564271814/comments/default" rel="replies" type="application/atom+xml"/>
    <link href="https://www.blogger.com/comment.g?blogID=4258103213764887486&amp;postID=2281517892564271814" rel="replies" type="text/html"/>
    <link href="http://www.blogger.com/feeds/4258103213764887486/posts/default/2281517892564271814" rel="edit" type="application/atom+xml"/>
    <link href="http://www.blogger.com/feeds/4258103213764887486/posts/default/2281517892564271814" rel="self" type="application/atom+xml"/>
    <link href="http://artificialcode.blogspot.com/2008/11/touch-engine-iphone-to-appengine-bridge.html" rel="alternate" type="text/html"/>
    <title>Touch Engine:  An iPhone to AppEngine Bridge</title>
    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><a href="http://isonnet.appspot.com/static/iSonnet.png"><img alt="" border="0" src="http://isonnet.appspot.com/static/iSonnet.png" style="float: right; margin: 0 0 10px 10px; cursor: pointer; cursor: hand; width: 414px; height: 770px;"/></a><br/>Jonathan Saggau and I just released a nice iPhone to Google App Engine bridge project.<br/><br/><a href="http://code.google.com/p/touchengine/">http://code.google.com/p/touchengine/</a><br/><br/>He has a bunch more than I do about it on his <a href="http://www.jonathansaggau.com/blog/2008/11/touchengine_iphone_google_app.html">blog here.</a></div>
    </content>
    <updated>2008-11-22T01:44:55Z</updated>
    <published>2008-11-22T01:30:00Z</published>
    <category scheme="http://www.blogger.com/atom/ns#" term="GAE"/>
    <category scheme="http://www.blogger.com/atom/ns#" term="iPhone"/>
    <category scheme="http://www.blogger.com/atom/ns#" term="touchengine"/>
    <author>
      <name>Noah Gift</name>
      <email>noah.gift@gmail.com</email>
      <uri>http://www.blogger.com/profile/13144332122855013229</uri>
    </author>
    <source>
      <id>tag:blogger.com,1999:blog-4258103213764887486</id>
      <author>
        <name>Noah Gift</name>
        <email>noah.gift@gmail.com</email>
        <uri>http://www.blogger.com/profile/13144332122855013229</uri>
      </author>
      <link href="http://artificialcode.blogspot.com/feeds/posts/default" rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml"/>
      <link href="http://www.blogger.com/feeds/4258103213764887486/posts/default" rel="self" type="application/atom+xml"/>
      <link href="http://artificialcode.blogspot.com/" rel="alternate" type="text/html"/>
      <subtitle>This is Noah Gift's Coding Blog.  I only talk about coding and technical stuff here, and that is mostly Python, although I will mix in some other languages, and talk about Artificial Intelligence.</subtitle>
      <title>Artificial Code</title>
      <updated>2008-12-22T08:23:10Z</updated>
    </source>
  </entry>

  <entry>
    <id>tag:blogger.com,1999:blog-4258103213764887486.post-2058961893282695149</id>
    <link href="http://artificialcode.blogspot.com/feeds/2058961893282695149/comments/default" rel="replies" type="application/atom+xml"/>
    <link href="https://www.blogger.com/comment.g?blogID=4258103213764887486&amp;postID=2058961893282695149" rel="replies" type="text/html"/>
    <link href="http://www.blogger.com/feeds/4258103213764887486/posts/default/2058961893282695149" rel="edit" type="application/atom+xml"/>
    <link href="http://www.blogger.com/feeds/4258103213764887486/posts/default/2058961893282695149" rel="self" type="application/atom+xml"/>
    <link href="http://artificialcode.blogspot.com/2008/11/religion-api-to-brain.html" rel="alternate" type="text/html"/>
    <title>Religion:  An API To The Brain?</title>
    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">I have been thinking a lot about publicly exposed API's to the brain, in the context of AI programming.  It seems like one of the more interesting APIs to the brain is religion.  At PyWorks, there was a talk on AI, in which, hunter/prey behavior was simulated.  I wonder if it would be possible to simulate religious behavior in much the same way?  The specific behavior would be a pastor/flock behavior.<br/><br/>It seems like if we could simulate religious behavior, then it might allow us to gain insights into how to create artificial life that follows that same API.  How could that look in Python?<br/><br/><pre><br/>from brain.religion import PublicApi<br/>from brain import emotion, rationality, memory<br/><br/>p = PublicApi(denomination="MyChurch", scale=10, **kw)<br/></pre><br/><br/><br/>One of the more fascinating parts would be to study the actual brain scans of deeply religious and non religious people while they talk about politics, and other hot issues.  This research could help to identify what regions of the brain are firing, and why, and thus help software engineers create a similar API with a future artificial life form.</div>
    </content>
    <updated>2008-11-19T05:54:04Z</updated>
    <published>2008-11-19T04:27:00Z</published>
    <category scheme="http://www.blogger.com/atom/ns#" term="ai"/>
    <category scheme="http://www.blogger.com/atom/ns#" term="api"/>
    <category scheme="http://www.blogger.com/atom/ns#" term="brain"/>
    <author>
      <name>Noah Gift</name>
      <email>noah.gift@gmail.com</email>
      <uri>http://www.blogger.com/profile/13144332122855013229</uri>
    </author>
    <source>
      <id>tag:blogger.com,1999:blog-4258103213764887486</id>
      <author>
        <name>Noah Gift</name>
        <email>noah.gift@gmail.com</email>
        <uri>http://www.blogger.com/profile/13144332122855013229</uri>
      </author>
      <link href="http://artificialcode.blogspot.com/feeds/posts/default" rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml"/>
      <link href="http://www.blogger.com/feeds/4258103213764887486/posts/default" rel="self" type="application/atom+xml"/>
      <link href="http://artificialcode.blogspot.com/" rel="alternate" type="text/html"/>
      <subtitle>This is Noah Gift's Coding Blog.  I only talk about coding and technical stuff here, and that is mostly Python, although I will mix in some other languages, and talk about Artificial Intelligence.</subtitle>
      <title>Artificial Code</title>
      <updated>2008-12-22T08:23:10Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://blog.repoze.org/2008/11/14/tom_gross-repoze_plone-20081113</id>
    <link href="http://blog.repoze.org/tom_gross-repoze_plone-20081113.html" rel="alternate" type="text/html"/>
    <title xml:lang="en">Using repoze in a Plone buildout</title>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p> Props to Tom Gross, who has tackled the job of figuring out and
<a href="http://valentinewebsystems.com/2008/02/19/plone-repoze-and-buildout">documenting</a> the minimal set of changes needed to make use of
repoze and WSGI from within a stock Plone buildout.</p>

<p><em>
 Correction:  Tom Gross sent me the link, but it was to a blog post by
 Tim Terlegård.  Apologies for the mistake!</em></p></div>
    </content>
    <updated>2008-11-14T15:42:11Z</updated>
    <published>2008-11-14T15:42:11Z</published>
    <source>
      <id>http://blog.repoze.org/index.atom</id>
      <author>
        <name>The Repoze Team</name>
        <email>repoze-dev@lists.repoze.org</email>
        <uri>http://blog.repoze.org/index.atom</uri>
      </author>
      <link href="http://blog.repoze.org" rel="alternate" type="text/html"/>
      <link href="http://blog.repoze.org/index.atom" rel="self" type="application/atom+xml"/>
      <rights xml:lang="en">Copyright 2007 Agendaless Consulting, Inc.</rights>
      <subtitle xml:lang="en">Notes on the Repoze platform</subtitle>
      <title xml:lang="en">Repoze Notes</title>
      <updated>2008-11-14T15:42:11Z</updated>
    </source>
  </entry>

  <entry xml:lang="en-US">
    <id>tag:www.groovie.org,:Article/773</id>
    <link href="http://groovie.org/2008/11/13/book-meme" rel="alternate" type="text/html"/>
    <title xml:lang="en-US">Book Meme</title>
    <summary type="xhtml" xml:lang="en-US"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://jtauber.com/blog/2008/11/12/book_meme/">Via</a> :</p>


	<ul>
	<li>Grab the nearest book.</li>
		<li>Open it to page 56.</li>
		<li>Find the fifth sentence.</li>
		<li>Post the text of the sentence in your journal along with these instructions.</li>
		<li>Don’t dig for your favorite book, the cool book, or the intellectual one: pick the <span class="caps">CLOSEST</span>.</li>
	</ul>


	<p>From <strong>The Omnivore’s Deilemma</strong> by Michael Pollan:</p>


	<blockquote>
		<p>And then of course there’s the corn itself, which if corn could form an opinion would surely marvel at the absurdity of it all—and at its great good fortune.</p>
	</blockquote>


	<p>I don’t usually do the meme thing, but I happened to have a book on my desk so it took so little effort… only one chapter to go!</p></div>
    </summary>
    <content type="xhtml" xml:lang="en-US"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://jtauber.com/blog/2008/11/12/book_meme/">Via</a> :</p>


	<ul>
	<li>Grab the nearest book.</li>
		<li>Open it to page 56.</li>
		<li>Find the fifth sentence.</li>
		<li>Post the text of the sentence in your journal along with these instructions.</li>
		<li>Don’t dig for your favorite book, the cool book, or the intellectual one: pick the <span class="caps">CLOSEST</span>.</li>
	</ul>


	<p>From <strong>The Omnivore’s Deilemma</strong> by Michael Pollan:</p>


	<blockquote>
		<p>And then of course there’s the corn itself, which if corn could form an opinion would surely marvel at the absurdity of it all—and at its great good fortune.</p>
	</blockquote>


	<p>I don’t usually do the meme thing, but I happened to have a book on my desk so it took so little effort… only one chapter to go!</p></div>
    </content>
    <updated>2008-11-13T19:03:59Z</updated>
    <published>2008-11-13T19:03:57Z</published>
    <category label="Thoughts" scheme="http://groovie.org/category/thoughts" term="thoughts"/>
    <author>
      <name>ben</name>
    </author>
    <source>
      <id>tag:www.groovie.org,:/articles</id>
      <link href="http://www.groovie.org" rel="alternate" type="text/html"/>
      <link href="http://www.groovie.org/articles.atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en-US">Code, Thoughts, and Misc Debris</subtitle>
      <title xml:lang="en-US">Groovie :</title>
      <updated>2008-12-09T01:04:48Z</updated>
    </source>
  </entry>

  <entry xml:lang="en">
    <id>http://blog.repoze.org/2008/11/13/tom_gross-repoze_plone-20081113</id>
    <link href="http://blog.repoze.org/tom_gross-repoze_plone-20081113.html" rel="alternate" type="text/html"/>
    <title xml:lang="en">Using repoze in a Plone buildout</title>
    <content type="xhtml" xml:lang="en"><div xmlns="http://www.w3.org/1999/xhtml"><p> Props to Tom Gross, who has tackled the job of figuring out and
<a href="http://valentinewebsystems.com/2008/02/19/plone-repoze-and-buildout">documenting</a> the minimal set of changes needed to make use of
repoze and WSGI from within a stock Plone buildout.</p></div>
    </content>
    <updated>2008-11-13T16:20:16Z</updated>
    <published>2008-11-13T16:20:16Z</published>
    <source>
      <id>http://blog.repoze.org/index.atom</id>
      <author>
        <name>The Repoze Team</name>
        <email>repoze-dev@lists.repoze.org</email>
        <uri>http://blog.repoze.org/index.atom</uri>
      </author>
      <link href="http://blog.repoze.org" rel="alternate" type="text/html"/>
      <link href="http://blog.repoze.org/index.atom" rel="self" type="application/atom+xml"/>
      <rights xml:lang="en">Copyright 2007 Agendaless Consulting, Inc.</rights>
      <subtitle xml:lang="en">Notes on the Repoze platform</subtitle>
      <title xml:lang="en">Repoze Notes</title>
      <updated>2008-11-13T16:20:16Z</updated>
    </source>
  </entry>

  <entry>
    <id>tag:blogger.com,1999:blog-4258103213764887486.post-4643213179558624186</id>
    <link href="http://artificialcode.blogspot.com/feeds/4643213179558624186/comments/default" rel="replies" type="application/atom+xml"/>
    <link href="https://www.blogger.com/comment.g?blogID=4258103213764887486&amp;postID=4643213179558624186" rel="replies" type="text/html"/>
    <link href="http://www.blogger.com/feeds/4258103213764887486/posts/default/4643213179558624186" rel="edit" type="application/atom+xml"/>
    <link href="http://www.blogger.com/feeds/4258103213764887486/posts/default/4643213179558624186" rel="self" type="application/atom+xml"/>
    <link href="http://artificialcode.blogspot.com/2008/11/pyworks-tutorial-on-google-app-engine.html" rel="alternate" type="text/html"/>
    <title>PyWorks Tutorial on Google App Engine</title>
    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">Here is the website for the 3 Hour Tutorial I taught at PyWorks on Google App Engine, GData API, and Google's AJAX API's.  There is some good stuff in there, especially the tutorial on how to programatically control Blogger.  If you write a command line tool to write to blogger, let me know!<br/><br/><a href="http://row.appspot.com/">http://row.appspot.com/</a></div>
    </content>
    <updated>2008-11-13T14:37:56Z</updated>
    <published>2008-11-13T14:35:00Z</published>
    <category scheme="http://www.blogger.com/atom/ns#" term="Google App Engine"/>
    <category scheme="http://www.blogger.com/atom/ns#" term="AJAX API"/>
    <category scheme="http://www.blogger.com/atom/ns#" term="GData"/>
    <author>
      <name>Noah Gift</name>
      <email>noah.gift@gmail.com</email>
      <uri>http://www.blogger.com/profile/13144332122855013229</uri>
    </author>
    <source>
      <id>tag:blogger.com,1999:blog-4258103213764887486</id>
      <author>
        <name>Noah Gift</name>
        <email>noah.gift@gmail.com</email>
        <uri>http://www.blogger.com/profile/13144332122855013229</uri>
      </author>
      <link href="http://artificialcode.blogspot.com/feeds/posts/default" rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml"/>
      <link href="http://www.blogger.com/feeds/4258103213764887486/posts/default" rel="self" type="application/atom+xml"/>
      <link href="http://artificialcode.blogspot.com/" rel="alternate" type="text/html"/>
      <subtitle>This is Noah Gift's Coding Blog.  I only talk about coding and technical stuff here, and that is mostly Python, although I will mix in some other languages, and talk about Artificial Intelligence.</subtitle>
      <title>Artificial Code</title>
      <updated>2008-12-22T08:23:10Z</updated>
    </source>
  </entry>

  <entry xml:lang="en-US">
    <id>tag:www.groovie.org,:Article/765</id>
    <link href="http://groovie.org/2008/05/06/most-bizarre-git-service-and-other-stupid-rails-powered-businesses" rel="alternate" type="text/html"/>
    <title xml:lang="en-US">Most bizarre Git service and other stupid Rails powered "businesses"</title>
    <summary type="xhtml" xml:lang="en-US"><div xmlns="http://www.w3.org/1999/xhtml"><p>I can’t help but get totally baffled when I see a <a href="http://github.com/plans">business model like this</a>.</p>


	<p>Yes, that’s right, you can pay for the privilege of keeping a copy of your <strong>distributed</strong> version control system (DVCS) <strong>private</strong> repositories on someone else’s machines. You also get to pay depending on how many people you want to allow to collaborate on it.</p>


	<p>Nevermind that one of the <b>entire points of a <span class="caps">DVCS</span></b> is that you <strong>do <span class="caps">NOT</span></strong> need a central repository. Does anyone actually work at a “Large Company” (as the page indicates) that would be stupid enough to pay $100/month so they can put all their proprietary and very personal code repositories on a third party web service?</p>


	<p>So what are you paying for? Well, to start with, they have awesome integration with <a href="http://lighthouseapp.com/">Lighthouse</a>, since we all know there’s no decent free open-source issue tracking system… <strong>cough</strong> <a href="http://trac.edgewall.org/">trac</a> <strong>cough</strong> <a href="http://roundup.sourceforge.net/">roundup</a> <strong>cough</strong>. Oh wait, since there’s absolutely no simple web-based issue tracking systems, let’s have another <a href="http://sera.lighthouseapp.com/plans">slick business model</a> to get people to pay for a stripped down Trac (but this time with a really pretty UI)!</p>


	<p>What do these sites have in common? Rails, “look ma, I can copy-paste the business plan too” pricing models, and some good graphic designers at the helm. There also seems to be an interesting amount of promotion between these sites, as well as a nice <a href="http://www.loudthinking.com/posts/24-gits-avalanche">blog post from the Rails creator himself</a> promoting GitHub. I’m sure no one who has read <a href="http://www.zedshaw.com/rants/rails_is_a_ghetto.html">this rant</a> should be surprised though.</p>


	<p>I only hope that no one starts to believe that a <span class="caps">DVCS</span> actually requires these “please pay” copies of their <span class="caps">DVCS</span> repo.</p>


	<p><b>Update (11/12/2008)</b>: This post is apparently popular enough to come up on occasion several times now, so I thought I’d clarify a bit more.</p>


	<p>Many people have suggested the obvious benefits of services like GitHUB, and I’ve used one just like it myself, <a href="http://www.bitbucket.org/">BitBucket</a>. These sites are great for open-source projects as many have rightfully pointed out, they make it easy to collaborate and fork projects, and easy for maintainers to pull patches from forks after looking them over.</p>


	<p>Most of their social-network features become moot though when working on company code thats not open-source, (note that this rant is directed entirely at the paid service options which are for <strong>private</strong> repos). None of the companies I’ve worked at would ever let their private source code leave their own servers. Since you need to deploy a site anyways (many times to a remote computer), which will generally require ssh access, its trivial to use the modern <span class="caps">DVCS</span>’s over ssh…. which makes it seem very silly to me to be paying so much to another company for a bunch of useless social features for a private repo.</p>


	<p>Part of the original humor intended in this rant was that a <strong>centralized repo hub</strong> has become one of the stronger selling points for a <strong>distributed</strong> VCS system. Unfortunately many seemed to have missed that point.</p></div>
    </summary>
    <content type="xhtml" xml:lang="en-US"><div xmlns="http://www.w3.org/1999/xhtml"><p>I can’t help but get totally baffled when I see a <a href="http://github.com/plans">business model like this</a>.</p>


	<p>Yes, that’s right, you can pay for the privilege of keeping a copy of your <strong>distributed</strong> version control system (DVCS) <strong>private</strong> repositories on someone else’s machines. You also get to pay depending on how many people you want to allow to collaborate on it.</p>


	<p>Nevermind that one of the <b>entire points of a <span class="caps">DVCS</span></b> is that you <strong>do <span class="caps">NOT</span></strong> need a central repository. Does anyone actually work at a “Large Company” (as the page indicates) that would be stupid enough to pay $100/month so they can put all their proprietary and very personal code repositories on a third party web service?</p>


	<p>So what are you paying for? Well, to start with, they have awesome integration with <a href="http://lighthouseapp.com/">Lighthouse</a>, since we all know there’s no decent free open-source issue tracking system… <strong>cough</strong> <a href="http://trac.edgewall.org/">trac</a> <strong>cough</strong> <a href="http://roundup.sourceforge.net/">roundup</a> <strong>cough</strong>. Oh wait, since there’s absolutely no simple web-based issue tracking systems, let’s have another <a href="http://sera.lighthouseapp.com/plans">slick business model</a> to get people to pay for a stripped down Trac (but this time with a really pretty UI)!</p>


	<p>What do these sites have in common? Rails, “look ma, I can copy-paste the business plan too” pricing models, and some good graphic designers at the helm. There also seems to be an interesting amount of promotion between these sites, as well as a nice <a href="http://www.loudthinking.com/posts/24-gits-avalanche">blog post from the Rails creator himself</a> promoting GitHub. I’m sure no one who has read <a href="http://www.zedshaw.com/rants/rails_is_a_ghetto.html">this rant</a> should be surprised though.</p>


	<p>I only hope that no one starts to believe that a <span class="caps">DVCS</span> actually requires these “please pay” copies of their <span class="caps">DVCS</span> repo.</p>


	<p><b>Update (11/12/2008)</b>: This post is apparently popular enough to come up on occasion several times now, so I thought I’d clarify a bit more.</p>


	<p>Many people have suggested the obvious benefits of services like GitHUB, and I’ve used one just like it myself, <a href="http://www.bitbucket.org/">BitBucket</a>. These sites are great for open-source projects as many have rightfully pointed out, they make it easy to collaborate and fork projects, and easy for maintainers to pull patches from forks after looking them over.</p>


	<p>Most of their social-network features become moot though when working on company code thats not open-source, (note that this rant is directed entirely at the paid service options which are for <strong>private</strong> repos). None of the companies I’ve worked at would ever let their private source code leave their own servers. Since you need to deploy a site anyways (many times to a remote computer), which will generally require ssh access, its trivial to use the modern <span class="caps">DVCS</span>’s over ssh…. which makes it seem very silly to me to be paying so much to another company for a bunch of useless social features for a private repo.</p>


	<p>Part of the original humor intended in this rant was that a <strong>centralized repo hub</strong> has become one of the stronger selling points for a <strong>distributed</strong> VCS system. Unfortunately many seemed to have missed that point.</p></div>
    </content>
    <updated>2008-11-12T19:18:42Z</updated>
    <published>2008-05-07T03:07:38Z</published>
    <category label="Rants" scheme="http://groovie.org/category/rants" term="rants"/>
    <author>
      <name>ben</name>
    </author>
    <source>
      <id>tag:www.groovie.org,:/articles</id>
      <link href="http://www.groovie.org" rel="alternate" type="text/html"/>
      <link href="http://www.groovie.org/articles.atom" rel="self" type="application/atom+xml"/>
      <subtitle xml:lang="en-US">Code, Thoughts, and Misc Debris</subtitle>
      <title xml:lang="en-US">Groovie :</title>
      <updated>2008-12-03T18:03:09Z</updated>
    </source>
  </entry>

  <entry>
    <id>tag:blogger.com,1999:blog-4258103213764887486.post-8150019653637580331</id>
    <link href="http://artificialcode.blogspot.com/feeds/8150019653637580331/comments/default" rel="replies" type="application/atom+xml"/>
    <link href="https://www.blogger.com/comment.g?blogID=4258103213764887486&amp;postID=8150019653637580331" rel="replies" type="text/html"/>
    <link href="http://www.blogger.com/feeds/4258103213764887486/posts/default/8150019653637580331" rel="edit" type="application/atom+xml"/>
    <link href="http://www.blogger.com/feeds/4258103213764887486/posts/default/8150019653637580331" rel="self" type="application/atom+xml"/>
    <link href="http://artificialcode.blogspot.com/2008/11/shakespears-sonnets-as-python.html" rel="alternate" type="text/html"/>
    <title>Shakespear's Sonnets as a Python Dictionary</title>
    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">This was a bit of a pain to get into a python datastructure, so I am posting it in case someone else needs it in the future.<br/><br/>I constructed it from this:<br/><br/><a href="http://www.it.usyd.edu.au/~matty/Shakespeare/texts/poetry/sonnets">http://www.it.usyd.edu.au/~matty/Shakespeare/texts/poetry/sonnets<br/></a><br/><pre><br/>verses={1:["I","""FROM fairest creatures we desire increase,<br/>That thereby beauty's rose might never die,<br/>But as the riper should by time decease,<br/>His tender heir might bear his memory:<br/>But thou, contracted to thine own bright eyes,<br/>Feed'st thy light'st flame with self-substantial fuel,<br/>Making a famine where abundance lies,<br/>Thyself thy foe, to thy sweet self too cruel.<br/>Thou that art now the world's fresh ornament<br/>And only herald to the gaudy spring,<br/>Within thine own bud buriest thy content<br/>And, tender churl, makest waste in niggarding.<br/>  Pity the world, or else this glutton be,<br/>  To eat the world's due, by the grave and thee.<br/>"""],<br/>2:["II","""When forty winters shall beseige thy brow,<br/>And dig deep trenches in thy beauty's field,<br/>Thy youth's proud livery, so gazed on now,<br/>Will be a tatter'd weed, of small worth held:<br/>Then being ask'd where all thy beauty lies,<br/>Where all the treasure of thy lusty days,<br/>To say, within thine own deep-sunken eyes,<br/>Were an all-eating shame and thriftless praise.<br/>How much more praise deserved thy beauty's use,<br/>If thou couldst answer 'This fair child of mine<br/>Shall sum my count and make my old excuse,'<br/>Proving his beauty by succession thine!<br/>  This were to be new made when thou art old,<br/>  And see thy blood warm when thou feel'st it cold.<br/>"""],3:["III","""Look in thy glass, and tell the face thou viewest<br/>Now is the time that face should form another;<br/>Whose fresh repair if now thou not renewest,<br/>Thou dost beguile the world, unbless some mother.<br/>For where is she so fair whose unear'd womb<br/>Disdains the tillage of thy husbandry?<br/>Or who is he so fond will be the tomb<br/>Of his self-love, to stop posterity?<br/>Thou art thy mother's glass, and she in thee<br/>Calls back the lovely April of her prime:<br/>So thou through windows of thine age shall see<br/>Despite of wrinkles this thy golden time.<br/>  But if thou live, remember'd not to be,<br/>  Die single, and thine image dies with thee.<br/>"""],4:["IV","""Unthrifty loveliness, why dost thou spend<br/>Upon thyself thy beauty's legacy?<br/>Nature's bequest gives nothing but doth lend,<br/>And being frank she lends to those are free.<br/>Then, beauteous niggard, why dost thou abuse<br/>The bounteous largess given thee to give?<br/>Profitless usurer, why dost thou use<br/>So great a sum of sums, yet canst not live?<br/>For having traffic with thyself alone,<br/>Thou of thyself thy sweet self dost deceive.<br/>Then how, when nature calls thee to be gone,<br/>What acceptable audit canst thou leave?<br/>  Thy unused beauty must be tomb'd with thee,<br/>  Which, used, lives th' executor to be.<br/>"""],5:["V","""Those hours, that with gentle work did frame<br/>The lovely gaze where every eye doth dwell,<br/>Will play the tyrants to the very same<br/>And that unfair which fairly doth excel:<br/>For never-resting time leads summer on<br/>To hideous winter and confounds him there;<br/>Sap cheque'd with frost and lusty leaves quite gone,<br/>Beauty o'ersnow'd and bareness every where:<br/>Then, were not summer's distillation left,<br/>A liquid prisoner pent in walls of glass,<br/>Beauty's effect with beauty were bereft,<br/>Nor it nor no remembrance what it was:<br/>  But flowers distill'd though they with winter meet,<br/>  Leese but their show; their substance still lives sweet.<br/>"""],6:["VI","""Then let not winter's ragged hand deface<br/>In thee thy summer, ere thou be distill'd:<br/>Make sweet some vial; treasure thou some place<br/>With beauty's treasure, ere it be self-kill'd.<br/>That use is not forbidden usury,<br/>Which happies those that pay the willing loan;<br/>That's for thyself to breed another thee,<br/>Or ten times happier, be it ten for one;<br/>Ten times thyself were happier than thou art,<br/>If ten of thine ten times refigured thee:<br/>Then what could death do, if thou shouldst depart,<br/>Leaving thee living in posterity?<br/>  Be not self-will'd, for thou art much too fair<br/>  To be death's conquest and make worms thine heir.<br/>"""],7:["VII","""Lo! in the orient when the gracious light<br/>Lifts up his burning head, each under eye<br/>Doth homage to his new-appearing sight,<br/>Serving with looks his sacred majesty;<br/>And having climb'd the steep-up heavenly hill,<br/>Resembling strong youth in his middle age,<br/>yet mortal looks adore his beauty still,<br/>Attending on his golden pilgrimage;<br/>But when from highmost pitch, with weary car,<br/>Like feeble age, he reeleth from the day,<br/>The eyes, 'fore duteous, now converted are<br/>From his low tract and look another way:<br/>  So thou, thyself out-going in thy noon,<br/>  Unlook'd on diest, unless thou get a son.<br/>"""],8:["VIII","""Music to hear, why hear'st thou music sadly?<br/>Sweets with sweets war not, joy delights in joy.<br/>Why lovest thou that which thou receivest not gladly,<br/>Or else receivest with pleasure thine annoy?<br/>If the true concord of well-tuned sounds,<br/>By unions married, do offend thine ear,<br/>They do but sweetly chide thee, who confounds<br/>In singleness the parts that thou shouldst bear.<br/>Mark how one string, sweet husband to another,<br/>Strikes each in each by mutual ordering,<br/>Resembling sire and child and happy mother<br/>Who all in one, one pleasing note do sing:<br/>  Whose speechless song, being many, seeming one,<br/>  Sings this to thee: 'thou single wilt prove none.'<br/>"""],9:["IX","""Is it for fear to wet a widow's eye<br/>That thou consumest thyself in single life?<br/>Ah! if thou issueless shalt hap to die.<br/>The world will wail thee, like a makeless wife;<br/>The world will be thy widow and still weep<br/>That thou no form of thee hast left behind,<br/>When every private widow well may keep<br/>By children's eyes her husband's shape in mind.<br/>Look, what an unthrift in the world doth spend<br/>Shifts but his place, for still the world enjoys it;<br/>But beauty's waste hath in the world an end,<br/>And kept unused, the user so destroys it.<br/>  No love toward others in that bosom sits<br/>  That on himself such murderous shame commits.<br/>"""],10:["X","""For shame! deny that thou bear'st love to any,<br/>Who for thyself art so unprovident.<br/>Grant, if thou wilt, thou art beloved of many,<br/>But that thou none lovest is most evident;<br/>For thou art so possess'd with murderous hate<br/>That 'gainst thyself thou stick'st not to conspire.<br/>Seeking that beauteous roof to ruinate<br/>Which to repair should be thy chief desire.<br/>O, change thy thought, that I may change my mind!<br/>Shall hate be fairer lodged than gentle love?<br/>Be, as thy presence is, gracious and kind,<br/>Or to thyself at least kind-hearted prove:<br/>  Make thee another self, for love of me,<br/>  That beauty still may live in thine or thee.<br/>"""],11:["XI","""As fast as thou shalt wane, so fast thou growest<br/>In one of thine, from that which thou departest;<br/>And that fresh blood which youngly thou bestowest<br/>Thou mayst call thine when thou from youth convertest.<br/>Herein lives wisdom, beauty and increase:<br/>Without this, folly, age and cold decay:<br/>If all were minded so, the times should cease<br/>And threescore year would make the world away.<br/>Let those whom Nature hath not made for store,<br/>Harsh featureless and rude, barrenly perish:<br/>Look, whom she best endow'd she gave the more;<br/>Which bounteous gift thou shouldst in bounty cherish:<br/>  She carved thee for her seal, and meant thereby<br/>  Thou shouldst print more, not let that copy die.<br/>"""],12:["XII","""When I do count the clock that tells the time,<br/>And see the brave day sunk in hideous night;<br/>When I behold the violet past prime,<br/>And sable curls all silver'd o'er with white;<br/>When lofty trees I see barren of leaves<br/>Which erst from heat did canopy the herd,<br/>And summer's green all girded up in sheaves<br/>Borne on the bier with white and bristly beard,<br/>Then of thy beauty do I question make,<br/>That thou among the wastes of time must go,<br/>Since sweets and beauties do themselves forsake<br/>And die as fast as they see others grow;<br/>  And nothing 'gainst Time's scythe can make defence<br/>  Save breed, to brave him when he takes thee hence.<br/>"""],13:["XIII","""O, that you were yourself! but, love, you are<br/>No longer yours than you yourself here live:<br/>Against this coming end you should prepare,<br/>And your sweet semblance to some other give.<br/>So should that beauty which you hold in lease<br/>Find no determination: then you were<br/>Yourself again after yourself's decease,<br/>When your sweet issue your sweet form should bear.<br/>Who lets so fair a house fall to decay,<br/>Which husbandry in honour might uphold<br/>Against the stormy gusts of winter's day<br/>And barren rage of death's eternal cold?<br/>  O, none but unthrifts! Dear my love, you know<br/>  You had a father: let your son say so.<br/>"""],14:["XIV","""Not from the stars do I my judgment pluck;<br/>And yet methinks I have astronomy,<br/>But not to tell of good or evil luck,<br/>Of plagues, of dearths, or seasons' quality;<br/>Nor can I fortune to brief minutes tell,<br/>Pointing to each his thunder, rain and wind,<br/>Or say with princes if it shall go well,<br/>By oft predict that I in heaven find:<br/>But from thine eyes my knowledge I derive,<br/>And, constant stars, in them I read such art<br/>As truth and beauty shall together thrive,<br/>If from thyself to store thou wouldst convert;<br/>  Or else of thee this I prognosticate:<br/>  Thy end is truth's and beauty's doom and date.<br/>"""],15:["XV","""When I consider every thing that grows<br/>Holds in perfection but a little moment,<br/>That this huge stage presenteth nought but shows<br/>Whereon the stars in secret influence comment;<br/>When I perceive that men as plants increase,<br/>Cheered and cheque'd even by the self-same sky,<br/>Vaunt in their youthful sap, at height decrease,<br/>And wear their brave state out of memory;<br/>Then the conceit of this inconstant stay<br/>Sets you most rich in youth before my sight,<br/>Where wasteful Time debateth with Decay,<br/>To change your day of youth to sullied night;<br/>  And all in war with Time for love of you,<br/>  As he takes from you, I engraft you new.<br/>"""],16:["XVI","""But wherefore do not you a mightier way<br/>Make war upon this bloody tyrant, Time?<br/>And fortify yourself in your decay<br/>With means more blessed than my barren rhyme?<br/>Now stand you on the top of happy hours,<br/>And many maiden gardens yet unset<br/>With virtuous wish would bear your living flowers,<br/>Much liker than your painted counterfeit:<br/>So should the lines of life that life repair,<br/>Which this, Time's pencil, or my pupil pen,<br/>Neither in inward worth nor outward fair,<br/>Can make you live yourself in eyes of men.<br/>  To give away yourself keeps yourself still,<br/>  And you must live, drawn by your own sweet skill.<br/>"""],17:["XVII","""Who will believe my verse in time to come,<br/>If it were fill'd with your most high deserts?<br/>Though yet, heaven knows, it is but as a tomb<br/>Which hides your life and shows not half your parts.<br/>If I could write the beauty of your eyes<br/>And in fresh numbers number all your graces,<br/>The age to come would say 'This poet lies:<br/>Such heavenly touches ne'er touch'd earthly faces.'<br/>So should my papers yellow'd with their age<br/>Be scorn'd like old men of less truth than tongue,<br/>And your true rights be term'd a poet's rage<br/>And stretched metre of an antique song:<br/>  But were some child of yours alive that time,<br/>  You should live twice; in it and in my rhyme.<br/>"""],18:["XVIII","""Shall I compare thee to a summer's day?<br/>Thou art more lovely and more temperate:<br/>Rough winds do shake the darling buds of May,<br/>And summer's lease hath all too short a date:<br/>Sometime too hot the eye of heaven shines,<br/>And often is his gold complexion dimm'd;<br/>And every fair from fair sometime declines,<br/>By chance or nature's changing course untrimm'd;<br/>But thy eternal summer shall not fade<br/>Nor lose possession of that fair thou owest;<br/>Nor shall Death brag thou wander'st in his shade,<br/>When in eternal lines to time thou growest:<br/>  So long as men can breathe or eyes can see,<br/>  So long lives this and this gives life to thee.<br/>"""],19:["XIX","""Devouring Time, blunt thou the lion's paws,<br/>And make the earth devour her own sweet brood;<br/>Pluck the keen teeth from the fierce tiger's jaws,<br/>And burn the long-lived phoenix in her blood;<br/>Make glad and sorry seasons as thou fleets,<br/>And do whate'er thou wilt, swift-footed Time,<br/>To the wide world and all her fading sweets;<br/>But I forbid thee one most heinous crime:<br/>O, carve not with thy hours my love's fair brow,