Log in

No account? Create an account
entries friends calendar profile Scribbles in the Dark Previous Previous Next Next
Pondering Python Path Programming Problems - Please Visit http://glyph.twistedmatrix.com/ - This Blog Is Closed.
Sorry about the ads, I can't turn them off.
Pondering Python Path Programming Problems
Most Python programmers are at least vaguely aware of sys.path, PYTHONPATH, and the effect they have on importing modules.  However, there's a lot of confusion about how to use them properly, and how powerful these concepts can be if you know how to apply them.  Twisted - and in particular the plugin system - make very nuanced use of the python path, which can sometimes make things that use them a bit hard to explain, since there isn't a well-defined common terminology or good library support for working with paths, except to the extent that they are used by importers.

This article is really about two things: the general concept of paths, and the Twisted module "twisted.python.modules", which provides some specific implementations of my ideas about the python path.

First of all, why should you care about python paths?  To put it simply, because very bizarre problems can result if you use them incorrectly.  Also, you need to know about them in order to use Twisted's plugin system effectively, and of course you want to use Twisted, right?  :)

What kind of problems?  Even very popular, well-regarded Python packages by very experienced Python programmers sometimes mess this up pretty badly.  Here's a simple example of what can go wrong with a package you probably know of, the Python Imaging Library:
>>> import Image
>>> import PIL.Image
>>> img = PIL.Image.Image()
>>> Image.__file__
>>> PIL.Image.__file__
Here we can see that you can import PIL's "Image" module as either "PIL.Image" or simply "Image".  Both these modules are loaded from the same file.  On the face of it, this is simply a convenience.  But let's dig deeper:
>>> PIL.Image == Image
The modules aren't the same object!  This has some nasty practical repercussions:
>>> isinstance(img, Image.Image)
For example, Image objects created from one of these PIL modules do not register as instances from the other, even though they're all the same code.  Worse yet, this mistake can become "sticky" if you use them along with a module like pickle, which carries the module and class name into the data:
>>> from cPickle import dumps
>>> img2 = Image.Image()
>>> dumps(img)
"(iPIL.Image\nImage\n ...
>>> dumps(img2)
"(iImage\nImage\n ...
Many Python features and packages depend on matching types.  Zope Interface, for example, will not let you use adapters for one Image type for the other, the objects will not compare equivalent even if they really are, and so on.  And none of this is a bug in the code!  Why does it happen?

PIL is a package; that is, a directory with Python source code and an "__init__.py" in it, named "PIL".  However, it also installs a ".pth" file as part of its installation.  ".pth" files are one way to add entries to your sys.path.  This particular one adds the "PIL" directory to your path, which means it can be loaded from two entries: as a package, from your "site-packages" directory.

This isn't to pick on PIL or the Effbot; I've seen lots of projects which have a "lib" directory with an __init__.py and change its name at installation time, or inconsistently reference subpackages with relative and absolute imports, or do any number of things which are just as bad.  I hope that I've convinced you not to do the same thing with your project, but I won't dwell on the problem here, since I have a solution handy.

Unless you already know what is going on (although I'm sure many of you reading this already do), this can be a bit confusing to figure out.  You can use twisted.python.modules to ask this question rather directly.  Here's how:
>>> from twisted.python.modules import getModule
>>> imageModule = getModule("Image")
>>> pilImageModule = getModule("PIL.Image")
>>> imageModule.pathEntry
>>> pilImageModule.pathEntry
Here we're asking twisted.python.modules to give us objects that represent metadata about two modules, without actually loading them.  The attribute here is the 'pathEntry' attribute, which tells us what entry on sys.path the module would be loaded from, if it's imported.
>>> import sys
>>> pilImageModule.isLoaded()
>>> imageModule.isLoaded()
>>> 'PIL.Image' in sys.modules
>>> 'Image' in sys.modules
Look, no modules!

Of course, if we wanted to load those modules, it's easy enough:
>>> pilImageModule.load()
<module 'PIL.Image' from '/usr/lib/python2.5/site-packages/PIL/Image.pyc'>
>>> imageModule.load()
<module 'Image' from '/usr/lib/python2.5/site-packages/PIL/Image.pyc'>
You can also get lists of modules.  For example, you can see that the list of modules in the "PIL" package is suspiciously similar to the list of top-level modules that comes from the path entry
where the "Image" module was loaded:
>>> pilModule = getModule("PIL")
>>> pprint(list(pilModule.iterModules())[:5])
>>> pprint(list(imageModule.pathEntry.iterModules())[:5])
As you might imagine, the ability to list modules and load the ones that seem interesting is a great way to load plugins - and that's exactly how Twisted's plugin system is implemented.  While the plugin system itself is a topic for another post (or perhaps you could just read the documentation) the way it finds plugins is interesting.

For example, let's take a look at the list of Mantissa plugin modules I have installed:
>>> xmplugins = getModule('xmantissa.plugins')
>>> pprint(list(xmplugins.iterModules()))
This simple query is actually an incomplete list.  It's just the modules that come with Mantissa itself.  Python has a special little-known rule when loading modules from packages, and twisted.python.plugins honors it: if there is a special variable called "__path__" in a package, it is a list of path names to load modules from.  However, twisted.python.plugins doesn't load modules unless you ask it to, so it can't determine the value of that attribute.  As it so happens, twisted.plugins uses the __path__ attribute in order to allow you to keep your development installations separate, so twisted.python.plugins can't determine all the places you might need to look for plugins without some help.  Let's just load that package so we can look at its __path__ attribute:
>>> xmplugins.load()
<module 'xmantissa.plugins' from '/home/glyph/Projects/Divmod/trunk/Mantissa/xmantissa/plugins/__init__.pyc'>
Now that we've loaded it, let's have a look at that list:
>>> pprint(list(xmplugins.iterModules()))

That's my full list of Mantissa plugins, including my super secret Divmod proprietary plugins.

This list is generated because plugins packages use a feature (which was previously kind of a gross hack but will be an officially supported feature of the next version of Twisted) to set their path to every directory with the same name as the plugin package which is not also a package on your python path.  In other words, if you have 2 sys.path entries, a/ and b/, and one package, x.plugins, in b/x/plugins/__init__.py with this trick in it, then if you have a file b/x/plugins/foo.py, it will be considered to contain the module "x.plugins.foo".  This requires that you do not have a file b/x/__init__.py or b/x/plugins/__init__.py.  If you do, this hack will treat the two paths the same way that Python does: duplicate packages in your path, so the package in a/ is loaded and the package in b/ is ignored.

The distinction between packages and path entries is why all the Twisted and Divmod projects conventionally have capitalized directory names but lowercase package names.  "Twisted" is where your path entry should point; "twisted" is the python package that is loaded from that path entry.  "Twisted" should never have an __init__.py in it.  "twisted" always should.  This goes the same for "Axiom" and "axiom", "Mantissa" and (the unfortunately named) "xmantissa".  You will sometimes encounter other examples of this style of naming floating around the web.

When using Twisted and Divmod infrastructure, keeping this distinction is clear is critical, because otherwise it is difficult to develop plugins independently.  You probably don't want to copy your development plugins into your Twisted installation - they're part of your source repository, after all, not ours.  However, keeping the distinction clear in your mind will avoid lots of obscure problems with duplicate classes and naming, so it's generally a good idea even if you don't like our naming conventions.

Please let me know in the comments which parts of this post you found useful, if any.  I know it's a bit rambling, and covers a number of different topics, some of which may be obvious and some of which might be inscrutable.  I've experienced quite a bit of confusion when talking to other python programmers about this stuff, but I'm not sure if it was my awkward explanation of Twisted's plugin system or some inherent issue in Python's path management.
2 comments or Leave a comment
From: ianbicking Date: August 23rd, 2007 05:58 am (UTC) (Link)


What's the advantage of this sort of thing over what Setuptools does? (As Setuptools does most of the same things in terms of path manipulation, and also supports a style of plugins)
glyf From: glyf Date: August 23rd, 2007 06:42 am (UTC) (Link)

Re: setuptools

I don't really understand your question, because setuptools does a lot, much of which is more or less irrelevant to the path manipulation stuff that I'm talking about here. Also, although I mentioned the plugin stuff in passing, and didn't use any plugin APIs, just discussed the way it did path discovery using structured path objects. I think I might understand the underlying question though: I assume you're really asking about the approach to pluggability in twisted.plugin vs. the approach in setuptools.

Twisted's plugin system is a very simple system for loading code based on the type of plugin to be provided. It's very simple, very focused, and built out of a small collection of structured objects. It is more or less agnostic to issues of installation, except for the optimizations that can be applied at deployment time to cache and speed up plugin lookup.

In effect, it's a way to adapt your runtime environment to a particular interface.

setuptools, by contrast, is very much focused on installation and deployment. "development mode" is a specialized form of installation, invoked by running a 'setup.py' command. It has lots of mini-languages for describing various pieces of metadata.

I obviously prefer Twisted's approach, because it seems to me like setuptools introduces at least a portion of a edit/compile/run cycle into development. With setuptools you write external metadata in ini files and setup.py files. With Twisted you just put python files in your existing Python path.

In the interests of fairness, though, Twisted's plugin system is somewhat lower level. For example, you would have to implement resource_string like this:
def resource_string(module_name, resource_name):
    module = getModule(module_name)
    return module.filePath.parent().child(resource_name).getContent()
Similar to setuptools, this does actually work with zip files, and could easily be made to work with other import systems. It's operating at a lower level, because it is a virtual filesystem paired with modules, rather than an abstract notion of related resources. You have to be familiar with the twisted.python.modules 'module' interface, 'getModule', and as-yet-ad-hoc 'FilePath' interface.

Of course, given that I like Twisted better, I'd also mention that this additional level makes it easier to operate on directory collections of resources as well :).

Is this the kind of thing that you meant?
2 comments or Leave a comment