You are viewing glyf

entries friends calendar profile Scribbles in the Dark Previous Previous Next Next
Please Visit http://glyph.twistedmatrix.com/ - This Blog Is Closed. - Highlighting buried treasure in Twisted
Sorry about the ads, I can't turn them off.
glyf
glyf
Add to Memories
Share
Highlighting buried treasure in Twisted
I've previously blogged about twisted.python.modules, but it assumes you know about another API inside Twisted, twisted.python.filepath.  Unfortunately this module is rather under-documented and under-publicized, despite being extremely useful.  Unlike a lot of Twisted, much of the code in twisted.python can be extracted and used by itself, regardless of whether the program in question is networked or even event-driven.  This is especially true of FilePath, which is completely blocking, although sometimes I wish there were at least a version of it that wasn't.

A common sort of script that deals with a filesystem is to open each file in a directory hierarchy with a given path and do something to its contents.  For example, let's write a program that prints out a list of all Python modules (with a .py extension) in a tree which contain shebang lines.

Here's the script using good old os.path:
import sys
import os

def os_shebangs(pathname):
    for dirpath, dirnames, filenames in os.walk(pathname):
        for filename in filenames:
            fullpath = os.path.join(dirpath, filename)
            if (fullpath.endswith(".py") and
                file(fullpath, "rb").readline().startswith("#!")):
                yield fullpath

def os_show_shebangs(pathname):
    for path in os_shebangs(pathname):
        sys.stdout.write("%s: %s\n" % (
                path,
                file(path, "rb").readline()[2:].strip()))

if __name__ == '__main__':
    os_show_shebangs(sys.argv[1])

Pretty normal looking python code; not too much wrong with it.  At 20 lines and 596 characters long, it's not too complex.

Now let's have a look at a similarly idiomatic version using FilePath:
import sys
from twisted.python.filepath import FilePath

def shebangs(path):
    for p in path.walk():
        if (p.basename().endswith(".py") and
            p.open().readline().startswith("#!")):
            yield p

def showShebangs(pathobj):
    for path in shebangs(pathobj):
        sys.stdout.write("%s: %s\n" % (
                path.path,
                path.open().readline()[2:].strip()))

if __name__ == '__main__':
    showShebangs(FilePath(sys.argv[1]))
At 18 lines and 471 characters, it's almost exactly 20% smaller than the version that uses os.path.  However, a small space savings is hardly the most interesting property of this code.  The advantages over the version that uses os.path:
  • It's easier to test.  You can use a fake FilePath object rather than needing to replace the whole "os" module and the "file" builtin.
  • It's easier to read.  You need fewer names; rather than os, os.path, and builtins, the code talks mainly to one object.
  • It's easier to write.  How many of you honestly remembered that "dirpath, dirnames, filenames" is the order of the tuples yielded from os.walk?
  • It's easier to secure.  If you wanted to allow untrusted users to supply input to the os.path version, you need to be very, very careful.  What about "/"?  What about ".."?  With FilePath, you simply supply the input to the 'child' method, and...
    >>> from twisted.python.filepath import FilePath
    >>> fp = FilePath(".")
    >>> x = fp.child("okay")
    >>> y = fp.child("..")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "twisted/python/filepath.py", line 308, in child
        raise InsecurePath("%r is not a child of %s" % (newpath, self.path))
    twisted.python.filepath.InsecurePath: '/home' is not a child of /home/glyph
    >>> z = fp.child("hello/world")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "twisted/python/filepath.py", line 305, in child
        raise InsecurePath("%r contains one or more directory separators" % (path,))
    twisted.python.filepath.InsecurePath: 'hello/world' contains one or more directory separators
  • It's easier to extend.  As of revision 22464 of Twisted (i.e. the next release) you can replace twisted.python.filepath.FilePath with twisted.python.zippath.ZipArchive, and this exact same code can operate on zip files.
Not only does FilePath provide these benefits, it has very few dependencies.  Even if you don't like Twisted much, you can use twisted.python.filepath by copying only 3 modules into your project (twisted.python.filepath, twisted.python.win32, and twisted.python.runtime) and twiddling the appropriate imports to be relative.  Since FilePath is only one import for your code, and mostly consists of method calls, it will easily work with Twisted's version or your own.  So, share and enjoy!
Comments
From: oubiwann Date: February 13th, 2008 06:04 pm (UTC) (Link)

Nice Write-up :-)

I've been thinking about sharing the goodness that is FilePath for a few months -- now I don't have to :-) Using it has saved me sooo much time...
djfroofy From: djfroofy Date: February 14th, 2008 09:29 pm (UTC) (Link)

Please convince Guido to make this part of the standard library

I mostly agree. twisted.python.filepath provides an great API and testability is important. The only functionality I wish FilePath.walk provided was the ability to stop recursion at certain points, like os.walk provides you:
  if '.svn' in dirs:
    dirs.remove('.svn')
Maybe this could be as simple as a keyword argument to walk:
  fp.walk(exclude=['.svn','evil_symlink2root'])
glyf From: glyf Date: February 15th, 2008 02:14 am (UTC) (Link)

Re: Please convince Guido to make this part of the standard library

Submit a patch! :) This sounds straightforward / well specified enough I'll even commit to a review... (However, something that lets you actually modify the iteration would be better than something that took static strings.)

As far as getting into the stdlib - agitate on python-dev. I'll help you do any necessary coding if you can do the legwork to get everyone to agree that it's desirable (as opposed to one of the 30 "OO" filesystem wrappers that people have written for the stdlib, or nothing at all). I don't have the energy for that.
djfroofy From: djfroofy Date: February 20th, 2008 04:45 pm (UTC) (Link)

Re: Please convince Guido to make this part of the standard library

I don't have the energy to agitate on python-dev either nor do I have the required diplomacy skills ;) Anyhow, here's the patch:

http://twistedmatrix.com/trac/attachment/ticket/3044/filepath.py.diff#preview
From: zooko Date: March 4th, 2008 05:02 pm (UTC) (Link)

too much coupling

This kind of thing makes me sad.

There are lots of people who could benefit from twisted.python.filepath, and there are comparable packages which twisted could use in order to gain the benefit without the cost of maintaining the package (at least one of which is being considered for inclusion in the Python Standard Library), but it isn't going to happen -- non-Twisted-requiring projects aren't going to benefit from twisted.python.filepath, and Twisted isn't going to benefit from those other packages, because Twisted doesn't use good packaging technology so that it can use other people's code and other people can use its code in an easy, manageable way.

Frankly, suggesting that people could copy a few source files is the kiss of death, for the prospect of that code being re-used by other people.

Twisted is falling behind because of this. Please ponder the postscript to this page:

http://www.kieranholland.com/code/documentation/nevow-stan/

This guy says, as I interpret: Nevow is technically better, but Django is the future because it makes it easy for people to re-use components in isolation. The same could be said of many of the Twisted and Divmod offerings.
glyf From: glyf Date: March 4th, 2008 08:32 pm (UTC) (Link)

Re: too much coupling

there are comparable packages which twisted could use in order to gain the benefit without the cost of maintaining the package
Not true, as far as I'm aware. I believe you're talking about the many "object oriented" path abstractions for Python. Twisted uses FilePath for two main requirements, neither of which is OO nicety: security in contexts like a web server, and polymorphism to zip files. Do any of the path libraries you're talking about implement either of these requirements?
Frankly, suggesting that people could copy a few source files is the kiss of death, for the prospect of that code being re-used by other people.
This is exactly the strategy that the module being considered for inclusion in Python under PEP 355 uses.
Twisted doesn't use good packaging technology
Python doesn't have any good packaging technology. distutils is sparsely documented and difficult to do anything with without touching internal implementation details; setuptools is a mine-field unless your sys.path is set up exactly like PJE's. Granted, Twisted's setup.py is especially bad, but that is being worked on. If you're really concerned about it, consider addressing some of the tickets mentioned by the release automation status document on the Twisted wiki.
Nevow is technically better, but Django is the future because it makes it easy for people to re-use components in isolation.
That's not the way I read it at all. From my perspective it looks like he's saying "Twisted is technically better, but Django has a much nicer website and therefore more people working to improve it." I don't think this has anything to do with packaging technology; Django is pitched at a much, much bigger audience than Twisted is, and will therefore generate more interest. Importantly, pretty much the entire audience is involved in building websites, and hence much more conversant with PR and graphic design than the Twisted audience is; I think we're lucky to be doing as well as we are on that front.

But by all means, help us deal with our release automation problems :).
6 comments or Leave a comment
profile
Glyph Lefkowitz
User: glyf
Name: Glyph Lefkowitz
calendar
Back December 2010
1234
567891011
12131415161718
19202122232425
262728293031
page summary
tags