Glyph Lefkowitz (glyf) wrote,

Highlighting buried treasure in Twisted

I've previously blogged about twisted.python.modules, but it assumes you know about another API inside Twisted, twisted.python.filepath.  Unfortunately this module is rather under-documented and under-publicized, despite being extremely useful.  Unlike a lot of Twisted, much of the code in twisted.python can be extracted and used by itself, regardless of whether the program in question is networked or even event-driven.  This is especially true of FilePath, which is completely blocking, although sometimes I wish there were at least a version of it that wasn't.

A common sort of script that deals with a filesystem is to open each file in a directory hierarchy with a given path and do something to its contents.  For example, let's write a program that prints out a list of all Python modules (with a .py extension) in a tree which contain shebang lines.

Here's the script using good old os.path:
import sys
import os

def os_shebangs(pathname):
    for dirpath, dirnames, filenames in os.walk(pathname):
        for filename in filenames:
            fullpath = os.path.join(dirpath, filename)
            if (fullpath.endswith(".py") and
                file(fullpath, "rb").readline().startswith("#!")):
                yield fullpath

def os_show_shebangs(pathname):
    for path in os_shebangs(pathname):
        sys.stdout.write("%s: %s\n" % (
                path,
                file(path, "rb").readline()[2:].strip()))

if __name__ == '__main__':
    os_show_shebangs(sys.argv[1])

Pretty normal looking python code; not too much wrong with it.  At 20 lines and 596 characters long, it's not too complex.

Now let's have a look at a similarly idiomatic version using FilePath:
import sys
from twisted.python.filepath import FilePath

def shebangs(path):
    for p in path.walk():
        if (p.basename().endswith(".py") and
            p.open().readline().startswith("#!")):
            yield p

def showShebangs(pathobj):
    for path in shebangs(pathobj):
        sys.stdout.write("%s: %s\n" % (
                path.path,
                path.open().readline()[2:].strip()))

if __name__ == '__main__':
    showShebangs(FilePath(sys.argv[1]))
At 18 lines and 471 characters, it's almost exactly 20% smaller than the version that uses os.path.  However, a small space savings is hardly the most interesting property of this code.  The advantages over the version that uses os.path:
  • It's easier to test.  You can use a fake FilePath object rather than needing to replace the whole "os" module and the "file" builtin.
  • It's easier to read.  You need fewer names; rather than os, os.path, and builtins, the code talks mainly to one object.
  • It's easier to write.  How many of you honestly remembered that "dirpath, dirnames, filenames" is the order of the tuples yielded from os.walk?
  • It's easier to secure.  If you wanted to allow untrusted users to supply input to the os.path version, you need to be very, very careful.  What about "/"?  What about ".."?  With FilePath, you simply supply the input to the 'child' method, and...
    >>> from twisted.python.filepath import FilePath
    >>> fp = FilePath(".")
    >>> x = fp.child("okay")
    >>> y = fp.child("..")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "twisted/python/filepath.py", line 308, in child
        raise InsecurePath("%r is not a child of %s" % (newpath, self.path))
    twisted.python.filepath.InsecurePath: '/home' is not a child of /home/glyph
    >>> z = fp.child("hello/world")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "twisted/python/filepath.py", line 305, in child
        raise InsecurePath("%r contains one or more directory separators" % (path,))
    twisted.python.filepath.InsecurePath: 'hello/world' contains one or more directory separators
  • It's easier to extend.  As of revision 22464 of Twisted (i.e. the next release) you can replace twisted.python.filepath.FilePath with twisted.python.zippath.ZipArchive, and this exact same code can operate on zip files.
Not only does FilePath provide these benefits, it has very few dependencies.  Even if you don't like Twisted much, you can use twisted.python.filepath by copying only 3 modules into your project (twisted.python.filepath, twisted.python.win32, and twisted.python.runtime) and twiddling the appropriate imports to be relative.  Since FilePath is only one import for your code, and mostly consists of method calls, it will easily work with Twisted's version or your own.  So, share and enjoy!
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded  

  • 6 comments