A common sort of script that deals with a filesystem is to open each file in a directory hierarchy with a given path and do something to its contents. For example, let's write a program that prints out a list of all Python modules (with a .py extension) in a tree which contain shebang lines.
Here's the script using good old os.path:
import sys import os def os_shebangs(pathname): for dirpath, dirnames, filenames in os.walk(pathname): for filename in filenames: fullpath = os.path.join(dirpath, filename) if (fullpath.endswith(".py") and file(fullpath, "rb").readline().startswith("#!")): yield fullpath def os_show_shebangs(pathname): for path in os_shebangs(pathname): sys.stdout.write("%s: %s\n" % ( path, file(path, "rb").readline()[2:].strip())) if __name__ == '__main__': os_show_shebangs(sys.argv)
Pretty normal looking python code; not too much wrong with it. At 20 lines and 596 characters long, it's not too complex.
Now let's have a look at a similarly idiomatic version using FilePath:
At 18 lines and 471 characters, it's almost exactly 20% smaller than the version that uses os.path. However, a small space savings is hardly the most interesting property of this code. The advantages over the version that uses os.path:import sys from twisted.python.filepath import FilePath def shebangs(path): for p in path.walk(): if (p.basename().endswith(".py") and p.open().readline().startswith("#!")): yield p def showShebangs(pathobj): for path in shebangs(pathobj): sys.stdout.write("%s: %s\n" % ( path.path, path.open().readline()[2:].strip())) if __name__ == '__main__': showShebangs(FilePath(sys.argv))
- It's easier to test. You can use a fake FilePath object rather than needing to replace the whole "os" module and the "file" builtin.
- It's easier to read. You need fewer names; rather than os, os.path, and builtins, the code talks mainly to one object.
- It's easier to write. How many of you honestly remembered that "dirpath, dirnames, filenames" is the order of the tuples yielded from os.walk?
- It's easier to secure. If you wanted to allow untrusted users to supply input to the os.path version, you need to be very, very careful. What about "/"? What about ".."? With FilePath, you simply supply the input to the 'child' method, and...
>>> from twisted.python.filepath import FilePath >>> fp = FilePath(".") >>> x = fp.child("okay") >>> y = fp.child("..") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "twisted/python/filepath.py", line 308, in child raise InsecurePath("%r is not a child of %s" % (newpath, self.path)) twisted.python.filepath.InsecurePath: '/home' is not a child of /home/glyph >>> z = fp.child("hello/world") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "twisted/python/filepath.py", line 305, in child raise InsecurePath("%r contains one or more directory separators" % (path,)) twisted.python.filepath.InsecurePath: 'hello/world' contains one or more directory separators
- It's easier to extend. As of revision 22464 of Twisted (i.e. the next release) you can replace twisted.python.filepath.FilePath with twisted.python.zippath.ZipArchive, and this exact same code can operate on zip files.