Log in

No account? Create an account
entries friends calendar profile Scribbles in the Dark Previous Previous Next Next
Twisted Do-Over - Please Visit http://glyph.twistedmatrix.com/ - This Blog Is Closed.
Sorry about the ads, I can't turn them off.
Twisted Do-Over
Recent fanfare over Twisted, including the totally awesome book which you should go buy right now, has gotten me thinking - is Twisted really all that great? I believe that while it is still probably the best thing out there for doing what it does, there are a few things I wish had happened differently. So here's a laundry list of things that I wish Twisted did differently, and how I would implement them if I were starting from scratch today. Maybe eventually this will be a roadmap - right now, it's wishful thinking, and too vague to be any real kind of spec.

In the innermost guts of the reactor, there is no real normalization of events. The reactor is sort of a fused engine block where all of the "work" of dispatching events happens. I'd rather that were unrolled a bit. Especially in today's world of generator-heavy Python, I'd rather that the reactor core look something like a set of wrapped iterators; a base generator that ran "select()" and yielded file descriptors ready for reading / writing; a generator that wrapped that which did OS- and FD-specific I/O, like recv() and send(); a wrapper above that generating application-level request/response pairs, and so forth. Think of this as a web server (in very, very broad strokes, this is not a precise API):

def webServer(self, connection):
        for request in parseRequests(connection.inputStream):
            response = self.respondTo(request)
            yield response

Such a system would also make it a lot clearer what "one reactor iteration" meant. Rather than some arbitrary constellation of behaviors which happened to be scheduled "at the same time", one reactor iteration could be made to correspond exactly with one tick of a user-provided iterator.

Further up from that (but using that facility), I wish that we had included SEDA's notion of a "stage". This would have made a few things a lot easier. For example, it would be nice to have a well-defined notion of a request/response processing webserver that could generate a "response" object, possibly from a thread, but have that "response" be processed entirely asynchronously in the main thread.

In particular having a notion of a "stage" would make it a lot easier to run full database transactions within threads, isolating them from communications code, by stipulating that transactions must produce notifications or network I/O in the form of output objects placed into a queue. Recently I have surveyed some open-source Twisted code in the wild, and answered a bunch of questions, which have implied to me that many Twisted developers now believe that the correct way to interface with a relational database is to turn *EVERY* SQL statement into a Deferred which is handled individually.

This is a tangent, but allow me to offer a bit of advice. The documentation is really poor, and never says this, but using Twisted, or rather ADBAPI, to convert every single SQL statement into a separate transaction and handle its results separately, has a whole slew of problems. First of all, it's slow: you have to acquire and release thread mutexes on every operation. Second, it is unsafe. Your conceptual transactions might be interrupted at any moment, leaving your database in an inconsistent state. Also in the realm of safety, notifications generated from within a transaction that gets rolled back are sent to the network anyway, so two different database-using proceses talking to each other can trivially become inconsistent. Take a look at the 'runInteraction' API and give some thought to what represents a "whole" transaction in your database. Moving this transaction processing out of the main loop *IS* an appropriate use of threads, and in fact adbapi does it internally. This is doubly true if your application or your SQL layer does any caching of SQL results; to be sure that the cache is consistent with the DB, you have to keep track of whether and when transactions are rolled back.

Back to the main point. I also would have designed the reactor access API a bit differently. 'from twisted.internet import reactor' looks convenient, but is highly misleading. Figuring out what reactor your process is currently using is part of a more general problem of execution context. There are other objects that applications wish to find in the same way: the current database connection, for example, the current log monitor, or the current HTTP request. twisted.python.context deals with this in a general manner, but because it is not used consistently to access important objects, it has not been subjected to the testing and refining that it has needed. The worst side-effect of this has been the "context object" abomination that afflicts Nevow and Web2.

I also would have designed Deferreds as more central to the whole thing, and optimized the hell out of them rather than worrying about their performance. For example, it would be a lot easier for many applications if deferLater were the default behavior of callLater. Similarly to deferToThread vs. callFromThread. The main reason that the reactor does not use Deferreds for these, or for the client connection API, is because of a general feeling of nervousness about how it would be hard to implement Deferreds in C, so the reactor API shouldn't require them. In retrospect this is silly (especially now that James Knight has actually gotten further implementing Deferreds in C than anyone else has on getting the reactor implemented there).

Current Mood: pensive pensive

6 comments or Leave a comment
From: srlamb Date: September 25th, 2005 04:18 pm (UTC) (Link)


I'm just getting into Twisted, so I can't comment too much on a lot of what you said.

I don't think the mutexes are a real performance problem, though. On my Linux/NPTL system, locking and unlocking is almost as fast as an empty shared library function call in the no-contention case.
    $ ./mutex_bench 
    Mutex locking/unlocking took 3.56646 seconds
    Empty function calls took 1.3248 seconds

(That's http://www.slamb.org/svn/repos/trunk/projects/misc/mutex_bench.c)

I'm sure that time is completely dwarfed by (A) the performance hit of round trips to the database, which might not even be on the same machine (B) the context switches (C) the creation and destruction of Python objects.

I have one major performance complaint about Twisted's design. I have SMP machines sitting around, and Twisted does all of its work in a single thread/process. So if I'm CPU-bound, I'm wasting a lot of machine.

I wrote a C++-based SSL proxy recently using libevent. In my quick uniprocessor test, it seemed to be about twice stunnel's performance. (Process per connection.) On a dual-processor machine, it'd use half as much CPU time as stunnel...but still handle the same number of connections. That's only so helpful.

I'm not sure what the right answer is. I've toyed with solutions like Jeff Darcy's, but it makes the locking complicated. I haven't gotten anywhere with that.

One simple solution might be to just spawn a few processes that share your listener descriptors. Use some trylock sort of mutex magic around adding the listeners to the descriptor set, so you get wake-one instead of wake-all semantics on new connections. Then just go...whatever one happens to be listening at the time picks up the new connection. They'll split the work more or less evenly.

Actually, this approach probably could be worked into twisted without too much pain.
jcalderone From: jcalderone Date: September 25th, 2005 08:31 pm (UTC) (Link)

RE: Interesting

That mutex performance benchmark is borne out (and the point strengthened, I think) by the equivalent Python test:

$ timeit -s 'from threading import Lock; L = Lock()' 'L.acquire(); L.release()'
1000000 loops, best of 3: 1.18 usec per loop
$ timeit -s 'x = type("X", (object,), {"f": lambda self: None})()' 'x.f(); x.f()'
1000000 loops, best of 3: 1.17 usec per loop

But the point stands for all the other reasons Glyph gave.

Regarding exploiting multiple processors, this is where the SEDA-style stages can come in useful: each stage can execute on a different CPU; since communication only happens between stages at a single point (the event queue), each stage can have its own thread or its own process without the application ever being aware of the fact.

This kind of change seems like it would necessarily involve a break in API compatibility - or at least a whole new world which might co-exist with the old, but which would not provide any benefits for code not written for it (ie, all existing code). I don't think this is necessarily a bad thing, but it is clear that it's not going to happen without a lot of effort. Worst of all, with the work most of the core developers are doing, there's little incentive to actually undertake this task; the current system works well enough for most applications, and there's precious little feedback (in any form - praise, complaints, dollars) from people who are using Twisted in environments where this would make a difference.

glyf From: glyf Date: September 26th, 2005 12:17 am (UTC) (Link)

Re: Interesting.

I didn't mean to say that it was the actual lock acquisition that was the cost; in this particular case, the mutexes being acquired will often be in contention since it's a database connection and subsequent operations are going to happen almost instantaneously; that means all the other costs you mentioned (context switching, etc), plus the acquiring and releasing the GIL which is basically *always* under contention.

But, yeah. Interesting points, all. SMP machines are less of a concern for me than clusters, and (as with what JP said) SEDA can help with distributed clustering as well...
From: srlamb Date: September 27th, 2005 04:31 am (UTC) (Link)

Re: Interesting.

Hmm, maybe I don't understand SEDA as well as I thought, then. Aren't the "stages" just a few thread pools separated by queues? Requests move from one stage to another?

If so, I don't see how this is functionally different from adbapi's queueing. If it causes a lot of contention, wouldn't the SEDA stuff, too?

Actually, maybe so - I see a complaint on the list today about excessive context switches with thread pools. Hmm.
From: micahel Date: September 26th, 2005 08:02 am (UTC) (Link)

context objects an abomination?

I think I find nevow's context object fairly natural. It's at least explicit, which makes a change from some of the things that might have crossed people's minds... I've never been very good at understanding nevow at a philosophical level, though.
glyf From: glyf Date: September 5th, 2006 04:12 pm (UTC) (Link)

Re: context objects an abomination?

The problem with context objects is that it's extremely hard to write about their interfaces. For example:
def render_foo(self, ctx, data):
    stuff = IBar(ctx).baz(data)
    return [stuff.moreStuff()]

Now, you can always get yourself into trouble like this when you are using a dynamic language and you don't document your contracts, but this can be especially puzzling to figure out. With regular functions, you can at least inspect the stack to figure out where the value came from; with Deferreds you can instrument the callback chain. With the context, though, the code that stuck the IBar into your page is completely gone by the time your renderer is executing, making it both hard to document ("remember IBar in the context in the template somewhere in a node superior to the one tagged with this renderer"), and almost impossible to debug.

We're very close to eliminating the context entirely these days, and I think that all of our Nevow code is much better for it.
6 comments or Leave a comment