Eventlet: Asynchronous I/O for Grownups

lose-an-argument-like-a-man-say--well-i-guess-ill-just-go-fuck-myself-then.jpgEvent-driven asynchronous I/O is the newest chatter at the Silicon Valley High Abercrombie table.  Threading, the mode of parallelism we all thought we were so smart for understanding, isn't cool anymore. Everybody who is anybody is using asynchronous I/O, and of course, there are different opinions on how it should be done. This being the software world, you can count on those opinions being vehement.

If you look at the benchmarks, all of the major async libraries for Python are basically on the same operating plane. There's Twisted, Tornado, gevent, and a handful of others, but the one that really stands out in the group is Eventlet. Why is that? Two reasons:

1. You don't need to get balls deep in theory to be productive with Eventlet.
2. You need to modify very little pre-existing code to adapt a program to be event-driven.

Eventlet's approach is that asynchronous code should look like synchronous code. Why? Because it's easy for people to understand synchronous code.  Thinking about callbacks and schedulers is unnecessary, after all, we have work to do. What's more, not only does asynchronous code with Eventlet look synchronous, it can also run synchronously.

Look at this Python snippet:

def fetch_and_parse(url):
    contents = urllib2.urlopen(url).read()
    tree = lxml.html.fromstring(contents)
    # Do some parsing on the ElementTree
    return value
It looks like regular synchronous code, and ostensibly it is. The output of the URL fetch is the input to the HTML parser. However, if you have a ton of URLs to do this to, how would you parallelize it? Threads are an option, but so is Eventlet:

import eventlet
from eventlet.green import urllib2

def main():
    green_pool = eventlet.GreenPool(size = 10)
    results = []
    for result in green_pool.imap(fetch_and_parse, urls):
        results.append(result)
This is interesting because all I've done to make a seemingly synchronous piece of code run asynchronously is to patch the library it needs for I/O and give it a driver method. That driver class could have easily been a series of threads all reading from a Queue, and importing the standard library's version of urllib2.

Now hold on a second. This is a painfully contrived example, but it's such a key point: The asynchronous code looks synchronous. It can even function synchronously. All I did to make it use event-driven I/O is change the driver and patch a library. Now this is podracing!

That sort of integration has such a massive business value that I will easily disregard any pissing-contest performance gains that Twisted or Tornado may offer. I know that when you have code written in the "old" style, and the powers that be hand down the "new" style, there is an itch to re-write it, but rewriting known-working code is the worst thing you can do for your project.

The Eventlet developers have gone further than this, providing a facility to monkey-patch the existing system libraries at invocation time. For example, let's say you have a web app that does some Memcached I/O and some database I/O.

from eventlet import patcher
patcher.monkey_patch(all = True)
Oh look. Your application is now using asynchronous I/O. This call patches Python's socket module and a few others to make it all "just work" with Eventlet's internal coroutine switching mechanism. (Caveat: MySQLdb, which uses C-land sockets, needs a little bit of extra treatment, but it's only a couple of lines)

This all sounds great in theory, but I have actually made a large I/O bound program work using monkey patching and changing the driver. It is a piece of software that reads jobs from a queue and processes them, putting the result in memcached. For esoteric reasons I will not go into, the job processors could not thread the work, they had to fork. Using this setup, one production box with 8GB of RAM was consistently 7.5GB full. After a less than 5 line code change to the driver, that same production box uses only around 1GB of RAM consistently, and can handle 5 to 10x the throughput of the old system.

Now compare this to Twisted or Tornado. Twisted tries so damn hard to be Java that it really offends me personally. Those developers strike me as the alpha-programmer types who see no reason not to rewrite an existing codebase for a 20% performance gain.  Tornado on the other hand is significantly less Jersey Shore douchebaggy, but they still miss the point: we are programmers who need to get stuff done. Inventing your own HTTP client class, when Python's builtin works just fine if not better is the type of hubris that gets hotshot programmers fired in their first month.

There's also gevent, which appears to be a fork of Eventlet, but is not as well documented. Partial credit.

It's hard to find a performance or scaling related open source library that values my time. Eventlet is one of those rare few.

Break My Concentration and I Break Your Kneecaps

a-handgun-is-like-an-atm-machine-and-convincing-argument-all-in-one.jpg
I own a good set of headphones that fully enclose my ears. I am not an audiophile, I just don't like to hear other people talk at me.  When I am staring at my Emacs windows with headphones on, it generally isn't a physical cue that I am looking for conversation. In fact, when I am that deep into thinking out a problem and I get interrupted, I think about the anti-workplace-violence clause in the employee handbook, and how a poorly lit parking lot probably doesn't qualify as "company property".

Interrupting a thinking programmer is a sucker punch to productivity's kidney. Of course it's still important to keep open communication channels, especially in a small team. I don't mind answering questions and helping out, so long as it's not an immediate context switch for me, i.e. I'll help you if I don't have to speak.

Instant messaging is a decent first attempt, but it's only person-to-person communication. (And no, group-IM never fucking works right) Programming teams need group chat.  White-label Twitter clones like Yammer are okay, but I feel icky using a product that is hailed as a technological advance for supporting the ability to identify topics by prefixing a word with a pound sign. That, and I want to keep an eye on the conversation as I work, and my attention isn't on my IM client or browser when I'm coding. It's on Emacs.

The answer, of course is IRC.

My team recently grew, and four of us need to communicate constantly. I set up an IRC server and brought people in. One non-programmer who needed to be in the loop had never used IRC, but caught on quickly. Productivity is up, as is communication. The developer chat channel is right in front of me as I work, as a window in Emacs:

at-the-crunchies-i-got-drunk-and-started-heckling-people-who-used-to-be-important.pngThink of developer communication like I/O. There's blocking and nonblocking. When somebody talks to me as I work, my programming train of thought needs to block. With inline chat like you see above, I can answer questions when I have spare cycles. Since the conversation is integrated into my development environment, I don't need to look around at other applications, and there's no popup notification bouncing around like a Jack Russell terrier who got into my Adderall supply. Also since it's Emacs, it's not vim. If you use vim, /quit #life.

Collaboration technology doesn't need to be re-invented every six years. The stuff we had in the eighties works just fine.

Options for Parallel Compression

when-a-couple-gets-a-dog-its-like-saying-we-want-a-baby-but-dont-want-to-go-to-jail-if-it-dies-by-accident.jpgAt Milo, I pretty frequently need to pull data down from production to my workstation to test some new code. That's what happens when you raise a Series A round - you can't live-edit production data anymore. I think it's in the term sheet somewhere.

Anyhow, I was pulling down a 14GB MySQL database dump today. Trying to compress it through plain Jane gzip was pretty slow, so I looked for some parallel options. The server I was pulling from has 16 cores, so I figured I could make use of them.  Anyhow, here's what I found:

  • pbzip2 - Parallel BZIP2: Parallel implementation of BZIP2. BZIP2 is well known for being balls slow, so speed it up using multiple CPUs.
  • pigz - Parallel GZIP: Parallel implementation of GZIP written by Mark Adler (guy who co-authored zlib and gzip, so you can be reasonably confident he has his shit together).
On the 14GB database dump, both are faster than vanilla GZIP. Because Hacker News and Reddit both love this shit, here are the timing stats:

  • Plain gzip, default compression level: 11 minutes, 58 seconds. Resultant file is 2.3GB.
  • pbzip2, default compression level: 8 minutes, 48 seconds. Resultant file is 1.7GB.
  • pigz, default compression level: 1 minute, 33 seconds. Resultant file is 2.3GB.
Again this was on a 14GB database dump file, on a 16-core machine, with Intel solid state disks.

If any readers know of other parallel compression schemes I can try, e-mail me and let me know. I will post stats here.