At Uncov, I took a lot of time to write about fail. For lack of a better word, I described the horrific ignorance of reason and systemic absence of programming ability that runs rampant in Web 2.0 simply as fail.
While developing Persai, we have furthered the study of fail, and developed it into a management tool.
The Failboard
My two cofounders and I started a failboard where we keep tallies of eachother's fail. A whiteboard, a dry erase marker, and unrelenting fits of ball-busting are all you need to implement this strategy at your organization.
When one member of your team fails at something, it is tallied on the board. The other members of the team then harass the failer about his programming ability, sexual prowess, and general competence.
The Grand Unified Theory of Fail
What is fail? What does a person have to do to deserve this treatment? It has no formal definition, but loosely, a fail is some bit of negligence, misunderstanding, unwillingness to read documentation, or just plain unforgivable stupidity that leads to a problem. Fail is amplified by the magnitude of the problem and the ease with which the error could have been detected.
For example, pushing a CSS change to a production website that doesn't quite work in IE is not a fail because it's just some lame HTML problem. There's no gap in understanding here, it's just programmer laziness, or aversion to busywork.
Anatomy of a Fail
I will pull an example of something that is a fail from my experience with Persai. This fail was my own, and my balls were busted within an inch of their lives in retribution.
We keep vectors in a sparse format to conserve memory, because most of the elements are zeroes. Iteration on these sparse vectors requires some tricky bookkeeping of indices, but if you're paying attention, it's not that hard.
I was writing a simple math library for our sparse vector format. The first method I wrote was a dot product, and it performed admirably. If you're not familiar with the dot product, it's the sum of the products of the vector elements. So, if you have a zero in some position in one of the vectors, it's not even worth your time to carry through the multiplication with the other vector if it has a nonzero element in that position, because it adds nothing to the sum.
Because of this, you can play fast and loose with the bookkeeping in a dot product. I hammered this method down, and had gotten the general idea of how to iterate over two sparse vectors and match up the indices.
The next method I went to write was a vector sum. Since I had gotten the bookkeeping right in the dot product method, I copied-and-pasted the loop from
Of course, I never unit tested this, just rolled it out. I know what I'm doing. I've got a degree in math. How could I possibly screw this up? A vector sum is a pretty fundamental operation, and many other higher-level vector math operations we use depend on it. Namely, there's some complicated math behind how we bootstrap a recommendation system based on relatively little signal. I had pushed this new library out to our development setup and just assumed it worked.
I created a new Persai interest about Facebook, seeding it with the word "facebook". It started giving recommendations about pasta recipes. I tried to pass the blame off to my co-founders, trying to come up with some cock-and-bull story about how the documents are parsed and so on. I dug into the code a little bit, down through the math, and had that moment of realization: I had failed. Usually, you can tell when a teammate fails because they will be looking at an Eclipse screen and just out of the blue mutter "Oh good Lord".
Remember that whole bit about being able to skip the multiply when a vector has a zero in that position? That doesn't work with addition, because even though 0x = 0, it turns out that 0 + x = x, not zero. FAIL.
This error was inexcusable, and I paid dearly for it, mostly with pride.
The Failboard in Practice
After a while using the failboard, you start to get a sixth sense for fail. Some code path that makes use of a
It gets more powerful when you can preemptively detect fail, not in your own code, but in your teammates'. Believe me, if you punish teammates severely enough, it will happen less, because everyone develops a nose for fail.
While developing Persai, we have furthered the study of fail, and developed it into a management tool.
The Failboard
My two cofounders and I started a failboard where we keep tallies of eachother's fail. A whiteboard, a dry erase marker, and unrelenting fits of ball-busting are all you need to implement this strategy at your organization.
When one member of your team fails at something, it is tallied on the board. The other members of the team then harass the failer about his programming ability, sexual prowess, and general competence.
The Grand Unified Theory of Fail
What is fail? What does a person have to do to deserve this treatment? It has no formal definition, but loosely, a fail is some bit of negligence, misunderstanding, unwillingness to read documentation, or just plain unforgivable stupidity that leads to a problem. Fail is amplified by the magnitude of the problem and the ease with which the error could have been detected.
For example, pushing a CSS change to a production website that doesn't quite work in IE is not a fail because it's just some lame HTML problem. There's no gap in understanding here, it's just programmer laziness, or aversion to busywork.
Anatomy of a Fail
I will pull an example of something that is a fail from my experience with Persai. This fail was my own, and my balls were busted within an inch of their lives in retribution.
We keep vectors in a sparse format to conserve memory, because most of the elements are zeroes. Iteration on these sparse vectors requires some tricky bookkeeping of indices, but if you're paying attention, it's not that hard.
I was writing a simple math library for our sparse vector format. The first method I wrote was a dot product, and it performed admirably. If you're not familiar with the dot product, it's the sum of the products of the vector elements. So, if you have a zero in some position in one of the vectors, it's not even worth your time to carry through the multiplication with the other vector if it has a nonzero element in that position, because it adds nothing to the sum.
Because of this, you can play fast and loose with the bookkeeping in a dot product. I hammered this method down, and had gotten the general idea of how to iterate over two sparse vectors and match up the indices.
The next method I went to write was a vector sum. Since I had gotten the bookkeeping right in the dot product method, I copied-and-pasted the loop from
dotprod() to sum(). Copy-paste is strongly correlated with fail.Of course, I never unit tested this, just rolled it out. I know what I'm doing. I've got a degree in math. How could I possibly screw this up? A vector sum is a pretty fundamental operation, and many other higher-level vector math operations we use depend on it. Namely, there's some complicated math behind how we bootstrap a recommendation system based on relatively little signal. I had pushed this new library out to our development setup and just assumed it worked.
I created a new Persai interest about Facebook, seeding it with the word "facebook". It started giving recommendations about pasta recipes. I tried to pass the blame off to my co-founders, trying to come up with some cock-and-bull story about how the documents are parsed and so on. I dug into the code a little bit, down through the math, and had that moment of realization: I had failed. Usually, you can tell when a teammate fails because they will be looking at an Eclipse screen and just out of the blue mutter "Oh good Lord".
Remember that whole bit about being able to skip the multiply when a vector has a zero in that position? That doesn't work with addition, because even though 0x = 0, it turns out that 0 + x = x, not zero. FAIL.
This error was inexcusable, and I paid dearly for it, mostly with pride.
The Failboard in Practice
After a while using the failboard, you start to get a sixth sense for fail. Some code path that makes use of a
synchronized() block is slowing down at odd times? A bit of UTF-8 encoded text goes through a couple of programs and comes out garbled? ConcurrentModificationException? The sources of these problems will shortly end up as tallies on the board.It gets more powerful when you can preemptively detect fail, not in your own code, but in your teammates'. Believe me, if you punish teammates severely enough, it will happen less, because everyone develops a nose for fail.
Leave a comment