Corporate Competence

|
1213940897512.jpg
I really love it when people just do their jobs.  I feel gifted whenever I call a company and get a customer support representative who know what they are doing and actually cares about me.

It's rare, but it happens.

Worst ISP Ever


For a while, I had Comcast's cable internet service.  It was clear after two years of putting up with their horseshit that they don't care about customers at all.

Oh, wait, they set up a Twitter account.

Fantastic, but my BitTorrent shit still didn't work on their network.  Their installation staff is rude and has questionable hygeine, and their customer support representatives are downright lazy.

Switch to AT&T Now

When I moved, my first order of business was to call Comcast and tell them it's over.  They said my service wouldn't end until I brought back my cable modem, and of course, the place I need to bring it back to is only open during working hours.

I took off work early to get this little brick of dissatisfaction back to its rightful owner, because fuck them.

At the same time, I was waiting for AT&T to show up and install U-Verse internet service.  They did, and shit was impressive.

  • They told me the tech would be at my house any time from noon to 2pm on a Sunday.  The tech showed up at noon on the dot.
  • It took him about an hour to set up the service.  When he left, he gave me a card with his direct cell phone number.  If I had any problem in the next ten days, I called him directly and he would come fix it.
  • An hour after he left, the service went out.  I called him, and he was back at my house within 30 minutes.  It turns out there was something wrong with the line from the street to my house, and he had to get another tech out to fix it.  That guy showed up, fixed the problem, and was on his way.  The two of them were at my place until 8pm on a Sunday until the job was done right.
I've been using the service for almost a week now and it's great.

  • No BitTorrent fuckery.  All my torrents work great, and I can seed.
  • 10 megabits downstream, 1.5 megabits upstream.
Great job, AT&T, you actually care about the people paying your salaries.


Having Fun At Mahalo's Expense

|
Holy crap this is awesome.  Mahalo, the "human powered search engine", now lets the general public edit parts of its pages.

For the uninformed, Mahalo is a for-profit installation of MediaWiki founded by notable blowhard Jason Calacanis and backed by Sequoia Capital.  They're aiming to be a hand-vetted search engine to compete with Google.  Right.

Anyway, now you can edit their pages anonymously.  Since they're a capitalist version of Wikipedia, they need to pay people to review edits, so they aren't too quick on the uptake.  Here's how to do it:

Step 1: Search for something at mahalo.com


mahalo_search.png
Step 2: Click the 'edit' link

loren_page.pngStep 3: Add your verbiage. Remember, this is MediaWiki syntax, so you can link to other Mahalo pages.


edit_page.png
Step 4: Hit 'save' and collect your winnings

win.png
A search engine that anyone can edit.  What could possibly go wrong?

Practical Unique Identifiers

|
dogs_love_md5.jpgThere have been a handful of places within the Persai pipeline where I have needed unique identifiers of varying length.  64 bits here, 32 bits there.  I'm not the only one to ever have to solve this problem, but I could never find a concise toolbox of information on it.

Automatic Increment or Not


MySQL has the AUTO_INCREMENT modifier for integral record keys.  That's great, if you're using MySQL.  In general, prefer a non-automatically increasing record identifier, unless you have a specific reason.  Here's why:

  • You may actually have to think about thread synchronization at some point when creating records.
  • If these identifiers become publicly visible, they can leak information about how many records are in your database.
  • If you make identifiers out of other pieces of data (say URLs), then you can't get the identifier value of a given datum without a table lookup.  And even then, you'll need another index on that field.
There are a few cases where automatic increment identifiers are good, though:

  • You are using a MySQL database and are setting up a simple structure of tables. (i.e. MySQL handles synchronization for you and it's actually harder to not use automatic increment)
  • The creation order of records is really important to you, but not important enough to store a timestamp field.

Making an Identifier Out Of Arbitrary Data

Easy, right?  Just hash whatever data you've got.  It's not reversible and spread uniformly over the identifier space.  However, many times the output of a standard hashing algorithm is too big.  SHA-1, for example, is 160 bits wide.  Way too long for most purposes.

In this case, I truncate the output.  Yes, this is mathematically valid, because any good hashing algorithm's output will be uniform over the range of the function.  And by uniform, I mean really uniform.  For example, if you take the first 64 bits of a 160-bit SHA-1 hash and call that your unique identifier, the probability of a collision is going to be uniform over the space of all 64-bit numbers.  If it wasn't (i.e. the first 64-bits of a SHA-1 hash were distributed, say normally), then the hash function would be cryptographically insecure.

Don't try to swing your dick around and come up with your own hash function.  You'll screw it up.  I know I have.

GUIDs

I got an e-mail from a reader about using GUIDs for unique identifiers.  This fits with the hashing scheme, but for the most part, I think GUIDs are far too large, especially if you are storing a lot of records.  GUIDs are 128 bits wide, so if you have a hundred million records, that's about 1.5GB worth of identifiers.  Use a 64-bit identifier, and your space is halved, without a significant increase in collision probability.

Making An Identifier Easier On The Eyes

If you need to put a unique identifier in a URL, it can't look too nerdy.  For example, this URL looks like shit:

http://www.website.com/document?id=1b25a53bf21d0206
Too many numbers.  So, to make it look better, Base-64 encode it.  It will lengthen the code a little, but it's much easier to look at:

http://www.website.com/document?id=ZnJvc3RlZCBidXR0cw==
Eh, well it looks better to me.  Personal taste, I guess.

You'll need to make sure that your Base-64 alphabet doesn't include the + and / characters: they aren't URL safe.

Sort Orderings

Don't worry about sort ordering unless you have to worry about sort ordering.  Duh.  The vast majority of Persai's data is stored simply as files, and for most purposes we don't have to care about the processing order.  We're fortunate in that regard (well maybe not fortunate, I mean that's like saying you're fortunate that you're not fat because you exercise and eat sensibly).

Anyway, there are a couple of places in Persai where sort order matters.  The ordering of recommendations, for example.  There, though, we're just ordering by time, and we need to display the exact time, not just the relative times of the recommendations, so we store a date field and order data by it in the store.

This drives one of my earlier points home: if you need ordering by time, don't count on an automatic increment unique identifier to do it.  It's much more robust to store a timestamp.

In fact this point goes deeper.  Very rarely do you actually need records sorted by record identifier.  What you need is the records sorted by some other value that happens to be reflected in the record identifier by virtue of automatic increment and the insertion order.  It's always more robust to store the actual value you need to sort by.

I'm Not Going To Tell You How To Write Code

Because I don't really care.  This is how I do it, though.