Monday, May 12, 2008

BigTable, NoJoins

As I continue my trepidatious trot away from the comfortable "normalized" world of relational databases into the strange new (old) world of the non relational store. You've got your column stores...and then there's stuff lke BigTable, Hbase, MarkLogic, SimpleDB, CloudDB, CouchDB, RDDB, Lotus Notes (?).

I am still having trouble getting my head around the challenge of "no joins" and systems that do a lot of writes. The fundamental issue is- at some point you have to do so many writes to update the denormalized data that locking becomes a performance problem.

I was looking into updating my stupid application that was striving towards the NetFlix prize to make it use one of these services, just for fun. It's fine for that- no users making updates. But what if it were a system where you had many people updating the same data? It turns out that Google's BigTable actually handles that pretty well- with a very tight time driven history of updates.

Digging into HBase now trying to see if I can get some of that action.