Saturday, July 07, 2007

Persistence Scaling

Interesting interview with Michael Stonebraker (the Ingres guy) from the ACM.[via GLinden]

The basic thrust is the RDBMS is slowing fading in its place at the heart of IT systems everywhere, for a variety of reasons. I personally have seen a lot more in the embedded database and in-memory database category lately, but he's talking about the relational model itself failing for warehousing, streams, text, and list processing. I don't know if the database concepts behind his current company Streambase and stream data processing are that broadly applicable. Still, his depiction of the areas of the relational model where there are scaling or intelligibility problems seems right on. I am not sure I get the following, but it makes sense in a way too:

"It’s the same case in scientific and intelligence databases. Most of these clients have large arrays, so array data is much more popular than tabular data. If you have array data and use special-purpose technology that knows about arrays, you can clobber a system in which tables are used to simulate arrays."

Vector oriented programming?

Stonebraker makes a nice point about how the ActiveRecord component of Rails is in effect a cleaner way of interacting with data than SQL. It really is. That said, I wonder if LINQ is on his radar? Sidetracked by this great post on from 9 till 2 on the 'Rubenesque' (That's Paul Rubens, not Matz) status of c#, which says the following about LINQ:

"Historically, there have been more Microsoft ways to access the Northwind database than they are rows in the Customer table. OLEDB, ODBC, DAO, RDS, JRO, RDO, SQLXML, ADO, ADO.NET, Entity Services....I suspect that DLINQ will probably not be the last ever ever ever in this data accessing periodic series."

And tellingly:

"LINQ has that big benefit of being able to treat relational, XML hierarchical and in-memory data objects all with the same query syntax, allowing you to swap store types at a drop of a hat. The nagging doubt though is that this may be the same big benefit akin to not having to use SOAP over HTTP. How did that go again? Something wonderful for a demo, but something that may not actually be a real pressing 'need'. There is a school of thought that when you're working with a relational database rather than a collection of in-memory objects, then you should not lose track of the various nuances and advantages of the stores - abstraction to save typing can come back to bite you?"

Yeah, like if you're doing Rails finders without :include? Ooops, N queries just ran. Still, I like the idea of s3record. As Nutrun says: "I have occasionally participated in conversations around the subject of the database as a product with an expiry date, destined to eventually be replaced by highly distributed data storage models. Although S3's data storage and retrieval model looks presently better suited for larger units of data (e.g. media content), it would be interesting to investigate how it could be applied as an Object persistence service."

This is obviously the wrong implementation for such a scheme, but it points in one of the directions we're exploring. The concepts of streams, spaces, distribution are the future of scaling and persistence. It won't be a simple design decision whether a database is at the heart of your system or not. It's going to be usual.