Domain Modeling
Should the domain model be separate from the serialization format? For example, if you are using something like Avro or Thrift as your serialization format, does it make sense to have all of the objects in your domain modeled as objects generated by these technologies or is it better to just model them as Plain Old Java Objects and then find a way to serialize those?
This question has recently come up, and I find it reminiscent of the older question of whether your XML data model should be your domain model or whether your database model should your domain model. The basic difference effect of using this pure data model approach is that you end up with classes that do not contain any behavior- basically structs. In the other case, you end up with objects that have both data and behaviors that depend on that data. The additional wrinkle is that generating classes from schema, be it thrift, Avro, ProtocolBuffers or XML adds the additional wrinkle of working with generated code. You typically don't modify generated code, so this can make adding methods, functions, etc. to this data a little challenging.
There is a great body of knowledge out there that supports not putting a dependency in your code on a particular technology implementation. In particular, modeling your domain effectively and independent of other components follows a lot of solid principles, and lets you pursue a domain driven design. The opposite, an Anemic Domain Model has its limitations- a procedural design, generally missing out on OO.
The first "depends" I would consider is the programming paradigm. If you are working in a functional programming paradigm, not just a procedural one, for example in a language like Clojure, separating data model from behavior is a natural pattern. Many of the object oriented advantages to a rich domain model simply do not apply. If you are working in an object oriented paradigm, for example in a language like Java, it makes more sense to put the behavior with the data.
These are still generalities though. Generated code is less of a challenge in languages other than Java. Some languages, such as C#, allow partial classes, so you can combine generated code with code that you write in the same class. Dynamically typed languages like Ruby provide a lot of flexibility, so having model classes that are generated by a framework like ActiveRecord, or a pattern like Active Record)is not a problem, because the classes contain no code.
The question then comes up of things like the Hadoop MapReduce framework. While your code may be in Java, the programming paradigm of map reduce is functional. (Some would even say that it is giving us an opportunity to get away from the object oriented programming model- and that it is sad that we produce CompSci graduates that only know Java.) Thus the benefits of having behavior associated with your objects is greatly reduced.
However, in the particular case we are considering, we are not just working in MapReduce, we have a number of application layers that can use the domain object. We are working in Java, so we can't do a lot of cute things with partial classes or mixins. We do want to use Avro as a serialization format though, so there is some momentum for using that as the medium of our domain model. It forces a certain weakness into our domain model, and an awkwardness to our programming. It is not as noticeable when writing functional style Map Reduce code, but it is quite noticeable when we try to abstract behavior such as validation and mapping operations that naturally go on the model. We can get something close to the
Of course, if we go to a more pure, POJO domain model, it then becomes a little more tricky to go in reverse and make the serialization occur with Avro. Still thinking on this one...