Wednesday, November 23, 2005

'Proprietary' Formats: KML and GML

This one makes even less sense than proprietary APIs- there is criticism of KML claiming that it is a proprietary format. I suppose the comparison is to GML, which is supposedly a non-proprietary format. In what sense is KML proprietary?

  • It's an open format. Published on the web. Documented.

  • There is no use restriction on the format. Other vendors and developers are encouraged to use it.

  • It is widely used and implemented by a growing variety of vendors.

  • Suggestions and comments on the format are accepted via a public forum.

  • Even the compressed binary format (KMZ) is just using the common zipfile compression algorithm, which has implementations on every operating system and modern programming language, rather than something that you need to buy a seperate product to use.

It would appear that the only salient difference in proprietary-ness is that KML was developed by a single company (albeit with input from others), whereas GML was developed by a standards committee. The salient difference in the marketplace is that KML is usable and hand-editable, whereas GML is rather too complex for use without tools. In contrast to what one might expect, the standards committe developed format requires tools to create, whereas the one developed by Keyhole does not. Then again, looking at the history of the OGC, they were primarily pushed forward by the other vendors attempting to cooperate in opposition to dominance in the marketplace by ESRI and ESRI formats such as the shapefile and the arc .e00 formats, which had become the accepted interchange formats for vector data. This infighting has led to OGC standards in most cases being worse than standards which have been defined by individual vendors.

So far they have only been able to come up with "superset" type standards which have been overly philosophical in their approach- not designed with implementation efficiency in mind. I think there is a something to be said for formats that have been proven with a high performing implementation. Think of a reference implementation, such as Apache Tomcat now is for the Java Enterprise Edition for the Servlet specification. It is possible to create a poorly defined or bloated spec such that the implementations are going to be burdened by poor performance.

The proprietary label should be reserved for those formats which are protected by copyright or obfuscation and don't allow for open use- think of the various DRM music formats offered by Apple and Sony- not an XML format that is documented on the web. And oh yeah- ESRI is supporting KML now too- if you need any more proof that it's not proprietary, you must work on a standards committee. I've been there, I'm not going back.


rlake said...

That GML requires tools and KML does not is simply nonesense. GML is no more or less complex than KML. Look at the geometry model and temporal models of the two and you will see they are the same. Both make use of user defined schemas.

The BIG difference is simply that Google provides a viewer - a tool if you will that reads KML. That is the key differnce.

Matt M said...

I agree that it is a bit of an exagerration to claim that GML requires tools to create, but it is true for me. I am sure you can edit it by hand in a text editor just like I do with KML. However, it is a much bigger schema to keep in my wee cranium.

How do you balance creating a superset style specification that does everything, while one that is simple and small enough to be widely used? It is useful how GML is broken into numerous sub-schemas, and I think there is room in there for some alternate approaches. But right now you end up with some referential dependency complexity, where you have to look up where the various items are defined, you can't just peek under the covers and easily use the bits (like the geometry model) and still be called GML.

If you think of GML as the holistic concept for defining an entire geography application, versus the little bit that you need for 90% of the uses, which is defining a simple set of geometries, attributes, and how they should be displayed- an XML version of a shapefile + display parameters, that would be ideal.

I think the similarity between the geometry and the temporal models of the two is really important. If KML used those from the gml: namespaces, it might all fit together. As it is though, I can write an XPath style non-validating parser that can read GML and KML. The general approach though of doing something well, and then growing into other things is the philosophy that should be espoused here.

Perhaps even a simpler method of expressing the schema, RelaxNG vs. XSD would help tame the complexity?

Roger said...

Wow - you really identified the prime problem with GML and to a lesser degree KML - their complexity has reached a degree that for all intents and purposes they require tools to be able to manipulate and maintain them. In effect, they are machine-oriented, not human-oriented languages. I recall a comment Don Box made a number of years ago to the effect that XML was really best for machine-to-machine communication and not for human editing, and for which he received criticism. But in the long view he was right, especially once the W3C started issuing brain-numbing XML stardards such as XML Schema, XML Infoset, XML Query among many.

GML is really a well-thought out and comprehensive structure built upon these XML standards. From a distance GML looks like an elegant solution to geospatial data exchange - and it is if someone else has written all of the GML for you and it never needs changing. However, once you step into the GML fray, the quogmire of understanding all of these specications and their interactions becomes quickly apparent.

XML succeeded where SGML failed because it's creators focused upon simplcity and usefulness. They choose not to create SGML profiles or subsets. XL is turning complex again from the endless "comprehensive" standards specifications being written for it. However, many of the more sucessful applications of XML are those that don't get bogged down with these standards. I would argue that GML needs an infinitely simpler cousin that works in 90% of the uses.

Ron Lake said...

It is important to understand that GML and KML are complementary to one another. KML is a language for visualization of geographic data - like SVG but with a virtual globe as the canvas. GML was not desiged for map drawing - but to capture models of real world entities - meaning named entity types (like buildings, roads, bridges, buoys etc). In both cases we need geometry (just as we do in SVG), but the focus is different. GML is about the details. GML is readily styled into KML for presentation on GE or similar tools. Equally you can transport GML inside KML using the KML Metadata tag. The Referential complexity issue mentioned my Matt is handled very well by the subset tool ( a pair of XSLT scripts that are packaged in the spec). This makes it easy to create profiles of GML for specific purposes - like GeoRSS to name just one example which are pretty simple.


Anonymous said...

The issue is less a matter of how the format originally came into being (e.g. a single company or a committee), but how it evolves going forward. If a single company can control something that becomes a de facto standard, it gives that company an unfair advantage. Put another way, whether to adopt a proposed change or not will be a decision that probably factors in the "owner's" interest over other companeis or even the marketplace. Please realize I don't have much of a strong opinion with regard to this particular format - I use kml regularly in my job and am fine with it. However, to say that it doesn't matter whether a single company or a standards committee guides the evolution of the format is a bit naive and historically ignorant. Of course, there are successful standards by committee (e.g. jpeg, mpeg,etc. etc.) and cumbersome, impractical ones...