Monday, December 19, 2005

no more DOCTYPE...

Drop the: "Back when we cooked up XML in 1996-97, there were good reasons to have that ugly upper-case gibberish at the top of your XML documents. That was almost ten years ago; now it’s time to do away with it, and also time to have a spec for Doctype-free XML."


Tim Bray is ready to drop the doctype. I am too. I am not overly fond of discussing the foibles of my colleagues, present or former, preferring to discuss topics in the abstract. However, a particularly heinous form of idiocy that I was exposed to in a previous consulting engagement was related to this. These folks were operating on a network that was not connected to the internet. They were concerned that all of these doctype declarations in XML were referring to things like http://www.w3.org/TR/html4/strict.dtd. So, they wanted to do the logical thing and create websites on their internal network that resolved to those addresses and host the DTDs there. You know, so they could download the DTDs and parse the files. I think a similar group at that organization expressed a great interest in starting a project to provide a buzzword-compliant web service to actually perform validations of XML against Schemas and DTDs. Which would of course have been accompanied by an enterprise policy that forbade anyone else from validating XML....

I really think anyone writing a parsing application needs to make sure that they don't put in any dependencies on actually reading the DTD. I always try to parse XML using XPath in a relatively fault tolerant way. I honestly think that one of the reasons for the ascendancy of Internet Explorer was that they went out of their way to try to display as much as they could of any old hacked up HTML that is out there. This to me is the best part of Tim's post->
"...it’s usually a bad idea for a document to express an opinion as to what its own schema is. Most useful languages have more than one schema, and it’s the absolute right of someone receiving a document to decide whether it meets their definition of “valid” before they use it, so they’re going to make their own decisions as to which schema (if any) is appropriate."

I reserve that right.



0 comments: