Converting XML to JSON using XSLT

Transforming XML to JSON: (part 1|part 2|source)

For some while now I had the idea of converting XML documents to JSON, just by using an XSLT stylesheet. The advantage of this solution is you have an xml2json-converter usable in any environment supporting transformations. Unix or Windows, Compiled or Scripted; one JSON solution for all!

But the real reason I wrote this was the challenge if it could be done. One issue actually withheld me from writing it for some time: how to handle arrays with XPath. But before I go into this, I first explain the translation rules (mapping). You find the source-code in my research section.

First we have value-types: boolean, number and string. If a text-node contains “true” or “false”, it becomes a boolean. If it is a number according to XPath, it becomes a number in JSON. And otherwise the text-node must be a string. Mixed content is ignored. So if you have the following markup:

<data>
  this is ignored.
  <age>34</age>
  <married>false</married>
  <name>Doekman</name>
</data>

The text “this is ignored.” will be ignored. I do this with the rule which states the text-node’s parent must have more than one element-child. With the same rule, the indenting white-space is ignored. String-escaping and correct number conversion is still on my todo-list, so don’t rely on it ;-)

Element siblings are converted to an object. From the previous XML example, the mark-up within the data-tag will be converted to {age:34,married:false,name:”Doekman”}. As you might recognize, my solution determines when an object-property is a javascript reserved word, and applies quotes appropriately. For JSON compliancy, replace the calls to the quote-property-template with quote-character.

When all element-siblings have the same tag-name, I wanted to use an array. As consequence, you loose the name of the tag. If you have the following mark-up:

<items>
  <item>one</item>
  <item>two</item>
</items>

The produced JSON will be {item:[”one”,”two”]}. It took me a while to think of the of the right XPath selecting this pattern. To pattern is:

*[count(../*[name(../*)=name(.)])=count(../*) and count(../*)&gt;1]

This fine piece of hard to read code translates into the following English:

  • an element belongs to an array when:
    • the count of the child-elements (count(../*) with the same name as the current node (name(../*)=name(.)) equals the count of all child-elements (count(../*))
    • and there are at least two child-elements (count(../*)&gt;1)

That is the downside of introspection: this solution doesn’t support empty arrays or arrays with one element. Talking about empty: empty elements (<empty/> or <empty></empty>) are converted to null (or actually, the absence of children in an element-node will result in a null literal). And for your convinience, XML comments and attribute-nodes are converted to Javascript-comments.

This solution was really fun to write. I’m still thinking about string-escaping, and correctly support javascript-numbers. I also might support conversion from XML (ISO 8601) to native javascript dates.

Resources:

 
blog\xml2json.txt · Last modified: 2006-12-04T23:31
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki