Transforming XML to JSON: (part 1|part 2|source)
I still needed to escape some special characters in javascript strings. A javascript-string is surrounded by double quotes1), and the following characters need escaping:
\ to \\ (put a backslash before the backslash)" to \" (put a backslash before the double quote)\t \n\r
Escaping the last three was actually pretty easy, because characters are replaced with a string in which the original character doesn’t occur. Because of this, I could recursively replace all occurences. Just test with the contains function if a character still occurs in the string, and use the concat/substring-before/substring-after-trio to replace them (see the encode-string-template in the source).
As it happens, these characters are also non-printable. Actually, it is not needed to escape the tab-character. Javascript-strings can contain tab-characters, but I like them to be visible. The CR and LF characters must be escaped, since javascript-strings can’t span multiple lines.
After my first victory, I realized this method does not work with the printable characters from my list. After a quote is replaced (") with the escaped version (\"), the double quote still occurs in the string. Because of this, the contains function can’t be used. To make things worse, the backslash from the escaped quote shouldn’t be escaped by the double backslash. Only backslashes from the original string should be escaped.
At this point, I already learned text manipulation in XSLT is not for the faint of heart. There are some string manipulation functions, but they are a bit weird. In javascript you have substring and indexOf and you think in terms of position. In XPath you only have these methods combined: string-before($a,$b) in XSLT is like a.substring(0,a.indexOf(b)) in Javascript. For some reason the folks at w3c developers don’t need positions in strings2). W3c’s response on this might be: “the position-concept [in XPath] is encapsulated.”
My first solution was to address all situations of the backslash/quote occurence in a string3):
I needed this, in order to prevent replacing quotes in already replaced quotes, and to make sure the backslash is replaced first. For the replacement algoritm I used what I call divide-and-conquer: the string is split into a left part which doesn’t need replacing, and a right part which does. In the middle I insert the escaped string. There is different from the non-printable escape template, which just replaces characters on the whole string, until there are no more characters left.
The next small challenge was to determine the difference between situation 3 and 4. The other situations are easily handled with the contains function. But since we don’t have an indexOf method, I was puzzled for a while how to determine whether a quote or a backslash comes first in a given string. After a while browsing through the XPath specification, I noticed the function string-length was the only string-related function returning a number. And after this insight, the solution was obvious:
string-length(substring-before($s,'"'))<string-length(substring-before($s,'\'))
Now the string-escaping was done with three templates:
escape-string - providing for the surrounding quotesescape-bs-and-quot-string-in-one-template - quoting the backslashes and double quotesencode-string - for escaping the tab, line feed and carriage return characters
But I really didn’t like the backslash/quote-template. It is a too big, and if I also wanted to escape single quotes, I would end up with 164) situations. This is because of contains and “is-substring-before-substring” statements need to be combined.
But at that point I realized it was way easier to break the middle template into two templates. First the escape-bs-string template divide-and-conquers, and passes the text left of the backslash to escape-quot-string and the right part to itself. And after that, escape-quote-string does the same trick. I leave the escape-bs-and-quot-string-in-one-template template in the XSLT for mine and your entertainment, but it will never be called.
Also note: we still need the when-contains/otherwise-construction, because both ‘substring-before and substring-after’’ return an empty string when the substring is not found. That is the cost for position encapsulation