text() is a code smell
In a lot of the XQuery code I've come across, I see the text() node test being used. It is, of course, perfectly legal and valid to use. In some cases, it's essential. But most often I see text() being used where the string() function should be used instead. In these cases, the code is a sitting duck, waiting to be broken. Moreover, since text() is rarely needed in simple queries and, in my experience, its inappropriate uses far outweigh its appropriate uses, I feel perfectly comfortable positing this: text() is a code smell. If you have a habit of using text() a lot, then read on.
The simplest way to explain what I mean is by example. Consider the following query, which is intended to output one <li> element per <item> element in the input:
And consider some sample input (at "/test.xml"):
Running the above query yields exactly the desired result:
So what's the problem? text() did just fine. Isn't it essential here? If we were to leave it out:
Then the <item> elements themselves would get copied to the result, and that's not what we want:
Okay, good point. But now let me throw a wrench into your code. Let's say test.xml now looks like this:
Here is the new output from the original query that uses text():
Uh-oh. Now it's doing what the code instructed (output one <li> per text node), but it's not doing what we wanted. The presence of the comment (<!--was the second-->) caused the first <item> to contain not one but two text nodes. No matter, we just need to fix the code. Let's move the text() part to the expression inside the <li> element constructor:
Now we get the original desired output. Problem solved, right? Not so fast. I've got another wrench:
Here's what our newly fixed query outputs for the above document:
Where did the word "second" go? Well, it's not a text node child of <item>, so it didn't get copied through. Only the text node children of <item> were copied through, just as your code instructed.
You may be thinking: give me a break. You
changed the data, and your code broke. Big deal. That doesn't mean
text() is a code smell. After all, I can fix it again like this (by
using //text() instead of /text()):
To which I would reply: do you always want to have to set yourself up for chasing down subtle bugs when seemingly innocuous changes are made to your data? There's a more robust way. It's called the string() function:
The string() function converts its argument to a string. In the case of a node, it returns the string-value of the node. In the case of an element (or a document node), the string-value of the node is the concatenation of the string-values of all its descendant text nodes. Lo and behold, that fits the bill perfectly! Now it doesn't matter if any inline markup constructs (sub-elements, comments, or processing instructions) are added. Our code will continue to work as intended.
If you call string() with no arguments, the
context node is taken to be the implicit, default argument. In
other words, string() is short for string(.). That means you can use it as a direct
replacement for text() in many cases, and it will give you the
desired result without being so easily breakable:
As I mentioned before, the text() node test (remember it's not a function even though it looks like one) has its perfectly legitimate uses. Out of curiosity, I did a search on the code base for RunDMC (the XQuery and XSLT code that runs this site), and I only found two instances of text() in all of the view-related code. They were both cases where using text() was essential (and string() couldn't be used instead). In contrast, there were many uses of string(). I mention this not in order to suggest that my code is absolutely exemplary (Ha! I only hope you don't look around too much), but that I'm at least practicing what I'm preaching. If you've got a different view, feel free to share it in the comment section below!
Comments