XML Connections

Saturday, November 22, 2008

More on an XQuery format-number function

Recently Chris Wallace has been writing about formatting numbers in XQuery; as you may remember, I did blog about an XQuery-based format-number function some time ago, providing a partial implementation.

Chris mentions my post in his Wikibooks entry, and he has been so nice to run some tests and identify a few bugs and limitations. As I've recently had to "enjoy" a red eye flight on my way back to Boston, I couldn't resist the temptation to fix some of those issues.

Attached is a better version; it includes also support for the typical functionality you would access through xsl:decimal-format. The XQuery includes the tests that Chris posted on the Wikibooks entry, plus a few more, whose results are:
This is far from being a complete solution; there are several patterns that are not supported at all; but it's getting better... As I commented here I do believe that format-number needs to become part of the XQuery language and natively supported by the engine, which is the direction where XQuery 1.1 is moving.

formatnumber-xquery.xq

Labels: , ,

Thursday, November 20, 2008

XQuery for the SQL programmer – And performance?

In this last post of the XQuery for the SQL programmer series, I would like to spend a few minutes on performance. The previous post listed a dozen data integration uses cases, deploying an XQuery engine on top of your SQL database. The question is of course how performant such solution can be.

If you have a rather naive implementation retrieving the complete table (or multiple tables) and subsequently perform queries on an in-memory representation, well of course, performance will be unacceptable slow. If it works at all, once you start to query your production database with millions of records.

The tricky part is to have a performant and scalable XQuery engine, that is capable of translating XQuery straight into SQL. And we believe DataDirect XQuery is...

We wrote a white paper about translating XQuery to SQL, showing concrete XQuery queries and the corresponding SQL. I would advice to read the document, but in short, the SQL generation is based on the following principles

  • Minimize data retrieval
  • Leverage the database strengths
  • Optimize for each database
  • Retrieve data efficiently
  • Support incremental evaluation
  • Optimize for XML hierarchies
  • Give the programmer the last word

And of course, when it comes to answering your data integration challenges, it's a matter of joining and aggregating relational data with other formats in the most optimal way. We have blogged about this topic before, but there is of course much more to say. Looks like I should spend some more blog-time on the performance and scalability aspects of data integration through XQuery.

And remember, performance is one aspect, developer's productivity is also important. Think of all the APIs to master, Java code to write - and maintain! - to combine multiple data sources, while all this can be done in a single XQuery.

Tech Tags:

Labels: , , ,

Friday, November 7, 2008

Grouping an XML document based on element names

It has been a while since we have talked about some "pure XML" problem to be solved with XQuery; so when I read this un-answered post on the Stylus Studio Developer Network I thought that was a good chance to talk about it here as an interesting XQuery example.
The problem involves moving from a flat XML structure like this one: ...to a more hierarchical XML that "explodes" the implicit structure hidden in the original XML element names: In the end this is a grouping problem, but a bit trickier than usual, as it involves recognizing and exploding the groups from the original XML element names.
Even if XQuery 1.0 doesn't support grouping explicitly, the fn:distinct-values() function is extremely useful in solving grouping problems. fn:distinct-values() gets a sequence of atomic values in input and returns a sequence containing the same values with any duplicate removed. That helps a lot with our problem, as we can retrieve what all the unique top level categories are (MAINx) and what the unique sub categories are (SUBy) for each top level one. Add to that a very simple use of the fn:tokenize() function that splits a name like "MAIN1_SUB1_COLNAME1" into a sequence like ("MAIN1", "SUB1", "COLNAME1"), and the problem is easily solved; here is the XQuery I came up with:That generates the following XML result, which is what we are looking for:A simple example, but it makes use of some useful functions and structures in XQuery; just to remind us that while we keep talking about how useful XQuery is to deal with heterogeneous data sources and leverage the XML Data Model as an abstraction from the physical details of the data we are dealing with, XQuery is extremely powerful and flexible also in the "simpler" cases where you need to manipulate and re-arrange XML structures.

Labels: ,