XML Connections

Wednesday, August 29, 2007

XQJ Part VI - Manipulating the static context

Today's post in the XQJ series explains how to access and manipulate the static context through the XQJ API.

XQuery defines the Static Context as follows,

The static context of an expression is the information that is available during static analysis of the expression, prior to its evaluation.

Refer to the XQuery spec for the complete list, but the static context includes for example information like,

  • default element namespace
  • statically known namespaces
  • context item static type
  • default order for empty sequences
  • boundary-space policy
  • base uri
  • etc

Most of the components in the static context can be initialized or augmented in the query prolog. In the next example, the boundary-space policy is explicitly specified,

declare boundary-space preserve;
<e> </e>

If a static context component is not initialized in the query prolog, an implementation default is used. Indeed, although that XQuery defines default values for each of the components in the static context, as outlined in Appendix C of the XQuery specification, implementations are free to override and/or extend these defaults.
In theory this means that the same query can behave substantial different between two "conformant" XQuery implementations. Talking about interoperability... Not that I know of any implementation overriding the default function namespace from 'fn' to something proprietary. If an implementation does, I guess the marketplace will decide if it was a good choice...

Anyway, back to our example. Applications often need to change the defaults for some of the static context components. If you require to preserve boundary spaces in all queries, you have the option to add the boundary-space declaration to your queries, as shown above. Or, would it be nice, if the implementation's default can be overridden through the API and become active for all queries? Well, I guess it is not a matter of one approach being better than the other, it all depends on your application design and use case.

How can I set boundary-space policy to preserve through the XQJ API?

...
// get a static context object with the implementation's defaults
XQStaticContext xqsc = xqc.getStaticContext();
// make sure boundary-space policy is preserve
xqsc.setBoundarySpacePolicy(XQConstants.BOUNDARY_SPACE_PRESERVE);
// make the changes effective
xqc.setStaticContext(xqsc);
...

First retrieve the implementation's default values for the static context components through an XQStaticContext object. XQStaticContext defines setter and getter methods for the various static context components.
As show in the previous example an XQStaticContext is a value object. Changing any of the static context components doesn't have yet any effect. Only after calling setStaticContext() on the XQConnection object the new values in the XQStaticContext become effective. One can say that XQStaticContext objects are passed by value from the XQJ driver to the application and vice-versa.
Once the static context is being updated, all (and only) subsequently created XQExpression and XQPreparedExpression objects will assume the new values for the static context components.

...
// the boundary-space for this first query is implementation defined,
// i.e. depends on the implementation's defaults
XQPreparedExpression xqp1 = xqc.prepareExpression("<e> </e>");
// set the boundary-space policy to preserve
XQStaticContext xqsc = xqc.getStaticContext();
xqsc.setBoundarySpacePolicy(XQConstants.BOUNDARY_SPACE_PRESERVE);
xqc.setStaticContext(xqsc);
// the boundary-space policy for this second query *is* preserve
XQPreparedExpression xqp2 = xqc.prepareExpression("<e> </e>");
...

In the previous examples, the static context is updated at the connection level, and as such all subsequent created XQExpression and XQPreparedExpression object are affected. This is great if you want all your XQuery expressions to be based on the same defaults in the static context. But what if the default values need to be different for some XQExpression and XQPreparedExpression objects? The application has also the ability to specify an XQStaticContext during the creation of XQ(Prepared)Expression objects.

...
// change the boundary-space policy in the static context object
// but don't apply those change at the connection level
XQStaticContext xqsc = xqc.getStaticContext();
xqsc.setBoundarySpacePolicy(XQConstants.BOUNDARY_SPACE_PRESERVE);
// create a prepared expression using the modified static context
// other expressions subsequently created are not affected
XQPreparedExpression xqp1 = xqc.prepareExpression("<e> </e>", xqsc);
...

Again, such approach is useful if some static context components need to be changed for a specific expression, but want to keep the default values for most other expression being executed.

Almost all static context components are accessible through XQStaticContext. Here is the list,

  • Statically known namespaces
  • Default element/type namespace
  • Default function namespace
  • Context item static type
  • Default collation
  • Construction mode
  • Ordering mode
  • Default order for empty sequences
  • Boundary-space policy
  • Copy-namespaces mode
  • Base URI

In addition, XQStaticContext includes a number of XQJ specific properties,

  • Binding mode
  • Holdability of the result sequences
  • Scrollability of the result sequences
  • Query language
  • Query timeout

The most frequently used properties are "Binding mode" and "Scrollability", which are going to be discussed in a future post in this series. The Query language is by default XQuery, and can be changed to XQueryX. Supporting query timeout is optional, implementations are free to ignore it, it sets the number of seconds an implementation will wait for a query to execute.

Looking forward to the next post? We'll discuss how XQuery types are exposed through XQJ.

Labels:

Tuesday, August 28, 2007

XQuery against RDBMS: let the engine optimize your SQL


We recently noticed some questions on a newsgroup that attracted our attention. The question was something similar to this:

How to create an XQuery which generates SQL like SELECT t1.columnName1 FROM Table t1 WHERE t1.columnName2 IN('as','fa','pr')

My immediate reaction was: why is he worrying about that? You should't need to think about how to write an XQuery to obtain a specific SQL; it is the XQuery processor's goal to digest your XQuery and make the "best" SQL out of it...

The obvious XQuery that comes to my mind would be...

for $ts in fn:collection("Table")/Table
where $t1/columnName2 = ('as','fa','pr')
return $ts/columnName1


Strangely enough, experts on that newsgroup suggested to instead write the query like:

for $t in Table() where $t/columnName2 = 'as' or $t/columnName2 = 'fa' or $t/columnName2 = 'pr'
return $t


...or, assuming a sequence of values on which to filter:

for $v in $values for $t in Table()
where $t/columnName2 = $v
return $t


Why would they suggest such an unnatural way to solve that problem in XQuery? The reality is that I'm thinking in terms of what DataDirect XQuery would do; while they are thinking about different XQuery processors. DataDirect XQuery has been designed to re-write XQuery expressions in SQL without forcing the user to code XQuery in a specific way. In XQuery it's possible to express the same logic in many different ways; but it shouldn't be the XQuery author's responsibility to guess about how the underlying XQuery engine optimizes what he writes; it should be the XQuery engine that is able to take the "right decisions" no matter how the user codes the solution in XQuery (at least in reasonably equivalent scenarios, like the one described above).

For the record, the XQueries described above all end up generating the same execution plan in DataDirect XQuery:
If you are interested in more details about how DataDirect XQuery generates SQL when running XQuery against Relational data sources, a good source of information is: the Generating SQL white paper as Consistent SQL Generation.


Labels: , , , ,

Thursday, August 23, 2007

XQJ Part V - Serializing query results

The XQuery 1.0 specification consists out of multiple books, one is XSLT 2.0 and XQuery 1.0 Serialization. Given a data model instance, it defines how to serialize it into a sequence of octets. To mention a typical use case, it provides for example guidelines on how to write query results using XML syntax into a file.

Serialization defines a number of parameters which influence this process. The specification includes a detailed description for each of these parameters. We'll explain some through examples later on,

  • byte-order-mark
  • cdata-section-elements
  • doctype-public
  • doctype-public
  • encoding
  • escape-uri-attributes
  • include-content-type
  • indent
  • media-type
  • method
  • normalization-form
  • omit-xml-declaration
  • standalone
  • undeclare-prefixes
  • use-character-maps
  • version

Note that XQuery Serialization is an Optional Feature in XQuery. However, XQJ is more strict and requires every implementation to support serialization. XQJ does does not require every parameter defined in the XQuery Serialization spec to be supported in its full extend, but at least a default value for each of the parameters needs to be documented and behave conformant to the spec.
For DataDirect XQuery all parameters are documented here.

Suppose you want to serialize your query results in a file, fairly simple as shown in the next example,

...
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"doc('orders.xml')/*/ORDERS[O_ORDERKEY = '39']");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
new Properties());
...

Note the second argument of writeSequence() is an empty Properties object. You can also specify null. Both an empty Properties object and null are implying that the XQJ driver uses the default values for each of the serialization parameters.

You might get something as follows (assume this to be one line),

<ORDERS><O_ORDERKEY>39</O_ORDERKEY><O_CUSTKEY>
8177</O_CUSTKEY><O_ORDERSTATUS>O</O_ORDERSTATUS>
<O_TOTALPRICE>307811.89</O_TOTALPRICE><O_ORDERDATE>
1996-09-20T00:00:00</O_ORDERDATE><O_ORDERPRIORITY>3-MEDIUM
</O_ORDERPRIORITY><O_CLERK>Clerk#000000659</O_CLERK>
<O_SHIPPRIORITY>0</O_SHIPPRIORITY><O_COMMENT>furiously
unusual pinto beans above the furiously ironic asymptot
</O_COMMENT> </ORDERS>

Not really readable, some indentation would help. It's also good practice to add the XML declaration including an encoding. Suppose we want to encode the XML file as UTF-16,

...
Properties serializationProps = new java.util.Properties();
// make sure we output xml
serializationProps.setProperty("method", "xml");
// pretty printing
serializationProps.setProperty("indent", "yes");
// serialize as UTF-16
serializationProps.setProperty("encoding", "UTF-16");
// want an XML declaration
serializationProps.setProperty("omit-xml-declaration", "no");
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"doc('orders.xml')/*/ORDERS[O_ORDERKEY = '39']");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
serializationProps);
...

Much better what we get now,

<?xml version="1.0" encoding="UTF-16"?>
<ORDERS>
<O_ORDERKEY>39</O_ORDERKEY>
<O_CUSTKEY>8177</O_CUSTKEY>
<O_ORDERSTATUS>O</O_ORDERSTATUS>
<O_TOTALPRICE>307811.89</O_TOTALPRICE>
<O_ORDERDATE>1996-09-20T00:00:00</O_ORDERDATE>
<O_ORDERPRIORITY>3-MEDIUM</O_ORDERPRIORITY>
<O_CLERK>Clerk#000000659</O_CLERK>
<O_SHIPPRIORITY>0</O_SHIPPRIORITY>
<O_COMMENT>furiously unusual pinto
beans above the furiously ironic
asymptot</O_COMMENT>
</ORDERS>

Note that during serialization characters are escaped as needed for the specified encoding. Suppose a query returning a document with a registered trademark character, and the specified encoding is US-ASCII,

...
Properties serializationProps = new java.util.Properties();
serializationProps.setProperty("method", "xml");
serializationProps.setProperty("encoding", "ASCII");
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"<product>DataDirect XQuery®</product>");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
serializationProps);
...

And you'll get the following, note that the ® character is serialized as a character reference because it is not defined in the ASCII character set,

<product>DataDirect XQuery&#xae</product>

In some use cases, the cdata-section-elements parameter is useful. Suppose you're serializing some XML elements including ampersand characters. By default the & characters will be escaped, using CDATA sections might be preferable to make the XML file more human readable.

...
Properties serializationProps = new java.util.Properties();
serializationProps.setProperty("method", "xml");
serializationProps.setProperty("cdata-section-elements", "product");
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"<product>DataDirect XQuery &amp; XML Converters</product>");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
null);
...

Is serialized as follows,

<product><![CDATA[DataDirect XQuery & XML Converters]]></product>

Note that multiple elements can be specified through the cdata-section-elements parameter, separated by a semi-colon character. And in case the element is in a namespace, add the namespace uri using the James Clark notation, "{"+namespace uri+"}"localname

...
Properties serializationProps = new java.util.Properties();
serializationProps.setProperty("method", "xml");
serializationProps.setProperty("encoding", "UTF-8");
serializationProps.setProperty("omit-xml-declaration", "no");
serializationProps.setProperty("cdata-section-elements",
"product;{uri}product");
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"<e xmlns:p='uri'> " +
" <product>DataDirect XQuery &amp; XML Converters</product>" +
" <p:product>DataDirect XQuery &amp; XML Converters</p:product>" +
"</e>");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
null);
...

Yields the following result,

<?xml version="1.0" encoding="UTF-8"?>
<e xmlns:p="uri">
<product><![CDATA[DataDirect XQuery & XML Converters]]></product>
<p:product><![CDATA[DataDirect XQuery & XML Converters]]></p:product>
</e>

In addition to the XML output method, the XQuery serialization defines other output methods like HTML and XHTML. Note that these serialization methods will not "automagically" produce (X)HTML. It is still the query's responsibility to produce results conform to (X)HTML. But the serializer will consider the (X)HTML rules outputting the results. For example <br> elements will be serialized without a closing </br>.
Note for example the difference between the following result.xml and result.html

...
Properties serializationProps = new java.util.Properties();
XQPreparedExpression xqpe = xqc.createPreparedExpression(
"<html>line1<br/>line2</html>");
XQSequence xqs = xqpe.executeQuery();
serializationProps.setProperty("method", "xml");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
serializationProps);
XQSequence xqs = xqpe.executeQuery();
serializationProps.setProperty("method", "html");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.html"),
serializationProps);
...

result.xml is as follows,

<html>line1<br/>line2</html>

where results.html will look as follows,

<html>line1<br>line2</html>

If your interested in all the details about (X)HTML serialization, look here for HTML and here for XHTML.

In all previous examples, we've serialized the query results in a FileOutputStream. In addition an XQSequence can also be serialized into a java.io.Writer using the writeSequence() method. And getSequenceAsString() serializes to a java.lang.String.

Similar to serializing the complete XQSequence, there are methods to serialize the current item in the XQSequence. In the following example, the items in the query result are saved into individual files, result1.xml, result2.xml, and so on.

...
Properties serializationProps = new java.util.Properties();
serializationProps.setProperty("method", "xml");
serializationProps.setProperty("indent", "yes");
serializationProps.setProperty("encoding", "UTF-8");
serializationProps.setProperty("omit-xml-declaration", "no");
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery("doc('orders.xml')/*/ORDERS");
int i = 1;
while (xqs.next()) {
FileOutputStream file;
file = new FileOutputStream("/home/jimmy/result" +
i + ".xml");
xqs.writeItem(file, serializationProps);
file.close();
}
...

To conclude this post, note that XML serialization doesn’t always result in a well-formed XML document. More precisely it is either a well-formed XML document or a well-formed XML external general parsed entity. This is further explained in the serialization specification.

In the next upcoming post, we'll talk about manipulating the XQuery Static Context through the XQJ API.

Labels:

Monday, August 20, 2007

XQJ Part IV - Processing query results

In XQJ Part III we learned how to execute queries. In XQuery, query evaluation results in a sequence. In XQJ, executing a query through XQExpression or XQPreparedExpression returns an XQSequence object. An XQSequence represents an XQuery sequence with in addition a cursor over that sequence.

The application can browse through an XQSequence using the next() method. Initially the current position of the XQSequence is before the first item. next() moves the current position forward and returns true if there is another item to be consumed. Once all items in the sequence have been read, next() returns false.

Let's iterate through a sequence,

...
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery("doc('orders.xml')//order[id='174']");
while (xqs.next()) {
...
}
...

Positioned on an item, the application can retrieve the data using one of the getXXX() methods. To give a taste, we'll go through some of these methods by example.

An application can use getObject() to retrieve the current item of an XQSequence as a Java object. XQJ defines a mapping for each of the XQuery item types to a Java object value.

One of the most common scenario is probably a query returning a sequence of elements. Using getObject(), XQJ defines a mapping to Java DOM elements,

... 
org.w3c.dom.Element employee;
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery("doc(employees.xml)//employee");
while (xqs.next()) {
employee = (org.w3c.dom.Element)xqs.getObject();
...
}
...

But actually, XQJ defines a mapping for every XQuery type to Java objects, including all the atomic types. Assume for example a query retuning xs:decimal values, using getObject() your Java application retrieves the items as java.math.BigDecimal objects,

... 
java.math.BigDecimal price;
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"doc('orders.xml')/orders/order/xs:decimal(total_price)");
while (xqs.next()) {
price = (java.math.BigDecimal)xqs.getObject();
...
}
...

Suppose you have a query returning atomic values, and want to retrieve a textual representation of these. For example to output to System.out. getAtomicValue() returns a string representation of an atomic value according to the XQuery xs:string casting rules, and throws an exception if the item is a node.
In the next example the query returns a sequence of atomic values, note that the items are not all of the same type.

... 
XQExpression xqe = xqc.createExpression();
XQSequence xqs = xqe.executeQuery(
"'Hello world!', 123, 1E1, xs:QName('abc')");
while (xqs.next()) {
System.out.println(xqs.getAtomicValue());
}
...

Beside the DOM, XQJ also provides native support for 2 other popular XML APIs, SAX and StAX. In the next example each of the items is returned to the application through SAX,

... 
ContentHandler ch = ...
XQExpression xqe = xqc.createExpression();
XQSequence xqs = xqe.executeQuery(
"doc(employees.xml)//employee");
while (xqs.next()) {
xqs.writeItemToSAX(ch);
}
...

Up to now we have seen a number of examples where the application iterates over all the items in the sequence, and retrieves them one-by-one. The XQSequence interface also offers functionality to retrieve the complete sequence within a single call. In the next example, we execute a query and serialize the complete result into a SAX event stream.

... 
ContentHandler ch = ...
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery("doc('employees.xml')//employee");
xqs.writeSequenceToSAX(ch);
...

Or in a similar way, read the complete sequence as a StAX event stream.

... 
XQExpression xqe = xqc.createExpression();
XQSequence xqs = xqe.executeQuery("doc('employees.xml')");
XMLStreamReader xmlReader = xqs.getSequenceAsStream();
while (xmlReader.next() != XMLStreamConstants.END_DOCUMENT) {
...
}
...

Beside exposing the sequence through a SAX or StAX event stream, XQSequence also provides the ability to serialize into a binary or character stream. Here we're entering the arena of XSLT 2.0 and XQuery 1.0 Serialization, that's what the next post will be about.

Last, the above examples all iterate forward through the XQSequence objects. XQJ has also the notion of scrollable sequences, allowing to move both forward and backwards, set the cursor to an absolute position and allowing to iterate through the XQSequence more than once. We'll come back to it later.

Labels:

Thursday, August 16, 2007

XQJ Part III - Executing queries

In XQJ Part II we explained how to create a connection. Now your application is ready to do some real work, executing queries.

In XQJ an XQExpression objects allows you to execute an XQuery expression. Such XQExpression object is created in the context of an XQConnection. The next example creates an XQExpression and subsequently uses it to execute an XQuery expression,

...
// assume an XQConnection xqc
XQExpression xqe = xqc.createExpression();
xqe.executeQuery("doc('orders.xml')//order[id='174']");
...

The result of a query evaluation is a sequence, which is modeled as an XQSequence object in XQJ. Hence, the result of the executeQuery() method is an XQSequence. In typical scenarios the code example of above will look actually as follows,

...
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery("doc('orders.xml')//order[id='174']");
// process the query results
...

In the next post, XQJ Part IV, we'll discuss the XQSequence functionality in detail.

An XQExpression object can be reused, each time a different XQuery expression can be executed. The next example retrieves all orders with id 174 and next the orders with id 267,

...
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
// execute a first query
xqs = xqe.executeQuery("doc('orders.xml')//order[id='174']");
// process the query results
...
// execute a second query
xqs = xqe.executeQuery("doc('orders.xml')//order[id='267']");
// process the query results
...

In this last example we execute twice almost the same query, only the value compared against changes.

XQJ supports the concept of prepared queries. The idea here is to "prepare" the query only once, and subsequently "execute" it several times. During the prepare phase, the query is parsed, statically validated and an optimized execution plan is generated. This can be a relative expensive operation, hence using XQPreparedExpression objects can improve performance if the same query is executed multiple times.

Using prepared queries often implies the use of external variables in your query. The application can bind with each execution different values to each of the external variables.
Coming back to the previous example, here is the XQPreparedExpression variant. Note that the XQuery expression is specified when the XQPreparedExpression object is created, not at execute time,

...
XQPreparedExpression xqp;
XQSequence xqs;
xqp = xqc.prepareExpression(
"declare variable $id as xs:string external; " +
"doc('orders.xml')//order[id=$id]");
// execute a first query and process the query results
xqp.bindString(new QName("id"),"174", null);
xqs = xqp.executeQuery();
...
// execute a second query and process the query results
xqp.bindString(new QName("id"), "267", null);
xqs = xqp.executeQuery();
...

The previous example demonstrated how to bind values to XQPreparedExpression objects, it is also possible to bind values to external variables using XQExpression objects. Suppose you have a DOM tree and want to query it.

...
Document domDocument = ...;
XQExpression xqe = xqc.createExpression();
xqe.bindNode(new QName("doc"), domDocument, null);
XQSequence xqs = xqe.executeQuery(
"declare variable $doc as document-node(element(*,xs:untyped));" +
"$doc//order[id='174']");
// process the query results
...

XQuery has the concept of a context item, represented by a "dot" in your queries. In the previous examples we demonstrated values can be bound to XQExpression and XQPreparedExpression objects by name. In XQuery, the context item has no name, as such XQJ defines a name to bind the context item, XQConstants.CONTEXT_ITEM. The next example is similar to the last one, but it binds the DOM document to the initial context item rather than to an external variable.

...
Document domDocument = ...;
XQExpression xqe = xqc.createExpression();
xqe.bindNode(XQConstants.CONTEXT_ITEM, domDocument, null);
XQSequence xqs = xqe.executeQuery(".//order[id='174']");
// process the query results
...

In a later post, we will come back on binding values to external variables or the context item.

Note that in all the above examples, the XQuery expressions are specified as Java Strings. XQJ also allows to specify an InputStream, as shown in the next example, where the query in the getorders.xquery file is executed,

...
InputStream query;
query = new FileInputStream("home/joe/getorders.xquery")
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(query);
// process the query results
...

Interesting to note here is that an xquery can optionally specify the encoding in the query prolog. Good XQJ/XQuery implementations will use that information and properly parse the InputStream. If no encoding is specified, the assumed encoding depends on the implementation, for DataDirect XQuery this is UTF-8.

We know how to execute queries and have introduced the concept of prepared expressions, next in this series we will focus on processing query results, expect XQJ Part IV soon.

Labels:

Wednesday, August 15, 2007

Why do I want to use XQuery against relational databases?


A few days ago someone brought to my attention a post by Elliotte Harold about "The State of Native XML Databases", which also mentions DataDirect XQuery:
"DataDirect XQuery is not itself a database. Rather it is an adapter layer that sits on top of your existing payware database such as SQL Server or Oracle and provides an XQuery interface. Why you’d want to use XQuery instead of SQL when talking to a relational database, I’ve never quite been able to fathom. Data Direct XQuery also has adapters for XML files, EDI, and other flat files."

Why would I want to use XQuery when talking to a relational database? I've spent a good portion of my last two years demonstrating how DataDirect XQuery can be used to solve problems that also require access to relational databases; so, let me describe a few usecases I have seen; in all of them, being able to use XQuery rather than Java+SQL makes developer's life much simpler and productive.

I need to query my database to generate XML documents
We have recently published a customer story about this; it's not infrequent that data needs to be transferred/communicated in XML format. So, even when the data is all stored in relational databases, SQL (plus whatever language around it to "massage" the data) is not necessarily the easiest path to generate XML representing portions of the relational data. The customer in question has commented on some quantitative comparison between *how much* easier it is; it's not difficult believe it if you know the technologies in question.

I need to process an incoming XML message and respond including data stored in my database
This is a very frequent usecases; in some cases instead of receiving an XML request the user is dealing even with EDI; but there are tools that make the conversion of EDI to XML quite easy, so the problem ends up really being the same. How would you address that problem without an XQuery processor able to handle relational database access? You would probably write a Java application that parses the incoming XML document, even these days most likely you would materialize it in an XML DOM, fetch the information you needs from it, issue SQL queries over JDBC to get more related information, and finally you would create an XML document to return the requested result(s). We did try implementing this exercise dealing with some ACORD XML requests, and a relational back-end storing the various insurance policy details; we will soon publish more information and code samples on http://www.xquery.com/; in the meanwhile, I'll need to leave it to your imagination how the Java code looks like, and how it compares to the equivalent XQuery+Java code that would achieve the same goal using DataDirect XQuery.

I need to shred the content of my XML document into my database
This is one of the reasons why we have added RDBMS update capabilities in the latest version of DataDirect XQuery; it is another quite popular usecase. Again you are dealing with the necessity to navigate an XML document, find the proper information in there, and then execute operations against a relational database based on that information. Once again the same problem can indeed be solved using Java+SQL+some XML API; but isn't it more productive to use an XML native language featuring the ability to access both XML and relational data in the same context?


To wrap up, I believe the real question I would ask myself is not "Why would I want to use XQuery when talking to a relational database?", but rather "Can I use XQuery to access a relational database achieving the same levels of scalability and performance that I could achieve using a combination of Java and SQL?". That's exactly where the difficulty is, and where XQuery implementations differ dramatically. Of course it would be easy for me to answer "Sure you can! DataDirect XQuery has been designed exactly with that goal in mind!"; and I could point you to a variety of literature on http://www.xquery.com/ (the most interesting one being probably this one by Marc). But instead, I encourage you to try it yourself; get a fully functional trial version from http://www.xquery.com/download, and give it a try, using your own usecase, your own database and your own data set. That's the only way you will actually experience how XQuery can make things easier for you without scalability and performance compromises.


Labels: , , , ,

Monday, August 13, 2007

Formatting numbers in XQuery 1.0


I often need to use XQuery to create XHTML or even XSL-FO; which is apparently a fairly unusual usecase for XQuery. It is true that most people dealing with the transformation of XML into HTML or XSL-FO have been doing that with XSLT in the past; and I guess most people working on XQuery specs assumed that this would still be the case even after XQuery became a standard.

Or at least that's the only reason I can find for some obvious things that seem to be missing in XQuery 1.0, one of which is the availability of a format-number() function, which is a quite basic function for creating a string representation of a number.

Why do I need to use XQuery rather than XSLT? Because the XHTML/XSL-FO reports I need to create are aggregating data that is available in part as XML documents, and in part inside a relational database; and at least some XQuery implementations make that task very easy to achieve in a highly scalable way.

Of course in most XQuery implementations like DataDirect XQuery or Saxon you can create Java extension functions to implement a format-number() function; but I wanted to find a way to achieve at least some of the format-number() functionality without having to rely on a Java function; that's why I ended up writing the attached piece of XQuery.

I haven't spent much time cleaning it up, adding documentation or making it part of a nicely structured library module; if anyone is willing to do that (or to make it become part of FunctX), please feel free to. And also, if you have any comments or suggestions about how to improve it, feel free to post them here.

format-number.xquery



Labels: , , , ,

FunctX XQuery function library is available for download

FunctX is an XQuery library with over 100 useful functions written by Priscilla Walmsley.

The source has been available for a while, but it was a little hard to use, as the code was separated for each function. Priscilla made now an XQuery module available for download.
Simply import the module in your query and be ready to use the functions.

The library has been tested with Saxon as well as DataDirect XQuery.

Labels: ,

Friday, August 10, 2007

XQJ Part II - Setting up a session

There is a large variety of XQuery implementations, both from an architectural perspective, as well as in the range of supported data sources. XQJ is designed that all these different architectures can be plugged in. For example, based on the XQJ and XQuery implementation, the required parameters and settings to configure the implementation might be different. Some implementations are server based, and require information to locate the server. Or, if the XQJ and XQuery implementation are co-located, you might need to specify the default location to query files on the local file system.

An XQJ application always starts with accessing an XQDataSource object. Such object encapsulates all parameters and settings needed to create a session with a specific implementation and eventually execute XQuery expressions and process results.

Every XQJ driver has its own XQDataSource implementation. This XQDataSource object supports a number of implementation-specific properties. For each of these properties, a "getter" and "setter" method is provided.

Assume an application wants to query both an Oracle database, and some files located in "/usr/joe/data" with DataDirect XQuery. Two properties need to be specified, the base uri and the jdbc url to connect to the Oracle database (feel free to replace Oracle with your favourite database, be it SQL server, MySQL or yet another one),

DDXQDataSource xqds = new DDXQDataSource();
xqds.setBaseUri("/usr/joe/data");
xqds.setJdbcUrl("jdbc:xquery:oracle://sales:1521;SID=ORA10");

Or envision another Oracle specific implementation where the server parameters are specified in individual properties rather than through a jdbc url. It could be as follows,

OracleDataSource xqds = new OracleDataSource()
xqds.setServerName("sales");
xqds.setPortNumber(1521);
xqds.setSID("ORA10");

Having access to an XQDataSource object, what's next? An XQDataSource is a factory for XQConnection objects. The XQConnection object represents a session in which XQuery expression are executed.

Establishing such a session is straightforward,

XQConnection xqc = xqds.getConnection();

In case user credentials are needed, these can be specified as arguments to the getConnection() method,

XQConnection xqc = xqds.getConnection("joe", "topsecret");

So far so good. Using the approach outlined above to create XQDataSource objects, makes the application dependent on a specific XQJ implementation. The proprietary classes DDXQDataSource and OracleDataSource are referenced. This is not necessarily wrong, there are scenarios where hard-coding the underlying XQJ implementation makes sense.

But often this is not desirable, XQJ is all about making your application independent from the underlying XQuery implementation. How can we make our application independent of the XQJ implementation? We'll show two approaches.

  • using a Java properties files
  • through JNDI

Assume all the XQDataSource properties are stored in a Java properties file, and in addition a property ClassName to identify the XQDataSource implementation to use.

For the DataDirect XQuery example above, the properties file would look as follows,

ClassName = com.ddtek.xquery3.xqj.DDXQDataSource
BaseUri = /usr/joe/data
JdbcUrl = jdbc:xquery:oracle://sales:1521;SID=ORA10

For the Oracle implementation,

ClassName = org.example.xqj.OracleDataSource
ServerName = sales
PortNumber = 1521
SID = ORA10

Using such properties file, an application can easily abstract out any hard-coded dependencies on the underlying XQJ implementation. The XQDataSource class is loaded through reflection and next it is simply a matter of passing in the properties.

// load the properties file
Properties p = new Properties();
p.load(new FileInputStream("/tmp/xqjds.prop"));

// create an XQDataSource instance using reflection
String xqdsClassName = properties.getProperty("ClassName");
Class xqdsClass = Class.forName(xqdsClassName);
XQDataSource xqds = (XQDataSource)xqdsClass.newInstance();

// remove the ClassName property
// the XQJ implementation will not recognize
// it and raise an error
p.remove("ClassName");

// set the remaining properties
xqds.setProperties(tmpProperties);

// create an XQConnection
XQConnection xqc = xqds.getConnection();

Of course, this is just an example, it might well be that for some applications another means to load the datasource properties is better suited.

Similar to JDBC, running in a J2EE environment, the XQDataSource object can be stored in a JNDI-enabled naming service. This allows your application to access the XQDataSource by simply specifying a logical name.

// get the initial JNDI context
Context ctx = new InitialContext();
// load the XQDataSource instance
XQDataSource xqds = (XQDataSource)ctx.lookup("xqj/sales");
// create an XQConnection
XQConnection xqc = xqds.getConnection();

For the readers with a JDBC background; JDBC has two mechanisms to establish a connection

  • DriverManager
  • DataSource

Don't look in XQJ for DriverManager-like functionality, XQJ doesn't offer this legacy functionality.

We have now learned how to create an XQConnection. In our next post we will do some real work, and show how to execute queries.

Labels:

Tuesday, August 7, 2007

XQJ Part I - Introduction

This is the first post in a series about XQJ (JSR 225). The XQJ initiative aims to provide the standard Java API to access XQuery implementations. Eventually XQJ will be for XQuery what JDBC means for SQL.
XQJ is currently in public review, the specification can be downloaded here.

I hope this series will give a good overview of the major interfaces and functionality offered by XQJ, mostly through example code. The majority of the code should be implementation independent, but I have to admit, it will be tested using DataDirect XQuery 3.0, which supports the XQJ Public Draft.

We have today the following essays in the XQJ series (the list is updated as more posts become available),

Let's start with our first simple XQJ application; the XQuery/XQJ version of "Hello World!".

XQDataSource xqjd = new DDXQDataSource();
XQConnection xqjc = xqjd.getConnection();
XQExpression xqje = xqjc.createExpression();
XQSequence xqjs = xqje.executeQuery("'Hello World!'");
xqjs.writeSequence(System.out, null);
xqjc.close();
I assume most of it speaks for itself. But don't worry if you have some questions, things will become clear in the upcoming posts.

Some of you might recognize some of the JDBC concepts in the code example above.
Indeed, the XQJ expert group decided to make XQJ stylistic consistent with JDBC. Similar to JDBC is has concepts like,

  • datasources
  • connections
  • expressions and prepared expressions
But at the same time, XQuery is not SQL, as such XQJ differs from JDBC,
  • data model
  • typing
  • static context
  • error handling
The next post in this series will introduce data sources and setting up connections.

Labels: