XML Connections

Tuesday, March 4, 2008

Approval for XQJ 1.0

I blogged last week on submitting XQJ to the JCP for final approval. Yesterday, it successfully passed the Final Approval Ballot in the JCP Executive committee.

XQJ 1.0 will be released in the coming weeks! I'll keep you informed when the download page becomes available, and about DataDirect's plans to officially support XQJ 1.0.

I would like to use this opportunity, and thank everyone in the JSR-225 Expert Group for their efforts on this specification.

Tech Tags:

Labels:

Wednesday, February 20, 2008

XQJ goes for Final Approval Ballot

The XQJ specification (JSR 225) has been submitted to the JCP for final approval. The Final Approval Ballot lasts from February 19 until March 3. I'll keep you posted here.

Want to know more about XQJ, we have a tutorial covering most functionality. And there are already several implementations available, including DataDirect XQuery.

Tech Tags:

Labels:

Sunday, November 25, 2007

XQJ Part XI - Processing large inputs

Today's post in the XQJ series explains how to handle and query large XML documents through the XQJ API.

Since XML became a standard in the late 90's, we have been taught that XML is a tree; and the most intuitive (and popular) representation of such tree has been (still is!) the Document Object Model (DOM).

When you think about querying XML documents, using XQuery, XSLT or XPath, you usually think about a processor that navigates the DOM tree, extracts, compares the values it needs, and it creates another DOM as a result of those operations. Which is indeed what happens using typical XML processing implementations. Although today's processors use a more optimal representation than DOM, the problem remains the same, scalability.

What happens if the XML you are dealing with cannot be represented in the physical constraints of the memory available to your application? That's usually the limit that typical "in-memory" XQuery, XSLT, XPath implementations hit. But what if you were able to forget about DOMs, forget about materializing in memory the whole XML tree and do XML processing in a purely streaming fashion?

Using an XQuery streaming processor, like DataDirect XQuery, is a good start. But a chain is only as strong as the weakest link. Beside the streaming capabilities of your XQuery implementation, also the API must have the provision to handle those large XML fragments.

From an XQuery API perspective, it is crucial that the input to your query can be handled in a streaming fashion. In XQJ Part VIII - Binding external variables we learned how to bind values to external variables declared in an xquery. By default, binding a value to an XQExpression or XQPreparedExpression using bindXXX(), it is consumed during the binding process, and it stays active and valid for all subsequent execution cycles. We say that XQJ operates in 'immediate binding mode'.
Let's look closely at one of the pipeline examples from the previous post in this series.

...
XQExpression xqe1;
XQSequence xqs1;

xqe1 = xqc.createExpression();
xqs1 = xqe1.executeQuery("doc('orders.xml')//order");

XQExpression xqe2;
xqe2 = xqc.createExpression();
xqe2.bindSequence(xqs1);
xqe1.close();

XQSequence xqs2;
xqs2 = xqe2.executeQuery(
"declare variable $orders as element(*,xs:untyped) external; " +
"for $order in $orders " +
"where $order/@status = 'closed' " +
"return " +
" <closed_order id = '{$order/@id}'>{ " +
" $order/* " +
" </closed_order>";
xqs2.writeSequence(System.out, null);
xqe2.close();
...

During the bindSequence() call, the complete xqs1 sequence is consumed. Subsequently we can safely close the xqe1 expression, freeing up any runtime resources it held. On the other hand, consuming the complete sequence during bindSequence() implies that the XQJ implementation has to buffer the data one way or the other for subsequent query evaluations. All this works perfectly fine as long as we're handling relative small XML instances. But as the data is buffered, it breaks all opportunities for the underlying XQuery processor to take advantage of its streaming capabilities.

If you know that the data bound to the external variable will be used for only a single XQuery execution, is there then a way to inform the XQJ/XQuery implementation of possible optimization opportunities, and use its streaming capabilities?

The default binding mode in XQJ is 'immediate', which means the value bound to an external variable is consumed during the bindXXX() method. In addition, an application has the ability to set the binding mode to 'deferred'. With deferred binding mode, the application gives a hint to the XQJ-implementation and underlying XQuery processor, to take advantage of its streaming capabilities. In deferred binding mode, bindings are only active for a single execution cycle. The application is required to explicitly re-bind values to every external variable before each execution.

You can change the binding mode through the XQStaticContext interface, as shown in the next example. Refer to Part VI in this series for more information on how to manipulate the static context.

...
XQStaticContext xqsc = xqc.getStaticContext();
// change the binding mode
xqsc.setBindingMode(XQConstants.BINDING_MODE_DEFERRED);
// make the changes effective
xqc.setStaticContext(xqsc);
...

In deferred mode the application cannot assume that the bound value will be consumed during the invocation of the bindXXX() method. The XQJ-implementation is free to read the bound value either at bind time or during the subsequent evaluation and processing of the query results. This has some consequences on when resources can be cleaned up. If we consider the first example again, it will not work properly in deferred binding mode. Note that xqe1 was closed right after calling bindSequence(). The example needs to be modified as follows,

...
XQExpression xqe1;
XQSequence xqs1;

xqe1 = xqc.createExpression();
xqs1 = xqe1.executeQuery("doc('orders.xml')//orders");
XQExpression xqe2 = xqc.createExpression();
xqe2.bindSequence(xqs1);

XQSequence xqs2 = xqe2.executeQuery(
"declare variable $orders as element(*,xs:untyped) external; " +
"for $order in $orders " +
"where $order/@status = 'closed' " +
"return " +
" <closed_order id = '{$order/@id}'>{ " +
" $order/* " +
" </closed_order>";
xqs2.writeSequence(System.out, null);
xqe2.close();
xqe1.close();
...

This example shows how to build a pipeline of xqueries. But deferred binding mode applies also to the other bindXXX() methods. In the next example we show how to bind a StreamSource to the context item. As binding mode is deferred, the implementation can handle the query in streaming mode and as such process huge XML documents that don't fit in available memory.

...
XQStaticContext xqsc = xqc.getStaticContext();
// change the binding mode
xqsc.setBindingMode(XQConstants.BINDING_MODE_DEFERRED);
// make the changes effective
xqc.setStaticContext(xqsc);

XQExpression xqe;
XQSequence xqs;

xqe = xqc.createExpression();
xqe.bindDocument(
XQConstants.CONTEXT_ITEM,
new StreamSource("large_orders_document.xml"));
xqs = xqe.executeExpression("/orders/order")
...

To conclude, using deferred binding mode requires a little more care than immediate. But the potential improvements when querying large XML documents is enormous. Of course, the API needs to provide the necessary functionality, but the heavy lifting is performed in the underlying XQuery processor. Especially with DataDirect XQuery, where deferred binding mode allows you to both take advantage of XML document projection and its XML streaming capabilities. This allows to query XML documents in the hundreds of megabytes, even in the gigabytes!

Labels:

Wednesday, November 21, 2007

XQJ Proposed Final Draft

The Proposed Final Draft for XQJ, which is being developed under the Java Community Process as JSR 225 is available for download. XQJ is the standard Java API to access XQuery implementations. Eventually XQJ will be for XQuery what JDBC is for SQL on the Java platform.

The Proposed Final Draft includes the following components,

  • Specification (PDF)
  • JavaDoc of the API (HTML)
  • Java sources of the API
  • JAR file of the API (xqjapi.jar)
  • Reference implementation
  • Technology Compatibility Kit

Comments can be send to jsr-225-comments@jcp.org.

Want to learn XQJ? The ongoing XQJ series is a good starting point. And there are implementations available, including DataDirect XQuery.

Labels:

Sunday, October 28, 2007

XQJ Part X - XML Pipelines

Today's post in the XQJ series is about XML Pipelines. How can we create a pipeline of XQueries or how to integrate XQuery with JAXP based XML processors.

An XML pipeline is a sequence of XML processes, also called transformations, the result of one transformation is passed as input to the next one. In case the question comes up, read Norman Walsh’s essay Why Pipelines? Historically, the transformations coming to mind are XSLT transformation, XML Schema validation, a simple XML parsing, etc. But of course, it can also be an XQuery. This post is about integrating XQuery and XQJ in such XML pipeline.

Let’s start with a pipeline, where the result of a first XQuery is passed on into a second. Given that query execution results in an XQSequence, and an XQSequence can be bound to an external variable, pipelining two xqueries is rather simple.

...
XQExpression xqe1 = xqc.createExpression();
XQSequence xqs1;
xqs1 = xqe1.executeQuery("doc('orders.xml')//orders");
XQExpression xqe2 = xqc.createExpression
xqe2.bindSequence(xqs1);

XQSequence xqs2 = xqe2.executeQuery(
"declare variable $orders as " +
" element(*,xs:untyped) external; " +
"for $order in $orders " +
"where $order/@status = 'closed' " +
"return " +
" <closed_order id = '{$order/@id}'>{ " +
" $order/* " +
" </closed_order>";
xqs2.writeSequence(System.out, null);
xqe2.close();
xqe1.close();
...

In this example, both queries in the pipeline are executed in the context of a single XQConnection object. But this is not required; it is perfectly possible to have two different connection objects, possibly from different XQJ implementations.

The application writer shouldn’t be concerned how the query result from the first is passed to the second xquery. Whether for example a DOM, SAX, StAX, serialized XML or some other (proprietary) mechanism is used, is an implementation detail. The application can assume that the most appropriate mechanism is used.

Let’s now look how an XQuery can be integrated in a pipeline with other XML processors. We'll work out two scenarios, the result of an xquery is passed to an XSLT transformation, and next vice versa, an xquery processing the results of an XSLT transformation. On the Java platform, XSLT transformations are processed through the JAXP api. JAXP makes use of the Source and Result interfaces to query XML documents and handle the results of transformations. As XQJ has built-in support for these interfaces, it is possible to integrate both JAXP and XQJ in a pipeline.

The next example invokes an XSLT transformation, and the results are passed as input to an XQuery. Under the covers, SAX events will be. It is favorable to use SAX from a performance and scalability perspective, rather than when a serialized XML stream would be passed, not to talk about the use of a DOM tree.

First an XMLFilter for the XLST transformation is created. And a SAXSource, the input for the XSLT Transformation.
It is this SAXSource which is bound to the external variable of the XQuery. Subsequently the xquery is executed, and finally the application consumes the results of the xquery.

...
// Build an XMLFilter for the XSLT transformation
SAXTransformerFactory stf;
stf = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
XMLFilter stage1;
stage1 = stf.newXMLFilter(new StreamSource("stage1.xsl"));

// Create a SAX source, the input for the XSLT transformation
SAXSource saxSource;
saxSource = new SAXSource(stage1,new InputSource("input.xml"));

// Create the XQuery expression
XQPreparedExpression stage2;
stage2 = xqc.prepareExpression(new FileInputStream("stage2.xquery"));

// Bind the input document as a Source object to stage2
stage2.bindDocument(new QName("var"),saxSource);

// Execute the query (stage2)
XQSequence result = stage2.executeQuery();

// Process query results
result.writeSequenceToResult(
new StreamResult(System.out));
...

As you notice in this example, the pipeline is activated starting from the end. Under the covers, the XQJ implementation will invoke the parse() method on the SAXSource bound to the external variable. Which will start invoking SAX callbacks to the XSLT transformation and this will on its turn perform callbacks to the XQJ implementation. These callbacks represent the result of the XSLT transformation, allowing the xquery to yield results back to the application.

Let's now have a closer look at a pipeline where the results of an xquery are passed on to an XSLT transformation.

First we present a utility class, XQJFilter, an XMLFiler based on an XQJ XQPreparedExpression object.

public class XQJFilter extends XMLFilterImpl {
XQPreparedExpression _expression;

public XQJFilter(XQPreparedExpression expression) {
_expression = expression;
}
public void parse(InputSource source) throws SAXException {
try {
XQSequence xqs = _expression.executeQuery();
Result result = new SAXResult(this.getContentHandler());
xqs.writeSequenceToResult(result);
xqs.close();
} catch (XQException e) {
throw new SAXException(e);
}
}
}

Next we’ll use this XQJFilter implementation to build the pipeline,

...
// Create an XQuery expression
XQPreparedExpression xqp;
xqp = xqc.prepareExpression(new FileInputStream ("query.xq"));
// Create the XQJFilter, the first stage in our pipeline
XQJFilter stage1 = new XQJFilter(xqp);
// Create an XMLFilter for the XSLT transformation, the 2nd stage

SAXTransformerFactory stf;
stf = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
XMLFilter stage2 = stf.newXMLFilter(new StreamSource("stage2.xsl"));
stage2.setParent(stage1);
// Make sure to capture the SAX events as result of the pipeline
stage2.setContentHandler(...);
// Activate the pipeline
stage2.parse("");
...

As with the previous example, also here the pipeline is activated through the end, by calling the parse method on the 2nd stage.

In our next post we discuss deferred binding mode, working with external variables, this is essential functionality to handle huge XML documents.

Labels:

Wednesday, October 3, 2007

XQJ Part IX - Creating XDM instances

In the previous posts of the XQJ series, we have learned how to handle XDM instances as result of query execution; iterating through sequences, and get access to the items in the sequence. What if we want to create an XDM instance, without execution a query, can we?

XQJ offers functionality to create both XQSequence and XQItem objects. I mean, not as a result of a query execution, but rather as standalone XDM instances. This functionality is offered through the XQDataFactory interface. An XQDataFactory creates the following types of objects,

  • XQItem
  • XQSequence
  • XQItemType
  • XQSequenceType

Every XQConnection must implement the XQDataFactory interface. In XQJ 1.0 these are the only concrete XQDataFactory implementations, future versions might introduce different mechanisms to get access to an XQDataFactory.

Creating types

In the Typing post in this series, we have introduced the XQItemType and XQSequenceType interfaces. We have also learned how these objects are used to describe the static type of a query result and external variables. How do we create such type objects in our application?

Remember that XQJ defines a dozen of XQITEMKIND_XXX constants. For each of those there is a matching createXXXType method,

  • createAtomicType
  • createAttributeType
  • createCommentType
  • createDocumentElementType
  • createDocumentType
  • createElementType
  • creatItemType
  • createNodeType
  • createProcessingInstructionType
  • createSchemaAttributeType
  • createSchemaElementType
  • createTextType

Let's discuss some of the most common used methods in the above list.

The method createAtomicType(), creates an XQItemType object representing an XQuery atomic type. It accepts a single argument, an integer which is one of the predefined XQBASETYPE constants.
The next example create 3 XQItemType instances representing xs:integer, xs:string and xs:decimal,

...
XQItemType xsinteger = xqc.createAtomicType(
XQItemType.XQBASETYPE_INTEGER);
XQItemType xsstring = xqc.createAtomicType(
XQItemType.XQBASETYPE_STRING);
XQItemType xsdecimal = xqc.createAtomicType(
XQItemType.XQBASETYPE_DECIMAL);
...

Remember that every XQConnection is an XQDataFactory, in the example we've used our XQConnection xqc, to create these XQItemType instances. However, the XQItemType objects are completely independent of the connection.

Where the above example shows how to create XQItemType objects representing one of the built-in atomic XML Schema types, there is a second flavor of createAtomicType() for user-defined atomic types. Assume a hatsize user-defined atomic type derived from xs:integer in the http://www/hatsize.com schema,

...
XQItemType hatsize;
hatsize = xqc.createAtomicType(
XQItemType.XQBASETYPE_INTEGER,
new QName("http://www.hatsizes.com", "hatsize"),
new URI("http://www.hatsizes.com"));
...

Beside atomic types, also element types are frequently used. In the next example we create an XQItemType representing element(person),

...
XQItemType type;
type = xqc.createElementType(
new QName("person"),
XQItemType.XQBASETYPE_ANYTYPE);
...

The first argument to createElementType() is a QName. Where in the example a person element in no namespace is created, the next example creates an element type person in the namespace http://www.foo.com. The second argument can be any of the predefined types, beside xs:anyType also xs:untyped is frequently used,

...
XQItemType type;
type = xqc.createElementType(
new QName("person","http://www.example.com"),
XQItemType.XQBASETYPE_UNTYPED);
...

The first argument can also be null, which is assumed to be the wild card, the following code snippet shows the creation of element(*, xs:untyped),

...
XQItemType type;
type = xqc.createElementType(
null,
XQItemType.XQBASETYPE_UNTYPED);
...

What about document-node() types? In the next example we create two XQItemType instances, a first representing any document and a second representing a well-formed untyped document,

...
XQItemType type1;
XQItemType type2;
type1 = xqc.createDocumentType();
type2 = xqc.createDocumentElementType(
xqc.createElementType(
null,
XQItemType.XQBASETYPE_UNTYPED));
...

In addition to XQItemTypes, also XQSequenceType objects can be created.
As explained before in the Typing post, an XQSequence consists of

  • an XQItemType
  • the cardinality to constraint the number of items, one of the OCC_XXX constants defined on XQSequenceType.

As such creating an XQSequenceType is simple. The next example shows how to create a xs:string* sequence type,

...
XQItemType itemType;
XQSequenceType sequenceType;
itemType = xqc.createAtomicType(
XQItemType.XQBASETYPE_STRING);
sequenceType = xqc.createSequenceType(
itemType,
XQSequenceType.OCC_ZERO_OR_MORE);
...

Uisng types

So far so good, but why would one need to create all these types?

Assume an XQSequence, iterating over the items, if the item is a node retrieve is through the DOM, and get atomic values as Strings. This can be accomplished using the instanceOf() method, passing in an XQItemType object

...
XQItemType nodeType = xqc.createNodeType();
XQSequence xqs = ...
...

while (xqs.next()) {
if (xqs.instanceOf(nodeType)) {
org.w3c.dom.Node node = xqs.getNode();
...
} else {
String s = xqs.getAtomicValue();
...
}
}
...

Some XQuery implementations have support for the Static Typing Feature as defined in XQuery. This requires implementations to detect and report type errors during the static analyses phase.
For expressions depending on the context item, the application must specify the static type of the context item. Why? In order to perform static typing, the implementation has to know the static type of the context item. The application has to provide the static type, and failing to do so, will result in an error being reported during the static analyses phase.

As the static type of the context item is a static context component, the XQJ XQStaticContext interface allows it to manipulate.
The next example shows to set the static type of the initial context item to document-node(element(*, xs:untyped)),

...
XQItemType documentType;
documentType = xqc.createDocumentElementType(
xqc.createElementType(
null,
XQItemType.XQBASETYPE_UNTYPED));
XQStaticContext xqsc = xqc.getStaticContext();
xqsc.setContextItemStaticType(documentType);
...
XQPreparedExpression xqp;
xqp = xqc.prepareExpression("//address",
xqsc);
...

As last use case of XQItemType, remember some of the examples of the previous post in this series, Binding external variables.
The bindXXX() methods defined on XQDynamicContext have all a third parameter, which allows to override the default Java to XQuery data type mapping.

In the next example we bind a java Integer to the external variable, but rather than using the default mapping to xs:int, specify to map it to a xs:short,

...
XQItemType xsshort;
xsshort = xqc.createAtomicType(XQItemType.XQBASETYPE_SHORT);
XQPreparedExpression xqp;
xqp = xqc.prepareExpression(
"declare variable $v as xs:short external; " +
"$v + 1");
xqp.bindInt(new QName("v"), 22, xsshort);
...

Creating XDM instances

Having discussed the ability to create XQItemType and XQSequenceType instances, XQDataFactory offers also the ability the create XQItem and XQSequence instances.

There is basically nothing new under the sun. If you understand the way binding to an XQDynamicContext works, as discussed in our previous post, you almost know how XQItem instances are created. For every bindXXX() method defined on XQDynamicContext, there is corresponding createItemFromXXX() method.

Let's show a simple example, binding a java.math.BigDecimal to an external variable $d,

...
XQExpression xqe = ...
xqe.bindObject(new QName("d"),new BigDecimal("174"), null);


And creating an XQItem of type xs:decimal from the same java.math.BigDecimal,

XQItem xqi = xqc.createItemFromObject(new BigDecimal("174"), null);

Note that the XQItem objects created through XQDataFactory are independent of any connection.

Suppose you execute a query returning a single item, and subsequently close the connection but still require access to the XQItem. Closing the XQConnection will invalidate the XQItem object resulting from the query execution. As such XQDataFactory has an XQItem copy method. createItem() accepts a single XQItem argument, and returns a (deep) copy of the specified item.
The following example shows how to make a query result available, also after closing the XQSequence or XQConnection,

XQConnection xqc = ...
XQExpression xqe = xqc.createExpression();
XQSequence xqs;
xqs = xqe.executeQuery("(doc('book.xml')//paragraph)[1]");
xqs.next();
XQItem xqi = xqc.createItem(xqs.getItem());
xqc.close();
// although the connection is closed, xqi is still valid.

Suppose you have an XML document which needs to be queried multiple times, but don’t want to go through the XML parsing overhead, each time it is queried. In the following example, two queries are executed and as such, books.xml will be parsed twice,

...
XQExpression xqe = xqc.createExpression();
xqe.executeQuery("fn:doc('book.xml')//paragraph[contains(.,'XQuery')]");
xqe.executeQuery("fn:doc('book.xml')//paragraph[contains(.,'SQL')]");
...

Or suppose you receive a transient XML stream, for example in a servlet environment, and need to query the stream multiple times. Then one way or the other the data will need to be buffered in order to query it more than once.

How can we make a) an XML document being parsed only once, b) in case the XML stream is transient, make it 'queryable' multiple times?

Suppose two XQPreparedExpression objects, xqp1 and xqp2. The next example will create first an XQItem representing the XML document, as such it will be parsed only once. Second, it will be bound to 2 different XQPreparedExpression object,

...
InputStream input = ...
XQItem doc = xqc.createItemFromDocument(input, null);
...
xqp1.bindItem(new QName("doc"), doc);
...
xqp2.bindItem(new QName("doc"), doc);
...

One of the disadvantages of such apporach, especially with large document, are the scalability aspects and memory consumption. For example, in case of DataDirect XQuery, the streaming capabilities will not be of much use as the complete XML document is instantiated in-memory. We'll come back to the topic of processing large input documents in a future post of the XQJ series.

Finally, XQDynamicContext also allows to create XQSequence objects.
There is a createSequence() copy operation. I.e. with a single XQSequence argument, returning a copy of it. Similar to the XQItem example above, it allows to have query results outlive an XQConnection.

A second flavor of createSequence() accepts a java.util.Iterator, returning a sequence of items based on the objects returned by the Iterator. The objects are converted into XDM instances using the default object mapping defined in XQJ. For example, the following code snippet results in a sequence of xs:decimal instances,

...
// assume an ArrayList of BigDecimal objects
ArrayList list = ...
XQSequence s = xqc.createSequence(list.iterator());
...

Pipelines is the next topic we will discuss. How can one create a pipeline of xqueries, or pipelining an xquery with an XSLT transformations? Watch out for the next post.

Labels:

Sunday, September 23, 2007

XQJ Part VIII - Binding external variables

Last month in, XQJ Part III - Executing queries, we showed through some simple examples how to bind a value to an external variable declared in your query. In this post of the XQJ series, we will get into some more details on this subject.

As we know, XQuery operates on the abstract, logical structure of XML, known as the XQuery Data Model (XDM). As such, by definition in XQuery, the value bound to an external variable is an XDM instance. Having a Java object in your Java application, how is itconverted into such XDM instance? XQJ defines this mapping and glues it all together.

A first simple example,

...
XQPreparedExpression xqp;
XQSequence xqs;
xqp = xqc.prepareExpression(
"declare variable $id as xs:integer external; " +
"doc('orders.xml')//order[id=$id]");
xqp.bindObject(new QName("id"),new Integer(174), null);
xqs = xqp.executeQuery();
...

The bindObject() method is defined in the XQDynamicContext interface. It provides a number of methods to bind values to external variables. As XQDynamicContext is both the base for XQExpression and XQPreparedExpression, as such both expression implementations support binding values to external variables.
The first argument to the bindObject() method is a QName, which identifies the external variable in your XQuery. Second argument is the Java object to be bound and XQJ defines a mapping of Java objects to XDM instances. Providing the full list is out of scope, I would like to refer to the XQJ spec if you’re interested in all the details, but here a couple of examples,





































Java type

XQuery type

java.lang.Integer

xs:int

java.lang.BigInteger

xs:integer

java.lang.BigDecimal

xs:decimal

java.lang.String

xs:untypedAtomic

org.w3c.dom.Document

untyped document node

org.w3c.dom.Element

untyped element node

...

...

The third argument, for which null is specified in the example above, allows to override the default mapping. This is shown next [1].

... 
XQItemType xsinteger;
xsinteger = xqc.createAtomicType(XQItemType.XQBASETYPE_INTEGER);

XQPreparedExpression xqp;
XQSequence xqs;

xqp = xqc.prepareExpression(
"declare variable $v1 external; " +
"declare variable $v2 external; " +
"$v1 instance of xs:integer, "+
"$v1 instance of xs:int, "+
"$v2 instance of xs:integer, "+
"$v2 instance of xs:int");
xqp.bindObject(new QName("v1"),new Integer(174), null);
xqp.bindObject(new QName("v2"),new Integer(174), xsinteger);
...

This example yields a sequence of 4 xs:boolean instances,

true, true, true, false
A Java Integer is by default mapped to xs:int. xs:int extends by restriction xs:integer, as such the first two 'instance of' expressions evaluate to true. The second external variable is bound with an xs:integer instance as the application explicitly specifies to create such XDM instance. As such the last 'instance of' evaluates to false, as xs:integer is not extending xs:int.

Note that various error conditions can occur during the binding process,

  • The conversion from Java to XDM instance can fail.
    For example, a java.lang.Integer object with value 10000 is converted into a xs:byte. As 10000 is out of bounds of the xq:byte value space, an error will be reported
  • Once converted into an XDM instance, the binding can still fail in case the external variable declaration includes a declared type. In such scenario the XDM instance must match the declared type according to the rules of SequenceType matching.
    For example, a java.lang.Integer is bound and converted into an xs:integer instance, but the external variable is declared as xs:string.

We have introduced the bindObject() method through some examples, but XQDynamicContext has many more bind methods.

bindAtomicValue() accepts a java.lang.String and will convert it to the specified type according to the casting rules from xs:string, basically the specified string must be in the lexical space of the specified atomic type. In the following example the Java String "123" is converted into xs:string, xs:integer and xs:double instances and bound the the external variables $v1, $v2 and $v3.

...
xqp.bindAtomicValue(new QName("v1"), "123",
xqc.createAtomicType(XQItemType.XQBASETYPE_STRING));
xqp.bindAtomicValue(new QName("v2"), "123",
xqc.createAtomicType(XQItemType.XQBASETYPE_INTEGER));
xqp.bindAtomicValue(new QName("v3"), "123",
xqc.createAtomicType(XQItemType.XQBASETYPE_DOUBLE));
...

In contrast, the following two bindAtomicValue() invocations will fail. The first because "abc" is not in the value spaces of xs:integer. The second one because no type has been specified as third parameter, unlike with bindObject(), bindAtomicValue() has no default mapping and a XQItemType must be specified as third argument.

...
xqp.bindAtomicValue(new QName("e"), "abc",
xqc.createAtomicType(XQItemType.XQBASETYPE_INTEGER));
xqp.bindAtomicValue(new QName("e"), "123", null);
...

Further XQDynamicContext also provides bindXXX() methods for each of the Java primitive types,

  • boolean
  • byte
  • double
  • float
  • int
  • long
  • short

For example, binding an xs:integer instance 123 to the external variable $v. The default mapping for int is xs:int, as such we specify the type as third parameter.

xqp.bindInt(new QName("v"), 123, 
xqc.createAtomicType(XQItemType.XQBASETYPE_INTEGER));

Further binding a DOM node is also possible, basically the is equivalent to bindObject, with the restriction that the argument must be a DOM node and as such the XDM instance is always a node, never an atomic value. Of course in addition to DOM, also the SAX and StAX APIs are supported through XQDynamicContext.

Let’s read an XML document foo.xml through DOM, SAX and StAX and each time bind it to an external variable $v.

The DOM version,

...
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);

DocumentBuilder parser = factory.newDocumentBuilder();
Document domDocument = parser.parse("foo.xml");
xqp.bindNode(new QName("e"), domDocument,null);
...

The StAX version,

...
XMLInputFactory factory = XMLInputFactory.newInstance();
FileInputStream doc = new FileInputStream("foo.xml");
XMLStreamReader reader = factory.createXMLStreamReader(doc);
xqp.bindDocument(new QName("e"), reader, null);
...

And the SAX version, for which we need to implement an XML Filter.

...
XMLFilter xmlReader = new XMLFilterImpl() {
public void parse(String systemId) throws IOException, SAXException {
super.parse("foo.xml");
}
};
// the parent XML Reader is a SAX parser, this one will do the actual
// work of parsing the XML document
xmlReader.setParent(org.xml.sax.helpers.XMLReaderFactory.createXMLReader());
xqp.bindDocument (new QName("e"), xmlReader);
...

But of course, this is something you only want to use in specific scenarios. The simple use case of binding an xml file, can easily be accomplished in a single line. The XQJ implementation will make sure the XML file is parsed and queried.

...
xqp.bindDocument(new QName("e"), new FileInputStream("foo.xml"));
...

An XQItem or a complete XQSequence can also be bound to an external variable. We’ll discuss this soon in this XQJ series, in a post on pipelining. Talking about pipelining, XQJ also supports the JAXP Source and Result interfaces, these too will be discussed.

[1] In this series we have not yet introduced the createAtomicType() method defined on XQDataFactory. This will be handled in the next post. Anyway, for now it’s sufficient to know that it returns an XQItemType object representing the specified atomic type.

Labels:

Monday, September 10, 2007

XQJ Part VII - Typing

In today's post of the XQJ series, we'll have a closer look at how XQJ interacts with the XQuery type system.
XQuery is a strongly typed language, the type system is based on XML Schema. As it is an inherent part of XQuery, you'll need some notions of it to be really effective with XQuery. However, it is out of scope for this XQJ tutorial to go into all the details, my recent XQuery book recommendation is probably a good start if you're interested to know more about the XQuery type system.

XQuery defines a sequence type, as a type that can be expressed using the SequenceType syntax. It consists of an item type that constrains the type of each item in the sequence, and a cardinality that constrains the number of items in the sequence. Having sequences and items in the XQuery type system, XQJ defines two corresponding interfaces XQSequenceType and XQItemType.

XQSequenceType is a rather simple interface with only 3 methods,

  • getItemType() retrieves the item type of the sequence type
  • getItemOccurrence() retrieves the cardinality that constraints the number of items
  • toString() yields a string representation of the sequence type

XQItemType encapsulates more information,

  • getItemKind() returns whether it is an element, attribute, atomic type, etc
  • getBaseType() specifies the built-in schema type closest matching this item type. E.g. xs:anyType, xs:string, etc
  • getNodeName() yields the name of the node, which is a QName.
    getPIName() yields the name of a processing instruction, which is a String
  • getTypeName() specifies a QName identifying the XML Schema type of the item type. This can be either a built-in XML Schema type or user defined
  • toString() yields a string representation of the item type
  • there are some more attributes defined on XQItemType related to user defined schema type, but that would bring us too far in the context of this introductory series.

XQSequenceType and XQItemType objects are used in two different contexts,

  • the representation of the static type of an external variable defined in a query and the query result. In this context, the type is possibly abstract, like item(), node()+ or xs:anyAtomicType?
  • the concrete type of an item, here abstract types are not applicable

Lat's have a closer look at XQItemType, which specifies the item kind and base type,

...
XQSequenceType xqtype = ...
XQItemType xqitype = xqtype.getItemType();
int itemKind = xqitype.getItemKind();
int schemaType = xqitype.getBaseType();
...

XQJ defines constants for each of the item kinds representable in XQuery SequenceType syntax,

















































Sequence Type

XQJ definition

QName

XQITEMKIND_ATOMIC

element(...)

XQITEMKIND_ELEMENT

attribute(...)

XQITEMKIND_ATTRIBUTE

comment()

XQITEMKIND_COMMENT

document-node()

XQITEMKIND_DOCUMENT

document-node(element(...))

XQITEMKIND_DOCUMENT_ELEMENT

processing-instruction(...)

XQITEMKIND_PI

text()

XQITEMKIND_TEXT

item()

XQITEMKIND_ITEM

node()

XQITEMKIND_NODE

getBaseType() is used to determine more precisely the type in case of for example XQITEMKIND_ATOMIC. When we have an atomic type, is it an xs:string or xs:integer? XQJ defines constants for all the built-in XML Schema and XQuery types. It's a long list, too long for this post.


























XML Schema type

XQJ definition

xs:string

XQBASETYPE_STRING

xs:integer

XQBASETYPE_INTEGER

xs:untypedAtomic

XQBASETYPE_UNTYPEDATOMIC

...

...

Iterating over query results, XQJ allows you to request precise type information about each item. Suppose you want to use a different getXXX() method, depending on the item type,

XQSequence xqs = ...
while (xqs.next()) {
XQItemType xqtype = xqs.getItemType();
if (xqtype.getItemKind() == XQItemType.XQITEMKIND_ATOMIC) {
// We have an atomic type
switch (xqtype.getBaseType()) {
case XQItemType.XQBASETYPE_STRING:

case XQItemType.XQBASETYPE_UNTYPEDATOMIC: {
String s = (String)xqs.getObject();
...
break;
}
case XQItemType.XQBASETYPE_INTEGER: {
long l = xqs.getLong();
...
break;
}
...
}
} else {
// We have a node, retrieve it as a DOM node
org.w3c.dom.Node node = xqs.getNode();
...
}
}

OK, this can make your code rather complex and long. Sometimes it is needed, but most of the time a number of shortcuts can be taken. As explained in XQJ Part IV - Processing query results, you can use some of the more the general purpose methods. Suppose you need a DOM node in case the query returns a node, and the string value for all atomic values. The next simple example shows how to do this,

XQSequence xqs = ...
while (xqs.next()) {
XQItemType xqtype = xqs.getItemType();
if (xqtype.getItemKind() == XQItemType.XQITEMKIND_ATOMIC) {
// We have an atomic type
String s = xqs.getAtomicValue();
...
} else {
// We have a node, retrieve it as a DOM node
org.w3c.dom.Node node = xqs.getNode();
...
}
}

That's it for the dynamic type of items. The next example shows how to retrieve the static type of a query (for the JDBC, ODBC and SQL users, this is somehow similar to "describe information")

...
XQPreparedExpression xqe = xqc.prepareExpression("1+2");
XQSequenceType xqtype = xqe.getStaticResultType();
System.out.println(xqtype.toString());
...
With DataDirect XQuery this examples outputs xs:integer to stdout.
Similar, you can inquire the prepared expression to retrieve information about the external variables. As shown in the next examples, first we determine the external variables declared in the query, next we retrieve the static type of each of the external variables,
...
XQPreparedExpression xqe = xqc.prepareExpression(
"declare variable $i as xs:integer external; $i+1");
QName variables[] = xqe.getAllExternalVariables();
for (int i=0; i<variables.length; i++) {
XQSequenceType xqtype = xqe.getStaticVariableType(variables[i]);
System.out.println(variables[i] + ": " + xqtype.toString());
}
...

Why would one care about all this? Let's have a quick look at a use case.
The idea of exposing XQueries as web services is not new, remember for example the XQuery at Your Web Service research paper. A fully functional example of such 'XQuery Web Service' is available on xquery.com, and can be downloaded here. It is basically a servlet that reads xqueries from a specific directory, and makes each of the queries available as functions accessible through SOAP.
The servlet needs to determine the external variables in each of the queries in order to generate the WSDL, which contains an XML Schema definition describing the parameters for each operation. Something as follows, assuming an XQuery with two external variables, $employeeName and $hiringDate, declared as xs:string and xs:date.

...
<xs:element name="XXX">
<xs:complexType>
<xs:sequence>
<xs:element name="employeeName" type="xs:string"/>
<xs:element name="hiringDate" type="xs:date"/>
</xs:sequence>
</xs:complexType>
</xs:element>
...

All the information required to generate such XML schema definition, is available in the sequence type of each declared variable. And through XQJ this information becomes immediately accessible. We can write a piece of code translating the relevant item kinds and base types to an XML Schema definition as shown above, only a matter of a number of Java switch statements. But is there an easier way?

XQJ defines toString() on XQItemType as implementation dependent. Well, more precisely, it is a requirement to return a human-readable string. In any case, with DataDirect XQuery the string representation is based on the XQuery sequence type syntax, where the QName prefixes are as follows,

  • for QNames representing built-in XML schema types, the xs prefix is always used.
  • for QNames representing element or attribute names the prefixes as defined in the query are used. In case of duplicates, one is chosen in an implementation dependent manner

Going back to our XQuery Web Service use case, the strategy to map the external variable declaration to the WSDL becomes rather simple using toString(),

  • if the XQItemType is an atomic type, use the string representation
  • if the XQItemType is anything else, use xs:anyType

I hope this post gave you a feel for the XQSequenceType and XQItemType interfaces, and how you can take advantage of them in your application. Applications have also the ability to create XQItemType objects. We'll show how this can be done and detail out use cases in the next post of the XQJ series.

Labels:

Wednesday, August 29, 2007

XQJ Part VI - Manipulating the static context

Today's post in the XQJ series explains how to access and manipulate the static context through the XQJ API.

XQuery defines the Static Context as follows,

The static context of an expression is the information that is available during static analysis of the expression, prior to its evaluation.

Refer to the XQuery spec for the complete list, but the static context includes for example information like,

  • default element namespace
  • statically known namespaces
  • context item static type
  • default order for empty sequences
  • boundary-space policy
  • base uri
  • etc

Most of the components in the static context can be initialized or augmented in the query prolog. In the next example, the boundary-space policy is explicitly specified,

declare boundary-space preserve;
<e> </e>

If a static context component is not initialized in the query prolog, an implementation default is used. Indeed, although that XQuery defines default values for each of the components in the static context, as outlined in Appendix C of the XQuery specification, implementations are free to override and/or extend these defaults.
In theory this means that the same query can behave substantial different between two "conformant" XQuery implementations. Talking about interoperability... Not that I know of any implementation overriding the default function namespace from 'fn' to something proprietary. If an implementation does, I guess the marketplace will decide if it was a good choice...

Anyway, back to our example. Applications often need to change the defaults for some of the static context components. If you require to preserve boundary spaces in all queries, you have the option to add the boundary-space declaration to your queries, as shown above. Or, would it be nice, if the implementation's default can be overridden through the API and become active for all queries? Well, I guess it is not a matter of one approach being better than the other, it all depends on your application design and use case.

How can I set boundary-space policy to preserve through the XQJ API?

...
// get a static context object with the implementation's defaults
XQStaticContext xqsc = xqc.getStaticContext();
// make sure boundary-space policy is preserve
xqsc.setBoundarySpacePolicy(XQConstants.BOUNDARY_SPACE_PRESERVE);
// make the changes effective
xqc.setStaticContext(xqsc);
...

First retrieve the implementation's default values for the static context components through an XQStaticContext object. XQStaticContext defines setter and getter methods for the various static context components.
As show in the previous example an XQStaticContext is a value object. Changing any of the static context components doesn't have yet any effect. Only after calling setStaticContext() on the XQConnection object the new values in the XQStaticContext become effective. One can say that XQStaticContext objects are passed by value from the XQJ driver to the application and vice-versa.
Once the static context is being updated, all (and only) subsequently created XQExpression and XQPreparedExpression objects will assume the new values for the static context components.

...
// the boundary-space for this first query is implementation defined,
// i.e. depends on the implementation's defaults
XQPreparedExpression xqp1 = xqc.prepareExpression("<e> </e>");
// set the boundary-space policy to preserve
XQStaticContext xqsc = xqc.getStaticContext();
xqsc.setBoundarySpacePolicy(XQConstants.BOUNDARY_SPACE_PRESERVE);
xqc.setStaticContext(xqsc);
// the boundary-space policy for this second query *is* preserve
XQPreparedExpression xqp2 = xqc.prepareExpression("<e> </e>");
...

In the previous examples, the static context is updated at the connection level, and as such all subsequent created XQExpression and XQPreparedExpression object are affected. This is great if you want all your XQuery expressions to be based on the same defaults in the static context. But what if the default values need to be different for some XQExpression and XQPreparedExpression objects? The application has also the ability to specify an XQStaticContext during the creation of XQ(Prepared)Expression objects.

...
// change the boundary-space policy in the static context object
// but don't apply those change at the connection level
XQStaticContext xqsc = xqc.getStaticContext();
xqsc.setBoundarySpacePolicy(XQConstants.BOUNDARY_SPACE_PRESERVE);
// create a prepared expression using the modified static context
// other expressions subsequently created are not affected
XQPreparedExpression xqp1 = xqc.prepareExpression("<e> </e>", xqsc);
...

Again, such approach is useful if some static context components need to be changed for a specific expression, but want to keep the default values for most other expression being executed.

Almost all static context components are accessible through XQStaticContext. Here is the list,

  • Statically known namespaces
  • Default element/type namespace
  • Default function namespace
  • Context item static type
  • Default collation
  • Construction mode
  • Ordering mode
  • Default order for empty sequences
  • Boundary-space policy
  • Copy-namespaces mode
  • Base URI

In addition, XQStaticContext includes a number of XQJ specific properties,

  • Binding mode
  • Holdability of the result sequences
  • Scrollability of the result sequences
  • Query language
  • Query timeout

The most frequently used properties are "Binding mode" and "Scrollability", which are going to be discussed in a future post in this series. The Query language is by default XQuery, and can be changed to XQueryX. Supporting query timeout is optional, implementations are free to ignore it, it sets the number of seconds an implementation will wait for a query to execute.

Looking forward to the next post? We'll discuss how XQuery types are exposed through XQJ.

Labels:

Thursday, August 23, 2007

XQJ Part V - Serializing query results

The XQuery 1.0 specification consists out of multiple books, one is XSLT 2.0 and XQuery 1.0 Serialization. Given a data model instance, it defines how to serialize it into a sequence of octets. To mention a typical use case, it provides for example guidelines on how to write query results using XML syntax into a file.

Serialization defines a number of parameters which influence this process. The specification includes a detailed description for each of these parameters. We'll explain some through examples later on,

  • byte-order-mark
  • cdata-section-elements
  • doctype-public
  • doctype-public
  • encoding
  • escape-uri-attributes
  • include-content-type
  • indent
  • media-type
  • method
  • normalization-form
  • omit-xml-declaration
  • standalone
  • undeclare-prefixes
  • use-character-maps
  • version

Note that XQuery Serialization is an Optional Feature in XQuery. However, XQJ is more strict and requires every implementation to support serialization. XQJ does does not require every parameter defined in the XQuery Serialization spec to be supported in its full extend, but at least a default value for each of the parameters needs to be documented and behave conformant to the spec.
For DataDirect XQuery all parameters are documented here.

Suppose you want to serialize your query results in a file, fairly simple as shown in the next example,

...
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"doc('orders.xml')/*/ORDERS[O_ORDERKEY = '39']");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
new Properties());
...

Note the second argument of writeSequence() is an empty Properties object. You can also specify null. Both an empty Properties object and null are implying that the XQJ driver uses the default values for each of the serialization parameters.

You might get something as follows (assume this to be one line),

<ORDERS><O_ORDERKEY>39</O_ORDERKEY><O_CUSTKEY>
8177</O_CUSTKEY><O_ORDERSTATUS>O</O_ORDERSTATUS>
<O_TOTALPRICE>307811.89</O_TOTALPRICE><O_ORDERDATE>
1996-09-20T00:00:00</O_ORDERDATE><O_ORDERPRIORITY>3-MEDIUM
</O_ORDERPRIORITY><O_CLERK>Clerk#000000659</O_CLERK>
<O_SHIPPRIORITY>0</O_SHIPPRIORITY><O_COMMENT>furiously
unusual pinto beans above the furiously ironic asymptot
</O_COMMENT> </ORDERS>

Not really readable, some indentation would help. It's also good practice to add the XML declaration including an encoding. Suppose we want to encode the XML file as UTF-16,

...
Properties serializationProps = new java.util.Properties();
// make sure we output xml
serializationProps.setProperty("method", "xml");
// pretty printing
serializationProps.setProperty("indent", "yes");
// serialize as UTF-16
serializationProps.setProperty("encoding", "UTF-16");
// want an XML declaration
serializationProps.setProperty("omit-xml-declaration", "no");
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"doc('orders.xml')/*/ORDERS[O_ORDERKEY = '39']");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
serializationProps);
...

Much better what we get now,

<?xml version="1.0" encoding="UTF-16"?>
<ORDERS>
<O_ORDERKEY>39</O_ORDERKEY>
<O_CUSTKEY>8177</O_CUSTKEY>
<O_ORDERSTATUS>O</O_ORDERSTATUS>
<O_TOTALPRICE>307811.89</O_TOTALPRICE>
<O_ORDERDATE>1996-09-20T00:00:00</O_ORDERDATE>
<O_ORDERPRIORITY>3-MEDIUM</O_ORDERPRIORITY>
<O_CLERK>Clerk#000000659</O_CLERK>
<O_SHIPPRIORITY>0</O_SHIPPRIORITY>
<O_COMMENT>furiously unusual pinto
beans above the furiously ironic
asymptot</O_COMMENT>
</ORDERS>

Note that during serialization characters are escaped as needed for the specified encoding. Suppose a query returning a document with a registered trademark character, and the specified encoding is US-ASCII,

...
Properties serializationProps = new java.util.Properties();
serializationProps.setProperty("method", "xml");
serializationProps.setProperty("encoding", "ASCII");
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"<product>DataDirect XQuery®</product>");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
serializationProps);
...

And you'll get the following, note that the ® character is serialized as a character reference because it is not defined in the ASCII character set,

<product>DataDirect XQuery&#xae</product>

In some use cases, the cdata-section-elements parameter is useful. Suppose you're serializing some XML elements including ampersand characters. By default the & characters will be escaped, using CDATA sections might be preferable to make the XML file more human readable.

...
Properties serializationProps = new java.util.Properties();
serializationProps.setProperty("method", "xml");
serializationProps.setProperty("cdata-section-elements", "product");
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"<product>DataDirect XQuery &amp; XML Converters</product>");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
null);
...

Is serialized as follows,

<product><![CDATA[DataDirect XQuery & XML Converters]]></product>

Note that multiple elements can be specified through the cdata-section-elements parameter, separated by a semi-colon character. And in case the element is in a namespace, add the namespace uri using the James Clark notation, "{"+namespace uri+"}"localname

...
Properties serializationProps = new java.util.Properties();
serializationProps.setProperty("method", "xml");
serializationProps.setProperty("encoding", "UTF-8");
serializationProps.setProperty("omit-xml-declaration", "no");
serializationProps.setProperty("cdata-section-elements",
"product;{uri}product");
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"<e xmlns:p='uri'> " +
" <product>DataDirect XQuery &amp; XML Converters</product>" +
" <p:product>DataDirect XQuery &amp; XML Converters</p:product>" +
"</e>");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
null);
...

Yields the following result,

<?xml version="1.0" encoding="UTF-8"?>
<e xmlns:p="uri">
<product><![CDATA[DataDirect XQuery & XML Converters]]></product>
<p:product><![CDATA[DataDirect XQuery & XML Converters]]></p:product>
</e>

In addition to the XML output method, the XQuery serialization defines other output methods like HTML and XHTML. Note that these serialization methods will not "automagically" produce (X)HTML. It is still the query's responsibility to produce results conform to (X)HTML. But the serializer will consider the (X)HTML rules outputting the results. For example <br> elements will be serialized without a closing </br>.
Note for example the difference between the following result.xml and result.html

...
Properties serializationProps = new java.util.Properties();
XQPreparedExpression xqpe = xqc.createPreparedExpression(
"<html>line1<br/>line2</html>");
XQSequence xqs = xqpe.executeQuery();
serializationProps.setProperty("method", "xml");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.xml"),
serializationProps);
XQSequence xqs = xqpe.executeQuery();
serializationProps.setProperty("method", "html");
xqs.writeSequence(
new FileOutputStream("/home/jimmy/result.html"),
serializationProps);
...

result.xml is as follows,

<html>line1<br/>line2</html>

where results.html will look as follows,

<html>line1<br>line2</html>

If your interested in all the details about (X)HTML serialization, look here for HTML and here for XHTML.

In all previous examples, we've serialized the query results in a FileOutputStream. In addition an XQSequence can also be serialized into a java.io.Writer using the writeSequence() method. And getSequenceAsString() serializes to a java.lang.String.

Similar to serializing the complete XQSequence, there are methods to serialize the current item in the XQSequence. In the following example, the items in the query result are saved into individual files, result1.xml, result2.xml, and so on.

...
Properties serializationProps = new java.util.Properties();
serializationProps.setProperty("method", "xml");
serializationProps.setProperty("indent", "yes");
serializationProps.setProperty("encoding", "UTF-8");
serializationProps.setProperty("omit-xml-declaration", "no");
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery("doc('orders.xml')/*/ORDERS");
int i = 1;
while (xqs.next()) {
FileOutputStream file;
file = new FileOutputStream("/home/jimmy/result" +
i + ".xml");
xqs.writeItem(file, serializationProps);
file.close();
}
...

To conclude this post, note that XML serialization doesn’t always result in a well-formed XML document. More precisely it is either a well-formed XML document or a well-formed XML external general parsed entity. This is further explained in the serialization specification.

In the next upcoming post, we'll talk about manipulating the XQuery Static Context through the XQJ API.

Labels:

Monday, August 20, 2007

XQJ Part IV - Processing query results

In XQJ Part III we learned how to execute queries. In XQuery, query evaluation results in a sequence. In XQJ, executing a query through XQExpression or XQPreparedExpression returns an XQSequence object. An XQSequence represents an XQuery sequence with in addition a cursor over that sequence.

The application can browse through an XQSequence using the next() method. Initially the current position of the XQSequence is before the first item. next() moves the current position forward and returns true if there is another item to be consumed. Once all items in the sequence have been read, next() returns false.

Let's iterate through a sequence,

...
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery("doc('orders.xml')//order[id='174']");
while (xqs.next()) {
...
}
...

Positioned on an item, the application can retrieve the data using one of the getXXX() methods. To give a taste, we'll go through some of these methods by example.

An application can use getObject() to retrieve the current item of an XQSequence as a Java object. XQJ defines a mapping for each of the XQuery item types to a Java object value.

One of the most common scenario is probably a query returning a sequence of elements. Using getObject(), XQJ defines a mapping to Java DOM elements,

... 
org.w3c.dom.Element employee;
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery("doc(employees.xml)//employee");
while (xqs.next()) {
employee = (org.w3c.dom.Element)xqs.getObject();
...
}
...

But actually, XQJ defines a mapping for every XQuery type to Java objects, including all the atomic types. Assume for example a query retuning xs:decimal values, using getObject() your Java application retrieves the items as java.math.BigDecimal objects,

... 
java.math.BigDecimal price;
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(
"doc('orders.xml')/orders/order/xs:decimal(total_price)");
while (xqs.next()) {
price = (java.math.BigDecimal)xqs.getObject();
...
}
...

Suppose you have a query returning atomic values, and want to retrieve a textual representation of these. For example to output to System.out. getAtomicValue() returns a string representation of an atomic value according to the XQuery xs:string casting rules, and throws an exception if the item is a node.
In the next example the query returns a sequence of atomic values, note that the items are not all of the same type.

... 
XQExpression xqe = xqc.createExpression();
XQSequence xqs = xqe.executeQuery(
"'Hello world!', 123, 1E1, xs:QName('abc')");
while (xqs.next()) {
System.out.println(xqs.getAtomicValue());
}
...

Beside the DOM, XQJ also provides native support for 2 other popular XML APIs, SAX and StAX. In the next example each of the items is returned to the application through SAX,

... 
ContentHandler ch = ...
XQExpression xqe = xqc.createExpression();
XQSequence xqs = xqe.executeQuery(
"doc(employees.xml)//employee");
while (xqs.next()) {
xqs.writeItemToSAX(ch);
}
...

Up to now we have seen a number of examples where the application iterates over all the items in the sequence, and retrieves them one-by-one. The XQSequence interface also offers functionality to retrieve the complete sequence within a single call. In the next example, we execute a query and serialize the complete result into a SAX event stream.

... 
ContentHandler ch = ...
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery("doc('employees.xml')//employee");
xqs.writeSequenceToSAX(ch);
...

Or in a similar way, read the complete sequence as a StAX event stream.

... 
XQExpression xqe = xqc.createExpression();
XQSequence xqs = xqe.executeQuery("doc('employees.xml')");
XMLStreamReader xmlReader = xqs.getSequenceAsStream();
while (xmlReader.next() != XMLStreamConstants.END_DOCUMENT) {
...
}
...

Beside exposing the sequence through a SAX or StAX event stream, XQSequence also provides the ability to serialize into a binary or character stream. Here we're entering the arena of XSLT 2.0 and XQuery 1.0 Serialization, that's what the next post will be about.

Last, the above examples all iterate forward through the XQSequence objects. XQJ has also the notion of scrollable sequences, allowing to move both forward and backwards, set the cursor to an absolute position and allowing to iterate through the XQSequence more than once. We'll come back to it later.

Labels:

Thursday, August 16, 2007

XQJ Part III - Executing queries

In XQJ Part II we explained how to create a connection. Now your application is ready to do some real work, executing queries.

In XQJ an XQExpression objects allows you to execute an XQuery expression. Such XQExpression object is created in the context of an XQConnection. The next example creates an XQExpression and subsequently uses it to execute an XQuery expression,

...
// assume an XQConnection xqc
XQExpression xqe = xqc.createExpression();
xqe.executeQuery("doc('orders.xml')//order[id='174']");
...

The result of a query evaluation is a sequence, which is modeled as an XQSequence object in XQJ. Hence, the result of the executeQuery() method is an XQSequence. In typical scenarios the code example of above will look actually as follows,

...
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery("doc('orders.xml')//order[id='174']");
// process the query results
...

In the next post, XQJ Part IV, we'll discuss the XQSequence functionality in detail.

An XQExpression object can be reused, each time a different XQuery expression can be executed. The next example retrieves all orders with id 174 and next the orders with id 267,

...
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
// execute a first query
xqs = xqe.executeQuery("doc('orders.xml')//order[id='174']");
// process the query results
...
// execute a second query
xqs = xqe.executeQuery("doc('orders.xml')//order[id='267']");
// process the query results
...

In this last example we execute twice almost the same query, only the value compared against changes.

XQJ supports the concept of prepared queries. The idea here is to "prepare" the query only once, and subsequently "execute" it several times. During the prepare phase, the query is parsed, statically validated and an optimized execution plan is generated. This can be a relative expensive operation, hence using XQPreparedExpression objects can improve performance if the same query is executed multiple times.

Using prepared queries often implies the use of external variables in your query. The application can bind with each execution different values to each of the external variables.
Coming back to the previous example, here is the XQPreparedExpression variant. Note that the XQuery expression is specified when the XQPreparedExpression object is created, not at execute time,

...
XQPreparedExpression xqp;
XQSequence xqs;
xqp = xqc.prepareExpression(
"declare variable $id as xs:string external; " +
"doc('orders.xml')//order[id=$id]");
// execute a first query and process the query results
xqp.bindString(new QName("id"),"174", null);
xqs = xqp.executeQuery();
...
// execute a second query and process the query results
xqp.bindString(new QName("id"), "267", null);
xqs = xqp.executeQuery();
...

The previous example demonstrated how to bind values to XQPreparedExpression objects, it is also possible to bind values to external variables using XQExpression objects. Suppose you have a DOM tree and want to query it.

...
Document domDocument = ...;
XQExpression xqe = xqc.createExpression();
xqe.bindNode(new QName("doc"), domDocument, null);
XQSequence xqs = xqe.executeQuery(
"declare variable $doc as document-node(element(*,xs:untyped));" +
"$doc//order[id='174']");
// process the query results
...

XQuery has the concept of a context item, represented by a "dot" in your queries. In the previous examples we demonstrated values can be bound to XQExpression and XQPreparedExpression objects by name. In XQuery, the context item has no name, as such XQJ defines a name to bind the context item, XQConstants.CONTEXT_ITEM. The next example is similar to the last one, but it binds the DOM document to the initial context item rather than to an external variable.

...
Document domDocument = ...;
XQExpression xqe = xqc.createExpression();
xqe.bindNode(XQConstants.CONTEXT_ITEM, domDocument, null);
XQSequence xqs = xqe.executeQuery(".//order[id='174']");
// process the query results
...

In a later post, we will come back on binding values to external variables or the context item.

Note that in all the above examples, the XQuery expressions are specified as Java Strings. XQJ also allows to specify an InputStream, as shown in the next example, where the query in the getorders.xquery file is executed,

...
InputStream query;
query = new FileInputStream("home/joe/getorders.xquery")
XQExpression xqe;
XQSequence xqs;
xqe = xqc.createExpression();
xqs = xqe.executeQuery(query);
// process the query results
...

Interesting to note here is that an xquery can optionally specify the encoding in the query prolog. Good XQJ/XQuery implementations will use that information and properly parse the InputStream. If no encoding is specified, the assumed encoding depends on the implementation, for DataDirect XQuery this is UTF-8.

We know how to execute queries and have introduced the concept of prepared expressions, next in this series we will focus on processing query results, expect XQJ Part IV soon.

Labels:

Friday, August 10, 2007

XQJ Part II - Setting up a session

There is a large variety of XQuery implementations, both from an architectural perspective, as well as in the range of supported data sources. XQJ is designed that all these different architectures can be plugged in. For example, based on the XQJ and XQuery implementation, the required parameters and settings to configure the implementation might be different. Some implementations are server based, and require information to locate the server. Or, if the XQJ and XQuery implementation are co-located, you might need to specify the default location to query files on the local file system.

An XQJ application always starts with accessing an XQDataSource object. Such object encapsulates all parameters and settings needed to create a session with a specific implementation and eventually execute XQuery expressions and process results.

Every XQJ driver has its own XQDataSource implementation. This XQDataSource object supports a number of implementation-specific properties. For each of these properties, a "getter" and "setter" method is provided.

Assume an application wants to query both an Oracle database, and some files located in "/usr/joe/data" with DataDirect XQuery. Two properties need to be specified, the base uri and the jdbc url to connect to the Oracle database (feel free to replace Oracle with your favourite database, be it SQL server, MySQL or yet another one),

DDXQDataSource xqds = new DDXQDataSource();
xqds.setBaseUri("/usr/joe/data");
xqds.setJdbcUrl("jdbc:xquery:oracle://sales:1521;SID=ORA10");

Or envision another Oracle specific implementation where the server parameters are specified in individual properties rather than through a jdbc url. It could be as follows,

OracleDataSource xqds = new OracleDataSource()
xqds.setServerName("sales");
xqds.setPortNumber(1521);
xqds.setSID("ORA10");

Having access to an XQDataSource object, what's next? An XQDataSource is a factory for XQConnection objects. The XQConnection object represents a session in which XQuery expression are executed.

Establishing such a session is straightforward,

XQConnection xqc = xqds.getConnection();

In case user credentials are needed, these can be specified as arguments to the getConnection() method,

XQConnection xqc = xqds.getConnection("joe", "topsecret");

So far so good. Using the approach outlined above to create XQDataSource objects, makes the application dependent on a specific XQJ implementation. The proprietary classes DDXQDataSource and OracleDataSource are referenced. This is not necessarily wrong, there are scenarios where hard-coding the underlying XQJ implementation makes sense.

But often this is not desirable, XQJ is all about making your application independent from the underlying XQuery implementation. How can we make our application independent of the XQJ implementation? We'll show two approaches.

  • using a Java properties files
  • through JNDI

Assume all the XQDataSource properties are stored in a Java properties file, and in addition a property ClassName to identify the XQDataSource implementation to use.

For the DataDirect XQuery example above, the properties file would look as follows,

ClassName = com.ddtek.xquery3.xqj.DDXQDataSource
BaseUri = /usr/joe/data
JdbcUrl = jdbc:xquery:oracle://sales:1521;SID=ORA10

For the Oracle implementation,

ClassName = org.example.xqj.OracleDataSource
ServerName = sales
PortNumber = 1521
SID = ORA10

Using such properties file, an application can easily abstract out any hard-coded dependencies on the underlying XQJ implementation. The XQDataSource class is loaded through reflection and next it is simply a matter of passing in the properties.

// load the properties file
Properties p = new Properties();
p.load(new FileInputStream("/tmp/xqjds.prop"));

// create an XQDataSource instance using reflection
String xqdsClassName = properties.getProperty("ClassName");
Class xqdsClass = Class.forName(xqdsClassName);
XQDataSource xqds = (XQDataSource)xqdsClass.newInstance();

// remove the ClassName property
// the XQJ implementation will not recognize
// it and raise an error
p.remove("ClassName");

// set the remaining properties
xqds.setProperties(tmpProperties);

// create an XQConnection
XQConnection xqc = xqds.getConnection();

Of course, this is just an example, it might well be that for some applications another means to load the datasource properties is better suited.

Similar to JDBC, running in a J2EE environment, the XQDataSource object can be stored in a JNDI-enabled naming service. This allows your application to access the XQDataSource by simply specifying a logical name.

// get the initial JNDI context
Context ctx = new InitialContext();
// load the XQDataSource instance
XQDataSource xqds = (XQDataSource)ctx.lookup("xqj/sales");
// create an XQConnection
XQConnection xqc = xqds.getConnection();

For the readers with a JDBC background; JDBC has two mechanisms to establish a connection

  • DriverManager
  • DataSource

Don't look in XQJ for DriverManager-like functionality, XQJ doesn't offer this legacy functionality.

We have now learned how to create an XQConnection. In our next post we will do some real work, and show how to execute queries.

Labels:

Tuesday, August 7, 2007

XQJ Part I - Introduction

This is the first post in a series about XQJ (JSR 225). The XQJ initiative aims to provide the standard Java API to access XQuery implementations. Eventually XQJ will be for XQuery what JDBC means for SQL.
XQJ is currently in public review, the specification can be downloaded here.

I hope this series will give a good overview of the major interfaces and functionality offered by XQJ, mostly through example code. The majority of the code should be implementation independent, but I have to admit, it will be tested using DataDirect XQuery 3.0, which supports the XQJ Public Draft.

We have today the following essays in the XQJ series (the list is updated as more posts become available),

Let's start with our first simple XQJ application; the XQuery/XQJ version of "Hello World!".

XQDataSource xqjd = new DDXQDataSource();
XQConnection xqjc = xqjd.getConnection();
XQExpression xqje = xqjc.createExpression();
XQSequence xqjs = xqje.executeQuery("'Hello World!'");
xqjs.writeSequence(System.out, null);
xqjc.close();
I assume most of it speaks for itself. But don't worry if you have some questions, things will become clear in the upcoming posts.

Some of you might recognize some of the JDBC concepts in the code example above.
Indeed, the XQJ expert group decided to make XQJ stylistic consistent with JDBC. Similar to JDBC is has concepts like,

  • datasources
  • connections
  • expressions and prepared expressions
But at the same time, XQuery is not SQL, as such XQJ differs from JDBC,
  • data model
  • typing
  • static context
  • error handling
The next post in this series will introduce data sources and setting up connections.

Labels: