XML Connections

Sunday, October 28, 2007

XQJ Part X - XML Pipelines

Today's post in the XQJ series is about XML Pipelines. How can we create a pipeline of XQueries or how to integrate XQuery with JAXP based XML processors.

An XML pipeline is a sequence of XML processes, also called transformations, the result of one transformation is passed as input to the next one. In case the question comes up, read Norman Walsh’s essay Why Pipelines? Historically, the transformations coming to mind are XSLT transformation, XML Schema validation, a simple XML parsing, etc. But of course, it can also be an XQuery. This post is about integrating XQuery and XQJ in such XML pipeline.

Let’s start with a pipeline, where the result of a first XQuery is passed on into a second. Given that query execution results in an XQSequence, and an XQSequence can be bound to an external variable, pipelining two xqueries is rather simple.

...
XQExpression xqe1 = xqc.createExpression();
XQSequence xqs1;
xqs1 = xqe1.executeQuery("doc('orders.xml')//orders");
XQExpression xqe2 = xqc.createExpression
xqe2.bindSequence(xqs1);

XQSequence xqs2 = xqe2.executeQuery(
"declare variable $orders as " +
" element(*,xs:untyped) external; " +
"for $order in $orders " +
"where $order/@status = 'closed' " +
"return " +
" <closed_order id = '{$order/@id}'>{ " +
" $order/* " +
" </closed_order>";
xqs2.writeSequence(System.out, null);
xqe2.close();
xqe1.close();
...

In this example, both queries in the pipeline are executed in the context of a single XQConnection object. But this is not required; it is perfectly possible to have two different connection objects, possibly from different XQJ implementations.

The application writer shouldn’t be concerned how the query result from the first is passed to the second xquery. Whether for example a DOM, SAX, StAX, serialized XML or some other (proprietary) mechanism is used, is an implementation detail. The application can assume that the most appropriate mechanism is used.

Let’s now look how an XQuery can be integrated in a pipeline with other XML processors. We'll work out two scenarios, the result of an xquery is passed to an XSLT transformation, and next vice versa, an xquery processing the results of an XSLT transformation. On the Java platform, XSLT transformations are processed through the JAXP api. JAXP makes use of the Source and Result interfaces to query XML documents and handle the results of transformations. As XQJ has built-in support for these interfaces, it is possible to integrate both JAXP and XQJ in a pipeline.

The next example invokes an XSLT transformation, and the results are passed as input to an XQuery. Under the covers, SAX events will be. It is favorable to use SAX from a performance and scalability perspective, rather than when a serialized XML stream would be passed, not to talk about the use of a DOM tree.

First an XMLFilter for the XLST transformation is created. And a SAXSource, the input for the XSLT Transformation.
It is this SAXSource which is bound to the external variable of the XQuery. Subsequently the xquery is executed, and finally the application consumes the results of the xquery.

...
// Build an XMLFilter for the XSLT transformation
SAXTransformerFactory stf;
stf = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
XMLFilter stage1;
stage1 = stf.newXMLFilter(new StreamSource("stage1.xsl"));

// Create a SAX source, the input for the XSLT transformation
SAXSource saxSource;
saxSource = new SAXSource(stage1,new InputSource("input.xml"));

// Create the XQuery expression
XQPreparedExpression stage2;
stage2 = xqc.prepareExpression(new FileInputStream("stage2.xquery"));

// Bind the input document as a Source object to stage2
stage2.bindDocument(new QName("var"),saxSource);

// Execute the query (stage2)
XQSequence result = stage2.executeQuery();

// Process query results
result.writeSequenceToResult(
new StreamResult(System.out));
...

As you notice in this example, the pipeline is activated starting from the end. Under the covers, the XQJ implementation will invoke the parse() method on the SAXSource bound to the external variable. Which will start invoking SAX callbacks to the XSLT transformation and this will on its turn perform callbacks to the XQJ implementation. These callbacks represent the result of the XSLT transformation, allowing the xquery to yield results back to the application.

Let's now have a closer look at a pipeline where the results of an xquery are passed on to an XSLT transformation.

First we present a utility class, XQJFilter, an XMLFiler based on an XQJ XQPreparedExpression object.

public class XQJFilter extends XMLFilterImpl {
XQPreparedExpression _expression;

public XQJFilter(XQPreparedExpression expression) {
_expression = expression;
}
public void parse(InputSource source) throws SAXException {
try {
XQSequence xqs = _expression.executeQuery();
Result result = new SAXResult(this.getContentHandler());
xqs.writeSequenceToResult(result);
xqs.close();
} catch (XQException e) {
throw new SAXException(e);
}
}
}

Next we’ll use this XQJFilter implementation to build the pipeline,

...
// Create an XQuery expression
XQPreparedExpression xqp;
xqp = xqc.prepareExpression(new FileInputStream ("query.xq"));
// Create the XQJFilter, the first stage in our pipeline
XQJFilter stage1 = new XQJFilter(xqp);
// Create an XMLFilter for the XSLT transformation, the 2nd stage

SAXTransformerFactory stf;
stf = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
XMLFilter stage2 = stf.newXMLFilter(new StreamSource("stage2.xsl"));
stage2.setParent(stage1);
// Make sure to capture the SAX events as result of the pipeline
stage2.setContentHandler(...);
// Activate the pipeline
stage2.parse("");
...

As with the previous example, also here the pipeline is activated through the end, by calling the parse method on the 2nd stage.

In our next post we discuss deferred binding mode, working with external variables, this is essential functionality to handle huge XML documents.

Labels:

Monday, October 22, 2007

Yes! You can finally query your office documents!


When I was working on an XML database in the late nineties, I remember hearing a lot of noise about the fact that — finally! — XML would allow for real reuse and collaboration when working with applications like word processors and spreadsheet editors. Copy like "Data is finally disjoint from format; store the data as XML, and use other languages [XSLT, typically] to take care of the format." was easy to find in industry publications and seemed to herald the start of something big.

Well, XML did work that way, but only in limited cases; for the vast majority of applications, XML didn't provide any of the benefits of separate data and formatting. Fast forward a decade, and there's now genuine promise in this area.

Two emerging standards, OpenDocument Format (ODF) and Office Open XML (OOXML), are gaining popularity (and originating several fights in standard bodies), and things are moving again, even if they're not moving in exactly the way people thought they would some time ago: neither ODF nor OOXML really create a clear separation between data and presentation. Instead, they have adopted an XML format that provides for a mix of data and presentation. But both standards have adopted XML, which means that you can finally use standard XML tools, like XQuery, to query your office documents!

Marc recently wrote a nice article about how you can leverage XQuery to query XML-based office document standards; he provides an excellent technical overview, and the included examples are good references about how to get started experimenting with XQuery and these new formats.


Labels: , , , ,

Tuesday, October 16, 2007

Converting between EDI and XML... Why?


EDI (Electronic Data Interchange) pre-dates XML by several years; people started using the term EDI to describe the transfer of data, typically across multiple companies, often using VANs (Value Added Networks) or the Internet. Many standards bodies have been creating EDI standards in the past several years; some of the most popular EDI standards in active use today include X12, HL7, EDIFACT, IATA.

And how is XML related to all this? Since the start of XML's adoption in the late 90's, one of XML's primary applications was in handling B2B or even B2C data interchange. The benefits compared to EDI were obvious, numerous, and embedded in the very nature of XML: human readable, easily extensible, easy to validate (well, after XML Schema saw the light, at least), easily parsed using off-the-shelf components, and easily accessed through standard interfaces like DOM, Sax, and StAX. None of these features applies to the EDI standards, as anyone dealing with EDI knows very well.

So, there is no point in worrying about EDI standards; we can just use XML and rely on powerful XML tools like DataDirect XQuery to create and manipulate messages used for data interchange across companies, right? Wrong.

It is true that "the world" is moving in the direction of using XML more and more to deal with data interchange. But it's also true that there are so many existing critical applications based on EDI standards, in so many different industry verticals (health, airlines, and insurance, to name but a few), that EDI is not going away anytime soon.

As you can imagine, the contemporaneous existence of two strong data interchange formats creates another problem: you are company A, and you deal with company B, which "speaks" only EDI, and with company C, which has recently upgraded all its systems which now "speak" only XML... (I wish I were making up this scenario for illustrative purposes, but I am not.) You must become bilingual, speaking and understanding both EDI and XML. Many companies try to address that problem creating ad-hoc code that knows how to deal with EDI and XML, and the many variants that various partner companies may be using in their EDI approach; but that's far from an optimal approach, as developing and testing applications becomes a nightmare, and handling changes is a big challenge.

That's exactly where a product like DataDirect XML Converters can help — XML Converters are streaming-based Java and .NET components that translate between EDI and XML. This means that if your company (company A) receives EDI messages from a partner company (company B), you can still treat that EDI as XML data thanks to XML Converters; similarly, if you need to send an EDI message to a partner company, you can create XML and then rely on XML Converters to translate that XML into EDI. You don't need to write EDI parsers; you can handle variations from the EDI standards through XML Converters; and you can just focus on manipulating XML, preferably through a language like XQuery.

Edict Systems understood the benefits that DataDirect XML Converters bring to the table, and they are now part of the growing number of companies that use XML Converters to handle business situations in which EDI and XML need to co-exist, or where developers want to be shielded from the task of parsing and creating raw EDI. An interesting case study of Edict System's DataDirect XML Converters implementation is now available on the DataDirect XML Converters web site: http://www.xmlconverters.com/customers/edict-systems.html

There's much more to write about other usecases where using DataDirect XML Converters and DataDirect XQuery together provide a very powerful combination, and I'll be blogging about that in future posts.



Tech Tags:

Labels: , , ,

Monday, October 15, 2007

XQuery your Excel spreadsheets

A few weeks ago I blogged about XQuery your office documents. We can query our Office Open XML and OpenDocument Format documents, because they are XML based. But what about older formats? For example, there are a zillion of Excel 2003 spreadsheets, and they will be around for another few years. Wouldn't it be great if we can query those, just as we can query OOXML and ODF documents through DataDirect XQuery?

URI Resolvers

JAXP defines the URIResolver interface. A URIResolver turns a URI into a 'virtual' XML document. The concept of URIResolver is supported by most Java-based XPath, XSLT and XQuery implementations and thus also by DataDirect XQuery.

URIResolvers allow you to the query any proprietary format through XQuery, as long as you go through the effort of some Java coding to transform the legacy format to XML.

There are also products using the URIResolver interface to make non-XML data standards available to the XML eco-system. The XML Converters, for example, allow to query many non-XML data formats like the EDI standards X12, EDIFACT, EANCOM, HL7, etc, as well as dBASE, CSV, JSON and many others.

Reading Excel 2003 files

We're going to use the concept of URIResolvers to query our Excel 2003 spreadsheets. But of course, we need a Java implementation to read these XLS files. Apache POI is such Java API to access Microsoft file formats, including Excel 2003 documents.

The advantages of Apache POI for Excel are,

  • Cross platform, as there are no dependencies on native Windows DLLs
  • The API is powerful enough to translate an XLS into XML
  • Formula support, most of the time you want to query the 'data'. For a cell containing a formula (for example SUM(A1:A5)) you're not interested in the formula itself but its result.

The ExcelURIResolver

So we have written a URIResolver to read Excel 2003 files. The xqexcel.jar file is available here, and needs to be added to your CLASSPATH. You also need two Apache POI 3.0.1 jar files, poi-3.0.1-FINAL-20070705.jar and poi-scratchpad-3.0.1-FINAL-20070705.jar. The Apache POI distribution can be downloaded here.

Enabling the ExcelURIResolver in DataDirect XQuery is trivial. If you are using the XQJ API, you can simply register the ExcelURIResolver through your DDXQDataSource.

...
DDXQDataSource ddds = new DDXQDataSource();
ddds.setDocumentUriResolver(
"com.ddtek.xquery.excel.ExcelURIResolver");
...

Using the DataDirect XQuery command-line utility, all you need to do is adding the -r option specifying the class name of the Excel URIResolver, com.ddtek.xquery.excel.ExcelURIResolver.

And you're all set to query Excel 2003 documents through the fn:doc() function. Use the excel: URI scheme, specifying the file name of the .XLS

fn:doc('excel:C:/my office documents/sales2007.xls')

The virtual Excel XML document

An Excel 2003 document is called a workbook and can contain several sheets, each sheet is a grid of cells. Our ExcelURIResolver makes the following information available through the virtual XML document.

  • All sheets, each with the name of the sheet.
  • Within a sheet, all used row. For each row, the row number as available in Excel is available.
  • Within a row, each cell being used. Note that you can have a different number of cells within each row. So, its not like a relational table where each row has a fixed number of columns. For each of the cells, the name of the column is available, consistent with the scheme used by Excel.

As an example, consider a sample Excel file, ciscoexpo.xls, from Microsoft's web site. The file consists of one sheet called Sheet1 and it looks as follows.
The columns A and B (Year and Sales) contain plain data (numbers) and the columns C and D (Predication and Ratio) contain formula's. Cell C5 is for example =58.552664*EXP(0.569367*A5) and D5 is =C5/C4.

When we query the complete document,

fn:doc('excel:C:/my office documents/ciscoexpo.xls')

We get the following virtual XML document.

<workbook name="excel:C:/my office documents/ciscoexpo.xls">
 <sheet name="Sheet1">
  <row line="1">
   <cell column="2" name="B1">Year 1=1990</cell>
  </row>
  <row line="3">
   <cell column="1" name="A3">Year</cell>
   <cell column="2" name="B3">Sales</cell>
   <cell column="3" name="C3">Prediction</cell>
   <cell column="4" name="D3">Ratio</cell>
  </row>
  <row line="4">
   <cell column="1" name="A4">1</cell>
   <cell column="2" name="B4">70</cell>
   <cell column="3" name="C4">103.4712285029616</cell>
  </row>
  <row line="5">
   <cell column="1" name="A5">2</cell>
   <cell column="2" name="B5">183</cell>
   <cell column="3" name="C5">182.84898408571283</cell>
   <cell column="4" name="D5">1.767148092578018</cell>
  </row>
  <row line="6">
   <cell column="1" name="A6">3</cell>
   <cell column="2" name="B6">340</cell>
   <cell column="3" name="C6">323.1212334568959</cell>
   <cell column="4" name="D6">1.7671480925780185</cell>
  </row>
  ...
  <row line="13">
   <cell column="1" name="A13">10</cell>
   <cell column="2" name="B13">12154</cell>
   <cell column="3" name="C13">17389.060639019517</cell>
   <cell column="4" name="D13">1.767148092578018</cell>
  </row>
  <row line="15">
   <cell column="1" name="A15">16</cell>
   <cell column="3" name="C15">529558.3247555149</cell>
  </row>
 </sheet>
 <sheet name="Sheet2"/>
 <sheet name="Sheet3"/>
</workbook>

Accessing cell B5 to B7 in XQuery world,

let $xls := fn:doc('excel:C:/my office documents/ciscoexpo.xls')
let $sheet := $xls/workbook/sheet[@name="Sheet1"]
return
$sheet/row/cell[@name=("B5","B6","B7")]
Conclusion

We have shown how to open data locked up for years in your Excel spreadsheets. We can now query this virtual XML document like we can with any other XML document, opening a wide range of use cases.

  • Transform Excel 2003 documents into any XML standard format
  • Join your .xls data with your relation database to generate complex XML documents
  • Create EDI messages using the XML Converters with data stored in Excel 2003 spreadsheets
  • Extract information out of Excel 2003 documents and upload it into your database
  • Publish data out of your Excel spreadsheets in PDF format using XSL-FO
  • etc

But also, the concept of URI Resolver is powerful, and allows you basically to query any proprietary data through XQuery.

Labels: ,

Friday, October 5, 2007

XQuery questions you've always wanted to ask (but never dared to)

When using a programming language, sooner or later we all end up trying to solve similar problems. When I enjoyed writing applications in Prolog or C++ (yes, many years ago; and yes, I said enjoy), I wasn't lucky enough to be able to search the Internet for answers, and I had to find solutions to problems that I was sure thousands of other developers had already faced (and solved!).

But Internet or no, developers today are still confronted with questions and problems, especially when dealing with relatively new languages; and this is true of XQuery, of course — How do I return a sequence of elements? How do I do grouping? How do I use variables in expressions? Why does using the default namespace make my query fail? And many more.

Since we don't want you to suffer the same way I did when I was younger, we thought it would be a good idea to share with you typical questions (and answers) that we have experienced in the past few years of work on XQuery. The result is a collection of "tips and tricks" that already covers dozens of topics, and we'll augment the collection over time.

If you have recently hit an XQuery problem about which you have a question, or if you have recently solved a problem that you think might be encountered by other XQuery users, let us know! Who knows: maybe the next addition to our tips and tricks pages to help other XQuery developers will be yours!


Tech Tags:

Labels: , ,

Wednesday, October 3, 2007

XQJ Part IX - Creating XDM instances

In the previous posts of the XQJ series, we have learned how to handle XDM instances as result of query execution; iterating through sequences, and get access to the items in the sequence. What if we want to create an XDM instance, without execution a query, can we?

XQJ offers functionality to create both XQSequence and XQItem objects. I mean, not as a result of a query execution, but rather as standalone XDM instances. This functionality is offered through the XQDataFactory interface. An XQDataFactory creates the following types of objects,

  • XQItem
  • XQSequence
  • XQItemType
  • XQSequenceType

Every XQConnection must implement the XQDataFactory interface. In XQJ 1.0 these are the only concrete XQDataFactory implementations, future versions might introduce different mechanisms to get access to an XQDataFactory.

Creating types

In the Typing post in this series, we have introduced the XQItemType and XQSequenceType interfaces. We have also learned how these objects are used to describe the static type of a query result and external variables. How do we create such type objects in our application?

Remember that XQJ defines a dozen of XQITEMKIND_XXX constants. For each of those there is a matching createXXXType method,

  • createAtomicType
  • createAttributeType
  • createCommentType
  • createDocumentElementType
  • createDocumentType
  • createElementType
  • creatItemType
  • createNodeType
  • createProcessingInstructionType
  • createSchemaAttributeType
  • createSchemaElementType
  • createTextType

Let's discuss some of the most common used methods in the above list.

The method createAtomicType(), creates an XQItemType object representing an XQuery atomic type. It accepts a single argument, an integer which is one of the predefined XQBASETYPE constants.
The next example create 3 XQItemType instances representing xs:integer, xs:string and xs:decimal,

...
XQItemType xsinteger = xqc.createAtomicType(
XQItemType.XQBASETYPE_INTEGER);
XQItemType xsstring = xqc.createAtomicType(
XQItemType.XQBASETYPE_STRING);
XQItemType xsdecimal = xqc.createAtomicType(
XQItemType.XQBASETYPE_DECIMAL);
...

Remember that every XQConnection is an XQDataFactory, in the example we've used our XQConnection xqc, to create these XQItemType instances. However, the XQItemType objects are completely independent of the connection.

Where the above example shows how to create XQItemType objects representing one of the built-in atomic XML Schema types, there is a second flavor of createAtomicType() for user-defined atomic types. Assume a hatsize user-defined atomic type derived from xs:integer in the http://www/hatsize.com schema,

...
XQItemType hatsize;
hatsize = xqc.createAtomicType(
XQItemType.XQBASETYPE_INTEGER,
new QName("http://www.hatsizes.com", "hatsize"),
new URI("http://www.hatsizes.com"));
...

Beside atomic types, also element types are frequently used. In the next example we create an XQItemType representing element(person),

...
XQItemType type;
type = xqc.createElementType(
new QName("person"),
XQItemType.XQBASETYPE_ANYTYPE);
...

The first argument to createElementType() is a QName. Where in the example a person element in no namespace is created, the next example creates an element type person in the namespace http://www.foo.com. The second argument can be any of the predefined types, beside xs:anyType also xs:untyped is frequently used,

...
XQItemType type;
type = xqc.createElementType(
new QName("person","http://www.example.com"),
XQItemType.XQBASETYPE_UNTYPED);
...

The first argument can also be null, which is assumed to be the wild card, the following code snippet shows the creation of element(*, xs:untyped),

...
XQItemType type;
type = xqc.createElementType(
null,
XQItemType.XQBASETYPE_UNTYPED);
...

What about document-node() types? In the next example we create two XQItemType instances, a first representing any document and a second representing a well-formed untyped document,

...
XQItemType type1;
XQItemType type2;
type1 = xqc.createDocumentType();
type2 = xqc.createDocumentElementType(
xqc.createElementType(
null,
XQItemType.XQBASETYPE_UNTYPED));
...

In addition to XQItemTypes, also XQSequenceType objects can be created.
As explained before in the Typing post, an XQSequence consists of

  • an XQItemType
  • the cardinality to constraint the number of items, one of the OCC_XXX constants defined on XQSequenceType.

As such creating an XQSequenceType is simple. The next example shows how to create a xs:string* sequence type,

...
XQItemType itemType;
XQSequenceType sequenceType;
itemType = xqc.createAtomicType(
XQItemType.XQBASETYPE_STRING);
sequenceType = xqc.createSequenceType(
itemType,
XQSequenceType.OCC_ZERO_OR_MORE);
...

Uisng types

So far so good, but why would one need to create all these types?

Assume an XQSequence, iterating over the items, if the item is a node retrieve is through the DOM, and get atomic values as Strings. This can be accomplished using the instanceOf() method, passing in an XQItemType object

...
XQItemType nodeType = xqc.createNodeType();
XQSequence xqs = ...
...

while (xqs.next()) {
if (xqs.instanceOf(nodeType)) {
org.w3c.dom.Node node = xqs.getNode();
...
} else {
String s = xqs.getAtomicValue();
...
}
}
...

Some XQuery implementations have support for the Static Typing Feature as defined in XQuery. This requires implementations to detect and report type errors during the static analyses phase.
For expressions depending on the context item, the application must specify the static type of the context item. Why? In order to perform static typing, the implementation has to know the static type of the context item. The application has to provide the static type, and failing to do so, will result in an error being reported during the static analyses phase.

As the static type of the context item is a static context component, the XQJ XQStaticContext interface allows it to manipulate.
The next example shows to set the static type of the initial context item to document-node(element(*, xs:untyped)),

...
XQItemType documentType;
documentType = xqc.createDocumentElementType(
xqc.createElementType(
null,
XQItemType.XQBASETYPE_UNTYPED));
XQStaticContext xqsc = xqc.getStaticContext();
xqsc.setContextItemStaticType(documentType);
...
XQPreparedExpression xqp;
xqp = xqc.prepareExpression("//address",
xqsc);
...

As last use case of XQItemType, remember some of the examples of the previous post in this series, Binding external variables.
The bindXXX() methods defined on XQDynamicContext have all a third parameter, which allows to override the default Java to XQuery data type mapping.

In the next example we bind a java Integer to the external variable, but rather than using the default mapping to xs:int, specify to map it to a xs:short,

...
XQItemType xsshort;
xsshort = xqc.createAtomicType(XQItemType.XQBASETYPE_SHORT);
XQPreparedExpression xqp;
xqp = xqc.prepareExpression(
"declare variable $v as xs:short external; " +
"$v + 1");
xqp.bindInt(new QName("v"), 22, xsshort);
...

Creating XDM instances

Having discussed the ability to create XQItemType and XQSequenceType instances, XQDataFactory offers also the ability the create XQItem and XQSequence instances.

There is basically nothing new under the sun. If you understand the way binding to an XQDynamicContext works, as discussed in our previous post, you almost know how XQItem instances are created. For every bindXXX() method defined on XQDynamicContext, there is corresponding createItemFromXXX() method.

Let's show a simple example, binding a java.math.BigDecimal to an external variable $d,

...
XQExpression xqe = ...
xqe.bindObject(new QName("d"),new BigDecimal("174"), null);


And creating an XQItem of type xs:decimal from the same java.math.BigDecimal,

XQItem xqi = xqc.createItemFromObject(new BigDecimal("174"), null);

Note that the XQItem objects created through XQDataFactory are independent of any connection.

Suppose you execute a query returning a single item, and subsequently close the connection but still require access to the XQItem. Closing the XQConnection will invalidate the XQItem object resulting from the query execution. As such XQDataFactory has an XQItem copy method. createItem() accepts a single XQItem argument, and returns a (deep) copy of the specified item.
The following example shows how to make a query result available, also after closing the XQSequence or XQConnection,

XQConnection xqc = ...
XQExpression xqe = xqc.createExpression();
XQSequence xqs;
xqs = xqe.executeQuery("(doc('book.xml')//paragraph)[1]");
xqs.next();
XQItem xqi = xqc.createItem(xqs.getItem());
xqc.close();
// although the connection is closed, xqi is still valid.

Suppose you have an XML document which needs to be queried multiple times, but don’t want to go through the XML parsing overhead, each time it is queried. In the following example, two queries are executed and as such, books.xml will be parsed twice,

...
XQExpression xqe = xqc.createExpression();
xqe.executeQuery("fn:doc('book.xml')//paragraph[contains(.,'XQuery')]");
xqe.executeQuery("fn:doc('book.xml')//paragraph[contains(.,'SQL')]");
...

Or suppose you receive a transient XML stream, for example in a servlet environment, and need to query the stream multiple times. Then one way or the other the data will need to be buffered in order to query it more than once.

How can we make a) an XML document being parsed only once, b) in case the XML stream is transient, make it 'queryable' multiple times?

Suppose two XQPreparedExpression objects, xqp1 and xqp2. The next example will create first an XQItem representing the XML document, as such it will be parsed only once. Second, it will be bound to 2 different XQPreparedExpression object,

...
InputStream input = ...
XQItem doc = xqc.createItemFromDocument(input, null);
...
xqp1.bindItem(new QName("doc"), doc);
...
xqp2.bindItem(new QName("doc"), doc);
...

One of the disadvantages of such apporach, especially with large document, are the scalability aspects and memory consumption. For example, in case of DataDirect XQuery, the streaming capabilities will not be of much use as the complete XML document is instantiated in-memory. We'll come back to the topic of processing large input documents in a future post of the XQJ series.

Finally, XQDynamicContext also allows to create XQSequence objects.
There is a createSequence() copy operation. I.e. with a single XQSequence argument, returning a copy of it. Similar to the XQItem example above, it allows to have query results outlive an XQConnection.

A second flavor of createSequence() accepts a java.util.Iterator, returning a sequence of items based on the objects returned by the Iterator. The objects are converted into XDM instances using the default object mapping defined in XQJ. For example, the following code snippet results in a sequence of xs:decimal instances,

...
// assume an ArrayList of BigDecimal objects
ArrayList list = ...
XQSequence s = xqc.createSequence(list.iterator());
...

Pipelines is the next topic we will discuss. How can one create a pipeline of xqueries, or pipelining an xquery with an XSLT transformations? Watch out for the next post.

Labels: