XML Connections

Friday, September 28, 2007

Thinking about insurance? XQuery can help!


ACORD is a well-known organization focused on standards in the insurance industry. In the last several years ACORD has developed an impressive number of XML-based standards, and more and more organizations encounter these standards every year. It should be no surprise that a language like XQuery can be extremely helpful in creating, consuming, and processing ACORD messages; and XQuery processors able to access heterogeneous data sources, like DataDirect XQuery, provide an even more powerful way to process ACORD-based requests, to validate requests against a variety of sources, and to perform back-end updates based on changes communicated through ACORD messages.

In order to illustrate the value brought to the table by DataDirect XQuery, we went through the exercise of creating XQuery that deal with specific ACORD requests in hypothetical (but reasonably realistic) scenarios. We recently published the results on our web site, at http://www.xquery.com/ACORD/.

One of the major benefits of using DataDirect XQuery is that users are able to access and process multiple heterogeneous data sources in the context of a single language (XQuery) and data model. To prove that, we spent some time trying to imagine how tasks similar to the ones described in the XQuery examples would be solved in an environment where the developer is limited to the use of Java+SQL. We'll leave it to you to judge the difference in complexity between the XQuery and Java+SQL approaches.

Of course, insurance is just one of the many possible "industry verticals" that we could choose; time permitting, we will tackle other industries (health care and airlines are two that come to mind) focusing on different standards. If you have any specific suggestions, we are interested in hearing about them! Write and let us know.



Labels: , ,

Wednesday, September 26, 2007

Join us for some SOA, XQuery, data services and other discussions!



Interested in hearing about SOA, data services and data access layers? Want to hear about how SOA, XQuery, SDO and Data services all fit together?

Why not joining us for some interactive sessions at our Architect Tutorial or Design Preview events?

I will personally focus on XQuery and Data services during the Design Previews here in Boston (10/9); possibly also in Palo Alto (10/17). It will be great meeting you there in person.


Labels: , ,

Sunday, September 23, 2007

XQJ Part VIII - Binding external variables

Last month in, XQJ Part III - Executing queries, we showed through some simple examples how to bind a value to an external variable declared in your query. In this post of the XQJ series, we will get into some more details on this subject.

As we know, XQuery operates on the abstract, logical structure of XML, known as the XQuery Data Model (XDM). As such, by definition in XQuery, the value bound to an external variable is an XDM instance. Having a Java object in your Java application, how is itconverted into such XDM instance? XQJ defines this mapping and glues it all together.

A first simple example,

...
XQPreparedExpression xqp;
XQSequence xqs;
xqp = xqc.prepareExpression(
"declare variable $id as xs:integer external; " +
"doc('orders.xml')//order[id=$id]");
xqp.bindObject(new QName("id"),new Integer(174), null);
xqs = xqp.executeQuery();
...

The bindObject() method is defined in the XQDynamicContext interface. It provides a number of methods to bind values to external variables. As XQDynamicContext is both the base for XQExpression and XQPreparedExpression, as such both expression implementations support binding values to external variables.
The first argument to the bindObject() method is a QName, which identifies the external variable in your XQuery. Second argument is the Java object to be bound and XQJ defines a mapping of Java objects to XDM instances. Providing the full list is out of scope, I would like to refer to the XQJ spec if you’re interested in all the details, but here a couple of examples,





































Java type

XQuery type

java.lang.Integer

xs:int

java.lang.BigInteger

xs:integer

java.lang.BigDecimal

xs:decimal

java.lang.String

xs:untypedAtomic

org.w3c.dom.Document

untyped document node

org.w3c.dom.Element

untyped element node

...

...

The third argument, for which null is specified in the example above, allows to override the default mapping. This is shown next [1].

... 
XQItemType xsinteger;
xsinteger = xqc.createAtomicType(XQItemType.XQBASETYPE_INTEGER);

XQPreparedExpression xqp;
XQSequence xqs;

xqp = xqc.prepareExpression(
"declare variable $v1 external; " +
"declare variable $v2 external; " +
"$v1 instance of xs:integer, "+
"$v1 instance of xs:int, "+
"$v2 instance of xs:integer, "+
"$v2 instance of xs:int");
xqp.bindObject(new QName("v1"),new Integer(174), null);
xqp.bindObject(new QName("v2"),new Integer(174), xsinteger);
...

This example yields a sequence of 4 xs:boolean instances,

true, true, true, false
A Java Integer is by default mapped to xs:int. xs:int extends by restriction xs:integer, as such the first two 'instance of' expressions evaluate to true. The second external variable is bound with an xs:integer instance as the application explicitly specifies to create such XDM instance. As such the last 'instance of' evaluates to false, as xs:integer is not extending xs:int.

Note that various error conditions can occur during the binding process,

  • The conversion from Java to XDM instance can fail.
    For example, a java.lang.Integer object with value 10000 is converted into a xs:byte. As 10000 is out of bounds of the xq:byte value space, an error will be reported
  • Once converted into an XDM instance, the binding can still fail in case the external variable declaration includes a declared type. In such scenario the XDM instance must match the declared type according to the rules of SequenceType matching.
    For example, a java.lang.Integer is bound and converted into an xs:integer instance, but the external variable is declared as xs:string.

We have introduced the bindObject() method through some examples, but XQDynamicContext has many more bind methods.

bindAtomicValue() accepts a java.lang.String and will convert it to the specified type according to the casting rules from xs:string, basically the specified string must be in the lexical space of the specified atomic type. In the following example the Java String "123" is converted into xs:string, xs:integer and xs:double instances and bound the the external variables $v1, $v2 and $v3.

...
xqp.bindAtomicValue(new QName("v1"), "123",
xqc.createAtomicType(XQItemType.XQBASETYPE_STRING));
xqp.bindAtomicValue(new QName("v2"), "123",
xqc.createAtomicType(XQItemType.XQBASETYPE_INTEGER));
xqp.bindAtomicValue(new QName("v3"), "123",
xqc.createAtomicType(XQItemType.XQBASETYPE_DOUBLE));
...

In contrast, the following two bindAtomicValue() invocations will fail. The first because "abc" is not in the value spaces of xs:integer. The second one because no type has been specified as third parameter, unlike with bindObject(), bindAtomicValue() has no default mapping and a XQItemType must be specified as third argument.

...
xqp.bindAtomicValue(new QName("e"), "abc",
xqc.createAtomicType(XQItemType.XQBASETYPE_INTEGER));
xqp.bindAtomicValue(new QName("e"), "123", null);
...

Further XQDynamicContext also provides bindXXX() methods for each of the Java primitive types,

  • boolean
  • byte
  • double
  • float
  • int
  • long
  • short

For example, binding an xs:integer instance 123 to the external variable $v. The default mapping for int is xs:int, as such we specify the type as third parameter.

xqp.bindInt(new QName("v"), 123, 
xqc.createAtomicType(XQItemType.XQBASETYPE_INTEGER));

Further binding a DOM node is also possible, basically the is equivalent to bindObject, with the restriction that the argument must be a DOM node and as such the XDM instance is always a node, never an atomic value. Of course in addition to DOM, also the SAX and StAX APIs are supported through XQDynamicContext.

Let’s read an XML document foo.xml through DOM, SAX and StAX and each time bind it to an external variable $v.

The DOM version,

...
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);

DocumentBuilder parser = factory.newDocumentBuilder();
Document domDocument = parser.parse("foo.xml");
xqp.bindNode(new QName("e"), domDocument,null);
...

The StAX version,

...
XMLInputFactory factory = XMLInputFactory.newInstance();
FileInputStream doc = new FileInputStream("foo.xml");
XMLStreamReader reader = factory.createXMLStreamReader(doc);
xqp.bindDocument(new QName("e"), reader, null);
...

And the SAX version, for which we need to implement an XML Filter.

...
XMLFilter xmlReader = new XMLFilterImpl() {
public void parse(String systemId) throws IOException, SAXException {
super.parse("foo.xml");
}
};
// the parent XML Reader is a SAX parser, this one will do the actual
// work of parsing the XML document
xmlReader.setParent(org.xml.sax.helpers.XMLReaderFactory.createXMLReader());
xqp.bindDocument (new QName("e"), xmlReader);
...

But of course, this is something you only want to use in specific scenarios. The simple use case of binding an xml file, can easily be accomplished in a single line. The XQJ implementation will make sure the XML file is parsed and queried.

...
xqp.bindDocument(new QName("e"), new FileInputStream("foo.xml"));
...

An XQItem or a complete XQSequence can also be bound to an external variable. We’ll discuss this soon in this XQJ series, in a post on pipelining. Talking about pipelining, XQJ also supports the JAXP Source and Result interfaces, these too will be discussed.

[1] In this series we have not yet introduced the createAtomicType() method defined on XQDataFactory. This will be handled in the next post. Anyway, for now it’s sufficient to know that it returns an XQItemType object representing the specified atomic type.

Labels:

Tuesday, September 18, 2007

XQuery your office documents

With the introduction of the Office Open XML in Microsoft Office 2007 and OpenDocument Format used by OpenOffice applications, your office documents like spreadsheets and word processing documents, become consumable with XQuery.

Technically these formats are basically a bunch of XML files, packaged in a ZIP file. Here is for example an Office Open XML .docx file opened with winzip,

Although the format is different, the OpenDocument Format is conceptually similar, a bunch of XML files packaged in a ZIP.

How can you query the XML files inside the .docx file? After all it’s fairly simple with with DataDirect XQuery using the standard fn:doc XQuery function. fn:doc has a single argument, the URL identifying the XML document to query. Amongst the standard URL schemes like file: and http:, your Java virtual machine also support the jar: URL scheme.

A jar archive is considered "a zip archive with logical extensions". The "logical extensions" being special files like manifest.mf or the META-INF directory located in the archives. But physically these are just zip archives and as such we can use the jar: URL scheme to query Office Open XML documents. For example, to query the main document from our example above, use the following fn:doc call,

doc('jar:file:///C:/example.docx!/word/document.xml')

Let’s take a concrete example, and extract all the comments from John, ordered by date. The Office Open XML format stores the comments in the comments.xml document(what else would you have expected?).

declare namespace w =
"http://schemas.openxmlformats.org/wordprocessingml/2006/main";
<all_john_comments>>{
for $comment in
doc('jar:file:///C:/example.docx!/word/comments.xml')/*/w:comment
where $comment/@w:author = 'John'
order by xs:dateTime($comment/@w:date)
return
<comment date="{$comment/@w:date}">{$comment//text()}</comment>
}</all_john_comments>

Well, I guess you get the idea...

It’s not only about simply querying your office documents. Using the out-of–the-box DataDirect XQuery and XML Converters features, a whole new range of capabilities and use cases become available. For example, extract data out of 'document forms' and save it in your database. Or extract and aggregate data out of a spreadsheet to generate EDI messages.

We have shown how to query your the OpenDocument Format and Office Open XML documents. In a future post, we’ll show how the "older" office formats can be queried using a Custom URI Resolver.
Imagine you can query your existing Excel files using DataDirect XQuery. Stay tuned!

Labels: ,

Monday, September 10, 2007

XQJ Part VII - Typing

In today's post of the XQJ series, we'll have a closer look at how XQJ interacts with the XQuery type system.
XQuery is a strongly typed language, the type system is based on XML Schema. As it is an inherent part of XQuery, you'll need some notions of it to be really effective with XQuery. However, it is out of scope for this XQJ tutorial to go into all the details, my recent XQuery book recommendation is probably a good start if you're interested to know more about the XQuery type system.

XQuery defines a sequence type, as a type that can be expressed using the SequenceType syntax. It consists of an item type that constrains the type of each item in the sequence, and a cardinality that constrains the number of items in the sequence. Having sequences and items in the XQuery type system, XQJ defines two corresponding interfaces XQSequenceType and XQItemType.

XQSequenceType is a rather simple interface with only 3 methods,

  • getItemType() retrieves the item type of the sequence type
  • getItemOccurrence() retrieves the cardinality that constraints the number of items
  • toString() yields a string representation of the sequence type

XQItemType encapsulates more information,

  • getItemKind() returns whether it is an element, attribute, atomic type, etc
  • getBaseType() specifies the built-in schema type closest matching this item type. E.g. xs:anyType, xs:string, etc
  • getNodeName() yields the name of the node, which is a QName.
    getPIName() yields the name of a processing instruction, which is a String
  • getTypeName() specifies a QName identifying the XML Schema type of the item type. This can be either a built-in XML Schema type or user defined
  • toString() yields a string representation of the item type
  • there are some more attributes defined on XQItemType related to user defined schema type, but that would bring us too far in the context of this introductory series.

XQSequenceType and XQItemType objects are used in two different contexts,

  • the representation of the static type of an external variable defined in a query and the query result. In this context, the type is possibly abstract, like item(), node()+ or xs:anyAtomicType?
  • the concrete type of an item, here abstract types are not applicable

Lat's have a closer look at XQItemType, which specifies the item kind and base type,

...
XQSequenceType xqtype = ...
XQItemType xqitype = xqtype.getItemType();
int itemKind = xqitype.getItemKind();
int schemaType = xqitype.getBaseType();
...

XQJ defines constants for each of the item kinds representable in XQuery SequenceType syntax,

















































Sequence Type

XQJ definition

QName

XQITEMKIND_ATOMIC

element(...)

XQITEMKIND_ELEMENT

attribute(...)

XQITEMKIND_ATTRIBUTE

comment()

XQITEMKIND_COMMENT

document-node()

XQITEMKIND_DOCUMENT

document-node(element(...))

XQITEMKIND_DOCUMENT_ELEMENT

processing-instruction(...)

XQITEMKIND_PI

text()

XQITEMKIND_TEXT

item()

XQITEMKIND_ITEM

node()

XQITEMKIND_NODE

getBaseType() is used to determine more precisely the type in case of for example XQITEMKIND_ATOMIC. When we have an atomic type, is it an xs:string or xs:integer? XQJ defines constants for all the built-in XML Schema and XQuery types. It's a long list, too long for this post.


























XML Schema type

XQJ definition

xs:string

XQBASETYPE_STRING

xs:integer

XQBASETYPE_INTEGER

xs:untypedAtomic

XQBASETYPE_UNTYPEDATOMIC

...

...

Iterating over query results, XQJ allows you to request precise type information about each item. Suppose you want to use a different getXXX() method, depending on the item type,

XQSequence xqs = ...
while (xqs.next()) {
XQItemType xqtype = xqs.getItemType();
if (xqtype.getItemKind() == XQItemType.XQITEMKIND_ATOMIC) {
// We have an atomic type
switch (xqtype.getBaseType()) {
case XQItemType.XQBASETYPE_STRING:

case XQItemType.XQBASETYPE_UNTYPEDATOMIC: {
String s = (String)xqs.getObject();
...
break;
}
case XQItemType.XQBASETYPE_INTEGER: {
long l = xqs.getLong();
...
break;
}
...
}
} else {
// We have a node, retrieve it as a DOM node
org.w3c.dom.Node node = xqs.getNode();
...
}
}

OK, this can make your code rather complex and long. Sometimes it is needed, but most of the time a number of shortcuts can be taken. As explained in XQJ Part IV - Processing query results, you can use some of the more the general purpose methods. Suppose you need a DOM node in case the query returns a node, and the string value for all atomic values. The next simple example shows how to do this,

XQSequence xqs = ...
while (xqs.next()) {
XQItemType xqtype = xqs.getItemType();
if (xqtype.getItemKind() == XQItemType.XQITEMKIND_ATOMIC) {
// We have an atomic type
String s = xqs.getAtomicValue();
...
} else {
// We have a node, retrieve it as a DOM node
org.w3c.dom.Node node = xqs.getNode();
...
}
}

That's it for the dynamic type of items. The next example shows how to retrieve the static type of a query (for the JDBC, ODBC and SQL users, this is somehow similar to "describe information")

...
XQPreparedExpression xqe = xqc.prepareExpression("1+2");
XQSequenceType xqtype = xqe.getStaticResultType();
System.out.println(xqtype.toString());
...
With DataDirect XQuery this examples outputs xs:integer to stdout.
Similar, you can inquire the prepared expression to retrieve information about the external variables. As shown in the next examples, first we determine the external variables declared in the query, next we retrieve the static type of each of the external variables,
...
XQPreparedExpression xqe = xqc.prepareExpression(
"declare variable $i as xs:integer external; $i+1");
QName variables[] = xqe.getAllExternalVariables();
for (int i=0; i<variables.length; i++) {
XQSequenceType xqtype = xqe.getStaticVariableType(variables[i]);
System.out.println(variables[i] + ": " + xqtype.toString());
}
...

Why would one care about all this? Let's have a quick look at a use case.
The idea of exposing XQueries as web services is not new, remember for example the XQuery at Your Web Service research paper. A fully functional example of such 'XQuery Web Service' is available on xquery.com, and can be downloaded here. It is basically a servlet that reads xqueries from a specific directory, and makes each of the queries available as functions accessible through SOAP.
The servlet needs to determine the external variables in each of the queries in order to generate the WSDL, which contains an XML Schema definition describing the parameters for each operation. Something as follows, assuming an XQuery with two external variables, $employeeName and $hiringDate, declared as xs:string and xs:date.

...
<xs:element name="XXX">
<xs:complexType>
<xs:sequence>
<xs:element name="employeeName" type="xs:string"/>
<xs:element name="hiringDate" type="xs:date"/>
</xs:sequence>
</xs:complexType>
</xs:element>
...

All the information required to generate such XML schema definition, is available in the sequence type of each declared variable. And through XQJ this information becomes immediately accessible. We can write a piece of code translating the relevant item kinds and base types to an XML Schema definition as shown above, only a matter of a number of Java switch statements. But is there an easier way?

XQJ defines toString() on XQItemType as implementation dependent. Well, more precisely, it is a requirement to return a human-readable string. In any case, with DataDirect XQuery the string representation is based on the XQuery sequence type syntax, where the QName prefixes are as follows,

  • for QNames representing built-in XML schema types, the xs prefix is always used.
  • for QNames representing element or attribute names the prefixes as defined in the query are used. In case of duplicates, one is chosen in an implementation dependent manner

Going back to our XQuery Web Service use case, the strategy to map the external variable declaration to the WSDL becomes rather simple using toString(),

  • if the XQItemType is an atomic type, use the string representation
  • if the XQItemType is anything else, use xs:anyType

I hope this post gave you a feel for the XQSequenceType and XQItemType interfaces, and how you can take advantage of them in your application. Applications have also the ability to create XQItemType objects. We'll show how this can be done and detail out use cases in the next post of the XQJ series.

Labels:

Tuesday, September 4, 2007

Article about SOA and XQuery


I recently wrote an article, "SOA and the importance of XQuery", published on "The SOA Magazine".

You can read the details in the article, but the point I make is that I believe XQuery will play a major role to satisfy the need for good strategies around data services in SOA.

I would be happy to read your comments on this topic.


Tech Tags:

Labels:

Monday, September 3, 2007

XQuery book recommendation

Now and then I'm asked for advice on XQuery books. There are two books I strongly recommend these days.

The first is XQuery from Priscilla Walmsley.

Priscilla is an excellent writer, and known for teaching good XML and XQuery classes, she is also the author of the FunctX XQuery function library.

It's good to have some basic XML knowledge to start reading, but you don't need to be an expert. Especially for some of the more advanced topics, think of XML namespaces, it's useful to have some introductional material. On the other hand, the book is also good reading for the more experienced XQuery developers and architects.

25 chapters on specific topics, and a summary of all built-in functions and types, make it also a good reference. This is not the kind of book you buy and read only once.

My second recommendation is XPath 2.0 Programmer's Reference from the hand of Michael Kay. I guess there is no need to introduce Michael.

Right, this is an XPath book, but remember that XPath 2.0 is a subset of XQuery 1.0 and they share the same data model, XQuery 1.0 and XPath 2.0 Data Model.

As you can expect from Michael, this is an extremely in depth book. Not a tutorial, but an authoritative reference. If you're serious about XQuery development, this is a must have!

Labels: ,