XML Connections

Wednesday, April 30, 2008

Transforming XML using XQuery updates

In one of my previous posts I blogged about the new W3C XML Query publications, including XQuery Update Facility.

Talking to people, I feel a majority relate XQuery Update Facility with updating XML documents in some persistent store. But there is actually more, it is also about transforming and enriching (transient) documents. In fact a use case hard to handle with XQuery 1.0.

Note the XQuery Update Facility is today in W3C Candidate Recommendation status. Although not yet supported by DataDirect XQuery, we are following the progression of the specification closely, and hope to communicate about our plans soon, when the specification moves into Proposed Recommendation status.

Anyway, I want to give a feel on what will be possible in the near future.

Let’s consider a concrete example. Not so long ago I posted a series on XQuery generating multiple XML documents, let’s continue with this theme. Assume you have a directory with XML documents, and these need to be enriched with data coming out of your relational database. For example, orders need to be completed with the shipping address.

Let’s enrich all our orders, each a seperate XML document. Using the customer’s name available in the order, we lookup the shipping address in our relational database and enrich subsequently the order with that shipping address.

In the margin, we use the utility function local:get-file-name, which we introduced before.

for $order in fn:collection("file:///C:/inputorders?select=*.xml")
let $filename := concat("file:///C:/outputorders/",
local:get-file-name(document-uri($order)))
let $newOrder :=
<order id="{$order/order/@id}">{
$order/order/customer,
<address>{
fn:collection("CUSTOMERS")/CUSTOMERS[NAME = $order/order/customer]/ADDRESS/text()
}</address>,
$order/order/totalprice,
$order/order/priority,
$order/order/deliverydate,
$order/order/notes,
$order/order/lineitems
}</order>
return
ddtek:serialize-to-url($newOrder, $filename, "")

Basically we read information out of the input document, and recreate the output using the same structure, but extending it with the shipping address.
There are a number of issues with such approach.

  • The more complex the input documents are, the more complex and longer the query will be.
  • The query is specific to the document structure. When the document structure evolves, the query needs to be updated.
  • If you need to handle different document types, either your query becomes more complex or multiple queries are to be used.
And as you can imagine, the more complex your documents are, the more these issues are relevant. The example above is in fact over simplified. Typical XML documents in the industry are much more complex, think for example on some of the ones we use in our tutorial How DataDirect XQuery Helps the Insurance Industry Deal with ACORD Standards.

Using XQuery Update Facility this becomes much simpler. It helps to resolve the raised concerns, resulting in more maintainable queries, and eventually increased productivity.
In the following query we simply enrich all XML documents by adding a corresponding address element after each customer element. Such approach is much more robust, and is ready to handle future revisions of the orders XML Schema.

for $order in fn:collection("file:///C:/inputorders?select=*.xml")
let $filename := concat("file:///C:/outputorders/",
local:get-file-name(document-uri($order)))
return
copy $newOrder := $order
modify
insert node
<address>{
fn:collection("CUSTOMERS")/CUSTOMERS[NAME = $order/order/customer]/ADDRESS/text()}
</address>
after $newOrder/order/customer
return
ddtek:serialize-to-url($newOrder, $filename, "")

Queries like the above one are already fully functional in our development lab. At DataDirect we are excited about all the new functionality the XQuery Update Facility will bring to the XML development community.

Tech Tags:

Labels: ,

Tuesday, April 22, 2008

Stock quotes, CSV, XQuery and Web services

Not long ago I could rely on a variety of free Web services to fetch the semi-live quotes of my preferred stock tickers. Unfortunately, one by one all those Web services have either disappeared or started requiring a subscription... and for writing a few demos now and then, the subscription model doesn't work that well for me.

I started thinking about alternatives; one option could be to fetch an HTML page from one of the many Web sites that expose stock quotes; but then extracting the information I'm looking for, reformatting it, or mashing it up or reshaping it in a different XML structure would be too painful. HTML layouts change frequently, and navigating HTML even after converting it to well formed XML (XHTML) is usually error prone.

When I mentioned that to one of our lead engineers, Tony Lavinio pointed out that Yahoo! exposes a service which returns a comma separated value (CSV) file given a list of tickers and a few options (that you can find described in a few places on the Internet). It's as simple as invoking a URL; for example, try this:

http://finance.yahoo.com/d/quotes.csv?s=PRGS,AAPL,JAVA,MSFT&f=snl1

If Microsoft Excel is installed on your computer, you will most likely get the results back in an Excel spreadsheet. The options at the end control how many and what kind of "columns" you get back ("s" is the symbol, "n" is the company name, "l1" is the latest value). There are more options, as I mentioned; the following URL will return you more information for each stock ticker:

http://finance.yahoo.com/d/quotes.csv?s=PRGS,AAPL,JAVA,MSFT&f=snl1chgopvd1t1jkxerba

Well, OK; that's interesting; but how does it help me to create my own HTML mash-ups, or my own XML reports? The Yahoo! service is very nice, but it returns a CSV format, not XML - which, as you can guess, is what I would like to manipulate. Once again XML Converters come to the rescue: open your Stylus Studio IDE, create a new XQuery and copy and paste this:

doc("converter:CSV?http://finance.yahoo.com/d/quotes.csv?s=PRGS,AAPL,JAVA,MSFT&amp;f=snl1")

This time you'll get an XML document back:

<table>
  <row>
    <column.0>PRGS</column.0>
    <column.1>PROGRESS SOFTWARE</column.1>
    <column.2>30.07</column.2>
  </row>
  <row>
    <column.0>AAPL</column.0>
    <column.1>APPLE INC</column.1>
    <column.2>168.16</column.2>
  </row>
  <row>
    <column.0>JAVA</column.0>
    <column.1>SUN MICROSYSTEMS </column.1>
    <column.2>15.78</column.2>
  </row>
  <row>
    <column.0>MSFT</column.0>
    <column.1>MICROSOFT CP</column.1>
    <column.2>30.42</column.2>
  </row>
</table>

That's something I can use to create my own HTML content or my XML report! I can just change the XQuery a bit to get that XML formatted into an HTML table, for example:

<table cellspacing= "1" cellpadding="4" border="1">{
  for $row in doc("converter:CSV?http://finance.yahoo.com/d/quotes.csv?s=PRGS,AAPL,JAVA,MSFT&amp;f=snl1hg")/table/row
  return
  <tr>
    <td>{ (:s symbol:) $row/column.0/text() }</td>
    <td>{ (:n name:) $row/column.1/text() }</td>
    <td>{ (:l1 last value:) $row/column.2/text() }</td>
    <td>{ (:h day's high:) $row/column.3/text() }</td>
    <td>{ (:g day's low:) $row/column.4/text() }</td>
  </tr>
}</table>

The result will look something like this:

PRGS PROGRESS SOFTWARE 29.58 29.87 29.58
AAPL APPLE INC 167.527 168.00 167.05
JAVA SUN MICROSYSTEMS 15.64 15.70 15.60
MSFT MICROSOFT CP 30.46 30.64 30.43

Now, what if I want to create my own Web service that returns information about one or more stock tickers? If you have read this blog entry about how to expose and consume Web service with DataDirect XQuery, you will know that creating an XQuery-based stock quote Web service is as easy as creating a simple XQuery with one external variable that accepts a list of stock tickers:

getQuotes.xquery:

declare variable $tickers as xs:string external;
<quotes>{
  for $row in doc(concat("converter:CSV?http://finance.yahoo.com/d/quotes.csv?s=", $tickers, "&amp;f=snl1hg"))/table/row
  return
  <stock>
    <symbol>{ $row/column.0/text() }</symbol>
    <name>{ $row/column.1/text() }</name>
    <last-value>{ $row/column.2/text() }</last-value>
    <day-high>{ $row/column.3/text() }</day-high>
    <day-low>{ $row/column.4/text() }</day-low>
  </stock>
}</quotes>


Deploying that XQuery in the DataDirect XQuery Web service framework will expose a new WSDL/SOAP operation, and a new REST service that we can use from any environment that supports Web service invocation. Do you want to try it? Point your browser at:

http://examples.xquery.com/stock-quotes

From there you can retrieve the WSDL corresponding to the XQuery above (getQuotes.xquery), or try the service itself through the Web interface itself! To achieve that I didn't do any more coding than writing the XQuery listed above! As you can see, I also added there a second XQuery (getQuotesEx.xquery) which allows you to specify different options:

getQuotesEx.xquery:

declare variable $tickers as xs:string external;
declare variable $options as xs:string external;
<quotes>{
  for $row in doc(concat("converter:CSV?http://finance.yahoo.com/d/quotes.csv?s=", $tickers, "&amp;f=", $options))/table/row
  return $row
}</quotes>


The XML returned might be formatted in a more pleasant way adding some translation of each option in a reasonable element name returned as result, but I'll leave that exercise to someone else.

Thanks to the power and flexibility of DataDirect XQuery and XML Converters, and thanks to the Yahoo! Finance CSV service, I now have again Web services I can use to retrieve all the information I need about any stock ticker!

Labels: , , , ,

Saturday, April 5, 2008

W3C XML Query Working Group published new documents

Several new W3C XML Query publications have hit the street the last weeks.

First of all the XML Query Working Group published a Candidate Recommendation of XQuery Update Facility 1.0. Together with this Candidate Recommendation, the Working Group published an update of the XQuery Update Facility 1.0 Requirements and XQuery Update Facility 1.0 Use Cases.

Where XQuery 1.0 only allows to "query", the XQuery Update Facility adds "update" capabilities to XQuery. The obvious use case is to update XML documents stored in a database, but XQuery Update Facility is about more!
It also enables transformations and enrichment of both transient and persisted XML documents. Such operations are rather complex with XQuery 1.0. I will talk about some of these uses cases in a future post, and show what XQuery Update Facility adds to the XQuery world when it comes to transformations and data enrichment.

When will DataDirect XQuery support the XQuery Update Facility? If you were hoping to read here about an a concrete date, I have to disappoint you. All I can say right now is that DataDirect follows closely the XQuery Update Facility development process. We will be able to make this more concrete later this year, when the XQuery Update Facility moves into Proposed Recommendation.
In the meantime, don’t forget that DataDirect XQuery offers today the ability to update your database through XQuery, Minollo blogged about a nice use case not so long ago.

I believe a significant limitations of the XQuery Update Facility is that an expressions is either "updating" or "non updating". In other words, within a single query it is not possible to both perform an update and return a query result. Looks like a common use case to me. The XML Query Working Group is trying to answer this through XQuery Scripting Extensions.

A first Working Draft of XQuery Scripting Extension 1.0 made it to the public recently. As this is a first Working Draft, don’t expect things to be written in stone. Everything is open for change and improvements. As with most of the XQuery specifications, this one also is accompanied by a Use Cases document, it provides the usage scenarios that motivate the changes in the XQuery Scripting Extension. I’m a fan of the use cases approach, it’s a good source for learning.

As said above, the ability to both "update" and "query" is required, and this is definitely the most important functionality I see in the XQuery Scripting Extensions. Some of the procedural extensions (assignments, while-loop, etc) are somehow questionable. It has to be seen how these will be used in a declarative language like XQuery. Will such concepts result in confusion for the user? Will productivity of XQuery programming improve through the procedural extensions? The future will tell...

The XML Query Working Group has been busy lately, and also started with XQuery 1.1. Still preliminary, but the Requirements and a first version of the Uses Cases were published mid March.

Tech Tags:

Labels: