XML Connections

Thursday, November 20, 2008

XQuery for the SQL programmer – And performance?

In this last post of the XQuery for the SQL programmer series, I would like to spend a few minutes on performance. The previous post listed a dozen data integration uses cases, deploying an XQuery engine on top of your SQL database. The question is of course how performant such solution can be.

If you have a rather naive implementation retrieving the complete table (or multiple tables) and subsequently perform queries on an in-memory representation, well of course, performance will be unacceptable slow. If it works at all, once you start to query your production database with millions of records.

The tricky part is to have a performant and scalable XQuery engine, that is capable of translating XQuery straight into SQL. And we believe DataDirect XQuery is...

We wrote a white paper about translating XQuery to SQL, showing concrete XQuery queries and the corresponding SQL. I would advice to read the document, but in short, the SQL generation is based on the following principles

  • Minimize data retrieval
  • Leverage the database strengths
  • Optimize for each database
  • Retrieve data efficiently
  • Support incremental evaluation
  • Optimize for XML hierarchies
  • Give the programmer the last word

And of course, when it comes to answering your data integration challenges, it's a matter of joining and aggregating relational data with other formats in the most optimal way. We have blogged about this topic before, but there is of course much more to say. Looks like I should spend some more blog-time on the performance and scalability aspects of data integration through XQuery.

And remember, performance is one aspect, developer's productivity is also important. Think of all the APIs to master, Java code to write - and maintain! - to combine multiple data sources, while all this can be done in a single XQuery.

Tech Tags:

Labels: , , ,

Thursday, October 30, 2008

XQuery for the SQL programmer – Why XQuery?

Over the last month, wearing our SQL-glasses we mastered the basics of XQuery. But wait... what we’ve learned up to now in the XQuery for the SQL programmer series is about “do in XQuery what you can do in SQL”. Doesn’t sound like a compelling reason to use XQuery

To be clear, I’m not advocating to replace SQL with XQuery. For a lot of good reasons SQL is there today, and it will be there tomorrow. Admitting, that’s also a good thing for DataDirect, as a fair amount of our business is based on ODBC, JDBC and ADO.NET.

Why do I want to use XQuery against relational databases? Minollo answered the question before. It’s all about the data integration promise offered by products like DataDirect’s Data Integration Suite.

If you frequent xml-connection.com you know about the use cases

Are you suffering data integration?

Tech Tags:

Labels: , , ,

Thursday, September 11, 2008

XQuery for the SQL programmer – Updating your database

In today’s post we’ll show how to update your relational database using XQuery. As for all posts in this series we’ll start with SQL statements, and show the equivalent xqueries.

The XQuery Update Facility is an upcoming W3C standard, currently in Candidate Recommendation status. The XQuery Update Facility extends XQuery 1.0 and allows to change XML. However, today we will not get into the details of this upcoming standard. We will however show how XQuery allows you to update your RDBMS, in a similar way as SQL does, using the DataDirect XQuery update functionality.

Basically, DataDirect XQuery introduces 3 new XQuery functions,

You probably already see the analogy with SQL’s INSERT, UPDATE and DELETE statements, time for some examples.

Inserting rows

The following SQL statement creates a new user, Joe Roxy,

insert into USERS(USERID, NAME) values ('U10', 'Joe Roxy')

The XQuery variant is very similar,

ddtek:sql-insert("USERS", "USERID", "U10", "NAME", "Joe Roxy")

The first argument identifies the table which we want to insert into. Next ddtek:sql-insert has a variable list of column-value pairs.

A "insert into select" is a bit more verbose in XQuery than it is in SQL.

insert into NEWUSERS select * from USERS where NAME LIKE 'Tom%'

In XQuery you would write,

for $user in collection("USERS")/USERS
where starts-with($user/NAME, "Tom")
return
ddtek:sql-insert("OTHERUSERS",
"USERID", $user/USERID,
"NAME", $user/NAME,
"RATING", $user/RATING)

Deleting rows

Let’s now delete Joe Roxy from the user’s table,

delete from USERS where USERID = 'U10'

With DataDirect XQuery you use ddtek:sql-delete. You select all the users you want to delete, we know how we can do that,

collection("USERS")/USERS[USERID = 'U10']

And now you pass that information as argument to ddtek:sql-delete,

ddtek:sql-delete(collection("USERS")/USERS[USERID = 'U10'])

And of course, if you have a preference for FLWOR expressions, the following is equivalent,

for $u in collection("TESTUSERS")/TESTUSERS
where $u/USERID = 'U10'
return
ddtek:sql-delete($u)

Updating rows

Let’s now show how to update the item 1001, change its description to Green Bicycle and price = 100,

update ITEMS
set DESCRIPTION = 'Green Bicycle', RESERVE_PRICE = '100'
where ITEMNO = '1001'

Similar to deleting rows, you first of all need to specify which rows to be updated, and pass that as argument to ddtek:sql-update.

for $item in collection("ITEMS")/ITEMS
where $item/ITEMNO = '1001'
return
ddtek:sql-update($item,
"DESCRIPTION", "Green Bicycle",
"RESERVE_PRICE", 100)

Just as in SQL, you can update a column with a value computed from the current value. Suppose we want to raise the reserve price for all items with 10%. As shown in the following SQL statement,

UPDATE "xvs"."xvs001"."ITEMS" SET "RESERVE_PRICE" = "RESERVE_PRICE" * 1.10

And in XQuery,

for $item in collection("ITEMS")/ITEMS              
return
ddtek:sql-update($item, "RESERVE_PRICE", $item/RESERVE_PRICE * 1.10)

Upsert?

Several SQL implementations support upsert or merge functionality. It allows to update a target table with data from a source table. Rows in the target that match the source are updated, and the non existing rows are inserted. XQuery has not such built-in functionality, but there is also no need to. Using a conditional expression, this is easily expressed in XQuery.

Suppose we want to copy all users with rating A from one to another table. If the user exists, we update the row, otherwise we insert.
We iterate over all rows in the USERS table with rating A, and lookup the matching row in the NEWUSERS table.
If a match in NEWUSERS is found, we update the row, otherwise a new row is inserted,

for $source in collection("USERS")/USERS
let $target := collection("NEWUSERS")/NEWUSERS[USERID = $source/USERID]
where $source/RATING = "A"
return
if (exists($target)) then
ddtek:sql-update($target,
"NAME", $source/NAME,
"RATING", $source/RATING)
else
ddtek:sql-insert("NEWUSERS",
"USERID", $source/USERID,
"NAME", $source/NAME,
"RATING", $source/RATING)

NULL values

As we learned in The Data Model, in the SQL/XML view, NULL values area represented as missing elements. Similar, to pass NULL to one of the ddtek:sql-* functions, specify the empty sequence.

Setting an item’s description to null in SQL,

update ITEMS
set DESCRIPTION = NULL
where ITEMNO = '1001'

Updating a columns with NULL, is not much different in XQuery,

for $item in collection("ITEMS")/ITEMS
return ddtek:sql-update($item, "DESCRIPTION", ())

All the update examples in this post either use static data (literals) or are about copying data from one to another table. But in most scenarios this data comes out of your application in one or the other way.

With DataDirect XQuery you have the ability to read data out of an XML structure and update your database. Minollo and I have blogged on this subject before in Bulk load of XML data into a relational database and Shredding XML documents into tables, a database independent approach....

As you know in SQL, bind markers facilitates binding values from your application into the SQL statement. What is the equivalent of SQL bind markers in XQuery? Watch out for our next post in the XQuery for the SQL programmer series.

Tech Tags:

Labels: , ,

Tuesday, August 5, 2008

XQuery for the SQL Programmer - Grouping and aggregation

Today's topic in the XQuery for the SQL programmer series is grouping.
Where grouping is built-in the SQL language, this is unfortunately a bit less trivial in XQuery. Let's start with a simple example, count the number of bids for every user.

select USERID, count(*)
from BIDS
group by USERID

XQuery has no built-in grouping construct, but you can achieve the same using the distinct-values() function, which we introduced in our last week's post.

for $userid in distinct-values(collection("BIDS")/BIDS/USERID)
return
<user>
<userid>{$userid}</userid>
<count>{count(collection("BIDS")/BIDS[USERID = $userid])}</count>
</user>

Let's now add a having clause, we're interested in the users with more than 1 bid,

select USERID, count(*)
from BIDS
group by USERID
having count(*) > 1

Our previous XQuery is changed as follows. Note that we use a let-clause, which we introduced in Joining Data.

for $userid in distinct-values(collection("BIDS")/BIDS/USERID)
let $count := count(collection("BIDS")/BIDS[USERID = $userid])
where $count > 1
return
<user>
<userid>{$userid}</userid>
<count>{$count }</count>
</user>

And we further detail our query, as we're only interested in significant bids of more than 100$.

select USERID, count(*)
from BIDS
where BID > 100
group by USERID
having count(*) > 1

In XQuery we do,

for $userid in distinct-values(collection("BIDS")/BIDS/USERID)
let $count := count(collection("BIDS")/BIDS[USERID = $userid and BID > 100])
where $count > 1
return
<user>
<userid>{$userid}</userid>
<count>{$count }</count>
</user>

As a final example, let's discuss a grouping scenario based on two values. For all items group on the rating of the user, and count the number of bids, in addition to the average bid,

select ITEMNO, RATING, AVG(BID), COUNT(*) from xvs001.BIDS, xvs001.USERS
where USERS.USERID = BIDS.USERID
GROUP BY RATING, ITEMNO
ORDER BY ITEMNO, RATING

In XQuery one could write the following query. Not the two for-loops over distinct-values().

for $rating in distinct-values(collection("USERS")/USERS/RATING)
for $itemno in distinct-values(collection("BIDS")/BIDS/ITEMNO)
for $userid in collection("USERS")/USERS[RATING=$rating]/USERID
let $bids := collection("BIDS")/BIDS[ITEMNO=$itemno and USERID = $userid]
let $avg := avg($bids/BID)
let $count := count($bids)
where $bids
order by $itemno, $rating
return
<result>
<<itemno>{$itemno}</itemno>
<rating>{$rating }</rating>
<avg>{$avg}</avg>
<count>{$count}</count>
</result>

We can conclude that resolving grouping problems in XQuery 1.0 is less simple than in SQL. Unfortunate, but today's reality.
In the margin, the XML Query Working Group has recognized the issue, and is active working on adding native grouping capabilities to the language. All this in scope of XQuery 1.1. Today still a working draft, but at least we know it will change and improve over time.

Tech Tags:

Labels: , ,

Thursday, July 31, 2008

XQuery for the SQL Programmer - Operators, functions and conditions

This is our fourth post in the XQuery for the SQL programmer series. Today we'll have a closer look at the SQL functions, operators, and conditions used in where clauses. And of course we'll explain how to accomplish the same in XQuery.

Comparisons

All of us are using comparison operators frequently in SQL, especially equality (=). Not surprising, we've been using it already a few times in examples of previous posts of this series.

Select all motorcycles,

select *
from ITEMS
where DESCRIPTION = 'Motorcycle'

is written in XQuery as follows,

for $item in collection("ITEMS")/ITEMS
where $item/DESCRIPTION = "Motorcycle"
return $item

Looks like = in SQL is the same as = in XQuery, but actually that is not entirely true. In SQL the comparison operators are used to compare two values. In XQuery, the comparisons operate on sequences of values and as such have existential behavior. A few examples of = in XQuery,

  • 1 = 2 evaluates to false
  • 1 = (1,2) evaluates to true
  • 1 != (1,2) evaluates to true
Note the last two comparisons, they illustrate that = and != are not the inverse of each other. Another consequence of the existential behavior, equality is not transitive in XQuery.
  • (1,2) = (2,3) evaluates to true
  • (2,3) = (3,4) evaluates to true
  • But (1,2) = (3,4) evaluates to false!
In XQuery we call these operators (=, !=, <, <=, > and >=) general comparisons. In addition XQuery also has value comparisons (eq, ne, lt, le, gt, ge), which are used to compare single values. Although they have a different syntax, these value comparisons are equivalent to the SQL comparison operators. The example above could thus also be written as follows,

for $item in collection("ITEMS")/ITEMS
where $item/DESCRIPTION eq "Motorcycle"
return $item/ITEMNO

Arithmetic operators

A lot of analogy between the arithmetic operators in SQL and XQuery. +, - and * are equivalent.
Where SQL uses the / operator for divisions, this is not possible in XQuery. Remember that / is used in path expressions. So XQuery has a div operator.

IN clause

Note the similarity, to some extend, between the IN condition as known in SQL with XQuery's equality general comparison.

select ITEMNO
from ITEMS
where DESCRIPTION in ('Motorcycle', 'Bicycle')

Can be written in XQuery using a general comparison,

for $item in collection("ITEMS")/ITEMS
where $item/DESCRIPTION = ("Motorcycle", "Bicycle")
return $item/ITEMNO

XQuery doesn't have a dedicated construct like the SQL IN condition. As we will see throughout this post, in XQuery we take advantage of the rich set of built-in functions and operators to accomplish the same as with some of the specific SQL constructs.

LIKE condition

Take for example, the LIKE condition, frequently used in where clauses. XQuery doesn't have the concept of a LIKE condition, but the rich set of functions includes equivalents.

The matches() function has two arguments, the input string and a regular expression against which the input string is matched. Refer to the regular expression syntax specification for all the details. But in the context of the SQL LIKE condition, here is a quick introduction. In a regular expression a dot (.) represents any character, similar to the underscore (_) in SQL. Any number of any character matches percent (%) in SQL, this is represented as a dot with asterisk (.*)in XQuery. Further, in a regular expression ^ and $ indicate the start and end of the input string.

  • where DESCRIPTION LIKE 'Motor%' is written in XQuery as where matches($item/DESCRIPTION, "^Motor")
  • where DESCRIPTION LIKE '%Motor%' is written in XQuery as where matches($item/DESCRIPTION, "Motor")
  • where DESCRIPTION LIKE 'Motor_' is written in XQuery as where matches($item/DESCRIPTION, "^Motor.$")
  • where DESCRIPTION LIKE 'Motor%cycle' is written in XQuery as where matches($item/DESCRIPTION, "^Motor.*cycle$")

XQuery also offers a starts-with() and ends-with() function, which are equivalent to LIKE "string%" and LIKE "%string". In simple cases, starts-with() and ends-with() are more convenient than the powerful matches() function.

Get all items ending on "cycle",

select *
from ITEMS
where DESCRIPTION LIKE '%cycle'

An equivalent XQuery query is,

for $item in collection("ITEMS")/ITEMS
where ends-with($item/DESCRIPTION = "cycle")
return $item

Boolean operators

Similar as in SQL, you can also in XQuery combine Boolean expressions with "and" and "or". Negating a condition in SQL is expressed in two different ways.

Get all items which are not Bicycles,

select *
from ITEMS
where not(DESCRIPTION = 'Bicycle')

And get all items not ending on "cycle",

select *
from ITEMS
where DESCRIPTION NOT LIKE '%cycle'

In XQuery, we have again only one approach, using the not() function. The first example above can be expressed as follows,

for $item in collection("ITEMS")/ITEMS
where not($item/DESCRIPTION = "Bicycle")
return $item

And get all items not ending on "cycle",

for $item in collection("ITEMS")/ITEMS
where not(ends-with($item/DESCRIPTION = "cycle"))
return $item

DISTINCT values

DISTINCT as it exists in SQL, is not available in XQuery . Again, in XQuery this is accomplished through a function, distinct-values(). For example, retrieve all ids for users having placed a bid,

select DISTINCT USERID
from BIDS

Using the distinct-values() function this can be expressed as follows in XQuery,

distinct-values(collection("BIDS")/BIDS/USERID)

Note that we wrote this without a FLWOR expression. In case we would like to return the user ids ordered, we would need to use a FLWOR,

for $userId in distinct-values(collection("BIDS")/BIDS/USERID)
order by $userId
return $userId

Aggregate functions

Not surprising, the 5 aggregate functions in SQL and XQuery are equivalent,

  • avg
  • count
  • max
  • min
  • sum
Calculating the total value of all items in our auction,

select sum(RESERVE_PRICE) from xvs001.ITEMS

Is expressed in XQuery as follows,

sum(collection("ITEMS")/ITEMS/RESERVE_PRICE)

Scalar functions

Just as every database having a long list of scalar functions, XQuery also includes a wide range of built-in functions. Going through the complete list is out of scope for this post. In general usage is the same, only a matter of finding the equivalent XQuery function for your SQL function.

Retrieve all items with a description of more than 10 characters. In SQL Server we have the len() function.

select DESCRIPTION
from ITEMS
where len(DESCRIPTION) > 10

With the following equivalent XQuery,

for $item in collection("ITEMS")/ITEMS
where string-length($item/DESCRIPTION) > 10
return $item/DESCRIPTION

Suppose your favorite SQL function is not supported in the standard XQuery function library. Are you stuck? Not necessarily if you're using DataDirect XQuery, which allows you to call any of the functions available in your database.

Consider the ORACLE function INITCAP, it returns the specified string with each word's first letter in uppercase,

select INITCAP(DESCRIPTION)
from ITEMS

Querying my ORACLE data through DataDirect XQuery you can achieve the same as follows. All you need is to declare in your query the INITCAP function as being SQL specific, and under the covers DataDirect XQuery will take advantage of Oracles INITCAP implementation.

declare function ddtek-sql:INITCAP($s as xs:string) as xs:string external; 
for $item in collection('ITEMS')/ITEMS
return
ddtek-sql:INITCAP(lower-case($item/DESCRIPTION))

I hope you have now a better feel on the similarities and key differences between SQL and XQuery, when it comes to functions and operators.
Looking forward at our next post, we'll discuss grouping.

Tech Tags:

Labels: , ,

Wednesday, July 23, 2008

XQuery for the SQL programmer – Joining Data

Often you join multiple table in SQL queries. What about XQuery, how can we perform joins in XQuery? This is what this third post in the XQuery for the SQL programmer series is all about.

A first example

Assume you want to get all users with their bids. In SQL one could write,

select u.NAME, b.ITEMNO, b.BID
from USERS u, BIDS b
where u.USERID = b.USERID

In XQuery this is expressed as follows,

for $user in collection("USERS")/USERS
for $bid in collection("BIDS")/BIDS
where $user/USERID = $bid/USERID
return
<result>{
$user/NAME,
$bid/ITEMNO,
$bid/BID
}</result>

Conceptually, the for-clauses of a FLWOR expression generate an ordered sequence of tuples of variable bindings. Note the analogy with a cartesian-join in SQL (a cross-join if you want). Next the where-clause filters the sequence of tuples. In our example above this results in the appropriate join condition being applied.
Note XQuery doesn't have the concept of result sets, everything is XML. We create a small XML fragment in the return-clause, combining the user's name with the bid's information.

The example above showed a join between two tables. But just like in SQL, XQuery allows to have more complex joins. Simply add additional for-clauses to your FLWOR expression. The next example is based on the previous, but in addition the bids are joined with items to retrieve the corresponding item description,

for $user in collection("USERS")/USERS
for $bid in collection("BIDS")/BIDS
for $item in collection("ITEMS")/ITEMS
where $user/USERID = $bid/USERID and
$bid/ITEMNO = $item/ITEMNO
return
<result>{
$user/NAME,
$bid/ITEMNO,
$bid/BID,
$item/DESCRIPTION
}</result>

A word on outer joins

Up to now we've talked about inner join, what about outer joins? Get all users with their corresponding bids,

select u.NAME, b.ITEMNO, b.BID
from USERS u left outer join BIDS b
on u.USERID = b.USERID

In XQuery this can be expressed building hierarchical XML results, for each user we group the bids. Right, forget about tabular only data, in XML you can create hierarchical structures.

for $user in collection("USERS")/USERS
return
<user>{
$user/NAME,
for $bid in collection("BIDS")/BIDS
where $user/USERID = $bid/USERID
return
<bid>{
$bid/ITEMNO,
$bid/BID
}</bid>
}</user>

For example, Jack Sprat placed two bids, one for item 1003 and another on item 1007. Rip Van Winkle on the other hand doesn't have yet any bids.

...
<user>
<NAME>Jack Sprat</NAME>
<bid>
<ITEMNO>1003</ITEMNO>
<BID>20</BID>
</bid>
<bid>
<ITEMNO>1007</ITEMNO>
<BID>200</BID>
</bid>
</user>
<user>
<NAME>Rip Van Winkle</NAME>
</user>
...

Suppose now you want to represent the results in a more tabular structure. This requires a bit more work in XQuery...
Remember the first join example in this post,

for $user in collection("USERS")/USERS
for $bid in collection("BIDS")/BIDS
where $user/USERID = $bid/USERID
return
<result>{
$user/NAME,
$bid/ITEMNO,
$bid/BID
}</result>

This yields all the user-bid tuples we're looking for, but excludes the users not having placed any bids. Retrieving those users can be achieved as follows,

for $user in collection("USERS")/USERS
where fn:empty(collection("BIDS")/BIDS[USERID = $user/USERID])
return
<result>{
$user/NAME
}</result>

We simply get all users, and only consider those which have no bids through the where-clause. Note that we use a more concise path expression in,

where fn:empty(collection("BIDS")/BIDS[USERID = $user/USERID])

If you prefer to use FLWOR expressions consistently the previous query becomes,

where fn:empty(for $bid in collection("BIDS")/BIDS
where $bid/USERID = $user/USERID
return $bid)

Note this is only about syntax difference. Both forms are semantically equivalent, and good XQuery implementations yield the same performance characteristics.

Back to our outer join subject, now we can combine both queries using the comma-operator,

for $user in collection("USERS")/USERS
for $bid in collection("BIDS")/BIDS
where $user/USERID = $bid/USERID
return
<result>{
$user/NAME,
$bid/ITEMNO,
$bid/BID
}</result>
,
for $user in collection("USERS")/USERS
where fn:empty(collection("BIDS")/BIDS[USERID = $user/USERID])
return
<result>{
$user/NAME
}</result>

For Jack Sprat and Rip Van Winkle we get results as follows,

<result>
<NAME>Jack Sprat</NAME>
<ITEMNO>1003</ITEMNO>
<BID>20</BID>
</result>
<result>
<NAME>Jack Sprat</NAME>
<ITEMNO>1007</ITEMNO>
<BID>200</BID>
</result>
<result>
<NAME>Rip Van Winkle</NAME>
</result>

You can fairly easy extend the last query to perform a full outer join. Only a matter of adding a third sub-expression.

If you want to order the result by name and itemno, simply wrap the previous query with an additional FLWOR expression,

for $result in
(for $user in collection("USERS")/USERS
for $bid in collection("BIDS")/BIDS
where $user/USERID = $bid/USERID
return
<result>{
$user/NAME,
$bid/ITEMNO,
$bid/BID
}</result>
,
for $user in collection("USERS")/USERS
where fn:empty(collection("BIDS")/BIDS[USERID = $user/USERID])
return
<result>{
$user/NAME
}</result>)
order by
$result/NAME
return
$result

What about subselects?

In SQL we often use subselects. For example, get a list of all bids with the name of the bidder,

select (select u.NAME from USERS u where u.USERID = b.USERID),
b.ITEMNO,
b.BID
from BIDS b

Similar in XQuery you can write,

for $bid in collection("BIDS")/BIDS
return
<bid>{
for $user in collection("USERS")/USERS
where $user/USERID = $bid/USERID
return
$user/NAME,
$bid/ITEMNO,
$bid/BID
}</bid>

Or more concise,

for $bid in collection("BIDS")/BIDS
return
<bid>{
collection("USERS")/USERS[USERID = $bid/USERID]/NAME,
$bid/ITEMNO,
$bid/BID
}</bid>

Suppose we want to order the bids by the bidder's name,

select (select u.NAME from USERS u where u.USERID = b.USERID),
b.ITEMNO,
b.BID
from BIDS b
ORDER BY (select u.NAME from USERS u where u.USERID = b.USERID)

Rather than repeating the expression twice, use a let-clause in XQuery,

for $bid in collection("BIDS")/BIDS
let $name := collection("USERS")/USERS[USERID = $bid/USERID]/NAME
order by $name
return
<bid>{
$name,
$bid/ITEMNO,
$bid/BID
}</bid>

Similar to SQL, where you can use subselects at various locations in a select statement, this also applies to XQuery. FLWOR expressions can be nested. In fact, XQuery is much more open to this than SQL. Where in SQL a subselect can only be used in specific locations, in XQuery every expressions can be combined with any other expression.

In SQL you have a rich set of functions, in our next post we'll compare this with the functions and operators available in SQL.

Tech Tags:

Labels: , ,

Thursday, July 10, 2008

XQuery for the SQL programmer – The Data Model

In this second post of the XQuery for the SQL programmer series we’ll explain how to represent relational tables in XML and query them through XQuery.

SQL/XML mapping

The SQL/XML standard, specifies a mapping for relational tables into XML. Actually, SQL/XML includes a set of mapping rules. These mappings can be parameterized in several ways, including the target namespace for the XML structure, whether to handle nulls using xsi:nil or absence, and whether to map a table to a single element or a forest of elements.

This last parameter – is the table mapped to a forest or not? – is mostly affecting the XML structure. Here is the mapping for the USERS table with SQL/XML forest parameter set to false. As we see there is a single <USERS> root element, and every row in our table becomes a <row> element. Each <row> element contains elements representing every single column.

<USERS>
<row>
<USERID>U01</USERID>
<NAME>Tom Jones</NAME>
<RATING>B</RATING>
</row>
<row>
<USERID>U02</USERID>
<NAME>Mary Doe</NAME>
<RATING>A</RATING>
</row>
...
</USERS>

Setting the SQL/XML forest parameter to true, we get an XML structure as shown below. Every row in the USERS table is mapped to a <USERS> element.

<USERS>
<USERID>U01</USERID>
<NAME>Tom Jones</NAME>
<RATING>B</RATING>
</USERS>
<USERS>
<USERID>U02</USERID>
<NAME>Mary Doe</NAME>
<RATING>A</RATING>
</USERS>
...

In case you are curious why this mapping option is named SQL/XML forest, not there is not a single root element in this scenario, this is what we call an XML forest and hence the name of this option.

In the end there isn’t much of a difference between both mapping approaches, one is not better than the other. It is more a matter of taste than anything else.

As the XML structure is different depending on the mapping, it will affect the way our queries look like. Consider a simple example, get the description of the item with itemno 1004,

select DESCRIPTION
from ITEMS
where ITEMNO = '1004'

Assuming SQL/XML forest set to true this becomes,

for $item in collection("ITEMS")/ITEMS
where $item/ITEMNO = "1004"
return $item/DESCRIPTION

Set to false we have to write the query as follows, note the additional row step,

for $item in collection("ITEMS")/ITEMS/row
where $item/ITEMNO = "1004"
return $item/DESCRIPTION

DataDirect XQuery supports both mappings, the default for the SQL/XML forest parameter is true. All future examples and use cases in this series will assume this default.

Mapping NULL values

SQL NULL values are mapped as absent elements. Just like there is a difference between an empty string and NULL in SQL, there is a difference in XML between an empty element and an absent element.
Suppose Tom Jones’ rating is NULL, and Mary Doe’s rating it the empty string. The XML representation of our USERS table would be as follows,

<USERS>
<USERID>U01</USERID>
<NAME>Tom Jones</NAME>
</USERS>
<USERS>
<USERID>U02</USERID>
<NAME>Mary Doe</NAME>
<RATING/>
</USERS>
...

Retrieving all users which have a rating in SQL,

select * from USERS
where RATING IS NOT NULL

We’ll explain it in more detail in a subsequent post, but the XQuery variant is as follows,

for $user in collection("USERS")/USERS
where $user/RATING
return $user

Another way to represent a NULL value is by using the XML Schema xsi:nil mechanism. The USERS table outlined above would be structured as follows,

<USERS>
<USERID>U01</USERID>
<NAME>Tom Jones</NAME>
<RATING xsi:nil="true"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>
</USERS>
<USERS>
<USERID>U02</USERID>
<NAME>Mary Doe</NAME>
<RATING/>
</USERS>
...

All users with a rating are retrieved as follows,

for $user in collection("USERS")/USERS
where $user/RATING/@xsi:nil = "true"
return $user

To me the xsi:nil approach looks much more verbose. DataDirect XQuery represents SQL NULL through an absent element.

Accessing relational data through XQuery

We haven’t yet explained in detail how to reference, to subsequently query, a SQL table in your XQuery. As you have figured out already from the previous examples, we reference a SQL table through the collection() function, passing as argument the SQL table name. Note that this is a convention specific to DataDirect XQuery. As XQuery doesn’t offer a standard way to access your relational database, other products might have chosen a different approach.

Next are equivalent SQL and XQuery queries,

select  * from USERS

for $u in collection("USERS")/USERS
return $u

If you want to query a table in a specific schema,

select * from MYAUCTION.USERS

for $u in collection("MYAUCTION.USERS")/USERS
return $u

And of course similar if you want to query a table in a particular catalog.

The XQuery Data Model

In this post we learned how to represent relational data in XML, we’ll use this XML representation throughout our subsequent posts in this series.
But of course that there is much more to say about the XQuery Data Model. However this is out of scope for today, we’ll point out some of the major characteristics in a later post in this series. Also note there are various good books covering the subject, or check out the XQuery 1.0 and XPath 2.0 Data Model.

Next we’ll have a closer look into joins.

Tech Tags:

Labels: , ,

Sunday, June 29, 2008

XQuery for the SQL programmer – Introduction

At DataDirect we often get XQuery coding questions from SQL programmers. It can probably be explained by two main reasons. First there are many developers with a SQL background, not surprising they are looking for some analogy with SQL during their first XQuery steps. Second, with DataDirect XQuery you can use XQuery against relational databases, DataDirect XQuery is a natural inroad for SQL programmers into the world of XML and XQuery.

As explained last week, we have good experiences teaching XQuery. I thought it would be a good idea to start a series "XQuery for the SQL programmer" - a light introduction to XQuery from a SQL perspective.
This series is not a general introduction to XQuery. We assume you have already some notions of XQuery (or XPath). You can always refresh your XQuery knowledge reading Learn XQuery in 10 Minutes: An XQuery Tutorial.

The following essays are available in this series. This list is updated as posts become available, bookmark it if you want to have the update-to-date and complete list at first hand.

This series includes a lot of examples, based on the schema of Use Case "R" from the XML Query Use Cases. It’s a simple auction system, consisting of three tables. Here is the description taken from the XML Query Use Cases document.

The auction maintains a USERS table containing information on registered users, each identified by a unique userid, who can either offer items for sale or bid on items. An ITEMS table lists items currently or recently for sale, with the userid of the user who offered each item. A BIDS table contains all bids on record, keyed by the userid of the bidder and the item number of the item to which the bid applies

The three tables look as follows:
USERS

ITEMS

BIDS

As said, this series make extensive use of examples, all the XQuery examples are fully functional and have been tested with DataDirect XQuery.

To get started, let’s look at a first but simple SQL query. Get the item numbers of all motorcycle items.

select ITEMNO
from ITEMS
where DESCRIPTION = 'Motorcycle'

Or in XQuery you could write.

for $item in collection("ITEMS")/ITEMS
where $item/DESCRIPTION = "Motorcycle"
return $item/ITEMNO

This is a FLWOR expression, which stands for "for-let-where-orderby-return". I guess you see the analogy with a SQL select statement. Suppose we want to sort the results by item number. In SQL we add an order by clause.

select ITEMNO
from ITEMS
where DESCRIPTION = 'Motorcycle'
order by ITEMNO

In XQuery it’s similar to add an order by clause.

for $item in collection("ITEMS")/ITEMS
where $item/DESCRIPTION = "Motorcycle"
order by $item/ITEMNO
return $item/ITEMNO

This was of course not much more than a quick introduction. In our next post we’ll start with the real work, and discuss the Relational and XML data models.
Stay tuned!

Tech Tags:

Labels: , ,

Thursday, June 12, 2008

Teaching XQuery to SQL developers

Dan McCreary wrote about his Teaching XSLT vs. Teaching XQuery experience.

At DataDirect we have also good experience teaching XQuery, and a developer with some SQL experience is definitely not in disadvantage. A lot of the query concepts in SQL are easily mapped into XQuery. Priscilla Walmsley’s book, which I highly recommend, devotes a complete chapter on this topic.

As a SQL developer you might be wondering, why using XQuery to talk to my favorite relational database? Minollo blogged about it a while back. Yes there are some good reasons to have XQuery in your toolbox in addition to good old SQL.

XQuery is the natural choice to bridge the relational and XML world, with products like XML Converters you can easily extend this from XML to the complete EDI world. And tools like the XQueryWebService Framework make it trivial to build powerful data services that query, aggregate and update multiple data sources.

Increased developer’s productivity is great, but equally important is performance and scalability. This is exactly one of the focus areas of DataDirect XQuery, for example by taking full advantage of your current SQL investments.

Labels: , , , ,

Thursday, March 27, 2008

Joining XML with RDBMS, how and what are the consequences?

A few months ago, my friend Minollo blogged about XQuery against RDBMS: let the engine optimize your SQL. And last week again, a question on some newsgroup attracted our attention.

Imagine you have an XML document with customers, there might be one or a few hundred, and want to retrieve all orders for these customers out of your Oracle database. One could write something as follows, using a general comparison in the where clause.

for $order in collection("ORDERS")/ORDERS
where $order/O_CUSTKEY = doc("customers.xml")/customers/customer
return
<order id="{$order/O_ORDERKEY}">
<customer>{$order/O_CUSTKEY/data(.)}</customer>
<date>{$order/O_ORDERDATE/data(.)}</date>
<price>{$order/O_TOTALPRICE/data(.)}</price>
</order>

As explained in the white paper Implementing XQuery efficiently for relational databases, DataDirect XQuery always handles general comparisons efficiently and translates them into SQL - taking advantage of database indexes and only retrieving the relevant data out of the database.

Most other XQuery implementations don't translate such general comparisons into SQL, because the second argument of the general comparison is a sequence of customers. In other words, the where clause is not evaluated by the database engine, with all the negative performance and scalability consequences that implies. Basically, all records in the ORDERS table are retrieved out of the database, and filtering is accomplished in memory.

As advised on the newsgroup, one can rewrite the query as follows.

for $customer in doc("customers.xml")/customers/customer
for $order in collection("ORDERS")/ORDERS
where $order/O_CUSTKEY = $customer
return
<order id="{$order/O_ORDERKEY}">
<customer>{$order/O_CUSTKEY/data(.)}</customer>
<date>{$order/O_ORDERDATE/data(.)}</date>
<price>{$order/O_TOTALPRICE/data(.)}</price>
</order>

Both queries yield the same result. Well, to be technically accurate, there are some differences.

  • the order of the returned orders might be different. With the second query, the orders are grouped per customer
  • in order to make both queries equivalent, with the second alternative, you need to get distinct values out of the customers.xml
But ok, let's assume for a moment both queries yield the same result.

One interesting thing about this second query, compared to the first, is that most XQuery implementations are capable of translating this where clause in SQL as both arguments of the comparison are single values. And you end up with SQL as follows, executed through a nested loop algorithm.

select  O_ORDERKEY, O_CUSTKEY, O_ORDERDATE, O_TOTALPRICE
from ORDERS
where O_CUSTKEY = ?

Further, better implementations will try to minimize the number of roundtrips to the database, and combine multiple requests in a single SQL statement. Smells like a topic for another post, anyway...

Looks great, our initial concerns are answered, the where clause is evaluated by the SQL engine. We're perfectly happy, right?

Unfortunately, there is a significant drawback with the second query! Suppose, for example, we want to order our results by ORDERDATE and TOTALPRICE.

for $customer in doc("customers.xml")/customers/customer
for $order in collection("ORDERS")/ORDERS
where $order/O_CUSTKEY = $customer
order by $order/O_ORDERDATE, $order/O_TOTALPRICE
return
<order id="{$order/O_ORDERKEY}">
<customer>{$order/O_CUSTKEY/data(.)}</customer>
<date>{$order/O_ORDERDATE/data(.)}</date>
<price>{$order/O_TOTALPRICE/data(.)}</price>
</order>

Although the where clause is translated into SQL, the order by clause is not. It is simply not possible! Remember that multiple SQL statements will be executed in a nested loop, all these results are buffered in memory, after which we need to order the buffered results. This is clearly a significant drawback of the proposed alternative. Handing order by clauses out of your database engine, implies a serious performance penalty and excessive memory consumption, even with medium sized query results.

Let's go back to our initial query and add the order by clause.

for $order in collection("ORDERS")/ORDERS
where $order/O_CUSTKEY = doc("customers.xml")/customers/customer
order by $order/O_ORDERDATE, $order/O_TOTALPRICE
return
<order id="{$order/O_ORDERKEY}">
<customer>{$order/O_CUSTKEY/data(.)}</customer>
<date>{$order/O_ORDERDATE/data(.)}</date>
<price>{$order/O_TOTALPRICE/data(.)}</price>
</order>

If you have an XQuery implementation that is capable of translating such where clauses into SQL, the topic this post started with, it can as a consequence go further and also translate the order by clause into SQL! With DataDirect XQuery, the execution plan for this last query is as follows. Note that both the where and order by clause are pushed into SQL and processed by your Oracle database.

As you see, choosing the right XQuery implementation can make a huge difference. If you want to find out more about how DataDirect XQuery generates SQL and leverages the strengths of your current database investment, check out this white paper.

Labels: , , , ,

Tuesday, August 28, 2007

XQuery against RDBMS: let the engine optimize your SQL


We recently noticed some questions on a newsgroup that attracted our attention. The question was something similar to this:

How to create an XQuery which generates SQL like SELECT t1.columnName1 FROM Table t1 WHERE t1.columnName2 IN('as','fa','pr')

My immediate reaction was: why is he worrying about that? You should't need to think about how to write an XQuery to obtain a specific SQL; it is the XQuery processor's goal to digest your XQuery and make the "best" SQL out of it...

The obvious XQuery that comes to my mind would be...

for $ts in fn:collection("Table")/Table
where $t1/columnName2 = ('as','fa','pr')
return $ts/columnName1


Strangely enough, experts on that newsgroup suggested to instead write the query like:

for $t in Table() where $t/columnName2 = 'as' or $t/columnName2 = 'fa' or $t/columnName2 = 'pr'
return $t


...or, assuming a sequence of values on which to filter:

for $v in $values for $t in Table()
where $t/columnName2 = $v
return $t


Why would they suggest such an unnatural way to solve that problem in XQuery? The reality is that I'm thinking in terms of what DataDirect XQuery would do; while they are thinking about different XQuery processors. DataDirect XQuery has been designed to re-write XQuery expressions in SQL without forcing the user to code XQuery in a specific way. In XQuery it's possible to express the same logic in many different ways; but it shouldn't be the XQuery author's responsibility to guess about how the underlying XQuery engine optimizes what he writes; it should be the XQuery engine that is able to take the "right decisions" no matter how the user codes the solution in XQuery (at least in reasonably equivalent scenarios, like the one described above).

For the record, the XQueries described above all end up generating the same execution plan in DataDirect XQuery:
If you are interested in more details about how DataDirect XQuery generates SQL when running XQuery against Relational data sources, a good source of information is: the Generating SQL white paper as Consistent SQL Generation.


Labels: , , , ,

Wednesday, August 15, 2007

Why do I want to use XQuery against relational databases?


A few days ago someone brought to my attention a post by Elliotte Harold about "The State of Native XML Databases", which also mentions DataDirect XQuery:
"DataDirect XQuery is not itself a database. Rather it is an adapter layer that sits on top of your existing payware database such as SQL Server or Oracle and provides an XQuery interface. Why you’d want to use XQuery instead of SQL when talking to a relational database, I’ve never quite been able to fathom. Data Direct XQuery also has adapters for XML files, EDI, and other flat files."

Why would I want to use XQuery when talking to a relational database? I've spent a good portion of my last two years demonstrating how DataDirect XQuery can be used to solve problems that also require access to relational databases; so, let me describe a few usecases I have seen; in all of them, being able to use XQuery rather than Java+SQL makes developer's life much simpler and productive.

I need to query my database to generate XML documents
We have recently published a customer story about this; it's not infrequent that data needs to be transferred/communicated in XML format. So, even when the data is all stored in relational databases, SQL (plus whatever language around it to "massage" the data) is not necessarily the easiest path to generate XML representing portions of the relational data. The customer in question has commented on some quantitative comparison between *how much* easier it is; it's not difficult believe it if you know the technologies in question.

I need to process an incoming XML message and respond including data stored in my database
This is a very frequent usecases; in some cases instead of receiving an XML request the user is dealing even with EDI; but there are tools that make the conversion of EDI to XML quite easy, so the problem ends up really being the same. How would you address that problem without an XQuery processor able to handle relational database access? You would probably write a Java application that parses the incoming XML document, even these days most likely you would materialize it in an XML DOM, fetch the information you needs from it, issue SQL queries over JDBC to get more related information, and finally you would create an XML document to return the requested result(s). We did try implementing this exercise dealing with some ACORD XML requests, and a relational back-end storing the various insurance policy details; we will soon publish more information and code samples on http://www.xquery.com/; in the meanwhile, I'll need to leave it to your imagination how the Java code looks like, and how it compares to the equivalent XQuery+Java code that would achieve the same goal using DataDirect XQuery.

I need to shred the content of my XML document into my database
This is one of the reasons why we have added RDBMS update capabilities in the latest version of DataDirect XQuery; it is another quite popular usecase. Again you are dealing with the necessity to navigate an XML document, find the proper information in there, and then execute operations against a relational database based on that information. Once again the same problem can indeed be solved using Java+SQL+some XML API; but isn't it more productive to use an XML native language featuring the ability to access both XML and relational data in the same context?


To wrap up, I believe the real question I would ask myself is not "Why would I want to use XQuery when talking to a relational database?", but rather "Can I use XQuery to access a relational database achieving the same levels of scalability and performance that I could achieve using a combination of Java and SQL?". That's exactly where the difficulty is, and where XQuery implementations differ dramatically. Of course it would be easy for me to answer "Sure you can! DataDirect XQuery has been designed exactly with that goal in mind!"; and I could point you to a variety of literature on http://www.xquery.com/ (the most interesting one being probably this one by Marc). But instead, I encourage you to try it yourself; get a fully functional trial version from http://www.xquery.com/download, and give it a try, using your own usecase, your own database and your own data set. That's the only way you will actually experience how XQuery can make things easier for you without scalability and performance compromises.


Labels: , , , ,