XML Connections

Thursday, July 31, 2008

XQuery for the SQL Programmer - Operators, functions and conditions

This is our fourth post in the XQuery for the SQL programmer series. Today we'll have a closer look at the SQL functions, operators, and conditions used in where clauses. And of course we'll explain how to accomplish the same in XQuery.

Comparisons

All of us are using comparison operators frequently in SQL, especially equality (=). Not surprising, we've been using it already a few times in examples of previous posts of this series.

Select all motorcycles,

select *
from ITEMS
where DESCRIPTION = 'Motorcycle'

is written in XQuery as follows,

for $item in collection("ITEMS")/ITEMS
where $item/DESCRIPTION = "Motorcycle"
return $item

Looks like = in SQL is the same as = in XQuery, but actually that is not entirely true. In SQL the comparison operators are used to compare two values. In XQuery, the comparisons operate on sequences of values and as such have existential behavior. A few examples of = in XQuery,

  • 1 = 2 evaluates to false
  • 1 = (1,2) evaluates to true
  • 1 != (1,2) evaluates to true
Note the last two comparisons, they illustrate that = and != are not the inverse of each other. Another consequence of the existential behavior, equality is not transitive in XQuery.
  • (1,2) = (2,3) evaluates to true
  • (2,3) = (3,4) evaluates to true
  • But (1,2) = (3,4) evaluates to false!
In XQuery we call these operators (=, !=, <, <=, > and >=) general comparisons. In addition XQuery also has value comparisons (eq, ne, lt, le, gt, ge), which are used to compare single values. Although they have a different syntax, these value comparisons are equivalent to the SQL comparison operators. The example above could thus also be written as follows,

for $item in collection("ITEMS")/ITEMS
where $item/DESCRIPTION eq "Motorcycle"
return $item/ITEMNO

Arithmetic operators

A lot of analogy between the arithmetic operators in SQL and XQuery. +, - and * are equivalent.
Where SQL uses the / operator for divisions, this is not possible in XQuery. Remember that / is used in path expressions. So XQuery has a div operator.

IN clause

Note the similarity, to some extend, between the IN condition as known in SQL with XQuery's equality general comparison.

select ITEMNO
from ITEMS
where DESCRIPTION in ('Motorcycle', 'Bicycle')

Can be written in XQuery using a general comparison,

for $item in collection("ITEMS")/ITEMS
where $item/DESCRIPTION = ("Motorcycle", "Bicycle")
return $item/ITEMNO

XQuery doesn't have a dedicated construct like the SQL IN condition. As we will see throughout this post, in XQuery we take advantage of the rich set of built-in functions and operators to accomplish the same as with some of the specific SQL constructs.

LIKE condition

Take for example, the LIKE condition, frequently used in where clauses. XQuery doesn't have the concept of a LIKE condition, but the rich set of functions includes equivalents.

The matches() function has two arguments, the input string and a regular expression against which the input string is matched. Refer to the regular expression syntax specification for all the details. But in the context of the SQL LIKE condition, here is a quick introduction. In a regular expression a dot (.) represents any character, similar to the underscore (_) in SQL. Any number of any character matches percent (%) in SQL, this is represented as a dot with asterisk (.*)in XQuery. Further, in a regular expression ^ and $ indicate the start and end of the input string.

  • where DESCRIPTION LIKE 'Motor%' is written in XQuery as where matches($item/DESCRIPTION, "^Motor")
  • where DESCRIPTION LIKE '%Motor%' is written in XQuery as where matches($item/DESCRIPTION, "Motor")
  • where DESCRIPTION LIKE 'Motor_' is written in XQuery as where matches($item/DESCRIPTION, "^Motor.$")
  • where DESCRIPTION LIKE 'Motor%cycle' is written in XQuery as where matches($item/DESCRIPTION, "^Motor.*cycle$")

XQuery also offers a starts-with() and ends-with() function, which are equivalent to LIKE "string%" and LIKE "%string". In simple cases, starts-with() and ends-with() are more convenient than the powerful matches() function.

Get all items ending on "cycle",

select *
from ITEMS
where DESCRIPTION LIKE '%cycle'

An equivalent XQuery query is,

for $item in collection("ITEMS")/ITEMS
where ends-with($item/DESCRIPTION = "cycle")
return $item

Boolean operators

Similar as in SQL, you can also in XQuery combine Boolean expressions with "and" and "or". Negating a condition in SQL is expressed in two different ways.

Get all items which are not Bicycles,

select *
from ITEMS
where not(DESCRIPTION = 'Bicycle')

And get all items not ending on "cycle",

select *
from ITEMS
where DESCRIPTION NOT LIKE '%cycle'

In XQuery, we have again only one approach, using the not() function. The first example above can be expressed as follows,

for $item in collection("ITEMS")/ITEMS
where not($item/DESCRIPTION = "Bicycle")
return $item

And get all items not ending on "cycle",

for $item in collection("ITEMS")/ITEMS
where not(ends-with($item/DESCRIPTION = "cycle"))
return $item

DISTINCT values

DISTINCT as it exists in SQL, is not available in XQuery . Again, in XQuery this is accomplished through a function, distinct-values(). For example, retrieve all ids for users having placed a bid,

select DISTINCT USERID
from BIDS

Using the distinct-values() function this can be expressed as follows in XQuery,

distinct-values(collection("BIDS")/BIDS/USERID)

Note that we wrote this without a FLWOR expression. In case we would like to return the user ids ordered, we would need to use a FLWOR,

for $userId in distinct-values(collection("BIDS")/BIDS/USERID)
order by $userId
return $userId

Aggregate functions

Not surprising, the 5 aggregate functions in SQL and XQuery are equivalent,

  • avg
  • count
  • max
  • min
  • sum
Calculating the total value of all items in our auction,

select sum(RESERVE_PRICE) from xvs001.ITEMS

Is expressed in XQuery as follows,

sum(collection("ITEMS")/ITEMS/RESERVE_PRICE)

Scalar functions

Just as every database having a long list of scalar functions, XQuery also includes a wide range of built-in functions. Going through the complete list is out of scope for this post. In general usage is the same, only a matter of finding the equivalent XQuery function for your SQL function.

Retrieve all items with a description of more than 10 characters. In SQL Server we have the len() function.

select DESCRIPTION
from ITEMS
where len(DESCRIPTION) > 10

With the following equivalent XQuery,

for $item in collection("ITEMS")/ITEMS
where string-length($item/DESCRIPTION) > 10
return $item/DESCRIPTION

Suppose your favorite SQL function is not supported in the standard XQuery function library. Are you stuck? Not necessarily if you're using DataDirect XQuery, which allows you to call any of the functions available in your database.

Consider the ORACLE function INITCAP, it returns the specified string with each word's first letter in uppercase,

select INITCAP(DESCRIPTION)
from ITEMS

Querying my ORACLE data through DataDirect XQuery you can achieve the same as follows. All you need is to declare in your query the INITCAP function as being SQL specific, and under the covers DataDirect XQuery will take advantage of Oracles INITCAP implementation.

declare function ddtek-sql:INITCAP($s as xs:string) as xs:string external; 
for $item in collection('ITEMS')/ITEMS
return
ddtek-sql:INITCAP(lower-case($item/DESCRIPTION))

I hope you have now a better feel on the similarities and key differences between SQL and XQuery, when it comes to functions and operators.
Looking forward at our next post, we'll discuss grouping.

Tech Tags:

Labels: , ,

Wednesday, July 23, 2008

XQuery for the SQL programmer – Joining Data

Often you join multiple table in SQL queries. What about XQuery, how can we perform joins in XQuery? This is what this third post in the XQuery for the SQL programmer series is all about.

A first example

Assume you want to get all users with their bids. In SQL one could write,

select u.NAME, b.ITEMNO, b.BID
from USERS u, BIDS b
where u.USERID = b.USERID

In XQuery this is expressed as follows,

for $user in collection("USERS")/USERS
for $bid in collection("BIDS")/BIDS
where $user/USERID = $bid/USERID
return
<result>{
$user/NAME,
$bid/ITEMNO,
$bid/BID
}</result>

Conceptually, the for-clauses of a FLWOR expression generate an ordered sequence of tuples of variable bindings. Note the analogy with a cartesian-join in SQL (a cross-join if you want). Next the where-clause filters the sequence of tuples. In our example above this results in the appropriate join condition being applied.
Note XQuery doesn't have the concept of result sets, everything is XML. We create a small XML fragment in the return-clause, combining the user's name with the bid's information.

The example above showed a join between two tables. But just like in SQL, XQuery allows to have more complex joins. Simply add additional for-clauses to your FLWOR expression. The next example is based on the previous, but in addition the bids are joined with items to retrieve the corresponding item description,

for $user in collection("USERS")/USERS
for $bid in collection("BIDS")/BIDS
for $item in collection("ITEMS")/ITEMS
where $user/USERID = $bid/USERID and
$bid/ITEMNO = $item/ITEMNO
return
<result>{
$user/NAME,
$bid/ITEMNO,
$bid/BID,
$item/DESCRIPTION
}</result>

A word on outer joins

Up to now we've talked about inner join, what about outer joins? Get all users with their corresponding bids,

select u.NAME, b.ITEMNO, b.BID
from USERS u left outer join BIDS b
on u.USERID = b.USERID

In XQuery this can be expressed building hierarchical XML results, for each user we group the bids. Right, forget about tabular only data, in XML you can create hierarchical structures.

for $user in collection("USERS")/USERS
return
<user>{
$user/NAME,
for $bid in collection("BIDS")/BIDS
where $user/USERID = $bid/USERID
return
<bid>{
$bid/ITEMNO,
$bid/BID
}</bid>
}</user>

For example, Jack Sprat placed two bids, one for item 1003 and another on item 1007. Rip Van Winkle on the other hand doesn't have yet any bids.

...
<user>
<NAME>Jack Sprat</NAME>
<bid>
<ITEMNO>1003</ITEMNO>
<BID>20</BID>
</bid>
<bid>
<ITEMNO>1007</ITEMNO>
<BID>200</BID>
</bid>
</user>
<user>
<NAME>Rip Van Winkle</NAME>
</user>
...

Suppose now you want to represent the results in a more tabular structure. This requires a bit more work in XQuery...
Remember the first join example in this post,

for $user in collection("USERS")/USERS
for $bid in collection("BIDS")/BIDS
where $user/USERID = $bid/USERID
return
<result>{
$user/NAME,
$bid/ITEMNO,
$bid/BID
}</result>

This yields all the user-bid tuples we're looking for, but excludes the users not having placed any bids. Retrieving those users can be achieved as follows,

for $user in collection("USERS")/USERS
where fn:empty(collection("BIDS")/BIDS[USERID = $user/USERID])
return
<result>{
$user/NAME
}</result>

We simply get all users, and only consider those which have no bids through the where-clause. Note that we use a more concise path expression in,

where fn:empty(collection("BIDS")/BIDS[USERID = $user/USERID])

If you prefer to use FLWOR expressions consistently the previous query becomes,

where fn:empty(for $bid in collection("BIDS")/BIDS
where $bid/USERID = $user/USERID
return $bid)

Note this is only about syntax difference. Both forms are semantically equivalent, and good XQuery implementations yield the same performance characteristics.

Back to our outer join subject, now we can combine both queries using the comma-operator,

for $user in collection("USERS")/USERS
for $bid in collection("BIDS")/BIDS
where $user/USERID = $bid/USERID
return
<result>{
$user/NAME,
$bid/ITEMNO,
$bid/BID
}</result>
,
for $user in collection("USERS")/USERS
where fn:empty(collection("BIDS")/BIDS[USERID = $user/USERID])
return
<result>{
$user/NAME
}</result>

For Jack Sprat and Rip Van Winkle we get results as follows,

<result>
<NAME>Jack Sprat</NAME>
<ITEMNO>1003</ITEMNO>
<BID>20</BID>
</result>
<result>
<NAME>Jack Sprat</NAME>
<ITEMNO>1007</ITEMNO>
<BID>200</BID>
</result>
<result>
<NAME>Rip Van Winkle</NAME>
</result>

You can fairly easy extend the last query to perform a full outer join. Only a matter of adding a third sub-expression.

If you want to order the result by name and itemno, simply wrap the previous query with an additional FLWOR expression,

for $result in
(for $user in collection("USERS")/USERS
for $bid in collection("BIDS")/BIDS
where $user/USERID = $bid/USERID
return
<result>{
$user/NAME,
$bid/ITEMNO,
$bid/BID
}</result>
,
for $user in collection("USERS")/USERS
where fn:empty(collection("BIDS")/BIDS[USERID = $user/USERID])
return
<result>{
$user/NAME
}</result>)
order by
$result/NAME
return
$result

What about subselects?

In SQL we often use subselects. For example, get a list of all bids with the name of the bidder,

select (select u.NAME from USERS u where u.USERID = b.USERID),
b.ITEMNO,
b.BID
from BIDS b

Similar in XQuery you can write,

for $bid in collection("BIDS")/BIDS
return
<bid>{
for $user in collection("USERS")/USERS
where $user/USERID = $bid/USERID
return
$user/NAME,
$bid/ITEMNO,
$bid/BID
}</bid>

Or more concise,

for $bid in collection("BIDS")/BIDS
return
<bid>{
collection("USERS")/USERS[USERID = $bid/USERID]/NAME,
$bid/ITEMNO,
$bid/BID
}</bid>

Suppose we want to order the bids by the bidder's name,

select (select u.NAME from USERS u where u.USERID = b.USERID),
b.ITEMNO,
b.BID
from BIDS b
ORDER BY (select u.NAME from USERS u where u.USERID = b.USERID)

Rather than repeating the expression twice, use a let-clause in XQuery,

for $bid in collection("BIDS")/BIDS
let $name := collection("USERS")/USERS[USERID = $bid/USERID]/NAME
order by $name
return
<bid>{
$name,
$bid/ITEMNO,
$bid/BID
}</bid>

Similar to SQL, where you can use subselects at various locations in a select statement, this also applies to XQuery. FLWOR expressions can be nested. In fact, XQuery is much more open to this than SQL. Where in SQL a subselect can only be used in specific locations, in XQuery every expressions can be combined with any other expression.

In SQL you have a rich set of functions, in our next post we'll compare this with the functions and operators available in SQL.

Tech Tags:

Labels: , ,

Thursday, July 10, 2008

XQuery for the SQL programmer – The Data Model

In this second post of the XQuery for the SQL programmer series we’ll explain how to represent relational tables in XML and query them through XQuery.

SQL/XML mapping

The SQL/XML standard, specifies a mapping for relational tables into XML. Actually, SQL/XML includes a set of mapping rules. These mappings can be parameterized in several ways, including the target namespace for the XML structure, whether to handle nulls using xsi:nil or absence, and whether to map a table to a single element or a forest of elements.

This last parameter – is the table mapped to a forest or not? – is mostly affecting the XML structure. Here is the mapping for the USERS table with SQL/XML forest parameter set to false. As we see there is a single <USERS> root element, and every row in our table becomes a <row> element. Each <row> element contains elements representing every single column.

<USERS>
<row>
<USERID>U01</USERID>
<NAME>Tom Jones</NAME>
<RATING>B</RATING>
</row>
<row>
<USERID>U02</USERID>
<NAME>Mary Doe</NAME>
<RATING>A</RATING>
</row>
...
</USERS>

Setting the SQL/XML forest parameter to true, we get an XML structure as shown below. Every row in the USERS table is mapped to a <USERS> element.

<USERS>
<USERID>U01</USERID>
<NAME>Tom Jones</NAME>
<RATING>B</RATING>
</USERS>
<USERS>
<USERID>U02</USERID>
<NAME>Mary Doe</NAME>
<RATING>A</RATING>
</USERS>
...

In case you are curious why this mapping option is named SQL/XML forest, not there is not a single root element in this scenario, this is what we call an XML forest and hence the name of this option.

In the end there isn’t much of a difference between both mapping approaches, one is not better than the other. It is more a matter of taste than anything else.

As the XML structure is different depending on the mapping, it will affect the way our queries look like. Consider a simple example, get the description of the item with itemno 1004,

select DESCRIPTION
from ITEMS
where ITEMNO = '1004'

Assuming SQL/XML forest set to true this becomes,

for $item in collection("ITEMS")/ITEMS
where $item/ITEMNO = "1004"
return $item/DESCRIPTION

Set to false we have to write the query as follows, note the additional row step,

for $item in collection("ITEMS")/ITEMS/row
where $item/ITEMNO = "1004"
return $item/DESCRIPTION

DataDirect XQuery supports both mappings, the default for the SQL/XML forest parameter is true. All future examples and use cases in this series will assume this default.

Mapping NULL values

SQL NULL values are mapped as absent elements. Just like there is a difference between an empty string and NULL in SQL, there is a difference in XML between an empty element and an absent element.
Suppose Tom Jones’ rating is NULL, and Mary Doe’s rating it the empty string. The XML representation of our USERS table would be as follows,

<USERS>
<USERID>U01</USERID>
<NAME>Tom Jones</NAME>
</USERS>
<USERS>
<USERID>U02</USERID>
<NAME>Mary Doe</NAME>
<RATING/>
</USERS>
...

Retrieving all users which have a rating in SQL,

select * from USERS
where RATING IS NOT NULL

We’ll explain it in more detail in a subsequent post, but the XQuery variant is as follows,

for $user in collection("USERS")/USERS
where $user/RATING
return $user

Another way to represent a NULL value is by using the XML Schema xsi:nil mechanism. The USERS table outlined above would be structured as follows,

<USERS>
<USERID>U01</USERID>
<NAME>Tom Jones</NAME>
<RATING xsi:nil="true"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>
</USERS>
<USERS>
<USERID>U02</USERID>
<NAME>Mary Doe</NAME>
<RATING/>
</USERS>
...

All users with a rating are retrieved as follows,

for $user in collection("USERS")/USERS
where $user/RATING/@xsi:nil = "true"
return $user

To me the xsi:nil approach looks much more verbose. DataDirect XQuery represents SQL NULL through an absent element.

Accessing relational data through XQuery

We haven’t yet explained in detail how to reference, to subsequently query, a SQL table in your XQuery. As you have figured out already from the previous examples, we reference a SQL table through the collection() function, passing as argument the SQL table name. Note that this is a convention specific to DataDirect XQuery. As XQuery doesn’t offer a standard way to access your relational database, other products might have chosen a different approach.

Next are equivalent SQL and XQuery queries,

select  * from USERS

for $u in collection("USERS")/USERS
return $u

If you want to query a table in a specific schema,

select * from MYAUCTION.USERS

for $u in collection("MYAUCTION.USERS")/USERS
return $u

And of course similar if you want to query a table in a particular catalog.

The XQuery Data Model

In this post we learned how to represent relational data in XML, we’ll use this XML representation throughout our subsequent posts in this series.
But of course that there is much more to say about the XQuery Data Model. However this is out of scope for today, we’ll point out some of the major characteristics in a later post in this series. Also note there are various good books covering the subject, or check out the XQuery 1.0 and XPath 2.0 Data Model.

Next we’ll have a closer look into joins.

Tech Tags:

Labels: , ,