XML Connections

Tuesday, August 5, 2008

XQuery for the SQL Programmer - Grouping and aggregation

Today's topic in the XQuery for the SQL programmer series is grouping.
Where grouping is built-in the SQL language, this is unfortunately a bit less trivial in XQuery. Let's start with a simple example, count the number of bids for every user.

select USERID, count(*)
from BIDS
group by USERID

XQuery has no built-in grouping construct, but you can achieve the same using the distinct-values() function, which we introduced in our last week's post.

for $userid in distinct-values(collection("BIDS")/BIDS/USERID)
return
<user>
<userid>{$userid}</userid>
<count>{count(collection("BIDS")/BIDS[USERID = $userid])}</count>
</user>

Let's now add a having clause, we're interested in the users with more than 1 bid,

select USERID, count(*)
from BIDS
group by USERID
having count(*) > 1

Our previous XQuery is changed as follows. Note that we use a let-clause, which we introduced in Joining Data.

for $userid in distinct-values(collection("BIDS")/BIDS/USERID)
let $count := count(collection("BIDS")/BIDS[USERID = $userid])
where $count > 1
return
<user>
<userid>{$userid}</userid>
<count>{$count }</count>
</user>

And we further detail our query, as we're only interested in significant bids of more than 100$.

select USERID, count(*)
from BIDS
where BID > 100
group by USERID
having count(*) > 1

In XQuery we do,

for $userid in distinct-values(collection("BIDS")/BIDS/USERID)
let $count := count(collection("BIDS")/BIDS[USERID = $userid and BID > 100])
where $count > 1
return
<user>
<userid>{$userid}</userid>
<count>{$count }</count>
</user>

As a final example, let's discuss a grouping scenario based on two values. For all items group on the rating of the user, and count the number of bids, in addition to the average bid,

select ITEMNO, RATING, AVG(BID), COUNT(*) from xvs001.BIDS, xvs001.USERS
where USERS.USERID = BIDS.USERID
GROUP BY RATING, ITEMNO
ORDER BY ITEMNO, RATING

In XQuery one could write the following query. Not the two for-loops over distinct-values().

for $rating in distinct-values(collection("USERS")/USERS/RATING)
for $itemno in distinct-values(collection("BIDS")/BIDS/ITEMNO)
for $userid in collection("USERS")/USERS[RATING=$rating]/USERID
let $bids := collection("BIDS")/BIDS[ITEMNO=$itemno and USERID = $userid]
let $avg := avg($bids/BID)
let $count := count($bids)
where $bids
order by $itemno, $rating
return
<result>
<<itemno>{$itemno}</itemno>
<rating>{$rating }</rating>
<avg>{$avg}</avg>
<count>{$count}</count>
</result>

We can conclude that resolving grouping problems in XQuery 1.0 is less simple than in SQL. Unfortunate, but today's reality.
In the margin, the XML Query Working Group has recognized the issue, and is active working on adding native grouping capabilities to the language. All this in scope of XQuery 1.1. Today still a working draft, but at least we know it will change and improve over time.

Tech Tags:

Labels: , ,

0 Comments:

Post a Comment



<< Home