XML Connections

Wednesday, February 20, 2008

XQJ goes for Final Approval Ballot

The XQJ specification (JSR 225) has been submitted to the JCP for final approval. The Final Approval Ballot lasts from February 19 until March 3. I'll keep you posted here.

Want to know more about XQJ, we have a tutorial covering most functionality. And there are already several implementations available, including DataDirect XQuery.

Tech Tags:

Labels:

Monday, February 18, 2008

Converting proprietary file formats to specific XML structures


We have talked in the past about how you can use XQuery to deal with EDI message (transform a directory of EDI messages, create EDI messages out of your database, converting tab delimited files into EDI). But sometimes you are not dealing with EDI; it's not infrequent that companies still rely on proprietary file formats to exchange or circulate information.
For example, a few days ago we received a request from a user who needed to convert a file in a proprietary format into a specific XML structure. The sample file he sent us looked something like this:


10       0123456789MR JOHN            SMITH
20       0123456789ACCT1
30       0123456789SALARY       500000
30       0123456789BONUS          1000
20       0123456789ACCT2
30       0123456789OTHER           100
10       1234566790MR DAVID           DOE
20       1234567890ACCT1
30       1234567890SALARY        10000


This isn't EDI, CSV, tab delimited or any of the other standard or semi-standard ways to represent information in text files; it's a proprietary format based on a specific interpretation of the size and position of the fields. This is the description of records and fields that the user shared with us:


one block is made of:
  one record 10 (customer)
  1 to N record 20 per record 10 (accounts)
  1 to N record 30 per record 20 (transactions)

record 10
  record type
on 10 characters
  customer id on 10 characters
  customer lastname on 20 characters
  customer firstname on 30 characters

record 20
  record type
on 10 characters
  customer id on 10 characters
  account id on 10 characters

record 30
  record type
on 10 characters
  customer id on 10 characters
  label on 10 characters
  value on 10 characters

How can we deal with something like this without building our own parser? DataDirect XML Converters and Stylus Studio can definitely help! XML Converters support what we call "custom conversions"; a custom conversion defines how files belonging to the same format family should be converted to XML by XML Converters. As the definition of a custom conversion can be a tedious task, Stylus Studio provides an editor that allows you to create custom conversion through an intuitive graphical interface. Explaining how that would work in this case is probably more complicated than actually doing it... so, here is a short video which describes all the operations you would do in Stylus Studio to convert the file provided by the user into XML.


Thanks to the custom conversion created using Stylus Studio, we are now able to manipulate the proprietary format file as XML; but, as mentioned above, the user needs to convert the file into a specific XML format. Using the example above, the desired XML result should look like this:


<root>
    <record_10>
        <type>10</type>
        <customer_id>0123456789</customer_id>
        <lastname>MR JOHN</lastname>
        <firstname>SMITH</firstname>
    </record_10>
    <record_20>
        <type>20</type>
        <customer_id>0123456789</customer_id>
        <account_id>ACCT1</account_id>
    </record_20>
    <record_30>
        <type>30</type>
        <customer_id>0123456789</customer_id>
        <label>SALARY</label>
        <value>500000</value>
    </record_30>
    <record_30>
        <type>30</type>
        <customer_id>0123456789</customer_id>
        <label>BONUS</label>
        <value>1000</value>
    </record_30>
    <record_20>
        <type>20</type>
        <customer_id>0123456789</customer_id>
        <account_id>ACCT2</account_id>
    </record_20>
    <record_30>
        <type>30</type>
        <customer_id>0123456789</customer_id>
        <label>OTHER</label>
        <value>100</value>
    </record_30>
    <record_10>
        <type>10</type>
        <customer_id>1234566790</customer_id>
        <lastname>MR DAVID</lastname>
        <firstname>DOE</firstname>
    </record_10>
    <record_20>
        <type>20</type>
        <customer_id>1234567890</customer_id>
        <account_id>ACCT1</account_id>
    </record_20>
    <record_30>
        <type>30</type>
        <customer_id>1234567890</customer_id>
        <label>SALARY</label>
        <value>10000</value>
    </record_30></root>

This is where the combination of DataDirect XQuery and XML Converters proves particularly helpful: not only we are able to move from a proprietary file format to well formed XML, but we also have all the power, performance and scalability of XQuery to deal with such XML to transform it (potentially, even augment it!) into the final format that we need. The following XQuery - which implicitly relies on the custom converter described above - does the trick; notice that we created a couple of functions to take care of the positional grouping which is required to generate the desired output; that makes the XQuery much easier to read and more reusable:


declare function local:getRelated20s($item) {
    let $nextItem :=$item/following-sibling::*[local-name()!="record_20"and local-name()!="record_30"][1]
    for $related in $item/following-sibling::*[local-name()="record_20"]
    where if ($nextItem) then $related << $nextItem else true()
    return $related
};
declare function local:getRelated30s($item) {
    let $nextItem := $item/following-sibling::*[local-name()!="record_30"][1]
    for $related in $item/following-sibling::*[local-name()="record_30"]
    where if($nextItem) then $related << $nextItem else true()
    return $related
};
<DataSets>
    {
    for $record_10 in doc('converter:file:///c:/blog1/sample.conv?sampleinput.txt')/root/record_10
    return
    <DataSet>
        <CustomerId>
            {$record_10/customer_id/text()}
        </CustomerId>
        <CustomerName>
            {$record_10/lastname/text()}
        </CustomerName>
        <CustomerFirstName>
            {$record_10/firstname/text()}
        </CustomerFirstName>
        <Accounts>
            {
            for $record_20 in local:getRelated20s($record_10)
            return
            <Account>
                <AccountId>
                    {$record_20/account_id/text()}
                </AccountId>
                <Transactions> {
                    for $record_30 in local:getRelated30s($record_20)
                    return
                        <Transaction>
                            <Label>{$record_30/label/text()}</Label>
                            <Value>{$record_30/value/text()}</Value>
                        </Transaction>
                } </Transactions>
            </Account>
            }
        </Accounts>
    </DataSet>
    }
</DataSets>

Once again, thanks to the fact that DataDirect XQuery and XML Converters make it possible to manipulate a wide variety of data sources (even proprietary ones!), you are able to deal with transformation (and possibly aggregation) problems from an XML point of view, shielding as much as possible those tasks from the low level details of the data sources.

Labels: , , , ,

Wednesday, February 6, 2008

XQuery creates HTML including images

This is the fourth post in the series XQuery generating multiple documents. Today we show how to create an HTML document referencing JPEG files through XQuery. The DataDirect XQuery built-in function ddtek:serialize-to-url is used to save the JPEG images.

The idea is that all the data, including the images, are stored in a relational database. The queries in this post have been tested with DataDirect XQuery against SQL Server, but as we will demonstrate this works perfectly against any other database supported by DataDirect XQuery - Oracle, DB2, Sybase, MySQL, etc.

XQuery is known to query and produce XML, but how can it be used to produce images, JPEG files in this particular example? This is another example where the DataDirect XML Converters come to rescue.

The JPEG images are stored in our database, and are retrieved by DataDirect XQuery as xs:hexbinary data. Subsequently the DataDirect XML Converters are used to convert the xs:hexbinary into a binary stream, which is saved as a JPEG file.

The following query shows how to generate the main HTML page which references the JPEG files. The idea here is to save the HTML document in the same directory as the JPEG files.
The table in our SQL Server database is fairly simple, having 3 columns with the name of the image, a small description and the actual picture.

<html>{
"My favourite pictures.",<p/>,
(: loop over all images in the database:)
for $image in collection("images")/images
(: the file name of the JPEG file :)
let $filename := concat($image/name, ".jpg")
(: create some virtual XML, which will be
processed by the XML Converters :)
let $jpeg := <jpeg>{$image/image/text()}</jpeg>
return
(<a href="./{$filename}">{
$image/description,
(: create the JPEG file :)
ddtek:serialize-to-url($jpeg,
concat("C:/test", $filename),
"method=binary")
}</a>,
<br/>)
}</html>

In this example we've used JPEG files, but you can apply the outlined technique with any other binary format.

Tech Tags:

Labels: