XML Connections

Tuesday, January 29, 2008

Converting tab delimited files into EDI


A few days ago I received a email saying "I am a DataDirect XQuery user, and I'm successfully using your product to create XML-based reports from incoming EDI messages. Now I'm in need to do something somewhat different: it may happen that my company receives tab-delimited messages, and those messages need to be translated to EDI X12 831, rather than converted to XML. Can your product help me handling tab-delimited to EDI conversions?".

Not being a marketing person, I usually like showing how things are done, rather than just claiming that they can be easily done using our products... :)
So, I asked the user for an example of his data, a tab-delimited file. If you have read some of the other posts in this Blog, you will know that DataDirect XQuery can access tab-delimited files thanks to the XML Converters technology. So, consider this tab-delimited file:

id description quantity unitprice
001 Telephone 1 23.00
002 Desk 1 129.00
003 Keyboard 2 21.00

The following XQuery will return, for example, all the description fields in the tab-delimited input message:

 <items> {
  for $item in doc("converter:TAB:first=yes?file:///c:/sample.tab")//row
  return $item/description
 } </items>

The result is:

 <items>
  <description>Telephone</description>
  <description>Desk</description>
  <description>Keyboard</description>
 </items>

That shows that we can handle a tab-delimited file in DataDirect XQuery and generate XML; but it doesn't help much our user: he needs to output EDI.
But we know how to do that; Marc has shown us in this post how you can create EDI messages from XQuery. We just need to apply the same idea to our example; the following XQuery uses the incoming tab-delimited file to apparently generate an XML fragment consistent with a specific vocabulary; that vocabulary is what the underlying XML Converters technology knows how to translate in the corresponding EDI message (an X12 832 message in this case, which is a price/sales catalog). The serialization option at the beginning of the XQuery is instructing DataDirect XQuery to output a raw EDI message rather than XML; that way we are moving data from a tab-delimited file format into an EDI message manipulating it as if it was all XML, but without ever materializing it as XML! Many of the EDI fields typically specified are omitted here; but this example will run if you want to see how it works.

 declare option ddtek:serialize "method=EDI,long=yes";
 <X12>
  <ISA>
   <ISA06-InterchangeSenderId>1515151515</ISA06-InterchangeSenderId>
   <ISA08-InterchangeReceiverId>5151515151</ISA08-InterchangeReceiverId>
   <ISA11-RepetitionSeparator>^</ISA11-RepetitionSeparator>
   <ISA13-InterchangeControlNumber>000032123</ISA13-InterchangeControlNumber>
   <ISA14-AcknowledgmentRequested>0</ISA14-AcknowledgmentRequested>
   <ISA15-UsageIndicator>P</ISA15-UsageIndicator>
   <ISA16-ComponentElementSeparator>*</ISA16-ComponentElementSeparator>
  </ISA>
  <GS>
   <GS01-FunctionalIdentifierCode>CT</GS01-FunctionalIdentifierCode>
   <GS02-ApplicationSendersCode>9988776655</GS02-ApplicationSendersCode>
   <GS03-ApplicationReceiversCode>1122334455</GS03-ApplicationReceiversCode>
   <GS06-GroupControlNumber>128</GS06-GroupControlNumber>
   <GS07-ResponsibleAgencyCode>X</GS07-ResponsibleAgencyCode>
   <GS08-VersionReleaseIndustry>004030</GS08-VersionReleaseIndustry>
  </GS>
  <TS_832>
   <ST>
    <ST01-TransactionSetIdentifierCode>832</ST01-TransactionSetIdentifierCode>
    <ST02-TransactionSetControlNumber>12345</ST02-TransactionSetControlNumber>
   </ST>
   <BCT>
    <BCT01-CatalogPurposeCode>CP</BCT01-CatalogPurposeCode>
    <BCT02-CatalogNumber>GOV56789</BCT02-CatalogNumber>
    <BCT10-TransactionSetPurposeCode>00</BCT10-TransactionSetPurposeCode>
   </BCT>
   {
    for $row in doc('converter:TAB:first=yes?file:///c:/sample.tab')/table/row
    return
     <GROUP_5>
      <LIN>
       <LIN02-ProductServiceIdQualifier>MF</LIN02-ProductServiceIdQualifier>
       <LIN03-ProductServiceId>{$row/id/text()}</LIN03-ProductServiceId>
       <LIN07-ProductServiceId>{$row/description/text()}</LIN07-ProductServiceId>
       <LIN09-ProductServiceId>{$row/quantity/text()}</LIN09-ProductServiceId>
      </LIN>
      <GROUP_6>
       <CTP>
        <CTP03-UnitPrice>{$row/unitprice/text()}</CTP03-UnitPrice>
       </CTP>
      </GROUP_6>
     </GROUP_5>
   }
   <SE/>
  </TS_832>
  <GE/>
 </X12>

Curious about the result? Here you go:
ISA+00+ +00+ +ZZ+1515151515 +ZZ+5151515151 +080129+1710+U+00000+000032123+0+P+*'
GS+CT+9988776655+1122334455+20080129+1710+128+X+004030'
ST+832+12345'
BCT+CP+GOV56789++++++++00'
LIN++MF+001++++Telephone++1'
CTP+++23'
LIN++MF+002++++Desk++1'
CTP+++129'
LIN++MF+003++++Keyboard++2'
CTP+++21'
SE+9+12345'
GE+1+128'
IEA+1+000032123'

The beauty of this approach is that you can use it in conjunction with all the other data aggregation capabilities of DataDirect XQuery, like possibly adding data to the outgoing EDI message which is not available in the incoming tab-delimited file but available inside a relational database (like a encoded field that may need to be looked up in a table); and this will all work taking advantage of the performance and scalability features of DataDirect XQuery and XML Converters: the example above will seamlessly stream from tab-delimited to EDI without ever allocating in memory for than a few items per time.

The same technique applies of course to all the data sources supported by DataDirect XQuery and XML Converters; you can find more details at the following links:

Labels: , , , , ,

Thursday, January 17, 2008

Split up an XML document

In the series XQuery generating multiple XML documents we talk today about splitting a large XML document in several smaller ones. In some scenarios such huge documents are simply unmanageable, requiring the need to split them up.

Consider one of the Shakespeare plays marked up in XML, for example Hamlet. We want to split for this play each speech in a separate XML document.

for $speech at $i in
doc("http://www.andrew.cmu.edu/user/akj/shakespeare/hamlet.xml")
/PLAY//SPEECH
let $url := concat("C:/SHAKESPEAR/",
string-join($speech/SPEAKER,"_"),
$i, ".xml")
return ddtek:serialize-to-url($speech, $url,
"omit-xml-declaration=no,indent=yes")

It outputs 1138 files in the C:\SHAKESPEAR directory.

Or you might need to generate an HTML file for each speech, and have a document referencing them all. In the next example, each of the speech HTML files are created through ddtek:serialize-to-url, the query result is the master HTML referencing the others.

<html>{
for $speech at $i in
doc("http://www.andrew.cmu.edu/user/akj/shakespeare/hamlet.xml")
/PLAY//SPEECH
let $url := concat("C:/SHAKESPEAR/",
string-join($speech/SPEAKER,"_"),
$i, ".html")
let $htmlspeech := <html>
<b>{$speech/SPEAKER/text()}</b>:<br/>{
for $line in $speech/LINE
return
($line/text(),<br/>)
}</html>
return
(
ddtek:serialize-to-url($htmlspeech, $url, "method=html"),
<a href="{$url}">{
$speech/SPEAKER/text()
}</a>,
<br/>
)
}</html>

As you see, using the ddtek:serialize-to-url function it becomes very natural to split XML documents in multiple ones. And of course, all this taking advantage of DataDirect XQuery's XML Streaming capabilities, enabling support for huge XML documents.

Tech Tags:

Labels: ,

Tuesday, January 8, 2008

Standard XQuery support for MySQL(R): it's available now!


As you may have noticed if you paid attention to our recent new releases, DataDirect XQuery now fully supports MySQL(R).
That means that finally MySQL users have access to a standard way (XQuery) to get access to their data as XML. To see how all this works, and how MySQL users can take advantage of this opportunity, we have created a few examples on our web site; see how easy it is to extract data from MySQL as XML, how to aggregate it with XML documents or even EDI documents; and how you can even update your MySQL data directly from XQuery.

Labels: , , , ,

Friday, January 4, 2008

Bulk load of XML data into a relational database


Recently I received a question about the following topic:
We are trying to come up with a solution to bulk insert XML data into a flat
structure table in an Oracle database. Our questions are:
1. Is the performance favorable using DataDirect XQuery?
2. Do you have any sample code we can use for a proof of concept?


Those are two good questions that are becoming more and more popular among our users. And DataDirect XQuery has very good answers for them! What about performance? Well, performance always depends on many factors, of course; how is the data structured in the XML document(s), for example; how does it need to be shred into columns, maybe into multiple tables; what kind of "massage" data needs to go through, and so on. But in general, users can rely on the DataDirect XQuery streaming and projection capabilities to boost scalability and on its ability to work efficiently with relational data sources to leverage performance.

What about some example? Consider the following hypothetical scenario: your warehouse receives daily shipments of books; details about the books are available in XML documents structured like this:
 <shipment>
  <book bookid="1" quantity="15">
   <title>Java Message Service</title>
   <isbn>0596000685</isbn>
   <subject>Java Message Service (JMS)</subject>
  </book>
  <book bookid="2" quantity="25">
   <title>Instant UNIX</title>
   <isbn>1874416656</isbn>
   <subject>GNU/Linux</subject>
  </book>
 </shipment>

Real shipment lists are actually made of hundreds of thousands of entries.
The SQL database used by your warehouse has a "shipments" table which is defined like this:
 CREATE TABLE [dbo].[shipments](
   [DATE] [datetime] NOT NULL,
   [ISBN] [nchar](10) NOT NULL,
   [QUANTITY] [decimal](18, 0) NOT NULL,
   [TITLE] [nvarchar](max) NOT NULL
 ) ON [PRIMARY]

You can write an XQuery loading all the data in the XML document into the "shipments" table quite easily, relying on the DataDirect XQuery RDBMS update capabilities:
 declare variable $shipment
     as document-node(element(*, xs:untyped)) external;
 for $book in $shipment/order/book
 return
  ddtek:sql-insert("shipments",
    "DATE", current-dateTime(),
    "ISBN", $book/isbn,
    "TITLE", $book/title,
    "QUANTITY", $book/@quantity)

Of course there may be cases in which the data been inserted in the database needs to be somewhat modified; for example, support the "TITLE" columns requires all capital case letters:
 for $book in $shipment/order/book
 return
  ddtek:sql-insert("shipments",
    "DATE", current-dateTime(),
    "ISBN", $book/isbn,
    "TITLE", upper-case($book/title),
    "QUANTITY", $book/@quantity)

...or maybe something a little bit more interesting: suppose the @quantity attribute in the input XML document is optional, and if it's missing it means quantity is actually 1:
 for $book in $shipment/order/book
 return
  ddtek:sql-insert("shipments",
    "DATE", current-dateTime(),
    "ISBN", $book/isbn,
    "TITLE", upper-case($book/title),
    "QUANTITY",
      if ($book/@quantity) then $book/@quantity else 1)

Or maybe the single XML needs to be shred in multiple tables, for example doing something like this:
 let $now := current-dateTime()
 for $book in $shipment/order/book
 return (
  ddtek:sql-insert("shipments1",
    "DATE", $now,
    "ISBN", $book/isbn,
    "TITLE", $book/title),
  ddtek:sql-insert("shipments2",
    "DATE", $now,
    "ISBN", $book/isbn,
    "QUANTITY", $book/@quantity)
 )

DataDirect XQuery is able to run these XQueries in a streaming fashion; so, users will be able to upload millions of records without scalability issues. Yet another way in which DataDirect XQuery helps you bridge the XML and SQL worlds in a highly scalable and performant way.

Labels: , , , , , ,