Friday, October 29, 2010

Editorial: Growing Support for Asynchronous Transactions

It's always exciting to see a prediction come true. . .especially when its one of your own.  For a number of years now it has been obvious that the simple synchronous web services taught in books and college courses were never going to be sufficient enough to support the highly complex transactions required in the real world.  In particular, transactions which require two-phase validation, such as those that leverage Schematron in addition to simple XML Schema validation.

It appears that others are leaning towards agreeing with this same premise.  Microsoft is now publishing its Visual Studio Async in order to better handle these sorts of highly complex transactions which require higher network latency than your traditional synchronous request/response web service.  Infoworld has a good article here that describes Microsoft's adoption of this new trend.

While not endorsing any specific platform, Microsoft's development tools have historically been a great indicator of market trends.  This latest addition proves to be a good sign that the industry is headed towards better support of this highly complex transaction processing world in which we live.

Wednesday, October 20, 2010

XSLT: Transform XML into nc:ContactInformation Structure

This is a short post to show how to leverage XSLT to convert a simple and generic XML file into NIEM-conformant XML as it pertains to the nc:ContactInformation block. 

This is a very common situation where a "non-NIEM" data stream is received and needs to be converted to a conformant structure.  Take the following sample non-NIEM XML instance:

<?xml version="1.0" encoding="UTF-8" ?>     
<SomeBatchOfStuff>
   <Person>
       <Name>John Doe</Name>
       <PhoneNumber>212-111-2222</PhoneNumber>
   </Person>
   <Person>
       <Name>Sally Smith</Name>
       <PhoneNumber>212-333-4444</PhoneNumber>
   </Person>
</SomeBatchOfStuff>

If this very logical structure needed to be converted into nc:Person and nc:ContactInformatoin elements (with an nc:PersonContactInformationAssociation object to link the two together), the following XSLT could be used:

<?xml version="1.0" encoding="UTF-8" ?>     
<xsl:stylesheet  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ns0="SomeNonConformantDocumentNamespace" version="1.0" exclude-result-prefixes="xs">
    <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
    <xsl:template match="/">
        <xsl:variable name="var1_instance_InputSchema" select="."/>
        <MyNIEMConformantDocument xmlns="MyNIEMDocumentNamespace" xmlns:i="http://niem.gov/niem/appinfo/2.0" xmlns:nc="http://niem.gov/niem/niem-core/2.0" xmlns:niem-xsd="http://niem.gov/niem/proxy/xsd/2.0" xmlns:s="http://niem.gov/niem/structures/2.0">
            <!-- Loop through the Persons and create an NIEM Conformant Person -->
            <xsl:for-each select="$var1_instance_InputSchema/SomeBatchOfStuff/Person">
                <xsl:variable name="NonConformantPerson" select="."/>
                <nc:Person>
                    <xsl:attribute name="s:id">
                        <xsl:value-of select="generate-id(.)"/>
                    </xsl:attribute>
                    <nc:PersonName>
                        <xsl:for-each select="$NonConformantPerson/Name">
                            <nc:PersonFullName>
                                <xsl:value-of select="string(.)"/>
                            </nc:PersonFullName>
                        </xsl:for-each>
                    </nc:PersonName>
                </nc:Person>
            </xsl:for-each>
            
            <!-- Loop through the phone numbers and create a NIEM Conformant Contact Information -->
            <xsl:for-each select="$var1_instance_InputSchema/SomeBatchOfStuff/Person/PhoneNumber">
                <nc:ContactInformation>
                    <xsl:attribute name="s:id">
                        <xsl:value-of select="generate-id(.)"/>
                    </xsl:attribute>
                    <nc:ContactTelephoneNumber>
                        <nc:FullTelephoneNumber>
                            <nc:TelephoneNumberFullID>
                                <xsl:value-of select="string(.)"/>
                            </nc:TelephoneNumberFullID>
                        </nc:FullTelephoneNumber>
                    </nc:ContactTelephoneNumber>
                </nc:ContactInformation>
            </xsl:for-each>
            
            <!-- Loop through the phone numbers and create a NIEM Conformant Contact Information Association -->
            <xsl:for-each select="$var1_instance_InputSchema/SomeBatchOfStuff/Person">
                <nc:PersonContactInformationAssociation>
                    <nc:PersonReference>
                    <xsl:attribute name="s:ref">
                        <xsl:value-of select="generate-id(.)"/>
                    </xsl:attribute>
                    </nc:PersonReference>
                    <nc:ContactInformationReference>
                        <xsl:attribute name="s:ref">
                            <xsl:value-of select="generate-id(./PhoneNumber)"/>
                        </xsl:attribute>
                    </nc:ContactInformationReference>
                </nc:PersonContactInformationAssociation>
            </xsl:for-each>
        </MyNIEMConformantDocument>
    </xsl:template>                    
</xsl:stylesheet>

The XSLT heavily leverages the XSLT generate-id() function in order to work its magic and result in the following NIEM-conformant XML file:

<?xml version="1.0" encoding="UTF-8" ?>   
<MyNIEMConformantDocument xmlns="MyNIEMDocumentNamespace" xmlns:i="http://niem.gov/niem/appinfo/2.0" xmlns:nc="http://niem.gov/niem/niem-core/2.0" xmlns:niem-xsd="http://niem.gov/niem/proxy/xsd/2.0" xmlns:s="http://niem.gov/niem/structures/2.0" xmlns:ns0="SomeNonConformantDocumentNamespace">
   <nc:Person s:id="d0e3">
      <nc:PersonName>
         <nc:PersonFullName>John Doe</nc:PersonFullName>
      </nc:PersonName>
   </nc:Person>
   <nc:Person s:id="d0e12">
      <nc:PersonName>
         <nc:PersonFullName>Sally Smith</nc:PersonFullName>
      </nc:PersonName>
   </nc:Person>
   <nc:ContactInformation s:id="d0e8">
      <nc:ContactTelephoneNumber>
         <nc:FullTelephoneNumber>
            <nc:TelephoneNumberFullID>212-111-2222</nc:TelephoneNumberFullID>
         </nc:FullTelephoneNumber>
      </nc:ContactTelephoneNumber>
   </nc:ContactInformation>
   <nc:ContactInformation s:id="d0e17">
      <nc:ContactTelephoneNumber>
         <nc:FullTelephoneNumber>
            <nc:TelephoneNumberFullID>212-333-4444</nc:TelephoneNumberFullID>
         </nc:FullTelephoneNumber>
      </nc:ContactTelephoneNumber>
   </nc:ContactInformation>
   <nc:PersonContactInformationAssociation>
      <nc:PersonReference s:ref="d0e3"/>
      <nc:ContactInformationReference s:ref="d0e8"/>
   </nc:PersonContactInformationAssociation>
   <nc:PersonContactInformationAssociation>
      <nc:PersonReference s:ref="d0e12"/>
      <nc:ContactInformationReference s:ref="d0e17"/>
   </nc:PersonContactInformationAssociation>
</MyNIEMConformantDocument>

Wednesday, March 31, 2010

Schematron: Supporting Form Validation in a Data Validation World

NIEM is for all intents and purposes a highly object-oriented data model which may, or may not be used by form entry tools at the time of data capture.  While this has enormous benefits, it can be detrimental if one wants to use Schematron for BOTH form and data validation in conjunction with NIEM.

In order to properly support form validation, source systems will often want to know exactly which field caused an error during the processing of business rules surrounding a form. While in many systems, the map between the source fields and the NIEM may be readily available, in cases where it is not or processing speed is critical, the data validation engine should be capable of furnishing this information back to the calling system.

Lets take the example of a citation data capture tool with the following example data entry UI:

image

NIEM supports the passing of a “footnote” on every element called the nc:Metadata element.  nc:Metadata is a complex data type that in turn includes an element to store the source-system’s field name called nc:SourceIDText.  The NIEM conformant XML instance to pass this would look something like the following:

<ns:CitationBatchDocument> 
  ...
  ...
  ...
  <ns:Citation> 
    ...
    ...
    <!-- Citation Number --> 
    <nc:ActivityIdentification> 
      <nc:IdentificationID s:metadata=”M1”>ABC123</nc:IdentificationID> 
    </nc:ActivityIdentification> 
    <!-- Citation Date --> 
    <nc:ActivityDate> 
      <nc:Date s:metadata=”M2”>2002-05-30</nc:Date> 
    </nc:ActivityDate> 
    ...
    ...
  </ns:Citation> 
  ...
  ...
  ...
  <nc:Metadata s:id=”M1”> 
    <nc:SourceIDText>CITE_NUM</nc:SourceIDText> 
  </nc:Metadata> 
  <nc:Metadata s:id=”M2”> 
    <nc:SourceIDText>CITE_DATE</nc:SourceIDText> 
  </nc:Metadata> 
  ...
  ...
</ <ns:CitationBatchDocument>

While passing the field name to the business rules engine is 1/2 the battle, one must also return the field name with any errors the data validation engine runs into.  An example Schematron code snippet to support returning the field name to the source system in the diagnostics would appear something like the following:

...
...
...
<pattern id="eBasicCiteRules">
  <title>Check the minimum basic citation rules.</title>
  <rule context="cite:CitationBatchDocument/cite:Citation">
    <let name="CiteNumSource" value="/cite:CitationBatchDocument
                                     /nc:Metadata [@s:id = current()
                                     /nc:ActivityIdentification
                                     /nc:IdentificationID/@s:metadata]
                                     /nc:SourceIDText"/>
    <assert test="nc:ActivityIdentification/nc:IdentificationID and
        string-length(normalize-space (nc:ActivityIdentification/nc:IdentificationID))
        &gt; 0" diagnostics="eCiteIdDiag">
            Citations must have a Citation Number.
    </assert>
  </rule>
</pattern>
...
...
...
<diagnostics>
  <diagnostic id="eCiteIdDiag">
    |<value-of select=”@CiteNumSource”/>|
    Some technical error description goes here (e.g. XPath to error).
  </diagnostic>
</diagnostics>

What this will yield to the end user is the following error message:

Citations must have a Citation Number.

What the source system will also receive in the case of any errors would look like the following:

|CITE_NUM| Some technical error description goes here (e.g. XPath to error).

The field name passed back could then be used by the source system in helping guide end users to complete the form correctly (e.g. jump to the first field with an error).  In the above example, simple “bar” delimiters are being used (|) but this could of course be changed to proper XML elements through the use of &gt; and $lt; instead.

EDITED 2010-04-01: Adding current() to XPath in Schematron code snippet.

Thursday, February 11, 2010

XSLT: Using the generate-id() Function

NIEM utilizes ID and IDREF elements heavily throughout the data standard.  While this is native to the W3C specification for XML Schema files (.XSD) and in no way “unique” to NIEM, it is used much more heavily in NIEM than in many other national and international standards. 

When converting or transforming to NIEM from another data standard, it quickly becomes necessary to generate unique identifiers in a common and consistent manner for key “noun” elements such as Persons, Places, Vehicles, and the like.  A number of home-grown functions are scattered around the Internet to do this, however a native XSLT function already exists to perform this task called generate-id()

Say the following non-NIEM-conformant XML payload is provided to a system processing citation data:

<CitationBatch>
  <Citation>
    <CitationNumber>123456</CitationNumber>
    <CitationDefendant>
      <FirstName>John</FirstName>
      <LastName>Doe</LastName>
      <PhoneNumber>123-456-7890</PhoneNumber>
    </CitationDefendant>
    <!-- Remainder Omitted -->
  </Citation>
<CitationBatch>

Within NIEM the <CitationDefendant> element above is termed the <j:CitationSubject> and includes a <nc:RoleOfPersonReference> rather than embedding all person information as child elements within the citation.  Additionally, the phone number for any given person is contained within a <nc:ContactInformation> element. 

The XSLT generate-id() function accepts a specific xml node as its input parameter and will consistently provide a unique ID for that node no matter where or how many times it is called from within the XSLT.  For example, take the following XSLT snippets:

<xsl:for-each select="$xmlInputFile/CitationBatch/Citation">
        <xsl:variable name="xmlCiteNode" select="."/>
        <j:CitationSubject>
            <nc:RoleOfPersonReference>
                <xsl:attribute name="s:ref">
                    <xsl:value-of select="generate-id($xmlCiteNode/CitationDefendant)"/>
                </xsl:attribute>
            </nc:RoleOfPersonReference>
        </j:CitationSubject>
    </xsl:for-each>
    ....
    ....
    ....
    <xsl:for-each select="$xmlInputFile/CitationBatch/Citation/CitationDefendant">
        <xsl:variable name="xmlCiteSubjectNode" select="."/>
        <nc:Person>
            <xsl:attribute name="s:id">
                <xsl:value-of select="generate-id($xmlCiteSubjectNode)"/>
            </xsl:attribute>
        </nc:Person>
    </xsl:for-each>
    ....
    ....

Even though the generate-id() function is called in two places within the transform, using two different variable names, the function will return the same exact yet unique ID as the XPath for both variables resolve to the same element in the input schema.  The output of the above would appear as the following:

....
....
<j:CitationSubject>
    <nc:RoleOfPersonReference s:ref="d0e8"/>
</j:CitationSubject>
....
....
<nc:Person s:id="d0e8"/>

This powerful function within XSLT dramatically ease ID and IDREF usage within XML and makes implementation of transforms to NIEM relatively trivial.

Monday, January 11, 2010

XSLT: Transform Date and Time Elements into nc:DateTime

While NIEM practitioners tend to merge Date and Time elements together into single nc:DateTime elements, we often find that the outside world separates these into two fields in their XML data packages.  For example, if someone were to use Java XForms or Microsoft InfoPath to capture data in an electronic form, it is common to separate these out into their component parts.

For example, assume a NIBRS report form exists and has discrete date and time values.  Using XSLT to merge these is quite simple and can be done using the concat() function as show here:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
    version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:ns="SomeNibrsOffenseReportNamespace"
    exclude-result-prefixes="ns">
    
    <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
    
    <xsl:template match="/">
        <xsl:variable name="sInputSchema" select="."/>
        
        <OffenseReportDocument xmlns="SomeNiemOffenseReportNamespace" xmlns:j="http://niem.gov/niem/domains/jxdm/4.0" xmlns:nc="http://niem.gov/niem/niem-core/2.0">
          <j:Offense>
            <nc:ActivityDate>
              <nc:DateTime>
                <xsl:value-of select="concat(string($sInputSchema/ns:NibrsForm/ns:OffenseDate), 'T', string(sInputSchema/ns:NibrsForm/ns:OffenseTime))"/>
              </nc:DateTime>
            </nc:ActivityDate>
          </j:Offense>
        </OffenseReportDocument>
    </xsl:template>
</xsl:stylesheet>

In the above example, the concat() function allows us to merge the date (e.g. ‘2010-01-01’), the letter ‘T’, and the time (e.g. ‘12:00:00’) into a single string which in turn can be evaluated as a nc:dateTime element. 

1-13-10 – Edit for typo

Thursday, January 7, 2010

Schematron: Using the Number() Function Versus Casting

There are situations where it becomes necessary to test the value of a numeric element to ensure it meets some minimum or maximum value.  As Schematron is capable of treating any element as a string, it is generally a best practice to cast the value to a numeric data type first. 

For example, on a citation or a complaint document it may be necessary to check the fine or bail amount to ensure it is greater than zero.  This could be done with the following Schematron assert statement:

<assert test="xsd:double(nc:ObligationDueAmount) &gt; 0">
  Bail amount may not be less than zero.
</assert>

While the above would work when a value is provided in the nc:ObligationDueAmount element, an XSLT error would be raised in the following circumstances:

  • Value is blank or null
      • <nc:ObligationDueAmount></nc:ObligationDueAmount>
  • Value is omitted
      • <nc:ObligationDueAmount/>
  • Value is a string value
      • <nc:ObligationDueAmount>N/A</nc:ObligationDueAmount>

For this reason, it is often preferable to use the native XPath function number().  As described by Ms. Priscilla Walmsley in her O’Reilly book XQuery, this function will prevent the XSLT parser from throwing an error and instead return the value ‘NaN’ (Not a Number).  The following would be the same way the Schematron test could be written using the number() function instead:

<assert test="number(nc:ObligationDueAmount) &gt; 0">
  Bail amount may not be less than zero.
</assert>
<assert test="nc:ObligationDueAmount and string-length(nc:ObligationDueAmount) &gt; 0">
  Bail amount may not be left blank or otherwise omitted.
</assert>

While a few more lines are required, this prevents a runtime parser error from being raised and causing havoc with the validation engine.