Wednesday, March 31, 2010

Schematron: Supporting Form Validation in a Data Validation World

NIEM is for all intents and purposes a highly object-oriented data model which may, or may not be used by form entry tools at the time of data capture.  While this has enormous benefits, it can be detrimental if one wants to use Schematron for BOTH form and data validation in conjunction with NIEM.

In order to properly support form validation, source systems will often want to know exactly which field caused an error during the processing of business rules surrounding a form. While in many systems, the map between the source fields and the NIEM may be readily available, in cases where it is not or processing speed is critical, the data validation engine should be capable of furnishing this information back to the calling system.

Lets take the example of a citation data capture tool with the following example data entry UI:

image

NIEM supports the passing of a “footnote” on every element called the nc:Metadata element.  nc:Metadata is a complex data type that in turn includes an element to store the source-system’s field name called nc:SourceIDText.  The NIEM conformant XML instance to pass this would look something like the following:

<ns:CitationBatchDocument> 
  ...
  ...
  ...
  <ns:Citation> 
    ...
    ...
    <!-- Citation Number --> 
    <nc:ActivityIdentification> 
      <nc:IdentificationID s:metadata=”M1”>ABC123</nc:IdentificationID> 
    </nc:ActivityIdentification> 
    <!-- Citation Date --> 
    <nc:ActivityDate> 
      <nc:Date s:metadata=”M2”>2002-05-30</nc:Date> 
    </nc:ActivityDate> 
    ...
    ...
  </ns:Citation> 
  ...
  ...
  ...
  <nc:Metadata s:id=”M1”> 
    <nc:SourceIDText>CITE_NUM</nc:SourceIDText> 
  </nc:Metadata> 
  <nc:Metadata s:id=”M2”> 
    <nc:SourceIDText>CITE_DATE</nc:SourceIDText> 
  </nc:Metadata> 
  ...
  ...
</ <ns:CitationBatchDocument>

While passing the field name to the business rules engine is 1/2 the battle, one must also return the field name with any errors the data validation engine runs into.  An example Schematron code snippet to support returning the field name to the source system in the diagnostics would appear something like the following:

...
...
...
<pattern id="eBasicCiteRules">
  <title>Check the minimum basic citation rules.</title>
  <rule context="cite:CitationBatchDocument/cite:Citation">
    <let name="CiteNumSource" value="/cite:CitationBatchDocument
                                     /nc:Metadata [@s:id = current()
                                     /nc:ActivityIdentification
                                     /nc:IdentificationID/@s:metadata]
                                     /nc:SourceIDText"/>
    <assert test="nc:ActivityIdentification/nc:IdentificationID and
        string-length(normalize-space (nc:ActivityIdentification/nc:IdentificationID))
        &gt; 0" diagnostics="eCiteIdDiag">
            Citations must have a Citation Number.
    </assert>
  </rule>
</pattern>
...
...
...
<diagnostics>
  <diagnostic id="eCiteIdDiag">
    |<value-of select=”@CiteNumSource”/>|
    Some technical error description goes here (e.g. XPath to error).
  </diagnostic>
</diagnostics>

What this will yield to the end user is the following error message:

Citations must have a Citation Number.

What the source system will also receive in the case of any errors would look like the following:

|CITE_NUM| Some technical error description goes here (e.g. XPath to error).

The field name passed back could then be used by the source system in helping guide end users to complete the form correctly (e.g. jump to the first field with an error).  In the above example, simple “bar” delimiters are being used (|) but this could of course be changed to proper XML elements through the use of &gt; and $lt; instead.

EDITED 2010-04-01: Adding current() to XPath in Schematron code snippet.

Thursday, February 11, 2010

XSLT: Using the generate-id() Function

NIEM utilizes ID and IDREF elements heavily throughout the data standard.  While this is native to the W3C specification for XML Schema files (.XSD) and in no way “unique” to NIEM, it is used much more heavily in NIEM than in many other national and international standards. 

When converting or transforming to NIEM from another data standard, it quickly becomes necessary to generate unique identifiers in a common and consistent manner for key “noun” elements such as Persons, Places, Vehicles, and the like.  A number of home-grown functions are scattered around the Internet to do this, however a native XSLT function already exists to perform this task called generate-id()

Say the following non-NIEM-conformant XML payload is provided to a system processing citation data:

<CitationBatch>
  <Citation>
    <CitationNumber>123456</CitationNumber>
    <CitationDefendant>
      <FirstName>John</FirstName>
      <LastName>Doe</LastName>
      <PhoneNumber>123-456-7890</PhoneNumber>
    </CitationDefendant>
    <!-- Remainder Omitted -->
  </Citation>
<CitationBatch>

Within NIEM the <CitationDefendant> element above is termed the <j:CitationSubject> and includes a <nc:RoleOfPersonReference> rather than embedding all person information as child elements within the citation.  Additionally, the phone number for any given person is contained within a <nc:ContactInformation> element. 

The XSLT generate-id() function accepts a specific xml node as its input parameter and will consistently provide a unique ID for that node no matter where or how many times it is called from within the XSLT.  For example, take the following XSLT snippets:

<xsl:for-each select="$xmlInputFile/CitationBatch/Citation">
        <xsl:variable name="xmlCiteNode" select="."/>
        <j:CitationSubject>
            <nc:RoleOfPersonReference>
                <xsl:attribute name="s:ref">
                    <xsl:value-of select="generate-id($xmlCiteNode/CitationDefendant)"/>
                </xsl:attribute>
            </nc:RoleOfPersonReference>
        </j:CitationSubject>
    </xsl:for-each>
    ....
    ....
    ....
    <xsl:for-each select="$xmlInputFile/CitationBatch/Citation/CitationDefendant">
        <xsl:variable name="xmlCiteSubjectNode" select="."/>
        <nc:Person>
            <xsl:attribute name="s:id">
                <xsl:value-of select="generate-id($xmlCiteSubjectNode)"/>
            </xsl:attribute>
        </nc:Person>
    </xsl:for-each>
    ....
    ....

Even though the generate-id() function is called in two places within the transform, using two different variable names, the function will return the same exact yet unique ID as the XPath for both variables resolve to the same element in the input schema.  The output of the above would appear as the following:

....
....
<j:CitationSubject>
    <nc:RoleOfPersonReference s:ref="d0e8"/>
</j:CitationSubject>
....
....
<nc:Person s:id="d0e8"/>

This powerful function within XSLT dramatically ease ID and IDREF usage within XML and makes implementation of transforms to NIEM relatively trivial.