Friday, November 13, 2009

Schematron: Validating NIEM Documents Against Non-Conformant Code Lists

Schematron rules and assertions are based upon XPath statements, which allow for a number of powerful XML querying capabilities. Two XPath capabilities leveraged and outlined in this section are doc() and XPath predicates which allow us to validate data captured in an NIEM XML instance against external code list of any kind.

Lets assume a scenario where we would like to validate an exchange document’s category against a predefined list of enumerated values.  This list is maintained by an outside party in a format other than NIEM and changes on a fairly regular basis. 

Traditionally, a NIEM practitioner would take this list and define an enumeration within an extension schema to enforce this code list.  Each time the third party makes a change to that code list, an updated NIEM extension schema would be created and redistributed.  This maintenance-intensive process could become overwhelming therefore the team chose instead to simply adopt the third-party list and keep it in the following non-conformant format relying instead on Schematron to perform the validation:

<?xml version="1.0" encoding="UTF-8"?>
<!-- List of Valid code Values -->
<CategoryList>
  <Category>a</Category>
  <Category>b</Category>
</CategoryList>

As shown in the above, the valid categories include the values “a” and “b”.  An example of a NIEM-conformant XML payload would look something like the following:

<ns:SomeDocument 
    xmlns:nc="http://niem.gov/niem/niem-core/2.0"    
    xmlns:ns="http://www.niematron.org/SchematronTestbed"
    schemaLocation"http://www.niematron.org/SchematronTestbed  ./SomeDocument.xsd">
  <nc:DocumentCategoryText>A</nc:DocumentCategoryText>
  <!-- Remaining Document Elements Omitted -->
</ns:SomeDocument>

In this example, the developers would like to perform the validation ignoring case, therefore the Schematron rule to validate the nc:DocumentCategoryText against the third-party-provided list would look something like the following:

<pattern id="eDocumentCategory">
  <title>Verify the document category matches the external list of valid categories.</title>
  <rule context="/ns:SomeDocument">
    <let name="sText" value="lower-case(nc:DocumentCategoryText)"/>
    <assert test="count(doc('./CategoryList.xml')/CategoryList/Category[. = $sText]) &gt; 0">
      Invalid document category.
    </assert>
  </rule>
</pattern>

Lets look at some of the key statements in the above Schematron example breaking it into individual parts. 

  • lower-case(nc:DocumentCategoryText) – This statement encapsulated in a <let> tag converts the text in the NIEM payload to lower case thereby ignoring deviations from the code list due to case.  It is then stored in a temporary variable named $sText.
  • doc('.CategoryList.xml')/… – This effectively points the parser at the third-party provided file (in this example assumed to be in the same directory as the .sch file) so that elements from that file can be referenced using the XPath in addition to elements in the source payload document. 
  • …/Category[. = $sText] – The usage of the square brackets ([ and ])  in  an XPath statement is considered a predicate.  Any number of predicate statements can be made to help filter values contained within an XPath, but in this case, the expression tells the parser to select all of the Category elements with the value contained in the variable $sText.
  • count(…) &gt; 0 – The XQuery count function returns the number of elements contained in the XPath.  If no match to the category existed, the count would return a value of zero, therefore we want to ensure the value is greater than zero meaning a match existed in the external code list.

No comments:

Post a Comment