Wednesday, March 31, 2010

Schematron: Supporting Form Validation in a Data Validation World

NIEM is for all intents and purposes a highly object-oriented data model which may, or may not be used by form entry tools at the time of data capture.  While this has enormous benefits, it can be detrimental if one wants to use Schematron for BOTH form and data validation in conjunction with NIEM.

In order to properly support form validation, source systems will often want to know exactly which field caused an error during the processing of business rules surrounding a form. While in many systems, the map between the source fields and the NIEM may be readily available, in cases where it is not or processing speed is critical, the data validation engine should be capable of furnishing this information back to the calling system.

Lets take the example of a citation data capture tool with the following example data entry UI:

image

NIEM supports the passing of a “footnote” on every element called the nc:Metadata element.  nc:Metadata is a complex data type that in turn includes an element to store the source-system’s field name called nc:SourceIDText.  The NIEM conformant XML instance to pass this would look something like the following:

<ns:CitationBatchDocument> 
  ...
  ...
  ...
  <ns:Citation> 
    ...
    ...
    <!-- Citation Number --> 
    <nc:ActivityIdentification> 
      <nc:IdentificationID s:metadata=”M1”>ABC123</nc:IdentificationID> 
    </nc:ActivityIdentification> 
    <!-- Citation Date --> 
    <nc:ActivityDate> 
      <nc:Date s:metadata=”M2”>2002-05-30</nc:Date> 
    </nc:ActivityDate> 
    ...
    ...
  </ns:Citation> 
  ...
  ...
  ...
  <nc:Metadata s:id=”M1”> 
    <nc:SourceIDText>CITE_NUM</nc:SourceIDText> 
  </nc:Metadata> 
  <nc:Metadata s:id=”M2”> 
    <nc:SourceIDText>CITE_DATE</nc:SourceIDText> 
  </nc:Metadata> 
  ...
  ...
</ <ns:CitationBatchDocument>

While passing the field name to the business rules engine is 1/2 the battle, one must also return the field name with any errors the data validation engine runs into.  An example Schematron code snippet to support returning the field name to the source system in the diagnostics would appear something like the following:

...
...
...
<pattern id="eBasicCiteRules">
  <title>Check the minimum basic citation rules.</title>
  <rule context="cite:CitationBatchDocument/cite:Citation">
    <let name="CiteNumSource" value="/cite:CitationBatchDocument
                                     /nc:Metadata [@s:id = current()
                                     /nc:ActivityIdentification
                                     /nc:IdentificationID/@s:metadata]
                                     /nc:SourceIDText"/>
    <assert test="nc:ActivityIdentification/nc:IdentificationID and
        string-length(normalize-space (nc:ActivityIdentification/nc:IdentificationID))
        &gt; 0" diagnostics="eCiteIdDiag">
            Citations must have a Citation Number.
    </assert>
  </rule>
</pattern>
...
...
...
<diagnostics>
  <diagnostic id="eCiteIdDiag">
    |<value-of select=”@CiteNumSource”/>|
    Some technical error description goes here (e.g. XPath to error).
  </diagnostic>
</diagnostics>

What this will yield to the end user is the following error message:

Citations must have a Citation Number.

What the source system will also receive in the case of any errors would look like the following:

|CITE_NUM| Some technical error description goes here (e.g. XPath to error).

The field name passed back could then be used by the source system in helping guide end users to complete the form correctly (e.g. jump to the first field with an error).  In the above example, simple “bar” delimiters are being used (|) but this could of course be changed to proper XML elements through the use of &gt; and $lt; instead.

EDITED 2010-04-01: Adding current() to XPath in Schematron code snippet.