NIEMatron: Schematron

Showing posts with label Schematron. Show all posts

Wednesday, October 17, 2012

Schematron: Using Schematron Client-Side!

As browsers continue to evolve and incorporate more standards, it is starting to become possible to better leverage XML directly in client-side browsers.

A great series of articles (4 in all) has been posted by a colleague about how to leverage XML in forms capture and even use Schematron to help validate it! Check it out http://udiminished.blogspot.com/2011/12/simplify-with-xml-data-model-part-1.html

Friday, October 29, 2010

Editorial: Growing Support for Asynchronous Transactions

It's always exciting to see a prediction come true. . .especially when its one of your own. For a number of years now it has been obvious that the simple synchronous web services taught in books and college courses were never going to be sufficient enough to support the highly complex transactions required in the real world. In particular, transactions which require two-phase validation, such as those that leverage Schematron in addition to simple XML Schema validation.

It appears that others are leaning towards agreeing with this same premise. Microsoft is now publishing its Visual Studio Async in order to better handle these sorts of highly complex transactions which require higher network latency than your traditional synchronous request/response web service. Infoworld has a good article here that describes Microsoft's adoption of this new trend.

While not endorsing any specific platform, Microsoft's development tools have historically been a great indicator of market trends. This latest addition proves to be a good sign that the industry is headed towards better support of this highly complex transaction processing world in which we live.

Wednesday, March 31, 2010

Schematron: Supporting Form Validation in a Data Validation World

NIEM is for all intents and purposes a highly object-oriented data model which may, or may not be used by form entry tools at the time of data capture. While this has enormous benefits, it can be detrimental if one wants to use Schematron for BOTH form and data validation in conjunction with NIEM.

In order to properly support form validation, source systems will often want to know exactly which field caused an error during the processing of business rules surrounding a form. While in many systems, the map between the source fields and the NIEM may be readily available, in cases where it is not or processing speed is critical, the data validation engine should be capable of furnishing this information back to the calling system.

Lets take the example of a citation data capture tool with the following example data entry UI:

NIEM supports the passing of a “footnote” on every element called the nc:Metadata element. nc:Metadata is a complex data type that in turn includes an element to store the source-system’s field name called nc:SourceIDText. The NIEM conformant XML instance to pass this would look something like the following:

<ns:CitationBatchDocument> 
  ...
  ...
  ...
  <ns:Citation> 
    ...
    ...
    <!-- Citation Number --> 
    <nc:ActivityIdentification> 
      <nc:IdentificationID s:metadata=”M1”>ABC123</nc:IdentificationID> 
    </nc:ActivityIdentification> 
    <!-- Citation Date --> 
    <nc:ActivityDate> 
      <nc:Date s:metadata=”M2”>2002-05-30</nc:Date> 
    </nc:ActivityDate> 
    ...
    ...
  </ns:Citation> 
  ...
  ...
  ...
  <nc:Metadata s:id=”M1”> 
    <nc:SourceIDText>CITE_NUM</nc:SourceIDText> 
  </nc:Metadata> 
  <nc:Metadata s:id=”M2”> 
    <nc:SourceIDText>CITE_DATE</nc:SourceIDText> 
  </nc:Metadata> 
  ...
  ...
</ <ns:CitationBatchDocument>

While passing the field name to the business rules engine is 1/2 the battle, one must also return the field name with any errors the data validation engine runs into. An example Schematron code snippet to support returning the field name to the source system in the diagnostics would appear something like the following:

...
...
...
<pattern id="eBasicCiteRules">
  <title>Check the minimum basic citation rules.</title>
  <rule context="cite:CitationBatchDocument/cite:Citation">
    <let name="CiteNumSource" value="/cite:CitationBatchDocument
                                     /nc:Metadata [@s:id = current()
                                     /nc:ActivityIdentification
                                     /nc:IdentificationID/@s:metadata]
                                     /nc:SourceIDText"/>
    <assert test="nc:ActivityIdentification/nc:IdentificationID and
        string-length(normalize-space (nc:ActivityIdentification/nc:IdentificationID))
        &gt; 0" diagnostics="eCiteIdDiag">
            Citations must have a Citation Number.
    </assert>
  </rule>
</pattern>
...
...
...
<diagnostics>
  <diagnostic id="eCiteIdDiag">
    |<value-of select=”@CiteNumSource”/>|
    Some technical error description goes here (e.g. XPath to error).
  </diagnostic>
</diagnostics>

What this will yield to the end user is the following error message:

Citations must have a Citation Number.

What the source system will also receive in the case of any errors would look like the following:

|CITE_NUM| Some technical error description goes here (e.g. XPath to error).

The field name passed back could then be used by the source system in helping guide end users to complete the form correctly (e.g. jump to the first field with an error). In the above example, simple “bar” delimiters are being used (|) but this could of course be changed to proper XML elements through the use of > and $lt; instead.

EDITED 2010-04-01: Adding current() to XPath in Schematron code snippet.

Thursday, January 7, 2010

Schematron: Using the Number() Function Versus Casting

There are situations where it becomes necessary to test the value of a numeric element to ensure it meets some minimum or maximum value. As Schematron is capable of treating any element as a string, it is generally a best practice to cast the value to a numeric data type first.

For example, on a citation or a complaint document it may be necessary to check the fine or bail amount to ensure it is greater than zero. This could be done with the following Schematron assert statement:

<assert test="xsd:double(nc:ObligationDueAmount) &gt; 0">
  Bail amount may not be less than zero.
</assert>

While the above would work when a value is provided in the nc:ObligationDueAmount element, an XSLT error would be raised in the following circumstances:

Value is blank or null
Value is omitted
Value is a string value

For this reason, it is often preferable to use the native XPath function number(). As described by Ms. Priscilla Walmsley in her O’Reilly book XQuery, this function will prevent the XSLT parser from throwing an error and instead return the value ‘NaN’ (Not a Number). The following would be the same way the Schematron test could be written using the number() function instead:

<assert test="number(nc:ObligationDueAmount) &gt; 0">
  Bail amount may not be less than zero.
</assert>
<assert test="nc:ObligationDueAmount and string-length(nc:ObligationDueAmount) &gt; 0">
  Bail amount may not be left blank or otherwise omitted.
</assert>

While a few more lines are required, this prevents a runtime parser error from being raised and causing havoc with the validation engine.

Tuesday, December 8, 2009

Schematron: Trim Whitespace When Testing String Length

As discussed in previous posts, the string-length() > 0 test is useful in checking to be sure null values are not passed; a validation step that raw XSD does not natively perform. This ensures the following is NOT allowed:

<nc:PersonGivenName></nc:PersonGivenName>

It also prevents the following:

<nc:PersonGivenName/>

However, if white space is not trimmed, the following WOULD be allowed:

<nc:PersonGivenName>  </nc:PersonGivenName>

In order to trim leading and trailing white space, the built-in XSLT function normalize-space() can be used. This in effect eliminates the above scenario where spaces have been inserted into the string. This can be seen in the following example:

<assert test="string-length(normalize-space(nc:PersonGivenName)) &gt; 0"/>
  Person's first name may not be left blank.
</assert>

Be aware, that this function also eliminates redundant spaces between characters (including duplicate carriage returns and line-feeds) so a custom replace() function may be required if you wish to preserve those extra characters in your string length checks.

Monday, November 23, 2009

Schematron: Nesting XPath Values Within an XPath Predicate

In previous examples, we have seen the usage of a temporary variable or <let> tag to store a value which is later used in an XPath predicate (the square brackets surrounding the index of an element array). It is important to note that this is not required. A simple XPath statement can be used in the predicate for any other XPath statement. For example see the following:

<pattern id="wEmptyMetadataComment">
  <title>Ensure person metadata comment is not blank.</title>
  <rule context="/ns:SomeDocument/nc:Person">
    <assert test="string-length(/ns:SomeDocument/nc:Metadata[@s:id=current()/@s:metadata]/nc:CommentText) &gt; 0">
      Comments regarding a person should not be blank.
    </assert>
  </rule>
</pattern>

In the above example, simply the attribute @s:id=/ns:SomeDocument/nc:Person/@s:metadata is used to identify which specific Metadata element should be examined. With the context defined as /ns:SomeDocument/nc:Person, the rule will loop through each nc:Person element and use the appropriate @s:metadata value in each subsequent pass.

[Updated: Corrected Syntax on 04-01-2010]

Friday, November 13, 2009

Schematron: Validating NIEM Documents Against Non-Conformant Code Lists

Schematron rules and assertions are based upon XPath statements, which allow for a number of powerful XML querying capabilities. Two XPath capabilities leveraged and outlined in this section are doc() and XPath predicates which allow us to validate data captured in an NIEM XML instance against external code list of any kind.

Lets assume a scenario where we would like to validate an exchange document’s category against a predefined list of enumerated values. This list is maintained by an outside party in a format other than NIEM and changes on a fairly regular basis.

Traditionally, a NIEM practitioner would take this list and define an enumeration within an extension schema to enforce this code list. Each time the third party makes a change to that code list, an updated NIEM extension schema would be created and redistributed. This maintenance-intensive process could become overwhelming therefore the team chose instead to simply adopt the third-party list and keep it in the following non-conformant format relying instead on Schematron to perform the validation:

<?xml version="1.0" encoding="UTF-8"?>
<!-- List of Valid code Values -->
<CategoryList>
  <Category>a</Category>
  <Category>b</Category>
</CategoryList>

As shown in the above, the valid categories include the values “a” and “b”. An example of a NIEM-conformant XML payload would look something like the following:

<ns:SomeDocument 
    xmlns:nc="http://niem.gov/niem/niem-core/2.0"    
    xmlns:ns="http://www.niematron.org/SchematronTestbed"
    schemaLocation"http://www.niematron.org/SchematronTestbed  ./SomeDocument.xsd">
  <nc:DocumentCategoryText>A</nc:DocumentCategoryText>
  <!-- Remaining Document Elements Omitted -->
</ns:SomeDocument>

In this example, the developers would like to perform the validation ignoring case, therefore the Schematron rule to validate the nc:DocumentCategoryText against the third-party-provided list would look something like the following:

<pattern id="eDocumentCategory">
  <title>Verify the document category matches the external list of valid categories.</title>
  <rule context="/ns:SomeDocument">
    <let name="sText" value="lower-case(nc:DocumentCategoryText)"/>
    <assert test="count(doc('./CategoryList.xml')/CategoryList/Category[. = $sText]) &gt; 0">
      Invalid document category.
    </assert>
  </rule>
</pattern>

Lets look at some of the key statements in the above Schematron example breaking it into individual parts.

lower-case(nc:DocumentCategoryText) – This statement encapsulated in a <let> tag converts the text in the NIEM payload to lower case thereby ignoring deviations from the code list due to case. It is then stored in a temporary variable named $sText.
doc('.CategoryList.xml')/… – This effectively points the parser at the third-party provided file (in this example assumed to be in the same directory as the .sch file) so that elements from that file can be referenced using the XPath in addition to elements in the source payload document.
…/Category[. = $sText] – The usage of the square brackets ([ and ]) in an XPath statement is considered a predicate. Any number of predicate statements can be made to help filter values contained within an XPath, but in this case, the expression tells the parser to select all of the Category elements with the value contained in the variable $sText.
count(…) > 0 – The XQuery count function returns the number of elements contained in the XPath. If no match to the category existed, the count would return a value of zero, therefore we want to ensure the value is greater than zero meaning a match existed in the external code list.

Friday, November 6, 2009

Schematron: Enforce String Patterns in Schematron

In the general area of XML schemas, XSD “patterns” are commonly used to enforce special string formatting constraints. This is a very powerful tool when a document recipient wishes to ensure that the sender provides string data in a consistent format. A common example is the usage of a string constraint is to validate the structure of a Social Security Number (SSN). This would be expressed in a typical schema in the following manner:

<xsd:simpleType name="SsnSimpleType">
    <xsd:restriction base="xsd:string">
        <xsd:pattern value="[0-9]{3}[\-][0-9]{2}[\-][0-9]{4}" />
    </xsd:restriction>
</xsd:simpleType>

As with most parts of NIEM, much of the model is based on inheritance which makes enforce of simple data types, such as that shown above, cumbersome and awkward. Semantically, the correct element for an SSN would be under:

nc:Person/nc:PersonSSNIdentification/ nc:IdentificationID

Since nc:PersonSSNIdentification is an nc:IdentificationType, if one were to enforce SSN formatting on nc:IdentificationID, any other part of the schema that is derived from nc:IdentificationType would also need to abide by the same rules (e.g. Driver License Number, State ID Number, Document Identification, etc.). In the past this situation led to one thing. . . extension.

With Schematron, extension for this purpose could be avoided. Rather than enforcing the string constraints in the XSD file, instead the IEPD publisher could enforce this constraint within the Schematron rules document instead. The following is an example of what code would be required in Schematron to accomplish this purpose:

<pattern id="ePersonSSN">
  <title>Verify person social security number is in the correct format.</title>
  <rule context="/ns:SomeDocument/nc:Person/nc:PersonSSNIdentification">
    <assert test=
      "count(tokenize(nc:IdentificationID,'[0-9]{3}-[0-9]{2}-[0-9]{4}')) 
      - 1 = 1">
       Social security number must be in the proper format (e.g. 11-222-3333).
    </assert>
  </rule>
</pattern>

By using the Schematron approach, the semantically equivalent element is preserved in the schema and only the appropriate identifier is subjected to the constraint.

This approach can be further extended to address any number of string constraints. Another example would be ensuring an identification number only contains digits and has a string length of 5 or more. This could be done by using the following XQuery count() query instead:

count(tokenize(nc:IdentificationID, '\d')) > 5

This very powerful approach to constraining strings is yet another reason to take a real good look at Schematron in conjunction with your NIEM IEPDs.

Wednesday, November 4, 2009

Schematron: Correct nc:DateRepresentation Usage

The inherent flexibility of NIEM proves to be an incredibly beneficial when used correctly, however this benefit can also be one of its largest banes. Sometimes this flexibility can lead to confusion when implementers attempt to deploy a NIEM exchange which is “valid” according to the XSD, yet not what the recipient is expecting.

One such example is NIEM’s usage of substitution groups where a variety of data elements are legal according to the schema, but rarely are all of these legal options accounted for by the recipient’s adapter. Take NIEM’s DateType as an example. It employs the explicit substitution group (abstract data element) of nc:DateRepresentation which can be one of several different data types. This representation can be replaced with a date (2009-01-01), a date/time (2009-01-01T12:00:00), a month and a year (01-2009), etc.

Lets assume for a minute that a document has two different dates: a document filed date, and a person’s birth date. The publisher’s intention is that filed date be a “timestamp” which includes both a date and a time, while the birth date is simply a date including a month, day and year. A valid sample XML payload would look something like the following:

<?xml version="1.0" encoding="UTF-8"?>
<ns:SomeDocument>
  <nc:DocumentFiledDate>
    <nc:DateTime>2009-01-01T01:00:00</nc:DateTime>
  </nc:DocumentFiledDate>
  <nc:Person>
    <nc:PersonBirthDate>
      <nc:Date>1970-01-01</nc:Date>
    </nc:PersonBirthDate>
  </nc:Person>
</ns:SomeDocument>

The Schematron code to enforce the publisher’s intentions could appear as the following:

<pattern id="eDocumentDateTime">
  <title>Verify the document filed date includes a date/Time</title>
  <rule context="ns:SomeDocument/nc:DocumentFiledDate">
    <assert test="nc:DateTime">
      A date and a time must be provided as the document filed date.
    </assert>
  </rule>
</pattern>
<pattern id="ePersonBirthDate">
  <title>Ensure the person's birth date is an nc:Date.</title>
  <rule context="ns:SomeDocument/nc:Person/nc:PersonBirthDate">
    <assert test="nc:Date">
      A person's birth date must be a full date.
    </assert>
  </rule>
</pattern>

This is a great example of how Schematron can help clarify a publisher’s intent as NEIM-conformant services are developed and deployed.

Wednesday, October 21, 2009

Schematron: License Plate State is Required when a Number Exists

A common practice in transportation and law enforcement is to document a vehicle’s license plate number. In many situations, this plate number must be accompanied by the state which issued the license plate.

In NIEM, a vehicle’s license plate is contained within the nc:ConveyanceRegistrationPlateIdentification element which is an nc:IdentificationType. Using schema cardinality, one could make a the state required by simply assigning a minOccurs=”1” to the nc:IdentificationJurisdiction element, however this can often cause more problems than it solves for two key reasons:

Making jurisdiction required through schema cardinality makes it required globally throughout the exchange even if it doesn’t apply in those scenarios as many other elements in a typical NIEM exchange are also nc:IdentificationType data types.
nc:IdentificationJurisdiction is an abstract data element that can be replaced with any number of elements, not all of which are enumerated state values. Some are country codes, some are province codes for other countries and others are simply free-text.

This presents another ideal use case for Schematron. The following example code segment ensures a NCIC plate issuing state is included any time a Plate Identification exists:

<pattern id="eVehiclePlateState">
  <title>Ensure a plate state is included with a plate number.</title>
  <rule context="ns:MyDocument/nc:Vehicle/nc:ConveyanceRegistrationPlateIdentification">
    <assert test="j:IdentificationJurisdictionNCICLISCode">
      A plate state must be included with vehicle license plate.
    </assert>
  </rule>
</pattern>

The same segment can be modified to enforce any of the available jurisdiction code lists. For example, an exchange in Canada may wish to check for the existence of j:IdentificationJurisdictionCanadianProvinceCode instead of j:IdentificationJurisdictionNCICLISCode.

Monday, October 12, 2009

Schematron: Use Phase for Errors and Warnings

Schematron allows for grouping of rules not only by XPath, but also through association using the “Phase” element. While this has long been recommended as an approach for improving validation performance and unit testing, Phase also serves as an excellent way to group together and differentiate between critical errors and simple warnings.

For example, an agency might choose to have some minimum data restrictions surrounding Officers and Agencies on a electronic citation that can not be overlooked, and at the same time have warnings surrounding statue codes that do not match the known state code values. In Schematron the following would the the code matching this scenario:

<!-- Rules resulting in just warnings (should not prevent submission) -->
<phase id="Warnings">
  <active pattern="validStatute"/>
</phase>
<!-- Rules resulting in errors (must prevent submission) -->
<phase id="Errors">
  <active pattern="minOfficerData"/>
  <active pattern="minAgencyData"/>
</phase>

The pattern attribute is an IDREF to a related pattern ID somewhere in the document. A developer can then create code to prevent the submission of any validation errors resulting from only one of the above phases. Schematron validation engines typically have command line switches or parameters to specify which phase should be run. For example, in Xerces’ implementation of Saxon, the parameter “phase=x” is used where x is one of the phase id’s listed above or “#ALL” if all phases should be processed.

Friday, October 9, 2009

Schematron: Recommended Reading

What books exists out there that detail Schematron and how to use it?

Best one I've seen to date is "Schematron" from O'Reilly Media available in electronic (Adobe PDF) format for $9.99 USD. Provides a great overview and set of examples using Schematron, albeit exclusively in hierarchically linked XML documents. It does assume the reader already has a strong understanding of XML and XSLT. If you are a bit rusty on XSLT or XQuery, you'll want to pick up a book on that as well.

Wednesday, October 7, 2009

Schematron: Officer Has a Last Name

This is the first in a series of code example articles that will be posted to give NIEM developers a head start in using Schematron. This example will show how to perform a test across multiple branches or nodes of a typical NIEM schema as law enforcement officer is a role played by a person in NIEM schemas. Take the following example XML code:

<ns:SomeDocument>
<j:Citation>  
   <j:CitationIssuingOfficial>
     <nc:RoleOfPersonReference s:ref="P1"/>  
   </j:CitationIssuingOfficial>
</j:Citation>
<nc:Person s:id="P1">
   <nc:PersonName><nc:PersonSurName>Smith</nc:PersonSurName></nc:PersonName>
</nc:Person>
</ns:SomeDocument>

One way in which to test for the existence of a last name is to match the ID with the officer's REF and test to be sure the string length is greater than 1 as shown in the following example (using XSLT2 & ISO Schematron):

<pattern id="eOfficerData"> 
    <let name="sOfficerRef" value="ns:SomeDocument/j:Citation/ j:CitationIssuingOfficial/nc:RoleOfPersonReference/@s:ref"/> 
    <rule context="ns:SomeDocument/nc:Person"> 
        <report test="@s:id = $sOfficerRef and string-length(nc:PersonName/nc:PersonSurName) < 1"> 
            Officers last name must be provided. 
        </report> 
    </rule> 
</pattern>

In theory the same test can be done using the XQuery id() function however use of the id function is HIGHLY dependent on the parser's capabilities.