Saturday, December 26, 2009

Article Title and Tag Updates

As Schematron is so closely related to XSLT and other related languages such as XQuery, in the new year, we have decided to begin providing XSLT tips in addition to the Schematron tips.  We will be updating the prior posts tags and titles to help differentiate the Schematron from the XSLT articles.  We apologize for any inconvenience or confusion these updates may cause.

Tuesday, December 8, 2009

Schematron: Trim Whitespace When Testing String Length

As discussed in previous posts, the string-length() > 0 test is useful in checking to be sure null values are not passed; a validation step that raw XSD does not natively perform.  This ensures the following is NOT allowed:

<nc:PersonGivenName></nc:PersonGivenName>

It also prevents the following:

<nc:PersonGivenName/>

However, if white space is not trimmed, the following WOULD be allowed:

<nc:PersonGivenName>  </nc:PersonGivenName>

In order to trim leading and trailing white space, the built-in XSLT function normalize-space() can be used.  This in effect eliminates the above scenario where spaces have been inserted into the string.  This can be seen in the following example:

<assert test="string-length(normalize-space(nc:PersonGivenName)) &gt; 0"/>
  Person's first name may not be left blank.
</assert>

Be aware, that this function also eliminates redundant spaces between characters (including duplicate carriage returns and line-feeds) so a custom replace() function may be required if you wish to preserve those extra characters in your string length checks.

Monday, November 23, 2009

Schematron: Nesting XPath Values Within an XPath Predicate

In previous examples, we have seen the usage of a temporary variable or <let> tag to store a value which is later used in an XPath predicate (the square brackets surrounding the index of an element array).  It is important to note that this is not required.  A simple XPath statement can be used in the predicate for any other XPath statement.  For example see the following:

<pattern id="wEmptyMetadataComment">
  <title>Ensure person metadata comment is not blank.</title>
  <rule context="/ns:SomeDocument/nc:Person">
    <assert test="string-length(/ns:SomeDocument/nc:Metadata[@s:id=current()/@s:metadata]/nc:CommentText) &gt; 0">
      Comments regarding a person should not be blank.
    </assert>
  </rule>
</pattern>

In the above example, simply the attribute @s:id=/ns:SomeDocument/nc:Person/@s:metadata is used to identify which specific Metadata element should be examined.  With the context defined as /ns:SomeDocument/nc:Person, the rule will loop through each nc:Person element and use the appropriate @s:metadata value in each subsequent pass.

[Updated: Corrected Syntax on 04-01-2010]

Friday, November 13, 2009

Schematron: Validating NIEM Documents Against Non-Conformant Code Lists

Schematron rules and assertions are based upon XPath statements, which allow for a number of powerful XML querying capabilities. Two XPath capabilities leveraged and outlined in this section are doc() and XPath predicates which allow us to validate data captured in an NIEM XML instance against external code list of any kind.

Lets assume a scenario where we would like to validate an exchange document’s category against a predefined list of enumerated values.  This list is maintained by an outside party in a format other than NIEM and changes on a fairly regular basis. 

Traditionally, a NIEM practitioner would take this list and define an enumeration within an extension schema to enforce this code list.  Each time the third party makes a change to that code list, an updated NIEM extension schema would be created and redistributed.  This maintenance-intensive process could become overwhelming therefore the team chose instead to simply adopt the third-party list and keep it in the following non-conformant format relying instead on Schematron to perform the validation:

<?xml version="1.0" encoding="UTF-8"?>
<!-- List of Valid code Values -->
<CategoryList>
  <Category>a</Category>
  <Category>b</Category>
</CategoryList>

As shown in the above, the valid categories include the values “a” and “b”.  An example of a NIEM-conformant XML payload would look something like the following:

<ns:SomeDocument 
    xmlns:nc="http://niem.gov/niem/niem-core/2.0"    
    xmlns:ns="http://www.niematron.org/SchematronTestbed"
    schemaLocation"http://www.niematron.org/SchematronTestbed  ./SomeDocument.xsd">
  <nc:DocumentCategoryText>A</nc:DocumentCategoryText>
  <!-- Remaining Document Elements Omitted -->
</ns:SomeDocument>

In this example, the developers would like to perform the validation ignoring case, therefore the Schematron rule to validate the nc:DocumentCategoryText against the third-party-provided list would look something like the following:

<pattern id="eDocumentCategory">
  <title>Verify the document category matches the external list of valid categories.</title>
  <rule context="/ns:SomeDocument">
    <let name="sText" value="lower-case(nc:DocumentCategoryText)"/>
    <assert test="count(doc('./CategoryList.xml')/CategoryList/Category[. = $sText]) &gt; 0">
      Invalid document category.
    </assert>
  </rule>
</pattern>

Lets look at some of the key statements in the above Schematron example breaking it into individual parts. 

  • lower-case(nc:DocumentCategoryText) – This statement encapsulated in a <let> tag converts the text in the NIEM payload to lower case thereby ignoring deviations from the code list due to case.  It is then stored in a temporary variable named $sText.
  • doc('.CategoryList.xml')/… – This effectively points the parser at the third-party provided file (in this example assumed to be in the same directory as the .sch file) so that elements from that file can be referenced using the XPath in addition to elements in the source payload document. 
  • …/Category[. = $sText] – The usage of the square brackets ([ and ])  in  an XPath statement is considered a predicate.  Any number of predicate statements can be made to help filter values contained within an XPath, but in this case, the expression tells the parser to select all of the Category elements with the value contained in the variable $sText.
  • count(…) &gt; 0 – The XQuery count function returns the number of elements contained in the XPath.  If no match to the category existed, the count would return a value of zero, therefore we want to ensure the value is greater than zero meaning a match existed in the external code list.

Friday, November 6, 2009

Schematron: Enforce String Patterns in Schematron

In the general area of XML schemas, XSD “patterns” are commonly used to enforce special string formatting constraints.  This is a very powerful tool when a document recipient wishes to ensure that the sender provides string data in a consistent format.  A common example is the usage of a string constraint is to validate the structure of a Social Security Number (SSN).  This would be expressed in a typical schema in the following manner:

<xsd:simpleType name="SsnSimpleType">
    <xsd:restriction base="xsd:string">
        <xsd:pattern value="[0-9]{3}[\-][0-9]{2}[\-][0-9]{4}" />
    </xsd:restriction>
</xsd:simpleType>

As with most parts of NIEM, much of the model is based on inheritance which makes enforce of simple data types, such as that shown above, cumbersome and awkward.  Semantically, the correct element for an SSN would be under:

nc:Person/nc:PersonSSNIdentification/ nc:IdentificationID

Since nc:PersonSSNIdentification is an nc:IdentificationType, if one were to enforce SSN formatting on nc:IdentificationID, any other part of the schema that is derived from nc:IdentificationType would also need to abide by the same rules (e.g. Driver License Number, State ID Number, Document Identification, etc.).  In the past this situation led to one thing. . . extension.

With Schematron, extension for this purpose could be avoided.  Rather than enforcing the string constraints in the XSD file, instead the IEPD publisher could enforce this constraint within the Schematron rules document instead.  The following is an example of what code would be required in Schematron to accomplish this purpose:

<pattern id="ePersonSSN">
  <title>Verify person social security number is in the correct format.</title>
  <rule context="/ns:SomeDocument/nc:Person/nc:PersonSSNIdentification">
    <assert test=
      "count(tokenize(nc:IdentificationID,'[0-9]{3}-[0-9]{2}-[0-9]{4}')) 
      - 1 = 1">
       Social security number must be in the proper format (e.g. 11-222-3333).
    </assert>
  </rule>
</pattern>

By using the Schematron approach, the semantically equivalent element is preserved in the schema and only the appropriate identifier is subjected to the constraint.

This approach can be further extended to address any number of string constraints.  Another example would be ensuring an identification number only contains digits and has a string length of 5 or more.  This could be done by using the following XQuery count() query instead:

count(tokenize(nc:IdentificationID, '\d')) &gt; 5

This very powerful approach to constraining strings is yet another reason to take a real good look at Schematron in conjunction with your NIEM IEPDs.

Wednesday, November 4, 2009

Schematron: Correct nc:DateRepresentation Usage

The inherent flexibility of NIEM proves to be an incredibly beneficial when used correctly, however this benefit can also be one of its largest banes.  Sometimes this flexibility can lead to confusion when implementers attempt to deploy a NIEM exchange which is “valid” according to the XSD, yet not what the recipient is expecting. 

One such example is NIEM’s usage of substitution groups where a variety of data elements are legal according to the schema, but rarely are all of these legal options accounted for by the recipient’s adapter.  Take NIEM’s DateType as an example.  It employs the explicit substitution group (abstract data element) of nc:DateRepresentation which can be one of several different data types.  This representation can be replaced with a date (2009-01-01), a date/time (2009-01-01T12:00:00), a month and a year (01-2009), etc. 

Lets assume for a minute that a document has two different dates: a document filed date, and a person’s birth date.  The publisher’s intention is that filed date be a “timestamp” which includes both a date and a time, while the birth date is simply a date including a month, day and year.  A valid sample XML payload would look something like the following:

<?xml version="1.0" encoding="UTF-8"?>
<ns:SomeDocument>
  <nc:DocumentFiledDate>
    <nc:DateTime>2009-01-01T01:00:00</nc:DateTime>
  </nc:DocumentFiledDate>
  <nc:Person>
    <nc:PersonBirthDate>
      <nc:Date>1970-01-01</nc:Date>
    </nc:PersonBirthDate>
  </nc:Person>
</ns:SomeDocument>

The Schematron code to enforce the publisher’s intentions could appear as the following:

<pattern id="eDocumentDateTime">
  <title>Verify the document filed date includes a date/Time</title>
  <rule context="ns:SomeDocument/nc:DocumentFiledDate">
    <assert test="nc:DateTime">
      A date and a time must be provided as the document filed date.
    </assert>
  </rule>
</pattern>
<pattern id="ePersonBirthDate">
  <title>Ensure the person's birth date is an nc:Date.</title>
  <rule context="ns:SomeDocument/nc:Person/nc:PersonBirthDate">
    <assert test="nc:Date">
      A person's birth date must be a full date.
    </assert>
  </rule>
</pattern>

This is a great example of how Schematron can help clarify a publisher’s intent as NEIM-conformant services are developed and deployed.

Wednesday, October 21, 2009

Schematron: License Plate State is Required when a Number Exists

A common practice in transportation and law enforcement is to document a vehicle’s license plate number.  In many situations, this plate number must be accompanied by the state which issued the license plate. 

In NIEM, a vehicle’s license plate is contained within the nc:ConveyanceRegistrationPlateIdentification element which is an nc:IdentificationType.  Using schema cardinality, one could make a the state required by simply assigning a minOccurs=”1” to the nc:IdentificationJurisdiction element, however this can often cause more problems than it solves for two key reasons:

  1. Making jurisdiction required through schema cardinality makes it required globally throughout the exchange even if it doesn’t apply in those scenarios as many other elements in a typical NIEM exchange are also nc:IdentificationType data types.
  2. nc:IdentificationJurisdiction is an abstract data element that can be replaced with any number of elements, not all of which are enumerated state values.  Some are country codes, some are province codes for other countries and others are simply free-text. 

This presents another ideal use case for Schematron.  The following example code segment ensures a NCIC plate issuing state is included any time a Plate Identification exists:

<pattern id="eVehiclePlateState">
  <title>Ensure a plate state is included with a plate number.</title>
  <rule context="ns:MyDocument/nc:Vehicle/nc:ConveyanceRegistrationPlateIdentification">
    <assert test="j:IdentificationJurisdictionNCICLISCode">
      A plate state must be included with vehicle license plate.
    </assert>
  </rule>
</pattern>

The same segment can be modified to enforce any of the available jurisdiction code lists.  For example, an exchange in Canada may wish to check for the existence of j:IdentificationJurisdictionCanadianProvinceCode instead of j:IdentificationJurisdictionNCICLISCode

Wednesday, October 14, 2009

Link: Schematron Tutorials

There are a number of solid Schematron tutorials posted on the internet.  Two of most comprehensive ones available are:

We won’t plagiarize their incredible wealth of Schematron material here, so it is highly recommend developers jump to and use these online tutorials.

Monday, October 12, 2009

Schematron: Use Phase for Errors and Warnings

Schematron allows for grouping of rules not only by XPath, but also through association using the “Phase” element. While this has long been recommended as an approach for improving validation performance and unit testing, Phase also serves as an excellent way to group together and differentiate between critical errors and simple warnings.

For example, an agency might choose to have some minimum data restrictions surrounding Officers and Agencies on a electronic citation that can not be overlooked, and at the same time have warnings surrounding statue codes that do not match the known state code values. In Schematron the following would the the code matching this scenario:

<!-- Rules resulting in just warnings (should not prevent submission) -->
<phase id="Warnings">
  <active pattern="validStatute"/>
</phase>
<!-- Rules resulting in errors (must prevent submission) -->
<phase id="Errors">
  <active pattern="minOfficerData"/>
  <active pattern="minAgencyData"/>
</phase>

The pattern attribute is an IDREF to a related pattern ID somewhere in the document. A developer can then create code to prevent the submission of any validation errors resulting from only one of the above phases. Schematron validation engines typically have command line switches or parameters to specify which phase should be run. For example, in Xerces’ implementation of Saxon, the parameter “phase=x” is used where x is one of the phase id’s listed above or “#ALL” if all phases should be processed.

Friday, October 9, 2009

Schematron: Recommended Reading

What books exists out there that detail Schematron and how to use it?
Best one I've seen to date is "Schematron" from O'Reilly Media available in electronic (Adobe PDF) format for $9.99 USD. Provides a great overview and set of examples using Schematron, albeit exclusively in hierarchically linked XML documents. It does assume the reader already has a strong understanding of XML and XSLT. If you are a bit rusty on XSLT or XQuery, you'll want to pick up a book on that as well.

Wednesday, October 7, 2009

Schematron: Officer Has a Last Name

This is the first in a series of code example articles that will be posted to give NIEM developers a head start in using Schematron. This example will show how to perform a test across multiple branches or nodes of a typical NIEM schema as law enforcement officer is a role played by a person in NIEM schemas. Take the following example XML code:

<ns:SomeDocument>
<j:Citation>  
   <j:CitationIssuingOfficial>
     <nc:RoleOfPersonReference s:ref="P1"/>  
   </j:CitationIssuingOfficial>
</j:Citation>
<nc:Person s:id="P1">
   <nc:PersonName><nc:PersonSurName>Smith</nc:PersonSurName></nc:PersonName>
</nc:Person>
</ns:SomeDocument>

One way in which to test for the existence of a last name is to match the ID with the officer's REF and test to be sure the string length is greater than 1 as shown in the following example (using XSLT2 & ISO Schematron):

<pattern id="eOfficerData"> 
    <let name="sOfficerRef" value="ns:SomeDocument/j:Citation/ j:CitationIssuingOfficial/nc:RoleOfPersonReference/@s:ref"/> 
    <rule context="ns:SomeDocument/nc:Person"> 
        <report test="@s:id = $sOfficerRef and string-length(nc:PersonName/nc:PersonSurName) < 1"> 
            Officers last name must be provided. 
        </report> 
    </rule> 
</pattern>

In theory the same test can be done using the XQuery id() function however use of the id function is HIGHLY dependent on the parser's capabilities.

Monday, October 5, 2009

To ISO or Not to ISO

Schematron comes in two common flavors, Schemtron v1.5 (older) and ISO Schematron (newer). While there are a number of differences that can be read about on the official Schematron Website, it can sometimes be confusing about which version is "best". The following table presents some of the differences between the two:
Featurev1.5ISO
VarriablesNot supportedlet element available.
Query LanguageXSLT 1.0/XPath 1.0XSLT 1.0/XPath 1.0, XSLT 2.0/XPath 2.0, EXSLT, STX, XSLT 1.1, etc.
Abstract Patterns & InheritanceNot SupportedSupported
value-of Element
(helpful in debugging and error messaging)
Not SupportedSupported
xsl:key ElementSupportedNot Supported
(Workaround Exists)
flag AttributeNot SupportedSupported
SVRLNot SupportedSupported
include elementNot SupportedSupported
My suggestion would be to go with ISO Schematron, for the following reasons:
  • Support for variables is important when working with ID and IDREF (more about this in a later blog).
  • XQuery 2.0 functions provide a number of goodies that would be hard to pass up.
  • ISO Schematron is a recognized ISO standard. . . which should count for something in a standards-based community.
Feel free to comment if you know any other key differences or disagree with my suggestion. Update: 2009-10-05 - Correct typo.

XML Editor Support for Schematron

XML editors are one of the primary tools in a practitioner's toolbox and as such, are the first place many of us look for assistance using a complex validation engine such as Schematron. Currently today editors on the market have different levels of support for Schematron; as it is at the core a multi-pass validation engine. The following table should provide you with links to where tutorials for some of the most common XML editors are located:
EditorNative Schematron Validation?Native Schematron Editing?Tutorial
oXygenYesYeshttp://www.oxygenxml.com/validation.html
XMLSpyNoYeshttp://xml.sys-con.com/node/40656
EditixYesYesNone available
Update: 2009-10-05 - Adding Hyperlink to Editor websites.

Welcome!

Welcome to the initial post of the NIEM-Schematron (aka NIEMatron) community. At the 2009 NIEM National Training Event (see #NIEMNTE on Twitter) it became apparent that the next big task for NIEM was ensuring that data was self-describing. As it currently stands today, the XML Schema Documents (XSD) used can only enforce and describe a small portion of the data in each Information Exchange Package Document (IEPD) developed. Expressing complex business rules has and continues to be a large struggle for many implementers of the NIEM standard. Most implementers have chosen to use validation logic built using whatever programming language fits their need (e.g. Java, VB.net, C++, C#, etc.) however once this is done, there is very little that one of their information exchange partners can use if the partner leverages a different technology or programming language. This problem is one of the key reasons the XML community developed the ISO standard Schematron. Schematron is a simple XML/XSLT-based script that allows implementers to describe highly complex business rules that would otherwise be impossible to describe using XSD alone. This website and blog will be updated on a regular basis to provide NIEM practitioners a place to learn about, ask questions, and discuss Schematron. We will attempt to provide as much information to NIEM developers as possible about tricks and traps we've encountered while using this powerful scripting language and will also be providing further executive-level explanations surrounding the technology and the value proposition it brings. Please be patient as like many of you, we travel quite often and may have difficulty responding immediately to questions you pose. Please follow us on Twitter (user: NIEMatron) if you would like to be appraised of new messages as they are posted. Thanks, and welcome to the future!