As Schematron is so closely related to XSLT and other related languages such as XQuery, in the new year, we have decided to begin providing XSLT tips in addition to the Schematron tips. We will be updating the prior posts tags and titles to help differentiate the Schematron from the XSLT articles. We apologize for any inconvenience or confusion these updates may cause.
Schematron and other Application Development help for Public Sector and National Information Exchange Model (NIEM) practitioners.
Saturday, December 26, 2009
Tuesday, December 8, 2009
Schematron: Trim Whitespace When Testing String Length
As discussed in previous posts, the string-length() > 0 test is useful in checking to be sure null values are not passed; a validation step that raw XSD does not natively perform. This ensures the following is NOT allowed:
<nc:PersonGivenName></nc:PersonGivenName>
It also prevents the following:
<nc:PersonGivenName/>
However, if white space is not trimmed, the following WOULD be allowed:
<nc:PersonGivenName> </nc:PersonGivenName>
In order to trim leading and trailing white space, the built-in XSLT function normalize-space() can be used. This in effect eliminates the above scenario where spaces have been inserted into the string. This can be seen in the following example:
<assert test="string-length(normalize-space(nc:PersonGivenName)) > 0"/> Person's first name may not be left blank. </assert>
Be aware, that this function also eliminates redundant spaces between characters (including duplicate carriage returns and line-feeds) so a custom replace() function may be required if you wish to preserve those extra characters in your string length checks.
Monday, November 23, 2009
Schematron: Nesting XPath Values Within an XPath Predicate
In previous examples, we have seen the usage of a temporary variable or <let> tag to store a value which is later used in an XPath predicate (the square brackets surrounding the index of an element array). It is important to note that this is not required. A simple XPath statement can be used in the predicate for any other XPath statement. For example see the following:
<pattern id="wEmptyMetadataComment"> <title>Ensure person metadata comment is not blank.</title> <rule context="/ns:SomeDocument/nc:Person"> <assert test="string-length(/ns:SomeDocument/nc:Metadata[@s:id=current()/@s:metadata]/nc:CommentText) > 0"> Comments regarding a person should not be blank. </assert> </rule> </pattern>
In the above example, simply the attribute @s:id=/ns:SomeDocument/nc:Person/@s:metadata is used to identify which specific Metadata element should be examined. With the context defined as /ns:SomeDocument/nc:Person, the rule will loop through each nc:Person element and use the appropriate @s:metadata value in each subsequent pass.
[Updated: Corrected Syntax on 04-01-2010]
Friday, November 13, 2009
Schematron: Validating NIEM Documents Against Non-Conformant Code Lists
Schematron rules and assertions are based upon XPath statements, which allow for a number of powerful XML querying capabilities. Two XPath capabilities leveraged and outlined in this section are doc() and XPath predicates which allow us to validate data captured in an NIEM XML instance against external code list of any kind.
Lets assume a scenario where we would like to validate an exchange document’s category against a predefined list of enumerated values. This list is maintained by an outside party in a format other than NIEM and changes on a fairly regular basis.
Traditionally, a NIEM practitioner would take this list and define an enumeration within an extension schema to enforce this code list. Each time the third party makes a change to that code list, an updated NIEM extension schema would be created and redistributed. This maintenance-intensive process could become overwhelming therefore the team chose instead to simply adopt the third-party list and keep it in the following non-conformant format relying instead on Schematron to perform the validation:
<?xml version="1.0" encoding="UTF-8"?> <!-- List of Valid code Values --> <CategoryList> <Category>a</Category> <Category>b</Category> </CategoryList>
As shown in the above, the valid categories include the values “a” and “b”. An example of a NIEM-conformant XML payload would look something like the following:
<ns:SomeDocument xmlns:nc="http://niem.gov/niem/niem-core/2.0" xmlns:ns="http://www.niematron.org/SchematronTestbed" schemaLocation"http://www.niematron.org/SchematronTestbed ./SomeDocument.xsd"> <nc:DocumentCategoryText>A</nc:DocumentCategoryText> <!-- Remaining Document Elements Omitted --> </ns:SomeDocument>
In this example, the developers would like to perform the validation ignoring case, therefore the Schematron rule to validate the nc:DocumentCategoryText against the third-party-provided list would look something like the following:
<pattern id="eDocumentCategory"> <title>Verify the document category matches the external list of valid categories.</title> <rule context="/ns:SomeDocument"> <let name="sText" value="lower-case(nc:DocumentCategoryText)"/> <assert test="count(doc('./CategoryList.xml')/CategoryList/Category[. = $sText]) > 0"> Invalid document category. </assert> </rule> </pattern>
Lets look at some of the key statements in the above Schematron example breaking it into individual parts.
- lower-case(nc:DocumentCategoryText) – This statement encapsulated in a <let> tag converts the text in the NIEM payload to lower case thereby ignoring deviations from the code list due to case. It is then stored in a temporary variable named $sText.
- doc('.CategoryList.xml')/… – This effectively points the parser at the third-party provided file (in this example assumed to be in the same directory as the .sch file) so that elements from that file can be referenced using the XPath in addition to elements in the source payload document.
- …/Category[. = $sText] – The usage of the square brackets ([ and ]) in an XPath statement is considered a predicate. Any number of predicate statements can be made to help filter values contained within an XPath, but in this case, the expression tells the parser to select all of the Category elements with the value contained in the variable $sText.
- count(…) > 0 – The XQuery count function returns the number of elements contained in the XPath. If no match to the category existed, the count would return a value of zero, therefore we want to ensure the value is greater than zero meaning a match existed in the external code list.
Friday, November 6, 2009
Schematron: Enforce String Patterns in Schematron
In the general area of XML schemas, XSD “patterns” are commonly used to enforce special string formatting constraints. This is a very powerful tool when a document recipient wishes to ensure that the sender provides string data in a consistent format. A common example is the usage of a string constraint is to validate the structure of a Social Security Number (SSN). This would be expressed in a typical schema in the following manner:
<xsd:simpleType name="SsnSimpleType"> <xsd:restriction base="xsd:string"> <xsd:pattern value="[0-9]{3}[\-][0-9]{2}[\-][0-9]{4}" /> </xsd:restriction> </xsd:simpleType>
As with most parts of NIEM, much of the model is based on inheritance which makes enforce of simple data types, such as that shown above, cumbersome and awkward. Semantically, the correct element for an SSN would be under:
nc:Person/nc:PersonSSNIdentification/ nc:IdentificationID
Since nc:PersonSSNIdentification is an nc:IdentificationType, if one were to enforce SSN formatting on nc:IdentificationID, any other part of the schema that is derived from nc:IdentificationType would also need to abide by the same rules (e.g. Driver License Number, State ID Number, Document Identification, etc.). In the past this situation led to one thing. . . extension.
With Schematron, extension for this purpose could be avoided. Rather than enforcing the string constraints in the XSD file, instead the IEPD publisher could enforce this constraint within the Schematron rules document instead. The following is an example of what code would be required in Schematron to accomplish this purpose:
<pattern id="ePersonSSN"> <title>Verify person social security number is in the correct format.</title> <rule context="/ns:SomeDocument/nc:Person/nc:PersonSSNIdentification"> <assert test= "count(tokenize(nc:IdentificationID,'[0-9]{3}-[0-9]{2}-[0-9]{4}')) - 1 = 1"> Social security number must be in the proper format (e.g. 11-222-3333). </assert> </rule> </pattern>
By using the Schematron approach, the semantically equivalent element is preserved in the schema and only the appropriate identifier is subjected to the constraint.
This approach can be further extended to address any number of string constraints. Another example would be ensuring an identification number only contains digits and has a string length of 5 or more. This could be done by using the following XQuery count() query instead:
count(tokenize(nc:IdentificationID, '\d')) > 5
This very powerful approach to constraining strings is yet another reason to take a real good look at Schematron in conjunction with your NIEM IEPDs.
Wednesday, November 4, 2009
Schematron: Correct nc:DateRepresentation Usage
The inherent flexibility of NIEM proves to be an incredibly beneficial when used correctly, however this benefit can also be one of its largest banes. Sometimes this flexibility can lead to confusion when implementers attempt to deploy a NIEM exchange which is “valid” according to the XSD, yet not what the recipient is expecting.
One such example is NIEM’s usage of substitution groups where a variety of data elements are legal according to the schema, but rarely are all of these legal options accounted for by the recipient’s adapter. Take NIEM’s DateType as an example. It employs the explicit substitution group (abstract data element) of nc:DateRepresentation which can be one of several different data types. This representation can be replaced with a date (2009-01-01), a date/time (2009-01-01T12:00:00), a month and a year (01-2009), etc.
Lets assume for a minute that a document has two different dates: a document filed date, and a person’s birth date. The publisher’s intention is that filed date be a “timestamp” which includes both a date and a time, while the birth date is simply a date including a month, day and year. A valid sample XML payload would look something like the following:
<?xml version="1.0" encoding="UTF-8"?> <ns:SomeDocument> <nc:DocumentFiledDate> <nc:DateTime>2009-01-01T01:00:00</nc:DateTime> </nc:DocumentFiledDate> <nc:Person> <nc:PersonBirthDate> <nc:Date>1970-01-01</nc:Date> </nc:PersonBirthDate> </nc:Person> </ns:SomeDocument>
The Schematron code to enforce the publisher’s intentions could appear as the following:
<pattern id="eDocumentDateTime"> <title>Verify the document filed date includes a date/Time</title> <rule context="ns:SomeDocument/nc:DocumentFiledDate"> <assert test="nc:DateTime"> A date and a time must be provided as the document filed date. </assert> </rule> </pattern> <pattern id="ePersonBirthDate"> <title>Ensure the person's birth date is an nc:Date.</title> <rule context="ns:SomeDocument/nc:Person/nc:PersonBirthDate"> <assert test="nc:Date"> A person's birth date must be a full date. </assert> </rule> </pattern>
This is a great example of how Schematron can help clarify a publisher’s intent as NEIM-conformant services are developed and deployed.
Wednesday, October 21, 2009
Schematron: License Plate State is Required when a Number Exists
A common practice in transportation and law enforcement is to document a vehicle’s license plate number. In many situations, this plate number must be accompanied by the state which issued the license plate.
In NIEM, a vehicle’s license plate is contained within the nc:ConveyanceRegistrationPlateIdentification element which is an nc:IdentificationType. Using schema cardinality, one could make a the state required by simply assigning a minOccurs=”1” to the nc:IdentificationJurisdiction element, however this can often cause more problems than it solves for two key reasons:
- Making jurisdiction required through schema cardinality makes it required globally throughout the exchange even if it doesn’t apply in those scenarios as many other elements in a typical NIEM exchange are also nc:IdentificationType data types.
- nc:IdentificationJurisdiction is an abstract data element that can be replaced with any number of elements, not all of which are enumerated state values. Some are country codes, some are province codes for other countries and others are simply free-text.
This presents another ideal use case for Schematron. The following example code segment ensures a NCIC plate issuing state is included any time a Plate Identification exists:
<pattern id="eVehiclePlateState"> <title>Ensure a plate state is included with a plate number.</title> <rule context="ns:MyDocument/nc:Vehicle/nc:ConveyanceRegistrationPlateIdentification"> <assert test="j:IdentificationJurisdictionNCICLISCode"> A plate state must be included with vehicle license plate. </assert> </rule> </pattern>
The same segment can be modified to enforce any of the available jurisdiction code lists. For example, an exchange in Canada may wish to check for the existence of j:IdentificationJurisdictionCanadianProvinceCode instead of j:IdentificationJurisdictionNCICLISCode.
Wednesday, October 14, 2009
Link: Schematron Tutorials
There are a number of solid Schematron tutorials posted on the internet. Two of most comprehensive ones available are:
- The Oasis-Hosted Coverpages.com Schematron Article.
- Roger Costello’s Tutorial
We won’t plagiarize their incredible wealth of Schematron material here, so it is highly recommend developers jump to and use these online tutorials.
Monday, October 12, 2009
Schematron: Use Phase for Errors and Warnings
Schematron allows for grouping of rules not only by XPath, but also through association using the “Phase” element. While this has long been recommended as an approach for improving validation performance and unit testing, Phase also serves as an excellent way to group together and differentiate between critical errors and simple warnings.
For example, an agency might choose to have some minimum data restrictions surrounding Officers and Agencies on a electronic citation that can not be overlooked, and at the same time have warnings surrounding statue codes that do not match the known state code values. In Schematron the following would the the code matching this scenario:
<!-- Rules resulting in just warnings (should not prevent submission) --> <phase id="Warnings"> <active pattern="validStatute"/> </phase> <!-- Rules resulting in errors (must prevent submission) --> <phase id="Errors"> <active pattern="minOfficerData"/> <active pattern="minAgencyData"/> </phase>
The pattern attribute is an IDREF to a related pattern ID somewhere in the document. A developer can then create code to prevent the submission of any validation errors resulting from only one of the above phases. Schematron validation engines typically have command line switches or parameters to specify which phase should be run. For example, in Xerces’ implementation of Saxon, the parameter “phase=x” is used where x is one of the phase id’s listed above or “#ALL” if all phases should be processed.
Friday, October 9, 2009
Schematron: Recommended Reading
Wednesday, October 7, 2009
Schematron: Officer Has a Last Name
This is the first in a series of code example articles that will be posted to give NIEM developers a head start in using Schematron. This example will show how to perform a test across multiple branches or nodes of a typical NIEM schema as law enforcement officer is a role played by a person in NIEM schemas. Take the following example XML code:
<ns:SomeDocument> <j:Citation> <j:CitationIssuingOfficial> <nc:RoleOfPersonReference s:ref="P1"/> </j:CitationIssuingOfficial> </j:Citation> <nc:Person s:id="P1"> <nc:PersonName><nc:PersonSurName>Smith</nc:PersonSurName></nc:PersonName> </nc:Person> </ns:SomeDocument>
One way in which to test for the existence of a last name is to match the ID with the officer's REF and test to be sure the string length is greater than 1 as shown in the following example (using XSLT2 & ISO Schematron):
<pattern id="eOfficerData"> <let name="sOfficerRef" value="ns:SomeDocument/j:Citation/ j:CitationIssuingOfficial/nc:RoleOfPersonReference/@s:ref"/> <rule context="ns:SomeDocument/nc:Person"> <report test="@s:id = $sOfficerRef and string-length(nc:PersonName/nc:PersonSurName) < 1"> Officers last name must be provided. </report> </rule> </pattern>
In theory the same test can be done using the XQuery id()
function however use of the id
function is HIGHLY dependent on the parser's capabilities.
Monday, October 5, 2009
To ISO or Not to ISO
Feature | v1.5 | ISO |
Varriables | Not supported | let element available. |
Query Language | XSLT 1.0/XPath 1.0 | XSLT 1.0/XPath 1.0, XSLT 2.0/XPath 2.0, EXSLT, STX, XSLT 1.1, etc. |
Abstract Patterns & Inheritance | Not Supported | Supported |
value-of Element(helpful in debugging and error messaging) | Not Supported | Supported |
xsl:key Element | Supported | Not Supported (Workaround Exists) |
flag Attribute | Not Supported | Supported |
SVRL | Not Supported | Supported |
include element | Not Supported | Supported |
- Support for variables is important when working with ID and IDREF (more about this in a later blog).
- XQuery 2.0 functions provide a number of goodies that would be hard to pass up.
- ISO Schematron is a recognized ISO standard. . . which should count for something in a standards-based community.
XML Editor Support for Schematron
Editor | Native Schematron Validation? | Native Schematron Editing? | Tutorial |
oXygen | Yes | Yes | http://www.oxygenxml.com/validation.html |
XMLSpy | No | Yes | http://xml.sys-con.com/node/40656 |
Editix | Yes | Yes | None available |