Thursday, February 11, 2010

XSLT: Using the generate-id() Function

NIEM utilizes ID and IDREF elements heavily throughout the data standard.  While this is native to the W3C specification for XML Schema files (.XSD) and in no way “unique” to NIEM, it is used much more heavily in NIEM than in many other national and international standards. 

When converting or transforming to NIEM from another data standard, it quickly becomes necessary to generate unique identifiers in a common and consistent manner for key “noun” elements such as Persons, Places, Vehicles, and the like.  A number of home-grown functions are scattered around the Internet to do this, however a native XSLT function already exists to perform this task called generate-id()

Say the following non-NIEM-conformant XML payload is provided to a system processing citation data:

<CitationBatch>
  <Citation>
    <CitationNumber>123456</CitationNumber>
    <CitationDefendant>
      <FirstName>John</FirstName>
      <LastName>Doe</LastName>
      <PhoneNumber>123-456-7890</PhoneNumber>
    </CitationDefendant>
    <!-- Remainder Omitted -->
  </Citation>
<CitationBatch>

Within NIEM the <CitationDefendant> element above is termed the <j:CitationSubject> and includes a <nc:RoleOfPersonReference> rather than embedding all person information as child elements within the citation.  Additionally, the phone number for any given person is contained within a <nc:ContactInformation> element. 

The XSLT generate-id() function accepts a specific xml node as its input parameter and will consistently provide a unique ID for that node no matter where or how many times it is called from within the XSLT.  For example, take the following XSLT snippets:

<xsl:for-each select="$xmlInputFile/CitationBatch/Citation">
        <xsl:variable name="xmlCiteNode" select="."/>
        <j:CitationSubject>
            <nc:RoleOfPersonReference>
                <xsl:attribute name="s:ref">
                    <xsl:value-of select="generate-id($xmlCiteNode/CitationDefendant)"/>
                </xsl:attribute>
            </nc:RoleOfPersonReference>
        </j:CitationSubject>
    </xsl:for-each>
    ....
    ....
    ....
    <xsl:for-each select="$xmlInputFile/CitationBatch/Citation/CitationDefendant">
        <xsl:variable name="xmlCiteSubjectNode" select="."/>
        <nc:Person>
            <xsl:attribute name="s:id">
                <xsl:value-of select="generate-id($xmlCiteSubjectNode)"/>
            </xsl:attribute>
        </nc:Person>
    </xsl:for-each>
    ....
    ....

Even though the generate-id() function is called in two places within the transform, using two different variable names, the function will return the same exact yet unique ID as the XPath for both variables resolve to the same element in the input schema.  The output of the above would appear as the following:

....
....
<j:CitationSubject>
    <nc:RoleOfPersonReference s:ref="d0e8"/>
</j:CitationSubject>
....
....
<nc:Person s:id="d0e8"/>

This powerful function within XSLT dramatically ease ID and IDREF usage within XML and makes implementation of transforms to NIEM relatively trivial.