Abbey Workshop

XSLT: Split a Value List Into Elements

This page covers how to convert a list of space separated values into separate elements. This is covered in Michael Kay's first book which is linked below. (If you work with XSLT a lot, you need to own this book as it is the most comprehensive reference you can find on XSLT 1.0. Highly recommended.)

The need for using this technique has come up a couple of times in the last couple of months at work. I thought it best to put some sample code online so the next time someone asks about it, they can be referred here.

For this example, assume that we have some data about classes offered at a school or university this semester. Along with each class name, there is a list of student IDs for that class. Some naughty programmer has given us a list of student IDs (social security numbers) separated by spaces in a single element instead of marking up the data correctly. Our task is to convert the data inside the element into separate <id> elements. A sample of our input file is shown below.

Listing for: classes.xml

   1 <?xml version="1.0" encoding="UTF-8"?>
   2 <classList>
   3   <class name="Chemistry 101">
   4     <studentList>123-45-6789 234-56-7890 345-67-8901 456-78-9012 
   5     567-89-0123</studentList>
   6   </class>
   7   <class name="Biology 101">
   8     <studentList>
   9     123-45-6789
  10     234-56-7890
  11     345-67-8901
  12     456-78-9012 
  13     567-89-0123
  14     </studentList>
  15   </class>
  16 </classList>

The example shows two different ways the data can be formatted. The technique converts both examples into elements since whitespace is normalized before each item is removed from the list.

The basic approach is to pass the contents of the <studentList> element to a template which extracts each item. This template is called recursively until all items have been removed from the list. A sample style sheet is shown below:

Listing for: split-values.xsl

   1 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   2   version="1.0">
   3 
   4 <xsl:output indent="yes" method="xml"/>
   5 
   6 <xsl:template match="/">
   7 <classList>
   8   <xsl:for-each select="/classList/class">
   9   <class name="{@name}">
  10     <studentList>
  11     <xsl:call-template name="output-tokens">
  12       <xsl:with-param name="list"><xsl:value-of select="studentList" /></xsl:with-param>
  13     </xsl:call-template>
  14     </studentList>
  15   </class>
  16   </xsl:for-each>
  17 </classList>
  18 </xsl:template>
  19 
  20 <xsl:template name="output-tokens">
  21     <xsl:param name="list" />
  22     <xsl:variable name="newlist" select="concat(normalize-space($list), ' ')" />
  23     <xsl:variable name="first" select="substring-before($newlist, ' ')" />
  24     <xsl:variable name="remaining" select="substring-after($newlist, ' ')" />
  25     <id><xsl:value-of select="$first" /></id>
  26     <xsl:if test="$remaining">
  27         <xsl:call-template name="output-tokens">
  28             <xsl:with-param name="list" select="$remaining" />
  29         </xsl:call-template>
  30     </xsl:if>
  31 </xsl:template>
  32 </xsl:stylesheet>
  33 
  34 

The key to splitting apart the list are lines 20-31. The template first normalizes the space on line 22. This has the affect of converting any extra spaces or line feeds into a single space. The concat() function adds a space as a delimiter so that the last item in the list is followed by a space. This allows the last item in the list to be matched and returned. Line 23 takes the first item from the list and returns that item in an element on line 25. Line 24 takes the rest of the items and stores them in the variable $remaining. If $remaining is not empty on line 26, then the template is called again. The process repeats until all the items have been removed from the list.

The file is transformed using Ant, see this how to one the basics of XSLT transforms using Ant.

After processing, the output file looks like this:

Listing for: classes-idList.xml

   1 <?xml version="1.0" encoding="UTF-8"?>
   2 <classList>
   3   <class name="Chemistry 101">
   4     <studentList>
   5       <id>123-45-6789</id>
   6       <id>234-56-7890</id>
   7       <id>345-67-8901</id>
   8       <id>456-78-9012</id>
   9       <id>567-89-0123</id>
  10     </studentList>
  11   </class>
  12   <class name="Biology 101">
  13     <studentList>
  14       <id>123-45-6789</id>
  15       <id>234-56-7890</id>
  16       <id>345-67-8901</id>
  17       <id>456-78-9012</id>
  18       <id>567-89-0123</id>
  19     </studentList>
  20   </class>
  21 </classList>

A much better looking set of output. You could also extend this code a bit more and use delimiters other than a space. For example:

Listing for: split-values-delim.xsl

   1 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   2   version="1.0">
   3 
   4 <xsl:output indent="yes" method="xml"/>
   5 
   6 <xsl:template match="/">
   7 <testroot>
   8     <xsl:call-template name="output-tokens">
   9       <xsl:with-param name="list">1,2,3,4,5</xsl:with-param>
  10       <xsl:with-param name="delimiter">,</xsl:with-param>
  11     </xsl:call-template>
  12 </testroot></xsl:template>
  13 
  14 <xsl:template name="output-tokens">
  15     <xsl:param name="list" />
  16     <xsl:param name="delimiter" />
  17     <xsl:variable name="newlist">
  18     <xsl:choose>
  19       <xsl:when test="contains($list, $delimiter)"><xsl:value-of select="normalize-space($list)" /></xsl:when>
  20       
  21       <xsl:otherwise><xsl:value-of select="concat(normalize-space($list), $delimiter)"/></xsl:otherwise>
  22     </xsl:choose>
  23   </xsl:variable>
  24     <xsl:variable name="first" select="substring-before($newlist, $delimiter)" />
  25     <xsl:variable name="remaining" select="substring-after($newlist, $delimiter)" />
  26     <num><xsl:value-of select="$first" /></num>
  27     <xsl:if test="$remaining">
  28         <xsl:call-template name="output-tokens">
  29             <xsl:with-param name="list" select="$remaining" />
  30       <xsl:with-param name="delimiter"><xsl:value-of select="$delimiter"/></xsl:with-param>
  31         </xsl:call-template>
  32     </xsl:if>
  33 </xsl:template>
  34 </xsl:stylesheet>
  35 
  36 

The only real change is lines 18-22. Since you are no longer using a space as a delimiter, you can't add delimiters as before since the normalize-space() function will not remove extras. So in this case, the delimiter is added to the final item in the list. A shout out to Dave D'Amico for writing that last example.