How to Use XPath String Functions
While XPath is excellent for selecting nodes based on their position in the XML tree, its real power comes to life when you need to query the text content of those nodes. XPath provides a rich library of built-in string functions that allow you to search, manipulate, and format the text within elements and attributes.
Let's walk through the most common and useful string functions in XPath, providing practical examples for each based on a sample XML document.
Overview of String Functions
| Function | Description |
|---|---|
contains() | Checks if a string contains a specific substring. |
starts-with() | Checks if a string begins with a specific substring. |
substring-before() | Extracts the part of a string before a delimiter. |
substring-after() | Extracts the part of a string after a delimiter. |
substring() | Extracts a substring by position and length. |
string-length() | Returns the number of characters in a string. |
normalize-space() | Trims leading/trailing whitespace and collapses internal spaces. |
concat() | Joins two or more strings together. |
translate() | Replaces specific characters in a string. |
Practical Examples of Searching and Matching Functions
These functions are almost always used inside predicates [...] and return true or false.
All examples will refer to the following XML document:
<inventory>
<item id="bk101">
<author>Nolan, Tom</author>
<title>XML Guide</title>
<price>44.95</price>
<description>An in-depth look at creating applications with XML.</description>
</item>
<item id="bk102">
<author>Nollan, Tom</author>
<title>Midnight Rain</title>
<price>5.95</price>
<description>A former architect battles corporate zombies...</description>
</item>
</inventory>
contains()
Checks if the first string contains the second string. It is case-sensitive.
Example: Select all items whose title contains the word 'Developer'.
//item[contains(title, 'Developer')]
Output: this selects the first <item> element.
<item id="bk101">
<author>Nolan, Tom</author>
<title>XML Guide</title>
<price>44.95</price>
<description>An in-depth look at creating applications with XML.</description>
</item>
starts-with()
Checks if the first string begins with the second string. It is case-sensitive.
Example: Select all items whose ID attribute starts with 'bk'.
//item[starts-with(@id, 'bk')]
Output: this selects both <item> elements in the document.
<item id="bk101">
<author>Nolan, Tom</author>
<title>XML Guide</title>
<price>44.95</price>
<description>An in-depth look at creating applications with XML.</description>
</item>
<item id="bk102">
<author>Nollan, Tom</author>
<title>Midnight Rain</title>
<price>5.95</price>
<description>A former architect battles corporate zombies...</description>
</item>
Substring and Extraction Functions
These functions return a new string that is a piece of the original string.
substring-before()
Returns the portion of the first string that occurs before the first appearance of the second string.
Example: Get the last name of the first author.
substring-before(//item[1]/author, ',')
Output: this returns the string 'Nolan'.
substring-after()
Returns the portion of the first string that occurs after the first appearance of the second string.
Example: Get the first name of the first author.
substring-after(//item[1]/author, ', ')
Output: this returns the string 'Tom'. (Note the space after the comma in the delimiter).
substring()
Extracts a substring based on a starting position and optional length. Important: XPath positions are 1-based, not 0-based like in many programming languages.
Example: Get the first 3 characters of the first item's title.
substring(//item[1]/title, 1, 3)
Output: this returns the string 'XML'.
Practical Examples of Utility and Formatting Functions
string-length()
Returns the number of characters in a string. It is often used in predicates for filtering.
Example: Select all items with a title longer than 15 characters.
//item[string-length(title) > 15]
Output: this selects the first <item> ("XML Guide").
normalize-space()
Trims all leading and trailing whitespace and replaces sequences of multiple whitespace characters with a single space. This is incredibly useful for cleaning up data before a comparison.
Example: Select the item whose description is exactly 'An in-depth look...'.
//item[normalize-space(description) = 'An in-depth look at creating applications with XML.']
Output: this selects the first <item>, correctly matching the text even though the original XML contains extra
padding spaces.
concat()
Joins two or more strings into a single string. This is typically used to create a new value rather than to select a node.
Example: Create a string combining the author and title.
concat(//item[2]/author, ' - ', //item[2]/title)
Output: this returns the single string 'Nolan, Tom - Midnight Rain'.
translate()
Performs a character-by-character replacement. It takes three strings: the original, a string of characters to be replaced, and a string of replacement characters.
Example: Convert the first title to lowercase (for the characters 'X', 'M', 'L').
translate(//item[1]/title, 'XML', 'xml')
Output: this returns the string 'xml Guide'. The X is replaced by x, M by m, and L by l.
Conclusion
XPath's string functions are essential for moving beyond simple node selection. They provide the tools you need to inspect, compare, and manipulate the text content within your XML documents.
- Use
contains()andstarts-with()inside predicates[...]for powerful filtering. - Use
substring-before(),substring-after(), andsubstring()to extract specific pieces of data from a string. - Use
normalize-space()to reliably compare text values that may have inconsistent whitespace.
Mastering these functions will dramatically increase the power and precision of your XPath queries.