How to Select Elements By Text in XPath?

Valentina Skakun Valentina Skakun
Last update: 31 Oct 2024

In XPath, selecting elements by text is a core technique for navigating and extracting specific information within XML or HTML structures. This functionality allows for locating elements based on their visible text content, making XPath a powerful tool for text-based data extraction and processing.

Key Methods for Text Selection in XPath

XPath provides several core methods for working with text, and a few that were not initially designed for this purpose but are still very convenient to use. Let’s start by talking about the basic text search methods. We will test them on this demo site as an example.

text(): Select Elements by Exact Text Match

The text() method selects elements whose text matches exactly. It’s sensitive to case and whitespace, making it ideal for cases where text content is predictable.

Example:

//*[text()='MacBook']

Try XPath to select the element

Try XPath to select the element

This will match only those elements whose text exactly matches “MacBook.”

contains(): Select Elements by Substring Match

contains() is useful when searching for elements that include a specific substring. This method is case-sensitive and is commonly used for flexible text matching.

Example:

.//*[contains(text(), 'EOS')]

Select both elements

Select both elements

This query selects all elements containing “EOS” in their text. To refine this selection, you can specify a tag:

.//p[contains(text(), 'EOS')]

This returns only <p> tags that contain “EOS”.

ends-with(): Select Elements Ending with a Substring

The starts-with() method finds elements where the text begins with a specific substring. This method is case-sensitive, making it ideal for locating elements with a known prefix.

Example::

.//*[starts-with(text(), 'Ap')]

Use XPath to find the element

Use XPath to find the element

This matches elements with text starting with “Ap”.

ends-with(): Select Elements Ending with a Substring

While commonly available in XPath 2.0, ends-with() may not be supported in all environments. For XPath 1.0, you might need workarounds or conditional filtering in your code.

Example (XPath 2.0+):

.//*[ends-with(text(), 'ok')]

Additional Text Processing Methods

As we mentioned earlier, there are additional text search and processing methods beyond the basic ones. This section will discuss how to more accurately identify the necessary elements using additional XPath methods, which are well-suited for working with strings.

translate(): Ignore Case Sensitivity

When case-insensitivity is required, translate() can convert text to a uniform case. This is particularly useful for normalized text matching.

Example:

//*[starts-with(translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'mac')]

Try translate() method

Try translate() method

This matches elements where text starts with “mac” in any case.

not(): Exclude Elements by Text

The not() function filters out elements that contain specific text, which is valuable when removing certain elements from the result set.

Example:

//h4[not(contains(a/text(), 'Mac'))]

Find all Elements using XPath in Console

Find all Elements using XPath in Console

This selects all <h4> elements without the substring “Mac” in their text.

position(): Select Elements by Position

The position() function is useful for selecting elements based on their order in the DOM.

Example:

//h4[position() = 1]

To get the last item:

//h4[position() = last()]

You can also specify a range of elements:

//h4[position() >= 2 and position() <= 4]

normalize-space(): Remove Extra Spaces

The normalize-space() function removes leading, trailing, and extra whitespace within text, producing cleaner results for elements with complex spacing.

Example.

normalize-space("   This        is an      example")

Results in: “This is an example”

Test normalize-space() method

Test normalize-space() method

Advanced Techniques

Now that we’ve covered the primary methods for finding and processing text let’s dive into more advanced XPath techniques for working with text. We’ll start by exploring how to use regular expressions to find elements.

Regular Expressions in XPath 2.0+

For environments that support XPath 2.0+, regular expressions offer advanced matching capabilities. This can be useful for patterns like email addresses.

Example:

//*[matches(text(), '[\w\.-]+@[\w\.-]+')]

This finds text matching an email format.

Combining Methods for Complex Queries

XPath methods can be combined to create complex queries, useful for cases such as finding elements containing both ”@” and ”.” but not containing spaces.

Example:

//div[contains(text(),'@') and contains(text(),'.') and not(contains(text(),'  '))]/text()

This expression uses several criteria to determine whether a string is an email address:

  1. It must contain the @ symbol.

  2. It must contain at least one dot (.).

  3. It must not contain any spaces.

The element will be ignored if any of these conditions are not met.

For further reading on related topics, explore these articles:

Blog

Might Be Interesting