Related Products

How to Select Elements By Text in XPath?

Valentina Skakun

Last update: 31 Oct 2024

In XPath, selecting elements by text is a core technique for navigating and extracting specific information within XML or HTML structures. This functionality allows for locating elements based on their visible text content, making XPath a powerful tool for text-based data extraction and processing.

Key Methods for Text Selection in XPath

XPath provides several core methods for working with text, and a few that were not initially designed for this purpose but are still very convenient to use. Let’s start by talking about the basic text search methods. We will test them on this demo site as an example.

text(): Select Elements by Exact Text Match

The text() method selects elements whose text matches exactly. It’s sensitive to case and whitespace, making it ideal for cases where text content is predictable.

Example:

//*[text()='MacBook']

This will match only those elements whose text exactly matches “MacBook.”

contains(): Select Elements by Substring Match

contains() is useful when searching for elements that include a specific substring. This method is case-sensitive and is commonly used for flexible text matching.

Example:

.//*[contains(text(), 'EOS')]

This query selects all elements containing “EOS” in their text. To refine this selection, you can specify a tag:

.//p[contains(text(), 'EOS')]

This returns only <p> tags that contain “EOS”.

ends-with(): Select Elements Ending with a Substring

The starts-with() method finds elements where the text begins with a specific substring. This method is case-sensitive, making it ideal for locating elements with a known prefix.

Example::

.//*[starts-with(text(), 'Ap')]

This matches elements with text starting with “Ap”.

ends-with(): Select Elements Ending with a Substring

While commonly available in XPath 2.0, ends-with() may not be supported in all environments. For XPath 1.0, you might need workarounds or conditional filtering in your code.

Example (XPath 2.0+):

.//*[ends-with(text(), 'ok')]

Additional Text Processing Methods

As we mentioned earlier, there are additional text search and processing methods beyond the basic ones. This section will discuss how to more accurately identify the necessary elements using additional XPath methods, which are well-suited for working with strings.

translate(): Ignore Case Sensitivity

When case-insensitivity is required, translate() can convert text to a uniform case. This is particularly useful for normalized text matching.

Example:

//*[starts-with(translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'mac')]

This matches elements where text starts with “mac” in any case.

not(): Exclude Elements by Text

The not() function filters out elements that contain specific text, which is valuable when removing certain elements from the result set.

Example:

//h4[not(contains(a/text(), 'Mac'))]

Find all Elements using XPath in Console

This selects all <h4> elements without the substring “Mac” in their text.

position(): Select Elements by Position

The position() function is useful for selecting elements based on their order in the DOM.

Example:

//h4[position() = 1]

To get the last item:

//h4[position() = last()]

You can also specify a range of elements:

//h4[position() >= 2 and position() <= 4]

normalize-space(): Remove Extra Spaces

The normalize-space() function removes leading, trailing, and extra whitespace within text, producing cleaner results for elements with complex spacing.

Example.

normalize-space("   This        is an      example")

Results in: “This is an example”

Advanced Techniques

Now that we’ve covered the primary methods for finding and processing text let’s dive into more advanced XPath techniques for working with text. We’ll start by exploring how to use regular expressions to find elements.

Regular Expressions in XPath 2.0+

For environments that support XPath 2.0+, regular expressions offer advanced matching capabilities. This can be useful for patterns like email addresses.

Example:

//*[matches(text(), '[\w\.-]+@[\w\.-]+')]

This finds text matching an email format.

Combining Methods for Complex Queries

XPath methods can be combined to create complex queries, useful for cases such as finding elements containing both ”@” and ”.” but not containing spaces.

Example:

//div[contains(text(),'@') and contains(text(),'.') and not(contains(text(),'  '))]/text()

This expression uses several criteria to determine whether a string is an email address:

It must contain the @ symbol.
It must contain at least one dot (.).
It must not contain any spaces.

The element will be ignored if any of these conditions are not met.

For further reading on related topics, explore these articles:

Valentina Skakun

I'm a technical writer who believes that data parsing can help in getting and analyzing data. I'll tell about what parsing is and how to use it.

Key Methods for Text Selection in XPath text(): Select Elements by Exact Text Match contains(): Select Elements by Substring Match ends-with(): Select Elements Ending with a Substring ends-with(): Select Elements Ending with a Substring Additional Text Processing Methods translate(): Ignore Case Sensitivity not(): Exclude Elements by Text position(): Select Elements by Position normalize-space(): Remove Extra Spaces Advanced Techniques Regular Expressions in XPath 2.0+Combining Methods for Complex Queries

Articles

Might Be Interesting

Jun 25, 2025

Bypass Cloudflare 1020: The Ultimate Guide for Web Scrapers (2025)

Tired of 'Access Denied'? Go beyond basic fixes. This guide teaches you how to bypass Cloudflare 1020 by mastering TLS fingerprinting, headers & stealth browsers.

Tutorials and guides Tools and Libraries Python

May 19, 2025

Best Ways to Find All URLs on Any Website

Find all URLs on a domain by using a site crawler, parsing the sitemap file, exploring robots.txt, applying search engine queries with operators, or writing a custom scraping script.

Tutorials and guides Python Use Cases

May 7, 2025

How to Scrape a Website that Requires Login with Python

Learn how to handle login authentication in Python using various methods, from basic auth and API endpoints to CSRF tokens, WAFs, reCAPTCHA, Scrapy, and cookie reuse.

Python Tutorials and guides Tools and Libraries

Related Products

How to Select Elements By Text in XPath?

Key Methods for Text Selection in XPath

text(): Select Elements by Exact Text Match

contains(): Select Elements by Substring Match

ends-with(): Select Elements Ending with a Substring

ends-with(): Select Elements Ending with a Substring

Additional Text Processing Methods

translate(): Ignore Case Sensitivity

not(): Exclude Elements by Text

position(): Select Elements by Position

normalize-space(): Remove Extra Spaces

Advanced Techniques

Regular Expressions in XPath 2.0+

Combining Methods for Complex Queries

On this page

Might Be Interesting

Bypass Cloudflare 1020: The Ultimate Guide for Web Scrapers (2025)

Best Ways to Find All URLs on Any Website

How to Scrape a Website that Requires Login with Python