What does preceding-sibling do in XPath?
preceding-sibling:: is an XPath axis that selects sibling nodes appearing before the context node under the same parent. The syntax is context/preceding-sibling::tag[predicate], so //input[@id='email']/preceding-sibling::label[1] returns the closest <label> before the email input. Its forward counterpart is following-sibling::, and preceding-sibling::* matches every prior sibling regardless of tag.
The common gotcha is index direction. On the preceding-sibling axis, position counts backward from the context node, so [1] is the immediately previous sibling, [2] is the one before that, and so on. This trips up developers who expect document order. If you want the first sibling in document order under the parent, do not write preceding-sibling::li[1]. Use preceding-sibling::li[last()] instead.
Nearest previous td:
//td[@class='value']/preceding-sibling::td[1]First td under the parent (in document order):
//td[@class='value']/preceding-sibling::td[last()]Do not confuse preceding-sibling with preceding. The preceding axis selects every node appearing before the context node anywhere in the document, except ancestors, attributes, and namespace nodes. preceding-sibling is restricted to nodes that share the same parent. //h2[@id='intro']/preceding::p walks back across the whole document, while //h2[@id='intro']/preceding-sibling::p stays inside the parent of that <h2>.
Related articles
All articles →Web Scraping with XPath in Selenium
Using XPath in Selenium for scraping helps to parse dynamic elements and to find element at any level of DOM structure.
XPath vs CSS: Why Web Scrapers Should Stop Listening to QA Testers
Use CSS Selectors for browser automation (clicks) and XPath for data extraction. See Python benchmarks proving XPath is faster in lxml.
How to Select Elements By Text in XPath?
Discover basic and advanced XPath techniques for selecting web elements by text, including contains(), text(), regular expressions, and more.
Run XPath against pages that actually loaded
HasData fetches blocked, rendered, or protected pages, so XPath has something to query.