-- "give me all <tr> nodes that are the direct child of a <table> node."
Did you pay attention to the bold text in the above post? Go read it again.
> test xpath //table/tr
-- "give me all <tbody> nodes that are the direct child of a <table> node."
Good, you learned!
> test xpath //table/tbody
-- "give me all <tr> nodes that are contained by a <table> node."
And thus, the difference between a single forward slash and two: the former means "direct children named x" and the latter means "global search inside these nodes for children named x". I'm sure I'm not exactly precise on the terminology, but you catch my drift.
> test xpath //table//tr
-- "give me all <td> nodes that have attribute id equal to 'first_cell'"
Two new things here. First, @ means attribute. We've been searching for node (aka tag) names before, and now this is the syntax for attributes.
> test xpath //td[@id="first_cell"]
Second, [this is a predicate]. When I write something[something_else], that translates to "give me the somethings WHICH HAVE A something_else". Useful.
-- "give me THE TEXT OF all <td> nodes that have attribute id equal to 'first_cell'"
Same query as last time, but this time we applied the text() function to it. This is a special function that will recursively concatenate the text contents of a given node's children and return that, instead of the node itself. Very useful.
> test xpath //td[@id="first_cell"]/text()
1: Cell 1
-- "give me the text of all <p> nodes that are the dc of a <td> node that are the second <tr> child of a <tbody> node that are the dc of a table node that are the dc of a body node."
Here, I've not used any global (//) searches, instead specifying the exact full path to a node. If you care about eking every last drop of performance out of your xpath queries, this is the way to do it - the engine doesn't have to do any backtracking, it just goes right to the desired spot in the DOM. (in general, don't worry about performance; xpath is fast) Note that you can specify by index which nodes you want - there are two <tr> nodes in that <tbody> nodes; I selected the second. Note also that indexing is one-based (sigh).
> test xpath /body/table/tbody/tr/td/p/text()
1: with a paragraph in it
As you can see, the english translations of some of these xpath queries is going to get quite long. That's kind of the point - xpath is a very compact, expressive language. It does its job better than English.
-- "give me the text of all <tr> nodes that have a class attribute."
Pretty straightforward. If you don't specify what the attribute needs to be equal to, you're just saying it needs to exist.
> test xpath //tr[@class]/text()
1: Cell 3 with a paragraph in it
-- "give me any nodes which have any direct children which have a class equal to banana."
Now we're getting somewhere. You can nest predicates, that's pretty cool. And you can specify * to match any node, that's cool too. This is a tool/trick that will come up a lot - you have one exact spot in the DOM you want to match, but then you want to "backtrack" up a few nodes to grab more text around it. This is how you do it.
> test xpath //*[*[@class="banana"]]
But wait - if you look at the HTML source... why did this match <body>? Two class="banana" nodes have the same <p> parent, there should only have been two matches, right? Bzzt. Go back and read the bold text in the above post. Your browser and the parser implicitly closed the <p> tag right before the <div> tags that were nested inside it. Did you catch that on first reading through the HTML? Of course not - you're a human. Only look at the parsed DOM, don't trust the page source.