the DOM, regex, scalability, and other jargony words

Bale · Sep 30, 2014

HTML:

> test xpath charpane.html //center[3]//td[@valign="center"]/text()

1: The Sonata of Sneakiness (2)
2: Smooth Movements (7)
3: Peeled Eyeballs (7)
4: Pisces in the Skyces (12)
5: Elemental Saucesphere (52)
6: Empathy (815)
7: Fat Leon's Phat Loot Lyric (822)
8: Springy Fusilli (825)
9: Leash of Linguini (825)
10: Polka of Plenty (873)
11: Spirit of Bacon Grease (∞)

Effects.

Whoa. That was cool!! Is there a way to get the familiar block regardless of it being above or below effects?

Ulti · Sep 30, 2014

One of the best parser engines for html I've come across was http://simplehtmldom.sourceforge.net/
It's been a pleasure to work with from PHP and uses many workarounds to get around PHP's limitations.
I could build an equivalent in ash for the mafia scripting community. Would sit nicely together with the regular expression code I already wrote in the framework I was working on:
http://pastie.org/pastes/9607697/text?key=mghui6mebmihsjbrkelw

roippi · Sep 30, 2014

Bale said:
Whoa. That was cool!! Is there a way to get the familiar block regardless of it being above or below effects?

Sure.

HTML:

> test xpath charpane.html //tr[td[a[@class="familiarpick"]]]/text()

1: Tron, the 35 pound Rogue Program

> test xpath charpane-fambelow.html //tr[td[a[@class="familiarpick"]]]/text()

1: Tron, the 35 pound Rogue Program

roippi · Sep 30, 2014

(missed this post on previous page)

Darzil said:
It's a shame that compare won't also work with compact panel! (but I'm sure you can quickly do one that would)

It's the one really annoying thing about character pane in particular, there are different styles to support and different classes/avatars with different resources to handle. And the html is not consistent !

Oh. Yeah. I could possibly construct an xpath that would work for both compact/expanded... but it would be quite the hack. Comparing the HTML from both, the DOM looks completely different. (this is why you should do styling in CSS, people!)

Bale · Oct 1, 2014

roippi said:

Sure.

HTML:

> test xpath charpane.html //tr[td[a[@class="familiarpick"]]]/text()

1: Tron, the 35 pound Rogue Program

> test xpath charpane-fambelow.html //tr[td[a[@class="familiarpick"]]]/text()

1: Tron, the 35 pound Rogue Program

Now THAT is impressive

I see that I can grab the entire familiar block with

Code:

[COLOR="#808000"]> test xpath charpane.html //table[tbody[tr[td[a[@class="familiarpick"]]]]]/text()[/COLOR]

1: Familiar:(0/1 next @ 43)Dismal Jasper, the 22 pound Grimstone Golem

Now, how do I get that with the html included? Or am I going to have to go without any html and just convert all of ChIT at once? It's just that I am already set up to parse...

Code:

<table width=90%><tr><td colspan=2 align=center><font size=2><b>Familiar:</b><br>(0/1 next @ 43)</font></td></tr><tr><td align=center valign=center><a target=mainpane href="familiar.php" class="familiarpick"><img src="/images/itemimages/grimgolem.gif" width=30 height=30 border=0></a></td><td valign=center align=left><a target=mainpane href="familiar.php" class="familiarpick"><b><font size=2>Dismal Jasper</a></b>, the  <b>22</b> pound Grimstone Golem</font></td></tr></table>

Do I have to change everything when I start to use xpath? I'm kinda hoping to do it a bit at a time. Well, I suppose it is easier if I start changing it from the bottom up instead of the top down.

roippi · Oct 1, 2014

Bale said:
Now, how do I get that with the html included?

Well, you can. But that's the thing I haven't figured out how port to ash yet.

Normally, an xpath query returns an array of node objects. (TagNode objects, in HtmlCleaner, that have all the methods of the root object, including xpath evaluation) But when you apply the text() function, you're asking the parser to turn each node into its (recursively concatenated) text contents. So really you're dealing with two completely separate behaviors and I've only written the "test" command to work with the simpler one.

So, I have to figure out an ASH API that can handle xpath expressions which can return an array of strings OR an array of node objects. It's a bit icky.

Bale · Oct 1, 2014

So, once you figure that out, I'd just leave off the "/text()" at the end to get it with full html?

It's pretty darn amazing and I'm looking forward to getting my hands on this after the next release.

Ulti · Oct 1, 2014

Couldn't you just walk the nodes and reconstruct the innerHTML? Regarding giving ash node objects, I believe you could define a "record" containing integers referencing internal Java objects. I was trying to create some sort of union/mixed type before using records: http://pastie.org/pastes/9607704/text?key=vrvhs2ownvah97nhbvoew Alternatively you could add a new object type to ash similar to matcher.

roippi · Oct 1, 2014

Bale said:
So, once you figure that out, I'd just leave off the "/text()" at the end to get it with full html?

Ulti said:
Couldn't you just walk the nodes and reconstruct the innerHTML?

Huh. I was previously thinking of adding a new type to ASH (blah) but I think I like this option way more. Just always return strings; convert TagNode objects to their innerHTML equivalent when necessary. Yeah, that can work.

Bale said:
It's pretty darn amazing and I'm looking forward to getting my hands on this after the next release.

Well, I'm not averse to adding provisional new ASH functions before the point release. I'm just not keen on messing with mafia internals before then.

roippi · Oct 1, 2014

I'm sure I'll make changes to it, but you can play around with the xpath( str ) function I just added. Only works in relay scripts after you've invoked visit_url().

(or, well, you will be able to eventually, once I sort out build.xml)

Bale · Oct 1, 2014

How is xpath( string ) going to work? It simply responds to xpath commands parsing the last html visited in the browser? I figured I'd pass it two strings, one would be the html and the other would be the xpath command.

I'm curious about why you do it that way.

roippi · Oct 1, 2014

Not sure, really. I had this thought that I would eventually be clever and fetch the pre-parsed tree (since I'm going to use parsers as part of many requests, might as well reuse them) but... I don't know, I'm unable to brain tonight, it's obviously better if you can feed it arbitrary html. Need sleep.

Bale · Oct 1, 2014

Just to continue considering alternatives...

You could have both a two parameter and one parameter version where the one parameter version always queries the most recently parsed tree. That way I can switch html any time I desire, but still have the speed advantage of not needing to parse the same darn html every single time.

the DOM, regex, scalability, and other jargony words

Bale

Minion

Ulti

Member

roippi

Developer

roippi

Developer

Bale

Minion

roippi

Developer

Bale

Minion

Ulti

Member

roippi

Developer

roippi

Developer

Bale

Minion

roippi

Developer

Bale

Minion