Algebra in ASH?

Catch-22

Active member
Huh, I wasn't aware that any of the Java internal code was available online. I'd always assumed that they used some existing C or C++ regex engine, but no, they implemented the whole thing in Java.
Yes, all the Java utility classes are written in Java. Well, actually, everything in JDK is written in Java :)

Anyway, it appears that matches() and lookingAt() ultimately call exactly the same matching code as find(), just using an alternate version of the compiled pattern that's exactly the same as if the pattern started with "^" (and if the pattern actually did start with "^", the same version is used in both cases). The additional handling of the last search position that find() has to do is negligible compared to the actual matching operation, so I'm going to stand by my statement that there's no performance benefit to having matches() available.

Each of the functions is there for a specific purpose and they all act differently.

matches() and lookingAt() use match():
Code:
    boolean match(int from, int anchor) {
        this.hitEnd = false;
        this.requireEnd = false;
        from        = from < 0 ? 0 : from;
        this.first  = from;
        this.oldLast = oldLast < 0 ? from : oldLast;
        for (int i = 0; i < groups.length; i++)
            groups[i] = -1;
        acceptMode = anchor;
        boolean result = parentPattern.matchRoot.match(this, from, text);
        if (!result)
            this.first = -1;
        this.oldLast = this.last;
        return result;
    }

Whereas find() uses search():
Code:
    boolean search(int from) {
        this.hitEnd = false;
        this.requireEnd = false;
        from        = from < 0 ? 0 : from;
        this.first  = from;
        this.oldLast = oldLast < 0 ? from : oldLast;
        for (int i = 0; i < groups.length; i++)
            groups[i] = -1;
        acceptMode = NOANCHOR;
        boolean result = parentPattern.root.match(this, from, text);
        if (!result)
            this.first = -1;
        this.oldLast = this.last;
        return result;
    }

What I don't get is why there is a parentPattern.matchRoot.match and a parentPattern.root.match and why find() doesn't just return matches(start, NOANCHOR) and do away with the search() function completely (search() is only called by find()).

I think I do understand what you are getting at with the ^. To be more precise, a Boyer-Moore or a completely anchored search (starts with ^ and also ends with $) skips certain parts of the pattern compiler as part of it's optimization.

There is still a difference between the way search() and match() handle anchors in multi-line mode though.

To break away from Java Discussion and get back to Scripting Discussion, whilst I think that matches() is better suited, the use of find() is going to be fine as the strings we are working with have already been split and by default KoLmafia operates in single-line (PERL) mode, so the downsides are pretty minimal.
 
Last edited:

cakyrespa

Member
That's cool. I never delved into the depths of Java's regex operations. I'm more of a C# person. But, it looks like the rpn.ash script is okay with find() at this point, and it is anchored at both ends. Attached is the rolled up version of RPN which includes a few cleanups by the_great_cow_guru, a bit of zlib checking, and some minor cleanups. I'm also calling this "version 4".

And it is working beautiful for my levelup clone. Except it keeps insisting that the bedroom is 1.2 stat points better than the ballroom when it comes to leveling moxie and I have absolutely no clue why. Oh well, that's math and perl, not the calculator's fault.

I do appreciate everyone's help so far with this, it made my other scripts much more concise since I can pre-generate the formulas for individual areas but still take into account the character's changing situation.
 

Attachments

  • rpn.ash
    6.1 KB · Views: 19
  • rpn_example.ash
    988 bytes · Views: 19

cakyrespa

Member
And I'm itching to get my hands on your levelup clone. :)

Right now, it only handles finding the best "naked" location for adventuring for the most part. I think I got most (say 70%) of the noncombats parsed for formulas but it makes some bad decisions on the best noncombat choice to pick; I think I need to write a custom location-based script for things like the ballroom, bedroom, and barrr. It also uses this RPN to calculate the stat caps on the Hidden Temple (and like), so it seems to automatically stop using it when you find a better place. I'm also parsing out one-times and conditional adventures, so it just makes assumptions on stuff you would expect to encounter frequently. I also split out the clover, so I should be able to make different decisions when you have clovers.

I'm also working on getting a extensive ASH-accessible set of data on items, effects, familiars, and monsters (CakeLib). In formula form. Mainly so I can calculate it without any other additions. Unfortunately, it redoes some of KoLmafia's work but I want to be able to work with various things before adjustments and be able to break out the adjustments individually.

The first three are so I can maximize stat gain while making sure the character can survive the area. The last so I can calculate item drops so I can figure out how much meat I can sell for the meat side of LevelUp. And so I know what items produce what effects. That way, I can see if we have a candy heart in our inventory I know it produces a specific effect which gives +3 attributes which contributes to the survivability of an area. That part is giving me a bit of a headache, mainly because I'm still wrapping my mind on how to make an area as +ML as possible but only improve muscle/moxie enough to get an area safe (defined by the Wossname threshold), productive (how many rounds to defeat the average opponent), and survivable (maximum % of damage in a single hit).

Still working on that last paragraph, plus I want to run through at least some the game with some limits. I overused it in the last few weeks and I'm pretty much able to handle everything at this point. I'm planning on defeating the NS today, going on vacation this weekend, then starting a new HC ascension next week (I only do HC, so everything works with that first).
 

Catch-22

Active member
Hi caky. Thanks for the credit mention.

I've been meaning to sit down and have a good play around with this to see if I can come up with any more optimizations, but I haven't had the chance. I can see one which doesn't require much changing. Now that you've got a pretty stable script up and running, it might be worth sacrificing some readability in the is_operator matcher.

Code:
"^(\\+|-|\\*|/|\\^|if|m(ax|in)|not)$"

Should work out slightly faster.

Another benefit can be realized by compiling the common patterns. At the moment you are actually compiling a pattern each time you push an operand and two patterns each time you push an operator, when really the same two patterns could be compiled just once each for the whole script.
 
Last edited:

cakyrespa

Member
Hi caky. Thanks for the credit mention.

You helped. That and it takes nothing to update credits.

Another benefit can be realized by compiling the common patterns. At the moment you are actually compiling a pattern each time you push an operand and two patterns each time you push an operator, when really the same two patterns could be compiled just once each for the whole script.

I looked at that briefly, but couldn't figure out how to create a matcher that didn't take the input string as part of the constructor. That wouldn't work but I couldn't find something that basically did "create_matcher(string): pattern" or something to that effect. Because of it, I left it the way it is.

EDIT: Yeah, it could be obvious, but I didn't find it just scanning through `ashref`
 
Would this do it? (found it on the ash wiki):

matcher reset( matcher m , string input )
Resets matcher to search from beginning of string, possibly with new input.

You could make it and initially just give it a blank string or something, and then later reset it:

Code:
matcher op_matcher = create_matcher(""^(\\+|-|\\*|/|\\^|if|max|min|not)$"", "")

boolean is_operator(string test) {
	reset(op_matcher, test).find();
}
 

Veracity

Developer
Staff member
Jason made create_matcher() save the compiled pattern the first time you call it. Subsequent calls with the identical pattern string will use the previously compiled pattern.
 

cakyrespa

Member
That is great. I probably won't make the "...|m(ax|in)|.." change though. Yeah, it might save some time, but I think at that point, given the compiled pattern, it would be shaving off maybe a few nanoseconds off the entire thing. :)

I am planning on adding sqrt() though, I need that for my formulas. But, I'll hold off until one of the Cakes comes out in the forum (or someone wants it).

I'm also thinking of a few formula manipulation routines. Like if you want to add "0.0" to "1.1", it would just use "1.1" instead of "0.0 1.1 +" which is overkill. Opinions?
 

Veracity

Developer
Staff member
TI am planning on adding sqrt() though, I need that for my formulas. But, I'll hold off until one of the Cakes comes out in the forum (or someone wants it).
Is there a problem with ASH's built-in square_root() function?

Wait. I get it. The RPN calculator doesn't hook up to it yet. Gotcha.
 

Catch-22

Active member
That is great. I probably won't make the "...|m(ax|in)|.." change though. Yeah, it might save some time, but I think at that point, given the compiled pattern, it would be shaving off maybe a few nanoseconds off the entire thing. :)

Hmm more like a few milliseconds per function call. What if you were performing calculations on a very large map? A few milliseconds becomes a few seconds. Every optimization counts :)

Not sure what you mean by "given the compiled pattern".

The optimization is not so much geared towards a faster match, but rather to minimize backtracking in the event that a match isn't found. In your case, this would result in sooner pushes of a variable :) If you're pushing a lot of variables; you might find that this optimization will have a bigger impact than you think.

Ultimately the decision is up to you though.
 

cakyrespa

Member
The optimization is not so much geared towards a faster match, but rather to minimize backtracking in the event that a match isn't found. In your case, this would result in sooner pushes of a variable :) If you're pushing a lot of variables; you might find that this optimization will have a bigger impact than you think.

True, but then would it be better to use something like?

^([-\\+\\*\\/\\^]|m(ax|in)|if|not|sqrt)$

EDIT: Actually ^(?:[-\\+\\*\\/\\^]|m(?:ax|in)|if|not|sqrt)$ might be slightly better for using non-capture groups which we don't use in the code. I think that is the syntax, been a while.

It uses the character selector instead of chaining statements, it might be faster depending on the regex implementation.

It would be interesting to see the performance of which ones work better. But, you still have to do 10k iterations or so to really test it. I found that regex operations are pretty fast in general, but you are right, a fraction of a millisecond can add up. But, I think the string operations in RPN are going to be more expensive than the regex operations combined. Just tokenizing the string probably is the slowest part of RPN and the stack operations are right up there.
 
Last edited:

Catch-22

Active member
True, but then would it be better to use something like?

^([-\\+\\*\\/\\^]|m(ax|in)|if|not|sqrt)$

EDIT: Actually ^(?:[-\\+\\*\\/\\^]|m(?:ax|in)|if|not|sqrt)$ might be slightly better for using non-capture groups which we don't use in the code. I think that is the syntax, been a while.

It uses the character selector instead of chaining statements, it might be faster depending on the regex implementation.

It would be interesting to see the performance of which ones work better. But, you still have to do 10k iterations or so to really test it. I found that regex operations are pretty fast in general, but you are right, a fraction of a millisecond can add up. But, I think the string operations in RPN are going to be more expensive than the regex operations combined. Just tokenizing the string probably is the slowest part of RPN and the stack operations are right up there.

I could not agree with you more!

This is a much better expression, in that according to my benchmarks it's about 25% faster than the original one.

Code:
^([\+\-\*/\^]|m(ax|in)|if|not|sqrt)$

Also re-ordering the expression (if max/min is more likely to appear than if/not/sqrt, they should appear first in the expression) has helped in this particular case.

I don't think non-capture groups are going to be beneficial in this case. I tried
Code:
^(?:[\+\-\*/\^]|m(?:ax|in)|if|not|sqrt)$
and
Code:
^(?:[\+\-\*/\^]|m(ax|in)|if|not|sqrt)$
They worked out slightly slower than the the expression without non-capture groups. It was still faster than our original expression though, thanks to the character selector.

I also agree with you that string splits and concatenation are currently the most expensive part of the script, I have had a few ideas for optimization but I would really need to sit down and properly nut them out before I start blurting out stuff that KoLmafia doesn't even support and sending everyone on a wild goose chase :D

EDIT: I should mention that my benchmarks are using the java.util.regex, which is what KoLmafia has implemented, so the benchmarks should be a pretty good indication.
 
Last edited:

cakyrespa

Member
Awesome, then I'll use:

Code:
^([\+\-\*/\^]|m(ax|in)|if|not|sqrt)$

With the next release. I keep forgetting about ordering, though I really should know better at this point in the game. So, I'm curious about performance. How fast is this in general? Like going through 10,000 iterations?

I was also thinking that putting a quick strength length check in the beginning would also be a relatively fast operation (length > 4 is by definition, false) and would negate the entire matching operation too.
 

Catch-22

Active member
So, I'm curious about performance. How fast is this in general? Like going through 10,000 iterations?

Actually, quite speedy. I went through 100,000 iterations in 3.7 seconds (I think) on an Intel T2330 CPU (1.6GHz). I can post the actual results of the benchmark up if you like, but it will have to wait until I get back to work as I was on my work laptop.

I was also thinking that putting a quick strength length check in the beginning would also be a relatively fast operation (length > 4 is by definition, false) and would negate the entire matching operation too.

I'm not sure if this will be worth it. java.util.regex may already check for length once the pattern has been compiled (not sure) and it's also going to slow down things that are valid. If you think you will be pushing more variables than you are operators/numbers then maybe (I don't see how that would ever be true, you're always going to want to perform some kind of operation on the variable).
 
Last edited:

Catch-22

Active member
Unfortunately this board doesn't support BBCode tables, but it's not too hard to read anyway :) Turned out it was 10,000,000 iterations, not 100,000.

Code:
[b]Iteration Benchmarks[/b]
^(\+|-|\*|/|\^|max|min|if|not|sqrt)$
Data:    max
Count:   10000000
SUN1.6:  3750 msecs

^(\+|-|\*|/|\^|max|min|if|not|sqrt)$
Data:    min
Count:   10000000
SUN1.6:  4203 msecs

^(\+|-|\*|/|\^|m(ax|in)|if|not|sqrt)$
Data:    max
Count:   10000000
SUN1.6:  4251 msecs

^(\+|-|\*|/|\^|m(ax|in)|if|not|sqrt)$
Data:    min
Count:   10000000
SUN1.6:  4375 msecs

^([\+\-\*/\^]|m(ax|in)|if|not|sqrt)$
Data:    max
Count:   10000000
SUN1.6:  2781 msecs

^([\+\-\*/\^]|m(ax|in)|if|not|sqrt)$
Data:    min
Count:   10000000
SUN1.6:  3047 msecs

^(?:[\+\-\*/\^]|m(ax|in)|if|not|sqrt)$
Data:    max
Count:   10000000
SUN1.6:  2906 msecs

^(?:[\+\-\*/\^]|m(ax|in)|if|not|sqrt)$
Data:    min
Count:   10000000
SUN1.6:  3110 msecs

^(?:[\+\-\*/\^]|m(?:ax|in)|if|not|sqrt)$
Data:    max
Count:   10000000
SUN1.6:  2859 msecs

^(?:[\+\-\*/\^]|m(?:ax|in)|if|not|sqrt)$
Data:    min
Count:   10000000
SUN1.6:  3141 msecs

[b]Compatibility Tests[/b]
Engines: SUN1.6
 

cakyrespa

Member
Awesome, thank you for posting the numbers. It does seem more reasonable, 10M for 2.3 seconds. I love how fast computers have gotten these days.
 
Top