Algebra in ASH?

Catch-22 · Aug 28, 2009

the_great_cow_guru said:
Code:

boolean is_numeric(string test) { return create_matcher("^(-?[0-9]+)((\\.?[0-9]+)?)$" , test).find(); }

Yes you can do exactly that. I just wrote it the other way because I thought it would be easier for someone unfamiliar with the matcher to follow. I'm actually not sure why I used to find() method. The matches() method is actually much better suited for this task:

Code:

boolean is_numeric(string test) {
    return create_matcher("^(-?[0-9]+)((\\.?[0-9]+)?)$" , test).matches();
}

the_great_cow_guru said:
As for the negative numbers, true you could represent them like that. I wasn't aware of any restriction that said you couldn't use negative numbers in RPN though, and even if that is the case, it can't hurt to have a nice shortcut for the users, right?

You're not wrong actually. It's not just a shortcut for users either...

Here's a simple breakdown of representing -1 as 4 5 -

push 4
push 5
pop 5
pop 4
call eval_rpn
push -1

Here's a simple breakdown of representing -1 as -1

push -1

Haha, no contest?

I'm sure you could write a really small footprint implementation that uses unsigned integers but for the purposes of this, I think you're better off just pushing a negative number on the stack to begin with.

There's a lot more optimizations I can see in there that can be done, I am currently at work so I shouldn't even be here

Here's a suggestion for the is_operator function:

Code:

boolean is_operator(string test) {
    return create_matcher("\\+|-|\\*|/|\\^|if|max|min|not", test).matches();
}

Happy coding!

the_great_cow_guru · Aug 28, 2009

Thanks, you seem quite knowledgeable about all this regex and matcher stuff, it certainly makes the code a lot neater

Does that second regex need the start and end characters? (^ and $)

Catch-22 · Aug 28, 2009

the_great_cow_guru said:
Thanks, you seem quite knowledgeable about all this regex and matcher stuff, it certainly makes the code a lot neater

Hopefully faster too

the_great_cow_guru said:
Does that second regex need the start and end characters? (^ and $)

No. The matches() method matches the entire string anyway (you could actually remove the ^ and $ from the is_numeric pattern too if you are using matches(), but no harm in keeping it there). If you wanted to use find() the pattern would get really cluttered and look like this:

"^\\+$|^-$|^\\*$|^/$|^\\$|^if$|^max$|^min$|^not$"

Which is when I realized I should have been using matches() the whole time, haha.

Edit:

Oh, to better explain myself, the ^ and $ mean match the beginning and end of a string.

If you left them out and were using find() with the pattern "not" something like "nottingham" would match. find() would see the pattern "not" at position 1 in "nottingham" and returns 1 (that's True in boolean). If you were looking for "^not$" then "nottingham" wouldn't match, because "not" has to be the beginning and end of the string. ^ means match the beginning and $ means match the end.

If you notice the | in the pattern, that's an OR, so in the case of find() it wants the string to start and end with "if" ("^if$") OR start and end with "max" ("^max$") so the resulting expression is "^if$|^max$" and so on. So you can see how it starts to get cluttered.

matches() on the other hand, will check if the entire string matches the pattern, so ^ and $ are not needed.

Hope that makes some sense

the_great_cow_guru · Aug 28, 2009

Yeah, it probably does speed it up too.

Ah ok, in that case matches is definitely better for this situation.

EDIT: huh, looks like the matches() function isn't working so I used find, and I wrote the is_operator without so many ^ and $

"^(\\+|-|\\*|/|\\^|if|max|min|not)$"

I know I said I didn't want to flood the thread with copies of this script, but I didn't want to forget all the changes we needed to make, so here's the updated version, tested and works

EDIT (again): cleaned up the file some more

Catch-22 · Aug 28, 2009

the_great_cow_guru said:
Yeah, it probably does speed it up too.

Ah ok, in that case matches is definitely better for this situation.

EDIT: huh, looks like the matches() function isn't working so I used find, and I wrote the is_operator without so many ^ and $

"^(\\+|-|\\*|/|\\^|if|max|min|not)$"

I know I said I didn't want to flood the thread with copies of this script, but I didn't want to forget all the changes we needed to make, so here's the updated version, tested and works

Looks good man, also nice work simplifying the find() pattern. Silly me for not thinking of enclosing the patterns in brackets, haha. Sorry I didn't test any of what I've written. So the matches() method isn't supported by mafia, or it just didn't return the correct result? I still think matches() is better suited for both tasks, if we can get it to work.

I don't think it's bad to have a few copies floating around here, seeings it's scripting discussion. I can see a few areas that could still use some tweaking but once we're all relatively happy with it someone could post this in repository. Start with version 1.0 and we can update the thread as needed.

the_great_cow_guru · Aug 28, 2009

Yeah as far as I can tell, it's not supported by mafia. They might have it by another name or something.

Yeah true, guess there's no need to worry as long as it's only in scripting discussion. So what else do you think we should add/change?

Catch-22 · Aug 28, 2009

the_great_cow_guru said:
Yeah as far as I can tell, it's not supported by mafia. They might have it by another name or something.

Hmm... I wonder if this would work any better:

Code:

boolean is_operator(string test) {
    return create_matcher("\\+|-|\\*|/|\\^|if|max|min|not").matcher(test).matches();
}

Give it a try.

the_great_cow_guru said:
So what else do you think we should add/change?

Well, I think some performance gains could be realized by using switch/case instead of if-then-else in eval_rpn:

Code:

if (is_numeric(val)) {
    push(stack, to_float(val));
} else if (is_operator(val)) {
    switch (val) {
            case "+":
                float a = pop(stack);
                float b = pop(stack);
                push(stack, b+a);
                break;
            case "-":
                float a = pop(stack);
                float b = pop(stack);
                push(stack, b-a);
                break;
            case "*":
                float a = pop(stack);
                float b = pop(stack);
                push(stack, b*a);
                break;
            //etc.
    }
} else {
    if (!(variables contains val)) abort("failed to supply a value for variable: " + val);
    push(stack, variables[val]);
}

What do you think?

the_great_cow_guru · Aug 28, 2009

matches still didn't work.

I don't know if the switch is faster, but it does look a little neater so I swithed it. Also cleaned up the eval_rpn_character function a bit. Are there any other variables you think we should add by default?

(updated the file in my previous post)

Catch-22 · Aug 28, 2009

the_great_cow_guru said:
Also cleaned up the eval_rpn_character function a bit. Are there any other variables you think we should add by default?)

Hmm to be honest with you, I think everything after eval_rpn needs to be cleaned up/placed in a different file, so we can keep the RPN library separate. You could then have like an "rpn_example.ash" which imports rpn.ash at the top and work from there.

cakyrespa · Aug 28, 2009

Catch-22 said:
Hmm to be honest with you, I think everything after eval_rpn needs to be cleaned up/placed in a different file, so we can keep the RPN library separate. You could then have like an "rpn_example.ash" which imports rpn.ash at the top and work from there.

I think having eval_rpn() is good where it is. It basically is the entire point of the RPN library. Also, it minimizes the number of libraries to import since I'm already using it in two of my libraries. I haven't tracked the recent changes, but I have the ones from earlier this evening thrown on my subversion site (including a "\\s" to "\\s+" but fixed). Tomorrow, I'll add the stuff you guys are adding and post that.

I was also thinking about asking the moderates about moving this to Informational Scripts and I'll make this thread the project page for it (with the most recent on the front page and changelogs).

EDIT: Oops, didn't read very well, Yeah, I could remove the main() but I figured the only time you'll use that is when you call it directly. Seems like a good place (in a utility script) to show information about the script, how it is intended to be used, and the tests. I don't know if it hurts anything to leave there and it means that something happens when you run it directly.

the_great_cow_guru · Aug 28, 2009

Catch-22 said:
I think everything after eval_rpn needs to be cleaned up/placed in a different file, so we can keep the RPN library separate.

I agree that the test and the main should be taken out. But I would say that the eval_rpn_character is an important part of the library. That way people who want to do calculations with different stats and stuff can do it quick and easy without having to specify a long list of variables.

the_great_cow_guru · Aug 28, 2009

cakyrespa said:
I was also thinking about asking the moderates about moving this to Informational Scripts and I'll make this thread the project page for it (with the most recent on the front page and changelogs).

Might want to hold off a little bit since we've made a lot of changes lately. Once we combine all our changes and factor out the test functions into an example script then we can ask them to move it.

the_great_cow_guru · Aug 28, 2009

Ok here's a version with the testing functions factored out and caky's "\\s+" change. If you had any more changes you should merge them with this one, and then we can have the mods move this thread

Catch-22 · Aug 28, 2009

Sorry guys, can't quite keep up. My session keeps expiring and I'm supposed to be doing work

the_great_cow_guru said:
matches still didn't work.

I had a look at "/kolmafia/src/net/sourceforge/kolmafia/textui/RuntimeLibrary.java" and it turns out matches() hasn't actually been implemented. It'd be pretty easy to do though.

Code:

public static Value matches( final Value matcher )
{
	Matcher m = (Matcher) matcher.rawValue();
	return m.matches();
}

We would need someone with SVN access to pass off on and commit the changes though.

I also had a look at the find() function and it seems like it doesn't work the same way as in java.util.regex. Unless I am mistaken, it looks to me that the result of find() is always converted to boolean before being returned, so you're not going to get the position of the match in the string.

Code:

public static Value find( final Value matcher )
{
	Matcher m = (Matcher) matcher.rawValue();
	return DataTypes.makeBooleanValue( m.find() );
}

I'm not really sure why it has been implemented this way.

cakyrespa · Aug 28, 2009

Bah, can't test the changes tonight. Oh well, I really should be going to bed myself. That whole "work" thing. I think I integrated all of the changes tonight into my version, but I can't really test it so I won't post it until tomorrow.

http://svn.mfgames.com/KoLmafia/rpn/

If you want to see, I do have it up on my SVN server for now (easier to track changes in my opinion). Once I get a chance to test it, I'll post it.

I also added a bit of debugging information since I was having trouble with some bad RPN strings. It includes a flag to optionally debug the statement, though it isn't perfect yet and really needs the stack in addition to the RPN string.

Though, I'd rather have main() in there. Even if it is just a few print statements.

It really is only called when someone uses the script directly.

jasonharper · Aug 28, 2009

The matches() method is redundant, since you can do exactly the same thing by putting ^, $ around your pattern. I would even expect that to be slightly faster, since the regex pattern compiler would know that it doesn't need to bother with any optimizations that would only help in finding the starting point of the match.

The underlying Java find() method returns a boolean, so why would ASH be any different? To get the position of the match, use the start() and end() methods.

Catch-22 · Aug 28, 2009

jasonharper said:
The underlying Java find() method returns a boolean, so why would ASH be any different? To get the position of the match, use the start() and end() methods.

Sorry jasonharper, I misunderstood the way find() works. Why is DataTypes.makeBooleanValue called if find() returns a boolean?

I am actually more curious now why there is a matches() and a find() function, when find() can be used for the same thing (which we are doing).

I was under the impression that matches() works better if you are matching the entire region as the matcher does not need to be reset each time it's called.

jasonharper · Aug 28, 2009

All ASH values, variables, and expressions are actually represented as instances of the Value class, so that they can be handled interchangeably; raw Java booleans, ints, Strings, etc. are not suitable. You'll notice that all of the RuntimeLibrary functions turn their return value into a Value somehow - often via new Value(...), but that would be wasteful in the case of booleans since there are only two values possible.

I'm not really sure why both find() and matches() exist - and there's also lookingAt(), which matches only at the beginning of the string (as if you'd used ^ but not $). I guess they're useful when a single Pattern needs to be used in more than one of these ways, but that doesn't seem like a very common need.

Catch-22 · Aug 28, 2009

jasonharper said:
All ASH values, variables, and expressions are actually represented as instances of the Value class, so that they can be handled interchangeably; raw Java booleans, ints, Strings, etc. are not suitable. You'll notice that all of the RuntimeLibrary functions turn their return value into a Value somehow - often via new Value(...), but that would be wasteful in the case of booleans since there are only two values possible.

Makes sense now.

jasonharper said:
I'm not really sure why both find() and matches() exist - and there's also lookingAt(), which matches only at the beginning of the string (as if you'd used ^ but not $). I guess they're useful when a single Pattern needs to be used in more than one of these ways, but that doesn't seem like a very common need.

lookingAt() and matches() work in similar ways. lookingAt() stops as soon as it finds a match, but the entire string doesn't have to match. It's not the same as having ^ but not $. lookingAt() trying to find "in" in "the cat in the hat" would work.

lookingAt() always starts from the beginning of the string, like matches(), whereas find() will look for anything in the string. So looking for ".at" in "the cat in the hat", matches() would find nothing, lookingAt() would find "cat", find() would find "cat" and "hat".

This was why I thought the matches() method was going to be more efficient in this case (and I still do).

matches()

Code:

    public boolean matches() {
        return match(from, ENDANCHOR);
    }

lookingAt()

Code:

    public boolean matches() {
        return match(from, NOANCHOR);
    }

find()

Code:

    public boolean find() {
        int nextSearchIndex = last;
        if (nextSearchIndex == first)
            nextSearchIndex++;

        // If next search starts before region, start it at region
        if (nextSearchIndex < from)
            nextSearchIndex = from;

        // If next search starts beyond region then it fails
        if (nextSearchIndex > to) {
            for (int i = 0; i < groups.length; i++)
                groups[i] = -1;
            return false;
        }
        return search(nextSearchIndex);
    }

You can see how I got confused with find() thinking that it returns the index.

So based on those excerpts there seems to be a lot more going on in the find() function, like the internal counters for example, which led me to believe matches() was better suited.

jasonharper · Aug 28, 2009

Huh, I wasn't aware that any of the Java internal code was available online. I'd always assumed that they used some existing C or C++ regex engine, but no, they implemented the whole thing in Java.

Anyway, it appears that matches() and lookingAt() ultimately call exactly the same matching code as find(), just using an alternate version of the compiled pattern that's exactly the same as if the pattern started with "^" (and if the pattern actually did start with "^", the same version is used in both cases). The additional handling of the last search position that find() has to do is negligible compared to the actual matching operation, so I'm going to stand by my statement that there's no performance benefit to having matches() available.

Algebra in ASH?

Active member

Member

Active member

Member

Attachments

Active member

Member

Active member

Member

Active member

Member

Member

Member

Member

Attachments

Active member

Member

Developer

Active member

Developer

Active member

Developer