Yes, all the Java utility classes are written in Java. Well, actually, everything in JDK is written in JavaHuh, I wasn't aware that any of the Java internal code was available online. I'd always assumed that they used some existing C or C++ regex engine, but no, they implemented the whole thing in Java.
Anyway, it appears that matches() and lookingAt() ultimately call exactly the same matching code as find(), just using an alternate version of the compiled pattern that's exactly the same as if the pattern started with "^" (and if the pattern actually did start with "^", the same version is used in both cases). The additional handling of the last search position that find() has to do is negligible compared to the actual matching operation, so I'm going to stand by my statement that there's no performance benefit to having matches() available.
Each of the functions is there for a specific purpose and they all act differently.
matches() and lookingAt() use match():
Code:
boolean match(int from, int anchor) {
this.hitEnd = false;
this.requireEnd = false;
from = from < 0 ? 0 : from;
this.first = from;
this.oldLast = oldLast < 0 ? from : oldLast;
for (int i = 0; i < groups.length; i++)
groups[i] = -1;
acceptMode = anchor;
boolean result = parentPattern.matchRoot.match(this, from, text);
if (!result)
this.first = -1;
this.oldLast = this.last;
return result;
}
Whereas find() uses search():
Code:
boolean search(int from) {
this.hitEnd = false;
this.requireEnd = false;
from = from < 0 ? 0 : from;
this.first = from;
this.oldLast = oldLast < 0 ? from : oldLast;
for (int i = 0; i < groups.length; i++)
groups[i] = -1;
acceptMode = NOANCHOR;
boolean result = parentPattern.root.match(this, from, text);
if (!result)
this.first = -1;
this.oldLast = this.last;
return result;
}
What I don't get is why there is a parentPattern.matchRoot.match and a parentPattern.root.match and why find() doesn't just return matches(start, NOANCHOR) and do away with the search() function completely (search() is only called by find()).
I think I do understand what you are getting at with the ^. To be more precise, a Boyer-Moore or a completely anchored search (starts with ^ and also ends with $) skips certain parts of the pattern compiler as part of it's optimization.
There is still a difference between the way search() and match() handle anchors in multi-line mode though.
To break away from Java Discussion and get back to Scripting Discussion, whilst I think that matches() is better suited, the use of find() is going to be fine as the strings we are working with have already been split and by default KoLmafia operates in single-line (PERL) mode, so the downsides are pretty minimal.
Last edited: