Feature - Rejected Expose fuzzy matching API to ASH please?

philmasterplus

Active member
Like this: boolean fuzzy_match( string match_from , string match_target )

Because right now I'm trying to do something like this:
PHP:
string target = "underworld";
foreach it in $get_inventory[]
   if ( it.to_string().fuzzy_match( target ) )
      put_display( item_amount( it ) , it );

I could implement it the way it was defined in StringUtilities.java, but it would take too long, ASH being an interpreted language run by a program written in another semi-interpreted language.
 

heeheehee

Developer
Staff member
PHP:
string target = "underworld";
foreach it in get_inventory()
   if ( it.to_string().contains_text( target ) )
      put_display( item_amount( it ) , it );
I think that should work?
 

philmasterplus

Active member
Thank you, heeheehee, but I'm afraid the fuzzy matching algorithm seems more complicated than that. (BTW Devs you wouldn't mind me pasting KoLmafia source code do you?)

From /src/net/sourceforge/kolmafia/utilities/StringUtilities.java, revision 8350:

PHP:
 private static final boolean fuzzyMatches( final String sourceString, final String searchString,
416 	final int lastSourceIndex, final int lastSearchIndex )
417 	{
418 	int maxSearchIndex = searchString.length() - 1;
419 	
420 	if ( lastSearchIndex == maxSearchIndex )
421 	{
422 	return true;
423 	}
424 	
425 	// Skip over any non alphanumeric characters in the search string
426 	// since they hold no meaning.
427 	
428 	char searchChar;
429 	int searchIndex = lastSearchIndex;
430 	
431 	do
432 	{
433 	if ( ++searchIndex > maxSearchIndex )
434 	{
435 	return true;
436 	}
437 	
438 	searchChar = searchString.charAt( searchIndex );
439 	}
440 	while ( Character.isWhitespace( searchChar ) );
441 	
442 	// If it matched the first character in the source string, the
443 	// character right after the last search, or the match is on a
444 	// word boundary, continue searching.
445 	
446 	int sourceIndex = sourceString.indexOf( searchChar, lastSourceIndex + 1 );
447 	
448 	while ( sourceIndex != -1 )
449 	{
450 	if ( sourceIndex == 0 || sourceIndex == lastSourceIndex + 1 || !Character.isLetterOrDigit( sourceString.charAt( sourceIndex - 1 ) ) )
451 	{
452 	if ( StringUtilities.fuzzyMatches( sourceString, searchString, sourceIndex, searchIndex ) )
453 	{
454 	return true;
455 	}
456 	}
457 	
458 	sourceIndex = sourceString.indexOf( searchChar, sourceIndex + 1 );
459 	}
460 	
461 	return false;
462 	}

Which is a character-by-character match that matches ogm, origamigm, gentlemag, or ogmmag with origami "gentleman's" magazine--you get the idea.
 
Last edited:

jasonharper

Developer
Notice that 'private' keyword at the top of the code you quoted? This isn't even an API that's available to the rest of KoLmafia, it's just an internal implementation detail that's subject to change. It's only a tiny part of how user input is parsed: the fuzzy matcher is in fact the fourth thing that is tried, after the other ways of matching (exact, substring starting on word boundary, and unrestricted substring) have been tried and failed. In other words, even if this routine was made public, you wouldn't get behavior that corresponded to how user input is parsed anywhere else in the program.

The real string matcher (in ItemFinder.java) is unfortunately not set up to be called in the way that you want; it's designed to work in terms of matching a string against an entire array of known names. Furthermore, it's optimized for the expected case that there will be only a few name arrays, seldom changing - it precalculates some data on the first call with a given array, in order to speed up subsequent calls with the same array. Giving it a new array each time (which would be unavoidable if given an ASH interface) would result in horrible performance.
 

philmasterplus

Active member
Notice that 'private' keyword at the top of the code you quoted? This isn't even an API that's available to the rest of KoLmafia, it's just an internal implementation detail that's subject to change. It's only a tiny part of how user input is parsed: the fuzzy matcher is in fact the fourth thing that is tried, after the other ways of matching (exact, substring starting on word boundary, and unrestricted substring) have been tried and failed. In other words, even if this routine was made public, you wouldn't get behavior that corresponded to how user input is parsed anywhere else in the program.

The real string matcher (in ItemFinder.java) is unfortunately not set up to be called in the way that you want; it's designed to work in terms of matching a string against an entire array of known names. Furthermore, it's optimized for the expected case that there will be only a few name arrays, seldom changing - it precalculates some data on the first call with a given array, in order to speed up subsequent calls with the same array. Giving it a new array each time (which would be unavoidable if given an ASH interface) would result in horrible performance.

Thank you, though I'd still like some similar feature implemented, because I'm trying to search for items that match a query:
Code:
search( aggregate , "iotm retradeable" ) => return all softcore-only iotm except the mayfly bait
search( aggregate , "ascension rewards equipment" ) => return all stainless steel/plexiglass/brimstone items
search_price( aggregate , 10000000 , 20000000 ) => return all items in aggregate that has historical_price() between 10 and 20 mil
(to avoid repetitive abuse and handle items that are traded through the gift shop, the price information would be stored separately for each item)

I was trying to build a collection of keywords for each item, break down the query into sub-queries, and then check whether each item and its associated keywords matched all of the sub-queries. Could anyone please point me to a better direction? Or would the entire thing be too resource-intensive to be feasable? On my computer, building the keyword database and searching the query took at most a few seconds, but I can't tell for other computers.
 
Last edited:
Top