Yeah. We have a test suite for ItemFinder which is pretty exhaustive for counts, items with commas, items by item number via square brackets or pilcrows or something, but we also have a "fuzzy matching" algorithm for items.
Executive summary, before a wall of text:
I would like to have a test suite specifically dealing with fuzzy matching for user input of item names.
Why does "croll" mean "scroll of drastic healing"? It is neither the first such match by itemId or alphabetically.
How, exactly, do we match "mmj" to "magical mystery juice"?
I think, somehow, that the bulk of all that is via ItemDatabase.getMatchingNames(String substring) calling StringUtilities.getMatchingNames(ItemDatabase.canonicalNames, substring).
ItemDatabase.canonicalNames is an (alphabetically) sorted array of "canonical" (i.e. lower-cased) item names.
Given that, ItemDatabase.getCanonicalName and ItemFinder.getFirstMatchingItemName also have "heuristics", some of which seem to have become dead code, many years after it was first written.
1) Let's demystify fuzzy name matching via a test suite.
2) ItemDatabase.getMatchingNames returns a list of names to ItemFinder.getMatchingItemList, which will give an error if there are multiple matches after filtering. Usually. Why not "croll"?
3) All the CLI commands which want to match items - "pull", "closet", etc. - call that. Which is to say, ambiguous input may or may not match what the user wants or expects.
4) Why do "ambiguity" errors not abort the CLI command?
---------------------------------------------------------------
I generally think of "fuzzy matching" as substring matching - and yes, that is part of it - but if a substring can't narrow an item down to a single match, heuristics!
Some of it is in ItemDatabase.getCanonicalName(String itemName, int count, boolean subStringMatch).
The non-substring-match part has some oddities dealing with plurals, but is relatively straightforward.
But the substring matching when we look for all sorts of weird pluralization is chock full of heuristics.
I think the idea was to let users type in plurals and we reduce them to a particular item id. I think that was before we made an effort to have accurate plurals for all items. Which is to say, parsing user input, rather than item names from KoL or ASH. (CLI scripts use the same commands as users might type - but I would hope that actual scripts will not depend on fuzzy matching.
But, look:
Code:
> pull 2 scrolls of drastic healing
[scrolls of drastic healing] has no matches.
> pull 2 scroll of drastic healing
Pulling items from storage...
Requests complete
That is the legitimate plural of that item, but, even though ItemFinder knows we are asking for 2, it's not looking up the name in a way that will check for plurals.
Looking for callers of ItemDatabase.getCanonicalName where "count" might be something other than 1:
Code:
./persistence/ItemDatabase.java:1147: String name = ItemDatabase.getCanonicalName(itemName, count, substringMatch);
./persistence/ItemDatabase.java:1285: return ItemDatabase.getCanonicalName(canonicalName.split(" ")[1] + " snowcone", count);
The second is a recursive call in the method itself.
The first is in ItemDatabase.getItemIds(String itemName, int count, boolean subStringMatch).
Looking at the callers of THAT method:
Code:
./textui/Parser.java:4114: int[] ids = ItemDatabase.getItemIds(name, 1, false);
./textui/DataTypes.java:726: int[] itemIds = ItemDatabase.getItemIds(name, 1, false);
./textui/parsetree/Type.java:308: ? ItemDatabase.getItemIds(name, 1, false)
./textui/command/TestCommand.java:383: int[] itemIds = ItemDatabase.getItemIds(string, 1, true);
./AdventureResult.java:450: && ItemDatabase.getItemIds(this.name, 1, false).length > 1)
./request/MallSearchRequest.java:256: int[] itemIds = ItemDatabase.getItemIds(itemName, 1, true);
Which is to say, all the code in ItemDatabase.getCanonicalName() that deals with "count > 1" is dead code at this point.
ItemFInder also has "fuzzy matching" heuristics in ItemFinder.getFirstMatchingItemName like this:
Code:
// If there are multiple matches, such that one is a substring of the
// others, choose the shorter one, on the grounds that the user would have
// included part of the unique section of the longer name if that was the
// item they actually intended. This makes it easier to refer to
// non-clockwork in-a-boxes, and DoD potions by flavor.
and
Code:
// Candy hearts, snowcones and cupcakes take precedence over
// all the other items in the game, IF exactly one such item
// matches.
This is old code. The first bookshelf IOTM gave you snowcones. I have thousands of snowcones in storage.
We have no tests which specifically test ItemFinder.getFirstMatchingItemName.
Having looked at all that, why do I get this:
Code:
> inv drastic
scroll of drastic healing (10)
> pull 1 croll
Pulling items from storage...
Requests complete.
> inv drastic
scroll of drastic healing (11)
As you pointed out, "croll" is ambiguous.
ItemFinder.getMatchingNames gets a list of matches. It calls ItemDatabase.getMatchingNames, which calls StringUtilities.getMatchingNames with a (sorted) list of item names. There are 23 items that match "croll" - and drastic healing is not alphabetically the first.
Hah. I installed some debug logging and have identified the "croll" issue:
Code:
Looking for first matching item name in a list of 23 names.
Filtering by ANY
Reduced namelist to 1 names.
Returning scroll of drastic healing
Why?
ItemFinder.filterNameList:
Code:
if (filterType != Match.FOOD && filterType != Match.BOOZE && filterType != Match.SPLEEN && filterType != Match.CANDY) {
// First, check to see if there are an HP/MP restores
// in the list of matches. If there are, only return
// the restorative items (the others are irrelevant).
...
Yet Another Heuristic. For a "pull" (or even "use"), why in the world are only "restores" relevant?
Well, enough musing,, for now.