Bug Mafia performance

ikzann · Nov 4, 2022

Hi all,

After getting a little frustrated with how quickly Mafia chews up my laptop battery, I decided to run it through a profiler to see what seems to be generating the most CPU usage. Below are some quick highlights from a first run while running a farming script, and I'd be happy to dive deeper into any of these. I'm sure the devs who spend more time on Mafia know some of these already, but posting anyway in case it's helpful. I will spend some time with the code this weekend and see if I can't fix any of the issues.

As everyone seems to have keyed in on a little lately - I think you added the preference, Veracity? - the biggest standout is the concoction database refreshes. 40% of time during my profiling session was spent on sorting the LockableListModel during a concoction database refresh. It feels like the concoction database should be updated lazily, only when there is creation going on, but I haven't spent enough time with the code.
13% of time was spent updating equipment each adventure. Not sure exactly why this is so expensive.
12% of time was spent running HtmlCleaner on resulting HTML. I am not sure if there is a way around this, but flagging anyway.
10% of time was spent doing preference updates. Preference updates are generally slow because every update leads to a full serialization of the entire preference DB. Preferences that update every combat are especially impactful. Maybe there is some way to defer this update so it's once per combat round instead of multiple times at end of combat, which I believe is how it works now.

ikzann · Nov 4, 2022

Here is a mafia performance snapshot - you should be able to open it in Java VisualVM. This is on the branch for PR #1227. There still appear to be lots of concoction databases refreshes happening in the main code path.

Dropbox

www.dropbox.com

Veracity · Nov 4, 2022

Comments on ConcoctionDatabase.

It has two LockableListModels:

- creatableList - every "concoction".
Retrieved by getCreatables.
This is the Model for the CreateItemPanel - the tabs under Creatable in the Item Manager

- usableList - every food, drink, spleen, potion
Retrieved by getUsables
This is the Model for the UseItemEnquePanel - the tabs under Usable in the Item Manager.

Note that these lists contain EVERY item of the specified kind.
Whether or not they are visible depends on whether the "usable amount" or "creatable amount" is greater than zero.
This also depends on the various checkboxes on each panel.

We update these lists when you acquire (or lose) an item which is either usable or is an ingredient of a creatable item.
This means that items can become visible - or cease to be visible - when your inventory changes.

It also means that if you change acceptable "source" - closet, storage, whatever - visibility can change. That required a refresh, since it is effectively making more (or fewer) items available.

(Filtering via checkboxes - "turn free" "+mys only" etc does not require a refresh. That changes visibility of elements of the model, which does not change.)

GenericRequest will refresh concoctions after every request with a result.
It does this this way:

Code:

ConcoctionDatabase.refreshConcoctions(false)

Note that "false". That is the "force" parameter.
"true" would mean "always do it"
"false" means "only do it if you have been told that a refresh is needed by ConcoctionDatabase.setRefreshNeeded".

So, result processing looks at each item acquired/lost, and lets ConcoctionDatabase decide if it applies to usableList or creatableList.

Now, we have ways to cut down on the number of refreshes.

If you automate adventuring - KoLmafia.executeRequest - we will "defer" concoction refreshes across the entire sequence of requests.

You want to adventure in a location multiple times adventure(100, <location>) - the we defer refreshes at the beginning of the sequence and un-defer it at the end, resulting in 1 refresh, no matter how many adventures yielded items.

We also defer refreshing while processing items dropped from a fight: one (potential) refresh after all items are processed, not after each item.

Now, if you are automated via adv1, there is no help; your adventure "sequence" is a single adventure. All the rounds of a fight will still turn into a single refresh, but you will get one refresh per adventure.

Another wrinkle: if you have a "goal" or "condition" set with an item which can be created, we do not defer refreshes across an adventure sequence. This is why you can adventure in the Hole in the Sky with "Richard's star key" as a goal and it will stop when it has acquired the ingredients and crafted one for you.

Conclusions:

1) It is not possible to "defer refreshes until you request crafting". The "creatable amount" will be zero until we refresh, and you won't even SEE that the item is creatable in the GUI. Perhaps we could request a (non-force) refresh if a script method asks to "create" or "retrieve" or "use" or whatever, but it so much simpler to just keep the lists up to date each time we know they will change.
2) There are dozens of calls to ConcoctionDatabase.refreshConcoctions() in various places. No argument means "(true)" i.e. force a refresh.
We should probably look closely at those.
3) I would to see some unit tests to confirm my assumptions:
- A fight returning multiple "usables" results in a single refresh
- A fight returning nothing that goes on usables or creatables does no refreshes
- Automating a few adventures (fights) going through adventure() results in a single refresh for the sequence, not one per adventure.

That could be a fun little project.

Start with the tests confirming - or not - the assumptions - that our refresh minimizing heuristics behave as expected.
Fix the bugs.

And if refreshing still seems too slow - 40% of CPU usage in automated adventuring - consider optimizing the culprit.
Which is how you should ALWAYS approach optimization.

ikzann · Nov 4, 2022

Interesting. That's all extremely helpful context. Your comment about adventure() makes a lot of sense - there is no reason to refresh the model driving the UI when most of the UI is locked, anyway. The same logic would seem to apply to scripts. If a user is running a script, is there any reason to have the concoction database updating? If I'm on target, then deferring updates until a script finishes seems like the "easy" solution here. And scripts are the place where Mafia seems to really turn up my laptop fans.

As an additional complication to a broader refactor, I'll note that in addition to the refreshConcoctions calls all over the code base, there are also many, many cases where other functions call ConcoctionDatabase.getUsables().sort() directly. The broader issue seems like a tricky knot to untangle.

ikzann · Nov 4, 2022

Ah, but create() inside a script must rely on the database? Hm.

the dictator · Nov 5, 2022

Veracity said:
We update these lists when you acquire (or lose) an item which is either usable or is an ingredient of a creatable item.
...
2) There are dozens of calls to ConcoctionDatabase.refreshConcoctions() in various places. No argument means "(true)" i.e. force a refresh.
We should probably look closely at those.

I think I found one of those cases, but I can't find it in the code to check/fix: The database refreshes every time I get attacked in PVP, even if the item lost isn't an ingredient/usable. Hell, even if I won the fight and thus didn't loose any items at all.

Ryo_Sangnoir · Nov 5, 2022

Starting in `src/net/sourceforge/kolmafia/chat/ChatManager.java`, line 438 updates the inventory from the API when it sees "just attacked you".

Eventually this gets to `InventoryManager.parseInventory(JSONObject JSON)`, which blanks and replaces the inventory lists, updates available equipment, refreshes concoctions and sends the "(hats)" listener event. Refreshing the concoctions, updating the available equipment and refreshing hats are all things which Mafia spends a lot of proportional time doing.

ikzann · Nov 5, 2022

I have a couple PRs in now to at least ameliorate some of these issues. Still looking at a few other candidates for low-hanging fruit for improvement.

ikzann · Nov 6, 2022

Okay, after the work this weekend there is still one major outstanding question I'd like to ask folks: should we switch preferences from using a sorted TreeMap to an unsorted HashMap-style?

Mafia currently spends a nontrivial amount of time doing its in-memory preference lookups - maybe 8-10% whlie running scripts in my testing - both on the set-side and the get-side. I've done some inspection on typical execution patterns and some patterns emerge:

Mafia checks kingLiberated all the time. Probably hundreds of times per adventure. It's not surprising, I guess.
A number of other preferences that are implicated in the concoction database are also checked very frequently, as they're called during every compareTo while sorting the usable list.
Most of the lookup time is, unsurprisingly, spent traversing the TreeMap BST, and string comparisons are not always cheap.

Switching to an unordered map would mean that preferences would naturally be printed in the file in a random order. I don't really care about this, but others might. I never open my file and when I do I always ctrl-f. But maybe there other consequences I'm missing.

I threw together a quick microbenchmark that just checks every preference 20,000 times. Without the change, it takes 13.5 seconds. With the change, it takes 2.8 seconds.

fronobulax · Nov 7, 2022

ikzann said:
Switching to an unordered map would mean that preferences would naturally be printed in the file in a random order. I don't really care about this, but others might. I never open my file and when I do I always ctrl-f. But maybe there other consequences I'm missing.

I would object to a preferences file that was not in a human readable sorted order. I tend to scroll rather than search. Some of that is mouse centric rather than keyboard centric but some of that is knowing a partial name and wanting to find the preference and related or similarly named preferences quickly.

Could allowing a different representation in memory and on disk be a compromise? Perhaps the alwaysWriteOnChange or what ever it is called should be reconsidered by people who value performance over data integrity?

Linknoid · Nov 7, 2022

Here's a way to reduce the number of map lookups without making the map unordered:

Store each setting value in its own object. Instead of a Map<string, string>, it would be a Map<string, SettingString>, where SettingString is a class containing a readable/writable string. Then when you have something like kingLiberated you want to check, you check it through an accessor which retrieves the SettingString and caches it the first time, and then it knows the object the setting is stored in forever more. No more map lookup of that key on any kind of subsequent accesses. Any commonly accessed properties can be cached like this, and the less commonly accessed ones can stay as string lookups.

ikzann · Nov 7, 2022

fronobulax said:
I would object to a preferences file that was not in a human readable sorted order. I tend to scroll rather than search. Some of that is mouse centric rather than keyboard centric but some of that is knowing a partial name and wanting to find the preference and related or similarly named preferences quickly.

Could allowing a different representation in memory and on disk be a compromise? Perhaps the alwaysWriteOnChange or what ever it is called should be reconsidered by people who value performance over data integrity?

That's fair. It is true that a big part of the "problem" (to the existent there is one) is saveSettingsOnSet - there are now quite a few preferences that update every combat. I am going to explore turning it off, or triggering a write once per adventure, in some of the scripts I work on.

I think it makes more sense to just sort the list before writing it out rather than storing it sorted on memory, as you suggest. I ran another microbenchmark to test this. The following test ran in 14.0 seconds on current main and 3.0 seconds on a version with HashMap for preferences and sorted output. It's just a 4:1 ratio of gets to sets.

Java:

  @Test
  void preferenceSpeed() {
    var cleanups = new Cleanups(withProperty("saveSettingsOnSet", true), withSavePreferencesToFile());
    try (cleanups) {
      for (int i = 0; i < 10000; i++) {
        for (var pref : Preferences.allDefaults()) {
          Preferences.getString(pref);
          if ((i & 3) == 0) Preferences.setString(pref, "abc");
        }
      }
      long start = ManagementFactory.getThreadMXBean().getThreadCpuTime(Thread.currentThread().getId());
      for (int i = 0; i < 20000; i++) {
        for (var pref : Preferences.allDefaults()) {
          Preferences.getString(pref);
          if ((i & 3) == 0) Preferences.setString(pref, "abc");
        }
      }
      long end = ManagementFactory.getThreadMXBean().getThreadCpuTime(Thread.currentThread().getId());
      System.out.println("Elapsed: " + (end - start) / 1_000_000);
    }
  }

Linknoid said:
Here's a way to reduce the number of map lookups without making the map unordered:

Store each setting value in its own object. Instead of a Map<string, string>, it would be a Map<string, SettingString>, where SettingString is a class containing a readable/writable string. Then when you have something like kingLiberated you want to check, you check it through an accessor which retrieves the SettingString and caches it the first time, and then it knows the object the setting is stored in forever more. No more map lookup of that key on any kind of subsequent accesses. Any commonly accessed properties can be cached like this, and the less commonly accessed ones can stay as string lookups.

This is a great idea but more refactoring as far as I can tell - I don't think you can just use the Entry objects for this purpose. There's also a long tail of properties that are checked very frequently, e.g. all the unknownRecipeX properties are checked every concoction refresh. For now I am just going to file a PR moving to a ConcurrentHashMap with sorting on output.

heeheehee · Nov 7, 2022

ikzann said:
if ((i & 3) == 0) Preferences.setString(pref, "abc");

This is actually not representative because setString() has special handling for setting a preference to its existing value.

Java:

  public static void setString(final String user, final String name, final String value) {
    String old = Preferences.getString(user, name);
    if (!old.equals(value)) {
      Preferences.setObject(user, name, value, value);
    }
  }

You're not doing the sort or the write to disk after the inner loop runs for the ~~first~~ fourth time, which is why you're mostly observing the difference in lookup speed.

(Aside: I'm a bit grumpy when I see code that looks like

Java:

if (!map.containsKey(key)) {
  map.put(key, value);
}

or similarly,

Java:

if (!map.get(key).equals(value)) {
  map.put(key, value);
}

since that does two map lookups, when put() tells you what the old value is. That said, it's somewhat more ergonomic in C++ where you have readily available access to the iterators, and you don't have to incur a string copy unless you actually do the insertion.)

ikzann · Nov 7, 2022

Thanks. I'll rerun the tests.

heeheehee · Nov 7, 2022

fyi -- local run on my laptop (which has a nvme ssd, so disk performance is likely better than the average user's):

PreferencesTest > preferenceSpeed() STANDARD_OUT
Elapsed: 4.173547323

Replacing that assignment with "abc" + i:

PreferencesTest > preferenceSpeed() STANDARD_OUT
Elapsed: 7.324115627

Workloads that are more write-heavy will see worse performance. With (i & 3) == 0 -> (i & 3) != 0 (namely, 75% as many writes as reads):

PreferencesTest > preferenceSpeed() STANDARD_OUT
Elapsed: 14.069218074

ikzann · Nov 7, 2022

Gausie merged my changes this morning, so I ran the 4:1 ratio benchmark against both r26905 and r26907. I still saw a 4-5x speedup or so as the old code also slows down substantially with the corrected benchmark. That's good enough for me.

This is also only CPU time - obviously folks with faster SSDs should see better wall-clock performance in the real world. My main focus has been the drain on my battery life.

heeheehee · Nov 7, 2022

Right. In both cases, you're going to incur a write to disk, so really the only pertinent question for performance is whether sorting on write + using a hash map is cheaper than maintaining a sorted data structure.

There are probably enough issues with TreeMap thrashing the cache hierarchy with binary search + node indirection that it's closer than it should be (even with a 3000-element map). And that's not even getting into the details behind the hammer that is Collections.synchronizedSortedMap.

ikzann · Nov 13, 2022

Thanks everyone for all the help. After the PRs I've submitted, the hot spots are still mostly the same, but overall execution time seems to have been reduced (hard to measure exactly). I think there is probably some work that could be done to the maximizer, but that may be a more in-depth project. I'll take a brief look.

Bug Mafia performance

ikzann

Member

ikzann

Member

Dropbox

Veracity

Developer

ikzann

Member

ikzann

Member

the dictator

Member

Ryo_Sangnoir

Developer

ikzann

Member

ikzann

Member

fronobulax

Developer

Linknoid

Member

ikzann

Member

heeheehee

Developer

ikzann

Member

heeheehee

Developer

ikzann

Member

heeheehee

Developer

ikzann

Member