Bug - Fixed Performance issues

Darzil

Developer
I (and by talking to clannies and from recent comments on G-D others) have an issue where sometimes Mafia slows right down, and stays slow until the Kolmafia is restarted. Some (such as me) have found this to happen occasionally for months, others seem to have had issues start recently. I suspect there may be more than one issue going on, but I thought it best to start a thread here to gather evidence. I've mainly noticed this in the relay browser, but then I almost always have it open.

Today when I got the issue, I fired up Wireshark to help me work out where the delays happened.

I was in the Hippy Camp, and clicked on adventure again. As I was down to one turn on 3 buffs, it recast two, then visited campground for a free rest, then cast the last buff, then started the fight. This took approximately 34 seconds.

The network traffic observed (have cut out chat messages) was :
Code:
418.27 POST /skills.php
419.21 HTTP/1.1 200 OK get skills message back
419.22 POST /api.php
419.47 HTTP/1.1 200 OK get api message back
426.23 POST /skills.php
426.95 HTTP/1.1 200 OK get skills message back
426.96 POST /api.php
427.20 HTTP/1.1 200 OK get api message back
434.62 POST /campground.php
435.33 HTTP/1.1 200 OK get campground message back
435.36 POST /api.php
435.60 HTTP/1.1 200 OK get api message back
442.83 POST /skills.php
443.54 HTTP/1.1 200 OK get skills message back
443.55 POST api.php
443.80 HTTP/1.1 200 OK get api message back
451.02 POST /adventure.php
451.27 HTTP/1.1 302 Found
451.30 POST /fight/php
452.21 HTTP/1.1 200 OK get fight back

So the responses from KoL look reasonably fast given they are multipacket and I have a 0.2 second ping to the server - 0.94, 0.72 and 0.71 for the skill casting, 0.71 for the campground rest, 0.91 for the fight. API was faster, 0.25, 0.24, 0.24, 0.25.

The Mafia processing to work out if it needs to refresh character data (call to api.php) is fast 0.01s - 0.03s.

The Mafia processing to parse api.php and work out the next action is slow (6.76s, 7.42s, 7.23s, 7.22s).

Am now going to restart KolMafia and record a baseline when first opened.
 
Last edited:
Ok, restart and try something similar. This time no campground visit.

Network traffic:
Code:
2765.77 POST /skills.php
2766.71 HTTP/1.1 200 OK get skills message back
2766.72 POST /api.php
2766.97 HTTP/1.1 200 OK get api message back
2767.15 POST /skills.php
2767.86 HTTP/1.1 200 OK get skills message back
2767.87 POST /api.php
2768.11 HTTP/1.1 200 OK get api message back
2768.42 POST /skills.php
2768.91 HTTP/1.1 200 OK get skills message back
2768.92 POST /api.php
2769.16 HTTP/1,1 200 OK get api message back
2769.52 POST /adventure.php
2769.75 HTTP/1.1 302 Found
2769.77 POST /fight.php
2770.27 HTTP/1.1 200 OK get fight message back

So the responses from KoL are pretty much the same - 0.94, 0.71 and 0.49 for the skill casting, 0.60 for the fight. API was 0.25, 0.24, 0.24.

The Mafia processing to work out if it needs to refresh character data (call to api.php) is still fast 0.01s.

The Mafia processing to parse api.php and work out the next action is much faster (0.18s, 0.31s, 0.36s).

The question which now arises is what is the best way of working out what causes the slowdown, which could be a Java issue, a function issue, a multi-threading issue, etc. Is there a way to switch some debug on whilst mafia is running which will show what is being processed and at what time? Or a Java kit tool that would show this?
 
You were in the Relay Browser - since you said "I clicked on Adventure gain".
What is your tool looking at? The request/response between KoLmafia and KoL?
In the Relay Browser, we also have the request/response between the Browser and KoLmafia, over the 127.0.0.1 socket.

Does your tool have access to the localhost connection, too? Because there could be delay between when KoLmafia is ready to send the response down to the browser and when it actually does so - or between when KoLmafia sends the response down and when the browser actually picks it up.

I mention this because I am now using Firefox as my Browser (as the Relay Browser only; I use Safari elsewhere), and I sometimes see the KoL tab sit there and spin for a while, waiting for a response from 127.0.0.1
 
Unfortunately Wireshark doesn't look at 127.0.0.1.

It's looking at all (actual) network traffic, and I'm filtering to ip.host == 69.16.150.211.

I'm not restarting IE10 when I get the issue, though I am closing the relevant tab, I'm not sure if that starts a new IE10 thread, probably.

When it's slow it also affects cli actions, which hopefully don't care about the browser.

Edit - got it again now. When I click on adventure again, the page text parsed results hits the CLi very fast, but the relay browser very slowly.

Edit again - Chat doesn't slow down.

Another edit - An example (less technical) on what I'm seeing in the CLi :

> use 5 black cherry soda

Purchasing black cherry soda (5 @ 80)...
You acquire black cherry soda (5)
You spent 400 Meat

(7 second delay)

Purchases complete.
Using 5 black cherry soda...
You gain 51 Mojo Points

(7 second delay)

Finished using 5 black cherry soda.

Yet another edit - Also at present, adventuring in the relay browser, it's slow to start the fight (long delay between stuff appearing in CLi and relay browser), and after the fight (long delay between result appearing in CLi and relay browser) but not displaying individual rounds.
 
Last edited:
Does mafia look for the localhost even if relay browser isn't open? Or if its open, but you don't click anything on it and leave it at a random page from minutes ago?

That exact same behaviour was happening to me last night no matter what, just getting stuck at the start or the end of an automated adventure for 30 seconds to a minute. I tried various combinations of open and closed relay browser, multiple restarts of mafia and a system reboot, but there wasn't much improvement.
 
Does mafia look for the localhost even if relay browser isn't open? Or if its open, but you don't click anything on it and leave it at a random page from minutes ago?

That exact same behaviour was happening to me last night no matter what, just getting stuck at the start or the end of an automated adventure for 30 seconds to a minute. I tried various combinations of open and closed relay browser, multiple restarts of mafia and a system reboot, but there wasn't much improvement.

If multiple restarts of mafia and a system reboot didn't cure it, it sounds more likely there was a network issue somewhere between you and the server. If you want to know where, I can recommend pingplotter as a good tool to use. Am happy to help you interpret the plots. You'd want to set it up to contact 69.16.150.211.
 
30ms for processing isn't usually considered fast. Seeing as that was just to follow a HTTP redirect, which is maybe 100 bytes of data, you might have bigger problems on your hands. How much CPU time was Mafia getting?

I don't see why you can't configure wireshark to also capture on 127.0.0.1 (on Linux, you have "any" as a possible capture target as well as "lo"; I'm not sure about other OSes, but a last-ditch resort would be to just run multiple instances of wireshark).
 
Yeah, works on Linux, not on Windows for 127.0.0.1. There are other tools, but what I'm taking from the above so far is that Java is processing Kolmafia slowly rather than it being a network issue.

I'll look into this more on Monday when I'm back from a weekend away. I think you could well be right, and that all activity is affected, but the more processing is needed the greater the time taken. The main thing it's established for me is that when it first starts up it's fine, and later it's very slow. From talking to others it's pretty common, and people are getting used to just having to restart Mafia every couple of hours. I'm trying to get people to add their experiences to this thread if they are the same. If it's just my machine that's fine, but I fear it may not be.

I'm running Java 1.7.0_21 on Windows 7 64bit.
 
I've noticed that animated familiars seem to do something to mafia. The longer they are out, the worse the performance and only a restart fixes it. I've never figured out a way to prove it well enough for a bug report, so I just removed animation from the ones I use.
 
Idea to try out/question: when this happens, does disabling all scripts (recoveryscript, before/between/after battlescript) and moods help?
(and some anecdata: I have seen mafia's java process as high as 400% cpu in top before (so 4 cores used to 100% each), when I had these long delays; I have also tried looking at GC output (I have verbose garbaging turned on by default, because I like seeing that something is happening), and saw that it didn't even trigger GCs throughout the entire delay (I had shorter delays at that point, so it was only one second with GCs out of three seconds of delay), and it wasn't bad (total GC time around 0.2s during that 1s, with roughly 600MB collected in that time)).
Maybe I should start running with profiling on.
 
For amusement, revision 12625 adds "debug trace [on]" and "debug trace off" commands. These open a TRACE file in the same place as the DEBUG file which logs (with millisecond timestamps) all transactions between the browser and KoLmafia and between KoLmafia and KoL.

For amusement, go to the browser and ask to look at the inventory with inventory images on. Ponder how that would look with and without relayUsesCachedImages - which I ALWAYS run with, but which, for reasons which have never made sense to me, certain people here like to advise others to turn off.
 
For amusement, go to the browser and ask to look at the inventory with inventory images on. Ponder how that would look with and without relayUsesCachedImages - which I ALWAYS run with, but which, for reasons which have never made sense to me, certain people here like to advise others to turn off.

Assuming I guessed correctly where the page load ended, relayUsesCachedImages=false finished in ~2 seconds, relayUsesCachedImages=true finished in ~4 seconds (the second time around; the first time took ~10 seconds to prime browser's cache).
But I have a lot of RAM and can afford to give firefox as much cache as it wants. And even its disk cache is completely cached in RAM by the OS ...

I disabled relayUsesCachedImages years ago when I noticed that many image URLs weren't pointed to mafia's cache anyway, because they didn't have the prefix that mafia was looking for. (this is according to my faulty memory)


ETA:
okay, here's some stuff from trace, with ## comments from me:
Code:
1379199454255: From Browser: GET /adventure.php?snarfblat=340 HTTP/1.1
1379199454290: Requesting: http://www.kingdomofloathing.com/campground.php?preaction=summongygax&quantity=1 ## triggered by my mood
1379199454507: Retrieved: http://www.kingdomofloathing.com/campground.php?preaction=summongygax&quantity=1
1379199454872: Requesting: http://www.kingdomofloathing.com/api.php?what=status&for=KoLmafia
1379199455080: Retrieved: http://www.kingdomofloathing.com/api.php?what=status&for=KoLmafia
1379199455586: Requesting: http://www.kingdomofloathing.com/adventure.php?snarfblat=340 ## what was happening for the half second between retrieving api.php and requesting adventure.php?
1379199455780: Retrieved: http://www.kingdomofloathing.com/adventure.php?snarfblat=340
1379199455813: To Browser: HTTP/1.1 302 Found: /adventure.php?snarfblat=340
1379199455825: From Browser: GET /fight.php?ireallymeanit=1379199455 HTTP/1.1
1379199455825: Requesting: http://www.kingdomofloathing.com/fight.php?ireallymeanit=1379199455
1379199456071: Retrieved: http://www.kingdomofloathing.com/fight.php?ireallymeanit=1379199455
1379199456497: To Browser: HTTP/1.1 200 OK: /fight.php?ireallymeanit=1379199455 ## again, almost half a second doing ... what?
1379199456709: From Browser: GET /images/scripts/keybinds.min.2.js HTTP/1.1
1379199456709: From Browser: GET /images/scripts/window.20111231.js HTTP/1.1
1379199456710: From Browser: GET /images/scripts/core.js HTTP/1.1
1379199456711: From Browser: GET /images/scripts/jquery-1.3.1.min.js HTTP/1.1
1379199456711: From Browser: GET /basics.js HTTP/1.1
1379199456711: From Browser: GET /basics.css HTTP/1.1
1379199456711: From Browser: GET /charpane.php HTTP/1.1
1379199456712: From Browser: GET /newchatmessages.php?j=1&lasttime=1356089864 HTTP/1.1
1379199456738: To Browser: HTTP/1.1 200 OK: /images/scripts/keybinds.min.2.js
+-- 86 lines: 1379199456738: Requesting: http://www.kingdomofloathing.com/newchatmessages.php?j=1&pwd--------- ## mostly chit here
1379199458423: To Browser: HTTP/1.1 200 OK: /images/itemimages/snowsuit.gif ########## this concludes images requested by chit, almost 2 seconds later - and 4.2 seconds after clicking the adventure link
1379199458761: From Browser: POST /fight.php HTTP/1.1 ######################## here I clicked the button
1379199458761: Requesting: http://www.kingdomofloathing.com/fight.php?action=macro&macrotext=&whichmacro=105758
1379199459626: From Browser: GET /newchatmessages.php?j=1&lasttime=1356089866 HTTP/1.1
1379199459626: Requesting: http://www.kingdomofloathing.com/newchatmessages.php?j=1&pwd
1379199459720: Retrieved: http://www.kingdomofloathing.com/fight.php?action=macro&macrotext=&whichmacro=105758 ## that was a second to get back response from KoL
1379199459806: Retrieved: http://www.kingdomofloathing.com/newchatmessages.php?j=1&pwd
1379199459808: To Browser: HTTP/1.1 200 OK: /newchatmessages.php?j=1&lasttime=1356089866
1379199460848: To Browser: HTTP/1.1 200 OK: /fight.php ## and another second to start sending processed response to relay browser
1379199460970: From Browser: GET /images/scripts/keybinds.min.2.js HTTP/1.1
1379199460970: From Browser: GET /images/scripts/window.20111231.js HTTP/1.1
## again, chit requests follow
 
Last edited:
If we use cached images, we store the .gif on a local file and serve it to the browser when it requests it.
If we don't use cached images, we relay the request for the image to KoL, and when the response comes back, serve it to the browser.

You just said that sending hundreds of additional requests to KoL is twice as fast as NOT sending those requests but, instead, looking at a local cache.

That is extremely hard to believe.
 
You just said that sending hundreds of additional requests to KoL is twice as fast as NOT sending those requests but, instead, looking at a local cache.

No. I just said that the browser doesn't even send those requests, because it looks at its own cache, instead of requesting them from mafia, which then looks at mafia cache.

What I don't understand is why relayUsesCachedImages=true added two seconds in the case when the browser already had images with *those* urls cached. Is the search-replace *that* inefficient? [1]
(maybe I should point out - when mafia's cache is not in use, image urls received by browser are like "http://images.kingdomofloathing.com/itemimages/dv_freddysweater.gif"; when mafia's cache is enabled, they are "/images/itemimages/dv_freddysweater.gif" instead in the html (mangled by mafia), and therefore loaded from "http://127.0.0.1:60080/images/itemimages/dv_freddysweater.gif" - totally different files, as far as the browser is concerned).


[1] ETA: no, it isn't; profiling shows that, so far, most time is spent, unsurprisingly, in PauseObject.pause(), then in swing, so far 11 seconds were spent in CreateFrameRunnable.run(), 6 in GenericRequest.receiveServerReply, 5 in ByteBufferUtilities.read, 4 in chat handling, 1 in sending GenericRequest.postClientData, and everything else is below 1s cumulative since I started mafia ...
 
Last edited:
Yes, I experimented and I see that the browser is requesting "/images/xxx" all the time and skipping its own cache.
I will try putting on the local host/port prefix, as you comment and see how it behaves...
 
Here's a trace log where mafia was going really slow. If I'm reading this correctly, the time between sending and receiving is very short. Yet, I'm experiencing delays of up to 5 second between actions. Submitted in case I'm reading wrong, but if I'm right, then something else in mafia could be causing this slowdown. Possibly running a second instance?
 

Attachments

I see up to 2-3 seconds of BatMan_RE.ash in there ...
Then you submitted a simple skill use, and there was still 0.5s between "From Browser: POST ..., Requesting: http://..., Retrieved: http://" and "To Browser: POST ...", and another 0.3s before the browser even started requesting actionbar .. and then another 2 seconds until both actionbar and chit were done downloading all their images?

How much of that delay is due to batman?
 
Yes, I experimented and I see that the browser is requesting "/images/xxx" all the time and skipping its own cache. I will try putting on the local host/port prefix, as you comment and see how it behaves...
I've experimented with this and still find it puzzling.

If we transmute "http://images.kingdomofloathing.com" to "/images", the browser requests "/images/xxx" every time it sees it, even if we have previously served it that image. It does not seem to cache it.

If we transmute "http://images.kingdomofloathing.com" to "http://127.0.0.1:60080/images", the browser STILL requests "/images/xxx" every time, even if we have previously served it that image. It does not seem to cache it.

If we do no transmutation, the browser does not request things from images.kingdomofloathing.com; apparently it is caching.

So, if the browser is not doing caching, "relayCachesImages" makes it go to KoLmafia - a "local proxy" rather than across the net to images.kingdomofloathing.com - presumably a big help. But if the browser - Firefox, at least - IS doing caching, apparently it does not use its own caching for images in "/images" or in "192.0.0.1:68080/images", which ADDS to work.

There must be something I don't understand about this; that is NOT how I want it to behave.

Hmm. Googlation indicates that when KoLmafia-as-proxy serves a cached file, it should include some headers in the response to tell the Browser to cache it. In particular, it should include an Expires in the future and a Last-Modified in the past. I'm a little wary of that, since KoL COULD change its images, but, you know, we don't time-stamp them in our local cache, anyway.

We should probably do that: when KoL itself passes back Expires and Last-Modified for a file, we should cache it and remember those values, and pass them down to the Browser when we serve it a cached file.

In other words, KoLmafia's own proxy caching could stand some work. :)
 
OK, turns out we were not logging or tracing "image" requests to KoL. I added code for that and here is a sample:

Code:
Requesting: http://images.kingdomofloathing.com/itemimages/familiar2.gif
0 request properties

Retrieved: http://images.kingdomofloathing.com/itemimages/familiar2.gif
8 header fields
Field: null = [HTTP/1.1 200 OK]
Field: Date = [Sun, 15 Sep 2013 16:48:29 GMT]
Field: Content-Length = [193]
Field: Last-Modified = [Wed, 19 May 2004 01:08:09 GMT]
Field: Content-Type = [image/gif]
Field: Connection = [keep-alive]
Field: Accept-Ranges = [bytes]
Field: Server = [nginx/0.8.55]
I see a Last-Modified header, but no Expires. So, why is the browser deciding to cache this - and for how long?

I will have to think about this.
 
Back
Top