Bug Lag in both relay browser and using the adventure tab...

mekkanik

New member
Hi,

I've noticed this quite some time ago... but the problem seems to be getting worse. The TCP connections opened in mafia are not getting shut down properly and ending up in TIME_WAIT status.

This happens during scripted adventuring via the adventure tab and is exponentially worse while using the relay browser. Running TCPView (http://technet.microsoft.com/en-us/sysinternals/bb897437) clearly shows the issue. The connections stack up steadily on port 12080. Things get much worse when the relay browser is fired up and you can see tons of connections to 60080 stuck in TIME_WAIT...


These connections take quite some amount of time to spin down... speed does not come back up it happens.
 

Veracity

Developer
Staff member
What makes you think the TIME_WAIT state is a problem? For that matter, what makes you think you can avoid it with a "proper" shut down? What IS a "proper" shut down?
 

Theraze

Active member
If you believe that's the problem, try turning your "allowSocketTimeout" preference to true and see if that makes it better. The default on that was changed from true to false several months (I think?) back, as it causes potential problems with double-sending requests if you have an especially laggy or problematic connection.
 

Theraze

Active member
Ah, sorry. I thought it was mentioned as making requests timeout if there's no timely response, so that it was related. Sorry. :(
 

xKiv

Active member
I would say the TIME_WAIT itself isn't a problem (dozens of those just mean dozens of connections "closed" within a short timespan) - but it's probably a symptom of some problem. I don't think using mafia should result in dozens (I have seen over 100 myself) connections being churned through in a minute (my machine has a default TCP FIN timeout of 60 seconds). And it usually doesn't.

For reference, here is an excerpt from my DEBUG_20120422.txt (dumped by graygui, iirc):
Code:
"LocalRelayAgent93" prio=10 tid=0x0000000001f48000 nid=0x58b0 waiting for monitor entry [0x00007f65640bf000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at net.sourceforge.kolmafia.chat.ChatPoller.getEntries(ChatPoller.java:146)
        - waiting to lock <0x00000000d75e5e28> (a java.lang.Class for net.sourceforge.kolmafia.chat.ChatPoller)
        at net.sourceforge.kolmafia.request.RelayRequest.handleChat(RelayRequest.java:1526)
        at net.sourceforge.kolmafia.request.RelayRequest.run(RelayRequest.java:1666)
        at net.sourceforge.kolmafia.RequestThread.postRequest(RequestThread.java:221)
        at net.sourceforge.kolmafia.RequestThread.postRequest(RequestThread.java:184)
        at net.sourceforge.kolmafia.webui.RelayAgent.readServerResponse(RelayAgent.java:480)
        at net.sourceforge.kolmafia.webui.RelayAgent.performRelay(RelayAgent.java:159)
        at net.sourceforge.kolmafia.webui.RelayAgent.run(RelayAgent.java:132)
(this is repeated many times, for 93 local relay agents, and a couple RUNNABLE threads)

I would also wager that a situation like this doesn't play well with TCP's congestion mechanic, but that's an uninformed conjecture on my part.
 

Veracity

Developer
Staff member
I would say the TIME_WAIT itself isn't a problem (dozens of those just mean dozens of connections "closed" within a short timespan)
Correct. TIME_WAIT is a normal aspect of TCP's state machine. In the absence of an RST shutting down the connection abnormally, one end or the other of a TCP connection will pretty much always be in a TIME_WAIT state after it shuts down. Either we or the KoL server will hang on to the TCB for 2 * MSL.

(I implement network protocols for a living. I've actually implemented TCP itself three times in my career. Not for a long time - since before Linux was invented - but I'm not uninformed about the technical details of how the protocol works. :))

but it's probably a symptom of some problem. I don't think using mafia should result in dozens (I have seen over 100 myself) connections being churned through in a minute (my machine has a default TCP FIN timeout of 60 seconds). And it usually doesn't.
My impression is that we open up a new connection for every request we make to the KoL server. I also think that's how browsers generally do it.

For reference, here is an excerpt from my DEBUG_20120422.txt (dumped by graygui, iirc):
...
(this is repeated many times, for 93 local relay agents, and a couple RUNNABLE threads)
93 open threads? You showed us one in ChatPoller. Are they all in chat?

It does seem abnormal to have that many active threads.
 

xKiv

Active member
My impression is that we open up a new connection for every request we make to the KoL server. I also think that's how browsers generally do it.

Don't browsers do keep-alive (persistent) connections to combat TCP management overhead (including TIME_WAIT)?

93 open threads? You showed us one in ChatPoller. Are they all in chat?

They had all the same stack trace, blocked in RelayRequest.handleChat wating for ChatPoller.getEntries (if I am reading it correctly).

ETA: by "all" I mean all the LocalRelayAgent threads. There were other threads too but those were not blocked, and only a couple of them.

It does seem abnormal to have that many active threads.

I think it's just that the one thread gets locked somewhere, and then others start spawning (because I have relay browser open with chat in there, and that checks repeatedly). None of them can close (because that requires the application on the other end to wake up and process the close - again, I have no real experience with this, just parroting what I googled), so they keep hanging around.
No idea what the original cause might be.
(and to add something possibly wildly unrelated: sometimes I get a case of "I click on something in relay browser, mafia processes the request (fight, ...) correctly, but no replay is received by the browser, so the tab is permanently connecting/loading/whatever, until I stop it and click on something else, like [main] ..."; generally, I happily forget all about this very fast).
 
Last edited:

roippi

Developer
There have been some GD complaints of lag popping up recently, and now that I think about it their appearance may have coincided with the changes to the chat subsystem. Some profiling/deeper investigation may be warranted.
 

Veracity

Developer
Staff member
Don't browsers do keep-alive (persistent) connections to combat TCP management overhead (including TIME_WAIT)?
The "Connection" header field specifies whether to leave the connection open. I notice that when we initiate a request, we set this:

Field: Connection = [close]

And when we get back the response, it also has this:

Field: Connection = [close]

We specify that because we are coded to always close the connection immediately after doing the transaction. We could recode to keep it open, I expect, although the server is still allowed to tell us to close and we have to honor that.

That's a good idea.

They had all the same stack trace, blocked in RelayRequest.handleChat wating for ChatPoller.getEntries (if I am reading it correctly).
That sounds like a bug.
 

Grotfang

Developer
I'm afraid I don't have time to deal with this (still in my exams), but one possibility I would want to exclude is that this is related to Faxbot. If you use the mafia interface to request a FaxBot fax, while you have the relay chat open with the new chat system, I can imagine a situation where the thread that opens that request never gets a response. Could this be it?
 

xKiv

Active member
Not for me, I don't use mafia's builtins for that.
I would suspect intermittent network hiccups, but between localhost and localhsot???
 

jwylot

Member
I'm not technically minded but wonder if my experiences over the last few weeks might add anything. I'm getting fairly regular hangs in my relay browser when adventuring normally. I don't use relay chat. The symptom is that the gcli completes a combat or action and the browser just sits there trying to connect. Also when this happens, CPU usage of the browser shows at 25% instead of the usual <1%. My solution is to kill the browser from process explorer and restart it.

Curiously, closing the mafia tab or the browser normally also results in it showing using 25% CPU in process explorer for some time afterwards.

I'm using Waterfox x64 on Win7 if that is relevant.
 

xKiv

Active member
I believe hat would be how any recent-ish firefox does garbage collection after any "naughty" page (lots of javascripts and loads would probably do it). It eventually completes.

It's probably showing 25% because you have 4 CPU cores and firefox is single-thread only (so it can't use the entire CPU; advantage: it can't slow your entire computer to crawl this way).
 

lostcalpolydude

Developer
Staff member
I'm afraid I don't have time to deal with this (still in my exams), but one possibility I would want to exclude is that this is related to Faxbot. If you use the mafia interface to request a FaxBot fax, while you have the relay chat open with the new chat system, I can imagine a situation where the thread that opens that request never gets a response. Could this be it?

requestFax() stops waiting after 60 seconds when relay chat intercepts the response.
 

MsNicole

Member
I don't know if it's the same issue, but I've been having problems with mafia lately.

With me it's not so much a "lag" as a freezing up. All the control buttons grey out, and it becomes impossible to do anything in Mafia until I close the window and reopen it, and then before long it's doing it again.

Well, one way around it is to hit the "stop now button", then enter "conditions clear" in the cli. But you have to do that after every action.
 

lostcalpolydude

Developer
Staff member
I don't know if it's the same issue, but I've been having problems with mafia lately.

With me it's not so much a "lag" as a freezing up. All the control buttons grey out, and it becomes impossible to do anything in Mafia until I close the window and reopen it, and then before long it's doing it again.

Well, one way around it is to hit the "stop now button", then enter "conditions clear" in the cli. But you have to do that after every action.

"graygui force" should clean that up for you. Ignore the error-looking message it spits out, it really does accomplish something.
 

MsNicole

Member
"graygui force" should clean that up for you. Ignore the error-looking message it spits out, it really does accomplish something.


I did try that, and got the following:

0 request sequences will be ignored.
To use this feature, you must run KoLmafia with a JDK instead of a JRE.

... and my buttons and text fields are still greyed out. So I guess the next step is to download this "JDK" thing?

ETA:
OK, it does seem to be helping, just not instantly. Thanks! But I'd still like to know if it would be worth installing JDK if you're not really planning to do javascript development. Any thoughts?
 
Last edited:

Winterbay

Active member
What happens if you use the JDK instead of the JRE is that you get a debuglog as well showing what exactly it was that had hung itself and thus was forcefully terminated by the command.
 
Top