Bug IO Exception almost daily on manual adventuring.

VladYvhuce

Member
This is one I have no idea how to trigger, and since I don't know how to read debug logs, may not be able to capture with a log that is not too big to post to the forums. But, it seems like it happens at some point any day that I happen to do a bunch of manual adventuring. In the browser, it shows up simply as the adventuring pane not loading. This can be during combat, or noncombat. This time it happened in the middle of a choice adventure.

This is what shows up when looking in mafia's Graphical CLI tab:
IOException retrieving server reply (choice.php?pwd&whichchoice=923&option=1) -- /192.168.0.52:52425: GOAWAY received.
 
That message is usually due to a server side error.

Agreed. But there was one place that if I understood what I was reading claimed it could be a client error. The client was calculating a header size as if the protocol were HTTP/1. But things were being sent as HTTP/2. The corresponding portion of the header was compressed and so the sizes were different for HTTP/1 and HTTP/2. The server returned GOAWAY because the header was not the size it was claimed to be.

I suppose if someone who knew protocols better than I do and was motivated then they could check the code and decide whether that applies and whether something is being built for the wrong protocol.
 
I suppose if someone who knew protocols better than I do and was motivated then they could check the code and decide whether that applies and whether something is being built for the wrong protocol.

If that were the case, then ~every request would result in a GOAWAY.

We do not have our own hand-rolled HTTP codec, as we're relying on Java's standard library implementation. As such, I'm very skeptical of the notion that Mafia is doing something that's standards-noncompliant.

We have seen similar reports in the past, which were nominally attributed to AWS ALB. I have no reason to believe this case is different. https://kolmafia.us/threads/ioexception-goaway-received.27397/

Per observations in the thread (as well as updates to the github issue I linked), it seems that AWS ALB will send a GOAWAY and a TCP FIN after 10k requests on the same connection, or if any of the headers are too big (> 8 KB) post-compression. With that new information at hand, @Ryo_Sangnoir's suggestion to recreate the HttpClient every X requests (for some value < 10000) seems a lot more sensible.

Also, for what it's worth, if it is actually initiated by AWS ALB, then it's likely safe to retry, since said request would not be forwarded to the backend (i.e. KoL's actual webserver) at all.
 
That said, r26388 implemented the nominal "reset the HttpClient every X requests" (which was later refined in r27798). I would be a little surprised if we're running into the headers-too-big problem. I'd be more inclined to believe that something isn't actually using the ResettingHttpClient.
 
Also, for what it's worth, if it is actually initiated by AWS ALB, then it's likely safe to retry, since said request would not be forwarded to the backend (i.e. KoL's actual webserver) at all.
From what I can tell, the request /is/ forwarded to KoL's server, which then responds with GOAWAY. Last I checked I thought it might come from NGINX -- I think the default config is GOAWAY after a large number of requests on the same connection.
 
KoL is apparently using both nginx/1.18.0 and nginx/1.8.1, per the returned headers (the AWSALB cookie pins you to a particular backend). Perhaps they've recently started a migration, which would explain why this started happening again. (HTTP/2 support in nginx was added in 1.9.5.)

https://nginx.org/en/docs/http/ngx_http_v2_module.html#http2_max_requests defaults to 1000; that then also results in a GOAWAY. We likely need to lower our threshold. Nominally, lowering our threshold to 900 should be fine; the overhead of setting up a new HTTP/2 connection (~3 round trips since it looks like my curl request is being downgraded to TLS 1.2) should be fairly small whether amortized over 100 or 1000 requests.

(Per the linked docs, that particular directive is merged into keepalive_requests in newer versions of nginx; between 1.19.7 and 1.19.10 the default drops briefly to 100.)

Even so, if the GOAWAY comes from nginx, it should be safe to retry since it never actually makes it to its backend, namely the PHP server which handles the business logic.
 
This one happened mid-fight. Again, manual adventuring. This is the graphical CLI log. The fight results never showed up in the relay browser.
[535] The Castle in the Clouds in the Sky (Basement)
Preference lastEncounter changed from Alphabet Giant to Neckbeard Giant
Encounter: Neckbeard Giant
Preference _lastCombatStarted changed from 20241013183846 to 20241013183907
Round 0: Rick Tyger wins initiative!
Round 1: Your toy train moves ahead to the Coal Hopper. Your train takes on coal to power the next stop.
Preference trainsetPosition changed from 457 to 458
Preference _currentDartboard changed from 7513:butt,7514:head,7515:torso,7516:arm,7517:leg to 7513:torso,7514:leg,7515:arm,7516:butt,7517:head
Preference _dartsLeft changed from 2 to 3
Round 1: Rick Tyger casts INFECTIOUS BITE!
Round 2: Jed claws, claws, and bites your opponent for 14 (+16) damage.
Round 2: Neckbeard Giant takes 30 damage.
Round 2: Neckbeard Giant takes 16 damage.
Round 2: Rick Tyger casts SURPRISINGLY SWEET STAB!
IOException retrieving server reply (fight.php?action=skill&whichskill=7489) -- /192.168.0.52:52178: GOAWAY received.
Preference _concoctionDatabaseRefreshes changed from 366 to 367
 
Hm. For the next time this happens again, mind grabbing a debug log at the start of your session, then again right after it occurs?

Specifically,
Code:
debug on; text game.php; debug off
should be enough to grab some response headers, which'll tell us which version of nginx is issuing the GOAWAY.

Code:
...
Retrieved: https://www.kingdomofloathing.com/game.php
11 header fields
Field: :status = [200]
Field: cache-control = [no-store, no-cache, must-revalidate, post-check=0, pre-check=0]
Field: content-encoding = [gzip]
Field: content-type = [text/html; charset=UTF-8]
Field: date = [Mon, 14 Oct 2024 00:34:02 GMT]
Field: expires = [Thu, 19 Nov 1981 08:52:00 GMT]
Field: pragma = [no-cache]
Field: server = [nginx/1.18.0]
...

We probably could just set HTTP_CLIENT_REQUEST_LIMIT to 700 and call it a day... although I couldn't seem to get a GOAWAY when I tried to reproduce, so there might be something else at play.
 
Hm. For the next time this happens again, mind grabbing a debug log at the start of your session, then again right after it occurs?

Specifically,
Code:
debug on; text game.php; debug off
should be enough to grab some response headers, which'll tell us which version of nginx is issuing the GOAWAY.

Code:
...
Retrieved: https://www.kingdomofloathing.com/game.php
11 header fields
Field: :status = [200]
Field: cache-control = [no-store, no-cache, must-revalidate, post-check=0, pre-check=0]
Field: content-encoding = [gzip]
Field: content-type = [text/html; charset=UTF-8]
Field: date = [Mon, 14 Oct 2024 00:34:02 GMT]
Field: expires = [Thu, 19 Nov 1981 08:52:00 GMT]
Field: pragma = [no-cache]
Field: server = [nginx/1.18.0]
...

We probably could just set HTTP_CLIENT_REQUEST_LIMIT to 700 and call it a day... although I couldn't seem to get a GOAWAY when I tried to reproduce, so there might be something else at play.
Can you explain a bit more clearly? Do you want me to try logging from the beginning until it occurs and hoping the log isn't too big? Or just turn it on and off? Or on, then do something then off?
 
Last edited:
The command I specifically listed should open a log, make a single (small) request, and then turn off the log, so you can run it both at the start of your session and right after encountering the problem.
 
Back
Top