Bug IOException: GOAWAY received

Irrat

Member
With the HTTP Migration from HTTP1 to HTTP2 which has offered tremendous speed gains, there is a new issue where every so rarely often the mafia client will receive an IOException.

Summed up, it's as the title says. GOAWAY received.
It's not the devs trying to tell us we're unwelcome either, it's an actual HTTP error.

Two links to stackoverflow that seem very relevant. Link & Link

Could be the case that there was extra info in that server response that wasn't expressed in the exception, but it seems like the problem is fairly simple on that it's as the two links express.

As per some quick "research", "last-stream-id" is what we want, but apparently may not be properly supported in Java's HttpClient. Which is not great news.

A thought of a stupid hacky solution could be to recreate the connection every X requests, but I'm not sure that would work, or is even how it works.

Disclaimer, the below exception was made on a modified client to print the exception.

Code:
IOException retrieving server reply (newchatmessages.php?j=1&lasttime=1517153437).
class java.io.IOException: /[IP Address:Port]: GOAWAY received
java.io.IOException: /[IP Address:Port]: GOAWAY received
    at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:565)
    at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:119)
    at net.sourceforge.kolmafia.request.GenericRequest.sendRequest(GenericRequest.java:1575)
    at net.sourceforge.kolmafia.request.GenericRequest.externalExecute(GenericRequest.java:1403)
    at net.sourceforge.kolmafia.request.GenericRequest.execute(GenericRequest.java:1391)
    at net.sourceforge.kolmafia.request.GenericRequest.run(GenericRequest.java:1146)
    at net.sourceforge.kolmafia.request.ChatRequest.run(ChatRequest.java:79)
    at net.sourceforge.kolmafia.chat.ChatPoller.run(ChatPoller.java:208)
Caused by: java.io.IOException: /[IP Address:Port]: GOAWAY received
    at java.net.http/jdk.internal.net.http.Http2Connection.handleGoAway(Http2Connection.java:985)
    at java.net.http/jdk.internal.net.http.Http2Connection.handleConnectionFrame(Http2Connection.java:853)
    at java.net.http/jdk.internal.net.http.Http2Connection.processFrame(Http2Connection.java:724)
    at java.net.http/jdk.internal.net.http.frame.FramesDecoder.decode(FramesDecoder.java:155)
    at java.net.http/jdk.internal.net.http.Http2Connection$FramesController.processReceivedData(Http2Connection.java:232)
    at java.net.http/jdk.internal.net.http.Http2Connection.asyncReceive(Http2Connection.java:649)
    at java.net.http/jdk.internal.net.http.Http2Connection$Http2TubeSubscriber.processQueue(Http2Connection.java:1275)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SynchronizedRestartableTask.run(SequentialScheduler.java:175)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:147)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:198)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:271)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:224)
    at java.net.http/jdk.internal.net.http.Http2Connection$Http2TubeSubscriber.runOrSchedule(Http2Connection.java:1293)
    at java.net.http/jdk.internal.net.http.Http2Connection$Http2TubeSubscriber.onNext(Http2Connection.java:1319)
    at java.net.http/jdk.internal.net.http.Http2Connection$Http2TubeSubscriber.onNext(Http2Connection.java:1253)
    at java.net.http/jdk.internal.net.http.common.SSLTube$DelegateWrapper.onNext(SSLTube.java:202)
    at java.net.http/jdk.internal.net.http.common.SSLTube$SSLSubscriberWrapper.onNext(SSLTube.java:484)
    at java.net.http/jdk.internal.net.http.common.SSLTube$SSLSubscriberWrapper.onNext(SSLTube.java:287)
    at java.net.http/jdk.internal.net.http.common.SubscriberWrapper$DownstreamPusher.run1(SubscriberWrapper.java:318)
    at java.net.http/jdk.internal.net.http.common.SubscriberWrapper$DownstreamPusher.run(SubscriberWrapper.java:261)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SynchronizedRestartableTask.run(SequentialScheduler.java:175)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:147)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:198)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:271)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:224)
    at java.net.http/jdk.internal.net.http.common.SubscriberWrapper.outgoing(SubscriberWrapper.java:234)
    at java.net.http/jdk.internal.net.http.common.SubscriberWrapper.outgoing(SubscriberWrapper.java:200)
    at java.net.http/jdk.internal.net.http.common.SSLFlowDelegate$Reader.processData(SSLFlowDelegate.java:403)
    at java.net.http/jdk.internal.net.http.common.SSLFlowDelegate$Reader$ReaderDownstreamPusher.run(SSLFlowDelegate.java:264)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SynchronizedRestartableTask.run(SequentialScheduler.java:175)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:147)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:198)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
 

fronobulax

Developer
Staff member
I reported elsewhere that I was seeing

IOException retrieving server reply (http://dcdb.coldfront.net/collectio...st_query_type=player&last_query_value=2115655) -- HTTP/1.1 header parser received no bytes.

I am repeating myself since the script tries to scrape many pages from Coldfront and to the extent that I can reconstruct recent failures from the session logs, the first failure is about the 10th query plus or minus and the second about 10 queries, plus or minus later. The script stops after the second failure.

When I see "recreate the connection every X requests" my experience makes me wonder if X is ten?
 
This might be more than a benign message. Today I had this come up as I was doing my CS:

unknown.png


I'm pretty sure the item ended up not going onto the disembodied hand. The result was a little less mana obtained after coil wire, so the world didn't end or anything. Still, worth noting.
 

Ryo_Sangnoir

Developer
Staff member
Based on the first post, another option would be to treat the GOAWAY IOException as a transient error, same as timing out would be. Currently IOExceptions are seen as permanent, maximize the timeoutCount, and put Mafia into an error state. As seen here this isn't necessarily the case.
 

Ryo_Sangnoir

Developer
Staff member
On the HTTP/1.1 exception: just pasting
Code:
ash string x = visit_url("http://dcdb.coldfront.net/collections/index.cgi?query_type=item&query_value=4308&last_query_type=player&last_query_value=2115655"); print(x.length())
into the CLI over and over, I can reproduce the error. It works fairly well at first, then I get an error around the fifth, next works, seventh fails, then it irregularly but consistently fails: 10101101110...

As I can reproduce this one it will be easier to see if a fix works :)
 
Last edited:

Ryo_Sangnoir

Developer
Staff member
See if r26307 is any better. It has much better behaviour on the HTTP/1.1 collection pages, though I can't reproduce the HTTP/2 error often enough to check that one.
 

heeheehee

Developer
Staff member
Summed up, it's as the title says. GOAWAY received.
It's not the devs trying to tell us we're unwelcome either, it's an actual HTTP error.
Note that GOAWAY isn't necessarily an error any more than Connection: close in HTTP/1.1. Servers can (and should!) send GOAWAY in preparation for being taken offline for maintenance (e.g. software upgrades).

That said, apparently AWS ALB's H2 implementation / use of GOAWAY is buggy: https://github.com/golang/go/issues/18639 (This is in play for requests to KoL per `curl -I https://www.kingdomofloathing.com`.)

Based on the first post, another option would be to treat the GOAWAY IOException as a transient error, same as timing out would be. Currently IOExceptions are seen as permanent, maximize the timeoutCount, and put Mafia into an error state. As seen here this isn't necessarily the case.

Performing blanket retries of IOExceptions may be dangerous, especially for stateful requests (POSTs in particular) -- if we encounter, say, a frontend -> backend timeout (which for whatever reason didn't abort the backend request), then retrying may result in a duplicated action. This is not great if, say, you repeat an action in combat, take a wrong nested choice (in weird cases where whichchoice doesn't save you), purchase an extra copy of an item...

I do see that you're limiting this to GOAWAY / "parser received no bytes". I'm not sure exactly in what situation "parser received no bytes" might be triggered, but it seems like it's after the client has sent the full HTTP request (since this is specifically in the context of the HTTP response parser). That makes me a little wary.
 

Ryo_Sangnoir

Developer
Staff member
I'm basing GOAWAY needing a retry on:
* https://kolmafia.us/threads/ioexception-goaway-received.27397/#post-167424, which indicates nothing happened (though we don't know for sure)
* how I thougt nginx worked (close the connection, user has to retry), though I'm not sure that's true. In case of POST, it could accept the data, deal with it, then return GOAWAY instead of a response. I think that's less likely though.

"parser received no bytes" is from fronobulax's DCDB parsing. This would be a request to a HTTP/1.1 server where the server again sometimes closes the connection without returning a response. Whether this was safe to retry or not would depend on the server.
 

fronobulax

Developer
Staff member
"parser received no bytes" is from fronobulax's DCDB parsing. This would be a request to a HTTP/1.1 server where the server again sometimes closes the connection without returning a response. Whether this was safe to retry or not would depend on the server.

FWIW

The script is using visit_url and it is an http: not https: call.

It has a long history of "bad responses" from DCDB. I finally decided to count them and abort because with retries, the script could eat up 60 minutes of wall clock time, easily.

The failures I explicitly checked for were visit_url returning a zero length response or a response containing "Unable to connect". Those were determined from experience :)

Logically there is no harm in bounded retries since it is just scraping a page and not doing anything that would "change state".
 

fronobulax

Developer
Staff member
I built and ran and the script ran as expected. In particular it did not detect enough I/O errors to stop. Presumably the retry worked as hoped.
 

Ryo_Sangnoir

Developer
Staff member
By some messages on discord, it's possible that on a GOAWAY there is server action, unlike what I assumed. So we can only safely retry the idempotent calls (e.g. api.php, profile.php, other pages).

The double request happened on a chateau rest, but could also happen on any other rest, any action in battle, taking a choice in a multi-choice, any POST call... I think these should be marked as "do not retry" (under any circumstances, not just GOAWAY or the like).
 

Irrat

Member
This GOAWAY error is annoying.

I ascended, and received the error.


Code:
Ascend as a Casual Female Seal Clubber under the Wombat sign on no path, banking 85 Karma.
IOException retrieving server reply (afterlife.php?action=ascend&confirmascend=1&whichsign=7&gender=2&whichclass=1&whichpath=0&asctype=1&nopetok=1) -- /192.168.178.116:65271: GOAWAY received.

I thought it was harmless, a page refresh showed I had successfully ascended. But then stuff wasn't working properly. Refreshed my session and a bunch of post-ascension things were revealed as not having been run.

> fold garbage shirt

You can't make a shirt
(Used menu to refresh session here)
Loading character status...
Refreshing session data...
Synchronizing moon data...
Retrieving character data...
Preference lastVioletFogMap changed from 121 to 122
Preference lastDwarfFactoryReset changed from 121 to 122
Preference lastGuyMadeOfBeesReset changed from 121 to 122
Preference lastPirateInsultReset changed from 121 to 122
Preference lastBangPotionReset changed from 121 to 122
Preference lastSkateParkReset changed from 121 to 122
Preference lastCellarReset changed from 121 to 122
Updating inventory...
Examining Meat in closet...
Updating closet...
Retrieving quest data...
Preference questM05Toot changed from unstarted to started
Retrieving familiar data...
Familiar data retrieved.
Retrieving campground data...
Looking through your telescope...
Preference lastTelescopeReset changed from 121 to 122
Preference telescope1 changed from to sitting around playing chess and solving complicated-looking logic puzzles
Preference nsChallenge1 changed from none to Mysticality
Preference telescope2 changed from to people, all of whom appear to be on fire
Preference nsChallenge2 changed from none to hot
Preference telescope3 changed from to smoldering bushes on the outskirts of a hedge maze
Preference nsChallenge3 changed from none to hot
Preference telescope4 changed from to a cloud of green gas hovering over the maze
Preference nsChallenge4 changed from none to stench
Preference telescope5 changed from to surrounded by creepy black mist
Preference nsChallenge5 changed from none to spooky
Preference hasOven changed from false to true
Preference hasShaker changed from false to true
Examining Meat and pulls in storage...
Updating storage...
Preference lastEmptiedStorage changed from 121 to 122
Preference umbrellaState changed from bucket style to broken
You are currently a member of The Average Clan
Seeing what's still unrestricted today...
Done checking allowed items.
Session data refreshed.
Preference lastBattlefieldReset changed from 121 to 122

> fold garbage shirt

Preference garbageTreeCharge changed from 0 to 1000
Preference garbageChampagneCharge changed from 0 to 11
Preference garbageShirtCharge changed from 0 to 37
Preference _garbageItemChanged changed from false to true
You acquire an item: makeshift garbage shirt
 

Ryo_Sangnoir

Developer
Staff member
The circumstances here seem to be:
* a request is sent
* the server accepts the request, does something, then returns a GOAWAY frame
* Java errors upon reception of the GOAWAY frame

Mafia can't update the internal state because there is no response text. Mafia also can't get the response text, because to do that it would have to make the call again -- but the call has done something, so it shouldn't be retried.

Ideally, the client should return the response upon receiving GOAWAY, and internally reset its own connection instead of exposing that to the caller. Failing that, it could expose some logic for closing the connection every X calls (where X here is ~1000). I don't think HttpClient does either of those, though.

The main problem with this is that it's not easily reproducible, so it's hard to check whether a "fix" has actually worked. My next thought would be to count how many "GenericRequest" calls we make with the same client, and reset the client every 998 or so. I don't know whether this would actually work -- Java bug tickets make reference to a "connection pool", and if the client uses a pool it won't help.

Still, could add a pref, and let those who want to run with it try it out for a few weeks to see if anything helps.
 

heeheehee

Developer
Staff member
it could expose some logic for closing the connection every X calls
I don't think that fully fixes the issue, only makes it slightly less common (assuming that you're not churning through 1000 requests much faster than the rate at which the frontend set you connect to receives rollouts).

Can you link an example ticket that mentions connection pooling? I can't find any documentation indicating that Java's HttpClient actually uses this behavior (I do see references to such in OkHttp and Apache).

Per openjdk17 source it seems like each HttpClient has a separate map of host : port -> connection (no pooling).
 

Irrat

Member
Modified my client to log how many requests were sent, and it appears that every 10k requests; A goaway is sent.

Code:
response = getClient().send(request, BodyHandlers.ofInputStream());

if (request.method().equalsIgnoreCase("get")) {
  getRequestsSent++;
}else if (request.method().equalsIgnoreCase("post")) {
  postRequestsSent++;
}else {
  otherRequestsSent++;
}

Code:
  && (errorMessage.contains("GOAWAY")
      || errorMessage.contains("parser received no bytes"))) {

 RequestLogger.updateSessionLog("Requests (" + getRequestsSent + " get), (" + postRequestsSent + " post), (" +otherRequestsSent + " other). Request was " + request.method() + " of url " + request.uri().toString());
++this.timeoutCount;

Code:
Requests (3151 get), (6851 post), (0 other). Request was POST of url https://www.kingdomofloathing.com:443/choice.php
Requests (4634 get), (5367 post), (0 other). Request was GET of url https://www.kingdomofloathing.com:443/newchatmessages.php?j=1&lasttime=1518767628
Requests (14495 get), (5505 post), (0 other). Request was POST of url https://www.kingdomofloathing.com:443/inv_equip.php
Requests (14495 get), (5505 post), (0 other). Request was GET of url https://www.kingdomofloathing.com:443/newchatmessages.php?j=1&lasttime=1518775616

So the obvious answer is to reconstruct the HttpClient every 9900 requests.
 

Obeliks

Member
Any updates on this? Was there a fix committed?

I have two characters who run the identical script with the identical build of kolmafia (in separate processes with separate home folders), one regularly hits the GOAWAY, but the other one doesn't.
 

Obeliks

Member
Sorry for digging up this old thread.

I found the commit that restricts GOAWAY handling to retryOnTimeout requests, so it's probably related to some coincidence that causes one character to hit the 1000th request on a retriable request and the other not. Although I'm not surprised that the script should be making 1000 requests within two minutes, I have to check my script, but I think there is something else going on.
 
Top