Lag issues and timeouts

AlbinoRhino

Active member
Me too. Most are of the form "IOException during data post (URL): Connection timed out: connect."
The most frequent URL appears to be "api.php?what=status&for=KoLmafia" which I guess mafia must visit fairly often.
I also occasionally am seeing a message that (not sure if this is the exact wording) says... Lag issues were encountered. Which doesn't appear in red. It seems mafai will assume those requests completed? That is... it will increment, toggle properties and such ...which further experimentation will reveal didn't actually complete and shouldn't have been incremented.

I've reset everything I could reset (router etc.) to no effect. Also, none of my other internet activity appears to be affected. Not sure what can be done is the problem is actually somewhere between mafia and the servers.
 

fronobulax

Developer
Staff member
I'm having lots of trouble with automation these days. For example, today I had to run Ezandora's briefcase script three times to get it to finish. I always believed it was a server side issue.

Recently, this was posted: The majority, if not all of the server issues people have experienced, appear to be on Mafia's side, not the server itself.

Is this problem unique to me? I don't see many others complaining about it, but I have dozens of failures a day.

It would be nice if there were some elaboration/evidence to support the assertion that the problems appear to be on Mafia's side.

This might be a related experience. In that case the reporter used a VPN to connect to the USA and reported the problem seems to have gone away.

I did notice one timeout today but ... That said, I usually don't see any timeouts so I don't know if this is even a helpful anecdote.

I feel like a couple of scripts are taking more wall clock time than usual but since the characters are in KoE it might be as simple as the scripts are trying things that should not work in KoE and then being robust enough to continue after failure.

When I was responsible for both the clients and the servers my experience was that, most of the time, lag and timeouts were a network problem. Neither the server nor the clients were showing "signs of stress" or otherwise lacking resources needed to handle all the connections. So I tend to find reports that a problem is client (or server) side less credible and blame the network. Of course my networking team would often work with me to prove otherwise but often the problem just went away.

But you sound like you are having more issues than I am ;-)
 

Veracity

Developer
Staff member
I saw that and rolled my eyes. I'm very curious to learn how "they" determined that.

KoLmafia opens a network connection to KoL's server.
- If it times out (as birdy has been having a lot recently) - that is because the response did not arrive in time.
--- Is it because the server did not respond? The report you cited implies no - and, as I said, I am curious to understand how they determined that. Did they run KoLmafia and see a request time out and then look at the server logs, sort through tens of thousands of requests, and see the specific request and the specific response they sent?
--- Is it because there is an issue with KoL's ISP and sometimes requests end up getting dropped without returning a more informative connection error?
--- Is there a sporadic routing issue anywhere between the user's ISP and KoL's ISP with the same effect?
--- Even if the request is received and a response is generated, similar issues could result in the response being dropped and the connection timing out.

KoLmafia succeeds in opening the connection. It posts a request.
- It can get an IOException. This is on a successfully opened connection. So, this is returned by Java's HTTP/TCP code
--- (presumably because the OS failed to accept/transmit the request.)
--- This is a problem in Java or the OS.

KoLmafia succeeded in sending the request. It waits for the response.
- I believe we no longer timeout on this. It can take a LONG time to get the response
--- Perhaps the server is slow
--- Perhaps the network is laggy
----- Perhaps your computer has to retransmit multiple times before the server receives the request
----- Perhaps the server has to retransmit multiple times before your computer receives the response.
- Sometimes Java receives an IOException. Since we don't time out, there is no indication of what caused this.
--- Something, in Java, or in your OS, decides that the response will not be forthcoming, so it aborts.

Summarizing all of the above, ConnectionFailures and timeouts and IOExceptions can arise from the following places:

- Your local Java implementation failed to handle KoLmafia's networking calls successfully
- Your local OS failed to handle Java's networking calls successfully
- Your local ISP (or any router on the way) failed to route the network traffic to KoL's ISP
- KoL's ISP failed to deliver the network traffic - connection or data - to KoL's server

All of the above are what KoL means when they say "the issues are on Mafia's side".
Notice that literally none of them are in KoLmafia itself.

- KoL's ISP delivered the traffic to KoL and KoL responded as expected: it accepted the connection or it read the request and sent the response.
--- Note that this can require retransmission, and KoL's server will not give up until end-to-end acknowledgment is successful. I.e., until your computer acknowledges that it has received all the data and KoL's server receives that acknowledgement.

That is what KoL means when they say "not the server itself".

- KoL's ISP (or any router on the way) failed to route the responses to your local ISP.
- Your local ISP fails to deliver the response to your computer.
- Your computer receives the response and acknowledges it, it fails to deliver it to Java
- Java fails to deliver it to KoLmafia, returning instead a connection failure or an IOException (or a timeout, if we enabled that).

Summarizing all of that, when KoL says "not the server itself", they are saying this:

- Connection requests come in to KoL, and are accepted (perhaps after retransmission perhaps in both directions)
- Requests come in (perhaps with retransmission), are acknowledged (perhaps with retransmission).
- Responses are sent and delivered (perhaps with retransmission) and are acknowledged (perhaps with retransmission).

If all of the above are true, the issue is not in the server itself, as KoL claims. I assume they have verified this by looking at specific requests that failed in KoLmafia, finding transactions in the server logs, and seeing that the requests came in and responses were sent - and the response did not fail with a network failure. I assume that the networking code on KoL's server is robust and that code operating on the servers is not failing; they are presumably handling hundreds or thousands of transactions per second.

Given that, what can be causing the issues?

- Is it in KoLmafia's code? Nothing has changed in the network handling for years. I am recently interested in looking at the Relay code to see if I can improve response there, since we are adding an extra layer of network calls when we interpose between the browser and KoL, but the examples you (and others) are citing are directly between KoLmafia and KoL - scripts, automation, etc.
- Is it in Java? Seems unlikely, but I have to ask: have you updated Java recently?
- Is it in your OS? Possible. If you are getting IOExceptions, that is Java responding to unexpected behavior from the OS - which is most likely unexpected behavior in the network.
- Is it in the network between your computer and KoL's servers? I think this is the most likely explanation, since the recent issues happen to some people and not to others. For example, Ezandora's briefcase script has worked flawlessly for me always (except when I tried running it in Kingdom of Exploathing before I made KoLmafia visit the council and thereby enable place.php.)

Final final summary:

KoLmafia's viewpoint is that a "server side issue" is anywhere outside KoLmafia. To be more precise, we can include "your Java installation" and "your OS" as part of that. But, assuming we (KoLmafia/Java/OS) generate the correct network requests and transmit them to the network, failures are not our fault and there is nothing we can do about it.

KoL's viewpoint is similar: assuming they (OS/Apache/KoL) receive and transmit correct network requests, failures are not their fault. Aenimus's post notwithstanding, that does NOT mean that they are on "Mafia's side", unless he includes the entire Internet as being "Mafia's side".

If we are fair, KoLmafia is "your computer - KoLmafia/Java/OS" and KoL is "their computer - KoL/Apache/OS".

I have seen no evidence that the problems are in KoLmafia.
Aenimus claims that "The majority, if not all of the server issues people have experienced ... [are] not in the server itself".

Which means that the issues are in the network between KoLmafia and KoL:

- Your ISP
- the rest of the Internet
- KoL's ISP
 

Crowther

Active member
Thanks for your long response. I notice times where failure is common and times were it is rare. Which makes me think it is not a client problem.

Here's a recent example:
Code:
> acquire 1 wand of

Verifying ingredients for Wand of Nagamar (1)...
Creating Wand of Nagamar (1)...
Creation failed, no results detected.

> acquire 1 wand of

Verifying ingredients for Wand of Nagamar (1)...
Creating Wand of Nagamar (1)...
You acquire an item: Wand of Nagamar
Successfully created Wand of Nagamar (1)
One thing I noticed is when it fails with item creation or casting a buff those things did not happen on the server. I think I'll try and set up some traffic logging. The fact that some people have trouble and some don't makes me think it is a network problem, but I don't notice dropped connections with other systems, but then I don't use anything else the way I use KoL.
 

Veracity

Developer
Staff member
One thing I noticed is when it fails with item creation or casting a buff those things did not happen on the server. I think I'll try and set up some traffic logging. The fact that some people have trouble and some don't makes me think it is a network problem, but I don't notice dropped connections with other systems, but then I don't use anything else the way I use KoL.
Yeah. If we post a request and that request fails, the following goes in your DEBUG log:

Time out during data post (URL). This could be bad...

or the following goes in gCLI and status line, along with a red sidebar.

IOException during data post (URL): ERROR.

Do you have timeouts enables?

get allowSocketTimeout

default is false. I have it false, since, as I mentioned in my long note, it can legitimately take a long time to get a response and, as the first message I shared above says, if it times out during the data post, KoL may or may not have actually received it and acted on it, and you have no way of knowing.
 

Ezandora

Member
Oh, I've been studying this; I used to have these issues until I engineered a fix for my end.

What I've noticed:

It seems to happen in batches; when running turns on two mafia instances at once, they both lag at the same time.

Inside of mafia, you can replicate the issue by equipping/unequipping a hat over and over in a script until you have Problems. Or maybe just visiting main.php.

It's possible to replicate outside of mafia using httping. Run it for a long enough time, and you'll start to get timeout errors, something like 0.5% of the time:

Code:
%./httping -v "https://www.kingdomofloathing.com/login.php?loginid="
Connecting to 107.23.63.16...
connected to 107.23.63.16:443 (154 bytes), seq=48 time= 86.00 ms 
Connecting to 107.23.63.16...
connected to 107.23.63.16:443 (154 bytes), seq=49 time= 87.16 ms 
Connecting to 107.23.63.16...
connected to 107.23.63.16:443 (154 bytes), seq=50 time= 89.01 ms 
Connecting to 107.23.63.16...
connect time out

[...]

Connecting to 54.89.140.139...
connected to 54.89.140.139:443 (154 bytes), seq=719 time= 86.57 ms 
Connecting to 54.89.140.139...
connected to 54.89.140.139:443 (154 bytes), seq=720 time=100.80 ms 
Connecting to 54.89.140.139...
connected to 54.89.140.139:443 (154 bytes), seq=721 time= 89.64 ms 
Connecting to 54.89.140.139...
connect time out
You may need to ping "https://www.kingdomofloathing.com/login.php?loginid=" specifically, rather than "https://www.kingdomofloathing.com/".

I edited httping to output which IP it was connecting to, even during a timeout error. This allowed me to discover that two of the four IPs KOL uses have errors, and two don't.
Erroring IPs:
Code:
54.89.140.139
107.23.63.16
Non-erroring IPs: (so far)
Code:
3.225.18.117
18.214.149.52

I tracerouted all of those IPs, and they seem to take two sets of routes. One set for the erroring IPs, and one for the non-erroring IPs.

As such, I suspect the issue could be:
-Those two frontends are bad, and only certain people have issues?
-There's an issue somewhere in the internet, that only affects certain routes and certain ISPs?, that causes connections to break in ways TCP doesn't like.
-A bad AWS availability zone, where those two IPs are hosted, and the other two are in another zone...?

I don't have enough network knowledge to think of other ways.

It's possible to mitigate the issues by editing your /etc/hosts file, to only connect to a "stable" server:
Code:
3.225.18.117 www.kingdomofloathing.com
But that isn't a useful fix for everyone. Actually, mafia itself could just blacklist the two bad IPs. Or the game devs could shut those servers off.
 

AlbinoRhino

Active member
Haven't been able to test too extensively, but preliminary indications are that the hosts file entry has resolved this issue for me. My start-of-day script completed w/o errors for the first time in several days. Thanks Ezandora!
 

Crowther

Active member
Do you have timeouts enables?

get allowSocketTimeout

default is false. I have it false, since, as I mentioned in my long note, it can legitimately take a long time to get a response and, as the first message I shared above says, if it times out during the data post, KoL may or may not have actually received it and acted on it, and you have no way of knowing.
I had timeouts enabled. I have no clue why.

Thanks Ezandora! Any workaround in a storm.
 

AlbinoRhino

Active member
allowSocketTimeout was apparently true for me as well and I never changed it.

I just completed an entire day's worth of interactions w/o seeing a single request fail.
 

Veracity

Developer
Staff member
I think it originally defaulted to true, but I changed the default many years ago, so old players who had it set to the original default didn’t automatically get the new default.

I probably should just remove it, since I think it is generally a bad idea.
 

Aenimus

Member
My apologies, folks. I misinterpreted a bunch of stuff to mean that the errors were purely due to mafia (something CDMoyer said, a graph he posted, the fact I couldn't replicate the errors myself, and the complaints only ever seemed to happen when using mafia itself).

If it hasn't been already, I will bring it up in /dev, and I will also correct the misinformation in the next next patch notes. My understanding is that it's an issue with 2 of the 4 KoL servers?

Really sorry about that!
 

fronobulax

Developer
Staff member
My apologies, folks. I misinterpreted a bunch of stuff to mean that the errors were purely due to mafia (something CDMoyer said, a graph he posted, the fact I couldn't replicate the errors myself, and the complaints only ever seemed to happen when using mafia itself).

If it hasn't been already, I will bring it up in /dev, and I will also correct the misinformation in the next next patch notes. My understanding is that it's an issue with 2 of the 4 KoL servers?

Really sorry about that!

Thank you for the apology.
 

Veracity

Developer
Staff member
Thanks, Aenimus.

Revision 19533 disables allowSocketTimeout. I left the code, as an example of how to do it, but the option is ignored and is always treated as false.
 

Crowther

Active member
While I can't say which fix helped, I no longer have such bad trouble. I've seen a few things that look like lag timeout, but less than once a day per character, which is something I can comfortably live with. Thanks, everyone!
 

Kizehk

New member
It's possible to mitigate the issues by editing your /etc/hosts file, to only connect to a "stable" server

Thank you Ezandora! I was timing out every couple of minutes for the last few months. After applying this fix, I've only timed out once in the past week.
 
Top