Page 1 of 2 1 2 LastLast
Results 1 to 10 of 16

Thread: Lag issues and timeouts

  1. #1
    Senior Member Crowther's Avatar
    Join Date
    Nov 2006
    Posts
    1,487

    Default Lag issues and timeouts

    I'm having lots of trouble with automation these days. For example, today I had to run Ezandora's briefcase script three times to get it to finish. I always believed it was a server side issue.

    Recently, this was posted: The majority, if not all of the server issues people have experienced, appear to be on Mafia's side, not the server itself.

    Is this problem unique to me? I don't see many others complaining about it, but I have dozens of failures a day.

  2. #2
    Senior Member AlbinoRhino's Avatar
    Join Date
    May 2008
    Posts
    996

    Default

    Me too. Most are of the form "IOException during data post (URL): Connection timed out: connect."
    The most frequent URL appears to be "api.php?what=status&for=KoLmafia" which I guess mafia must visit fairly often.
    I also occasionally am seeing a message that (not sure if this is the exact wording) says... Lag issues were encountered. Which doesn't appear in red. It seems mafai will assume those requests completed? That is... it will increment, toggle properties and such ...which further experimentation will reveal didn't actually complete and shouldn't have been incremented.

    I've reset everything I could reset (router etc.) to no effect. Also, none of my other internet activity appears to be affected. Not sure what can be done is the problem is actually somewhere between mafia and the servers.

  3. #3
    Developer fronobulax's Avatar
    Join Date
    Feb 2009
    Location
    Central Virginia, USA
    Posts
    4,474

    Default

    I'm having lots of trouble with automation these days. For example, today I had to run Ezandora's briefcase script three times to get it to finish. I always believed it was a server side issue.

    Recently, this was posted: The majority, if not all of the server issues people have experienced, appear to be on Mafia's side, not the server itself.

    Is this problem unique to me? I don't see many others complaining about it, but I have dozens of failures a day.
    Originally Posted by Crowther View Post
    It would be nice if there were some elaboration/evidence to support the assertion that the problems appear to be on Mafia's side.

    This might be a related experience. In that case the reporter used a VPN to connect to the USA and reported the problem seems to have gone away.

    I did notice one timeout today but ... That said, I usually don't see any timeouts so I don't know if this is even a helpful anecdote.

    I feel like a couple of scripts are taking more wall clock time than usual but since the characters are in KoE it might be as simple as the scripts are trying things that should not work in KoE and then being robust enough to continue after failure.

    When I was responsible for both the clients and the servers my experience was that, most of the time, lag and timeouts were a network problem. Neither the server nor the clients were showing "signs of stress" or otherwise lacking resources needed to handle all the connections. So I tend to find reports that a problem is client (or server) side less credible and blame the network. Of course my networking team would often work with me to prove otherwise but often the problem just went away.

    But you sound like you are having more issues than I am ;-)
    Well, thank you.
    Originally Posted by Veracity View Post

  4. #4
    Developer Veracity's Avatar
    Join Date
    Mar 2006
    Location
    The Unseelie Court
    Posts
    12,087

    Default

    I saw that and rolled my eyes. I'm very curious to learn how "they" determined that.

    KoLmafia opens a network connection to KoL's server.
    - If it times out (as birdy has been having a lot recently) - that is because the response did not arrive in time.
    --- Is it because the server did not respond? The report you cited implies no - and, as I said, I am curious to understand how they determined that. Did they run KoLmafia and see a request time out and then look at the server logs, sort through tens of thousands of requests, and see the specific request and the specific response they sent?
    --- Is it because there is an issue with KoL's ISP and sometimes requests end up getting dropped without returning a more informative connection error?
    --- Is there a sporadic routing issue anywhere between the user's ISP and KoL's ISP with the same effect?
    --- Even if the request is received and a response is generated, similar issues could result in the response being dropped and the connection timing out.

    KoLmafia succeeds in opening the connection. It posts a request.
    - It can get an IOException. This is on a successfully opened connection. So, this is returned by Java's HTTP/TCP code
    --- (presumably because the OS failed to accept/transmit the request.)
    --- This is a problem in Java or the OS.

    KoLmafia succeeded in sending the request. It waits for the response.
    - I believe we no longer timeout on this. It can take a LONG time to get the response
    --- Perhaps the server is slow
    --- Perhaps the network is laggy
    ----- Perhaps your computer has to retransmit multiple times before the server receives the request
    ----- Perhaps the server has to retransmit multiple times before your computer receives the response.
    - Sometimes Java receives an IOException. Since we don't time out, there is no indication of what caused this.
    --- Something, in Java, or in your OS, decides that the response will not be forthcoming, so it aborts.

    Summarizing all of the above, ConnectionFailures and timeouts and IOExceptions can arise from the following places:

    - Your local Java implementation failed to handle KoLmafia's networking calls successfully
    - Your local OS failed to handle Java's networking calls successfully
    - Your local ISP (or any router on the way) failed to route the network traffic to KoL's ISP
    - KoL's ISP failed to deliver the network traffic - connection or data - to KoL's server

    All of the above are what KoL means when they say "the issues are on Mafia's side".
    Notice that literally none of them are in KoLmafia itself.

    - KoL's ISP delivered the traffic to KoL and KoL responded as expected: it accepted the connection or it read the request and sent the response.
    --- Note that this can require retransmission, and KoL's server will not give up until end-to-end acknowledgment is successful. I.e., until your computer acknowledges that it has received all the data and KoL's server receives that acknowledgement.

    That is what KoL means when they say "not the server itself".

    - KoL's ISP (or any router on the way) failed to route the responses to your local ISP.
    - Your local ISP fails to deliver the response to your computer.
    - Your computer receives the response and acknowledges it, it fails to deliver it to Java
    - Java fails to deliver it to KoLmafia, returning instead a connection failure or an IOException (or a timeout, if we enabled that).

    Summarizing all of that, when KoL says "not the server itself", they are saying this:

    - Connection requests come in to KoL, and are accepted (perhaps after retransmission perhaps in both directions)
    - Requests come in (perhaps with retransmission), are acknowledged (perhaps with retransmission).
    - Responses are sent and delivered (perhaps with retransmission) and are acknowledged (perhaps with retransmission).

    If all of the above are true, the issue is not in the server itself, as KoL claims. I assume they have verified this by looking at specific requests that failed in KoLmafia, finding transactions in the server logs, and seeing that the requests came in and responses were sent - and the response did not fail with a network failure. I assume that the networking code on KoL's server is robust and that code operating on the servers is not failing; they are presumably handling hundreds or thousands of transactions per second.

    Given that, what can be causing the issues?

    - Is it in KoLmafia's code? Nothing has changed in the network handling for years. I am recently interested in looking at the Relay code to see if I can improve response there, since we are adding an extra layer of network calls when we interpose between the browser and KoL, but the examples you (and others) are citing are directly between KoLmafia and KoL - scripts, automation, etc.
    - Is it in Java? Seems unlikely, but I have to ask: have you updated Java recently?
    - Is it in your OS? Possible. If you are getting IOExceptions, that is Java responding to unexpected behavior from the OS - which is most likely unexpected behavior in the network.
    - Is it in the network between your computer and KoL's servers? I think this is the most likely explanation, since the recent issues happen to some people and not to others. For example, Ezandora's briefcase script has worked flawlessly for me always (except when I tried running it in Kingdom of Exploathing before I made KoLmafia visit the council and thereby enable place.php.)

    Final final summary:

    KoLmafia's viewpoint is that a "server side issue" is anywhere outside KoLmafia. To be more precise, we can include "your Java installation" and "your OS" as part of that. But, assuming we (KoLmafia/Java/OS) generate the correct network requests and transmit them to the network, failures are not our fault and there is nothing we can do about it.

    KoL's viewpoint is similar: assuming they (OS/Apache/KoL) receive and transmit correct network requests, failures are not their fault. Aenimus's post notwithstanding, that does NOT mean that they are on "Mafia's side", unless he includes the entire Internet as being "Mafia's side".

    If we are fair, KoLmafia is "your computer - KoLmafia/Java/OS" and KoL is "their computer - KoL/Apache/OS".

    I have seen no evidence that the problems are in KoLmafia.
    Aenimus claims that "The majority, if not all of the server issues people have experienced ... [are] not in the server itself".

    Which means that the issues are in the network between KoLmafia and KoL:

    - Your ISP
    - the rest of the Internet
    - KoL's ISP

  5. #5
    Senior Member Crowther's Avatar
    Join Date
    Nov 2006
    Posts
    1,487

    Default

    Thanks for your long response. I notice times where failure is common and times were it is rare. Which makes me think it is not a client problem.

    Here's a recent example:
    Code:
    > acquire 1 wand of
    
    Verifying ingredients for Wand of Nagamar (1)...
    Creating Wand of Nagamar (1)...
    Creation failed, no results detected.
    
    > acquire 1 wand of
    
    Verifying ingredients for Wand of Nagamar (1)...
    Creating Wand of Nagamar (1)...
    You acquire an item: Wand of Nagamar
    Successfully created Wand of Nagamar (1)
    One thing I noticed is when it fails with item creation or casting a buff those things did not happen on the server. I think I'll try and set up some traffic logging. The fact that some people have trouble and some don't makes me think it is a network problem, but I don't notice dropped connections with other systems, but then I don't use anything else the way I use KoL.

  6. #6
    Developer Veracity's Avatar
    Join Date
    Mar 2006
    Location
    The Unseelie Court
    Posts
    12,087

    Default

    One thing I noticed is when it fails with item creation or casting a buff those things did not happen on the server. I think I'll try and set up some traffic logging. The fact that some people have trouble and some don't makes me think it is a network problem, but I don't notice dropped connections with other systems, but then I don't use anything else the way I use KoL.
    Originally Posted by Crowther View Post
    Yeah. If we post a request and that request fails, the following goes in your DEBUG log:

    Time out during data post (URL). This could be bad...

    or the following goes in gCLI and status line, along with a red sidebar.

    IOException during data post (URL): ERROR.

    Do you have timeouts enables?

    get allowSocketTimeout

    default is false. I have it false, since, as I mentioned in my long note, it can legitimately take a long time to get a response and, as the first message I shared above says, if it times out during the data post, KoL may or may not have actually received it and acted on it, and you have no way of knowing.
    Ph'nglui mglw'nafh Cthulhu
    R'lyeh wgah-nagl fhtagn.

  7. #7
    Senior Member
    Join Date
    Jan 2014
    Posts
    205

    Default

    Oh, I've been studying this; I used to have these issues until I engineered a fix for my end.

    What I've noticed:

    It seems to happen in batches; when running turns on two mafia instances at once, they both lag at the same time.

    Inside of mafia, you can replicate the issue by equipping/unequipping a hat over and over in a script until you have Problems. Or maybe just visiting main.php.

    It's possible to replicate outside of mafia using httping. Run it for a long enough time, and you'll start to get timeout errors, something like 0.5% of the time:

    Code:
    %./httping -v "https://www.kingdomofloathing.com/login.php?loginid="
    Connecting to 107.23.63.16...
    connected to 107.23.63.16:443 (154 bytes), seq=48 time= 86.00 ms 
    Connecting to 107.23.63.16...
    connected to 107.23.63.16:443 (154 bytes), seq=49 time= 87.16 ms 
    Connecting to 107.23.63.16...
    connected to 107.23.63.16:443 (154 bytes), seq=50 time= 89.01 ms 
    Connecting to 107.23.63.16...
    connect time out
    
    [...]
    
    Connecting to 54.89.140.139...
    connected to 54.89.140.139:443 (154 bytes), seq=719 time= 86.57 ms 
    Connecting to 54.89.140.139...
    connected to 54.89.140.139:443 (154 bytes), seq=720 time=100.80 ms 
    Connecting to 54.89.140.139...
    connected to 54.89.140.139:443 (154 bytes), seq=721 time= 89.64 ms 
    Connecting to 54.89.140.139...
    connect time out
    You may need to ping "https://www.kingdomofloathing.com/login.php?loginid=" specifically, rather than "https://www.kingdomofloathing.com/".

    I edited httping to output which IP it was connecting to, even during a timeout error. This allowed me to discover that two of the four IPs KOL uses have errors, and two don't.
    Erroring IPs:
    Code:
    54.89.140.139
    107.23.63.16
    Non-erroring IPs: (so far)
    Code:
    3.225.18.117
    18.214.149.52
    I tracerouted all of those IPs, and they seem to take two sets of routes. One set for the erroring IPs, and one for the non-erroring IPs.

    As such, I suspect the issue could be:
    -Those two frontends are bad, and only certain people have issues?
    -There's an issue somewhere in the internet, that only affects certain routes and certain ISPs?, that causes connections to break in ways TCP doesn't like.
    -A bad AWS availability zone, where those two IPs are hosted, and the other two are in another zone...?

    I don't have enough network knowledge to think of other ways.

    It's possible to mitigate the issues by editing your /etc/hosts file, to only connect to a "stable" server:
    Code:
    3.225.18.117 www.kingdomofloathing.com
    But that isn't a useful fix for everyone. Actually, mafia itself could just blacklist the two bad IPs. Or the game devs could shut those servers off.

  8. #8

    Default

    I saw that and rolled my eyes. I'm very curious to learn how "they" determined that.
    Originally Posted by Veracity View Post
    That was all Aenimus.

  9. #9
    Senior Member AlbinoRhino's Avatar
    Join Date
    May 2008
    Posts
    996

    Default

    Haven't been able to test too extensively, but preliminary indications are that the hosts file entry has resolved this issue for me. My start-of-day script completed w/o errors for the first time in several days. Thanks Ezandora!

  10. #10
    Senior Member Crowther's Avatar
    Join Date
    Nov 2006
    Posts
    1,487

    Default

    Do you have timeouts enables?

    get allowSocketTimeout

    default is false. I have it false, since, as I mentioned in my long note, it can legitimately take a long time to get a response and, as the first message I shared above says, if it times out during the data post, KoL may or may not have actually received it and acted on it, and you have no way of knowing.
    Originally Posted by Veracity View Post
    I had timeouts enabled. I have no clue why.

    Thanks Ezandora! Any workaround in a storm.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •