Feature - Implemented Cache copy of faxbot.xml

Fluxxdog

Active member
Code:
Configuring available monsters.
[COLOR="#FF0000"]Could not load faxbot configuration from http://www.hogsofdestiny.com/faxbot/faxbot.xml
Could not load faxbot configuration from http://67.23.43.49/faxbot/faxbot.xml
Could not load Faxbot configuration[/COLOR]
Visiting Fax Machine in clan VIP lounge
Faxbot was online and working. Would have thought it would use a cached copy, but apparently not. Sooo... FReq plz?
 

fronobulax

Developer
Staff member
What's the FR here? Try multiple sites in order until a copy of faxbot.xml is found? Cache the last fetched copy locally and use it if a site does not respond? A mixture of both?

I know a lot of folks get justifiably angsty when embedding a URL within KoLmafia so perhaps the best solution would be to bundle a copy in KoLmafia (much like we do with prices), update it locally when http://www.hogsofdestiny.com/faxbot/faxbot.xml responds and use the local copy when it doesn't?
 

Veracity

Developer
Staff member
What's the FR here? Try multiple sites in order until a copy of faxbot.xml is found? Cache the last fetched copy locally and use it if a site does not respond? A mixture of both?
We already do the first one. The list of sites we check is hardcoded into FaxBotDatabase.java:

Code:
	final static String LOCATION = "http://www.hogsofdestiny.com/faxbot/faxbot.xml";
	final static String LOCATION2 = "http://67.23.43.49/faxbot/faxbot.xml";
Those are examined in turn by FaxBotDatabase.configureFaxBot(), which invokes FaxBotDatabase.configureFaxBot( String URL ). Refactoring to have an array rather than two static constants would be reasonable.

I believe the second one is the FR.

Not a bad idea: refactor the config loading a little to iterate over a sequence of URLs, culminating in a locally cached disk file, and give up only if all of those fail. Upon successfully loading a .xml file, save a local copy. Doesn't seem too hard.

I do not want to bundle a copy of the actual file inside KoLmafia. I am content to require you to have successfully fetched prices at least once, and use your own downloaded copy as a backup.
 

Catch-22

Active member
My post didn't really have anything to do with the FR, but at time of posting both locations were unavailable so I was just posting an alternative. In any case, "http://67.23.43.49/faxbot/faxbot.xml" appears to be permanently unavailable as they have moved to a new host (which uses a vhost, so the new host doesn't work with the direct IP).

I'd suggest it cache it the first time it loads, then it checks "Last-Modified" header on the remote server. If the file is newer, download it again. If it's the same age or the response code is something other than 200, it should return the locally cached copy. If there's no cached copy and the server is down, go with current behaviour (or perhaps return the HTTP error to the user).
 

Veracity

Developer
Staff member
I added the second address because weas told me that it was a backup server. Not a second name for the same server. If he tells me that the situation has changed, we will adapt. Until then, what I hear you say is just hearsay.

Edit: However, the following is NOT hearsay.

Hey there V,

Unfortunately the IP address for the backup faxbot.xml host is not static, as I originally thought it was. You should probably just remove it from the lookup list as that IP is stale.

Caching the last list sounds like a great idea, since additions to the faxbot.xml are relatively rare, and hogsofdestiny.com is *usually* up.

Let me know if you need anything else.

-weas
 
Last edited:

Catch-22

Active member
I added the second address because weas told me that it was a backup server. Not a second name for the same server. If he tells me that the situation has changed, we will adapt. Until then, what I hear you say is just hearsay.

Well given that 67.23.43.49 is part of a static pool of dedicated servers at Rackspace and 184.173.233.96 is part of a shared host of over 100 websites at Hostgator, I'd say it was more of an uninformed yet educated guess than hearsay, but point taken :) Good to know a clear answer.
 

Catch-22

Active member
Is anyone working on this already?

How do we go about tackling this one? I had a look and doing caching in Java the "right way" appears to be a fairly complicated process.

There's open source implementations already out there, like Ehcache, which could be included in the project. Would playing around with this idea be a waste of time?

I think it'd be great if faxbot.xml and all the buffbot xml files could be cached locally, but if we were to extend KoLmafia with a proper caching engine like Ehcache it also opens up the possibility of using it to cache other things, like javascript files, images, externally hosted map files, etc.
 

Bale

Minion
How do we go about tackling this one? I had a look and doing caching in Java the "right way" appears to be a fairly complicated process.

Maybe you should look at mallprices.txt for how KoLmafia has previously implemented a similar idea.

I think it'd be great if faxbot.xml and all the buffbot xml files could be cached locally, but if we were to extend KoLmafia with a proper caching engine like Ehcache it also opens up the possibility of using it to cache other things, like javascript files, images, externally hosted map files, etc.

Mafia already does some of that in the /images directory if you enable it in preferences.
 
Last edited:

Catch-22

Active member
Maybe you should look at mallprices.txt for how KoLmafia has previously implemented a similar idea.

PHP:
private static void doLogin( String name )
{
	...
	if ( Preferences.getBoolean( "sharePriceData" ) )
	{
		KoLmafiaCLI.DEFAULT_SHELL.executeLine( "update prices http://kolmafia.us/scripts/updateprices.php?action=getmap" );
	}
	...
}

Not really seeing it?

Mafia already does some of that in the /images directory if you enable it in preferences.

I have been using that for 5 years :) It's not a cache "done right" though. It will return any image you have in there, regardless of what the server has, ie. the cache content never expires. If we were to do that for buffbot/faxbot XML files the data you have could be way out of date and you'd never know. For the purposes of KoL images, I think the current implementation has been working pretty well, but if we're going to implement a proper cache it could be changed. What I had in mind at first was really just externally hosted images (such as what might be used in a relay override), though.
 
Last edited:

fronobulax

Developer
Staff member
I wonder if perhaps you are over-engineering the idea of cache? I think what I would have done and would have been accepted would have been to rename the old file if it existed; fetch the current file; if successful delete the renamed old file; if not successful rename the old file to the original name.
 

Catch-22

Active member
I'm not over-engineering anything yet :)

Is anyone working on this already?
If the answer to this is yes, I won't bother.

Would playing around with this idea be a waste of time?
If the answer to this is yes, I won't bother.

What you're essentially suggesting is a cache that is assumed to expire as soon as it is created, which is fine for when the server goes down and satisfies the FReq. I was merely suggesting a smarter solution which would save some bandwidth in the process.
 

Veracity

Developer
Staff member
If we were to do that for buffbot/faxbot XML files the data you have could be way out of date and you'd never know.
Uh huh. And, considering that the only time we would look at the cached file is because the server was down and we couldn't fetch and recache the "not way out of date" file, there is no way you COULD know.

Why would it be a problem to use the "way out of date" file when you had no way of getting a newer file? For faxbot, at least; buffbots are likely to change prices and such - which is to say change existing entries - rather than simply adding new offerings, like faxbot.

It never occurred to me to cache buffbot configs.
 

Catch-22

Active member
Uh huh. And, considering that the only time we would look at the cached file is because the server was down and we couldn't fetch and recache the "not way out of date" file, there is no way you COULD know.

Why would it be a problem to use the "way out of date" file when you had no way of getting a newer file? For faxbot, at least; buffbots are likely to change prices and such - which is to say change existing entries - rather than simply adding new offerings, like faxbot.

It never occurred to me to cache buffbot configs.

No, in my suggestion we'd always use the cached file, unless the remote file had changed. The benefits to this solution are less bandwidth (both upstream bandwidth on the remote server and downstream bandwidth on the client) and a more responsive client experience (doesn't have to download the entire file if the header suggests the content hasn't changed). rfc2616 pretty much covers what I am talking about. All modern web browsers do this already (unless you turn off the cache for some reason).
 

Veracity

Developer
Staff member
No, in my suggestion we'd always use the cached file, unless the remote file had changed.
Yes, yes. I understood that.

I ask again: What does "you could use a way out of date file without knowing" have to do with anything? If you ping the server and don't get a response, you use the cached file, just as if the remote file had not changed - since you have no way of knowing if the remote file had changed.

What else could you do?

Unless you thought, for some reason, that we'd not ping the remote file if there was a cached file - which is not something that _I_ have ever suggested in this thread. In fact, if you look at MY words, you will see that I said:

Not a bad idea: refactor the config loading a little to iterate over a sequence of URLs, culminating in a locally cached disk file, and give up only if all of those fail. Upon successfully loading a .xml file, save a local copy. Doesn't seem too hard.
You want to ping the modification date, rather than loading the file. A nice optimization - for a file that we will load exactly once per session, at most, unlike images, which can be loaded hundreds of times per session. And, therefore, it might just be over-engineering things. :)
 

Catch-22

Active member
I ask again: What does "you could use a way out of date file without knowing" have to do with anything?
Ah, sorry that might've been misleading, that was in response to Bale regarding the mechanism currently being used for images (which would not be appropriate for use in this case).

A nice optimization - for a file that we will load exactly once per session, at most
Keep in mind that the files we're talking about are run on servers hosted/paid for by the community, I'm sure any reduction in bandwidth usage for these guys would be a welcome change :)

unlike images, which can be loaded hundreds of times per session.
If you're using a browser that caches, it will actually cache the images anyway.

Comparison:
Code:
curl --head http://www.hogsofdestiny.com/faxbot/faxbot.xml
HTTP/1.1 200 OK
Date: Wed, 15 Aug 2012 00:58:26 GMT
Server: Apache
Last-Modified: Tue, 07 Aug 2012 04:39:32 GMT
Accept-Ranges: bytes
Content-Length: 26107
Content-Type: application/xml
In this case, getting the head is 190 bytes (+ TCP overhead). If we were to include the body-content, that would be 26297 bytes (+ TCP overhead). Assume the xml is being loaded a hundred times a day (I think this is a conservative figure), that's 2.6mb per day. The content hasn't changed in the past week, so that's 14mb of bandwidth that could've been saved, sure it's not huge but it's something.

Edit: Applying the above method to the Testudinata buffbot XML files gives you an even more drastic figure.

Code:
curl --head http://www.rawbw.com/~ssjlee/kol/testudinata.xml
HTTP/1.1 200 OK
Date: Wed, 15 Aug 2012 01:08:33 GMT
Server: Apache/2.2
Last-Modified: Fri, 09 Jul 2010 09:48:22 GMT
ETag: "c19-889f-48af14db06580"
Accept-Ranges: bytes
Content-Length: 34975
Content-Type: application/xml

35k x 100 per day x 760ish days = 2.5GB in bandwidth.
 
Last edited:

heeheehee

Developer
Staff member
Rackspace cost of bandwidth is 18¢ per GB. At 26.5KB per request, it'll take roughly 2 million requests for that to cost weas (or whoever's paying for HoD's hosting) a Mr. A. Basically, this means that a call to faxbot's XML file costs him 5 meat.

Not sure what to make of this.

(Fun fact: if said resources were gzipped, the bandwidth used would be reduced by 85% or so.)
 

Catch-22

Active member
Rackspace cost of bandwidth is 18¢ per GB. At 26.5KB per request, it'll take roughly 2 million requests for that to cost weas (or whoever's paying for HoD's hosting) a Mr. A. Basically, this means that a call to faxbot's XML file costs him 5 meat.

I like how you broke it down to meat cost :) The primary host for the faxbot is Hostgator, not Rackspace. It's likely to be an unlimited bandwidth plan (Hearsay? All the hostgator plans are unlimited). The same may not be the case for the rest of the external resources that KoLmafia loads though.

Not sure what to make of this.

I don't really know what to make of it either, are you suggesting that cost savings due to a reduction in bandwidth usage are not worth considering? Isn't that like saying a lightbulb only costs 8c to run for 24hrs, so we may as well leave it on all the time?

(Fun fact: if said resources were gzipped, the bandwidth used would be reduced by 85% or so.)

It would reduce the bandwidth, however, KoLmafia requests do not have an "Accept-Encoding" header (KoLmafia is currently only capable of processing a plain-text response). Even if the servers were to gzip the response, it wouldn't be usable for the request anyway.
 
Last edited:

heeheehee

Developer
Staff member
I like how you broke it down to meat cost :) The primary host for the faxbot is Hostgator, not Rackspace. It's likely to be an unlimited bandwidth plan (Hearsay? All the hostgator plans are unlimited). The same may not be the case for the rest of the external resources that KoLmafia loads though.
Pinging HoD does indeed confirm hostgator; was just going off of earlier information in the thread about the "static" IP address belonging to Rackspace.

I don't really know what to make of it either, are you suggesting that cost savings due to a reduction in bandwidth usage are not worth considering? Isn't that like saying a lightbulb only costs 8c to run for 24hrs, so we may as well leave it on all the time?
Eh. I was considering making a silly suggestion that every use of faxbot be supplemented with a 5 meat donation to weas, but decided against it.

It would reduce the bandwidth, however, KoLmafia requests do not have an "Accept-Encoding" header (KoLmafia is currently only capable of processing a plain-text response). Even if the servers were to gzip the response, it wouldn't be usable for the request anyway.
This is a bit of a pity, but it's not really necessary. I'd definitely file a feature request for this if KoL itself offered gzip compression, though (it doesn't, as far as I can tell).
 

Catch-22

Active member
Hopefully this patch will strike a balance between simplicity of implementation and optimization of bandwidth consumption.

Perhaps this patch can be reviewed with the possibility of being given a thumbs-up for applying the same treatment to buffbot XML files as well.

Obviously there would be some other considerations to make before that would happen, though.

Coming from a place where the internet is not as fast as I would like it to be (is it ever?), I can say that running the faxbot command for the first time in a new session is a lot more responsive now that the file is cached :)
 

Attachments

  • CacheFaxbot.patch
    5.3 KB · Views: 32
Top