Bug - Fixed file_to_map needs https handling?

xKiv

Active member
In DataFileCache, we have:
Code:
	public static BufferedReader getReader( final String filename )
	{
		if ( filename.startsWith( "http://" ) )
		{
			return DataUtilities.getReader( "", filename );
		}
		byte[] data = DataFileCache.getBytes( filename );

		return DataUtilities.getReader( new ByteArrayInputStream( data ) );
	}

That looks like mafia knows that http:// files are not local files, but not that https:// files are not local files.

I added || filename.startsWith( "https://" ) here and in DataUtilities.getReader, and that seems to have done the trick.
 

fronobulax

Developer
Staff member
In DataFileCache, we have:
Code:
	public static BufferedReader getReader( final String filename )
	{
		if ( filename.startsWith( "http://" ) )
		{
			return DataUtilities.getReader( "", filename );
		}
		byte[] data = DataFileCache.getBytes( filename );

		return DataUtilities.getReader( new ByteArrayInputStream( data ) );
	}

That looks like mafia knows that http:// files are not local files, but not that https:// files are not local files.

I added || filename.startsWith( "https://" ) here and in DataUtilities.getReader, and that seems to have done the trick.

r18242. Only lightly tested.
 

Magus_Prime

Well-known member
r18243 - This may be an unintended side-effect of this change. Got the following in the gCLI when I logged in this morning:
Code:
(file not found)
0 prices updated from http://kolmafia.us/scripts/updateprices.php?action=getmap
 

fronobulax

Developer
Staff member
r18243 - This may be an unintended side-effect of this change. Got the following in the gCLI when I logged in this morning:
Code:
(file not found)
0 prices updated from http://kolmafia.us/scripts/updateprices.php?action=getmap

I saw it too so I will check.
 

fronobulax

Developer
Staff member
I did a partial revert. Prices are being updated again. If anyone has a non-versioned (in the mafia sense of versions and data overrides) file accessed via http: and https: I will dig deeper.

I thought the problem might be that /lib/net/java/dev/spellcast/utilities/DataUtilities.java was trying to create an Http connection with an Https URL but my quick fix broke the price update. My "fix" might not have been correct, however. I don't think the Java update is a factor. I wonder whether data overrides is playing into this.

But mostly, I need to be elsewhere for a couple hours and did not want to leave the price update broken.
 

xKiv

Active member
That's weird, because without that there in DataUtilities, I was getting errors when batbrain tried to update batfactors.txt. Because you can't get a https:// url's content from filesystem.
Is price updating doing some weird stuff with putting the url on a class loader or override streams?
 

AlbinoRhino

Active member
I am running r18243 (before the revert) as well as the newest Java.

PHP:
Festival of Jarlsberg tomorrow, Moxie bonus today and tomorrow.
Using     data override: data/mallprices.txt
2048 prices updated from     http://kolmafia.us/scripts/updateprices.php?action=getmap
 

fronobulax

Developer
Staff member
Price updating is special in that it is a versioned file. I don't completely understand the versioning but the execution path is different from file_to_map because reading and checking the version is "hidden" from the code that just tries to read the file.

Presumably my environment would have updated batbrain. I didn't notice any errors. Where were they reported?

I think the fact that DataUtilities can't handle https: effects other files but I'm not sure I know the fix yet.

This file loads batfactors via https. 0 indicates failure. I was seeing 16 with code that seemed to work for batfactors but not the maps.

View attachment BF.ash
 

AlbinoRhino

Active member
16 is a correct count for batfactors. There are 16 different "types" (not sure what zarqon calls them) of data in the file. When I count the "id" fields I get a number that concurs with the number of lines in the file.

I was using
string [string] [string] [string] [string] [string] batdata;

as a map.

I was seeing the errors (file_to_map was working (returning true anyway), but map counts were actually 0) after zlib changed to https. I don't think any reporting was done until xKiv figured it out and started this thread. (It seemed to me that mafia found the file but there was no tab information being read, thus map counts were zero. My speculation only.)

Edit: Somewhere, zarqon mentioned that the http would redirect to the https.
 
Last edited:

xKiv

Active member
Price updating is special in that it is a versioned file. I don't completely understand the versioning but the execution path is different from file_to_map because reading and checking the version is "hidden" from the code that just tries to read the file.

That shouldn't matter at all. You (=mafia) need to get the file (from internet, local file, or cache) before you can even look at the version.

Presumably my environment would have updated batbrain. I didn't notice any errors. Where were they reported?

Batbrain itself, yes (that's svn, and https is handled by svnkit). Batfactors, no.

I didn't report "batbrain is unable to update batfactors.txt after switching to https", I found out why and reported *that*.
What was happening was that, with every combat, batbrain tried to update batfactors, something printed a lot of html lke
Code:
> Updating recoveryScript_map_v2.txt from '2017-02-06T21:18:34-06:00' to '<html><head><title>KoLmafia Script Registry  :: ... ...
(and more - looks like the entire html of https://www.zachbardon.com/mafiatools/autoupdate.php )

I debugged what's happening, and found that both DataFileCache and DataUtilitiesCache had a special case for http:// but not for https://. I had to fix both, or I would always get empty stream for the file (that's what mafia does when it cannot find .


This file loads batfactors via https. 0 indicates failure. I was seeing 16 with code that seemed to work for batfactors but not the maps.
Are you sure?

For me it does this (after the revert):
Code:
> BF

0

because there's no https://zachbardon.com/mafiatools/autoupdate.php?f=batfactors&act=getmap on my filesystem, mafia's cache, or on classpath. I need to specifically tell DataUtilities to use HttpURLConnection not just for http:// but also for https://.
 

xKiv

Active member
BTW, relevant code from zlib:

Code:
  string rem = visit_url("https://zachbardon.com/mafiatools/autoupdate.php?f="+fname+"&act=getver", false); // SOMETIMES for a reason I haven't figured out yet hitting this returns the regular HTML page as though no query string were given
   if (rem == "" || rem.length() > 100) return vprint("There was a problem accessing the Map Manager.",-1);  // check length to figure out if that happened
   if (zv[key].ver == rem && count(dest) > 0) {
      map_to_file(zv,"zversions.txt");
      return vprint("You have the latest "+fname+".txt.  Will not check again today.",3);
   }
   vprint("Updating "+fname+".txt "+(count(dest) > 0 ? "from '"+zv[key].ver+"' " : "")+"to '"+rem+"'...",1);
   if (!file_to_map("https://zachbardon.com/mafiatools/autoupdate.php?f="+fname+"&act=getmap",dest) || count(dest) == 0) {
   print("cnt " + count(dest));
      return vprint("Error loading "+fname+".txt from the Map Manager.","red",-1);
      }

Note the comment on the first visit_url ... SOMETIMES it returns just autoupdate.php without the query parameters. Funny thing, for me it was *always* returning as if without query parameters, until I made this change. Afterwards I was again getting like "> Updating batfactors.txt from '2017-09-09T02:19:42-05:00' to '2017-10-15T10:54:17-05:00'...". Hmm.
 

zarqon

Well-known member
I can provide a clue, possibly. The links used by the Map Manager are not direct links to map files but are rather requests sent to a fetcher file with the map to be returned specified in GET/POST data (it works with either). This lets me do more sanity checking and have tighter security than with direct links. However, the POST information is somehow lost when redirecting from http to https, which means the request will serve an HTML page rather than a map.

The redirect works fine (POST data is not lost) in a browser, and even, it seems, using "ash visit_url('http://page.php?querystring')" in the CLI, but fails from within an ASH script.

So this means that if you use a link starting with 'http' rather than 'https', you won't get a map file since POST gets lost in the redirect. So r18242 was probably necessary since I've updated ZLib to use 'https'. But that also leaves me with no clue why it would have affected the updateprices check. I believe I originally wrote the updateprices script too -- and maintained it until it was moved to kolmafia.us -- but I can't seem to find a local copy of it to look at it for clues.

My host says:

Zarqon's host said:
The POSTs weren't going through because a script doesn't have a session like a browser. The script hadn't done the TLS handshaking so the server dropped the POST. When you tested in the browser what happened was the server said, "I am redirecting you to https, go there" and the browser said, "OK" and resubmitted the POST at the https URL. Browsers handle redirects transparently; scripts are too simple to do so.

Included in case that's helpful.

EDIT: @xKiv: I've removed that comment since I figured out the above, though the comment deletion hasn't made its way into the released version yet. :)

Also @Rhino: I call them "categories".
 
Last edited:

zarqon

Well-known member
Made a very simple PHP script that simply outputs the POST and GET information received and put it on my server:

> ash visit_url("http://zachbardon.com/mafiatools/getpost.php?thisis=atest")

Returned: POST count: 0
GET count: 0

> ash visit_url("https://zachbardon.com/mafiatools/getpost.php?thisis=atest")

Returned: POST count: 1
The value of thisis is atest
GET count: 0

> ash visit_url("http://zachbardon.com/mafiatools/getpost.php?thisis=atest",true)

Returned: POST count: 0
GET count: 0

> ash visit_url("http://zachbardon.com/mafiatools/getpost.php?thisis=atest",false)

Returned: POST count: 0
GET count: 1
The value of thisis is atest

> ash visit_url("https://zachbardon.com/mafiatools/getpost.php?thisis=atest",true)

Returned: POST count: 1
The value of thisis is atest
GET count: 0

> ash visit_url("https://zachbardon.com/mafiatools/getpost.php?thisis=atest",false)

Returned: POST count: 0
GET count: 1
The value of thisis is atest

This is using r18244. Looks like either I was mistaken about it working in the CLI with the "ash" command or something's changed. Requests to 'http' all seem to drop the POST when redirecting to https.

I'll leave getpost.php up there in case anyone wants to play with it for debugging purposes.
 
Last edited:

AlbinoRhino

Active member
I've logged in many times with r18243 at this point. I've yet to see the prices fail to update. Maybe that was unrelated ?
 

xKiv

Active member
visit_url goes through entirely different code path than file_to_map(with url location).

visit_url creates a RelayRequst(false) or GenericRequest, posts that request with the given url and method, and buffers the result. GenericRequest has logic for handling/parsing kol pages, and such. It does not use DataFileCache or DataUtilities.

file_to_map will reads data from BufferedReader reader = DataFileCache.getReader( filename ), which has two options:
- one for http:// (and now https://): DataUtilities.getReader( "", filename )
- otherwise it gets data from DataFileCache.getBytes( filename )), which find the file using DataFileCache.getFile( filename, true ) (and then possibly "reads" the data from memory chace, if file modification time hasn't changed), which ... looks in scripts, relay, and data directories first, then any files directly in kolmafia root?


MallPriceDatabase.updatePrices has yet another approach: FileUtilites.getReader, but that still goes through DataUtilities.getReader( KoLConstants.DATA_DIRECTORY, filename, allowOverride ) (which sees http:// prefix, and thus makes HttpURLConnection, ignoring the other two arguments). Note that it will return empty stream if the response HTTP code is not 200. So updating prices will fail if kolmafia.us returns anything other than 200 here. Like when cloudflare detects any problems and returns a desriptive error page (iirc this was happening for a few hours yesterday, at least for me).




Oh, I see another place that checks only http:// : DataFileCache.getFile

And MallPriceDatabase.updatePrices, but the only place for updating prices so far is still http://kolmafia.us/scripts/updateprices.php?action=getmap (from LoginManager)


And KolmafiaCLI could maybe registerPrefix("https://") for VisitURLCommand?
 

xKiv

Active member
I've logged in many times with r18243 at this point. I've yet to see the prices fail to update. Maybe that was unrelated ?

I feel like this needs to be singled out from my previous post: prices may fail to update whenever there are problems with kolmafia.us.

Perhaps DataUtilities.getReader( final String directory, final String filename, final boolean allowOverride ) should actually log some sort of error/warning message when it converts a non-200 response to an empty stream?
 
Last edited:
Top