CanAdv -- check whether you can adventure at a given location

Veracity

Developer
Staff member
OK, I see that zone_check() does not have either "Little Canadia" or "MoxSign". I added those:

Code:
      case "Little Canadia": return canadia_available();
...
      case "MoxSign": return gnomads_available();
and now I get:

> ash import <canadv.ash> can_adv($location[Camp Logging Camp], false)

calling zone_check
checking zone Little Canadia
Returned: false

> ash import <canadv.ash> can_adv($location[The Edge of the Swamp], false)

calling zone_check
checking zone Le Marais Dègueulasse
zone_check returned true
Returned: true

> ash import <canadv.ash> can_adv($location[Thugnderdome], false)

calling zone_check
checking zone MoxSign
zone_check returned true
checking zone Beach
Returned: true
It is not recognizing "Le Marais Dègueulasse". That has a non-ASCII character in it. Hmm.

> ash ($location[The Edge of the Swamp].zone == "Le Marais Dègueulasse")

Returned: true
Let's see. Turning on ASH tracing:

Code:
                        <SWITCH (OPTIMIZED)>
                           <VARREF> zone
                           <SCOPE>
                              ...
                              <CASE>
                                 <VALUE string [Le Marais D�gueulasse]>
                                 <RETURN boolean>
                                    <CALL canadia_available>
This is the parse tree for that case label. I wonder what that string is?

Code:
                     Entering function zone_check
                           switch
                           Value: zone
                           [NORMAL] <- Le Marais Dègueulasse
                        [NORMAL] <- void
                           Eval: true
                           Returning: true
                        [RETURN] <- true
                     Function zone_check returned: true
The switch statement did not match that string and fell out the bottom.

I'll poke around and see how it optimized the case labels...
 

Theraze

Active member
Odd. Looks like your version of canadv is old, as well, since you posted this:
> ash import <canadv.ash> can_adv($location[The Edge of the Swamp], false)

Changing "The Frat House (Bombed Back to the Stone Age)" to "The Orcish Frat House (Bombed Back to the Stone Age)" would get rid of this message (canadv.ash, line 379)
Returned: true
The script has had The Frat House (Bombed Back) as The Orcish Frat House (Bombed Back) since Friday.

Also, regarding Little Canadia and the Gnomads:
Code:
  // signs
   case $location[Thugnderdome]:
   case $location[Pump Up Moxie]: return gnomads_available() && zone_check("Beach");
   case $location[Outskirts of Camp Logging Camp]:
   case $location[Camp Logging Camp]:
   case $location[Pump Up Mysticality]: return canadia_available();
It already detects them like that. Regarding the zone check, any zone not mentioned returns true/unlocked.

As I showed in my quote, Little Canadia and Le Marais Degueulasse both correct detect as locked when I'm in a muscle sign.
 

Veracity

Developer
Staff member
Try reading the rest of what I posted.

You should update zone_check() to have Little Canadia and MoxSign. For both of them, it always returns true.

And ASH is not compiling the string constant for "Le Marais Degueulasse" the same for me as it does for you. Since there is a non-ASCII character in there, I suspect there is a machine dependency of some sort. I am investigating.
 

Veracity

Developer
Staff member
I am clueless. Here is a script:

Code:
void testit( string str )
{
    switch (str) {
    case "Le Marais Dègueulasse": print( "yes" ); break;
    default: print( "no" ); break;
    }
}

testit( "Le Marais Dègueulasse" );
With some instrumentation, when I validate that script I get:

Code:
Read 184 bytes into a 182 string
label = "Le Marais Dègueulasse"(21)
Note that the two instances of the non-ASCII character were 2 byte Unicode characters which turn into single Java characters.

Now, validating canadv:

Code:
Read 37564 bytes into a 37564 string
Read 44044 bytes into a 44039 string
label = "Le Marais D�gueulasse"(21)
It read two scripts - canadv and zlib - but canadv did not seem to have unicode character; each byte in the file turned into a single character. zlib, on the other hand, had 5 unicode characters in it. Looks like the "±" character in 5 comment lines about verbosity.

Hmm. It looks like canadv is saved in coding system "iso-latin-1-dos" rather than "utf-8-dos". Ha ha. I saved it with UTF-8 encoding and guess what? It works now.

Code:
Read 37565 bytes into a 37564 string
Read 44044 bytes into a 44039 string
label = "Le Marais Dègueulasse"(21)
 

Theraze

Active member
Try reading the rest of what I posted.

You should update zone_check() to have Little Canadia and MoxSign. For both of them, it always returns true.

It always returns true because the locations themselves have the sign check. Regardless of whether the check for sign is done in the zone or on the location, it SHOULD give the same results... Why it didn't work for you that time, I don't know.

So... does the loading change thing (r14051) mean that the second Canadia zone works the same for you on your Mac as it does for me on Windows?
 

Veracity

Developer
Staff member
It always returns true because the locations themselves have the sign check. Regardless of whether the check for sign is done in the zone or on the location, it SHOULD give the same results.
It does the zone check first.
If that returns true, it does the location check.

It sounds like you are making an excuse for not making the test in the zone check for a couple of zones that can be instantly rejected based on your zodiac sign, instead choosing to check for individual locations. That puzzles me.

Why it didn't work for you that time, I don't know.
Because the script has a bad encoding - iso-latin-1, not utf-8 - which apparently works for you but not for me.

So... does the loading change thing (r14051) mean that the second Canadia zone works the same for you on your Mac as it does for me on Windows?
The released version is still broken. After I saved it with utf-8 encoding, it now works for me.
 

Theraze

Active member
It does the zone check first.
If that returns true, it does the location check.

It sounds like you are making an excuse for not making the test in the zone check for a couple of zones that can be instantly rejected based on your zodiac sign, instead choosing to check for individual locations. That puzzles me.
Sorry, I think you're misunderstanding me. I prefer to actually fix the bugs posted before doing optimizations. The more changes I make, the harder it is to fix the original reported bug if it still exists. You reported in post 356 that it was incorrectly giving the sign zones as available when they weren't. Until the reason for that is explained, I don't want to change where it's doing its check. If that was simply an anomaly which can be ignored, then I can move the check from the individual check to the zone check.


Because the script has a bad encoding - iso-latin-1, not utf-8 - which apparently works for you but not for me.

The released version is still broken. After I saved it with utf-8 encoding, it now works for me.
The default encoding for Windows isn't UTF-8, but ANSI. The question I posed was whether r14051 actually changed anything for mafia reading and using scripts... While most/all of my scripts, working on Windows, will be ANSI, it seems that other coders end with a variety of encodings... Take Zarqon for exampe: BatBrain and BestBetweenBattle are both ANSI, but zlib is in UTF-8.

At least for myself and several others, zlib is the only zarqon script that defies easy fixing and errors out when editing happens. If I can avoid making it so that myself and others can't easily update CanAdv in the future, I'd prefer to avoid that. But since that datafile (data\zonelist.txt) and presumably KoL itself has the actual encoded character rather than the HTML encoding, we have this OS-level disruption. Specifically, on my system, a UTF-8 encoded CanAdv that I create corrupts the start of the text file.
***** SCRIPTS\canadv.ash
#/******************************************************************************
***** SCRIPTS\CANADV.ASH.TXT
#/***************************************************************************
***
 

Veracity

Developer
Staff member
The default encoding for Windows isn't UTF-8, but ANSI.
I have no idea what "ANSI" means in this context. The file you posted is in iso-latin-1. Is "ANSI" a synonym for that? Not exactly; Wikipedia says that "Windows-1252 or CP-1252 is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages. " and "This character encoding is a superset of ISO 8859-1, but differs from the IANA's ISO-8859-1 by using displayable characters rather than control characters in the 80 to 9F (hex) range." ISO 8859-1 is, essentially, iso-latin-1.

Here is Wikipedia for ISO Latin 1 and for UTF-8.

The question I posed was whether r14051 actually changed anything for mafia reading and using scripts... While most/all of my scripts, working on Windows, will be ANSI, it seems that other coders end with a variety of encodings... Take Zarqon for exampe: BatBrain and BestBetweenBattle are both ANSI, but zlib is in UTF-8.
Your comment that you write scripts that "work on windows" is interesting. Whether or not you think you are saying this, what you are really saying is that you write scripts that "work ONLY on windows", if they include any non-ASCII characters, since they will be encoded in a way that ONLY Windows is able to read.

In an effort to inspire people to publish their scripts in a portable encoding that ALL systems - Windows, Mac, Linux, ... - can read, revision 14051 makes KoLmafia read scripts in UTF-8, whether or not that is what they were written in. That means that if there is an iso-latin-1 character in the script (as there is in canadv), it will be decoded into some entirely unexpected UTF-8 character. That was happens to me with the current canadv. Sucks to be me, eh? Except, hopefully, now it happens to you, too, and you will be motivated to fix it canadv that it works for everyone, rather than only working for you.

At least for myself and several others, zlib is the only zarqon script that defies easy fixing and errors out when editing happens. If I can avoid making it so that myself and others can't easily update CanAdv in the future, I'd prefer to avoid that. But since that datafile (data\zonelist.txt) and presumably KoL itself has the actual encoded character rather than the HTML encoding, we have this OS-level disruption. Specifically, on my system, a UTF-8 encoded CanAdv that I create corrupts the start of the text file.
This makes literally no sense at all. I have no idea whatsoever you mean by "HTML encoding" in this context, since that has nothing to do with UTF-8. You are correct that KoL and KoLmafia both have "the actual encoded character" - in UTF-8. Neither of them has "the iso-latin-1 encoding" of characters, which canadv has.

The only difference between canadv as an iso-latin-1 encoded file and a utf-8 encoded file is that a single iso-latin-1 8-bit byte representing the "è" character becomes 2 8-bit bytes representing the corresponding utf-8 character. If you convert canadv.ash from its current iso-latin-1 encoding to utf-8 encoding and anything other than that single 1-byte-character to 2-byte-character transformation happens, you are converting it incorrectly.

I suggest that you figure out the correct way to convert that file - and the correct way to save your files in utf-8.
 

Veracity

Developer
Staff member
Take Zarqon for exampe: BatBrain and BestBetweenBattle are both ANSI, but zlib is in UTF-8.
Actually, no.

BatBrain and BestBetweenBattle are both ASCII; neither includes any special characters. Since ASCII is a subset of UTF-8, that means they are also effectively UTF-8.
zlib has the "±" character 5 times and encodes them in UTF-8; those 5 characters are the only thing which makes it non-ASCII.

Since you said you have difficulty converting from iso-latin-1 to utf-8, I attach canadv.ash encoded in utf-8 - also including the other two bug fixes I pointed out to you: checking for Little Canadia and MoxSign in the zone check.

Thanks.
 

Attachments

  • canadv.ash
    36.6 KB · Views: 31

Theraze

Active member
Eh, ANSI is what Notepad calls it. If that's iso-latin-1, then yes.

Still waiting for the specifics on whether the sign checks weren't working because of a hiccup in that run of mafia for you or if there's actually some reason why your execution detected differently than mine did... if it was just an inconsistency in that run, then I can move the sign detection to zone_check. If something is actually running differently, then it needs to be fixed. After that's fixed, I can move that up there...
 

xKiv

Active member
since they will be encoded in a way that ONLY Windows is able to read..

I think this is incorrect. A byte-order-marker is a standard unicode feature, and if a program that says it reads unicode fails to read a file because it starts with a BOM, that's a bug in that program.
The issue is not that only windows is able to read them (which isn' true), but rather that windows is the only system that bothers *writing* them that way. Completely unnecessarily, because utf-8 can only have one byte order ever. It makes sense in every other unicode encoding, especially when the same file should work on any platform.


Eh, ANSI is what Notepad calls it. If that's iso-latin-1, then yes.

IIUIC, ANSI means different things on different versions of windows, and especially in different countries.
It's also almost never an iso-latin-## encoding, because microsoft always has to "improve" on industry strandards (by, for example, switching around two characters ...)
 

Veracity

Developer
Staff member
I think this is incorrect. A byte-order-marker is a standard unicode feature, and if a program that says it reads unicode fails to read a file because it starts with a BOM, that's a bug in that program.
Fair enough. Here is an example of how we set up to read a UTF-8 file:

Code:
			this.istream = DataUtilities.getInputStream( scriptFile );
			this.commandStream = new LineNumberReader( new InputStreamReader( this.istream, "UTF-8" ) );
In other words, we make a stream from a File and make an InputStreamReader from it, specifying UTF-8 encoding. Will reading from that stream skip a BOM? I would hope so, but I don't know. I don't have a test file with such a marker in it to see. If not, and the stream passes it to the caller, I expect we could ignore such a marker. I would consider that to be getting around a Java library bug.

The issue is not that only windows is able to read them (which isn' true), but rather that windows is the only system that bothers *writing* them that way.
I will point out that the format I said that "ONLY Windows will be able to read" is what Theraze called "ANSI" - the variant of iso-latin-1 that the released version of canadv.ash is written in, with single-byte representations of accented characters and no indication whatsoever that this is how the file is encoded.

When Emacs reads in a file, it looks at the whole thing and uses heuristics to decide how it is encoded and converts non-Unicode to Unicode for its own internal usage. When it looks at canadv, it says it is "iso-latin-1-dos" - because it has DOS line breaks and has a single-byte non-ASCII character. When it looks at zlib, it says it is "utf-8-dos", since it has DOS line breaks and UTF-8 non-ASCII characters. Presumably, it would use a Unicode BOM to help it, if one were present, but it doesn't need it.

I do not want to load up KoLmafia with that kind of heuristics and auto-transformation of encoding. Much easier to simply say "use UTF-8 (which is simply ASCII, if you have no non-ASCII characters)".
 

Theraze

Active member
Here is canadv converted back to UTF-8 through Notepad complete with BOM. Extension is changed, to not break the working UTF-8 encoded version.
 

Attachments

  • canadv.ash.txt
    36.5 KB · Views: 116

Veracity

Developer
Staff member
Sweet! Sort of. I am disappointed that Java doesn't simply skip the BOM.
Well, I have what I need to make KoLmafia skip it, at least.
Thanks.
 

Veracity

Developer
Staff member
OK, revision 14053 will skip Unicode BOM characters at the beginning of a line in an ASH script. That's overkill; presumably it could be made to skip it only at the beginning of a file. But this is simple, at least.
 

Theraze

Active member
Thanks! That should help users stop posting bug reports randomly when they edit a UTF-8 script using Windows Notepad and it throws the useless characters on there. :) It's been an ongoing random problem for years that never made perfect sense to myself and those trying to help troubleshoot their circumstances.

To confirm: we're fine moving the sign checks to the zone_check area, and don't believe that the time when it failed was a replicable bug, right? I committed the UTF-8 (without BOM) this morning, but want to make sure that if something is wrong with the check that I fix it rather than just move it to a new area where it will still fail.
 

Veracity

Developer
Staff member
For zones that are only available in certain signs - Little Canadia and MoxSign - then yes, the zone check is where to do it.
 

Theraze

Active member
I'm planning on moving it there. You reported in post 356 that the canadia_available and gnomads_available checks were not working properly. Have you experienced that again since Friday, or have they been working since then? If those checks aren't working, then the zone checks will need to do server hits to check if the zones are unlocked or parse my_sign or something else similar. Since there is no case where 2 different mutually exclusive signs should both say they're available, but that is what post 356 reported.

I'm just trying to make sure that, having moved the my_sign checks to zone_check, that everything will continue working properly. Mutually exclusive values reporting that they're both working doesn't inspire confidence.
 

Veracity

Developer
Staff member
They weren't working for the Swamp zones because of the issue we've been discussing - the non-Unicode version of the è character in the zone name in canadv. I was mistaken about the Thugnderdome. And I was confused about the Little Canadia zones because I initially tested when I was overdrunk and didn't realize you were checking for that condition. I think that gnomads_available() and canadia_available() are working fine.

If you release a UTF-8 version of canadv from now on, I expect the swamp zones will be correctly rejected by the zone check on all platforms - and for efficiency, you should also test for Little Canadia and MoxSign in the zone check, too.
 
Top