Bug - Not A Bug Unusual characters in faxbot monster names cause problems

zarqon

Well-known member
Using the Request a Fax window, KoLmafia was unable to successfully request a Jar of Alphredo™.

Asking Easyfax to send a fax of Possessed Jar of Alphredo™: Possessed Jar of Alphredo™
Configuration error: unknown command sent to Easyfax

The chat contained:

zarqon: Possessed Jar of Alphredo?
Easyfax: I couldn't find that monster. Please look here [link] http://sourceforge.net/p/easyfax/code/HEAD/tree/list for a list of monster names.

It worked without the trademark symbol. Is this an error with EasyFax's config file or something that could/should be worked around in mafia?
 

lostcalpolydude

Developer
Staff member
I believe there are multiple pages discussing that in the thread where support was added for FaustBot and Easyfax. I don't remember the conclusion.
 

Veracity

Developer
Staff member
The monster is named:

Code:
Possessed Jar of Alphredo™
EasyFax's config file says:

Code:
<monsterdata>
<name>Possessed Jar of Alphredo™</name>
<actual_name>Possessed Jar of Alphredo™</actual_name>
<command>Possessed Jar of Alphredo™</command>
<category>None</category>
</monsterdata>
Note the actual ™ character. The EasyFax config file says "<?xml version="1.0" encoding="UTF-8"?>". Therefore, the character code for that Unicode symbol is 8482 (decimal).

When we send the request to EasyFax, we do it via a chat command:

Code:
				ChatSender.sendMessage( botName, command, false );
According to Generic Request:

Code:
		String charset = this.isChatRequest ? "ISO-8859-1" : "UTF-8";
we encode chat requests using ISO-8859-1, not UTF-8. In that charset, the character code for that symbol is 153.

I went into the gCLI.
I did /msg Veracity abc™
KoLmafia submitted this request:

submitnewchat.php?pwd&playerid=121572&graf=%2Fmsg+Veracity+abc%3F

Note the %3F character, which is a question mark. It did not actually transmit the ™ symbol. I'm not sure where that is happening; I'd have expected to see it transmitted as char code 153, per ISO-8859-1 encoding.

I went into the Relay Browser and typed that same /msg command. The Browser popped up a window saying the ™ character is not ASCII and it will convert it into a space. And, in fact, I received abc with no ™.

Looks to me like EasyFax config file should store the actual KoL (and KoLmafia) monster names in its config files - which is to say, using HTML entities.
 

Veracity

Developer
Staff member
Although, you know, we could convert config file "commands"s to use HTML entities ourself.

Probably easy enough, and it would be harmless even if EasyFax eventually does that itself.
 

Veracity

Developer
Staff member
Except, EasyFax would need to understand it.

Code:
[13:59] Veracity: Possessed Jar of Alphredo™
[14:00] Easyfax: I couldn't find that monster. Please look here [link] http:// sourceforge.net/p/ easyfax/code/HEAD/ tree/list for a list of monster names.
The fix has to be in EasyFax, not KoLmafia.
 

Crowther

Active member
Except, EasyFax would need to understand it.

Code:
[13:59] Veracity: Possessed Jar of Alphredo™
[14:00] Easyfax: I couldn't find that monster. Please look here [link] http:// sourceforge.net/p/ easyfax/code/HEAD/ tree/list for a list of monster names.
The fix has to be in EasyFax, not KoLmafia.
Ugh. The code I use just gets the name off the fax sheet.
Code:
matcher m = create_matcher("This is a sheet of copier paper with a grainy, blurry likeness of a(n*) (.+?) on it", fax);
So it is in the format visit_url() returns.

Just because my XML says UTF-8, doesn't mean that's right. I honestly don't understand these new character sets. I think someone told me to change that from ISO-8859-1, which I simply copied from faxbot.

Anyway, that's not the real issue. My problem is with the way KoL works. Since there isn't a unique mapping between monster names and monsters, I simply go by the name as it appears on the fax page. However, as you point out, KoL will not let you cut and paste that name directly into chat, which leaves me in a bit of a bind. Is there a mafia way to convert from whatever visit_url() gives to html characters?

BTW, if I disappear for a bit, it isn't because I don't care about this issue, my life is about to get very busy again. I should be able to squeeze this in.
 

Crowther

Active member
I think entity_encode() will do exactly what you need.
Cool! Thanks both of you!

I thought something like that was added, but my wiki searches and ashref foo failed me. Easyfax now converts names to html when it looks at the fax page. Since this should make them pure ASCII, I'm betting things will run much smoother. I'm going to restart it now, but it will take a while for clans to be checked and for the information to propagate, so don't expect it to work better right away.
 

Crowther

Active member
Code:
[13:40] Crowther: Possessed Jar of Alphredo™
[13:40] Easyfax: Your fax is ready.
And the data file has the html name too. One odd side effect here. The first time I requested just "Alphredo", the request failed, because it went to find the ™ version and found the ™ version and rejected it, because that wasn't want it was expecting to find. However, since after visiting that clan, it now remembers the ™ version is there, a second request worked fine. There aren't too many monsters with odd characters, so I'll try to request them all.

EDIT: Actually, I couldn't find any other monsters in the network with odd characters, so I guess this problem has been fixed!
 
Last edited:

Veracity

Developer
Staff member
Code:
[color=red]Could not load easyfax configuration from "https://sourceforge.net/p/easyfax/code/HEAD/tree/Easyfax.xml?format=raw"[/color]
I see an error on my console:

[Fatal Error] easyfax.xml:1557:39: The entity "trade" was referenced, but not declared.

Line 1157 of the XML file:

<name>Possessed Jar of Alphredo™</name>

So, yes - this change to the config file has everything to do with why we cannot load the configuration.

Apparently the "trade" entity is not built-in to XML and you have to declare it, somehow.
 

Veracity

Developer
Staff member
I tested by changing the start of the easyfax.xml to look like this:

Code:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE faxbot
   [
      <!ENTITY trade  "™">
]>
The DOCTYPE thing is new. We can now successfully parse the file - but it replaces the ™ with the unicode character and we are back to where we started. That ™ symbol is actually a Unicode character definition: "& #8482 ;" (minus the spaces).

I then adjusted FaxBotDatabase:

Code:
			this.command = CharacterEntities.escape( command );
And it works like this:

EasyFax stores the name of the monster with the characters entities.
EasyFax generates a config file that defines & references the character entity.
KoLmafia's XML parser reads that config file and gets unicode characters,
KoLmafia escapes the unicode characters and ends up with the same ASCII string that EasFax has

--> KoLmafia can send that command via Fax and EasyFax understands it.

---

Alternate solution:

Easyfax config does NOT have a DOCTYPE. Instead, change the monster:

Code:
<monsterdata>
<name>Possessed Jar of Alphredo&trade;</name>
<actual_name>Possessed Jar of Alphredo&trade;</actual_name>
<command>Possessed Jar of Alphredo&trade;</command>
<category>None</category>
</monsterdata>
to not actually have a character entity - just something that XML will expand to look like one.

KoLmafia's FaxBotDatabase does not entity-encode the command. Same as now.

Net result is, KoLmafia's "command" agrees with EasyFax's command, and we can get the monster.

So:

EasyFax does entity-encode on the monster name it reads from the photocopier.
It stores that in its command table.
It does ANOTHER entity encode on the command before it writes it to the XML file. But that's not the one it uses. That's just how it publishes it.
 

fronobulax

Developer
Staff member
I'm confused.

What is the end user solution to
Could not load easyfax configuration from "https://sourceforge.net/p/easyfax/code/HEAD/tree/Easyfax.xml?format=raw"
with r17537?

Short term is edit the file or mafia and the file to implement the changes above, but what is the long term solution? Tell EasyFax how KoLmafia would like to handle trademarks and and then wait for them to update their file?
 

Veracity

Developer
Staff member
Short term solution for the end user is to talk to Easyfax directly with chat, just like non-KoLmafia users do it.

Relatively short turn solution is for EasyFax to use & in its XML config file for every & character it wants to put there - after it has entity-encoded the monster name, which it currently does.

I see no need for anything else beyond that. That should be robust even if future monsters have diferent HTML entities.
 

Crowther

Active member
Short term solution for the end user is to talk to Easyfax directly with chat, just like non-KoLmafia users do it.

Relatively short turn solution is for EasyFax to use & in its XML config file for every & character it wants to put there - after it has entity-encoded the monster name, which it currently does.

I see no need for anything else beyond that. That should be robust even if future monsters have diferent HTML entities.
Yup. I can do that. It's a little more messy than it sounds, but not a problem.
 

Crowther

Active member
Okay. That seems to work. I was able to use the GUI to request that monster. The GUI shows it as "™" so there might be some touching up to do there or not.
 
Last edited:

Veracity

Developer
Staff member
Yeah, we might want to entity-decode the actual name (but not the command) for display purposes. I'll think about it.
Thank you for your work!
 

Veracity

Developer
Staff member
Revision 17540 will display pretty monster names, with HTML entities unescaped.
It will also print the error message if XML parsing throws an error.
 

xKiv

Active member
Alternate solution:

Easyfax config does NOT have a DOCTYPE. Instead, change the monster:

Code:
<monsterdata>
<name>Possessed Jar of Alphredo™</name>
<actual_name>Possessed Jar of Alphredo™</actual_name>
<command>Possessed Jar of Alphredo™</command>
<category>None</category>
</monsterdata>
to not actually have a character entity - just something that XML will expand to look like one.

I believe this is the correct way to do it, assuming that mafia will parse string representations of html entities (ie generating the file should use something like xml_encode(entity_encode(...)), depending on language and used libraries). Note that for future compatibility, you should also replace any < with < and > with > ; I don't think you will need to escape " or ' - that's only necessary within element attributes, not inside text nodes.
The approach with defining xml entities has the problem that you would have to include ALL used html entities in the definition (except the few ones that are also in XML) - either preemtively, by including the whole table in Easyfax.xml, or by scanning all names and somehow determining what to expand them to.

It's almost the same thing you have with including regexes in java strings, you need an escape layer for each language layer.
 
Top