Bug - Fixed non-ASCII characters error

ckb

Minion
Staff member
From my script:
Changing "Sweetbreads Flambe" to "Sweetbreads Flambé" would get rid of this message.

So I did that, and I get this:
Typed constant $effect[Sweetbreads Flambé] contains non-ASCII characters

I am confused.
 

Veracity

Developer
Staff member
Code:
1918	Fungal Flambé	pod_hot.gif	c48b0a26daedd9d432f728c7c2d9c221	#use hot spore pod in combat
2091	Sacré Mental	wine2.gif	8b5af0a2980a1144c725e3410b5e53e2	drink 1 Sacramento wine
2092	Sweetbreads Flambé	potion4.gif	b3f3d7e5c72a045fff556e01f287f42d	use 1 Greek fire
KoL uses the actual accented character rather than a character entity.

At least, one would think so, given the effect description file.

Getting the effect:

Code:
You acquire an effect: <b>Sweetbreads Flambé</b><br>(duration: 25 Adventures)
Yep. No character entity.

I'm not sure what to do about this. Pondering.
 

heeheehee

Developer
Staff member
Is there a reason why we forbid non-ascii characters in typed constants? Might be easier to just allow them.
 

Veracity

Developer
Staff member
We could allow UTF-8 characters, I suppose. The problem was when people wrote scripts using some sort of Microsoft charset and expected them to work correctly on machines for which that was not the default.

Code:
[color=green]> ash to_effect( "Sacré Mental" ).to_string()[/color]

Returned: Sacré Mental

[color=green]> ash to_effect( "Sacré Mental" ).to_string()[/color]

Returned: Sacré Mental

[color=green]> ash to_effect( "Memento Moiré" ).to_string()[/color]

Returned: Memento Moiré

[color=green]> ash to_effect( "Memento Moiré" ).to_string()[/color]

Returned: Memento Moiré
That's on OS X. I used "option-e e" to get the Unicode character.

ASH uses this to parse effect names:

Code:
		effectId = EffectDatabase.getEffectId( name );
and then saves the "normalized" name of the effect in the Value:

Code:
		name = EffectDatabase.getEffectName( effectId );
		return DataTypes.makeNormalizedEffect( effectId, name );
EffectDatabase will look up the effectId using the "canonical" name, which will entityEncode UTF-8 characters.

I expect that all of the above will fail if you give it a string with characters encoded in a Windows charset.

I dunno. We have considerably changed how ASH makes item and effect objects since back when we disallowed non-ASCII characters.

I'd be curious to see if the example I cited works a Windows system these days. Do modern versions of that OS still use their own special character encoding?
 

Veracity

Developer
Staff member
Code:
[color=green]> ash $effect[ sac men ].to_string()[/color]

Changing "sac men" to "Sacré Mental" would get rid of this message. ()
Returned: Sacré Mental

[color=green]> ash $effect[ mem moir ].to_string()[/color]

Changing "mem moir" to "Memento Moiré" would get rid of this message. ()
Returned: Memento Moiré
Note that the "Changing" message does give you the actual data name of the effect.

Code:
[color=green]> ash $effect[ memento moiré ].to_string().contains_text( "acute" )[/color]

[color=red]Typed constant $effect[memento moiré] contains non-ASCII characters ()[/color]
Returned: void

[color=green]> ash to_effect( "memento moiré" ).to_string().contains_text( "acute" )[/color]

Returned: true
Just to verify that when we construct the "effect" object we store the actual data name in it, regardless of the string you passed in.

How do we read script files? Parser.java:

Code:
			this.istream = DataUtilities.getInputStream( scriptFile );
...
			this.commandStream = new LineNumberReader( new InputStreamReader( this.istream, "UTF-8" ) );
We try to read it in UTF-8. Which makes sense, since we are reading it into Java strings.

I don't know how that works if your file was not saved in UTF-8. Or ASCII.

I suspect that the ASCII check is no longer helpful.
 

Veracity

Developer
Staff member
Revision 17680:

Code:
[color=green]> ash $effect[ Sacré Mental ].to_string().contains_text( "acute" )[/color]

Returned: false

[color=green]> ash $effect[ Memento Moiré ].to_string().contains_text( "acute" )[/color]

Changing "Memento Moiré" to "Memento Moiré" would get rid of this message. ()
Returned: true
The previous discussion involved a script saved on Windows with a "tm" symbol in an item name, rather than a character entity - the Newbiesport™ tent, I believe - which did not parse successfully on OS X. I'm curious about whether that would parse successfully on Windows any more, since reading it in UTF-8 would result in an item name which I would expect to not match successfully.
 

fronobulax

Developer
Staff member
Revision 17680:

Code:
[color=green]> ash $effect[ Sacré Mental ].to_string().contains_text( "acute" )[/color]

Returned: false

[color=green]> ash $effect[ Memento Moiré ].to_string().contains_text( "acute" )[/color]

Changing "Memento Moiré" to "Memento Moiré" would get rid of this message. ()
Returned: true
The previous discussion involved a script saved on Windows with a "tm" symbol in an item name, rather than a character entity - the Newbiesport™ tent, I believe - which did not parse successfully on OS X. I'm curious about whether that would parse successfully on Windows any more, since reading it in UTF-8 would result in an item name which I would expect to not match successfully.

I am not following this closely but...

r17681 on Windows X

Code:
> ash $effect[ Sacré Mental ].to_string().contains_text( "acute" )

Bad effect value: "Sacr� Mental" ()
Returned: void

> ash $effect[ Memento Moiré ].to_string().contains_text( "acute" )

Bad effect value: "Memento Moir�" ()
Returned: void
 

Veracity

Developer
Staff member
If you use a non-UTF-8 character on input, we will not match an item. and you will get an error.
That is because no items have non-UTF-8 characters in them (although some have character entities) and our UTF-8 to character entity conversion function cannot convert the non-UTF-8 character to an expected entity.

I'd say this is working as intended. Thanks for testing!
 

Veracity

Developer
Staff member
So, how are you entering the é character?
What is your system default character encoding?
We read the command input in a JTextField using getText().

Revision 17683 adds a line to the Version info panel you get from the "About KoLmafia" menu item.
Tell me what it says about your "Default system file encoding".

Google on JTextField and UTF-8 has various comments. One says this:

Java uses the default encoding for your computer, which for Windows would be C16 and doesn't support unicode. Run your program with the following command in terminal:

Code:
java -jar -Dfile.encoding=utf-8 <path to your .jar>
Try this, tell me what the version info says, and tell me how your gCLI command line works with the é character.
 

fronobulax

Developer
Staff member
r17684

It says my default file system encoding is Cp1252 running normally.

Running with "java -jar -Dfile.encoding=utf-8 ..." says utf-8.

When I rerun the "samples" in the utf-8, I get

> ash $effect[ Sacré Mental ].to_string().contains_text( "acute" )

Returned: false

> ash $effect[ Memento Moiré ].to_string().contains_text( "acute" )

Changing "Memento Moiré" to "Memento Moiré" would get rid of this message. ()
Returned: true

In all cases I copied the ash test lines from the post and pasted them into the gCLI.
 

Veracity

Developer
Staff member
CommandDisplayPanel.java:

Code:
		private void submitCommand()
		{
			String command = CommandDisplayPanel.this.entryField.getText().trim();
			CommandDisplayPanel.this.entryField.setText( "" );

			CommandDisplayPanel.this.commandHistory.add( command );

			CommandDisplayPanel.this.commandIndex = CommandDisplayPanel.this.commandHistory.size();
			CommandDisplayFrame.executeCommand( command );
		}
See where it calls getText() to get a String out of the entry field?

Yeah. So, when we do getText() out of a JTextPane, we get characters in your default encoding.
If that is UTF-8, they are useable normally.
If they are Cp1252, they are unrecognizable.

Google tells me that you cannot change the default encoding once the JVM has started, which is why you have to specify it via -D when you start the .jar file.

I do not know if you can successfully take the characters and put them into a UTF-8 string. You can convert an array of bytes into UTF-8, using StringUtilities.getEncodedString( bytes, "UTF-8" );

Here is an interesting program.

Perhaps there is something that can be done via Java's "Charset" and/or "new String" to convert between whatever you got from getText() into Unicode or UTF-8 via a byte array or something.

Good luck!
 

AlbinoRhino

Active member
Mine says Cp1252, whatever that means. And, after updating, I get this in the login messages:

net.sourceforge.kolmafia.swingui.AdventureFrame could not be loaded

and I'm no longer able to use the adventure frame, at all. Is this related to this ? And, if so, (and keep in mind I barely understand what you guys are talking about), how do I fix it ?
 

Veracity

Developer
Staff member
Here is an article about converting charsets.

If understand correctly:

- Java's internal character encoding for Strings varies by system and can be changed at startup. You tried that.
- Java will read characters from a stream (for example, an open file or a network stream) using its default charset, unless you specify a different charset.
- DataUtilities (in net.java.dev.spellcast.utilities) will read files, by default, in UTF-8:

Code:
	public static BufferedReader getReader( final InputStream istream )
	{
		return DataUtilities.getReader( istream, "UTF-8" );
	}
although it can read them in other encodings.
- KoL sends us the "descriptions" for items and effects and such in UTF-8:

Code:
Retrieved: https://www.kingdomofloathing.com/desc_item.php?whichitem=113038072
12 header fields
Field: Transfer-Encoding = [chunked]
Field: null = [HTTP/1.1 200 OK]
...
Field: Content-Type = [text/html; charset=UTF-8]
- Therefore, we read responses from KoL in UTF-8. GenericRequest.java:

Code:
		this.responseText = new String( ByteBufferUtilities.read( istream ), "UTF-8" );
- And therefore, in KoLmafia's data files, where we store data scraped from KoL, we store UTF-8 encoded characters.
- Additionally, when we read ASH scripts, we read them in UTF-8. Parser.java:

Code:
			this.commandStream = new LineNumberReader( new InputStreamReader( this.istream, "UTF-8" ) );

Therefore, your issue seems to be this:

- Your system's default character encoding is not UTF-8.
- Therefore, the text in a JTextArea (the command line in the gCLI) is in that encoding.
- And also therefore, the JEditorPane (the display area in the gCLI) is in that encoding.
- When you type an "ash" command in the gCLI, you end up with a non-UTF-8 String and when ASH parses it, it compares against UTF-8 encoded strings for $effect and $item.

We cannot force you to use UTF-8 as your Java default encoding.

- ASH files and KoLmafia datafiles and ASH map_to_file/file_map use UTF-8
- The command line from the gCLI may or may not be UTF-8

Therefore, perhaps the "ash" CLI command should try to do some sort of character conversion on the input string before passing it to an ASH Interpreter.
 

Veracity

Developer
Staff member
Mine says Cp1252, whatever that means. And, after updating, I get this in the login messages:

net.sourceforge.kolmafia.swingui.AdventureFrame could not be loaded

and I'm no longer able to use the adventure frame, at all. Is this related to this ? And, if so, (and keep in mind I barely understand what you guys are talking about), how do I fix it ?
I can't say whether it is related or not.

You should have a DEBUG file with a stack trace. If not, you'll see a stack trace on the console.

Post it, please.

Sorry for the inconvenience.
 

AlbinoRhino

Active member
Please disregard my last post. I figured out what was wrong. Entirely my own fault. I did have a debug and it informed me that I still had the property addingScrolls set to -2 (index out of bounds). I do that because it is the only way I have figured out to disable mafia's automated fighting of the adding machine and allow the fight to use my own fight script. I thought I had code to reset the prop to something in bounds after I get my facsimile dictionary, so I will have to look at that. Working again now. Sorry for the false alarm.

Here it is:
 

Attachments

  • DEBUG_20170116.txt
    9.8 KB · Views: 21
Last edited:

Veracity

Developer
Staff member
We certainly shouldn't fail to load your GUI because a preference has a bad value. I'll see what I can do to make that not happen.

Code:
		int adding = Preferences.getInteger( "addingScrolls" );
		if ( adding == -1 )
		{
			adding = Preferences.getBoolean( "createHackerSummons" ) ? 3 : 2;
			Preferences.setInteger( "addingScrolls", adding );
		}
		this.addingSelect.setSelectedIndex( adding );
Heh.
 
Last edited:

Veracity

Developer
Staff member
It is odd. It is not in defaults.txt and is not used anywhere in KoLmafia except that single line.
 

Veracity

Developer
Staff member
Code:
		this.addingSelect = new JComboBox();
		this.addingSelect.addItem( "show in browser" );
		this.addingSelect.addItem( "create goal scrolls only" );
		this.addingSelect.addItem( "create goal & 668 scrolls" );
		this.addingSelect.addItem( "create goal, 31337, 668 scrolls" );
Apparently if you have that undocumented preference set to true, and have manually set "addingScrolls" to -1, it will select the last option, otherwise the next to last option.

That seems like a hack.
 
Top