Feature ASH language server features

MCroft

Developer
Staff member
To the extent that ash features are more static than dynamic, why integrate into mafia to keep things up to date? Why not a separate utility or code base that looked at mafia source and updated the spec in response to observed changes? How does LSP in general handle externals (in an IDE context)? I can imagine and IDE that knew about zlib but is that knowledge derived dynamically based upon where, or if, zlib is in my script file system?

Presumably if this were done right there would be a plugin that could be created that would extend language support in IntelliJ or Eclipse to ash?
So, I think there are a few ways to approach where the server "lives", all of which are likely to change the project, but only some of which may change the KoLmafia jar. Fred will want to talk about his goals, constraints, and what he's learned.
  1. Build the LSP server into KoLmafia, and start it with a different CLI flag (similar to --CLI, but "--LSP" or some such.)
  2. Build the LSP server into KoLmafia and always be listening to a new port for requests.
  3. Add a 2nd Java app to the KoLmafia source and make a build target for it.
  4. Create an ASH definitions exporter for KoLmafia that creates a definitions flat file for a completely external program
or many other approaches. Will leave my opinions out of this post. :)
 
Last edited:

fronobulax

Developer
Staff member
To use the LS, you open a script file in an editor. If this file uses zlib, there will be a line "import <zlib.ash>;". Mafia reads that, figures out where it should import zlib from, and includes all of zlib's functions at the top level. You can then use these functions in your script. Similarly, the LS looks at the file, sees the line "import <zlib.ash>", finds the zlib file, and marks all the zlib functions as available and coming from zlib (and what parameters they take, etc.). None of this is hardcoded: it can work for any file you choose to import.

Both IntelliJ and Eclipse have LSP clients, so after this is done you should be able to create a plugin that lets them communicate with the LS, yes.

Got it. I suspect we would have gotten here a lot less verbosely if the original post had said "Developing a LSP for ash with expected initial use in IDEs to allow syntax highlighting, autocompletion features and perhaps other features".

I think making an IDE for ash or an existing IDE ash-aware is a reasonable idea although not something I am likely to use.
 

fronobulax

Developer
Staff member
So, I think there are a few ways to approach where the server "lives", all of which are likely to change the project, but only some of which may change the KoLmafia jar. Fred will want to talk about his goals, constraints, and what he's learned.
  1. Build the LSP server into KoLmafia, and start it with a different CLI flag (similar to --CLI, but "--LSP" or some such.)
  2. Build the LSP server into KoLmafia and always be listening to a new port for requests.
  3. Add a 2nd Java app to the KoLmafia source and make a build target for it.
  4. Create an ASH definitions exporter for KoLmafia that creates a definitions flat file for a completely external program
or many other approaches. Will leave my opinions out of this post. :)

I really like 4 for obvious reasons.

If the stated goal was a source level debugger for ash I'm not sure it could be done without adding a server in mafia or replicating the ash environment standalone. The former is likely to be more reliable and easier to use but potentially intrusive.
 

ckb

Minion
Staff member
FWIW, I am an ASH scripter, but not a programmer. I have little to no formal coding education, so most of this goes over my head. That said, I write a lot of ASH scripts, using Notepad++. I am particular, so I write my own language definition.
Option #4 would be the least intrusive, and also allow the possibility for those like myself to parse this definitions file to create their own language files using a custom parser.
 

gausie

D̰͕̝͚̤̥̙̐̇̑͗̒e͍͔͎͈͔ͥ̉̔̅́̈l̠̪̜͓̲ͧ̍̈́͛v̻̾ͤe͗̃ͥ̐̊ͬp̔͒ͪ
Staff member
I think we should do version 1 (except enabled with a flag in settings rather than with a CLI flag). You can get version 4 fairly easily as a function of version 1 (and there seem to be many attempts at writing an LSP client for notepad++)
 

fredg1

Member
Edit: this patch is now obsolete.

Here's the first step, which also illustrates the kind of changes I was talking about
Not clear why mafia has to be changed to support that but that could be me interpreting questions as Requests For Change.


Mafia currently does not care about anything in the original files other than the row at which each line was found. This is an attempt at making it save both the row and the column it is at.
The goal is to eventually make it so that each ParseTreeNode contains which row+column it is found at (start and end), which is the cornerstone of most LSP features (since this is basically the information that needs to be sent back to the client, most times).

With only a single exception, this patch does not change in any way how Parser works.

The exception:

The only exception is the two modifications at the start of ParseTypedConstant.
They should also not change Parser's behaviour in any way, but since I'm not the one who made this class, I'm going to leave open the possibility that I made a mistake, so I'm disclaiming this, just in case.

before:
When the parser saw a plural type token, it would re-write it as singular, then deletes the currently cached token, which causes it to get parsed again.

Now:
Instead of rewriting the whole line, the token itself is changed, both skipping a useless parsing, since this new token must be what would have been parsed anyway (this is where I "could" be wrong), and keeping the original line/text, keeping us synchronized with it.
 

Attachments

  • Parser.java.patch
    11.8 KB · Views: 2
Last edited:

fredg1

Member
The first version of the language server is pretty much ready to go, but I'm trying to apply @heeheehee 's advices, and will try to split this as much as possible.

Here is the first step.

This patch is about introducing two new classes to Parser.java: the Line, and the Token.

Currently, everything is just Strings. The line we are currently at is represented by a simple string, and the token we are currently at is represented by a simple string.
This patch replaces these by Line and Token, respectively.
The purpose of the Line is to save, and allow access to, the file as parsed. It holds reference to every other line, as well as the tokens that are inside of itself.
The purpose of the Token is to hold information relative to it. Namely, the index at which it starts and ends. (Those are not included in this patch. They will be added later.)

These information are crucial to the language server, as it is what allows the language server to tell the client where to put the red squiggles marking errors, for instance.

This patch 3300 lines long.

Q. That's big.
A. That's not a question.

Q. Why is it so big? (pompous assh--)
A. Did you mutter something? Anyway, yes, it's big. This is because, currently, Parser.java uses a lot of this:
Java:
"exampleStringLiteral".equals( this.currentToken() )
This is a null-safe way to call the `String.equals( String )` method.
However, since the `currentToken()` method now returns a Token, they need to be reversed. (The method cannot return null, so it's still null-safe)
Since these are peppered across Parser.java, and since .patch files add context lines above and below the changed line(s), it bloats the patch file.

Besides this, this patch only contains 3 major changes.
  1. Parser( File, InputStream, Map<File, Long> ) - Parser.parse() - Parser.importFile( String, Scope )
    The closing of the InputStream now happens inside the contructor, rather than after the parsing. This means that there no longer is a need to surround Parser.parseScope( Scope, Type, VariableList, BasicScope, boolean, boolean ) with a try-finally block.

  2. Sections that handled strings character-by-character: Parser.parseBasicScript() - Parser.parseString() - Parser.parseEscapeSequence( StringBuilder, int ) - Parser.parseTypedConstant( BasicScope ) - Parser.parsePluralConstant( BasicScope, Type ) - Parser.parseDirective( String )
    It was not possible to leave those out of this patch, given how they were interacting with fields that are no longer here.

    Most notable changes:
    • Handling of escape sequences was previously in Parser.parseString( BasicScope, Type ). This caused Parser.parseTypedConstant( BasicScope ) to call it to parse plural constants.
      This is no longer the case. Parser.parseString( BasicScope ) and Parser.parsePluralConstant( BasicScope, Type ) are now two different methods.
    • Handling of escape sequences is now in its own method, Parser.parseEscapeSequence( StringBuilder, int ), which is called by the two aforementioned methods when required.
  3. Methods in the "Tokenizer" section: Parser.currentToken() - Parser.nextToken() and the Line + Token classes.
    Obviously, that had to go somewhere...
 

Attachments

  • Parser_Tokens.patch
    88.9 KB · Views: 2

heeheehee

Developer
Staff member
currently, Parser.java uses a lot of this
For reference, grep counted 247 instances (typically one insertion and one deletion, so probably closer to a hundred instances):

Bash:
$ grep -cE '("\.equals\w* ?\( this\.(next|current)Token|(next|current)Token\(\)\.equals)' Parser_Tokens.patch
247

Some ideas for how to break this up even further:

- When reading through this, I see some scattered formatting fixes. A pure "reformat this file" change is super easy to review.

- It doesn't look like the existing currentToken() method can ever be null.

Java:
    private String currentToken()
    {
        // Repeat until we get a token
        while ( true )
        {
            // If we've already parsed a token, return it
            if ( this.currentToken != null )
            {
                return this.currentToken;
            }
            ...
            // Get the next token for consideration
            this.currentToken = this.currentLine.substring( 0, this.tokenLength( this.currentLine ) );
            ...
            // "/*" starts a comment which is terminated by "*/"
            if ( !this.currentToken.equals( "/*" ) )
            {
                return this.currentToken;
            }
        ...
}

Can we perhaps do all the reversing of "foo".equals( this.currentToken() ) in a standalone change? It seems safe enough to me.
 

fredg1

Member
For reference, grep counted 247 instances (typically one insertion and one deletion, so probably closer to a hundred instances):

Bash:
$ grep -cE '("\.equals\w* ?\( this\.(next|current)Token|(next|current)Token\(\)\.equals)' Parser_Tokens.patch
247
While I did say that the internal logic of Parser.nextToken() changed, it's return value did not (and it can, in both versions, still very well return null), so it shouldn't be included here.

- When reading through this, I see some scattered formatting fixes. A pure "reformat this file" change is super easy to review.
While it is true that it is very easy to review, it is also very easy to reject or ignore...
This goes back to https://kolmafia.us/threads/shout-spring-cleaning-shout-whitespaces.25975/
with https://kolmafia.us/threads/shout-spring-cleaning-shout-whitespaces.25975/#post-161891 specifically hammering home the point that a few formatting edits across a patch on a specific file is preferred over a patch centered specifically around formatting...

Then there was my many other attempts throughout April that were just ignored, even though there were literally just an attempt at continuing the work of Veracity
(https://kolmafia.us/threads/20300-m...eliminate-lots-of-unchecked-references.25314/ ,
https://kolmafia.us/threads/20302-eliminate-deprecated-explicit-boxing.25316/ ,
https://kolmafia.us/threads/20304-fix-hundreds-of-compiler-warnings-2267-remain-using-java-14.25319/ ,
https://kolmafia.us/threads/20306-e...g-a-static-methid-as-if-it-is-an-insta.25321/ ,
https://kolmafia.us/threads/20308-fix-compiler-warnings-in-most-of-ash-and-some-cli-commands.25326/ ,
https://kolmafia.us/threads/20309-resolve-some-compiler-warnings.25327/ (this one's by fronobulax),
https://kolmafia.us/threads/20313-f...ange-frame-render-items-names-prettier.25334/ )

So I don't know if you view this differently, but from what I saw so far, patches that are centered around formatting changes just get ignored.

- It doesn't look like the existing currentToken() method can ever be null.

Java:
    private String currentToken()
    {
        // Repeat until we get a token
        while ( true )
        {
            // If we've already parsed a token, return it
            if ( this.currentToken != null )
            {
                return this.currentToken;
            }
            ...
            // Get the next token for consideration
            this.currentToken = this.currentLine.substring( 0, this.tokenLength( this.currentLine ) );
            ...
            // "/*" starts a comment which is terminated by "*/"
            if ( !this.currentToken.equals( "/*" ) )
            {
                return this.currentToken;
            }
        ...
}

Can we perhaps do all the reversing of "foo".equals( this.currentToken() ) in a standalone change? It seems safe enough to me.
Sure, give me a minute...
 

heeheehee

Developer
Staff member
So I don't know if you view this differently, but from what I saw so far, patches that are centered around formatting changes just get ignored.
That's fair. I personally don't agree with that, but I see frono's point that you said you made these changes manually, and manual edits always have the potential to inject errors.

It should not be controversial to make a formatting change via some automated tool, whether that's clang-format or idea or eclipse or something else, assuming that it enforces some agreed-upon style.

I think it absolutely should not be controversial to make a pure formatting change as one of the patches in a broader refactor to make later ones easier to review, even if it's not committed in isolation. Again, communicating where we're going, and all that.
 

heeheehee

Developer
Staff member
Hm, looks like you fixed a bug in parseJavaFor:
Java:
while ( this.currentToken() != null && !this.currentToken.equals( ";" ) )
Nice.

Overall looks good to me by inspection, but this is where I wish we had some actual tests to confirm that behavior isn't breaking.

Took it for a quick spin, and nothing seems to have broken. Committing momentarily.
 

fredg1

Member
Hm, looks like you fixed a bug in parseJavaFor:
Java:
while ( this.currentToken() != null && !this.currentToken.equals( ";" ) )
Nice.
This isn't actually a bug. this.currentToken() also sets the value of this.currentToken, so since no reading happens between the two, it is indeed the same (since this.currentToken() will simply return the value of this.currentToken )
It's just a better idea to keep relying on the method consistently.
 

fredg1

Member
Now, here is the conversion towards the Token.

I came back and removed the formatting changes, as well as putting the null checks back (even though they are still pointless), so that they can be changed in a later patch. This patch is really the bare minimum. (minus *ONE* typo fix. Really couldn't resist ._.)
 

Attachments

  • Parser_Tokens_2.patch
    55.6 KB · Views: 1

heeheehee

Developer
Staff member
This isn't actually a bug. this.currentToken() also sets the value of this.currentToken
Not always. I'm looking at the second if-statement here:
Java:
private String currentToken()
    {
        // Repeat until we get a token
        while ( true )
        {
            // If we've already parsed a token, return it
            if ( this.currentToken != null )
            {
                return this.currentToken;
            }

            // Locate next token
            this.fixLines();
            if ( this.currentLine == null )
            {
                return ";";
            }

In particular, if currentLine is null at this point (e.g. because of reaching end-of-file), we would have encountered a NPE. (Note that fixLines() explicitly sets this.currentToken to null at the get-go). I can trigger it with a file containing just "for(".
 

fronobulax

Developer
Staff member
Manual formatting changes can introduce errors.

Automated formatting changes to an agreed upon standard should be applied to every code file before it is committed (or presented for review).

A separate patch that creates tests and thus can be run before and after the "patch of interest" is committed is a good thing and would increase the likelihood that a large or complex patch will be reviewed and committed.

The most rigorous organization I worked for used a commit process (as opposed to a single command, but the process appeared atomic to the developer) that automatically formatted source files and computed a test coverage metric. If the commit lowered the coverage metric it was rejected and the developer either wrote more/better tests or petitioned the Configuration Manager for an exception.

The last time I recall an attempt to agree upon formatting there were areas where the devs agreed to disagree and no one could figure out how to allow a tool to let alternative formats coexist. Things have changed and if there is someone who wants to manage the process of getting agreement and configuring the tool that would be fine with me.
 

heeheehee

Developer
Staff member
Frono: again, I agree that manual formatting changes can cause errors. I don't think investing in tooling on this front is worth the effort right now, and besides, we didn't have consensus, last I checked.

Back to the patch:
Diff:
@@ -622,23 +612,23 @@
             }
 
             // Get the field name
-            String fieldName = this.currentToken();
+            Token fieldName = this.currentToken();
             if ( fieldName == null )
             {
                 throw this.parseException( "Field name expected" );
             }
 
-            if ( !this.parseIdentifier( fieldName ) )
+            if ( !this.parseIdentifier( fieldName.content ) )

Should this first check be if (fieldName.content == null) ?

Parser.parseIdentifier can and will throw an NPE if its argument is null, as I determined when I was trying to reproduce the parseJavaFor bug (and instead stumbled across a different extant one in parseFor)...

My brain's a bit fried from reviewing code all day, but given that there seems to be a bug here, I'm hesitant to move forward without at least some tests in place.

The CustomScriptTest I set up once upon a time may be useful here, since you can just create some .ash files and make sure they have reasonable outputs (parsing errors or otherwise, but ideally not "Unexpected error, debug log printed").

An example test for the NPE I referenced earlier:

test/root/scripts/test_java_for.ash:
Code:
for(
test/root/expected/test_java_for.ash.out:
Code:
Variable reference expected (test_java_for.ash, line 1)
 

heeheehee

Developer
Staff member
Okay. It looks like Token.content can never be null, as Token() always initializes it with either ";" or the result of some substring call. But do we have that same guarantee within Line.content (which is used to populate Token.content)? It doesn't look like it.

Either way, I don't trust myself to catch everything tonight. Sorry.
 

fredg1

Member
Okay. It looks like Token.content can never be null, as Token() always initializes it with either ";" or the result of some substring call. But do we have that same guarantee within Line.content (which is used to populate Token.content)? It doesn't look like it.

Either way, I don't trust myself to catch everything tonight. Sorry.
As I said, I only left those here to make the reviewing easier. These will either be removed, or be replaced with a check that actually helps in a later patch.
 
Top