Feature ASH language server features

fredg1

Member
@heeheehee thought on where to go next.

I just made Token extend Range, and exposed the Location-related methods, so the next step would intuitively be to move on with giving a Location to the various parsetree classes.

However, as it has been pointed out, testing for these Locations was asked.
Now, while it "would" be possible to make those tests inside ParserTest.java, doing so would result in an *enormous, chaotic mess*.

The logical conclusion would be that it would be preferable to first split Parser's parsing methods between textui/parsetree's classes.

Now, since Parser is in a different package from textui/parsetree, doing so will inevitably lead to methods being made public.
Obviously, this means that adding tests will be required, which, let me say... My god, I will cling to any hope there is, however small it is, that there's a way to not have to do that.

So, what *could* we do to not have to do that? (given that it needs to be more than "because I asked gently")
An idea I had was to give two flags to Parser instances.
The flags would go along the lines of "contentParsed" and "parsingInProgress". The latter would be set to true once Parser.parse is called (unless contentParsed is true, in which case it throws an IllegalStateException), and to false before exiting that same method (while also setting contentParsed to true), with no way of modifying them anywhere else.

Then, every method that is forced to be public because of this change, but we don't actually want to be publicly available, would start with
Java:
public Expression parse(final Parser parser, <other parameters>) {
    Parser.checkState(parser);
   [...]
}
with that method reading
Java:
class Parser {
    private boolean contentParsed = false;
    private boolean parsingInProgress = false;
    [...]
    private boolean isParsing() {
        return this.parsingInProgress;
    }

    public void checkState(final Parser parser) {
        if (parser == null || !parser.isParsing()) {
            throw new IllegalStateException("Parsing method accessed outside parsing logic");
        }
    }
    [...]
}

Since parsing is done all in one go, this would effectively allow us to deny access to those methods from outside the parsing logic, making them safe.

Thoughts?
 

heeheehee

Developer
Staff member
So, what *could* we do to not have to do that?
Prove to me that your code is correct.


As far as your suggestion: I don't like the idea of calling parser.checkState() outside of Parser. That's error-prone and easy to ignore.

Now, since Parser is in a different package from textui/parsetree
(You could change this. I'm not convinced that's better.)
 

fredg1

Member
Prove to me that your code is correct.
Not sure I get you. Is this supposed to be a "you're meant to understand that it would be harder than writing tests" kind of thing?
Are you implying it's feasible?

I don't get how it would be possible without, you know, tests... Additionally, you mentioned that tests are also meant to define a method's behaviour...
 

heeheehee

Developer
Staff member
Not sure I get you. Is this supposed to be a "you're meant to understand that it would be harder than writing tests" kind of thing?
Are you implying it's feasible?
For simpler cases, it's absolutely feasible to prove correctness of code, e.g. via an exhaustive proof (for instance, checking all inputs). In the general case, this isn't as easy, which is usually why we resort to tests that define edge-case behavior as well as behavior in the happy path.

If you want to write code without tests, especially if it touches something as central as ASH, I would like to be totally convinced without a shadow of a doubt that your code is totally correct, can never crash, does not break any existing scripts, etc.

I strongly prefer that you provide tests to make sure that we don't break any piece of this behavior with future changes, but if you're fine with only convincing me that it's correct today, and are fine with the possibility that it could break in the future...
 

fredg1

Member
O.k., so... error locations are not now fully implemented...
The next question isn't where to go next, it's how should we get there.

The next step is to make it so that we always parse up to the end of the file, regardless of errors encountered on the way.
Now, this would be a pretty big change, which should preferably be split into a few PRs. However, doing this would come with a lot of risk.

I *could* come up with something that would glue together these intermediary steps (as in allow Parser to function with part of it throwing Exceptions, and part of it storing them). However, I can't guarantee that it's safe. I can't guarantee that I won't miss an infinite loop or a null pointer exception, all in the name of "making review easier".

So, the question is: do you insist that I split this into multiple PRs, or would it be okay to submit the larger one? (I'm still not submitting the whole step as one, but at least, the initial, under-the-hood transition would be all done at once)
 
Last edited:

fronobulax

Developer
Staff member
O.k., so... error locations are not fully implemented...
The next question isn't where to go next, it's how should we get there.

The next step is to make it so that we always parse up to the end of the file, regardless of errors encountered on the way.
Now, this would be a pretty big change, which should preferably be split into a few PRs. However, doing this would come with a lot of risk.

I *could* come up with something that would glue together these intermediary steps (as in allow Parser to function with part of it throwing Exceptions, and part of it storing them). However, I can't guarantee that it's safe. I can't guarantee that I won't miss an infinite loop or a null pointer exception, all in the name of "making review easier".

So, the question is: do you insist that I split this into multiple PRs, or would it be okay to submit the larger one? (I'm still not submitting the whole step as one, but at least, the initial, under-the-hood transition would be all done at once)

In the New World Order of Github, I have found I tend to review more promptly when
  • The number of files is small, preferably less than 5
  • I have some understanding of what the author is trying to do OR
  • The PR has an understandable statement of intent
  • I think it is a good or necessary change
Your next step misses most of those :)

I will defer to @heeheehee since they have done most of the heavy lifting in this area.

O.k., so... error locations are not fully implemented...

Typo - not should be now?

The next step is to make it so that we always parse up to the end of the file, regardless of errors encountered on the way.

Why is this a requirement? Specifically if meeting the requirement increases the possibility of doing something "unsafe" is the better solution to relax the requirement?

I know a good well behaved parser for a language with a firm grammatical foundation will parse the entire file and generate a series of errors, perhaps all with a common cause. But I have also lived with parsers that bail at the first error. As the writer of the code being parsed I accept that there is a choice between having to make multiple runs of the parser to eliminate syntax errors and making one run of the parser but being faced with a page full of errors all of which have the same, single cause. Sometimes I prefer the former. My preference does vary with the cost of parsing. When the turn around time for a compile job was measured in hours I wanted all the errors flagged. But when it is measured in seconds one error at a time is acceptable and is easier to manage.

I can't guarantee that I won't miss an infinite loop or a null pointer exception, all in the name of "making review easier".

Clarify, please? Is the loop or NPE you fear in KoLmafia itself and the result of parsing or is it something bad that a script could do?

If the former then I might respectfully suggest the parsing implementation is flawed. You could convince me that the problem is the desire to have a small number of files in a review that causes the problem, but my counter would be to see if we can live with "unsafe" for a week or two while multiple patches work their way through the system.

I also wonder how much tests could increase safety and confidence? Perhaps step one would be to add tests (scripts?) that explicitly test for the unsafe conditions you are concerned with. They could fail now and be disabled until the work was complete and they passed.
 

fredg1

Member
Typo - not should be now?
...yes, typo. "Now".

Why is this a requirement? Specifically if meeting the requirement increases the possibility of doing something "unsafe" is the better solution to relax the requirement?
Clarification: parsing up to the end of the file *will be* fully safe. I *know* I have plugged all these unsafe holes on my side.
The worries come from the transition to that state.

Making small PRs means that I shouldn't commit all of it at once.
What this means is that I may select changes in a given method for a PR, but miss that these changes have impact in other methods, and that I should have also selected some of the changes in these methods, too.

The safety worries *only* come from having to split all of this into multiple PRs.

Clarify, please? Is the loop or NPE you fear in KoLmafia itself and the result of parsing or is it something bad that a script could do?

If the former then I might respectfully suggest the parsing implementation is flawed. You could convince me that the problem is the desire to have a small number of files in a review that causes the problem, but my counter would be to see if we can live with "unsafe" for a week or two while multiple patches work their way through the system.
Let's clarify this with an example:

Most
Java:
while (true) {
   
}
will turn into this:
Java:
Position previousPosition = null;
while (this.madeProgress(previousPosition, previousPosition = this.getCurrentPosition())) {
   
}
which is logic made specifically to prevent said infinite loops.

Now, this change happens all over Parser.java. However, in order to make short PRs, I would only submit some of them at a time.
The worry is that I may not submit one when needed.

The solution could be "just submit all of them at once, just to be sure", well, that's exactly what I'm arguing for here.

I also wonder how much tests could increase safety and confidence? Perhaps step one would be to add tests (scripts?) that explicitly test for the unsafe conditions you are concerned with. They could fail now and be disabled until the work was complete and they passed.
The issue here is that all the "unsafe conditions" are related to what happens *after* a script gets an error.
This would mean that in order to test these unsafe conditions, we would need to append every possible error state to a script that already got an error state.
We have 247 of those.
247^2 = 61009 different tests
 
Last edited:

fronobulax

Developer
Staff member
OK. How big is this patch? How long can you wait for a review before you get concerned because you can't proceed to your next step while this is pending.
 

fredg1

Member
Well... given that I *know* that it's a lot to review, I guess I can deal with the fact that it could take, let's say, 4-5 days... (given the fact that it's all voluntary/hobby work)
 

heeheehee

Developer
Staff member
I will defer to @heeheehee since they have done most of the heavy lifting in this area.
I've been busy as of late (thanks frono + gausie for picking up my slack in reviewing stuff), but I could potentially take some quick passes at the code.

I am uninterested in the full cross product of errors unless each of those is uniquely meaningful (as opposed to 247 cases for the first error, and 247 cases for the second error).

To repeat frono's question: how big is this patch? Are we talking 1000 lines? 10k? The size of the patch is an input to how long it takes to review the code, as well as the complexity of the patch (e.g. if it's 10k lines of deleting empty newlines, that'll be faster to review than 500 lines of dense, terse code.)
 

fredg1

Member
I've been busy as of late (thanks frono + gausie for picking up my slack in reviewing stuff), but I could potentially take some quick passes at the code.

I am uninterested in the full cross product of errors unless each of those is uniquely meaningful (as opposed to 247 cases for the first error, and 247 cases for the second error).

To repeat frono's question: how big is this patch? Are we talking 1000 lines? 10k? The size of the patch is an input to how long it takes to review the code, as well as the complexity of the patch (e.g. if it's 10k lines of deleting empty newlines, that'll be faster to review than 500 lines of dense, terse code.)
The vast, vast majority of the patch is just a switch of the method used to make an error, and, in most cases, adds a single line afterwards to allow us to keep using non-null values.
example:
Java:
    Evaluable condition = this.parseExpression(parentScope);

    if (condition == null) {
      Location errorLocation = this.makeLocation(this.currentToken());

      throw this.parseException(errorLocation, "Expression expected");
    }
becomes
Java:
    Evaluable condition = this.parseExpression(parentScope);

    if (condition == null) {
      Location errorLocation = this.makeLocation(this.currentToken());

      this.error(errorLocation, "Expression expected");

      condition = Value.locate(errorLocation, Value.BAD_VALUE);
    }

That additional line includes deciding to read or not the next token anyway, exiting loops, exiting the method...

This does mean that it affects every single place where we were throwing an exception in Parser.
 

heeheehee

Developer
Staff member
so... a ~550 line patch of a 2-line pattern, repeated throughout the code (okay maybe 800 with that empty newline in the middle)? Yeah, that's fine to send in one go.
 

fredg1

Member
Alright, it's nearly gonna be time to start submitting the Language Server itself.
Before that, though, there would be two issues (interconnected) that I have yet to solve: Communication and Path.

Path
These two issues start from a single point: KoLConstants.ROOT_LOCATION.
Java:
    public static final boolean USE_OSX_STYLE_DIRECTORIES = System.getProperty( "os.name" ).startsWith( "Mac" );
    public static final boolean USE_LINUX_STYLE_DIRECTORIES =
        !UtilityConstants.USE_OSX_STYLE_DIRECTORIES && !System.getProperty( "os.name" ).startsWith( "Win" );

    public static final File BASE_LOCATION = new File( System.getProperty( "user.dir" ) ).getAbsoluteFile();
    public static final File HOME_LOCATION = new File( System.getProperty( "user.home" ) ).getAbsoluteFile();

    public static final File ROOT_LOCATION =
        Boolean.getBoolean( "useCWDasROOT" ) ?
        UtilityConstants.BASE_LOCATION :
        UtilityConstants.USE_OSX_STYLE_DIRECTORIES ?
        new File( UtilityConstants.HOME_LOCATION, "Library/Application Support/KoLmafia" ) :
        UtilityConstants.USE_LINUX_STYLE_DIRECTORIES ?
        new File( UtilityConstants.HOME_LOCATION, ".kolmafia" ) :
        UtilityConstants.BASE_LOCATION;

What this excerpt means:
If your setting "useCWDasROOT" it true AND/OR you are on windows, the value of KoLConstants.ROOT_LOCATION will be the directory from which this program (KoLmafia) was called.

KoLmafiaCLI.findScriptFile (which we use in Parser.importFile) then uses this value, enforcing that any file it returns must be going through this path.
Unless I'm unaware of an alternative, this means that the Server won't work properly if it isn't called from the target scripts' root directory.

Proposition 1: letting Parser.importFile have access to a variant of KoLmafiaCLI.findScriptFile which would let him use a different path as the root.

Communication
All the server needs to work is two things: an InputStream and an OutputStream.
These two requirements are easily filled by System.in and System.out if the client directly calls the Server. However, as seen above, it often won't work (unless the proposition is accepted. Even then, though, we might want to add an alternative to allow access to the server when KoLmafia is already running).

Proposition 2 (kinda):
I'm not really well versed on communication across programs, so it is likely that there's a better way, but what I thought was to use another executable that would act as a bridge. The server would first be ran manually by the user in the wanted directory, and the bridge would be what the client executes.
As for how the bridge and the server communicate, my best guess would be an IPV4 address/port, which means a Socket, but there's got to be a better way, right? Not being able to communicate prior to choosing the address/port means it would have to be fixed, meaning we can't afford landing on an address/port already used by another program (even another instance of the server)...

(I used the conditional tense for proposition 2, but it's actually what I went for for testing the server, i.e. this proposition was tested. I just recognize its flaws)

Proposition 3: we listen to the advice of someone with more knowledge on these topics than me.
 

heeheehee

Developer
Staff member
I think all the dev team members will agree that we want to keep the existing ban on Mafia reading from arbitrary directories, as that's definitely a potential attack vector. Why do you need arbitrary paths? Can you just return an informative error if a file isn't in the right directory? I don't feel that it's much of an imposition to tell script authors they can only use this against scripts they can also execute... for instance, do script authors actually check out their own scripts via SVN within Mafia?

Why can't Mafia run the server (dynamically launchable if necessary, similar to the relay [nobrowser] CLI command)? It's already running the RelayServer in a background thread, I can't imagine it's that bad to add a second one. And if you're using sockets, you'll have input / output streams.
 

fredg1

Member
Why do you need arbitrary paths? Can you just return an informative error if a file isn't in the right directory? I don't feel that it's much of an imposition to tell script authors they can only use this against scripts they can also execute...
That's because language clients often seem to ask for something they can execute
idea64_GcrAR28PsS.png
(pictured: IntelliJ)

However, as noted in the previous post, if *they* execute the jar, then the root's location will be based on where *they* execute it, so telling them (the users) that a file isn't in the right directory wouldn't help because they won't have a say on "what is" the right directory.

However, just making sure to reiterate that fixing this one issue is optional. You already made your point that allowing arbitrary paths was a no-no, and I don't intend to push further. I only answered the other points for informative purposes.
Fixing the second "issue" (communication) is a workaround for the path issue.


Why can't Mafia run the server (dynamically launchable if necessary, similar to the relay [nobrowser] CLI command)? It's already running the RelayServer in a background thread, I can't imagine it's that bad to add a second one.
100% possible (and was in fact envisioned as what we would do from the start). Proposition 1 was only submitted because it would have been the only way to not force us to build a second java archive.
(This is because, if mafia runs the server, it means it will already be running when the client wants to call it, meaning the client will need to call "something" else (a "bridge" of sort))

p.s. I say "build a second java archive", but this may just show how uneducated I am; is there a way to have the bridge be of a different language, for example, straight-up BASH? (it would still need to work cross-platform)

And if you're using sockets, you'll have input / output streams.
Yes, but how do we get the socket in the first place? (again, I'm just inexperienced in this field)
Is it only possible through IPV4/IPV6?
If so, do we need to just hard-code an address + port and hope the user never happens to be using that address+port?
AFAIK, we can't use a 'range', like mafia does with the relay server, because the bridge needs to also know in advance what address + port the server is using...
 
Last edited:

heeheehee

Developer
Staff member
100% possible (and was in fact envisioned as what we would do from the start). Proposition 1 was only submitted because it would have been the only way to not force us to build a second java archive.
(This is because, if mafia runs the server, it means it will already be running when the client wants to call it, meaning the client will need to call "something" else (a "bridge" of sort))
Why can't the client talk directly to this (hypothetical) secondary Mafia server?

p.s. I say "build a second java archive", but this may just show how uneducated I am; is there a way to have the bridge be of a different language, for example, straight-up BASH? (it would still need to work cross-platform)
Um, you could theoretically do that, but that sounds unfun (and no, bash wouldn't work for windows unless maybe if you're using something like WSL, but I don't really know how Windows works these days)

Yes, but how do we get the socket in the first place? (again, I'm just inexperienced in this field)
Is it only possible through IPV4/IPV6?
Unix domain sockets are a thing, but that's not part of the standard Java API until Java 16. Or part of Windows prior to Windows 10.

The typical approach is to create a server socket, bind it to an address + port, then listen + accept incoming connections (which creates per-socket connections that you'll actually read / write from), and dispatch them as appropriate (often to separate worker threads).

(Java allows you to combine the first two steps via the ServerSocket constructor)

If so, do we need to just hard-code an address + port and hope the user never happens to be using that address+port?
For the address, you can generally use a local loopback address (Mafia currently hardcodes 127.0.0.1, but really should use InetAddress.getLoopbackAddress()). Generally when you run a server, it binds to a specific port, which is a way of signaling to the OS that it should get packets destined to that port, and that nobody else should be able to bind to it.

AFAIK, we can't use a 'range', like mafia does with the relay server, because the bridge needs to also know in advance what address + port the server is using...
Mafia doesn't use a range for the relay server. It tries to bind to 60080, then to 60081 if something's already listening there, 60082, etc. A given Mafia instance is only bound to a single IP address + (TCP) port.
 

fredg1

Member
Why can't the client talk directly to this (hypothetical) secondary Mafia server?
Can it? I'm unaware of a way to connect to an already running application from the command line.

(disclaimer: the next answer takes as a premise that the answer to the previous question is "no". If the answer turns out to be "yes", feel free to ignore the text below, as it invalidates the need for the "bridge")
The typical approach is to create a server socket, bind it to an address + port, then listen + accept incoming connections (which creates per-socket connections that you'll actually read / write from), and dispatch them as appropriate (often to separate worker threads).

(Java allows you to combine the first two steps via the ServerSocket constructor)


For the address, you can generally use a local loopback address (Mafia currently hardcodes 127.0.0.1, but really should use InetAddress.getLoopbackAddress()). Generally when you run a server, it binds to a specific port, which is a way of signaling to the OS that it should get packets destined to that port, and that nobody else should be able to bind to it.
Yes, but how is the bridge supposed to know which port the server chose?

Mafia doesn't use a range for the relay server. It tries to bind to 60080, then to 60081 if something's already listening there, 60082, etc. A given Mafia instance is only bound to a single IP address + (TCP) port.
I was just referencing the fact that mafia aborts once it reaches port 60090, meaning it was trying port 60080 through 60090, which is what I called a "range".
 

heeheehee

Developer
Staff member
Can it? I'm unaware of a way to connect to an already running application from the command line.
... yes? You'd just tell the client to open a socket to localhost:60090. I don't think there's anything special about this.

You don't need a range; you could just make sure there's nothing else listening to 60090 (or just insist that if you want to use this thing, you must have that port available).
 

fronobulax

Developer
Staff member
Contributing from my usual ignorance and confusion.

My vision was that for the first iteration, a mafia user who wanted to use LSP downloaded and ran a separate jar file. If they didn't then the user was choosing to have no LSP support and that is fine.

So the LSP jar does not have to be subject to mafia's direction restrictions although it should probably honor them anyway to reduce the chance of an exploit. The user, in the process of configuring mafia to use the LSP sets an address and a port or more likely just a port on localhost.

Since the user has to take some actions, it could be required that the LSP has to be running before mafia is started. Probably only going to have LSP support for one instance of mafia initially anyway.

I am definitely coming at this from the standpoint that an LSP prototype will be extremely useful for refining the implementation. This is a lot of effort for what is basically one person's project and a useful prototype will do much for expanding the audience.
 
Top