Bug - Fixed "svn update" does not move files in local copy

philmasterplus

Active member
Edit on May 28th: This post has been rewritten to better summarize what is going on. To see the original post, check out the "spoiler" below.

KoLmafia r20716 here, running under AdoptOpenJDK 11.0.10+9 on Windows 10.

If I run "svn update" and the commits move file(s) to a different subdirectory, the files in the working copy are moved correctly, but those in the local copy are not.

Since this is difficult to reproduce cleanly, I'll provide instructions instead of providing a repo link:

  1. Create a new GitHub repository online. Let's say https://github.com/pastelmind/svn-update-repro
  2. Add a file in this repository: scripts/foo/some-dummy-script.ash. The contents of this file are unimportant.
    Commit and push changes to the repo.
  3. In KoLmafia, install this project: svn checkout https://github.com/pastelmind/svn-update-repro/trunk.
    • Result: A working copy is created under <kolmafia dir>/svn/pastelmind-svn-update-repro-trunk/.
      KoLmafia also adds a local copy of my script at <kolmafia dir>/scripts/foo/some-dummy-script.ash.
  4. Back to our repo, let's move the file to another dir: now it's in scripts/bar/some-dummy-script.ash
    Commit and push changes to the repo.
  5. Back in KoLmafia, run svn update. The gCLI prints this:

    Code:
    Validating repo...
    Repo validated.
    Updating pastelmind-svn-update-repro-trunk...
    C:\Users\Phil\Documents\KoL\svn\pastelmind-svn-update-repro-trunk
    A         https://github.com/pastelmind/svn-update-repro/trunk/scripts/bar
    A         https://github.com/pastelmind/svn-update-repro/trunk/scripts/bar/some-dummy-script.ash
    D         C:\Users\Phil\Documents\KoL\svn\pastelmind-svn-update-repro-trunk\scripts\foo
    https://github.com/pastelmind/svn-update-repro/trunk/scripts
    https://github.com/pastelmind/svn-update-repro/trunk
    At revision 5

    This is followed by a popup, to which I click "Yes":

    1620921758321.png

    Then KoLmafia prints out some more logs and completes:
    Code:
    Pushing local updates...
    some-dummy-script.ash => C:\Users\Phil\Documents\KoL\scripts\foo\some-dummy-script.ash
    Done.
    Update log for pastelmind-svn-update-repro-trunk:
    ------
    Commit r5:
    Author: pastelmind
    
    Move file to another dir
    ------
    Requests complete.
  6. Results:
    • In the working copy, the directory is successfully renamed: I have <kolmafia dir>/svn/pastelmind-svn-update-repro-trunk/bar/some-dummy-script.ash.
    • In the local copy, the file has not moved. It is still at <kolmafia dir>/scripts/foo/some-dummy-script.ash.

This can be fixed easily by moving the single file. However, my actual project has >20 files that need to be moved to multiple different subdirectories.
They are affected by this bug in varying degrees; some are moved, others are not.
Any ideas on how to handle this?

There are two issues at play:
  1. When svn checkout or svn update results in the creation of a new file(s) under scripts/ or relay/ (or any of their subdirectories), it searches recursively for a file with a matching name. If it finds a match, KoLmafia overwrites the matched file, instead of (re)creating the file in its intended location.

    This causes file name collisions when two projects provide files with the same name, but in different subdirectories.

    Note: As of r20743 KoLmafia will always write files to their correct paths when updating the local copy.
  2. When svn update results in the deletion of an entire directory, KoLmafia does not delete the directory in the local copy, leaving the files behind.

    This also occurs when an entire subdirectory is renamed or moved. Such a change may be represented by the remote Subversion repository as a pair of directory deletion and directory creation events. When updating the local copy, KoLmafia ignores the directory deletion event.
These issues affect the local copy--files that are called from the gCLI & the Script Menu, imported from ASH scripts, or served by the relay browser. It does not affect the working copies (i.e. files under svn/).
 
Last edited:

fronobulax

Developer
Staff member
Does scripts//bar/some-dummy-script.ash exist?

I have a vague recollection that mafia's SVN never expected to delete files that were not in scripts (i.e. scripts/foo) and the use case was there was no guarantee that the files in scripts/foo were exclusively under SVN control.

Renaming and moving can get dicey under pure SVN. Operations that could be atomic are not always.

I suspect your step 4 above is the issue.

If you break it up - create bar/something.ash, commit, delete foo/something.ash, commit do you get the results you want?

(You can't tell mafia SVN you are moving or renaming something. You can only get the effect by adding/creating the new and then deleting the old).
 

philmasterplus

Active member
Does scripts//bar/some-dummy-script.ash exist?
Nope. The file is still in the scripts/foo/ subdirectory, and the scripts/bar/ subdirectory is not created at all.

I tried several variations on your suggestion, but none of them worked.

(1) Create a copy of the file in another subdirectory​

  1. Create repository with scripts/foo/myfile.ash
  2. In KoLmafia, install with svn checkout https://github.com/...
  3. In repo, create a copy of the file at scripts/bar/myfile.ash. Commit.
  4. In KoLmafia, run svn update
Result:
  • Working copy has both scripts/foo/myfile.ash and scripts/bar/myfile.ash
  • Local copy only has scripts/foo/myfile.ash

(2) Create a copy of the file, then delete the original file​

  1. Create repository with scripts/foo/myfile.ash
  2. In KoLmafia, install with svn checkout https://github.com/...
  3. In repo, create a copy of the file at scripts/bar/myfile.ash. Commit.
  4. In repo, delete the original file at scripts/foo/myfile.ash. Commit.
  5. In KoLmafia, run svn update
Result:
  • Working copy has scripts/bar/myfile.ash
  • Local copy has scripts/foo/myfile.ash

(3) Delete the original file, then create a copy​

  1. Create repository with scripts/foo/myfile.ash
  2. In KoLmafia, install with svn checkout https://github.com/...
  3. In repo, delete the original file at scripts/foo/myfile.ash. Commit.
  4. In repo, create a copy of the file at scripts/bar/myfile.ash. Commit.
  5. In KoLmafia, run svn update
Result:
  • Working copy has scripts/bar/myfile.ash
  • Local copy has scripts/foo/myfile.ash



However, renaming the file AND moving it to a different subdirectory does work. I'm going to try renaming all my files and see if they are moved as intended.
 

MCroft

Developer
Staff member
I would like to put some breakpoints in SVNManager.java. Phil, thanks for the excellent step-by-step guide. I have vague hunches, but nothing I'd call a theory yet. I will let you know if I learn anything.


also, this is very old code, mostly copied from TMate's web site, and it has scary (to me) comments like this:
Code:
// no type-safe way to do this in Java 5 (6 has Deque)
Stack<SVNFileEvent> eventStackCopy = (Stack<SVNFileEvent>) SVNManager.eventStack.clone();
 

MCroft

Developer
Staff member
So we figure out file is updated and we need to push it with doPush(event.getFile(),ralpath). So far, so good...
event.getFile(): /path/to/svn/BadHorseMonkey-Mafia-Ashtest.git-trunk/scripts/qux/some-dummy-script.ash
Relpath: scripts/qux/some-dummy-script.ash

We go in, getRebase(newpath), and it searches for the file by name and replaces that path with oldpath.

In short, we don't support renaming parent directories. We could support it, but we forcibly refuse to do so.

I don't know if this is intended behavior, or if so, why it would be intended behavior.

Java:
private static void doPush( File file, String relpath )
{
       File rebase = getRebase( relpath );

       if ( rebase == null )
              // this is okay; just make the file in its default location
              rebase = new File( KoLConstants.ROOT_LOCATION, relpath );

       rebase.getParentFile().mkdirs();

       RequestLogger.printLine( file.getName() + " => " + rebase.getPath() );
       FileUtilities.copyFile( file, rebase );
}
private static File getRebase( String relpath )
{
       File rebase = new File( KoLConstants.ROOT_LOCATION, relpath );

       // scripts/ and relay/ exist in the searchable namespace, so only search if we're looking there
       // the root location is also in the namespace, but svn isn't allowed to put stuff there, so ignore it
       if ( relpath.startsWith( "scripts" ) || relpath.startsWith( "relay" ) )
       {
              List<File> matches = KoLmafiaCLI.findScriptFile( rebase.getName() );

              if ( matches.size() == 1 )
                     return matches.get( 0 );

              if ( matches.size() > 1 )
                     return null;
       }

       // some directories are not searched by findScriptFile, but we can just check if the rebase exists in those cases
       if ( rebase.exists() )
              return rebase;

       return null;
}
 

xKiv

Active member
If I undestand the issue ...
I don't think you can change this simply. IIRC the functionality that if I move the files within my scripts/ hierarchy, svn update will update the existing copies without recreating "originals", is intentional.
I am using this for script organization (under scripts/ and also in scripts menu). After installing, I move script files somewhere within my own hierarchy of subdirectories, by functionality type. I expect mafia to keep updating the moved files.
Anything that would handle this "correctly" would need to 1) detect that a file has been moved by an incoming svn change, and from where to where 2) find where the target file is under scripts/ 3) rename/move the target file relatively to where it actually lives
And that still wouldn't solve the case where the script has two files with the same name under different (sub)directories). That just might have to be a forbidden thing. (in fact, that might need to be validated across all installed scripts).
 

MCroft

Developer
Staff member
Yeah, I think xKiv's description is accurate. Even if it's not good behavior, it's well-established behavior, and changing it would have unintended consequences.

I think we're generally screwed if there are two files with the same name. That seems dangerous.
Edit: we just tell them "no".
Code:
if ( matches.size() > 1 )
{
       RequestLogger.printLine( "WARNING: too many matches for " + rebase.getName() +
              " in your namespace; no local files were updated." );

However we have the following eventStack:
0: update_delete directory foo
1: update_add directory qux
2: update_add file some-dummy-script.ash

And then we iterate over the stack in what looks like LIFO order (which is great! stacks are supposed to have problems with that). OTOH, it looks like this sequence needs to be executed in FIFO order. Delete dir foo, add dir qux, add file s-d-s-.ash...

So we're telling it to do the right things and failing.

I have a suggestion that might be allow xKiv's use as well as philmasterplus's. Two suggestions, really.
1: process items as a FIFO queue. We'd need to do some checking to make sure it was doing the right things.
2: process deletes before adds. So two passes through the stack (or Deque)...

The latter assures us that if the parent directory is deleted, it won't be found and rebased. the xKiv "I move the file" option still works, but so does the philmasterplus "I refactor my source" option.

Also, it dies a hard death if you change foo to bar, bar to baz and it tried to update_delete bar, because it's not there. So getting out of sync is kinda bad.

Rich (BB code):
VMDUMP032I JVM requested JIT dump using '/Users/mcroft/Library/Application Support/KoLmafia/jitdump.20210513.234630.7952.0004.dmp' in response to an event
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=000000000C6B96C0 Handler2=000000000C8E7860 InaccessibleAddress=0000000000000CE7
 
Last edited:

MCroft

Developer
Staff member
Or Idea #3, which may be even better: add all update_delete directories to the ignore list for rebase.
 

philmasterplus

Active member
I expect mafia to keep updating the moved files.

I understand the desire to move files around, but I wouldn't expect automatic updates to "just work" with that. Were I to make that decision, svn update would simply assume that the file was gone and recreate it.

However, I understand that we can't easily remove a long-established functionality.

@MCroft Thanks for your investigation. Does this mean that in general, it's bad for two projects to supply files with the same name? For example, if project A had relay/foo/code.js and project B had relay/bar/code.js, would installing A then B overwrite the first file? That would be a much more troubling problem.
 

MCroft

Developer
Staff member
I understand the desire to move files around, but I wouldn't expect automatic updates to "just work" with that. Were I to make that decision, svn update would simply assume that the file was gone and recreate it.

However, I understand that we can't easily remove a long-established functionality.
Yeah, that's why I was brainstorming, on the "if you told me to get rid of it, I want to get rid of it." basis. This is, AFAICT, one of the places where doing what the dev said to do may be better than being smart.

@MCroft Thanks for your investigation. Does this mean that in general, it's bad for two projects to supply files with the same name? For example, if project A had relay/foo/code.js and project B had relay/bar/code.js, would installing A then B overwrite the first file? That would be a much more troubling problem.
I'm not sure, but we could test that and see if we get the two-directory warning. the related case of relay/code.ash and another svn instance with relay/code.ash is bad in that it could end up with conflicts if we were bundling our dependencies rather than using dependencies.txt.
 
Last edited:

taltamir

Member
And that still wouldn't solve the case where the script has two files with the same name under different (sub)directories). That just might have to be a forbidden thing. (in fact, that might need to be validated across all installed scripts).
It does need to be validated across all script. I have previously encountered instances where two scripts clashed because they had an identically named file. the scripts just overwrite the file regardless of its location.

so let us say I have
/ttpack/util.ash
and there is also
/autoscend/util.ash

If I install autoscend first, then when I install ttpack it will overwrite the file
/autoscend/util.ash
with the file from ttpack. it will not relocate it btw.
this makes it impossible to use both scripts at once

this is why every file in autoscend project starts with auto_ to prevent such clashes
 
Last edited:

philmasterplus

Active member
It does need to be validated across all script. I have previously encountered instances where two scripts clashed because they had an identically named file. the scripts just overwrite the file regardless of its location.

Unfortunately, I can reproduce this. 😥

  1. In KoLmafia, run svn checkout https://github.com/pastelmind/svn-update-repro/branches/clobber-test-first.

    This branch has a single file, scripts/first/dont-overwrite-this.ash, which contains:
    Code:
    print('This is from clobber-test-first');

    Upon installing, KoLmafia creates a scripts/first/dont-overwrite-this.ash in the local copy.

  2. In KoLmafia, run svn checkout https://github.com/pastelmind/svn-update-repro/branches/clobber-test-second.

    This branch has a single file, scripts/second/dont-overwrite-this.ash, which contains:
    Code:
    print('This is from clobber-test-second');

    Upon installing, KoLmafia asks me this:

    1620982493696.png

    ...and when I click "Yes", it overwrites scripts/first/dont-overwrite-this.ash with the contents from the clobber-test-second branch.

This a more serious issue. I am already mindful about giving unique names to ASH and JavaScript files, since they can be invoked from the gCLI regardless of their location. However, this revelation means that I must give unique names to CSS, HTML, and any other files that go into relay/. It's doable, but cumbersome and error-prone. I expect other developers would also trip over this.

Could this behavior be changed so that I don't have to worry about unique-naming every file?
 
Last edited:

fronobulax

Developer
Staff member
My recollection of history and events may be wrong and incomplete. But here we go anyway.

Early on there was a lot of concern about using KoLmafia as a vector for malicious activity on a user's machine, especially when scripting was introduced. One defense was to limit Kolmafia's use of the file system. When scripting was initially conceived and supported there was an assumption that all scripts would live in /scripts (not subdirectories thereof) and that filenames would be unique.

This assumption did not last long. The fact that MyScript.ash and myScript.ash were the same file on some operating systems and different files on others required some code adjustments. Users insisted on subdirectories of /scripts to both organize scripts and reduce menu sizes. Nevertheless the underlying assumption of name uniqueness remained. There are places in KoLmafia where a script file name is known but not an explicit subdirectory and KoLmafia will just look for the script file and run the first match it finds. So there are portions of KoLmafia that will not make a distinction between scripts/foo/phil.ash and scripts/bar/phil.ash and will prefer /scripts/phil.ash if it exists.

Similarly, if asked to run phil, KoLmafia isn't guaranteed to choose between phil, phil.txt, phil.ash and now phil.js in the way the user expects.

There are several places where user input has to be mapped into an intrinsic (to KoLmafia) command or a file on the file system. This mapping makes some assumptions about uniqueness that are perhaps no longer correct or desirable in an environment with several allowed file extensions and a directory tree (instead of a directory).

I might suggest that the SVN issues identified are not going to be successfully solved until KoLmafia's disambiguation can deal with a command name that maps to multiple files in the allowed file system.
 

Crowther

Active member
Similarly, if asked to run phil, KoLmafia isn't guaranteed to choose between phil, phil.txt, phil.ash and now phil.js in the way the user expects.
Just a side note: You can request "phil.ash" instead of just "phil" if you want to force KoLmafia to use the ash version.
 

fronobulax

Developer
Staff member
Just a side note: You can request "phil.ash" instead of just "phil" if you want to force KoLmafia to use the ash version.
Yes. To be clear the more information that is provided the less likely KoLmafia is to disambiguate and get the unexpected result.
 

MCroft

Developer
Staff member
My recollection of history and events may be wrong and incomplete. But here we go anyway
Thank you for providing this, btw. It helps figure out what we should be considering.
There are several places where user input has to be mapped into an intrinsic (to KoLmafia) command or a file on the file system. This mapping makes some assumptions about uniqueness that are perhaps no longer correct or desirable in an environment with several allowed file extensions and a directory tree (instead of a directory).

I might suggest that the SVN issues identified are not going to be successfully solved until KoLmafia's disambiguation can deal with a command name that maps to multiple files in the allowed file system.
It's hard to see how to fix this easily, or even in a painful and difficult way.

I don't understand the security or practical value of aggressively finding, modifying, and using a script based on a partial location. OS security practices have been closing these kinds of conditions for years.

So, what are the consequences of just disabling disambiguation? If ttpack wants to use util.ash, ttpack could reference ttpack/util.ash. Ditto autoscend. And if autoscend tries to install a file that ttpack uses, it can coexist. And if it's in the same directory, a perhaps-rewritten version of the warning phil showed will prevent users from making mistakes. We'd still use the "no changes outside the designated folder" restriction.

OTOH, we'd lose the use case xKiv mentioned: "I move my code to a new subdirectory and it still gets updated anyway." How bad is that? What problem are we solving by moving a file to a different directory than the repo expects it to be in, especially if the repo move commands are not broken.

It seems like we're helping a few people do iffy things at the expense of losing control of where the files go.
 

philmasterplus

Active member
For now, I would treat KoLmafia as a flat filesystem, where directories are superfluous and anything can overwrite anything. We could tell people to namespace all their file names (not just ASH or JS) with a unique prefix.



I can identify three use cases seemingly at odds here:
  1. I want to reorganize Script Menu to my taste
  2. I want to call and import <something>.ash from anywhere
  3. I want to use styles.css as a file name without overwriting another styles.css in another directory.
Where 1, 2 are fully supported right now, but 3 is not. I would like 3 while minimizing impact on 1, 2.

With regard to 1, I assume that people don't care much about moving files around in relay/ or data/. We can agree that scripts/ is a special directory, since (a) it's directly exposed via the Script Menu and (b) we expect people to put their own scripts in there.

One suggestion is to decouple the install/update logic from the script file lookup logic. That way, we could handle 2 and 3 separately.

For the install/upgrade logic, KoLmafia should use the "smart" filename lookup only if both the source path and the matched path are under scripts/. All other directories should follow Exact Path Match only.
(I prefer ditching "smart" lookup entirely, but that's something we could discuss another time)

The script file lookup logic can stay as it is. I don't like how it dives into relay/, but again, another time.
 

fronobulax

Developer
Staff member
The security issue initially drove scripts as the only directory that could contain scripts. Using double dots in a path name to implicitly traverse a directory was behind the no subdirectory policy. As users demanded other features the security issues were relaxed or forgotten. The security fears from a decade ago no longer seem as relevant as they once did but they are part of the "how did we get here?" story.

Ignoring the SVN issues for the moment, the first fix would be to make sure KoLmafia had a fully qualified path name internally and everywhere else it used scripts. This would almost certainly effect the Scripts menu and the various places scripts are called. The question of how Kolmafia should behave if what the user (or script) presented as a script name and that was not a fully qualified path name would need some discussion.

I am lazy and so wonder if solution is to put this back on the script author. It is up to them to manage the uniqueness of disambiguated names in SVN operations. I think this means using a named subdirectory and references need to have the subdirectory and script name.

I think we are still at the point where no one is happy :)
 

MCroft

Developer
Staff member
I am lazy and so wonder if solution is to put this back on the script author. It is up to them to manage the uniqueness of disambiguated names in SVN operations. I think this means using a named subdirectory and references need to have the subdirectory and script name.

I think we are still at the point where no one is happy :)
I agree on #2, and on #1, I see two issues:
A: we cannot refactor our own scripts, because we can't move anything around in our source repo and have it reflected in the project.
B: someone else can nuke our perfectly lovely scripts because they used the same name as we did.

I'm ok with saying it is the job of the script author, but we aren't giving them enough control of their own destiny to let them succeed or fail without getting shivved by a fellow script-author, unintentionally.

Maybe we need a mapping between the menu and a name there and a file. It might be nice to choose what's displayed, and not have the menu show ___auto_DO_NOT_RUN_FILES_IN_THIS_DIRECTORY

If I wanted to do that, I might make "dependencies.txt" have an optional menuItems list that was ignored by dependencies, but was used to populate (or skip) certain menu items and define what they ran. You could only list your entry points...
 
Top