Bug - Fixed "svn update" does not move files in local copy

fronobulax

Developer
Staff member
I think this is a great idea. We could add some sort of "svn unpack" command that would copy out your scripts from your svn folder to /scripts and /relay folders. Then at least the user can get the functionality we have now, but they could choose to do so at their own risk. I'd be happy to write this up, sounds like a great move

I'm confused, or confusing. Don't we do this now and isn't it exposed as "svn sync"? The use case is I want my local changes to your script to be preserved when you push an update, so I make my changes in the SVN directory, sync because I don't want to use the OS to copy a file and then hope at your next update SVN merge will do the right thing or complain loudly that it can't.

I think the core issue remains the deletion aspects of of mirroring from a local repository.

We could certainly write a NukeAndMirror command that copied, renamed and deleted so that the repository and KoLmafia file system were mirrors for the portions they were supposed to have in common, but I am not smart enough to write such a command and have it idiot proofed. I don't want to deal with the consequences of someone who ran it for /scripts. I have answered "Are You Sure?" prompts with the wrong answer, because I was tired or in a hurry, often enough that they are not a protection method I trust.
 

MCroft

Developer
Staff member
I'm confused, or confusing. Don't we do this now and isn't it exposed as "svn sync"? The use case is I want my local changes to your script to be preserved when you push an update, so I make my changes in the SVN directory, sync because I don't want to use the OS to copy a file and then hope at your next update SVN merge will do the right thing or complain loudly that it can't.
This is the comment block for syncAll()
Java:
/**
 * "sync" operations are for users who make edits to the working copy version of files (in svn/) and want those
 * changes reflected in their local copy.
 * <p>
 * Sync first iterates through projects and builds a list of working copy files that are modified.
 * <p>
 * For files that are modified, it then compares the WC file against the local rebase. If the rebase is different,
 * the WC file is copied over the rebase.
 */

It's an interesting and related thing that unfortunately can't help with the problem we're seeing. I think svn sync is designed to cover the "local development in the svn directory, pushed to the project". You would then presumably commit your changes and all would be copacetic. For that case.

Here's my code proposal (changes in red):

Rich (BB code):
private static void doPush( File file, String relpath )
{
       File rebase = getRebase( relpath );

       if ( rebase == null || Preferences.getBoolean( "svnNoPushRebase" ) )
              // this is okay; just make the file in its default location
              rebase = new File( KoLConstants.ROOT_LOCATION, relpath );
Even if it's not permanent, it lets PM+, me, and anyone else who wants to update their global_prefs to use the default location only on push. We can run with it and see if it is a good solution, then we can decide what we want to do.

The immediate goal is to find out if we can avoid the problems we're seeing with identical file names if we get rid of the rebase on push, or if we need to keep investigating.
 
Last edited:

fronobulax

Developer
Staff member
I am not sure I agree with your assessment of svn sync but am not motivated enough to run experiments or the debugger. I will say that synchAll does call doPush which AFAIK will copy whatever is in the svn directories to the script and relay directories and it does not care whether the svn files were changed by SVN updating from the remote repository or a user making local changes that have not (or cannot) been committed to the remote repo.

Perhaps the context of my comments and confusion remains unclear.

I see no evidence that KoLmafia implements or exposes the svn commit command. I have no recollection that KoLmafia ever intended to support developers who need to update a remote repository. Perhaps I have learned something or perhaps there is no support for "local development in the svn directory, pushed to the project" assuming "project" means the remote repository.

Since there are way too many things going on I'm going to put a stake in the ground and see whether I get burned on it.

KoLmafia is NOT going to be changed to delete files in /scripts or /relay regardless of the state of a local svn directory.

If the KoLmafia SVN commands result in a local svn directory that is not a mirror of the remote repository then there is a KoLmafia feature that needs to be documented or a bug that needs to be fixed.

Repository managers need to understand that moving and renaming in the repository may require deletes by KoLmafia in which case the first point applies.

The problem where two script authors use the same name for a file that gets placed in /relay or /scripts and the resulting collision breaks something needs to be addressed. I won't repeat my suggestions because I can't tell whether the response is from people who think it is a suggestion to solve a different problem or people who just have different ideas for this one.
 

heeheehee

Developer
Staff member
Probably a dumb idea, from someone who hasn't really thought about the problem, and has only skimmed this thread just now:

What if Mafia read scripts directly from the svn/ directory, without mirroring them to scripts/ or relay/ or wherever else?
 

xKiv

Active member
I see no evidence that KoLmafia implements or exposes the svn commit command. I have no recollection that KoLmafia ever intended to support developers who need to update a remote repository. Perhaps I have learned something or perhaps there is no support for "local development in the svn directory, pushed to the project" assuming "project" means the remote repository.
Local development is "supported" by letting commandline tools (and IDEs/editors) do their job. And then using "svn sync" in kolmafia.
 

fronobulax

Developer
Staff member
Probably a dumb idea, from someone who hasn't really thought about the problem, and has only skimmed this thread just now:

What if Mafia read scripts directly from the svn/ directory, without mirroring them to scripts/ or relay/ or wherever else?

My recollection was that was considered. I think it was rejected because of the level of effort needed, combined with the requirement to be backwards compatible with users who had scripts from non-SVN sources in /scripts and /relay. Using SVN was an option for script writers, not a requirement.
 

heeheehee

Developer
Staff member
What I had envisioned was just adding svn/*/scripts/, svn/*/relay, etc to the lookup path. I recognize it isn't trivially backwards-compatible, though -- you'd have to deal with file conflicts for existing installations.
 

MCroft

Developer
Staff member
I am not sure I agree with your assessment of svn sync but am not motivated enough to run experiments or the debugger.
I am naively going on the basis of the comment and also am not motivated enough to run experiments. It's interesting but it doesn't help here.
Perhaps the context of my comments and confusion remains unclear.
I still don't understand the context. Sorry.
Since there are way too many things going on I'm going to put a stake in the ground and see whether I get burned on it.
I don't think anyone wants to burn anyone. There is and should be institutional resistance to change when the clear reason for it need more discussion. I think the stake is a bit premature, and I'll be unhappy if we can't discuss this further, but I have yet to be unhappy enough about anything we discuss to want to turn you into St. Frono d'Arc.

KoLmafia is NOT going to be changed to delete files in /scripts or /relay regardless of the state of a local svn directory.
Is anyone proposing this? I think we're saying delete files *specifically because* they have been deleted in the local SVN directory. That they are added to a different local directory is, in our model, not handled in the equivalent of a transaction. Our SVN action is "update to HEAD", and you get, for instance, r38 in your $svn directory. But when it installs via svn update, it doesn't follow the same model and you don't have r38 in your scripts or relay directories.

This would just be litter in the filesystem (e.g. rename /scripts/foo to /scripts/bar) if we didn't use an aggressive find-and-run-by-name system. If you move a directory and it acts as is, you have not actually moved anything. If we remove the rebase, then we have to have a way to actually follow the Event Stack actions and do the delete+add

As a former Configuration Manager myself, this is chaos. You don't know the state, and you can't impose the state of your scripts on the running directory. It kinda works as-is, with the flaws noted, but only by giving up any possibility of knowing what's going on or why it's broken. I am not suggesting it was the wrong choice at the time, but I definitely think it has some painful ramifications now and is worth making better.

And by better I mean "it puts the script files where the version control system tells it to", but I'd also like the "we run the files in-place" and I'm open to other solutions as well.
Repository managers need to understand that moving and renaming in the repository may require deletes by KoLmafia in which case the first point applies.
What would it look like if I wanted to change scripts/myscript/foo to scripts/myscript/bar? Would I tell a user to delete my script and re-install? Would I need to abandon the one with the wrong directory structure and create a new project? Do I rename the contents and every reference to them inside my script? There's no way in KoLmafia to update a script and change the structure.
The problem where two script authors use the same name for a file that gets placed in /relay or /scripts and the resulting collision breaks something needs to be addressed. I won't repeat my suggestions because I can't tell whether the response is from people who think it is a suggestion to solve a different problem or people who just have different ideas for this one.
Should we split the thread? The reasons both issues are here is that they might have one solution. I think that having introduced javascript we have made it more likely, as packages that are being used might generate index.html as a part of each project. Or we might use versioned libraries in JS that would have problems if the configuration file wasn't updated, etc. The last project in would be the only "winner", placing (for instance) index.html in the directory structure of the very first installed index.html.
 

fronobulax

Developer
Staff member
What I had envisioned was just adding svn/*/scripts/, svn/*/relay, etc to the lookup path. I recognize it isn't trivially backwards-compatible, though -- you'd have to deal with file conflicts for existing installations.

I think the complexity of building a script selection GUI was also a factor. The threat/security model at the time preferred to minimize the directories KoLmafia had access to and most users knew they wanted to run Universal Recovery but did not know the associated SVN directory.
 

fronobulax

Developer
Staff member
Evil Script Writer "phil" created a script that when installed created /scripts/fills. Evil Clueless User had a secret project that created files in /scripts/fills perhaps even using and depending upon files "phil" had written. "phil" decided that they had chosen poorly and renamed fills to phils and updated their repository accordingly.

I am opposed to any action triggered by the effective deletion of fills in the repository that touches Evil Clueless User's files. Evil Clueless User may have to make changes if their project wants to use phil's latest and greatest files but phil's decision should not have any effect on Evil Clueless User's files.

In general we are throwing a lot of ideas around that assume the only way /scripts and /relay get changed is via actions that KoLmafia knows about. Who knows - there may be a git user who has a local repository and a shell script that updates it using git and copies the results into /scripts? And some people are still downloading scripts from forum posts here and in clans that choose not to share with the larger community.

There are many technical solutions to this available to Reformed, Professional Configuration Managers. Most of them won't work in an environment where script writers and KoLmafia users can be advised how to best do something but cannot be compelled to do so. The generic problem of un-versioned files in a SVN environment is usually solved by decree - "Don't do that and don't whine to me when you lose something". That is easier than making tools aware that there are files they are not supposed to touch and updating whatever mechanisms are used when the collection of files to be ignored is changed.

The "search for a match" is not needed for users who select script files to run from a GUI or the file system, but is needed for users who run scripts from the gCLI and don't want to type a complete path name. It also benefits script writers who have less typing to do when using shared files.

The script disambiguation always had the potential to break when there were file names that were not unique across the search space. It just seems to be breaking more now as well established scripts are being forked or renamed and JavaScript writers try to create or replicate the run time environment of their choice in KoLmafia.

My inclination is for KoLmafia to assert and enforce some kind of control or punt and give script writers tools to identify issues. For example I can see modifying the disambiguation code to still return the first match but continue the traversal and emit a message when a second match is found. I can see a tool that accepts a string and returns, a possibly empty, list of matches. This pushes all the work on the scripters but it points the way for scripters to solve name collisions. If a writer insists on renaming directories and I insist on not allowing SVN to delete directories, the script writer could have a version of the script that included a check for name collisions potentially created by the rename and respectfully asked the user to delete the "old" files.

We can also go down the path of allowing a script writer to identify a directory tree as MINE. KoLmafia would enforce mirroring. The user would have to accept responsibility for the consequences of also using it and we could some up with something to prevent malicious script writer fronobulax from deliberately creating a script that updated phil's MINE.

Morning coffee is wearing off :)
 

MCroft

Developer
Staff member
MINE as a concept seems like a promising workaround. I promise to only call it “MINE-crafting“ in my head. This is my lane, I shall do as I please here, and touch It at your own risk Vs this space is open for community shared use. Not sure how to implement it (no coffee yet), but it seems to offer a free swim/structured lane hybrid that would let a developer either live with the current model (in cases where it was advantageous) or have a way to enforce structure. it gives us a way to change without breaking existing processes.
 

philmasterplus

Active member
For what it's worth, KoLmafia as of right now allows the following scenario to happen:

  1. User installs Project A, which has:
    Code:
    relay/
      one/
        foo.txt
        bar.txt
  2. User edits foo.txt
  3. Project A's developer deletes foo.txt. Git records this as a file deletion since relay/one/bar.txt still remains.
  4. User runs svn update. KoLmafia deletes the local copy containing user changes.

In this case we "fail" to protect user changes from being deleted. Hence the status quo could be described as "files are unprotected, directories are protected".

FYI I personally treat any subdirectory under scripts/ and relay/ as a de facto "mine" dir. I think it's a sane philosophy.

P. S. For future readers, what we refer to as "script disambiguation" is sometimes called a "module resolver" in JavaScript land, but the concept goes as early as C++ where everyone struggles with "where is this header file coming from?"
 
Last edited:

fronobulax

Developer
Staff member
If I get up from my seat and the top thing on my ToDo list is KoLmafia I'd probably go for the disambiguation collision reporter instead of MINE. Less intrusive to the code base and makes easy for a human to deal with the problem.

If I were convinced to work on MINE instead I would build a persistent mapping between a repository (the name of a directory one level down from mafia's svn) and any subdirectories of /scripts and /relay that were created by a fresh install. We could call it the MINEList and the process of building and shaping it MINECrafting.

I would then monitor install requests to confirm that any additions to the MINEList were unique based upon the state of the list at install. I would abort the install if that was not the case.

I would monitor update requests to confirm that the request only dealt with brand new directories in the MINEList or only referenced directories previously registered by the repository. I would abort the update if that was not true.

I would make sure the MINEList was updated when directories or repositories were deleted.

I would write new code that no kidding mirrored the repository (and thus allowed renames, file deletions and the removal of any files that the repository did not contain, user be darned) and use it in the SVN update for repositories in the MINEList.

I would write something that traversed the svn directory and updated the MINEList based upon what it found, or didn't. I would consider a utility that told a user about subdirectories in /scripts and /relay that were not in the MINEList just in case the user wanted to do some manual pruning.

I would acknowledge the first come, first served (or alphabetical) ordering. If repository A and repository B both want to use /scripts/bozo then if A is installed first B will not be able to be installed but if B is installed first A would generate the error.

I would hope the existing install mechanic guaranteed unique directory names for repositories and would withstand a malicious attempt to have myscripts at GitHub somehow overwrite myscripts at SourceForge.
 

fronobulax

Developer
Staff member
For what it's worth, KoLmafia as of right now allows the following scenario to happen:

  1. User installs Project A, which has:
    Code:
    relay/
      one/
        foo.txt
        bar.txt
  2. User edits foo.txt
  3. Project A's developer deletes foo.txt. Git records this as a file deletion since ICODE]relay/one/bar.txt[/ICODE] still remains.
  4. User runs svn update. KoLmafia deletes the local copy containing user changes.

In this case we "fail" to protect user changes from being deleted. Hence the status quo could be described as "files are unprotected, directories are protected".

FYI I personally treat any subdirectory under scripts/ and relay/ as a de facto "mine" dir. I think it's a sane philosophy.

P. S. For future readers, what we refer to as "script disambiguation" is sometimes called a "module resolver" in JavaScript land, but the concept goes as early as C++ where everyone struggles with "where is this header file coming from?"

Might I suggest you review https://wiki.kolmafia.us/index.php/SVN_Primer and consider what to add to it to make it clear that the consequences of not following the recommended procedure (i.e. edit in the svn directory, not /scripts or whatever) are to lose local edits at the next update. I'm sure we could make it more visible but it is known, accepted and documented that user, local changes can be lost if the user edits the running copy in /scripts (or whatever) rather than make them in the appropriate location under svn. So I see your example as something that is already documented and working as expected.

In your case if you edited foo.txt in the local repository SVN will detect a conflict when it tries to delete a file that has changed locally, I seem to recall.

You might benefit from a history lesson. SVN has been supporting organizations that provide software for a very long time. Anyone who has used it understands that using branching, deleting directories, moving files, directories or repositories, and renaming files, directories and repositories can have unexpected consequences especially when done by someone who does not know the tool well. Experienced SVN users tend to not trust merges either. Poorly staffed organizations tended to have a list of Thou Shalt Nots. Developers who were given commit access were required to agree to them and management would reprimand developers whose failure to follow the rules created additional work or caused deadlines not to be met. Well staffed organizations had a person whose job it was to manage the repository and when there was a good reason to do some of the difficult operations they were done by the trained SVN manager and not a developer.

CVS was perhaps the earliest free and widely adopted version control system. SVN evolved from it.

The origin story for git varies depending upon who you talk to but it was definitively developed in response to the shortcomings of SVN et. al. in a distributed environment. It is said that using CVS as an example of what not to do, was a design criteria for git.

So it is built into git's DNA that it will do things differently from SVN and in many cases better. But doing things that "just work" in git and expecting to map perfectly into SVN is not realistic. There will be a time when the cost of bridging the gap between git and SVM will be to not take advantage of git and do things SVN's way. Your casual approach to deleting and renaming is one example. You don't have to think about it with git. It just works. Someone using SVN isn't even going to think about doing it.

At some point all of these cases where a scripter an do something in git that cause problems for SVN, mafia's integration of SVN, and mafia's support for scripts that are not version controlled, doesn't really address the current situation as much as it starts making the case for mafia to directly support git.
 

fronobulax

Developer
Staff member
Well.

I looked at the disambiguation code. Turns out there is a routine that traverses directories and makes a list of possible script matches. One place it is called is the gCLI when it thinks what was typed might be a script name.

I set up a case with duplicates and I got
> Alice

D:\KoLmafia\dist\scripts\Alice.ash
D:\KoLmafia\dist\scripts\BogusA\Alice.ash
D:\KoLmafia\dist\scripts\BogusB\Alice.ash

[Alice] has too many matches.

So the aggressive finding is not actually going to do anything wrong for a command line.

It looks almost trivial to add a gCli command that will pass the input to the same code and display the results. I will probably do that.

It is possible that somewhere there is a use where the caller uses the first thing returned and continues. That is an error in using the results of disambiguation and not the search itself.

It is possible there are places that should use the underlying search but don't. I should look at how include files are handled.

Routine doesn't seem to expect to look for html or css files.

Anyway the problem looks easier, harder and different from what I was expecting. Real life is calling but this is too interesting not to come back to.
 

fronobulax

Developer
Staff member
KoLmafiaCLI.findScriptFile does the heavy lifting. Every place it is called but one the caller eventually treats anything other than one match as a do not continue error. The exception is SafeRequire in the javascript interpreter area. The code uses a filter and a syntax I am not completely comfortable with but it looks like it will accept the first file it finds rather than generating an error. Going out on a limb I will say that it looks like javascript that calls ash might not get what is expected.

In my example above javaScript that called Alice on my system would get Alice.ash in scripts when it probably wanted the one in BogusA, assuming that was where it was installed.

So before we get too deep someone smarter than me needs to fix SafeRequire so multiple matches is an error or explain the difference between what I think the code is doing and what it does. I suspect all the angst about non-unique filenames ash files is caused by this. If fixed it should at least detect multiple matches in the file system and the user will know there is a problem.

AFAIK KoLmafia does not look for html or css files using findScriptFile so what is finding them?
 

philmasterplus

Active member
AFAIK KoLmafia does not look for html or css files using findScriptFile so what is finding them?

I set up a breakpoint in SVNManager.java to trace the code step by step while installing and updating a project with gCLI commands. That code calls findScriptFile() for every file that is "created" in the working copy. Not just ASH, but also CSS, TXT, and HTML.

Edit: See https://sourceforge.net/p/kolmafia/...ourceforge/kolmafia/svn/SVNManager.java#l1682

Edit2: On second thought, I suppose you meant "when the web browser requests a CSS file from localhost, does KoLmafia use findScriptFile() to locate the file inside relay/ and serve it?" In which case I believe the answer is "No, it does not use findScriptFile() for that".
 
Last edited:

fronobulax

Developer
Staff member
Yeah. Since KoLmafia is not explicitly resolving locations for html and css using findScriptFile then something else needs to be investigated. When I set up a runtime environment for other languages I often have to configure the environment so code can be found. Java's classpath for jar files or class files and an operating system's path command for executables are two examples. So what is the equivalent for JavaScript running in KoLmafia? Is there an assumption that all files are found in the current, working directory as defined by where the equivalent of main() lives? Can that be changed or overwritten by a script writer? Would running from scripts/whatever instead of relay or relay/whatever reduce the chances of conflicts? Could a script writer be told to use my_index.html and my.css instead of assuming the file with a default name and location is the desired file?
 

philmasterplus

Active member
I'm treading unfamiliar territory; gausie and bean would be the right people to consult with. But I believe KoLmafia's JavaScript runtime uses different logic for importing ASH and JS.

When importing an ASH script it uses findScriptFile() in the code you saw. I can call require("somefile.ash") in JS and get the same module resolution I would get when using import "somefile.ash"; in ASH. So we face the same problems that ASH writers do. Most of the time, we can say "I can import Zlib wherever it is? That's neat." and move on.

When importing non-ASH (i.e. JavaScript) files, it uses a different resolution scheme provided by Rhino:
  • Relative paths (e.g. require("./foo/bar.js")) are resolved relative to the current script.
  • Non-relative paths (e.g. require("scripts/foo/bar.js")) are resolved against two possible "root" paths: KoLmafia's working directory and the scripts/ directory.
This means that JavaScript files cannot be moved as easily as ASH files. If my JS script uses require("./util.js"), it always wants the util.js that is in the same subdirectory, not somewhere else. Moving my script or util.js would break my script.

In practice, this has never been a problem for me, and I expect the same for other JS devs. We usually concatenate many JavaScript files--including other JS scripts we import--into one large file before publishing them. Single-file scripts with no require() calls can be freely moved around. We don't have runtime dependency issues simply because we don't have runtime dependencies.

This might become a problem as more people join JavaScript for KoLmafia. Some people may use bundling techniques that produce multiple files instead of one big file. Some people may prefer to publish multiple small JavaScript files without a bundling step. Then we'd have the same problem all over again: "All these JS files are cluttering my Scripts Menu, but I can't move them without breaking everything!"

But I don't believe JS writers would want to change how require() works, in order to fix what is essentially a user interface problem.

Edit: I'm pretty sure CSS and HTML files don't need the findScriptFile() treatment. relay/ essentially functions as the /var/www/html/ of Apache, and I doubt anyone wants to change that?
 
Last edited:

fronobulax

Developer
Staff member
The require call of findScript needs to generate an error if there is more than one result.

If we ask JS writers to explicitly specify a file is in the current directory or a subdirectory of the current directory and ask them to create a subdirectory of /scripts for their scripts does that eliminate name collisions with any other ash or JS script? I think yes but am open to being educated.

If css and html files are expected to be in /relay and by analogy to /var/www/html are expected to be shared do we make certain named files part of the KoLmafia distribution (with all the baggage that implies) or to we voluntarily impose a naming convention on script writers in hopes of reducing collisions?

Can we tell rhino to use the script's directory instead of /relay for name resolution?
 
Top