Feature git based script checkout

MCroft

Developer
Staff member
So, based on this thread about an issue with some GitHub repositories and SVN, I made an off-hand comment about implementing JGit.

People responded, so I tried it.

It turns out it's pretty easy, but also complex.

I added the latest version of eclipse JGit and dependencies (sfl4j, ugh), and it worked! or sort of.

Java:
try ( Git result = Git.cloneRepository()
                      .setURI( REMOTE_URL )
                      .setDirectory( localPath )
                      //.setProgressMonitor( new SimpleProgressMonitor() )
                      .call() )
{
        // Note: the call() returns an opened repository already which needs to be closed to avoid file handle leaks!
        // I assume that try with resources will solve that...
        System.out.println( "Created repository: " + result.getRepository().getDirectory() );
}
catch (GitAPIException ex  )
{
        System.out.println("Error" + ex );
}

I have one problem and a few design questions:
  1. (problem): git does not honor subdirectory matches. I can see documentation for various command line GIT clients on how to set parameters around depth and filters and sparseness, but I don't know what I'm doing and don't even really speak the lingo. Some expert help would be, err, helpful.

  2. (question): What commands do we need? I only implemented "clone". delete probably doesn't call JGit at all, but the equivalents of list and update are required, I assume.

  3. (question): Do we follow the same SVN checkout/sync model we are doing in SVN? Or do we use this to experiment with some of the other design patterns we've talked about?

  4. (question): Do we put these scripts into the ScriptManager, or maybe later, or no?
 

MCroft

Developer
Staff member
Since this is currently entirely hypothetical, we can also reinvent things if we want.

we can just check out all of autoscend and ask them to put manifest.json in the root, and that tells us what files to add to the various folders, what the valid entry points are, all the detail for the script manager, dependencies, urls, etc.
 

philmasterplus

Active member
Awesome work!

But we should first review why everyone wants to check out/clone subdirectories, and why some projects are using release branches.

It's because KoLmafia enforces a strict rule on the directory structure. We can't have READMEs, package.jsons, or even .gitignore files.

This could be "solved" if we could tell KoLmafia to sync a subdirectory of the repository instead of the entire repo.
Code:
release/           ─┐
  scripts/          │
    foo.ash         ├─ KoLmafia should sync this
    bar.js          │
  relay/            │
    relay_foo.ash  ─┘
README             ─┐
LICENSE             │
.gitignore          │
src/                ├─ Cloned but not synced
  ...               │
lib/                │
  ...              ─┘

Maybe the hypothetical manifest.json would point to a subdirectory that should be synced:
JSON:
{
  "sync_dir": "./release/"
}

Alternatively, the sync dir could be hardcoded into KoLmafia itself.
 

philmasterplus

Active member
Looking at other package managers...

Cargo (from Rust) supports git URLs, optionally with branches or tags:
Code:
cargo install --git https://github.com/my-org/project-foo.git
cargo install --git https://github.com/my-org/project-foo.git --branch my-branch
cargo install --git https://github.com/my-org/projcet-foo.git --tag my-tag

Cargo looks for a file named Cargo.toml anywhere in the repository. This means that you can use an arbitrary subdirectory as the "installable package".

NPM (from Node.js) supports installing from git URLs, optionally with "commit-ish" which can be a branch/tag name or a commit hash:
Code:
npm install git+https://github.com/my-org/project-foo
npm install git+https://github.com/my-org/project-foo#my-branch
npm install git+https://github.com/my-org/project-foo#my-tag

Unfortunately, NPM doesn't seem to support installing subdirectories inside a repo.
 

MCroft

Developer
Staff member
+1 for noting that this is a package manager built on top of a VCS system. That helps clarify my thinking.
 

MCroft

Developer
Staff member
Gitmotize Project Design Notes: 6/22

Goal: gCLI and ASH -- git command parallel to Svn commands
MVP: clone, pull, sync, delete
  1. git clone <giturl> -- converts URL->/git/SVN-Like-Path, clones repo to that directory. ToDo: Sync
  2. git list -- not yet started
  3. Pull
    1. git pull all -- not started
    2. git pull <projectname>
    3. git pull [<giturl>] -- equivalent to svn update. ToDo: Sync
  4. git delete <projectname>
  5. svn increment|inc|decrement|dec <projectname> -- I didn't even know these were there...
  6. git sync

ASH
  1. boolean git_exists( string )
  2. boolean git_at_head( string )
  3. record git_info( string )

GUI
  1. menu
 

MCroft

Developer
Staff member
Other things I need to know to finish the project.
  1. SVNKit's sample code may have been overkill. I am not convinced we need a lock file, but we are multi-threaded and it's not hard to do.
  2. As long as KoLmafia is searching for scripts in arbitrary locations within the scripts directory, we can't get rid of the rebase logic. @fronobulax was looking into that. We should discuss how that's going to affect this code and vice versa.
  3. Structure-wise, I think I should make SVNManager and GITManager both inherit from SCManager (or ScriptPackageManager, which is more accurate). That way Sync and Delete can be common and I'm half done for free.
  4. I am somewhat nervous about my lousy code.This would be a bad place to be naively insecure.
Feedback solicited, btw. I would love opinions here.
 

reverkiler

New member
TL;DR: Checkout git into some other kolmafia location (git/) and then copy the files to the relevant mafia directrories

MCroft, would it be helpful to reconsider what defines a "package" in KoLMafia terms?

Currently a package is, as I understand it:
A directory called "KoLMafia", which contains:
- dependencies.txt (optional), which lists which other projects this depends on
- scripts/ (can have nested subdirectories, etc). which contains the scripts to run
- ccs/ which contains ccs to be used
- relay/ which contains relay overrides
- data/ which contains data files used by the script


What I am thinking about suggesting is that for git going forward, you could provide some sort of "manifest" describing your package.
This would need to be in the toplevel of your git repo, say "manifest.txt"

Mafia could then use this to construct the directories i listed above, so that way we wouldn't need to worry about "checking out a sub directory" of a repo.

Does this make sense? It seems like you are spending a lot of time fighting git on how to solve a very mafia problem (we need to get the files in a particular directory structure). I'm instead proposing that you have a seperate place where you check out git repos, and when they update you rebuild them to the relevant directories using the manifest.

Importantly, I know that we want to maintain backward compatibility with existing repos, so I am still not 100% sure how we do that, but the workflow would look something like:

- checkout a git repo, the whole thing, into some new mafia location (git/ maybe?)
- have a "bundler" that runs through that repo and pulls out the relevant files (either using some manifest, or discovery based on file structure)
- copy the files from that directory to the relevant "script", etc. directories

If this is barking up the wrong tree, then no worries, I just thought it was worth considering.
 

MCroft

Developer
Staff member
TL;DR: Checkout git into some other kolmafia location (git/) and then copy the files to the relevant mafia directrories

Importantly, I know that we want to maintain backward compatibility with existing repos, so I am still not 100% sure how we do that, but the workflow would look something like:

- checkout a git repo, the whole thing, into some new mafia location (git/ maybe?)
- have a "bundler" that runs through that repo and pulls out the relevant files (either using some manifest, or discovery based on file structure)
- copy the files from that directory to the relevant "script", etc. directories
.
This is exactly what I want to do. My plan is to use manifest.json, so it’s both structured and human readable and doesn’t rely on order or tabs vs spaces

For backwards compatibility, if there is no manifest, follow the current convention. This lets us use this model with both git and svn and things that will never be updated.

Rev 2 will include all the ScriptManager data in the manifest, so we can more easily build that out for new projects. Maybe also dependencies because there’s no need for two metadata files.

In Rev N+1, maybe we also put default properties in it, so we can restore easily. Maybe someone else will get excited about that and decide to write it, though. Other expansions that aren’t MVP can also be added later; We can have incompatibilities, min Mafia version before we try to install, long description, external documentation link, Readme, Install success message, whatever….


Since svn already uses svn/, I am planning to use git/ as the repo home. The git sync or sync after pull/clone option will probably be common between SVN and GIT, except git will use the package base directory from the manifest.

We might even be smart and know if you try to update a svn based package to try git if it has a .git directory but no .svn. Or at least we can tell them to use the other one. Rev N+1…
 

MCroft

Developer
Staff member


I could also use some help with git as a client. Which is working great with git (for clone and pull), but now I have to replicate some-to-all of the SVN functionality, including the rebase commands. I feel like that's my big project, since I seem to have started on it, and the build side of things may need additional eyes.
 
Last edited:

heeheehee

Developer
Staff member
I usually use `git stash; git fetch; git rebase; git stash pop` for pulling new commits into a repo with local changes (middle two commands can be replaced with `git pull --rebase` if that's what you want).

I'm admittedly not totally sure what you're looking for, since svn doesn't have a rebase command that I'm aware of? All of our changes are in trunk, so I also don't see svn merge being very relevant to our use case (other than, say, reverting bad commits).

But, I can try to answer any specific questions you might have.
 

MCroft

Developer
Staff member
My untrustworthy memory is that KoLmafia only intended to implement a subset of SVN commands. The original focus was on commands a script user would need to install a script, keep it updated and then uninstall it. While some commands were implemented to assist a script writer in maintaining a script there was never the expectation that a script writer's only repository tool would be KoLmafia. I could argue that philosophy should continue. I could also argue that it is a convenience, not a requirement, to allow a KoLmafia developer to build locally using KoLmafia as the only git tool. Perhaps we have lost sight of the target and the goal?
This is my naive implementation.
  • git clone RepoURL branchname
  • git pull RepoURL branchname
git works differently from SVN, and is probably a bit more granular. And I *think* HeeHeeHee's string of commands works like svn update, in that it preserves and tries to merge the changes made locally.

I think "preserve local changes to the repo" is probably going to be after the minimum viable product get ships that basically allows us to "stop digging" in the hole we've got.

I've got those commands working wrt to git repos, but not the infrastructure around them.

The technical hurdle I am aware of is obtaining a version number from git that "works".
That we may have solved, but it's a consideration for "git for KoLmafia" not a consideration for "git for scripts." We may want to move these last couple or three posts off to that thread.

git for scripts is where I'm digging in and trying to work on it, and it seems higher priority to me.

Things I have on my "this is not easy" list:
  1. dependency management, since a URL can either be a git or svn repository.
    I may solve this by trying one first and picking the "winner". It probably needs to be updated in SVNManager as well.

  2. Identifying file level changes between the two repositories like we do in SVN to tell us what to delete.
    Tentative plan: don't do it. my sub-folder space is mine. sync is no longer sync, but really a full copy. Do not modify scripts for script 'X', do not store data in "scripts" or "relay" because an update will reset it to factory (git) values. Warn that. Will not work well with self-modifying scripts.

  3. displaying a change list. (may be post-MVP).
    Hard to know what version to display it since, or else easy and I don't get it.

  4. dealing with rebased/conflicting files & the lack of namespace.
    Force my package directory structure. Rather than update SameName.ash in OldProject's location, delete it. Then copy the full files as requested.

  5. the Script Manager

I think with 1, 2, & 4 we can put it out there
 

heeheehee

Developer
Staff member
Yeah. If you try to `git pull` with pending changes, then it'll stop you and say "hey you need to commit or stash your changes". If you commit your changes first, by default, git pull will create a merge commit. That's where git pull --rebase comes in (or fetch -> rebase if you want to do it manually). I see modest stash support in JGit. (There's StashCreateCommand which is half of it, but instead of pop, it's fragmented into StashApplyCommand and StashDropCommand, and StashDropCommand has the caveat "Currently only supported on a traditional file repository using one-file-per-ref reflogs" which is probably mostly fine. And, the API seems a bit awkward since you have to store the return value of StashCreateCommand#call() and then use getId() to get the ref that the other commands want. But, all-in-all, it's workable.)

3. For the MVP, I think it's reasonable to hide branching from scripts. In that case, you can just keep track of the the head commit at origin before the merge, then `git log abc123f..origin` (or whatever name you give to the remote),

e.g. `git log 036a66..github/main` gets me the changes in my git-svn repo since r20791. (JGit has a LogCommand which should work here.)

5. Config side seems simple enough -- add a new field expected in svnrepo.json, indicating VCS in use; if not present, assume SVN (we could do the migration in one go, but eh, not necessary).

Problems 2 and 4 are largely not a VCS issue as I think has been discussed before, but git certainly can show you file deletions (JGit's IndexDiff looks like basically what you want).

Otherwise, overall seems like a reasonable plan.
 
Top