Feature Something to automatically prune git history

Irrat · 2025-09-16T01:44:24+0100

Maybe svn too? But no one should be using that.

Some git repositories, such as KolItemPrices (55mb) or garbage collector (76mb), have a long commit history. Cloning these repositories downloads the entire history, resulting in a relatively large .git directory which can be an issue for some users.

I was thinking about a new preference that allows users to limit the depth of the git history stored locally. Set to a default, probably 50? Setting it to 0 or less would disable the feature. An alternative is to only remember recent commits, but that seems more complicated.

Looking at GitManager, updating git clone is the easiest part, you can call setDepth. But the other half of this, pruning an existing repo, seems a bit more complicated, especially with the "changelog" thing.

heeheehee · 2025-09-16T13:17:23+0100

Irrat said:
Looking at GitManager, updating git clone is the easiest part, you can call setDepth.

Note that you'll also want to setDepth() on any pulls or other operations that internally invoke git fetch.

Irrat said:
But the other half of this, pruning an existing repo, seems a bit more complicated, especially with the "changelog" thing.

Are you referring to the preference `gitShowCommitMessages`? That only prints commits since the last fetch. And even then, if a repo had a thousand commits since the last fetch, we still probably don't want to spam the user (or the log) with all those changes.

Can we just do shallow fetches combined with git gc?

fronobulax · 2025-09-16T13:33:03+0100

Irrat said:
Maybe svn too? But no one should be using that.

"should" is an awfully strong word in this context. If you feel strongly then maybe we need a project to migrate abandonware and convince inactive maintainers to move to svn with the idea that we will eventually abandon svn and delete the related code from kolmafia.

I have limited understanding of git but could you not resolve this using a command line git tool instead of KoLmafia's built in support? It was never intended that mafia support would go beyond allowing a user to pull updates. At some point is was assumed that someone doing development type activities would use another tool.

Irrat · 2025-09-16T14:53:03+0100

heeheehee said:
Note that you'll also want to setDepth() on any pulls or other operations that internally invoke git fetch.

Are you referring to the preference `gitShowCommitMessages`? That only prints commits since the last fetch. And even then, if a repo had a thousand commits since the last fetch, we still probably don't want to spam the user (or the log) with all those changes.

Can we just do shallow fetches combined with git gc?

I meant that I assumed that if we pruned the git history, we might not see those messages. But admittingly I didn't dive too deep as I wasn't interested in the time spent here. But agreed, most users won't care, most users probably don't even acknowledge the messages.

fronobulax said:
"should" is an awfully strong word in this context. If you feel strongly then maybe we need a project to migrate abandonware and convince inactive maintainers to move to svn with the idea that we will eventually abandon svn and delete the related code from kolmafia.

I have limited understanding of git but could you not resolve this using a command line git tool instead of KoLmafia's built in support? It was never intended that mafia support would go beyond allowing a user to pull updates. At some point is was assumed that someone doing development type activities would use another tool.

I meant 'should' in the context that it's not really used for new projects. Also, any old projects still on svn are unlikely to see substansial activity.

The reason I proposed this as a feature instead of an automated task in my own enviroment is because this is more about the long term effects as those projects see more and more commits, and users will see the folder increase in size without being able to do anything about it, with "git" being assumed to be a funsies command that KoLMafia made up.

I see this as more of a minor issue that would be nice to future proof, along with improved speeds in some cases (initial checkout). Garbage Collector shouldn't take me 10-30 seconds to download. Admittingly, I forgot to specificy a branch, and I assumed the 267mb or so was the size everyone else was seeing. I've corrected it to the 76mb.

Granted, in an ideal world this wouldn't matter. But people do end up noticing. No one really likes having large files sitting around for no reason, and when it starts hitting 60mb+, people tend to start noticing. Especially if they're moving around files.

Among the possible solutions, the easiest and cleanest, with about no downsides for 99.9% of users, would be to automatically prune commit history. For the 0.01% of users, the preference to turn this on/off. I expect this to only be an issue if there is a desire to fork a deleted repository.

MCroft · 2025-09-16T15:21:52+0100

So, if I guess correctly, there are two classes of users:
end users want to get the latest, or perhaps a particular tag or revision, and no history. They are using git to get the latest version only and don't really interact with it in most git contexts.
developer uses want to work on the code, perhaps see git blame, or do something else. I don't know if they get the code from inside mafia, or use their own dev tools or the command line. I'd love to hear what the Loathers discord users say about how they develop, and also devs here who aren't part of that tribe.

We don't support the "releases" model, because script projects work really well by giving the user the source text files. For our compiled code, we do use that. Scripts like Autoscend have some releases, but nothing since 2021.

I like the idea of making the process more efficient/less resource intensive, but I've never noticed an issue with a large .git directory. Might be a problem if you're trying to run from a RAM drive on a small device, but we have a java app, so we're not super lean in the first place. It might be nice if a script had a flaw in it and we didn't want it to be exposed even in the history, but that's hypothetical; I don't know of any such issues.

Have you done any experimentation/timing tests/size comparisons with "what if I set depth to 1?"

I guess my expectation is that you'll find that it's measurable but practically negligible to optimize this way.

Irrat · 2025-09-16T16:18:48+0100

MCroft said:
Have you done any experimentation/timing tests/size comparisons with "what if I set depth to 1?"

I guess my expectation is that you'll find that it's measurable but practically negligible to optimize this way.

Size of a git clone --depth 1 https://github.com/libraryaddict/KolItemPrices.git

Code:

KolItemPrices$ du -hd1
244K    ./data
276K    ./.git
524K    .

Size of KoLMafia's version.

Code:

libraryaddict-KolItemPrices-master$ du -hd1
244K    ./data
57M     ./.git
57M     .

And running this on my existing repo also trimmed it down. Admittingly, this may not be exposed via java, and this may be bad practice. Might even be that the best usage is to reclone the repo every X commits.

Code:

$ git fetch --depth=1
$ git reflog expire --expire-unreachable=now --all
$ git gc --prune=all

Trimmed it down. But notably, it is not able to push to a remote repo.

But if you wanted to be cautious about it, could limit the scope of this to just a command, or a way for a script to flag itself as pruneable. Though I'm unsure how you would do the latter, without first doing a full clone. Via argument perhaps? Not the best solution in my opinion, but it is what it is.

Feature Something to automatically prune git history

Irrat

Member

heeheehee

Developer

fronobulax

Developer

Irrat

Member

MCroft

Developer

Irrat

Member