Feature Prefs file backup/restore Draft PR

fronobulax

Developer
Staff member
I suggested a database because my experience, primarily with SQL based systems, is that if properly configured backup, recovery and several other useful protections come "for free" with the decision to go there. But the driver is that if only one thing is changed then only one thing is written. I am becoming more and more convinced that corruption, not cause by power problems or hardware issues, is because we write all of the preferences when only one changes and when we read the file we overwrite rather than merge.

I'm not opposed to the PR but still believe it is not fixing the problem, just making it less painful if it occurs.

I think the issue is specific to preference files and not all files KoLmafia manages. It might be worth exploring using a local git repository for backups. Motivated users (since I don't think mafia supports this) could use a diff tool and manually merge changes instead of just choosing the least out of date file.
 

MCroft

Developer
Staff member
I suggested a database because my experience, primarily with SQL based systems, is that if properly configured backup, recovery and several other useful protections come "for free" with the decision to go there. But the driver is that if only one thing is changed then only one thing is written. I am becoming more and more convinced that corruption, not cause by power problems or hardware issues, is because we write all of the preferences when only one changes and when we read the file we overwrite rather than merge.

I'm not opposed to the PR but still believe it is not fixing the problem, just making it less painful if it occurs.

I think the issue is specific to preference files and not all files KoLmafia manages. It might be worth exploring using a local git repository for backups. Motivated users (since I don't think mafia supports this) could use a diff tool and manually merge changes instead of just choosing the least out of date file.
I'm ok with making things less painful while we figure out a better long term solution. The weakest part of my approach is that I've never had this problem and I don't have any examples of corrupted preference files.

Under the hood the a DB is writing to files (and DB corruption is miserable to deal with), but what you have going for you is a manager that handles those writes. If the concern is simultaneous writes, maybe we need a single prefsWriter thread that reads from a blocking queue that every thread that currently writes prefs can populate. Which also might be a step towards pulling it out and putting in a DB based prefsWriter.

I almost agree on prefs being the only file with a current issue, but I suspect that if we wrote to anything else with the frequency and glee with which we write to the prefs, it might not be the case.

If you want to look at Git for data, you might want to look at Dolt or similar products (https://www.dolthub.com/blog/2020-03-06-so-you-want-git-for-data/). I don't know if the branching/merging/versioning capabilities are worth the effort of not just using a SQL database. (edit: the Dolt folks say that it is a version-controlled database https://www.dolthub.com/blog/2022-08-04-database-versioning/). I think it's an interesting approach that I haven't messed with before and which I'll leave for later incremental PRs.
 

fronobulax

Developer
Staff member
Jury is out on the backup since there are OS level solutions that require no changes to mafia. But we can do it if people think we need to.

IMO the simplistic avenue to explore is to find a way so that reading the file, writing the file and changing a preference in memory are done in a controlled fashion. We could probably get that by synching some operations but there may be a performance hit. I also recall a scheme that worked around a similar problem by writing multiple files. Reads are from file A. Writes go to file B. When a write is done B replaces A. That means a read has to be a merge operation rather than than blindly overwriting. Might not work as I remember :)
 

xKiv

Active member
I also recall a scheme that worked around a similar problem by writing multiple files. Reads are from file A. Writes go to file B. When a write is done B replaces A. That means a read has to be a merge operation rather than than blindly overwriting. Might not work as I remember :)

The full scheme is a bit more complicated than that - it also writes a "lock file" [1], so that other concurrently running instances [2] know they should refrain from writing. But it's a very standard way of making sure that you can guarantee that you wrote a complete new version of the file before you get rid of the previous version. (or rather, backup the previous version byrenaming it, then put the new version in place also by renaming it).
(I would also *not* call it "merge", that has connotations of reading records from two files and writing them all into a third file .... or working with a fixed-length record files)
(you also need to somehow explicitly handle the situation where your code finds that there already is an old "B" file - maybe just even ask the user to handle the situation manually, of course)

[1] I don't exactly remember what the precise solution is for remote filesystems or other cases where you cannot guarantee an "exclusive create" mode
[2] in the case of mafia, I think we would only be worried about cases when remote syncing copies files between *different* machines that are currently running mafia? Is that something that happens? I can imagine forgetting that I left mafia on my home PC running and then starting it from laptop, or something like that ... but I don't use file syncing solutions, so I don't know how *that* plays into it.
 
Top