Feature - Implemented Prefs file backup/restore Draft PR

I suggested a database because my experience, primarily with SQL based systems, is that if properly configured backup, recovery and several other useful protections come "for free" with the decision to go there. But the driver is that if only one thing is changed then only one thing is written. I am becoming more and more convinced that corruption, not cause by power problems or hardware issues, is because we write all of the preferences when only one changes and when we read the file we overwrite rather than merge.

I'm not opposed to the PR but still believe it is not fixing the problem, just making it less painful if it occurs.

I think the issue is specific to preference files and not all files KoLmafia manages. It might be worth exploring using a local git repository for backups. Motivated users (since I don't think mafia supports this) could use a diff tool and manually merge changes instead of just choosing the least out of date file.
 
I suggested a database because my experience, primarily with SQL based systems, is that if properly configured backup, recovery and several other useful protections come "for free" with the decision to go there. But the driver is that if only one thing is changed then only one thing is written. I am becoming more and more convinced that corruption, not cause by power problems or hardware issues, is because we write all of the preferences when only one changes and when we read the file we overwrite rather than merge.

I'm not opposed to the PR but still believe it is not fixing the problem, just making it less painful if it occurs.

I think the issue is specific to preference files and not all files KoLmafia manages. It might be worth exploring using a local git repository for backups. Motivated users (since I don't think mafia supports this) could use a diff tool and manually merge changes instead of just choosing the least out of date file.
I'm ok with making things less painful while we figure out a better long term solution. The weakest part of my approach is that I've never had this problem and I don't have any examples of corrupted preference files.

Under the hood the a DB is writing to files (and DB corruption is miserable to deal with), but what you have going for you is a manager that handles those writes. If the concern is simultaneous writes, maybe we need a single prefsWriter thread that reads from a blocking queue that every thread that currently writes prefs can populate. Which also might be a step towards pulling it out and putting in a DB based prefsWriter.

I almost agree on prefs being the only file with a current issue, but I suspect that if we wrote to anything else with the frequency and glee with which we write to the prefs, it might not be the case.

If you want to look at Git for data, you might want to look at Dolt or similar products (https://www.dolthub.com/blog/2020-03-06-so-you-want-git-for-data/). I don't know if the branching/merging/versioning capabilities are worth the effort of not just using a SQL database. (edit: the Dolt folks say that it is a version-controlled database https://www.dolthub.com/blog/2022-08-04-database-versioning/). I think it's an interesting approach that I haven't messed with before and which I'll leave for later incremental PRs.
 
Jury is out on the backup since there are OS level solutions that require no changes to mafia. But we can do it if people think we need to.

IMO the simplistic avenue to explore is to find a way so that reading the file, writing the file and changing a preference in memory are done in a controlled fashion. We could probably get that by synching some operations but there may be a performance hit. I also recall a scheme that worked around a similar problem by writing multiple files. Reads are from file A. Writes go to file B. When a write is done B replaces A. That means a read has to be a merge operation rather than than blindly overwriting. Might not work as I remember :-)
 
I also recall a scheme that worked around a similar problem by writing multiple files. Reads are from file A. Writes go to file B. When a write is done B replaces A. That means a read has to be a merge operation rather than than blindly overwriting. Might not work as I remember :-)

The full scheme is a bit more complicated than that - it also writes a "lock file" [1], so that other concurrently running instances [2] know they should refrain from writing. But it's a very standard way of making sure that you can guarantee that you wrote a complete new version of the file before you get rid of the previous version. (or rather, backup the previous version byrenaming it, then put the new version in place also by renaming it).
(I would also *not* call it "merge", that has connotations of reading records from two files and writing them all into a third file .... or working with a fixed-length record files)
(you also need to somehow explicitly handle the situation where your code finds that there already is an old "B" file - maybe just even ask the user to handle the situation manually, of course)

[1] I don't exactly remember what the precise solution is for remote filesystems or other cases where you cannot guarantee an "exclusive create" mode
[2] in the case of mafia, I think we would only be worried about cases when remote syncing copies files between *different* machines that are currently running mafia? Is that something that happens? I can imagine forgetting that I left mafia on my home PC running and then starting it from laptop, or something like that ... but I don't use file syncing solutions, so I don't know how *that* plays into it.
 
Sharing another data point in case it's helpful – I just had my prefs wiped this morning shortly after opening my laptop, with no power loss overnight. I'm on OS X and only run Mafia on one computer.

1) I accidentally left mafia logged in overnight. Upon opening my laptop, it logged back in and refreshed all the daily prefs
2) I closed all mafia windows to have it log out
3) While logged out, I happened to update my mafia version from r27373 to r27381 (I assume unrelated, but mentioning for completeness)
4) Upon logging back in, I discovered my prefs were wiped

Unfortunately I didn't have the presence of mind to check the prefs files and see what they looked like before starting to fix things inside mafia, but if there's other diagnostic info that might help I'd be happy to provide it.
 
Sharing another data point in case it's helpful – I just had my prefs wiped this morning shortly after opening my laptop, with no power loss overnight. I'm on OS X and only run Mafia on one computer.

1) I accidentally left mafia logged in overnight. Upon opening my laptop, it logged back in and refreshed all the daily prefs
2) I closed all mafia windows to have it log out
3) While logged out, I happened to update my mafia version from r27373 to r27381 (I assume unrelated, but mentioning for completeness)
4) Upon logging back in, I discovered my prefs were wiped

Unfortunately I didn't have the presence of mind to check the prefs files and see what they looked like before starting to fix things inside mafia, but if there's other diagnostic info that might help I'd be happy to provide it.

Thank you.

I think a fundamental problem is that we reread the preferences upon relogin and that somehow has a negative interaction with preferences that are already in memory. I've been wrong before.

I looked at the write code and lots of things are synchronized and it looks correct so I don't think simultaneous reads and writes are possibile. But again, I've been wrong before.
 
OK, so is there a pattern like play/sleep/get KoL logged out/wake/play/relogin? Is the problem happening because we're trying to write after we wake up and we've been logged out or the file handle is closed or something?

We're not failing, but we're not succeeding the way we want.
 
I don't want to put it I this PR, but would it be worthwhile to instrument the code a bit? Or put some validation in the write or in loadPreferences?

This is why I'd like to see what it actually looks like, but I'm imagining something like this.
Java:
synchronized (encodedData) {
  for (Entry<String, byte[]> current : encodedData.entrySet()) {
    if ( current.getKey().contains( "\u0000" ) || (current.getValue().contains("\u0000"))) {
       // ask the user to report the problem to the forums and tell them to check their backup...
    }
    // maybe this should be inside an else block, but not until we really know what's going on...
    fstream.write( current.getValue() );
  }
}
 
OK, so is there a pattern like play/sleep/get KoL logged out/wake/play/relogin? Is the problem happening because we're trying to write after we wake up and we've been logged out or the file handle is closed or something?

We're not failing, but we're not succeeding the way we want.
I'm headed in that direction.

I'm a little uncertain as to the exact nature of the corruption. Is it a malformed file cause by the way we write or is data in memory getting corrupted and we correctly write bad data?

The latter might occur, for example, if we clear everything in memory before a loading from disk and we try and write from memory while the clear is in progress?
 
I don't want to put it I this PR, but would it be worthwhile to instrument the code a bit? Or put some validation in the write or in loadPreferences?

This is why I'd like to see what it actually looks like, but I'm imagining something like this.
Java:
synchronized (encodedData) {
  for (Entry<String, byte[]> current : encodedData.entrySet()) {
    if ( current.getKey().contains( "\u0000" ) || (current.getValue().contains("\u0000"))) {
       // ask the user to report the problem to the forums and tell them to check their backup...
    }
    // maybe this should be inside an else block, but not until we really know what's going on...
    fstream.write( current.getValue() );
  }
}

Is there any type of corruption other than writing zeros that are then interpreted as characters?

I'd be fine with committing temporary code that tries to detect corruption, logs some data, asks the user to share it and then apologizes because mafia knows there is corruption and is currently doing nothing.
 
It's not the only bad state, but it's one known bad state. It's why I'd love to see some logs or prefs files that were in bad state, to see what else we should be trying to detect.

Another bad state is "my prefs were reset to the defaults". Part of the problem is that there are normal states where a user might have only the default preferences, such as a new character or they've logged in from a new machine.

My immediate thought on how to get users to help us find this would be a bug bounty, but the lack of funds makes this difficult. Perhaps @gausie can send the user who gives us a repeatable set of steps a piece of toast.

Or is this something excavator can excavate?

Which brings us to the "what's helpful and when?" questions.
What's the right information to capture? Platform, version (for use later), key/value with the detected error, notes from user? Do we put it in the debug log automatically?

What's the right way to trigger it? One thing we've identified is strings of \u0000 in the prefs file, but we've also heard that prefs revert to default. I think that's the case for the backup, right now, which is that the current behavior is "prefs are size 0, therefore set the defaults", and the new logic is "prefs are size 0 and there is a backup, read from the backup or else set the defaults". But we still want to capture it.
 
Potentially related - can lose info about how many turns you have spent in a zone. Bringing it up as that seems to be related but perhaps it isn't!

 
@Alium, I'm not sure that's related. I think that the narrow fix for that rollover issue is in the rollover code, not the prefs loading/saving code. If you want to prove it's unrelated, try it without writing to the file. This PR is strictly about the file system related issues we've been seeing.
 
Maybe related, maybe not. Relevant to the discussion of corruption. I have been running a couple of Bale's scripts since they were active. Several times "recently" they have failed and I could not think of anything I had changed deliberately. I tracked the problem down to preferences. @zarqon's PrefRefPlus showed preferences and values that were mismatched in types. A preference was believed to be a Boolean but was in fact an integer or it was expected to be a string but was set to (int) zero. The scripts worked as expected once I changed the value to match they type. The wrong values could have been due to read/write corruption since every instance I can recall had an int where something else was expected.
 
I just got a mid-session pref-wipe. I haven't closed mafia yet, so if there's any diagnostic information I can provide let me know.

I had been running a script and my computer went to sleep; when it woke up, the script was stalled but mafia was still open. When I directed mafia to open the relay, I got the wipe.
 
Just had a power flicker and my prefs file was lost, the backup file worked amazingly, just a few of my counters were off but everything else worked smoothly. Thanks so much for this.
 
Back
Top