Feature - Implemented GZip Support for Session Logs

Darzil

Developer
Personally I'd prefer a mafia supported standard way. Otherwise we are opening up a sea of "I zipped a file but it won't read" bug reports in future.
 

Saklad5

Member
ZIP doesn’t preserve UNIX metadata, so if we are going to choose a standard way it shouldn’t be that.

That being said, it should not be too difficult to cover all the likely formats. As long as everything is well-documented on the wiki, it should be fine.

At any rate, we’ll have to implement reading compressed files whether or not we implement writing them. So we might as well do this first.
 

fronobulax

Developer
Staff member
To be clear the patch I am testing does one thing. To support ash session_logs, getSessionLogs derives a file name based on date and player and then processes the file if it exists. Old behavior if the file does not exist is to move on. New behavior when file does not exist is to append ".gz" to the derived filename and process it if it exists, otherwise move on.

It will be tested on two different OSes but the truth is making a GZIPInputStream is something Java is doing so if it doesn't work across OSes it is a Java bug. The use of a hardwired lower case ".gz" in the file name could be a cross OS problem but since ".txt" is already hardwired this doesn't introduce any new "case sensitivity across OS" problems.

As it is this feature is of no personal use. The discussion about archives and Zip compression was an exploration to see if there was a a feature Saklad5 and I were both interested in, since I am selfish and lazy and was not going to implement something that benefited perhaps as few as two people and neither of them was me. However Saklad5 produced a patch before there was any resolution on an expanded feature set. That shifted my sentiment because (as a community service) I will review and commit a patch that does nothing for me if it seems to do something for someone else and does no harm.

So if there are concerns with is actually under test then I will certainly cease and desist but at the moment I do not think this breaks anything and adds a capability at least two people are interested in.

getSessionLogs derives a filename and looks for it. If, instead, it iterated over the contents of /sessions then support for archives, other compression schemes and directory recursion should be considered.
 

Saklad5

Member
That’s about right, with one caveat: the current behavior actually creates a blank file if it can’t find one. I don’t really understand the benefit of doing that, but I made sure to keep that the same in my patch. If it cannot find either an uncompressed or compressed file, it makes a new blank file with the “.txt” ending.

If you want to compress individual files into ZIPs, I could easily implement support for that by just replicating my existing code and replacing GZIPInputStream with ZipInputStream. However, multi-file ZIPs would cause undefined behavior, so I feel it is probably best to hold off on that until we decide how to deal with it.

It might be best to eventually make a separate class with functions for reading and writing compressed/archived files, rather than cluttering RuntimeLibrary with a bunch of control structures for every format. That isn’t necessary right now, though.
 

fronobulax

Developer
Staff member
Personally I'd prefer a mafia supported standard way. Otherwise we are opening up a sea of "I zipped a file but it won't read" bug reports in future.

Point considered. I think it is apples and oranges because a gz file is compressed and uncompressing it creates a single file. That file may be an archive of files (.tar, .jar, etc.) but it does not have to be. A non-empty zip file contains one or more individual files, possibly organized in a hierarchy that may be significant.

You can only use this feature if you use and understand gZip. AFAIK plain vanilla Windows contains no tools to write a .gz file so the people who benefit are non-Windows users or non-vanilla Windows users. So maybe it gets rejected as quasi-OS specific?

I playing for a music camp the rest of this week so won't even test until later. I'll test and let this percolate to see who else weighs in an on which side.

Thanks. Discussion is much better than having something reverted :)
 

Saklad5

Member
Point considered. I think it is apples and oranges because a gz file is compressed and uncompressing it creates a single file.

You can only use this feature if you use and understand gZip. AFAIK plain vanilla Windows contains no tools to write a .gz file so the people who benefit are non-Windows users or non-vanilla Windows users. So maybe it gets rejected as quasi-OS specific?

I went with gzip because it is both built into Java and guaranteed to deflate into one file. As far as I’m concerned, archives should be treated like folders, and that currently means ignoring them. I recognize that gzip might be less accessible to a fresh Windows installation, but anyone willing to manage their session logs like this is unlikely to find that troublesome.

More importantly, adding support for gzip does not rule out supporting other formats in the future, and no one is disadvantaged by not utilizing this functionality: if they don’t use a supported format, nothing has changed. If you use ZIP right now, you can keep using ZIP.

By the way, could you post a patch with the revisions you made to my code? SVN makes parallel development rather difficult, so I want to make sure we are all handling the same thing.
 
Last edited:

fronobulax

Developer
Staff member
I've tested the modified patch to my satisfaction.

It changes the behavior of session_logs.

If I doesn't find a txt session log but does find a txt.gz session log it will process the latter instead of continuing without that session.

Only people who individually gzip session logs and observe the naming convention will be able to take advantage of this . Everyone else should see no change.

Since session_logs derives a file name and looks for it, as opposed to iterating over the contents of a directory, I see no easy way to support archives without rewriting session_logs. I am not willing to do that. Nor do I see any benefit to requiring zip file users to create an archive with exactly one file and a particular file name.

I will commit this in the evening unless there is more discussion against it.
 

Saklad5

Member
This should be marked as implemented. Support for folders, archives, and other compression formats can be dealt with in future requests.

Thank you, by the way.
 
Last edited:

Saklad5

Member
I wrote a Bourne script for compressing session logs that should work on everything besides Windows without modification, if anyone wants it:
Code:
#!/bin/sh
find $1/sessions -name *.txt | parallel -X 'gzip -N9 {}'
It takes the path of the KoLmafia data directory as an argument, and requires GNU Parallel and gzip. It’ll distribute work efficiently across all available CPUs. I have it running as a cron job on my Ubuntu server, two minutes after rollover.
 
Last edited:

Crowther

Active member
I wrote a Bourne script for compressing session logs that should work on everything besides Windows without modification, if anyone wants it.

It takes the path of the KoLmafia data directory as an argument, and requires GNU Parallel and gzip. It’ll distribute work efficiently across all available CPUs.

Hmm. I would have just done something like
Code:
pigz -9 ~/KoLmafia/sessions/*
 

Saklad5

Member
I’ve heard of pigz, but it seems like it produces slightly larger files due to splitting the file up for processing. I’ve also heard it is rather inefficient, especially with RAM. Besides, that’s mainly for using multiple CPU threads to compress one file, and that isn’t really necessary here. Log files are almost certainly not going to be larger than a few megabytes, but there are potentially many of them.

Also, GNU Parallel is wonderful and should be installed on everything anyway. It can show progress bars with ETAs, output job logs, and even distribute work across multiple systems using SSH and rsync.
 
Last edited:
Top