Page 1 of 4 1 2 3 ... LastLast
Results 1 to 10 of 32

Thread: GZip Support for Session Logs

  1. #1
    Senior Member
    Join Date
    Apr 2018
    Posts
    133

    Default GZip Support for Session Logs

    Given enough time, session logs from KoLmafia can take up quite a lot of storage space. Thankfully, the nature of these logs means that they are extremely compressible using tools like Gzip.

    Many scripts benefit from accessing these logs through the session_logs ASH function, though, and this would break that functionality. Id like to change that function so that it automatically decompresses and reads Gzipped logs along with uncompressed logs, as if they were normal text. Java has a built-in class for that, so it should be fairly easy to implement.

    Compressing the files in the first place could be left to the user. Gzip isnt computationally intensive (it is already used for web requests in KoLmafia, after all), but adding a setting for it would probably confuse a lot of people.

  2. #2
    Senior Member zarqon's Avatar
    Join Date
    Nov 2007
    Location
    Seoul, Korea
    Posts
    3,584

    Default

    I support this idea!

    Sometime in the beginning of each year I make a zip file of the previous year's logs and move it to another folder (because I sync my KoLmafia directory with Dropbox and have relatively limited space). This feature would increase my time between purgings.
    Sig by JakAtk
    My scripts: Prefref Plus | Skillref Plus | One-Click Wossna | Om*****st (??) | Psychose-a-Matic | RandBot
    Combat suite: Best Between Battle | Mercenary Mood | SmartStasis | BatMan | BatMan RE
    For script authors: ASH Wiki | ZLib | BatBrain | CLI Links | Drag-n-Drop Inventories | CanAdv | Script Registry | Map Manager
    If you appreciate my work, help me become BAT KING OF THE WORLD! Thanks to all donators!

  3. #3
    Developer fronobulax's Avatar
    Join Date
    Feb 2009
    Location
    Central Virginia, USA
    Posts
    4,159

    Default

    In contrast, on the first day of every month I move all my logs to a zip file. I have no use case for using KoLmafia to access logs older than a month and am quite comfortable with manually unzipping if one developed. I am also quite content to parse logs using tools other than ash scripts so I have no benefit from compressing individual files since my tools read text and act on all (matching) files in a directory.

    That said it does seem as if adding a new choice of input stream wouldn't be too hard.

    But...

    Should this support a collection of individually compressed files in a subdirectory, a compressed archive of files or both? Since KoLmafia is cross platform which compression and which archive formats should be supported? Should it be expected to merge results from an archive and directory?

    Given my available storage, the ease of adding files to a zip file on Window and the different process required to create a collection of zip files, each of which contains one file, I am most interested in this if it supports archives and then (since the use case is session_logs) if it handles duplicates (same file name in archive and in directory) or renamed files.

  4. #4
    Senior Member
    Join Date
    Apr 2018
    Posts
    133

    Default

    I think gzip is ideally suited for this type of compression, and it is already supported by basically everything. As I said, KoLmafia actually already uses gzip to speed up web requests. ZIP files are less consistent across systems, lose certain UNIX metadata, take longer to compress and decompress, and produce a larger file. However, more formats can be added if you feel that is necessary (xz, for example, boasts a much better compression ratio than gzip, though it is more computationally intensive and not built into Java itself), but that seems like a bit of a slippery slope.

    As for multi-file archives, I think it would simplify things tremendously to only support individual files in the existing directory with unchanged names (extension aside). That rules out all but one potential conflict: the presence of both an uncompressed file and a compressed one. In that case, the most recently-modified file should be used.

    By the way, gzip is not quite the same as ZIP. Rather than compressing groups of files into a single archive, gzip takes arbitrary data (file, stream, whatever) and spits it back out. You are supposed to use a different tool (tar, almost always) to bundle multiple files without compressing them, then feed that into gzip. This is why you’ll often see multiple extensions on gzipped archives (tar.gz). It decompresses into a single tar file (.tar), and that contains multiple files/folders. In contrast, ZIP basically includes both of these functions. We can skip tar (.gz) if we only support individual files for now, though.
    Last edited by Saklad5; 07-30-2018 at 12:41 AM. Reason: Clarified that gzip alone will never have multiple files

  5. #5
    Senior Member
    Join Date
    Apr 2018
    Posts
    133

    Default

    because I sync my KoLmafia directory with Dropbox and have relatively limited space
    Originally Posted by zarqon View Post
    I highly recommend Resilio Sync (formerly known as BitTorrent Sync) for this. Rather than using cloud storage, it uses the torrent protocol to synchronize data directly between your devices, no server involved. It’s extremely fast and efficient, and improves if you add more devices. The only downside is that at least one of the devices with the most recent version of the folder has to be online at the same time as the devices without it for changes to propagate.

  6. #6
    Senior Member
    Join Date
    Apr 2018
    Posts
    133

    Default

    To give an idea of what impact this could have, I just compressed my 153MB sessions folder with tar and xz. The resulting file is 4.5MB.

    xz is probably not a good candidate for this, unless we want to support multiple formats, for reasons I gave previously: more intensive, not built into Java, etcetera. Gzip is nearly as good without any of those drawbacks, which is why it is baked into the Web.

    7zip actually uses the same algorithm as xz (LZMA2 to be precise), so the same applies to it.
    Last edited by Saklad5; 07-30-2018 at 12:54 AM.

  7. #7
    Developer fronobulax's Avatar
    Join Date
    Feb 2009
    Location
    Central Virginia, USA
    Posts
    4,159

    Default

    I think gzip is ideally suited for this type of compression, and it is already supported by basically everything. As I said, KoLmafia actually already uses gzip to speed up web requests. ZIP files are less consistent across systems, lose certain UNIX metadata, take longer to compress and decompress, and produce a larger file. However, more formats can be added if you feel that is necessary (xz, for example, boasts a much better compression ratio than gzip, though it is more computationally intensive and not built into Java itself), but that seems like a bit of a slippery slope.

    As for multi-file archives, I think it would simplify things tremendously to only support individual files in the existing directory with unchanged names (extension aside). That rules out all but one potential conflict: the presence of both an uncompressed file and a compressed one. In that case, the most recently-modified file should be used.

    By the way, gzip is not quite the same as ZIP. Rather than compressing groups of files into a single archive, gzip takes arbitrary data (file, stream, whatever) and spits it back out. You are supposed to use a different tool (tar, almost always) to bundle multiple files without compressing them, then feed that into gzip. This is why youll often see multiple extensions on gzipped archives (tar.gz). It decompresses into a single tar file, and that contains multiple files/folders. In contrast, ZIP basically includes both of these functions. We can skip tar if we only support individual files for now, though.
    Originally Posted by Saklad5 View Post
    You may proclaim the technical superiority as much as you want but your task is to either submit a patch for review or convince someone else to do the work. I am probably the least prolific of the current devs but I am willing and able to work on things that benefit me personally or that I think are really, really important.

    My use case is that all logs being actively "mined" are in text format in a single directory. Compressed files are in a zip archive because that is what is easiest to create on Windows. If a file in an archive needs to be processed it is manually extracted.

    If this new feature is to be of any use to me, it needs to be able to treat a (zip) archive as augmenting the contents of a directory. So my comments were an attempt to bridge the gap between your *nix centered requirements and my Windows centric way of doing things.

    By the way, gzip is not quite the same as ZIP.
    Originally Posted by Saklad5 View Post
    Please do not make assumptions about my technical knowledge and talk down to me.
    You just vehemently agreed with me
    Originally Posted by Veracity View Post
    I agree with frono.
    Originally Posted by Veracity View Post

  8. #8
    Developer fronobulax's Avatar
    Join Date
    Feb 2009
    Location
    Central Virginia, USA
    Posts
    4,159

    Default

    I highly recommend Resilio Sync (formerly known as BitTorrent Sync) for this. Rather than using cloud storage, it uses the torrent protocol to synchronize data directly between your devices, no server involved. It’s extremely fast and efficient, and improves if you add more devices. The only downside is that at least one of the devices with the most recent version of the folder has to be online at the same time as the devices without it for changes to propagate.
    Originally Posted by Saklad5 View Post
    My personal use case for Dropbox or any other synch service is that computers do not have to be online at the same time. If it's too sensitive for someone else's cloud I just don't synch it. YMMV, obviously.
    You just vehemently agreed with me
    Originally Posted by Veracity View Post
    I agree with frono.
    Originally Posted by Veracity View Post

  9. #9
    Developer
    Join Date
    Aug 2009
    Posts
    2,814

    Default

    In my limited testing (over my ~50MB of session files from the past month), it seems like gzip provides roughly 15x compression ratios, while bzip2 and xz provide roughly 30x. Surprisingly, gzipping files individually didn't seem to have a significant negative impact on achieved ratios (maybe 10% larger compared to gzipping the entire tarball; bzip2 yielded ~25% improvement over individually-compressed files; xz was even more drastic with a 40% improvement over individually-compressed files).

    Also, maybe I'm still in the stone age with my rsync usage... I have no idea what this resilio sync is, but it looks like it doesn't send deltas, or compress files in transit?

  10. #10
    Developer fronobulax's Avatar
    Join Date
    Feb 2009
    Location
    Central Virginia, USA
    Posts
    4,159

    Default

    As long as we are off on a tangent - at one point in Ye Olden Days compressing files did not save as much disk space as people hoped because of the disk sector size. If the sector size was 2048 bytes compressing a 2040 byte file down to 200 bytes did not free up 1840 bytes for another file. This was used as an argument for putting files in an archive and (typically) compressing the archive. The savings are trivial on a modern PC but there are still a few embedded devices in use in the real world where bytes matter.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •