To zip or not to zip (General scripting not ash)

Paragon · Feb 21, 2013

Hey, hope nobody minds but I just wanted to ask a general question even though it doesn't relate to ash. How large would a string need to be to justify zipping it before sending it over the wire? I.E, at how many bytes would the time spent zipping/unzipping become shorter then the time spent sending uncompressed data over the line?

Bale · Feb 21, 2013

Way too many undeclared variables, but maybe someone can give you a formula to determine it.

How compressible is the string? Not all data can be compressed equally.
How fast is the systems processor?
What form of compression is being used? Some are faster, but others compress certain formats of data better.

Can anyone else find other relevant factors?

roippi · Feb 21, 2013

Do you care about bandwidth usage? What's the throughput of the connection?

Paragon · Feb 21, 2013

Lets see...
The plaintext data is Json encoded... possibly encrypted json encoded.. so, I don't know how compressible that is.
And the rest of it is up in the air b/c I still have to pick a Server.... but It lets assume I use a pretty cheapest VPS (The Vps has to be windows b/c .net apps that mono doesn't support)
OS: Windows Server
2008 Standard 64-bit
• RAM: 1 GB
• Storage: 20 GB‡‡
• Bandwidth: 1,000 GB/mo

That's Go Daddy's Economy VPS... they don't mention Processing power.
The Compression would be a format 7zip is capable or using. (probly, winzip since that is probably the most commonly avail to end users... who use windows (target audience)

Caring about bandwidth usage... tough tough question... let's say I "DO" care about bandwidth usage... but I have no idea to what extent

Honestly, I don't even know how to begin estimating bandwidth, or even bandwidth per session.

xKiv · Feb 21, 2013

Well, "compressing" an empty file with 7z lzma maximal compression creates a 70-byte file (zip created 100-byte file), but a lot of that will be file info overhead (name of file, ...).
In general, if your string is mostly text [A-Za-z0-9_{}()/.!@#$%^&*()-_=+|\\\[\]<>,.?] (that's ~100 different characters), you will probably crunch the string to smaller than 90% of original even with extremely primitive methods, + overhead. That's definitely a win for strings of 1001 bytes and more.
In reality, you will get much better rates, but it will probably still be best to compress every time, then check which is smaller (compressed or uncompressed), then send a flag and the smaller version. Or use a compression library that already does that by default.

Bandwidth will depend on the slowest point between your server and client, and there will be additional overhead added there too. But decent compression shouldn't be too taxing on your CPU. But best compression is one that can use pre-existing shared (==doesn't need to be transmitted over the wire with each piece of data) knowledge about the structure of your data (i.e. if you know that your string will only contain certain words, then replace those words with something shorter; or when you can generate some data from a random seed and then just transmit that seed).

In the end, I would recommend testing performance in a "real" situation, with some variants of solution (never compress, always compress, do both and pick smaller to transmit; possibly with variants where "less than X bytes" is always non-compressed and "more than Y bytes" is always compressed and everything in between sends the smaller of compressed/noncompressed; ...)

Darzil · Feb 21, 2013

Paragon said:
Lets see...
The plaintext data is Json encoded... possibly encrypted json encoded.. so, I don't know how compressible that is.

Encrypted stuff basically doesn't tend to compress well. I've not done a lot on compression, but used to use a quality of service network applicance to compress and decompress data in business environment, using 64k leased lines (years ago). Web based applications used to compress by around 1:20, but encrypted Lotus Notes messages used to bloat slightly instead! You don't get many patterns in a well encypted file, and most compression relies on encoding patterns in a way that saves space.

But yeah, agree with xKiv, try some experiments and see. My gut feeling is going for overall time saved it'll vary a lot based on bandwidth, file type and compression type.

Catch-22 · Feb 21, 2013

If you're talking about whether or not you should enable gzip encoding on your web server, the answer is yes, you should.

You should be serving the JSON data over HTTP, I'm not sure where 7zip comes into the equation. Let the webserver handle it for you using it's own implementation of gzip.

The numbers have changed over the past 4 years, but Paul Buchheit's blog post on the subject still holds true.

If you're going to be serving static 7zip files over HTTP which just contain JSON text files, I'd have to ask "Why!?", you have a perfectly good web server for that purpose.

Paragon · Feb 22, 2013

Ok, so the whole story is, I am writing an application for which the primary interface is command line, however the commands should be able to be sent from pretty much anywhere. So, the Kernal accepts commands in Json format, then runs the command and responds with Json results. A client application should work from 1) The local machine command window, 2) A remote client application connected via socket, 3) A web client using Websockets or 4) A web client using a restful API. While nearly every command is so short that zipping would be useless, there are two instances that a result *might* benefit from being zipped, if the result is a file or if the result contains hundreds or hundreds of thousands of records... In any case, it seems that the only real way to tell is what xkiv and Darzil recommended which is to actually do it with each method and test.

Catch-22 · Feb 22, 2013

I am still confused as to why you wouldn't serve the .json files as application/json types from a web server with gzip enabled, but good luck

To zip or not to zip (General scripting not ash)

Paragon

Member

Bale

Minion

roippi

Developer

Paragon

Member

xKiv

Active member

Darzil

Developer

Catch-22

Active member

Paragon

Member

Catch-22

Active member