Testing this is simple: just implement all missing features. Apart from encryption support (which you should not use, but gpg instead), the work is now finished and available in jzip github repo. Here are the results.
Lines of code (as counted by wc)
Info-Zip: 82 057 lines of C
jzip: 1091 lines of C++
Stripped binary size
Info-Zip: 159 kB
jzip: 51 kB
Performance
Performance was tested by zipping a full Clang source + build tree into one zip file. This includes both the source, svn directories and all build artifacts. The total size was 9.3 gigabytes. Extraction times were as follows.
Info-Zip: 5m 38s
jzip: 2m 47s
Jzip is roughly twice as fast. This is a bit underwhelming result given that the test machine has 8 cores. Further examination showed that the reason for this was that jzip saturates the hard drive write capacity.
Conclusions
Using a few evenings worth of spare time it is possible to reimplement an established (but relatively straightforward) product with two orders of magnitude less code and massively better performance.
Update: more measurements
Usinag a 48 core machine with fast disks.
Info-zip: 3m 32s
jzip: 12s
On this machine jzip is 95% faster.
On the other hand when running on a machine with a slow disk, jzip may be up to 30% slower because of write contention overhead.
Nice!
ReplyDeleteIn my book the speed boost is like that:
(212s-12s)/12s*100%=1666% or 16x faster
Jussi, are you aware of this compression competition:
https://encode.su/threads/3421-GDC-Competition-Notices
Running it in single-threaded mode can set Jzip on the map, it is kinda under the radar at the moment :(
That seems to be a competition for new compression tech. Jzip (which is now called Parzip) simply uses Zlib or xz.
Delete