Binary options signals 80 win31 comments
Sicher oder forex binaren optionen
I think I first heard about the Zstandard compression algorithm at a Mercurial developer sprint in At one end of a large table a few people were uttering expletives out of sheer excitement. At developer gatherings, that's the universal signal for something is awesome.
Long story short, a Facebook engineer shared a link to the RealTime Data Compression blog operated by Yann Collet then known as the author of LZ4 - a compression algorithm known for its insane speeds and people were completely nerding out over the excellent articles and the data within showing the beginnings of a new general purpose lossless compression algorithm named Zstandard. This being a Mercurial meeting, many of us were intrigued because zlib is used by Mercurial for various functionality including on-disk storage and compression over the wire protocol and zlib operations frequently appear as performance hot spots.
Before I continue, if you are interested in low-level performance and software optimization, I highly recommend perusing the RealTime Data Compression blog. There are some absolute nuggets of info in there. Anyway, over the months, the news about Zstandard zstd kept getting better and more promising. I was toying around with pre-release versions and was absolutely blown away by the performance and features.
I believed the hype. A few days later, I started the python-zstandard project to provide a fully-featured and Pythonic interface to the underlying zstd C API while not sacrificing safety or performance. The ulterior motive was to leverage those bindings in Mercurial so Zstandard could be a first class citizen in Mercurial, possibly replacing zlib as the default compression algorithm for all operations. Fast forward six months and I've achieved many of those goals.
It even exposes some primitives not in the C API, such as batch compression operations that leverage multiple threads and use minimal memory allocations to facilitate insanely fast execution. Expect a dedicated post on python-zstandard from me soon. When cloning from hg. And, work is ongoing for Mercurial to support Zstandard for on-disk storage, which should bring considerable performance wins over zlib for local operations. I've learned a lot working on python-zstandard and integrating Zstandard into Mercurial.
My primary takeaway is Zstandard is awesome. In this post, I'm going to extol the virtues of Zstandard and provide reasons why I think you should use it. This trade-off is usually made because data - either at rest in storage or in motion over a network or even through a machine via software and memory - is a limiting factor for performance.
At scale, better and more efficient compression can translate to substantial cost savings in infrastructure. It can also lead to improved application performance, translating to better end-user engagement, sales, productivity, etc.
This is why companies like Facebook Zstandard , Google brotli, snappy, zopfli , and Pied Piper middle-out invest in compression. Computers are completely different today than they were in The Pentium microprocessor debuted in For comparison, a modern NVMe M.
And of course CPU and network speeds have increased as well. We also have completely different instruction sets on CPUs for well-designed algorithms and software to take advantage of. What I'm trying to say is the market is ripe for DEFLATE and zlib to be dethroned by algorithms and software that take into account the realities of modern computers.
Zstandard initially piqued my attention by promising better-than-zlib compression and performance in both the compression and decompression directions.
But it isn't unique. Brotli achieves the same, for example. But what kept my attention was Zstandard's rich feature set, tuning abilities, and therefore versatility. Before I do, I need to throw in an obligatory disclaimer about data and numbers that I use. Benchmarks should not be trusted.
There are so many variables that can influence performance and benchmarks. And if you change power settings, does that reflect real-life usage? Reporting useful and accurate performance numbers for compression is hard because there are so many variables to care about. Since Mercurial is the driver for my work in Zstandard, the data and numbers I report in this post are mostly Mercurial data.
Specifically, I'll be referring to data in the mozilla-unified Firefox repository. This repository contains over , commits spanning almost 10 years. The Mercurial layer adds some binary structures to e. There are two Mercurial-specific pieces of data I will use. One is a Mercurial bundle. This is essentially a representation of all data in a repository. It stores a mix of raw, fulltext data and deltas on that data. The other piece of data is revlog chunks. This is a mix of fulltext and delta data for a specific item tracked in version control.
I frequently use the changelog corpus, which is the fulltext data describing changesets or commits to Firefox. The numbers quoted and used for charts in this post are available in a Google Sheet. All performance data was obtained on an iK running Ubuntu Memory used is DDR with a cycle time of 35 clocks. While I'm pretty positive about Zstandard, it isn't perfect. There are corpora for which Zstandard performs worse than other algorithms, even ones I compare it directly to in this post.
So, your mileage may vary. Please enlighten me with your counterexamples by leaving a comment. With that rather large disclaimer out of the way, let's talk about what makes Zstandard awesome.
Compression algorithms typically contain parameters to control how much work to do. You can choose to spend more CPU to hopefully achieve better compression or you can spend less CPU to sacrifice compression.
OK, fine, there are other factors like memory usage at play too. This is commonly exposed to end-users as a compression level. In reality there are often multiple parameters that can be tuned.
But I'll just use level as a stand-in to represent the concept. But even with adjustable compression levels, the performance of many compression algorithms and libraries tend to fall within a relatively narrow window. In other words, many compression algorithms focus on niche markets.
For example, LZ4 is super fast but doesn't yield great compression ratios. LZMA yields terrific compression ratios but is extremely slow.
This can be visualized in the following chart showing results when compressing a mozilla-unified Mercurial bundle:. This chart plots the logarithmic compression speed in megabytes per second against achieved compression ratio.
The further right a data point is, the better the compression and the smaller the output. The higher up a point is, the faster compression is. The ideal compression algorithm lives in the top right, which means it compresses well and is fast. But the powers of mathematics push compression algorithms away from the top right.
LZ4 is highly vertical, which means its compression ratios are limited in variance but it is extremely flexible in speed. So for this data, you might as well stick to a lower compression level because higher values don't buy you much.
Bzip2 is the opposite: That means it is consistently the same speed while yielding different compression ratios. In other words, you might as well crank bzip2 up to maximum compression because it doesn't have a significant adverse impact on speed. LZMA and zlib are more interesting because they exhibit more variance in both the compression ratio and speed dimensions.
But let's be frank, they are still pretty narrow. This small window of flexibility means that you often have to choose a compression algorithm based on the speed versus size trade-off you are willing to make at that time.
That choice often gets baked into software. And as time passes and your software or data gains popularity, changing the software to swap in or support a new compression algorithm becomes harder because of the cost and disruption it will cause.
What we really want is a single compression algorithm that occupies lots of space in both dimensions of our chart - a curve that has high variance in both compression speed and ratio. Such an algorithm would allow you to make an easy decision choosing a compression algorithm without locking you into a narrow behavior profile.
It would allow you make a completely different size versus speed trade-off in the future by only adjusting a config knob or two in your application - no swapping of compression algorithms needed! As you can guess, Zstandard fulfills this role. This can clearly be seen in the following chart which also adds brotli for comparison. The advantages of Zstandard and brotli are obvious. That fastest speed is only 2x slower than LZ4 level 1. It's worth noting that zstd's C API exposes several knobs for tweaking the compression algorithm.
Each compression level maps to a pre-defined set of values for these knobs. It is possible to set these values beyond the ranges exposed by the default compression levels 1 through I've done some basic experimentation with this and have made compression even faster while sacrificing ratio, of course. This covers the gap between Zstandard and brotli on this end of the tuning curve. The wide span of compression speeds and ratios is a game changer for compression. Unless you have special requirements such as lightning fast operations which LZ4 can provide or special corpora that Zstandard can't handle well, Zstandard is a very safe and flexible choice for general purpose compression.
The output from this API is compatible with the Zstandard frame format and doesn't require any special handling on the decompression side.