Fast CRC32
链接: Fast CRC32
Error Checking
Real life data tends to get corrupted because machines (and humans) are never as reliable as we wish for. One efficient way is make sure your data wasn't unintendedly modifiied is to generate some kind of hash. That hash shall be unique, compact and efficient:
- unique: any kind of modification to the data shall generate a different hash
- compact: as few bits or bytes as possible to keep the overhead low
- efficient: use little computing resources, i.e. fast and low memory usage
Many protocols, like Ethernet and GZIP archives, append a so-called CRC hash to their data streams which has a few weaknesses regarding uniqueness (sometimes data modifications remain undetected) but is absolutely great when it comes to compactness (CRC32: 32 bits) and efficiency. You find a CRC32 right below every Download button on my blog, too.
There is one very useful property of CRC32: it can be implemented as a rolling algorithm. That means, if you already have some chunk of data and its CRC, then you can append new data and compute the updated CRC but using your original CRC as a seed and just scanning through the appended data.
I don't want to go into mathematical details of Cyclic Redundancy Checking because there are tons of information on the internet, e.g. Wikipedia and the Painless Guide To CRC Error Detection Algorithms. This article only discusses how to write a fast CRC32 algorithm in C/C++.
If you aren't too keen on technical details and just want to have the fastest implementation for not-too-small datasets, I strongly recommend using the crc32_fast
function.
Note: The concepts behind the various CRC32 algorithm are not my original work - I only gathered them in one place.