C library implementing the ridiculously fast CLHash hashing function (with C++ wrappers)
CLHash is a very fast hashing function that uses carry-less multiplication. It runs on recent x64 processors (Haswell or better, via PCLMULQDQ + SSE4.2) and on 64-bit ARM processors with the crypto extension that provides PMULL (e.g., Apple Silicon, modern Cortex-A cores, AWS Graviton).
CLHash has the following characteristics :
- On a recent Intel processor (e.g., Skylake), it can hash input strings at a speed of 0.1 cycles per byte for sufficiently long strings. You read this right: it is simply ridiculously fast.
- On 64-bit ARM with PMULL (e.g., Apple Silicon), it is similarly fast.
- It has strong theoretical guarantees: XOR universality of short strings and excellent almost universality for longer strings.
- The x86 and ARM implementations are bit-for-bit compatible: identical (seed, input) pairs produce the same 64-bit hash on either platform. This is verified by a known-answer test in
tests/unit.c.
For details, please see the research article:
- Daniel Lemire, Owen Kaser, Faster 64-bit universal hashing using carry-less multiplications, Journal of Cryptographic Engineering 6 (3), 2016. http://arxiv.org/abs/1503.03465
A standard fast but non-random hash function is a simple recursive function like so:
uint64_t javalikehash(char *input, size_t length) {
uint64_t sum = 0;
for(size_t i = 0; i < length; ++i) sum = 31 * sum + (uint64_t) input[i];
return sum;
}You should expect clhash to be between 20 to 40 times faster than this reference hash function for input spanning hundreds of bytes or more.
- On x64, you need PCLMULQDQ + SSE4.2 (Haswell from 2013 or later in practice). On older
x64 chips it will either fail to build or be slow. The Makefile and
CMake build pass
-msse4.2 -mpclmul -march=nativeon x86. Virtually all x64 processors today fit this requirement. - On 64-bit ARM (AArch64), you need the crypto extension that provides the
PMULL/PMULL2 instructions (advertised via
HWCAP_PMULLon Linux, and always present on Apple Silicon). The build passes-march=armv8-a+cryptoon ARM. Virtually all 64-bit ARM processors support this feature today.
POWER and other architectures are not currently supported; the build will
fail at preprocessing with a clear #error.
If your compiler is not C99 compliant... please get better one.
make
./unitCompile option: if you define CLHASH_BITMIX during compilation, extra work is done to pass smhasher's avalanche test succesfully. Disabled by default.
#include <assert.h>
#include "clhash.h"
int main() {
void * random = get_random_key_for_clhash(UINT64_C(0x23a23cf5033c3c81),UINT64_C(0xb3816f6a2c68e530));
uint64_t hashvalue1 = clhash(random,"my dog",6);
uint64_t hashvalue2 = clhash(random,"my cat",6);
uint64_t hashvalue3 = clhash(random,"my dog",6);
assert(hashvalue1 == hashvalue3);
assert(hashvalue1 != hashvalue2);// very likely to be true
free(random);
return 0;
}make
./benchmarkIf you prefer the convenience of a C++ interface with support for stl::vector and std::string, you can create a clhasher object instead.
#include <assert.h>
#include <cstdio>
#include <vector>
#include "clhash.h"
int main(void) {
clhasher h(UINT64_C(0x23a23cf5033c3c81),UINT64_C(0xb3816f6a2c68e530));
std::vector<int> vec{1,3,4,5,2,24343};
uint64_t vechash = h(vec);
uint64_t arrayhash = h(vec.data(), vec.size());
assert(vechash == arrayhash);
uint64_t cstringhash = h("o hai wurld");
uint64_t stringhash = h(std::string("o hai wurld"));
assert(cstringhash == stringhash);
}make
./cppunitYou can also build with CMake:
cmake -S . -B build
cmake --build build -jRun tests with CTest:
ctest --test-dir build --output-on-failureEnable CLHASH_BITMIX at configure time:
cmake -S . -B build -DCLHASH_ENABLE_BITMIX=ON
cmake --build build -jThe CMake build provides install rules. Pick a prefix and run:
cmake -S . -B build -DCMAKE_INSTALL_PREFIX=/usr/local
cmake --build build -j
cmake --install build # may need sudo depending on the prefixThis installs:
lib/libclhash.a— the static libraryinclude/clhash.h— the public header (also provides the C++clhasherwrapper)lib/cmake/clhash/clhash{Config,ConfigVersion,Targets}.cmake— a CMake package config so downstream projects canfind_package(clhash)
Downstream CMake usage looks like:
find_package(clhash 1.0 REQUIRED)
add_executable(myapp main.c)
target_link_libraries(myapp PRIVATE clhash::clhash)If you are vendoring clhash via add_subdirectory() and do not want the
install rules to be inherited, pass -DCLHASH_INSTALL=OFF.
clhash is deliberately tiny: it is a single C source file plus a single public header. If you do not want to deal with a build system, just copy
src/clhash.cinclude/clhash.h
into your project and compile clhash.c with your other sources. The only
requirements are a C99 (or newer) compiler and the right architecture flag:
-msse4.2 -mpclmul(or-march=native) on x86-64-march=armv8-a+cryptoon 64-bit ARM
For example, on Apple Silicon:
cc -O3 -march=armv8-a+crypto -std=c99 -Iinclude -c src/clhash.c
cc -O3 -march=armv8-a+crypto -std=c99 -Iinclude my_program.c clhash.o -o my_programThe library has no external dependencies beyond the C standard library.
If you use this library in your work, please cite:
@article{lemire2016faster,
title={Faster 64-bit universal hashing using carry-less multiplications},
author={Lemire, Daniel and Kaser, Owen},
journal={Journal of Cryptographic Engineering},
volume={6},
number={3},
pages={171--185},
year={2016}
}