Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Hamming weight
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Efficient implementation == The population count of a [[bit array|bitstring]] is often needed in cryptography and other applications. The [[Hamming distance]] of two words ''A'' and ''B'' can be calculated as the Hamming weight of ''A'' [[xor]] ''B''.<ref name="Warren_2013"/> The problem of how to implement it efficiently has been widely studied. A single operation for the calculation, or parallel operations on bit vectors are [[#Processor support|available on some processors]]. For processors lacking those features, the best solutions known are based on adding counts in a tree pattern. For example, to count the number of 1 bits in the 16-bit binary number a = 0110 1100 1011 1010, these operations can be done: {| class="wikitable" style="text-align:center;" ! Expression ! colspan=8 | Binary ! Decimal ! Comment |- | style="text-align:left;" | <code>a</code> | 01 | 10 | 11 | 00 | 10 | 11 | 10 | 10 | style="text-align:left;" {{n/a|27834}} | style="text-align:left;" | The original number |- | style="text-align:left;" | <code>b0 = (a >> 0) & 01 01 01 01 01 01 01 01</code> | 01 | 00 | 01 | 00 | 00 | 01 | 00 | 00 | style="text-align:left;" | 1, 0, 1, 0, 0, 1, 0, 0 | style="text-align:left;" | Every other bit from a |- | style="text-align:left;" | <code>b1 = (a >> 1) & 01 01 01 01 01 01 01 01</code> | 00 | 01 | 01 | 00 | 01 | 01 | 01 | 01 | style="text-align:left;" | 0, 1, 1, 0, 1, 1, 1, 1 | style="text-align:left;" | The remaining bits from a |- | style="text-align:left;" | <code>c = b0 + b1</code> | 01 | 01 | 10 | 00 | 01 | 10 | 01 | 01 | style="text-align:left;" | 1, 1, 2, 0, 1, 2, 1, 1 | style="text-align:left;" | Count of 1s in each 2-bit slice of a |- | style="text-align:left;" | <code>d0 = (c >> 0) & 0011 0011 0011 0011</code> | colspan=2 | 0001 | colspan=2 | 0000 | colspan=2 | 0010 | colspan=2 | 0001 | style="text-align:left;" | 1, 0, 2, 1 | style="text-align:left;" | Every other count from c |- | style="text-align:left;" | <code>d2 = (c >> 2) & 0011 0011 0011 0011</code> | colspan=2 | 0001 | colspan=2 | 0010 | colspan=2 | 0001 | colspan=2 | 0001 | style="text-align:left;" | 1, 2, 1, 1 | style="text-align:left;" | The remaining counts from c |- | style="text-align:left;" | <code>e = d0 + d2</code> | colspan=2 | 0010 | colspan=2 | 0010 | colspan=2 | 0011 | colspan=2 | 0010 | style="text-align:left;" | 2, 2, 3, 2 | style="text-align:left;" | Count of 1s in each 4-bit slice of a |- | style="text-align:left;" | <code>f0 = (e >> 0) & 00001111 00001111</code> | colspan=4 | 00000010 | colspan=4 | 00000010 | style="text-align:left;" | 2, 2 | style="text-align:left;" | Every other count from e |- | style="text-align:left;" | <code>f4 = (e >> 4) & 00001111 00001111</code> | colspan=4 | 00000010 | colspan=4 | 00000011 | style="text-align:left;" | 2, 3 | style="text-align:left;" | The remaining counts from e |- | style="text-align:left;" | <code>g = f0 + f4</code> | colspan=4 | 00000100 | colspan=4 | 00000101 | style="text-align:left;" | 4, 5 | style="text-align:left;" | Count of 1s in each 8-bit slice of a |- | style="text-align:left;" | <code>h0 = (g >> 0) & 0000000011111111</code> | colspan=8 | 0000000000000101 | style="text-align:left;" | 5 | style="text-align:left;" | Every other count from g |- | style="text-align:left;" | <code>h8 = (g >> 8) & 0000000011111111</code> | colspan=8 | 0000000000000100 | style="text-align:left;" | 4 | style="text-align:left;" | The remaining counts from g |- | style="text-align:left;" | <code>i = h0 + h8</code> | colspan=8 | 0000000000001001 | style="text-align:left;" | 9 | style="text-align:left;" | Count of 1s in entire 16-bit word |} Here, the operations are as in [[C (programming language)|C programming language]], so {{code|X >> Y}} means to shift X right by Y bits, X & Y means the bitwise AND of X and Y, and + is ordinary addition. The best algorithms known for this problem are based on the concept illustrated above and are given here:<ref name="Warren_2013"/> <syntaxhighlight lang="c"> //types and constants used in the functions below //uint64_t is an unsigned 64-bit integer variable type (defined in C99 version of C language) const uint64_t m1 = 0x5555555555555555; //binary: 0101... const uint64_t m2 = 0x3333333333333333; //binary: 00110011.. const uint64_t m4 = 0x0f0f0f0f0f0f0f0f; //binary: 4 zeros, 4 ones ... const uint64_t m8 = 0x00ff00ff00ff00ff; //binary: 8 zeros, 8 ones ... const uint64_t m16 = 0x0000ffff0000ffff; //binary: 16 zeros, 16 ones ... const uint64_t m32 = 0x00000000ffffffff; //binary: 32 zeros, 32 ones const uint64_t h01 = 0x0101010101010101; //the sum of 256 to the power of 0,1,2,3... //This is a naive implementation, shown for comparison, //and to help in understanding the better functions. //This algorithm uses 24 arithmetic operations (shift, add, and). int popcount64a(uint64_t x) { x = (x & m1 ) + ((x >> 1) & m1 ); //put count of each 2 bits into those 2 bits x = (x & m2 ) + ((x >> 2) & m2 ); //put count of each 4 bits into those 4 bits x = (x & m4 ) + ((x >> 4) & m4 ); //put count of each 8 bits into those 8 bits x = (x & m8 ) + ((x >> 8) & m8 ); //put count of each 16 bits into those 16 bits x = (x & m16) + ((x >> 16) & m16); //put count of each 32 bits into those 32 bits x = (x & m32) + ((x >> 32) & m32); //put count of each 64 bits into those 64 bits return x; } //This uses fewer arithmetic operations than any other known //implementation on machines with slow multiplication. //This algorithm uses 17 arithmetic operations. int popcount64b(uint64_t x) { x -= (x >> 1) & m1; //put count of each 2 bits into those 2 bits x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4 bits x = (x + (x >> 4)) & m4; //put count of each 8 bits into those 8 bits x += x >> 8; //put count of each 16 bits into their lowest 8 bits x += x >> 16; //put count of each 32 bits into their lowest 8 bits x += x >> 32; //put count of each 64 bits into their lowest 8 bits return x & 0x7f; } //This uses fewer arithmetic operations than any other known //implementation on machines with fast multiplication. //This algorithm uses 12 arithmetic operations, one of which is a multiply. int popcount64c(uint64_t x) { x -= (x >> 1) & m1; //put count of each 2 bits into those 2 bits x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4 bits x = (x + (x >> 4)) & m4; //put count of each 8 bits into those 8 bits return (x * h01) >> 56; //returns left 8 bits of x + (x<<8) + (x<<16) + (x<<24) + ... } </syntaxhighlight> The above implementations have the best worst-case behavior of any known algorithm. However, when a value is expected to have few nonzero bits, it may instead be more efficient to use algorithms that count these bits one at a time. As Wegner described in 1960,<ref name="Wegner_1960"/> the [[bitwise AND]] of ''x'' with ''x'' − 1 differs from ''x'' only in zeroing out the least significant nonzero bit: subtracting 1 changes the rightmost string of 0s to 1s, and changes the rightmost 1 to a 0. If ''x'' originally had ''n'' bits that were 1, then after only ''n'' iterations of this operation, ''x'' will be reduced to zero. The following implementation is based on this principle. <syntaxhighlight lang="c"> //This is better when most bits in x are 0 //This algorithm works the same for all data sizes. //This algorithm uses 3 arithmetic operations and 1 comparison/branch per "1" bit in x. int popcount64d(uint64_t x) { int count; for (count=0; x; count++) x &= x - 1; return count; } </syntaxhighlight> If greater memory usage is allowed, we can calculate the Hamming weight faster than the above methods. With unlimited memory, we could simply create a large lookup table of the Hamming weight of every 64 bit integer. If we can store a lookup table of the hamming function of every 16 bit integer, we can do the following to compute the Hamming weight of every 32 bit integer. <syntaxhighlight lang="c"> static uint8_t wordbits[65536] = { /* bitcounts of integers 0 through 65535, inclusive */ }; //This algorithm uses 3 arithmetic operations and 2 memory reads. int popcount32e(uint32_t x) { return wordbits[x & 0xFFFF] + wordbits[x >> 16]; } </syntaxhighlight> <syntaxhighlight lang="c"> //Optionally, the wordbits[] table could be filled using this function int popcount32e_init(void) { uint32_t i; uint16_t x; int count; for (i=0; i <= 0xFFFF; i++) { x = i; for (count=0; x; count++) // borrowed from popcount64d() above x &= x - 1; wordbits[i] = count; } } </syntaxhighlight> A recursive algorithm is given in Donovan & Kernighan<ref name="Donovan_2016" /> <syntaxhighlight lang="c"> /* The weight of i can differ from the weight of i / 2 only in the least significant bit of i */ int popcount32e_init (void) { int i; for (i = 1; sizeof wordbits / sizeof *wordbits > i; ++i) wordbits [i] = wordbits [i >> 1] + (1 & i); } </syntaxhighlight> MuΕa et al.<ref name="Mula_2018"/> have shown that a vectorized version of popcount64b can run faster than dedicated instructions (e.g., popcnt on x64 processors).
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)