Wednesday, February 27, 2013

Hello, bits

Bit arrays are a great tool for efficiently implementing sets, but bit twiddling operations can be a little unwieldy. After searching for an existing method to list all the bits set in an integer, nothing turned up.  This is a pretty important operation for a set, so I just implemented it myself with a bit counting algorithm as a starting point.

How many?

Counting one bits is easy with Brian Kernighan's bit counting algorithm (in C, of course):
  1. int countBits(int n) {
  2. int count = 0;
  3. while(n != 0) {
  4. n &= (n-1);
  5. count++;
  6. }
  7. return count;
  8. }

Why does it work?

The algorithm takes an input n, and "peels" bits off one-by-one until none are left. Line 4 clears the lowest order one bit by masking with n-1. The shape of this mask is easier to see in binary:
When n = 0b1000, the mask n-1 = 0b0111
When n = 0b101000, the mask n-1 = 0b100111

So n-1 produces a mask with all the same high-order bits set.  The lowest bit from n is not set, but all bits below this one will be set.  Bitwise AND-ing n and (n-1) together produces a new value with all the same bits set except the lowest one.  Once all bits are cleared, n will be zero and the loop terminates.

Getting bit indices

Counting bits isn't what we're after, so lets get the actual bits.
  1. int printBits(int n) {
  2. int count = 0;
  3. int next_n, index;
  4. while(n != 0) {
  5. next_n = n & (n-1);
  6. index = __builtin_ctz(n - next_n);
  7. n = next_n;
  8. printf("Hello, %i\n", index);
  9. }
  10. return count;
  11. }
It's easy to get the value of the bit we just cleared---subtract the values before and after the mask!  This leaves us with a number containing only one bit set: the one we just masked out.  The __builtin_ctz function is a GCC builtin that gives us the index of this bit instead of the corresponding power of two.

Hello, bits

Printing the bits isn't all that useful, so here's a minimal C++ biterator that lets you walk over the one bits in an integer.
  1. class biterator {
  2. private:
  3. int _value;
  4. int _next;
  5.  
  6. public:
  7. biterator(int value) {
  8. _value = value;
  9. _next = value & (value-1);
  10. }
  11.  
  12. biterator begin() {
  13. return biterator(_value);
  14. }
  15.  
  16. biterator end() {
  17. return biterator(0);
  18. }
  19.  
  20. int operator*() {
  21. return __builtin_ctz(_value - _next);
  22. }
  23.  
  24. void operator++() {
  25. _value = _next;
  26. _next = _value & (_value - 1);
  27. }
  28.  
  29. bool operator!=(biterator b) {
  30. return _value != b._value;
  31. }
  32. };
  33.  
  34. // The obligatory example code
  35. int main() {
  36. // C++11 foreach:
  37. for(int i: biterator(0xfa)) {
  38. printf("Hello, %i\n", i);
  39. }
  40. return 0;
  41. }

Acknowledgements

Thanks to +Dan Barowy for piquing my interest with this problem, and translating my algorithmic ramblings to something actually runnable.