Unsigned 32-bit integer to binary string function

I’ve been trying to solidify my understanding of the binary number system by writing code. So, I tried to write a function with what I currently understand about the binary system that returns a string representation of the binary number representing the input.

Any feedback regarding how I may optimize the program is greatly appreciated. Thanks!

char *uint32_to_binarystr(uint32_t n) {
    if (n != UINT32_MAX) {
        int expb2 = 0;              // Exponent of 2.      
        static char binarystr[33];  // 32 bits + 1 termination NULL char.
        binarystr[32] = '\0';       // Add terminating NULL char.
        memset(binarystr, '0', 32); // Initialize 32-bit string as 0.
        while (n != 0) {                         // Continue until n has been completely represented by a sum of powers of 2.
            while (1 << expb2 <= n) { expb2++; } // Find the power of 2 that yields a result greater than n. 
            expb2 -= 1;                          // Once this number is found, we are sure that the previous power yields the largest power of 2 less than n.
            n -= 1 << expb2;                     // We've found a power of 2 to represent this "chunk" of n. Reduce n by the same amount.
            binarystr[31 - expb2] = '1';         // Set the bit at the index with expb2 digits following it to 1.
            expb2 = 0;                           // Reset exponent of 2 and repeat.
        }
        return binarystr;
    } else {
        /*
        In the case that UINT32_MAX is to be represented, just return hard-coded string. 
        Why? 
        A 32-bit shift is not doable in some cases:
        https://stackoverflow.com/questions/7401888/why-doesnt-left-bit-shift-for-32-bit-integers-work-as-expected-when-used
        */
        static char binarystr[33];  
        binarystr[32] = '\0';       
        memset(binarystr, '1', 32); 
        return binarystr;
    }
}

Answer

The interface is problematic. What output does this code produce?

#include <stdio.h>
int main(void)
{
    const char *s1 = uint32_to_binarystr(0);
    const char *s2 = uint32_to_binarystr(1000000);
    printf("%s\n%s\n", s1, s2);
}

You might expect it to output 00000000000000000000000000000000 followed by 00000000000011110100001001000000, but that’s not what we get. Do you see why?

The problem is

    static char binarystr[33];  
    ⋮
    return binarystr;

s1 and s2 both point to the same data!

The two usual solutions to this problem are

  • return newly-allocated memory that the caller must free(), or
  • have the caller supply a buffer into which to write.

I tend to prefer the latter – it avoids overhead of heap allocation in many cases, and reduces the risk of memory leaks due to misuse.

I would write that as

#include <stdbool.h>
#include <stdint.h>
#include <string.h>

bool uint32_to_binarystr(char *binarystr, size_t len, uint32_t n) {
    if (len < 33) {
        // 32 bits + 1 terminating NUL char
        return false;
    } 
    ⋮
    return true;
}
int main(void)
{
    char s1[33];
    char s2[33];
    assert(uint32_to_binarystr(s1, sizeof s1, 0));
    assert(uint32_to_binarystr(s2, sizeof s2, 1000000));
    printf("%s\n%s\n", s1, s2);
}

Passing the length available might seem unnecessary (who would supply a buffer that’s too short?) but it does turn out to be a good idea in practice.


For the algorithm, I would approach this with a “moving bit” (for simplicity, I’ll demonstrate the uint8_t version). If we start with our number abcdefgh and an initial mask 10000000, then we can decide whether the first digit is 0 or 1 using a bitwise AND operation n & mask. Then move to the next digit by shifting mask rightwards by one. When the bit runs off the end, mask becomes zero, and we know we’re done.

In code (and switching back to 32 bits), that looks like:

for (uint32_t mask = 0x80000000;  mask;  mask >>= 1) {
    if (n & mask) {
        /* we have a 1 bit */
    } else {
        /* we have a 0 bit */
    }
}

We can simplify the body of the loop, too, because C guarantees that '1' is one more than '0':

char digit = '0' + ((n & mask) != 0);

If we write all the characters like this, then there’s no need for memset() at the beginning.

Putting this all together, I have a replacement which is worth studying:

#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>

bool uint32_to_binarystr(char *buf, size_t len, uint32_t n) {
    if (len < 33) {
        // 32 bits + 1 terminating NUL char
        return false;
    }
    for (uint32_t mask = 0x80000000;  mask;  mask >>= 1) {
        bool bit_is_set = n & mask;
        *buf = '0' + bit_is_set;
        ++buf;
    }
    *buf = '\0';          /* add the terminator */
    return true;
}

Attribution
Source : Link , Question Author : Sean Xie , Answer Author : Toby Speight

Leave a Comment