C hex string to byte array

First off, I’ve Googled this question over the past few days but everything I find doesn’t work. I don’t receive runtime errors but when I type in the same key (in the form of a hex string) that the program generates to encrypt, decryption fails (but using the generated key throughout the program works fine). I’m trying to enter a hex string (format: 00:00:00. ) and turn it into a 32-byte byte array. The input comes from getpass() . I’ve done this before in Java and C# but I’m new to C++ and everything seems much more complicated. Any help would be greatly appreciated 🙂 Also I’m programming this on a linux platform so I’d like to avoid Windows-only functions.

Here is an example of what I’ve tried:

2 Answers 2

If your input has format: AA:BB:CC, you could write something like this:

What is the best way to convert a variable length hex string e.g. "01A1" to a byte array containing that data.

i.e converting this:

so that when I write this to a file and hexdump -C it I get the binary data containing 01A1 .

19 Answers 19

This ought to work:

Depending on your specific platform there’s probably also a standard implementation though.

This implementation uses the built-in strtol function to handle the actual conversion from text to bytes, but will work for any even-length hex string.

So for fun, I was curious if I could do this kind of conversion at compile-time. It doesn’t have a lot of error checking and was done in VS2015, which doesn’t support C++14 constexpr functions yet (thus how HexCharToInt looks). It takes a c-string array, converts pairs of characters into a single byte and expands those bytes into a uniform initialization list used to initialize the T type provided as a template parameter. T could be replaced with something like std::array to automatically return an array.

If you want to use OpenSSL to do it, there is a nifty trick I found:

Just be sure to strip off any leading ‘0x’ to the string.

Читайте также:  Lenovo ideapad z565 20066 характеристики

You said "variable length." Just how variable do you mean?

For hex strings that fit into an unsigned long I have always liked the C function strtoul . To make it convert hex pass 16 as the radix value.

Code might look like:

I would use a standard function like sscanf to read the string into an unsigned integer, and then you already have the bytes you need in memory. If you were on a big endian machine you could just write out ( memcpy ) the memory of the integer from the first non-zero byte. However you can’t safely assume this in general, so you can use some bit masking and shifting to get the bytes out.

C++11 variant (with gcc 4.7 — little endian format):

Crypto++ variant (with gcc 4.7):

Note that the first variant is about two times faster than the second one and at the same time works with odd and even number of nibbles (the result of "a56ac" is <0x0a, 0x56, 0xac>). Crypto++ discards the last one if there are odd number of nibbels (the result of "a56ac" is <0xa5, 0x6a>) and silently skips invalid hex characters (the result of "a5sac" is <0xa5, 0xac>).

This can be done with a stringstream , you just need to store the value in an intermediate numeric type such as an int :

If your goal is speed, I have an AVX2 SIMD implementation of an encoder and decoder here: https://github.com/zbjornson/fast-hex. These benchmark

12x faster than the fastest scalar implementations.

If you can make your data to look like this e.g array of "0x01", "0xA1" Then you can iterate your array and use sscanf to create the array of values

The difficulty in an hex to char conversion is that the hex digits work pairwise, f.ex: 3132 or A0FF. So an even number of hex digits is assumed. However it could be perfectly valid to have an odd number of digits, like: 332 and AFF, which should be understood as 0332 and 0AFF.

I propose an improvement to Niels Keurentjes hex2bin() function. First we count the number of valid hex digits. As we have to count, let’s control also the buffer size:

Читайте также:  Таблица сбалансированных пар gpu cpu

By the way, to use isxdigit() you’ll have to #include .
Once we know how many digits, we can determine if the first one is the higher digit (only pairs) or not (first digit not a pair).

Then we can loop digit by digit, combining each pair using bin shift

The goal is to convert a hex string to a byte array with the following requirements:

  • $O(1)$ additional space apart from input and output.
  • $O(n)$ runtime

This mostly just prohibits creating a new string with a 0 prepended to avoid having to deal with odd strings.

Can we write this nicer, possibly using Linq (and some chained iterables) in C#?

Test code that covers I think all requirements:

1 Answer 1

In situations like this, when dealing with odd-offset values, and byte-manipulation, I recommend four things:

  1. use a logical frame of reference.
  2. get familiar with bit-wise operations.
  3. pre-computing results at compile time is very efficient at runtime.
  4. switch statements are high-performance lookup tables

End-of-data frame of reference

What I mean by frame-of-reference, is that your common reference point between the input string, and the output array, is the last character, and the last member of the byte[] array. You should line up your reference points, and work out from there. In this case, it means working backwards.

The next thing that you have as a frame of reference, is your input value. You use this value to drive the calculation of the output array size, and also to drive the iteration in the loop. In this case, what it means is that you should be using the input chars to drive the loop, not the size / 2.

Putting this together, consider a loop that iterates from the last char, to the first char, and then populates the last byte, to the first byte. We can then do math from the end of the input/output arrays based on the loop.

Bitwise manipulations

There are some bitwise manipulation tricks you can do here that help:

Читайте также:  Посчитать длину линии в кореле

Because exactly 2 input chars are required to populate a byte, and because we are working in a base-2 (binary) system, we can take advantage of a trick in counting. As we count from the end of the input chars to the beginning, we notice that every time we count 2, we move another output byte. If we divide the count by 2, we get the relative position from the output end.

Remember, a right-shift is the same as dividing by 2 (integer division):

Bit-wise ANDing of a value with 1 tells you what the last bit is.

Bit-wise ORing of two values sets the respective output bits to 1 if either of the input bits are 1.

switch statements for fast lookups

The compiler is able to optimize a switch statement very effectively. Consider the following switch statement that converts a character in to the corresponding hex value. This is very efficient because it is based on fancy compile-time logic. It takes some effort to code, but the results are worth it:

On a character-by-character basis this will be faster than the use of ToUInt32()

Lookup Tables in memory

Again, there are only 16 hex values, this makes the memory usage very small, and you can do some memory lookup tricks.

Putting it together

  1. use a loop that counts back from the end of the input/output. Then you don’t need special odd-length input handling.
  2. identify whether an input char is going to be a high or low nibble in the result.
  3. use a lookup table (low/high nibble) to find a byte value for that input.
  4. use a bit-wise OR to add the low and high nibbles together
  5. use bitwise AND and bitwise right-shift to convert the input character to the output byte/nibble position.

The following code is a ‘simple’ loop, that is efficient, and works for your input cases:

Thus, you can accomplish the conversion with efficient switches, array lookups, and ‘simple’ math in the loops.