Goal: A brief exploration of what it means to "pack" and "unpack" bytes.
I've come across Ruby's
String#unpack methods, but never had the time to dive into them. While researching another article, I came across this question and decided to stop to explore it.
Exploration 1: Packing into two bytes
I can't define "packing", but I've gathered that it's a term for representing a series of bytes as a string. And depending on how you do it, you can even do this in fewer bytes than the original. Unpacking is the reverse: recovering the original information.
Trying an example based on the Stack Overflow question. I have a bunch of bytes, ie values between 0 (
00000000) and 255 (
11111111). Supposing I take two at random, maybe 126 and 2.
let [a, b] = [126, 2] console.log(a.toString(2).padStart(8, '0')) // 01111110 console.log(b.toString(2).padStart(8, '0')) // 00000010
I could represent them in a string by using the JS escape hexadecimal sequence:
console.log(a.toString(16).padStart(2, '0')) // 7e console.log(b.toString(16).padStart(2, '0')) // 02 console.log('\x7E\x02') // "~"
Buffer.from('\x7E\x02', 'utf16le').byteLength // 4
This string has two characters of two bytes each:
00 7e and
00 02. I want to pack the bytes so the string has only one character,
7e 02. Here's how:
let char = String.fromCharCode((a << 8) | b) console.log(char); // "縂" Buffer.from(char, 'utf16le').byteLength // 2
This is a bit of bit arithmetic (haha).
a << 8means "shift the bits in
aleft 8 times"
- shifting 126 (
01111110) left 8 times gives us
| bis a bitwise
01111110 00000000ORed with 2 (
01111110 00000010, which is what I want (
So there it is. I started with two bytes, and was able to fit them into a 2-byte character [note 2]. How about unpacking? Some more bitwise magic.
let bytes = char.charCodeAt(0) let byteA = bytes >> 8 // Shift the bits to the right 8 times to get the first byte let byteB = bytes & 0xFF // Bitwise AND the bits with 11111111 to keep only the second byte // Alternative: // byteB = bytes ^ (byteA << 8) console.log(byteA, byteB) // 126, 2
let byteArray = new Uint8Array([a, b]) let packedStr = new TextDecoder('utf-16be').decode(byteArray) console.log(packedStr) // "縂"
However, unpacking with
TextEncoder gives wrong results for this use case, since it only supports UTF-8:
let unpackedArray = new TextEncoder.encode(packedStr) console.log(unpackedArray) // Uint8Array [231, 184, 130]
Exploration 2: packing into one byte
Speaking of UTF-8, it's time to try that. But I'm changing some things:
- I won't use JS here, since its strings are UTF-16. I probably can use it, but I don't want that headache. Plus, I love any excuse to work with Ruby.
- All the bytes I'll pack are in the range 0 to 15. I've intentionally made it smaller so that I can pack two bytes into one UTF-8 character (one byte). I'll use 13 and 2 as my test bytes.
Packing in Ruby is pretty similar:
a, b = 13, 2 puts a.to_s(2).rjust(8, '0') # 00001101 puts b.to_s(2).rjust(8, '0') # 00000010 # hex puts a.to_s(16).rjust(2, '0') # 0d puts b.to_s(16).rjust(2, '0') # 02 char = ((a << 4) | b).chr # Shift by 4 bits, not 8, since I'm now packing in one byte puts char # => "\xD2" puts char.length # => 1 puts char.bytes.length # => 1 bytes = char.ord byteA = bytes >> 4 byteB = bytes & 0x0F # AND with 0F, not FF, since I'm splitting up one byte puts byteA, byteB # 13, 2
The output string here is a single byte "\xD2"...which is simply the original
02 bytes packed together 😀 Unfortunately, it's not a valid printable character, so printing it shows
�, but it's there.
As mentioned earlier, Ruby has inbuilt
unpack methods, but they can only map byte to byte, so i couldn't use them for this example.
packed = [a, b].pack('c*') # => "\r\x02" packed.unpack('c*') # => [13, 2]
But they work with the original UTF-16 example:
a, b = 126, 2 packed = [a, b].pack('c*') # => "~\x02" packed.unpack('c*') # => [126, 2]
7E 02. The difference is the encoding; in Ruby, this string is UTF-8, so it's rendered differently. But I can change the encoding and see for myself!
packed.force_encoding 'utf-16be' # => "\u7E02" packed.length # => 1 packed.bytes.length # => 2
Possible uses of packing
Why would you want to pack, though? I'm thinking, perhaps in a constrained environment like gaming over the Internet. If there is a limited number of possible buttons a player can press (say 12), instead of transmitting each button press as one byte, I could:
- wait for a few milliseconds, to gather the next few keypresses and send in a batch
- pack these keypresses into a byte. 12 possible buttons can fit in 4 bits (2^4 = 16), so two keypresses can go in one byte (8 bits).
In this, packing serves as a form of compression, to send less data over the network and improve the gaming experience (less data to download, so responses can be faster).
I also found this question, from a user who wanted to send a UUID as binary data. This is a valid use, since UUIDs are often rendered as strings, but they're actually a sequence of 16 bytes. Sending them as a string would take 36 bytes, so packing is useful here. You could also do this for other "binary-but-look-like-strings" data, like SHA-512 hashes for instance.
Let me know if you can think of any other uses.
1. The ECMAScript spec says:
When a String contains actual textual data, each element is considered to be a single UTF-16 code unit.
So JS strings are UTF-16. However, many modern Web APIs, like
TextEncoder, and even older Node.js ones like
Buffer assume (or accept only) UTF-8. My guess is that they expect the string to be from the outside world (reading a file, an API response, etc), in which case, it's most likely UTF-8.
The only reliable way I found to get the byte length of a native JS string (UTF-16) is
Buffer.from(string, 'utf16le').byteLength. Commonly suggested ways I found include
Blob, but they always assume UTF-8.
For this to work as expected, I had to specify UTF-16 Big Endian (
utf-16be) as the encoding. UTF-16 because I want 2-bytes per character, and big-endian because I want the big digits at the end, like I did in the custom packer.
(Confession: I built Tentacle✋ It helps you keep a clean inbox by combining your favourite blogs into one weekly newsletter.)