Integer Representation in HPACK
July 9, 2015
After watching my Erlang User Conference talk, "HTTP/2 and You!", I remembered that I skipped right over integer representation in HPACK because of time constraints. As I'm about to dive into the topic again at LambdaJam next week in Chicago, I figured I'd write it up here as a supplement to both talks. I feel like it might go down easier as a blog post anyway.
The HPACK Spec
The HPACK spec, RFC-7541, goes into how integer representation works here and it's a good, if not dense, read.
Header Types
HPACK is really stingy. It loves to save bits on the wire whenever it can. When it encodes a header in a set of bytes, it starts that set of bytes with a set of bits letting everyone know what type of header it is:
Indexed Header Field
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| 1 | Index (7+) |
+---+---------------------------+
We're sending over a single integer to represent the header on the wire. The bit
prefix 1
, is telling us that it's this type, and the following 7 bits are the
integer we're sending. You already see the problem, don't you? What if the
integer we're sending is greater than 2^7-1
? Since that's only 127, it's a likely
situation. That's why that above figure from the HPACK spec calls Index (7+)
.
Literal Header Field with Incremental Indexing
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| 0 | 1 | Index (6+) |
+---+---+-----------------------+
| H | Value Length (7+) |
+---+---------------------------+
| Value String (Length octets) |
+-------------------------------+
This time, the header value isn't in the encoding and decoding contexts, but we want
it to be in the future. The header name happens to already be in the contexts,
so we're going to use the bit prefix 01
. When we use 01
, the Index is now a
(6+)
. Which means we could only fit 2^6-1
integers in that remaining 6 bits
before we run out of space.
Value Length is another integer, this time of (7+)
.
Other Prefixes
I won't deep dive into the types of headers on the wire here, except to give you a list of the other bit prefixes that an integer might follow:
H
- String lengths are all prefixed by a bit that specifies wether or not
the string is Huffman encoded
0000
- Literal Header Field without Indexing0001
- Literal Header Field Never Indexed001
- Dynamic Table Size Update
Integer Prefix Lengths
What that all means is that HPACK will be trying to encode integers in the following formats:
4+
5+
6+
7+
How It Does It
Since the algorithm for encoding integers is the name for all four, we'll start
referring to the number of bits we have as N
.
For integers less than 2^N-1
, it's super easy. Just stick the bits in those N
that are available and you're done.
It's for Integer
greater than 2^N-1
that it gets tricky.
Step 1: Set all N
bits to 1.
Step 2: Since HPACK is super stingy with bits, it doesn't want to waste N
bits,
so it will consider 2^N-1
of the integer you want to send already sent.
Now we're left with Remaining = Integer - (2^N-1)
to send over the wire and
we're going to do it in whole bytes.
The first bit in that byte is going to be a continuation bit.
1
means there are more bytes0
means this is the last one
Ok, great, 7 more bits to work with. We're just going to take the least
significant 7 bits from the Remaining
and put them in this next byte. If
Remaining
less than 128, we set the continuation bit to 0
and we're done. If
it's greater, we set the continuation bit to 1
, then we bitshift right Remaing
and do the whole thing again.
Here's how I did it in Erlang. You can see the whole integer module at hpack_integer.erl
-spec encode(non_neg_integer(), pos_integer()) -> binary().
encode(Int, N) when Int < (1 bsl N - 1) ->
<<Int:N>>;
encode(Int, N) ->
Prefix = 1 bsl N - 1,
Remaining = Int - Prefix,
Bin = encode_(Remaining, <<>>),
<<Prefix:N, Bin/binary>>.
encode/2
takes Int
that you want to encode and N
. It takes care of Steps 1
and 2 I described above, as well as the case where Int < 2^N-1
.
For numbers larger than 2^N-1
, it calls encode_/2
which will just recurse
until we're out of bits to encode.
-spec encode_(non_neg_integer(), binary()) -> binary().
encode_(I, BinAcc) ->
LeastSigSeven = (I rem 128),
RestToEncode = I bsr 7,
case RestToEncode of
0 ->
<<BinAcc/binary, LeastSigSeven>>;
_ -> %% Adds the continuation bit
ThisByte = 128 + LeastSigSeven,
encode_(RestToEncode, <<BinAcc/binary, ThisByte>>)
end.
What Happens when Integer = 2^N-1
I've been tip-toeing around this one because I assumed it would just be setting
all N
bits to 1
and done. Then I wiresharked it and it didn't work. Turns out
that all N
bits to 1
is a signal that we're doing this encoding method, and
HPACK expects the rest to be encoded in the following bytes, but there is no Rest
.
We've got to send a zero byte to follow this up with. So, sending 31 with N = 5 looks like this:
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| X | X | X | 1 | 1 | 1 | 1 | 1 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+---+---+---+---+---+---+---+---+
Which is a lot of wasted 0
s, which we know HPACK hates; but, it's an edge case
so HPACK still sleeps well at night.
I've created some simple clauses of encode/2
to handle this edge case:
encode( 1,1) -> << 1:1,0:8>>;
encode( 3,2) -> << 3:2,0:8>>;
encode( 7,3) -> << 7:3,0:8>>;
encode( 15,4) -> << 15:4,0:8>>;
encode( 31,5) -> << 31:5,0:8>>;
encode( 63,6) -> << 63:6,0:8>>;
encode(127,7) -> <<127:7,0:8>>;
encode(255,8) -> <<255,0>>;
Examples
Appendix C.1 of the HPACK RFC-7541 has a few examples that I'd just be copying into this space, so I'll just link to them instead