joedevivo.com

After watching my Erlang User Conference talk, "HTTP/2 and You!", I remembered that I skipped right over integer representation in HPACK because of time constraints. As I'm about to dive into the topic again at LambdaJam next week in Chicago, I figured I'd write it up here as a supplement to both talks. I feel like it might go down easier as a blog post anyway.

The HPACK Spec

The HPACK spec, RFC-7541, goes into how integer representation works here and it's a good, if not dense, read.

Header Types

HPACK is really stingy. It loves to save bits on the wire whenever it can. When it encodes a header in a set of bytes, it starts that set of bytes with a set of bits letting everyone know what type of header it is:

Indexed Header Field

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 1 |        Index (7+)         |
+---+---------------------------+

We're sending over a single integer to represent the header on the wire. The bit prefix 1, is telling us that it's this type, and the following 7 bits are the integer we're sending. You already see the problem, don't you? What if the integer we're sending is greater than 2^7-1? Since that's only 127, it's a likely situation. That's why that above figure from the HPACK spec calls Index (7+).

Literal Header Field with Incremental Indexing

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| 0 | 1 |      Index (6+)       |
+---+---+-----------------------+
| H |     Value Length (7+)     |
+---+---------------------------+
| Value String (Length octets)  |
+-------------------------------+

This time, the header value isn't in the encoding and decoding contexts, but we want it to be in the future. The header name happens to already be in the contexts, so we're going to use the bit prefix 01. When we use 01, the Index is now a (6+). Which means we could only fit 2^6-1 integers in that remaining 6 bits before we run out of space.

Value Length is another integer, this time of (7+).

Other Prefixes

I won't deep dive into the types of headers on the wire here, except to give you a list of the other bit prefixes that an integer might follow:

H - String lengths are all prefixed by a bit that specifies wether or not

the string is Huffman encoded

0000 - Literal Header Field without Indexing
0001 - Literal Header Field Never Indexed
001 - Dynamic Table Size Update

Integer Prefix Lengths

What that all means is that HPACK will be trying to encode integers in the following formats:

4+
5+
6+
7+

How It Does It

Since the algorithm for encoding integers is the name for all four, we'll start referring to the number of bits we have as N.

For integers less than 2^N-1, it's super easy. Just stick the bits in those N that are available and you're done.

It's for Integer greater than 2^N-1 that it gets tricky.

Step 1: Set all N bits to 1.

Step 2: Since HPACK is super stingy with bits, it doesn't want to waste N bits, so it will consider 2^N-1 of the integer you want to send already sent.

Now we're left with Remaining = Integer - (2^N-1) to send over the wire and we're going to do it in whole bytes.

The first bit in that byte is going to be a continuation bit.

1 means there are more bytes
0 means this is the last one

Ok, great, 7 more bits to work with. We're just going to take the least significant 7 bits from the Remaining and put them in this next byte. If Remaining less than 128, we set the continuation bit to 0 and we're done. If it's greater, we set the continuation bit to 1, then we bitshift right Remaing and do the whole thing again.

Here's how I did it in Erlang. You can see the whole integer module at hpack_integer.erl

-spec encode(non_neg_integer(), pos_integer()) -> binary().
encode(Int, N) when Int < (1 bsl N - 1) ->
    <<Int:N>>;
encode(Int, N) ->
    Prefix = 1 bsl N - 1,
    Remaining = Int - Prefix,
    Bin = encode_(Remaining, <<>>),
    <<Prefix:N, Bin/binary>>.

encode/2 takes Int that you want to encode and N. It takes care of Steps 1 and 2 I described above, as well as the case where Int < 2^N-1.

For numbers larger than 2^N-1, it calls encode_/2 which will just recurse until we're out of bits to encode.

-spec encode_(non_neg_integer(), binary()) -> binary().
encode_(I, BinAcc) ->
    LeastSigSeven = (I rem 128),
    RestToEncode = I bsr 7,
    case RestToEncode of
        0 ->
            <<BinAcc/binary, LeastSigSeven>>;
        _ -> %% Adds the continuation bit
            ThisByte = 128 + LeastSigSeven,
            encode_(RestToEncode, <<BinAcc/binary, ThisByte>>)
    end.

What Happens when Integer = 2^N-1

I've been tip-toeing around this one because I assumed it would just be setting all N bits to 1 and done. Then I wiresharked it and it didn't work. Turns out that all N bits to 1 is a signal that we're doing this encoding method, and HPACK expects the rest to be encoded in the following bytes, but there is no Rest.

We've got to send a zero byte to follow this up with. So, sending 31 with N = 5 looks like this:

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
| X | X | X | 1 | 1 | 1 | 1 | 1 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+---+---+---+---+---+---+---+---+

Which is a lot of wasted 0s, which we know HPACK hates; but, it's an edge case so HPACK still sleeps well at night.

I've created some simple clauses of encode/2 to handle this edge case:

encode(  1,1) -> <<  1:1,0:8>>;
encode(  3,2) -> <<  3:2,0:8>>;
encode(  7,3) -> <<  7:3,0:8>>;
encode( 15,4) -> << 15:4,0:8>>;
encode( 31,5) -> << 31:5,0:8>>;
encode( 63,6) -> << 63:6,0:8>>;
encode(127,7) -> <<127:7,0:8>>;
encode(255,8) -> <<255,0>>;

Examples

Appendix C.1 of the HPACK RFC-7541 has a few examples that I'd just be copying into this space, so I'll just link to them instead

Integer Representation in HPACK