H.261

Source: MTS4EA

H.261 was the first successful member of video compression using block-based motion compensation.

H.261 can encode only progressive CIF or QCIF video with 4:2:0 subsampling.
The coded unit is 16x16 macroblocks.

H.261 Bitstream

The bitstream contains coded progressive pictures of CIF or QCIF size - until EOF.
It's really only a bitstream, there are no byte-alignments for any coded elements.

PSC ......... PSC ......... PSC ............. EOF

PSC is 20 bit picture start code (GN is zero):

      Start code           GN
+---------------------+ +------+
| 0000 0000 0000 0001 | | 0000 |
+---------------------+ +------+

  *not byte-aligned in H.261

H.261 arranges macroblocks in a fix GOB schema.
This is for CIF:

           GOB structure
           
+---------------+---------------+
|     GN=1      |     GN=2      |
+---------------+---------------+
|     GN=3      |     ...       |
+---------------+---------------+
|     ...       |               |
+---------------+---------------+
|               |               |
+---------------+---------------+
|               |               |
+---------------+---------------+
|     ...       |     GN=12     |
+---------------+---------------+


Each GOB is a group of 33 macroblocks:

      0   1   2   3   4   5   6   7   8   9  10  
    +-------------------------------------------+
0   | X   X   X   X   X   X   X   X   X   X   X | 
1   | X   X   X   X   X   X   X   X   X   X   X | 
2   | X   X   X   X   X   X   X   X   X   X   X | 
    +-------------------------------------------+

Data for each GOB (GN=1..12) is coded in the stream.

The GOB start code:

      Start code           GN
+---------------------+ +------+
| 0000 0000 0000 0001 | | xxxx |   
+---------------------+ +------+

  *not byte-aligned in H.261

After the GOB Header MB data follows.
Because some or all MB-s of the GOB can be skipped, GOB data ends by reaching the next Start Code (or EOF - see later).
This is all valid:

GOB Header.. MB.. MB.. MB.. MB.. MB.. MB.. MB.. MB.. MB.. Start Code
GOB Header.. MB.. MB.. MB.. Start Code
GOB Header.. Start Code

Start Code can be GOB or PIC. When PSC follows the GOB-layer ends. (Thus dealing the decoder with non-standard NTSC video where only 10 GOB-s are coded: try to play "short.p64" in Media Player Classic.)

H.261 EOF

There is a slight problem decoding H.261 files regarding EOF.

By standard there is no end-of-gob field and the MB-layer may just end:

1100 0010  1010 xxxx (I saw a stream ending like this: 0xC2 0xA0 with 4 unused zero bits)

The stream may end decoding anywhere inside the last GOB at any stream-bit position...
The decoder at this time has no idea whether the remaining bits (x) are actually valid stream bits or the stream has really ended. (AVC Byte Stream Format solves this by aligning start codes and the end of stream to byte boundaries plus stuffing '1'-s into 'x'.)

So.. I'm still working on some elegant solution - because in my code it is the MB-layer, which detects end-of-gob. This can be:

but because of the abovementioned, I cannot be simply sure. Furthermore, checking EOF after every MB is not really efficient. I guess in packetizied H.261 streams this is easier..

H.261 Motion compensation

There is only one reference picture: the previously decoded picture. Motion vectors are restricted in the range of -16..+15 and represents integer luma sample distances. The predicted area cannot fall outside of the picture. All these makes very simple and fast motion compensation implementation in code.

Reference samples   ---> [MV] ----> [FIL] ----> [CBP]  ---->   reconstructed samples


All 3 is optional and signalled in MBTYPE.

   Reference 8x8                   Filtered 8x8                  Reconstructed 8x8
+-----------------+             +-----------------+             +-----------------+
| . . . . . . . . |             | . . . . . . . . |             | . . . . . . . . |
| . . . . . . . . |             | . . . . . . . . |             | . . . . . . . . |
| . . . . . . . . |     FIL     | . . . . . . . . |     CBP     | . . . . . . . . |
| . . . . . . . . |  -------->  | . . . . . . . . |  -------->  | . . . . . . . . |
| . . . . . . . . |             | . . . . . . . . |  add error  | . . . . . . . . |
| . . . . . . . . |             | . . . . . . . . |             | . . . . . . . . |
| . . . . . . . . |             | . . . . . . . . |             | . . . . . . . . |
| . . . . . . . . |             | . . . . . . . . |             | . . . . . . . . |
+-----------------+             +-----------------+             +-----------------+

 Previous picture                    Temp area                     Current picture

H.261 Loop Filter

For each 8x8 predicted block using a 3-tap filter (1/4, 1/2, 1/4):

S  S  S
|  |  |
+--+--+          X= ( S[-1] + 2*S[0] + S[1] + 2 ) >> 2      (integer arithmetics)
   |
   X

Horizontal- and vertical filtering except on block edges:

+-----------------+              +-----------------+
| . H H H H H H . |              | . . . . . . . . |
| . H H H H H H . |              | V V V V V V V V |
| . H H H H H H . |              | V V V V V V V V |
| . H H H H H H . |   ------->   | V V V V V V V V |   ------> filtered block
| . H H H H H H . |              | V V V V V V V V |
| . H H H H H H . |              | V V V V V V V V |
| . H H H H H H . |              | V V V V V V V V |
| . H H H H H H . |              | . . . . . . . . |
+-----------------+              +-----------------+