SAMPLING FACTORS IN JPEG

Background

Chroma sub-sampling reduces bandwidth without major perception error.

         SIGNAL      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  luma sampling      | | | | | | | | | | | | | | | | | | | | | | | | | | | |  
chroma sampling      |   |   |   |   |   |   |   |   |   |   |   |   |   |  

JPEG gives sampling factors H/V for each component and the image dimension X/Y in the SOF header. All parameters can be computed necessary for decoding from these. X/Y is in samples, H/V is in data units (a DU is an 8x8 block of coefficients in DCT-, or one sample in lossless-mode).

Example from ITU image K1.JPG

0000021B: SOF2 (Progressive DCT)
  P=8 Y=257 X=255
  Nf=4
    Ci=200 HV=1x1 Qi=0
    Ci=150 HV=1x2 Qi=1
    Ci=100 HV=3x1 Qi=2
    Ci= 50 HV=1x4 Qi=3

This specifies the MCU structure. From the above Hmax=3 and Vmax=4:

   Hmax=3
+-----------+
|   |   |   |
|---+---+---|
|   |   |   |
|---+---+---|  Vmax=4
|   |   |   |
|---+---+---|
|   |   |   |
+-----------+

The MCU structure is the basis for image color pixel conversion using sub-sampled component data. Each image component provides n=HxV samples per MCU:

   HV=1x1              HV=1x2                HV=3x1             HV=1x4
+---+---+---+       +---+---+---+        +---+---+---+       +---+---+---+
| 0 |   |   |       | 1 |   |   |        | 3 | 4 | 5 |       | 6 |   |   |
+---+---+---+       +---+---+---+        +---+---+---+       +---+---+---+
|   |   |   |       | 2 |   |   |        |   |   |   |       | 7 |   |   |
+---+---+---+       +---+---+---+        +---+---+---+       +---+---+---+
|   |   |   |       |   |   |   |        |   |   |   |       | 8 |   |   |
+---+---+---+       +---+---+---+        +---+---+---+       +---+---+---+
|   |   |   |       |   |   |   |        |   |   |   |       | 9 |   |   |
+---+---+---+       +---+---+---+        +---+---+---+       +---+---+---+

For this particular image, we need 1+2+3+4=10 decoded sample values in this arrangement to convert to 3x4 color image pixels, as the minimal unit:

                               +-----------+
                               |   |   |   |
                               |---+---+---|  
                               |   |   |   | 
0 1 2 3 4 5 6 7 8 9  ------->  |---+---+---|     3 x 4 image pixels
                               |   |   |   | 
                               |---+---+---|
                               |   |   |   | 
                               +-----------+

The MCU structure is also the basis how data units appear in Interleaved scans1:

_ _ ___MCU_____ __________MCU________ _____MCU___ _ _
               | 0 1 2 3 4 5 6 7 8 9 |
               
               
1x1 + 1x2 + 3x1 + 1x4 = 1 + 2 + 3 + 4 = 10 DU/MCU (max. allowed)

(1): Interleaved scan: more than 1 component in a scan. Not all 4 required and not in the order specified in SOF, freely decided by the encoder.

(In baseline JPEG we can build the final image MCU-by-MCU.)

1. MCU coverage of the image

We don't know yet (K1 is progressive JPEG), but if there will be an (full) interleaved scan, data for full MCU-s are in the coded stream. So we compute the MCU-coverage (mcu_width/mcu_height) from X/Y and Hmax/Vmax:

These parameters are computed for coefficient memory allocation and to determine the number of MCU-s in an Interleaved scan.

          <----------------------- X ------------------------>
        
          --- Hmax -->
        . +-----------+-----------+------------------------------+    .
        | |   |   |   |   |   |   |   |   |                      |    |
        | |---+---+---|---+---+---|---+---+---                   |    |
   Vmax | |   |   |   |   |   |   |   |   |                      |    |
        | |---+---+---|---+---+---|---+---+---                   |    |
        | |   |   |   |   |   |   |   |   |                      |    |
        | |---+---+---|---+---+---|---+---+---                   |    |
        v |   |   |   |   |   |   |   |   |                      |    |
          +-----------+-----------+------------------------------+    Y
          |   |   |   |   |   |   |   |   |                      |    |
          |---+---+---|---+---+---|---+---+---                   |    |
          |   |   |   |   |   |   |   |   |                      |    |
          |                                                      |    |
          |                                                      |    v
          |                                                      |     
          +------------------------------------------------------+      

MCU-coverage in DCT-mode:

mcu_width  = div(X, Hmax*8)
mcu_height = div(Y, Vmax*8)

MCU-coverage in Lossless-mode:

mcu_width  = div(X, Hmax)
mcu_height = div(Y, Vmax)

Where div() is division with rounding up:

int div(int a, int b)
{
  return ( a + b - 1 ) / b;
} 

For this image mcu_width=9 and mcu_height=11. 99 MCU covers the image. 10 8x8 DU per MCU it's 990 DU all together. So call malloc(990*sizeof(DU)) for full image coefficient memory.

2. MCU coverage per image component

Because of sub-sampling, the size and dimension of coefficient memory is different for each component. The first component gives 1x1=1 DU per MCU, so the first component's MCU coverage is 99. And so on: 2x99=198, 3x99=297 and 4x99=396 for the other components.

There are many ways to compute these parameters..

Also the basis for memory allocations:

du_width  = mcu_width  * Hi
du_height = mcu_height * Vi

For this image:

HV=1x1  =>  (11 x 1) x (9 x 1) =  99
HV=1x2  =>  (11 x 1) x (9 x 2) = 198
HV=3x1  =>  (11 x 3) x (9 x 1) = 297
HV=1x4  =>  (11 x 1) x (9 x 4) = 369
______________________________________________

      sum:                       990

Indeed, 11 x 9 MCU, 10 DU/MCU gives 990.

Data unit planes per image component allocated:

HV=1x1
+---+---+---+--------+
|   |   |   |        |  (11 x 1) x (9 x 1) = 11 x 9 = 99
+---+---+---+--      |
|   |   |   |        |
+---+---+---+--      |
|   |   |   |        |
|                    |
|                    |
+--------------------+
	
HV=1x2
+---+---+---+--------+
|   |   |   |        |  (11 x 1) x (9 x 2) = 11 x 18 = 198
|---|---|---|-       |
|   |   |   |        |
+---+---+---+-       |
|   |   |   |        |
|---|---|---|-       |
|   |   |   |        |
+---+---+---+-       |
|   |   |   |        |
|                    |
|                    |
|                    |
|                    |
|                    |
|                    |
+--------------------+
	
HV=3x1
+-----------+-----------+------------------------------+
|   |   |   |   |   |   |   |   |                      |  (11 x 3) x (9 x 1) = 33 x 9 = 297
+-----------+-----------+------------------------------+
|   |   |   |   |   |   |   |   |                      |
+-----------+-----------+------------------------------+
|   |   |   |   |   |   |   |   |                      |
|                                                      |
|                                                      |
+------------------------------------------------------+
	
HV=1x4
+---+---+---+--------+
|   |   |   |        | (11 x 1) x (9 x 4) = 11 x 36 = 369
|---|---|---|--      |
|   |   |   |        |
|---|---|---|--      |
|   |   |   |        |
|---|---|---|--      |
|   |   |   |        |
+---+---+---+--      |
|   |   |   |        |
|---|---|---|--      |
|   |   |   |        |
|---|---|---|--      |
|   |   |   |        |
|---|---|---|--      |
|   |   |   |        |
+---+---+---+--      |
|   |   |   |        |
|                    |
|                    |
|                    |
|                    |
|                    |
|                    |
|                    |
|                    |
|                    |
+--------------------+

3. DU coverage for Single Scan

This is only needed for progressive JPEG to prepare parameters for possible single scans. In a single scan, only the necessary number of data units are coded.

First the number of samples per component necessary to create the final image:

xi = div(X*Hi, Hmax)
yi = div(Y*Vi, Vmax)

The DU-coverage in DCT-mode:

du_xi = div(xi, 8)
du_yi = div(yi, 8)

The DU-coverage in Lossless-mode:

du_xi = div(xi, 1) = xi
du_yi = div(yi, 1) = yi

Which gives the number of data units per component necessary to provide xi/yi samples. In Lossless-mode these two are the same.

Where div() is division with rounding up.

                    xi                      
<----------------------------------------->               <-- xi= div(X*Hi, Hmax)

+---+---+---+---+---+---+---+---+---+---+---+
|   |   |   |   |   |   |   |   |   |   |   |             <-- du_xi= div(xi, 8)

Single scans contain exactly du_xi * du_yi data units per component, which is not necessarly the same as du_width/du_height (see partial MCU below).

Partial MCU

The number of DU-s for a component in a scan may differ in single- and when the component is participating in interleaved-scans. Interleaved scans carry full MCU data - a single scan don't.

+-------------------------------------+ 
|                                     |<-- component's single scan
|  +--------------+--------------+----|---------+
|  | DU   DU   DU | DU   DU   DU | DU | DU   DU |<-- component in interleaved scan
|  |              |              |    |         |
|  | DU   DU   DU | DU   DU   DU | DU | DU   DU |
|  +--------------+--------------+----|---------+
|  | DU   DU   DU | DU   DU   DU | DU | DU   DU |
|  |              |              |    |         |
|  | DU   DU   DU | DU   DU   DU | DU | DU   DU |
|  +--------------+--------------+----|---------+
|  | DU   DU   DU | DU   DU   DU | DU | DU   DU |
+-------------------------------------+         |
   | DU   DU   DU | DU   DU   DU | DU   DU   DU |
   +--------------+--------------+--------------+

In the fig above, the last 2 colums and the last row of DU for this component is not needed to create the final image, but should appear in the compressed stream.

Due to different rounding up, these two might not be the same.

Unused DU-s are encoded, but discarded by the decoder.

Furthermore, in DCT-mode, due to rounding up to 8, unused coefficients (x) are not needed to create the final image, but are encoded and used in the DCT-process. Sample values of (x) are filled up with edge-samples by the encoder.

Example partials in DCT-mode:

<---------------- xi ------------------>          
       
+------+ +------+ +------+ +------+ +------+        
|      | |      | |      | |      | |    xx|            Single Scan du_width
|      | |      | |      | |      | |    xx|
+------+ +------+ +------+ +------+ +------+        

+------+ +------+ +------+ +------+ +------+ +------+ +------+        
|      | |      | |      | |      | |    xx| |xxxxxx| |xxxxxx|   Interleaved Scan du_width
|      | |      | |      | |      | |    xx| |xxxxxx| |xxxxxx| 
+------+ +------+ +------+ +------+ +------+ +------+ +------+        


Summary

In the SOF header Hi/Vi for each component and X/Y of the image is specified.

First we compute mcu_width and mcu_height from Hmax and Vmax.

Then for each component, du_width and du_height to allocate coefficient memory. This is the total number of MCU-s in a possible interleaved scan.

We also compute du_xi and du_yi for possible single scans.

During conversion we compute X x Y image pixels from xi x yi component samples.


2012 Attila Tarpai (tarpai76 at gmail)