Chroma sub-sampling reduces bandwidth without major perception error.

SIGNAL ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ luma sampling | | | | | | | | | | | | | | | | | | | | | | | | | | | | chroma sampling | | | | | | | | | | | | | |

JPEG gives sampling factors H/V for each component and the image dimension X/Y in the SOF header. All parameters can be computed necessary for decoding from these. X/Y is in samples, H/V is in data units (a DU is an 8x8 block of coefficients in DCT-, or one sample in lossless-mode).

0000021B: SOF2 (Progressive DCT) P=8 Y=257 X=255 Nf=4 Ci=200 HV=1x1 Qi=0 Ci=150 HV=1x2 Qi=1 Ci=100 HV=3x1 Qi=2 Ci= 50 HV=1x4 Qi=3

- image pixel dimensions: 255 x 257
- 8x8 DCT-mode
- progressive JPEG
- 4 components, sampling factors in HV

This specifies the MCU structure. From the above H_{max}=3 and V_{max}=4:

Hmax=3 +-----------+ | | | | |---+---+---| | | | | |---+---+---| Vmax=4 | | | | |---+---+---| | | | | +-----------+

The MCU structure is the basis for image color pixel conversion using sub-sampled component data. Each image component provides n=HxV samples per MCU:

HV=1x1 HV=1x2 HV=3x1 HV=1x4 +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+ | 0 | | | | 1 | | | | 3 | 4 | 5 | | 6 | | | +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+ | | | | | 2 | | | | | | | | 7 | | | +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+ | | | | | | | | | | | | | 8 | | | +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+ | | | | | | | | | | | | | 9 | | | +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+

For this particular image, we need 1+2+3+4=10 decoded sample values in this arrangement to convert to 3x4 color image pixels, as the minimal unit:

+-----------+ | | | | |---+---+---| | | | | 0 1 2 3 4 5 6 7 8 9 -------> |---+---+---| 3 x 4 image pixels | | | | |---+---+---| | | | | +-----------+

The MCU structure is also the basis how data units appear in Interleaved scans^{1}:

_ _ ___MCU_____ __________MCU________ _____MCU___ _ _ | 0 1 2 3 4 5 6 7 8 9 | 1x1 + 1x2 + 3x1 + 1x4 = 1 + 2 + 3 + 4 = 10 DU/MCU (max. allowed)

(1): Interleaved scan: more than 1 component in a scan. Not all 4 required and not in the order specified in SOF, freely decided by the encoder.

(In baseline JPEG we can build the final image MCU-by-MCU.)

We don't know yet (K1 is progressive JPEG), but if there will be an (full) interleaved scan, data for full MCU-s are in the coded stream. So we compute the *MCU-coverage* (mcu_width/mcu_height) from X/Y and H_{max}/V_{max}:

These parameters are computed for coefficient memory allocation and to determine the number of MCU-s in an Interleaved scan.

<----------------------- X ------------------------> --- Hmax --> . +-----------+-----------+------------------------------+ . | | | | | | | | | | | | | |---+---+---|---+---+---|---+---+--- | | Vmax | | | | | | | | | | | | | |---+---+---|---+---+---|---+---+--- | | | | | | | | | | | | | | | |---+---+---|---+---+---|---+---+--- | | v | | | | | | | | | | | +-----------+-----------+------------------------------+ Y | | | | | | | | | | | |---+---+---|---+---+---|---+---+--- | | | | | | | | | | | | | | | | | | v | | +------------------------------------------------------+

MCU-coverage in DCT-mode:

mcu_width = div(X, Hmax*8) mcu_height = div(Y, Vmax*8)

MCU-coverage in Lossless-mode:

mcu_width = div(X, Hmax) mcu_height = div(Y, Vmax)

Where *div()* is division with rounding up:

int div(int a, int b) { return ( a + b - 1 ) / b; }

For this image mcu_width=9 and mcu_height=11. 99 MCU covers the image. 10 8x8 DU per MCU it's 990 DU all together. So call `malloc(990*sizeof(DU))`

for full image coefficient memory.

Because of sub-sampling, the size and dimension of coefficient memory is different for each component. The first component gives 1x1=1 DU per MCU, so the first component's MCU coverage is 99. And so on: 2x99=198, 3x99=297 and 4x99=396 for the other components.

There are many ways to compute these parameters..

Also the basis for memory allocations:

du_width = mcu_width * Hi du_height = mcu_height * Vi

For this image:

HV=1x1 => (11 x 1) x (9 x 1) = 99 HV=1x2 => (11 x 1) x (9 x 2) = 198 HV=3x1 => (11 x 3) x (9 x 1) = 297 HV=1x4 => (11 x 1) x (9 x 4) = 369 ______________________________________________ sum: 990

Indeed, 11 x 9 MCU, 10 DU/MCU gives 990.

Data unit planes per image component allocated:

HV=1x1 +---+---+---+--------+ | | | | | (11 x 1) x (9 x 1) = 11 x 9 = 99 +---+---+---+-- | | | | | | +---+---+---+-- | | | | | | | | | | +--------------------+ HV=1x2 +---+---+---+--------+ | | | | | (11 x 1) x (9 x 2) = 11 x 18 = 198 |---|---|---|- | | | | | | +---+---+---+- | | | | | | |---|---|---|- | | | | | | +---+---+---+- | | | | | | | | | | | | | | | | | | +--------------------+ HV=3x1 +-----------+-----------+------------------------------+ | | | | | | | | | | (11 x 3) x (9 x 1) = 33 x 9 = 297 +-----------+-----------+------------------------------+ | | | | | | | | | | +-----------+-----------+------------------------------+ | | | | | | | | | | | | | | +------------------------------------------------------+ HV=1x4 +---+---+---+--------+ | | | | | (11 x 1) x (9 x 4) = 11 x 36 = 369 |---|---|---|-- | | | | | | |---|---|---|-- | | | | | | |---|---|---|-- | | | | | | +---+---+---+-- | | | | | | |---|---|---|-- | | | | | | |---|---|---|-- | | | | | | |---|---|---|-- | | | | | | +---+---+---+-- | | | | | | | | | | | | | | | | | | | | | | | | +--------------------+

This is only needed for progressive JPEG to prepare parameters for possible single scans. In a single scan, only the necessary number of data units are coded.

First the number of *samples* per component necessary to create the final image:

xi = div(X*Hi, Hmax) yi = div(Y*Vi, Vmax)

The DU-coverage in DCT-mode:

du_xi = div(xi, 8) du_yi = div(yi, 8)

The DU-coverage in Lossless-mode:

du_xi = div(xi, 1) = xi du_yi = div(yi, 1) = yi

Which gives the number of *data units* per component necessary to provide *xi/yi* samples. In Lossless-mode these two are the same.

Where *div()* is division with rounding up.

xi <-----------------------------------------> <-- xi= div(X*Hi, Hmax) +---+---+---+---+---+---+---+---+---+---+---+ | | | | | | | | | | | | <-- du_xi= div(xi, 8)

Single scans contain exactly du_xi * du_yi data units per component, which is not necessarly the same as du_width/du_height (see partial MCU below).

The number of DU-s for a component in a scan may differ in single- and when the component is participating in interleaved-scans. Interleaved scans carry full MCU data - a single scan don't.

+-------------------------------------+ | |<-- component's single scan | +--------------+--------------+----|---------+ | | DU DU DU | DU DU DU | DU | DU DU |<-- component in interleaved scan | | | | | | | | DU DU DU | DU DU DU | DU | DU DU | | +--------------+--------------+----|---------+ | | DU DU DU | DU DU DU | DU | DU DU | | | | | | | | | DU DU DU | DU DU DU | DU | DU DU | | +--------------+--------------+----|---------+ | | DU DU DU | DU DU DU | DU | DU DU | +-------------------------------------+ | | DU DU DU | DU DU DU | DU DU DU | +--------------+--------------+--------------+

In the fig above, the last 2 colums and the last row of DU for this component is not needed to create the final image, but should appear in the compressed stream.

- Single Scan contain exactly du_xi * du_yi data units per component
- Interleaved Scan contain du_width * du_height data units per component

Due to different *rounding up*, these two might not be the same.

Unused DU-s are encoded, but discarded by the decoder.

Furthermore, in DCT-mode, due to rounding up to 8, unused coefficients (x) are not needed to create the final image, but are encoded and used in the DCT-process. Sample values of (x) are filled up with edge-samples by the encoder.

Example partials in DCT-mode:

<---------------- xi ------------------> +------+ +------+ +------+ +------+ +------+ | | | | | | | | | xx| Single Scan du_width | | | | | | | | | xx| +------+ +------+ +------+ +------+ +------+ +------+ +------+ +------+ +------+ +------+ +------+ +------+ | | | | | | | | | xx| |xxxxxx| |xxxxxx| Interleaved Scan du_width | | | | | | | | | xx| |xxxxxx| |xxxxxx| +------+ +------+ +------+ +------+ +------+ +------+ +------+

In the SOF header Hi/Vi for each component and X/Y of the image is specified.

First we compute mcu_width and mcu_height from Hmax and Vmax.

Then for each component, du_width and du_height to allocate coefficient memory. This is the total number of MCU-s in a possible interleaved scan.

We also compute du_xi and du_yi for possible single scans.

During conversion we compute X x Y image pixels from xi x yi component samples.

2012 Attila Tarpai (tarpai76 at gmail)