Compressing interlaced video

2015 A. Tarpai (tarpai76 gmail)

Interlacing is a display device technique.

It was H.262 (MPEG-2), which first introduced improvements compressing interlaced video.

Capturing, transferring interlaced video is just a stream of pictures. Interlacing happens at the end: on the display device, where alternating fields appear as one frame.

                                                     _____________  
                                                    /             \ 
+------+ +------+ +------+                          |             | 
|      | |      | |      |    - - - - - - - - >     |     T V     | 
+------+ +------+ +------+                          |             | 
                                                    \_____________/ 
                                                     display device

Interlace vs. progressive

The codec is supposed to reconstruct the same pictures as the source - and send it to the display device (not drawn anymore).

Interlaced input                        _______________________                        
                                       |                       |           
 _______   _______   _______           |                       |           _______   _______   _______ 
|       | |       | |       |  ------> |         CODEC         | ------>  |       | |       | |       |
|_______| |_______| |_______|          |                       |          |_______| |_______| |_______|       
                                       |_______________________|          

Progressive input                       _______________________                        
 _______   _______   _______           |                       |           _______   _______   _______  
|       | |       | |       |          |                       |          |       | |       | |       | 
|       | |       | |       |  ------> |         CODEC         | ------>  |       | |       | |       | 
|       | |       | |       |          |                       |          |       | |       | |       | 
|_______| |_______| |_______|          |                       |          |_______| |_______| |_______| 
                                       |_______________________|           

In this sense, even MPEG-1 can encode/decode interlaced video. MPEG-1 will simply not use interlace tools like MPEG-2 and allocates field-size forward/backward prediction buffers:

                                        _______________________                        
                                       |                       |           
 _______   _______   _______           |        MPEG-1         |           _______   _______   _______ 
|       | |       | |       |  ------> |         CODEC         | ------>  |       | |       | |       |
|_______| |_______| |_______|          |                       |          |_______| |_______| |_______|     
                                       |_______________________|          
                                              |         |                  
                                           ___|___   ___|___             
                                          |   fw  | |   bw  |                 
                                          |_______| |_______|                 
                                            picture buffer

But by telling the MPEG-2 codec that the video source is interlaced, it can perform further techniques to gain better compression - on the cost of complexity. MPEG-2 will then buffer the source pictures into frames for analysis and makes fr/fi decisions. The picture buffer is always frame buffer (showing matching progressive format):

Interlaced input                        _______________________                              
                                       |                       |           
 _______   _______   _______           |       MPEG-2          |           _______   _______   _______ 
|       | |       | |       |  ------> |        CODEC          | ------>  |       | |       | |       |
|_______| |_______| |_______|          |                       |          |_______| |_______| |_______|
                                       |                       |          
                                       |_______________________|          
                                             |          |             
                                          ___|___    ___|___                
                                         |   fw  |  |   bw  |               
                                         |_______|  |_______|               
                                         |       |  |       |               
                                         |_______|  |_______|               
                                             frame buffer   


Progressive input                       _______________________                        
 _______   _______   _______           |                       |           _______   _______   _______  
|       | |       | |       |          |                       |          |       | |       | |       | 
|       | |       | |       |  ------> |         CODEC         | ------>  |       | |       | |       | 
|       | |       | |       |          |                       |          |       | |       | |       | 
|_______| |_______| |_______|          |                       |          |_______| |_______| |_______| 
                                       |_______________________|           
                                             |          |             
                                          ___|___    ___|___                
                                         |   fw  |  |   bw  |               
                                         |       |  |       |             
                                         |       |  |       |               
                                         |_______|  |_______|           
                                             frame buffer     

For the MPEG-2 decoder this means when interlaced is signalled, double-height buffers (frame buffers) are required.

As an example, from this 6 field pictures of input MPEG-2 in interlaced mode may work like this: 0/1 is the forward reference frame, 4/5 is the backward, 2/3 is the current frame (requires decoder reordering):

 _______   _______   _______   _______   _______   _______ 
|    0  | |    1  | |    2  | |    3  | |    4  | |    5  | ...
|_______| |_______| |_______| |_______| |_______| |_______|
	                                                       
                                           
   fw            curr           bw             
 _______       _______       _______                                   
|    0  |     |    2  |     |    4  |                                  
|_______| <-- |_______| --> |_______|                                  
|    1  |     |    3  |     |    5  |                                  
|_______|     |_______|     |_______|                                  

PAFF and MBAFF

The principle is the same: the encoder de-interlaces the input and makes fr/fi decisions:

Moving scene:


                (-1;-1)               (-1;-1)                (-1;0)                                 Field Prediction
   ..####......          ....##......          ............          ............                   Moving object, captured as fields. 
   ...##.......  ---->   ...####.....  ---->   .....##.....  ---->   ......##....                       Almost pure (error-free) prediction. 
   ............          ....##......          ....####....          .....####...
   ............          ............          .....##.....          ......##....
                                            
                                             Field Prediction
  
   
   
   
                         ..####......                                ............                   De-interlaced format: awful artifacts! 
                         ....##......            (?;?)               ............                       Prediction and a lots of prediction error. 
                         ...##.......                                .....##.....                       
                         ...####.....          --------->            ......##....
                         ............                                ....####....
                         ....##......                                .....####...
                         ............                                .....##.....
                         ............                                ......##....
                                             Frame Prediction
   

                         
                                
Frame Prediction: still scene


                 (0;0)                 (0;0)                 (0;0)
   ....##......          ....##......          ....##......          ....##......                 Still object (background).
   ...####.....  ---->   ...####.....  ---->   ...####.....  ---->   ...####.....                    Little redundant to predict 3 times as fields. 
   ....##......          ....##......          ....##......          ....##......
   ............          ............          ............          ............
                         
                                             Field Prediction
                                                                                                     

                         ....##......                                ....##......                 De-interlaced: simply predict as frame              
                         ....##......                                ....##......                     one MV only, little error
                         ...####.....            (0;0)               ...####.....                                
                         ...####.....          --------->            ...####.....                                
                         ....##......                                ....##......                                
                         ....##......                                ....##......                                
                         ............                                ............                                
                         ............                                ............
                                             Frame Prediction

On the picture-level (PAFF) this is how it looks like:

                                                                                 _______   _______              
                                _______                              ------->   |       | |       | coded fields
 _______   _______             |       |                           /            |_______| |_______|             
|       | |       |   ------>  |       |      ..analysis..   -----                                            
|_______| |_______|            |       |                           \             _______                        
                               |_______|                             ------->   |       |                       
    source pictures                                                             |       | coded frame  
                              Frame buffer                                      |       |                       
                                                                                |_______|                       

On the macroblock-level (MBAFF), in the case of this special de-interlaced coded frame, further refined, separating still background from foreground objects that are moving:

So the encoder now performs the same analysis but for each of the 16x16 area, finds the best matching frame-area or 2 fields that gives the least prediction error. Yes, even MPEG-2 does this - although MBAFF is a term for AVC and the actual implementation is a little different:

	=== === === === === === === === 
	=== === === === === === === === 
	=== === === === === === === ===            ===     frame macroblock          : pixel data from both top and bottom field, alternating          
	=== === === === === === === ===            ---     top-field macroblock      : pixel data only from the top field                          
	=== === --- --- --- --- === ===            ___     bottom-field macroblock   : only from bottom field   
	=== === ___ ___ ___ ___ === ===                                            
	=== === === --- --- --- --- ===            
	=== === === ___ ___ ___ ___ === 
	=== === === === --- --- === === 
	=== === === === ___ ___ === === 
	=== === === === === === === === 
	=== === === === === === === === 
	
	          MBAFF frame