http://halicery.com/3Video Decoders/Sample interpolation.html

Sample interpolation

Unfinished..

Background

It is claimed that inter prediction may result in less error from sub-pixel positions. These sample values has to be computed by some kind of interpolation during encoding/decoding to form the N x M prediction block.

That is moving the search block fractional pixels and compute the prediction error using interpolated sample values during motion estimation by the encoder. On the decoder side using sub-pixel precision motion vectors (mv), sample values are computed by the same interpolation.

Example: this 4x4 area of the image moving half pixel to the right:

X     X     X     X          X  o  X  o  X  o  X  o        
                                                           
X     X     X     X          X  o  X  o  X  o  X  o            X: sample             
                      --->                                     o: interpolated sample
X     X     X     X          X  o  X  o  X  o  X  o

X     X     X     X          X  o  X  o  X  o  X  o

What are the values of o?

Linear interpolation (average)

The simplest way of computing values of o in this case is by averaging the left- and right neighbours (a special case of linear interpolation where distances from neighbours are the same):

             A o B                   o = ( A + B ) / 2

On the right edge the reference block has to be extended to (N+1) to provide neighbours for the edge-samples:

X  o  X  o  X  o  X  o  X  
                           
X  o  X  o  X  o  X  o  X          5x4 ref block  -->  4x4 linear interpolated 
                           
X  o  X  o  X  o  X  o  X  
                           
X  o  X  o  X  o  X  o  X  

The vertical case is similar, extended to (M+1):

X     X     X     X    
o     o     o     o 
X     X     X     X                4x5 ref block  -->  4x4 linear interpolated 
o     o     o     o 
X     X     X     X    
o     o     o     o 
X     X     X     X    
o     o     o     o 
X     X     X     X    

When both the vertical- and horizontal movement is fractional, the reference block is extended to (N+1)x(M+1):

X     X     X     X     X
   o     o     o     o   
X     X     X     X     X          5x5 ref block  -->  4x4 bilinear interpolated 
   o     o     o     o   
X     X     X     X     X
   o     o     o     o   
X     X     X     X     X
   o     o     o     o   
X     X     X     X     X

Bilinear interpolation is used, which needs 4 neighbours:

A       B
  \   /                o = ( A + B + C + D ) / 4 
    o     
  /   \   
C       D

FIR filter

Another way of interpolation is to look beyond the closest neighbours and calulate some weighted average. This is called Finite Impulse Response filtering (FIR) and belongs to DSP. Samples are called tap.

Example 6-tap FIR of AVC. We extend the neighbourhood from 2 to 6 samples to calculate o using weighted average (weights are constant for AVC):

     1  -5  20  20  -5   1
     ---------------------
     R   S   A o B   K   L           o = ( 1 x M - 5 x N + 20 x A + 20 x B - 5 x Q + 1 x R ) / 32

For AVC 6-tap, the reference block is extended left (by 2) and right (by 3) to provide 3+3 neighbours for the edge-samples:

      +2                       +3
   +-------+-------------+-----------+
   |       |             |           |
   |X   X  |X o X o X o X|o X   X   X|
   |       |             |           |
   +-------+-------------+-----------+

Example 8-tap FIR of MPEG-4. Extend neighbourhood to 8 samples to calculate o:

-8  24 -48 160 160 -48  24  -8 
-------------------------------
 Q   R   S   A o B   K   L   M       o = ( -8 x Q + 24 x R - 48 x S + 160 x A + 160 x B - 48 x K + 24 x L - 8 x M ) / 256

MPEG-4 8-tap doesn't extend the reference block beyond bilinear (N+1)x(M+1), it uses boundary mirroring of 3 edge samples to provide 4+4 neighbours:

                           +1
           +-------------+---+
           |             |   |
C   B   A  |A o B o C o D|o E|  E   D   C
           |             |   |
           +-------------+---+

Other FIR used in video compression standards:

Inter prediction precision

A  B  C                Integer sample
D  E  F
G  H  I                   H.261, MPEG-1
A  .  B  .  C  .         
.  .  .  .  .  .
D  .  E  .  F  .           Half sample
.  .  .  .  .  .
G  .  H  .  I  .           MPEG-1/2, H.263, MPEG-4 bilinear interpolation
.  .  .  .  .  .
A  .  .  .  B  .  .  .  C  .  .  .       
.  .  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  .  .         Quarter sample
.  .  .  .  .  .  .  .  .  .  .  .         
D  .  .  .  E  .  .  .  F  .  .  .         AVC luma 6-tap FIR
.  .  .  .  .  .  .  .  .  .  .  .         MPEG-4 luma 8-tap FIR
.  .  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  .  .  
G  .  .  .  H  .  .  .  I  .  .  .  
.  .  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  .  .  
A  .  .  .  .  .  .  .  B  .  .  .  .  .       
.  .  .  .  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  .  .  .  .         1/8 sample
.  .  .  .  .  .  .  .  .  .  .  .  .  .         
.  .  .  .  .  .  .  .  .  .  .  .  .  .         AVC 4:2:0 sub-sampled chroma bilinear interpolation
.  .  .  .  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  .  .  .  .  
D  .  .  .  .  .  .  .  E  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  .  .  .  .  

Interpolation in the Decoder

MV

Motion vector component is a 2's-complement integer.

Int and Frac

The integer value interpreted as integer- and fractional part.

Here is a quarter precision mv horizontal component of +/-11:

                                -11                              +11
                <-------------------------------+------------------------------->
                                                |
X  .  .  .  X  .  .  .  X  .  .  .  X  .  .  .  X  .  .  .  X  .  .  .  X  .  .  .  X  .
            |           |                       |                       |           | 
            |           |                     curr                      |           | 
            |___________|                                               |___________| 
                                                                                      
           -3          -2          -1           0           1           2           3 

Moving +/-11 quarter pixels results in different integer left neighbour (start position for sample fetch):

Moving +/-11 quarter pixels results in different fractional position (different interpolation equations):


Fortunately, CPU integer operations asl and and gives exactly the correct results and is very fast! This is >> and & in C.

                mv         >> 2      & 3    frac

...
11111110 11     -5         -2          3     3/4
11111111 00     -4         -1          0     0
11111111 01     -3         -1          1     1/4
11111111 10     -2         -1          2     1/2
11111111 11     -1         -1          3     3/4
00000000 00      0          0          0
00000000 01      1          0          1     1/4
00000000 10      2          0          2     1/2
00000000 11      3          0          3     3/4
00000001 00      4          1          0     0
00000001 01      5          1          1     1/4
...

And it works for all precision:

                                  Int           Frac
Half sample precision             >> 1          & 1
Quarter sample precision          >> 2          & 3
Eights sample precision           >> 3          & 7


Sample fetch

To predict an N x M block by interpolation requires an extended reference sample block, depending on the method.

For all (bi)linear and peculiarly MPEG-4 ASP 8-tap:

(N+1)x(M+1) from (Intx; Inty)

                 bi                                       h                                       v
                 
                             +1                                      +1
        +-------------------+-+                 +-------------------+-+                 +-------------------+
        |                   | |                 |                   | |                 |                   |
        |                   | |                 |                   | |                 |                   |
        |                   | |                 |                   | |                 |                   |
        |                   | |                 |                   | |                 |                   |
        |                   | |                 |                   | |                 |                   |
        |                   | |                 |                   | |                 |                   |
        |                   | |                 |                   | |                 |                   |
        +-------------------+-+                 +-------------------+-+                 +-------------------+
     +1 +-------------------+-+                                                      +1 +-------------------+

For AVC 6-tap luma:

(N+5)x(M+5) from (Intx - 2; Inty - 2)

                bi                                       h                                       v
                
  -2                          +3                                                                                           
    +-+-+-------------------+-+-+-+                                                     +-------------------+    
-2  +-+-+-------------------+-+-+-+          -2                       +3            -2  +-------------------+         
    +-+-+-------------------+-+-+-+         +-+-+-------------------+-+-+-+             +-------------------+    
    | | |                   | | | |         | | |                   | | | |             |                   |    
    | | |                   | | | |         | | |                   | | | |             |                   |    
    | | |                   | | | |         | | |                   | | | |             |                   |    
    | | |                   | | | |         | | |                   | | | |             |                   |    
    | | |                   | | | |         | | |                   | | | |             |                   |    
    | | |                   | | | |         | | |                   | | | |             |                   |    
    | | |                   | | | |         | | |                   | | | |             |                   |    
    +-+-+-------------------+-+-+-+         +-+-+-------------------+-+-+-+             +-------------------+    
    +-+-+-------------------+-+-+-+                                                     +-------------------+    
+3  +-+-+-------------------+-+-+-+                                                 +3  +-------------------+    
    +-+-+-------------------+-+-+-+                                                     +-------------------+    

The horizontal/vertical cases are just optimizations to reduce pixel fetch from the reference buffers.


AVC chroma bilinear interpolation with 1/8 precision

A true linear interpolation: 1/8 samples are computed by weighted average of 4 neghbours. Weights are distances:

X  .  .  .  .  .  .  .  X
.  .  .  .  .  .  .  .                 X: integer samples
.  .  .  .  .  .  .  .                 .: 8th fractional samples
.  .  .  .  .  .  .  .          
.  .  .  .  .  .  .  .   
.  .  .  .  .  .  .  .   
.  .  .  .  .  .  .  .          
.  .  .  .  .  .  .  .          
X                       X

AVC chroma mv-s

To make 8th precision mv from 4th precision luma mv depends on sub-sampling.

When sub-sampled (eg. 4:2:0 vertical): keep the quarter luma mv as-is but with different int/frac:

   int     frac                  int    frac
100101010  10       ---->     10010101  010  

When not sub-sampled (eg. 4:2:2 vertical): either opt it out or to use the same 8th routines, scale the vertical component by <<1, so int is the same but frac is doubled:

   int     frac                  int    frac
100101010  10       ---->    100101010  100  

          - 1/2       + 1/2
               <-- -->
               |     |
               |     |
     -2    -1  |  0  |  1     2
---------------+-----+---------------
      X     X  o  X  o  X     X   
      
            |     |     |       
            |_____|_____|       
               |     |       

Ref sample block for interpolation

(Bi)linear interpolation

From closest neighbours:

A - h - B
| \   /                  h = ( A + B ) / 2
v   b                    v = ( A + C ) / 2
| /   \                  b = ( A + B + C + D ) / 4
C       D

or by CPU integer arithmetics:
o = ( A + B + 1) >> 1

Here o is by averaging 4 neighbours (again, a special case of bilinear interpolation where distance from neighbours are the same):

o = ( A + B + C + D ) / 4 
o = ( A + B + C + D + 2 ) >> 2 

Summary: half sample interpolation by linear interpolation

X - o - X
| \   /  
o   o     
| /   \   
X       X
       
       
    Reference 4x4  
+----------------------+
| X  o  X  o  X  o  X  |     all possible half positions of interpolated 3x3
| o  o  o  o  o  o     |
| X  o  X  o  X  o  X  |
| o  o  o  o  o  o     |
| X  o  X  o  X  o  X  |
| o  o  o  o  o  o     |
| X     X     X     X  |
+----------------------+

AVC EXAMPLE

Interpolated values are computed on-the-fly during prediction by the decoder.

Eg. below: xFrac=3, yFrac=2: Standard calls this k value. How to calculate its value?

A  .  .  .  B  .  .  .  C  .  .  
.  .  .  .  .  .  .  .  .  .  .  
.  .  .  k  .  .  .  k  .  .  .  
.  .  .  .  .  .  .  .  .  .  .  
D  .  .  .  E  .  .  .  F  .  .  
.  .  .  .  .  .  .  .  .  .  .  
.  .  .  k  .  .  .  k  .  .  .  
.  .  .  .  .  .  .  .  .  .  .  
G  .  .  .  H  .  .  .  I  .  .  
.  .  .  .  .  .  .  .  .  .  .  

k equals average of j and m:

A  .  .  .  B  .  .  .  C  .  . 
.  .  .  .  .  .  .  .  .  .  . 
.  .  j  k  m  .  j  k  m  .  . 
.  .  .  .  .  .  .  .  .  .  . 
D  .  .  .  E  .  .  .  F  .  . 
.  .  .  .  .  .  .  .  .  .  . 
.  .  j  k  m  .  j  k  m  .  . 
.  .  .  .  .  .  .  .  .  .  . 
G  .  .  .  H  .  .  .  I  .  . 
.  .  .  .  .  .  .  .  .  .  . 

Because we need m anyway at the end:

Unrestricted motion vectors


                   +---------+
+---------+        |         |
|         |        |         |                   
|      +--|--------|---------|-------------+ 
|      |  |        +---------+             | 
+---------+                                |           
       |                                   |           
       |                                   |           
       |                                   |           
       |                                   |           
       |                                   |           
       |                                   |           
       |                                   |           
       |                                   |           
       |                 +---------+      +---------+  
       +-----------------|---------|------|+        |  
                         |         |      |         |
                         |         |      |         |
                         +---------+      +---------+
             
                         +---------+                   
                         |         |                   
                         |         |                   
                         |         |                   
                         +---------+ 

.

Edge samples

0     0     0     1     2     

0     0     0     1     2     
         +-----+-----+-----+--
0     0  |  0  |  1  |  2  |
         +-----+-----+-----+--
3     3  |  3  |  4  |  5  |
         +-----+-----+-----+--
6     6  |  6  |  7  |  8  |
         +-----+-----+-----+--
         |     |     |     |

.

Mirror samples

MPEG-4's quarter luma interpolation for half-position samples to keep the same ref sample numbers as half-sample interpolation, but using 8-tap:


Sun Jun 17 22:30:11 UTC+0200 2018 © A. Tarpai