# Fast Integer IDCT

As part of a homebrew JPEG decoder I had to deal with the Inverse Discrete Cosine Transform, IDCT. Pretty interesting and hard topic, especially to implement it in C that is small and fast.

It has a *huge* literature: it's the heart of the main methods used to compress video and audio data in jpeg/mpeg/audio, both in software and silicon.

## Analysis

- Discrete Cosine Transform (DCT) was first introduced in 1974 by Ahmed et al. using real numbers.
- Few fast algorithms for the one-dimensional DCT (1-D DCT), the classical DCT/IDCT algorithms:
- Chen ‘77 the first one
- Wang ‘84
- Lee ‘84
- Vetterli ‘84
- AAN ‘88
- LLM ‘89

- all uses 12 multiplications and 29 additions
- LLM 1989 (C. Loeffler, A. Ligtenberg, and G. S. Moschytz, Practical fast 1-D DCT algorithms with 11 multiplications)

I’ve been fascinated by Löffler’s paper, its clean and simple solution. After a few but strict additions/multiplications the result appears on the ‘right side’ of this figure:

Image source: Reznik, Hindsy, Zhangz, Yuz, and Ni, Efficient Fixed-Point Approximations of the 8x8 Inverse Discrete Cosine Transform

## 1-D integer IDCT

The fig above shows the 1-D transform. It uses 7 constants (see below) to calculate from 8 input values. JPEG compresses by 8x8 pixel blocks.

Here is an example implementation in C, integer (fix-point) arithmetics used, we assume that our cpu can multiply too. I know that this routine can be improved on many ways... but this is the shortest 1D-IDCT in code I can come up with. It took 3 days anyway:

#define xadd3(xa,xb,xc,xd,h) p=xa+xb, n=xa-xb, xa=p+xc+h, xb=n+xd+h, xc=p-xc+h, xd=n-xd+h // triple-butterfly-add (and possible rounding)
#define xmul(xa,xb,k1,k2,sh) n=k1*(xa+xb), p=xa, xa=(n+(k2-k1)*xb)>>sh, xb=(n-(k2+k1)*p)>>sh // butterfly-mul equ.(2)
static void idct1(int *x, int *y, int ps, int half) // 1D-IDCT
{
int p, n;
x[0]<<=9, x[1]<<=7, x[3]*=181, x[4]<<=9, x[5]*=181, x[7]<<=7;
xmul(x[6],x[2],277,669,0);
xadd3(x[0],x[4],x[6],x[2],half);
xadd3(x[1],x[7],x[3],x[5],0);
xmul(x[5],x[3],251,50,6);
xmul(x[1],x[7],213,142,6);
y[0*8]=(x[0]+x[1])>>ps;
y[1*8]=(x[4]+x[5])>>ps;
y[2*8]=(x[2]+x[3])>>ps;
y[3*8]=(x[6]+x[7])>>ps;
y[4*8]=(x[6]-x[7])>>ps;
y[5*8]=(x[2]-x[3])>>ps;
y[6*8]=(x[4]-x[5])>>ps;
y[7*8]=(x[0]-x[1])>>ps;
}
void idct(int *b) // 2D 8x8 IDCT
{
int i, b2[64];
for (i=0; i<8; i++) idct1(b+i*8, b2+i, 9, 1<<8); // row
for (i=0; i<8; i++) idct1(b2+i*8, b+i, 12, 1<<11); // col
}

20101013

© 2010 Attila Tarpai (tarpai76 gmail)