article

Golomb coding is a form of entropy encoding invented by Solomon W. Golomb that is optimal for alphabets following geometric distributions, that is, when small values are vastly more common than large values. If negative values are present in the input stream then a overlap and interleave scheme is used, where all non-negative inputs are mapped to even numbers (x'=2x), and all negative numbers are mapped to odd numbers (x'=2.abs(x)+1).

It uses a tunable parameter b to divide an input value into two parts: q, the result of a division by b, and r, the remainder. The quotient is sent in unary coding, followed by the remainder in truncated binary encoding. When b=1 Golomb coding is equivalent to unary coding.

Formally, the two parts are given by the following expression, where x is the number being encoded:

q = \left \lfloor \frac{\left (x-1 \right )}{b} \right \rfloor

and

r = x-qb-1

The final result looks like:

\left (q+1 \right ) r

Note that r can be of a varying number of bits, and is specifically only b-bits for Rice code, and switches between b,b+1 bits for Golomb code (i.e b is not a power of 2); when r lies between 0,2^log2(b)-b, r takes a b bit representation, and when r is in the interval 2^log2(b) to b, we have a b+1 bit representation.

The parameter b is a function of the corresponding Bernoulli process, which is parameterized by p=P(X=0) the probability of success in a given Bernoulli Trial. b and p are related by these inequalities:

(1-p)^b + (1-p)^{b+1} \leq 1 < (1-p)^{b-1} + (1-p)^b

The Golomb code for this distribution is equivalent to the Huffman code for the same probabilities, if it were possible to compute the Huffman code.

Rice coding is a special case of Golomb coding first described by Robert Rice. It is equivalent to Golomb coding where the tunable parameter is a power of two. This makes it extremely efficient for use on computers, since the division operation becomes a bitshift operation and the remainder operation becomes a bitmask operation.

Simple Algorithm


1. Fix the parameter M to an integer value.

2. For the number to be encoded, say N, find out

    1. quotient = int*
    2. remainder = N%M
3. Generate Codeword
    1. The Code format :
    2. where
    3. a) Quotient Code
      1. Write quotient in unary coding
    4. b) Remainder Code
      1. a) if M is power of 2 , code remainder as binary format. So log2(M) bits are needed.
      2. b) Otherwise code first 2^ceil(log2(M))-M values as plain binary using floor(log2(M)) bits.
      3. Code rest values using ceil(log2(M)) bits, but value is changed to "value+2^ceil(log2(M))-M".

Applications


The FLAC audio codec uses Rice coding to represent linear prediction residues.
Apple's ALAC audio codec uses modfied Rice coding after its Adaptive FIR filter.

References


  • Golomb, S.W. (1966). Run-length encodings. IEEE Transactions on Information Theory, IT--12(3):399--401 *
  • R. F. Rice, "Some Practical Universal Noiseless Coding Techniques, " Jet Propulsion Laboratory, Pasadena, California, JPL Publication 79--22, Mar. 1979.
  • Witten, Ian Moffat, Alistair Bell, Timothy. "Managing Gigabytes: Compressing and Indexing Documents and Images." Second Edition. Morgan Kaufmann Publishers, San Francisco CA. 1999 ISBN 1-55860-570-3
  • David Salomon. "Data Compression",ISBN 0-387-95045-1.

Lossless compression algorithms

Golomb Code

 

This article is licensed under the GNU Free Documentation License. It uses material from the "Golomb coding".

Home Pageartsbusinesscomputersgameshealthhospitalshomekids & teensnewsphysiciansrecreationreferenceregionalscienceshoppingsocietysportsworld