MPEG-2 is the designation for a group of coding and compression standards for Audio and Video (AV), agreed upon by MPEG (Moving Picture Experts Group), and published as the ISO/IEC 13818 international standard. MPEG-2 is typically used to encode audio and video for broadcast signals, including direct broadcast satellite and Cable TV. MPEG-2, with some modifications, is also the coding format used by standard commercial DVD movies. Using MPEG2 requires paying licensing fees to the patent holders via the MPEG Licensing Association.
MPEG-2 includes a Systems part (part 1) that defines two distinct (but related) container formats. One is Transport Stream, which is designed to carry digital video and audio over somewhat-unreliable media. MPEG-2 Transport Stream is commonly used in broadcast applications, such as ATSC and DVB. MPEG-2 Systems also defines Program Stream, a container format that is designed for reasonably reliable media such as disks. MPEG-2 Program Stream is used in the DVD and SVCD standards.
The Video part (part 2) of MPEG-2 is similar to MPEG-1, but also provides support for interlaced video (the format used by analog broadcast TV systems). MPEG-2 video is not optimized for low bit-rates (less than 1 Mbit/s), but outperforms MPEG-1 at 3 Mbit/s and above. All standards-conforming MPEG-2 Video decoders are fully capable of playing back MPEG-1 Video streams.
With some enhancements, MPEG-2 Video and Systems are also used in most HDTV transmission systems.
The MPEG-2 Audio part (defined in Part 3 of the standard), enhances MPEG-1's audio by allowing the coding of audio programs with more than two channels. Part 3 of the standard allows this to be done in a backwards compatible way, allowing MPEG-1 audio decoders to decode the two main stereo components of the presentation.
Part 7 of the MPEG-2 standard specifies a rather different, non-backwards-compatible audio format. Part 7 is referred to as MPEG-2 AAC. While AAC is more efficient than the previous MPEG audio standards, it is much more complex to implement, and somewhat more powerful hardware is needed for encoding and decoding.
MPEG-2 is for the generic coding of moving pictures and associated audio and specifies a video stream format which may be constructed of three types of frame data (intra frames, forward predicted frames and bidirectionally predicted frames) that can be arranged in a specified order called the GOP structure (GOP = Group Of Pictures - see below). (Actually, the standard itself does not define or use the term GOP, except in the name of a syntax structure called a GOP header — however, users of MPEG-2 have found that the GOP concept helps convey a basic understanding of the standard.)
Typically the originating material is a video sequence at a pre-set pixel resolution at 25 (CCIR) or (approximately) 29.97 (FCC) frames/second with sound.
MPEG-2 supports both interlaced and progressive scan video streams. In progressive scan streams, the basic unit of encoding is a frame, while in interlaced streams, the basic unit may be either a field or a frame. In the discussion below, the generic terms "picture" and "image" refer to either fields or frames, depending on the type of stream.
An MPEG-2 video bitstream is made up of a series of data frames encoding pictures. The three ways of encoding a picture are: intra-coding (I pictures), forward prediction (P pictures), and bidirectional prediction (B pictures).
The video image is separated into one luminance (Y) and two chrominance channels (also called color difference signals Cb and Cr). Blocks of the luminance and chrominance arrays are organized into "macroblocks", which are the basic unit of coding within a picture. Each macroblock is divided into four 8×8 luminance blocks. The number of 8×8 chrominance blocks per macroblock depends on the chrominance format of the source image. For example, in the common 4:2:0 format, there is one chrominance block per macroblock for each of the two chrominance channels, making a total of six blocks per macroblock.
In the case of I pictures, the actual image data is then passed through the encoding process described below. P and B pictures are first subjected to a process of "motion estimation", in which the encoder searches for similarities with the previous (and in the case of B pictures, also the next) image in time order. For each macroblock, the encoder searches for a good "reference sample" — a same-sized area in the previous or next image that is most similar to it. A motion vector is encoded to describe the relationship between the current macroblock, and the reference sample it is predicted from. Usually, even after motion compensation is performed, there is still a difference between the macroblock and its reference sample. This difference (or "residual") may be described by the encoding process described below. Note that if the residual is very small - in particular, if it's probably too slight of a difference to see, the encoder may choose to not encode any residual at all in order to keep bitrate low. In this case, the entire macroblock's appearance is described by just two numbers: the two components of its motion vector.
Each block is treated with an 8x8 discrete cosine transform. The resulting DCT coefficients are then quantized, re-ordered to maximize the probability of long runs of zeros and low amplitudes of subsequent values, and then run-length coded. Finally a fixed-table huffman encoding scheme is applied.
I pictures achieve their efficiency by exploiting spatial redundancy in images; P and B pictures achieve efficiency by taking advantage of temporal redundancy. Because adjacent frames in a video stream are often very similar to each other, P pictures may be 10% of the size of I pictures, and B pictures 2% of their size.
A sequence of different frame types, beginning with an I picture and ending just before the subsequent I picture, is called a Group of Pictures (GOP). The application may choose the length and frame types present in a GOP, but commonly, a GOP is 15 frames long, and has the sequence I_BB_P_BB_P_BB_P_BB_P_BB_. A similar 12-frame sequence is also common. For ideal coding efficiency, the placement of I, P and B pictures in the GOP structure may be determined by the nature of the video stream and the bandwidth constraints on the output stream. For example, a low-motion scene is more efficiently encoded with more B pictures. Its GOP structure might look like: IBBBPBBBPBBP. A higher motion scene may be more efficiently encoded with fewer B pictures. Its GOP structure might look like: IPPBPBPPPPPP. Encoding time may also be a constraint. This is particularly true in live transmission and in real-time applications with limited computing resources, as a stream containing many B pictures can take three times longer to encode than an I-picture-only stream.
The output bit-rate of an MPEG-2 encoder can be constant or variable, with the maximum bit rate determined by the playback application — for example, the DVD standard allows a maximum bitrate of 10.4 Mbit/s. To achieve a constant bit-rate, the degree of quantization is adaptively altered to meet the output bit-rate requirement. Increasing quantization worsens image artifacts when the stream is decoded, generally in the form of "mosaicing", where the discontinuities at the edges of macroblocks become more visible as bit rate is reduced.
MPEG-1 video compression also defines DC pictures (D-pictures), which are similar to I-pictures but include only the DC value of each block. A system can use a stream of only D-pictures to support rapid searching through another stream. This picture type is not supported under MPEG-2.
MPEG-2 also introduces new audio encoding methods. These are
| Abbr. | Name | Frames | YUV | Streams | Comment |
|---|---|---|---|---|---|
| SP | Simple Profile | P, I | 4:2:0 | 1 | no interlacing |
| MP | Main Profile | P, I, B | 4:2:0 | 1 | |
| 422P | 4:2:2 Profile | P, I, B | 4:2:2 | 1 | |
| SNR | SNR Profile | P, I, B | 4:2:0 | 1–2 | SNR: Signal to Noise Ratio |
| SP | Spatial Profile | P, I, B | 4:2:0 | 1–3 | low, normal and high quality decoding |
| HP | High Profile | P, I, B | 4:2:2 | 1–3 |
| Abbr. | Name | Pixel/line | Lines | Framerate (Hz) | Bitrate (Mbit/s) |
|---|---|---|---|---|---|
| LL | Low Level | 352 | 288 | 30 | 4 |
| ML | Main Level | 720 | 576 | 30 | 15 |
| H-14 | High 1440 | 1440 | 1152 | 30 | 60 |
| HL | High Level | 1920 | 1152 | 30 | 80 |
| Profile @ Level | Resolution (px) | Framerate max. (Hz) | Sampling | Bitrate (Mbit/s) | Application |
|---|---|---|---|---|---|
| SP@LL | 176 × 144 | 15 | 4:2:0 | 0.096 | Wireless handsets |
| SP@ML | 352 × 288 | 15 | 4:2:0 | 0.384 | PDAs |
| 320 × 240 | 24 | ||||
| MP@LL | 352 × 288 | 30 | 4:2:0 | 4 | Set-top boxes (STB) |
| MP@ML | 720 × 480 | 30 | 4:2:0 | 15 (DVD: 9.8) | DVD, SD-DVB |
| 720 × 576 | 25 | ||||
| MP@H-14 | 1440 × 1080 | 30 | 4:2:0 | 60 (HDV: 25) | HDV |
| 1280 × 720 | 30 | ||||
| MP@HL | 1920 × 1080 | 30 | 4:2:0 | 80 | ATSC 1080i, 720p60, HD-DVB (HDTV) |
| 1280 × 720 | 60 | ||||
| 422P@LL | 4:2:2 | ||||
| 422P@ML | 720 × 480 | 30 | 4:2:2 | 50 | Sony IMX using I-frame only, Broadcast "contribution" video (I&P only) |
| 720 × 576 | 25 | ||||
| 422P@H-14 | 1440 × 1080 | 30 | 4:2:2 | 80 | Potential future MPEG-2-based HD products from Sony and Panasonic |
| 1280 × 720 | 60 | ||||
| 422P@HL | 1920 × 1080 | 30 | 4:2:2 | 300 | Potential future MPEG-2-based HD products from Panasonic |
| 1280 × 720 | 60 |
Allowed resolutions for SDTV:
(Part 8: 10-bit video extension. Primary application was studio video. Part 8 has been withdrawn due to lack of interest by industry).
Audio codecs | Video codecs | Interactive television
MPEG-2 | MPEG-2 | MPEG-2 | MPEG-2 | MPEG-2 | MPEG-2 | MPEG-2 | MPEG-2 | MPEG-2 | MPEG-2 | MPEG-2 | MPEG-2 | MPEG-2 | MPEG-2