SVD and digital image processing - TFE

SVD and digital image processing - TFE
Shafq ur Réhman
[email protected]
Image and Video
Compression
outline
 mage/video compression: what and why
source coding basics
basic idea
symbol codes
stream codes
 compression systems and standards
system standards and quality measures
image coding JPEG
video coding and MPEG
 Summary
2
need for compression
 Image: 6.0 million pixel camera, 3000x2000
18 MB per image -> 56 pictures / 1GB
 Video: DVD Disc 4.7 GB
video 720x480, RGB, 30 f/s -> 31.1MB/sec
audio 16bits x 44.1KHz stereo -> 176.4KB/s
->1.5 min per DVD disc
 Send video from cellphone:
352*240, RGB, 15 frames / second
3.8 MB/sec ->$38.00/sec levied by AT&T
3
Data Compression
Wikipedia: “data compression, or source coding, is the process of
encoding information using fewer bits (or other information-bearing
units) than an unencoded representation would use through use of
specific encoding schemes.”
Applications
General data compression: .zip, .gz …
Image over network: telephone/internet/wireless/etc
Slow device:
1xCD-ROM 150KB/s, bluetooth v1.2 up to ~0.25MB/s
Large multimedia databases
Understanding compression:
what are behind jpeg/mpeg/mp4 … formats?
what are the “good/fine/super fine” quality modifiers in my Canon 400D?
why/when do I want to use raw/jpeg format in my digital camera?
why doesn’t “zipping” jpeg files help?
what are the best ways to do compression?
are we doing our best? (yes/no/maybe)
4
what can we compress?
 Goals of compression
Remove redundancy
Reduce irrelevance
 irrelevance or perceptual redundancy not all visual
information is perceived by eye/brain, so throw away
those that are not.
 redundant : exceeding what is necessary or normal
 symbol redundancy:
the common and uncommon values cost the same
to store.
 spatial and temporal redundancy:
Temperatures: tend to be similar in adjacent
geographical areas, also tend to be similar in the
same month over different years …
5
symbol/inter-symbol redundancy
 Letters and words in English
e, a, i, s, t, …
a, the, me, I …
good, magnificent, …
fyi, btw, ttyl …
 In the evolution of language we
naturally chose to represent frequent
meanings with shorter representations.
6
Data and information
 Data is not the same thing as information.
 Data is the means with which information is
expressed. The amount of data can be much larger
than the amount of information.
 Redundant data doesn't provide additional
information.
 Image coding or compression aims at reducing the
amount of data while keeping the information by
reducing the amount of redundancy.
7
Image Compression


Image compression addresses the problem
of reducing the amount of data required to
represent a digital image. The underlying
basis of reduction process is the removal of
redundant data
Transforming a 2-D pixel array into a
statistically uncorrelated data set
Different Types of Redundancy
 Coding Redundancy:
Some gray levels are more common than
others.
 Inter-pixel Redundancy:
The same gray level may cover a large
area.
 Psycho-Visual Redundancy:
The eye can only resolve about 32 gray
levels locally
M. C. Escher, Drawing Hands, 1948
9
Redundancy

Spatial redundancy


Similarities between adjacent pixels
250 252 249


250 2 -3
Temporal redundancy


Similarities between pixels in adjacent frames
250 252 249

250 2 -1
10
Modes of compression
 Lossless
preserve all information, perfectly recoverable
examples: morse code, zip/gz
 Lossy
throw away perceptually insignificant information
cannot recover all bits
11
Image Compression
Image compression can be:
 Reversible (lossless), with no loss of information.
– The image after compression and decompression is identical to the
original image. Often necessary in image analysis applications.
– The compression ratio is typically 2 to 10 times.
 Non reversible (lossy), with loss of some information.
– Lossy compression is often used in image communication, compact
cameras, video, www, etc.
– The compression ratio is typically 10 to 30 times.
12
Image Coding and Compression
 Image coding
– How the image data can be represented.
 Image compression
– Reducing the amount of data required to represent an image.
– Enabling efficient image storage and transmission.
𝑓
𝑓^
13
Lossless compression
14
Objective Measures of Image Quality:
Rate
15
Objective Measures of Image Quality:
Compression Ratio

Compression ratio serves as the primary measure of a compression
technique’s effectiveness. It is a measure of the number of bits that can be
eliminated from an uncompressed representation of a source image.

Let N1 be the total number of bits required to store an uncompressed (raw)
source image and let N2 be the total number of bits required to store the
compressed data. The compression ratio Cr is then defined as the ratio of
N1 to N2
 Larger compression ratios indicate more effective compression
 Smaller compression ratios indicate less effective compression
 Compression ratios less than one indicate that the compressed
representation is actually larger than the uncompressed representation.
16
Objective Measures of Image Quality
17
Objective Measures of Image Quality
Objective fidelity criteria
18
MSE vs PSNR
19
Subjective Measures of Image Quality
 The problem
– The objective image quality measures previously shown does not
always fit with our perception of image quality.
 One solution
– Let a number of test persons rate the image quality of the images on
a scale. This will result in a subjective measure of image quality, or
rather fidelity, but it will be based on how we perceive the quality of the
images.
Subjective fidelity criteria
 Excellent
 Fine
 Passable
 Marginal
 Inferior
 unusable
20
Information Measure
Using Elements of Information theory
21
Example

Assume that the grading levels are


A, B, C, D, E, F
These are equally distributed
How much information do you have if you
know that you don’t have grade F?
 How much information is needed to know
your exact grade?

(5/6)*-log2 (5/6) = 0,22 bits
log2 (6) - ((1/6)*-log2 (1/6)) = 2,15 bits
Information Measure:
Entropy
Shannon Entropy = average information per
source output
H(z)= −
𝐿−1
𝑘=0 𝑝
𝑟𝑘 𝑙𝑜𝑔2 𝑝(𝑟𝑘)
where 𝑟𝑘 is gray level number k , and L is
the number of gray levels.
Measure the Amount of Data
The average length of the code words assigned to various gray-level
values is found by summing the product of the number of bits used to
represent each gray level and the probability that the gray level occurs
24
Example 3-bit image
25
The noiseless coding theorem

It is possible to make Lavg/n arbitrary close to
H(z) by coding infinitely long extensions of
the source
 Lavg / n 
lim 
 H ( z)

n 
 n 
……Redundancy
 Coding Redundancy:
Some gray levels are more common than
others.
 Inter-pixel Redundancy:
The same gray level may cover a large
area.
 Psycho-Visual Redundancy:
The eye can only resolve about 32 gray
levels locally
M. C. Escher, Drawing Hands, 1948
27
Definitions (revisited)
 Larger compression ratios indicate more effective compression
 Smaller compression ratios indicate less effective compression
 Compression ratios less than one indicate that the compressed
representation is actually larger than the uncompressed representation.
28
Dealing with coding redundancy
Basic idea:
Different gray levels occur with different probability (non uniform histogram).
Use shorter code words for the more common gray levels and longer code
words for less common gray levels. This is called Variable Code Length.
Code 1 Lavg = 3
Code 2 Lavg = 2,7
29
Revisit:
Desired properties of symbol codes
good codes are not only short but also easy to
encode/decode
 Non-singular: every symbol in X maps to a different
code word.
 Uniquely decodable: every sequence {x1, … xn} maps
to different codeword sequence.
 Instantaneous: no codeword is a prefix of any other
codeword a.k.a. prefix code, self-punctuating code,
prefix-free code.
30
Huffman Coding
First
1. Sort the gray levels by decreasing probability
2. Sum the two smallest probabilities.
3. Sort the new value into the list.
4. Repeat 1 to 3 until only two probabilities remains.
David Albert Huffman
Second
1. Give the code 0 to the highest probability, and the code 1
to the lowest probability in the summed pair.
2. Go backwards through the tree one node and repeat
from 1 until all gray levels have a unique code.
31
Example of Huffman coding
32
Huffman Coding
First
1. Sort the gray levels by decreasing probability
2. Sum the two smallest probabilities.
3. Sort the new value into the list.
4. Repeat 1 to 3 until only two probabilities remains.
David Albert Huffman
Second
1. Give the code 0 to the highest probability, and the code 1
to the lowest probability in the summed pair.
2. Go backwards through the tree one node and repeat
from 1 until all gray levels have a unique code.
33
Example of Huffman coding
Assigning codes
34
Example of Huffman coding
35
Huffman coding….
The Huffman code is completely reversible, i.e.,
lossless.
The table for the translation has to be stored together
with the coded image.
The resulting code is unambiguous. That is, for the
previous example, the encoded string 011011101011
can only be parsed into the code words 0, 110, 1110,
1011 and decoded as 7, 4, 5, 0.
The Huffman code does not take correlation between
adjacent pixels into consideration.
36
Interpixel Redundancy
Also called spatial or geometric redundancy
Adjacent pixels are often correlated, i.e., the
value of neighboring pixels of an observed
pixel can often be predicted from the value of
the observed pixel.
Coding methods:
Run-length coding
Difference coding
37
Run-length coding
 Every code word is made up of a pair (g,l) where g is the gray
level, and l is the number of pixels with that gray level (length or
“run”).
 E.g.,
results in the run-length code (1,3)(3,3)(4,1)(2,4)(1,5)
 The code is calculated row by row in this scan pattern:
(Newer methods can take advantage of runs of repetitive patterns
like: 8 5 5 8 5 5 8 5 5.)
38
Difference coding
Definition:
E.g.,
The code is calculated row by row in the following scan pattern:
Both run-length and difference coding are reversible and can be
combined with, e.g., Huffman coding.
39
Combining Difference and Huffman Coding
Original image
Difference coding
40
Combining Difference and
Huffman Coding
41
Combining Difference and Huffman Coding
42
LZW Coding
What if the symbol probabilities are
unknown?
LZW, Lempel-Ziv-Welch
 In contrast to Huffman with variable code length, LZW uses fixed
lengths of code words which are assigned to variable length sequences
of source symbols.
 The coding is done from left-to-right and row-by-row.
 Requires no a priori knowledge of the probability of occurrence of the
symbols to be encoded.
 Removes some of the inter-pixel redundancy.
 During encoding a dictionary or “code-book” with symbol sequences is
created which is recreated when decoding.
(Many modern lossless compression methods uses Huffman coding in
combination with methods like LZW.)
43
LZW-Coding
Lempel-Ziv-Welch

Integrated to mainstream imaging file formats



Graphic interchange format –GIF
Tagged image file format – TIFF
Portable document format - PDF
 Widely used: GIF, TIFF, PDF …
 Its royalty-free variant (DEFLATE) used in PNG, ZIP, …
Unisys U.S. LZW Patent No. 4,558,302 expired on June 20,2003
http://www.unisys.com/about__unisys/lzw
39 39 126 126 39 39 126 126 39 39 126 126 39
39 39 126 126
256
258
260
259
39 126 126
257
126
LZW



Coding dictionary (code book) is created
while data are being encoded
LZW decoder builds an identical
decompression dictionary as it decodes the
data stream
Flush the code book


When the codebook is full
When coding is inefficient
Bit Plane coding



Divide the gray level/color image into series of binary
images (with one image per bit). Code each image
separately using the above described methods. An 8-bit
image will be represented by 8 coded binary images.
It is based on the concept of decomposing a multilevel
image into a series of binary images and compressing
each binary image via one of several well-known binary
compression methods
Alternative decomposition approach – start with
representation of Gray code – successive gray level
differ with one bit
Constant area coding (CAC)


Image is divided into areas of size p x q
Classify all white, all black, mixed


Example white=0, black=10, mixed = 11+pixel
values
If dominantly white

Example white=0, black or mixed = 1 + pixel
values
Lossless Predictive Coding

Does not require decomposition of an image into a collection of
bitplanes

Based on eliminating the interpixel interference of closely spaced
pixels by extracting and coding only the new information in each
pixel

Contains encoder, decoder and predictor

The output of the predictor is rounded to the nearest integer
Lossless Predictive Coding




Prediction error en  f n  fˆn
Is coded using an variable length code
Decoder reconstructs
ˆ
f n  en  f n
The predictor uses m previous pixels
m
fˆn ( x, y )  round[ i f ( x, y  i )]
i 1
Lossy Predictive Compression
A Typical Predictive Coder
Lossy Predictive Compression
 Reduce the accuracy of the saved image for increased compression
 One of the simplest lossy predictive coding schemes is known as
Delta Modulation. Record the approximate error (difference)
between the predicted and actual samples values.
 For a sequence of samples S that are indexed by k to generate an
approximate sequence of samples S’ by using a fixed delta value, the
approximate error is given by
52
Lossy Predictive Compression:
Effect of delta value
53
Delta Modulation Example
54
Delta Modulation Example
55
Psycho-Visual Redundancy
If the only intended use of an image is visual observation, much of the
information can be psycho-visual redundant, i.e., it can be removed
without changing the visual appearance or perceived quality of the
image. Loss of information inflict a lossy method.
1 721 kB (uncompressed)
78 kB (low quality JPEG)
56
Psycho-Visual redundancy
 Psycho-Visual redundancy is often reduced by quantification.
E.g., Uniform quantification of gray levels
 Remove the least significant bits of the data.
 Causes edge effects.
 The edge effects can be reduced by Improved Gray Scale (IGS).
 Remove the least significant bit, and add a “random number”
based on the sum of the least significant bits of the present,
and the previous pixel.
 IGS reduces edge effects, but will at the same time unsharpen
true edges.
57
Improved Gray Scale (IGS)
(a) Original image. (b) Uniform quantization to 16 levels. (c) IGS quantization to 16
levels.
58
Transform Coding


A compression technique that is based on
modifying the transform of an image
For most natural images a significant number
of the coefficients have small magnitudes and
can be coarsely quantized or discarded with
little image distortion
Transform Coding


Sub image decomposition
Transformation


Quantization
Coding
Adaptive transform coding or nonadaptive transform coding
Transform Coding
1) Divide the image into n × n sub-images.
2) Transform each sub-image using a reversible transform (e.g., the Hotelling
transform, the discrete Fourier transform (DFT) or the discrete cosine
transform (DCT)).
3) Quantify, i.e., truncate the transformed image (e.g., by using DFT, and DCT
frequencies with small amplitude can be removed without much information
loss). The quantification can be either image dependent (IDP) or image
independent (IIP).
4) Code the resulting data, normally
using some kind of “variable length
coding”, e.g., Huffman code.
The coding is not reversible (unless step
3 is skipped).

g(x,y,u,v) = Forward transformation kernel
N 1 N 1
T (u , v)   f ( x, y )  g ( x, y, u , v)
x 0 y 0

h(x,y,u,v) = Inverse transformation kernel
N 1 N 1
f ( x, y )   T (u , v)  h( x, y, u , v)
u 0 v 0
Fourier Transform
1  j 2 ( ux  vy) / N
g ( x, y , u , v )  2 e
N
1  j 2 (ux  vy) / N
h ( x, y , u , v )  2 e
N
Walsh-Hadamard Transform WHT
m 1
[bi ( x ) pi (u )bi ( y ) pi ( v )]
1
g ( x, y, u, v)  h( x, y, u, v)  (1) i  0
N
• The Fourier Transform consists of a projection onto a
set of orthogonal sinusoidal waveforms.
• The FT coefficients are called frequency components
and the waveforms are ordered by frequency.
• The Hadamard Transform consists of a projection
onto a set of square waves called Walsh functions.
• The HT coefficients are called sequence components
and the Walsh functions are ordered by the number of
their zero-crossings.
Discrete Cosine Transform DCT
• The discrete cosine transform (DCT) is used to transform a signal from
the spatial domain into the frequency domain.
• The reverse process, that of transforming a signal from the frequency
domain into the spatial domain, is called the inverse discrete cosine
transform (IDCT).
• A signal in the frequency domain contains the same information as that in
the spatial domain. The order of values obtained by applying the DCT is
coincidentally from lowest to highest frequency
• This feature and the psychological observation that the human eye and
ear are less sensitive to recognizing the higher-order frequencies leads to
the possibility of compressing a spatial signal by transforming it to the
frequency domain and dropping high-order values and keeping low-order
ones.
• When reconstructing the signal, and transforming it back to the spatial
domain, the results are remarkably similar to the original signal.
Discrete Cosine Transform DCT
h( x, y, u , v)   (u ) (v) cos(
(2 x  1)u
(2 y  1)v
) cos(
)
2N
2N
 1/ N
u0
 2/ N
u  1, 2...N  1
 (u )  
JPEG - Sequential baseline
system




Limited to 8-bit words
DCT-values restricted to 11 bit
DCT computation, quantization, variable
length coding
Subimages 8 x 8 left to right, top to bottom
Image size selection
JPEG:
example of transform coding
File size in bytes
JPEG quality
9486
100 %
3839
90 %
1711
60 %
1287
40 %
822
20 %
533
10 %
380
5%
2086
80 %
70
Wavelet Coding

The principal difference between wavelet
coding and transform coding is the omisson
of the subimage processing stage

JPEG2000
File Formats with Lossy Compression
JPEG, Joint Photographic Experts Group, based on
a cosine transform on 8x8 pixel blocks and RunLength coding. Give arise to ringing and block
artifacts. (.jpg .jpe .jpeg)
JPEG2000, created by the Joint Photographic
Experts Group in 2000. Based on wavelet transform
and is superior to JPEG. Give arise only to ringing
artifacts and allows flexible decompression
(progressive transmission, region of interest, ...)
and reading. (.jp2 .jpx)
72
JPEG vs JPEG-2000
73
Typical steps in losy image
compression
74
How to represent a face image
75
DCT Approach (block-based)
=a
+b
+f
+g
+x
PCA Approach (full-frame)
77
JPEG/JFIF overview
78
Using the 2D FFT for image
compression
•
•
•
Image = 200x320 matrix of values
Compress by keeping largest 2.5% of FFT components
Similar idea used by jpeg
79
Trade-off
80
File Formats with Lossless Compression
TIFF, Tagged Image File Format, flexible format often
supporting up to 16 bits/pixel in 4 channels. Can use
several different compression methods, e.g.,
Huffman, LZW.
GIF, Graphics Interchange Format. Supports 8
bits/pixel in one channel, that is only 256 colors.
Uses LZW compression. Supports animations.
PNG, Portable Network Graphics, supports up to 16
bits/pixel in 4 channels (RGB + transparency). Uses
Deflate compression (~LZW and Huffman). Good
when interpixel redundancy is present.
81
Vector based file formats
82
Vector based file formats
 PS, PostScript, is a page description language developed in
1982 for sending text documents to printers.
 EPS, Encapsulated PostScript, like PS but can embed raster
images internally using the TIFF format.
 PDF, Portable Document Format, widely used for documents
and are supported by a wide range of platforms. Supports
embedding of fonts and raster/bitmap images. Beware of the
choice of coding. Both lossy and lossless compressions are
supported.
 SVG, Scalable Vector Graphics, based on XML supports both
static and dynamic content. All major web browsers supports it
(Internet Explorer from version 9).
83
Choosing image file format
Image analysis
– Lossless formats are vital. TIFF supports a wide
range of different bit depths and lossless compression
methods.
Images for use on the web
– JPEG for photos (JPEG2000), PNG for illustrations.
GIF for small animations. Vector format: SVG,
nowadays supported by web browsers.
Line art, illustrations, logotypes, etc.
– Lossless formats such as PNG etc. (or a vector
format)
84
Video
Compression
85
Video Compression Standards




Once video is in digital format, it makes
sense to compress it
Similarly to image compression, we want to
store video data as efficiently as possible
Again, we want to both maximize quality and
minimize storage space and processing
resources
This time, we can exploit correlation in both
space and time domains
Video Compression Standards


Unlike image encoding, video encoding is
rarely done in lossless form
No storage medium has enough capacity to
store a practical sized lossless video file





Lossless DVD video - 221 Mbps
Compressed DVD video - 4 Mbps
50:1 compression ratio!
Teleconference H.261, H.262, H.263, H.230
Multimedia video MPEG-1 MPEG-2 MPEG-4


Two organizations dominate video
compression standardization:
ITU-T Video Coding Experts Group (VCEG)
International Telecommunications Union – Telecommunications Standardization
Sector (ITU-T, a United Nations Organization, formerly CCITT),

ISO/IEC Moving Picture Experts Group
(MPEG)
International Standardization Organization and International Electrotechnical
Commission, Joint Technical Committee Number 1, Subcommittee 29, Working Group 11
88
Definitions

Bitrate




Resolution



Information stored/transmitted per unit time
Usually measured in Mbps (Megabits per second)
Ranges from < 1 Mbps to > 40 Mbps
Number of pixels per frame
Ranges from 160x120 to 1920x1080
FPS (frames per second)


Usually 24, 25, 30, or 60
Don’t need more because of limitations of the human
eye
89
Scan types

Interlaced scan




Odd and even lines displayed on
alternate frames
Initially used to save bandwidth on TV
transmission
When displaying interlaced video on a
progressive scan display, can see “comb
effect
Progressive scan



Display all lines on each frame
New “fixed-resolution” displays (such as
LCD, Plasma) all use progressive scan
Deinterlacing is not a trivial task
90
MPEG
(Moving Pictures Expert Group)



Committee of experts that develops video
encoding standards
Until recently, was the only game in town (still
the most popular, by far)
Suitable for wide range of videos



Low resolution to high resolution
Slow movement to fast action
Can be implemented either in software or
hardware
91
MPEG
(Moving Pictures Expert Group)

MPEG:s main components are:






Block (8×8 pixels)
Macro block (2×2 block)
Slice (One row of macro blocks)
Picture (An entire video frame)
Group of pictures (GOP)
Video Sequence (One or more GOP:s)
92
MPEG
8×8 Block
16×16 Macro block
Slice
93
Evolution of MPEG

MPEG-1






Initial audio/video compression standard
Used by VCD’s
MP3 = MPEG-1 audio layer 3
Target of 1.5 Mb/s bitrate at 352x240 resolution
Only supports progressive pictures
MPEG-2


Current de facto standard, widely used in DVD and Digital TV
Ubiquity in hardware implies that it will be here for a long time


Transition to HDTV has taken over 10 years and is not finished yet
Different profiles and levels allow for quality control
Evolution of MPEG

MPEG-2


Current de facto standard, widely used in DVD
and Digital TV
Ubiquity in hardware implies that it will be here for
a long time


Transition to HDTV has taken over 10 years and is not
finished yet
Different profiles and levels allow for quality
control
Evolution of MPEG

MPEG-3


MPEG-4




Includes support for AV “objects”, 3D content, low bitrate
encoding, and DRM
In practice, provides equal quality to MPEG-2 at a lower bitrate,
but often fails to deliver outright better quality
MPEG-4 Part 10 is H.264, which is used in HD-DVD and Blu-Ray
MPEG-7, 2001 :


Originally developed for HDTV, but abandoned when MPEG-2
was determined to be sufficient
metadata for audio-video streams, Multimedia Content
Description Interface
MPEG-21, 2002 :

distribution, exchange, user access of multimedia data and
intellectual property management
MPEG Block Diagram
MPEG technical specification










Part 1 - Systems - describes synchronization and multiplexing of
video and audio.
Part 2 - Video - compression codec for interlaced and noninterlaced video signals.
Part 3 - Audio - compression codec for perceptual coding of audio
signals. A multichannel-enabled extension of MPEG-1 audio.
Part 4 - Describes procedures for testing compliance.
Part 5 - Describes systems for Software simulation.
Part 6 - Describes extensions for DSM-CC (Digital Storage Media
Command and Control.)
Part 7 - Advanced Audio Coding (AAC)
Part 8 - Deleted
Part 9 - Extension for real time interfaces.
Part 10 - Conformance extensions for DSM-CC.
MPEG
video spatial domain processing

Spatial domain handled very similarly to
JPEG





Convert RGB values to YUV colorspace
Split frame into 8x8 blocks
2-D DCT on each block
Quantization of DCT coefficients
Run length and entropy coding
“Intra-Frame
Encoded”
Quantization
• major reduction
• controls ‘quality’
Zig-Zag Scan,
Run-length
coding
100
MPEG – YUV compression
Y
Y
Y
Y
U
V
Y
Y
U
V
Y
Y
U
V
4:2:0
Y
Y
Y
Y
U
V
Y
Y
U
U
V
V
Y
Y
Y
Y
U
V
Y
Y
U
U
V
V
4:1:1



4:2:2
4:4:4
4:2:0 or 4:1:1 common in consumer products (DV)
4:2:2 common on professional products (DVCPro)
4:4:4 is rarely used – gives no visible improvement
compared with 4:2:2
101
MPEG – Block compression

The same as JPEG
High frequencies
8
DCT
8
Spatial domain
Quantization
99 50 35 -11 12 0
0
0
-74 28 21 24 0
0
0
0
87 -49 54 16 0
0
0
0
55 95 35 22 4
2
0
0
68 40 -17 8
8
0
0
0
44 57 25 12 0
3
0
0
-25 32 33 24 0
0
0
0
60 18 14 5
-1
0
0
0
Low Frequencies
Frequency domain
0 0 0 0 0 -1 0 0 0 0 0 0 3 0 5 14 24 0 0 0 0 0 0 2 8 12 …
RLE
Huffman
102
MPEG
video time domain processing
(Temporal compression)


Adjacent frames share large similarities
Temporal compression can be achieved in
two ways:


Discarding images (reduce the frame rate)
Through motion estimation and motion vectors
103
MPEG
video time domain processing


Totally new ballgame
(this concept doesn’t
exist in JPEG)
General idea – Use
motion vectors to specify
how a 16x16 macroblock
translates between
reference frames and
current frame, then code
difference between
reference and actual
block
Types of frames

I frame (intra-coded)


P frame (predictive-coded)



Coded without reference to other frames
Coded with reference to a previous reference frame
(either I or P)
Size is usually about 1/3rd of an I frame
B frame (bi-directional predictive-coded)


Coded with reference to both previous and future
reference frames (either I or P)
Size is usually about 1/6th of an I frame
GOP (Group of Pictures)




GOP is a set of
consecutive frames that
can be decoded without
any other reference frames
Usually 12 or 15 frames
Transmitted sequence is
not the same as displayed
sequence
Random access to middle
of stream – Start with I
frame
Things about prediction

Only use motion vector if a “close” match can be
found





Evaluate “closeness” with MSE or other metric
Can’t search all possible blocks, so need a smart
algorithm
If no suitable match found, just code the macroblock
as an I-block
If a scene change is detected, start fresh
Don’t want too many P or B frames in a row


Predictive error will keep propagating until next I
frame
Delay in decoding
MPEG – Group Of Pictures
(GOP)

MPEG uses three types of frames

Grouped in a Group Of Pictures (GOP)
I-pictures (Intracoded)
P-pictures (Predictive Coded)
B-pictures (Bidirectionally interpolated)



I
B
B
P
B
B
P
P
B
B
P
Forward Prediction
I
B
B
108
Bidirectional Prediction
Compressed video stream
109
Temporal Redundancy Reduction



I frames are independently encoded
P frames are based on previous I, P frames
B frames are based on previous and following I and P
frames
 In case something is uncovered
MPEG – Motion Estimation
Macro block


Calculate the position for the macro block in
the new image
Store the motion vector and difference in
appearance
111
MPEG – Motion Estimation
 Help understanding the content of image sequence
 Help reduce temporal redundancy of video
 For compression
 Stabilizing video by detecting and removing small, noisy global
motions
 For building stabilizer in camcorder
A hard problem in general!
112
Bitrate allocation

CBR – Constant BitRate



Streaming media uses this
Easier to implement
VBR – Variable BitRate




DVD’s use this
Usually requires 2-pass coding
Allocate more bits for complex scenes
This is worth it, because you assume that you encode
once, decode many times
MPEG – Data stream
Display order
I
B
B
P
B
B
P
1
2
3
4
5
6
7
Order in data stream
I
P
B
B
P
B
B
1
4
2
3
7
5
6
114
MPEG - Audio

MPEG-1 – 3 layers of increasing quality, layer 3 being
the most common (MP3)






MPEG-2 – Supports > 2 channels, lower sampling
frequencies, low bitrate improvement
AAC (Advanced Audio Coding)




16 bits
Samping rate - 32, 44.1, or 48 kHz
Bitrate – 32 to 320 kbps
De facto - 44.1 kHz sample rate, 192 kbps bitrate
More sample frequencies (8 kHz to 96 kHz)
Higher coding efficiency and simpler filterbank
96 kbps AAC sounds better than 128 kbps MP3
Usually CBR, but can do VBR
MPEG Container Format


Container format is a file format that can
contain data compressed by standard codecs
2 types for MPEG


Program Stream (PS) – Designed for reasonably
reliable media, such as disks
Transport Stream (TS) – Designed for lossy links,
such as networks or broadcast antennas
AV Synchronization



Want audio and video streams to be played back
in sync with each other
Video stream contains “presentation
timestamps”
MPEG-2 clock runs at 90 kHz



Good for both 25 and 30 fps
PCR (Program Clock Reference) timestamps
are sent with data by sender
Receiver uses PLL (Phase Lock Loop) to
synchronize clocks
Real time video encoding



Motion estimation will
be worse, so need
higher bitrate to
compensate
Very hard to do in
software, need
dedicated hardware or
hardware assistance
Tivo, ReplayTV do this
Streaming media



Common types include Flash, RealVideo,
Quicktime
Usually have low bandwidth available, need
to optimize as such
Want dedicated network protocols for this
purpose

TCP will wait indefinitely for retransmission, so is
often not suitable
MPEG data stream
Analysis

Pros


Overall sharp picture
Audio and video stay
in sync with each
other

What if we were
transmitting this over a
network?

Cons

Picture flashes, blurs
when there is too
much movement on
screen

Higher bitrate often
does not solve this
problem
Image/Video Compression
Standards
 Bitstream useful only if the recipient knows the code!
 Standardization efforts are important
 Technology and algorithm benchmark
 System definition and development
 Patent pool management
 Defines the bitstream (decoder), not how you generate them (encoder)!
122
current industry focus:
H.264 encoding/decoding on mobile devices,
low-latency video transmission over various networks,
low-power video codec …
123
audio coding versus image
coding
124
VC demo
125
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement