Compression - reducing the number of bits Data Compression Ratio

Compression - reducing the number of bits Data Compression Ratio
12/1/2014
Compression - reducing the number of bits
•
•
•
•
Lossless vs. lossy compression
Compression ratios and savings
Entropy as a measurement of information
Huffman code construction and decoding
Data Compression Ratio and Savings
• Data Compression Ratio (DCR)
DCR
• Savings:
#
#
1
(x100 for %)
L25Q1. Stereo audio is sampled at 44.1 kHz and quantized to 16 bits/channel
and then compressed to 128 kbps mp3 playback format. What are the approximate
DCR and the resulting savings?
L25Q2. A picture of a samurai was saved as a 24-bit samurai.bmp (full size, 2188 kB)
and a 31 kB samurai.png. Estimate the DCR and savings from the PNG compression.
1
12/1/2014
Lossy and Lossless Compression
• Lossy Compression
– Usually leads to larger DCR and savings
– Sometimes creates noticeable “artifacts”
– Examples: mp3, mpeg, jpeg
• Lossless Compression (keeping all information)
– Uses repetition or other data statistics
– Usually leads to smaller compression ratios (~2)
– Examples: PNG, run-length codes, Huffman codes…
from Wikipedia
L25Q3. Why was the samurai picture compressed so much?
Can we expect to achieve such DCR with a photograph?
Amount of information in data
• The amount of information is measured in bits
• Consider the following dialogue:
– I have information for you.
– What is it?
– Guess!
– Can I ask yes/no questions?
– OK. You can ask 20 of them. Use them wisely.
from Wikipedia
L25Q4. How much information can be obtained with the 20 yes/no questions?
2
12/1/2014
Entropy measures information
• If you can predict the data, it contains less information
• Increase in entropy of data increases information in it
• Entropy, &,of each symbol or a message can be defined with (
representing statistical frequency of ) * possibility (aka probability)
0
&
+( ,
0
log / (
1
+ ( , log /
1
1
(
L25Q5. What is the entropy in a result of a single flip of a fair coin?
L25Q6. What is the entropy of a number of “heads” in two coin flips?
Review of logarithms and properties
• Base-2 logarithm gives a power of 2 equivalent for a number:
x log 2 3⇒ 3 25
• Logarithm of an inverse of a number is negative log of the number:
log 2
6
3
log 2 3
• Logarithm of a product is the sum of two logarithms:
log 2 37 log 2 3 8log 2 7
• Logarithm of a ratio is the difference of two logarithms:
3
log 2
log 2 3 log 2 7
7
3
1
2
3
~log 2 3
0
1
1.6
4
5
2.3
6
7
2.8
8
9
10
11
3.5
12
13
14
15
3.7
L25Q7. Complete the above table using logarithm properties.
2:
L25Q8. What is log 2
?
6;<
3
12/1/2014
Entropy of the sum of two dice
1
2
3
4
5
6
1
2
3
4
5
6
7
2
3
4
5
6
7
8
3
4
5
6
7
8
9
4
5
6
7
8
9
10
5
6
7
8
9
10
11
6
7
8
9
10
11
12
S
2
3
4
5
6
7
8
9
10
11
12
p
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
L25Q9. What is the entropy of the sum of two dice?
L25Q10. What is the entropy of one out of eleven equally likely outcomes?
Super-Fast Sandwiches
#1
#2
#3
#4
#5
#6
#7
#8
36
25
18
20
12
5
60
24
L25Q11. How many bits are needed to encode each order with a bit sequence?
L25Q12. What is the entropy of one order given the popularity statistics above?
4
12/1/2014
Huffman Codes use bits efficiently
#1
#2
#3
#4
#5
#6
#7
#8
36
25
18
20
12
5
60
24
Use fewer bits for more common symbols. Here’s how:
1. Place least frequent symbol on the right “1” branch and next least
frequent on left “0” branch as end nodes of a tree graph.
2. Combine these two symbols into one new node. Mark the frequency of
the new node. Return to step 1, considering nodes as new symbols.
L25Q13. Create a Huffman tree based on the order statistics given above.
Encoding and decoding Huffman
#1
#2
#3
#4
#5
#6
#7
#8
36
25
18
20
12
5
60
24
Hoffman Codes are prefix-free! (If you know where the message starts,
you can separate the symbols without confusion.)
L25Q14. Complete the table above with Huffman codes from the tree above.
L25Q15. Decode the order for the following bit sequence
5
12/1/2014
Average code length is no less than entropy
• Given =symbols , / , … 0 and corresponding frequencies, ? , the
average length per symbol is
@
0
A
+?
1
,@
0
+( ,@
1
L25Q16. What is the average bit length per sandwich order?
L25Q17. How does the average bit length compare to entropy?
Learning Objectives
a. Compute compression ratio and savings
b. To use relative frequency to compute entropy, the
shortest theoretical average code length.
c. To encode a symbol set with a Huffman code.
d. To decode a Huffman-encoded message.
e. To compute average code length for given a code.
6
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement