12/1/2014 Compression - reducing the number of bits • • • • Lossless vs. lossy compression Compression ratios and savings Entropy as a measurement of information Huffman code construction and decoding Data Compression Ratio and Savings • Data Compression Ratio (DCR) DCR • Savings: # # 1 (x100 for %) L25Q1. Stereo audio is sampled at 44.1 kHz and quantized to 16 bits/channel and then compressed to 128 kbps mp3 playback format. What are the approximate DCR and the resulting savings? L25Q2. A picture of a samurai was saved as a 24-bit samurai.bmp (full size, 2188 kB) and a 31 kB samurai.png. Estimate the DCR and savings from the PNG compression. 1 12/1/2014 Lossy and Lossless Compression • Lossy Compression – Usually leads to larger DCR and savings – Sometimes creates noticeable “artifacts” – Examples: mp3, mpeg, jpeg • Lossless Compression (keeping all information) – Uses repetition or other data statistics – Usually leads to smaller compression ratios (~2) – Examples: PNG, run-length codes, Huffman codes… from Wikipedia L25Q3. Why was the samurai picture compressed so much? Can we expect to achieve such DCR with a photograph? Amount of information in data • The amount of information is measured in bits • Consider the following dialogue: – I have information for you. – What is it? – Guess! – Can I ask yes/no questions? – OK. You can ask 20 of them. Use them wisely. from Wikipedia L25Q4. How much information can be obtained with the 20 yes/no questions? 2 12/1/2014 Entropy measures information • If you can predict the data, it contains less information • Increase in entropy of data increases information in it • Entropy, &,of each symbol or a message can be defined with ( representing statistical frequency of ) * possibility (aka probability) 0 & +( , 0 log / ( 1 + ( , log / 1 1 ( L25Q5. What is the entropy in a result of a single flip of a fair coin? L25Q6. What is the entropy of a number of “heads” in two coin flips? Review of logarithms and properties • Base-2 logarithm gives a power of 2 equivalent for a number: x log 2 3⇒ 3 25 • Logarithm of an inverse of a number is negative log of the number: log 2 6 3 log 2 3 • Logarithm of a product is the sum of two logarithms: log 2 37 log 2 3 8log 2 7 • Logarithm of a ratio is the difference of two logarithms: 3 log 2 log 2 3 log 2 7 7 3 1 2 3 ~log 2 3 0 1 1.6 4 5 2.3 6 7 2.8 8 9 10 11 3.5 12 13 14 15 3.7 L25Q7. Complete the above table using logarithm properties. 2: L25Q8. What is log 2 ? 6;< 3 12/1/2014 Entropy of the sum of two dice 1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 S 2 3 4 5 6 7 8 9 10 11 12 p 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 L25Q9. What is the entropy of the sum of two dice? L25Q10. What is the entropy of one out of eleven equally likely outcomes? Super-Fast Sandwiches #1 #2 #3 #4 #5 #6 #7 #8 36 25 18 20 12 5 60 24 L25Q11. How many bits are needed to encode each order with a bit sequence? L25Q12. What is the entropy of one order given the popularity statistics above? 4 12/1/2014 Huffman Codes use bits efficiently #1 #2 #3 #4 #5 #6 #7 #8 36 25 18 20 12 5 60 24 Use fewer bits for more common symbols. Here’s how: 1. Place least frequent symbol on the right “1” branch and next least frequent on left “0” branch as end nodes of a tree graph. 2. Combine these two symbols into one new node. Mark the frequency of the new node. Return to step 1, considering nodes as new symbols. L25Q13. Create a Huffman tree based on the order statistics given above. Encoding and decoding Huffman #1 #2 #3 #4 #5 #6 #7 #8 36 25 18 20 12 5 60 24 Hoffman Codes are prefix-free! (If you know where the message starts, you can separate the symbols without confusion.) L25Q14. Complete the table above with Huffman codes from the tree above. L25Q15. Decode the order for the following bit sequence 5 12/1/2014 Average code length is no less than entropy • Given =symbols , / , … 0 and corresponding frequencies, ? , the average length per symbol is @ 0 A +? 1 ,@ 0 +( ,@ 1 L25Q16. What is the average bit length per sandwich order? L25Q17. How does the average bit length compare to entropy? Learning Objectives a. Compute compression ratio and savings b. To use relative frequency to compute entropy, the shortest theoretical average code length. c. To encode a symbol set with a Huffman code. d. To decode a Huffman-encoded message. e. To compute average code length for given a code. 6
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
advertisement