INFORMATION TO USERS

INFORMATION TO USERS

INFORMATION TO USERS

This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer.

The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedtbrough, substandard margins, and improper alignment can adversely affect reproduction.

In the unlikely. event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back of the book.

Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6" x 9" black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order.

U-M-I

University Microfilms International

A Bell & Howell Information Company

300 North Zeeb Road. Ann Arbor. M148106-1346 USA

313/761-4700 800/521-0600

- - - - - -

Order Number 9426576

Entropy-constrained predictive trellis coded quantization and compression of hyperspectral imagery

Abousleman, Glen Patrick, Ph.D.

The University of Arizona, 1994

U·M·I

300 N. Zeeb Rd.

Ann Arbor, MI 48106

ENTROPY-CONSTRAINED PREDICTIVE TRELLIS CODED

QUANTIZATION AND COMPRESSION OF HYPERSPECTRAL

IMAGERY

by

Glen Patrick Abousleman

A Dissertation Submitted to the Faculty of the

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

In Partial Fulfillment of the Requirements

For the Degree of

DOCTOR OF PHILOSOPHY

WITH A MAJOR IN ELECTRICAL ENGINEERING

In the Graduate College

THE UNIVERSITY OF ARIZONA

1994

THE UNIVERSITY OF ARIZONA

GRADUATE COLLEGE

As members of the Final Examination Committee, we certify that we have read the dissertation prepared by

Glen Patrick Abouslp.man entitled Zntro9y-Constrained Predictive Trellis Coded

Quantization and Compression of Hyperspectral

Imagery and recommend that it be accepted as fulfilling the dissertation requirement for the Degree of Doctor of Philosophy

Robert

SChOWenge~

Date

Date

Final approval and acceptance of this dissertation is contingent upon the candidate's submission of the final copy of the dissertation to the

Graduate College.

I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation

R. Hunt

1at~!

r,,199

'I

7'-g'-f?Y

Date

3

STATEMENT BY AUTHOR

This dissertation has been submitted in partial fulfillment of requirements for an advanced degree at The University of Arizona and is deposited in the University

Library to be made available to borrowers under rules of the Library.

Brief quotations from this dissertation are allowable without special permission, provided that accurate acknowledgment of source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate

College when in his or her judgment the proposed use of the material is in the interests of scholarship.

In

all other instances, however, permission must be obtained from the author.

SIGNED:

4

ACKNOWLEDGEMENTS

I would like to thank my advisors Dr. Michael W. Marcellin and Dr. Bobby R.

Hunt for their support, persistence, guidance, and willingness to help at a moments notice. A special thanks goes to my fellow graduate students and lab colleagues

Ralph, Jim, Phil, Scott, Patrick, Dave, and Mike.

On a personal level, I wish to thank my parents for their continued support, confidence, and encouragement.

5

TABLE OF CONTENTS

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6

ABSTRACT. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. 8

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . .. " 10

2 ENTROPY-CONSTRAINED PREDICTIVE TCQ . . . . . . . .. 20

2.1 B a c k g r o u n d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Trellis Coded Quantization . . . . . . . . . . . . . . . . . . . . . .. 29

2.3 Entropy-Constrained Trellis Coded Quantization . . . . . . . . . " 34

2.4 Entropy-Constrained Predictive Trellis Coded Quantization . 37

2.4.1 Performance of ECPTCQ . . . . . . . . . . . . . . . . . . .. 40

3 HYPERSPECTRAL IMAGE CODER USING ECPTCQ . . . .. 43

3.1 System Description . . . . . . . . . . . . . . . . . . . . . . . . . . " 43

3.2 Codebook Design . . . . . . . . . . . . . . . . . . . . . . . . . . . " 43

3.3 Side Information and Rate Allocation . . . . . . . . . . . . . . . " 46

3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 50

4

3-D DCT HYPERSPECTRAL CODER.

4.1 System Description.

4.2 Codebook Design .

4.3 Side Information and Rate Allocation

4.4 Results

56

56

57

60

61

5 HYBRID DPCM/DCT HYPERSPECTRAL CODER . . . . . " 64

5.1 System Description. . . . . . . . . . . . . . . . . . . . . . . . . . .. 64

5.2 Codebook Design, Rate Allocation, and Side Information . . . . . . 67

5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6 COMPARISONS OF HYPERSPECTRAL CODERS ..

72

78 7 SUMMARY . . . . . . . . . . . .

Appendix A. PHOTOGRAPHS ..

REFERENCES . . . . . . . . . . . . .

81

86

6

LIST OF FIGURES

2.1 Input-output characteristics of a uniform scalar quantizer.

2.2 A 4-state trellis with subset labeling and codebook. . . . .

2.3 Granular gain for various trellis sizes (IOg2

N).. . . . . . .

23

30

31

2.4 Performance of 4-state and 256-state TCQ for encoding the memoryless Gaussian source. . . . . . . . . . . . . . . . . . . . . . . .. 32

2.5 Performance of 4-state and 256-state TCQ for encoding the memoryless Laplacian source. . . . . . . . . . . . . . . . . . . . . . . .. 33

2.6 Uniform TCQ codebook. . . . . . . . . . . . . . . . . . . . . . . . . 35

2.7 Performance of an 8-state entropy coded uniform TCQ for encoding the memoryless Gaussian source. . . . . . . . . . . . . . . . . . 36

2.8 Performance of an 8-state entropy-constrained TCQ for encoding the memory less Gaussian source. . . . . . . . . . . . . . . . . . . .. 37

2.9 SNR performance of ECPTCQ. . 42

3.1 Hyperspectral image coder. . . . 44

3.2 Overall rate versus number of spectral bands. 51

3.3 Performance of encoding hyperspectral sequence at R/I

Rl

=

0.10 b/p/b,

=

1.0 bpp, and

RJ.'

=

2.0 bpp. . . . . . . . . . . . . . . . . 53

3.4 Performance of encoding hyperspectral sequence at

RB

=

0.10 b/p/b,

Rl = 0.75 bpp, and RJ.' = 0.75 bpp. . . . . . . . . . . . 54

4.1 Hyperspectral image coder using the 3-D DCT and ECTCQ.

4.2 Probability density function for generalized Gaussian distributions

56 with alpha values of 0.5, 1.0, 1.5, 2.0, and 2.5. . . . . . . . . .. 58

4.3 Kurtosis vs. alpha . . . . . . . . . . . . . . . . . . . . . . . . . . .. 59

4.4 Performance of encoding hyperspectral sequence at RB = 0.10 b/p/b. 62

4.5 Overall rate versus number of spectral bands . . . 63

5.1 Hybrid DPCM/DCT hyperspectral image encoder

5.2 Performance of encoding hyperspectral sequence at

RB

65

=

0.10 b/p/b. 70

5.3 Overall rate versus number of spectral bands . . . . . . . . . . . . . 70

6.1 Performance of encoding hyperspectral sequence at

RII

6.2 Overall rate versus number of spectral bands. . .

=

0.10 b/p/b. 73

74

A.1 Band 50 of a hyperspectral sequence (256 x 256). 81

7

A.2 Encoded image from ECPTCQ coder with high side information

(44.37 dB at 0.19

b/p/b). . . . . . . . . . . . . . . . . . . . . .

82

A.3 Difference image from ECPTCQ coder with high side information. . 82

A.4 Encoded image from ECPTCQ coder with low side information (42.11 dB at 0.17 b/p/b). . . . . . . . . . . . . . . . . . . . . . . . .. 83

A.5 Difference image from ECPTCQ coder with low side information. 83

A.6 Encoded image from 3-D DCT coder (41.66 dB at 0.104 b/p/b). .. 84

A.7 Difference image from 3-D DCT coder. . . . . . . . . . . . . . . 84

A.8 Encoded image from hybrid coder (41.01 dB at 0.116 b/p/b). .

A.9 Difference image from hybrid coder. . . . . . . . . . . . . . . .

85

85

8

ABSTRACT

A training-sequence-based entropy-constrained predictive trellis coded quantization (ECPTCQ) scheme is presented for encoding autoregressive sources. For encoding a first-order Gauss-Markov source, the MSE performance of an 8-state ECPTCQ system exceeds that of entropy-constrained DPCM by up to 1.0 dB. In addition, three systems an ECPTCQ system, a 3-D Discrete Cosine Transform (DCT) system and a hybrid system are presented for compression of hyperspectral imagery which utilize trellis coded quantization (TCQ). Specifically, the first system utilizes a 2-D DCT and ECPTCQ. The 2-D DCT is used to transform all nonoverlapping

8

X

8 blocks of each band. Thereafter, ECPTCQ is used to encode the transform coefficients in the spectral dimension. The 3-D DCT system uses TCQ to encode transform coefficients resulting from the application of an 8 x 8 x 8 DCT. The hybrid system uses DPCM to spectrally decorrelate the data, while a 2-D DCT coding scheme is used for spatial decorrelation. Side information and rate allocation strategies for all systems are discussed. Entropy-constrained codebooks are optimized for various generalized Gaussian distributions using a modified version of the generalized

Lloyd algorithm. The first system can compress a hyperspectral image sequence at

0.125 bits/pixel/band while retaining an average peak signal-to-noise ratio of greater

9 than 43 dB over the spectral bands. The 3-D DCT and hybrid systems achieve compression ratios of 77:1 and 69:1 while maintaining average peak signal-to-noise ratios of 40.75 dB and 40.29 dB, respectively, over the coded bands.

10

CHAPTER 1

INTRODUCTION

The common operating mode for strategic and tactical reconnaissance sensors has been, from the earliest days of photography, panchromatic. That is, the usual form of focal plane sensors integrate a wide range of data wavelengths into a single response.

This single response is then usually displayed as a gray-scale image in which the integrated wavelength response at a given spatial point is given a gray-scale range between pure white (maximum response) and pure black (zero response). The choice of panchromatic sensing has been purely pragmatic in motivation. Although it is known that some narrow-band responses are of interest, such as in infrared, there have been few problems of reconnaissance significance that were driven by the need to couple wavelength sensitivity to spatial resolution.

The preferability of only panchromatic sensing has begun to wane in recent years.

A variety of questions are now important that can only be answered by the ability to perform precise recording of sensed energy in a number of narrow wavelength slices.

For example, various types of camouflage and concealment techniques are revealed by narrow-band spectral sensing. The effluents of various manufacturing facilities, as sensed by fine spectral resolution, can be a critical clue to the type of processes employed in the facility. The agricultural yield and health of crops can be predicted

11 from quantitative analysis of fine resolution spectral images [1]. The development and utilization of fine resolution spectral sensors is becoming of prominent interest for these and many other applications.

The Landsat series of earth satellites conclusively established the value of employing image sensors with multiple wavelength sensitivity [2]. Landsat images represent a rather coarse slicing of the optical wavelength spectrum, however, being only 4 to

6 overlapping bands through the visible and near infrared wavelengths, with each band having a width of 100 to 200 nm. Given that many surface materials have absorption features that are only 20 to 40 nm wide [1], it is apparent that this class of "multispectral" sensors cannot record the narrow wavelength absorption features that are indicative of specific materials in laboratory-based spectroscopy [3].

The limitation associated with multispectral scanners has led to imaging spectrometers which register many narrow waveband images of a scene and allow the assemblage of a contiguous reflectance spectrum for each pixel in the scene. An early example of such a sensor is the Airborne Imaging Spectrometer (AIS) [4]. AIS was developed by the NASA Jet Propulsion Laboratory (JPL) for civil environmental applications. AIS could simultaneously record 128 near-infrared wavebands (each being

9.3 nm wide) with a 365 to 787 m swath.

12

A somewhat similar yet more complex JPL sensor was the Airborne Visible/Infrared

Imaging Spectrometer (AVIRIS) [5],[6]. This sensor collected 209 visible and nearinfrared wavebands, each of width 10 nm, with an ll-km swath. The radiometric quantization of AVIRIS is 10 bits.

The Earth Observing Satellite (EOS), to be launched in the near future, will serve as the platform for new, advanced high dimensional multispectral scanners. EOS will contain the High Resolution Imaging Spectrometer! (HIRIS) [7], a Moderate Resolution Imaging Spectrometer (MODIS) [8], and a synthetic aperture radar. HIRIS can collect 192 spectral bands in the 0.4-2.5

p.m

wavelength region, with 30 meter spatial and 10 nm spectral resolution, over a 24 km wide swath. The 2-D CCD array aboard HIRIS is sampled and quantized at 12 bits. The 192 bands will comprise a single HIRIS (i.e., hyperspectraQ image. HIRIS will be used by a wide diversity of remote sensing disciplines including geologic, oceans, soil, vegetation, atmosphere, snow/ice, and others.

The extreme spectral resolution afforded by HIRIS is representative of future spectral sensors. However, because of the enhanced spectral (and spatial) resolution, tremendous amounts of raw data are produced. In fact, HIRIS will expel 512 megabits/second in its operational state. Transmission of the complete data record to an earth receiving station is difficult since the EOS downlink channel is capable of

IThe

BIRIS project has been cancelled with no replacement announced.

13 only 300 megabit/second and must service all on-board experiments, not just HIRIS

[9].

An obvious data reduction technique is to discard a portion of the data (i.e., so-called "spectral editing"). Although the remaining data is unscathed, many of the advantages of the fine spectral resolution imagery may be lost. Perhaps a more desirable solution is to use lossless or information preserving compression [10]. The purpose of lossless compression is to represent the data using a minimum number of bits by reducing the statistical redundancies inherent to the data. Although certainly viable, this method can only provide compression ratios of about 3:1 [11].

Another alternative, which is the subject of this paper, is lossy compression

[12],[13],[14]. Although distortion is introduced (in the reconstructed data), very high compression ratios can be obtained. Proper optimization of the compression system can yield an error small enough such that visual degradations are practically nonexistent and classification errors are small. Compression ratios of greater than

30:1 would enable HIRIS to operate 100% of the time rather than the anticipated 2 to 3% duty cycle [9].

Once the sensor information is down-loaded to an earth receiving station, processing and handling of the data is very problematic. For example, the EOS platform will expel terabytes of data per day [15], with the total data collected being on the order of 10

16 bytes over its 15-year mission. Moreover, it is estimated that data handling

14 expenses alone will amount to $300 million over the life of the system [16]. Groundbased operations include archival, browsing, dissemination, and rapid delivery and analysis of hyperspectral data.

In its pristine form, storage of the data may be feasible (although expensive) with the use of optical disk technology, but transmission from site to site places undue demands on the communications link, and browsing as a prelude to analysis is nearly impossible. It may be advantageous to have browse-quality data online to facilitate rapid viewing of several hyperspectral images. Fewer bytes on the disk equate to much faster read times and transmission costs of the data are reduced significantly

[16].

Lossy compression of imagery can manifest itself in sensor-based and/or groundbased systems. The former requires fast, efficient algorithms, implement able with existing low complexity (low power consumption) hardware in near-real-time. The requirements of the latter are somewhat relaxed in that greater complexity hardware can be utilized, and speed may not be of paramount importance. However, as discussed earlier, for uses such as rapid browsing, decoding should be fast enough to handle data rates in real time (50 kbit/sec). Also, savings in transmission and storage should not be negated by decoding computational costs.

HIRIS / AVIRIS is exemplary of the characteristics of future fine-spectral-resolution image sensors. The volume of data in such images requires fundamental rethinking of many image processing operations that have been developed for panchromatic

15 and even low-dimensional multispectral data. A property of fine-spectral-resolution imagery is interband correlation.

It is easy to observe in even coarse-band imagery, such as Landsat multispectral or three-primary color images, that many features of edge definition, contrast, texture, gray-level, etc., remain substantially the same from spectral band to spectral band. The interband correlation facilitates substantial reduction of the data required for storing and/or transmitting such imagery. However, a careless approach to reducing the correlation could lead to disastrous loss of the information differences between bands that are the critical value of multispectral imagery. An improper reduction of correlation redundancy could make it impossible to exploit the imagery for significant utility.

Recent work related to lossy compression of multispectral and hyperspectral imagery has been reported in the literature. In [9], DPCM, block truncation coding, transform coding, and various vector quantization (VQ) schemes were investigated to compress hyperspectral data from the Airborne Imaging Spectrometer.

It was found that mean residual and gain-shape VQ yield the best results (in the context of spectral signature mapping and mixture component analysis, as performed by the Spectral

Analysis Manager (SPAM) software package) with compression ratios of

~

20:1. VQ was found to yield the best quantitative performance with a compression ratio of 32:1.

Nonlinear predictive VQ was developed in [17] which used interband vector prediction and intraband VQ. Simulations with Landsat Thermatic Mapper (TM) imagery yielded average bit rates of 0.34 bpp with excellent quantitative results. This work

16 was extended in [18] using Variable Rate Multistage Vector Quantization (MSVQ) to yield slightly better performance. [19] used spectral VQ to compress imagery from the Thermal Infrared Multispectral Scanner (TIMS) and the Calibrated Airborne

Multispectral Scanner (CAMS) at 24:1 and 28:1, respectively, with very favorable results. [20] also investigated the use of VQ to compress data from the Advanced Very

High Resolution Radiometer (AVHRR). Their system yielded compression ratios of

24:1 with visually pleasing results. [16] also reports results of compressing AVHRR data using progressive VQ with the use of a Single Instruction Multiple Data (SIMD) machine. The system so described can be operated in various image reconstruction levels ranging from browse quality to lossless.

Several non-VQ systems have also been recently introduced. A hierarchical data compression system was developed in [21] which allowed compression with high loss, moderate loss, and no loss. This system incorporated block averaging, quadtrees and iterative parallel region growing. For compressing Landsat TM imagery, compression ratios of up to 27:1 were obtained. [22] introduced a system whereby the 3-D data is first decorrelated spectrally by using the Karhunen-Loeve (KL) transformation, and then subjected to an adaptive discrete cosine transform (DCT) coding technique.

A similar system was presented in [15]. That system also used the KL transform to spectrally decorrelate the multispectral data. However, the discrete wavelet transform was used to spatially decorrelate the principal components. It was reported that compression of the principal components yielded a 40% improvement in compression

17

(as compared to spatial coding of the bands themselves), with a similar reduction in mean-squared error.

Lossless compression schemes have also been reported in the literature [23],[24]. In particular, [11] found that in Landsat TM imagery, exploitation of spatial correlation yields better compression than does the use of spectral correlation. Accordingly, compression ratios of 3:1 were obtained. Slightly better performance was obtained in

[25] with the use of spatial prediction and arithmetic coding.

The algorithms developed in the present work for compression of hyperspectral imagery are based on a relatively new quantization scheme known as trellis coded quantization (TCQ). TCQ was introduced in [26] as an effective and computationally tractable method for encoding memoryless sources. TCQ is spawned from trellis coded modulation (TCM) originally developed by Ungerboeck [27]. In its simplest form, for encoding at

R

bits per sample using TCQ, a scalar codebook of size 2

R

+

1 is partitioned into 4 subsets, each of size 2

R

-

1

, and an appropriate trellis is labeled with these subsets. The Viterbi algorithm [28] is then used to find the minimum mean-squared error (MSE) path through the trellis.

High-performance image coding algorithms using TCQ have been recently introduced in the literature. For example, [29] designed an image coder which uses

TCQ to encode the coefficients resulting from the application of the 2-D DCT. For encoding the "Leena" image, it is shown that peak signal-to-noise ratios (PSNR)

(10Iog1Q255/MSE) of 39.33, 35.97, and 32.49 dB are obtained at encoding rates of

18

1.0, 0.5, and 0.25 bits/pixel, respectively. Further, the wavelet coder described in

[30] codes "Leena" with PSNR values of 39.85, 36.61, and 33.77 dB, at rates of

1.0, 0.5, and 0.25 bits/pixel, respectively. This coder uses wavelet filters to create

16 equal-sized subbands, each of which is encoded using TCQ, with the lowest frequency subband being transformed using a 2-D DCT prior to quantization. Both systems have excellent subjective performance, with the wavelet coder quantitatively surpassing all other systems reported in the literature to date.

In the present work, TCQ is used in three hyperspectral coding systems which are based on the DCT. The DCT is often rationalized as an approximation to the full optimality of the KL transform; the computational simplicity afforded by the DCT being a strong motivating factor. The DCT is very attractive in the computational cost versus performance category and has been adopted for international standards such as JPEG [31], CCITT H.261 [32], and MPEG [33]. Recent introduction of dedicated DCT processors [34],[35],[36] may facilitate implementation of the systems discussed herein in either sensor-based or ground-based applications.

Of the three systems described, the first uses the 2-D DCT to spatially decorrelate the bands, and ECPTCQ to encode the transform coefficients in the spectral dimension. The second system partitions the hyperspectral sequence into 8 x 8 x 8 cubes and applies a 3-D DCT. "Like-coefficients" are collected and quantized using entropy-constrained trellis coded quantization (ECTCQ). The third system is a hybrid DPCM/DCT configuration whereby DPCM is used in the spectral dimension

19 and each "error image" is coded using a 2-D nCT coder. In this coder, 8 x 8 blocks are transformed and like coefficients are collected and quantized using ECTCQ.

Codebooks for the 3-n nCT system are optimized for different generalized Gaussian distributions and the appropriate codebook is chosen based on the fourth central moment, while the hybrid system assumes Gaussian and Laplacian statistics. Spectral codebooks for the ECPTCQ system are optimized for first-order Gauss-Markov sources with various correlation coefficients. Codebook design for all systems uses a modified version of the generalized Lloyd algorithm in a training-sequence-based iterative scheme. Rate allocation is performed in an optimal fashion by an iterative technique which uses the rate-distortion performance of the various trellis-based quantizers.

20

CHAPTER 2

ENTROPY-CONSTRAINED PREDICTIVE TCQ

2.1 Background

The function of any lossy image compression scheme is to represent a digital image with fewer bits than that required by the original sampled image, while adhering to certain quantitative and subjective fidelity criteria. Generally speaking, data compression is achieved by decorrelating the redundancies (either spatial or spectral, depending on the type of imagery) inherent to the imagery, and quantizing the decorrelated data.

Decorrelation methods fall into one of two categories, spatial domain methods and frequency domain methods. Spatial domain methods attempt to remove redundancies by operating on the data directly. An example is DPCM, where the past data sample is used to predict the current sample. Ideally, the difference between the current sample and the predicted sample (i.e., the "error sample") is not correlated with other error samples at different instances of time. Frequency domain methods can further be broken down into transform methods and subband methods. Transform methods involve applying an energy preserving transform to the data such that the representation is in a different domain. In the transform domain, the majority

21 of energy is concentrated in a small number of "transform coefficients," where the coefficients tend to be less correlated than the original data. There are many different transforms with good energy compaction properties [12],[37]. Of these, the OCT has emerged as the most popular, partially because there exist fast implementation algorithms [38]. Transform coding has proven to be a very effective technique for image coding [38],[39],[40],[37].

Subband methods divide the data into frequency components or "subbands" by applying bandpass filters to the data, each tuned to a different center frequency.

Ideally, the subbands so created are mutually uncorrelated. Subband coding was first introduced in [41] for speech signals. Quadrature mirror filters (QMF) were introduced [42] and shown to allow alias free signal reconstruction in the absence of quantization errors. Subband decomposition was extended to multidimensional signals in [43] and subband image coders were subsequently introduced in [44] and

[45]. Since then, a variety of subband coders have been developed [46],[47]'[48].

A particular class of subband decomposition is

wavelet decomposition [49],[50],[51].

Wavelets are a family of functions derived from scaling and translating a single function known as a "mother" wavelet[52]. At each stage in a wavelet decomposition, the input signal is divided into a coarse signal (i.e., a low pass-signal) and a detail signal

(i.e., a high-pass signal). The discrete wavelet transform effectively decomposes an input signal into a set of frequency subbands.

22

Once the data has been decorrelated, information reduction is achieved with the use of a

quantizer.

The simplest quantization scheme is the one-dimensional or

scalar

quantizer (SQ). For our discussion, we assume the source to be a discrete-time, zeromean memoryless stationary process

{Xi}

with probability density function (pdf)

fx(x)

and variance (72. An N-Ievel quantizer maps the source output

x

E

lR

into one of

N

values

Q17 Q2, .•. , QN,

each in

lR,

based on a set of thresholds T t ,

T2, ... , TN-t.

That is, an input sample x is represented by a discrete output level

Q"

if x

E

(11-17 T,]

(with To

=

-00 and

TN

=

(0).

The

T,

are called

thresholds levels

and the

QI

are called

reconstruction levels.

The performance of the quantizer is typically measured by the mean-square error and is given by

P

=

N fTI

E

11

1=1 T,_l

(x -

QI)2 fx(x)dx.

(2.1)

Figure

2.1 shows an example of a 8-level uniform scalar quantizer (USQ). Note that for a uniform quantizer, the reconstruction levels (shown along the vertical axis) and the threshold levels (shown along the horizontal axis) have uniform spacing A.

For N levels and equal-length codeword assignment, the number of bits required to code each output index is

R

= log2

N

bits/sample.

Two types of distortion are inherent to any finite-level scalar quantizer. These are granular distortion and overload distortion [12]. Granular distortion refers to errors incurred by representing x with

Q"

while overload distortion occurs when the input exceeds

XOL

as shown in Figure

2.1.

y

QS--

Q7

Q6--

XOL

Tl

T2

T3

Q5

Q4 T5

I

• •

I

T6 T7

XOL

x

-r-

Q3

~t.

Q2 r-

Ql

Figure 2.1: Input-output characteristics of a uniform scalar quantizer.

23

24

For non-uniform memory less sources, the uniform scalar quantizer diverges from the rate-distortion function

R(D)

with increasing rate. (Recall that the rate distortion function gives the minimum distortion possible at a given rate for a particular distribution.). For example, when quantizing the memoryless Gaussian source using

USQ at

R

=

3 bits/sample, USQ is 3.79 dB from

R(D)

while at 7 bits/sample, the margin increases to 7.01 dB.

The performance of SQ can be improved for non-uniform sources by using nonuniformly spaced reconstruction and threshold levels [53]. These so-called Lloyd-Max quantizers (LMQ) are optimum since they minimize the average distortion

p,

for a fixed number of levels

N.

The necessary conditions are obtained by differentiating

p

in equation (2.1) with respect to the

Q,

and

11

and setting derivatives equal to zero.

The

Q,

become the centroids of the area of the pdf bounded by

T'-l

and

11,

and the

T,

become the midpoints of Q'-l and

Q,.

Solutions for

11

and

Q,

can be computed iteratively. For log-concave distributions (i.e.,

(Plogjx(x)/8x

2

<

0) these conditions are also sufficient.

For quantizing the memoryless Gaussian source, LMQ is 0.35 dB better than USQ at 3 bits/sample and 2.68 dB better at 7 bits/sample. The performance disparity is even greater for the memoryless Laplacian source, where at 7 bits/sample, LMQ outperforms USQ by 5.46 dB. However, although the divergence from

R(D)

of LMQ is less than that of USQ, the performance of LMQ (with respect to

R(D»

at high

25 rates is still very poor, with LMQ being 4.33 dB and 7.08 dB from the respective rate-distortion functions of the Gaussian and Laplacian sources, at

R

=

7 bits/sample.

The performance of LMQ (or USQ) can be improved for encoding non-uniform sources by entropy coding the quantizer output with the use of a variable-length coding scheme such as Huffman [54] or Arithmetic [55] coding. These coding schemes enable the actual bit rate to approach the output entropy of the quantizer, which is defined as

HN

=

N

-:E

P,log2 P,

bits/sample.

1=1

(2.2)

In this way, more levels can be used (which minimizes overload distortion) and noninteger encoding rates can be accommodated. Drawbacks of entropy coding include the increased complexity of the encoder/decoder, the additional buffering required to maintain a constant channel bit rate [56], and the increased susceptibility to error propagation [57].

At high encoding rates the optimum scalar quantizer is simply entropy-coded

USQ [58]. However, at lower rates, improved performance can be obtained by using

entropy-constrained scalar' quantization

(ECSQ). In this scheme, the average distortion is minimized, while t.he output entropy is held below a certain value. The optimum reconstruction levels are computed using

Q,

=

1

D

iTI

.q

T'_l xfx(dx)

(2.3)

26 where

1',

is the probability that the source output assumes a value in the

Ith

partition:

1',

== prob(1i_l

<

X

:51i)

=

1'1

fx(x)dx.

T,_

1

(2.4)

The optimum threshold levels (for a given output entropy) are computed using

(2.5)

In this fashion, the levels of the scalar quantizer are optimized so as to redistribute the output probabilities such that levels with high probability of occurrence are assigned short codeword lengths and low probability levels are assigned long codewords. It is useful to note two important characteristics of entropy-constrained scalar quantizers.

First, the output entropy depends only on the threshold levels

1i,

and hence the

Ql

have no effect on the entropy. Secondly, it is shown in [59] that nearly equal performance can be obtained by using uniform thresholds. That is, rather than designing the threshold levels according to (2.5), it suffices to simply choose the

Tl

uniformly, with spacing

~, and calculate the

Ql

using equation (2.3) as before. The choice of

~ greatly simplified. It is shown in [60] that for the Laplacian and exponential densities, uniformly spaced thresholds satisfy the necessary conditions for optimality.

At 1 bit/sample, ECSQ is 0.24 dB better than entropy-coded USQ for the memoryless Gaussian source, while the difference is 2.75 dB for the Laplacian source.

Although these differences in performance are source dependent, they are also attributable to the number of reconstruction levels. In fact, when the constraint on

27 the number of reconstruction levels is removed, ECSQ is known to provide a lower bound on the rate-distortion performance of all scalar quantizers [59]. For encoding all memory less sources, ECSQ comes within 1.53 dB of the respective rate-distortion functions for all non-negative rates.

The reason why LMQ and USQ do not achieve performance comparable to ECSQ

(for non-uniform sources) is explained as follows. Let

xm

=

[Xl!

X2, ••• , xm]

be a block of m samples from a memoryless source

X

with joint density function

fx(x) = fx(xt,

X2, ••• , x m ).

As m becomes larger, m-dimensional samples (blocks) become uniformly distributed inside a m-dimensional region

'P

elRm. This result is the socalled asymptotic equipartition principal (AEP) [61]. The shape or boundary of this region is dependent solely upon the source distribution. For the Gaussian source the region is a m-dimensional spherical shell, while for the Laplacian source, it is a m-dimensional pyramid [62]. Entropy coding exploits AEP and in effect, provides performance increases equivalent to those produced by matching the boundary of an m-dimensional codebook to that required by the source distribution. For the uniform source, this boundary is a m-dimensional cube and ECSQ provides no performance increase over USQ (at high rates).

Thus far, we have discussed quantization in the context of one dimension. However, there are substantial performance advantages to quantizing blocks of data rather then a single sample. Given a m-dimensional block of data

[Xl!

X2, ••• , x m ],

we choose a m-dimensional codeword that is "closest" (as defined by some metric, usually MSE)

28 to the input. For quantizing m-dimensional blocks at

R

bits/sample, a codebook of size

2Rm

is required. This form of quantization is known as vector quantization

(VQ). VQ can come arbitrarily close to the rate-distortion function by using properly designed codebooks and large vector dimension [63]. In effect, VQ places all its codevectors inside the high probability region

P.

Specifically, VQ gains advantages over uniform scalar quantization via three mechanisms -

boundary gain, granular gain,

and non-uniform density gain. [64],[65].

As discussed earlier, for large vector dimension, the codevectors must be distributed uniformly inside a m-dimensional region

P.

The shape of this region depends only on the pdf of the source (spherical and pyramidal for Gaussian and Laplacian, respectively), and is independent of the distortion measure [65]. Uniform scalar quantizers place their codevectors in a m-dimensional cubic lattice (obtained by taking a m-fold Cartesian product of the scalar codebook), and cannot conform to the boundaries required by non-uniform sources. For uniform sources,

f'll

is uniform over a m-dimensional cubic lattice and VQ provides no boundary gain over USQ (at high encoding rates).

Granular gain is a function of the shape of the Voronoi regions (partitions) of the quantizer. Granular gain is dependent only on the distortion measure and is independent of the source distribution. For the MSE distortion measure, the m-dimensional region with the smallest normalized second moment is a m-sphere. Consequently, as m

~ the granular gain (as compared to the m-dimensional cubic Veronoi regions

29 of USQ) of a m-dimensional spherical region approaches 1.53 dB [66]. This is the maximum granular gain achievable or the

ultimate granular gain.

For smaller vector dimension m, non-uniform density gain is the gain achievable by selecting the vector codewords to lie closely in high probability regions and farther apart in other regions. This gain disappears as m gets large, since the codewords become uniformly distributed inside the high probability region

1', as dictated by

AEP. Of course, this gain does not exist for uniform sources.

For non-uniform vector quantizers, it is difficult to evaluate separately boundary and non-uniform density gains, since the shapes and volumes of the Veronoi regions are extremely varied [64]. Boundary gain and granular gain are also difficult to evaluate independently for both non-uniform and uniform (e.g., lattice) VQ at low encoding rates. However, at high encoding rates, it was shown in [65] that granular and boundary gain can be independently evaluated and that the maximum granular gain achievable is 1.53 dB.

2.2 Trellis Coded Quantization

Trellis Coded Quantization was developed in [26]. For encoding a memoryless source at

R

bits per sample, a codebook of size

2RH

is partitioned into four subsets, each containing 2

R

-

1 codewords. These subsets are labeled

Do, Db D2

and

D

3 ,

and are used as labels for the branches of a suitably chosen trellis. An example is shown in Figure 2.2 for R

=

2.

30

• •

01 01 01

00

01

02

03

I

00

01

02

03

Figure 2.2: A 4-state trellis with subset labeling and codebook.

Sequences of codewords that can be produced by the TCQ system are those that result from "walks" along the trellis from left to right. For example, if beginning in the top left state of the trellis in Figure 2.2, the first codeword must be chosen from either

Do or

D2.

If a codeword from

D2

is chosen, then we walk along the lower branch (shown with a heavy line) to the second state from the bottom, at which we must choose a codeword from either DI or

D

3 •

Given an input data sequence

Xl,

X2, ••• ,

the best (minimum mean-square error) allowable sequence of codewords is determined as follows. For the

ith

stage in the trellis (corresponding to

Xi), the best codeword in the

lh

subset

(j

=

0, 1, 2, 3), say

Cj, is chosen and the associated cost

Pj

=

(Xi Cj)2 is calculated. Each branch in the

ith

stage of the trellis that is labeled with subset Dj is assigned cost Pj. The Viterbi algorithm [28] is then used to find the path through the trellis with the lowest overall cost.

31

,.... 1.0

ID

~ c

'0

(!)

:; c

~

(!)

0.5

2

4 6

Number of trellis states

8

Figure 2.3: Granular gain for various trellis sizes (lOg2

N).

10

A 256-state TCQ system can achieve a granular gain of 1.36 dB (out of the maximum 1.53 dB). Further, a simple 4-state system can achieve 0.99 dB. Figure 2.3 shows the granular gain of TCQ as a function of the number of trellis states.

Figures 2.4 and 2.5 show the performance of encoding zero-mean, unity-variance, memoryless Gaussian and Laplacian sources, respectively, using TCQ with trellis sizes of 4 and 256. All TCQ codebooks were designed using the generalized Lloyd algorithm [67]. It is apparent that even a 4-state TCQ system outperforms LMQ for all integer encoding rates. In fact, 4-state TCQ outperforms LMQ by 1.57 dB and 2.23 dB when encoding the Gaussian and Laplacian sources, respectively, at 3

32

Rate-Distortion Function

Uniform Scalar Quontizer

Lloyd-Max Quontizer

TCQ (4 states)

TCQ (256 stotes)

20

Figure 2.4: Performance of 4-state and 256-state TCQ for encoding the memory less

Gaussian source. bits/sample, while a 256-state TCQ system achieves a performance increase of 2.02

~ dB and 2.73 dB, respectively.

As shown above, TCQ can realize a significant portion of the maximum achievable granular gain (e.g., 1.36 dB out of 1.53 dB with a 256-state trellis). However, it does not realize any boundary gain (at high rates). Some non-uniform density gain can be realized by using non-uniform codebooks.

The complexity of TCQ is roughly independent of the encoding rate. The computations required per data sample are 4 rate-(R - 1) scalar quantizations with associated distortion calculations, 2N adds and N two-way compares, where N is the number of trellis states. For comparison, the encoding and design complexity of VQ

33

Shannon Lower Bound

Uniform Scalar Quantizer lloyd-Max Quantizer

TCQ (4 states)

)

..

••• It>

",,0

' "

.<~-'>

."/

,-:, •• , .... »

~~~:~::::,:

..

.-" ,-"

.;;.{{-;:~".:-:::~;/

./"

..

;;

".~/

/

".

". .~.

",.:;.

/,,:7

.:,.,

.,.-:/.,

~ .0..~., o

J'~

__

5 10

SNR (dB)

15 20

Figure 2.5: Performance of 4-state and 256-state TCQ for encoding the memoryless

Laplacian source. grows exponentially with

Rm,

where

R

is the encoding rate in bits/sample and m is the vector dimension. By evaluating the asymptotic VQ bound, [26] showed that no VQ of dimension less than 69 can achieve performance comparable to that of a

256-state TCQ system. Further, a lattice VQ of dimension 24 would be required to match the performance of a simple 4-state TCQ [65]. Application of VQ with high vector dimension is computationally prohibitive (for both encoding and codebook design) for moderate to high rates. Even at low rates, a 4-state TCQ system is far easier to implement than a VQ with similar performance. It should be noted that several reduced-complexity VQ schemes exist, but at the expense of some reduced performance [62],[68].

34

2.3 Entropy-Constrained Trellis Coded Quantization

Although the performance of TCQ is far superior to that of LMQ, there is still much performance to be gained by entropy coding the output of a specially designed

TCQ system.

Note that at each step in the encoding, the codeword must be chosen from either

Ao

=

Do

U

D2

or

Al

=

DI

U

D

3 •

Each of these "supersets" contains 2R codewords and hence, given an initial trellis state, the sequence of selected codewords can be transmitted using one R-bit label for each sample. In [69], the codeword labels (as described above) were noiselessly compressed using one variable length code for each superset. The encoding rate achievable by this process is the conditional entropy of the codebook

G,

given the superset:

H(GIA)

= -

I

E E

P(cIAi)P(A,)log2 P (cIA,).

,=0 cEAi

(2.6)

For a codebook of size

2R+1

(as discussed above) this noiseless compression causes the encoding rate to fall below

R

bits/sample. Thus, the size of the codebook should be increased.

An obvious method of adjusting the output entropy of the TCQ system (as computed using (2.6)) is to use a midtread uniform codebook with spacing

~, as shown in Figure 2.6. Inclusion of the zero codeword enables encoding rates below one bit/sample to be achieved [59]. As!1 gets large, the top state in Figure 2.2 will rarely be exited and the output entropy (computed using (2.6)) will drop to zero.

35

• •

D3 DO Dl D2 D3 DO Dl D2 D3 DO Dl

• •

• •

• • •

• •

-56 -46 -36 -26 -6 o

6 26 36 46 56

Figure

2.6:

Uniform TCQ codebook.

• • •

Figure

2.7 shows the performance of an 8-state TCQ system using a uniform codebook. Points on the curve were generated by varying the value of A. For comparison, the curve generated by encoding the source using (entropy-coded) USQ is also shown

(where the entropy is calculated using equation

(2.2)).

Note that at high rates, the

TCQ system comes within 0.5 dB of the rate-distortion function and outperforms entropy-coded USQ by about 1.0 dB. However, at low rates, the performance of

TCQ deteriorates, where, for example, the TCQ system is approximately 2.3 dB away from the rate-distortion function at 0.75 bits/sample.

It is shown in [69] that improved performance of the TCQ system can be obtained at low rates by employing the codebook design and encoding rule from [70]. This algorithm is a modified version of the generalized Lloyd algorithm for vector quantizer design [70], and is used to minimize the MSE of an encoding, subject to an entropy constraint. This is accomplished by minimizing the cost function

J

=

E[p(x, c)]

+

AE[I(c)],

(2.7) where

x

is the data, c is the encoded version of

x, p(x,

c) is the cost (usually MSE) of representing

x

by c,

A is a Lagrange multiplier, and

l(c)

~

-log2

P(xIA j )

is the number of bits used by the variable-length code to represent c. This algorithm chooses the "best" codeword by considering both the MSE and the number of bits

36

Rote-Distortion Function

Entropy Coded TCQ

Entropy Coded USQ

5 10

SNR (dB)

15

20

Figure 2.7: Performance of an 8-state entropy coded uniform TCQ for encoding the memoryless Gaussian source. required to represent the particular codeword. Utilization of the encoding rule in

(2.7) significantly improves the low rate performance of the TCQ system.

It is shown in [65] that for the memoryless Gaussian source, the maximum boundary gain obtainable is equivalent to the gain realized by entropy-coding the quantizer output. Experimentation has shown that this is also true of various generalized

Gaussian sources. For example, for encoding the memoryless Laplacian source using ECTCQ, performance within 0.5 dB (for an 8-state trellis) of the rate-distortion function is also obtained.

Figure 2.8 shows the performance of an 8-state entropy-constrained TCQ (ECTCQ) system for encoding the zero-mean, unit-variance Gaussian source. Also shown is the

37

4

Rote-Distortion Function

Entropy-Constrained TCQ

Entropy-Constrained SQ

5

10

SNR (dB)

15 20

Figure 2.8: Performance of an 8-state entropy-constrained TCQ for encoding the memoryless Gaussian source. performance of ECSQ. The 8-state ECTCQ system comes within 0.5 dB of the ratedistortion function for all non-negative rates and is about 1.0 dB better than ECSQ at high rates.

2.4 Entropy-Constrained Predictive Trellis Coded Quantization

For sources with memory, a predictive TCQ (PTCQ) system was developed in

[26]. In that system, each path through the trellis corresponds to a potential output sequence of PTCQ. Predictions of the data sample

Xi are made at each state using the potential output sequences leading into that state. The prediction residual for state

k,

say

df,

is calculated by subtracting the prediction from the current data sample.

38

The codeword closest to the prediction residual in each subset corresponding to a branch exiting state

k

is chosen and the branch cost is calculated as the squared error between that codeword and the prediction residual at state

k.

The quantized value of the current data sample (for each branch) is formed by adding the chosen codeword (quantized prediction residual) to the prediction. As in the memoryless case, the Viterbi algorithm is used to choose a path through the trellis.

Consider now a codebook with separate decision and reconstruction codewords.

The codeword chosen for each branch is that which minimizes the distance between the prediction residual (of the state from which the branch emanates) and a decision codeword

tj.

However, the quantized prediction residual is represented by the corresponding reconstruction codeword

Cj.

The cost associated with each branch (for use in the Viterbi search) is set to

pf

=

(df -

t

j)2.

The decision codewords

t

j are formed by taking a uniform scalar codebook with stepsize

~ and assigning its codewords

(from left-to-right) to subsets

Do, Db D

2 ,

D

3 ,

Do, Db D

2 ,

D

3 , ••••

It is important that a "midtread" codebook be used, and that the "zero" codeword is included in the Do subset [69]. By partitioning the decision codewords in this fashion, four codebooks are created, each corresponding to a uniform threshold scalar quantizer with step size

4~.

The reconstruction codewords

Cj are the centroids of support regions defined by tj

±

2~.

Philosophically, this process is a generalization (to a trellis-based structure) of that employed in [71]. In that system, the codewords used for reconstruction are computed

39 as centroids (based on the probability density function (pdf) of prediction residuals) while the decisions of which codewords to use are based on uniform thresholds. From the point of view of minimizing a cost function, this is equivalent to making codeword decisions based on minimizing the distance between prediction residuals and the codewords of a uniform codebook. This is exactly what is done by the ECPTCQ system. In fact, finding the best

tj

in each subset can be accomplished by performing a uniform threshold quantization.

Design of the reconstruction codewords is facilitated by using the following algorithm.

1. For a given stepsize

~ and integer

N,

let c~O)

=

tj,j

=

1, ... ,

N,

be the codewords of a midtread uniform scalar quantizer with stepsize~.

Partition these codewords into subsets and set i = o.

2. Set i

= i

+

1; quantize the training data and assign the prediction residual,

d,

at each stage (of the surviving path) to one of the sets, 8

1 , ••• ,

8N,

corresponding to the codeword selected for that residual.

3. For each set

8j, compute a reconstruction codeword as

cy)

= lijl

EdESj d,

where

18jl is the number of elements belonging to

8j.

4. If the difference in quantizer structure

-k

Ef=1

Ic~i)

_cY-l)l,

is greater than some small

>

0, go to step 2. Otherwise, stop with codebook

C.

40

Experiments using different codebook sizes at various rates were performed to determine the appropriate codebook size N. It was found that excellent performance can be obtained when

N

is at least

2(R+5) , where

R

is the desired entropy. It should be noted that the ECPTCQ training algorithm is not guaranteed to converge. However, suitable codebooks can typically be obtained in less than five iterations with N

=

512 and

=

0.0001.

The algorithm described above is similar to those employed in [71] and [72]. It is closest to the algorithm in [71] with the major difference being that we compute centroids from prediction residuals obtained from training data, while in [71], centroids are computed from approximate prediction residual pdfs obtained through polynomial expansions.

2.4.1 Performance of ECPTCQ

The performance curve for an 8-state ECPTCQ system is shown in Figure 2.9. For comparison, Figure 2.9 also shows the performance of an entropy-constrained DPCM system obtained by applying the same training algorithm to a scalar quantizer based system. In each case, the codebooks were trained using 50,000 data samples from a unity-variance first-order Gauss-Markov source with correlation coefficient

p

=

0.8.

The results of Figure 2.9 were then obtained by encoding 50,000 samples from outside the training sequence. Each curve was produced parametrically by varying the value of the step size~.

(i.e., Each point on a particular curve was obtained by applying

41 the algorithm with a different choice of

~.).

Although not shown here, experiments with different values of p provide similar results.

At rates above about 2.0 bits/sample, the performance of the DPCM system in

Figure 2.9 is equivalent to that obtained in [71]. At rates below 2.0 bits/sample, the

DPCM results reported here are superior to those in [71]. Apparently, at low rates, the polynomial model used there is not sufficiently accurate and the training-sequencebased approach yields superior performance. The ECPTCQ system provides a further performance increase over both scalar quantizer based schemes for all rates above about 0.25 bits/sample. For example, at 1.0 bits/sample, ECPTCQ is roughly 1.1 dB and 1.75 dB better than the ECDPCM results reported here and in [71], respectively.

Asymptotically, as the rate is increased and the quantization becomes very fine, the residual pdf should approach that of the memory less Gaussian source, and the performance of ECPTCQ (and ECDPCM) should approach that of entropy-constrained

TCQ (and entropy-constrained scalar quantization) operating on the memoryless

Gaussian source. This is in fact what occurs. At high rates, the performance of t.he

ECDPCM system is within 1.53 dB of the rate-distortion function (as expected from

[59],[71]) and the ECPTCQ system is within 0.5 dB of the rate-distortion function (as expected from [73],[69]). It should be pointed out that this convergence is markedly faster for the scalar system than for the trellis based system. The asymptotic result is nearly achieved for the DPCM systems at about 2.0 bits/sample, while the

ECPTCQ system requires rates as high as 5 bits/sample. It is expected that this

42

2.5

~

Q.

2.0

E o

!

1.5

~ e

;5

1.0

0.5 o

",""

",,"

."

••.••. " "

..... >

Rote-distortion function

•••

Entropy-constrained PlCQ

Entropy-constrained DPCt.4

...... " ,

..

'

.,

•••••• , "

.....

.0-

'"

...

.O'-

..

,,"

.,

'

.....

.,

....

•••

..

'

"

"

.,

..

'

til

.ee

.oo'"

'"

,

..

'

•• e

.'

,.e ."

. ,

.e· '"

' "

'"

"

;I'

.,;..,;..-;..:,.:,.;

.-

.-

5

__ __

L-~~~~~-L~

10

SNR (dB)

15 20

Figure 2.9: SNR performance of ECPTCQ. disparity can be reduced by designing codebooks with optimized thresholds (rather than the uniform thresholds used here).

43

CHAPTER 3

HYPERSPECTRAL IMAGE CODER USING ECPTCQ

3.1 System Description

In this chapter, we present a system for encoding hyperspectral imagery which uses the 2-D DCT and ECPTCQ. A block diagram illustrating the procedural flow of the system is shown in Figure 3.1. Each spectral band is partitioned into 8 x 8 blocks and the 2-D DCT is applied to each of these blocks. The transform coefficients at each spatial location are then collected to obtain spectral vectors. Thus, for a hyperspectral image with

L

bands, each of size

N

x

N,

there are

N

2

spectral vectors, each of length

L.

Since the spectral vectors are highly correlated, they are encoded using ECPTCQ. Additionally, to avoid degradation due to startup transients, the transform coefficients of the first band are encoded using ECTCQ.

3.2 Codebook Design

Experimentation using hyperspectral data from AVIRIS reveals that the correlation coefficient

p,

of the spectral vectors within any transformed block ranges from about 0.6 to 0.95. More specifically, the spectral vectors corresponding to the DC coefficients have the highest correlation (e.g., 0.95) while the vectors corresponding

CoIr. Coeff. r-

Calc. Specual

Mean.

Variances

Corr. Coeff.

Calc.

Slats.

SpecIral mCIIIJ

Q l

I

TCQ

I

Coder.

,

I

All~tc.1

(Side

Info)

QH~I

-

InIClband

ECPI'CQ

Encoder

T

Hyperspectral

InPUt

2-D ocr

DECOMP.

Intcrbancl

ECPI'CQ

Encoder

T

MUX

Reoonstructed

Hyperspectral

Image INVERSE ocr

Intcrbancl

ECPrCQ

Encoder

L . - - . #

T

(Side

Info)

,

I

Digital

I

Channel

!

Recon.

Cocff.

Recon.

Cocff.

DEMUX

~

Figure 3.1: Hyperspectral image coder.

44

45 to the highest frequency coefficient (in both spatial dimensions) have the lowest (e.g.,

0.60).

Accordingly, codebooks were designed using the ECPTCQ training algorithm for correlation coefficient values ranging from

0.60 to

0.95, in

0.05 increments. For each allowable value of

p,

twenty-one codebooks were designed with rates ranging from

0.25 bits/sample to

5.25 bits/sample, in quarter-bit increments. Each training sequence consisted of 50,000 samples from a first-order Gauss-Markov pseudo random number generator with the appropriate value of

p

(as discussed above).

The first spectral band in the hyperspectral sequence must also be encoded, with each quantized coefficient being used as the initial condition to encode the corresponding spectral vector using ECPTCQ. Coefficients corresponding to the same position within each block ("like-coefficients") are collected into sequences to be encoded using ECTCQ. Codebooks were designed using the algorithm in [69]. This algorithm uses a modified version of the generalized Lloyd algorithm [70] to minimize the MSE of an encoding, subject to an entropy constraint. This is accomplished by minimizing the cost function

J

=

E[p(x,

c)]

+

-XE[l(c)]

(3.1) where

x

is the data, c is the encoded version of

x, p(x,

c) is the mean-squared-error of representing

x

by c,

A

is a Lagrange multiplier, and

l(c)

~

-log2

P(xIAi)

is the number of bits used by a variable-length code to represent c. This process considers both the MSE and the number of bits used by the variable length code to choose the

"best" codeword.

46

It was shown in [69] that for rates less than 2.5 bpp, optimum codebooks do not yield increased performance over uniform codebooks. Thus, optimum codebook with

256 elements were designed in one-tenth bit increments for rates up to 2.4 bpp, with uniform codebooks used thereafter.

The sequence corresponding to the DC coefficients is assumed to have Gaussian statistics, while the remaining high-frequency sequences are assumed to have Laplacian statistics [74]. Therefore, training sequences consisted of 100,000 samples derived from Gaussian and Laplacian pseudo random number generators, respectively.

3.3 Side Information and Rate Allocation

The side information required for this algorithm is substantial. In principle, the mean, standard deviation, and correlation coefficient for each spectral vector must be transmitted. Additionally, the transform coefficients of the first band must be transmitted, along with the initial states for the trellises used to encode spectral vectors having nonzero rates (assuming 4-state trellises, each initial state requires 2 bits).

Fortunately, for a given spatial frequency, the variance of the correlation coefficients is extremely small. That is, if a sequence is formed by collecting the spectral correlation coefficients from each block at a given frequency, the variance of the sequence is negligible compared to its mean or average value. Thus, only the average

47 correlation coefficients (quantized using 16 bit uniform scalar quantizers) are transmitted as side information and used for encoding and rate allocation. Since for an

8 x 8 DCT, there are 64 spatial frequencies, only 64 average correlation coefficients need to be transmitted.

Similarly, the variance of the spectral standard deviations is quite small for all spatial frequencies except DC. Thus, the average spectral standard deviations (again, quantized to 16 bits) are used in encoding and transmitted as side information for each of the 63 non-DC spatial frequencies. The DC spectral standard deviations change considerably throughout the image, but are highly correlated in neighboring blocks.

Thus, the DC spectral standard deviations are quantized using ECPTCQ with raster scan (back and forth) at 5.25 bits/sample l

.

The "extra" side information required for encoding this sequence is 66 bits (16 bits each for the initial value, mean, standard deviation, and correlation coefficient of the sequence, and 2 bits for the initial trellis state).

Unfortunately, the means of the spectral vectors are quite random in nature, except for the DC spectral means, which exhibit a very high degree of correlation

(typically 0.99) if scanned in an order similar to that of the DC spectral standard deviations. Hence, ECPTCQ is also used to encode this sequence at 5.25 bits/sample

(plus an additional 66 bits, as in the case of the DC spectral standard deviations). lThe highest rate codebook available was used to ensure accurate representation of these values.

48

Since the remaining spectral means exhibit no significant correlation, ECTCQ is used to encode the 63 sequences formed by grouping all spectral means for a given (non-DC) spatial frequency ("like-means"). These sequences are encoded at an average rate chosen so that the rate required for all spectral means (including the

DC means) is

RJj

bits/pixel. The first spectral band, which is encoded in the same manner as the non-DC spectral means, is assigned an average rate of

Rl bits/pixel.

Rate allocation is performed using the algorithm described in [75]. This scheme uses the rate-distortion performance of different quantizers to provide a near-optimal allocation of bits, given an overall bit quota. The overall MSE incurred by encoding the

N

2

spectral vectors using ECPTCQ at an average rate of

Rs bits/coefficient is represented by

Es

N2

=

~u?Eij(ri) i=l

(3.2) where

ul

is the variance of sequence i, and

Eij(ri) denotes the rate-distortion performance of the lh quantizer (e.g.,

p

=

0.6 to 0.95 in 0.05 increments) at ri bits/sample.

The rate allocation vector

B

=

(ri'

T2, •• • ,

rN2)

is chosen such that Es is minimized, subject to an average rate constraint:

N2

Ra bits/coefficient. (3.3) i=1

It is shown in [75] that the solution

B*(

ri, ri, ... ,

rN2)

to the unconstrained problem min

B

(3.4)

49 minimizes

Ell

subject to

Ef:l

ri ::;

Ef:l

r;.

Thus, to find a solution to the constrained problem of equations (3.2) and (3.3), it suffices to find

A such that the solution to equation (3.4) yields

Ef:l

r; ::;

R

II •

Procedures for finding the appropriate

A

are given in [75).

For a given .A, the solution to the unconstrained problem is obtained by minimizing each term of the sum in (3.4) separately.

If

Sj

=

{p;, ... , qj}

is the set of allowable rates for the

ph

quantizer and r; is the

ith

component of the solution vector

B*,

then r; solves ri min

E

S j

{O':

Eij(ri)

+

.Ari}.

(3.5)

Testing revealed that equal performance was obtained whether globally allocating bits using all

N

2

spectral variances, or by using the 64 average spectral variances (as discussed above). The latter approach is adopted since the rate allocation algorithm need only take into account 64 spectral variances (rather than

N2),

thus resulting in far fewer computations. Also, this results in all blocks having identical spectral bit maps. All spectral variances are quantized prior to being used in the spectral rate allocation algorithm.

As in the case of the spectral vectors themselves, rate allocation for the spectral means and the first spectral band is accomplished using the algorithm of [75). The scalar-quantized variances of the "like-coefficient" and "like-mean" sequences are used in the rate allocation algorithms for the first band and spectral means, respectively.

It should be noted that the rate allocation for the spectral vectors is used to constrain

50 the spatial rate allocation of the first spectral band. Thus, if any spectral vector is assigned zero rate, the corresponding transform coefficient in the first spectral band is also assigned zero rate. Finally, any spectral vector that is assigned zero rate is set to (the quantized value of) its corresponding spectral mean.

The total side information required for encoding an

N

x

N

hyperspectral image consists of the spectral correlation coefficients (64· 16 bits), spectral standard deviations (63 . 16

+

5.25(N2/82)

+

66 bits), the spectral means

(R/1N2

bits), the first spectral band

(R1N2

bits), and the initial trellis states

(2n

bits, where

n ::; N2 is the number of spectral vectors assigned a non-zero encoding rate). Combining these quantities yields (2098

+

2n

+

(R/1

+

Rl

+

0.082)N2)

bits. The overall encoding rate for the system operating on a hyperspectral image with

L

bands is then

(R6(L

-1)

+

RJ.'

+

Rl

+

0.082

+

(2098

+

2n)/N2)/L

bits/pixel/band (b/p/b).

Figure 3.2 shows the overall encoding rate of a hyperspectral sequence as a function of the number of spectral bands. In this particular case, the asymptotic rate

R6

is 0.10 bits/coefficient and the encoding rates of the first band Rt, and the spectral means

R/17

are 1.0 and 2.0 bits/pixel (bpp), respectively. Note that for small sequences (e.g., less than ten bands), the side information dominates the overall rate.

3.4 Results

Coding simulations were carried out using a 140-band, 8-bit byperspectral data sequence from Cuprite, Nevada, obtained by the AVIRIS system. The bands were

51

-

Number of spectral bands

Figure 3.2: Overall rate versus number of spectral bands.

256 x

256 pixels and were taken from larger images (for computational simplicity).

The performance of the encoding system is reported using the peak signal-to-noise ratio defined as

(255)2

10log

1o 1

N-l N-l

~

N2 Eu=O EtI=o

II(u,v) - I(u,v)1

2 where

i(u,v)

is the coded version of the original band

I(u, v).

(3.6)

For a desired asymptotic rate of

RII

=

0.1 b/p/b, the first band in the sequence was quantized at

Rl

=

1.0 bpp and the spectral means were quantized at

RI'

=

2.0 bpp. It was found that these rates are a good compromise between MSE performance and side information. These choices lead to side information totaling about 3.65 bpp.

For short subsequences of hyperspectral data, this amount of side information can

52 be quite significant. However, if all 140 bands are coded, the side information is only

0.025 b/p/b.

Figure 3.3 shows the PSNRs obtained by encoding bands 30 through 69 of the hyperspectral sequence with

RII

=

0.10 b/p/b. If all 140 bands were encoded, this would correspond to an overall rate of about 0.125 b/p/b. The average PSNR of the coded sequence is 43.10 dB, with the PSNR of some bands approaching 46 dB.

The dip in PSNR around bands 56 and 57 is indicative of high sensor noise that is clearly evident upon visual examination. Figures A.1 and A.2 show the original and coded image from band 50, respectively, while Figure A.3 is the difference image obtained by displaying the magnitude of the error

2

The coded image is virtually indistinguishable from the original with no artifacts or contrast variations. All fine spatial detail is preserved with essentially no blurring. This subjective performance is indicative of the entire sequence. Note the complete absence of structure in the difference image.

The overall encoding rate for short sequences can be significantly reduced (at the expense of MSE performance) by assigning less rate to the first spectral band and the spectral means. For example, Figure 3.4 shows the PSNR values of coding bands

30 through 69 at

RlJ =

0.1 b/p/b as before, but with Rl = 0.75 bpp and

RJl

=

0.75 bpp. Although the average PSNR for the sequence has dropped to 41.34 dB (a decrease of 1.76 dB from the previous case), the performance of the system is still

2For the purpose of visual display, the difference images discussed herein were multiplied by a factor of

45 and hard clipped at

255.

53

50

..,

'it

III

Q.

40

35

Figure 3.3: Performance of encoding hyperspectral sequence at

=

1.0 bpp, and

R~

=

2.0 bpp.

Ra

=

0.10 b/p/b,

Rl extremely good. The advantage gained, however, is the significant drop in overall rate for shorter subsequences. Figure A.4 shows the coded version of band 50 with the reduced side information. Note that this image is nearly identical to the image coded with greater side information (and to the original), as would be expected from its PSNR of 42.1 dB. The difference image of Figure A.5 reveals greater error and structure as compared to the case of high side information.

The system achieves a compression ratio of 42:1 at an average PSNR of 43.10 dB when encoding a 40-band sequence. The same level of performance is attainable at compression ratios of 64:1 if 140 bands are coded. Additionally, compression ratios

54

45

Iii'

"0

'it

VI

Q.

40

35

30~~~~~~~~~~~~~~~~~~~~~~~

~ ~ ~ ~ ~

Bond

Figure 3.4: Performance of encoding hyperspectral sequence at

RlI =

0.10 b/p/b,

Rl

=

0.75 bpp, and

Rp.

= 0.75 bpp. approaching 70:1 can be obtained if coding all 209 bands of the AVIRIS system. Further, by decreasing the side information of

Rl and

Rp.

to 0.75 bpp each, compression ratios of 40:1, 53:1, 70:1, and 73:1 can be obtained for sequences of length 20, 40,

140, and 209, respectively, with a decrease of less than 2 dB in average PSNR.

There are many operational modes of the system, each obtained by adjusting

RlI ,

R

1 ,

Rp.,

and the number of trellis states. The two modes used here were chosen to give extremely high quantitative and subjective quality of the coded sequences (at very low rates), with varied amounts of side information. Depending on the quality required by the application, the system can easily be operated at compression ratios in excess of 100:1. This mode may be ideal for browsing or rapid analysis of the data.

55

The computational complexity of our coder is moderate, with the majority of computations being used to compute the DOT of each spectral band. Since 4-state trellises are used exclusively throughout the system, the quantization process requires minimal computation [26]. However, the transform coefficients of the entire hyperspectral sequence must reside in memory, which may limit its use to ground-based applications. Of course, much of this problem can be alleviated by breaking the sequence into shorter subsequences and coding each individually.

56

CHAPTER 4

3-D nCT HYPERSPECTRAL CODER

4.1 System Description

A hyperspectral coding system using the 3-D DCT and ECTCQ is shown in Figure 4.1. The image sequence is partitioned into 8 x 8 x 8 cubes and transformed using the 3-D DCT. Coefficients corresponding to the same position within each cube ("like-coefficients") are collected into sequences to be encoded using ECTCQ.

For a hyperspectral sequence with

L

bands, each of size

N

X

N,

the total number of sequences to be encoded is 8

3

=

512, each of length

LN2/512.

Input

Sequence

DCTof8x8x8

Cubes

Recon.

Sequence

Inverse ncr

Figure 4.1: Hyperspectral image coder using the 3-D DCT and ECTCQ.

57

4.2

Codebook Design

The probability distribution of each like-coefficient sequence is modeled by the socalled Generalized Gaussian Distribution (GGD), whose probability density function

(pdf) is given by

(4.1) where

11(a,O")

_ -1

[r(3/a)]1/

2

=

0"

r(l/a)

(4.2)

The shape parameter a describes the exponential rate of decay and

0"

is the standard deviation of the associated random variable

[59].

The gamma function r(-) is defined as

(4.3)

Distributions corresponding to a

=

1.0 and

2.0 are Laplacian and Gaussian, respectively. Figure

4.2 shows generalized Gaussian pdfs corresponding to a

=

0.5, 1.0, 1.5,

2.0, and

2.5.

It can be shown that

(4.4) or

K

=

E[X4]

= r(5/a)r(1/a)

0"4

r(2/a)r(3/a)

(4.5) where

K

is the fourth central moment, or Kurtosis. Recall that the Kurtosis is a measure of the peakedness of a given distribution.

If a pdf is symmetric about its

58

.'

. .

:j

! :

: j

! :

: i

! :

: i

I •

: r,! i/

.~

::.--:

..

~

!

:j

I. q

!: q

I . alpha

=

0.5 alpha .. 1.0 alpha

=

1.5 alpha alpha

= 2.0

=

2.5

-2 o

Normalized Input

2 4

Figure 4.2: Probability density function for generalized Gaussian distributions with alpha values of 0.5, 1.0, 1.5, 2.0, and 2.5. mean and is very flat in the vicinity of its mean, the coefficient of Kurtosis is relatively large. Similarly, a pdf that is peaked about its mean has a small Kurtosis value.

The sample Kurtosis of any sequence can be calculated easily and used as a measure by which the distribution of the sequence can be determined. Figure 4.3 shows the relationship between the shape parameter

a

and K. This graph is used to determine the appropriate

a

for a particular sequence.

Codebooks were designed for generalized Gaussian distributions with

a

values of

0.5,0.75, 1.0, 1.5, and 2.0, using the algorithm in [69). It was shown in [69) that for the Gaussian distribution, optimum codebooks do not yield significant MSE improvement over uniform codebooks at rates greater than 2.5 bits/sample. Experimentation

60r---~----~----~-----r----~----~----T---~

59 o

__

~~ ~

____

~

____

~

__

~

2 alpha

<4

Figure 4.3: Kurtosis vs. alpha revealed that this is also true for a = 1.5 and a = 1.0. However, for a = 0.75, optimum codebooks are superior up to 3.0 bits/sample, while for a

=

0.5, optimum codebooks should be used up to 3.5 bits/sample. Accordingly, for a values of 2.0,

1.5, and 1.0, optimum codebooks were designed in one-tenth bit increments up to

2.5 bits/sample, while for a

=

0.75 and a

=

0.5, optimum codebooks were designed in one-tenth bit increments up to 3.0 and 3.5 bits/sample, respectively. Thereafter, uniform codebooks were designed in one-tenth bit increments up to 12 bits/sample.

Training sequences consisted of 100,000 samples derived from generalized Gaussian pseudo random number generators, each tuned to the appropriate a value.

60

4.3 Side Information and Rate Allocation

Given the three-dimensional frequency space in the transform domain where the

x

and y axes represent the spatial dimensions and the

z

axis represents the spectral dimension, all like-coefficient sequences should (theoretically) be zero mean except the DC sequence (i.e., x,

y, z

=

0). In fact, the sample means of the like-coefficient sequences drop off very rapidly in any frequency direction relative to DC, with the mean of the DC sequence typically being two orders of magnitude greater than the means of those sequences corresponding to frequencies one position higher along any axis (i.e.,

(x,y,z)

=

(1,0,0), (0,1,0), and (0,0,1)). Thus, only the DC mean should need to be transmitted as side information. Experiments confirm this for the

(x,

0, 0) and (0,

y,

0) sequences which have sample standard deviations at least an order of magnitude larger than their corresponding sample means. On the other hand, the extremely high correlation along the z-axis (spectral dimension) causes a sharp drop in standard deviation along that axis. As a result, the sample means along this axis are within an order of magnitude of their respective standard deviations. For this reason, we chose to transmit all eight sample means for the sequences with coordinates of the form (0,0,

z).

The side information then consists of 512 standard deviations and 8 spatial means.

These quantities are quantized using 16 bit uniform scalar quantizers, resulting in a total of (520)(16)

=

8320 bits. The initial trellis states are also transmitted which

61

(for 4-state trellises) total (512)(2)

=

1024 bits. Combining these quantities yields

9344 bits of side information per hyperspectral sequence.

Rate allocation is performed by using the algorithm described in [75]. The overall

MSE incurred by encoding the like-coefficient sequences using ECTCQ at an average rate of

Ra

bits/coefficient is represented by

Ell

512

=

'EulEij(ri) i=l

(4.6) where

ul

is the variance of sequence i, and

Eij(ri)

denotes the rate-distortion performance of the

lh

quantizer (e.g., the quantizer corresponding to the Kurtosis of sequence i) at

n

bits/sample.

4.4 Results

Coding simulations were performed using a 140-band hyperspectral image sequence obtained from the AVIRIS system. The bands were 256 x 256 pixels and were taken from larger images (for computational simplicity). The performance of the system is reported using PSNR.

Figure 4.4 shows the PSNRs obtained by encoding bands 30 through 69 of the hyperspectral sequence at an asymptotic bit rate of 0.1 b/p/b. The average PSNR of the coded sequence is 40.75 dB. Figure A.6 shows the coded version of band 50 from the sequence. The coded image is very similar to the original (shown in Figure A.1).

We note a very fine blurring in some areas, but no contrast variations or artifacts

62

50

45

35

Figure 4.4: Performance of encoding hyperspectral sequence at

RII

=

0.10 b/p/b. are present. This subjective performance is indicative of the entire sequence. The difference image of Figure A.7 shows a fairly small coding error with very little structure.

Figure 4.5 shows the overall coding rate of the system as a function of sequence length. The overall rate for encoding the 40-band sequence is 0.104 b/p/b (for

R,

=

0.1 b/p/b). If all 140 bands were coded, the overall rate would be 0.101 b/p/b. It is evident that the overall rate (for a given asymptotic rate) required by the 3-D DCT system is roughly independent of image sequence length and image size for sequences longer than 15, after which the side information contributes less than 10% to the overall rate.

63

-

,...

"C

C o

0.6 I-

~

II

-[

e

~

~O.2l

-

-

------------------------------------~

Number of spectral bonds

Figure 4.5: Overall rate versus number of spectral bands

The 3-D DCT system achieves a compression ratio of 77:1 at an average PSNR of

40.75 dB. The same level of performance is attainable at compression ratios of 79:1 if all 140 bands were encoded. Moreover, the subjective performance of the coder is excellent.

The complexity of the 3-D DCT coder is moderate to high, being more computationally demanding than either the ECPTCQ coder discussed in Chapter 3, or the hybrid coder in Chapter 5, since the sequence is transformed in all three dimensions. Moreover, the entire sequence must reside in memory for calculation of sample statistics and encoding. The memory requirements can be lessened by breaking the sequence into shorter subsequences and coding each independently.

64

CHAPTER 5

HYBRID DPCM/DCT HYPERSPECTRAL CODER

5.1 System Description

DPCM is a simple and well-known method of achieving moderate compression of correlated sequences [13],[76],[12]. Given a pixel

Xi-I, the next pixel in the sequence,

Xi, is predicted. If is the predicted value of

Xi, then the difference, fi

=

Xi-Xili-l, will, on average, be significantly smaller in magnitude than

Xi.

Accordingly, fewer quantization bins, and thus, fewer bits are required to encode the error sequence than would be required to encode the sequence of pixels. It can be shown that for a nonzero-mean input sequence, the optimum (minimum MSE) first-order linear predictor is given by

(5.1) where

Jl

and

p

are the mean and correlation coefficient of the sequence, respectively, and xili-l is the predicted value of

Xi.

It is apparent that the input sequence must be normalized to zero mean or the sequence mean must be included in (5.1). In either case, for applying DPCM to a nonzero-mean sequence, the mean must be transmitted as side information.

65

2-D ocr

DccompoiitiOll

2-DInvene ocr

XnlD-l

Figure 5.1: Hybrid DPCM/DCT hyperspectral image encoder

The utilization of DPCM to exploit the spectral correlation of hyperspectral imagery is straightforward. For an L-band image of size

N

x

N,

an ordinary DPCM loop could be employed to encode each of the

N

2

sequences that result from treating each hyperspectral pixel as a sequence of length

L.

Unfortunately, while this scheme exploits spectral correlation, it does not exploit the spatial correlation inherent to the data, and requires the transmission of

N2

spectral means.

These problems can be circumvented by using the encoder configuration shown in Figure 5.1 [77],[78],[79]. Here, the DPCM loop operates on entire images rather than on individual spectral sequences. Given an image

X n , the next image in the hyperspectral sequence is estimated and an "error image" is formed from the difference

En

=

Xn Xnln-l.

The error image (at each instant in time) is spatially correlated and can be quantized using any image coding scheme. Note that the error image

66 must be decoded within the encoder loop so that the quantized image, xn, can be constructed and used to predict the next image.

The prediction error images have much lower energy than the original bands and can be subjected to very coarse quantization (less than 0.1 bits/pixel) without introducing "blocking effects." In tact, the bit rate chosen to encode each error image will be the asymptotic bit rate of the system.

Testing of AVIRIS data revealed that the spectral correlation coefficient

p,

for any pixel in the image is approximately 0.95. Accordingly, this value of

p

was used in the

DPCM loop.

Coding of the error image is similar to the system described in [29]. In that system, an image is partitioned into 16

X

16 blocks and the 2-D DCT is performed on each block. Coefficients corresponding to the same position within each block ("likecoefficients") are collected into sequences to be encoded using ECTCQ. Codebooks were designed by using a training image approach. Specifically, a group of images

(similar to the test image) was collected, partitioned, and transformed (as discussed above). The "DC" coefficients of each block were collected as a sequence, as were the remaining "high frequency" coefficients. These two training sequences were then used to form codebooks in quarter-bit increments (one set each for the DC and high frequency coefficients) using the ECTCQ training algorithm developed in [69]. A rate allocation scheme was developed in which an assumed distortion model was used.

67

5.2 Codebook Design, Rate Allocation, and Side Information

Our coder differs (from that in [29]) with respect to codebook design and rate allocation. In our system, the sequence corresponding to the DC coefficients is assumed to have Gaussian statistics, while the remaining high-frequency sequences are assumed to have Laplacian statistics [74]. Codebooks are designed in one-tenth bit increments up to 2.5 bits per pixel (bpp) using 100,000 sample sequences derived from Gaussian and Laplacian pseudo random number generators, respectively (using the algorithm in [69]). It was shown in [69] that for rates less than 2.5 bpp, optimum codebooks do not yield increased performance over uniform codebooks. Thus, at rates greater than 2.4 bpp, uniform codebooks are used. Rate allocation is performed using the algorithm in [75].

The DC sequence is normalized by subtracting its mean and dividing by its standard deviation. The non-DC sequences are normalized by dividing by their respective standard deviations (the non-DC sequences are assumed to have zero mean). The sequences are then encoded using ECTCQ. The total number of sequences M, when using 8 x 8 blocks, is 64 (one DC and 63 non-DC sequences).

The side information required to encode each error image (and the first spectral band) consists of the mean of the DC sequence and the standard deviations of all M =

64 sequences. These quantities are quantized using 16-bit uniform scalar quantizers to yield 16(M

+

1) bits. In addition, the initial trellis state of each sequence requires

68

2 bits (for a 4-state trellis) which yields 2M bits. The total side information is then

(18M

+

16) bits which corresponds to 1168/(256)2

=

0.0178 bpp for a 256 x 256 image or 1168/(512)2

=

0.00446 bpp for a 512 x 512 image. The first spectral band is encoded and transmitted (as the initial conditions for DPCM) at a total rate

(including side information) of Rl bits/pixel.

5.3 Results

Coding simulations were performed using a 140-band hyperspectral image sequence obtained from the AVIRIS system. The bands were 256 x 256 pixels and the performance of the coder is reported using the peak signal-to-noise ratio.

The first band in the sequence is quantized at Rl

=

0.75 bits/pixel and is used as the initial condition for the spectral DPCM. This rate was chosen so that the PSNR of the coded first band did not significantly deviate from the average PSNR of the sequence, when encoded at an asymptotic rate

R a,

of 0.10 bits/pixel/band (b/p/b).

Figure 5.2 shows the PSNRs obtained by encoding bands 30 through 69 of the hyperspectral sequence at an asymptotic bit rate l of

Ra

= 0.1 b/p/b. The average

PSNR of the coded sequence is 40.29 dB. Figure A.8 shows the coded image from band 50 of the sequence. The coded image is very similar to that obtained from the

3-D DCT system (shown in Figure A.6). We note a very slight blurring of the edges, lit should be noted that the asymptotic rate of the hybrid system includes the side information required to encode each error image, since the contribution of the side information to overall rate will remain constant regardless of sequence length.

69 especially in the higher gray-level regions. We also note that although each error image was subjected to very coarse quantization (e.g., 0.1 bpp), no blocking effects are observed. The error image shown in Figure A.9 reveals greater coding error and structure than either the ECPTCQ system (with high or low side information) or the

3-D DCT system. This result is expected since the PSNR of the coded band is lower than that obtained from the coders discussed previously.

Figure 5.3 shows the overall coding rate (including all side information) of the system as a function of sequence length. Note that for encoding short subsequences

(i.e., less than 10 bands), the overall rate of the hybrid system is dominated by the rate required to code the first spectral band. The overall rate for encoding the 40band sequence is 0.116 b/p/b when

RII

=

0.1 b/p/b. If all 140 bands were coded, the overall rate would be 0.105 b/p/b. It is evident that at least 65 bands are required such that the overall rate is within 10% of the asymptotic rate (when Rl

=

0.75 bpp).

The hybrid system achieves a compression ratio of 69:1 at an average PSNR of

40.29 dB. The same level of performance is attainable at a compression ratio of 76:1, if all 140 bands were encoded. Moreover, the subjective performance of the coder is excellent.

The hybrid system is the most computationally tractable of the three systems presented here, with the majority of computations being used to compute the 2-D DCT.

The most significant advantage of the hybrid system, however, is the small amount of memory required for encoding/decoding. The encoder requires only 2 bands at

70

50

45 -

-

35 -

-

30~~~~~~~~~~~~~~~~~~~~~~~~~

30 40 50 60 70

Band

Figure 5.2: Performance of encoding hyperspectral sequence at

R/I

=

0.10 b/p/b.

O.B

.., c

0

.0

~

II

)(

0.6

'0.

';;;- 0.4

!

~

~

0.21-

-

-

-

Number of spectral bands

Figure 5.3: Overall rate versus number of spectral bands

71 once while the ECPTCQ and 3-D DCT coders require the entire hyperspectral sequence to reside in memory. Accordingly, the hybrid coder is an ideal candidate for sensor-based applications.

72

CHAPTER 6

COMPARISONS OF HYPERSPECTRAL CODERS

The three hyperspectral image coders presented in the present work produce very high-quality results at extremely low bit rates. Figure 6.1 shows the PSNRs obtained by encoding the sequence at

R6

=

0.1 b/p/b with the ECPTCQ, 3-D DCT, and hybrid systems. Note that the average PSNR of the ECPTCQ coder is 2.35 dB better than that of the 3-D nCT coder and is 2.81 dB better than the hybrid coder.

Additionally, the 3-D nCT coder outperforms the hybrid coder by 0.46 dB, with average PSNRs of 40.75 dB and 40.29 dB, respectively.

The increased performance of the ECPTCQ system (for a given asymptotic rate) as compared to the hybrid and 3-D nCT systems is largely attributable to the increased side information required by the ECPTCQ system. The ECPTCQ coder, in effect, has access to more information about the sequence than do the other two coders. For short sequences, the increased side information adds considerably to the overall rate and makes the results of Figure 6.1 slightly misleading.

A more accurate comparison of the coders is shown in Figure 6.2, where the overall rate of the coders is plotted as a function of the number of spectral bands.

For this figure, the encoding rates of the hybrid and 3-D DCT coders were adjusted so that the average PSNR values for encoding the 40-band sequence equaled that of

73

ECPTCQ

30-DCT

Hybrid

I

t"-I'~

35

40 50

Band

60 70

Figure 6.1: Performance of encoding hyperspectral sequence at

R.

=

0.10 b/p/b. the ECPTCQ system (with high side information). For the hybrid coder,

Rl

(the encoding rate of the first spectral band) was increased to 1.0 bpp so that the PSNR of encoding the first band more closely matched the average PSNR of the sequence.

To achieve performance comparable to the ECPTCQ system, the hybrid coder must be operated at an asymptotic rate of

R.

=

0.262 b/p/b, while the 3-D DCT system requires an asymptotic rate of 0.187

b/p/b.

Clearly, as the sequence length becomes shorter, the 3-D DCT and hybrid systems achieve comparable performance at lower overall rates than the ECPTCQ system, since they require less side information.

For example, for sequences shorter than 20, the hybrid system has a lower overall rate than ECPTCQ, while the 3-D DCT system has the lowest overall rate (of the

2.0

1.5

......

"

~

~

'il.

~ 1.0 :

3D-DCT

Hybrid

ECPTCQ i

0.5 \ .•.•

".

,

-

-~-

=-=-

-=-=-~--~"'""-=-=:--=--

=...::-

-:...:--=--

=-=-

-=-=-~

Number of spectrol bands

Figure 6.2: Overall rate versus number of spectral bands. three coders) when encoding sequences of less than about 55 bands. In fact, the side information of the 3-D DCT system contributes less than 10% to the overall rate for sequences longer than 8, and the hybrid system requires a sequence of only 30 bands such that its overall rate is within 10% of its asymptotic rate. This is in sharp contrast to the ECPTCQ system, where sequences longer than 200 are needed such that the side information does not significantly contribute to the overall rate.

For performance comparisons, we have listed the results from [9] for coding hyperspectral data from the Airborne Imaging Spectrometer. It was shown that for encoding a 32-band sequence, compression ratios of 3.4:1, 6.7:1, and 8:1 were obtained by using DPCM, block truncation coding, and transform coding, with PSNR

74

75 values of 35.73, 31.35, and 32.60 dB, respectively. Various VQ schemes yielded performances ranging from 32.09 to 33.53 dB at compression ratios ranging from 32:1 to 21.3:1, respectively.

For encoding a 40-band hyperspectral sequence, the ECPTCQ, 3-D DCT, and hybrid systems achieve compression ratios of 42:1,77:1, and 69:1 at average PSNRs of

43.1, 40.75, and 40.29 dB, respectively. The same level of performance is attainable at compression ratios of 64:1, 79:1, and 76:1, respectively, if all 140 bands were encoded.

Moreover, the subjective performance of coders is excellent. Further, by decreasing the side information of the ECPTCQ system to R

1 ,

RII

=

0.75 bpp, compression ratios of 53:1 and 70:1 can be obtained for sequences of length 40 and 140, respectively, with a decrease of less than 2 dB in average PSNR, as compared to the high-sideinformation case.

Note that when encoding the sequence with the ECPTCQ or hybrid systems, the

PSNR values of the first few spectral bands is highly dependent upon the bit rate, R

1 , chosen to encode the first spectral band. It seems reasonable to choose

Rl

such that the PSNR of the first band is roughly equal to the average PSNR of the remaining bands. For the ECPTCQ system,

Rl

was chosen as 1.0 bpp (when

R/l

=

2.0 bpp and

Ra

=

0.1

b/p/b),

while

Rl

for the hybrid system was chosen at 0.75 bpp (when

Ra

=

0.1 b/p/b). However, when the ECPTCQ system is operated with low side information (e.g.,

RII

=

0.75 bpp),

Rl

was selected as 0.75 bpp to reflect the lower average PSNR of the coded sequence.

76

Despite having lower PSNR performance, the hybrid system gains advantages over the 3-D DCT coder in terms of computational simplicity, and over both 3-D

DCT and ECPTCQ systems in terms of memory requirements. The hybrid system requires only two bands at once to encode an image sequence, while the 3-D DCT and ECPTCQ systems use the entire transformed sequence (of course, much of this problem can be alleviated by breaking the sequence into shorter subsequences and coding each individually). Moreover, the hybrid coder uses simple DPCM in the spectral dimension, while the 3-D DCT coder uses transform operations in all three dimensions, thus affording the hybrid coder lower complexity. The complexity of the

ECPTCQ coder is comparable to that of the hybrid system. Since 4-state trellises are used exclusively in the three systems, the quantization process requires minimal computations [26].

Finally, we note that our systems need not be optimized for any particular class of imagery, thus making the coders very robust to a wide range of scenery. Many of the coders mentioned in the literature use codebooks trained with specific imagery which may result in image class dependencies. Consequently, those coders require some form of scene-based codebook selection mechanism (either supervised or unsupervised) to operate optimally with varying terrain.

The interband correlation of hyperspectral imagery facilitates substantial reduction of the data required for storing and/or transmitting such imagery. However, a careless approach to reducing the correlation could lead to disastrous loss of the

77 information differences between bands that are the critical value of multispectral imagery. Accordingly, subjective and quantitative performance are only starting points for the eva.luation of any hyperspectral image coder. Future studies involving techniques such as spectral signature mapping and mixture component analysis [9] may help reveal the true performance of the systems.

78

CHAPTER 7

SUMMARY

In

Chapter 1, we discussed the formation of hyperspectral imagery and its utility in identifying specific materials within a given scene. We defined the problems associated with storage and transmission of hyperspectral data, and found that lossless compression (i.e., information preserving) can provide compression ratios of only

3:1, which is insufficient to deal with the immense quantities of data produced by hyperspectral sensors. Lossy compression schemes for multispectral data were reviewed and found to be capable of providing compression ratios of greater than 30:1 by exploiting spatial and spectral redundancies inherent to multispectral data.

Chapter 2 reviewed the quantization process and described various scalar quantizers. Vector quantizers were presented and shown to provide substantial gains over scalar quantization via three different mechanisms granular gain, boundary gain, and non-uniform density gain. Trellis coded quantization was reviewed and shown to outperform scalar quantization as well, with far less complexity than VQ.

Entropy-constrained TCQ was shown to provide near-optimal performance (in a ratedistortion theory sense) for encoding all memoryless sources at all nonnegative encoding rates. TCQ was shown to be far more computationally tractable than VQ for a given encoding rate (at comparable performance levels). A new predictive coding

79 scheme called entropy-constrained predictive trellis coded quantization (ECPTCQ) was developed for encoding autoregressive sources. For encoding a Gauss-Markov source, an 8-state ECPTCQ outperforms entropy-constrained DPCM by up to 1.0 dB at all nonnegative encoding rates, and comes within 0.5 dB of the rate-distortion function at high rates.

In

Chapter 3, a hyperspectral coding scheme was developed which uses ECPTCQ.

Specifically, the hyperspectral image sequence is spatially decorrelated by applying a

2-D DCT to nonoverlapping 8 x 8 blocks. Thereafter, ECPTCQ is used to encode the transform coefficients in the spectral dimension. The first spectral band is encoded

(as initial conditions for ECPTCQ) using entropy-constrained TCQ. ECPTCQ and

ECTCQ codebooks are designed using a modified version of the generalized Lloyd algorithm. Spatial and spectral rate allocation is handled by an iterative optimization algorithm. For encoding an 8-bit hyperspectral sequence at 0.125 b/p/b, an average

PSNR of 43.1 dB is obtained.

A 3-D DCT hyperspectral coding system is presented in Chapter 4. The hyperspectral sequence is divided into nonoverlapping 8 x 8 x 8 cubes and "like-coefficients" are collected and encoded using ECTCQ. Codebooks are optimized for various generalized Gaussian distributions with the appropriate codebook being chosen based on the fourth central moment or Kurtosis. For encoding a 40-band hyperspectral sequence, a compression ratio of 77:1 is obtained with an average PSNR of 40.75 dB.

80

A third system is presented in Chapter 5. This coder is a hybrid DPCM/DCT configuration whereby DPCM is used in the spectral dimension and each "error image" is coded using a 2-D DCT coding scheme. The error-image coder transforms

8 x 8 blocks and encodes like coefficients using ECTCQ. Compression ratios of 69:1 are obtainable while maintaining an average PSNR of 40.29 dB over the encoded bands.

Comparisons of the hyperspectral coders were provided in Chapter 6. It was shown that at a given asymptotic rate, the highest level of performance is obtained with the

ECPTCQ coder, followed by the 3-D nCT and hybrid coders, respectively. When average PSNR levels of the coders are matched, the 3-D DCT coder has the lowest overall rate when the sequence length is less than 55 bands. The computational complexity of the ECPTCQ and hybrid systems are roughly equivalent, with the 3-D

DCT system being the most computationally intensive. The hybrid system requires only 2 bands at once for encoding purposes while the ECPTCQ and 3-D DCT systems require the entire hyperspectral sequence. Accordingly, the hybrid system is suitable for sensor-based applications while the ECPTCQ and 3-D DCT coders are better suited to ground-based applications.

Appendix A

PHOTOGRAPHS

81

Figure A.l: Band 50 of a hyperspectral sequence (256

X

256).

82

Figure A.2: Encoded image from ECPTCQ coder with high side information (44.37 dB at 0.19 b/p/b).

Figure A.3: Difference image from ECPTCQ coder with high side information.

83

Figure A.4: Encoded image from ECPTCQ coder with low side information (42.11 dB at 0.17 hlp/b).

Figure A.5: Difference image from ECPTCQ coder with low side information.

Figure A.6: Encoded image from 3-D DCT coder (41.66 dB at 0.104 b/p/b).

Figure A.7: Difference image from 3-D

neT

coder.

84

Figure A.8: Encoded image from hybrid coder (41.O1

dB

at 0.116 b/p/b).

Figure A.9: Difference image from hybrid coder.

85

86

REFERENCES

[1] A. F. H. Goetz, G. Vane, J. E. Solomon, and B. N. Rock, "Imaging spectrometry for earth remote sensing," Science, vol. 228, pp. 1147-1153, June 1985.

[2]

L.

Blanchard and O. Weinstein, "Design challenges of the Thermatic Mapper,"

IEEE Trans. Geosci. Remote Sensing, vol. GE-18, pp. 146-160, Apr. 1980.

[3] P. J. Curran and J.

1.

Dungan, "Estimation of signal-to-noise: A new Procedure applied to AVIRIS data," IEEE Trans. Geosci. Remote Sensing, vol. 27, pp. 620-

628, Sept. 1989.

[4] G. Vane and A. F. H. Goetz, "Terrestrial imaging spectrometry," Remote Sensing

Environ., vol. 24, pp. 1-29, 1988.

[5] W. M. Porter and H. T. Enmark, "A system overview of the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS)," Imaging Spectroscopy II, G.

Vane, Editor, Proc. SPIE 834, pp. 22-29, 1987.

[6] G. Vane, "First results from the Airborne visible/infrared imaging spectrometer

(AVIRIS)," Imaging Spectroscopy II, G. Vane, Editor, Proc. SPIE 834, pp. 166-

174, 1987.

[7] D. Rockey, "High Resolution Imaging Spectrometer (HIRIS) - A major advance in imaging spectrometry," Imaging Spectroscopy of the Terrestrial Environment,

G. Vane, Editor, Proc. SPIE 1298, pp. 93-104, 1990.

[8] W. Esaias and W. Barnes, "Moderate resolution imaging spectrometer (MODIS) instrument panel report," NASA, Washington, DC, EOS Rep., vol. 2b, 1986.

[9] R. L. Baker and Y. T. Tze, "Compression of high spectral resolution imagery,"

Applications of Digital Image Processing XI, A. G. Tescher, Editor, Proc. SPIE

974, pp. 255-264, 1988.

[10] A. Jain, "Image data compression: A review," Proc. IEEE, vol. 69, pp. 349-389,

Mar. 1981.

87

[11] T. M. Chen, D. H. Sta.elin, and R. B. Arps, "Information content analysis for

Landsat image data for compression," IEEE Trans. on Geoscience ana Remote

Sensing,

vol. GE-25, pp. 499-501, July 1987.

[12] N. S. Jayant and P. Noll, Digital Coding of Waveforms. Englewood Cliffs, NJ:

Prentice-Hall, 1984.

[13] A. K. Jain, Fundamentals of Digital Image Processing. Englewood Cliffs, NJ:

Prentice-Hall, 1989.

[14] A. N. Netravali and B. G. Haskell, Digital Pictures, Representation and Com-

pression.

New York, NY: Plenum Press, 1988.

[15] B. R. Epstein, R. Hingorani, J. M. Shapiro, and M. Czigler, "Multispectral

KLT-wavelet data compression for Landsat thermatic mapper images," Proc.

Data Compression Conf.,

pp. 200-208, Apr. 1992.

[16] M. Manohar and J. C. Tilton, "Progressive vector quantization of multispectral image data using a massively parallel SIMD machine," Proc. Data Compression

Conj.,

pp. 181-190, Apr. 1992.

[17] S. Gupta and A. Gersho, "Feature predictive vector quantization of multispectral images," IEEE Trans. Geoscience and Remote Sensing, vol. 30, no. 3, pp. 491-

501, 1992.

[18] S. Gupta and A. Gersho, "Variable rate multistage vector quantization of multispectral imagery with greedy bit allocation," Visual Communications and Image

Processing, B. G. Haskell and H. Hang, Editors, Proc. SPIE 2094, pp. 890-901,

1993.

[19] S. Jaggi, "An investigative study of multispectral lossy data compression using vector quantization," Hybrid Image and Signal Processing III, D. P. Casasent and A. G. Tescher, Editors, Proc. SPIE 1702, pp. 238-249, 1992.

[20] J. C. Tilton, D. Han, and M. Manohar, "Compression experiments with AVHRR data," Proc. Data Compression Conj., pp. 411-420, Apr. 1991.

[21] J. C. Tilton, "Hierarchical data compression: Integrated browse, moderate loss, and lossless levels of data compression," Proc. of the international Geoscience

and Remote Sensing Symp.,

pp. 1651-1654, 1990.

88

[22] J. A. Saghri and A. G. Tescher, "Near-Iossless bandwidth compression for radiometric data," Opt. Engr., vol. 30, pp. 934-939, July 1991.

[23] R. F. Rice, "Some practical universal noiseless coding techniques," JPL Publi-

cation, vol. 79-22, Mar. 15 1979.

[24] R. F. Rice and J. Lee, "Some practical universal noiseless coding techniques II,"

JPL Publication, vol. 83-17, Mar. 1 1983.

[25] P. G. Howard and J. S. Vitter, "New methods for lossless image compression using arithmetic coding," Proc. Data Compression Conf., pp. 257-266, Apr.

1991.

[26] M. W. Marcellin and T. R. Fischer, "Trellis coded quantization of memoryless and Gauss-Markov sources," IEEE Trans. Commun., vol. COM-38, pp. 82-93,

Jan. 1990.

[27] G. Ungerboeck, "Channel coding with multilevel/phase signals," IEEE Trans.

Inform. Th., vol. 28, pp. 55-67, Jan. 1982.

[28] G. D. Forney, Jr., "The Viterbi algorithm," Proc. IEEE, vol. 61, pp. 268-278,

Mar. 1973.

[29] M. W. Marcellin, P. Sriram, and K. Tong, "Transform coding of monochrome and color images using trellis coded quantization," IEEE Trans. Circuits and

Systems for Vidio Technology, vol. 3, pp. 270-276, Aug. 1993.

[30] P. Sriram and M. W. Marcellin, "Image coding using wavelet transforms and entropy-constrained trellis coded quantization," Submitted to IEEE Trans. Im-

age Proc.

[31] G. K. Wallace, "The JPEG still picture compression standard," Communications

of the ACM, vol. 34, pp. 30-44, Apr. 1991.

[32] M. L. Liou, "Overwiew of the px64 kbps video coding standard," Communica-

tions of the ACM, vol. 34, pp. 59-63, Apr. 1991.

[33] D. L. Le Gall, "MPEG: A video compression standard for multimedia applications," Communications of the ACM, vol. 34, pp. 46-58, Apr. 1991.

89

[34] M. T. Sun, T. C. Chen, A. Gottlieb, L. Wu, and M. L. Liou, "A 16 x 16 Discrete

Cosine Transform chip," Visual Communications and Image Processing II, T.

R. Hsing, Editor, Proc. SPIE 845, pp. 13-18, 1987.

[35] A. M. Chiang, "A video-rate CCD two-dimensional cosine transform processor,"

Visual Communications and Image Processing II,

T. R. Hsing, Editor, Proc.

SPIE 845, pp. 2-5, 1987.

[36] F. Jutand, N. Demassieux, M. Dana, J.-P. Durandeau, and G. Concordel, "A

13.5 MHz single chip multiformat discrete cosine transform," Visual Communi-

cations and Image Processing II,

T. R. Hsing, Editor, Proc. SPIE 845, pp. 6-12,

1987.

[37] R. J. Clarke, Transform Coding of Images. Orlando, FL: Academic Press, 1985.

[38] W. H. Chen and C. H. Smith, "Adaptive coding of monochrome and color images," IEEE Trans. Commun., vol. COM-25, pp. 1285-1292, Nov. 1977.

[39] W. H. Chen and W. K. Pratt, "Scene adaptive coder," IEEE Trans. Commun., vol. COM-32, pp. 225-232, Mar. 1984.

[40] W. A. Pearlman, P. Jakatdar, and M. M. Leung, "Adaptive transform tree coding of images," IEEE

J. Select. Areas in Commun.,

vol. 10, pp. 902-912, June 1992.

[41] R. E. Crochiere, S. A. Webber, and J. L. Flanagan, "Digital coding of speech in subbands," Bell Syst. Tech.

J.,

vol. 55, pp. 1069-1085, Oct. 1976.

[42] A. Croisier, D. Esteban, and C. Garland, "Perfect channel splitting by use of interpolation/decimation/tree decomposition techniques," in Conf. Proceedings,

1976 IEEE Int. Conf. on Inform. Sci., Patras, Greece, May 1976.

[43] M. Vetterli, "Multi-dimensional subband coding: Some theory and algorithms,"

Signal Processing,

vol. 6, pp. 97-112, Apr. 1984.

[44] J. W. Woods and S. D. O'Neil, "Subband coding of images," IEEE Trans.

Acoust., Speech, and Signal Proc.,

vol. ASSP-34, pp. 1278-1288, Oct. 1986.

[45] H. Gharavi and A. Tabatabai, "Subband coding of monochrome and color images," IEEE Trans. Circuits Syst., vol. CAS-35, pp. 207-214, Feb. 1988.

90

[46] P. H. Westerink, D. E. Boekee, J. Biemond, and J. W. Woods, "Subband coding of images using vector quantization," IEEE Trans. Commun., vol. COM-36, pp. 713-719, June 1988.

[47] M. J. T. Smith and S. L. Eddins, "Analysis/Synthisis techniques for subband image coding,"

IEEE Trans. Acoust., Speech, and Signal Proc., vol. ASSP-38, pp. 1446-1456, Aug. 1990.

[48] S. Nanda and W. A. Pearlman, "Tree coding of image subbands,"

IEEE Trans.

Image Proc., vol. IP-1, pp. 133-147, Apr. 1992.

[49] S. G. Mallat, "A theory for multiresolution signal decomposition: the wavelet representation,"

IEEE Trans. Pattern Anal. and Mach. Intel., vol. 11, pp. 674-

693, July 1989.

[50] S. G. Mallat, "Multifrequency channel decomposition of images and wavelet models," IEEE Trans. Acoust., Speech, and Signal Proc., vol. ASSP-37, pp. 2091-2110, Dec. 1989.

[51]

I.

Daubechies, "Orthonormal bases of compactly supported wavelets,"

Commun.

Pure Appl. Math, vol. 41, pp. 909-996, Nov. 1988.

[52] A. Grossmann and J. Morlet, "Decomposition of Hardy functions into square integrable wavelets of constant shape,"

SIAM J.

Math., vol. 15, pp. 723-736,

July 1984.

[53] J. Max, "Quantizing for minimum distortion," IEEE Trans. Inform. Th., vol. IT-

6, pp. 7-12, Mar. 1960.

[54] R. E. Blahut, Principles and Practice of Information Theory. Reading, MA:

Addison Wesley, 1987.

[55]

I.

H. Witten, R. M. Neal, and J. G. Cleary, "Arithmetic coding for data compression," Communications of the ACM, vol. 30, pp. 520-540, June 1987.

[56] F. Jelinek, "Buffer overflow in variable length coding of fixed rate sources,"

IEEE

Trans. Inform. Th., vol. IT-14, pp. 490-501, May 1968.

91

[57] N. Farvardin and J. W. Modestino, "On overflow and underflow problems in buffer-instrumented variable-length coding of fixed-rate memoryless sources,"

IEEE Trans. Inform. Th.,

vol. IT-32, pp. 839-845, Nov. 1986.

[58] H. Gish and J. N. Pierce, "Asymptotically efficient quantizing,"

IEEE Trans.

Inform. Th.,

vol. IT-14, pp. 676-683, Sept. 1968.

[59] N. Farvardin and J. W. Modestino, "Optimum quantizer performance for a class of non-gaussian memoryless sources,"

IEEE Trans. Inform. Th.,

vol. 30, pp. 485-

497, May 1984.

[60] T. Burger, "Optimum quantizers and permutation codes,"

IEEE Trans. Inform.

Th.,

vol. IT-18, pp. 759-765, Nov. 1972.

[61] T. Berger,

Rate Distortion Theory: A Mathematical Basis for Data Compression.

Englewood Cliffs, NJ: Prentice Hall, 1971.

[62] T. R. Fischer, "Geometric source coding and vector quantization,"

IEEE Trans.

Inform. Th.,

vol. 35, pp. 137-145, Jan. 1989.

[63] A. Gersho and R. M. Gray,

Vector Quantization and Signal Compression.

Boston, MA: Kluwer Academic Press, 1992.

[64] R. Laroia, "Design and analysis of a fixed-rate structured vector quantizer derived from variable-length scalar quantizers," Ph.D. dissertation, University of

Maryland, College Park, MD, May 1992.

[65] M. V. Eyuboglu and G. D Forney, Jr., "Lattice and trellis quantization with lattice- and trellis-bounded codebooks high rate theory for memoryless sources,"

IEEE Trans. Inform. Th.,

vol. 39, pp. 46-59, Jan. 1993.

[66] A. Gersho, "Asymptotically optimal block quantization,"

IEEE Trans. Inform.

Th.,

vol. IT-25, pp. 373-380, July 1979.

[67] Y. Linde, A. Buzo, and R. M. Gray, "An algorithm for vector quantizer design,"

IEEE Trans. Commun.,

vol. COM-28, pp. 84-95, Jan. 1980.

[68] T. R. Fischer, "A pyramid vector quantizer,"

IEEE Trans. Inform. Th.,

vol. 32, pp. 568-583, July 1986.

92

[69] M. W. Marcellin, "On entropy-constrained trellis coded quantization," To appear in IEEE Trans. Comm'Un., vol. 42, Jan. 1994.

[70] P. Chou, T. Lookabaugh, and R. M. Gray, "Entropy-constrained vector quantization," IEEE Trans. Aco'Ust., Speech, and Signal Proc., vol. ASSP-37, pp. 31-42,

Jan. 1989.

[71] N. Farvardin and J. W. Modestino, "Rate-distortion performance of DPCM schemes for autoregressive sources," IEEE Trans. Inform. Th., vol. 31, pp. 402-

418, May 1985.

[72] E. Ayanoglu and R. M. Gray, "The design of predictive trellis waveform coders using the generalized Lloyd algorithm," IEEE Trans. Comm'Un., vol. COM-34, pp. 1073-1080, Nov. 1986.

[73] T. R. Fischer and M. Wang, "Entropy-constrained trellis coded quantization,"

IEEE Trans. Inform. Th.,

vol. 38, pp. 415-425, Mar. 1992.

[74] R. C. Reininger and J. D. Gibson, "Distributions of the two-dimensional DCT coefficients for images," IEEE Trans. Commun., vol. COM-31, pp. 835-839, June

1983.

[75] Y. Shoham and A. Gersho, "Efficient bit allocation for an arbitrary set of quantizers," IEEE Trans. Aco'Ust., Speach, and Signal Proc., vol. 36, pp. 1445-1453,

Sept. 1988.

[76] R. C. Gonzalez and P. Wintz, Digital Image Processing. Reading, MA: Addison

Wesley, 1989.

[77] J. W. Woods, Editor, Subband Image Coding. Boston, MA: Kluwer Academic

Press, 1991.

[78] Y. Q. Zhang and S. Zafar, "Motion-compensated wavelet transform coding for color video compression," IEEE Trans. Circuits and Systems for Vidio Technol-

ogy,

vol. 2, pp. 285-296, Sept. 1992.

[79] S. Zafar, Y. Q. Zhang, and B. J. Jabbari, "Multiscale video representation using multiresolution motion compensation and wavelet decomposition," IEEE

J.

Selected Areas in Commun.,

vol. 11, pp. 24-35, Jan. 1993.

Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement