digital compression and coding of continuous-tone still

digital compression and coding of continuous-tone still
INTERNATIONAL TELECOMMUNICATION UNION
CCITT
THE INTERNATIONAL
TELEGRAPH AND TELEPHONE
CONSULTATIVE COMMITTEE
TERMINAL EQUIPMENT AND PROTOCOLS
FOR TELEMATIC SERVICES
INFORMATION TECHNOLOGY –
DIGITAL COMPRESSION AND CODING
OF CONTINUOUS-TONE STILL IMAGES –
REQUIREMENTS AND GUIDELINES
Recommendation T.81
T.81
(09/92)
Foreword
ITU (International Telecommunication Union) is the United Nations Specialized Agency in the field of
telecommunications. The CCITT (the International Telegraph and Telephone Consultative Committee) is a permanent
organ of the ITU. Some 166 member countries, 68 telecom operating entities, 163 scientific and industrial organizations
and 39 international organizations participate in CCITT which is the body which sets world telecommunications
standards (Recommendations).
The approval of Recommendations by the members of CCITT is covered by the procedure laid down in CCITT Resolution
No. 2 (Melbourne, 1988). In addition, the Plenary Assembly of CCITT, which meets every four years, approves
Recommendations submitted to it and establishes the study programme for the following period.
In some areas of information technology, which fall within CCITT’s purview, the necessary standards are prepared on a
collaborative basis with ISO and IEC. The text of CCITT Recommendation T.81 was approved on 18th September 1992.
The identical text is also published as ISO/IEC International Standard 10918-1.
___________________
CCITT NOTE
In this Recommendation, the expression “Administration” is used for conciseness to indicate both a telecommunication
administration and a recognized private operating agency.
 ITU 1993
All rights reserved. No part of this publication may be reproduced or utilized in any form or by any means, electronic or
mechanical, including photocopying and microfilm, without permission in writing from the ITU.
Contents
Page
Introduction..............................................................................................................................................................
iii
1
Scope ............................................................................................................................................................
1
2
Normative references.....................................................................................................................................
1
3
Definitions, abbreviations and symbols .........................................................................................................
1
4
General .........................................................................................................................................................
12
5
Interchange format requirements ...................................................................................................................
23
6
Encoder requirements ...................................................................................................................................
23
7
Decoder requirements ...................................................................................................................................
23
Annex A – Mathematical definitions........................................................................................................................
24
Annex B – Compressed data formats........................................................................................................................
31
Annex C – Huffman table specification....................................................................................................................
50
Annex D – Arithmetic coding ..................................................................................................................................
54
Annex E – Encoder and decoder control procedures ................................................................................................
77
Annex F – Sequential DCT-based mode of operation...............................................................................................
87
Annex G – Progressive DCT-based mode of operation.............................................................................................
119
Annex H – Lossless mode of operation ....................................................................................................................
132
Annex J – Hierarchical mode of operation................................................................................................................
137
Annex K – Examples and guidelines........................................................................................................................
143
Annex L – Patents....................................................................................................................................................
179
Annex M – Bibliography..........................................................................................................................................
181
CCITT Rec. T.81 (1992 E)
i
Introduction
This CCITT Recommendation | ISO/IEC International Standard was prepared by CCITT Study Group VIII and the Joint
Photographic Experts Group (JPEG) of ISO/IEC JTC 1/SC 29/WG 10. This Experts Group was formed in 1986 to
establish a standard for the sequential progressive encoding of continuous tone grayscale and colour images.
Digital Compression and Coding of Continuous-tone Still images, is published in two parts:
–
Requirements and guidelines;
–
Compliance testing.
This part, Part 1, sets out requirements and implementation guidelines for continuous-tone still image encoding and
decoding processes, and for the coded representation of compressed image data for interchange between applications.
These processes and representations are intended to be generic, that is, to be applicable to a broad range of applications for
colour and grayscale still images within communications and computer systems. Part 2, sets out tests for determining
whether implementations comply with the requirments for the various encoding and decoding processes specified in Part
1.
The user’s attention is called to the possibility that – for some of the coding processes specified herein – compliance with
this Recommendation | International Standard may require use of an invention covered by patent rights. See Annex L for
further information.
The requirements which these processes must satisfy to be useful for specific image communications applications such as
facsimile, Videotex and audiographic conferencing are defined in CCITT Recommendation T.80. The intent is that the
generic processes of Recommendation T.80 will be incorporated into the various CCITT Recommendations for terminal
equipment for these applications.
In addition to the applications addressed by the CCITT and ISO/IEC, the JPEG committee has developped a compression
standard to meet the needs of other applications as well, including desktop publishing, graphic arts, medical imaging and
scientific imaging.
Annexes A, B, C, D, E, F, G, H and J are normative, and thus form an integral part of this Specification. Annexes K, L
and M are informative and thus do not form an integral part of this Specification.
This Specification aims to follow the guidelines of CCITT and ISO/IEC JTC 1 on Rules for presentation of CCITT |
ISO/IEC common text.
ISO/IEC 10918-1 : 1993(E)
INTERNATIONAL STANDARD
ISO/IEC 10918-1 : 1993(E)
CCITT Rec. T.81 (1992 E)
CCITT RECOMMENDATION
INFORMATION TECHNOLOGY – DIGITAL COMPRESSION
AND CODING OF CONTINUOUS-TONE STILL IMAGES –
REQUIREMENTS AND GUIDELINES
1
Scope
This CCITT Recommendation | International Standard is applicable to continuous-tone – grayscale or colour – digital still
image data. It is applicable to a wide range of applications which require use of compressed images. It is not applicable to
bi-level image data.
This Specification
–
specifies processes for converting source image data to compressed image data;
–
specifies processes for converting compressed image data to reconstructed image data;
–
gives guidance on how to implement these processes in practice;
–
specifies coded representations for compressed image data.
NOTE – This Specification does not specify a complete coded image representation. Such representations may include
certain parameters, such as aspect ratio, component sample registration, and colour space designation, which are applicationdependent.
2
Normative references
The following CCITT Recommendations and International Standards contain provisions which, through reference in this
text, constitute provisions of this CCITT Recommendation | International Standard. At the time of publication, the
editions indicated were valid. All Recommendations and Standards are subject to revision, and parties to agreements
based on this CCITT Recommendation | International Standard are encouraged to investigate the possibility of applying
the most recent edition of the Recommendations and Standards listed below. Members of IEC and ISO maintain registers
of currently valid International Standards. The CCITT Secretariat maintains a list of currently valid CCITT
Recommendations.
–
CCITT Recommendation T.80 (1992), Common components for image compression and communication –
Basic principles.
3
Definitions, abbreviations and symbols
3.1
Definitions and abbreviations
For the purposes of this Specification, the following definitions apply.
3.1.1
abbreviated format: A representation of compressed image data which is missing some or all of the table
specifications required for decoding, or a representation of table-specification data without frame headers, scan headers,
and entropy-coded segments.
3.1.2
AC coefficient: Any DCT coefficient for which the frequency is not zero in at least one dimension.
3.1.3
(adaptive) (binary) arithmetic decoding: An entropy decoding procedure which recovers the sequence of
symbols from the sequence of bits produced by the arithmetic encoder.
3.1.4
(adaptive) (binary) arithmetic encoding: An entropy encoding procedure which codes by means of a recursive
subdivision of the probability of the sequence of symbols coded up to that point.
3.1.5
application environment: The standards for data representation, communication, or storage which have been
established for a particular application.
CCITT Rec. T.81 (1992 E)
1
ISO/IEC 10918-1 : 1993(E)
3.1.6
arithmetic decoder: An embodiment of arithmetic decoding procedure.
3.1.7
arithmetic encoder: An embodiment of arithmetic encoding procedure.
3.1.8
baseline (sequential): A particular sequential DCT-based encoding and decoding process specified in this
Specification, and which is required for all DCT-based decoding processes.
3.1.9
binary decision: Choice between two alternatives.
3.1.10
bit stream: Partially encoded or decoded sequence of bits comprising an entropy-coded segment.
3.1.11
block: An 8 × 8 array of samples or an 8 × 8 array of DCT coefficient values of one component.
3.1.12
block-row: A sequence of eight contiguous component lines which are partitioned into 8 × 8 blocks.
3.1.13
byte: A group of 8 bits.
3.1.14 byte stuffing: A procedure in which either the Huffman coder or the arithmetic coder inserts a zero byte into
the entropy-coded segment following the generation of an encoded hexadecimal X’FF’ byte.
3.1.15 carry bit: A bit in the arithmetic encoder code register which is set if a carry-over in the code register overflows
the eight bits reserved for the output byte.
3.1.16 ceiling function: The mathematical procedure in which the greatest integer value of a real number is obtained
by selecting the smallest integer value which is greater than or equal to the real number.
3.1.17
class (of coding process): Lossy or lossless coding processes.
3.1.18 code register: The arithmetic encoder register containing the least significant bits of the partially completed
entropy-coded segment. Alternatively, the arithmetic decoder register containing the most significant bits of a partially
decoded entropy-coded segment.
3.1.19
coder: An embodiment of a coding process.
3.1.20
coding: Encoding or decoding.
3.1.21
coding model: A procedure used to convert input data into symbols to be coded.
3.1.22
(coding) process: A general term for referring to an encoding process, a decoding process, or both.
3.1.23
colour image: A continuous-tone image that has more than one component.
3.1.24
columns: Samples per line in a component.
3.1.25
component: One of the two-dimensional arrays which comprise an image.
3.1.26
compressed data: Either compressed image data or table specification data or both.
3.1.27
compressed image data: A coded representation of an image, as specified in this Specification.
3.1.28
compression: Reduction in the number of bits used to represent source image data.
3.1.29 conditional exchange: The interchange of MPS and LPS probability intervals whenever the size of the LPS
interval is greater than the size of the MPS interval (in arithmetic coding).
3.1.30 (conditional) probability estimate: The probability value assigned to the LPS by the probability estimation
state machine (in arithmetic coding).
3.1.31 conditioning table: The set of parameters which select one of the defined relationships between prior coding
decisions and the conditional probability estimates used in arithmetic coding.
3.1.32 context: The set of previously coded binary decisions which is used to create the index to the probability
estimation state machine (in arithmetic coding).
3.1.33
continuous-tone image: An image whose components have more than one bit per sample.
3.1.34
data unit: An 8 × 8 block of samples of one component in DCT-based processes; a sample in lossless processes.
2
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
3.1.35
DC coefficient: The DCT coefficient for which the frequency is zero in both dimensions.
3.1.36 DC prediction: The procedure used by DCT-based encoders whereby the quantized DC coefficient from the
previously encoded 8 × 8 block of the same component is subtracted from the current quantized DC coefficient.
3.1.37 (DCT) coefficient: The amplitude of a specific cosine basis function – may refer to an original DCT coefficient,
to a quantized DCT coefficient, or to a dequantized DCT coefficient.
3.1.38
decoder: An embodiment of a decoding process.
3.1.39
image.
decoding process: A process which takes as its input compressed image data and outputs a continuous-tone
3.1.40 default conditioning: The values defined for the arithmetic coding conditioning tables at the beginning of
coding of an image.
3.1.41 dequantization: The inverse procedure to quantization by which the decoder recovers a representation of the
DCT coefficients.
3.1.42 differential component: The difference between an input component derived from the source image and the
corresponding reference component derived from the preceding frame for that component (in hierarchical mode coding).
3.1.43 differential frame: A frame in a hierarchical process in which differential components are either encoded or
decoded.
3.1.44 (digital) reconstructed image (data): A continuous-tone image which is the output of any decoder defined in
this Specification.
3.1.45 (digital) source image (data): A continuous-tone image used as input to any encoder defined in this
Specification.
3.1.46
(digital) (still) image: A set of two-dimensional arrays of integer data.
3.1.47 discrete cosine transform; DCT: Either the forward discrete cosine transform or the inverse discrete cosine
transform.
3.1.48 downsampling (filter): A procedure by which the spatial resolution of an image is reduced (in hierarchical
mode coding).
3.1.49
encoder: An embodiment of an encoding process.
3.1.50
data.
encoding process: A process which takes as its input a continuous-tone image and outputs compressed image
3.1.51 entropy-coded (data) segment: An independently decodable sequence of entropy encoded bytes of compressed
image data.
3.1.52 (entropy-coded segment) pointer: The variable which points to the most recently placed (or fetched) byte in
the entropy encoded segment.
3.1.53
entropy decoder: An embodiment of an entropy decoding procedure.
3.1.54 entropy decoding: A lossless procedure which recovers the sequence of symbols from the sequence of bits
produced by the entropy encoder.
3.1.55
entropy encoder: An embodiment of an entropy encoding procedure.
3.1.56 entropy encoding: A lossless procedure which converts a sequence of input symbols into a sequence of bits
such that the average number of bits per symbol approaches the entropy of the input symbols.
3.1.57 extended (DCT-based) process: A descriptive term for DCT-based encoding and decoding processes in which
additional capabilities are added to the baseline sequential process.
3.1.58 forward discrete cosine transform; FDCT: A mathematical transformation using cosine basis functions which
converts a block of samples into a corresponding block of original DCT coefficients.
CCITT Rec. T.81 (1992 E)
3
ISO/IEC 10918-1 : 1993(E)
3.1.59 frame: A group of one or more scans (all using the same DCT-based or lossless process) through the data of one
or more of the components in an image.
3.1.60 frame header: A marker segment that contains a start-of-frame marker and associated frame parameters that are
coded at the beginning of a frame.
3.1.61
frequency: A two-dimensional index into the two-dimensional array of DCT coefficients.
3.1.62
(frequency) band: A contiguous group of coefficients from the zig-zag sequence (in progressive mode coding).
3.1.63 full progression: A process which uses both spectral selection and successive approximation (in progressive
mode coding).
3.1.64
grayscale image: A continuous-tone image that has only one component.
3.1.65 hierarchical: A mode of operation for coding an image in which the first frame for a given component is
followed by frames which code the differences between the source data and the reconstructed data from the previous
frame for that component. Resolution changes are allowed between frames.
3.1.66 hierarchical decoder: A sequence of decoder processes in which the first frame for each component is followed
by frames which decode an array of differences for each component and adds it to the reconstructed data from the
preceding frame for that component.
3.1.67 hierarchical encoder: The mode of operation in which the first frame for each component is followed by frames
which encode the array of differences between the source data and the reconstructed data from the preceding frame for
that component.
3.1.68 horizontal sampling factor: The relative number of horizontal data units of a particular component with respect
to the number of horizontal data units in the other components.
3.1.69
Huffman decoder: An embodiment of a Huffman decoding procedure.
3.1.70 Huffman decoding: An entropy decoding procedure which recovers the symbol from each variable length code
produced by the Huffman encoder.
3.1.71
Huffman encoder: An embodiment of a Huffman encoding procedure.
3.1.72
Huffman encoding: An entropy encoding procedure which assigns a variable length code to each input symbol.
3.1.73
Huffman table: The set of variable length codes required in a Huffman encoder and Huffman decoder.
3.1.74
image data: Either source image data or reconstructed image data.
3.1.75 interchange format: The representation of compressed image data for exchange between application
environments.
3.1.76 interleaved: The descriptive term applied to the repetitive multiplexing of small groups of data units from each
component in a scan in a specific order.
3.1.77 inverse discrete cosine transform; IDCT: A mathematical transformation using cosine basis functions which
converts a block of dequantized DCT coefficients into a corresponding block of samples.
3.1.78 Joint Photographic Experts Group; JPEG: The informal name of the committee which created this
Specification. The “joint” comes from the CCITT and ISO/IEC collaboration.
3.1.79 latent output: Output of the arithmetic encoder which is held, pending resolution of carry-over (in arithmetic
coding).
3.1.80
less probable symbol; LPS: For a binary decision, the decision value which has the smaller probability.
3.1.81 level shift: A procedure used by DCT-based encoders and decoders whereby each input sample is either
converted from an unsigned representation to a two’s complement representation or from a two’s complement
representation to an unsigned representation.
4
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
3.1.82 lossless: A descriptive term for encoding and decoding processes and procedures in which the output of the
decoding procedure(s) is identical to the input to the encoding procedure(s).
3.1.83 lossless coding: The mode of operation which refers to any one of the coding processes defined in this
Specification in which all of the procedures are lossless (see Annex H).
3.1.84
lossy: A descriptive term for encoding and decoding processes which are not lossless.
3.1.85 marker: A two-byte code in which the first byte is hexadecimal FF (X’FF’) and the second byte is a value
between 1 and hexadecimal FE (X’FE’).
3.1.86
marker segment: A marker and associated set of parameters.
3.1.87 MCU-row: The smallest sequence of MCU which contains at least one line of samples or one block-row from
every component in the scan.
3.1.88
minimum coded unit; MCU: The smallest group of data units that is coded.
3.1.89
modes (of operation): The four main categories of image coding processes defined in this Specification.
3.1.90
more probable symbol; MPS: For a binary decision, the decision value which has the larger probability.
3.1.91 non-differential frame: The first frame for any components in a hierarchical encoder or decoder. The
components are encoded or decoded without subtraction from reference components. The term refers also to any frame in
modes other than the hierarchical mode.
3.1.92 non-interleaved: The descriptive term applied to the data unit processing sequence when the scan has only one
component.
3.1.93
parameters: Fixed length integers 4, 8 or 16 bits in length, used in the compressed data formats.
3.1.94
point transform: Scaling of a sample or DCT coefficient.
3.1.95
precision: Number of bits allocated to a particular sample or DCT coefficient.
3.1.96
predictor: A linear combination of previously reconstructed values (in lossless mode coding).
3.1.97 probability estimation state machine: An interlinked table of probability values and indices which is used to
estimate the probability of the LPS (in arithmetic coding).
3.1.98 probability interval: The probability of a particular sequence of binary decisions within the ordered set of all
possible sequences (in arithmetic coding).
3.1.99 (probability) sub-interval: A portion of a probability interval allocated to either of the two possible binary
decision values (in arithmetic coding).
3.1.100 procedure: A set of steps which accomplishes one of the tasks which comprise an encoding or decoding
process.
3.1.101 process: See coding process.
3.1.102 progressive (coding): One of the DCT-based processes defined in this Specification in which each scan
typically improves the quality of the reconstructed image.
3.1.103 progressive DCT-based: The mode of operation which refers to any one of the processes defined in Annex G.
3.1.104 quantization table: The set of 64 quantization values used to quantize the DCT coefficients.
3.1.105 quantization value: An integer value used in the quantization procedure.
3.1.106 quantize: The act of performing the quantization procedure for a DCT coefficient.
3.1.107 reference (reconstructed) component: Reconstructed component data which is used in a subsequent frame of a
hierarchical encoder or decoder process (in hierarchical mode coding).
CCITT Rec. T.81 (1992 E)
5
ISO/IEC 10918-1 : 1993(E)
3.1.108 renormalization: The doubling of the probability interval and the code register value until the probability
interval exceeds a fixed minimum value (in arithmetic coding).
3.1.109 restart interval: The integer number of MCUs processed as an independent sequence within a scan.
3.1.110 restart marker: The marker that separates two restart intervals in a scan.
3.1.111 run (length): Number of consecutive symbols of the same value.
3.1.112 sample: One element in the two-dimensional array which comprises a component.
3.1.113 sample-interleaved: The descriptive term applied to the repetitive multiplexing of small groups of samples from
each component in a scan in a specific order.
3.1.114 scan: A single pass through the data for one or more of the components in an image.
3.1.115 scan header: A marker segment that contains a start-of-scan marker and associated scan parameters that are
coded at the beginning of a scan.
3.1.116 sequential (coding): One of the lossless or DCT-based coding processes defined in this Specification in which
each component of the image is encoded within a single scan.
3.1.117 sequential DCT-based: The mode of operation which refers to any one of the processes defined in Annex F.
3.1.118 spectral selection: A progressive coding process in which the zig-zag sequence is divided into bands of one or
more contiguous coefficients, and each band is coded in one scan.
3.1.119 stack counter: The count of X’FF’ bytes which are held, pending resolution of carry-over in the arithmetic
encoder.
3.1.120 statistical conditioning: The selection, based on prior coding decisions, of one estimate out of a set of
conditional probability estimates (in arithmetic coding).
3.1.121 statistical model: The assignment of a particular conditional probability estimate to each of the binary
arithmetic coding decisions.
3.1.122 statistics area: The array of statistics bins required for a coding process which uses arithmetic coding.
3.1.123 statistics bin: The storage location where an index is stored which identifies the value of the conditional
probability estimate used for a particular arithmetic coding binary decision.
3.1.124 successive approximation: A progressive coding process in which the coefficients are coded with reduced
precision in the first scan, and precision is increased by one bit with each succeeding scan.
3.1.125 table specification data: The coded representation from which the tables used in the encoder and decoder are
generated and their destinations specified.
3.1.126 transcoder: A procedure for converting compressed image data of one encoder process to compressed image
data of another encoder process.
3.1.127 (uniform) quantization: The procedure by which DCT coefficients are linearly scaled in order to achieve
compression.
3.1.128 upsampling (filter): A procedure by which the spatial resolution of an image is increased (in hierarchical mode
coding).
3.1.129 vertical sampling factor: The relative number of vertical data units of a particular component with respect to
the number of vertical data units in the other components in the frame.
3.1.130 zero byte: The X’00’ byte.
3.1.131 zig-zag sequence: A specific sequential ordering of the DCT coefficients from (approximately) lowest spatial
frequency to highest.
3.1.132 3-sample predictor: A linear combination of the three nearest neighbor reconstructed samples to the left and
above (in lossless mode coding).
6
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
3.2
Symbols
The symbols used in this Specification are listed below.
A
probability interval
AC
AC DCT coefficient
ACji
AC coefficient predicted from DC values
Ah
successive approximation bit position, high
Al
successive approximation bit position, low
Api
ith 8-bit parameter in APPn segment
APPn
marker reserved for application segments
B
current byte in compressed data
B2
next byte in compressed data when B = X’FF’
BE
counter for buffered correction bits for Huffman coding in the successive approximation
process
BITS
16-byte list containing number of Huffman codes of each length
BP
pointer to compressed data
BPST
pointer to byte before start of entropy-coded segment
BR
counter for buffered correction bits for Huffman coding in the successive approximation
process
Bx
byte modified by a carry-over
C
value of bit stream in code register
Ci
component identifier for frame
Cu
horizontal frequency dependent scaling factor in DCT
Cv
vertical frequency dependent scaling factor in DCT
CE
conditional exchange
C-low
low order 16 bits of the arithmetic decoder code register
Cmi
ith 8-bit parameter in COM segment
CNT
bit counter in NEXTBYTE procedure
CODE
Huffman code value
CODESIZE(V)
code size for symbol V
COM
comment marker
Cs
conditioning table value
Csi
component identifier for scan
CT
renormalization shift counter
Cx
high order 16 bits of arithmetic decoder code register
CX
conditional exchange
dji
data unit from horizontal position i, vertical position j
djik
dji for component k
D
decision decoded
CCITT Rec. T.81 (1992 E)
7
ISO/IEC 10918-1 : 1993(E)
Da
in DC coding, the DC difference coded for the previous block from the same component;
in lossless coding, the difference coded for the sample immediately to the left
DAC
define-arithmetic-coding-conditioning marker
Db
the difference coded for the sample immediately above
DC
DC DCT coefficient
DCi
DC coefficient for ith block in component
DCk
kth DC value used in prediction of AC coefficients
DHP
define hierarchical progression marker
DHT
define-Huffman-tables marker
DIFF
difference between quantized DC and prediction
DNL
define-number-of-lines marker
DQT
define-quantization-tables marker
DRI
define restart interval marker
E
exponent in magnitude category upper bound
EC
event counter
ECS
entropy-coded segment
ECSi
ith entropy-coded segment
Eh
horizontal expansion parameter in EXP segment
EHUFCO
Huffman code table for encoder
EHUFSI
encoder table of Huffman code sizes
EOB
end-of-block for sequential; end-of-band for progressive
EOBn
run length category for EOB runs
EOBx
position of EOB in previous successive approximation scan
EOB0, EOB1, ..., EOB14
run length categories for EOB runs
EOI
end-of-image marker
Ev
vertical expansion parameter in EXP segment
EXP
expand reference components marker
FREQ(V)
frequency of occurrence of symbol V
Hi
horizontal sampling factor for ith component
Hmax
largest horizontal sampling factor
HUFFCODE
list of Huffman codes corresponding to lengths in HUFFSIZE
HUFFSIZE
list of code lengths
HUFFVAL
list of values assigned to each Huffman code
i
subscript index
I
integer variable
Index(S)
index to probability estimation state machine table for context index S
j
subscript index
J
integer variable
8
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
JPG
marker reserved for JPEG extensions
JPGn
marker reserved for JPEG extensions
k
subscript index
K
integer variable
Kmin
index of 1st AC coefficient in band (1 for sequential DCT)
Kx
conditioning parameter for AC arithmetic coding model
L
DC and lossless coding conditioning lower bound parameter
Li
element in BITS list in DHT segment
Li(t)
element in BITS list in the DHT segment for Huffman table t
La
length of parameters in APPn segment
LASTK
largest value of K
Lc
length of parameters in COM segment
Ld
length of parameters in DNL segment
Le
length of parameters in EXP segment
Lf
length of frame header parameters
Lh
length of parameters in DHT segment
Lp
length of parameters in DAC segment
LPS
less probable symbol (in arithmetic coding)
Lq
length of parameters in DQT segment
Lr
length of parameters in DRI segment
Ls
length of scan header parameters
LSB
least significant bit
m
modulo 8 counter for RSTm marker
mt
number of Vi,j parameters for Huffman table t
M
bit mask used in coding magnitude of V
Mn
nth statistics bin for coding magnitude bit pattern category
MAXCODE
table with maximum value of Huffman code for each code length
MCU
minimum coded unit
MCUi
ith MCU
MCUR
number of MCU required to make up one MCU-row
MINCODE
table with minimum value of Huffman code for each code length
MPS
more probable symbol (in arithmetic coding)
MPS(S)
more probable symbol for context-index S
MSB
most significant bit
M2, M3, M4, ... , M15
designation of context-indices for coding of magnitude bits in the arithmetic coding
models
n
integer variable
N
data unit counter for MCU coding
N/A
not applicable
CCITT Rec. T.81 (1992 E)
9
ISO/IEC 10918-1 : 1993(E)
Nb
number of data units in MCU
Next_Index_LPS
new value of Index(S) after a LPS renormalization
Next_Index_MPS
new value of Index(S) after a MPS renormalization
Nf
number of components in frame
NL
number of lines defined in DNL segment
Ns
number of components in scan
OTHERS(V)
index to next symbol in chain
P
sample precision
Pq
quantizer precision parameter in DQT segment
Pq(t)
quantizer precision parameter in DQT segment for quantization table t
PRED
quantized DC coefficient from the most recently coded block of the component
Pt
point transform parameter
Px
calculated value of sample
Qji
quantizer value for coefficient ACji
Qvu
quantization value for DCT coefficient Svu
Q00
quantizer value for DC coefficient
QACji
quantized AC coefficient predicted from DC values
QDCk
kth quantized DC value used in prediction of AC coefficients
Qe
LPS probability estimate
Qe(S)
LPS probability estimate for context index S
Qk
kth element of 64 quantization elements in DQT segment
rvu
reconstructed image sample
R
length of run of zero amplitude AC coefficients
Rvu
dequantized DCT coefficient
Ra
reconstructed sample value
Rb
reconstructed sample value
Rc
reconstructed sample value
Rd
rounding in prediction calculation
RES
reserved markers
Ri
restart interval in DRI segment
RRRR
4-bit value of run length of zero AC coefficients
RS
composite value used in Huffman coding of AC coefficients
RSTm
restart marker number m
syx
reconstructed value from IDCT
S
context index
Svu
DCT coefficient at horizontal frequency u, vertical frequency v
10
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
SC
context-index for coding of correction bit in successive approximation coding
Se
end of spectral selection band in zig-zag sequence
SE
context-index for coding of end-of-block or end-of-band
SI
Huffman code size
SIGN
1 if decoded sense of sign is negative and 0 if decoded sense of sign is positive
SIZE
length of a Huffman code
SLL
shift left logical operation
SLL α β
logical shift left of α by β bits
SN
context-index for coding of first magnitude category when V is negative
SOF0
baseline DCT process frame marker
SOF1
extended sequential DCT frame marker, Huffman coding
SOF2
progressive DCT frame marker, Huffman coding
SOF3
lossless process frame marker, Huffman coding
SOF5
differential sequential DCT frame marker, Huffman coding
SOF6
differential progressive DCT frame marker, Huffman coding
SOF7
differential lossless process frame marker, Huffman coding
SOF9
sequential DCT frame marker, arithmetic coding
SOF10
progressive DCT frame marker, arithmetic coding
SOF11
lossless process frame marker, arithmetic coding
SOF13
differential sequential DCT frame marker, arithmetic coding
SOF14
differential progressive DCT frame marker, arithmetic coding
SOF15
differential lossless process frame marker, arithmetic coding
SOI
start-of-image marker
SOS
start-of-scan marker
SP
context-index for coding of first magnitude category when V is positive
Sqvu
quantized DCT coefficient
SRL
shift right logical operation
SRL α β
logical shift right of α by β bits
Ss
start of spectral selection band in zig-zag sequence
SS
context-index for coding of sign decision
SSSS
4-bit size category of DC difference or AC coefficient amplitude
ST
stack counter
Switch_MPS
parameter controlling inversion of sense of MPS
Sz
parameter used in coding magnitude of V
S0
context-index for coding of V = 0 decision
t
summation index for parameter limits computation
T
temporary variable
CCITT Rec. T.81 (1992 E)
11
ISO/IEC 10918-1 : 1993(E)
Taj
AC entropy table destination selector for jth component in scan
Tb
arithmetic conditioning table destination identifier
Tc
Huffman coding or arithmetic coding table class
Tdj
DC entropy table destination selector for jth component in scan
TEM
temporary marker
Th
Huffman table destination identifier in DHT segment
Tq
quantization table destination identifier in DQT segment
Tqi
quantization table destination selector for ith component in frame
U
DC and lossless coding conditioning upper bound parameter
V
symbol or value being either encoded or decoded
Vi
vertical sampling factor for ith component
Vi,j
jth value for length i in HUFFVAL
Vmax
largest vertical sampling factor
Vt
temporary variable
VALPTR
list of indices for first value in HUFFVAL for each code length
V1
symbol value
V2
symbol value
xi
number of columns in ith component
X
number of samples per line in component with largest horizontal dimension
Xi
ith statistics bin for coding magnitude category decision
X1, X2, X3, ... , X15
designation of context-indices for coding of magnitude categories in the arithmetic coding
models
XHUFCO
extended Huffman code table
XHUFSI
table of sizes of extended Huffman codes
X’values’
values within the quotes are hexadecimal
yi
number of lines in ith component
Y
number of lines in component with largest vertical dimension
ZRL
value in HUFFVAL assigned to run of 16 zero coefficients
ZZ(K)
Kth element in zig-zag sequence of quantized DCT coefficients
ZZ(0)
quantized DC coefficient in zig-zag sequence order
4
General
The purpose of this clause is to give an informative overview of the elements specified in this Specification. Another
purpose is to introduce many of the terms which are defined in clause 3. These terms are printed in italics upon first usage
in this clause.
12
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
4.1
Elements specified in this Specification
There are three elements specified in this Specification:
a)
An encoder is an embodiment of an encoding process. As shown in Figure 1, an encoder takes as input
digital source image data and table specifications, and by means of a specified set of procedures generates
as output compressed image data.
b)
A decoder is an embodiment of a decoding process. As shown in Figure 2, a decoder takes as input
compressed image data and table specifications, and by means of a specified set of procedures generates as
output digital reconstructed image data.
c)
The interchange format, shown in Figure 3, is a compressed image data representation which includes all
table specifications used in the encoding process. The interchange format is for exchange between
application environments.
Encoder
TISO0650-93/d001
Source
image data
Table
specifications
Compressed
image data
Figure 1 – Encoder
FIGURE 1 [D01] 5 cm = 195%
Decoder
TISO0660-93/d002
Compressed
image data
Table
specifications
Reconstructed
image data
Figure 2 – Decoder
FIGURE 2 [D02] 6 cm 234%
Figures 1 and 2 illustrate the general case for which the continuous-tone source and reconstructed image data consist of
multiple components. (A colour image consists of multiple components; a grayscale image consists only of a single
component.) A significant portion of this Specification is concerned with how to handle multiple-component images in a
flexible, application-independent way.
CCITT Rec. T.81 (1992 E)
13
ISO/IEC 10918-1 : 1993(E)
Application environment
A
Compressed image data, including table specifications
Application environment
B
TISO0670-93/d003
Figure 3 – Interchange format for compressed image data
FIGURE 3 [D03] 9,5cm = 371 %
These figures are also meant to show that the same tables specified for an encoder to use to compress a particular image
must be provided to a decoder to reconstruct that image. However, this Specification does not specify how applications
should associate tables with compressed image data, nor how they should represent source image data generally within
their specific environments.
Consequently, this Specification also specifies the interchange format shown in Figure 3, in which table specifications are
included within compressed image data. An image compressed with a specified encoding process within
one application environment, A, is passed to a different environment, B, by means of the interchange format.
The interchange format does not specify a complete coded image representation. Application-dependent information,
e.g. colour space, is outside the scope of this Specification.
4.2
Lossy and lossless compression
This Specification specifies two classes of encoding and decoding processes, lossy and lossless processes. Those based on
the discrete cosine transform (DCT) are lossy, thereby allowing substantial compression to be achieved while producing a
reconstructed image with high visual fidelity to the encoder’s source image.
The simplest DCT-based coding process is referred to as the baseline sequential process. It provides a capability which is
sufficient for many applications. There are additional DCT-based processes which extend the baseline sequential process
to a broader range of applications. In any decoder using extended DCT-based decoding processes, the baseline decoding
process is required to be present in order to provide a default decoding capability.
The second class of coding processes is not based upon the DCT and is provided to meet the needs of applications
requiring lossless compression. These lossless encoding and decoding processes are used independently of any of the
DCT-based processes.
A table summarizing the relationship among these lossy and lossless coding processes is included in 4.11.
The amount of compression provided by any of the various processes is dependent on the characteristics of the particular
image being compressed, as well as on the picture quality desired by the application and the desired speed of compression
and decompression.
14
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
4.3
DCT-based coding
Figure 4 shows the main procedures for all encoding processes based on the DCT. It illustrates the special case of a singlecomponent image; this is an appropriate simplification for overview purposes, because all processes specified in this
Specification operate on each image component independently.
DCT-based encoder
8 × 8 blocks
FDCT
Source
image data
Quantizer
Entropy
encoder
Table
specifications
Table
specifications
Compressed
image data
TISO0680-93/d004
Figure 4 – DCT-based encoder simplified diagram
FIGURE 4 [D04] 7 cm = 273 %
In the encoding process the input component’s samples are grouped into 8 × 8 blocks, and each block is transformed by
the forward DCT (FDCT) into a set of 64 values referred to as DCT coefficients. One of these values is referred to as the
DC coefficient and the other 63 as the AC coefficients.
Each of the 64 coefficients is then quantized using one of 64 corresponding values from a quantization table (determined
by one of the table specifications shown in Figure 4). No default values for quantization tables are specified in this
Specification; applications may specify values which customize picture quality for their particular image characteristics,
display devices, and viewing conditions.
After quantization, the DC coefficient and the 63 AC coefficients are prepared for entropy encoding, as shown in Figure
5. The previous quantized DC coefficient is used to predict the current quantized DC coefficient, and the difference is
encoded. The 63 quantized AC coefficients undergo no such differential encoding, but are converted into a onedimensional zig-zag sequence, as shown in Figure 5.
The quantized coefficients are then passed to an entropy encoding procedure which compresses the data further. One of
two entropy coding procedures can be used, as described in 4.6. If Huffman encoding is used, Huffman table
specifications must be provided to the encoder. If arithmetic encoding is used, arithmetic coding conditioning table
specifications may be provided, otherwise the default conditioning table specifications shall be used.
Figure 6 shows the main procedures for all DCT-based decoding processes. Each step shown performs essentially the
inverse of its corresponding main procedure within the encoder. The entropy decoder decodes the zig-zag sequence of
quantized DCT coefficients. After dequantization the DCT coefficients are transformed to an 8 × 8 block of samples by
the inverse DCT (IDCT).
4.4
Lossless coding
Figure 7 shows the main procedures for the lossless encoding processes. A predictor combines the reconstructed values of
up to three neighbourhood samples at positions a, b, and c to form a prediction of the sample at position x as shown in
Figure 8. This prediction is then subtracted from the actual value of the sample at position x, and the difference is
losslessly entropy-coded by either Huffman or arithmetic coding.
CCITT Rec. T.81 (1992 E)
15
ISO/IEC 10918-1 : 1993(E)
DC AC 01
DC i - 1
DC i
Block i - 1
Block i
AC 07
DIFF = DC i - DC i - 1
TISO0690-93/d005
AC 70
Differential DC encoding
AC 77
Zig-zag order
Figure 5 – Preparation of quantized coefficients for entropy encoding
FIGURE 5 [D05] 8 cm = 313 %
DCT-based decoder
Entropy
decoder
Dequantizer
IDCT
TISO0700-93/d006
Compressed
image data
Table
specifications
Reconstructed
image data
Table
specifications
Figure 6 – DCT-based decoder simplified diagram
FIGURE 6 [D06] 6,5 cm = 254 %
Lossless encoder
Predictor
Entropy
encoder
TISO0710-93/d007
Table
specifications
Source
image data
Figure 7 – Lossless encoder simplified diagram
FIGURE 7 [D07] 6,5 cm = 254 %
16
CCITT Rec. T.81 (1992 E)
Compressed
image data
ISO/IEC 10918-1 : 1993(E)
c
b
a
x
TISO0720-93/d008
Figure 8 – 3-sample prediction neighbourhood
FIGURE 8 [D08] 5 cm = 195 %
This encoding process may also be used in a slightly modified way, whereby the precision of the input samples is reduced
by one or more bits prior to the lossless coding. This achieves higher compression than the lossless process (but lower
compression than the DCT-based processes for equivalent visual fidelity), and limits the reconstructed image’s worst-case
sample error to the amount of input precision reduction.
4.5
Modes of operation
There are four distinct modes of operation under which the various coding processes are defined: sequential
DCT-based, progressive DCT-based, lossless, and hierarchical. (Implementations are not required to provide all of
these.) The lossless mode of operation was described in 4.4. The other modes of operation are compared as follows.
For the sequential DCT-based mode, 8 × 8 sample blocks are typically input block by block from left to right, and blockrow by block-row from top to bottom. After a block has been transformed by the forward DCT, quantized and prepared for
entropy encoding, all 64 of its quantized DCT coefficients can be immediately entropy encoded and output as part of the
compressed image data (as was described in 4.3), thereby minimizing coefficient storage requirements.
For the progressive DCT-based mode, 8 × 8 blocks are also typically encoded in the same order, but in multiple scans
through the image. This is accomplished by adding an image-sized coefficient memory buffer (not shown in Figure 4)
between the quantizer and the entropy encoder. As each block is transformed by the forward DCT and quantized, its
coefficients are stored in the buffer. The DCT coefficients in the buffer are then partially encoded in each of multiple
scans. The typical sequence of image presentation at the output of the decoder for sequential versus progressive modes of
operation is shown in Figure 9.
There are two procedures by which the quantized coefficients in the buffer may be partially encoded within a scan. First,
only a specified band of coefficients from the zig-zag sequence need be encoded. This procedure is called spectral
selection, because each band typically contains coefficients which occupy a lower or higher part of the frequency spectrum
for that 8 × 8 block. Secondly, the coefficients within the current band need not be encoded to their full (quantized)
accuracy within each scan. Upon a coefficient’s first encoding, a specified number of most significant bits is encoded first.
In subsequent scans, the less significant bits are then encoded. This procedure is called successive approximation. Either
procedure may be used separately, or they may be mixed in flexible combinations.
In hierarchical mode, an image is encoded as a sequence of frames. These frames provide reference reconstructed
components which are usually needed for prediction in subsequent frames. Except for the first frame for a given
component, differential frames encode the difference between source components and reference reconstructed
components. The coding of the differences may be done using only DCT-based processes, only lossless processes, or
DCT-based processes with a final lossless process for each component. Downsampling and upsampling filters may be
used to provide a pyramid of spatial resolutions as shown in Figure 10. Alternatively, the hierarchical mode can be used to
improve the quality of the reconstructed components at a given spatial resolution.
Hierarchical mode offers a progressive presentation similar to the progressive DCT-based mode but is useful in
environments which have multi-resolution requirements. Hierarchical mode also offers the capability of progressive
coding to a final lossless stage.
CCITT Rec. T.81 (1992 E)
17
ISO/IEC 10918-1 : 1993(E)
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAA
Progressive
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AA
AAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAAAAAAAA
AAAAAAAA
AA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAAA
AAA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAAA
AAA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAA
AAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAA
AA
AAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAA
AA
AAAA
AAAA
AAAA
AAAAAA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAAAAAA
AA
AAAAAAAAAAAAAAAAAA
AA
AAAAAAAAAAAAAAAA
AAAAAAAAAA
AAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AA
AAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAAA
AAAA
AAAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAAA
AAAA
AAAA
AAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AA
AAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAA
AA
AAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAA
AAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAA
AA
AAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAA
AAAA
AAAA
AAAA
AA
AA AAAAAAAAAAAA
AAAA
AAAA
AAAA
AAAA
AAAAAAAAAA
AAAAAAAAAAAAAAAAAA
AA
AAAAAAAAAAAAAAAA
AAAA
AAAA
AA
AAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AA
AAA
AAAA
AAAA
AAAA
AAAA
AAAA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAAA
AAAA
AAAA
AAAA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AA
AAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAA
AA
AAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAA
AAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAA
AA
AAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAA
AAAA
AAAA
AAAA
AA
AAAAAAAAAAAAAAAAAA
AAAA
AAAA
AAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAA
AA
AAAAAAAAAAAAAAAA
AAAA
AAAA
AA
AAA
AAAA
AAAA
AAAA
AAAA
AAAA
AA
AAAAAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAA
AAAA
AAA
AAAA
AAAA
AAAA
AAA
AAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAA
AAAAAAAA
AAAAA
AAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAA
AAAAAAA
AAAAAAAA
AAAAA
AAAA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAAA
AAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAA
AAAA
AAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AA
AAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAAAA
AAAAAA
AA
AAAAAAAAAAAAAAAAAAAAAAAAAA
TISO0730-93/d009
Sequential
Figure 9 – Progressive versus sequential presentation
FIGURE 9 [D09] 9,5 cm = 371 %
TISO0740-93/d010
Figure 10 – Hierarchical multi-resolution encoding
FIGURE 10 [D10] 9.5 cm = 374 %
4.6
Entropy coding alternatives
Two alternative entropy coding procedures are specified: Huffman coding and arithmetic coding. Huffman coding
procedures use Huffman tables, determined by one of the table specifications shown in Figures 1 and 2. Arithmetic coding
procedures use arithmetic coding conditioning tables, which may also be determined by a table specification. No default
values for Huffman tables are specified, so that applications may choose tables appropriate for their own environments.
Default tables are defined for the arithmetic coding conditioning.
18
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
The baseline sequential process uses Huffman coding, while the extended DCT-based and lossless processes may use
either Huffman or arithmetic coding.
4.7
Sample precision
For DCT-based processes, two alternative sample precisions are specified: either 8 bits or 12 bits per sample. Applications
which use samples with other precisions can use either 8-bit or 12-bit precision by shifting their source image samples
appropriately. The baseline process uses only 8-bit precision. DCT-based implementations which handle 12-bit source
image samples are likely to need greater computational resources than those which handle only
8-bit source images. Consequently in this Specification separate normative requirements are defined for 8-bit and
12-bit DCT-based processes.
For lossless processes the sample precision is specified to be from 2 to 16 bits.
4.8
Multiple-component control
Subclauses 4.3 and 4.4 give an overview of one major part of the encoding and decoding processes – those which operate
on the sample values in order to achieve compression. There is another major part as well – the procedures which control
the order in which the image data from multiple components are processed to create the compressed data, and which
ensure that the proper set of table data is applied to the proper data units in the image. (A data unit is a sample for lossless
processes and an 8 × 8 block of samples for DCT-based processes.)
4.8.1
Interleaving multiple components
Figure 11 shows an example of how an encoding process selects between multiple source image components as well as
multiple sets of table data, when performing its encoding procedures. The source image in this example consists of the
three components A, B and C, and there are two sets of table specifications. (This simplified view does not distinguish
between the quantization tables and entropy coding tables.)
A
B
Encoding
process
C
Source
image data
Table specification 1
Table specification 2
Compressed
image data
TISO0750-93/d011
Figure 11 – Component-interleave and table-switching control
FIGURE 11 [D11] 7 cm = 273 %
In sequential mode, encoding is non-interleaved if the encoder compresses all image data units in component A before
beginning component B, and then in turn all of B before C. Encoding is interleaved if the encoder compresses a data unit
from A, a data unit from B, a data unit from C, then back to A, etc. These alternatives are illustrated in Figure 12, which
shows a case in which all three image components have identical dimensions: X columns by Y lines, for a total of n data
units each.
CCITT Rec. T.81 (1992 E)
19
ISO/IEC 10918-1 : 1993(E)
X
X
A1 A2
X
B 1 B2
Y
C1 C2
Y
Y
An
Bn
Cn
TISO0760-93/d012
A 1 , A 2, ....A n ,
B1 , B 2 , ....Bn ,
C1 , C 2 , ....C n
Scan 1
Scan 2
Scan 3
Data unit encoding order, non-interleaved
A 1 , B1 , C1 , A 2, B 2 , C 2 , ....A n, B n , C n
Scan 1
Data unit encoding order, interleaved
Figure 12 – Interleaved versus non-interleaved encoding order
FIGURE 12 [D12] 9,5 cm = 371 %
These control procedures are also able to handle cases in which the source image components have different dimensions.
Figure 13 shows a case in which two of the components, B and C, have half the number of horizontal samples relative to
component A. In this case, two data units from A are interleaved with one each from B and C. Cases in which components
of an image have more complex relationships, such as different horizontal and vertical dimensions, can be handled as
well. (See Annex A.)
X
A1
A2
Y
X/2
X/2
B1 B2
C1 C2
Y
An
Y
B n/2
C n/2
TISO0770-93/d013
A 1 , A 2 , B 1 , C 1 , A 3, A 4, B 2 , C 2 , ....A n-1, A n, Bn/2, C n/2
Scan 1
Data unit encoding order, interleaved
Figure 13 – Interleaved order for components with different dimensions
FIGURE 13 [D13] 8 cm = 313 %
20
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
4.8.2
Minimum coded unit
Related to the concepts of multiple-component interleave is the minimum coded unit (MCU). If the compressed image
data is non-interleaved, the MCU is defined to be one data unit. For example, in Figure 12 the MCU for the noninterleaved case is a single data unit. If the compressed data is interleaved, the MCU contains one or more data units from
each component. For the interleaved case in Figure 12, the (first) MCU consists of the three interleaved data units A1, B1,
C1. In the example of Figure 13, the (first) MCU consists of the four data units A1, A2 , B1, C1.
4.9
Structure of compressed data
Figures 1, 2, and 3 all illustrate slightly different views of compressed image data. Figure 1 shows this data as the output
of an encoding process, Figure 2 shows it as the input to a decoding process, and Figure 3 shows compressed image data
in the interchange format, at the interface between applications.
Compressed image data are described by a uniform structure and set of parameters for both classes of encoding processes
(lossy or lossless), and for all modes of operation (sequential, progressive, lossless, and hierarchical). The various parts of
the compressed image data are identified by special two-byte codes called markers. Some markers are followed by
particular sequences of parameters, as in the case of table specifications, frame header, or scan header. Others are used
without parameters for functions such as marking the start-of-image and end-of-image. When a marker is associated with a
particular sequence of parameters, the marker and its parameters comprise a marker segment.
The data created by the entropy encoder are also segmented, and one particular marker – the restart marker – is used to
isolate entropy-coded data segments. The encoder outputs the restart markers, intermixed with the entropy-coded data, at
regular restart intervals of the source image data. Restart markers can be identified without having to decode the
compressed data to find them. Because they can be independently decoded, they have application-specific uses, such as
parallel encoding or decoding, isolation of data corruptions, and semi-random access of entropy-coded segments.
There are three compressed data formats:
4.9.1
a)
the interchange format;
b)
the abbreviated format for compressed image data;
c)
the abbreviated format for table-specification data.
Interchange format
In addition to certain required marker segments and the entropy-coded segments, the interchange format shall include the
marker segments for all quantization and entropy-coding table specifications needed by the decoding process. This
guarantees that a compressed image can cross the boundary between application environments, regardless of how each
environment internally associates tables with compressed image data.
4.9.2
Abbreviated format for compressed image data
The abbreviated format for compressed image data is identical to the interchange format, except that it does not include all
tables required for decoding. (It may include some of them.) This format is intended for use within applications where
alternative mechanisms are available for supplying some or all of the table-specification data needed for decoding.
4.9.3
Abbreviated format for table-specification data
This format contains only table-specification data. It is a means by which the application may install in the decoder the
tables required to subsequently reconstruct one or more images.
4.10
Image, frame, and scan
Compressed image data consists of only one image. An image contains only one frame in the cases of sequential and
progressive coding processes; an image contains multiple frames for the hierarchical mode.
A frame contains one or more scans. For sequential processes, a scan contains a complete encoding of one or more image
components. In Figures 12 and 13, the frame consists of three scans when non-interleaved, and one scan if all three
components are interleaved together. The frame could also consist of two scans: one with a non-interleaved component,
the other with two components interleaved.
CCITT Rec. T.81 (1992 E)
21
ISO/IEC 10918-1 : 1993(E)
For progressive processes, a scan contains a partial encoding of all data units from one or more image components.
Components shall not be interleaved in progressive mode, except for the DC coefficients in the first scan for each
component of a progressive frame.
4.11
Summary of coding processes
Table 1 provides a summary of the essential characteristics of the various coding processes specified in this Specification.
The full specification of these processes is contained in Annexes F, G, H, and J.
Table 1 – Summary: Essential characteristics of coding processes
Baseline process (required for all DCT-based decoders)
•
•
•
•
•
•
DCT-based process
Source image: 8-bit samples within each component
Sequential
Huffman coding: 2 AC and 2 DC tables
Decoders shall process scans with 1, 2, 3, and 4 components
Interleaved and non-interleaved scans
Extended DCT-based processes
•
•
•
•
•
•
DCT-based process
Source image: 8-bit or 12-bit samples
Sequential or progressive
Huffman or arithmetic coding: 4 AC and 4 DC tables
Decoders shall process scans with 1, 2, 3, and 4 components
Interleaved and non-interleaved scans
Lossless processes
•
•
•
•
•
•
Predictive process (not DCT-based)
Source image: P-bit samples (2 ≤ P ≤ 16)
Sequential
Huffman or arithmetic coding: 4 DC tables
Decoders shall process scans with 1, 2, 3, and 4 components
Interleaved and non-interleaved scans
Hierarchical processes
•
•
•
•
22
Multiple frames (non-differential and differential)
Uses extended DCT-based or lossless processes
Decoders shall process scans with 1, 2, 3, and 4 components
Interleaved and non-interleaved scans
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
5
Interchange format requirements
The interchange format is the coded representation of compressed image data for exchange between application
environments.
The interchange format requirements are that any compressed image data represented in interchange format shall comply
with the syntax and code assignments appropriate for the decoding process selected, as specified in Annex B.
Tests for whether compressed image data comply with these requirements are specified in Part 2 of this Specification.
6
Encoder requirements
An encoding process converts source image data to compressed image data. Each of Annexes F, G, H, and J specifies a
number of distinct encoding processes for its particular mode of operation.
An encoder is an embodiment of one (or more) of the encoding processes specified in Annexes F, G, H, or J. In order to
comply with this Specification, an encoder shall satisfy at least one of the following two requirements.
An encoder shall
a)
with appropriate accuracy, convert source image data to compressed image data which comply with the
interchange format syntax specified in Annex B for the encoding process(es) embodied by the encoder;
b)
with appropriate accuracy, convert source image data to compressed image data which comply with the
abbreviated format for compressed image data syntax specified in Annex B for the encoding process(es)
embodied by the encoder.
For each of the encoding processes specified in Annexes F, G, H, and J, the compliance tests for the above requirements
are specified in Part 2 of this Specification.
NOTE – There is no requirement in this Specification that any encoder which embodies one of the encoding processes
specified in Annexes F, G, H, or J shall be able to operate for all ranges of the parameters which are allowed for that process. An
encoder is only required to meet the compliance tests specified in Part 2, and to generate the compressed data format according to
Annex B for those parameter values which it does use.
7
Decoder requirements
A decoding process converts compressed image data to reconstructed image data. Each of Annexes F, G, H, and J
specifies a number of distinct decoding processes for its particular mode of operation.
A decoder is an embodiment of one (or more) of the decoding processes specified in Annexes F, G, H, or J. In order to
comply with this Specification, a decoder shall satisfy all three of the following requirements.
A decoder shall
a)
with appropriate accuracy, convert to reconstructed image data any compressed image data with parameters
within the range supported by the application, and which comply with the interchange format syntax
specified in Annex B for the decoding process(es) embodied by the decoder;
b)
accept and properly store any table-specification data which comply with the abbreviated format for tablespecification data syntax specified in Annex B for the decoding process(es) embodied by the decoder;
c)
with appropriate accuracy, convert to reconstructed image data any compressed image data which comply
with the abbreviated format for compressed image data syntax specified in Annex B for the decoding
process(es) embodied by the decoder, provided that the table-specification data required for decoding the
compressed image data has previously been installed into the decoder.
Additionally, any DCT-based decoder, if it embodies any DCT-based decoding process other than baseline sequential,
shall also embody the baseline sequential decoding process.
For each of the decoding processes specified in Annexes F, G, H, and J, the compliance tests for the above requirements
are specified in Part 2 of this Specification.
CCITT Rec. T.81 (1992 E)
23
ISO/IEC 10918-1 : 1993(E)
Annex A
Mathematical definitions
(This annex forms an integral part of this Recommendation | International Standard)
A.1
Source image
Source images to which the encoding processes specified in this Specification can be applied are defined in this annex.
A.1.1
Dimensions and sampling factors
As shown in Figure A.1, a source image is defined to consist of Nf components. Each component, with unique identifier
Ci, is defined to consist of a rectangular array of samples of xi columns by yi lines. The component dimensions are derived
from two parameters, X and Y, where X is the maximum of the xi values and Y is the maximum of the yi values for all
components in the frame. For each component, sampling factors Hi and Vi are defined relating component dimensions xi
and yi to maximum dimensions X and Y, according to the following expressions:
xi =
L
MX
M
×
Hi
H max
O
P
P
and y i
L
MY
M
×
Vi O
,
Vmax PP
where Hmax and V max are the maximum sampling factors for all components in the frame, and   is the ceiling function.
As an example, consider an image having 3 components with maximum dimensions of 512 lines and 512 samples per line,
and with the following sampling factors:
Component 0
Component 1
Component 2
H 0 = 4, V0 = 1
H1 = 2, V1 = 2
H 2 = 1, V2 = 1
Then X = 512, Y = 512, Hmax = 4, Vmax = 2, and xi and yi for each component are
Component 0
Component 1
Component 2
x 0 = 512, y 0 = 256
x1 = 256, y1 = 512
x 2 = 128, y 2 = 256
NOTE – The X, Y, Hi , and Vi parameters are contained in the frame header of the compressed image data (see B.2.2),
whereas the individual component dimensions xi and yi are derived by the decoder. Source images with xi and yi dimensions which do
not satisfy the expressions above cannot be properly reconstructed.
A.1.2
Sample precision
A sample is an integer with precision P bits, with any value in the range 0 through 2P – 1. All samples of all components
within an image shall have the same precision P. Restrictions on the value of P depend on the mode of operation, as
specified in B.2 to B.7.
A.1.3
Data unit
A data unit is a sample in lossless processes and an 8 × 8 block of contiguous samples in DCT-based processes. The leftmost 8 samples of each of the top-most 8 rows in the component shall always be the top-left-most block. With this top-leftmost block as the reference, the component is partitioned into contiguous data units to the right and to the bottom (as
shown in Figure A.4).
A.1.4
Orientation
Figure A.1 indicates the orientation of an image component by the terms top, bottom, left, and right. The order by which
the data units of an image component are input to the compression encoding procedures is defined to be left-to-right and
top-to-bottom within the component. (This ordering is precisely defined in A.2.) Applications determine which edges of a
source image are defined as top, bottom, left, and right.
24
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Ci
Top
Samples
Line
yi
Left
C Nf
Right
xi
C Nf-1
C2
C1
Bottom
a) Source image with multiple components
TISO0780-93/d014
b) Characteristics of an image component
Figure A.1 – Source image characteristics
FIGURE A-1 [D14] 8 cm = 313 %
A.2
Order of source image data encoding
The scan header (see B.2.3) specifies the order by which source image data units shall be encoded and placed within the
compressed image data. For a given scan, if the scan header parameter Ns = 1, then data from only one source component
– the component specified by parameter Cs1 – shall be present within the scan. This data is non-interleaved by definition.
If Ns > 1, then data from the Ns components Cs1 through CsNs shall be present within the scan. This data shall always be
interleaved. The order of components in a scan shall be according to the order specified in the frame header.
The ordering of data units and the construction of minimum coded units (MCU) is defined as follows.
A.2.1
Minimum coded unit (MCU)
For non-interleaved data the MCU is one data unit. For interleaved data the MCU is the sequence of data units defined by
the sampling factors of the components in the scan.
A.2.2
Non-interleaved order (Ns = 1)
When Ns = 1 (where Ns is the number of components in a scan), the order of data units within a scan shall be left-to-right
and top-to-bottom, as shown in Figure A.2. This ordering applies whenever Ns = 1, regardless of the values of
H1 and V1.
Top
Left
Right
Bottom
TISO0790-93/d015
Figure A.2 – Non-interleaved data ordering
FIGURE A.2 [D15] 3,5 cm = 136 %
CCITT Rec. T.81 (1992 E)
25
ISO/IEC 10918-1 : 1993(E)
A.2.3
Interleaved order (Ns > 1)
When Ns > 1, each scan component Csi is partitioned into small rectangular arrays of Hk horizontal data units by Vk
vertical data units. The subscripts k indicate that Hk and Vk are from the position in the frame header componentspecification for which Ck = Csi. Within each Hk by Vk array, data units are ordered from left-to-right and top-to-bottom.
The arrays in turn are ordered from left-to-right and top-to-bottom within each component.
As shown in the example of Figure A.3, Ns = 4, and MCU1 consists of data units taken first from the top-left-most region
of Cs1, followed by data units from the corresponding region of Cs2, then from Cs3 and then from Cs4. MCU2 follows the
same ordering for data taken from the next region to the right for the four components.
Cs 1 : H 1 = 2, V 1 = 2
0
1
2
3
4
Cs 2: H 2 = 2, V 2 = 1
5
0
1
2
3
4
Cs 3 : H 3 = 1, V 3 = 2
5
0
1
Cs 4 : H 4 = 1, V4 = 1
2
0
0
0
0
0
1
1
1
1
2
2
3
3
1
2
TISO0800-93/d016
MCU 1 =
MCU 2 =
MCU 3 =
MCU 4 =
d 100
d 102
d 104
d 120
1
d 01
1
d 03
1
d 05
1
d 21
1
d10
1
d12
1
d14
1
d30
1
d11
1
d13
1
d15
1
d31
Cs1 data units
d 200
d 202
d 204
d 210
2
d 01
2
d 03
2
d 05
2
d 11
Cs 2
3
d 00
3
d 01
3
d 02
3
d 20
d 310
d 311
d 312
d 330
Cs 3
d 400 ,
d 401 ,
d 402 ,
4
,
d 10
Cs 4
Figure A.3 – Interleaved data ordering example
FIGURE A.3 [D16] 7,5 cm = 293 %
A.2.4
Completion of partial MCU
For DCT-based processes the data unit is a block. If xi is not a multiple of 8, the encoding process shall extend the number
of columns to complete the right-most sample blocks. If the component is to be interleaved, the encoding process shall also
extend the number of samples by one or more additional blocks, if necessary, so that the number of blocks is an integer
multiple of Hi. Similarly, if yi is not a multiple of 8, the encoding process shall extend the number of lines to complete the
bottom-most block-row. If the component is to be interleaved, the encoding process shall also extend the number of lines
by one or more additional block-rows, if necessary, so that the number of block-rows is an integer multiple of Vi.
NOTE – It is recommended that any incomplete MCUs be completed by replication of the right-most column and the bottom
line of each component.
For lossless processes the data unit is a sample. If the component is to be interleaved, the encoding process shall extend
the number of samples, if necessary, so that the number is a multiple of Hi. Similarly, the encoding process shall extend
the number of lines, if necessary, so that the number of lines is a multiple of Vi.
Any sample added by an encoding process to complete partial MCUs shall be removed by the decoding process.
A.3
DCT compression
A.3.1
Level shift
Before a non-differential frame encoding process computes the FDCT for a block of source image samples, the samples
shall be level shifted to a signed representation by subtracting 2P – 1, where P is the precision parameter specified in B.2.2.
Thus, when P = 8, the level shift is by 128; when P = 12, the level shift is by 2048.
26
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
After a non-differential frame decoding process computes the IDCT and produces a block of reconstructed image samples,
an inverse level shift shall restore the samples to the unsigned representation by adding 2P – 1 and clamping the results to
the range 0 to 2P – 1.
A.3.2
Orientation of samples for FDCT computation
Figure A.4 shows an image component which has been partitioned into 8 × 8 blocks for the FDCT computations. Figure
A.4 also defines the orientation of the samples within a block by showing the indices used in the FDCT equation of A.3.3.
The definitions of block partitioning and sample orientation also apply to any DCT decoding process and the output
reconstructed image. Any sample added by an encoding process to complete partial MCUs shall be removed by the
decoding process.
Ci
Top
Left
s 00
s 01
s 07
s 10
s 11
s 17
s 70
s 71
Right
s 77
TISO0810-93/d017
Bottom
Figure A.4 – Partition and orientation of 8 x 8 sample blocks
FIGURE A.4 [D17] 6 cm = 234 %
A.3.3
FDCT and IDCT (informative)
The following equations specify the ideal functional definition of the FDCT and the IDCT.
NOTE – These equations contain terms which cannot be represented with perfect accuracy by any real implementation. The
accuracy requirements for the combined FDCT and quantization procedures are specified in Part 2 of this Specification. The accuracy
requirements for the combined dequantization and IDCT procedures are also specified in Part 2 of this Specification.
7
s yx cos
( 2 x + 1) uπ
( 2 y + 1) vπ
cos
16
16
Cu C v S vu cos
( 2 x + 1) uπ
( 2 y + 1) vπ
cos
16
16
FDCT:
S vu =
1
C C ∑
4 u v x =0
IDCT:
s yx =
1
∑
4 u=0
Cu , Cv = 1
2 for u , v = 0
7
7
∑
y=0
7
∑
v=0
where
Cu , Cv = 1 otherwise
otherwise.
A.3.4
DCT coefficient quantization (informative) and dequantization (normative)
After the FDCT is computed for a block, each of the 64 resulting DCT coefficients is quantized by a uniform quantizer.
The quantizer step size for each coefficient Svu is the value of the corresponding element Qvu from the quantization table
specified by the frame parameter Tqi (see B.2.2).
CCITT Rec. T.81 (1992 E)
27
ISO/IEC 10918-1 : 1993(E)
The uniform quantizer is defined by the following equation. Rounding is to the nearest integer:
Sqvu = round
F Svu I
G
J
H Qvu K
Sqvu is the quantized DCT coefficient, normalized by the quantizer step size.
NOTE – This equation contains a term which may not be represented with perfect accuracy by any real implementation. The
accuracy requirements for the combined FDCT and quantization procedures are specified in Part 2 of this Specification.
At the decoder, this normalization is removed by the following equation, which defines dequantization:
Rvu = Sqvu × Qvu
NOTE – Depending on the rounding used in quantization, it is possible that the dequantized coefficient may be outside the
expected range.
The relationship among samples, DCT coefficients, and quantization is illustrated in Figure A.5.
A.3.5
Differential DC encoding
After quantization, and in preparation for entropy encoding, the quantized DC coefficient Sq00 is treated separately from
the 63 quantized AC coefficients. The value that shall be encoded is the difference (DIFF) between the quantized DC
coefficient of the current block (DCi which is also designated as Sq00) and that of the previous block of the same
component (PRED):
DIFF = DCi − PRED
A.3.6
Zig-zag sequence
After quantization, and in preparation for entropy encoding, the quantized AC coefficients are converted to the zig-zag
sequence. The quantized DC coefficient (coefficient zero in the array) is treated separately, as defined in A.3.5. The zigzag sequence is specified in Figure A.6.
A.4
Point transform
For various procedures data may be optionally divided by a power of 2 by a point transform prior to coding. There are
three processes which require a point transform: lossless coding, lossless differential frame coding in the hierarchical
mode, and successive approximation coding in the progressive DCT mode.
In the lossless mode of operation the point transform is applied to the input samples. In the difference coding of the
hierarchical mode of operation the point transform is applied to the difference between the input component samples and
the reference component samples. In both cases the point transform is an integer divide by 2Pt, where Pt is the value of the
point transform parameter (see B.2.3).
In successive approximation coding the point transform for the AC coefficients is an integer divide by 2Al, where Al is the
successive approximation bit position, low (see B.2.3). The point transform for the DC coefficients is an arithmetic-shiftright by Al bits. This is equivalent to dividing by 2Pt before the level shift (see A.3.1).
The output of the decoder is rescaled by multiplying by 2Pt. An example of the point transform is given in K.10.
28
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Quantize
FDCT
s 00
s 01
s 10
s 11
Top
s 07
S 00
S 01
S 07
Sq 00 Sq 01
Sq 07
s 17
S 10
S 11
S 17
Sq 10 Sq 11
Sq 17
Left
s 70
s71
s77
round
S 70
Bottom
Svu
( )= Sq
Right
Source image samples
(after level shift)
S 71
vu
Qvu
Sq 70 Sq 71
S 77
Sq 77
Quantized DCT coefficients
DCT coefficients
Q 00
Q 01
Q 07
Q 10
Q 11
Q 17
Transmission
Q 70
Q 71
Q 77
Quantization table
r 00
r 01
r 10
r 11
Top
r 07
R 00
R 01
R 07
Sq 00 Sq 01
Sq 07
r 17
R 10
R 11
R 17
Sq 10 Sq 11
Sq 17
Left
Rvu = Sqvu × Qvu
Right
r 70
r 71
r 77
R 70
Bottom
R 71
R 77
Sq 70 Sq 71
Sq 77
TISO0820-93/d018
Dequantize
IDCT
Reconstructed image samples
(before level shift)
Dequantized DCT coefficients
Received quantized DCT coefficients
Figure A.5 – Relationship between 8 × 8-block samples and DCT coefficients
FIGURE A.5 [D18] 21 cm = 821 %
CCITT Rec. T.81 (1992 E)
29
ISO/IEC 10918-1 : 1993(E)
10
11
15
16
14
15
27
28
12
14
17
13
16
26
29
42
13
18
12
17
25
30
41
43
19
11
18
24
31
40
44
53
10
19
23
32
39
45
52
54
20
22
33
38
46
51
55
60
21
34
37
47
50
56
59
61
35
36
48
49
57
58
62
63
Figure A.6 – Zig-zag sequence of quantized DCT coefficients
A.5
Arithmetic procedures in lossless and hierarchical modes of operation
In the lossless mode of operation predictions are calculated with full precision and without clamping of either overflow or
underflow beyond the range of values allowed by the precision of the input. However, the division by two which is part of
some of the prediction calculations shall be approximated by an arithmetic-shift-right by one bit.
The two’s complement differences which are coded in either the lossless mode of operation or the differential frame
coding in the hierarchical mode of operation are calculated modulo 65 536, thereby restricting the precision of these
differences to a maximum of 16 bits. The modulo values are calculated by performing the logical AND operation of the
two’s complement difference with X’FFFF’. For purposes of coding, the result is still interpreted as a 16 bit two’s
complement difference. Modulo 65 536 arithmetic is also used in the decoder in calculating the output from the sum of
the prediction and this two’s complement difference.
30
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1 1993(E)
ISO/IEC 10918-1 : 1 1993(E)
CCITT Rec. T.81 (1992 E)
Annex B
Compressed data formats
(This annex forms an integral part of this Recommendation | International Standard)
ISO/IEC 10918-1 : 1993(E)
CCITT Rec. T.81 (1992 E)
This annex specifies three compressed data formats:
a)
b)
c)
the interchange format, specified in B.2 and B.3;
the abbreviated format for compressed image data, specified in B.4;
the abbreviated format for table-specification data, specified in B.5.
B.1 describes the constituent parts of these formats. B.1.3 and B.1.4 give the conventions for symbols and figures used in
the format specifications.
B.1
General aspects of the compressed data format specifications
Structurally, the compressed data formats consist of an ordered collection of parameters, markers, and entropy-coded data
segments. Parameters and markers in turn are often organized into marker segments. Because all of these constituent parts
are represented with byte-aligned codes, each compressed data format consists of an ordered sequence of 8-bit bytes. For
each byte, a most significant bit (MSB) and a least significant bit (LSB) are defined.
B.1.1
Constituent parts
This subclause gives a general description of each of the constituent parts of the compressed data format.
B.1.1.1 Parameters
Parameters are integers, with values specific to the encoding process, source image characteristics, and other features
selectable by the application. Parameters are assigned either 4-bit, 1-byte, or 2-byte codes. Except for certain optional
groups of parameters, parameters encode critical information without which the decoding process cannot properly
reconstruct the image.
The code assignment for a parameter shall be an unsigned integer of the specified length in bits with the particular value
of the parameter.
For parameters which are 2 bytes (16 bits) in length, the most significant byte shall come first in the compressed data’s
ordered sequence of bytes. Parameters which are 4 bits in length always come in pairs, and the pair shall always be
encoded in a single byte. The first 4-bit parameter of the pair shall occupy the most significant 4 bits of the byte. Within
any 16-, 8-, or 4-bit parameter, the MSB shall come first and LSB shall come last.
B.1.1.2 Markers
Markers serve to identify the various structural parts of the compressed data formats. Most markers start marker segments
containing a related group of parameters; some markers stand alone. All markers are assigned two-byte codes: an X’FF’
byte followed by a byte which is not equal to 0 or X’FF’ (see Table B.1). Any marker may optionally be preceded by any
number of fill bytes, which are bytes assigned code X’FF’.
NOTE – Because of this special code-assignment structure, markers make it possible for a decoder to parse the compressed
data and locate its various parts without having to decode other segments of image data.
B.1.1.3 Marker assignments
All markers shall be assigned two-byte codes: a X’FF’ byte followed by a second byte which is not equal to 0 or X’FF’.
The second byte is specified in Table B.1 for each defined marker. An asterisk (*) indicates a marker which stands alone,
that is, which is not the start of a marker segment.
CCITT Rec. T.81 (1992 E)
31
ISO/IEC 10918-1 : 1993(E)
Table B.1 – Marker code assignments
Code Assignment
Symbol
Description
Start Of Frame markers, non-differential, Huffman coding
SOF0
SOF1
SOF2
SOF3
X’FFC0’
X’FFC1’
X’FFC2’
X’FFC3’
Baseline DCT
Extended sequential DCT
Progressive DCT
Lossless (sequential)
Start Of Frame markers, differential, Huffman coding
SOF5
SOF6
SOF7
X’FFC5’
X’FFC6’
X’FFC7’
Differential sequential DCT
Differential progressive DCT
Differential lossless (sequential)
Start Of Frame markers, non-differential, arithmetic coding
JPG
SOF9
SOF10
SOF11
X’FFC8’
X’FFC9’
X’FFCA’
X’FFCB’
Reserved for JPEG extensions
Extended sequential DCT
Progressive DCT
Lossless (sequential)
Start Of Frame markers, differential, arithmetic coding
SOF13
SOF14
SOF15
X’FFCD’
X’FFCE’
X’FFCF’
Differential sequential DCT
Differential progressive DCT
Differential lossless (sequential)
Huffman table specification
X’FFC4’
DHT
Define Huffman table(s)
Arithmetic coding conditioning specification
X’FFCC’
DAC
Define arithmetic coding conditioning(s)
Restart interval termination
X’FFD0’ through X’FFD7’
RSTm*
Restart with modulo 8 count “m”
Other markers
X’FFD8’
X’FFD9’
X’FFDA’
X’FFDB’
X’FFDC’
X’FFDD’
X’FFDE’
X’FFDF’
X’FFE0’ through X’FFEF’
X’FFF0’ through X’FFFD’
X’FFFE’
SOI*
EOI*
SOS
DQT
DNL
DRI
DHP
EXP
APPn
JPGn
COM
Start of image
End of image
Start of scan
Define quantization table(s)
Define number of lines
Define restart interval
Define hierarchical progression
Expand reference component(s)
Reserved for application segments
Reserved for JPEG extensions
Comment
Reserved markers
X’FF01’
X’FF02’ through X’FFBF’
32
CCITT Rec. T.81 (1992 E)
TEM*
RES
For temporary private use in arithmetic coding
Reserved
ISO/IEC 10918-1 : 1993(E)
B.1.1.4 Marker segments
A marker segment consists of a marker followed by a sequence of related parameters. The first parameter in a marker
segment is the two-byte length parameter. This length parameter encodes the number of bytes in the marker segment,
including the length parameter and excluding the two-byte marker. The marker segments identified by the SOF and SOS
marker codes are referred to as headers: the frame header and the scan header respectively.
B.1.1.5 Entropy-coded data segments
An entropy-coded data segment contains the output of an entropy-coding procedure. It consists of an integer number of
bytes, whether the entropy-coding procedure used is Huffman or arithmetic.
NOTES
1
Making entropy-coded segments an integer number of bytes is performed as follows: for Huffman coding, 1-bits are
used, if necessary, to pad the end of the compressed data to complete the final byte of a segment. For arithmetic coding, byte alignment
is performed in the procedure which terminates the entropy-coded segment (see D.1.8).
2
In order to ensure that a marker does not occur within an entropy-coded segment, any X’FF’ byte generated by either a
Huffman or arithmetic encoder, or an X’FF’ byte that was generated by the padding of 1-bits described in NOTE 1 above, is followed
by a “stuffed” zero byte (see D.1.6 and F.1.2.3).
B.1.2
Syntax
In B.2 and B.3 the interchange format syntax is specified. For the purposes of this Specification, the syntax specification
consists of:
–
–
–
–
–
the required ordering of markers, parameters, and entropy-coded segments;
identification of optional or conditional constituent parts;
the name, symbol, and definition of each marker and parameter;
the allowed values of each parameter;
any restrictions on the above which are specific to the various coding processes.
The ordering of constituent parts and the identification of which are optional or conditional is specified by the syntax
figures in B.2 and B.3. Names, symbols, definitions, allowed values, conditions, and restrictions are specified immediately
below each syntax figure.
B.1.3
Conventions for syntax figures
The syntax figures in B.2 and B.3 are a part of the interchange format specification. The following conventions, illustrated
in Figure B.1, apply to these figures:
–
parameter/marker indicator: A thin-lined box encloses either a marker or a single parameter;
–
segment indicator: A thick-lined box encloses either a marker segment, an entropy-coded data segment,
or combinations of these;
–
parameter length indicator: The width of a thin-lined box is proportional to the parameter length (4, 8,
or 16 bits, shown as E, B, and D respectively in Figure B.1) of the marker or parameter it encloses; the
width of thick-lined boxes is not meaningful;
–
optional/conditional indicator: Square brackets indicate that a marker or marker segment is only
optionally or conditionally present in the compressed image data;
–
ordering: In the interchange format a parameter or marker shown in a figure precedes all of those shown
to its right, and follows all of those shown to its left;
–
entropy-coded data indicator: Angled brackets indicate that the entity enclosed has been entropy
encoded.
[
Optional
segment
[
Segment
[B]
D
E
F
TISO0830-93/d019
Figure B.1 – Syntax notation conventions
Figure B.1 [D19], = 3 cm = 118%
CCITT Rec. T.81 (1992 E)
33
ISO/IEC 10918-1 : 1993(E)
B.1.4
Conventions for symbols, code lengths, and values
Following each syntax figure in B.2 and B.3, the symbol, name, and definition for each marker and parameter shown in
the figure are specified. For each parameter, the length and allowed values are also specified in tabular form.
The following conventions apply to symbols for markers and parameters:
B.2
–
all marker symbols have three upper-case letters, and some also have a subscript. Examples: SOI, SOFn;
–
all parameter symbols have one upper-case letter; some also have one lower-case letter and some have
subscripts. Examples: Y, Nf, Hi, Tqi.
General sequential and progressive syntax
This clause specifies the interchange format syntax which applies to all coding processes for sequential DCT-based,
progressive DCT-based, and lossless modes of operation.
B.2.1
High-level syntax
Figure B.2 specifies the order of the high-level constituent parts of the interchange format for all non-hierarchical
encoding processes specified in this Specification.
Compressed image data
SOI
Frame
EOI
Frame
DNL
[ segment
[
Scan 1
[Scan 2
[Scan last
[
Frame header
[
[
[ Tables/
misc.
Scan
[
[ Tables/
misc.
Scan header
[ECS 0
Entropy-coded segment
<MCU 1>, <MCU 2>,
···
RST0
0
<MCU >Ri
ECS last-1
RST last-1 ]
Entropy-coded segment
<MCU n >, <MCU n + 1 >,
···
ECS last
last
<MCU
>
last
TISO0840-93/d020
Figure B.2 – Syntax for sequential DCT-based, progressive DCT-based,
and lossless modes of operation
Figure B.2 [D20], = 10 cm = 391.%
The three markers shown in Figure B.2 are defined as follows:
SOI: Start of image marker – Marks the start of a compressed image represented in the interchange format or
abbreviated format.
EOI: End of image marker – Marks the end of a compressed image represented in the interchange format or
abbreviated format.
RSTm: Restart marker – A conditional marker which is placed between entropy-coded segments only if restart
is enabled. There are 8 unique restart markers (m = 0 - 7) which repeat in sequence from 0 to 7, starting with
zero for each scan, to provide a modulo 8 restart interval count.
The top level of Figure B.2 specifies that the non-hierarchical interchange format shall begin with an SOI marker, shall
contain one frame, and shall end with an EOI marker.
34
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
The second level of Figure B.2 specifies that a frame shall begin with a frame header and shall contain one or more scans.
A frame header may be preceded by one or more table-specification or miscellaneous marker segments as specified in
B.2.4. If a DNL segment (see B.2.5) is present, it shall immediately follow the first scan.
For sequential DCT-based and lossless processes each scan shall contain from one to four image components. If two to
four components are contained within a scan, they shall be interleaved within the scan. For progressive DCT-based
processes each image component is only partially contained within any one scan. Only the first scan(s) for the components
(which contain only DC coefficient data) may be interleaved.
The third level of Figure B.2 specifies that a scan shall begin with a scan header and shall contain one or more entropycoded data segments. Each scan header may be preceded by one or more table-specification or miscellaneous marker
segments. If restart is not enabled, there shall be only one entropy-coded segment (the one labeled “last”), and no restart
markers shall be present. If restart is enabled, the number of entropy-coded segments is defined by the size of the image
and the defined restart interval. In this case, a restart marker shall follow each entropy-coded segment except the last one.
The fourth level of Figure B.2 specifies that each entropy-coded segment is comprised of a sequence of entropycoded MCUs. If restart is enabled and the restart interval is defined to be Ri, each entropy-coded segment except the last
one shall contain Ri MCUs. The last one shall contain whatever number of MCUs completes the scan.
Figure B.2 specifies the locations where table-specification segments may be present. However, this Specification hereby
specifies that the interchange format shall contain all table-specification data necessary for decoding the compressed
image. Consequently, the required table-specification data shall be present at one or more of the allowed locations.
B.2.2
Frame header syntax
Figure B.3 specifies the frame header which shall be present at the start of a frame. This header specifies the source image
characteristics (see A.1), the components in the frame, and the sampling factors for each component, and specifies the
destinations from which the quantized tables to be used with each component are retrieved.
Frame header
SOF n
Lf
P
Y
X
Nf
Component-specification
parameters
Frame component-specification parameters
C1
H 1 V1
Tq 1
C2
H 2 V2
Tq 2
C Nf
H Nf V Nf
Tq Nf
TISO0850-93/d021
Figure B.3 – Frame header syntax
Figure B.3 [D21], = 5.5 cm = 215.%
The markers and parameters shown in Figure B.3 are defined below. The size and allowed values of each parameter are
given in Table B.2. In Table B.2 (and similar tables which follow), value choices are separated by commas (e.g. 8, 12) and
inclusive bounds are separated by dashes (e.g. 0 - 3).
SOFn: Start of frame marker – Marks the beginning of the frame parameters. The subscript n identifies whether
the encoding process is baseline sequential, extended sequential, progressive, or lossless, as well as which
entropy encoding procedure is used.
SOF0:
Baseline DCT
SOF1:
Extended sequential DCT, Huffman coding
SOF2:
Progressive DCT, Huffman coding
CCITT Rec. T.81 (1992 E)
35
ISO/IEC 10918-1 : 1993(E)
SOF3:
Lossless (sequential), Huffman coding
SOF9:
Extended sequential DCT, arithmetic coding
SOF10: Progressive DCT, arithmetic coding
SOF11: Lossless (sequential), arithmetic coding
Lf: Frame header length – Specifies the length of the frame header shown in Figure B.3 (see B.1.1.4).
P: Sample precision – Specifies the precision in bits for the samples of the components in the frame.
Y: Number of lines – Specifies the maximum number of lines in the source image. This shall be equal to the
number of lines in the component with the maximum number of vertical samples (see A.1.1). Value 0 indicates
that the number of lines shall be defined by the DNL marker and parameters at the end of the first scan (see
B.2.5).
X: Number of samples per line – Specifies the maximum number of samples per line in the source image. This
shall be equal to the number of samples per line in the component with the maximum number of horizontal
samples (see A.1.1).
Nf: Number of image components in frame – Specifies the number of source image components in the frame.
The value of Nf shall be equal to the number of sets of frame component specification parameters (Ci, Hi, Vi,
and Tqi) present in the frame header.
Ci: Component identifier – Assigns a unique label to the ith component in the sequence of frame component
specification parameters. These values shall be used in the scan headers to identify the components in the scan.
The value of Ci shall be different from the values of C1 through Ci − 1.
Hi: Horizontal sampling factor – Specifies the relationship between the component horizontal dimension
and maximum image dimension X (see A.1.1); also specifies the number of horizontal data units of component
Ci in each MCU, when more than one component is encoded in a scan.
Vi: Vertical sampling factor – Specifies the relationship between the component vertical dimension and
maximum image dimension Y (see A.1.1); also specifies the number of vertical data units of component Ci in
each MCU, when more than one component is encoded in a scan.
Tqi: Quantization table destination selector – Specifies one of four possible quantization table destinations
from which the quantization table to use for dequantization of DCT coefficients of component Ci is retrieved. If
the decoding process uses the dequantization procedure, this table shall have been installed in this destination
by the time the decoder is ready to decode the scan(s) containing component Ci. The destination shall not be respecified, or its contents changed, until all scans containing Ci have been completed.
Table B.2 – Frame header parameter sizes and values
Values
Parameter
Size (bits)
Sequential DCT
Baseline
36
Progressive DCT
Extended
8 + 3 × Nf
Lf
16
P
18
Y
16
0-65 535
X
16
1-65 535
Nf
18
Ci
18
0-25535
Hi
14
1-43550
Vi
14
1-43550
Tqi
18
CCITT Rec. T.81 (1992 E)
Lossless
8-255
1-255
0-312
8, 12
1-255
0-355
8, 12
2-165
1-4
1-255
0-3
0-125
ISO/IEC 10918-1 : 1993(E)
B.2.3
Scan header syntax
Figure B.4 specifies the scan header which shall be present at the start of a scan. This header specifies which
component(s) are contained in the scan, specifies the destinations from which the entropy tables to be used with each
component are retrieved, and (for the progressive DCT) which part of the DCT quantized coefficient data is contained in
the scan. For lossless processes the scan parameters specify the predictor and the point transform.
NOTE – If there is only one image component present in a scan, that component is, by definition, non-interleaved. If there is
more than one image component present in a scan, the components present are, by definition, interleaved.
Scan header
SOS
Ls
Ns
Component-specification
parameters
Ss
Se
Ah
Al
Scan component-specification parameters
Cs 1
Td 1 Ta 1
Cs 2
Td 2 Ta 2
Cs Ns
TISO0860-93/d022
Td Ns Ta Ns
Figure B.4 – Scan header syntax
Figure B.4 [D22], = 5.5 cm = 215.%
The marker and parameters shown in Figure B.4 are defined below. The size and allowed values of each parameter are
given in Table B.3.
SOS: Start of scan marker – Marks the beginning of the scan parameters.
Ls: Scan header length – Specifies the length of the scan header shown in Figure B.4 (see B.1.1.4).
Ns: Number of image components in scan – Specifies the number of source image components in the scan. The
value of Ns shall be equal to the number of sets of scan component specification parameters (Csj, Tdj, and Taj)
present in the scan header.
Csj: Scan component selector – Selects which of the Nf image components specified in the frame parameters
shall be the jth component in the scan. Each Csj shall match one of the Ci values specified in the frame header,
and the ordering in the scan header shall follow the ordering in the frame header. If Ns > 1, the order of
interleaved components in the MCU is Cs1 first, Cs2 second, etc. If Ns > 1, the following restriction shall be
placed on the image components contained in the scan:
Ns
∑
j =1
H j × V j ≤ 10,
where Hj and Vj are the horizontal and vertical sampling factors for scan component j. These sampling factors
are specified in the frame header for component i, where i is the frame component specification index for which
frame component identifier Ci matches scan component selector Csj.
As an example, consider an image having 3 components with maximum dimensions of 512 lines and
512 samples per line, and with the following sampling factors:
Component 0
H 0 = 4,
V0 = 1
Component 1
H1 = 1,
V1 = 2
Component 2
H2 = 2
V2 = 2
Then the summation of Hj × Vj is (4 × 1) + (1 × 2) + (2 × 2) = 10.
The value of Csj shall be different from the values of Cs1 to Csj – 1.
CCITT Rec. T.81 (1992 E)
37
ISO/IEC 10918-1 : 1993(E)
Tdj: DC entropy coding table destination selector – Specifies one of four possible DC entropy coding table
destinations from which the entropy table needed for decoding of the DC coefficients of component Csj is
retrieved. The DC entropy table shall have been installed in this destination (see B.2.4.2 and B.2.4.3) by the
time the decoder is ready to decode the current scan. This parameter specifies the entropy coding table
destination for the lossless processes.
Taj: AC entropy coding table destination selector – Specifies one of four possible AC entropy coding table
destinations from which the entropy table needed for decoding of the AC coefficients of component Csj is
retrieved. The AC entropy table selected shall have been installed in this destination (see B.2.4.2 and B.2.4.3)
by the time the decoder is ready to decode the current scan. This parameter is zero for the lossless processes.
Ss: Start of spectral or predictor selection – In the DCT modes of operation, this parameter specifies the first
DCT coefficient in each block in zig-zag order which shall be coded in the scan. This parameter shall be set to
zero for the sequential DCT processes. In the lossless mode of operations this parameter is used to select the
predictor.
Se: End of spectral selection – Specifies the last DCT coefficient in each block in zig-zag order which shall be
coded in the scan. This parameter shall be set to 63 for the sequential DCT processes. In the lossless mode of
operations this parameter has no meaning. It shall be set to zero.
Ah: Successive approximation bit position high – This parameter specifies the point transform used in the
preceding scan (i.e. successive approximation bit position low in the preceding scan) for the band of coefficients
specified by Ss and Se. This parameter shall be set to zero for the first scan of each band of coefficients. In the
lossless mode of operations this parameter has no meaning. It shall be set to zero.
Al: Successive approximation bit position low or point transform – In the DCT modes of operation this
parameter specifies the point transform, i.e. bit position low, used before coding the band of coefficients
specified by Ss and Se. This parameter shall be set to zero for the sequential DCT processes. In the lossless
mode of operations, this parameter specifies the point transform, Pt.
The entropy coding table destination selectors, Td j and Taj, specify either Huffman tables (in frames using Huffman
coding) or arithmetic coding tables (in frames using arithmetic coding). In the latter case the entropy coding table
destination selector specifies both an arithmetic coding conditioning table destination and an associated statistics area.
Table B.3 – Scan header parameter size and values
Values
Parameter
Size (bits)
Sequential DCT
Baseline
38
Progressive DCT
Lossless
Extended
Ls
16
6 + 2 × Ns
Ns
18
1-4
Csj
18
0-255a)
Tdj
14
0-1
0-3
0-3
0-3
Taj
14
0-1
0-3
0-3
0
Ss
18
0-1
0-1
0-63
1-7b)
Se
18
63-
63-
Ss-63c)
0
Ah
14
0-1
0-1
0-13
0
Al
14
0-1
0-1
0-13
0-15
a)
Csj shall be a member of the set of Ci specified in the frame header.
b)
0
for lossless differential frames in the hierarchical mode (see B.3).
c)
0
if Ss equals zero.
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
B.2.4
Table-specification and miscellaneous marker segment syntax
Figure B.5 specifies that, at the places indicated in Figure B.2, any of the table-specification segments or miscellaneous
marker segments specified in B.2.4.1 through B.2.4.6 may be present in any order and with no limit on the number of
segments.
If any table specification for a particular destination occurs in the compressed image data, it shall replace any previous
table specified for this destination, and shall be used whenever this destination is specified in the remaining scans in the
frame or subsequent images represented in the abbreviated format for compressed image data. If a table specification for a
given destination occurs more than once in the compressed image data, each specification shall replace the previous
specification. The quantization table specification shall not be altered between progressive DCT scans of a given
component.
Tables or miscellaneous marker segment
[
Marker
]
segment 1
[
Marker
]
segment 2
[
Marker
segment
]
last
TISO0870-93/d023
Marker segment
Quantization table-specification
or
Huffman table-specification
or
Arithmetic conditioning table-specification
or
Restart interval definition
or
Comment
or
Application data
Figure B.5 – Tables/miscellaneous marker segment syntax
Figure B.5 [D23], = 7.5 cm = 293.%
B.2.4.1 Quantization table-specification syntax
Figure B.6 specifies the marker segment which defines one or more quantization tables.
Define quantization table segment
DQT
Lq
Pq Tq
Q0
Q1
Q 63
TISO0880-93/d024
Multiple (t = 1, ..., n)
Figure B.6 – Quantization table syntax
Figure B.6 [D24], = 3.5 cm = 136.%
The marker and parameters shown in Figure B.6 are defined below. The size and allowed values of each parameter are
given in Table B.4.
DQT: Define quantization table marker – Marks the beginning of quantization table-specification parameters.
Lq: Quantization table definition length – Specifies the length of all quantization table parameters shown in
Figure B.6 (see B.1.1.4).
CCITT Rec. T.81 (1992 E)
39
ISO/IEC 10918-1 : 1993(E)
Pq: Quantization table element precision – Specifies the precision of the Qk values. Value 0 indicates 8-bit Qk
values; value 1 indicates 16-bit Qk values. Pq shall be zero for 8 bit sample precision P (see B.2.2).
Tq: Quantization table destination identifier – Specifies one of four possible destinations at the decoder into
which the quantization table shall be installed.
Qk: Quantization table element – Specifies the kth element out of 64 elements, where k is the index in the zigzag ordering of the DCT coefficients. The quantization elements shall be specified in zig-zag scan order.
Table B.4 – Quantization table-specification parameter sizes and values
Values
Parameter
Size (bits)
Sequential DCT
Baseline
Progressive DCT
Lossless
Extended
n
2 + ∑
b 65
+ 64 × Pq(t )g
Lq
16 16
Pq
14 16
Tq
14 16
0-3
Undefined
Qk
18, 16
1-255, 1-65 535
Undefined
t =1
0
0, 1
Undefined
0, 1
Undefined
The value n in Table B.4 is the number of quantization tables specified in the DQT marker segment.
Once a quantization table has been defined for a particular destination, it replaces the previous tables stored in that
destination and shall be used, when referenced, in the remaining scans of the current image and in subsequent images
represented in the abbreviated format for compressed image data. If a table has never been defined for a particular
destination, then when this destination is specified in a frame header, the results are unpredictable.
An 8-bit DCT-based process shall not use a 16-bit precision quantization table.
B.2.4.2 Huffman table-specification syntax
Figure B.7 specifies the marker segment which defines one or more Huffman table specifications.
Define Huffman table segment
DHT
Lh
Tc Th
L1
L2
Symbol-length
assignment
L 16
Multiple (t = 1, ..., n)
Symbol-length assignment parameters
V1,1
V1,2
V1,L
1
V 2,1
V 2,2
V2,L
2
V16,1
V16,2
V16,L
16
TISO0890-93/d025
Figure B.7 – Huffman table syntax
Figure B.7 [D25], = 5.5 cm = 215.%
40
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
The marker and parameters shown in Figure B.7 are defined below. The size and allowed values of each parameter are
given in Table B.5.
DHT: Define Huffman table marker – Marks the beginning of Huffman table definition parameters.
Lh: Huffman table definition length – Specifies the length of all Huffman table parameters shown in Figure B.7
(see B.1.1.4).
Tc: Table class – 0 = DC table or lossless table, 1 = AC table.
Th: Huffman table destination identifier – Specifies one of four possible destinations at the decoder into which
the Huffman table shall be installed.
Li: Number of Huffman codes of length i – Specifies the number of Huffman codes for each of the 16 possible
lengths allowed by this Specification. Li’s are the elements of the list BITS.
Vi,j: Value associated with each Huffman code – Specifies, for each i, the value associated with each Huffman
code of length i. The meaning of each value is determined by the Huffman coding model. The Vi,j’s are the
elements of the list HUFFVAL.
Table B.5 – Huffman table specification parameter sizes and values
Values
Parameter
Size (bits)
Sequential DCT
Baseline
Progressive DCT
n
Lh
Lossless
Extended
2 + ∑
16
t =1
+ mt h
c17
Tc
14
0, 1
0
Th
14
Li
18
0-255
Vi, j
18
0-255
0, 1
0-3
The value n in Table B.5 is the number of Huffman tables specified in the DHT marker segment. The value mt is the
number of parameters which follow the 16 Li(t) parameters for Huffman table t, and is given by:
16
m t = ∑ Li
i =1
In general, mt is different for each table.
Once a Huffman table has been defined for a particular destination, it replaces the previous tables stored in that
destination and shall be used when referenced, in the remaining scans of the current image and in subsequent images
represented in the abbreviated format for compressed image data. If a table has never been defined for a particular
destination, then when this destination is specified in a scan header, the results are unpredictable.
CCITT Rec. T.81 (1992 E)
41
ISO/IEC 10918-1 : 1993(E)
B.2.4.3 Arithmetic conditioning table-specification syntax
Figure B.8 specifies the marker segment which defines one or more arithmetic coding conditioning table specifications.
These replace the default arithmetic coding conditioning tables established by the SOI marker for arithmetic coding
processes. (See F.1.4.4.1.4 and F.1.4.4.2.1.)
Define arithmetic conditioning segment
DAC
La
Tc Tb
Cs
TISO0900-93/d026
Multiple (t = 1, ..., n)
Figure B.8 – Arithmetic conditioning table-specification syntax
Figure B.8 [D26], = 3 cm = 117.%
The marker and parameters shown in Figure B.8 are defined below. The size and allowed values of each parameter are
given in Table B.6.
DAC: Define arithmetic coding conditioning marker – Marks the beginning of the definition of arithmetic
coding conditioning parameters.
La: Arithmetic coding conditioning definition length – Specifies the length of all arithmetic coding
conditioning parameters shown in Figure B.8 (see B.1.1.4).
Tc: Table class – 0 = DC table or lossless table, 1 = AC table.
Tb: Arithmetic coding conditioning table destination identifier – Specifies one of four possible destinations at
the decoder into which the arithmetic coding conditioning table shall be installed.
Cs: Conditioning table value – Value in either the AC or the DC (and lossless) conditioning table. A single
value of Cs shall follow each value of Tb. For AC conditioning tables Tc shall be one and Cs shall contain a
value of Kx in the range 1 ≤ Kx ≤ 63. For DC (and lossless) conditioning tables Tc shall be zero and Cs shall
contain two 4-bit parameters, U and L. U and L shall be in the range 0 ≤ L ≤ U ≤ 15 and the value of Cs shall be
L + 16 × U.
The value n in Table B.6 is the number of arithmetic coding conditioning tables specified in the DAC marker segment.
The parameters L and U are the lower and upper conditioning bounds used in the arithmetic coding procedures defined
for DC coefficient coding and lossless coding. The separate value range 1-63 listed for DCT coding is the Kx conditioning
used in AC coefficient coding.
Table B.6 – Arithmetic coding conditioning table-specification parameter sizes and values
Values
Parameter
Size (bits)
Sequential DCT
Baseline
42
La
16
Undefined
Tc
14
Undefined
Tb
14
Undefined
Cs
18
Undefined
CCITT Rec. T.81 (1992 E)
Progressive DCT
Lossless
Extended
2+2×n
0, 1
0-255
0-3
0-255 (Tc = 0), 1-63 (Tc = 1)
0-255
ISO/IEC 10918-1 : 1993(E)
B.2.4.4 Restart interval definition syntax
Figure B.9 specifies the marker segment which defines the restart interval.
Define restart interval segment
DRI
Lr
Ri
TISO0910-93/d027
Figure B.9 – Restart interval definition syntax
Figure B.9 [D27], = 2.5 cm = 98.%
The marker and parameters shown in Figure B.9 are defined below. The size and allowed values of each parameter are
given in Table B.7.
DRI: Define restart interval marker – Marks the beginning of the parameters which define the restart interval.
Lr: Define restart interval segment length – Specifies the length of the parameters in the DRI segment shown in
Figure B.9 (see B.1.1.4).
Ri: Restart interval – Specifies the number of MCU in the restart interval.
In Table B.7 the value n is the number of rows of MCU in the restart interval. The value MCUR is the number of MCU
required to make up one line of samples of each component in the scan. The SOI marker disables the restart intervals. A
DRI marker segment with Ri nonzero shall be present to enable restart interval processing for the following scans. A DRI
marker segment with Ri equal to zero shall disable restart intervals for the following scans.
Table B.7 – Define restart interval segment parameter sizes and values
Values
Parameter
Size (bits)
Sequential DCT
Baseline
Lr
16
Ri
16
Progressive DCT
Lossless
Extended
4
n × MCUR
0-65 535
B.2.4.5 Comment syntax
Figure B.10 specifies the marker segment structure for a comment segment.
Comment segment
COM
Lc
Cm 1 . . . Cm Lc-2
TISO00920-93/d028
Figure B.10 – Comment segment syntax
Figure B.10 [D28], = 2.8cm = 98.%
CCITT Rec. T.81 (1992 E)
43
ISO/IEC 10918-1 : 1993(E)
The marker and parameters shown in Figure B.10 are defined below. The size and allowed values of each parameter are
given in Table B.8.
COM: Comment marker – Marks the beginning of a comment.
Lc: Comment segment length – Specifies the length of the comment segment shown in Figure B.10
(see B.1.1.4).
Cmi: Comment byte – The interpretation is left to the application.
Table B.8 – Comment segment parameter sizes and values
Values
Parameter
Size (bits)
Sequential DCT
Baseline
Progressive DCT
Lossless
Extended
Lc
16
2-65 535
Cmi
18
0-25522
B.2.4.6 Application data syntax
Figure B.11 specifies the marker segment structure for an application data segment.
Application data segment
APP n
Lp
Ap 1 . . . Ap Lp-2
TISO0930-93/d029
Figure B.11 – Application data syntax
Figure B.11 [D29], = 2.8 cm = 98.%
The marker and parameters shown in Figure B.11 are defined below. The size and allowed values of each parameter are
given in Table B.9.
APPn: Application data marker – Marks the beginning of an application data segment.
Lp: Application data segment length – Specifies the length of the application data segment shown in
Figure B.11 (see B.1.1.4).
Api: Application data byte – The interpretation is left to the application.
The APPn (Application) segments are reserved for application use. Since these segments may be defined differently for
different applications, they should be removed when the data are exchanged between application environments.
Table B.9 – Application data segment parameter sizes and values
Values
Parameter
Size (bits)
Sequential DCT
Baseline
44
Progressive DCT
Extended
Lp
16
2-65 535
Api
18
0-25522
CCITT Rec. T.81 (1992 E)
Lossless
ISO/IEC 10918-1 : 1993(E)
B.2.5
Define number of lines syntax
Figure B.12 specifies the marker segment for defining the number of lines. The DNL (Define Number of Lines) segment
provides a mechanism for defining or redefining the number of lines in the frame (the Y parameter in the frame header) at
the end of the first scan. The value specified shall be consistent with the number of MCU-rows encoded in the first scan.
This segment, if used, shall only occur at the end of the first scan, and only after coding of an integer number of MCUrows. This marker segment is mandatory if the number of lines (Y) specified in the frame header has the value zero.
Define number of lines segment
DNL
Ld
NL
TISO0940-93/d030
Figure B.12 – Define number of lines syntax
Figure B.12 [D30], = 2.8 cm = 98.%
The marker and parameters shown in Figure B.12 are defined below. The size and allowed values of each parameter are
given in Table B.10.
DNL: Define number of lines marker – Marks the beginning of the define number of lines segment.
Ld: Define number of lines segment length – Specifies the length of the define number of lines segment shown
in Figure B.12 (see B.1.1.4).
NL: Number of lines – Specifies the number of lines in the frame (see definition of Y in B.2.2).
Table B.10 – Define number of lines segment parameter sizes and values
Values
Parameter
Size (bits)
Sequential DCT
Baseline
a)
Progressive DCT
Lossless
Extended
Ld
16
4-65535a)
NL
16
1-65 535a)
The value specified shall be consistent with the number of lines coded at the point where the DNL segment
terminates the compressed data segment.
B.3
Hierarchical syntax
B.3.1
High level hierarchical mode syntax
Figure B.13 specifies the order of the high level constituent parts of the interchange format for hierarchical encoding
processes.
CCITT Rec. T.81 (1992 E)
45
ISO/IEC 10918-1 : 1993(E)
Compressed image data
SOI
[Tables/misc.]
DHP segment
Frame last
Frame 1
EOI
TISO0950-93/d031
Figure B.13 – Syntax for the hierarchical mode of operation
Figure B.13 [D31], = 3 cm = 117.%
Hierarchical mode syntax requires a DHP marker segment that appears before the non-differential frame or frames. The
hierarchical mode compressed image data may include EXP marker segments and differential frames which shall follow
the initial non-differential frame. The frame structure in hierarchical mode is identical to the frame structure in nonhierarchical mode.
The non-differential frames in the hierarchical sequence shall use one of the coding processes specified for SOFn markers:
SOF0, SOF1, SOF2, SOF3, SOF9, SOF10 and SOF11. The differential frames shall use one of the processes specified for
SOF5, SOF6, SOF7, SOF13, SOF14 and SOF15. The allowed combinations of SOF markers within one hierarchical
sequence are specified in Annex J.
The sample precision (P) shall be constant for all frames and have the identical value as that coded in the DHP marker
segment. The number of samples per line (X) for all frames shall not exceed the value coded in the DHP marker segment.
If the number of lines (Y) is non-zero in the DHP marker segment, then the number of lines for all frames shall not exceed
the value in the DHP marker segment.
B.3.2
DHP segment syntax
The DHP segment defines the image components, size, and sampling factors for the completed hierarchical sequence of
frames. The DHP segment shall precede the first frame; a single DHP segment shall occur in the compressed image data.
The DHP segment structure is identical to the frame header syntax, except that the DHP marker is used instead of the
SOFn marker. The figures and description of B.2.2 then apply, except that the quantization table destination selector
parameter shall be set to zero in the DHP segment.
B.3.3
EXP segment syntax
Figure B.14 specifies the marker segment structure for the EXP segment. The EXP segment shall be present if (and only
if) expansion of the reference components is required either horizontally or vertically. The EXP segment parameters apply
only to the next frame (which shall be a differential frame) in the image. If required, the EXP segment shall be one of the
table-specification segments or miscellaneous marker segments preceding the frame header; the EXP segment shall not be
one of the table-specification segments or miscellaneous marker segments preceding a scan header or a DHP marker
segment.
Expand segment
EXP
Le
Eh Ev
TISO0960-93/d032
Figure B.14 – Syntax of the expand segment
Figure B.14 [D32], = 2.5 cm = 98.%
46
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
The marker and parameters shown in Figure B.14 are defined below. The size and allowed values of each parameter are
given in Table B.11.
EXP: Expand reference components marker – Marks the beginning of the expand reference components
segment.
Le: Expand reference components segment length – Specifies the length of the expand reference components
segment (see B.1.1.4).
Eh: Expand horizontally – If one, the reference components shall be expanded horizontally by a factor of two.
If horizontal expansion is not required, the value shall be zero.
Ev: Expand vertically – If one, the reference components shall be expanded vertically by a factor of two.
If vertical expansion is not required, the value shall be zero.
Both Eh and Ev shall be one if expansion is required both horizontally and vertically.
Table B.11 – Expand segment parameter sizes and values
Values
Parameter
Size (bits)
Sequential DCT
Baseline
B.4
Progressive DCT
Lossless
Extended
Le
16
3,1
Eh
14
0, 1
Ev
14
0, 1
Abbreviated format for compressed image data
Figure B.2 shows the high-level constituent parts of the interchange format. This format includes all table specifications
required for decoding. If an application environment provides methods for table specification other than by means of the
compressed image data, some or all of the table specifications may be omitted. Compressed image data which is missing
any table specification data required for decoding has the abbreviated format.
B.5
Abbreviated format for table-specification data
Figure B.2 shows the high-level constituent parts of the interchange format. If no frames are present in the compressed
image data, the only purpose of the compressed image data is to convey table specifications or miscellaneous marker
segments defined in B.2.4.1, B.2.4.2, B.2.4.5, and B.2.4.6. In this case the compressed image data has the abbreviated
format for table specification data (see Figure B.15).
Compressed image data
SOI
[Tables/misc.]
EOI
TISO0970-93/d033
Figure B.15 – Abbreviated format for table-specification data syntax
Figure B.15 [D33], = 3 cm = 117.% dim. à 100
B.6
Summary
The order of the constituent parts of interchange format and all marker segment structures is summarized in Figures B.16
and B.17. Note that in Figure B.16 double-lined boxes enclose marker segments. In Figures B.16 and B.17 thick-lined
boxes enclose only markers.
The EXP segment can be mixed with the other tables/miscellaneous marker segments preceding the frame header but not
with the tables/miscellaneous marker segments preceding the DHP segment or the scan header.
CCITT Rec. T.81 (1992 E)
47
SOI
48
Tables/
miscellaneous
CCITT Rec. T.81 (1992 E)
DHP
Figure à l'italienne B.16 [D34], = 21 cm = 821.%
SOFn
Tables/
miscellaneous
SOS
Restart enabled
RST i(modulo 8)
i=0 to last-1
Restart not enabled
ECS i
Figure B.16 – Flow of compressed data syntax
Non-expansion
of reference
components
EXP
Non-hierarchical mode
Tables/
miscellaneous
Hierarchical mode
Abbreviated format for table-specification data
ECSlast
TISO0980-93/d034
From second
scan to last,
first scan when
number of lines
defined correctly
in frame header
DNL
Multi-scan
Multi-frame
EOI
ISO/IEC 10918-1 : 1993(E)
Le
Ld
Lf
Ls
Lf
Eh
NL
P
Ns
P
Lq
Lh
CCITT Rec. T.81 (1992 E)
Lp
Lc
Lr
La
Th
Tq
Nf
Taii
Nf
L1
Q0
Tc
Ap 1 . . . . Ap Lp-2
Tb
Hi
V1,1
Q 63
(=0)
Tq i
Al
Tq i
Default condition
L 16
Multiple (n times)
Q1
Vi
Figure B.17 – Flow of marker segment
Cs
Ah
Vi
i=1 to Nf
Se
Hi
i=1 to Nf
Multiple (n times)
Ci
Ss
Ci
Multiple (n times)
Tc
Pq
X
Td i
i=1 to Ns
X
Cm 1 . . . . Cm Lc-2
Ri
Ev
Y
Cs i
Y
Abbreviated format or some tables not in this position
APP n
Application
COM
Comment
DRI
Restart interval
DAC
Arithmetic coding table(s)
DHT
Huffman coding table(s)
DQT
Quantization table(s)
(Tables/miscellaneous)
EXP
(EXP segment)
DNL
(DNL segment)
DHP
(DHP segment)
SOS
(Scan header)
SOFn
(Frame header)
TISO0990-93/d035
V 16,L16
ISO/IEC 10918-1 : 1993(E)
Figure à l'italienne B.17 [D35], = 21 cm = 821.%
49
ISO/IEC 10918-1 : 1993(E)
Annex C
Huffman table specification
(This annex forms an integral part of this Recommendation | International Standard)
A Huffman coding procedure may be used for entropy coding in any of the coding processes. Coding models for
Huffman encoding are defined in Annexes F, G, and H. In this Annex, the Huffman table specification is defined.
Huffman tables are specified in terms of a 16-byte list (BITS) giving the number of codes for each code length from
1 to 16. This is followed by a list of the 8-bit symbol values (HUFFVAL), each of which is assigned a Huffman code. The
symbol values are placed in the list in order of increasing code length. Code lengths greater than 16 bits are not allowed.
In addition, the codes shall be generated such that the all-1-bits code word of any length is reserved as a prefix for longer
code words.
NOTE – The order of the symbol values within HUFFVAL is determined only by code length. Within a given code length
the ordering of the symbol values is arbitrary.
This annex specifies the procedure by which the Huffman tables (of Huffman code words and their corresponding 8-bit
symbol values) are derived from the two lists (BITS and HUFFVAL) in the interchange format. However, the way in
which these lists are generated is not specified. The lists should be generated in a manner which is consistent with the
rules for Huffman coding, and it shall observe the constraints discussed in the previous paragraph. Annex K contains an
example of a procedure for generating lists of Huffman code lengths and values which are in accord with these rules.
NOTE – There is no requirement in this Specification that any encoder or decoder shall implement the procedures in
precisely the manner specified by the flow charts in this annex. It is necessary only that an encoder or decoder implement the function
specified in this annex. The sole criterion for an encoder or decoder to be considered in compliance with this Specification is that it
satisfy the requirements given in clause 6 (for encoders) or clause 7 (for decoders), as determined by the compliance tests specified in
Part 2.
C.1
Marker segments for Huffman table specification
The DHT marker identifies the start of Huffman table definitions within the compressed image data. B.2.4.2 specifies the
syntax for Huffman table specification.
C.2
Conversion of Huffman table specifications to tables of codes and code lengths
Conversion of Huffman table specifications to tables of codes and code lengths uses three procedures. The first procedure
(Figure C.1) generates a table of Huffman code sizes. The second procedure (Figure C.2) generates the Huffman codes
from the table built in Figure C.1. The third procedure (Figure C.3) generates the Huffman codes in symbol value order.
Given a list BITS (1 to 16) containing the number of codes of each size, and a list HUFFVAL containing the symbol
values to be associated with those codes as described above, two tables are generated. The HUFFSIZE table contains a list
of code lengths; the HUFFCODE table contains the Huffman codes corresponding to those lengths.
Note that the variable LASTK is set to the index of the last entry in the table.
50
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Generate_size_table
K=0
I=1
J=1
HUFFSIZE(K) = I
K=K+1
J=J+1
No
J > BITS(I)
?
Yes
I=I+1
J=1
No
I > 16
?
Yes
HUFFSIZE(K) = 0
LASTK = K
Done
TISO1000-93/d036
Figure C.1 – Generation of table of Huffman code sizes
Figure C.1 [D36], = 16 cm = 625 %
CCITT Rec. T.81 (1992 E)
51
ISO/IEC 10918-1 : 1993(E)
A Huffman code table, HUFFCODE, containing a code for each size in HUFFSIZE is generated by the procedure in
Figure C.2. The notation “SLL CODE 1” in Figure C.2 indicates a shift-left-logical of CODE by one bit position.
Generate_code_table
K= 0
CODE = 0
SI = HUFFSIZE(0)
HUFFCODE(K) = CODE
CODE = CODE + 1
K=K+1
Yes
HUFFSIZE(K) = SI
?
No
HUFFSIZE(K) = 0
?
Yes
No
CODE = SLL CODE 1
SI = SI + 1
Done
TISO1010-93/d037
Yes
HUFFSIZE(K) = SI
?
No
Figure C.2 – Generation of table of Huffman codes
Figure C.2 [D37], = 16.5 cm = 645.%
Two tables, HUFFCODE and HUFFSIZE, have now been generated. The entries in the tables are ordered according to
increasing Huffman code numeric value and length.
The encoding procedure code tables, EHUFCO and EHUFSI, are created by reordering the codes specified by
HUFFCODE and HUFFSIZE according to the symbol values assigned to each code in HUFFVAL.
52
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Figure C.3 illustrates this ordering procedure.
Order_codes
K=0
I = HUFFVAL(K)
EHUFCO(I) = HUFFCODE(K)
EHUFSI(I) = HUFFSIZE(K)
K=K+1
Yes
K < LASTK
?
No
Done
TISO1020-93/d038
Figure C.3 – Ordering procedure for encoding procedure code tables
Figure C.3 [D38], = 11.5 cm = 449.%
C.3
Bit ordering within bytes
The root of a Huffman code is placed toward the MSB (most-significant-bit) of the byte, and successive bits are placed in
the direction MSB to LSB (least-significant-bit) of the byte. Remaining bits, if any, go into the next byte following the
same rules.
Integers associated with Huffman codes are appended with the MSB adjacent to the LSB of the preceding Huffman code.
CCITT Rec. T.81 (1992 E)
53
ISO/IEC 10918-1 : 1993(E)
Annex D
Arithmetic coding
(This annex forms an integral part of this Recommendation | International Standard)
An adaptive binary arithmetic coding procedure may be used for entropy coding in any of the coding processes except
the baseline sequential process. Coding models for adaptive binary arithmetic coding are defined in Annexes F, G,
and H. In this annex the arithmetic encoding and decoding procedures used in those models are defined.
In K.4 a simple test example is given which should be helpful in determining if a given implementation is correct.
NOTE – There is no requirement in this Specification that any encoder or decoder shall implement the procedures in
precisely the manner specified by the flow charts in this annex. It is necessary only that an encoder or decoder implement the function
specified in this annex. The sole criterion for an encoder or decoder to be considered in compliance with this Specification is that it
satisfy the requirements given in clause 6 (for encoders) or clause 7 (for decoders), as determined by the compliance tests specified in
Part 2.
D.1
Arithmetic encoding procedures
Four arithmetic encoding procedures are required in a system with arithmetic coding (see Table D.1).
Table D.1 – Procedures for binary arithmetic encoding
Procedure
Purpose
Code_0(S)
Code a “0” binary decision with context-index S
Code_1(S)
Code a “1” binary decision with context-index S
Initenc
Initialize the encoder
Flush
Terminate entropy-coded segment
The “Code_0(S)”and “Code_1(S)” procedures code the 0-decision and 1-decision respectively; S is a context-index
which identifies a particular conditional probability estimate used in coding the binary decision. The “Initenc” procedure
initializes the arithmetic coding entropy encoder. The “Flush” procedure terminates the entropy-coded segment in
preparation for the marker which follows.
D.1.1
Binary arithmetic encoding principles
The arithmetic coder encodes a series of binary symbols, zeros and ones, each symbol representing one possible result of a
binary decision.
Each “binary decision” provides a choice between two alternatives. The binary decision might be between positive and
negative signs, a magnitude being zero or nonzero, or a particular bit in a sequence of binary digits being zero or one.
The output bit stream (entropy-coded data segment) represents a binary fraction which increases in precision as bytes are
appended by the encoding process.
D.1.1.1 Recursive interval subdivision
Recursive probability interval subdivision is the basis for the binary arithmetic encoding procedures. With each binary
decision the current probability interval is subdivided into two sub-intervals, and the bit stream is modified (if necessary)
so that it points to the base (the lower bound) of the probability sub-interval assigned to the symbol which occurred.
In the partitioning of the current probability interval into two sub-intervals, the sub-interval for the less probable symbol
(LPS) and the sub-interval for the more probable symbol (MPS) are ordered such that usually the MPS sub-interval is
closer to zero. Therefore, when the LPS is coded, the MPS sub-interval size is added to the bit stream. This coding
convention requires that symbols be recognized as either MPS or LPS rather than 0 or 1. Consequently, the size of the
LPS sub-interval and the sense of the MPS for each decision must be known in order to encode that decision.
54
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
The subdivision of the current probability interval would ideally require a multiplication of the interval by the probability
estimate for the LPS. Because this subdivision is done approximately, it is possible for the LPS sub-interval to be larger
than the MPS sub-interval. When that happens a “conditional exchange” interchanges the assignment of the sub-intervals
such that the MPS is given the larger sub-interval.
Since the encoding procedure involves addition of binary fractions rather than concatenation of integer code words, the
more probable binary decisions can sometimes be coded at a cost of much less than one bit per decision.
D.1.1.2 Conditioning of probability estimates
An adaptive binary arithmetic coder requires a statistical model – a model for selecting conditional probability estimates to
be used in the coding of each binary decision. When a given binary decision probability estimate is dependent on a
particular feature or features (the context) already coded, it is “conditioned” on that feature. The conditioning of
probability estimates on previously coded decisions must be identical in encoder and decoder, and therefore can use only
information known to both.
Each conditional probability estimate required by the statistical model is kept in a separate storage location or “bin”
identified by a unique context-index S. The arithmetic coder is adaptive, which means that the probability estimates at
each context-index are developed and maintained by the arithmetic coding system on the basis of prior coding decisions
for that context-index.
D.1.2
Encoding conventions and approximations
The encoding procedures use fixed precision integer arithmetic and an integer representation of fractional values in which
X’8000’ can be regarded as the decimal value 0.75. The probability interval, A, is kept in the integer
range X’8000’ ≤ A < X’10000’ by doubling it whenever its integer value falls below X’8000’. This is equivalent to
keeping A in the decimal range 0.75 ≤ A < 1.5. This doubling procedure is called renormalization.
The code register, C, contains the trailing bits of the bit stream. C is also doubled each time A is doubled. Periodically
– to keep C from overflowing – a byte of data is removed from the high order bits of the C-register and placed in the
entropy-coded segment.
Carry-over into the entropy-coded segment is limited by delaying X’FF’ output bytes until the carry-over is resolved. Zero
bytes are stuffed after each X’FF’ byte in the entropy-coded segment in order to avoid the accidental generation of
markers in the entropy-coded segment.
Keeping A in the range 0.75 ≤ A < 1.5 allows a simple arithmetic approximation to be used in the probability interval
subdivision. Normally, if the current estimate of the LPS probability for context-index S is Qe(S), precise calculation of
the sub-intervals would require:
Qe(S) × A
Probability sub-interval for the LPS;
A – (Qe(S) × A) Probability sub-interval for the MPS.
Because the decimal value of A is of order unity, these can be approximated by
Qe(S)
A – Qe(S)
Probability sub-interval for the LPS;
Probability sub-interval for the MPS.
Whenever the LPS is coded, the value of A – Qe(S) is added to the code register and the probability interval is reduced to
Qe(S). Whenever the MPS is coded, the code register is left unchanged and the interval is reduced to A – Qe(S). The
precision range required for A is then restored, if necessary, by renormalization of both A and C.
With the procedure described above, the approximations in the probability interval subdivision process can sometimes
make the LPS sub-interval larger than the MPS sub-interval. If, for example, the value of Qe(S) is 0.5 and A is at the
minimum allowed value of 0.75, the approximate scaling gives one-third of the probability interval to the MPS and twothirds to the LPS. To avoid this size inversion, conditional exchange is used. The probability interval is subdivided using
the simple approximation, but the MPS and LPS sub-interval assignments are exchanged whenever the LPS sub-interval is
larger than the MPS sub-interval. This MPS/LPS conditional exchange can only occur when a renormalization will be
needed.
Each binary decision uses a context. A context is the set of prior coding decisions which determine the context-index, S,
identifying the probability estimate used in coding the decision.
Whenever a renormalization occurs, a probability estimation procedure is invoked which determines a new probability
estimate for the context currently being coded. No explicit symbol counts are needed for the estimation. The relative
probabilities of renormalization after coding of LPS and MPS provide, by means of a table-based probability estimation
state machine, a direct estimate of the probabilities.
CCITT Rec. T.81 (1992 E)
55
ISO/IEC 10918-1 : 1993(E)
D.1.3
Encoder code register conventions
The flow charts in this annex assume the register structures for the encoder as shown in Table D.2.
Table D.2 – Encoder register connections
MSB
LSB
C-register
0000cbbb,
bbbbbsss,
xxxxxxxx,
xxxxxxxx
A-register
00000000,
00000000,
aaaaaaaa,
aaaaaaaa
The “a” bits are the fractional bits in the A-register (the current probability interval value) and the “x” bits are the
fractional bits in the code register. The “s” bits are optional spacer bits which provide useful constraints on carry-over, and
the “b” bits indicate the bit positions from which the completed bytes of data are removed from the C-register. The “c” bit
is a carry bit. Except at the time of initialization, bit 15 of the A-register is always set and bit 16 is always clear (the LSB
is bit 0).
These register conventions illustrate one possible implementation. However, any register conventions which allow
resolution of carry-over in the encoder and which produce the same entropy-coded segment may be used. The handling of
carry-over and the byte stuffing following X’FF’ will be described in a later part of this annex.
D.1.4
Code_1(S) and Code_0(S) procedures
When a given binary decision is coded, one of two possibilities occurs – either a 1-decision or a 0-decision is coded.
Code_1(S) and Code_0(S) are shown in Figures D.1 and D.2. The Code_1(S) and Code_0(S) procedures use probability
estimates with a context-index S. The context-index S is determined by the statistical model and is, in general, a function
of the previous coding decisions; each value of S identifies a particular conditional probability estimate which is used in
encoding the binary decision.
Code_1(S)
No
MPS(S) = 1
?
Yes
Code_MPS(S)
Code_LPS(S)
TISO1800-93/d039
Done
Figure D.1 – Code_1(S) procedure
Figure D.1 [D39], = 9 cm = 352.%
56
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Code_0(S)
No
MSP(S) = 0
?
Yes
Code_LPS(S)
Code_MPS(S)
TISO1030-93/d040
Done
Figure D.2 – Code_0(S) procedure
Figure D.2 [D40], = 9 cm = 352 %
The context-index S selects a storage location which contains Index(S), an index to the tables which make up the
probability estimation state machine. When coding a binary decision, the symbol being coded is either the more probable
symbol or the less probable symbol. Therefore, additional information is stored at each context-index identifying the sense
of the more probable symbol, MPS(S).
For simplicity, the flow charts in this subclause assume that the context storage for each context-index S has an additional
storage field for Qe(S) containing the value of Qe(Index(S)). If only the value of Index(S) and MPS(S) are stored, all
references to Qe(S) should be replaced by Qe(Index(S)).
The Code_LPS(S) procedure normally consists of the addition of the MPS sub-interval A – Qe(S) to the bit stream and a
scaling of the interval to the sub-interval, Qe(S). It is always followed by the procedures for obtaining a new LPS
probability estimate (Estimate_Qe(S)_after_LPS) and renormalization (Renorm_e) (see Figure D.3).
However, in the event that the LPS sub-interval is larger than the MPS sub-interval, the conditional MPS/LPS exchange
occurs and the MPS sub-interval is coded.
The Code_MPS(S) procedure normally reduces the size of the probability interval to the MPS sub-interval. However, if
the LPS sub-interval is larger than the MPS sub-interval, the conditional exchange occurs and the LPS sub-interval is
coded instead. Note that conditional exchange cannot occur unless the procedures for obtaining a new LPS probability
estimate (Estimate_Qe(S)_after_MPS) and renormalization (Renorm_e) are required after the coding of the symbol (see
Figure D.4).
CCITT Rec. T.81 (1992 E)
57
ISO/IEC 10918-1 : 1993(E)
Code_LPS(S)
A = A – Qe(S)
A < Qe(S)
?
Yes
No
C=C+A
A = Qe(S)
Estimate_Qe(S)_after_LPS
Renorm_e
Done
TISO1040-93/d041
Figure D.3 – Code_LPS(S) procedure with conditional MPS/LPS exchange
Figure D.3 [D41], = 13.5 cm = 528.%
58
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Code_MPS(S)
A = A – Qe(S)
No
A < X’8000’
?
No
Yes
A < Qe(S)
?
Yes
C=C+A
A = Qe(S)
Estimate_Qe(S)_after_MPS
Renorm_e
Done
TISO1050-93/d042
Figure D.4 – Code_MPS(S) procedure with conditional MPS/LPS exchange
Figure D.4 [D42], = 16.5 cm = 645.%
D.1.5
Probability estimation in the encoder
D.1.5.1 Probability estimation state machine
The probability estimation state machine consists of a number of sequences of probability estimates. These sequences are
interlinked in a manner which provides probability estimates based on approximate symbol counts derived from the
arithmetic coder renormalization. Some of these sequences are used during the initial “learning” stages of probability
estimation; the rest are used for “steady state” estimation.
Each entry in the probability estimation state machine is assigned an index, and each index has associated with it a
Qe value and two Next_Index values. The Next_Index_MPS gives the index to the new probability estimate after an MPS
renormalization; the Next_Index_LPS gives the index to the new probability estimate after an LPS renormalization. Note
that both the index to the estimation state machine and the sense of the MPS are kept for each context-index S. The sense
of the MPS is changed whenever the entry in the Switch_MPS is one.
The probability estimation state machine is given in Table D.3. Initialization of the arithmetic coder is always with
an MPS sense of zero and a Qe index of zero in Table D.3.
The Qe values listed in Table D.3 are expressed as hexadecimal integers. To approximately convert the 15-bit integer
representation of Qe to a decimal probability, divide the Qe values by (4/3) × (X’8000’).
CCITT Rec. T.81 (1992 E)
59
ISO/IEC 10918-1 : 1993(E)
Table D.3 – Qe values and probability estimation state machine
Index
Qe
_Value
10
11
12
13
14
15
16
17
18
19
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
60
X’5A1D’
X’2586’
X’1114’
X’080B’
X’03D8’
X’01DA’
X’00E5’
X’006F’
X’0036’
X’001A’
X’000D’
X’0006’
X’0003’
X’0001’
X’5A7F’
X’3F25’
X’2CF2’
X’207C’
X’17B9’
X’1182’
X’0CEF’
X’09A1’
X’072F’
X’055C’
X’0406’
X’0303’
X’0240’
X’01B1’
X’0144’
X’00F5’
X’00B7’
X’008A’
X’0068’
X’004E’
X’003B’
X’002C’
X’5AE1’
X’484C’
X’3A0D’
X’2EF1’
X’261F’
X’1F33’
X’19A8’
X’1518’
X’1177’
X’0E74’
X’0BFB’
X’09F8’
X’0861’
X’0706’
X’05CD’
X’04DE’
X’040F’
X’0363’
X’02D4’
X’025C’
X’01F8’
Next_ Index
Switch
_LPS
_MPS
_MPS
11
14
16
18
20
23
25
28
30
33
35
19
10
12
15
36
38
39
40
42
43
45
46
48
49
51
52
54
56
57
59
60
62
63
32
33
37
64
65
67
68
69
70
72
73
74
75
77
78
79
48
50
50
51
52
53
54
11
12
13
14
15
16
17
18
19
10
11
12
13
13
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
19
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
CCITT Rec. T.81 (1992 E)
Index
Qe
_Value
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
100
101
102
103
104
105
106
107
108
109
110
111
112
X’01A4’
X’0160’
X’0125’
X’00F6’
X’00CB’
X’00AB’
X’008F’
X’5B12’
X’4D04’
X’412C’
X’37D8’
X’2FE8’
X’293C’
X’2379’
X’1EDF’
X’1AA9’
X’174E’
X’1424’
X’119C’
X’0F6B’
X’0D51’
X’0BB6’
X’0A40’
X’5832’
X’4D1C’
X’438E’
X’3BDD’
X’34EE’
X’2EAE’
X’299A’
X’2516’
X’5570’
X’4CA9’
X’44D9’
X’3E22’
X’3824’
X’32B4’
X’2E17’
X’56A8’
X’4F46’
X’47E5’
X’41CF’
X’3C3D’
X’375E’
X’5231’
X’4C0F’
X’4639’
X’415E’
X’5627’
X’50E7’
X’4B85’
X’5597’
X’504F’
X’5A10’
X’5522’
X’59EB’
Next_ Index
Switch
_LPS
_MPS
_MPS
155
156
157
158
159
161
161
165
180
181
182
183
184
186
187
187
172
172
174
174
175
177
177
180
188
189
190
191
192
193
186
188
195
196
197
199
199
193
195
101
102
103
104
199
105
106
107
103
105
108
109
110
111
110
112
112
158
159
160
161
162
163
132
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
148
181
182
183
184
185
186
187
171
189
190
191
192
193
194
186
196
197
198
199
100
193
102
103
104
199
106
107
103
109
107
111
109
111
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
1
ISO/IEC 10918-1 : 1993(E)
D.1.5.2 Renormalization driven estimation
The change in state in Table D.3 occurs only when the arithmetic coder interval register is renormalized. This must always
be done after coding an LPS, and whenever the probability interval register is less than X'8000' (0.75 in decimal notation)
after coding an MPS.
When the LPS renormalization is required, Next_Index_LPS gives the new index for the LPS probability estimate. When
the MPS renormalization is required, Next_Index_MPS gives the new index for the LPS probability estimate. If
Switch_MPS is 1 for the old index, the MPS symbol sense must be inverted after an LPS.
D.1.5.3 Estimation following renormalization after MPS
The procedure for estimating the probability on the MPS renormalization path is given in Figure D.5. Index(S) is part of
the information stored for context-index S. The new value of Index(S) is obtained from Table D.3 from the column labeled
Next_Index_MPS, as that is the next index after an MPS renormalization. This next index is stored as the new value of
Index(S) in the context storage at context-index S, and the value of Qe at this new Index(S) becomes the new Qe(S).
MPS(S) does not change.
Estimate_Qe(S)_
after_MPS
I = Index(S)
I = Next_Index_MPS(I)
Index(S) = I
Qe(S) = Qe_Value(I)
Done
TISO1060-93/d043
Figure D.5 – Probability estimation on MPS renormalization path
Figure D.5 [D43], = 7 cm = 273.%
CCITT Rec. T.81 (1992 E)
61
ISO/IEC 10918-1 : 1993(E)
D.1.5.4 Estimation following renormalization after LPS
The procedure for estimating the probability on the LPS renormalization path is shown in Figure D.6. The procedure is
similar to that of Figure D.5 except that when Switch_MPS(I) is 1, the sense of MPS(S) must be inverted.
Estimate_Qe(S)_
after_LPS
I = Index(S)
No
Switch_MPS(I) = 1
?
Yes
MPS(S) = 1 – MPS(S)
I = Next_Index_LPS(I)
Index(S) = I
Qe(S) = Qe_Value(I)
Done
TISO1070-93/d044
Figure D.6 – Probability estimation on LPS renormalization path
Figure D.6 [D44], = 14 cm = 547.%
D.1.6
Renormalization in the encoder
The Renorm_e procedure for the encoder renormalization is shown in Figure D.7. Both the probability interval register A
and the code register C are shifted, one bit at a time. The number of shifts is counted in the counter CT; when CT is zero,
a byte of compressed data is removed from C by the procedure Byte_out and CT is reset to 8. Renormalization continues
until A is no longer less than X’8000’.
62
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Renorm_e
A = SLL A 1
C = SLL C 1
CT = CT – 1
No
CT = 0
?
Yes
Byte_out
CT = 8
A < X’8000’
?
Yes
No
Done
TISO1080-93/d045
Figure D.7 – Encoder renormalization procedure
Figure D.7 [D45], = 16.5 cm = 645.%
The Byte_out procedure used in Renorm_e is shown in Figure D.8. This procedure uses byte-stuffing procedures which
prevent accidental generation of markers by the arithmetic encoding procedures. It also includes an example of a
procedure for resolving carry-over. For simplicity of exposition, the buffer holding the entropy-coded segment is assumed
to be large enough to contain the entire segment.
In Figure D.8 BP is the entropy-coded segment pointer and B is the compressed data byte pointed to by BP. T in Byte_out
is a temporary variable which is used to hold the output byte and carry bit. ST is the stack counter which is used to count
X’FF’ output bytes until any carry-over through the X’FF’ sequence has been resolved. The value of ST rarely exceeds 3.
However, since the upper limit for the value of ST is bounded only by the total entropy-coded segment size, a precision of
32 bits is recommended for ST.
Since large values of ST represent a latent output of compressed data, the following procedure may be needed in high
speed synchronous encoding systems for handling the burst of output data which occurs when the carry is resolved.
CCITT Rec. T.81 (1992 E)
63
ISO/IEC 10918-1 : 1993(E)
Byte_out
T = SRL C 19
B=B+1
Yes
T > X’FF’
?
No
Yes
Stuff_0
T = X’FF’
?
No
Output_stacked_
zeros
ST = ST + 1
BP = BP + 1
B=T
Output_stacked_
X’FF’s
BP = BP + 1
B=T
C = C AND X’7FFFF’
Done
TISO1090-93/d046
Figure D.8 – Byte_out procedure for encoder
Figure D.8 [D46], = 18 cm = 704.%
When the stack count reaches an upper bound determined by output channel capacity, the stack is emptied and the stacked
X’FF’ bytes (and stuffed zero bytes) are added to the compressed data before the carry-over is resolved. If a carry-over
then occurs, the carry is added to the final stuffed zero, thereby converting the final X’FF00’ sequence to the X’FF01’
temporary private marker. The entropy-coded segment must then be post-processed to resolve the carry-over and remove
the temporary marker code. For any reasonable bound on ST this post processing is very unlikely.
Referring to Figure D.8, the shift of the code register by 19 bits aligns the output bits with the low order bits of T. The
first test then determines if a carry-over has occurred. If so, the carry must be added to the previous output byte before
advancing the segment pointer BP. The Stuff_0 procedure stuffs a zero byte whenever the addition of the carry to the data
already in the entropy-coded segments creates a X’FF’ byte. Any stacked output bytes – converted to zeros by the carryover – are then placed in the entropy-coded segment. Note that when the output byte is later transferred from T to the
entropy-coded segment (to byte B), the carry bit is ignored if it is set.
64
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
If a carry has not occurred, the output byte is tested to see if it is X’FF’. If so, the stack count ST is incremented, as the
output must be delayed until the carry-over is resolved. If not, the carry-over has been resolved, and any stacked X’FF’
bytes must then be placed in the entropy-coded segment. Note that a zero byte is stuffed following each X’FF’.
The procedures used by Byte_out are defined in Figures D.9 through D.11.
Output_stacked_
zeros
ST = 0
?
Yes
No
BP = BP + 1
B=0
ST = ST – 1
Done
TISO1810-93/d047
Figure D.9 – Output_stacked_zeros procedure for encoder
Figure D.9 [D47], = 8.5 cm = 332.%
Output_stacked_
X’FF’s
ST = 0
?
Yes
No
BP = BP + 1
B = X’FF’
BP = BP + 1
B=0
ST = ST – 1
Done
TISO1100-93/d048
Figure D.10 – Output_stacked_X’FF’s procedure for encoder
Figure D.10 [D48], = 8.5 cm = 332.%
CCITT Rec. T.81 (1992 E)
65
ISO/IEC 10918-1 : 1993(E)
Stuff_0
No
B = X’FF’
?
Yes
BP = BP + 1
B=0
Done
TISO1110-93/d049
Figure D.11 – Stuff_0 procedure for encoder
Figure D.11 [D49], = 10 cm = 391.%
D.1.7
Initialization of the encoder
The Initenc procedure is used to start the arithmetic coder. The basic steps are shown in Figure D.12.
Initenc
Initialize statistics areas
ST = 0
A = X’10000’
A = (see Note below)
C=0
CT = 11
BP = BPST – 1
Done
TISO1120-93/d050
Figure D.12 – Initialization of the encoder
Figure D.12 [D50], = 9 cm = 352.%
66
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
The probability estimation tables are defined by Table D.3. The statistics areas are initialized to an MPS sense of 0 and a
Qe index of zero as defined by Table D.3. The stack count (ST) is cleared, the code register (C) is cleared, and the interval
register is set to X’10000’. The counter (CT) is set to 11, reflecting the fact that when A is initialized to X’10000’ three
spacer bits plus eight output bits in C must be filled before the first byte is removed. Note that BP is initialized to point to
the byte before the start of the entropy-coded segment (which is at BPST). Note also that the statistics areas are initialized
for all values of context-index S to MPS(S) = 0 and Index(S) = 0.
NOTE – Although the probability interval is initialized to X’10000’ in both Initenc and Initdec, the precision of
the probability interval register can still be limited to 16 bits. When the precision of the interval register is 16 bits, it is initialized to
zero.
D.1.8
Termination of encoding
The Flush procedure is used to terminate the arithmetic encoding procedures and prepare the entropy-coded segment for
the addition of the X’FF’ prefix of the marker which follows the arithmetically coded data. Figure D.13 shows this flush
procedure. The first step in the procedure is to set as many low order bits of the code register to zero as possible without
pointing outside of the final interval. Then, the output byte is aligned by shifting it left by CT bits; Byte_out then removes
it from C. C is then shifted left by 8 bits to align the second output byte and Byte_out is used a second time. The
remaining low order bits in C are guaranteed to be zero, and these trailing zero bits shall not be written to the entropycoded segment.
Flush
Clear_final_bits
C = SLL C CT
Byte_out
C = SLL C 8
Byte_out
Discard_final_zeros
Done
TISO1130-93/d051
Figure D.13 – Flush procedure
Figure D.13 [D51], = 15.5 cm = 606.%
CCITT Rec. T.81 (1992 E)
67
ISO/IEC 10918-1 : 1993(E)
Any trailing zero bytes already written to the entropy-coded segment and not preceded by a X’FF’ may, optionally, be
discarded. This is done in the Discard_final_zeros procedure. Stuffed zero bytes shall not be discarded.
Entropy coded segments are always followed by a marker. For this reason, the final zero bits needed to complete decoding
shall not be included in the entropy coded segment. Instead, when the decoder encounters a marker, zero bits shall be
supplied to the decoding procedure until decoding is complete. This convention guarantees that when a DNL marker is
used, the decoder will intercept it in time to correctly terminate the decoding procedure.
Clear_final_bits
T=C+A–1
T = T AND
T = X’FFFF0000’
T<C
?
No
Yes
T = T + X’8000’
C=T
Done
TISO1140-93/d052
Figure D.14 – Clear_final_bits procedure in Flush
Figure D.14 [D52], = 14 cm = 547.%
68
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Discard_final_zeros
Yes
BP < BPST
?
No
Yes
BP = BP – 1
B=0
?
No
Yes
BP = BP + 1
B = X’FF’
?
No
Done
TISO1150-93/d053
Figure D.15 – Discard_final_zeros procedure in Flush
Figure D.15 [D53], = 12.5cm = 489.%
D.2
Arithmetic decoding procedures
Two arithmetic decoding procedures are used for arithmetic decoding (see Table D.4).
The “Decode(S)” procedure decodes the binary decision for a given context-index S and returns a value of either 0 or 1. It
is the inverse of the “Code_0(S)” and “Code_1(S)” procedures described in D.1. “Initdec” initializes the arithmetic
coding entropy decoder.
Table D.4 – Procedures for binary arithmetic decoding
Procedure
Purpose
Decode(S)
Decode a binary decision with context-index S
Initdec
Initialize the decoder
CCITT Rec. T.81 (1992 E)
69
ISO/IEC 10918-1 : 1993(E)
D.2.1
Binary arithmetic decoding principles
The probability interval subdivision and sub-interval ordering defined for the arithmetic encoding procedures also apply to
the arithmetic decoding procedures.
Since the bit stream always points within the current probability interval, the decoding process is a matter of determining,
for each decision, which sub-interval is pointed to by the bit stream. This is done recursively, using the same probability
interval sub-division process as in the encoder. Each time a decision is decoded, the decoder subtracts from the bit stream
any interval the encoder added to the bit stream. Therefore, the code register in the decoder is a pointer into the current
probability interval relative to the base of the interval.
If the size of the sub-interval allocated to the LPS is larger than the sub-interval allocated to the MPS, the encoder invokes
the conditional exchange procedure. When the interval sizes are inverted in the decoder, the sense of the symbol decoded
must be inverted.
D.2.2
Decoding conventions and approximations
The approximations and integer arithmetic defined for the probability interval subdivision in the encoder must also be
used in the decoder. However, where the encoder would have added to the code register, the decoder subtracts from the
code register.
D.2.3
Decoder code register conventions
The flow charts given in this section assume the register structures for the decoder as shown in Table D.5:
Table D.5 – Decoder register conventions
MSB
LSB
Cx register
xxxxxxxx,
xxxxxxxx
C-low
bbbbbbbb,
00000000
A-register
aaaaaaaa,
aaaaaaaa
Cx and C-low can be regarded as one 32-bit C-register, in that renormalization of C shifts a bit of new data from bit 15 of
C-low to bit 0 of Cx. However, the decoding comparisons use Cx alone. New data are inserted into the “b” bits of C-low
one byte at a time.
NOTE – The comparisons shown in the various procedures use arithmetic comparisons, and therefore assume precisions
greater than 16 bits for the variables. Unsigned (logical) comparisons should be used in 16-bit precision implementations.
D.2.4
The decode procedure
The decoder decodes one binary decision at a time. After decoding the decision, the decoder subtracts any amount from
the code register that the encoder added. The amount left in the code register is the offset from the base of the current
probability interval to the sub-interval allocated to the binary decisions not yet decoded. In the first test in the decode
procedure shown in Figure D.16 the code register is compared to the size of the MPS sub-interval. Unless a conditional
exchange is needed, this test determines whether the MPS or LPS for context-index S is decoded. Note that the LPS for
context-index S is given by 1 – MPS(S).
70
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
When a renormalization is needed, the MPS/LPS conditional exchange may also be needed. For the LPS path, the
conditional exchange procedure is shown in Figure D.17. Note that the probability estimation in the decoder is identical
to the probability estimation in the encoder (Figures D.5 and D.6).
Decode(S)
A = A – Qe(S)
Yes
A < X’8000’
Cx < A
?
No
No
Yes
Di=iCond_MPS_exchange(S)
Renorm_d
D = MPS(S)
D = Cond_LPS_exchange(S)
Renorm_d
TISO1160-93/d054
Return D
Figure D.16 – Decode(S) procedure
Figure D.16 [D54], = 13.5 cm = 528.%
For the MPS path of the decoder the conditional exchange procedure is given in Figure D.18.
CCITT Rec. T.81 (1992 E)
71
ISO/IEC 10918-1 : 1993(E)
Cond_LPS_
exchange(S)
Yes
A < Qe(S)
?
No
D = MPS(S)
Cx = Cx – A
A = Qe(S)
D = 1 – MPS(S)
Cx = Cx – A
A = Qe(S)
Estimate_Qe(S)_
after_MPS
Estimate_Qe(S)_
after_LPS
Return D
TISO1170-93/d055
Figure D.17 – Decoder LPS path conditional exchange procedure
Figure D.17 [D55], = 12 cm = 469.%
Cond_MPS_
exchange(S)
Yes
A < Qe(S)
?
No
D = 1 – MPS(S)
D = MPS(S)
Estimate_Qe(S)_
after_LPS
Estimate_Qe(S)_
after_MPS
Return D
TISO1180-93/d056
Figure D.18 – Decoder MPS path conditional exchange procedure
Figure D.18 [D56], = 12 cm = 469.%
72
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
D.2.5
Probability estimation in the decoder
The procedures defined for obtaining a new LPS probability estimate in the encoder are also used in the decoder.
D.2.6
Renormalization in the decoder
The Renorm_d procedure for the decoder renormalization is shown in Figure D.19. CT is a counter which keeps track of
the number of compressed bits in the C-low section of the C-register. When CT is zero, a new byte is inserted into C-low
by the procedure Byte_in and CT is reset to 8.
Both the probability interval register A and the code register C are shifted, one bit at a time, until A is no longer less than
X’8000’.
Renorm_d
No
CT = 0
?
Yes
Byte_in
CT = 8
A = SLL A 1
C = SLL C 1
CT = CT – 1
A < X’8000’
?
Yes
No
Done
TISO1190-93/d057
Figure D.19 – Decoder renormalization procedure
Figure D.19 [D57], = 16.5 cm = 645.%
CCITT Rec. T.81 (1992 E)
73
ISO/IEC 10918-1 : 1993(E)
The Byte_in procedure used in Renorm_d is shown in Figure D.20. This procedure fetches one byte of data,
compensating for the stuffed zero byte which follows any X’FF’ byte. It also detects the marker which must follow the
entropy-coded segment. The C-register in this procedure is the concatenation of the Cx and C-low registers. For simplicity
of exposition, the buffer holding the entropy-coded segment is assumed to be large enough to contain the entire segment.
B is the byte pointed to by the entropy-coded segment pointer BP. BP is first incremented. If the new value of B is not a
X’FF’, it is inserted into the high order 8 bits of C-low.
Byte_in
BP = BP + 1
Yes
B = X’FF’
?
Unstuff_0
No
C = C + SLL B 8
Done
TISO1200-93/d058
Figure D.20 – Byte_in procedure for decoder
Figure D.20 [D58], = 12 cm = 469.%
74
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
The Unstuff_0 procedure is shown in Figure D.21. If the new value of B is X’FF’, BP is incremented to point to the next
byte and this next B is tested to see if it is zero. If so, B contains a stuffed byte which must be skipped. The zero B is
ignored, and the X’FF’ B value which preceded it is inserted in the C-register.
If the value of B after a X’FF’ byte is not zero, then a marker has been detected. The marker is interpreted as required and
the entropy-coded segment pointer is adjusted (“Adjust BP” in Figure D.21) so that 0-bytes will be fed to the decoder
until decoding is complete. One way of accomplishing this is to point BP to the byte preceding the marker which follows
the entropy-coded segment.
Unstuff_0
BP = BP + 1
Yes
B=0
?
No
Interpret_marker
Adjust BP
C = C OR X’FF00’
Done
TISO1210-93/d059
Figure D.21 – Unstuff_0 procedure for decoder
Figure D.21 [D59], = 12 cm = 469.%
CCITT Rec. T.81 (1992 E)
75
ISO/IEC 10918-1 : 1993(E)
D.2.7
Initialization of the decoder
The Initdec procedure is used to start the arithmetic decoder. The basic steps are shown in Figure D.22.
Initdec
Initialize statistics areas
BP = BPST – 1
A = X’0000’
A = (see Note below)
C=0
Byte_in
C = SLL C 8
Byte_in
C = SLL C 8
CT = 0
Done
TISO1220-93/d060
Figure D.22 – Initialization of the decoder
Figure D.22 [D60], = 16 cm = 625.%
The estimation tables are defined by Table D.3. The statistics areas are initialized to an MPS sense of 0 and a Qe index of
zero as defined by Table D.3. BP, the pointer to the entropy-coded segment, is then initialized to point to the byte before
the start of the entropy-coded segment at BPST, and the interval register is set to the same starting value as in the encoder.
The first byte of compressed data is fetched and shifted into Cx. The second byte is then fetched and shifted into Cx. The
count is set to zero, so that a new byte of data will be fetched by Renorm_d.
NOTE – Although the probability interval is initialized to X’10000’ in both Initenc and Initdec, the precision of
the probability interval register can still be limited to 16 bits. When the precision of the interval register is 16 bits, it is initialized to
zero.
D.3
Bit ordering within bytes
The arithmetically encoded entropy-coded segment is an integer of variable length. Therefore, the ordering of bytes and
the bit ordering within bytes is the same as for parameters (see B.1.1.1).
76
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Annex E
Encoder and decoder control procedures
(This annex forms an integral part of this Recommendation | International Standard)
This annex describes the encoder and decoder control procedures for the sequential, progressive, and lossless modes of
operation.
The encoding and decoding control procedures for the hierarchical processes are specified in Annex J.
NOTES
1
There is no requirement in this Specification that any encoder or decoder shall implement the procedures in precisely
the manner specified by the flow charts in this annex. It is necessary only that an encoder or decoder implement the function specified
in this annex. The sole criterion for an encoder or decoder to be considered in compliance with this Specification is that it satisfy the
requirements given in clause 6 (for encoders) or clause 7 (for decoders), as determined by the compliance tests specified in Part 2.
2
Implementation-specific setup steps are not indicated in this annex and may be necessary.
E.1
Encoder control procedures
E.1.1
Control procedure for encoding an image
The encoder control procedure for encoding an image is shown in Figure E.1.
Encode_image
Append SOI marker
Encode_frame
Append EOI marker
Done
TISO1230-93/d061
Figure E.1 – Control procedure for encoding an image
Figure E.1 [D61], = 11.5 cm = 449.%
CCITT Rec. T.81 (1992 E)
77
ISO/IEC 10918-1 : 1993(E)
E.1.2
Control procedure for encoding a frame
In all cases where markers are appended to the compressed data, optional X’FF’ fill bytes may precede the marker.
The control procedure for encoding a frame is oriented around the scans in the frame. The frame header is first appended,
and then the scans are coded. Table specifications and other marker segments may precede the SOFn marker, as indicated
by [tables/miscellaneous] in Figure E.2.
Figure E.2 shows the encoding process frame control procedure.
Encode_frame
[Append tables/miscellaneous]
Append SOF n marker and rest
of frame header
Encode_scan
First scan
?
Yes
[Append DNL
segment]
No
Yes
More scans
?
No
Done
TISO1240-93/d062
Figure E.2 – Control procedure for encoding a frame
Figure E.2 [D62], = 14 cm = 547.%
E.1.3
Control procedure for encoding a scan
A scan consists of a single pass through the data of each component in the scan. Table specifications and other marker
segments may precede the SOS marker. If more than one component is coded in the scan, the data are interleaved. If
restart is enabled, the data are segmented into restart intervals. If restart is enabled, a RSTm marker is placed in the coded
data between restart intervals. If restart is disabled, the control procedure is the same, except that the entire scan contains a
single restart interval. The compressed image data generated by a scan is always followed by a marker, either the EOI
marker or the marker of the next marker segment.
78
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Figure E.3 shows the encoding process scan control procedure. The loop is terminated when the encoding process has
coded the number of restart intervals which make up the scan. “m” is the restart interval modulo counter needed for the
RSTm marker. The modulo arithmetic for this counter is shown after the “Append RSTm marker” procedure.
Encode_scan
[Append tables/miscellaneous]
Append SOS marker and rest of
scan header
m=0
Encode_restart_
interval
More intervals
?
No
Yes
Append RST m marker
m = (m + 1) AND 7
Done
TISO1250-93/d063
Figure E.3 – Control procedure for encoding a scan
Figure E.3 [D63], = 13 cm = 508.%
CCITT Rec. T.81 (1992 E)
79
ISO/IEC 10918-1 : 1993(E)
E.1.4
Control procedure for encoding a restart interval
Figure E.4 shows the encoding process control procedure for a restart interval. The loop is terminated either when the
encoding process has coded the number of minimum coded units (MCU) in the restart interval or when it has completed
the image scan.
Encode_restart_
interval
Reset_encoder
Encode_MCU
More MCU
?
No
Prepare_for_marker
Yes
Done
TISO1260-93/d064
Figure E.4 – Control procedure for encoding a restart interval
Figure E.4 [D64], = 12 cm = 469.%
The “Reset_encoder” procedure consists at least of the following:
a)
if arithmetic coding is used, initialize the arithmetic encoder using the “Initenc” procedure described
in D.1.7;
b)
for DCT-based processes, set the DC prediction (PRED) to zero for all components in the scan
(see F.1.1.5.1);
c)
for lossless processes, reset the prediction to a default value for all components in the scan (see H.1.1);
d)
do all other implementation-dependent setups that may be necessary.
The procedure “Prepare_for_marker” terminates the entropy-coded segment by:
a)
padding a Huffman entropy-coded segment with 1-bits to complete the final byte (and if needed stuffing a
zero byte) (see F.1.2.3); or
b)
invoking the procedure “Flush” (see D.1.8) to terminate an arithmetic entropy-coded segment.
NOTE – The number of minimum coded units (MCU) in the final restart interval must be adjusted to match the number
of MCU in the scan. The number of MCU is calculated from the frame and scan parameters. (See Annex B.)
80
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
E.1.5
Control procedure for encoding a minimum coded unit (MCU)
The minimum coded unit is defined in A.2. Within a given MCU the data units are coded in the order in which they occur
in the MCU. The control procedure for encoding a MCU is shown in Figure E.5.
Encode_MCU
N =0
N=N+1
Encode data unit
No
N = Nb
?
Yes
Done
TISO1270-93/d065
Figure E.5 – Control procedure for encoding a minimum coded unit (MCU)
Figure E.5 [D65], = 12 cm = 469.%
In Figure E.5, Nb refers to the number of data units in the MCU. The order in which data units occur in the MCU is
defined in A.2. The data unit is an 8 × 8 block for DCT-based processes, and a single sample for lossless processes.
The procedures for encoding a data unit are specified in Annexes F, G, and H.
E.2
Decoder control procedures
E.2.1
Control procedure for decoding compressed image data
Figure E.6 shows the decoding process control for compressed image data.
Decoding control centers around identification of various markers. The first marker must be the SOI (Start Of Image)
marker. The “Decoder_setup” procedure resets the restart interval (Ri = 0) and, if the decoder has arithmetic decoding
capabilities, sets the conditioning tables for the arithmetic coding to their default values. (See F.1.4.4.1.4 and F.1.4.4.2.1.)
The next marker is normally a SOFn (Start Of Frame) marker; if this is not found, one of the marker segments listed in
Table E.1 has been received.
CCITT Rec. T.81 (1992 E)
81
ISO/IEC 10918-1 : 1993(E)
Decode_image
SOI marker
?
No
Yes
Decoder_setup
SOFn marker
?
Error
No
Interpret markers
Yes
Decode_frame
Done
TISO1280-93/d066
Figure E.6 – Control procedure for decoding compressed image data
Figure E.6 [D66], = 14 cm = 547 %
Table E.1 – Markers recognized by “Interpret markers”
Marker
Purpose
DHT
Define Huffman Tables
DAC
Define Arithmetic Conditioning
DQT
Define Quantization Tables
DRI
Define Restart Interval
APPn
Application defined marker
COM
Comment
Note that optional X’FF’ fill bytes which may precede any marker shall be discarded before determining which marker is
present.
82
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
The additional logic to interpret these various markers is contained in the box labeled “Interpret markers”. DHT markers
shall be interpreted by processes using Huffman coding. DAC markers shall be interpreted by processes using arithmetic
coding. DQT markers shall be interpreted by DCT-based decoders. DRI markers shall be interpreted by all decoders.
APPn and COM markers shall be interpreted only to the extent that they do not interfere with the decoding.
By definition, the procedures in “Interpret markers” leave the system at the next marker. Note that if the expected SOI
marker is missing at the start of the compressed image data, an error condition has occurred. The techniques for detecting
and managing error conditions can be as elaborate or as simple as desired.
E.2.2
Control procedure for decoding a frame
Figure E.7 shows the control procedure for the decoding of a frame.
Decode_frame
Interpret frame header
SOS marker
?
No
Interpret markers
Yes
Decode_scan
No
EOI marker
?
Yes
Done
TISO1290-93/d067
Figure E.7 – Control procedure for decoding a frame
Figure E.7 [D67], = 13.5 cm = 528.%
The loop is terminated if the EOI marker is found at the end of the scan.
The markers recognized by “Interpret markers” are listed in Table E.1. Subclause E.2.1 describes the extent to which the
various markers shall be interpreted.
CCITT Rec. T.81 (1992 E)
83
ISO/IEC 10918-1 : 1993(E)
E.2.3
Control procedure for decoding a scan
Figure E.8 shows the decoding of a scan.
The loop is terminated when the expected number of restart intervals has been decoded.
Decode_scan
Interpret scan header
m= 0
Decode_restart_
interval
More intervals
?
No
Yes
Done
TISO1300-93/d068
Figure E.8 – Control procedure for decoding a scan
Figure E.8 [D68], = 11.5cm = 449.%
84
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
E.2.4
Control procedure for decoding a restart interval
The procedure for decoding a restart interval is shown in Figure E.9. The “Reset_decoder” procedure consists at least of
the following:
a)
if arithmetic coding is used, initialize the arithmetic decoder using the “Initdec” procedure described
in D.2.7;
b)
for DCT-based processes, set the DC prediction (PRED) to zero for all components in the scan
(see F.2.1.3.1);
c)
for lossless process, reset the prediction to a default value for all components in the scan (see H.2.1);
d)
do all other implementation-dependent setups that may be necessary.
Decode_restart_
interval
Reset_decoder
Decode_MCU
More MCU
?
No
Find marker
Yes
Done
TISO1310-93/d069
Figure E.9 – Control procedure for decoding a restart interval
Figure E.9 [D69], = 12 cm = 469.%
At the end of the restart interval, the next marker is located. If a problem is detected in locating this marker, error handlin g
procedures may be invoked. While such procedures are optional, the decoder shall be able to correctly recognize restart
markers in the compressed data and reset the decoder when they are encountered. The decoder shall also be able to
recognize the DNL marker, set the number of lines defined in the DNL segment, and end the “Decode_restart_interval”
procedure.
NOTE – The final restart interval may be smaller than the size specified by the DRI marker segment, as it includes only the
number of MCUs remaining in the scan.
CCITT Rec. T.81 (1992 E)
85
ISO/IEC 10918-1 : 1993(E)
E.2.5
Control procedure for decoding a minimum coded unit (MCU)
The procedure for decoding a minimum coded unit (MCU) is shown in Figure E.10.
In Figure E.10 Nb is the number of data units in a MCU.
The procedures for decoding a data unit are specified in Annexes F, G, and H.
Decode_MCU
N=0
N=N+1
Decode_data_unit
No
N = Nb
?
Yes
Done
TISO1320-93/d070
Figure E.10 – Control procedure for decoding a minimum coded unit (MCU)
Figure E.106 [D70], = 11.5 cm = 449.%
86
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Annex F
Sequential DCT-based mode of operation
(This annex forms an integral part of this Recommendation | International Standard)
ISO/IEC 10918-1 : 1993(E)
CCITT Rec. T.81 (1992 E)
This annex provides a functional specification of the following coding processes for the sequential DCT-based mode of
operation:
1)
baseline sequential;
2)
extended sequential, Huffman coding, 8-bit sample precision;
3)
extended sequential, arithmetic coding, 8-bit sample precision;
4)
extended sequential, Huffman coding, 12-bit sample precision;
5)
extended sequential, arithmetic coding, 12-bit sample precision.
For each of these, the encoding process is specified in F.1, and the decoding process is specified in F.2. The functional
specification is presented by means of specific flow charts for the various procedures which comprise these coding
processes.
NOTE – There is no requirement in this Specification that any encoder or decoder which embodies one of the above-named
processes shall implement the procedures in precisely the manner specified by the flow charts in this annex. It is necessary only that an
encoder or decoder implement the function specified in this annex. The sole criterion for an encoder or decoder to be considered in
compliance with this Specification is that it satisfy the requirements given in clause 6 (for encoders) or clause 7 (for decoders), as
determined by the compliance tests specified in Part 2.
F.1
Sequential DCT-based encoding processes
F.1.1
Sequential DCT-based control procedures and coding models
F.1.1.1
Control procedures for sequential DCT-based encoders
The control procedures for encoding an image and its constituent parts – the frame, scan, restart interval and
MCU – are given in Figures E.1 to E.5. The procedure for encoding a MCU (see Figure E.5) repetitively calls the
procedure for encoding a data unit. For DCT-based encoders the data unit is an 8 × 8 block of samples.
F.1.1.2
Procedure for encoding an 8 × 8 block data unit
For the sequential DCT-based processes encoding an 8 × 8 block data unit consists of the following procedures:
F.1.1.3
a)
level shift, calculate forward 8 × 8 DCT and quantize the resulting coefficients using table destination
specified in frame header;
b)
encode DC coefficient for 8 × 8 block using DC table destination specified in scan header;
c)
encode AC coefficients for 8 × 8 block using AC table destination specified in scan header.
Level shift and forward DCT (FDCT)
The mathematical definition of the FDCT is given in A.3.3.
Prior to computing the FDCT the input data are level shifted to a signed two’s complement representation as described in
A.3.1. For 8-bit input precision the level shift is achieved by subtracting 128. For 12-bit input precision the level shift is
achieved by subtracting 2048.
F.1.1.4
Quantization of the FDCT
The uniform quantization procedure described in Annex A is used to quantize the DCT coefficients. One of four
quantization tables may be used by the encoder. No default quantization tables are specified in this Specification.
However, some typical quantization tables are given in Annex K.
The quantized DCT coefficient values are signed, two’s complement integers with 11-bit precision for 8-bit input
precision and 15-bit precision for 12-bit input precision.
CCITT Rec. T.81 (1992 E)
87
ISO/IEC 10918-1 : 1993(E)
F.1.1.5
Encoding models for the sequential DCT procedures
The two dimensional array of quantized DCT coefficients is rearranged in a zig-zag sequence order defined in A.3.6. The
zig-zag order coefficients are denoted ZZ (0) through ZZ(63) with:
ZZ(0) = Sq00,ZZ(1) = Sq01,ZZ(2) = Sq10,•,•,•,ZZ(63) = Sq77
Sqvu are defined in Figure A.6.
Two coding procedures are used, one for the DC coefficient ZZ(0) and the other for the AC coefficients ZZ(1)..ZZ(63).
The coefficients are encoded in the order in which they occur in zig-zag sequence order, starting with the DC coefficient.
The coefficients are represented as two’s complement integers.
F.1.1.5.1
Encoding model for DC coefficients
The DC coefficients are coded differentially, using a one-dimensional predictor, PRED, which is the quantized DC value
from the most recently coded 8 × 8 block from the same component. The difference, DIFF, is obtained from
DIFF = ZZ(0) – PRED
At the beginning of the scan and at the beginning of each restart interval, the prediction for the DC coefficient prediction
is initialized to 0. (Recall that the input data have been level shifted to two’s complement representation.)
F.1.1.5.2
Encoding model for AC coefficients
Since many coefficients are zero, runs of zeros are identified and coded efficiently. In addition, if the remaining
coefficients in the zig-zag sequence order are all zero, this is coded explicitly as an end-of-block (EOB).
F.1.2
Baseline Huffman encoding procedures
The baseline encoding procedure is for 8-bit sample precision. The encoder may employ up to two DC and two AC
Huffman tables within one scan.
F.1.2.1
F.1.2.1.1
Huffman encoding of DC coefficients
Structure of DC code table
The DC code table consists of a set of Huffman codes (maximum length 16 bits) and appended additional bits (in most
cases) which can code any possible value of DIFF, the difference between the current DC coefficient and the prediction.
The Huffman codes for the difference categories are generated in such a way that no code consists entirely of 1-bits
(X’FF’ prefix marker code avoided).
The two’s complement difference magnitudes are grouped into 12 categories, SSSS, and a Huffman code is created for
each of the 12 difference magnitude categories (see Table F.1).
For each category, except SSSS = 0, an additional bits field is appended to the code word to uniquely identify which
difference in that category actually occurred. The number of extra bits is given by SSSS; the extra bits are appended to the
LSB of the preceding Huffman code, most significant bit first. When DIFF is positive, the SSSS low order bits of DIFF
are appended. When DIFF is negative, the SSSS low order bits of (DIFF – 1) are appended. Note that the most significant
bit of the appended bit sequence is 0 for negative differences and 1 for positive differences.
F.1.2.1.2
Defining Huffman tables for the DC coefficients
The syntax for specifying the Huffman tables is given in Annex B. The procedure for creating a code table from this
information is described in Annex C. No more than two Huffman tables may be defined for coding of DC coefficients.
Two examples of Huffman tables for coding of DC coefficients are provided in Annex K.
88
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Table F.1 – Difference magnitude categories for DC coding
F.1.2.1.3
SSSS
DIFF values
10
0
11
–1,1
12
–3,–2,2,3
13
–7..–4,4..7
14
–15..–8,8..15
15
–31..–16,16..31
16
–63..–32,32..63
17
–127..–64,64..127
18
–255..–128,128..255
19
–511..–256,256..511
10
–1 023..–512,512..1 023
11
–2 047..–1 024,1 024..2 047
Huffman encoding procedures for DC coefficients
The encoding procedure is defined in terms of a set of extended tables, XHUFCO and XHUFSI, which contain the
complete set of Huffman codes and sizes for all possible difference values. For full 12-bit precision the tables are relatively
large. For the baseline system, however, the precision of the differences may be small enough to make this description
practical.
XHUFCO and XHUFSI are generated from the encoder tables EHUFCO and EHUFSI (see Annex C) by appending to the
Huffman codes for each difference category the additional bits that completely define the difference. By definition,
XHUFCO and XHUFSI have entries for each possible difference value. XHUFCO contains the concatenated bit pattern of
the Huffman code and the additional bits field; XHUFSI contains the total length in bits of this concatenated bit pattern.
Both are indexed by DIFF, the difference between the DC coefficient and the prediction.
The Huffman encoding procedure for the DC difference, DIFF, is:
SIZE = XHUFSI(DIFF)
CODE = XHUFCO(DIFF)
code SIZE bits of CODE
where DC is the quantized DC coefficient value and PRED is the predicted quantized DC value. The Huffman code
(CODE) (including any additional bits) is obtained from XHUFCO and SIZE (length of the code including additional
bits) is obtained from XHUFSI, using DIFF as the index to the two tables.
F.1.2.2
F.1.2.2.1
Huffman encoding of AC coefficients
Structure of AC code table
Each non-zero AC coefficient in ZZ is described by a composite 8-bit value, RS, of the form
RS = binary ’RRRRSSSS’
CCITT Rec. T.81 (1992 E)
89
ISO/IEC 10918-1 : 1993(E)
The 4 least significant bits, ’SSSS’, define a category for the amplitude of the next non-zero coefficient in ZZ, and the 4
most significant bits, ’RRRR’, give the position of the coefficient in ZZ relative to the previous non-zero coefficient (i.e.
the run-length of zero coefficients between non-zero coefficients). Since the run length of zero coefficients may exceed
15, the value ’RRRRSSSS’ = X’F0’ is defined to represent a run length of 15 zero coefficients followed by a coefficient
of zero amplitude. (This can be interpreted as a run length of 16 zero coefficients.) In addition, a special value
’RRRRSSSS’ = ’00000000’ is used to code the end-of-block (EOB), when all remaining coefficients in the block are
zero.
The general structure of the code table is illustrated in Figure F.1. The entries marked “N/A” are undefined for the
baseline procedure.
0
0
.
RRRR .
.
15
EOB
N/A
N/A
N/A
ZRL
1
2
SSSS
. .
.
9
10
COMPOSITE VALUES
TISO1330-93/d071
Figure F.1 – Two-dimensional value array for Huffman coding
Figure F.1 [D71] =4 cm = 156 %
The magnitude ranges assigned to each value of SSSS are defined in Table F.2.
Table F.2 – Categories assigned to coefficient values
SSSS
AC coefficients
11
–1,1
12
–3,–2,2,3
13
–7..–4,4..7
14
–15..–8,8..15
15
–31..–16,16..31
16
–63..–32,32..63
17
–127..–64,64..127
18
–255..–128,128..255
19
–511..–256,256..511
10
–1 023..–512,512..1 023
The composite value, RRRRSSSS, is Huffman coded and each Huffman code is followed by additional bits which specify
the sign and exact amplitude of the coefficient.
The AC code table consists of one Huffman code (maximum length 16 bits, not including additional bits) for each
possible composite value. The Huffman codes for the 8-bit composite values are generated in such a way that no code
consists entirely of 1-bits.
90
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
The format for the additional bits is the same as in the coding of the DC coefficients. The value of SSSS gives the number
of additional bits required to specify the sign and precise amplitude of the coefficient. The additional bits are either the
low-order SSSS bits of ZZ(K) when ZZ(K) is positive or the low-order SSSS bits of ZZ(K) – 1 when ZZ(K) is negative.
ZZ(K) is the Kth coefficient in the zig-zag sequence of coefficients being coded.
F.1.2.2.2
Defining Huffman tables for the AC coefficients
The syntax for specifying the Huffman tables is given in Annex B. The procedure for creating a code table from this
information is described in Annex C.
In the baseline system no more than two Huffman tables may be defined for coding of AC coefficients. Two examples of
Huffman tables for coding of AC coefficients are provided in Annex K.
F.1.2.2.3
Huffman encoding procedures for AC coefficients
As defined in Annex C, the Huffman code table is assumed to be available as a pair of tables, EHUFCO (containing the
code bits) and EHUFSI (containing the length of each code in bits), both indexed by the composite value defined above.
The procedure for encoding the AC coefficients in a block is shown in Figures F.2 and F.3. In Figure F.2, K is the index
to the zig-zag scan position and R is the run length of zero coefficients.
The procedure “Append EHUFSI(X’F0’) bits of EHUFCO(X’F0’)” codes a run of 16 zero coefficients (ZRL code of
Figure F.1). The procedure “Code EHUFSI(0) bits of EHUFCO(0)” codes the end-of-block (EOB code). If the last
coefficient (K = 63) is not zero, the EOB code is bypassed.
CSIZE is a procedure which maps an AC coefficient to the SSSS value as defined in Table F.2.
F.1.2.3
Byte stuffing
In order to provide code space for marker codes which can be located in the compressed image data without decoding,
byte stuffing is used.
Whenever, in the course of normal encoding, the byte value X’FF’ is created in the code string, a X’00’ byte is stuffed
into the code string.
If a X’00’ byte is detected after a X’FF’ byte, the decoder must discard it. If the byte is not zero, a marker has been
detected, and shall be interpreted to the extent needed to complete the decoding of the scan.
Byte alignment of markers is achieved by padding incomplete bytes with 1-bits. If padding with 1-bits creates a X’FF’
value, a zero byte is stuffed before adding the marker.
F.1.3
Extended sequential DCT-based Huffman encoding process for 8-bit sample precision
This process is identical to the Baseline encoding process described in F.1.2, with the exception that the number of sets of
Huffman table destinations which may be used within the same scan is increased to four. Four DC and four AC Huffman
table destinations is the maximum allowed by this Specification.
F.1.4
Extended sequential DCT-based arithmetic encoding process for 8-bit sample precision
This subclause describes the use of arithmetic coding procedures in the sequential DCT-based encoding process.
NOTE – The arithmetic coding procedures in this Specification are defined for the maximum precision to encourage
interchangeability.
The arithmetic coding extensions have the same DCT model as the Baseline DCT encoder. Therefore, Annex F.1.1 also
applies to arithmetic coding. As with the Huffman coding technique, the binary arithmetic coding technique is lossless. It
is possible to transcode between the two systems without either FDCT or IDCT computations, and without modification of
the reconstructed image.
The basic principles of adaptive binary arithmetic coding are described in Annex D. Up to four DC and four AC
conditioning table destinations and associated statistics areas may be used within one scan.
The arithmetic encoding procedures for encoding binary decisions, initializing the statistics area, initializing the encoder,
terminating the code string, and adding restart markers are listed in Table D.1 of Annex D.
CCITT Rec. T.81 (1992 E)
91
ISO/IEC 10918-1 : 1993(E)
Encode_AC_
coefficients
K=0
R=0
K=K+1
ZZ(K) = 0
?
R=R+1
Yes
No
Append EHUFSI(X’F0’) bits
of EHUFCO(X’F0’)
R = R – 16
Yes
K = 63
?
No
Yes
Append EHUFSI(X’00’) bits
of EHUFCO(X’00’)
R > 15
?
No
Encode_R,ZZ(K)
R=0
No
K = 63
?
Yes
Done
TISO1340-93/d072
Figure F.2 – Procedure for sequential encoding of AC coefficients with Huffman coding
Figure F.2 [D72] = 21 cm = 821 %
92
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Encode_R,ZZ(K)
SSSS = CSIZE(ZZ(K))
RS = (16 × R) + SSSS
Append EHUFSI(RS) bits
of EHUFCO(RS)
Yes
ZZ(K) < 0
?
No
ZZ(K) = ZZ(K) – 1
Append SSSS
low order bits of ZZ(K)
Done
TISO1350-93/d073
Figure F.3 – Sequential encoding of a non-zero AC coefficient
Figure F.3 [D73] 14 cm = 547 %
Some of the procedures in Table D.1 are used in the higher level control structure for scans and restart intervals described
in Annex E. At the beginning of scans and restart intervals, the probability estimates used in the arithmetic coder are reset
to the standard initial value as part of the Initenc procedure which restarts the arithmetic coder. At the end of scans and
restart intervals, the Flush procedure is invoked to empty the code register before the next marker is appended.
F.1.4.1
Arithmetic encoding of DC coefficients
The basic structure of the decision sequence for encoding a DC difference value, DIFF, is shown in Figure F.4.
The context-index S0 and other context-indices used in the DC coding procedures are defined in Table F.4
(see F.1.4.4.1.3). A 0-decision is coded if the difference value is zero and a 1-decision is coded if the difference is not
zero. If the difference is not zero, the sign and magnitude are coded using the procedure Encode_V(S0), which is
described in F.1.4.3.1.
F.1.4.2
Arithmetic encoding of AC coefficients
The AC coefficients are coded in the order in which they occur in the zig-zag sequence ZZ(1,...,63). An end-of-block
(EOB) binary decision is coded before coding the first AC coefficient in ZZ, and after each non-zero coefficient. If the
EOB occurs, all remaining coefficients in ZZ are zero. Figure F.5 illustrates the decision sequence. The equivalent
procedure for the Huffman coder is found in Figure F.2.
CCITT Rec. T.81 (1992 E)
93
ISO/IEC 10918-1 : 1993(E)
Encode_DC_DIFF
V = DIFF
No
V=0
?
Code_1(S0)
Encode_V(S0)
Yes
Code_0(S0)
Done
TISO1360-93/d074
Figure F.4 – Coding model for arithmetic coding of DC difference
Figure F.4 [D74] = 11.5 cm = 449 %
The context-indices SE and S0 used in the AC coding procedures are defined in Table F.5 (see F.1.4.4.2). In Figure F.5,
K is the index to the zig-zag sequence position. For the sequential scan, Kmin is 1 and Se is 63. The V = 0 decision is part
of a loop which codes runs of zero coefficients. Whenever the coefficient is non-zero, “Encode_V(S0)” codes the sign and
magnitude of the coefficient. Each time a non-zero coefficient is coded, it is followed by an EOB decision. If the EOB
occurs, a 1-decision is coded to indicate that the coding of the block is complete. If the coefficient for K = Se is not zero,
the EOB decision is skipped.
F.1.4.3
Encoding the binary decision sequence for non-zero DC differences and AC coefficients
Both the DC difference and the AC coefficients are represented as signed two’s complement integer values. The
decomposition of these signed integer values into a binary decision tree is done in the same way for both the DC and AC
coding models.
Although the binary decision trees for this section of the DC and AC coding models are the same, the statistical models
for assigning statistics bins to the binary decisions in the tree are quite different.
F.1.4.3.1
Structure of the encoding decision sequence
The encoding sequence can be separated into three procedures, a procedure which encodes the sign, a second procedure
which identifies the magnitude category, and a third procedure which identifies precisely which magnitude occurred
within the category identified in the second procedure.
At the point where the binary decision sequence in Encode_V(S0) starts, the coefficient or difference has already been
determined to be non-zero. That determination was made in the procedures in Figures F.4 and F.5.
Denoting either DC differences (DIFF) or AC coefficients as V, the non-zero signed integer value of V is encoded by the
sequence shown in Figure F.6. This sequence first codes the sign of V. It then (after converting V to a magnitude and
decrementing it by 1 to give Sz) codes the magnitude category of Sz (code_log2_Sz), and then codes the low order
magnitude bits (code_Sz_bits) to identify the exact magnitude value.
94
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
There are two significant differences between this sequence and the similar set of operations described in F.1.2 for
Huffman coding. First, the sign is encoded before the magnitude category is identified, and second, the magnitude is
decremented by 1 before the magnitude category is identified.
Encode_AC_
Coefficients
K = Kmin
K = EOB
?
Yes
Code_1(SE)
No
Code_0(SE)
K=K+1
K=K+1
V = ZZ(K)
V=0
?
Yes
Code_0(S0)
No
Code_1(S0)
Encode_V(S0)
No
K = Se
?
Yes
Done
TISO1370-93/d075
Figure F.5 – AC coding model for arithmetic coding
Figure F.5 [D75] = 21 cm = 821 %
CCITT Rec. T.81 (1992 E)
95
ISO/IEC 10918-1 : 1993(E)
Encode_V(S)
Encode_sign_of_V
Sz = | V | – 1
Encode_log2_Sz
Encode_Sz_bits
Done
TISO1380-93/d076
Figure F.6 – Sequence of procedures in encoding non-zero values of V
Figure F.6 [D76] = 13.5 cm = 528 %
F.1.4.3.1.1
Encoding the sign
The sign is encoded by coding a 0-decision when the sign is positive and a 1-decision when the sign is negative
(see Figure F.7).
The context-indices SS, SN and SP are defined for DC coding in Table F.4 and for AC coding in Table F.5. After the sign
is coded, the context-index S is set to either SN or SP, establishing an initial value for Encode_log2_Sz.
F.1.4.3.1.2
Encoding the magnitude category
The magnitude category is determined by a sequence of binary decisions which compares Sz against an exponentially
increasing bound (which is a power of 2) in order to determine the position of the leading 1-bit. This establishes the
magnitude category in much the same way that the Huffman encoder generates a code for the value associated with the
difference category. The flow chart for this procedure is shown in Figure F.8.
The starting value of the context-index S is determined in Encode_sign_of_V, and the context-index values X1 and X2
are defined for DC coding in Table F.4 and for AC coding in Table F.5. In Figure F.8, M is the exclusive upper bound for
the magnitude and the abbreviations “SLL” and “SRL” refer to the shift-left-logical and shift-right-logical operations – in
this case by one bit position. The SRL operation at the completion of the procedure aligns M with the most significant bit
of Sz (see Table F.3).
The highest precision allowed for the DCT is 15 bits. Therefore, the highest precision required for the coding decision
tree is 16 bits for the DC coefficient difference and 15 bits for the AC coefficients, including the sign bit.
96
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Encode_sign_of_V
Yes
V<0
?
No
Code_1(SS)
Code_0(SS)
S = SN
S = SP
Done
TISO1390-93/d077
Figure F.7 – Encoding the sign of V
Figure F.7 [D77] = 11 cm = 430 %
Table F.3 – Categories for each maximum bound
Exclusive upper
bound (M)
Sz range
Number of low order
magnitude bits
11111
0
10
11112
1
10
11114
2,3
11
11118
4,...,7
12
11116
8,...,15
13
32332
16,...,31
14
66464
32,...,63
15
12128
64,...,127
16
25256
128,...,255
17
15512
256,...,511
18
11 024
512,...,1 023
19
22 048
1 024,...,2 047
10
14 096
2 048,...,4 095
11
18 192
4 096,...,8 191
12
16 384
8 192,...,16 383
13
32 768
16 384,...,32 767
14
CCITT Rec. T.81 (1992 E)
97
ISO/IEC 10918-1 : 1993(E)
Encode_log2_Sz
M=1
Sz < M
?
Yes
No
Code_1(S)
M=2
S = X1
Sz < M
?
Yes
No
Code_1(S)
M=4
S = X2
Sz < M
?
Yes
No
Code_1(S)
Code_0(S)
M = SLL M 1
S=S+1
M = SRL M 1
Done
TISO1400-93/d078
Figure F.8 – Decision sequence to establish the magnitude category
98
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
F.1.4.3.1.3
Encoding the exact value of the magnitude
After the magnitude category is encoded, the low order magnitude bits are encoded. These bits are encoded in order of
decreasing bit significance. The procedure is shown in Figure F.9. The abbreviation “SRL” indicates the shift-rightlogical operation, and M is the exclusive bound established in Figure F.8. Note that M has only one bit set – shifting M
right converts it into a bit mask for the logical “AND” operation.
The starting value of the context-index S is determined in Encode_log2_Sz. The increment of S by 14 at the beginning of
this procedure sets the context-index to the value required in Tables F.4 and F.5.
Encode_Sz-bits
S = S + 14
M = SRL M 1
M=0
?
Yes
No
T = M AND Sz
Yes
Code_0(S)
T=0
?
Done
No
Code_1(S)
TISO1410-93/d079
Figure F.9 – Decision sequence to code the magnitude bit pattern
Figure F.9 [D79] = 16.5 cm = 645 %
CCITT Rec. T.81 (1992 E)
99
ISO/IEC 10918-1 : 1993(E)
F.1.4.4
Statistical models
An adaptive binary arithmetic coder requires a statistical model. The statistical model defines the contexts which are used
to select the conditional probability estimates used in the encoding and decoding procedures.
Each decision in the binary decision trees is associated with one or more contexts. These contexts identify the sense of the
MPS and the index in Table D.3 of the conditional probability estimate Qe which is used to encode and decode the binary
decision.
The arithmetic coder is adaptive, which means that the probability estimates for each context are developed and
maintained by the arithmetic coding system on the basis of prior coding decisions for that context.
F.1.4.4.1
Statistical model for coding DC prediction differences
The statistical model for coding the DC difference conditions some of the probability estimates for the binary decisions on
previous DC coding decisions.
F.1.4.4.1.1
Statistical conditioning on sign
In coding the DC coefficients, four separate statistics bins (probability estimates) are used in coding the zero/not-zero (V =
0) decision, the sign decision and the first magnitude category decision. Two of these bins are used to code the V = 0
decision and the sign decision. The other two bins are used in coding the first magnitude decision, Sz < 1; one of these
bins is used when the sign is positive, and the other is used when the sign is negative. Thus, the first magnitude decision
probability estimate is conditioned on the sign of V.
F.1.4.4.1.2
Statistical conditioning on DC difference in previous block
The probability estimates for these first three decisions are also conditioned on Da, the difference value coded for the
previous DCT block of the same component. The differences are classified into five groups: zero, small positive, small
negative, large positive and large negative. The relationship between the default classification and the quantization scale is
shown in Figure F.10.
. . . –5
–4
–3
– large
–2
–1
– small
0
0
+1
+2
+3
+4
+ small
+5 . . . DC difference
+ large
Classification
TISO1420-93/d080
Figure F.10 – Conditioning classification of difference values
Figure F.10 [D80] = 3 cm = 117 %
The bounds for the “small” difference category determine the classification. Defining L and U as integers in the range 0 to
15 inclusive, the lower bound (exclusive) for difference magnitudes classified as “small” is zero for L = 0, and is 2L–1 for
L > 0.
The upper bound (inclusive) for difference magnitudes classified as “small” is 2U.
L shall be less than or equal to U.
These bounds for the conditioning category provide a segmentation which is identical to that listed in Table F.3.
F.1.4.4.1.3
Assignment of statistical bins to the DC binary decision tree
As shown in Table F.4, each statistics area for DC coding consists of a set of 49 statistics bins. In the following
explanation, it is assumed that the bins are contiguous. The first 20 bins consist of five sets of four bins selected by a
context-index S0. The value of S0 is given by DC_Context(Da), which provides a value of 0, 4, 8, 12 or 16, depending on
the difference classification of Da (see F.1.4.4.1.2). The remaining 29 bins, X1,...,X15,M2,...,M15, are used to code
magnitude category decisions and magnitude bits.
100
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Table F.4 – Statistical model for DC coefficient coding
Context-index
S0
SS
SP
SN
X1
X2
X3
.
.
X15
M2
M3
.
.
M15
F.1.4.4.1.4
Value
DC_Context(Da)
S0 + 1
S0 + 2
S0 + 3
20
X1 + 1
X1 + 2
.
.
X1 + 14
X2 + 14
X3 + 14
.
.
X15 + 14
Coding decision
V=0
Sign of V
Sz < 1 if V > 0
Sz < 1 if V < 0
Sz < 2
Sz < 4
Sz < 8
.
.
Sz < 215
Magnitude bits if Sz < 4
Magnitude bits if Sz < 8
.
.
Magnitude bits if Sz < 215
Default conditioning for DC statistical model
The bounds, L and U, for determining the conditioning category have the default values L = 0 and U = 1. Other bounds
may be set using the DAC (Define Arithmetic coding Conditioning) marker segment, as described in Annex B.
F.1.4.4.1.5
Initial conditions for DC statistical model
At the start of a scan and at the beginning of each restart interval, the difference for the previous DC value is defined to be
zero in determining the conditioning state.
F.1.4.4.2
Statistical model for coding the AC coefficients
As shown in Table F.5, each statistics area for AC coding consists of a contiguous set of 245 statistics bins. Three bins are
used for each value of the zig-zag index K, and two sets of 28 additional bins X2,...,X15,M2,...,M15 are used for coding
the magnitude category and magnitude bits.
The value of SE (and also S0, SP and SN) is determined by the zig-zag index K. Since K is in the range 1 to 63, the
lowest value for SE is 0 and the largest value for SP is 188. SS is not assigned a value in AC coefficient coding, as the
signs of the coefficients are coded with a fixed probability value of approximately 0.5 (Qe = X’5A1D’, MPS = 0).
The value of X2 is given by AC_Context(K). This gives X2 = 189 when K ≤ Kx and X2 = 217 when K > Kx, where Kx is
defined using the DAC marker segment (see B.2.4.3).
Note that a X1 statistics bin is not used in this sequence. Instead, the 63 × 1 array of statistics bins for the magnitude
category is used for two decisions. Once the magnitude bound has been determined – at statistics bin Xn, for example – a
single statistics bin, Mn, is used to code the magnitude bit sequence for that bound.
F.1.4.4.2.1
Default conditioning for AC coefficient coding
The default value of Kx is 5. This may be modified using the DAC marker segment, as described in Annex B.
F.1.4.4.2.2
Initial conditions for AC statistical model
At the start of a scan and at each restart, all statistics bins are re-initialized to the standard default value described in
Annex D.
CCITT Rec. T.81 (1992 E)
101
ISO/IEC 10918-1 : 1993(E)
Table F.5 – Statistical model for AC coefficient coding
Context-index
Value
3 × (K – 1)
SE + 1
Fixed estimate
S0 + 1
S0 + 1
AC_Context(K)
X2 + 1
.
.
X2 + 13
X2 + 14
X3 + 14
.
.
X15 + 14
SE
S0
SS
SN,SP
X1
X2
X3
.
.
X15
M2
M3
.
.
M15
F.1.5
Coding decision
K = EOB
V=0
Sign of V
Sz < 1
Sz < 2
Sz < 4
Sz < 8
.
.
Sz < 215
Magnitude bits if Sz < 4
Magnitude bits if Sz < 8
.
.
Magnitude bits if Sz < 215
Extended sequential DCT-based Huffman encoding process for 12-bit sample precision
This process is identical to the sequential DCT process for 8-bit precision extended to four Huffman table destinations as
documented in F.1.3, with the following changes.
F.1.5.1
Structure of DC code table for 12-bit sample precision
The two’s complement difference magnitudes are grouped into 16 categories, SSSS, and a Huffman code is created for
each of the 16 difference magnitude categories.
The Huffman table for DC coding (see Table F.1) is extended as shown in Table F.6.
Table F.6 – Difference magnitude categories for DC coding
F.1.5.2
SSSS
Difference values
12
–4 095..–2 048,2 048..4 095
13
–8 191..–4 096,4 096..8 191
14
–16 383..–8 192,8 192..16 383
15
–32 767..–16 384,16 384..32 767
Structure of AC code table for 12-bit sample precision
The general structure of the code table is extended as illustrated in Figure F.11. The Huffman table for AC coding is
extended as shown in Table F.7.
102
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
0
RRRR
0
.
.
.
15
1
2
EOB
N/A
N/A
N/A
ZRL
SSSS
.
.
.
14
13
COMPOSITE VALUES
TISO1430-93/d081
Figure F.11 – Two-dimensional value array for Huffman coding
Figure F.11 {D81] = 4.5 cm = 176 %
Table F.7 – Values assigned to coefficient amplitude ranges
F.1.6
SSSS
AC coefficients
11
–2 047..–1 024,1 024..2 047
12
–4 095..–2 048,2 048..4 095
13
–8 191..–4 096,4 096..8 191
14
–16 383..–8 192,8 192..16 383
Extended sequential DCT-based arithmetic encoding process for 12-bit sample precision
The process is identical to the sequential DCT process for 8-bit precision except for changes in the precision of the FDCT
computation.
The structure of the encoding procedure is identical to that specified in F.1.4 which was already defined for a 12-bit
sample precision.
F.2
Sequential DCT-based decoding processes
F.2.1
Sequential DCT-based control procedures and coding models
F.2.1.1
Control procedures for sequential DCT-based decoders
The control procedures for decoding compressed image data and its constituent parts – the frame, scan, restart interval and
MCU – are given in Figures E.6 to E.10. The procedure for decoding a MCU (Figure E.10) repetitively calls the
procedure for decoding a data unit. For DCT-based decoders the data unit is an 8 × 8 block of samples.
F.2.1.2
Procedure for decoding an 8 × 8 block data unit
In the sequential DCT-based decoding process, decoding an 8 × 8 block data unit consists of the following procedures:
F.2.1.3
a)
decode DC coefficient for 8 × 8 block using the DC table destination specified in the scan header;
b)
decode AC coefficients for 8 × 8 block using the AC table destination specified in the scan header;
c)
dequantize using table destination specified in the frame header and calculate the inverse 8 × 8 DCT.
Decoding models for the sequential DCT procedures
Two decoding procedures are used, one for the DC coefficient ZZ(0) and the other for the AC coefficients ZZ(1)...ZZ(63).
The coefficients are decoded in the order in which they occur in the zig-zag sequence order, starting with the DC
coefficient. The coefficients are represented as two’s complement integers.
CCITT Rec. T.81 (1992 E)
103
ISO/IEC 10918-1 : 1993(E)
F.2.1.3.1
Decoding model for DC coefficients
The decoded difference, DIFF, is added to PRED, the DC value from the most recently decoded 8 × 8 block from the
same component. Thus ZZ(0) = PRED + DIFF.
At the beginning of the scan and at the beginning of each restart interval, the prediction for the DC coefficient is
initialized to zero.
F.2.1.3.2
Decoding model for AC coefficients
The AC coefficients are decoded in the order in which they occur in ZZ. When the EOB is decoded, all remaining
coefficients in ZZ are initialized to zero.
F.2.1.4
Dequantization of the quantized DCT coefficients
The dequantization of the quantized DCT coefficients as described in Annex A, is accomplished by multiplying each
quantized coefficient value by the quantization table value for that coefficient. The decoder shall be able to use up to four
quantization table destinations.
F.2.1.5
Inverse DCT (IDCT)
The mathematical definition of the IDCT is given in A.3.3.
After computation of the IDCT, the signed output samples are level-shifted, as described in Annex A, converting the
output to an unsigned representation. For 8-bit precision the level shift is performed by adding 128. For 12-bit precision
the level shift is performed by adding 2 048. If necessary, the output samples shall be clamped to stay within the range
appropriate for the precision (0 to 255 for 8-bit precision and 0 to 4 095 for 12-bit precision).
F.2.2
Baseline Huffman Decoding procedures
The baseline decoding procedure is for 8-bit sample precision. The decoder shall be capable of using up to two DC and
two AC Huffman tables within one scan.
F.2.2.1
Huffman decoding of DC coefficients
The decoding procedure for the DC difference, DIFF, is:
T = DECODE
DIFF = RECEIVE(T)
DIFF = EXTEND(DIFF,T)
where DECODE is a procedure which returns the 8-bit value associated with the next Huffman code in the compressed
image data (see F.2.2.3) and RECEIVE(T) is a procedure which places the next T bits of the serial bit string into the low
order bits of DIFF, MSB first. If T is zero, DIFF is set to zero. EXTEND is a procedure which converts the partially
decoded DIFF value of precision T to the full precision difference. EXTEND is shown in Figure F.12.
104
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
EXTEND(V,T)
V t = 2 T –1
V < Vt
?
Yes
No
V t = (SLL –1 T) + 1
V = V + Vt
Return V
TISO1440-93/d082
Figure F.12 – Extending the sign bit of a decoded value in V
Figure F.12 [D82] = 11 cm = 430 %
F.2.2.2
Decoding procedure for AC coefficients
The decoding procedure for AC coefficients is shown in Figures F.13 and F.14.
CCITT Rec. T.81 (1992 E)
105
ISO/IEC 10918-1 : 1993(E)
Decode_AC_
coefficients
K=1
ZZ(1,...,63) = 0
K=K+1
K = K + 16
RS = DECODE
SSSS = RS modulo 16
RRRR = SRL RS 4
R = RRRR
SSSS = 0
?
Yes
No
K=K+R
Decode_ZZ(K)
No
K = 63
?
Yes
R = 15
?
Yes
No
Done
TISO1450-93/d083
Figure F.13 – Huffman decoding procedure for AC coefficients
Figure F.13 [D83] = 21 cm = 821 %
106
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Decode_ZZ(K)
ZZ(K) = RECEIVE(SSSS)
ZZ(K) = EXTEND(ZZ(K),SSSS)
Done
TISO1460-93/d084
Figure F.14 – Decoding a non-zero AC coefficient
Figure F.14 [D84] = 7 cm = 273 %
The decoding of the amplitude and sign of the non-zero coefficient is done in the procedure “Decode_ZZ(K)”, shown in
Figure F.14.
DECODE is a procedure which returns the value, RS, associated with the next Huffman code in the code stream
(see F.2.2.3). The values SSSS and R are derived from RS. The value of SSSS is the four low order bits of the composite
value and R contains the value of RRRR (the four high order bits of the composite value). The interpretation of these
values is described in F.1.2.2. EXTEND is shown in Figure F.12.
F.2.2.3
The DECODE procedure
The DECODE procedure decodes an 8-bit value which, for the DC coefficient, determines the difference magnitude
category. For the AC coefficient this 8-bit value determines the zero run length and non-zero coefficient category.
Three tables, HUFFVAL, HUFFCODE, and HUFFSIZE, have been defined in Annex C. This particular implementation
of DECODE makes use of the ordering of the Huffman codes in HUFFCODE according to both value and code size.
Many other implementations of DECODE are possible.
NOTE – The values in HUFFVAL are assigned to each code in HUFFCODE and HUFFSIZE in sequence. There are no
ordering requirements for the values in HUFFVAL which have assigned codes of the same length.
The implementation of DECODE described in this subclause uses three tables, MINCODE, MAXCODE and VALPTR,
to decode a pointer to the HUFFVAL table. MINCODE, MAXCODE and VALPTR each have 16 entries, one for each
possible code size. MINCODE(I) contains the smallest code value for a given length I, MAXCODE(I) contains the largest
code value for a given length I, and VALPTR(I) contains the index to the start of the list of values in HUFFVAL which
are decoded by code words of length I. The values in MINCODE and MAXCODE are signed 16-bit integers; therefore, a
value of –1 sets all of the bits.
The procedure for generating these tables is shown in Figure F.15. The procedure for DECODE is shown in Figure F.16.
Note that the 8-bit “VALUE” is returned to the procedure which invokes DECODE.
CCITT Rec. T.81 (1992 E)
107
ISO/IEC 10918-1 : 1993(E)
Decoder_tables
I=0
J=0
MAXCODE(I) = –1
I=I+1
I > 16
?
Yes
No
Yes
BITS(I) = 0
?
Done
No
VALPTR(I) = J
MINCODE(I) = HUFFCODE(J)
J = J + BITS(I) – 1
MAXCODE(I) = HUFFCODE(J)
J=J+1
TISO1470-93/d085
Figure F.15 – Decoder table generation
Figure F.15 [D85] = 14.5 cm = 567 %
108
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
DECODE
I=1
CODE = NEXTBIT
I=I+1
CODE = (SLL CODE 1) + NEXTBIT
Yes
CODE > MAXCODE(I)
?
No
J = VALPTR(I)
J = J + CODE – MINCODE(I)
VALUE = HUFFVAL(J)
Return VALUE
TISO1480-93/d086
Figure F.16 – Procedure for DECODE
Figure F.16 [D86] = 14 cm = 547 %
CCITT Rec. T.81 (1992 E)
109
ISO/IEC 10918-1 : 1993(E)
F.2.2.4
The RECEIVE procedure
RECEIVE(SSSS) is a procedure which places the next SSSS bits of the entropy-coded segment into the low order bits of
DIFF, MSB first. It calls NEXTBIT and it returns the value of DIFF to the calling procedure (see Figure F.17).
RECEIVE(SSSS)
I=0
V=0
I= I+ 1
V = (SLL V 1) + NEXTBIT
No
I = SSSS
?
Yes
Return V
TISO1490-93/d087
F igu r e F .17 – Pr oced u r e for R E C E I V E (SSSS)
Figure F.17 [D87] = 11.5 cm = 449 %
F.2.2.5
The NEXTBIT procedure
NEXTBIT reads the next bit of compressed data and passes it to higher level routines. It also intercepts and removes stuff
bytes and detects markers. NEXTBIT reads the bits of a byte starting with the MSB (see Figure F.18).
Before starting the decoding of a scan, and after processing a RST marker, CNT is cleared. The compressed data are read
one byte at a time, using the procedure NEXTBYTE. Each time a byte, B, is read, CNT is set to 8.
The only valid marker which may occur within the Huffman coded data is the RSTm marker. Other than the EOI or
markers which may occur at or before the start of a scan, the only marker which can occur at the end of the scan is the
DNL (define-number-of-lines).
Normally, the decoder will terminate the decoding at the end of the final restart interval before the terminating marker is
intercepted. If the DNL marker is encountered, the current line count is set to the value specified by that marker. Since the
DNL marker can only be used at the end of the first scan, the scan decode procedure must be terminated when it is
encountered.
110
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
NEXTBIT
Yes
CNT = 0
?
No
B = NEXTBYTE
CNT = 8
No
B = X’FF’
?
Yes
B2 = NEXTBYTE
Yes
B2 = 0
?
No
BIT = SRL B 7
CNT = CNT – 1
B = SLL B 1
B2 = DNL
?
Yes
Process DNL marker
No
Return BIT
Error
Terminate scan
TISO1500-93/d088
Figure F.18 – Procedure for fetching the next bit of compressed data
Figure F.18 [D88] = 17 cm = 665 %
F.2.3
Sequential DCT decoding process with 8-bit precision extended to four sets of Huffman tables
This process is identical to the Baseline decoding process described in F.2.2, with the exception that the decoder shall be
capable of using up to four DC and four AC Huffman tables within one scan. Four DC and four AC Huffman tables is the
maximum allowed by this Specification.
F.2.4
Sequential DCT decoding process with arithmetic coding
This subclause describes the sequential DCT decoding process with arithmetic decoding.
The arithmetic decoding procedures for decoding binary decisions, initializing the statistical model, initializing the
decoder, and resynchronizing the decoder are listed in Table D.4 of Annex D.
Some of the procedures in Table D.4 are used in the higher level control structure for scans and restart intervals described
in F.2. At the beginning of scans and restart intervals, the probability estimates used in the arithmetic decoder are reset to
the standard initial value as part of the Initdec procedure which restarts the arithmetic coder.
CCITT Rec. T.81 (1992 E)
111
ISO/IEC 10918-1 : 1993(E)
The statistical models defined in F.1.4.4 also apply to this decoding process.
The decoder shall be capable of using up to four DC and four AC conditioning tables and associated statistics areas within
one scan.
F.2.4.1
Arithmetic decoding of DC coefficients
The basic structure of the decision sequence for decoding a DC difference value, DIFF, is shown in Figure F.19. The
equivalent structure for the encoder is found in Figure F.4.
Decode_DC_DIFF
D = Decode(S0)
No
D=0
?
Decode_V(S0)
Yes
DIFF = 0
DIFF = V
Done
TISO1510-93/d089
Figure F.19 – Arithmetic decoding of DC difference
Figure F.19 [D89] = 13 cm = 508 %
The context-indices used in the DC decoding procedures are defined in Table F.4 (see F.1.4.4.1.3).
The “Decode” procedure returns the value “D” of the binary decision. If the value is not zero, the sign and magnitude of
the non-zero DIFF must be decoded by the procedure “Decode_V(S0)”.
F.2.4.2
Arithmetic Decoding of AC coefficients
The AC coefficients are decoded in the order that they occur in ZZ(1,...,63). The encoder procedure for the coding process
is found in Figure F.5. Figure F.20 illustrates the decoding sequence.
112
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Decode_AC_
coefficients
K = Kmin
D = Decode(SE)
D=1
?
Yes
No
K=K+1
K=K+1
D = Decode(S0)
D=0
?
Yes
No
Decode_V(S0)
ZZ(K) = V
No
K = Se
?
Yes
Done
TISO1520-93/d090
Figure F.20 – Procedure for decoding the AC coefficients
Figure F.20 [D90] = 21 cm = 821 % presque pleine...
The context-indices used in the AC decoding procedures are defined in Table F.5 (see F.1.4.4.2).
CCITT Rec. T.81 (1992 E)
113
ISO/IEC 10918-1 : 1993(E)
In Figure F.20, K is the index to the zig-zag sequence position. For the sequential scan, Kmin = 1 and Se = 63. The
decision at the top of the loop is the EOB decision. If the EOB occurs (D = 1), the remaining coefficients in the block are
set to zero. The inner loop just below the EOB decoding decodes runs of zero coefficients. Whenever the coefficient is
non-zero, “Decode_V” decodes the sign and magnitude of the coefficient. After each non-zero coefficient is decoded, the
EOB decision is again decoded unless K = Se.
F.2.4.3
Decoding the binary decision sequence for non-zero DC differences and AC coefficients
Both the DC difference and the AC coefficients are represented as signed two’s complement 16-bit integer values. The
decoding decision tree for these signed integer values is the same for both the DC and AC coding models. Note, however,
that the statistical models are not the same.
F.2.4.3.1
Arithmetic decoding of non-zero values
Denoting either DC differences or AC coefficients as V, the non-zero signed integer value of V is decoded by the
sequence shown in Figure F.21. This sequence first decodes the sign of V. It then decodes the magnitude category of V
(Decode_log2_Sz), and then decodes the low order magnitude bits (Decode_Sz_bits). Note that the value decoded for Sz
must be incremented by 1 to get the actual coefficient magnitude.
Decode_V(S)
Decode_sign_of_V
Decode_log2_Sz
Decode_Sz_bits
V = Sz + 1
SIGN = 1
?
Yes
V = –V
No
Done
TISO1530-93/d091
Figure F.21 – Sequence of procedures in decoding non-zero values of V
Figure F.21 [D91] = 15.5 cm = 606 %
114
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
F.2.4.3.1.1
Decoding the sign
The sign is decoded by the procedure shown in Figure F.22.
The context-indices are defined for DC decoding in Table F.4 and AC decoding in Table F.5.
If SIGN = 0, the sign of the coefficient is positive; if SIGN = 1, the sign of the coefficient is negative.
Decode_sign_of_V
SIGN = Decode(SS)
Yes
SIGN = 1
?
S = SN
No
S = SP
Done
TISO1540-93/d092
Figure F.22 – Decoding the sign of V
Figure F.22 [D92] = 11 cm = 430 %
F.2.4.3.1.2
Decoding the magnitude category
The context-index S is set in Decode_sign_of_V and the context-index values X1 and X2 are defined for DC coding in
Table F.4 and for AC coding in Table F.5.
In Figure F.23, M is set to the upper bound for the magnitude and shifted left until the decoded decision is zero. It is then
shifted right by 1 to become the leading bit of the magnitude of Sz.
CCITT Rec. T.81 (1992 E)
115
ISO/IEC 10918-1 : 1993(E)
Decode_log2_Sz
M=1
D = Decode(S)
D=0
?
Yes
No
M=2
S = X1
D = Decode(S)
D=0
?
Yes
No
M=4
S = X2
D = Decode(S)
D=0
?
Yes
No
M = SLL M 1
S=S+1
M = SRL M 1
Sz = M
Done
TISO1550-93/d093
Figure F.23 – Decoding procedure to establish the magnitude category
116
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
F.2.4.3.1.3
Decoding the exact value of the magnitude
After the magnitude category is decoded, the low order magnitude bits are decoded. These bits are decoded in order of
decreasing bit significance. The procedure is shown in Figure F.24.
The context-index S is set in Decode_log2_Sz.
Decode_Sz_bits
S = S + 14
M = SRL M 1
M=0
?
Yes
No
D = Decode(S)
Yes
Done
D=0
?
No
Sz = M OR Sz
TISO1560-93/d094
Figure F.24 – Decision sequence to decode the magnitude bit pattern
Figure F.24 [D94] = 16 cm = 625 %
F.2.4.4
Decoder restart
The RSTm markers which are added to the compressed data between each restart interval have a two byte value which
cannot be generated by the coding procedures. These two byte sequences can be located without decoding, and can
therefore be used to resynchronize the decoder. RSTm markers can therefore be used for error recovery.
CCITT Rec. T.81 (1992 E)
117
ISO/IEC 10918-1 : 1993(E)
Before error recovery procedures can be invoked, the error condition must first be detected. Errors during decoding can
show up in two places:
a)
The decoder fails to find the expected marker at the point where it is expecting resynchronization.
b)
Physically impossible data are decoded. For example, decoding a magnitude beyond the range of values
allowed by the model is quite likely when the compressed data are corrupted by errors. For arithmetic
decoders this error condition is extremely important to detect, as otherwise the decoder may reach a
condition where it uses the compressed data very slowly.
NOTE – Some errors will not cause the decoder to lose synchronization. In addition, recovery is not
possible for all errors; for example, errors in the headers are likely to be catastrophic. The two error
conditions listed above, however, almost always cause the decoder to lose synchronization in a way which
permits recovery.
In regaining synchronization, the decoder can make use of the modulo 8 coding restart interval number in the low order
bits of the RSTm marker. By comparing the expected restart interval number to the value in the next RSTm marker in the
compressed image data, the decoder can usually recover synchronization. It then fills in missing lines in the output data by
replication or some other suitable procedure, and continues decoding. Of course, the reconstructed image will usually be
highly corrupted for at least a part of the restart interval where the error occurred.
F.2.5
Sequential DCT decoding process with Huffman coding and 12-bit precision
This process is identical to the sequential DCT process defined for 8-bit sample precision and extended to four Huffman
tables, as documented in F.2.3, but with the following changes.
F.2.5.1
Structure of DC Huffman decode table
The general structure of the DC Huffman decode table is extended as described in F.1.5.1.
F.2.5.2
Structure of AC Huffman decode table
The general structure of the AC Huffman decode table is extended as described in F.1.5.2.
F.2.6
Sequential DCT decoding process with arithmetic coding and 12-bit precision
The process is identical to the sequential DCT process for 8-bit precision except for changes in the precision of the IDCT
computation.
The structure of the decoding procedure in F.2.4 is already defined for a 12-bit input precision.
118
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Annex G
Progressive DCT-based mode of operation
(This annex forms an integral part of this Recommendation | International Standard)
This annex provides a functional specification of the following coding processes for the progressive DCT-based mode
of operation:
1)
spectral selection only, Huffman coding, 8-bit sample precision;
2)
spectral selection only, arithmetic coding, 8-bit sample precision;
3)
full progression, Huffman coding, 8-bit sample precision;
4)
full progression, arithmetic coding, 8-bit sample precision;
5)
spectral selection only, Huffman coding, 12-bit sample precision;
6)
spectral selection only, arithmetic coding, 12-bit sample precision;
7)
full progression, Huffman coding, 12-bit sample precision;
8)
full progression, arithmetic coding, 12-bit sample precision.
For each of these, the encoding process is specified in G.1, and the decoding process is specified in G.2. The functional
specification is presented by means of specific flow charts for the various procedures which comprise these coding
processes.
NOTE – There is no requirement in this Specification that any encoder or decoder which embodies one of the above-named
processes shall implement the procedures in precisely the manner specified by the flow charts in this annex. It is necessary only that an
encoder or decoder implement the function specified in this annex. The sole criterion for an encoder or decoder to be considered in
compliance with this Specification is that it satisfy the requirements given in clause 6 (for encoders) or clause 7 (for decoders), as
determined by the compliance tests specified in Part 2.
The number of Huffman or arithmetic conditioning tables which may be used within the same scan is four.
Two complementary progressive procedures are defined, spectral selection and successive approximation.
In spectral selection the DCT coefficients of each block are segmented into frequency bands. The bands are coded in
separate scans.
In successive approximation the DCT coefficients are divided by a power of two before coding. In the decoder the
coefficients are multiplied by that same power of two before computing the IDCT. In the succeeding scans the precision of
the coefficients is increased by one bit in each scan until full precision is reached.
An encoder or decoder implementing a full progression uses spectral selection within successive approximation. An
allowed subset is spectral selection alone.
Figure G.1 illustrates the spectral selection and successive approximation progressive processes.
G.1
Progressive DCT-based encoding processes
G.1.1
Control procedures and coding models for progressive DCT-based procedures
G.1.1.1 Control procedures for progressive DCT-based encoders
The control procedures for encoding an image and its constituent parts – the frame, scan, restart interval and MCU – are
given in Figures E.1 through E.5.
The control structure for encoding a frame is the same as for the sequential procedures. However, it is convenient to
calculate the FDCT for the entire set of components in a frame before starting the scans. A buffer which is large enough to
store all of the DCT coefficients may be used for this progressive mode of operation.
The number of scans is determined by the progression defined; the number of scans may be much larger than the number
of components in the frame.
CCITT Rec. T.81 (1992 E)
119
ISO/IEC 10918-1 : 1993(E)
Blocks
DCT
coefficients
1
0
1
2
Sending
a) Image component
a) as quantized
a) DCT coefficients
62
63
7 6
MSB
10
LSB
b) Sequential encoding
Sending
0
0
7
1st scan
Sending
1st scan
0
1
2
1
2
Sending
62
63
2nd scan
7654
MSB
2nd scan
Sending
3
4
5
Sending
3rd scan
3
3rd scan
Sending
61
62
63
Sending
nth scan
TISO1570-93/d095
0
(LSB)
6th scan
c) Progressive encoding –
a) Spectral selection
d) Progressive encoding –
a) Successive approximation
Figure G.1 – Spectral selection and successive approximation progressive processes
120
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
The procedure for encoding a MCU (see Figure E.5) repetitively invokes the procedure for coding a data unit. For
DCT-based encoders the data unit is an 8 × 8 block of samples.
Only a portion of each 8 × 8 block is coded in each scan, the portion being determined by the scan header parameters Ss,
Se, Ah, and Al (see B.2.3). The procedures used to code portions of each 8 × 8 block are described in this annex. Note,
however, that where these procedures are identical to those used in the sequential DCT-based mode of operation, the
sequential procedures are simply referenced.
G.1.1.1.1
Spectral selection control
In spectral selection the zig-zag sequence of DCT coefficients is segmented into bands. A band is defined in the scan
header by specifying the starting and ending indices in the zig-zag sequence. One band is coded in a given scan of the
progression. DC coefficients are always coded separately from AC coefficients, and only scans which code DC
coefficients may have interleaved blocks from more than one component. All other scans shall have only one component.
With the exception of the first DC scans for the components, the sequence of bands defined in the scans need not follow
the zig-zag ordering. For each component, a first DC scan shall precede any AC scans.
G.1.1.1.2
Successive approximation control
If successive approximation is used, the DCT coefficients are reduced in precision by the point transform (see A.4)
defined in the scan header (see B.2.3). The successive approximation bit position parameter Al specifies the actual point
transform, and the high four bits (Ah) – if there are preceding scans for the band – contain the value of the point transform
used in those preceding scans. If there are no preceding scans for the band, Ah is zero.
Each scan which follows the first scan for a given band progressively improves the precision of the coefficients by one bit,
until full precision is reached.
G.1.1.2 Coding models for progressive DCT-based encoders
If successive approximation is used, the DCT coefficients are reduced in precision by the point transform (see A.4)
defined in the scan header (see B.2.3). These models also apply to the progressive DCT-based encoders, but with the
following changes.
G.1.1.2.1
Progressive encoding model for DC coefficients
If Al is not zero, the point transform for DC coefficients shall be used to reduce the precision of the DC coefficients. If Ah
is zero, the coefficient values (as modified by the point transform) shall be coded, using the procedure described in Annex
F. If Ah is not zero, the least significant bit of the point transformed DC coefficients shall be coded, using the procedures
described in this annex.
G.1.1.2.2
Progressive encoding model for AC coefficients
If Al is not zero, the point transform for AC coefficients shall be used to reduce the precision of the AC coefficients. If Ah
is zero, the coefficient values (as modified by the point transform) shall be coded using modifications of the procedures
described in Annex F. These modifications are described in this annex. If Ah is not zero, the precision of the coefficients
shall be improved using the procedures described in this annex.
G.1.2
Progressive encoding procedures with Huffman coding
G.1.2.1 Progressive encoding of DC coefficients with Huffman coding
The first scan for a given component shall encode the DC coefficient values using the procedures described in F.1.2.1. If
the successive approximation bit position parameter Al is not zero, the coefficient values shall be reduced in precision by
the point transform described in Annex A before coding.
In subsequent scans using successive approximation the least significant bits are appended to the compressed bit stream
without compression or modification (see G.1.2.3), except for byte stuffing.
G.1.2.2 Progressive encoding of AC coefficients with Huffman coding
In spectral selection and in the first scan of successive approximation for a component, the AC coefficient coding model is
similar to that used by the sequential procedures. However, the Huffman code tables are extended to include coding of
runs of End-Of-Bands (EOBs). See Table G.1.
CCITT Rec. T.81 (1992 E)
121
ISO/IEC 10918-1 : 1993(E)
Table G.1 – EOBn code run length extensions
EOBn code
Run length
EOB0
1
EOB1
2,3
EOB2
4..7
EOB3
8..15
EOB4
16..31
EOB5
32..63
EOB6
64..127
EOB7
128..255
EOB8
256..511
EOB9
512..1 023
EOB10
1 024..2 047
EOB11
2 048..4 095
EOB12
4 096..8 191
EOB13
8 192..16 383
EOB14
16 384..32 767
The end-of-band run structure allows efficient coding of blocks which have only zero coefficients. An EOB run of length
5 means that the current block and the next four blocks have an end-of-band with no intervening non-zero coefficients.
The EOB run length is limited only by the restart interval.
The extension of the code table is illustrated in Figure G.2.
SSSS
0
RRRR
0
1
.
.
.
14
15
EOB0
EOB1
.
.
.
EOB14
ZRL
1
2
.
.
.
13
14
COMPOSITE VALUES
TISO1580-93/d096
Figure G.2 – Two-dimensional value array for Huffman coding
Figure G.2 [D96] = 4.5 cm = 176 %
The EOBn code sequence is defined as follows. Each EOBn code is followed by an extension field similar to the
extension field for the coefficient amplitudes (but with positive numbers only). The number of bits appended to the EOBn
code is the minimum number required to specify the run length.
If an EOB run is greater than 32 767, it is coded as a sequence of EOB runs of length 32 767 followed by a final EOB run
sufficient to complete the run.
At the beginning of each restart interval the EOB run count, EOBRUN, is set to zero. At the end of each restart interval
any remaining EOB run is coded.
The Huffman encoding procedure for AC coefficients in spectral selection and in the first scan of successive
approximation is illustrated in Figures G.3, G.4, G.5, and G.6.
122
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Encode_AC_
coefficients_SS
K = Ss – 1
R=0
K=K+1
No
ZZ(K) = 0
?
Encode_EOBRUN
Yes
Yes
R=R+1
K = Se
?
R < 16
?
No
Yes
No
EOBRUN =
EOBRUN + 1
Encode_ZRL
EOBRUN = X’7FFF’
?
No
Yes
Encode_R_ZZ(K)
Encode_EOBRUN
No
K = Se
?
Yes
Done
TISO1590-93/d097
Figure G.3 – Procedure for progressive encoding of AC coefficients with Huffman coding
Figure G.3[D97] = Pleine page
CCITT Rec. T.81 (1992 E)
123
ISO/IEC 10918-1 : 1993(E)
In Figure G.3, Ss is the start of spectral selection, Se is the end of spectral selection, K is the index into the list of
coefficients stored in the zig-zag sequence ZZ, R is the run length of zero coefficients, and EOBRUN is the run length of
EOBs. EOBRUN is set to zero at the start of each restart interval.
If the scan header parameter Al (successive approximation bit position low) is not zero, the DCT coefficient values ZZ(K)
in Figure G.3 and figures which follow in this annex, including those in the arithmetic coding section, shall be replaced
by the point transformed values ZZ’(K), where ZZ’(K) is defined by:
ZZ’(K) =
ZZ(K)x
2 Al
EOBSIZE is a procedure which returns the size of the EOB extension field given the EOB run length as input. CSIZE is a
procedure which maps an AC coefficient to the SSSS value defined in the subclauses on sequential encoding (see F.1.1
and F.1.3).
Encode_EOBRUN
Yes
EOBRUN = 0
?
No
SSSS = EOBSIZE(EOBRUN)
I = SSSS × 16
Append EHUFSI(I)
bits of EHUFCO(I)
Append SSSS low order
bits of EOBRUN
EOBRUN = 0
Done
TISO1600-93/d098
Figure G.4 – Progressive encoding of a non-zero AC coefficient
Figure G.4 [98] = 11 cm = 430 %
Encode_ZRL
Append EHUFSI(X’F0’)
bits of EHUFCO(X’F0’)
R = R – 16
Done
TISO1610-93/d099
Figure G.5 – Encoding of the run of zero coefficients
124
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Encode_R_ZZ(K)
SSSS = CSIZE(ZZ(K))
I = (16 × R) + SSSS
Append EHUFSI(I)
bits of EHUFCO(I)
ZZ(K) < 0
?
No
Yes
ZZ(K) = ZZ(K) – 1
Append SSSS low order
bits of ZZ(K)
R=0
Done
TISO1620-93/d100
Figure G.5 [99] = 7 cm = 273 % Figure G.6 – Encoding of the zero run and non-zero coefficient
Figure G.6 [D100] = 12.5 cm = 489 %
G.1.2.3 Coding model for subsequent scans of successive approximation
The Huffman coding structure of the subsequent scans of successive approximation for a given component is similar to the
coding structure of the first scan of that component.
The structure of the AC code table is identical to the structure described in G.1.2.2. Each non-zero point transformed
coefficient that has a zero history (i.e. that has a value ± 1, and therefore has not been coded in a previous scan) is defined
by a composite 8-bit run length-magnitude value of the form:
RRRRSSSS
The four most significant bits, RRRR, give the number of zero coefficients that are between the current coefficient and the
previously coded coefficient (or the start of band). Coefficients with non-zero history (a non-zero value coded in a
previous scan) are skipped over when counting the zero coefficients. The four least significant bits, SSSS, provide the
magnitude category of the non-zero coefficient; for a given component the value of SSSS can only be one.
The run length-magnitude composite value is Huffman coded and each Huffman code is followed by additional bits:
a)
One bit codes the sign of the newly non-zero coefficient. A 0-bit codes a negative sign; a 1-bit codes a
positive sign.
b)
For each coefficient with a non-zero history, one bit is used to code the correction. A 0-bit means no
correction and a 1-bit means that one shall be added to the (scaled) decoded magnitude of the coefficient.
CCITT Rec. T.81 (1992 E)
125
ISO/IEC 10918-1 : 1993(E)
Non-zero coefficients with zero history are coded with a composite code of the form:
HUFFCO(RRRRSSSS) + additional bit (rule a) + correction bits (rule b)
In addition whenever zero runs are coded with ZRL or EOBn codes, correction bits for those coefficients with non-zero
history contained within the zero run are appended according to rule b above.
For the Huffman coding version of Encode_AC_Coefficients_SA the EOB is defined to be the position of the last point
transformed coefficient of magnitude 1 in the band. If there are no coefficients of magnitude 1, the EOB is defined to be
zero.
NOTE – The definition of EOB is different for Huffman and arithmetic coding procedures.
In Figures G.7 and G.8 BE is the count of buffered correction bits at the start of coding of the block. BE is initialized to
zero at the start of each restart interval. At the end of each restart interval any remaining buffered bits are appended to the
bit stream following the last EOBn Huffman code and associated appended bits.
In Figures G.7 and G.9, BR is the count of buffered correction bits which are appended to the bit stream according to rule
b. BR is set to zero at the beginning of each Encode_AC_Coefficients_SA. At the end of each restart interval any
remaining buffered bits are appended to the bit stream following the last Huffman code and associated appended bits.
G.1.3
Progressive encoding procedures with arithmetic coding
G.1.3.1 Progressive encoding of DC coefficients with arithmetic coding
The first scan for a given component shall encode the DC coefficient values using the procedures described in F.1.4.1. If
the successive approximation bit position parameter is not zero, the coefficient values shall be reduced in precision by the
point transform described in Annex A before coding.
In subsequent scans using successive approximation the least significant bits shall be coded as binary decisions using a
fixed probability estimate of 0.5 (Qe = X’5A1D’, MPS = 0).
G.1.3.2 Progressive encoding of AC coefficients with arithmetic coding
Except for the point transform scaling of the DCT coefficients and the grouping of the coefficients into bands, the first
scan(s) of successive approximation is identical to the sequential encoding procedure described in F.1.4. If Kmin is
equated to Ss, the index of the first AC coefficient index in the band, the flow chart shown in Figure F.5 applies. The
EOB decision in that figure refers to the “end-of-band” rather than the “end-of-block”. For the arithmetic coding version
of Encode_AC_Coefficients_SA (and all other AC coefficient coding procedures) the EOB is defined to be the position
following the last non-zero coefficient in the band.
NOTE - The definition of EOB is different for Huffman and arithmetic coding procedures.
The statistical model described in F.1.4 also holds. For this model the default value of Kx is 5. Other values of Kx may be
specified using the DAC marker code (Annex B). The following calculation for Kx has proven to give good results for 8bit precision samples:
Kx = Kmin + SRL (8 + Se – Kmin) 4
This expression reduces to the default of Kx = 5 when the band is from index 1 to index 63.
126
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Encode_AC_
coefficients_SA
K = Ss – 1
R=0
BR = 0
K=K+1
No
Yes
Yes
R > 15
?
K ≥ EOB
?
No
No
|ZZ(K)| = 1
?
Yes
Encode_EOBRUN
Append_BE_bits
Encode_R_ZZ(K)
Append_BR_bits
No
ZZ(K) = 0
?
Yes
Append LSB of ZZ(K)
to buffered bits
BR = BR + 1
K = Se
?
K = Se
?
Encode_EOBRUN
Append_BE_bits
Encode_ZRL
Append_BR_bits
No
R=R+1
No
Yes
EOBRUN =
EOBRUN + 1
BE = BE + BR
Yes
EOBRUN = X’7FFF’
?
No
Yes
Encode_EOBRUN
Append_BE_bits
TISO1630-93/d101
Done
Figure G.7 – Successive approximation coding of AC coefficients using Huffman coding
CCITT Rec. T.81 (1992 E)
127
ISO/IEC 10918-1 : 1993(E)
Append_BE_bits
Yes
BE = 0
?
No
Append BE buffered bits
to bit stream
BE = 0
Done
TISO1640-93/d102
Figure G.8 – Transferring BE buffered bits from buffer to bit stream
Figure G.8 [D102] = 9.5 cm = 371 %
Append_BR_bits
Yes
BR = 0
?
No
Append BR buffered bits
to bit stream
BR = 0
Done
TISO1650-93/d103
Figure G.9 – Transferring BR buffered bits from buffer to bit stream
Figaure G.9 [D103] = 9.5 cm = 371 %
128
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
G.1.3.3 Coding model for subsequent scans of successive approximation
The procedure “Encode_AC_Coefficient_SA” shown in Figure G.10 increases the precision of the AC coefficient values
in the band by one bit.
As in the first scan of successive approximation for a component, an EOB decision is coded at the start of the band and
after each non-zero coefficient.
However, since the end-of-band index of the previous successive approximation scan for a given component, EOBx, is
known from the data coded in the prior scan of that component, this decision is bypassed whenever the current index, K,
is less than EOBx. As in the first scan(s), the EOB decision is also bypassed whenever the last coefficient in the band is
not zero. The decision ZZ(K) = 0 decodes runs of zero coefficients. If the decoder is at this step of the procedure, at least
one non-zero coefficient remains in the band of the block being coded. If ZZ(K) is not zero, the procedure in Figure G.11
is followed to code the value.
The context-indices in Figures G.10 and G.11 are defined in Table G.2 (see G.1.3.3.1). The signs of coefficients with
magnitude of one are coded with a fixed probability value of approximately 0.5 (Qe = X’5A1D’, MPS = 0).
G.1.3.3.1
Statistical model for subsequent successive approximation scans
As shown in Table G.2, each statistics area for subsequent successive approximation scans of AC coefficients consists of a
contiguous set of 189 statistics bins. The signs of coefficients with magnitude of one are coded with a fixed probability
value of approximately 0.5 (Qe = X’5A1D’, MPS = 0).
G.2
Progressive decoding of the DCT
The description of the computation of the IDCT and the dequantization procedure contained in A.3.3 and A.3.4 apply to
the progressive operation.
Progressive decoding processes must be able to decompress compressed image data which requires up to four sets of
Huffman or arithmetic coder conditioning tables within a scan.
In order to avoid repetition, detailed flow diagrams of progressive decoder operation are not included. Decoder operation
is defined by reversing the function of each step described in the encoder flow charts, and performing the steps in reverse
order.
CCITT Rec. T.81 (1992 E)
129
ISO/IEC 10918-1 : 1993(E)
Encode_AC_
coefficients_SA
K = Kmin
Yes
K < EOBx
?
No
K = EOB
?
Yes
Code_1(SE)
No
Code_0(SE)
K=K+1
K=K+1
ZZ(K) = 0
?
Yes
Code_0(S0)
No
CodeSA_ZZ(K)
No
K = Se
?
Yes
Done
TISO1660-93/d104
Figure G.10 – Subsequent successive approximation scans for coding
of AC coefficients using arithmetic coding
Figure G.10 [D104] = PLEINE
130
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
CodeSA_ZZ(K)
T = LSB ZZ(K)
No
T=1
?
Code_0(SC)
Yes
| ZZ(K) | > 1
?
Yes
No
Code_1(S0)
No
Code_1(SC)
ZZ(K) > 0
?
Code_1(SS)
Yes
Code_0(SS)
TISO1670-93/d105
Done
Figure G.11 – Coding non-zero coefficients for subsequent successive approximation scans
Figure G.11 [D105] = 11 cm = 430 %
Table G.2 – Statistical model for subsequent scans of successive
approximation coding of AC coefficient
Context-index
AC coding
Coding decision
SE
3 × (K–1)
K = EOB
S0
SE + 1
V=0
SS
Fixed estimate
Sign
SC
S0 + 1
LSB ZZ(K) = 1
CCITT Rec. T.81 (1992 E)
131
ISO/IEC 10918-1 : 1993(E)
Annex H
Lossless mode of operation
(This annex forms an integral part of this Recommendation | International Standard)
ISO/IEC 10918-1 : 1993(E)
CCITT Rec. T.81 (1992 E)
This annex provides a functional specification of the following coding processes for the lossless mode of operation:
1)
lossless processes with Huffman coding;
2)
lossless processes with arithmetic coding.
For each of these, the encoding process is specified in H.1, and the decoding process is specified in H.2. The functional
specification is presented by means of specific procedures which comprise these coding processes.
NOTE – There is no requirement in this Specification that any encoder or decoder which embodies one of the above-named
processes shall implement the procedures in precisely the manner specified in this annex. It is necessary only that an encoder or decoder
implement the function specified in this annex. The sole criterion for an encoder or decoder to be considered in compliance with this
Specification is that it satisfy the requirements given in clause 6 (for encoders) or clause 7 (for decoders), as determined by the
compliance tests specified in Part 2.
The processes which provide for sequential lossless encoding and decoding are not based on the DCT. The processes used
are spatial processes based on the coding model developed for the DC coefficients of the DCT. However, the model is
extended by incorporating a set of selectable one- and two-dimensional predictors, and for interleaved data the ordering of
samples for the one-dimensional predictor can be different from that used in the DCT-based processes.
Either Huffman coding or arithmetic coding entropy coding may be employed for these lossless encoding and decoding
processes. The Huffman code table structure is extended to allow up to 16-bit precision for the input data. The arithmetic
coder statistical model is extended to a two-dimensional form.
H.1
Lossless encoder processes
H.1.1
Lossless encoder control procedures
Subclause E.1 contains the encoder control procedures. In applying these procedures to the lossless encoder, the data unit
is one sample.
Input data precision may be from 2 to 16 bits/sample. If the input data path has different precision from the input data, the
data shall be aligned with the least significant bits of the input data path. Input data is represented as unsigned integers
and is not level shifted prior to coding.
When the encoder is reset in the restart interval control procedure (see E.1.4), the prediction is reset to a default value. If
arithmetic coding is used, the statistics are also reset.
For the lossless processes the restart interval shall be an integer multiple of the number of MCU in an MCU-row.
H.1.2
Coding model for lossless encoding
The coding model developed for encoding the DC coefficients of the DCT is extended to allow a selection from a set of
seven one-dimensional and two-dimensional predictors. The predictor is selected in the scan header (see Annex B). The
same predictor is used for all components of the scan. Each component in the scan is modeled independently, using
predictions derived from neighbouring samples of that component.
H.1.2.1 Prediction
Figure H.1 shows the relationship between the positions (a, b, c) of the reconstructed neighboring samples used for
prediction and the position of x, the sample being coded.
132
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
c
b
a
x
TISO1680-93/d106
Figure H.1 – Relationship between sample and prediction samples
Figure H.1 [D106] = 4.5 cm = 176 %
Define Px to be the prediction and Ra, Rb, and Rc to be the reconstructed samples immediately to the left, immediately
above, and diagonally to the left of the current sample. The allowed predictors, one of which is selected in the scan
header, are listed in Table H.1.
Table H.1 – Predictors for lossless coding
Selection-value
a)
Prediction
0
No prediction (See Annex J)
1
Px = Ra
2
Px = Rb
3
Px = Rc
4
Px = Ra + Rb – Rc
5
Px = Ra + ((Rb – Rc)/2)a)
6
Px = Rb + ((Ra – Rc)/2)a)
7
Px = (Ra + Rb)/2
Shift right arithmetic operation
Selection-value 0 shall only be used for differential coding in the hierarchical mode of operation. Selections 1, 2 and 3 are
one-dimensional predictors and selections 4, 5, 6, and 7 are two-dimensional predictors.
The one-dimensional horizontal predictor (prediction sample Ra) is used for the first line of samples at the start of the scan
and at the beginning of each restart interval. The selected predictor is used for all other lines. The sample from the line
above (prediction sample Rb) is used at the start of each line, except for the first line. At the beginning of the first line and
at the beginning of each restart interval the prediction value of 2P – 1 is used, where P is the input precision.
If the point transformation parameter (see A.4) is non-zero, the prediction value at the beginning of the first lines and the
beginning of each restart interval is 2P – Pt – 1, where Pt is the value of the point transformation parameter.
Each prediction is calculated with full integer arithmetic precision, and without clamping of either underflow or overflow
beyond the input precision bounds. For example, if Ra and Rb are both 16-bit integers, the sum is a 17-bit integer. After
dividing the sum by 2 (predictor 7), the prediction is a 16-bit integer.
CCITT Rec. T.81 (1992 E)
133
ISO/IEC 10918-1 : 1993(E)
For simplicity of implementation, the divide by 2 in the prediction selections 5 and 6 of Table H.1 is done by an
arithmetic-right-shift of the integer values.
The difference between the prediction value and the input is calculated modulo 216. In the decoder the difference is
decoded and added, modulo 216, to the prediction.
H.1.2.2 Huffman coding of the modulo difference
The Huffman coding procedures defined in Annex F for coding the DC coefficients are used to code the modulo 216
differences. The table for DC coding contained in Tables F.1 and F.6 is extended by one additional entry. No extra bits
are appended after SSSS = 16 is encoded. See Table H.2.
Table H.2 – Difference categories for lossless Huffman coding
SSSS
Difference values
10
0
11
–1,1
12
–3,–2,2,3
13
–7..–4,4..7
14
–15..–8,8..15
15
–31..–16,16..31
16
–63..–32,32..63
17
–127..–64,64..127
18
–255..–128,128..255
19
–511..–256,256..511
10
–1 023..–512,512..1 023
11
–2 047..–1 024,1 024..2 047
12
–4 095..–2 048,2 048..4 095
13
–8 191..–4 096,4 096..8 191
14
–16 383..–8 192,8 192..16 383
15
–32 767..–16 384,16 384..32 767
16
32 768
H.1.2.3 Arithmetic coding of the modulo difference
The statistical model defined for the DC coefficient arithmetic coding model (see F.1.4.4.1) is generalized to a twodimensional form in which differences coded for the sample to the left and for the line above are used for conditioning.
H.1.2.3.1
Two-dimensional statistical model
The binary decisions are conditioned on the differences coded for the neighbouring samples immediately above and
immediately to the left from the same component. As in the coding of the DC coefficients, the differences are classified
into 5 categories: zero(0), small positive (+S), small negative (–S), large positive (+L), and large negative (–L). The two
independent difference categories combine to give 25 different conditioning states. Figure H.2 shows the two-dimensional
array of conditioning indices. For each of the 25 conditioning states probability estimates for four binary decisions are
kept.
At the beginning of the scan and each restart interval the conditioning derived from the line above is set to zero for the
first line of each component. At the start of each line, the difference to the left is set to zero for the purposes of calculating
the conditioning.
134
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Difference above (position b)
Difference to left
(position a)
0
+S
–S
+L
–L
0
0
4
8
12
16
+S
20
24
28
32
36
–S
40
44
48
52
56
+L
60
64
68
72
76
–L
80
84
88
92
96
TISO1690-93/d107
Figure H.2 – 5 × 5 Conditioning array for two-dimensional statistical model
Figure H.2 [D107] = 7 cm = 273 %
H.1.2.3.2
Assignment of statistical bins to the DC binary decision tree
Each statistics area for lossless coding consists of a contiguous set of 158 statistics bins. The first 100 bins consist of
25 sets of four bins selected by a context-index S0. The value of S0 is given by L_Context(Da,Db), which provides a
value of 0, 4,..., 92 or 96, depending on the difference classifications of Da and Db (see H.1.2.3.1). The value for S0
provided by L_Context(Da,Db) is from the array in Figure H.2.
The remaining 58 bins consist of two sets of 29 bins, X1, ..., X15, M2, ..., M15, which are used to code magnitude
category decisions and magnitude bits. The value of X1 is given by X1_Context(Db), which provides a value of 100 when
Db is in the zero, small positive or small negative categories and a value of 129 when Db is in the large positive or large
negative categories.
The assignment of statistical bins to the binary decision tree used for coding the difference is given in Table H.3.
Table H.3 – Statistical model for lossless coding
Context-index
S0
SS
SP
SN
X1
X2
X3
.
.
X15
M2
M3
.
.
M15
Value
L_Context(Da,Db)
S0 + 1
S0 + 2
S0 + 3
X1_Context(Db)
X1 + 1
X1 + 2
.
.
X1 + 14
X2 + 14
X3 + 14
.
.
X15 + 14
Coding decision
V=0
Sign
Sz < 1 if V > 0
Sz < 1 if V < 0
Sz < 2
Sz < 4
Sz < 8
.
.
Sz < 215
Magnitude bits if Sz < 4
Magnitude bits if Sz < 8
.
.
Magnitude bits if Sz < 215
CCITT Rec. T.81 (1992 E)
135
ISO/IEC 10918-1 : 1993(E)
H.1.2.3.3
Default conditioning bounds
The bounds, L and U, for determining the conditioning category have the default values L = 0 and U = 1. Other bounds
may be set using the DAC (Define-Arithmetic-Conditioning) marker segment, as described in Annex B.
H.1.2.3.4
Initial conditions for statistical model
At the start of a scan and at each restart, all statistics bins are re-initialized to the standard default value described in
Annex D.
H.2
Lossless decoder processes
Lossless decoders may employ either Huffman decoding or arithmetic decoding. They shall be capable of using up to four
tables in a scan. Lossless decoders shall be able to decode encoded image source data with any input precision from 2 to
16 bits per sample.
H.2.1
Lossless decoder control procedures
Subclause E.2 contains the decoder control procedures. In applying these procedures to the lossless decoder the data unit
is one sample.
When the decoder is reset in the restart interval control procedure (see E.2.4) the prediction is reset to the same value
used in the encoder (see H.1.2.1). If arithmetic coding is used, the statistics are also reset.
Restrictions on the restart interval are specified in H.1.1.
H.2.2
Coding model for lossless decoding
The predictor calculations defined in H.1.2 also apply to the lossless decoder processes.
The lossless decoders, decode the differences and add them, modulo 216, to the predictions to create the output. The
lossless decoders shall be able to interpret the point transform parameter, and if non-zero, multiply the output of the
lossless decoder by 2Pt.
In order to avoid repetition, detailed flow charts of the lossless decoding procedures are omitted.
136
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Annex J
Hierarchical mode of operation
(This annex forms an integral part of this Recommendation | International Standard)
This annex provides a functional specification of the coding processes for the hierarchical mode of operation.
In the hierarchical mode of operation each component is encoded or decoded in a non-differential frame. Such frames may
be followed by a sequence of differential frames. A non-differential frame shall be encoded or decoded using the
procedures defined in Annexes F, G and H. Differential frame procedures are defined in this annex.
The coding process for a hierarchical encoding containing DCT-based processes is defined as the highest numbered
process listed in Table J.1 which is used to code any non-differential DCT-based or differential DCT-based frame in the
compressed image data format. The coding process for a hierarchical encoding containing only lossless processes is
defined to be the process used for the non-differential frames.
Table J.1 – Coding processes for hierarchical mode
Process
11
12
13
14
15
16
17
18
19
10
11
12
13
14
Non-differential frame specification
Extended sequential DCT, Huffman, 8-bit
Extended sequential DCT, arithmetic, 8-bit
Extended sequential DCT, Huffman, 12-bit
Extended sequential DCT, arithmetic, 12-bit
Spectral selection only, Huffman, 8-bit
Spectral selection only, arithmetic, 8-bit
Full progression, Huffman, 8-bit
Full progression, arithmetic, 8-bit
Spectral selection only, Huffman, 12-bit
Spectral selection only, arithmetic, 12-bit
Full progression, Huffman, 12-bit
Full progression, arithmetic, 12-bit
Lossless, Huffman, 2 through 16 bits
Lossless, arithmetic, 2 through 16 bits
Annex F, process 2
Annex F, process 3
Annex F, process 4
Annex F, process 5
Annex G, process 1
Annex G, process 2
Annex G, process 3
Annex G, process 4
Annex G, process 5
Annex G, process 6
Annex G, process 7
Annex G, process 8
Annex H, process 1
Annex H, process 2
Hierarchical mode syntax requires a DHP marker segment that appears before the non-differential frame or frames. It may
include EXP marker segments and differential frames which shall follow the initial non-differential frame. The frame
structure in hierarchical mode is identical to the frame structure in non-hierarchical mode.
Either all non-differential frames within an image shall be coded with DCT-based processes, or all non-differential frames
shall be coded with lossless processes. All frames within an image must use the same entropy coding procedure, either
Huffman or arithmetic, with the exception that non-differential frames coded with the baseline process may occur in the
same image with frames coded with arithmetic coding processes.
If the non-differential frames use DCT-based processes, all differential frames except the final frame for a component shall
use DCT-based processes. The final differential frame for each component may use a differential lossless process.
If the non-differential frames use lossless processes, all differential frames shall use differential lossless processes.
For each of the processes listed in Table J.1, the encoding processes are specified in J.1, and decoding processes are
specified in J.2.
NOTE – There is no requirement in this Specification that any encoder or decoder which embodies one of the
above-named processes shall implement the procedures in precisely the manner specified by the flow charts in this annex. It is
necessary only that an encoder or decoder implement the function specified in this annex. The sole criterion for an encoder or decoder
to be considered in compliance with this Specification is that it satisfy the requirements given in clause 6 (for encoders) or clause 7 (for
decoders), as determined by the compliance tests specified in Part 2.
CCITT Rec. T.81 (1992 E)
137
ISO/IEC 10918-1 : 1993(E)
In the hierarchical mode of operation each component is encoded or decoded in a non-differential frame followed by a
sequence of differential frames. A non-differential frame shall use the procedures defined in Annexes F, G, and H.
Differential frame procedures are defined in this annex.
J.1
Hierarchical encoding
J.1.1
Hierarchical control procedure for encoding an image
The control structure for encoding of an image using the hierarchical mode is given in Figure J.1.
Encode_image
[Generate down-sampled images]
Append SOI marker
[Append tables/miscellaneous]
Append DHP marker segment
No
Differential frame
?
Yes
[Upsample reference components and
append EXP marker segment]
Generate differential components
Encode_differential_frame
Reconstruct differential components
Reconstruct components
Encode_frame
Reconstruct components
using matching
decoder process
Yes
More frames
?
No
Append EOI marker
Done
TISO1700-93/d108
Figure J.1 – Hierarchical control procedure for encoding an image
Figure J.1 [D108] = 18 cm = 704 %
In Figure J.1 procedures in brackets shall be performed whenever the particular hierarchical encoding sequence being
followed requires them.
138
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
In the hierarchical mode the define-hierarchical-progression (DHP) marker segment shall be placed in the compressed
image data before the first start-of-frame. The DHP segment is used to signal the size of the image components of the
completed image. The syntax of the DHP segment is specified in Annex B.
The first frame for each component or group of components in a hierarchical process shall be encoded by a
non-differential frame. Differential frames shall then be used to encode the two’s complement differences between source
input components (possibly downsampled) and the reference components (possibly upsampled). The reference
components are reconstructed components created by previous frames in the hierarchical process. For either differential or
non-differential frames, reconstructions of the components shall be generated if needed as reference components for a
subsequent frame in the hierarchical process.
Resolution changes may occur between hierarchical frames in a hierarchical process. These changes occur if
downsampling filters are used to reduce the spatial resolution of some or all of the components of the source image. When
the resolution of a reference component does not match the resolution of the component input to a differential frame, an
upsampling filter shall be used to increase the spatial resolution of the reference component. The EXP marker segment
shall be added to the compressed image data before the start-of-frame whenever upsampling of a reference component is
required. No more than one EXP marker segment shall precede a given frame.
Any of the marker segments allowed before a start-of-frame for the encoding process selected may be used before either
non-differential or differential frames.
For 16-bit input precision (lossless encoder), the differential components which are input to a differential frame are
calculated modulo 216. The reconstructed components calculated from the reconstructed differential components are also
calculated modulo 216.
If a hierarchical encoding process uses a DCT encoding process for the first frame, all frames in the hierarchical process
except for the final frame for each component shall use the DCT encoding processes defined in either Annex F or Annex
G, or the modified DCT encoding processes defined in this annex. The final frame may use a modified lossless process
defined in this annex.
If a hierarchical encoding process uses a lossless encoding process for the first frame, all frames in the hierarchical process
shall use a lossless encoding process defined in Annex H, or a modified lossless process defined in this annex.
J.1.1.1
Downsampling filter
The downsampled components are generated using a downsampling filter that is not specified in this Specification. This
filter should, however, be consistent with the upsampling filter. An example of a downsampling filter is provided in K.5.
J.1.1.2
Upsampling filter
The upsampling filter increases the spatial resolution by a factor of two horizontally, vertically, or both. Bi-linear
interpolation is used for the upsampling filter, as illustrated in Figure J.2.
a
x
b
a
x
b
TISO1710-93/d109
Figure J.2 – Diagram of sample positions for upsampling rules
Figure J.2 [D109] = 4 cm = 156 %
CCITT Rec. T.81 (1992 E)
139
ISO/IEC 10918-1 : 1993(E)
The rule for calculating the interpolated value is:
Px = (Ra + Rb) / 2
where Ra and Rb are sample values from adjacent positions a and b of the lower resolution image and Px is the
interpolated value. The division indicates truncation, not rounding. The left-most column of the upsampled image matches
the left-most column of the lower resolution image. The top line of the upsampled image matches the top line of the lower
resolution image. The right column and the bottom line of the lower resolution image are replicated to provide the values
required for the right column edge and bottom line interpolations. The upsampling process always doubles the line length
or the number of lines.
If both horizontal and vertical expansions are signalled, they are done in sequence – first the horizontal expansion and
then the vertical.
J.1.2
Control procedure for encoding a differential frame
The control procedures in Annex E for frames, scans, restart intervals, and MCU also apply to the encoding of differential
frames, and the scans, restart intervals, and MCU from which the differential frame is constructed. The differential frames
differ from the frames of Annexes F, G, and H only at the coding model level.
J.1.3
Encoder coding models for differential frames
The coding models defined in Annexes F, G, and H are modified to allow them to be used for coding of two’s complement
differences.
J.1.3.1
Modifications to encoder DCT encoding models for differential frames
Two modifications are made to the DCT coding models to allow them to be used in differential frames. First, the FDCT of
the differential input is calculated without the level shift. Second, the DC coefficient of the DCT is coded directly –
without prediction.
J.1.3.2
Modifications to lossless encoding models for differential frames
One modification is made to the lossless coding models. The difference is coded directly – without prediction. The
prediction selection parameter in the scan header shall be set to zero. The point transform which may be applied to the
differential inputs is defined in Annex A.
J.1.4
Modifications to the entropy encoders for differential frames
The coding of two’s complement differences requires one extra bit of precision for the Huffman coding of AC coefficients.
The extension to Tables F.1 and F.7 is given in Table J.2.
Table J.2 – Modifications to table
of AC coefficient amplitude ranges
SSSS
15
AC coefficients
–32 767..–16 384, 16 384..32 767
The arithmetic coding models are already defined for the precision needed in differential frames.
140
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
J.2
Hierarchical decoding
J.2.1
Hierarchical control procedure for decoding an image
The control structure for decoding an image using the hierarchical mode is given in Figure J.3.
Decode_image
SOI marker
?
No
Yes
Interpret markers
EOI marker
?
Error
Yes
No
Hierarchical
?
Done
No
Non-Hierarchical mode
Yes
No
Decode_frame
Differential frame
?
Yes
[Upsample reference components]
Decode_differential_frame
Reconstruct_components
TISO1720-93/d110
Figure J.3 – Hierarchical control procedure for decoding an image
Figure J.3 [D110] = 18 cm = 704 %
CCITT Rec. T.81 (1992 E)
141
ISO/IEC 10918-1 : 1993(E)
The Interpret markers procedure shall decode the markers which may precede the SOF marker, continuing this decoding
until either a SOF or EOI marker is found. If the DHP marker is encountered before the first frame, a flag is set which
selects the hierarchical decoder at the “hierarchical?” decision point. In addition to the DHP marker (which shall precede
any SOF) and the EXP marker (which shall precede any differential SOF requiring resolution changes in the reference
components), any other markers which may precede a SOF shall be interpreted to the extent required for decoding of the
compressed image data.
If a differential SOF marker is found, the differential frame path is followed. If the EXP was encountered in the Interpret
markers procedure, the reference components for the frame shall be upsampled as required by the parameters in the EXP
segment. The upsampling procedure described in J.1.1.2 shall be followed.
The Decode_differential_frame procedure generates a set of differential components. These differential components shall
be added, modulo 216, to the upsampled reference components in the Reconstruct_components procedure. This creates a
new set of reference components which shall be used when required in subsequent frames of the hierarchical process.
J.2.2
Control procedure for decoding a differential frame
The control procedures in Annex E for frames, scans, restart intervals, and MCU also apply to the decoding of differential
frames and the scans, restart intervals, and MCU from which the differential frame is constructed. The differential frame
differs from the frames of Annexes F, G, and H only at the decoder coding model level.
J.2.3
Decoder coding models for differential frames
The decoding models described in Annexes F, G, and H are modified to allow them to be used for decoding of two’s
complement differential components.
J.2.3.1
Modifications to the differential frame decoder DCT coding model
Two modifications are made to the decoder DCT coding models to allow them to code differential frames. First, the IDCT
of the differential output is calculated without the level shift. Second, the DC coefficient of the DCT is decoded directly –
without prediction.
J.2.3.2
Modifications to the differential frame decoder lossless coding model
One modification is made to the lossless decoder coding model. The difference is decoded directly – without prediction. If
the point transformation parameter in the scan header is not zero, the point transform, defined in Annex A, shall be
applied to the differential output.
J.2.4
Modifications to the entropy decoders for differential frames
The decoding of two’s complement differences requires one extra bit of precision in the Huffman code table. This is
described in J.1.4. The arithmetic coding models are already defined for the precision needed in differential frames.
142
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Annex K
Examples and guidelines
(This annex does not form an integral part of this Recommendation | International Standard)
This annex provides examples of various tables, procedures, and other guidelines.
K.1
Quantization tables for luminance and chrominance components
Two examples of quantization tables are given in Tables K.1 and K.2. These are based on psychovisual thresholding and
are derived empirically using luminance and chrominance and 2:1 horizontal subsampling. These tables are provided as
examples only and are not necessarily suitable for any particular application. These quantization values have been used
with good results on 8-bit per sample luminance and chrominance images of the format illustrated in Figure 13. Note that
these quantization values are appropriate for the DCT normalization defined in A.3.3.
If these quantization values are divided by 2, the resulting reconstructed image is usually nearly indistinguishable from the
source image.
Table K.1 – Luminance quantization table
16
11
10
16
124
140
151
161
12
12
14
19
126
158
160
155
14
13
16
24
140
157
169
156
14
17
22
29
151
187
180
162
18
22
37
56
168
109
103
177
24
35
55
64
181
104
113
192
49
64
78
87
103
121
120
101
72
92
95
98
112
100
103
199
Table K.2 – Chrominance quantization table
17
18
24
47
99
99
99
99
18
21
26
66
99
99
99
99
24
26
56
99
99
99
99
99
47
66
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
99
CCITT Rec. T.81 (1992 E)
143
ISO/IEC 10918-1 : 1993(E)
K.2
A procedure for generating the lists which specify a Huffman code table
A Huffman table is generated from a collection of statistics in two steps. The first step is the generation of the list of
lengths and values which are in accord with the rules for generating the Huffman code tables. The second step is the
generation of the Huffman code table from the list of lengths and values.
The first step, the topic of this section, is needed only for custom Huffman table generation and is done only in the
encoder. In this step the statistics are used to create a table associating each value to be coded with the size (in bits) of the
corresponding Huffman code. This table is sorted by code size.
A procedure for creating a Huffman table for a set of up to 256 symbols is shown in Figure K.1. Three vectors are defined
for this procedure:
FREQ(V)
CODESIZE(V)
OTHERS(V)
Frequency of occurrence of symbol V
Code size of symbol V
Index to next symbol in chain of all symbols in current branch of code tree
where V goes from 0 to 256.
Before starting the procedure, the values of FREQ are collected for V = 0 to 255 and the FREQ value for V = 256 is set to
1 to reserve one code point. FREQ values for unused symbols are defined to be zero. In addition, the entries in
CODESIZE are all set to 0, and the indices in OTHERS are set to –1, the value which terminates a chain of indices.
Reserving one code point guarantees that no code word can ever be all “1” bits.
The search for the entry with the least value of FREQ(V) selects the largest value of V with the least value of FREQ(V)
greater than zero.
The procedure “Find V1 for least value of FREQ(V1) > 0” always selects the value with the largest value of V1 when
more than one V1 with the same frequency occurs. The reserved code point is then guaranteed to be in the longest code
word category.
144
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Code_size
Find V1 for least value of
FREQ(V1) > 0
Find V2 for next least value
of FREQ(V2) > 0
No
V2 exists
?
Yes
Done
FREQ(V1) =
FREQ(V1) +
FREQ(V2)
FREQ(V2) = 0
CODESIZE(V1) =
CODESIZE(V1) + 1
V1 = OTHERS(V1)
No
OTHERS(V1) = –1
?
Yes
OTHERS(V1) = V2
CODESIZE(V2) =
CODESIZE(V2) + 1
V2 = OTHERS(V2)
No
OTHERS(V2) = –1
?
Yes
TISO1730-93/d111
Figure K.1 – Procedure to find Huffman code sizes
Figure K.1 [D111] = 21 cm = 821 % (PAGE PLEINE)
CCITT Rec. T.81 (1992 E)
145
ISO/IEC 10918-1 : 1993(E)
Once the code lengths for each symbol have been obtained, the number of codes of each length is obtained using the
procedure in Figure K.2. The count for each size is contained in the list, BITS. The counts in BITS are zero at the start of
the procedure. The procedure assumes that the probabilities are large enough that code lengths greater than 32 bits never
occur. Note that until the final Adjust_BITS procedure is complete, BITS may have more than the 16 entries required in
the table specification (see Annex C).
Count_BITS
I= 0
CODESIZE(I) = 0
?
No BITS(CODESIZE(I)) =
BITS(CODESIZE(I)) + 1
Yes
I=I+1
No
I = 257
Yes
Adjust_BITS
Done
TISO1740-93/d112
Figure K.2 – Procedure to find the number of codes of each size
Figure K.2 [D112] = 16 cm = 625 %
146
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Figure K.3 gives the procedure for adjusting the BITS list so that no code is longer than 16 bits. Since symbols are paired
for the longest Huffman code, the symbols are removed from this length category two at a time. The prefix for the pair
(which is one bit shorter) is allocated to one of the pair; then (skipping the BITS entry for that prefix length) a code word
from the next shortest non-zero BITS entry is converted into a prefix for two code words one bit longer. After the BITS
list is reduced to a maximum code length of 16 bits, the last step removes the reserved code point from the code length
count.
Adjust_BITS
I = 32
Yes
BITS(I) > 0
?
J=I–1
No
I=I–1
No
I = 16
?
J=J–1
Yes
No
BITS(J) > 0
?
I=I–1
Yes
BITS(I) = BITS(I) – 2
BITS(I – 1) = BITS(I – 1) + 1
BITS(J + 1) = BITS(J + 1) + 2
BITS(J) = BITS (J) – 1
Yes
BITS(I) = 0
?
No
BITS(I) = BITS(I) – 1
Done
TISO1750-93/d113
Figure K.3 – Procedure for limiting code lengths to 16 bits
Figure K.3 [D113] = 20 cm = 782 %
CCITT Rec. T.81 (1992 E)
147
ISO/IEC 10918-1 : 1993(E)
The input values are sorted according to code size as shown in Figure K.4. HUFFVAL is the list containing the input
values associated with each code word, in order of increasing code length.
At this point, the list of code lengths (BITS) and the list of values (HUFFVAL) can be used to generate the code tables.
These procedures are described in Annex C.
Sort_input
I= 1
K=0
J=0
CODESIZE(J) = I
?
Yes
HUFFVAL(K) = J
K=K+1
No
J=J+1
No
J > 255
?
Yes
I=I+1
No
I > 32
?
Yes
Done
TISO1760-93/d114
Figure K.4 – Sorting of input values according to code size
Figure K.4 [D114] = 20.5 cm = 801 %
K.3
Typical Huffman tables for 8-bit precision luminance and chrominance
Huffman table-specification syntax is specified in B.2.4.2.
148
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
K.3.1
Typical Huffman tables for the DC coefficient differences
Tables K.3 and K.4 give Huffman tables for the DC coefficient differences which have been developed from the average
statistics of a large set of video images with 8-bit precision. Table K.3 is appropriate for luminance components and Table
K.4 is appropriate for chrominance components. Although there are no default tables, these tables may prove to be useful
for many applications.
Table K.3 – Table for luminance DC coefficient differences
Category
Code length
Code word
10
2
000
11
3
010
12
3
011
13
3
100
14
3
101
15
3
110
16
4
1110
17
5
11110
18
6
111110
19
7
1111110
10
8
11111110
11
9
111111110
Table K.4 – Table for chrominance DC coefficient differences
K.3.2
Category
Code length
Code word
10
12
000
11
12
01
12
12
10
13
13
110
14
14
1110
15
15
11110
16
16
111110
17
17
1111110
18
18
11111110
19
19
111111110
10
10
1111111110
11
11
11111111110
Typical Huffman tables for the AC coefficients
Tables K.5 and K.6 give Huffman tables for the AC coefficients which have been developed from the average statistics of
a large set of images with 8-bit precision. Table K.5 is appropriate for luminance components and Table K.6 is appropriate
for chrominance components. Although there are no default tables, these tables may prove to be useful for many
applications.
CCITT Rec. T.81 (1992 E)
149
ISO/IEC 10918-1 : 1993(E)
Table K.5 – Table for luminance AC coefficients (sheet 1 of 4)
Run/Size
0/0 (EOB)
0/1
0/2
0/3
0/4
0/5
0/6
0/7
0/8
0/9
0/A
1/1
1/2
1/3
1/4
1/5
1/6
1/7
1/8
1/9
1/A
2/1
2/2
2/3
2/4
2/5
2/6
2/7
2/8
2/9
2/A
3/1
3/2
3/3
3/4
3/5
3/6
3/7
3/8
3/9
3/A
150
CCITT Rec. T.81 (1992 E)
Code length
14
12
12
13
14
15
17
18
10
16
16
14
15
17
19
11
16
16
16
16
16
15
18
10
12
16
16
16
16
16
16
16
19
12
16
16
16
16
16
16
16
Code word
1010
00
01
100
1011
11010
1111000
11111000
1111110110
1111111110000010
1111111110000011
1100
11011
1111001
111110110
11111110110
1111111110000100
1111111110000101
1111111110000110
1111111110000111
1111111110001000
11100
11111001
1111110111
111111110100
1111111110001001
1111111110001010
1111111110001011
1111111110001100
1111111110001101
1111111110001110
111010
111110111
111111110101
1111111110001111
1111111110010000
1111111110010001
1111111110010010
1111111110010011
1111111110010100
1111111110010101
ISO/IEC 10918-1 : 1993(E)
Table K.5 (sheet 2 of 4)
Run/Size
4/1
4/2
4/3
4/4
4/5
4/6
4/7
4/8
4/9
4/A
5/1
5/2
5/3
5/4
5/5
5/6
5/7
5/8
5/9
5/A
6/1
6/2
6/3
6/4
6/5
6/6
6/7
6/8
6/9
6/A
7/1
7/2
7/3
7/4
7/5
7/6
7/7
7/8
7/9
7/A
8/1
8/2
Code length
16
10
16
16
16
16
16
16
16
16
17
11
16
16
16
16
16
16
16
16
17
12
16
16
16
16
16
16
16
16
18
12
16
16
16
16
16
16
16
16
19
15
Code word
111011
1111111000
1111111110010110
1111111110010111
1111111110011000
1111111110011001
1111111110011010
1111111110011011
1111111110011100
1111111110011101
1111010
11111110111
1111111110011110
1111111110011111
1111111110100000
1111111110100001
1111111110100010
1111111110100011
1111111110100100
1111111110100101
1111011
111111110110
1111111110100110
1111111110100111
1111111110101000
1111111110101001
1111111110101010
1111111110101011
1111111110101100
1111111110101101
11111010
111111110111
1111111110101110
1111111110101111
1111111110110000
1111111110110001
1111111110110010
1111111110110011
1111111110110100
1111111110110101
111111000
111111111000000
CCITT Rec. T.81 (1992 E)
151
ISO/IEC 10918-1 : 1993(E)
Table K.5 (sheet 3 of 4)
Run/Size
8/3
8/4
8/5
8/6
8/7
8/8
8/9
8/A
9/1
9/2
9/3
9/4
9/5
9/6
9/7
9/8
9/9
9/A
A/1
A/2
A/3
A/4
A/5
A/6
A/7
A/8
A/9
A/A
B/1
B/2
B/3
B/4
B/5
B/6
B/7
B/8
B/9
B/A
C/1
C/2
C/3
C/4
152
CCITT Rec. T.81 (1992 E)
Code length
16
16
16
16
16
16
16
16
19
16
16
16
16
16
16
16
16
16
19
16
16
16
16
16
16
16
16
16
10
16
16
16
16
16
16
16
16
16
10
16
16
16
Code word
1111111110110110
1111111110110111
1111111110111000
1111111110111001
1111111110111010
1111111110111011
1111111110111100
1111111110111101
111111001
1111111110111110
1111111110111111
1111111111000000
1111111111000001
1111111111000010
1111111111000011
1111111111000100
1111111111000101
1111111111000110
111111010
1111111111000111
1111111111001000
1111111111001001
1111111111001010
1111111111001011
1111111111001100
1111111111001101
1111111111001110
1111111111001111
1111111001
1111111111010000
1111111111010001
1111111111010010
1111111111010011
1111111111010100
1111111111010101
1111111111010110
1111111111010111
1111111111011000
1111111010
1111111111011001
1111111111011010
1111111111011011
ISO/IEC 10918-1 : 1993(E)
Table K.5 (sheet 4 of 4)
Run/Size
C/5
C/6
C/7
C/8
C/9
C/A
D/1
D/2
D/3
D/4
D/5
D/6
D/7
D/8
D/9
D/A
E/1
E/2
E/3
E/4
E/5
E/6
E/7
E/8
E/9
E/A
F/0 (ZRL)
F/1
F/2
F/3
F/4
F/5
F/6
F/7
F/8
F/9
F/A
Code length
16
16
16
16
16
16
11
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
11
16
16
16
16
16
16
16
16
16
16
Code word
1111111111011100
1111111111011101
1111111111011110
1111111111011111
1111111111100000
1111111111100001
11111111000
1111111111100010
1111111111100011
1111111111100100
1111111111100101
1111111111100110
1111111111100111
1111111111101000
1111111111101001
1111111111101010
1111111111101011
1111111111101100
1111111111101101
1111111111101110
1111111111101111
1111111111110000
1111111111110001
1111111111110010
1111111111110011
1111111111110100
11111111001
1111111111110101
1111111111110110
1111111111110111
1111111111111000
1111111111111001
1111111111111010
1111111111111011
1111111111111100
1111111111111101
1111111111111110
CCITT Rec. T.81 (1992 E)
153
ISO/IEC 10918-1 : 1993(E)
Table K.6 – Table for chrominance AC coefficients (sheet 1 of 4)
Run/Size
0/0 (EOB)
0/1
0/2
0/3
0/4
0/5
0/6
0/7
0/8
0/9
0/A
1/1
1/2
1/3
1/4
1/5
1/6
1/7
1/8
1/9
1/A
2/1
2/2
2/3
2/4
2/5
2/6
2/7
2/8
2/9
2/A
3/1
3/2
3/3
3/4
3/5
3/6
3/7
3/8
3/9
3/A
4/1
154
CCITT Rec. T.81 (1992 E)
Code length
12
12
13
14
15
15
16
17
19
10
12
14
16
18
19
11
12
16
16
16
16
15
18
10
12
15
16
16
16
16
16
15
18
10
12
16
16
16
16
16
16
16
Code word
00
01
100
1010
11000
11001
111000
1111000
111110100
1111110110
111111110100
1011
111001
11110110
111110101
11111110110
111111110101
1111111110001000
1111111110001001
1111111110001010
1111111110001011
11010
11110111
1111110111
111111110110
111111111000010
1111111110001100
1111111110001101
1111111110001110
1111111110001111
1111111110010000
11011
11111000
1111111000
111111110111
1111111110010001
1111111110010010
1111111110010011
1111111110010100
1111111110010101
1111111110010110
111010
ISO/IEC 10918-1 : 1993(E)
Table K.6 (sheet 2 of 4)
Run/Size
4/2
4/3
4/4
4/5
4/6
4/7
4/8
4/9
4/A
5/1
5/2
5/3
5/4
5/5
5/6
5/7
5/8
5/9
5/A
6/1
6/2
6/3
6/4
6/5
6/6
6/7
6/8
6/9
6/A
7/1
7/2
7/3
7/4
7/5
7/6
7/7
7/8
7/9
7/A
8/1
8/2
8/3
Code length
19
16
16
16
16
16
16
16
16
16
10
16
16
16
16
16
16
16
16
17
11
16
16
16
16
16
16
16
16
17
11
16
16
16
16
16
16
16
16
18
16
16
Code word
111110110
1111111110010111
1111111110011000
1111111110011001
1111111110011010
1111111110011011
1111111110011100
1111111110011101
1111111110011110
111011
1111111001
1111111110011111
1111111110100000
1111111110100001
1111111110100010
1111111110100011
1111111110100100
1111111110100101
1111111110100110
1111001
11111110111
1111111110100111
1111111110101000
1111111110101001
1111111110101010
1111111110101011
1111111110101100
1111111110101101
1111111110101110
1111010
11111111000
1111111110101111
1111111110110000
1111111110110001
1111111110110010
1111111110110011
1111111110110100
1111111110110101
1111111110110110
11111001
1111111110110111
1111111110111000
CCITT Rec. T.81 (1992 E)
155
ISO/IEC 10918-1 : 1993(E)
Table K.6 (sheet 3 of 4)
Run/Size
8/4
8/5
8/6
8/7
8/8
8/9
8/A
9/1
9/2
9/3
9/4
9/5
9/6
9/7
9/8
9/9
9/A
A/1
A/2
A/3
A/4
A/5
A/6
A/7
A/8
A/9
A/A
B/1
B/2
B/3
B/4
B/5
B/6
B/7
B/8
B/9
B/A
C/1
C/2
C/3
C/4
C/5
156
CCITT Rec. T.81 (1992 E)
Code length
16
16
16
16
16
16
16
19
16
16
16
16
16
16
16
16
16
19
16
16
16
16
16
16
16
16
16
19
16
16
16
16
16
16
16
16
16
19
16
16
16
16
Code word
1111111110111001
1111111110111010
1111111110111011
1111111110111100
1111111110111101
1111111110111110
1111111110111111
111110111
1111111111000000
1111111111000001
1111111111000010
1111111111000011
1111111111000100
1111111111000101
1111111111000110
1111111111000111
1111111111001000
111111000
1111111111001001
1111111111001010
1111111111001011
1111111111001100
1111111111001101
1111111111001110
1111111111001111
1111111111010000
1111111111010001
111111001
1111111111010010
1111111111010011
1111111111010100
1111111111010101
1111111111010110
1111111111010111
1111111111011000
1111111111011001
1111111111011010
111111010
1111111111011011
1111111111011100
1111111111011101
1111111111011110
ISO/IEC 10918-1 : 1993(E)
Table K.6 (sheet 4 of 4)
Run/Size
C/6
C/7
C/8
C/9
C/A
D/1
D/2
D/3
D/4
D/5
D/6
D/7
D/8
D/9
D/A
E/1
E/2
E/3
E/4
E/5
E/6
E/7
E/8
E/9
E/A
F/0 (ZRL)
F/1
F/2
F/3
F/4
F/5
F/6
F/7
F/8
F/9
F/A
Code length
16
16
16
16
16
11
16
16
16
16
16
16
16
16
16
14
16
16
16
16
16
16
16
16
16
10
15
16
16
16
16
16
16
16
16
16
Code word
1111111111011111
1111111111100000
1111111111100001
1111111111100010
1111111111100011
11111111001
1111111111100100
1111111111100101
1111111111100110
1111111111100111
1111111111101000
1111111111101001
1111111111101010
1111111111101011
1111111111101100
11111111100000
1111111111101101
1111111111101110
1111111111101111
1111111111110000
1111111111110001
1111111111110010
1111111111110011
1111111111110100
1111111111110101
1111111010
111111111000011
1111111111110110
1111111111110111
1111111111111000
1111111111111001
1111111111111010
1111111111111011
1111111111111100
1111111111111101
1111111111111110
CCITT Rec. T.81 (1992 E)
157
ISO/IEC 10918-1 : 1993(E)
K.3.3
Huffman table-specification examples
K.3.3.1 Specification of typical tables for DC difference coding
A set of typical tables for DC component coding is given in K.3.1. The specification of these tables is as follows:
For Table K.3 (for luminance DC coefficients), the 16 bytes which specify the list of code lengths for the table are
X’00
01
05
01
01
01
01
01
01
00
00
00
04
05
06
07
08
09
0A
0B’
00
00
00
00’
The set of values following this list is
X’00
01
02
03
For Table K.4 (for chrominance DC coefficients), the 16 bytes which specify the list of code lengths for the table are
X’00
03
01
01
01
01
01
01
01
01
01
00
04
05
06
07
08
09
0A
0B’
00
00
00
00’
The set of values following this list is
X’00
01
02
03
K.3.3.2 Specification of typical tables for AC coefficient coding
A set of typical tables for AC component coding is given in K.3.2. The specification of these tables is as follows:
For Table K.5 (for luminance AC coefficients), the 16 bytes which specify the list of code lengths for the table are
X’00
02
01
03
03
02
04
03
05
05
04
04
00
00
01
7D’
The set of values which follows this list is
X’01
02
03
00
04
11
05
12
21
31
41
06
13
51
61
07
X’22
71
14
32
81
91
A1
08
23
42
B1
C1
15
52
D1
F0
X’24
33
62
72
82
09
0A
16
17
18
19
1A
25
26
27
28
X’29
2A
34
35
36
37
38
39
3A
43
44
45
46
47
48
49
X’4A
53
54
55
56
57
58
59
5A
63
64
65
66
67
68
69
X’6A
73
74
75
76
77
78
79
7A
83
84
85
86
87
88
89
X’8A
92
93
94
95
96
97
98
99
9A
A2
A3
A4
A5
A6
A7
X’A8
A9
AA
B2
B3
B4
B5
B6
B7
B8
B9
BA
C2
C3
C4
C5
X’C6
C7
C8
C9
CA
D2
D3
D4
D5
D6
D7
D8
D9
DA
E1
E2
X’E3
E4
E5
E6
E7
E8
E9
EA
F1
F2
F3
F4
F5
F6
F7
F8
X’F9
FA’
158
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
For Table K.6 (for chrominance AC coefficients), the 16 bytes which specify the list of code lengths for the table are
X’00
02
01
02
04
04
03
04
07
05
04
04
00
01
02
77’
The set of values which follows this list is:
X’00
01
02
03
11
04
05
21
31
06
12
41
51
07
61
71
X’13
22
32
81
08
14
42
91
A1
B1
C1
09
23
33
52
F0
X'15
62
72
D1
0A
16
24
34
E1
25
F1
17
18
19
1A
26
X'27
28
29
2A
35
36
37
38
39
3A
43
44
45
46
47
48
X'49
4A
53
54
55
56
57
58
59
5A
63
64
65
66
67
68
X'69
6A
73
74
75
76
77
78
79
7A
82
83
84
85
86
87
X'88
89
8A
92
93
94
95
96
97
98
99
9A
A2
A3
A4
A5
X'A6
A7
A8
A9
AA
B2
B3
B4
B5
B6
B7
B8
B9
BA
C2
C3
X'C4
C5
C6
C7
C8
C9
CA
D2
D3
D4
D5
D6
D7
D8
D9
DA
X'E2
E3
E4
E5
E6
E7
E8
E9
EA
F2
F3
F4
F5
F6
F7
F8
X'F9
FA’
K.4
Additional information on arithmetic coding
K.4.1
Test sequence for a small data set for the arithmetic coder
The following 256-bit test sequence (in hexadecimal form) is structured to test many of the encoder and decoder paths:
X’00020051
000000C0
0352872A
AAAAAAAA
82C02000
FCD79EF6
74EAABF7
697EE74C’
Tables K.7 and K.8 provide a symbol-by-symbol list of the arithmetic encoder and decoder operation. In these tables the
event count, EC, is listed first, followed by the value of Qe used in encoding and decoding that event. The decision D to
be encoded (and decoded) is listed next. The column labeled MPS contains the sense of the MPS, and if it is followed by
a CE (in the “CX” column), the conditional MPS/LPS exchange occurs when encoding and decoding the decision (see
Figures D.3, D.4 and D.17). The contents of the A and C registers are the values before the event is encoded and decoded.
ST is the number of X’FF’ bytes stacked in the encoder waiting for a resolution of the carry-over. Note that the A register
is always greater than X’7FFF’. (The starting value has an implied value of X’10000’.)
In the encoder test, the code bytes (B) are listed if they were completed during the coding of the preceding event. If
additional bytes follow, they were also completed during the coding of the preceding event. If a byte is listed in the
Bx column, the preceding byte in column B was modified by a carry-over.
In the decoder the code bytes are listed if they were placed in the code register just prior to the event EC.
For this file the coded bit count is 240, including the overhead to flush the final data from the C register. When the
marker X’FFD9’ is appended, a total of 256 bits are output. The actual compressed data sequence for the encoder is (in
hexadecimal form)
X’655B5144
F7969D51
7855BFFF
00FC5184
C7CEF939
00287D46
708ECBC0
F6FFD900’
CCITT Rec. T.81 (1992 E)
159
ISO/IEC 10918-1 : 1993(E)
Table K.7 – Encoder test sequence (sheet 1 of 7)
160
EC
D
MPS
11
12
13
14
15
16
17
18
19
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
CX
CE
CCITT Rec. T.81 (1992 E)
Qe
(hexadecimal)
A
(hexadecimal)
C
(hexadecimal)
CT
ST
5A1D
5A1D
2586
2586
1114
1114
1114
1114
1114
080B
080B
080B
080B
080B
080B
17B9
1182
1182
1182
1182
1182
0CEF
0CEF
0CEF
0CEF
0CEF
1518
1518
1AA9
1AA9
174E
174E
1AA9
1AA9
1AA9
174E
0000
A5E3
B43A
8EB4
D25C
C148
B034
9F20
8E0C
F9F0
F1E5
E9DA
E1CF
D9C4
D1B9
80B0
D1EE
C06C
AEEA
9D68
8BE6
F4C8
E7D9
DAEA
CDFB
C10C
CEF0
B9D8
A8C0
8E17
E6DC
CF8E
BA70
9FC7
851E
D4EA
00000000
00000000
0000978C
0000978C
00012F18
00012F18
00012F18
00012F18
00012F18
00025E30
00025E30
00025E30
00025E30
00025E30
00025E30
00327DE0
0064FBC0
0064FBC0
0064FBC0
0064FBC0
0064FBC0
00C9F780
00C9F780
00C9F780
00C9F780
00C9F780
000AB9D0
000AB9D0
005AF480
005AF480
00B5E900
00B5E900
00050A00
00050A00
00050A00
000A1400
11
11
10
10
19
19
19
19
19
18
18
18
18
18
18
14
13
13
13
13
13
12
12
12
12
12
16
16
13
13
12
12
17
17
17
16
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Bx
B
65
5B
ISO/IEC 10918-1 : 1993(E)
Table K.7 – Encoder test sequence (sheet 2 of 7)
EC
D
MPS
37
38
0
0
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
CX
Qe
(hexadecimal)
A
(hexadecimal)
C
(hexadecimal)
CT
ST
0
0
174E
174E
BD9C
A64E
000A1400
000A1400
16
16
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
174E
1424
1424
1424
1424
1424
1424
119C
119C
119C
119C
119C
119C
119C
0F6B
0F6B
0F6B
0F6B
0F6B
1424
1AA9
1AA9
174E
174E
174E
174E
1424
1424
1424
1424
1424
1424
119C
1424
1AA9
8F00
EF64
DB40
C71C
B2F8
9ED4
8AB0
ED18
DB7C
C9E0
B844
A6A8
950C
8370
E3A8
D43D
C4D2
B567
A5FC
F6B0
A120
8677
D79C
C04E
A900
91B2
F4C8
E0A4
CC80
B85C
A438
9014
F7E0
8CE0
A120
000A1400
00142800
00142800
00142800
00142800
00142800
00142800
00285000
00285000
00285000
00285000
00285000
00285000
00285000
0050A000
0050A000
0050A000
0050A000
0050A000
00036910
00225CE0
00225CE0
0044B9C0
0044B9C0
0044B9C0
0044B9C0
00897380
00897380
00897380
00897380
00897380
00897380
0112E700
001E6A20
00F716E0
16
15
5
5
5
5
5
4
4
4
4
4
4
4
3
3
3
3
3
7
4
4
3
3
3
3
2
2
2
2
2
2
1
6
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Bx
CCITT Rec. T.81 (1992 E)
B
51
44
161
ISO/IEC 10918-1 : 1993(E)
Table K.7 – Encoder test sequence (sheet 3 of 7)
162
EC
D
MPS
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
100
101
102
103
104
105
106
107
108
109
110
1
0
1
0
0
1
0
1
0
0
0
0
1
1
1
0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
CX
CE
CE
CE
CE
CE
CE
CE
CCITT Rec. T.81 (1992 E)
Qe
(hexadecimal)
A
(hexadecimal)
C
(hexadecimal)
CT
ST
1AA9
2516
2516
299A
2516
2516
299A
2516
299A
2516
2516
2516
1EDF
2516
299A
32B4
2E17
2E17
32B4
32B4
3C3D
3C3D
415E
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
8677
D548
B032
9458
D57C
B066
9458
D57C
9458
D57C
B066
8B50
CC74
F6F8
9458
A668
E768
B951
B85C
85A8
CAD0
8E93
F0F4
AF96
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
00F716E0
00041570
00041570
00128230
00250460
00250460
00963EC0
012C7D80
0004B798
00096F30
00096F30
00096F30
0012DE60
009C5FA8
0274C628
0004C398
00098730
00098730
002849A8
002849A8
00A27270
00A27270
00031318
00031318
000702A0
000E7E46
001D92B4
003B9E6E
0077D304
00F01F0E
01E0D444
0002218E
0004D944
000A2B8E
0014ED44
002A538E
00553D44
3
8
8
6
5
5
3
2
8
7
7
7
6
3
1
7
6
6
4
4
2
2
8
8
7
6
5
4
3
2
1
8
7
6
5
4
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Bx
B
F7
96
9D
51
78
ISO/IEC 10918-1 : 1993(E)
Table K.7 – Encoder test sequence (sheet 4 of 7)
EC
D
MPS
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
0
0
0
1
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
CX
CE
CE
CE
CE
CE
CE
CE
CE
CE
CE
CE
Qe
(hexadecimal)
A
(hexadecimal)
C
(hexadecimal)
CT
ST
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
3C3D
375E
32B4
32B4
3C3D
3C3D
415E
4639
415E
3C3D
375E
32B4
32B4
32B4
2E17
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
9628
B3D6
F8F0
C63C
CAD0
8E93
F0F4
82BC
8C72
9628
B3D6
F8F0
C63C
9388
C1A8
00AAF38E
01567D44
0005738E
000B7D44
0017738E
002F7D44
005F738E
00BF7D44
017F738E
02FF7D44
0007738E
000F7D44
001F738E
003F7D44
007F738E
00FF7D44
01FF738E
03FF7D44
0007738E
000F7D44
001F738E
003EE71C
007DCE38
00FB9C70
00FB9C70
03F0BFE0
03F0BFE0
000448D8
0009F0DC
00145ABE
0028B57C
00516AF8
00A2D5F0
00A2D5F0
00A2D5F0
0145ABE0
2
1
8
7
6
5
4
3
2
1
8
7
6
5
4
3
2
1
8
7
6
5
4
3
3
1
1
7
6
5
4
3
2
2
2
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
Bx
CCITT Rec. T.81 (1992 E)
B
55
BF
FF00FC
163
ISO/IEC 10918-1 : 1993(E)
Table K.7 – Encoder test sequence (sheet 5 of 7)
164
EC
D
MPS
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
0
0
1
1
0
1
0
1
1
1
1
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
CX
CE
CE
CE
CE
CE
CE
CE
CE
CCITT Rec. T.81 (1992 E)
Qe
(hexadecimal)
A
(hexadecimal)
C
(hexadecimal)
CT
ST
2E17
32B4
32B4
2E17
299A
299A
299A
2516
2516
2516
1EDF
1EDF
1EDF
1EDF
1AA9
2516
299A
32B4
3C3D
415E
4639
415E
3C3D
415E
4639
415E
4639
415E
4639
4B85
504F
5522
504F
4B85
504F
5522
59EB
9391
B85C
85A8
A5E8
EFA2
C608
9C6E
E5A8
C092
9B7C
ECCC
CDED
AF0E
902F
E2A0
D548
9458
A668
CAD0
F0F4
82BC
8C72
9628
F0F4
82BC
8C72
82BC
8C72
82BC
F20C
970A
8D76
AA44
B3EA
970A
8D76
E150
0145ABE0
00084568
00084568
00108AD0
002115A0
002115A0
002115A0
00422B40
00422B40
00422B40
00845680
00845680
00845680
00845680
0108AD00
000BA7B8
00315FA8
00C72998
031E7530
000C0F0C
00197D44
0033738E
0066E71C
019D041C
033B6764
000747CE
000F25C4
001EC48E
003E1F44
00F87D10
01F2472E
03E48E5C
00018D60
00031AC0
0007064A
000E0C94
00383250
1
7
7
6
5
5
5
4
4
4
3
3
3
3
2
7
5
3
1
7
6
5
4
2
1
8
7
6
5
3
2
1
8
7
6
5
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Bx
B
51
84
C7
CE
F9
ISO/IEC 10918-1 : 1993(E)
Table K.7 – Encoder test sequence (sheet 6 of 7)
EC
D
MPS
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
0
1
1
1
1
0
1
1
0
0
1
1
1
0
1
0
0
1
1
1
0
1
0
1
0
1
0
1
0
1
0
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
CX
CE
CE
CE
CE
CE
CE
CE
CE
CE
CE
CE
Qe
(hexadecimal)
A
(hexadecimal)
C
(hexadecimal)
CT
ST
59EB
59EB
59EB
5522
504F
4B85
504F
4B85
4639
4B85
504F
4B85
4639
415E
4639
415E
4639
4B85
4B85
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
3C3D
375E
32B4
32B4
B3D6
B3D6
B3D6
B3D6
BD68
DA32
970A
A09E
AA32
8C72
81DA
A09E
AA32
C7F2
82BC
8C72
82BC
F20C
A687
B604
DF96
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
9628
B3D6
F8F0
C63C
0071736A
00E39AAA
0007E92A
000FD254
001FA4A8
003F4950
007FAFFA
00FFED6A
01FFDAD4
04007D9A
0000FB34
0002597E
0004B2FC
000965F8
0013D918
00282B36
0050EC94
0003B250
0003B250
000764A0
000EC940
001ECEF0
003E16E6
007CC3F4
00FA00EE
01F49804
0001A90E
0003E844
0008498E
00112944
0022CB8E
00462D44
008CD38E
0119A71C
00034E38
00069C70
00069C70
2
1
8
7
6
5
4
3
2
1
8
7
6
5
4
3
2
8
8
7
6
5
4
3
2
1
8
7
6
5
4
3
2
1
8
7
7
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Bx
B
38
39
CCITT Rec. T.81 (1992 E)
00
28
7D
46
165
ISO/IEC 10918-1 : 1993(E)
Table K.7 – Encoder test sequence (sheet 7 of 7)
EC
D
MPS
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
Flush:
0
1
1
1
0
1
1
0
1
0
0
1
0
1
1
1
1
1
1
0
1
1
1
0
0
1
1
1
0
1
0
0
1
1
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
CX
CE
CE
CE
CE
Qe
(hexadecimal)
A
(hexadecimal)
C
(hexadecimal)
CT
ST
32B4
3C3D
3C3D
375E
32B4
3C3D
3C3D
375E
3C3D
3C3D
415E
4639
415E
4639
415E
3C3D
375E
32B4
32B4
32B4
3C3D
3C3D
375E
32B4
3C3D
415E
415E
3C3D
3C3D
415E
415E
4639
4B85
4B85
4639
4B85
9388
CAD0
8E93
A4AC
DA9C
CAD0
8E93
A4AC
DD78
A13B
F0F4
82BC
8C72
82BC
8C72
9628
B3D6
F8F0
C63C
9388
CAD0
8E93
A4AC
DA9C
CAD0
F0F4
AF96
DC70
A033
F0F4
AF96
82BC
F20C
A687
B604
8C72
81DA
00069C70
001BF510
001BF510
0037EA20
006FD440
01C1F0A0
01C1F0A0
0003E140
00113A38
00113A38
00467CD8
008E58DC
011D2ABE
023AEBA4
0006504E
000CA09C
00194138
00328270
00328270
00328270
00CB8D10
00CB8D10
01971A20
032E3440
000B70A0
002FFCCC
002FFCCC
005FF998
005FF998
01817638
01817638
0303C8E0
000F2380
000F2380
001E4700
003D6D96
007ADB2C
7
5
5
4
3
1
1
8
6
6
4
3
2
1
8
7
6
5
5
5
3
3
2
1
7
5
5
4
4
2
2
1
7
7
6
5
4
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Bx
B
70
8E
CB
C0
F6
FFD9
166
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Table K.8 – Decoder test sequence (sheet 1 of 7)
EC
D
MPS
11
12
13
14
15
16
17
18
19
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
CX
CE
Qe
(hexadecimal)
A
(hexadecimal)
C
(hexadecimal)
CT
B
5A1D
5A1D
2586
2586
1114
1114
1114
1114
1114
080B
080B
080B
080B
080B
080B
17B9
1182
1182
1182
1182
1182
0CEF
0CEF
0CEF
0CEF
0CEF
1518
1518
1AA9
1AA9
174E
174E
1AA9
1AA9
1AA9
174E
174E
174E
174E
1424
0000
A5E3
B43A
8EB4
D25C
C148
B034
9F20
8E0C
F9F0
F1E5
E9DA
E1CF
D9C4
D1B9
80B0
D1EE
C06C
AEEA
9D68
8BE6
F4C8
E7D9
DAEA
CDFB
C10C
CEF0
B9D8
A8C0
8E17
E6DC
CF8E
BA70
9FC7
851E
D4EA
BD9C
A64E
8F00
EF64
655B0000
655B0000
332AA200
332AA200
66554400
66554400
66554400
66554400
66554400
CCAA8800
CCAA8800
CCAA8800
CCAA8800
CCAA8800
CCAA8800
2FC88000
5F910000
5F910000
5F910000
5F910000
5F910000
BF228800
BF228800
BF228800
BF228800
BF228800
B0588000
B0588000
5CC40000
5CC40000
B989EE00
B989EE00
0A4F7000
0A4F7000
0A4F7000
149EE000
149EE000
149EE000
149EE000
293DC000
0
0
7
7
6
6
6
6
6
5
5
5
5
5
5
1
0
0
0
0
0
7
7
7
7
7
3
3
0
0
7
7
4
4
4
3
3
3
3
2
65 5B
51
44
F7
CCITT Rec. T.81 (1992 E)
167
ISO/IEC 10918-1 : 1993(E)
Table K.8 – Decoder test sequence (sheet 2 of 7)
168
EC
D
MPS
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
1
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
CX
CCITT Rec. T.81 (1992 E)
Qe
(hexadecimal)
A
(hexadecimal)
C
(hexadecimal)
CT
1424
1424
1424
1424
1424
119C
119C
119C
119C
119C
119C
119C
0F6B
0F6B
0F6B
0F6B
0F6B
1424
1AA9
1AA9
174E
174E
174E
174E
1424
1424
1424
1424
1424
1424
119C
1424
1AA9
1AA9
2516
2516
299A
2516
2516
299A
DB40
C71C
B2F8
9ED4
8AB0
ED18
DB7C
C9E0
B844
A6A8
950C
8370
E3A8
D43D
C4D2
B567
A5FC
F6B0
A120
8677
D79C
C04E
A900
91B2
F4C8
E0A4
CC80
B85C
A438
9014
F7E0
8CE0
A120
8677
D548
B032
9458
D57C
B066
9458
293DC000
293DC000
293DC000
293DC000
293DC000
527B8000
527B8000
527B8000
527B8000
527B8000
527B8000
527B8000
A4F70000
A4F70000
A4F70000
A4F70000
A4F70000
E6696000
1EEB0000
1EEB0000
3DD60000
3DD60000
3DD60000
3DD60000
7BAD3A00
7BAD3A00
7BAD3A00
7BAD3A00
7BAD3A00
7BAD3A00
F75A7400
88B3A000
7FBD0000
7FBD0000
9F7A8800
9F7A8800
517A2000
A2F44000
A2F44000
5E910000
2
2
2
2
2
1
1
1
1
1
1
1
0
0
0
0
0
4
1
1
0
0
0
0
7
7
7
7
7
7
6
3
0
0
5
5
3
2
2
0
B
96
9D
51
ISO/IEC 10918-1 : 1993(E)
Table K.8 – Decoder test sequence (sheet 3 of 7)
EC
D
MPS
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
1
0
0
0
0
1
1
1
0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
120
CX
CE
CE
CE
CE
CE
CE
CE
CE
CE
CE
CE
CE
Qe
(hexadecimal)
A
(hexadecimal)
C
(hexadecimal)
CT
B
2516
299A
2516
2516
2516
1EDF
2516
299A
32B4
2E17
2E17
32B4
32B4
3C3D
3C3D
415E
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
D57C
9458
D57C
B066
8B50
CC74
F6F8
9458
A668
E768
B951
B85C
85A8
CAD0
8E93
F0F4
AF96
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
BD22F000
32F3C000
65E78000
65E78000
65E78000
CBCF0000
F1D00000
7FB95400
53ED5000
A7DAA000
A7DAA000
72828000
72828000
7E3B7E00
7E3B7E00
AF95F800
AF95F800
82BBF000
8C71E000
82BBC000
8C718000
82BB0000
8C71FE00
82BBFC00
8C71F800
82BBF000
8C71E000
82BBC000
8C718000
82BB0000
8C71F800
82BBF000
8C71E000
82BBC000
8C718000
82BB0000
8C700000
82B80000
8C6AA200
82AD4400
7
5
4
4
4
3
0
6
4
3
3
1
1
7
7
5
5
4
3
2
1
0
7
6
5
4
3
2
1
0
7
6
5
4
3
2
1
0
7
6
78
55
BF
FF 00
FC
51
CCITT Rec. T.81 (1992 E)
169
ISO/IEC 10918-1 : 1993(E)
Table K.8 – Decoder test sequence (sheet 4 of 7)
170
EC
D
MPS
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
1
0
1
0
1
0
1
0
1
0
0
0
0
0
1
0
1
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
CX
CE
CE
CE
CE
CE
CE
CCITT Rec. T.81 (1992 E)
Qe
(hexadecimal)
A
(hexadecimal)
C
(hexadecimal)
CT
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
3C3D
375E
32B4
32B4
3C3D
3C3D
415E
4639
415E
3C3D
375E
32B4
32B4
32B4
2E17
2E17
32B4
32B4
2E17
299A
299A
299A
2516
2516
2516
1EDF
1EDF
1EDF
1EDF
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
9628
B3D6
F8F0
C63C
CAD0
8E93
F0F4
82BC
8C72
9628
B3D6
F8F0
C63C
9388
C1A8
9391
B85C
85A8
A5E8
EFA2
C608
9C6E
E5A8
C092
9B7C
ECCC
CDED
AF0E
902F
8C548800
82811000
8BFC2000
81D04000
8A9A8000
7F0D0000
85150800
74021000
6EFE2000
47D44000
16A28000
2D450000
5A8A0000
B5140000
B5140000
86331C00
86331C00
CF747000
3FBCE000
0673C000
0CE78000
19CF0000
339F9C00
339F9C00
339F9C00
673F3800
673F3800
0714E000
0714E000
0E29C000
1C538000
1C538000
1C538000
38A70000
38A70000
38A70000
714E0000
714E0000
714E0000
714E0000
5
4
3
2
1
0
7
6
5
4
3
2
1
0
0
6
6
4
3
2
1
0
7
7
7
6
6
4
4
3
2
2
2
1
1
1
0
0
0
0
B
84
C7
CE
ISO/IEC 10918-1 : 1993(E)
Table K.8 – Decoder test sequence (sheet 5 of 7)
EC
D
MPS
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
1
1
1
1
1
1
0
0
1
1
0
1
0
1
1
1
1
0
0
1
1
1
1
0
1
1
1
1
0
1
1
0
0
1
1
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
CX
CE
CE
CE
CE
CE
CE
CE
CE
CE
CE
CE
CE
CE
Qe
(hexadecimal)
A
(hexadecimal)
C
(hexadecimal)
CT
B
1AA9
2516
299A
32B4
3C3D
415E
4639
415E
3C3D
415E
4639
415E
4639
415E
4639
4B85
504F
5522
504F
4B85
504F
5522
59EB
59EB
59EB
59EB
5522
504F
4B85
504F
4B85
4639
4B85
504F
4B85
4639
415E
4639
415E
4639
E2A0
D548
9458
A668
CAD0
F0F4
82BC
8C72
9628
F0F4
82BC
8C72
82BC
8C72
82BC
F20C
970A
8D76
AA44
B3EA
970A
8D76
E150
B3D6
B3D6
B3D6
B3D6
BD68
DA32
970A
A09E
AA32
8C72
81DA
A09E
AA32
C7F2
82BC
8C72
82BC
E29DF200
D5379000
94164000
A5610000
C6B4E400
E0879000
61E32000
4AC04000
95808000
EE560000
7D800000
81FA0000
6DCC0000
62920000
2EFC0000
BBF00000
2AD25000
55A4A000
3AA14000
75428000
19BB0000
33760000
CDD80000
8CE6FA00
65F7F400
1819E800
3033D000
6067A000
C0CF4000
64448000
3B130000
76268C00
245B1800
48B63000
2E566000
5CACC000
B9598000
658B0000
52100000
0DF8E000
7
4
2
0
6
4
3
2
1
7
6
5
4
3
2
0
7
6
5
4
3
2
0
7
6
5
4
3
2
1
0
7
6
5
4
3
2
1
0
7
F9
39
00
28
7D
46
70
CCITT Rec. T.81 (1992 E)
171
ISO/IEC 10918-1 : 1993(E)
Table K.8 – Decoder test sequence (sheet 6 of 7)
172
EC
D
MPS
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
1
1
1
0
1
0
1
0
1
0
1
0
1
0
1
1
1
1
1
1
0
1
1
1
0
1
1
0
1
0
0
1
0
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
CX
CE
CE
CE
CE
CE
CE
CE
CE
CCITT Rec. T.81 (1992 E)
Qe
(hexadecimal)
A
(hexadecimal)
C
(hexadecimal)
CT
4B85
4B85
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
4639
415E
3C3D
375E
32B4
32B4
32B4
3C3D
3C3D
375E
32B4
3C3D
3C3D
375E
3C3D
3C3D
415E
4639
415E
4639
415E
3C3D
375E
32B4
32B4
32B4
F20C
A687
B604
DF96
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
82BC
8C72
9628
B3D6
F8F0
C63C
9388
CAD0
8E93
A4AC
DA9C
CAD0
8E93
A4AC
DD78
A13B
F0F4
82BC
8C72
82BC
8C72
9628
B3D6
F8F0
C63C
9388
37E38000
37E38000
6FC70000
DF8E0000
82AC0000
8C520000
827C0000
8BF31C00
81BE3800
8A767000
7EC4E000
8483C000
72DF8000
6CB90000
434A0000
0D8F9600
1B1F2C00
363E5800
6C7CB000
6C7CB000
6C7CB000
2EA2C000
2EA2C000
5D458000
BA8B0000
4A8F0000
4A8F0000
951E0000
9F400000
9F400000
E9080000
72E40000
6CC3EC00
435FD800
0DB9B000
1B736000
36E6C000
6DCD8000
6DCD8000
6DCD8000
5
5
4
3
2
1
0
7
6
5
4
3
2
1
0
7
6
5
4
4
4
2
2
1
0
6
6
5
3
3
1
0
7
6
5
4
3
2
2
2
B
8E
CB
C0
F6
ISO/IEC 10918-1 : 1993(E)
Table K.8 – Decoder test sequence (sheet 7 of 7)
EC
D
MPS
CX
241
1
1
242
1
1
Marker detected: zero byte fed to decoder
243
1
1
244
0
1
245
0
1
246
1
1
247
1
1
248
1
1
249
0
1
Marker detected: zero byte fed to decoder
250
1
1
251
0
1
252
0
1
CE
253
1
1
254
1
1
255
0
1
256
0
1
CE
K.5
Qe
(hexadecimal)
A
(hexadecimal)
C
(hexadecimal)
CT
3C3D
3C3D
CAD0
8E93
33E60000
33E60000
0
0
375E
32B4
3C3D
415E
415E
3C3D
3C3D
A4AC
DA9C
CAD0
F0F4
AF96
DC70
A033
67CC0000
CF980000
9EC00000
40B40000
40B40000
81680000
81680000
7
6
4
2
2
1
1
415E
415E
4639
4B85
4B85
4639
4B85
F0F4
AF96
82BC
F20C
A687
B604
8C72
75C80000
75C80000
0F200000
3C800000
3C800000
79000000
126A0000
7
7
6
4
4
3
2
B
Low-pass downsampling filters for hierarchical coding
In this section simple examples are given of downsampling filters which are compatible with the upsampling filter defined
in J.1.1.2.
Figure K.5 shows the weighting of neighbouring samples for simple one-dimensional horizontal and vertical low-pass
filters. The output of the filter must be normalized by the sum of the neighbourhood weights.
1
2
1
1
2
1
TISO1770-93/d115
Figure K.5 – Low-pass filter example
Figure K.5 [D115] = 4 cm = 156 %
The centre sample in Figure K.5 should be aligned with the left column or top line of the high resolution image when
calculating the left column or top line of the low resolution image. Sample values which are situated outside of the image
boundary are replicated from the sample values at the boundary to provide missing edge values.
If the image being downsampled has an odd width or length, the odd dimension is increased by 1 by sample replication on
the right edge or bottom line before downsampling.
CCITT Rec. T.81 (1992 E)
173
ISO/IEC 10918-1 : 1993(E)
K.6
Domain of applicability of DCT and spatial coding techniques
The DCT coder is intended for lossy coding in a range from quite visible loss to distortion well below the threshold for
visibility. However in general, DCT-based processes cannot be used for true lossless coding.
The lossless coder is intended for completely lossless coding. The lossless coding process is significantly less effective
than the DCT-based processes for distortions near and above the threshold of visibility.
The point transform of the input to the lossless coder permits a very restricted form of lossy coding with the “lossless”
coder. (The coder is still lossless after the input point transform.) Since the DCT is intended for lossy coding, there may
be some confusion about when this alternative lossy technique should be used.
Lossless coding with a point transformed input is intended for applications which cannot be addressed by DCT coding
techniques. Among these are
–
true lossless coding to a specified precision;
–
lossy coding with precisely defined error bounds;
–
hierarchical progression to a truly lossless final stage.
If lossless coding with a point transformed input is used in applications which can be met effectively by DCT coding, the
results will be significantly less satisfactory. For example, distortion in the form of visible contours usually appears when
precision of the luminance component is reduced to about six bits. For normal image data, this occurs at bit rates well
above those for which the DCT gives outputs which are visually indistinguishable from the source.
K.7
Domain of applicability of the progressive coding modes of operation
Two very different progressive coding modes of operation have been defined, progressive coding of the DCT coefficients
and hierarchical progression. Progressive coding of the DCT coefficients has two complementary procedures, spectral
selection and successive approximation. Because of this diversity of choices, there may be some confusion as to which
method of progression to use for a given application.
K.7.1
Progressive coding of the DCT
In progressive coding of the DCT coefficients two complementary procedures are defined for decomposing the 8 × 8 DCT
coefficient array, spectral selection and successive approximation. Spectral selection partitions zig-zag array of DCT
coefficients into “bands”, one band being coded in each scan. Successive approximation codes the coefficients with
reduced precision in the first scan; in each subsequent scan the precision is increased by one bit.
A single forward DCT is calculated for these procedures. When all coefficients are coded to full precision, the DCT is the
same as in the sequential mode. Therefore, like the sequential DCT coding, progressive coding of DCT coefficients is
intended for applications which need very good compression for a given level of visual distortion.
The simplest progressive coding technique is spectral selection; indeed, because of this simplicity, some applications may
choose – despite the limited progression that can be achieved – to use only spectral selection. Note, however, that the
absence of high frequency bands typically leads – for a given bit rate – to a significantly lower image quality in the
intermediate stages than can be achieved with the more general progressions. The net coding efficiency at the completion
of the final stage is typically comparable to or slightly less than that achieved with the sequential DCT.
A much more flexible progressive system is attained at some increase in complexity when successive approximation is
added to the spectral selection progression. For a given bit rate, this system typically provides significantly better image
quality than spectral selection alone. The net coding efficiency at the completion of the final stage is typically comparable
to or slightly better than that achieved with the sequential DCT.
K.7.2
Hierarchical progression
Hierarchical progression permits a sequence of outputs of increasing spatial resolution, and also allows refinement of
image quality at a given spatial resolution. Both DCT and spatial versions of the hierarchical progression are allowed, and
progressive coding of DCT coefficients may be used in a frame of the DCT hierarchical progression.
The DCT hierarchical progression is intended for applications which need very good compression for a given level of
visual distortion; the spatial hierarchical progression is intended for applications which need a simple progression with a
truly lossless final stage. Figure K.6 illustrates examples of these two basic hierarchical progressions.
174
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
DCT path
Lossless path
DCT (dif)
Predicted (dif)
DCT (dif)
Predicted (dif)
Lossless (dif)
+
Point transform
Predicted (dif)
Bounded error on
reconstructed image
No error on
reconstructed image
TISO1780-93/d116
Figure K.6 – Sketch of the basic operations of the hierarchical mode
Figure K.6 [D116] = 14 cm = 547 %
K.7.2.1 DCT Hierarchical progression
If a DCT hierarchical progression uses reduced spatial resolution, the early stages of the progression can have better image
quality for a given bit rate than the early stages of non-hierarchical progressive coding of the DCT coefficients. However,
at the point where the distortion between source and output becomes indistinguishable, the coding efficiency achieved
with a DCT hierarchical progression is typically significantly lower than the coding efficiency achieved with a nonhierarchical progressive coding of the DCT coefficients.
While the hierarchical DCT progression is intended for lossy progressive coding, a final spatial differential coding stage
can be used. When this final stage is used, the output can be almost lossless, limited only by the difference between the
encoder and decoder IDCT implementations. Since IDCT implementations can differ significantly, truly lossless coding
after a DCT hierarchical progression cannot be guaranteed. An important alternative, therefore, is to use the input point
transform of the final lossless differential coding stage to reduce the precision of the differential input. This allows a
bounding of the difference between source and output at a significantly lower cost in coded bits than coding of the full
precision spatial difference would require.
K.7.2.2 Spatial hierarchical progression
If lossless progression is required, a very simple hierarchical progression may be used in which the spatial lossless coder
with point transformed input is used as a first stage. This first stage is followed by one or more spatial differential coding
stages. The first stage should be nearly lossless, such that the low order bits which are truncated by the point transform are
essentially random – otherwise the compression efficiency will be degraded relative to non-progressive lossless coding.
CCITT Rec. T.81 (1992 E)
175
ISO/IEC 10918-1 : 1993(E)
K.8
Suppression of block-to-block discontinuities in decoded images
A simple technique is available for suppressing the block-to-block discontinuities which can occur in images compressed
by DCT techniques.
The first few (five in this example) low frequency DCT coefficients are predicted from the nine DC values of the block
and the eight nearest-neighbour blocks, and the predicted values are used to suppress blocking artifacts in smooth areas of
the image.
The prediction equations for the first five AC coefficients in the zig-zag sequence are obtained as follows:
K.8.1
AC prediction
The sample field in a 3 by 3 array of blocks (each block containing an 8 × 8 array of samples) is modeled by a
two-dimensional second degree polynomial of the form:
P(x,y) = A1(x2y2) + A2(x2y) + A3(xy2) + A4(x2) + A5(xy) + A6(y2) + A7(x) + A8(y) + A9
The nine coefficients A1 through A9 are uniquely determined by imposing the constraint that the mean of P(x,y) over
each of the nine blocks must yield the correct DC-values.
Applying the DCT to the quadratic field predicting the samples in the central block gives a prediction of the low
frequency AC coefficients depicted in Figure K.7.
DC
x
x
x
x
x
TISO1790-93/d117
Figure K.7 – DCT array positions of predicted AC coefficients
Figure K.7 [D.117] = 8 cm = 313 %
The prediction equations derived in this manner are as follows:
For the two dimensional array of DC values shown
DC1
DC4
DC7
DC2
DC5
DC8
DC3
DC6
DC9
The unquantized prediction equations are
AC01 = 1,13885 (DC4 – DC6)
AC10 = 1,13885 (DC2 – DC8)
AC20 = 0,27881 (DC2 + DC8 – 2 × DC5)
AC11 = 0,16213 ((DC1 – DC3) – (DC7 – DC9))
AC02 = 0,27881 (DC4 + DC6 – 2 × DC5)
The scaling of the predicted AC coefficients is consistent with the DCT normalization defined in A.3.3.
176
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
K.8.2
Quantized AC prediction
The prediction equations can be mapped to a form which uses quantized values of the DC coefficients and which
computes quantized AC coefficients using integer arithmetic. The quantized DC coefficients need to be scaled, however,
such that the predicted coefficients have fractional bit precision.
First, the prediction equation coefficients are scaled by 32 and rounded to the nearest integer. Thus,
1,13885 × 32 = 36
0,27881 × 32 = 39
0,16213 × 32 = 35
The multiplicative factors are then scaled by the ratio of the DC and AC quantization factors and rounded appropriately.
The normalization defined for the DCT introduces another factor of 8 in the unquantized DC values. Therefore, in terms
of the quantized DC values, the predicted quantized AC coefficients are given by the equations below. Note that if (for
example) the DC values are scaled by a factor of 4, the AC predictions will have 2 fractional bits of precision relative to
the quantized DCT coefficients.
QAC01 = ( (Rd × Q01) + (36 × Q00 × (QDC4 – QDC6)))/(256 × Q01)
QAC10 = ( (Rd × Q10) + (36 × Q00 × (QDC2 – QDC8)))/(256 × Q10)
QAC20 = ( (Rd × Q20) + ( 9 × Q00 × (QDC2 + QDC8 – 2 × QDC5)))/(256 × Q20)
QAC11 = ( (Rd × Q11) + ( 5 × Q00 × ((QDC1 – QDC3) – (QDC7 – QDC9))))/(256 × Q11)
QAC02 = ( (Rd × Q02) + ( 9 × Q00 × (QDC4 + QDC6 – 2 × QDC5)))/(256 × Q02)
where QDCx and QACxy are the quantized and scaled DC and AC coefficient values. The constant Rd is added to get a
correct rounding in the division. Rd is 128 for positive numerators, and –128 for negative numerators.
Predicted values should not override coded values. Therefore, predicted values for coefficients which are already non-zero
should be set to zero. Predictions should be clamped if they exceed a value which would be quantized to a non-zero value
for the current precision in the successive approximation.
K.9
Modification of dequantization to improve displayed image quality
For a progression where the first stage successive approximation bit, Al, is set to 3, uniform quantization of the DCT gives
the following quantization and dequantization levels for a sequence of successive approximation scans, as shown in
Figure K.8:
Al
3
r¯ t
2
r¯ t
1
r¯ t
0
r¯ t¯ r¯ t¯ r¯ t¯ r¯ t¯ r¯ t¯ r¯ t¯ r¯ t¯ r¯ t _r_t¯ r¯ t¯ r¯ t¯ r¯ t¯ r¯ t¯ r¯ t¯ r¯ t¯ r¯ t¯ r
–8
r
x
x¯ r¯ t
–7
–6
r¯ t
r
x¯ r¯ t
–5
–4
t¯ r
x¯ r¯ t
–3
–2
r
–1
0
t¯ r¯ x
+1
+2
+3
t¯ r
x
t¯ r
t¯ r¯ x
t¯ r¯ x
t¯ r
+4
+5
+6
+7
+8
T0812990-93/d118
Quantized DCT coefficient value
F i gur e K .8 – I l l ust r at i on of t w o r econst r uct i on st r at egi es
Figure K.8 [D118] = 6 cm = 234 %
The column to the left labelled “Al” gives the bit position specified in the scan header. The quantized DCT coefficient
magnitudes are therefore divided by 2Al during that scan.
CCITT Rec. T.81 (1992 E)
177
ISO/IEC 10918-1 : 1993(E)
Referring to the final scan (Al = 0), the points marked with “t” are the threshold values, while the points marked with “r”
are the reconstruction values. The unquantized output is obtained by multiplying the horizontal scale in Figure K.8 by the
quantization value.
The quantization interval for a coefficient value of zero is indicated by the depressed interval of the line. As the bit
position Al is increased, a “fat zero” quantization interval develops around the zero DCT coefficient value. In the limit
where the scaling factor is very large, the zero interval is twice as large as the rest of the quantization intervals.
Two different reconstruction strategies are shown. The points marked “r” are the reconstruction obtained using the normal
rounding rules for the DCT for the complete full precision output. This rule seems to give better image quality when high
bandwidth displays are used. The points marked “x” are an alternative reconstruction which tends to give better images on
lower bandwidth displays. “x” and “r” are the same for slice 0. The system designer must determine which strategy is best
for the display system being used.
K.10
Example of point transform
The difference between the arithmetic-shift-right by Pt and divide by 2Pt can be seen from the following:
After the level shift the DC has values from +127 to –128. Consider values near zero (after the level shift), and the case
where Pt = 1:
Before
Before
After
After
level shift
point transform
divide by 2
shift-right-arithmetic 1
131
+3
+1
+1
130
+2
+1
+1
129
+1
+0
+0
128
+0
+0
+0
127
–1
+0
–1
126
–2
–1
–1
125
–3
–1
–2
124
–4
–2
–2
123
–5
–2
–3
The key difference is in the truncation of precision. The divide truncates the magnitude; the arithmetic shift truncates the
LSB. With a divide by 2 we would get non-uniform quantization of the DC values; therefore we use the shift-rightarithmetic operation.
For positive values, the divide by 2 and the shift-right-arithmetic by 1 operations are the same. Therefore, the shift-rightarithmetic by 1 operation effectively is a divide by 2 when the point transform is done before the level shift.
178
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Annex L
Patents
(This annex does not form an integral part of this Recommendation | International Standard)
L.1
Introductory remarks
The user’s attention is called to the possibility that – for some of the coding processes specified in Annexes F, G, H, and J
– compliance with this Specification may require use of an invention covered by patent rights.
By publication of this Specification, no position is taken with respect to the validity of this claim or of any patent rights in
connection therewith. However, for each patent listed in this annex, the patent holder has filed with the Information
Technology Task Force (ITTF) and the Telecommunication Standardization Bureau (TSB) a statement of willingness to
grant a license under these rights on reasonable and non-discriminatory terms and conditions to applicants desiring to
obtain such a license.
The criteria for including patents in this annex are:
a)
the patent has been identified by someone who is familiar with the technical fields relevant to this
Specification, and who believes use of the invention covered by the patent is required for implementation
of one or more of the coding processes specified in Annexes F, G, H, or J;
b)
the patent-holder has written a letter to the ITTF and TSB, stating willingness to grant a license to an
unlimited number of applicants throughout the world under reasonable terms and conditions that are
demonstrably free of any unfair discrimination.
This list of patents shall be updated, if necessary, upon publication of any revisions to the Recommendation | International
Standard.
L.2
List of patents
The following patents may be required for implementation of any one of the processes specified in Annexes F, G, H, and J
which uses arithmetic coding:
US 4,633,490, December 30, 1986, IBM, MITCHELL (J.L.) and GOERTZEL (G.): Symmetrical Adaptive Data
Compression/Decompression System.
US 4,652,856, February 4, 1986, IBM, MOHIUDDIN (K.M.) and RISSANEN (J.J.): A Multiplication-free
Multi-Alphabet Arithmetic Code.
US 4,369,463, January 18, 1983, IBM, ANASTASSIOU (D.) and MITCHELL (J.L.): Grey Scale Image
Compression with Code Words a Function of Image History.
US 4,749,983, June 7, 1988, IBM, LANGDON (G.): Compression of Multilevel Signals.
US 4,935,882, June 19, 1990, IBM, PENNEBAKER (W.B.) and MITCHELL (J.L.): Probability Adaptation
for Arithmetic Coders.
US 4,905,297, February 27, 1990, IBM, LANGDON (G.G.), Jr., MITCHELL (J.L.), PENNEBAKER (W.B.),
and RISSANEN (J.J.): Arithmetic Coding Encoder and Decoder System.
US 4,973,961, November 27, 1990, AT&T, CHAMZAS (C.), DUTTWEILER (D.L.): Method and Apparatus
for Carry-over Control in Arithmetic Entropy Coding.
US 5,025,258, June 18, 1991, AT&T, DUTTWEILER (D.L): Adaptive Probability Estimator for Entropy
Encoding/Decoding.
US 5,099,440, March 24, 1992, IBM, PENNEBAKER (W.B.) and MITCHELL (J.L.): Probability Adaptation
for Arithmetic Coders.
Japanese Patent Application 2-46275, February 26, 1990, MEL ONO (F.), KIMURA (T.), YOSHIDA (M.), and
KINO (S.): Coding System.
The following patent may be required for implementation of any one of the hierarchical processes specified in Annex H
when used with a lossless final frame:
US 4,665,436, May 12, 1987, EI OSBORNE (J.A.) and SEIFFERT (C.): Narrow Bandwidth Signal
Transmission.
CCITT Rec. T.81 (1992 E)
179
ISO/IEC 10918-1 : 1993(E)
No other patents required for implementation of any of the other processes specified in Annexes F, G, H, or J had been
identified at the time of publication of this Specification.
L.3
Contact addresses for patent information
Director, Telecommunication Standardization Bureau (formerly CCITT)
International Telecommunication Union
Place des Nations
CH-1211 Genève 20, Switzerland
Tel. +41 (22) 730 5111
Fax: +41 (22) 730 5853
Information Technology Task Force
International Organization for Standardization
1, rue de Varembé
CH-1211 Genève 20, Switzerland
Tel: +41 (22) 734 0150
Fax: +41 (22) 733 3843
Program Manager, Licensing
Intellectual Property and Licensing Services
IBM Corporation
208 Harbor Drive
P.O. Box 10501
Stamford, Connecticut 08904-2501, USA
Tel: +1 (203) 973 7935
Fax: +1 (203) 973 7981 or +1 (203) 973 7982
Mitsubishi Electric Corp.
Intellectual Property License Department
1-2-3 Morunouchi, Chiyoda-ku
Tokyo 100, Japan
Tel: +81 (3) 3218 3465
Fax: +81 (3) 3215 3842
AT&T Intellectual Property Division Manager
Room 3A21
10 Independence Blvd.
Warren, NJ 07059, USA
Tel: +1 (908) 580 5392
Fax: +1 (908) 580 6355
Senior General Manager
Corporate Intellectual Property and Legal Headquarters
Canon Inc.
30-2 Shimomaruko 3-chome
Ohta-ku Tokyo 146 Japan
Tel: +81 (3) 3758 2111
Fax: +81 (3) 3756 0947
Chief Executive Officer
Electronic Imagery, Inc.
1100 Park Central Boulevard South
Suite 3400
Pompano Beach, FL 33064, USA
Tel: +1 (305) 968 7100
Fax: +1 (305) 968 7319
180
CCITT Rec. T.81 (1992 E)
ISO/IEC 10918-1 : 1993(E)
Annex M
Bibliography
(This annex does not form an integral part of this Recommendation | International Standard)
M.1
General references
LEGER (A.), OMACHI (T.), and WALLACE (G.K.): JPEG Still Picture Compression Algorithm, Optical Engineering,
Vol. 30, No. 7, pp. 947-954, 1991.
RABBANI (M.) and JONES (P.): Digital Image Compression Techniques, Tutorial Texts in Optical Engineering,
Vol. TT7, SPIE Press, 1991.
HUDSON (G.), YASUDA (H.) and SEBESTYEN (I.): The International Standardization of a Still Picture Compression
Technique, Proc. of IEEE Global Telecommunications Conference, pp. 1016-1021, 1988.
LEGER (A.), MITCHELL (J.) and YAMAZAKI (Y.): Still Picture Compression Algorithm Evaluated for International
Standardization, Proc. of the IEEE Global Telecommunications Conference, pp. 1028-1032, 1988.
WALLACE (G.), VIVIAN (R.) and POULSEN (H.): Subjective Testing Results for Still Picture Compression Algorithms
for International Standardization, Proc. of the IEEE Global Telecommunications Conference, pp. 1022-1027, 1988.
MITCHELL (J.L.) and PENNEBAKER (W.B.): Evolving JPEG Colour Data Compression Standard, Standards for
Electronic Imaging Systems, M. Nier, M.E. Courtot, Editors, SPIE, Vol. CR37, pp. 68-97, 1991.
WALLACE (G.K.): The JPEG Still Picture Compression Standard, Communications of the ACM, Vol. 34, No. 4, pp. 3144, 1991.
NETRAVALI (A.N.) and HASKELL (B.G.): Digital Pictures: Representation and Compression, Plenum
New York 1988.
Press,
PENNEBAKER (W.B.) and MITCHELL (J.L.): JPEG: Still Image Data Compression Standard, Van Nostrand
Reinhold, New York 1993.
M.2
DCT references
CHEN (W.), SMITH (C.H.) and FRALICK (S.C.): A Fast Computational Algorithm for the Discrete Cosine Transform,
IEEE Trans. on Communications, Vol. COM-25, pp. 1004-1009, 1977.
AHMED (N.), NATARAJAN (T.) and RAO (K.R.): Discrete Cosine Transform, IEEE Trans. on Computers, Vol. C-23,
pp. 90-93, 1974.
NARASINHA (N.J.) and PETERSON (A.M.): On the Computation of the Discrete Cosine Transform, IEEE Trans. on
Communications, Vol. COM-26, No. 6, pp. 966-968, 1978.
DUHAMEL (P.) and GUILLEMOT (C.): Polynomial Transform Computation of the 2-D DCT, Proc. IEEE ICASSP-90,
pp. 1515-1518, Albuquerque, New Mexico 1990.
FEIG (E.): A Fast Scaled DCT Algorithm, in Image Processing Algorithms and Techniques, Proc. SPIE, Vol. 1244, K.S.
Pennington and R. J. Moorhead II, Editors, pp. 2-13, Santa Clara, California, 1990.
HOU (H.S.): A Fast Recursive Algorithm for Computing the Discrete Cosine Transform, IEEE Trans. Acoust. Speech and
Signal Processing, Vol. ASSP-35, No. 10, pp. 1455-1461.
LEE (B.G.): A New Algorithm to Compute the Discrete Cosine Transform, IEEE Trans. on Acoust., Speech and Signal
Processing, Vol. ASSP-32, No. 6, pp. 1243-1245, 1984.
LINZER (E.N.) and FEIG (E.): New DCT and Scaled DCT Algorithms for Fused Multiply/Add Architectures, Proc.
IEEE ICASSP-91, pp. 2201-2204, Toronto, Canada, 1991.
VETTERLI (M.) and NUSSBAUMER (H.J.): Simple FFT and DCT Algorithms with Reduced Number of Operations,
Signal Processing, 1984.
CCITT Rec. T.81 (1992 E)
181
ISO/IEC 10918-1 : 1993(E)
VETTERLI (M.): Fast 2-D Discrete Cosine Transform, Proc. IEEE ICASSP-85, pp. 1538-1541, Tampa, Florida, 1985.
ARAI (Y.), AGUI (T.), and NAKAJIMA (M.): A Fast DCT-SQ Scheme for Images, Trans. of IEICE, Vol. E.71, No. 11,
pp. 1095-1097, 1988.
SUEHIRO (N.) and HATORI (M.): Fast Algorithms for the DFT and other Sinusoidal Transforms, IEEE Trans. on
Acoust., Speech and Signal Processing, Vol ASSP-34, No. 3, pp. 642-644, 1986.
M.3
Quantization and human visual model references
CHEN (W.H.) and PRATT (W.K.): Scene adaptive coder, IEEE Trans. on Communications, Vol. COM-32, pp. 225-232,
1984.
GRANRATH (D.J.): The role of human visual models in image processing, Proceedings of the IEEE, Vol. 67,
pp. 552-561, 1981.
LOHSCHELLER (H.): Vision adapted progressive image transmission, Proceedings of EUSIPCO, Vol. 83, pp. 191-194,
1983.
LOHSCHELLER (H.) and FRANKE (U.): Colour picture coding – Algorithm optimization and technical realization,
Frequenze, Vol. 41, pp. 291-299, 1987.
LOHSCHELLER (H.): A subjectively adapted image communication system, IEEE Trans. on Communications,
Vol. COM-32, pp. 1316-1322, 1984.
PETERSON (H.A.) et al: Quantization of colour image components in the DCT domain, SPIE/IS&T 1991 Symposium on
Electronic Imaging Science and Technology, 1991.
M.4
Arithmetic coding references
LANGDON (G.): An Introduction to Arithmetic Coding, IBM J. Res. Develop., Vol. 28, pp. 135-149, 1984.
PENNEBAKER (W.B.), MITCHELL (J.L.), LANGDON (G.) Jr., and ARPS (R.B.): An Overview of the Basic Principles
of the Q-Coder Binary Arithmetic Coder, IBM J. Res. Develop., Vol. 32, No. 6, pp. 717-726, 1988.
MITCHELL (J.L.) and PENNEBAKER (W.B.): Optimal Hardware and Software Arithmetic Coding Procedures for the
Q-Coder Binary Arithmetic Coder, IBM J. Res. Develop., Vol. 32, No. 6, pp. 727-736, 1988.
PENNEBAKER (W.B.) and MITCHELL (J.L.): Probability Estimation for the Q-Coder, IBM J. Res. Develop., Vol. 32,
No. 6, pp. 737-752, 1988.
MITCHELL (J.L.) and PENNEBAKER (W.B.): Software Implementations of the Q-Coder, IBM J. Res. Develop.,
Vol. 32, No. 6, pp. 753-774, 1988.
ARPS (R.B.), TRUONG (T.K.), LU (D.J.), PASCO (R.C.) and FRIEDMAN (T.D.): A Multi-Purpose VLSI Chip for
Adaptive Data Compression of Bilevel Images, IBM J. Res. Develop., Vol. 32, No. 6, pp. 775-795, 1988.
ONO (F.), YOSHIDA (M.), KIMURA (T.) and KINO (S.): Subtraction-type Arithmetic Coding with
Conditional Exchange, Annual Spring Conference of IECED, Japan, D-288, 1990.
MPS/LPS
DUTTWEILER (D.) and CHAMZAS (C.): Probability Estimation in Arithmetic and Adaptive-Huffman Entropy Coders,
submitted to IEEE Trans. on Image Processing.
JONES (C.B.): An Efficient Coding System for Long Source Sequences, IEEE Trans. Inf. Theory,Vol. IT-27,
pp. 280-291, 1981.
LANGDON (G.): Method for Carry-over Control in a Fifo Arithmetic Code String, IBM Technical Disclosure Bulletin,
Vol. 23, No.1, pp. 310-312, 1980.
M.5
Huffman coding references
HUFFMAN (D.A.): A Method for the Construction of Minimum Redundancy codes, Proc. IRE, Vol. 40, pp. 1098-1101,
1952.
182
CCITT Rec. T.81 (1992 E)
JPEG File Interchange Format
Version 1.02
September 1, 1992
Eric Hamilton
C-Cube Microsystems
1778 McCarthy Blvd.
Milpitas, CA 95035
+1 408 944-6300
Fax: +1 408 944-6314
E-mail: [email protected]
JPEG File Interchange Format
Version 1.02
Why a File Interchange Format
JPEG File Interchange Format is a minimal file format which enables JPEG bitstreams to
be exchanged between a wide variety of platforms and applications. This minimal format
does not include any of the advanced features found in the TIFF JPEG specification or any
application specific file format. Nor should it, for the only purpose of this simplified
format is to allow the exchange of JPEG compressed images.
JPEG File Interchange Format features
o
o
o
o
o
o
o
Uses JPEG compression
Uses JPEG interchange format compressed image representation
PC or Mac or Unix workstation compatible
Standard color space: one or three components. For three components, YCbCr
(CCIR 601-256 levels)
APP0 marker used to specify Units, X pixel density, Y pixel density, thumbnail
APP0 marker also used to specify JFIF extensions
APP0 marker also used to specify application-specific information
JPEG Compression
Although any JPEG process is supported by the syntax of the JPEG File Interchange Format
(JFIF) it is strongly recommended that the JPEG baseline process be used for the purposes
of file interchange. This ensures maximum compatibility with all applications supporting
JPEG. JFIF conforms to the JPEG Draft International Standard (ISO DIS 10918-1).
The JPEG File Interchange Format is entirely compatible with the standard JPEG
interchange format; the only additional requirement is the mandatory presence of the
APP0 marker right after the SOI marker. Note that JPEG interchange format requires (as
does JFIF) that all table specifications used in the encoding process be coded in the
bitstream prior to their use.
Compatible across platforms
The JPEG File Interchange Format is compatible across platforms: for example, it does not
use any resource forks, supported by the Macintosh but not by PCs or workstations.
Standard color space
The color space to be used is YCbCr as defined by CCIR 601 (256 levels). The RGB
components calculated by linear conversion from YCbCr shall not be gamma corrected
(gamma = 1.0). If only one component is used, that component shall be Y.
APP0 marker used to identify JPEG FIF
The APP0 marker is used to identify a JPEG FIF file.
mandatory right after the SOI marker.
The JPEG FIF APP0 marker is
The JFIF APP0 marker is identified by a zero terminated string: "JFIF". The APP0 can be
used for any other purpose by the application provided it can be distinguished from the
JFIF APP0.
The JFIF APP0 marker provides information which is missing from the JPEG stream:
version number, X and Y pixel density (dots per inch or dots per cm), pixel aspect ratio
(derived from X and Y pixel density), thumbnail.
APP0 marker used to specify JFIF extensions
Additional APP0 marker segment(s) can optionally be used to specify JFIF extensions. If
used, these segment(s) must immediately follow the JFIF APP0 marker. Decoders should
skip any unsupported JFIF extension segments and continue decoding.
The JFIF extension APP0 marker is identified by a zero terminated string: "JFXX". The
JFIF extension APP0 marker segment contains a 1-byte code which identifies the extension.
This version, version 1.02, has only one extension defined: an extension for defining
thumbnails stored in formats other than 24-bit RGB.
APP0 marker used for application-specific information
Additional APP0 marker segments can be used to hold application-specific information
which does not affect the decodability or displayability of the JFIF file. Applicationspecific APP0 marker segments must appear after the JFIF APP0 and any JFXX APP0
segments. Decoders should skip any unrecognized application-specific APP0 segments.
Application-specific APP0 marker segments are identified by a zero terminated string
which identifies the application (not "JFIF" or "JFXX"). This string should be an
organization name or company trademark. Generic strings such as dog, cat, tree, etc.
should not be used.
Conversion to and from RGB
Y, Cb, and Cr are converted from R, G, and B as defined in CCIR Recommendation 601
but are normalized so as to occupy the full 256 levels of a 8-bit binary encoding.
precisely:
Y
Cb
Cr
More
= 256 * E'y
= 256 * [ E'Cb ] + 128
= 256 * [ E'Cr ] + 128
where the E'y, E'Cb and E'Cb are defined as in CCIR 601. Since values of E'y have a
range of 0 to 1.0 and those for E'Cb and E'Cr have a range of -0.5 to +0.5, Y, Cb, and Cr
must be clamped to 255 when they are maximum value.
RGB to YCbCr Conversion
YCbCr (256 levels) can be computed directly from 8-bit RGB as follows:
Y
Cb
Cr
=
=
=
0.299 R + 0.587 G + 0.114 B
- 0.1687 R - 0.3313 G + 0.5
B + 128
0.5
R - 0.4187 G - 0.0813 B + 128
NOTE - Not all image file formats store image samples in the order R0, G0,
B0, ... Rn, Gn, Bn. Be sure to verify the sample order before converting an
RGB file to JFIF.
YCbCr to RGB Conversion
RGB can be computed directly from YCbCr (256 levels) as follows:
R = Y
+ 1.402
(Cr-128)
G = Y - 0.34414 (Cb-128) - 0.71414 (Cr-128)
B = Y + 1.772
(Cb-128)
Image Orientation
In JFIF files, the image orientation is always top-down. This means that the first image
samples encoded in a JFIF file are located in the upper left hand corner of the image and
encoding proceeds from left to right and top to bottom. Top-down orientation is used for
both the full resolution image and the thumbnail image.
The process of converting an image file having bottom-up orientation to JFIF must include
inverting the order of all image lines before JPEG encoding
Spatial Relationship of Components
Specification of the spatial positioning of pixel samples within components relative to the
samples of other components is necessary for proper image post processing and accurate
image presentation. In JFIF files, the position of the pixels in subsampled components are
defined with respect to the highest resolution component. Since components must be
sampled orthogonally (along rows and columns), the spatial position of the samples in a
given subsampled component may be determined by specifying the horizontal and vertical
offsets of the first sample, i.e. the sample in the upper left corner, with respect to the
highest resolution component.
The horizontal and vertical offsets of the first sample in a subsampled component,
Xoffseti[0,0] and Yoffseti[0,0], is defined to be
Xoffseti[0,0] = ( Nsamplesref / Nsamplesi ) / 2 - 0.5
Yoffseti[0,0] = ( Nlinesref / Nlinesi ) / 2 - 0.5
where
Nsamplesref is the number of samples per line in the largest component,
Nsamplesi is the number of samples per line in the ith component,
Nlinesref is the number of lines in the largest component,
Nlinesi is the number of lines in the ith component.
Proper subsampling of components incorporates an anti-aliasing filter which reduces the
spectral bandwidth of the full resolution components.
Subsampling can easily be
accomplished using a symmetrical digital filter with an even number of taps (coefficients).
A commonly used filter for 2:1 subsampling utilizes two taps (1/2,1/2).
NOTE - This definition is compatible with industry standards such as Postcript
Level 2 and QuickTime. This defintition is not compatible with the conventions
used by CCIR Recommendation 601-1 and other digital video formats. For these
formats, pre-processing of the chrominance components is necessary prior to
compression in order to ensure accurate reconstruction of the compressed image.
JPEG File Interchange Format Specification
The syntax of a JFIF file conforms to the syntax for interchange format defined in Annex B
of ISO DIS 10918-1. In addition, a JFIF file uses APP0 marker segments and constrains
certain parameters in the frame header as defined below.
X'FF', SOI
X'FF', APP0, length, identifier, version, units, Xdensity, Ydensity, Xthumbnail,
Ythumbnail, (RGB)n
length
(2 bytes) Total APP0 field byte count, including the byte
count value (2 bytes), but excluding the APP0
marker itself
identifier (5 bytes) = X'4A', X'46', X'49', X'46', X'00'
This zero terminated string ("JFIF") uniquely
identifies this APP0 marker. This string shall
have zero parity (bit 7=0).
version
(2 bytes) = X'0102'
The most significant byte is used for major
revisions, the least significant byte for minor
revisions. Version 1.02 is the current released
revision.
units
(1 byte)
Units for the X and Y densities.
units = 0: no units, X and Y specify the pixel
aspect ratio
units = 1: X and Y are dots per inch
units = 2: X and Y are dots per cm
Xdensity
(2 bytes) Horizontal pixel density
Ydensity
(2 bytes) Vertical pixel density
Xthumbnail (1 byte)
Thumbnail horizontal pixel count
Ythumbnail (1 byte)
Thumbnail vertical pixel count
(RGB)n
(3n bytes) Packed (24-bit) RGB values for the thumbnail
pixels, n = Xthumbnail * Ythumbnail
[ Optional JFIF extension APP0 marker segment(s) - see below ]
o
o
o
X'FF', SOFn, length, frame parameters
Number of components Nf = 1 or 3
1st component
C1
= 1 = Y component
2nd component
C2
= 2 = Cb component
3rd component
C3
= 3 = Cr component
o
o
o
X'FF', EOI
JFIF Extension APP0 Marker Segment
Immediately following the JFIF APP0 marker segment may be a JFIF extension APP0
marker. This JFIF extension APP0 marker segment may only be present for JFIF versions
1.02 and above. The syntax of the JFIF extension APP0 marker segment is:
X'FF', APP0, length, identifier, extension_code, extension_data
length
(2 bytes)
Total APP0 field byte count, including the byte
count value (2 bytes), but excluding the APP0
marker itself
identifier (5 bytes)
= X'4A', X'46', X'58', X'58', X'00'
This zero terminated string ("JFXX") uniquely
identifies this APP0 marker. This string shall
have zero parity (bit 7=0).
extension_code (1 byte)
= Code which identifies the extension. In this
version, the following extensions are defined:
= X'10'
Thumbnail coded using JPEG
= X'11'
Thumbnail stored using 1 byte/pixel
= X'13'
Thumbnail stored using 3 bytes/pixel
extension_data (variable)
= The specification of the remainder of the JFIF
extension APP0 marker segment varies with the
extension. See below for a specification of
extension_data for each extension.
JFIF Extension:
Thumbnail coded using JPEG
This extension supports thumbnails compressed using JPEG. The compressed thumbnail
immediately follows the extension_code (X'10') in the extension_data field and the length
of the compressed data must be included in the JFIF extension APP0 marker length field.
The syntax of the extension_data field conforms to the syntax for interchange format defined
in Annex B of ISO DIS 10918-1. However, no "JFIF" or "JFXX" marker segments shall
be present. As in the full resolution image of the JFIF file, the syntax of extension_data
constrains parameters in the frame header as defined below:
X'FF', SOI
o
o
o
X'FF', SOFn, length, frame parameters
Number of components
Nf
= 1 or 3
1st component
C1
= 1 = Y component
2nd component
C2
= 2 = Cb component
3rd component
C3
= 3 = Cr component
o
o
o
X'FF', EOI
JFIF Extension:
Thumbnail stored using one byte per pixel
This extension supports thumbnails stored using one byte per pixel and a color palette in
the extension_data field. The syntax of extension_data is:
Xthumbnail
Ythumbnail
palette
(pixel)n
JFIF Extension:
(1 byte)
Thumbnail horizontal pixel count
(1 byte)
Thumbnail vertical pixel count
(768 bytes) 24-bit RGB pixel values for the color palette.
The RGB values define the colors represented by
each value of an 8-bit binary encoding (0 - 255).
(n bytes)
8-bit values for the thumbnail pixels
n = Xthumbnail * Ythumbnail
Thumbnail stored using three bytes per pixel
This extension supports thumbnails stored using three bytes per pixel in the extension_data
field. The syntax of extension_data is:
Xthumbnail
Ythumbnail
(RGB)n
(1 byte)
(1 byte)
(3n bytes)
Thumbnail horizontal pixel count
Thumbnail vertical pixel count
Packed (24-bit) RGB values for the thumbnail
pixels, n = Xthumbnail * Ythumbnail
Useful tips
o you can identify a JFIF file by looking for the following sequence: X'FF', SOI, X'FF',
APP0, <2 bytes to be skipped>, "JFIF", X'00'.
o if you use APP0 elsewhere, be sure not to have the strings "JFIF" or "JFXX" right after
the APP0 marker.
o if you do not want to include a thumbnail, just program Xthumbnail = Ythumbnail = 0.
o be sure to check the version number in the special APP0 field. In general, if the major
version number of the JFIF file matches that supported by the decoder, the file will be
decodable.
o if you only want to specify a pixel aspect ratio, put 0 for the units field in the special
APP0 field. Xdensity and Ydensity can then be programmed for the desired aspect ratio.
Xdensity = 1, Ydensity = 1 will program a 1:1 aspect ratio. Xdensity and Ydensity should
always be non-zero.
The JPEG Still Picture Compression Standard
Gregory K. Wallace
Multimedia Engineering
Digital Equipment Corporation
Maynard, Massachusetts
Submitted in December 1991 for publication in IEEE Transactions on Consumer Electronics
This paper is a revised version of an article by the same
title and author which appeared in the April 1991 issue
of Communications of the ACM.
The key obstacle for many applications is the vast
amount of data required to represent a digital image
directly. A digitized version of a single, color picture
at TV resolution contains on the order of one million
bytes; 35mm resolution requires ten times that amount.
Use of digital images often is not viable due to high
storage or transmission costs, even when image capture
and display devices are quite affordable.
Abstract
For the past few years, a joint ISO/CCITT committee
known as JPEG (Joint Photographic Experts Group)
has been working to establish the first international
compression standard for continuous-tone still images,
both grayscale and color. JPEG’s proposed standard
aims to be generic, to support a wide variety of
applications for continuous-tone images. To meet the
differing needs of many applications, the JPEG
standard includes two basic compression methods, each
with various modes of operation. A DCT-based method
is specified for “lossy’’ compression, and a predictive
method for “lossless’’ compression. JPEG features a
simple lossy technique known as the Baseline method,
a subset of the other DCT-based modes of operation.
The Baseline method has been by far the most widely
implemented JPEG method to date, and is sufficient in
its own right for a large number of applications. This
article provides an overview of the JPEG standard, and
focuses in detail on the Baseline method.
Modern image compression technology offers a
possible solution.
State-of-the-art techniques can
compress typical images from 1/10 to 1/50 their
uncompressed size without visibly affecting image
quality. But compression technology alone is not
sufficient. For digital image applications involving
storage or transmission to become widespread in
today’s marketplace, a standard image compression
method is needed to enable interoperability of
equipment from different manufacturers. The CCITT
recommendation for today’s ubiquitous Group 3 fax
machines [17] is a dramatic example of how a standard
compression method can enable an important image
application. The Group 3 method, however, deals with
bilevel images only and does not address photographic
image compression.
For the past few years, a standardization effort known
by the acronym JPEG, for Joint Photographic Experts
Group, has been working toward establishing the first
international digital image compression standard for
continuous-tone (multilevel) still images, both
grayscale and color. The “joint” in JPEG refers to a
collaboration between CCITT and ISO.
JPEG
convenes officially as the ISO committee designated
JTC1/SC2/WG10, but operates in close informal
collaboration with CCITT SGVIII. JPEG will be both
an ISO Standard and a CCITT Recommendation. The
text of both will be identical.
1 Introduction
Advances over the past decade in many aspects of
digital technology - especially devices for image
acquisition, data storage, and bitmapped printing and
display - have brought about many applications of
digital imaging. However, these applications tend to be
specialized due to their relatively high cost. With the
possible exception of facsimile, digital images are not
commonplace in general-purpose computing systems
the way text and geometric graphics are. The majority
of modern business and consumer usage of photographs
and other types of images takes place through more
traditional analog means.
Photovideotex, desktop publishing, graphic arts, color
facsimile, newspaper wirephoto transmission, medical
imaging, and many other continuous-tone image
applications require a compression standard in order to
1
develop significantly beyond their present state. JPEG
has undertaken the ambitious task of developing a
general-purpose compression standard to meet the
needs of almost all continuous-tone still-image
applications.
2) be applicable to practically any kind of
continuous-tone digital source image (i.e. for most
practical purposes not be restricted to images of
certain dimensions, color spaces, pixel aspect
ratios, etc.) and not be limited to classes of imagery
with restrictions on scene content, such as
complexity, range of colors, or statistical
properties;
If this goal proves attainable, not only will individual
applications flourish, but exchange of images across
application boundaries will be facilitated. This latter
feature will become increasingly important as more
image applications are implemented on general-purpose
computing systems, which are themselves becoming
increasingly interoperable and internetworked. For
applications which require specialized VLSI to meet
their compression and decompression speed
requirements, a common method will provide
economies of scale not possible within a single
application.
3) have tractable computational complexity, to make
feasible software implementations with viable
performance on a range of CPU’s, as well as
hardware implementations with viable cost for
applications requiring high performance;
4)
This article gives an overview of JPEG’s proposed
image-compression standard. Readers without prior
knowledge of JPEG or compression based on the
Discrete Cosine Transform (DCT) are encouraged to
study first the detailed description of the Baseline
sequential codec, which is the basis for all of the
DCT-based decoders. While this article provides many
details, many more are necessarily omitted. The reader
should refer to the ISO draft standard [2] before
attempting implementation.
Some of the earliest industry attention to the JPEG
proposal has been focused on the Baseline sequential
codec as a motion image compression method - of the
‘‘intraframe’’ class, where each frame is encoded as a
separate image. This class of motion image coding,
while providing less compression than ‘‘interframe’’
methods like MPEG, has greater flexibility for video
editing. While this paper focuses only on JPEG as a
still picture standard (as ISO intended), it is interesting
to note that JPEG is likely to become a ‘‘de facto’’
intraframe motion standard as well.
have the following modes of operation:
•
Sequential encoding: each image component is
encoded in a single left-to-right, top-to-bottom
scan;
•
Progressive encoding: the image is encoded in
multiple scans for applications in which
transmission time is long, and the viewer
prefers to watch the image build up in multiple
coarse-to-clear passes;
•
Lossless encoding: the image is encoded to
guarantee exact recovery of every source
image sample value (even though the result is
low compression compared to the lossy
modes);
•
Hierarchical encoding: the image is encoded at
multiple resolutions so that lower-resolution
versions may be accessed without first having
to decompress the image at its full resolution.
In June 1987, JPEG conducted a selection process
based on a blind assessment of subjective picture
quality, and narrowed 12 proposed methods to three.
Three informal working groups formed to refine them,
and in January 1988, a second, more rigorous selection
process [19] revealed that the “ADCT” proposal [11],
based on the 8x8 DCT, had produced the best picture
quality.
2 Background: Requirements and Selection Process
JPEG’s goal has been to develop a method for
continuous-tone image compression which meets the
following requirements:
At the time of its selection, the DCT-based method was
only partially defined for some of the modes of
operation. From 1988 through 1990, JPEG undertook
the sizable task of defining, documenting, simulating,
testing, validating, and simply agreeing on the plethora
of details necessary for genuine interoperability and
universality. Further history of the JPEG effort is
contained in [6, 7, 9, 18].
1) be at or near the state of the art with regard to
compression rate and accompanying image
fidelity, over a wide range of image quality ratings,
and especially in the range where visual fidelity to
the original is characterized as “very good” to
“excellent”; also, the encoder should be
parameterizable, so that the application (or user)
can set the desired compression/quality tradeoff;
2
of operation, the steps shown are used as building
blocks within a larger framework.
3 Architecture of the Proposed Standard
The proposed standard contains the four “modes of
operation” identified previously. For each mode, one
or more distinct codecs are specified. Codecs within a
mode differ according to the precision of source image
samples they can handle or the entropy coding method
they use. Although the word codec (encoder/decoder)
is used frequently in this article, there is no requirement
that implementations must include both an encoder and
a decoder. Many applications will have systems or
devices which require only one or the other.
4.1 8x8 FDCT and IDCT
At the input to the encoder, source image samples are
grouped into 8x8 blocks, shifted from unsigned integers
with range [0, 2P - 1] to signed integers with range
[-2P-1, 2P-1-1], and input to the Forward DCT (FDCT).
At the output from the decoder, the Inverse DCT
(IDCT) outputs 8x8 sample blocks to form the
reconstructed image. The following equations are the
idealized mathematical definitions of the 8x8 FDCT
and 8x8 IDCT:
The four modes of operation and their various codecs
have resulted from JPEG’s goal of being generic and
from the diversity of image formats across applications.
The multiple pieces can give the impression of
undesirable complexity, but they should actually be
regarded as a comprehensive “toolkit” which can span a
wide range of continuous-tone image applications. It is
unlikely that many implementations will utilize every
tool -- indeed, most of the early implementations now
on the market (even before final ISO approval) have
implemented only the Baseline sequential codec.
XXf x y
7
F(u, v) = C(u)C(v)
[ =0
1
4
x
7
( , )*
y=0
uπ cos (2x+1)vπ
cos (2x+1)
16
16
]
XXC u C v F u v
7
f(x, y) =
The Baseline sequential codec is inherently a rich and
sophisticated compression method which will be
sufficient for many applications. Getting this minimum
JPEG
capability
implemented
properly
and
interoperably will provide the industry with an
important initial capability for exchange of images
across vendors and applications.
1
4
7
[u=0 v=0
( ) ( ) ( , )*
uπ cos (2x+1)vπ
cos (2x+1)
16
16
]
√
where:
(1)
C(u), C(v) = 1= 2
for
(2)
u, v = 0 ;
C(u), C(v) = 1 otherwise.
4 Processing Steps for DCT-Based Coding
Figures 1 and 2 show the key processing steps which
are the heart of the DCT-based modes of operation.
These figures illustrate the special case of
single-component (grayscale) image compression. The
reader can grasp the essentials of DCT-based
compression by thinking of it as essentially
compression of a stream of 8x8 blocks of grayscale
image samples. Color image compression can then be
approximately regarded as compression of multiple
grayscale images, which are either compressed entirely
one at a time, or are compressed by alternately
interleaving 8x8 sample blocks from each in turn.
The DCT is related to the Discrete Fourier Transform
(DFT).
Some simple intuition for DCT-based
compression can be obtained by viewing the FDCT as a
harmonic analyzer and the IDCT as a harmonic
synthesizer. Each 8x8 block of source image samples
is effectively a 64-point discrete signal which is a
function of the two spatial dimensions x and y. The
FDCT takes such a signal as its input and decomposes
it into 64 orthogonal basis signals. Each contains one
of the 64 unique two-dimensional (2D) “spatial
frequencies’’ which comprise the input signal’s
“spectrum.” The ouput of the FDCT is the set of 64
basis-signal amplitudes or “DCT coefficients” whose
values are uniquely determined by the particular
64-point input signal.
For DCT sequential-mode codecs, which include the
Baseline sequential codec, the simplified diagrams
indicate how single-component compression works in a
fairly complete way. Each 8x8 block is input, makes
its way through each processing step, and yields output
in compressed form into the data stream. For DCT
progressive-mode codecs, an image buffer exists prior
to the entropy coding step, so that an image can be
stored and then parceled out in multiple scans with successively improving quality. For the hierarchical mode
The DCT coefficient values can thus be regarded as the
relative amount of the 2D spatial frequencies contained
in the 64-point input signal. The coefficient with zero
frequency in both dimensions is called the “DC
coefficient” and the remaining 63 coefficients are
called the “AC coefficients.’’ Because sample values
3
8x8 blocks
DCT-Based Encoder
•
FDCT
Source
Image Data
Entropy
Encoder
Quantizer
Table
Specifications
Table
Specifications
Compressed
Image Data
Figure 1. DCT-Based Encoder Processing Steps
DCT-Based Decoder
Entropy
Decoder
Compressed
Image Data
Table
Specifications
Dequantizer
IDCT
Table
Specifications
Reconstructed
Image Data
Figure 2. DCT-Based Decoder Processing Steps
typically vary slowly from point to point across an
image, the FDCT processing step lays the foundation
for achieving data compression by concentrating most
of the signal in the lower spatial frequencies. For a
typical 8x8 sample block from a typical source image,
most of the spatial frequencies have zero or near-zero
amplitude and need not be encoded.
FDCT and IDCT may be approximately computed have
been devised [16]. Indeed, research in fast DCT
algorithms is ongoing and no single algorithm is
optimal for all implementations. What is optimal in
software for a general-purpose CPU is unlikely to be
optimal in firmware for a programmable DSP and is
certain to be suboptimal for dedicated VLSI.
At the decoder the IDCT reverses this processing step.
It takes the 64 DCT coefficients (which at that point
have been quantized) and reconstructs a 64-point ouput
image signal by summing the basis signals.
Mathematically, the DCT is one-to-one mapping for
64-point vectors between the image and the frequency
domains. If the FDCT and IDCT could be computed
with perfect accuracy and if the DCT coefficients were
not quantized as in the following description, the
original 64-point signal could be exactly recovered. In
principle, the DCT introduces no loss to the source
image samples; it merely transforms them to a domain
in which they can be more efficiently encoded.
Even in light of the finite precision of the DCT inputs
and outputs, independently designed implementations
of the very same FDCT or IDCT algorithm which differ
even minutely in the precision by which they represent
cosine terms or intermediate results, or in the way they
sum and round fractional values, will eventually
produce slightly different outputs from identical inputs.
To preserve freedom for innovation and customization
within implementations, JPEG has chosen to specify
neither a unique FDCT algorithm or a unique IDCT
algorithm in its proposed standard. This makes
compliance somewhat more difficult to confirm,
because two compliant encoders (or decoders)
generally will not produce identical outputs given
identical inputs. The JPEG standard will address this
issue by specifying an accuracy test as part of its
compliance tests for all DCT-based encoders and
decoders; this is to ensure against crudely inaccurate
cosine basis functions which would degrade image
quality.
Some properties of practical FDCT and IDCT
implementations raise the issue of what precisely
should be required by the JPEG standard.
A
fundamental property is that the FDCT and IDCT
equations
contain
transcendental
functions.
Consequently, no physical implementation can
compute them with perfect accuracy. Because of the
DCT’s application importance and its relationship to
the DFT, many different algorithms by which the
4
This output value is normalized by the quantizer step
size. Dequantization is the inverse function, which in
this case means simply that the normalization is
removed by multiplying by the step size, which returns
the result to a representation appropriate for input to the
IDCT:
For each DCT-based mode of operation, the JPEG
proposal specifies separate codecs for images with 8-bit
and 12-bit (per component) source image samples. The
12-bit codecs, needed to accommodate certain types of
medical and other images, require greater
computational resources to achieve the required FDCT
or IDCT accuracy.
Images with other sample
precisions can usually be accommodated by either an
8-bit or 12-bit codec, but this must be done outside the
JPEG standard.
For example, it would be the
responsibility of an application to decide how to fit or
pad a 6-bit sample into the 8-bit encoder’s input
interface, how to unpack it at the decoder’s output, and
how to encode any necessary related information.
FQ (u, v) = FQ(u, v)
0
After output from the FDCT, each of the 64 DCT
coefficients is uniformly quantized in conjunction with
a 64-element Quantization Table, which must be
specified by the application (or user) as an input to the
encoder. Each element can be any integer value from 1
to 255, which specifies the step size of the quantizer for
its corresponding DCT coefficient. The purpose of
quantization is to achieve further compression by
representing DCT coefficients with no greater precision
than is necessary to achieve the desired image quality.
Stated another way, the goal of this processing step is
to discard information which is not visually significant.
Quantization is a many-to-one mapping, and therefore
is fundamentally lossy. It is the principal source of
lossiness in DCT-based encoders.
After quantization, the DC coefficient is treated
separately from the 63 AC coefficients. The DC
coefficient is a measure of the average value of the 64
image samples. Because there is usually strong
correlation between the DC coefficients of adjacent 8x8
blocks, the quantized DC coefficient is encoded as the
difference from the DC term of the previous block in
the encoding order (defined in the following), as shown
in Figure 3. This special treatment is worthwhile, as
DC coefficients frequently contain a significant fraction
of the total image energy.
(3)
DC AC01
...
blocki-1
• •
• •
• •
• •
• •
• •
• •
• •
DCi
blocki
(4)
4.3 DC Coding and Zig-Zag Sequence
Quantization is defined as division of each DCT
coefficient by its corresponding quantizer step size,
followed by rounding to the nearest integer:
DCi-1
Q(u, v)
When the aim is to compress the image as much as
possible without visible artifacts, each step size ideally
should be chosen as the perceptual threshold or “just
noticeable difference” for the visual contribution of its
corresponding cosine basis function. These thresholds
are also functions of the source image characteristics,
display characteristics and viewing distance. For
applications in which these variables can be reasonably
well defined, psychovisual experiments can be
performed to determine the best thresholds. The
experiment described in [12] has led to a set of
Quantization Tables for CCIR-601 [4] images and
displays. These have been used experimentally by
JPEG members and will appear in the ISO standard as a
matter of information, but not as a requirement.
4.2 Quantization
FQ(u, v) = Integer Round (QF((uu,,vv)) )
*
...
DIFF = DCi - DCi-1
AC07
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
AC77
AC70
Differential DC encoding
•
•
•
•
•
•
•
•
Zig−zag sequence
Figure 3. Preparation of Quantized Coefficients for Entropy Coding
5
Finally, all of the quantized coefficients are ordered
into the “zig-zag” sequence, also shown in Figure 3.
This ordering helps to facilitate entropy coding by
placing low-frequency coefficients (which are more
likely to be nonzero) before high-frequency
coefficients.
possible by simply entropy decoding with one method
and entropy recoding with the other.
4.5 Compression and Picture Quality
For color images with moderately complex scenes, all
DCT-based modes of operation typically produce the
following levels of picture quality for the indicated
ranges of compression. These levels are only a
guideline - quality and compression can vary
significantly according to source image characteristics
and scene content. (The units “bits/pixel” here mean
the total number of bits in the compressed image including the chrominance components - divided by the
number of samples in the luminance component.)
4.4 Entropy Coding
The final DCT-based encoder processing step is
entropy coding.
This step achieves additional
compression losslessly by encoding the quantized DCT
coefficients more compactly based on their statistical
characteristics. The JPEG proposal specifies two
entropy coding methods - Huffman coding [8] and
arithmetic coding [15]. The Baseline sequential codec
uses Huffman coding, but codecs with both methods
are specified for all modes of operation.
It is useful to consider entropy coding as a 2-step
process. The first step converts the zig-zag sequence of
quantized coefficients into an intermediate sequence of
symbols. The second step converts the symbols to a
data stream in which the symbols no longer have
externally identifiable boundaries. The form and
definition of the intermediate symbols is dependent on
both the DCT-based mode of operation and the entropy
coding method.
Huffman coding requires that one or more sets of
Huffman code tables be specified by the application.
The same tables used to compress an image are needed
to decompress it. Huffman tables may be predefined
and used within an application as defaults, or computed
specifically for a given image in an initial
statistics-gathering pass prior to compression. Such
choices are the business of the applications which use
JPEG; the JPEG proposal specifies no required
Huffman tables. Huffman coding for the Baseline
sequential encoder is described in detail in section 7.
•
0.25-0.5 bits/pixel: moderate to good quality,
sufficient for some applications;
•
0.5-0.75 bits/pixel: good to very good quality,
sufficient for many applications;
•
0.75-1/5 bits/pixel: excellent quality, sufficient for
most applications;
•
1.5-2.0 bits/pixel: usually indistinguishable from
the original, sufficient for the most demanding
applications.
5 Processing Steps for Predictive Lossless
Coding
After its selection of a DCT-based method in 1988,
JPEG discovered that a DCT-based lossless mode was
difficult to define as a practical standard against which
encoders and decoders could be independently
implemented, without placing severe constraints on
both encoder and decoder implementations.
JPEG, to meet its requirement for a lossless mode of
operation, has chosen a simple predictive method
which is wholly independent of the DCT processing
described previously. Selection of this method was not
the result of rigorous competitive evaluation as was the
DCT-based method. Nevertheless, the JPEG lossless
method produces results which, in light of its
simplicity, are surprisingly close to the state of the art
for lossless continuous-tone compression, as indicated
by a recent technical report [5].
By contrast, the particular arithmetic coding method
specified in the JPEG proposal [2] requires no tables to
be externally input, because it is able to adapt to the
image statistics as it encodes the image. (If desired,
statistical conditioning tables can be used as inputs for
slightly better efficiency, but this is not required.)
Arithmetic coding has produced 5-10% better
compression than Huffman for many of the images
which JPEG members have tested. However, some feel
it is more complex than Huffman coding for certain
implementations, for example, the highest-speed
hardware implementations.
(Throughout JPEG’s
history, “complexity” has proved to be most elusive as
a practical metric for comparing compression methods.)
Figure 4 shows the main processing steps for a
single-component image. A predictor combines the
values of up to three neighboring samples (A, B, and C)
to form a prediction of the sample indicated by X in
Figure 5. This prediction is then subtracted from the
actual value of sample X, and the difference is encoded
If the only difference between two JPEG codecs is the
entropy coding method, transcoding between the two is
6
Lossless Encoder
Entropy
Encoder
Predictor
Source
Image Data
Compressed
Image Data
Table
Specifications
Figure 4. Lossless Mode Encoder Processing Steps
losslessly by either of the entropy coding methods Huffman or arithmetic. Any one of the eight predictors
listed in Table 1 (under “selection-value”) can be used.
selectionvalue
Selections 1, 2, and 3 are one-dimensional predictors
and selections 4, 5, 6 and 7 are two-dimensional
predictors. Selection-value 0 can only be used for
differential coding in the hierarchical mode of
operation. The entropy coding is nearly identical to
that used for the DC coefficient as described in section
7.1 (for Huffman coding).
0
1
2
3
4
5
6
7
prediction
no prediction
A
B
C
A+B-C
A+((B-C)/2)
B+((A-C)/2)
(A+B)/2
Table 1. Predictors for Lossless Coding
C
B
A
X
6 Multiple-Component Images
The previous sections discussed the key processing
steps of the DCT-based and predictive lossless codecs
for the case of single-component source images. These
steps accomplish the image data compression. But a
good deal of the JPEG proposal is also concerned with
the handling and control of color (or other) images with
multiple components. JPEG’s aim for a generic
compression standard requires its proposal to
accommodate a variety of source image formats.
Figure 5. 3-Sample Prediction Neighborhood
For the lossless mode of operation, two different codecs
are specified - one for each entropy coding method.
The encoders can use any source image precision from
2 to 16 bits/sample, and can use any of the predictors
except selection-value 0. The decoders must handle
any of the sample precisions and any of the predictors.
Lossless codecs typically produce around 2:1
compression for color images with moderately complex
scenes.
6.1 Source Image Formats
The source image model used in the JPEG proposal is
an abstraction from a variety of image types and
applications and consists of only what is necessary to
compress and reconstruct digital image data. The
reader should recognize that the JPEG compressed data
format does not encode enough information to serve as
a complete image representation. For example, JPEG
does not specify or encode any information on pixel
aspect ratio, color space, or image acquisition
characteristics.
7
..
Ci
.
samples .
.
.
left .
.
C2 .
C1
.
CNf
.C
Nf-1
top
. . . . . . .
. . . . . . .
. . .
yi
.
line
right
xi
bottom
(b) Characteristics of an image component
(a) Source image with multiple components
Figure 6. JPEG Source Image Model
of decompression. For many systems, this is only
feasible if the components are interleaved together
within the compressed data stream.
Figure 6 illustrates the JPEG source image model. A
source image contains from 1 to 255 image
components, sometimes called color or spectral bands
or channels. Each component consists of a rectangular
array of samples. A sample is defined to be an
unsigned integer with precision P bits, with any value
in the range [0, 2P-1]. All samples of all components
within the same source image must have the same
precision P. P can be 8 or 12 for DCT-based codecs,
and 2 to 16 for predictive codecs.
To make the same interleaving machinery applicable to
both DCT-based and predictive codecs, the JPEG
proposal has defined the concept of “data unit.” A data
unit is a sample in predictive codecs and an 8x8 block
of samples in DCT-based codecs.
The order in which compressed data units are placed in
the compressed data stream is a generalization of
raster-scan order. Generally, data units are ordered
from left-to-right and top-to-bottom according to the
orientation shown in Figure 6. (It is the responsibility
of applications to define which edges of a source image
are top, bottom, left and right.) If an image component
is noninterleaved (i.e., compressed without being
interleaved with other components), compressed data
units are ordered in a pure raster scan as shown in
Figure 7.
The ith component has sample dimensions xi by yi. To
accommodate formats in which some image
components are sampled at different rates than others,
components can have different dimensions.
The
dimensions must have a mutual integral relationship
defined by Hi and Vi, the relative horizontal and
vertical sampling factors, which must be specified for
each component. Overall image dimensions X and Y
are defined as the maximum xi and yi for all
components in the image, and can be any number up to
216. H and V are allowed only the integer values 1
through 4. The encoded parameters are X, Y, and His
and Vis for each components. The decoder reconstructs
the dimensions xi and yi for each component, according
to the following relationship shown in Equation 5:
xi =
yi =
H
X × H i  and
max
Vi
Y ×

Vm ax
top
left
•
•
•
• • • •
• • • •
• • • •
• •
• • right
• •

bottom
Figure 7. Noninterleaved Data Ordering
(5)
where   is the ceiling function.
When two or more components are interleaved, each
component Ci is partitioned into rectangular regions of
Hi by Vi data units, as shown in the generalized
example of Figure 8. Regions are ordered within a
component from left-to-right and top-to-bottom, and
within a region, data units are ordered from left-to-right
and top-to-bottom. The JPEG proposal defines the
term Minimum Coded Unit (MCU) to be the smallest
6.2 Encoding Order and Interleaving
A practical image compression standard must address
how systems will need to handle the data during the
process of decompression. Many applications need to
pipeline the process of displaying or printing
multiple-component images in parallel with the process
8
Cs1: H1=2, V1=2
0 1
0
1
2
3
2
3
4
Cs2: H2=2, V2=1
0 1 2 3 4 5
5
• • •
• • •
•
•
•
•
•
•
• • •
• • •
•
•
•
•
•
•
0
• •
• •
• •
1
• •
• •
• •
Cs3: H3=1, V3=2
0 1 2
0 • • •
1
• • •
2
• • •
3
• • •
1
1
1
1
2
2
3
3
4
1
1
1
1
2
2
3
3
4
1
1
1
1
2
2
3
3
4
1
1
1
1
2
2
3
3
4
MCU1 =
d 00 d 0 1 d 1 0 d 11 d 00 d 0 1 d 00 d 1 0 d 00 ,
MCU2 =
d 0 2 d 03 d 1 2 d 1 3 d 0 2 d 03 d 01 d 11 d 0 1 ,
MCU3 =
d 0 4 d 0 5 d 14 d 1 5 d 0 4 d 0 5 d 0 2 d 12 d 0 2 ,
MCU4 =
d 20 d 21 d 30 d 3 1 d 1 0 d 1 1 d 2 0 d 30 d 1 0 ,
Cs1 data units
Cs2
Cs3
Cs4: H4=1, V4=1
0
1
2
0
• •
•
1
• •
•
Cs4
Figure 8. Generalized Interleaved Data Ordering Example
group of interleaved data units. For the example
shown, MCU1 consists of data units taken first from the
top-left-most region of C1, followed by data units from
the same region of C2, and likewise for C3 and C4.
MCU2 continues the pattern as shown.
6.3 Multiple Tables
In addition to the interleaving control discussed
previously, JPEG codecs must control application of
the proper table data to the proper components. The
same quantization table and the same entropy coding
table (or set of tables) must be used to encode all
samples within a component.
Thus, interleaved data is an ordered sequence of MCUs,
and the number of data units contained in an MCU is
determined by the number of components interleaved
and their relative sampling factors. The maximum
number of components which can be interleaved is 4
and the maximum number of data units in an MCU is
10. The latter restriction is expressed as shown in
Equation 6, where the summation is over the
interleaved components:
JPEG decoders can store up to 4 different quantization
tables and up to 4 different (sets of) entropy coding
tables simultaneously.
(The Baseline sequential
decoder is the exception; it can only store up to 2 sets
of entropy coding tables.) This is necessary for
switching
between
different
tables
during
decompression of a scan containing multiple
(interleaved) components, in order to apply the proper
table to the proper component. (Tables cannot be
loaded during decompression of a scan.) Figure 9
illustrates the table-switching control that must be
managed in conjunction with multiple-component
interleaving for the encoder side. (This simplified view
does not distinguish between quantization and entropy
coding tables.)
∑ Hi × Vi ≤ 10
all i in
interleave
(6)
Because of this restriction, not every combination of 4
components which can be represented in noninterleaved
order within a JPEG-compressed image is allowed to
be interleaved. Also, note that the JPEG proposal
allows some components to be interleaved and some to
be noninterleaved within the same compressed image.
9
B
•
•
•
C
•
A
Baseline sequential was defined, the committee’s VLSI
experts felt that current technology made the feasibility
of crowding four sets of loadable Huffman tables - in
addition to four sets of Quantization tables - onto a
single commodity-priced codec chip a risky
proposition.
Encoding
Process
••
•
Source
Table Table
Image Data Spec. 1 Spec. 2
The FDCT, Quantization, DC differencing, and zig-zag
ordering processing steps for the Baseline sequential
codec proceed just as described in section 4. Prior to
entropy coding, there usually are few nonzero and
many zero-valued coefficients. The task of entropy
coding is to encode these few coefficients efficiently.
The description of Baseline sequential entropy coding
is given in two steps: conversion of the quantized DCT
coefficients into an intermediate sequence of symbols
and assignment of variable-length codes to the
symbols.
Compressed
Image Data
Figure 9. Component-Interleave and
Table-Switching Control
7 Baseline and Other DCT Sequential
Codecs
7.1 Intermediate Entropy Coding Representations
The DCT sequential mode of operation consists of the
FDCT and Quantization steps from section 4, and the
multiple-component control from section 6.3. In
addition to the Baseline sequential codec, other DCT
sequential codecs are defined to accommodate the two
different sample precisions (8 and 12 bits) and the two
different types of entropy coding methods (Huffman
and arithmetic).
In the intermediate symbol sequence, each nonzero AC
coefficient is represented in combination with the
‘‘runlength’’ (consecutive number) of zero-valued AC
coefficients which precede it in the zig-zag sequence.
Each such runlength/nonzero-coefficient combination is
(usually) represented by a pair of symbols:
symbol-1
(RUNLENGTH, SIZE)
Baseline sequential coding is for images with 8-bit
samples and uses Huffman coding only. It also differs
from the other sequential DCT codecs in that its
decoder can store only two sets of Huffman tables (one
AC table and DC table per set). This restriction means
that, for images with three or four interleaved
components, at least one set of Huffman tables must be
shared by two components. This restriction poses no
limitation at all for noninterleaved components; a new
set of tables can be loaded into the decoder before
decompression of a noninterleaved component begins.
symbol-2
(AMPLITUDE)
Symbol-1 represents two pieces of information,
RUNLENGTH and SIZE. Symbol-2 represents the
single piece of information designated AMPLITUDE,
which is simply the amplitude of the nonzero AC
coefficient. RUNLENGTH is the number of
consecutive zero-valued AC coefficients in the zig-zag
sequence preceding the nonzero AC coefficient being
represented. SIZE is the number of bits used to encode
AMPLITUDE - that is, to encoded symbol-2, by the
signed-integer encoding used with JPEG’s particular
method of Huffman coding.
For many applications which do need to interleave
three color components, this restriction is hardly a
limitation at all. Color spaces (YUV, CIELUV,
CIELAB, and others) which represent the chromatic
(‘‘color’’) information in two components and the
achromatic (‘‘grayscale’’) information in a third are
more efficient for compression than spaces like RGB.
One Huffman table set can be used for the achromatic
component and one for the chrominance components.
DCT coefficient statistics are similar for the
chrominance components of most images, and one set
of Huffman tables can encode both almost as optimally
as two.
RUNLENGTH represents zero-runs of length 0 to 15.
Actual zero-runs in the zig-zag sequence can be greater
than 15, so the symbol-1 value (15, 0) is interpreted as
the extension symbol with runlength=16. There can be
up to three consecutive (15, 0) extensions before the
terminating symbol-1 whose RUNLENGTH value
completes the actual runlength. The terminating
symbol-1 is always followed by a single symbol-2,
except for the case in which the last run of zeros
includes the last (63d) AC coefficient. In this frequent
case, the special symbol-1 value (0,0) means EOB (end
of block), and can be viewed as an ‘‘escape’’ symbol
which terminates the 8x8 sample block.
The committee also felt that early availability of
single-chip implementations at commodity prices
would encourage early acceptance of the JPEG
proposal in a variety of applications. In 1988 when
10
Thus, for each 8x8 block of samples, the zig-zag
sequence of 63 quantized AC coefficients is
represented as a sequence of symbol-1, symbol-2
symbol pairs, though each ‘‘pair’’ can have repetitions
of symbol-1 in the case of a long run-length or only one
symbol-1 in the case of an EOB.
7.2 Variable-Length Entropy Coding
Once the quantized coefficient data for an 8x8 block is
represented in the intermediate symbol sequence
described above, variable-length codes are assigned.
For each 8x8 block, the DC coefficient’s symbol-1 and
symbol-2 representation is coded and output first.
The possible range of quantized AC coefficients
determines the range of values which both the
AMPLITUDE and the SIZE information must
represent. A numerical analysis of the 8x8 FDCT
equation shows that, if the 64-point (8x8 block) input
signal contains N-bit integers, then the nonfractional
part of the output numbers (DCT coefficients) can grow
by at most 3 bits. This is also the largest possible size
of a quantized DCT coefficient when its quantizer step
size has integer value 1.
For both DC and AC coefficients, each symbol-1 is
encoded with a variable-length code (VLC) from the
Huffman table set assigned to the 8x8 block’s image
component.
Each symbol-2 is encoded with a
“variable-length integer” (VLI) code whose length in
bits is given in Table 3. VLCs and VLIs both are codes
with variable lengths, but VLIs are not Huffman codes.
An important distinction is that the length of a VLC
(Huffman code) is not known until it is decoded, but
the length of a VLI is stored in its preceding VLC.
Baseline sequential has 8-bit integer source samples in
the range [-27, 27-1], so quantized AC coefficient
amplitudes are covered by integers in the range [-210,
210-1]. The signed-integer encoding uses symbol-2
AMPLITUDE codes of 1 to 10 bits in length (so SIZE
also represents values from 1 to 10), and
RUNLENGTH represents values from 0 to 15 as
discussed previously. For AC coefficients, the structure
of the symbol-1 and symbol-2 intermediate
representations is illustrated in Tables 2 and 3,
respectively.
Huffman codes (VLCs) must be specified externally as
an input to JPEG encoders. (Note that the form in
which Huffman tables are represented in the data
stream is an indirect specification with which the
decoder must construct the tables themselves prior to
decompression.) The JPEG proposal includes an
example set of Huffman tables in its information annex,
but because they are application-specific, it specifies
none for required use. The VLI codes in contrast, are
“hardwired” into the proposal. This is appropriate,
because the VLI codes are far more numerous, can be
computed rather than stored, and have not been shown
to be appreciably more efficient when implemented as
Huffman codes.
The intermediate representation for an 8x8 sample
block’s differential DC coefficient is structured
similarly. Symbol-1, however, represents only SIZE
information; symbol-2 represents AMPLITUDE
information as before:
symbol-1
(SIZE)
7.3 Baseline Encoding Example
This section gives an example of Baseline compression
and encoding of a single 8x8 sample block. Note that a
good deal of the operation of a complete JPEG Baseline
encoder is omitted here, including creation of
Interchange Format information (parameters, headers,
quantization and Huffman tables), byte-stuffing,
padding to byte-boundaries prior to a marker code, and
other key operations. Nonetheless, this example should
help to make concrete much of the foregoing
explanation.
symbol-2
(AMPLITUDE)
Because the DC coefficient is differentially encoded, it
is covered by twice as many integer values, [-211,
211-1] as the AC coefficients, so one additional level
must be added to the bottom of Table 3 for DC
coefficients. Symbol-1 for DC coefficients thus
represents a value from 1 to 11.
0
1
0 EOB
.
X
RUN
.
X
LENGTH
.
X
15 ZRL
2
SIZE
... 9
Figure 10(a) is an 8x8 block of 8-bit samples,
aribtrarily extracted from a real image. The small
variations from sample to sample indicate the
predominance of low spatial frequencies. After
subtracting 128 from each sample for the required
level-shift, the 8x8 block is input to the FDCT,
equation (1). Figure 10(b) shows (to one decimal place)
the resulting DCT coefficients. Except for a few of the
lowest frequency coefficients, the amplitudes are quite
small.
10
RUN-SIZE
values
Table 2. Baseline Huffman Coding
Symbol-1 Structure
11
139 144 149 153 155 155 155 155
235.6 -1.0 -12.1 -5.2
2.1 -1.7 -2.7 1.3
16
11
10
16
24
40
51
61
144 151 153 156 159 156 156 156
-22.6 -17.5 -6.2 -3.2 -2.9 -0.1 0.4 -1.2
12
12
14
19
26
58
60
55
150 155 160 163 158 156 156 156
-10.9 -9.3 -1.6 1.5 0.2 -0.9 -0.6 -0.1
14
13
16
24
40
57
69
56
80
62
159 161 162 160 160 159 159 159
-7.1 -1.9
0.2 1.5
0.9 -0.1 0.0 0.3
14
17
22
29
51
87
159 160 161 162 162 155 155 155
-0.6 -0.8
1.5 1.6 -0.1 -0.7 0.6 1.3
18
22
37
56
68
109 103 77
161 161 161 161 160 157 157 157
1.8 -0.2
1.6 -0.3 -0.8
1.5 1.0 -1.0
24
35
55
64
81
104 113 92
162 162 161 163 162 157 157 157
-1.3 -0.4 -0.3 -1.5 -0.5
1.7 1.1 -0.8
49
64
78
87
103 121 120 101
162 162 161 161 163 158 158 158
-2.6
1.9 1.2 -0.6 -0.4
72
92
95
98
112 100 103 99
(a)
15
-2
-1
0
0
0
0
0
0
-1
-1
0
0
0
0
0
(b) forward DCT coefficients
source image samples
-1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
(d) normalized quantized
coefficients
1.6 -3.8 -1.8
0
240
0
0
-24
0
0
0
0
0
0
-10
0
0
0
0
0
-12
0
0
0
0
0
0
-14
-13
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
(c) quantization table
144 146 149 152 154 156 156 156
148 150 152 154 156 156 156 156
155 156 157 158 158 157 156 155
160 161 161 162 161 159 157 155
163 163 164 163 162 160 158 156
163 164 164 164 162 160 158 157
160 161 162 162 162 161 159 158
158 159 161 161 162 161 159 158
(e) denormalized quantized
coefficients
Figure 10.
(f) reconstructed image samples
DCT and Quantization Examples
each is preceded by a zero-run of length zero, for
intermediate symbols (0,1)(-1). The last non-zero
coefficient is -1 preceded by two zeros, for (2,1)(-1).
Because this is the last non-zero coefficient, the final
symbol representing this 8x8 block is EOB, or (0,0).
Figure 10(c) is the example quantization table for
luminance (grayscale) components included in the
informational annex of the draft JPEG standard part
1 [2]. Figure 10(d) shows the quantized DCT
coefficients, normalized by their quantization table
entries, as specified by equation (3). At the decoder
these numbers are “denormalized” according to
equation (4), and input to the IDCT, equation (2).
Finally, figure 10(f) shows the reconstructed sample
values, remarkably similar to the originals in 10(a).
Thus, the intermediate sequence of symbols for this
example 8x8 block is:
(2)(3), (1,2)(-2), (0,1)(-1), (0,1)(-1),
(0,1)(-1), (2,1)(-1), (0,0)
Of course, the numbers in figure 10(d) must be
Huffman-encoded before transmission to the
decoder. The first number of the block to be encoded
is the DC term, which must be differentially
encoded. If the quantized DC term of the previous
block is, for example, 12, then the difference is +3.
Thus, the intermediate representation is (2)(3), for
SIZE=2 and AMPLITUDE=3.
Next the codes themselves must be assigned. For this
example, the VLCs (Huffman codes) from the
informational annex of [2] will be used. The
differential-DC VLC for this example is:
(2)
011
The AC luminance VLCs for this example are:
Next, the the quantized AC coefficients are encoded.
Following the zig-zag order, the first non-zero
coefficient is -2, preceded by a zero-run of 1. This
yields an intermediate representation of (1,2)(-2).
Next encountered in the zig-zag order are three
consecutive non-zeros of amplitude -1. This means
(0,0)
(0,1)
(1,2)
(2,1)
12
1010
00
11011
11100
or higher part of the spatial-frequency spectrum for
that 8x8 block. Secondly, the coefficients within the
current band need not be encoded to their full
(quantized) accuracy in a given scan. Upon a
coefficient’s first encoding, the N most significant
bits can be encoded first, where N is specifiable. In
subsequent scans, the less significant bits can then be
encoded. This procedure is called ‘‘successive
approximation.’’ Both procedures can be used
separately, or mixed in flexible combinations.
The VLIs specified in [2] are related to the two’s
complement representation. They are:
(3)
(-2)
(-1)
11
01
0
Thus, the bit-stream for this 8x8 example block is as
follows. Note that 31 bits are required to represent
64 coefficients, which achieves compression of just
under 0.5 bits/sample:
SIZE
1
2
3
4
5
6
7
8
9
10
0111111011010000000001110001010
7.4 Other DCT Sequential Codecs
The structure of the 12-bit DCT sequential codec
with Huffman coding is a straightforward extension
of the entropy coding method described previously.
Quantized DCT coefficients can be 4 bits larger, so
the SIZE and AMPLITUDE information extend
accordingly. DCT sequential with arithmetic coding
is described in detail in [2].
AMPLITUDE
-1,1
-3,-2,2,3
-7..-4,4..7
-15..-8,8..15
-31..-16,16..31
-63..-32,32..63
-127..-64,64..127
-255..-128,128..255
-511..-256,256..511
-1023..-512,512..1023
Table 3. Baseline Entropy Coding
Symbol-2 Structure
8 DCT Progressive Mode
Some intuition for spectral selection and successive
approximation can be obtained from Figure 11. The
quantized DCT coefficient information can be
viewed as a rectangle for which the axes are the
DCT coefficients (in zig-zag order) and their
amplitudes. Spectral selection slices the information
in one dimension and successive approximation in
the other.
The DCT progressive mode of operation consists of
the same FDCT and Quantization steps (from section
4) that are used by DCT sequential mode. The key
difference is that each image component is encoded
in multiple scans rather than in a single scan. The
first scan(s) encode a rough but recognizable version
of the image which can be transmitted quickly in
comparison to the total transmission time, and are
refined by succeeding scans until reaching a level of
picture quality that was established by the
quantization tables.
9 Hierarchical Mode of Operation
The hierarchical mode provides a “pyramidal”
encoding of an image at multiple resolutions, each
differing in resolution from its adjacent encoding by
a factor of two in either the horizontal or vertical
dimension or both. The encoding procedure can be
summarized as follows:
To achieve this requires the addition of an
image-sized buffer memory at the output of the
quantizer, before the input to entropy encoder. The
buffer memory must be of sufficient size to store the
image as quantized DCT coefficients, each of which
(if stored straightforwardly) is 3 bits larger than the
source image samples. After each block of DCT
coefficients is quantized, it is stored in the
coefficient buffer memory. The buffered coefficients
are then partially encoded in each of multiple scans.
1) Filter and down-sample the original image by
the desired number of multiples of 2 in each
dimension.
2) Encode this reduced-size image using one of the
sequential DCT, progressive DCT, or lossless
encoders described previously.
There are two complementary methods by which a
block of quantized DCT coefficients may be partially
encoded.
First, only a specified “band” of
coefficients from the zig-zag sequence need be
encoded within a given scan. This procedure is
called “spectral selection,” because each band
typically contains coefficients which occupy a lower
3) Decode this reduced-size image and then
interpolate and up-sample it by 2 horizontally
and/or
vertically,
using
the
identical
interpolation filter which the receiver must use.
13
Blocks
1 2
0
1
DCT
coefficients
sending
62
63
7 6
MSB
1 0
LSB
(a) image component
as quantized
DCT coefficients
0
sending
0
(b) Sequential encoding
7. . . .
1st scan
0
1st scan
1
2
sending
62
63
765 4
MSB
2nd scan
sending
1
2
2nd scan
sending
sending
3
4
5
3
3rd scan
3rd scan
•
•
•
•
•
•
sending
61
62
63
sending
0
(LSB)
th
6 scan
nth scan
c) progressive encoding: spectral selection
d) progressive encoding: successive approximation
Figure 11. Spectral Selection and Successive Approximation Methods of Progressive Encoding
4) Use this up-sampled image as a prediction of the
original at this resolution, and encode the
difference image using one of the sequential
DCT, progressive DCT, or lossless encoders
described previously.
or DCT-based processes with a final lossless process
for each component.
Hierarchical encoding is useful in applications in
which a very high resolution image must be accessed
by a lower-resolution display. An example is an
image scanned and compressed at high resolution for
a very high-quality printer, where the image must
also be displayed on a low-resolution PC video
screen.
5) Repeat steps 3) and 4) until the full resolution of
the image has been encoded.
The encoding in steps 2) and 4) must be done using
only DCT-based processes, only lossless processes,
14
There are two key balloting phases in the ISO
standardization process: a Committee Draft (CD) is
balloted to determine promotion to Draft
International Standard (DIS), and a DIS is balloted to
determine promotion to International Standard (IS).
A CD ballot requires four to six months of
processing, and a DIS ballot requires six to nine
months of processing. JPEG’s Part 1 began DIS
ballot in November 1991, and Part 2 began CD
ballot in December 1991.
10 Other Aspects of the JPEG Proposal
Some key aspects of the proposed standard can only
be mentioned briefly. Foremost among these are
points concerning the coded representation for
compressed image data specified in addition to the
encoding and decoding procedures.
Most importantly, an interchange format syntax is
specified which ensures that a JPEG-compressed
image can be exchanged successfully between
different application environments. The format is
structured in a consistent way for all modes of
operation. The interchange format always includes
all quantization and entropy-coding tables which
were used to compress the image.
Though there is no guarantee that the first ballot of
each phase will result in promotion to the next, JPEG
achieved promotion of CD Part 1 to DIS Part 1 in the
first ballot. Moreover, JPEG’s DIS Part 1 has
undergone no technical changes (other than some
minor corrections) since JPEG’s final Working Draft
(WD) [14]. Thus, Part 1 has remained unchanged
from the final WD, through CD, and into DIS. If all
goes well, Part 1 should receive final approval as an
IS in mid-1992, with Part 2 getting final IS approval
about nine months later.
Applications (and application-specific standards) are
the “users” of the JPEG standard. The JPEG
standard imposes no requirement that, within an
application’s environment, all or even any tables
must be encoded with the compressed image data
during storage or transmission.
This leaves
applications the freedom to specify default or
referenced tables if they are considered appropriate.
It also leaves them the responsibility to ensure that
JPEG-compliant decoders used within their
environment get loaded with the proper tables at the
proper times, and that the proper tables are included
in the interchange format when a compressed image
is “exported” outside the application.
12 Conclusions
The emerging JPEG continuous-tone image
compression standard is not a panacea that will solve
the myriad issues which must be addressed before
digital images will be fully integrated within all the
applications that will ultimately benefit from them.
For example, if two applications cannot exchange
uncompressed images because they use incompatible
color spaces, aspect ratios, dimensions, etc. then a
common compression method will not help.
Some of the important applications that are already
in the process of adopting JPEG compression or
have stated their interest in doing so are Adobe’s
PostScript language for printing systems [1], the
Raster Content portion of the ISO Office Document
Architecture and Interchange Format [13], the future
CCITT color facsimile standard, and the European
ETSI videotext standard [10].
However, a great many applications are “stuck” because of storage or transmission costs, because of argument over which (nonstandard) compression
method to use, or because VLSI codecs are too expensive due to low volumes. For these applications,
the thorough technical evaluation, testing, selection,
validation, and documentation work which JPEG
committee members have performed is expected to
soon yield an approved international standard that
will withstand the tests of quality and time. As diverse imaging applications become increasingly implemented on open networked computing systems,
the ultimate measure of the committee’s success will
be when JPEG-compressed digital images come to
be regarded and even taken for granted as “just another data type,” as text and graphics are today.
11 Standardization Schedule
JPEG’s ISO standard will be divided into two parts.
Part 1 [2] will specify the four modes of operation,
the different codecs specified for those modes, and
the interchange format. It will also contain a
substantial informational section on implementation
guidelines. Part 2 [3] will specify the compliance
tests which will determine whether an encoder
implementation, a decoder implementation, or a
JPEG-compressed image in interchange format
comply with the Part 1 specifications. In addition to
the ISO documents referenced, the JPEG standard
will also be issued as CCITT Recommendation T.81.
15
For more information
References
Information on how to obtain the ISO JPEG (draft)
standards can be obtained by writing the author at
the following address:
1.
Adobe Systems Inc. PostScript Language Reference Manual. Second Ed. Addison Wesley,
Menlo Park, Calif. 1990
Digital Equipment Corporation
146 Main Street, ML01-2/U44
Maynard, MA 01754-2571
2.
Digital Compression and Coding of Continuoustone Still Images, Part 1, Requirements and
Guidelines. ISO/IEC JTC1 Draft International
Standard 10918-1, Nov. 1991.
Internet: [email protected]
3.
Digital Compression and Coding of Continuoustone Still Images, Part 2, Compliance Testing.
ISO/IEC JTC1 Committee Draft 10918-2, Dec.
1991.
4.
Encoding parameters of digital television for
studios. CCIR Recommendations, Recommendation 601, 1982.
5.
Howard, P.G., and Vitter, J.S. New methods for
lossless image compression using arithmetic
coding. Brown University Dept. of Computer
Science Tech. Report No. CS-91-47, Aug. 1991.
6.
Hudson, G.P. The development of photographic
videotex in the UK. In Proceedings of the IEEE
Global Telecommunications Conference, IEEE
Communication Society, 1983, pp. 319-322.
7.
Hudson, G.P., Yasuda, H., and Sebestyén, I.
The international standardization of a still picture compression technique. In Proceedings of
the IEEE Global Telecommunications Conference, IEEE Communications Society, Nov.
1988, pp. 1016-1021.
8.
Huffman, D.A. A method for the construction
of minimum redundancy codes. In Proceedings
IRE, vol. 40, 1962, pp. 1098-1101.
9.
Léger, A. Implementations of fast discrete cosine transform for full color videotex services
and terminals. In Proceedings of the IEEE
Global Telecommunications Conference, IEEE
Communications Society, 1984, pp. 333-337.
Floppy disks containing uncompressed, compressed,
and reconstructed data for the purpose of informally
validating whether an encoder or decoder
implementation conforms to the proposed standard
are available. Thanks to the following JPEG
committee member and his company who have
agreed to provide these for a nominal fee on behalf
of the committee until arrangements can be made for
ISO to provide them:
Eric Hamilton
C-Cube Microsystems
1778 McCarthy Blvd.
Milpitas, CA 95035
Acknowledgments
The following longtime JPEG core members have
spent untold hours (usually in addition to their ‘‘real
jobs’’) to make this collaborative international effort
succeed. Each has made specific substantive contributions to the JPEG proposal: Aharon Gill (Zoran,
Israel), Eric Hamilton (C-Cube, USA), Alain Leger
(CCETT, France), Adriaan Ligtenberg (Storm,
USA), Herbert Lohscheller (ANT, Germany), Joan
Mitchell (IBM, USA), Michael Nier (Kodak, USA),
Takao Omachi (NEC, Japan), William Pennebaker
(IBM, USA), Henning Poulsen (KTAS, Denmark),
and Jorgen Vaaben (AutoGraph, Denmark). The
leadership efforts of Hiroshi Yasuda (NTT, Japan),
the Convenor of JTC1/SC2/WG8 from which JPEG
was spawned, Istvan Sebestyen (Siemens, Germany),
the Special Rapporteur from CCITT SGVIII, and
Graham Hudson (British Telecom U.K.) former
JPEG chair and founder of the effort which became
JPEG. The author regrets that space does not permit
recognition of the many other individuals who contributed to JPEG’s work.
10. Léger, A., Omachi, T., and Wallace, G. The
JPEG still picture compression algorithm. In
Optical Engineering, vol. 30, no. 7 (July 1991),
pp. 947-954.
Thanks to Majid Rabbani of Eastman Kodak for providing the example in section 7.3.
11. Léger, A., Mitchell, M., and Yamazaki, Y. Still
picture compression algorithms evaluated for international standardization. In Proceedings of
the IEEE Global Telecommunications Conference, IEEE Communications Society, Nov.
1988, pp. 1028-1032.
The author’s role within JPEG has been supported in
a great number of ways by Digital Equipment Corporation
12. Lohscheller, H. A subjectively adapted image
communication system. IEEE Trans. Commun.
COM-32 (Dec. 1984), pp. 1316-1322.
16
13. Office Document Architecture (ODA) and Interchange Format, Part 7: Raster Graphics Content
Architectures.
ISO/IEC JTC1 International
Standard 8613-7.
14. Pennebaker, W.B., JPEG Tech. Specification,
Revision 8. Informal Working paper JPEG-8R8, Aug. 1990.
15. Pennebaker, W.B., Mitchell, J.L., et. al. Arithmetic coding articles. IBM J. Res. Dev., vol. 32,
no. 6 (Nov. 1988), pp. 717-774.
16. Rao, K.R., and Yip, P. Discrete Cosine
Transform--Algorithms, Advantages, Applications. Academic Press, Inc. London, 1990.
17. Standardization of Group 3 facsimile apparatus
for document transmission. CCITT Recommendations, Fascicle VII.2, Recommendation T.4,
1980.
18. Wallace, G.K. Overview of the JPEG
(ISO/CCITT) still image compression standard.
Image Processing Algorithms and Techniques.
In Proceedings of the SPIE, vol. 1244 (Feb.
1990), pp. 220-233.
19. Wallace, G., Vivian, R,. and Poulsen, H. Subjective testing results for still picture compression algorithms for international standardization.
In Proceedings of the IEEE Global Telecommunications Conference. IEEE Communications
Society, Nov. 1988, pp. 1022-1027.
Biography
Gregory K. Wallace is currently Manager of
Multimedia Engineering, Advanced Development, at
Digital Equipment Corporation. Since 1988 he has
served as Chair of the JPEG committee (ISO/IEC
JTC1/SC2/WG10). For the past five years at DEC,
he has worked on efficient software and hardware
implementations of image compression and
processing algorithms for incorporation in
general-purpose computing systems. He received the
BSEE and MSEE from Stanford University in 1977
and 1979. His current research interests are the
integration of robust real-time multimedia
capabilities into networked computing systems.
17
LZW and GIF explained----Steve Blackstock
I hope this little document will help enlighten those of you out there
who want to know more about the Lempel-Ziv Welch compression algorithm, and,
specifically, the implementation that GIF uses.
Before we start, here's a little terminology, for the purposes of this
document:
"character": a fundamental data element. In normal text files, this is
just a single byte. In raster images, which is what we're interested in, it's
an index that specifies the color of a given pixel. I'll refer to an arbitray
character as "K".
"charstream": a stream of characters, as in a data file.
"string": a number of continuous characters, anywhere from one to very
many characters in length. I can specify an arbitrary string as "[...]K".
"prefix": almost the same as a string, but with the implication that a
prefix immediately precedes a character, and a prefix can have a length of
zero. So, a prefix and a character make up a string. I will refer to an
arbitrary prefix as "[...]".
"root": a single-character string. For most purposes, this is a
character, but we may occasionally make a distinction. It is [...]K, where
[...] is empty.
"code": a number, specified by a known number of bits, which maps to a
string.
"codestream": the output stream of codes, as in the "raster data"
"entry": a code and its string.
"string table": a list of entries; usually, but not necessarily, unique.
That should be enough of that.
LZW is a way of compressing data that takes advantage of repetition of
strings in the data. Since raster data usually contains a lot of this
repetition, LZW is a good way of compressing and decompressing it.
For the moment, lets consider normal LZW encoding and decoding. GIF's
variation on the concept is just an extension from there.
LZW manipulates three objects in both compression and decompression: the
charstream, the codestream, and the string table. In compression, the
charstream is the input and the codestream is the output. In decompression,
the codestream is the input and the charstream is the output. The string table
is a product of both compression and decompression, but is never passed from
one to the other.
The first thing we do in LZW compression is initialize our string table.
To do this, we need to choose a code size (how many bits) and know how many
values our characters can possibly take. Let's say our code size is 12 bits,
meaning we can store 0->FFF, or 4096 entries in our string table. Lets also
say that we have 32 possible different characters. (This corresponds to, say,
a picture in which there are 32 different colors possible for each pixel.) To
initialize the table, we set code#0 to character#0, code #1 to character#1,
and so on, until code#31 to character#31. Actually, we are specifying that
each code from 0 to 31 maps to a root. There will be no more entries in the
table that have this property.
Now we start compressing data. Let's first define something called the
"current prefix". It's just a prefix that we'll store things in and compare
things to now and then. I will refer to it as "[.c.]". Initially, the current
prefix has nothing in it. Let's also define a "current string", which will be
the current prefix plus the next character in the charstream. I will refer to
the current string as "[.c.]K", where K is some character. OK, look at the
first character in the charstream. Call it P. Make [.c.]P the current string.
(At this point, of course, it's just the root P.) Now search through the
string table to see if [.c.]P appears in it. Of course, it does now, because
our string table is initialized to have all roots. So we don't do anything.
Now make [.c.]P the current prefix. Look at the next character in the
charstream. Call it Q. Add it to the current prefix to form [.c.]Q, the
current string. Now search through the string table to see if [.c.]Q appears
in it. In this case, of course, it doesn't. Aha! Now we get to do something.
Add [.c.]Q (which is PQ in this case) to the string table for code#32, and
output the code for [.c.] to the codestream. Now start over again with the
current prefix being just the root P. Keep adding characters to [.c.] to form
[.c.]K, until you can't find [.c.]K in the string table. Then output the code
for [.c.] and add [.c.]K to the string table. In pseudo-code, the algorithm
goes something like this:
[1] Initialize string table;
[2] [.c.] <- empty;
[3] K <- next character in charstream;
[4] Is [.c.]K in string table?
(yes: [.c.] <- [.c.]K;
go to [3];
)
(no: add [.c.]K to the string table;
output the code for [.c.] to the codestream;
[.c.] <- K;
go to [3];
)
It's as simple as that! Of course, when you get to step [3] and there
aren't any more characters left, you just output the code for [.c.] and throw
the table away. You're done.
Wanna do an example? Let's pretend we have a four-character alphabet:
A,B,C,D. The charstream looks like ABACABA. Let's compress it. First, we
initialize our string table to: #0=A, #1=B, #2=C, #3=D. The first character is
A, which is in the string table, so [.c.] becomes A. Next we get AB, which is
not in the table, so we output code #0 (for [.c.]),
and add AB to the string table as code #4. [.c.] becomes B. Next we get
[.c.]A = BA, which is not in the string table, so output code #1, and add BA
to the string table as code #5. [.c.] becomes A. Next we get AC, which is not
in the string table. Output code #0, and add AC to the string table as code
#6. Now [.c.] becomes C. Next we get [.c.]A = CA, which is not in the table.
Output #2 for C, and add CA to table as code#7. Now [.c.] becomes A. Next we
get AB, which IS in the string table, so [.c.] gets AB, and we look at ABA,
which is not in the string table, so output the code for AB, which is #4, and
add ABA to the string table as code #8. [.c.] becomes A. We can't get any more
characters, so we just output #0 for the code for A, and we're done. So, the
codestream is #0#1#0#2#4#0.
A few words (four) should be said here about efficiency: use a hashing
strategy. The search through the string table can be computationally
intensive, and some hashing is well worth the effort. Also, note that
"straight LZW" compression runs the risk of overflowing the string table getting to a code which can't be represented in the number of bits you've set
aside for codes. There are several ways of dealing with this problem, and GIF
implements a very clever one, but we'll get to that.
An important thing to notice is that, at any point during the
compression, if [...]K is in the string table, [...] is there also. This fact
suggests an efficient method for storing strings in the table. Rather than
store the entire string of K's in the table, realize that any string can be
expressed as a prefix plus a character: [...]K. If we're about to store [...]K
in the table, we know that [...] is already there, so we can just store the
code for [...] plus the final character K.
Ok, that takes care of compression. Decompression is perhaps more
difficult conceptually, but it is really easier to program.
Here's how it goes: We again have to start with an initialized string
table. This table comes from what knowledge we have about the charstream that
we will eventually get, like what possible values the characters can take. In
GIF files, this information is in the header as the number of possible pixel
values. The beauty of LZW, though, is that this is all we need to know. We
will build the rest of the string table as we decompress the codestream. The
compression is done in such a way that we will never encounter a code in the
codestream that we can't translate into a string.
We need to define something called a "current code", which I will refer
to as "<code>", and an "old-code", which I will refer to as "<old>". To start
things off, look at the first code. This is now <code>. This code will be in
the intialized string table as the code for a root. Output the root to the
charstream. Make this code the old-code <old>. *Now look at the next code, and
make it <code>. It is possible that this code will not be in the string table,
but let's assume for now that it is. Output the string corresponding to <code>
to the codestream. Now find the first character in the string you just
translated. Call this K. Add this to the prefix [...] generated by <old> to
form a new string [...]K. Add this string [...]K to the string table, and set
the old-code <old> to the current code <code>. Repeat from where I typed the
asterisk, and you're all set. Read this paragraph again if you just skimmed
it!!! Now let's consider the possibility that <code> is not in the string
table. Think back to compression, and try to understand what happens when you
have a string like P[...]P[...]PQ appear in the charstream. Suppose P[...] is
already in the string table, but P[...]P is not. The compressor will parse out
P[...], and find that P[...]P is not in the string table. It will output the
code for P[...], and add P[...]P to the string table. Then it will get up to
P[...]P for the next string, and find that P[...]P is in the table, as
the code just added. So it will output the code for P[...]P if it finds
that P[...]PQ is not in the table. The decompressor is always "one step
behind" the compressor. When the decompressor sees the code for P[...]P, it
will not have added that code to it's string table yet because it needed the
beginning character of P[...]P to add to the string for the last code, P[...],
to form the code for P[...]P. However, when a decompressor finds a code that
it doesn't know yet, it will always be the very next one to be added to the
string table. So it can guess at what the string for the code should be, and,
in fact, it will always be correct. If I am a decompressor, and I see
code#124, and yet my string table has entries only up to code#123, I can
figure out what code#124 must be, add it to my string table, and output the
string. If code#123 generated the string, which I will refer to here as a
prefix, [...], then code#124, in this special case, will be [...] plus the
first character of [...]. So just add the first character of [...] to the end
of itself. Not too bad. As an example (and a very common one) of this special
case, let's assume we have a raster image in which the first three pixels have
the same color value. That is, my charstream looks like: QQQ.... For the sake
of argument, let's say we have 32 colors, and Q is the color#12. The
compressor will generate the code sequence 12,32,.... (if you don't know why,
take a minute to understand it.) Remember that #32 is not in the initial
table, which goes from #0 to #31. The decompressor will see #12 and translate
it just fine as color Q. Then it will see #32 and not yet know what that
means. But if it thinks about it long enough, it can figure out that QQ should
be entry#32 in the table and QQ should be the next string output. So the
decompression pseudo-code goes something like:
[1] Initialize string table;
[2] get first code: <code>;
[3] output the string for <code> to the charstream;
[4] <old> = <code>;
[5] <code> <- next code in codestream;
[6] does <code> exist in the string table?
(yes: output the string for <code> to the charstream;
[...] <- translation for <old>;
K <- first character of translation for <code>;
add [...]K to the string table;
<old> <- <code>;
(no: [...] <- translation for <old>;
K <- first character of [...];
output [...]K to charstream and add it to string table;
<old> <- <code>
)
[7] go to [5];
)
Again, when you get to step [5] and there are no more codes, you're
finished. Outputting of strings, and finding of initial characters in strings
are efficiency problems all to themselves, but I'm not going to suggest ways
to do them here. Half the fun of programming is figuring these things out!
--Now for the GIF variations on the theme. In part of the header of a GIF
file, there is a field, in the Raster Data stream, called "code size". This is
a very misleading name for the field, but we have to live with it. What it is
really is the "root size". The actual size, in bits, of the compression codes
actually changes during compression/decompression, and I will refer to that
size here as the "compression size". The initial table is just the codes for
all the roots, as usual, but two special codes are added on top of those.
Suppose you have a "code size", which is usually the number of bits per pixel
in the image, of N. If the number of bits/pixel is one, then N must be 2: the
roots take up slots #0 and #1 in the initial table, and the two special codes
will take up slots #4 and #5. In any other case, N is the number of bits per
pixel, and the roots take up slots #0 through #(2**N-1), and the special codes
are (2**N) and (2**N + 1). The initial compression size will be N+1 bits per
code. If you're encoding, you output the codes (N+1) bits at a time to start
with, and if you're decoding, you grab (N+1) bits from the codestream at a
time. As for the special codes: <CC> or the clear code, is (2**N), and <EOI>,
or end-of-information, is (2**N + 1). <CC> tells the compressor to reinitialize the string table, and to reset the compression size to (N+1). <EOI>
means there's no more in the codestream. If you're encoding or decoding, you
should start adding things to the string table at <CC> + 2. If you're
encoding, you should output <CC> as the very first code, and then whenever
after that you reach code #4095 (hex FFF), because GIF does not allow
compression sizes to be greater than 12 bits. If you're decoding, you should
reinitialize your string table when you observe <CC>. The variable
compression sizes are really no big deal. If you're encoding, you start with a
compression size of (N+1) bits, and, whenever you output the code
(2**(compression size)-1), you bump the compression size up one bit. So the
next code you output will be one bit longer. Remember that the largest
compression size is 12 bits, corresponding to a code of 4095. If you get that
far, you must output <CC> as the next code, and start over. If you're
decoding, you must increase your compression size AS SOON AS YOU write entry
#(2**(compression size) - 1) to the string table. The next code you READ will
be one bit longer. Don't make the mistake of waiting until you need to add the
code (2**compression size) to the table. You'll have already missed a bit from
the last code. The packaging of codes into a bitsream for the raster data is
also a potential stumbling block for the novice encoder or decoder. The lowest
order bit in the code should coincide with the lowest available bit in the
first available byte in the codestream. For example, if you're starting with
5-bit compression codes, and your first three codes are, say, <abcde>,
<fghij>, <klmno>, where e, j, and o are bit#0, then your codestream will start
off like:
byte#0: hijabcde
byte#1: .klmnofg
So the differences between straight LZW and GIF LZW are: two additional
special codes and variable compression sizes. If you understand LZW, and you
understand those variations, you understand it all!
Just as sort of a P.S., you may have noticed that a compressor has a
little bit of flexibility at compression time. I specified a "greedy" approach
to the compression, grabbing as many characters as possible before outputting
codes. This is, in fact, the standard LZW way of doing things, and it will
yield the best compression ratio. But there's no rule saying you can't stop
anywhere along the line and just output the code for the current prefix,
whether it's already in the table or not, and add that string plus the next
character to the string table. There are various reasons for wanting to do
this, especially if the strings get extremely long and make hashing difficult.
If you need to, do it.
Hope this helps out.----steve blackstock
ð
New Technical Notes
Macintosh
®
Developer Support
PT 24 - MacPaint Document Format
Platforms & Tools
Revised by:
Written by:
Jim Reekes
Bill Atkinson
June 1989
1983
This Technical Note describes the internal format of a MacPaint® document, which is a
standard used by many other programs. This description is the same as that found in the
“Macintosh Miscellaneous” section of early Inside Macintosh versions.
Changes since October 1988: Fixed bugs in the example code.
MacPaint documents are easy to read and write, and they have become a standard interchange
format for full–page images on the Macintosh. This Note describes the MacPaint internal
document format to help developers generate and interpret files in this format.
MacPaint documents have a file type of “PNTG,” and since they use only the data fork, you
can ignore the resource fork. The data fork contains a 512–byte header followed by
compressed data which represents a single bitmap (576 pixels wide by 720 pixels tall). At a
resolution of 72 pixels per inch, this bitmap occupies the full 8 inch by 10 inch printable area of
a standard ImageWriter printer page.
Header
The first 512 bytes of the document form a header of the following format:
• 4–byte version number (default = 2)
• 38*8 = 304 bytes of patterns
• 204 unused bytes (reserved for future expansion)
As a Pascal record, the document format could look like the following:
MPHeader = RECORD
Version:
PatArray:
Future:
END;
LONGINT;
ARRAY [1..38] of Pattern;
PACKED ARRAY [1..204] of SignedByte;
If the version number is zero, the document uses default patterns, so you can ignore the rest of
the header block, and if your program generates MacPaint documents, you can write 512 bytes
of zero for the document header. Most programs which read MacPaint documents can skip the
header when reading.
PT 24 - MacPaint Document Format
Platforms & Tools
1 of 5
Macintosh Technical Notes
Bitmap
Following the header are 720 compressed scan lines of data which form the 576 pixel wide by
720 pixel tall bitmap. Without compression, this bitmap would occupy 51,840 bytes and chew
up disk space pretty fast; typical MacPaint documents compress to about 10K using the
_PackBits procedure to compress runs of equal bytes within each scan line. The bitmap part
of a MacPaint document is simply the output of _PackBits called 720 times, with 72 bytes
of input each time.
To determine the maximum size of a MacPaint file, it is worth noting what Inside Macintosh
says about _PackBits:
“The worst case would be when _PackBits adds one byte to the row of bytes when
packing.”
If we include an extra 512 bytes for the file header information to the size of an uncompressed
bitmap (51,840), then the total number of bytes would be 52,352. If we take into account the
extra 720 “potential” bytes (one for each row) to the previous total, the maximum size of a
MacPaint file becomes 53,072 bytes.
Reading Sample
PROCEDURE ReadMPFile;
{ This is a small example procedure written in Pascal that demonstrates
how to read MacPaint files. As a final step, it takes the data that
was read and displays it on the screen to show that it worked.
Caveat: This is not intended to be an example of good programming
practice, in that the possible errors merely cause the program to exit.
This is VERY uninformative, and there should be some sort of error handler
to explain what happened. For simplicity, and thus clarity, those types
of things were deliberately not included. This example will not work
on a 128K Macintosh, since memory allocation is done too simplistically.
}
CONST
DefaultVolume = 0;
HeaderSize = 512;
MaxUnPackedSize = 51840;
{ size of MacPaint header in bytes }
{ maximum MacPaint size in bytes }
{ 720 lines * 72 bytes/line }
VAR
srcPtr:
Ptr;
dstPtr:
Ptr;
saveDstPtr:
Ptr;
lastDestPtr: Ptr;
srcFile:
INTEGER;
srcSize:
LONGINT;
errCode:
INTEGER;
scanLine:
INTEGER;
aPort: GrafPort;
theBitMap:
BitMap;
BEGIN
errCode := FSOpen('MP TestFile', DefaultVolume, srcFile); { Open the
file. }
IF errCode <> noErr THEN ExitToShell;
2 of 5
PT 24 - MacPaint Document Format
Platforms & Tools
Developer Support Center
June 1989
errcode := SetFPos(srcFile, fsFromStart, HeaderSize);
{ Skip the
header )
IF errCode <> noErr THEN ExitToShell;
errCode := GetEOF(srcFile, srcSize);
IF errCode <> noErr THEN ExitToShell;
srcSize := srcSize - HeaderSize ;
srcPtr := NewPtr(srcSize);
{ Find out how big the file
is, }
{ and figure out source
size. }
{ Remove the header from
count. }
{ Make buffer just the
right size }
IF srcPtr = NIL THEN ExitToShell;
errCode := FSRead(srcFile, srcSize, srcPtr); { Read the data into the
buffer. }
IF errCode <> noErr THEN ExitToShell;
{ File marker is past
header. }
errCode := FSClose(srcFile);
{ Close the file we just
read. }
IF errCode <> noErr THEN ExitToShell;
{ Create a buffer that will be used for the Destination BitMap. }
dstPtr := NewPtrClear(MaxUnPackedSize);
{MPW library routine, see
TN 219}
IF dstPtr = NIL THEN ExitToShell;
saveDstPtr := dstPtr;
{ Unpack each scan line into the buffer. Note that 720 scan lines are
guaranteed to be in the file. (They may be blank lines.) In the
UnPackBits call, the 72 is the count of bytes done when the file was
created. MacPaint does one scan line at a time when creating the
file. The destination pointer is tested each time through the scan
loop. UnPackBits should increment this pointer by 72, but in the
case where the packed file is corrupted UnPackBits may end up
sending bits into uncharted territory. A temporary pointer
"lastDstPtr" is used for testing the result.}
FOR scanLine := 1 TO 720 DO BEGIN
lastDstPtr := dstPtr;
UnPackBits(srcPtr, dstPtr, 72);
{ bumps both pointers }
IF ORD4(lastDstPtr) + 72 <> ORD4(dstPtr) THEN ExitToShell;
END;
{ The buffer has been fully unpacked. Create a port that we can draw
into. You should save and restore the current port. }
OpenPort(@aPort);
{ Create a BitMap out of our saveDstPtr that can be copied to the
screen. }
theBitMap.baseAddr := saveDstPtr;
theBitMap.rowBytes := 72;
{ width of MacPaint picture }
SetPt(theBitMap.bounds.topLeft, 0, 0);
SetPt(theBitMap.bounds.botRight, 72*8, 720); {maximum rectangle}
{ Now use that BitMap and draw the piece of it to the screen.
Only draw the piece that is full screen size (portRect). }
CopyBits(theBitMap, aPort.portBits, aPort.portRect,
aPort.portRect, srcCopy, NIL);
PT 24 - MacPaint Document Format
Platforms & Tools
3 of 5
Macintosh Technical Notes
{ We need to dispose of the memory we’ve allocated. You would not
dispose of the destPtr if you wish to edit the data. }
DisposPtr(srcPtr);
{ dispose of the source buffer }
DisposPtr(dstPtr);
{ dispose of the destination buffer }
END;
Writing Sample
PROCEDURE WriteMPFile;
{ This is a small example procedure written in Pascal that demonstrates how
to write MacPaint files. It will use the screen as a handy BitMap to be
written to a file.
}
CONST
DefaultVolume = 0;
HeaderSize = 512;
{ size of MacPaint header in bytes }
MaxFileSize = 53072; { maximum MacPaint file size. }
VAR
srcPtr:
Ptr;
dstPtr:
Ptr;
dstFile:
INTEGER;
dstSize:
LONGINT;
errCode:
INTEGER;
scanLine:
INTEGER;
aPort: GrafPort;
dstBuffer:
PACKED ARRAY[1..HeaderSize] OF BYTE;
I:
LONGINT;
picturePtr:
Ptr;
tempPtr:
BigPtr;
theBitMap:
BitMap;
BEGIN
{ Make an empty buffer that is the picture size. }
picturePtr := NewPtrClear(MaxFileSize);
{MPW library routine, see
TN 219}
IF picturePtr = NIL THEN ExitToShell;
{ Open a port so we can get to the screen's BitMap easily.
save and restore the current port. }
OpenPort(@aPort);
You should
{ Create a BitMap out of our dstPtr that can be copied to the screen.
}
theBitMap.baseAddr := picturePtr;
theBitMap.rowBytes := 72;
{ width of MacPaint picture }
SetPt(theBitMap.bounds.topLeft, 0, 0);
SetPt(theBitMap.bounds.botRight, 72*8, 720); {maximum rectangle}
{ Draw the screen over into our picture buffer. }
CopyBits(aPort.portBits, theBitMap, aPort.portRect,
aPort.portRect, srcCopy, NIL);
{ Create the file, giving it the right Creator and File type.}
errCode := Create('MP TestFile', DefaultVolume, 'MPNT', 'PNTG');
IF errCode <> noErr THEN ExitToShell;
{ Open the data file to be written. }
errCode := FSOpen(dstFileName, DefaultVolume, dstFile);
IF errCode <> noErr THEN ExitToShell;
4 of 5
PT 24 - MacPaint Document Format
Platforms & Tools
Developer Support Center
June 1989
FOR I := 1 to HeaderSize DO
{ Write the header as all zeros. }
dstBuffer[I] := 0;
errCode := FSWrite(dstFile, HeaderSize, @dstBuffer);
IF errCode <> noErr THEN ExitToShell;
{ Now go into a loop where we pack each line of data into the buffer,
then write that data to the file. We are using the line count of 72
in order to make the file readable by MacPaint. Note that the
Pack/UnPackBits can be used for other purposes. }
srcPtr := theBitMap.baseAddr;
{ point at our picture
BitMap }
FOR scanLine := 1 to 720 DO
BEGIN
dstPtr := @dstBuffer;
{ reset the pointer to
bottom }
PackBits(srcPtr, dstPtr, 72);
{ bumps both ptrs }
dstSize := ORD(dstPtr)-ORD(@dstBuffer);
{ calc packed size }
errCode := FSWrite(dstFile, dstSize, @dstBuffer);
IF errCode <> noErr THEN ExitToShell;
END;
errCode := FSClose(dstFile);
{ Close the file we just
wrote. }
IF errCode <> noErr THEN ExitToShell;
END;
Further Reference:
• Inside Macintosh, Volume I-135, QuickDraw
• Inside Macintosh, Volume I-465, Toolbox Utilities
• Inside Macintosh, Volume II-77, The File Manager
MacPaint is a registered trademark of Claris Corporation.
PT 24 - MacPaint Document Format
Platforms & Tools
5 of 5
MacDraw Format
Every couple of months someone requests this information on info-mac. Attached is the front end to
a MacDraw-to-Imagen translator. It is in the C language header format. Microsoft graphics
programs also output MacDraw format.
/*
*
*
*
*
*
*
*
*
*
*
*
*
*
*/
Description of MacDraw file
<MacDraw file> ::= HeadPacket <ObjectList> <End Object>
<Object List> ::= <Object List> <Object> | <Object>
<Object> ::= <Complex Object> | <Simple Object>
<Complex Object> ::= <Nest Object> <Object List> <End Object>
<Simple Object> ::= HeadWord <Body>
<Object Body> ::= endObject | textObject | gridlineObject |
lineObject | rectObject | roundrectObject |
ovalObject | arcObject | freehandObject |
polyObject | nestObject
<Nest Object> ::= HeadWord nestObject
<End Object> ::= HeadWord endObject
/* integer types */
typedef unsigned char
typedef short int
typedef long int
int8;
int16;
int32;
#define NOBJECTS
11
/* packet at head of MacDraw file */
struct HeadPacket
{
int16
unknown1[85];
int16
PlotWidth;
int16
PlotHeight;
int16
PageWidth;
int16
PageHeight;
int16
unknown2[167];
} HeadPacket;
/* word at beginning of each graphical object */
struct HeadWord
{
int8
ObjectType;
int8
Lock;
int16
unknown;
} HeadWord;
/* ObjectType values */
#define endObject
0
#define textObject
1
#define gridlineObject 2
#define lineObject
3
#define rectObject
4
#define roundrectObject 5
#define ovalObject
6
#define arcObject
7
#define freehandObject 8
#define polyObject
9
#define nestObject
10
/* Object #11, Paint format bitmaps is not defined here */
/* Lock values */
#define unlocked
#define locked
0
1
/* end object delimiter */
struct End
{
int8
LineFat;
int8
LinePat;
int8
FillPat;
int8
unknown;
} End;
/* LineFat values */
#define NFat
6
#define invisibleLine
1
#define thinLine
2
#define mediumLine
3
#define thickLine
4
#define fatLine
5
#define defaultLine
2
/* fatness in rasters */
float FatTable[NFat] = {0.,0.,1.,2.,3.5,5.};
/* LinePat, FillPat values
#define NPat
#define noPat
#define whitePat
#define blackPat
#define darkgrayPat
#define medgrayPat
#define lightgrayPat
#define coarsedotsPat
#define dotsPat
#define sparsedotsPat
#define topshinglePat
#define brickPat
#define slantbrickPat
#define leftdiagPat
#define thickleftdiagPat
#define dashleftdiagPat
#define narrowleftdiagPat
#define heavyleftdiagPat
#define dualdiagPat
#define horzdashpat
#define horzlinePat
#define circlePat
#define fourwayPat
#define smallhatchedPat
#define smalldiamondPat
#define rightdiagPat
#define thickrightdiagPat
#define dashrightdiagPat
#define narrowrightdiagPat
#define heavyrightdiagPat
#define trianglePat
#define vertdashpat
#define vertlinePat
#define rightshinglePat
#define heartPat
#define largehatchedPat
#define largediamondPat
*/
37
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/* pattern masks */
#define MPat
8
unsigned char Pat[NPat][MPat] = {
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,0xFF,
0xBB,0xEE,0xBB,0xEE,0xBB,0xEE,0xBB,0xEE,
0x55,0xAA,0x55,0xAA,0x55,0xAA,0x55,0xAA,
0x88,0x22,0x88,0x22,0x88,0x22,0x88,0x22,
0x88,0x00,0x22,0x00,0x88,0x00,0x22,0x00,
0x80,0x00,0x08,0x00,0x80,0x00,0x08,0x00,
0x08,0x00,0x00,0x00,0x80,0x00,0x00,0x00,
0x80,0x80,0x41,0x3E,0x08,0x08,0x14,0xE3,
0x08,0x1C,0x22,0x41,0x80,0x01,0x02,0x04,
0xFF,0x80,0x80,0x80,0xFF,0x08,0x08,0x08,
0x01,0x80,0x40,0x20,0x10,0x08,0x04,0x02,
0x81,0xC0,0x60,0x30,0x18,0x0C,0x06,0x03,
0x11,0x88,0x44,0x00,0x11,0x88,0x44,0x00,
0x11,0x88,0x44,0x22,0x11,0x88,0x44,0x22,
0x33,0x99,0xCC,0x66,0x33,0x99,0xCC,0x66,
0x01,0x80,0x40,0x00,0x02,0x04,0x08,0x00,
0x66,0x00,0x00,0x00,0x99,0x00,0x00,0x00,
0xFF,0x00,0x00,0x00,0xFF,0x00,0x00,0x00,
0x50,0x20,0x20,0x20,0x50,0x88,0x27,0x88,
0x84,0x9F,0x80,0x80,0x04,0x04,0xE7,0x84,
0x01,0x01,0x01,0xFF,0x01,0x01,0x01,0xFF,
0x55,0x88,0x55,0x22,0x55,0x88,0x55,0x22,
0x80,0x01,0x02,0x04,0x08,0x10,0x20,0x40,
0xC0,0x81,0x03,0x06,0x0C,0x18,0x30,0x60,
0x88,0x11,0x22,0x00,0x88,0x11,0x22,0x00,
0x88,0x11,0x22,0x44,0x88,0x11,0x22,0x44,
0xCC,0x99,0x33,0x66,0xCC,0x99,0x33,0x66,
0x20,0x50,0x00,0x00,0x02,0x05,0x00,0x00,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x04,0x04,0x40,0x40,0x04,0x04,0x40,0x40,
0x03,0x84,0x48,0x30,0x0C,0x02,0x01,0x01,
0x0A,0x11,0xA0,0x40,0x00,0xB1,0x4A,0x4A,
0x40,0x40,0x40,0xFF,0x40,0x40,0x40,0x40,
0x41,0x22,0x14,0x08,0x14,0x22,0x41,0x80
};
/* text object */
struct Text
{
int8
LineFat;
int8
LinePat;
int8
FillPat;
int8
unknown1;
int16
BoxDx;
int16
BoxDy;
int8
Style;
int8
Font;
int8
Size;
int8
LineSpace;
int8
Justify;
int8
Orient;
int8
unknown2;
int8
CharCount;
int16
Top;
int16
Left;
int16
Bottom;
int16
Right;
/* plus CharCount bytes */
} Text;
char
TextString[256];
/* Style values */
#define plainStyle
#define boldStyle
#define italicStyle
#define underlineStyle
#define outlineStyle
#define shadowStyle
#define defaultStyle
0
1
2
4
8
16
0
/* Font
#define
#define
#define
#define
#define
#define
#define
#define
values */
ChicagoFont
GenevaFont
NewYorkFont
MonocoFont
VeniceFont
LondonFont
AthensFont
defaultFont
1
2
3
4
5
6
7
1
/* Size
#define
#define
#define
#define
#define
#define
#define
#define
#define
values */
size9
size10
size12
size14
size18
size24
size36
size48
NTextSize
1
2
3
4
5
6
7
8
9
/* LineSpace values */
#define singleSpace
1
#define halfSpace
#define doubleSpace
#define defaultSpace
2
3
1
/* Justify values */
#define leftJustify
#define centerJustify
#define rightJustify
#define defaultJustify
1
2
3
1
/* Orient values */
#define deg0Orient
#define deg90Orient
#define deg180Orient
#define deg270Orient
#define reflect0Orient
#define reflect90Orient
#define reflect180Orient
#define reflect270Orient
0
3
2
1
4
6
5
7
/* grid line object */
struct GridLine
{
int8
LineFat;
int8
LinePat;
int8
FillPat;
int8
Arrow;
int16
y1;
int16
unknown1;
int16
x1;
int16
unknown2;
int16
y2;
int16
unknown3;
int16
x2;
int16
unknown4;
} GridLine;
/* Arrow values */
#define noArrow
0
#define rightArrow
1
#define leftArrow
2
#define bothArrow
3
#define defaultArrow
0
/* arrow length in rasters */
float ArrowSize[NFat] = {0.,0.,5.,10.,17.,25.};
/* arrow angle in radians */
#define ARROW_ANGLE
.5
/* line object */
struct Line
{
int8
LineFat;
int8
LinePat;
int8
FillPat;
int8
Arrow;
int16
y1;
int16
unknown1;
int16
x1;
int16
unknown2;
int16
y2;
int16
unknown3;
int16
x2;
int16
unknown4;
} Line;
/* rectangle object */
struct Rect
{
int8
LineFat;
int8
LinePat;
int8
FillPat;
int8
Corner;
int16
Top;
int16
unknown1;
int16
Left;
int16
unknown2;
int16
int16
int16
int16
} Rect;
Bottom;
unknown3;
Right;
unknown4;
/* Corner values */
#define NCorner
6
#define zeroCorner
0
#define one8Corner
1
#define three16Corner
2
#define one4Corner
3
#define five16Corner
4
#define three8Corner
5
/* radii in inches */
float RadiusTable[NCorner] = {0.,.125,.1875,.25,.3125,.375};
/* rounded rectangle object */
struct RoundRect
{
int8
LineFat;
int8
LinePat;
int8
FillPat;
int8
Corner;
int16
Top;
int16
unknown1;
int16
Left;
int16
unknown2;
int16
Bottom;
int16
unknown3;
int16
Right;
int16
unknown4;
} RoundRect;
/* oval object */
struct Oval
{
int8
LineFat;
int8
LinePat;
int8
FillPat;
int8
unknown;
int16
Top;
int16
unknown1;
int16
Left;
int16
unknown2;
int16
Bottom;
int16
unknown3;
int16
Right;
int16
unknown4;
} Oval;
/* arc object */
struct Arc
{
int8
LineFat;
int8
LinePat;
int8
FillPat;
int8
unknown;
int16
Top;
int16
unknown1;
int16
Left;
int16
unknown2;
int16
Bottom;
int16
unknown3;
int16
Right;
int16
unknown4;
int16
StartAngle;
int16
NDegree;
} Arc;
/* point objects */
#define NPoint
256
struct Point
{
int16
y;
int16
unknown1;
int16
x;
int16
unknown2;
} Point[NPoint];
struct Delta
{
char
dx;
char
dy;
} Delta[NPoint];
/* freehand line object */
struct FreeHand
{
int8
LineFat;
int8
LinePat;
int8
FillPat;
int8
unknown1;
int16
unknown2;
int16
Bytes;
int16
PointCount;
int16
Top;
int16
unknown3;
int16
Left;
int16
unknown4;
int16
Bottom;
int16
unknown5;
int16
Right;
int16
unknown6;
int16
unknown7;
int16
y1;
int16
unknown8;
int16
x1;
int16
unknown9;
/* plus Bytes-28 bytes or Bytes/2-14 dx,dy pairs */
} FreeHand;
/* polygon object */
struct Poly
{
int8
LineFat;
int8
LinePat;
int8
FillPat;
int8
unknown1;
int16
unknown2;
int16
Bytes;
int16
PointCount;
int16
unknown3;
int16
unknown4;
int16
Top;
int16
unknown5;
int16
Left;
int16
unknown6;
int16
Bottom;
int16
unknown7;
int16
Right;
/* plus Bytes-20 or PointCount*4 bytes, PointCount x,y pairs */
} Poly;
/* nest object delimiter */
struct Nest
{
int8
LineFat;
int8
LinePat;
int8
FillPat;
int8
unknown1;
int16
unknown2;
int16
ObjectCount;
int16
unknown3;
int16
Bytes;
int16
Top;
int16
unknown4;
int16
Left;
int16
unknown5;
int16
Bottom;
int16
unknown6;
int16
Right;
int16
unknown7[5];
} Nest;
/*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*/
count Object lengths
#include "MacDraw.h"
main(){
printf ("HeadPacket=%d\n",sizeof(HeadPacket));
printf ("HeadWord=%d\n",sizeof(HeadWord));
printf ("End=%d\n",sizeof(End));
printf ("Text=%d\n",sizeof(Text));
printf ("GridLine=%d\n",sizeof(GridLine));
printf ("Line=%d\n",sizeof(Line));
printf ("Rect=%d\n",sizeof(Rect));
printf ("RoundRect=%d\n",sizeof(RoundRect));
printf ("Oval=%d\n",sizeof(Oval));
printf ("Arc=%d\n",sizeof(Arc));
printf ("FreeHand=%d\n",sizeof(FreeHand));
printf ("Poly=%d\n",sizeof(Poly));
printf ("Nest=%d\n",sizeof(Nest));
};
Standard MIDI File Format
Dustin Caldwell
The standard MIDI file format is a very strange beast. When viewed as a
whole, it can be quite overwhelming. Of course, no matter how you look at it,
describing a piece of music in enough detail to be able to reproduce it
accurately is no small task. So, while complicated, the structure of the midi
file format is fairly intuitive when understood.
I must insert a disclaimer here that I am by no means an expert with
midi nor midi files. I recently obtained a Gravis UltraSound board for my PC,
and upon hearing a few midi files (.MID) thought, "Gee, I'd like to be able to
make my own .MID files." Well, many aggravating hours later, I discovered that
this was no trivial task. But, I couldn't let a stupid file format stop me.
(besides, I once told my wife that computers aren't really that hard to use,
and I'd hate to be a hypocrite) So if any errors are found in this
information, please let me know and I will fix it. Also, this document's scope
does not extend to EVERY type of midi command and EVERY possible file
configuration. It is a basic guide that should enable the reader (with a
moderate investment in time) to generate a quality midi file.
1. Overview
A midi (.MID) file contains basically 2 things, Header chunks and Track
chunks. Section 2 explains the header chunks, and Section 3 explains the track
chunks. A midi file contains ONE header chunk describing the file format,
etc., and any number of track chunks. A track may be thought of in the same
way as a track on a multi-track tape deck. You may assign one track to each
voice, each staff, each instrument or whatever you want.
2. Header Chunk
The header chunk appears at the beginning of the file, and describes the
file in three ways. The header chunk always looks like:
4D 54 68 64 00 00 00 06 ff ff nn nn dd dd
The ascii equivalent of the first 4 bytes is MThd. After MThd comes the 4-byte
size of the header. This will always be 00 00 00 06, because the actual header
information will always be 6 bytes.
ff ff is the file format. There are 3 formats:
0 - single-track
1 - multiple tracks, synchronous
2 - multiple tracks, asynchronous
Single track is fairly self-explanatory - one track only. Synchronous multiple
tracks means that the tracks will all be vertically synchronous, or in other
words, they all start at the same time, and so can represent different parts
in one song. Asynchronous multiple tracks do not necessarily start at the same
time, and can be completely asynchronous.
nn nn is the number of tracks in the midi file.
dd dd is the number of delta-time ticks per quarter note. (More about this
later)
3. Track Chunks
The remainder of the file after the header chunk consists of track chunks.
Each track has one header and may contain as many midi commands as you like.
The header for a track is very similar to the one for the file:
4D 54 72 6B xx xx xx xx
As with the header, the first 4 bytes has an ascii equivalent. This one is
MTrk. The 4 bytes after MTrk give the length of the track (not including the
track header) in bytes.
Following the header are midi events. These events are identical to the
actual data sent and received by MIDI ports on a synth with one addition. A
midi event is preceded by a delta-time. A delta time is the number of ticks
after which the midi event is to be executed. The number of ticks per quarter
note was defined previously in the file header chunk. This delta-time is a
variable-length encoded value. This format, while confusing, allows large
numbers to use as many bytes as they need, without requiring small numbers to
waste bytes by filling with zeros. The number is converted into 7-bit bytes,
and the most-significant bit of each byte is 1 except for the last byte of the
number, which has a msb of 0. This allows the number to be read one byte at a
time, and when you see a msb of 0, you know that it was the last (least
significant) byte of the number. According to the MIDI spec, the entire deltatime should be at most 4 bytes long.
Following the delta-time is a midi event. Each midi event (except a
running midi event) has a command byte which will always have a msb of 1 (the
value will be >= 128). A list of most of these commands is in appendix A. Each
command has different parameters and lengths, but the data that follows the
command will have a msb of 0 (less than 128). The exception to this is a metaevent, which may contain data with a msb of 1. However, meta-events require a
length parameter which alleviates confusion.
One subtlety which can cause confusion is running mode. This is where
the actual midi command is omitted, and the last midi command issued is
assumed. This means that the midi event will consist of a delta-time and the
parameters that would go to the command if it were included.
4. Conclusion
If this explanation has only served to confuse the issue more, the
appendices contain examples which may help clarify the issue. Also, 2
utilities and a graphic file should have been included with this document:
DEC.EXE - This utility converts a binary file (like .MID) to a tab-delimited
text file containing the decimal equivalents of each byte.
REC.EXE - This utility converts a tab-delimited text file of decimal values
into a binary file in which each byte corresponds to one of the decimal
values.
MIDINOTE.PS - This is the postscript form of a page showing note numbers with
a keyboard and with the standard grand staff.
Appendix A
1. MIDI Event Commands
Each command byte has 2 parts. The left nybble (4 bits) contains the actual
command, and the right nybble contains the midi channel number on which the
command will be executed. There are 16 midi channels, and 8 midi commands (the
command nybble must have a msb of 1).
In the following table, x indicates the midi channel number. Note that all
data bytes will be <128 (msb set to 0).
Hex
8x
Binary
1000xxxx
Data
nn vv
Description
Note off (key is released)
nn=note number
vv=velocity
9x
1001xxxx
nn vv
Note on (key is pressed)
nn=note number
vv=velocity
Ax
1010xxxx
nn vv
Key after-touch
nn=note number
vv=velocity
Bx
1011xxxx
cc vv
Control Change
cc=controller number
vv=new value
Cx
1100xxxx
pp
Program (patch) change
pp=new program number
Dx
1101xxxx
cc
Channel after-touch
cc=channel number
Ex
1110xxxx
bb tt
Pitch wheel change (2000H is normal or no
change)
bb=bottom (least sig) 7 bits of value
tt=top (most sig) 7 bits of value
The following table lists meta-events which have no midi channel number. They
are of the format:
FF xx nn dd
All meta-events start with FF followed by the command (xx), the length, or
number of bytes that will contain data (nn), and the actual data (dd).
Hex
00
Binary
00000000
Data
nn ssss
Description
Sets the track's sequence number.
nn=02 (length of 2-byte sequence number)
ssss=sequence number
01
00000001
nn tt ..
Text event- any text you want.
nn=length in bytes of text
tt=text characters
02
00000010
nn tt ..
Same as text event, but used for
copyright info.
nn tt=same as text event
03
00000011
nn tt ..
Sequence or Track name
nn tt=same as text event
04
00000100
nn tt ..
Track instrument name
nn tt=same as text event
05
00000101
nn tt ..
Lyric
nn tt=same as text event
06
00000110
nn tt ..
Marker
nn tt=same as text event
07
00000111
nn tt ..
Cue point
nn tt=same as text event
2F
00101111
00
This event must come at the end of each
track
51
01010001
03 tttttt
Set tempo
tttttt=microseconds/quarter note
58
01011000
04 nn dd ccbb Time Signature
nn=numerator of time sig.
dd=denominator of time sig. 2=quarter
3=eighth, etc.
cc=number of ticks in metronome click
bb=number of 32nd notes to the quarter
note
59
01011001
02 sf mi
Key signature
sf=sharps/flats (-7=7 flats, 0=key of C,
7=7 sharps)
mi=major/minor (0=major, 1=minor)
7F
01111111
xx dd ..
Sequencer specific information
xx=number of bytes to be sent
dd=data
The following table lists system messages which control the entire system.
These have no midi channel number. (these will generally only apply to
controlling a midi keyboard, etc.)
Hex
F8
Binary
11111000
FA
11111010
Data
Description
Timing clock used when synchronization is
required.
Start current sequence
FB
11111011
Continue a stopped sequence where left
off
FC
11111100
Stop a sequence
The following table lists the numbers corresponding to notes for use in note
on and note off commands.
Octave||
Note Numbers
# ||
|| C
| C# | D
| D# | E
| F
| F# | G
| G# | A
| A# | B
----------------------------------------------------------------------------0 ||
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 | 10 | 11
1 || 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23
2 || 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35
3 || 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47
4 || 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59
5 || 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71
6 || 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83
7 || 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95
8 || 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107
9 || 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119
10 || 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 |
BIBLIOGRAPHY
"MIDI Systems and Control" Francis Rumsey
1990 Focal Press
"MIDI and Sound Book for the Atari ST" Bernd Enders and Wolfgang Klemme
1989 M&T Publishing, Inc.
MIDI file specs and general MIDI specs were also obtained by sending e-mail
to [email protected] with the phrase GET MIDISPEC PACKAGE
in the message.
------------------------------- DEC.CPP -----------------------------------/*
file
dec.cpp
by
Dustin Caldwell
([email protected])
*/
#include <dos.h>
#include <stdio.h>
#include <stdlib.h>
void helpdoc();
main()
{
FILE *fp;
unsigned char ch, c;
if((fp=fopen(_argv[1], "rb"))==NULL)
/* open file to read */
{
printf("cannot open file %s\n",_argv[1]);
helpdoc();
exit(-1);
}
c=0;
ch=fgetc(fp);
while(!feof(fp))
{
printf("%u\t", ch);
/* loop for whole file */
/* print every byte's decimal equiv. */
c++;
if(c>8)
{
/* print 8 numbers to a line */
c=0;
printf("\n");
}
ch=fgetc(fp);
}
fclose(fp);
/* close up */
}
void helpdoc()
{
printf("\n
/* print help message */
Binary File Decoder\n\n");
printf("\n Syntax:
dec binary_file_name\n\n");
printf("by Dustin Caldwell ([email protected])\n\n");
printf("This is a filter program that reads a binary file\n");
printf("and prints the decimal equivalent of each byte\n");
printf("tab-separated. This is mostly useful when piped \n");
printf("into another file to be edited manually. eg:\n\n");
printf("c:\>dec sonata3.mid > son3.txt\n\n");
printf("This will create a file called son3.txt which can\n");
printf("be edited with any ascii editor. \n\n");
printf("(rec.exe may also be useful, as it reencodes the \n");
printf("ascii text file).\n\n");
printf("Have Fun!!\n");
}
---------------------------- REC.CPP ---------------------------------/*
File
rec.cpp
by Dustin Caldwell
([email protected])
*/
#include
#include
#include
#include
<dos.h>
<stdio.h>
<ctype.h>
<stdlib.h>
void helpdoc();
main()
{
FILE *rfp, *wfp;
unsigned char ch, c;
char s[20];
if((rfp=fopen(_argv[1], "r"))==NULL)
{
printf("cannot open file %s \n",_argv[1]);
helpdoc();
exit(-1);
}
/* open the read file */
if((wfp=fopen(_argv[2], "wb"))==NULL)
{
printf("cannot open file %s \n",_argv[1]);
helpdoc();
exit(-1);
}
/* open the write file */
c=0;
ch=fgetc(rfp);
while(!feof(rfp))
{
/* loop for whole file */
if(isalnum(ch))
{
c=0;
while(isdigit(ch))
{
s[c]=ch;
c++;
ch=fgetc(rfp);
}
s[c]=NULL;
/* only 'see' valid ascii chars */
/* only use decimal digits (0-9) */
/* build a string containing the number */
/* must have NULL terminator */
fputc(atoi(s), wfp);/* write the binary equivalent to file */
}
ch=fgetc(rfp);
/* loop until next number starts */
}
fclose(rfp);
fclose(wfp);
/* close up */
}
void helpdoc()
{
printf("\n
/* print help message */
Text File Encoder\n\n");
printf("\n Syntax:
rec text_file_name binary_file_name\n\n");
printf("by Dustin Caldwell ([email protected])\n\n");
printf("This is a program that reads an ascii tab-\n");
printf("delimited file and builds a binary file where\n");
printf("each byte of the binary file is one of the decimal\n");
printf("digits in the text file.\n");
printf(" eg:\n\n");
printf("c:\>rec son3.txt son3.mid\n\n");
printf("(This will create a file called son3.mid which is\n");
printf("a valid binary file)\n\n");
printf("(dec.exe may also be useful, as it decodes binary files)\n\n");
printf("Have Fun!!\n");
}
From: Wilson Woo <[email protected]>
To: [email protected]
Subject: MPEG Video
THIS TEXT CONTAINS ONLY MPEG VIDEO HEADER INFO - BY WILSON WOO
It's only what I know. Please feel free to update it.
Below is information got from someone.
/*****************************************************************/
Sequence Header
This contains information related to one or more "group-of-pictures"
Byte#
Data
Details
===================================================================
1-4
Sequence header
In Hex 000001B3
code
12 bits
Horizontal size
In pixels
12 bits
Vertical size
In pixels
4 bits
Pel aspect ratio
See below
18 bits
Picture rate
See below
1 bit
Marker bit
Always 1
10 bits
VBV buffer size
Minimum buffer needed to decode this
sequence of pictures; in 16KB units
1 bit
Constrained
parameter flag
1 bit
Load intra
0: false; 1: true (matrix follows)
quantizer matrix
64 bytes
Intra quantizer
Optional
matrix
1 bit
Load nonintra
0: false; 1: true (matrix follows)
quantizer matrix
64 bytes
Nonintra quantizer Optional
matrix
Squence extension
Optional
Data
User data
Optional application-dependent data
===================================================================
Aspect raios are defined by a code which represents the height and
width of the Video image.
Picture rates are also defined by a code that represents the number
of pictures that may be displayed each second.
Each group of pictures has a header that contains one "I picture"
and zero or more B and P pictures. The header is concerned with
the time synchronisation for the first picture in this group, and
the closeness of the previous group to this one.
/*****************************************************************/
For
1 =
2 =
3 =
4 =
5 =
6 =
7 =
8 =
picture rate:
23.976 frames/sec
24
25
29.97
30
50
59.94
60
Here gives an example. Below is Hex dump of first 256 bytes of
the first Video frame of TEST.MPG from XingMPEG.
00
14
1A
21
2A
00
00
00
14
1C
23
29
00
00
01
16
1D
23
2D
01
01
B3
16
1E
24
2D
B8
01
16
16
1E
23
2D
00
13
00
16
1E
23
2D
08
F9
F0
18
1D
21
30
00
50
C4
18
1C
25
31
00
02
02
19
1E
26
30
00
BC
A3
18
1F
27
34
00
B2
20
18
20
27
34
01
B8
A5
1A
21
26
38
00
BE
10
1B
21
25
16
00
68
12
1B
20
29
00
0A
8B
12
1B
1F
2A
F0
72
A4
14
1B
1E
2A
C4
00
9F
C5
DA
5D
F0
03
96
CD
1E
53
B5
FE
E7
10
A0
FC
C4
A7
8F
CA
13
00
10
DC
5D
0B
60
59
00
CF
C8
C6
03
C0
63
3C
F1
56
B7
0A
27
12
06
DC
9D
BF
76
FF
92
80
60
E0
90
74
93
Sequence header
Horizontal size
Vertical size
Pel aspect ratio
Picture rate
Marker bit
39
8F
B9
A0
18
1A
18
80
FB
=
=
=
=
=
=
65
F4
29
34
49
72
24
76
A0
F2
CE
3C
E1
27
11
00
05
04
30
7B
21
C8
1D
7C
EC
0B
01
8B
FA
23
E4
D4
9A
84
02
BC
A6
0E
F1
0F
BC
8D
90
81
B0
9D
F0
D6
74
67
C9
18
A9
CE
(Hex) 00 00 01 B3
0x160 = 352
0x0F0 = 240
[I don't know]
4 = 29.97 frames/sec
1
50
66
40
91
0E
45
10
29
18
69
AE
13
DA
54
89
C9
39
E1
E7
1C
06
C4
8C
6D
3B
68
25
MPEG-2 Technical (and sometimes political) Frequently Asked Questions
(FAQ) list.
Copyright 1994 by Chad Fogg ([email protected])
Draft 3.3 (May 10, 1994)
1. MPEG is a DCT based scheme, right?
2. What does the MPEG video syntax feature that codes video efficiently?
3. What does the syntax provide for error robustness?
4. What is the significance of each layer in MPEG video ?
5. How does the syntax facilitate parallelism?
6. I hear the encoder is not part of the standard?
7. Are some encoders better than others?
8. Can MPEG-1 encode higher sample rates than 352 x 240 x 30 Hz ?
9. What are Constrained Parameters Bitstreams (CPB) for video?
10. Why is Constrained Parameters so important?
11. Who uses constrained parameters bitstreams?
12. Are there ways of circumventing constrained parameters bitstreams for SIF
class applications and decoders ?
13. Are there any other conformance points like CPB for MPEG-1?
14. What frame rates are permitted in MPEG?
15. Special prediction switches for MPEG-2
16. What is MPEG-2 Video Main Profile and Main Level?
17. Does anybody actually use the scalability modes?
18. What's the difference between Field and Frame pictures?
19. What do B-pictures buy you?
20. Why do some people hate B-frames?
21. Why was the 16x16 area chosen?
22. Why was the 8x8 DCT size chosen?
23. What is motion compensated prediction, and why is it a pain?
24. What are the various prediction modes in MPEG-2?
24.1 Frame:
24.2 Field predictions in frame-coded pictures:
24.3 Field predictions in field-coded pictures:
24.4 16x8 predictions in field-coded pictures:
24.5 Dual Prime prediction in frame and field-coded pictures
24.6 Field and frame organized macroblocks:
25. How do you tell a MPEG-1 bitstream from a MPEG-2 bitstream?
26. What is the reasoning behind MPEG syntax symbols?
27. Why bother to research compressed video when there is a standard?
28. Where can I get a copy of the latest MPEG-2 draft?
29. What are the latest working drafts of MPEG-2 ?
30. What is the latest version of the MPEG-1 documents?
31. What is the evolution of ISO standard documents?
32. Where is a good introductory paper to MPEG?
33. What are some journals on related MPEG topics ?
34. Is there a book on MPEG video?
35. Is it MPEG-2 (Arabic numbers) or MPEG-II (roman)?
36. What happened to MPEG-3?
37. What is MPEG-4?
38. What are the scaleable modes of MPEG-2?
39. Why MPEG-2? Wasn't MPEG-1 enough?
40. What did MPEG-2 add to MPEG-1 in terms of syntax/algorithms ?
41. How do MPEG and JPEG differ?
42. How do MPEG and H.261 differ?
43. Is H.261 the de facto teleconferencing standard?
44. What is the TM rate control and adaptive quantization technique ?
45. How does the TM work?
46. What is a good motion estimation method, then?
47. Is exhaustive search "optimal" ?
48. What are some advanced encoding methods?
49. Is so-and-so really MPEG compliant ?
50. What are the tell-tale MPEG artifacts?
51. Where are the weak points of MPEG video ?
52. What are some myths about MPEG?
53. What is the color space of MPEG?
54. Don't you mean 4:1:1 ?
55. Why did MPEG choose 4:2:0 ? Isn't 4:2:2 the standard for TV?
56. What is the precision of MPEG samples?
57. What is all the fuss with cositing of chroma components?
58. How would you explain MPEG to the data compression expert?
59. How does MPEG video really compare to TV, VHS, laserdisc ?
60. What are the typical MPEG-2 bitrates and picture quality?
61. At what bitrates is MPEG-2 video optimal?
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
Why does film perform so well with MPEG ?
What is the best compression ratio for MPEG ?
Can MPEG be used to code still frames?
Is there an MPEG file format?
What are some pre-processing enhancements ?
Why use these "advanced" pre-filtering techniques?
What about post-processing enhancements?
Can motion vectors be used to measure object velocity?
How do you code interlaced video with MPEG-1 syntax?
Is MPEG patented?
How many cable box alliances are there?
Will there be an MPEG video tape format?
Where will be see MPEG in everyday life?
What is the best compression ratio for MPEG ?
Is there a MPEG CD-ROM format?
1. MPEG is a DCT based scheme, right?
The DCT and Huffman algorithms receive the most press coverage (e.g. MPEG is
a DCT based scheme with Huffman coding ), but are in fact fairly
insignificant. The variety of coding modes signaled to the decoder as
context-dependent side information are chiefly responsible for the efficiency
of the MPEG syntax.
2. What does the MPEG video syntax feature that codes video efficiently?
A. Here are some of the statistical conditions and their syntax counterparts.
Occlusion:
pictures.
forward, backwards, or bi-directional temporal prediction in B
Smooth optical flow fields:
for motion vectors.
variable length coding of 1-D prediction errors
Spatial correlation beyond 8x8 sample block boundaries: 1-D prediction of
DC coefficients in consecutive group intra-coded macroblocks.
High temporal correlation: variable on/off coding of prediction error at
the macroblock (no-coding) or individual block (coded block pattern) level.
Temporal de-correlation: forward, backwards, or bidirectional prediction.
Content dependent quality: locally adaptive quantization
Temporal prediction accuracy: "half-pel" sample accuracy.
High locally correlated signal refresh pictures (I picture) and prediction
errors: DCT
Subjective coding: location-dependent quantization of DCT coefficients.
3. What does the syntax provide for error robustness?
1.
2.
3.
4.
5.
Byte-aligned start codes in the coded bitstream.
End of block codes in coded blocks.
Slices.
slice_vertical_position embedded as sub-field within slice start codes.
slices commencing at regular locations in picture (MPEG-2)
4. What is the significance of each layer in MPEG video ?
Sequence:
Set of pictures sharing same sampling dimensions, bit rate,
chromaticy (MPEG-1), quantization matrices (MPEG-1 only).
Group of Pictures:
Random access point giving SMPTE time code within sequence.
Guaranteed to start with an I picture.
Picture:
Samples of a common plane -- "captured" from the same time instant.
Slice:
Error resynchronization unit of macroblocks.
At the commencement of a slice, all inter-macroblock coding
dependencies are reset. Likewise, all macroblocks within a common slice
can be dependently coded.
Macroblock:
Least common multiple of Y, Cb, Cr 8x8 blocks in 4:2:0 sampling
structure.
For MPEG-1, the smallest granularity of temporal prediction.
Block:
Smallest granularity of spatial decorrelation.
5. How does the syntax facilitate parallelism?
A. For MPEG-1, slices may consist of an arbitrary number of macroblocks.
The coded bitstream must first be mapped into fixed-length elements before
true parallelism in a decoder application can be exploited. Further, since
macroblocks have coding dependencies on previous macroblocks within the same
slice, the data hierarchy must be pre-processed down to the layer of DC DCT
coefficients. After this, blocks may be independently inverse transformed
and quantized, temporally predicted, and reconstructed to buffer memory.
Parallelism is usually more of a concern for encoders. Macroblock motion
estimation and some rate control stages can be processed independently. An
encoder also has the freedom to choose the slice structure.
6. I hear the encoder is not part of the standard?
A. The encoder rests just outside the normative scope of the standard,
as long as the bitstreams it produces are compliant. The decoder,
however, is almost deterministic: a given bitstream should
reconstruct to a unique set of pictures. Statistically speaking, an
occasional error of a Least
Significant Bit is permitted as a
result of the fact that the IDCT function is the only non-normative
stage in the decoder (the designer is free to choose among many DCT
algorithms and implementations). The IEEE 1180 test referenced in
Annex A of the MPEG-1 and MPEG-2 specifications spells out the
statistical mismatch tolerance between the Reference IDCT, which
uses 64-bit floating point accuracy, and the Test IDCT.
7. Are some encoders better than others?
A. Yes. For example, the range over which a compensated prediction
macroblock is searched for has a great influence over final picture
quality. At a certain point a very large range can actually become
detrimental (it may encourage large differential motion vectors).
Practical ranges are usually between +/- 15 and +/- 32. As the
range doubles, for instance, the search area quadruples.
8. Can MPEG-1 encode higher sample rates than 352 x 240 x 30 Hz ?
A. Yes. The MPEG-1 syntax permits sampling dimensions as high as 4095 x
4095 x 60 frames per second.
The MPEG most people think of as "MPEG1" is actually a kind of subset known as Constrained Parameters
bitstream (CPB).
9. What are Constrained Parameters Bitstreams (CPB) for video?
A. MPEG-1 CPB are a limited set of sampling and bitrate parameters
designed to normalize decoder computational complexity, buffer size, and
memory bandwidth while still addressing the widest possible range of
applications. The parameter limits were intentionally designed so that a
decoder implementation would need only 4 Megabits of DRAM.
Parameter
-------------pixels/line
lines/picture
pixels*lines
picture rate
Limit
--------------------------704
480 or 576
352*240 or 352*288
30 Hz
bit rate
buffer size
1.862million bits/sec
40 Kilobytes (327,680 bits)
The sampling limits of CPB are bounded at the ever popular SIF rate:
396 macroblocks (101,376 pixels) per picture if the picture rate is
less than or equal to 25 Hz, and 330 macroblocks (84,480 pixels) per
picture if the picture rate is 30 Hz. The MPEG nomenclature loosely
defines a "pixel" or "pel" as a unit vector containing a complete
luminance sample and one fractional (0.25 in 4:2:0 format) sample from
each of the two chrominance (Cb and Cr) channels. Thus, the
corresponding bandwidth figure can be computed as:
352 samples/line x 240 lines/picture x 30 pictures/sec x 1.5 samples/pixel
or 3.8 Ms/s (million samples/sec) including chroma, but not including
blanking intervals. Since most decoders are capable of sustaining
VLC decoding at a faster rate than 1.8 Mbit/sec, the coded video bitrate
has become the most often waived parameter of CPB. An encoder which
intelligently employs the syntax tools should achieve SIF quality saturation
at about 2 Mbit/sec, whereas an encoder producing streams containing
only I (Intra) pictures might require as much as 4 Mbit/sec to achieve the
same video quality.
10. Why is Constrained Parameters so important?
A. It is an optimum point that allows (just barely) cost effective VLSI
implementations in 1992 technology (0.8 microns). It also implies a
nominal guarantee of interoperability for decoders and encoders. Since
CPB is a canonical conformance point, MPEG devices which are not capable
of meeting SIF rates are usually not considered to be true MPEG.
11. Who uses constrained parameters bitstreams?
A. Applications which are focused on CPB are Compact Disc (White Book or CDI) and computer video applications. Set-top TV decoders fall into a higher
sampling rate category known as CCIR 601 or Broadcast rate.
12. Are there ways of circumventing constrained parameters bitstreams for SIF
class applications and decoders ?
A. Yes, some. Remember that CPB limits pictures by macroblock count.
416 x 240 x 24 Hz sampling rates are still within the constraints, but this
would only be of benefit in NTSC (240 lines/field) displays. Deviating from
352 samples/line could throw off many decoder implementations which possess
limited horizontal sample rate conversion abilities. Some decoders do in fact
include a few rate conversion modes, with a filter usually implemented via
binary taps (shifts and adds). Likewise, the target sample rates are usually
limited or ratios (e.g. 640, 540, 480 pixels/line, etc.). Future MPEG
decoders will likely include on-chip arbitrary sample rate converters,
perhaps capable of operating in the vertical direction (although there is
little need of this in applications using standard TV monitors, with the
possible exception of windowing in cable box graphical user interfaces).
13. Are there any other conformance points like CPB for MPEG-1?
A. Undocumented ones, yes. A second generation of decoder chips emerged on
the market
about 1 year after the first wave of SIF-class decoders. Both
LSI Logic and SGS-Thomson introduced CCIR 601 class MPEG-1 decodersto fill in
the gap between canonical MPEG-1 and the emergence of MPEG-2. Under nondisclosure agreement, C-Cube had the CL-950.
14. What frame rates are permitted in MPEG?
A. A limited set is available for the choosing in MPEG-1, although "tricks"
could be played with Systems-layer Time Stamps to convey non-standard rates.
The set is: 23.976 Hz (3-2 pulldown NTSC), 24 Hz (Film), 25 Hz (PAL/SECAM or
625/60 video), 29.97 (NTSC), 30 Hz (drop-frame NTSC or component 525/60), 50
Hz (double-rate PAL), 59.97 Hz (double rate NTSC), and 60 Hz (double-rate
drop-frame NTSC/component 525/60 video).
15. Special prediction switches for MPEG-2
MPEG-2 sequence
/
progressive
sequence
\
interlaced sequence
/
\
Field picture
Frame picture
/
\
Frame or field pred.
Frame MB prediction only
/
\
Field dct
Frame dct
16. What is MPEG-2 Video Main Profile and Main Level?
A. MPEG-2 Video Main Profile and Main Level is analogous to MPEG-1's CPB,with
sampling limits at CCIR 601 parameters (720 x 480 x 30 Hz). Profiles limit
syntax (i.e. algorithms), whereas Levels place limits on coding parameters
(sample rates, frame dimensions, coded bitrates, etc.). Together, Video Main
Profile and Main Level (abbreviated as [email protected]) normalize complexity within
feasible limits of 1994 VLSI technology (0.5 micron), yet still meet the
needs of the majority of application [email protected] is the conformance point for
most cable and satellite systems.
Profiles
======
Simple: I and P pictures only. 4:2:0 sampling ratio. 8,9, or 10 bits DC
precision.
Main: I, P, and B pictures. Dual Prime with no B-pictures only. 4:2:0
sampling ratio. 8, 9, or 10 bits sample precision.
SNR profile:
Spatial profile:
High: 8,9,10, or 11 bits sample precision. 4:2:2 and 4:4:4 sampling ratio.
Level
====
Simple: SIF video rate (3.041280 Mhz), 4 Mbit/sec, 0.489472 Mbit VBV
buffer, 64 vertical in frame, 32Vertical in field, 1:7 fcode hor.
Main: CCIR 601 video rate (10.368 Mhz), 15 Mbit/sec, 1.835008 Mbit VBV
buffer, 128 V in frame, 64 V in field, 1:8 f_code Hor.
High 1440: 1440 x 1152 x 30 Hz (47.0016 Mhz), 60 Mbit/sec.
VBV buffer, 128 V in Fe, 1:9 fcode H.
7.340032 Mbit
High: 1920 x 1152 x 30 Hz (62.6688 Mhz), 80 Mbit/sec. 9.787392 Mbit VBV
buffer.
1:9 fcode H
17. Does anybody actually use the scalability modes?
A. At this time, scalability has found itself a limited number of
applications, although research is definitely underway for its use in HDTV.
Experiments have been demonstrated in Europe where, for example, PAL-rate
video (720 x 576 x 25 fps) is embedded in the same stream as HDTV rate video
(1440 x 1152 x 25 fps). The Nov. 1992 VADIS experiment divided the base layer
(PAL) and enhancement into 4 and 16 Mbit/sec channels, respectively. The U.S.
Grand Alliance favors HDTV simulcasting (separate NTSC analog and digital
HDTV broadcasts). Temporal scalability is the pet scalability mode as the
possible future solution for coding 60 Hz progressive sequences while
maintaining backwards compatibility with early-wave equipment (e.g. 1920 x
1080 x 30 Hz displays) . To elaborate, the first wave receivers of the late
1990's would be limited to 62at 0 Hz interlaced/30 Hz progressive HDTV
decoders. Essentially, 60 interlaced fields would be coded in a, for
example, 16 Mbit/sec stream in 1996, and when VLSI processes shift another
thousand or so angstroms down the wavelength scale, an 8 Mbit/sec enhancement
layer containing the coded "high pass" between 60 Hz progressive and 60 Hz
interlaced would be simulcasted or multiplexed. Several corporate mouths
have been known to water at the mention of charging the quality conscious
subscriber an extra fee for the enhancement layer.
18. What's the difference between Field and Frame pictures?
A. A frame-coded picture consists of samples from both even and odd fields.
A
frame picture is coded in progressive order (an even line, then an odd line,
etc.) and in the case of MPEG-2, may optionally switch between field and
frame order on a macroblock basis. The Display Process, which is *almost*
completely outside the scope of the MPEG specification, can chose to reinterlace the picture by displaying the odd and even lines at different times
(16 milliseconds apart for 60 Hz displays). In fact, most pictures,
regardless of whether they were coded as a Field or Frame, end up being
displayed interlaced due to the fact that most TV sets are interlaced.
19. What do B-pictures buy you?
A. Since bi-directional macroblock predictions are an average of two
macroblock areas, noise is reduced at low bit rates (like a 3-D filter, if
you will). At nominal MPEG-1 video (352 x 240 x 30, 1.15 Mbit/sec) rates, it
is said that B-frames improves SNR by as much as 2 dB. (0.5 dB gain is
usually considered worth-while in MPEG). However, at higher bit rates, Bframes become less useful since they inherently do not contribute to the
progressive refinement of an image sequence (i.e. not used as prediction by
subsequent coded frames). Regardless, B-frames are still politically
controversial.
B pictures are interpolative in two ways: 1. predictions in the bidirectional macroblocks are an average from block areas of two pictures 2. B
pictures fill in or interpolate the 3-D video signal over a 33 or 25
millisecond picture period without contributing to the overall signal quality
beyond that immediate point in time. In other words, a B pictures,
regardless of its internal make-up of macroblock types, has a life limited to
its immediate self. As mentioned before, its energy does not propagate into
other frames. In a sense, bits spent on B pictures are wasted.
20. Why do some people hate B-frames?
A. Computational complexity, bandwidth, delay, and picture buffer size are
the four B-frame Pet Peeves. Computational complexity in the decoder is
increased since a some macroblock modes require averaging between two
macroblocks.
Worst case, memory bandwidth is increased an extra 15.2 MByte/s (4:2:0 601
rates, not including any half pel or page-mode overhead) for this extra
prediction. An extra picture buffer is needed to store the future prediction
reference (bi-directionality). Finally, extra delay is introduced in
encoding since the frame used for backwards prediction needs to be
transmitted to the decoder before the intermediate B-pictures can be decoded
and displayed.
Cable television (e.g. -- more like i.e.-- General Instruments) have been
particularly adverse to B-frames since, for CCIR 601 rate video, the extra
picture buffer pushes the decoder DRAM memory requirements past the magic 8Mbit (1 Mbyte) threshold into the evil realm of 16 Mbits (2 Mbyte)....
although 8-Mbits is fine for 352 x 480 B picture sequence. However, cable
often forgets that DRAM does not come in convenient high-volume (low cost) 8Mbit packages as does the friendly 4-Mbit and 16-Mbit. In a few years, the
cost difference between 16 Mbit and 8 Mbit will become insignificant compared
to the bandwidth savings gain through higher compression. For the time
being, some cable boxes will start with 8-Mbit and allow future drop-in
upgrades to the full 16-Mbit.
21. Why was the 16x16 area chosen?
A. The 16x16 area corresponds to the Least Common Multiple (LCM) of 8x8
blocks, given the normative 4:2:0 chroma ratio. Starting with medium
size images, the 16x16 area provides a good balance between side
information overhead & complexity and motion compensated prediction
accuracy. In gist, 16x16 seemed like a good trade-off.
22. Why was the 8x8 DCT size chosen?
A. Experiments showed little improvements with larger sizes vs. the
increased complexity. A fast DCT algorithm will require roughly double
the arithmetic operations per sample when the transform point size is
doubled. Naturally, the best compaction efficiency has been demonstrated
using
locally adaptive block sizes (e.g. 16x16, 16x8, 8x8, 8x4, and 4x4) [See
Baker and Sullivan]. Naturally, this introduces additional side information
overhead and forces the decoder to implement programmable or hardwired
recursive DCT algorithms. If the DCT size becomes too large, then more edges
(local discontinuities) and the like become absorbed into the transform
block, resulting in wider propagation of Gibbs (ringing) and other phenomena.
Finally, with larger transform sizes, the DC term is even more critically
sensitive to quantization noise.
23. What is motion compensated prediction, and why is it a pain?
A. MCP in the decoder can be thought of as having four stages:
1. Motion vector computation
2. Prediction retrieval
various predictions are 16x16, 16x8, 8x4, 8x8 plus any half-pel
overhead (e.g. 17x16, 17x17, etc).
3. Filtering
3.1 Forming half-pel predictions through bi-linear interpolation.
3.2 Averaging two predictions together (B macroblocks, Dual Prime)
4. Combination and ordering
4.1 combining 1 or 2 predictions from stage three into upper and
lower halves (16 x 8, field in frame)
4.2 interleaving or grouping together odd and even lines in frame
picture predictions.
The final, combined prediction is always a 16x16 block of luminance and
8x8 block of chrominance, just like we experience in MPEG-1.
A single motion vector can be associated with each source, hence a macroblock
can have as many as 4 motion vectors.
24. What are the various prediction modes in MPEG-2?
24.1 Frame:
Predictions are formed from a 16 x 16 pixel area in a previously
reconstructed frame. Identical to MPEG-1. There can be only one source in
forward or backward predicted macroblocks, and two sources in bi-directional
macroblocks. The prediction frame itself may have been coded as either a
frame or two fields, however once a frame is reconstructed, it is simply a
frame as far as future predictions are concerned.
24.2 Field predictions in frame-coded pictures:
Separate predictions are formed for the top (8 lines from field 1)and bottom
(8 lines from field 2) portions of the macroblock. A total of two motion
vectors in forward or backward predictions, four in bi-directional.
24.3 Field predictions in field-coded pictures:
Predictions are formed from the two most recently decoded fields. Prediction
sizes are 16x16, however the 16 lines have a corresponding projection onto a
16x32 pixel area of a frame. One motion vector for forward or backward
predictions, and two for bi-directional.
24.4 16x8 predictions in field-coded pictures:
Like field macroblocks in frame-coded pictures, the upper and lower 8 lines
in this macroblock mode can have different predictions (hence two motion
vectors). This mode compensates for the reduced temporal prediction
precision of field picture macroblocks (a result of the fact that fields
inherently possess half the number of lines that frames do). The field
prediction area projected onto a frame is restored to 16 lines. 2 motion
vectors for backwards or forwards, 4 for bi-directional.
24.5 Dual Prime prediction in frame and field-coded pictures
Predictions for the current macroblock are formed from the average of two 16
x 8 line areas from the two most recently decoded fields. Dual Prime was
devised as an alternative for B pictures in low delay applications, but still
offers many of the signal
quality benefits of B-pictures. Dual Prime requires one less prediction
picture buffer, but still retains the same instantaneous prediction bandwidth
of a B picture system. As an alternative to coding separate motion vectors
for each of the upper and lower 16x8 areas, a full motion vector is sent for
the first area, and a +1, 0, or -1 differential vector (variable length
coded) is specified for the second prediction area. A macroblock will have
total of two full motion vectors and two differential vectors in frame-coded
pictures. Due to the prediction bandwidth overhead, Main Profile restricts
the use of Dual Prime prediction to P picture sequences only. High Profile
permits use of Dual Prime in B pictures.
24.6 Field and frame organized macroblocks:
Originally intended as a cheaper means of achieving field-decorrelation in
frame-coded pictures without the fussy overhead of separate field prediction
estimates, the dct coefficients (quantized prediction error for a given
macroblock) may be organized into either a field or frame pattern.
Essentially this means that the prediction error for the combined 16x16
macroblock may be grouped into field or frame blocks. A bit in the macroblock
header (dct_type) indicates whether the upper and lower portions of the
macroblock are to be interleaved (frame organized) or remain separated (field
organized).
25. How do you tell a MPEG-1 bitstream from a MPEG-2 bitstream?
A. All MPEG-2 bitstreams must contain specific extension headers that
*immediately* follow MPEG-1 headers. At the highest layer, for example,
the MPEG-1 style sequence_header() is followed by sequence_extension()
exclusive to MPEG-2. Some extension headers are specific to MPEG-2 profiles.
For example, sequence_scalable_extension() is not allowed in Main Profile
bitstreams.
A simple program need only scan the coded bitstream for byte-aligned start
codes to determine whether the stream is MPEG-1 or MPEG-2.
26. What is the reasoning behind MPEG syntax symbols?
A. Here are some of the Whys and Wherefores of MPEG symbols:
Start codes
These 32-bit byte-aligned codes provide a mechanism for cheaply
searching coded bitstreams for commencement of various layers of video
without having to actually parse variable-length codes or perform any
decoder arithmetic. Start codes also provide a mechanism for
resynchronization in the presence of bit errors.
Coded block pattern (CBP --not to be confused with Constrained
Parameters!) When the frame prediction is particularly good, the
displaced frame difference (DFD, or prediction error) tends to be small,
often with entire block energy being reduced to zero after quantization.
This usually happens only at low bit rates. Coded block patterns
prevent the need for transmitting EOB symbols in those zero coded
blocks.
DCT_coefficient_first
Each intra coded block has a DC coefficient. With coded block patterns
signaling all possible combinations of all-zero valued blocks, the
dct_coef_first mechanism assigns a different meaning to the VLC codeword
that would otherwise represent EOB as the first coefficient.
End of Block:
Saves unnecessary run-length codes. At optimal bitrates, there tends to
be few AC coefficients concentrated in the early stages of the zig-zag
vector. In MPEG-1, the 2-bit length of EOB implies that there is an
average of only 3 or 4 non-zero AC coefficients per block. In MPEG-2
Intra (I) pictures, with a 4-bit EOB code, this number is between 9 and
16 coefficients. Since EOB is required for all coded blocks, its absence
can signal that a syntax error has occurred in the bitstream.
Macroblock stuffing
A genuine pain for VLSI implementations, macroblock stuffing was
introduced
to maintain smoother, constant bitrate control in MPEG-1.
However, with normalized complexity measures and buffer management
performed a priori (pre-frame, pre-slice, and pre-macroblock) in the
MPEG-2 encoder test model, the need for such localized smoothing
evaporated. Stuffing can be achieved through virtually unlimited slice
start code padding if required. A good rule of thumb: if you find
yourself often using stuffing more than once per slice, you probably
don't have a very good rate control algorithm. Anyway, macroblock
stuffing is now illegal in MPEG-2, so don t start using it if you
already haven t.
MPEG's modified Huffman VLC tables
The VLC tables in MPEG are not Huffman tables in the true sense of
Huffman coding, but are more like the tables used in Group 3 fax. They
are entropy constrained, that is, non-downloadable and optimized for a
limited range of bit rates (sweet spots). With the exception of a few
codewords, the larger tables were carried over from the H.261 standard
of 1990. MPEG-2 added an "Intra table". Note that the dct_coefficient
tables assume positive/negative coefficient pmf
symmetry.
27. Why bother to research compressed video when there is a standard?
A. Despite the worldwide standard, many areas remain open for research:
advanced encoding and pre-processing, motion estimation, macroblock
decision models, rate control and buffer management in editing
environments, etc. There's practically no end to it.
28. Where can I get a copy of the latest MPEG-2 draft?
A. Contact your national standards body (e.g. ANSI Sales in NYC for the
U.S., British Standards Institute in the UK, etc.). A number of private
organizations offer ISO documents.
29. What are the latest working drafts of MPEG-2 ?
A. MPEG-2 has reached voting document of the Draft International Standard for
:
Information Technology -- Generic Coding of Moving Pictures and
Associated Audio. Recommendation H.262, ISO/IEC Draft International Standard
13818-2. [produced March 25, 1994, not yet approved by voting process].
Audio is Part 1, Video Part 2, and Systems is Part 3. A committee draft for
Conformance (Part 4) is expected in Novemeber 1994, as well as the Technical
Report on Software Simulation (Part 5).
30. What is the latest version of the MPEG-1 documents?
A. Systems (ISO/IEC IS 11172-1), Video (ISO/IEC IS 11172-2), and Audio
(ISO/IEC IS 11172-3) have reached the final document stage. Part 4,
Conformance Testing, is currently DIS
31. What is the evolution of ISO standard documents?
A. In chronological order:
ISO/Committee notation
--------------------------------------Problem (unofficial first stage)
New work Item (NI)
New Proposal (NP)
Working Draft (WD)
Committee Draft (CD)
Draft International Standard (DIS)
International Standard (IS)
Author's notation
------------------------Barroom Witticism
Napkin Item
Need Permission
We're Drunk
Calendar Deadlock
Doesn't Include Substance
Induced patent Statements
32. Where is a good introductory paper to MPEG?
A. Didier Le Gall, "MPEG: A Video Compression Standard for Multimedia
Applications," Communications of the ACM, April 1991, Vol.34, No.4, pp. 47-58
33. What are some journals on related MPEG topics ?
A.
IEEE Transactions on Consumer Electronics
IEEE Transactions on Broadcasting
IEEE Transactions on Circuits and Systems for Video Technology
Advanced Electronic Imaging
Electronic Engineering Times (EE Times -- more tabloid coverage. Unfortunate
columns by Richard Doherty)
IEEE Int'l Conference on Acoustics, Speech, and Signal Processing
(ICASSP)
International Broadcasting Convention (IBC)
Society of Motion Pictures and Television Engineers (SMPTE)
SPIE conference on Visual Communications and Image Processing
SPIE conference on Video Compression for Personal Computers
IEEE Multimedia [first edition Spring 1994]
34. Is there a book on MPEG video?
A. Yes, there will be a book published sometime in 1994 by the same authors
who brought you the JPEG book (Bill Pennebaker, Joan Mitchell). Didier Le
Gall will be an additional co-author, and will insure digressions into, e.g.
arithmetic coding aspects, be kept to a minimum :-)
35. Is it MPEG-2 (Arabic numbers) or MPEG-II (roman)?
A. Committee insiders most often use the Arabic notation with the hyphen,
e.g. MPEG-2. Only the most retentive use the official designation: Phase 2.
In fact, M.P.E.G. itself is a nickname. The official title is: ISO/IEC JTC1
SC29 WG11. The militaristic lingo has so far managed to keep the enemy
(DVI) confused and out of the picture.
ISO:
IEC:
JTC1:
SC29:
WG11:
International Organization for Standardization
International Electrotechnical Commission
Joint Technical Committee 1
Sub-committee 29
Work Group 11 (moving pictures with... uh, audio)
36. What happened to MPEG-3?
A. MPEG-3 was to have targeted HDTV applications with sampling dimensions up
to 1920 x 1080 x 30 Hz and coded bitrates between 20 and 40 Mbit/sec. It was
later discovered that with some (compatible) fine tuning, MPEG-2 and MPEG-1
syntax worked very well for HDTV rate video. The key is to maintain an
optimal balance between sample rate and coded bit rate.
Also, the standardization window for HDTV was rapidly closing. Europe and
the United States were on the brink of committing to analog-digital
subnyquist hybrid algorithms (D-MAC, MUSE, et al).
European all-digital
projects such as HD-DIVINE and VADIS demonstrated better picture quality with
respect to bandwidth using the MPEG syntax. In the United States, the
Sarnoff/NBC/Philips/Thomson HDTV consortium had used MPEG-1 syntax from the
beginning of its all-digital proposal, and with the exception of motion
artifacts (due to limited search range in the encoder), was deemed to have
the best picture quality of all three digital proponents. HDTV is now part of
the MPEG-2 High-1440 Level and High Level toolkit.
37. What is MPEG-4?
A. MPEG-4 targets the Very Low Bitrate applications defined loosely as
having sampling dimensions up to 176 x 144 x 10 Hz and coded bit rates
between 4800 and 64,000 bits/sec.
This new standard would be used, for
example, in low bit rate videophones over analog telephone lines.
This effort is in the very early stages. Morphology, fractals, model
based, and anal retentive block transform coding are all in the
offering. MPEG-4 is now in the application identification phase.
Scaleable modes of MPEG-2
38. What are the scaleable modes of MPEG-2?
A. Scaleable video is permitted only in the High Profiles.
Currently, there are four scaleable modes in the MPEG-2 toolkit. These modes
break MPEG-2 video into different layers (base, middle, and high layers)
mostly for purposes of prioritizing video data. For example, the high
priority channel (bitstream) can be coded with a combination of extra error
correction information and/or increased signal strength (i.e. higher Carrierto-Noise ratio or lower Bit Error Rate) than the lower priority channel. For
example, in HDTV, the high priority bitstream (720 x 480) can be decoded
under noise conditions were the lower priority (1440 x 960) cannot. This is
part of the "graceful degradation concept. Breaking a video signal into two
streams (base and enhancements) has a penalty, however. Usually less than
1.5 dB.
Another purpose of salability is complexity division. A standard TV set need
only decode the 720 x 480 channel, thus requiring a less expensive decoder
processor than a TV set wishing to display 1440 x 960. This is known as
simulcasting.
A brief summary of the MPEG-2 video scalability modes:
Spatial Scalablity-- Useful in simulcasting, and for feasible software
decoding of the lower resolution, base layer. This spatial domain
method codes a base layer at lower sampling dimensions (i.e.
"resolution") than the upper layers. The upsampled reconstructed lower
(base) layers are then used as prediction for the higher layers.
Data Partitioning-- Similar to JPEG's frequency progressive mode, only
the slice layer indicates the maximum number of block transform
coefficients contained in the particular bitstream (known as the
"priority break point"). Data partitioning is a frequency domain method
that breaks the block of 64 quantized transform coefficients into two
bitstreams. The first, higher priority bitstream contains the more
critical lower frequency coefficients and side informations (such as DC
values, motion vectors). The second, lower priority bitstream carries
higher frequency AC data.
SNR Scalability-- Similar to the point transform in JPEG, SNR
scalability is a spatial domain method where channels are coded at
identical sample rates, but with differing picture quality (achieved through
quantization step sizes). The higher priority bitstream contains base
layer data that can be added to a lower priority refinement layer to
construct a higher quality picture.
Temporal Scalability--- A temporal domain method useful in, e.g.,
stereoscopic video. The first, higher priority bitstreams codes video
at a lower frame rate, and the intermediate frames can be coded in a
second bitstream using the first bitstream reconstruction as prediction.
In stereoscopic vision, for example, the left video channel can be
prediction from the right channel.
Other scalability modes were experimented with in MPEG-2 video (such as
Frequency Scalability), but were eventually dropped in favor of methods
that demonstrated comparable or better picture quality with greater
simplicity.
39. Why MPEG-2?
Wasn't MPEG-1 enough?
A. MPEG-1 was optimized for CD-ROM or applications at about 1.5
Mbit/sec. Video was strictly non-interlaced (i.e. progressive). The
international cooperation executed well enough for MPEG-1, that the committee
began to address applications at broadcast TV sample rates using the
CCIR 601 recommendation (720 samples/line by 480 lines per frame by 30
frames per second or about 15.2 million samples/sec including chroma) as
the reference.
Unfortunately, today's TV scanning pattern is interlaced. This
introduces a duality in block coding: do local redundancy areas (blocks)
exist exclusively in a field or a frame.(or a particle or wave) ? The
answer of course is that some blocks are one or the other at different
times, depending on motion activity. The additional man years of
experimentation and implementation between MPEG-1 and MPEG-2 improved
the method of block-based transform coding.
40. What did MPEG-2 add to MPEG-1 in terms of syntax/algorithms ?
A. Here is a brief summary:
Sequence layer:
More aspect ratios.
A minor, yet necessary part of the syntax.
Horizontal and vertical dimensions are now required to be a multiple of
16 in frame coded pictures, and the vertical dimension must be a
multiple of 32 in field coded pictures.
4:2:2 and 4:4:4 macroblocks were added in the Next profiles.
Syntax can now signal frame sizes as large as 16383 x 16383.
Syntax signals source video type (NTSC, PAL, SECAM, MAC, component) to
help post-processing and display.
Source video color primaries (609, 170M, 240M, D65, etc.) and optoelectronic transfer characteristics (709, 624-4M, 170M etc.) can be
indicated.
Four scaleable modes [see scalability discussion]
Picture layer:
All MPEG-2 motion vectors are specified to a half-pel sample grid.
DC precision can be user-selected as 8, 9, 10, or 11 bits.
New scalar quantization matrices may be downloaded once per picture. In High
profile, separate chrominance matrices now exist (Y and C no longer have to
share)
Concealment motion vectors were added to I-pictures in order to increase
robustness from bit errors. I pictures are the most critical and sensitive
picture in a group of pictures.
A non-linear macroblock quantization factor providing a wider dynamic
range, from 0.5 to 56, than the linear MPEG-1 (1 to 32) range. Both are
sent as a 5-bit FLC side information in the macroblock and slice
headers.
New Intra-VLC table for dct_coefficient_next (AC run-level events) that
is a better match for the histogram of Intra-coded pictures. EOB is 4
bits. The old table, dct_coef_next, are reserved for use in non-intra
pictures (P, B), although they new table can be used for Intra-coded
macroblocks in P and B pictures as well.
Alternate scanning pattern that (supposedly) improves entropy coding
performance over the original Zig-Zag scan used in H.261, JPEG, and MPEG-1.
The extra scanning pattern is geared towards interlaced video.
Syntax to signal an irregular 3:2 pulldown process (repeat_field_first flag)
Progressive and interlaced frame coding
Syntax to indicate source composite video characteristics useful in postprocessing operations. (v-axis, field sequence, sub_carrier, phase,
burst_amplitude, etc.)
Pan & scanning syntax that tells decoder how to, for example, window a
4:3 image within a wider 16:9 aspect ratio coded image. Vertical pan
offset has 1/16th pixel accuracy.
Macroblock layer:
Macroblock stuffing is now illegal in MPEG-2 (hurray!!). If stuffing is
really needed, the encoder can pad slice start codes.
Two organizations for macroblock coefficients (interlaced and progressive)
signaled by dct_type flag.
Now only one run-level escape code code (24-bits) instead of the single (20bits) and double escape (28-bits) in MPEG-1.
Improved mismatch control in quantization over the original oddification
method in MPEG-1. Now specifies adding or subtracting one to the 63rd
AC coefficient depending on parity of the summed coefficients. MPEG-2
mismatch control is performed on the transform coefficients, whereas in MPEG1, it is applied to the quantized transform coefficients.
Many additional prediction modes (16x8 MC, field MC, Dual Prime) and,
correspondingly, macroblock modes.
Overall, MPEG-2's greatest compression improvements over MPEG-1 are:
prediction modes, Intra VLC table, DC precision, non-linear macroblock
quantization. Implementation improvements: macroblock stuffing was
eliminated.
41. How do MPEG and JPEG differ?
A. The most fundamental difference is MPEG's use of block-based motion
compensated prediction (MCP)---a method falling into the general category of
temporal DPCM.
The second most fundamental difference is in the target application.
JPEG adopts a general purpose philosophy: independence from color space
(up to 255 components per frame) and quantization tables for each
component. Extended modes in JPEG include two sample precision (8 and
12 bit sample accuracy), combinations of frequency progressive, spatial
hierarchically progressive, and amplitude (point transform) progressive
scanning modes. Further color independence is made possible thanks to
downloadable Huffman tables (up to one for each component.)
Since MPEG is targeted for a set of specific applications, there is only
one color space (4:2:0 YCbCr), one sample precision (8 bits), and one
scanning mode (sequential). Luminance and chrominance share quantization
and VLC tables. MPEG adds adaptive quantization at the macroblock (16 x
16 pixel area) layer. This permits both smoother bit rate control and
more perceptually uniform quantization throughout the picture and image
sequence. However, adaptive quantization is part of the Enhanced JPEG
charter (ISO/IEC 10918-3) currently in verification stage. MPEG variable
length coding tables are non-downloadable, and are therefore optimized
for a limited range of compression ratios appropriate for the target
applications.
The local spatial decorrelation methods in MPEG and JPEG are very
similar. Picture data is block transform coded with the two-dimensional
orthanormal 8x8 DCT, with asymmetric basis vectors about time (aka DCTII ). The resulting 63 AC transform coefficients are mapped in a zig-zag
pattern (or alternative scan pattern in MPEG-2) to statistically
increase the runs of zeros. Coefficients of the vector are then
uniformly scalar quantized, run-length coded, and finally the run-length
symbols are variable length coded using a canonical (JPEG) or modified
Huffman (MPEG) scheme. Global frame redundancy is reduced by 1-D DPCM
of the block DC coefficients, followed by quantization and variable
length entropy coding of the quantized DC coefficient.
MCP
DCT
ZZ
Q
Frame -> 8x8 spatial block -> 8x8 frequency block -> Zig-zag scan ->
RLC
VLC
quanitzation -> run-length coding -> variable length coding.
The similarities have made it possible for the development of hard-wired
silicon that can code both standards. Even some highly microcoded
architectures employing hardwired instruction primitives or functional
blocks benefit from JPEG/MPEG similarities. There are many additional
yet minor differences. They include:
1. In addition to the 8-bit mode, DCT and quantization precision
in MPEG has a 9-bit and 12-bit mode, respectively, exclusively in nonintra coded macroblocks. A 1-bit expansion takes place in the
macroblock difference operation.
2. Mismatch control in MPEG-1 forces quantized coefficients to
become odd values (oddification). JPEG does not employ any mismatch
mechanism.
3. JPEG run-length coding produces run-size tokens (run of zeros,
non-zero coefficient magnitude) whereas MPEG produces fully concatenated
run-level tokens that do not require magnitude differential bits.
4. DC values in MPEG-1 are limited to 8-bit precision (a constant
stepsize of 8), whereas JPEG DC precision can occupy all possible 11bits. MPEG-2, however, re-introduced extra DC precision critical even
at high compression ratios.
Difference between MPEG and H.261
42. How do MPEG and H.261 differ?
A. H.261, also known as Px64, was targeted for teleconferencing
applications where motion is naturally more limited. Motion vectors are
restricted to a range of +/- 15 pixel unit displacements. Prediction
accuracy is reduced since H.261 motion vectors are specified to only
integer-pel accuracy. Other quality syntactic differences include: no
B-pictures, inferior mismatch control.
43. Is H.261 the de facto teleconferencing standard?
A. Not exactly. To date, about seventy percent of the industrial
teleconferencing hardware market is controlled by PictureTel of Mass.
The second largest market controller is Compression Labs of Silicon
Valley. PictureTel hardware includes compatibility with H.261 as a
lowest common denominator, but when in communication with other
PictureTel hardware, it can switch to a mode superior at low bit rates
(less than 300kbits/sec). In fact, over 2/3 of all teleconferencing is
done at two-times switched 56 channel (~P = 2) bandwidth. ISDN is still
expensive. In each direction, video and audio are coded at an aggregate
rate of 112 kbits/sec (2*56 kbits/sec). The PictureTel proprietary
compression algorithm is acknowledged to be a combination of spatial
pyramid, lattice vector quantizer, and an unidentified entropy coding
method. Motion compensation is considerably more refined and
sophisticated than the 16x16 integer-pel block method specified in
H.261.
The Compression Labs proprietary algorithm also offers significant
improvement over H.261 when linked to other CLI hardware. Local
decorrelation is based on a DCT-VQ hybrid.
Currently, ITU-TS (International Telecommunications Union-teleconferencing Sector), formerly CCITT, is quietly defining an
improvement to H.261 with the participation of industry vendors.
Rate control
44. What is the TM rate control and adaptive quantization technique ?
A. The Test model (MPEG-2) and Simulation Model (MPEG-1) were not, by
any stretch of the imagination, meant to epitomize state-of-the art
encoding quality. They were, however, designed to exercise the syntax,
verify proposals, and test the *relative* compression performance of
proposals in a timely manner that could be duplicated by coexperimenters. Without simplicity, there would have been no doubt
endless debates over model interpretation. Regardless of all else, more
advanced techniques would probably trespass into proprietary territory.
The final test model for MPEG-2 is TM version 5b, aka TM version 6. The
final MPEG-1 simulation model is version 3. The MPEG-2 TM rate control
method offers a dramatic improvement over the SM method. TM adds more
accurate estimation of macroblock complexity through use of limited a
priori information. Macroblock quantization adjustments are computed on
a macroblock basis, instead of once-per-slice.
45. How does the TM work?
A. Rate control and adaptive quantization are divided into three steps:
Step One:Bit Allocation
In Complexity Estimation, the global complexity measures assign
relative weights to each picture type (I,P,B). These weights (Xi, Xp,
Xb) are reflected by the typical coded frame size of I, P, and B
pictures (see typical frame size discussion). I pictures are usually
assigned the largest weight since they have the greatest stability
factor in an image sequence. B pictures are assigned the smallest
weight since B energy do not propagate into other pictures and are usually
highly correlated with neighboring P and I pictures.
The bit target for a frame is based on the frame type, the remaining number
of bits left in the Group of Pictures (GOP) allocation, and the immediate
statistical history of previously coded pictures.
Step Two:
Rate Control
Rate control attempts to adjust bit allocation if there is significant
difference between the target bits (anticipated bits) and actual coded
bits for a block of data. If the virtual buffer begins to overflow, the
macroblock quantization step size is increased, resulting in a smaller
yield of coded bits in subsequent macroblocks. Likewise, if underflow
begins, the step size is decreased.
The Test Model approximates that the
target
picture has spatially uniform distribution of bits. This is a safe
approximation since spatial activity and perceived quantization noise
are almost inversely proportional. Of course, the user is free to
design a custom distribution, perhaps targeting more bits in areas that
contain text, for example.
Step Three:
Adaptive Quantization
The final step modulates the macroblock quantization step size obtained in
Step 2 by a local activity measure. The activity measure itself is normalized
against the most recently coded picture of the same type (I, P, or B). The
activity for a macroblock is chosen as the minimum among the four 8x8 block
luminance variances. Choosing the minimum block is part of the concept that
a macroblock is no better than the block of highest visible distortion
(weakest link in the chain).
46. What is a good motion estimation method, then?
A. When shopping for motion vectors, the three basic characteristics
are: Search range, search pattern, and matching criteria. Search
pattern has the greatest impact on finding the best vector. Hierarchical
search patterns first find the best match between downsampled images of
the reference and target pictures and then refine the vector through
progressively higher resolutions. When compared to other fast methods,
hierarchical patterns are less likely to be confused by extremely local
distortion minimums as being a best match. Also note that subsampled search
and hierarchical search are not synonymous.
Q.
Is there a limit to the length of motion vectors?
The search area is unlimited, but the reconstructed motion vectors must
not:
a. point beyond the picture boundaries
(1 <= MV_x <= luminancewidth 16) and (1 <= MV_y <= luminanceheight - 16). The - 16 is due to the
fact that the motion vector origin is the upper left hand corner of a
macroblock)
b. In Constrained Parameters MPEG-1, the motion vector is limited to a
range of [-64,+63.5] luminance samples with half-pel accuracy, and [128,+127.5] with integer pel accuracy. Break the constrained parameters
rules and your video sequence will not likely display on many hardware
devices.
c. In MPEG-2 Video Main Profile at Main Level, the motion vectors are
always on a half-pel co-ordinate grid, and the vertical range is
restricted to [-64, +63.5], and the horizontal limit is [-256,+255.5].
d. in MPEG-1, the syntactic limit of the motion vector is [-1024,+1023]
integer pel, horizontal and vertical.
e. in MPEG-2, the syntactic limit of the motion vector is [-2048,+2047.5]
horizontal, [-1024,+1023.5] vertical.
47. Is exhaustive search "optimal" ?
A. Definitely not in the context of block-based MCP video.
Since one
motion vector represents the prediction of 256 pixels, divergent pixels
within the macroblock are misrepresented by the "global" vector. This
leads back to the general philosophy of block-based coding as an
approximation technique. In their ICASSP'93 paper, Sullivan discusses ways in
which block-based prediction schemes can solve part of this problem.
Exhaustive search may find blocks with the least distortion (displaced frame
difference) but will not produce motion vectors with the lowest entropy.
48. What are some advanced encoding methods?
Quantizer feedback: determine the dependent quantization stepsize by
modeling quantization error propagating over multiple pictures. [Uz/et
al ICASSP 93, Ortega/Vetterli/et al ICASSP 93]
Smoothness constraint placed on local activity measures. immediate blocks
outside target macroblock are considered when selecting macroblock
quantization stepsize .[Thomson/Savitier patent]
Horizontal variance: measure variance between columns of pixels in addition
to the traditional measure of variance along rows (lines) when making
field/frame macroblock prediction decision.
DFD energy: examine DFD energy/variance when making Intra/Non-intra
macroblock decision.
Activity measures: use total bits from a first-pass encoding of a picture or
macroblock as a measure of the activity. Coded bits is a more accurate
reflection of local complexity than variance. [Thomson/Savitier patent]
motion vector cost: this is true for any syntax elements, really. Signaling
a macroblock quantization factor or a large motion vector differential can
cost more than making up the difference with extra quantized DFD (prediction
error) bits.
The optimum can be found with, some Lagrangian operator. In
summary, any compression system with side information, there is a optimum
point between signaling overhead (e.g. prediction) and prediction error.
Liberal Interpretations of the Forward DCT:
Borrowing from the concept that the DCT is simply a filter bank, a
technique that seems to be gaining popularity is basis vector shaping.
Usually this is combined with the quantization stage since the two are
tied closely together in a rate-distortion sense. The idea is to use
the basis vector shaping as a cheap alternative to pre-filtering by
combining the more desirable data adaptive properties of pre-filtering/
pre-processing into the transformation process... yet still reconstruct
a picture in the decoder using the standard IDCT that looks reasonably
like the source. Some more clever schemes will apply a form of windowing.
[Warning: watch out for eigenimage/basis vector orthoganality. ]
Frequency-domain enhancements:
Enhancements are applied after the DCT (and possibly quantization)stage
to the transform coefficients. This borrows from the concept: if you
don't like the (quantized) transformed results, simply reshape them into
something you do like. Suppressing isolated small amplitudes is popular.
Temporal spreading of quantization error:
This method is similar to the original intent behind color subcarrier
phase alternation by field in the NTSC, PAL, and SECAM analog TV
standards: for stationary areas, noise does not hang" in one location,
but dances about the image over time to give a more uniform effect.
Distribution makes it more difficult for the eye to "catch on" to
trouble spots (due to the latent temporal response curve of human
vision). Simple encoder models tend to do this naturally but will not
solve all situations.
Look-ahead and adaptive frame cycle structures: analyze picture activity
several pictures into the future, looking for scene changes or motion
statistics.
It is easy to spot encoders that do not employ any advanced encoding
techniques: reconstructed video usually contains ringing around edges,
color bleeding, and lots of noise.
49. Is so-and-so really MPEG compliant ?
A. At the very least, there are two areas of conformance/compliance in
MPEG: 1. Compliant bitstreams 2. compliant decoders. Technically
speaking, video bitstreams consisting entirely of I-frames (such as
those generated by Xing software) are syntactically compliant with the
MPEG specification. The I-frame sequence is simply a subset of the full
syntax. Compliant bitstreams must obey the range limits (e.g. motion
vectors limited to +/-128, frame sizes, frame rates, etc.)and syntax
rules (e.g. all slices must commence and terminate with a non-skipped
macroblock, no gaps between slices, etc.).
Decoders, however, cannot escape true conformance. For example, a
decoder that cannot decode P or B frames are *not* legal MPEG.
Likewise, full arithmetic precision must be obeyed before any decoder
can be called "MPEG compliant."
The IDCT, inverse quantizer, and
motion compensated predictor must meet the specification requirements...
which are fairly rigid (e.g. no more than 1 least significant bit of
error between reference and test decoders). Real-time conformance is
more complicated to measure than arithmetic precision, but it is
reasonable to expect that decoders that skip frames on reasonable
bitstreams are not likely to be considered compliant.
Artifacts
50. What are the tell-tale MPEG artifacts?
A. If the encoder did its job properly, and the user specified a proper
balance between sample rate and bitrate, there shouldn't be any visible
artifacts. However, in sub-optimal systems, you can look for:
Gibbs phenomenon/Ringing/Aliasing (too few AC bits, not enough
pre-processing)
Blockiness (not considering your neighbors before quantizing)
Posterization (too few DC bits)
Checkerboards (DCT eigenimages as a result of too few AC coefficients)
Colorbleeding (not considering color in encoder cost model, not
subtracting color at edges of objects, etc.)
51. Where are the weak points of MPEG video ?
A.
Texture patterns (rapidly alternating lines)
sharp edges (especially text)
[installment 3]
52. What are some myths about MPEG?
A. There are a few major myths that I am aware of:
1. Block displacements: macroblock predictions are formed out of
arbitrary 16x16 (or 16x8/16x16 in MPEG-2) areas from previously
reconstructed pictures. Many people believe that the prediction
macroblocks have boundaries that fall on interchange boundaries (pixel
0, 15, 31, 53... line 0, 15, 31, 53... etc.). In fact, motion vectors
represent relative translations with respect to the target
reconstruction macroblock coordinates. The motion vectors can point to
half pixel coordinates, requiring that the prediction macroblock to be
formed via bi-linear interpolation of pixels.
2. Displaced frame (macroblock) difference construction: the prediction
error formed as the difference between the prediction macroblock and
source macroblock is coded much like an Intra macroblock. The
prediction may come from different locations (as in bi-directional
prediction--or in MPEG-2--16x8, field-in-frame, and Dual Prime), but the
DFD is always coded as a 16x16 unit.
3. Compression ratios
You hear 200:1 and 100:1 in the media. Utter rubbish. The true range
is between 16:1 and 40:1. Spreading misinformation about compression
ratios in public will catch the attention of the infamous MPEG Police.
They say mild-mannered Michael Barnsley will snap, without warning, into
violent rage if he doesn't get the upper bunk bed.
4. Picture coding types all consist of the same macroblocks
Macroblocks within I pictures are strictly intra-coded. Macroblocks
within P pictures can be either predicted or intra-coded, and B pictures
they can be bi-directional, forward, backward, or intra. Additional
macroblock modes switches include: predicted with no motion
compensation, modified macroblock quantization, coding of prediction error or
not. The switches are concatenated into the macroblock_type side information
and variable length coded in the macroblock header.
53. What is the color space of MPEG?
MPEG strictly specifies the YCbCr color space, not YUV or YIQ or YPbPr
or YDrDb or any other color difference variations. Regardless of any
bitstream parameters, MPEG-1 and MPEG-2 Video Main Profile specify 4:2:0
chroma ratio, where the color difference channels (Cb, Cr) have half the
resolution or sample grid density in both the horizontal and vertical
direction
with respect to luminance.
MPEG-2 High Profile includes an option for 4:2:2 and 4:4:4 coding.
Applications
for this are likely to be broadcasting and contribution equipment.
54. Don't you mean 4:1:1 ?
A. No, here is a table of ratios:
format
----4:4:4
4:2:2
4:2:0
4:1:1
4:1:0
CCIR 601 (60 Hz) image
Y
Cb, Cr
--------720 x 480
720 x 480
720 x 480
720 x 480
720 x 480
Chroma sub-sampling factors
Vertical
Horizontal
-------------------------720 x 480
none
none
360 x 480
none
2:1
360 x 240
2:1
2:1
720 x 120
none
4:1
180 x 120
4:1
4:1
3:2:2, 3:1:1, and 3:1:0 are less common variations.
55. Why did MPEG choose 4:2:0 ? Isn't 4:2:2 the standard for TV?
A. At least three reasons I can think of:
1. 4:2:0 picture memory requirements are 33% less than the size of 4:2:2
pictures.
MPEG-1 decoder are able to snugly fit all 3 SIF pictures (1 reconstruction &
display, 2 prediction) into 512 KBytes of buffer space. CCIR 601 is a
tighter fit into 2 Mbytes.
2. The subjective difference between 4:2:0 and 4:2:2 is minimal, when
considering consumer display equipment and distribution compression ratios.
3. Vertical decimation increases compression efficiency by reducing syntax
overhead posed in an 8 block (4:2:0) macroblock structure.
4. You re compressing the hell out of the video signal, so what possible
difference can the 0:0:2 high-pass make?
Interlacing and the 62 microsecond gap between successively scanned lines
introduces some discontinuities, but most of this can be alleviated through
pre-processing.
56. What is the precision of MPEG samples?
A. By definition, MPEG samples have no more and no less than 8-bits uniform
sample precision (256 quantization levels). For luminance (which is
unsigned) data, black corresponds to level 0, white is level 255. However, in
CCIR recommendation 601 chromaticy, levels 0 through 14 and 236 through 255
are reserved for blanking signal excursions. MPEG currently has no such
clipped excursion restrictions, although decoder might take care to insure
active samples do not exceed these limits. With three color components per
pixel, the total combination is roughly 16.8 million colors (i.e. 24-bits).
57. What is all the fuss with cositing of chroma components?
A. It is moderately important to properly co-site chroma samples,
otherwise a sort of chroma shifting effect (exhibited as a halo ) may result
when the reconstructed video is displayed. In MPEG-1 video, the chroma
samples are exactly centered between the 4 luminance samples (Fig 1.)
To
maintain compatibility with the CCIR 601 horizontal chroma locations and
simplify implementation (eliminate need for phase shift), MPEG-2 chroma
samples are arranged as per Fig.2.
Y
Y
Y
C
Y
Y
Y
X
Y
Y
C
Y
Y
C
Y
Y
C
Y
Y
Y
Fig.1 MPEG-1
4:2:0 organization
Y
C
Y
Y
Y
C
Y
Y
Y
Y
Y
C
Y
Y
YC
Y
YC
Y
Y
YC
Y
YC
Y
Y
C
Y
Y
YC
Y
YC
Y
Y
YC
Y
YC
Y
Fig.2 MPEG-2
4:2:0 organization
Fig.3 MPEG-2 and
CCIR Rec. 601
4:2:2 organization
MPEG for the data compression expert
58. How would you explain MPEG to the data compression expert?
A. MPEG video is a block-based video scheme.
59. How does MPEG video really compare to TV, VHS, laserdisc ?
A. VHS picture quality can be achieved for source film video at about 1
million bits per second (with proprietary encoding methods). It is very
difficult to objectively compare MPEG to VHS. The response curve of
VHS places -3 dB at around 2 MHz of analog luminance bandwidth
(equivalent to 200 samples/line). VHS chroma is considerably less dense
in the horizontal direction than MPEG source video (compare 80
samples/line to 176!). From a sampling density perspective, VHS is
superior only in the vertical direction (480 luminance lines compared to
240)...
but when taking into account (supposedly such things as) interfield magnetic
tape crosstalk and the TV monitor Kell factor, the perceptual vertical
advantage is not all that significant. VHS is prone to such inconveniences
as timing errors (an annoyance addressed by time base correctors), whereas
digital video is fully discretized. Pre-recorded VHS is typically recorded at
very high duplication speeds (5 to 15 times real time playback speed),
opening up additional avenues for artifacts. In gist, MPEG-1 at its nominal
parameters can match VHS's sexy low-pass-filtered look.
With careful coding schemes, broadcast NTSC quality can be approximated at
about 3 Mbit/sec, and PAL quality at about 4 Mbit/sec. Of course, sports
sequences with complex spatial-temporal activity should be treated with bit
rates more like 5 and 6 Mbit/sec, respectively. Laserdisc is a tough one to
compare. Laserdisc's are encoded with composite video (NTSC or PAL).
Manufacturers of laser disc players make claims of up to 425 TVL (or 567
samples/line) response. Thus it could be said the laserdisc has a 567 x 480 x
30 Hz "potential resolution". The carrier-to-noise ratio is typically better
than 48 dB. Timing is excellent. Yet some of the clean characteristics of
laserdisc can be achieved with MPEG-1 at 1.15 Mbit/sec (SIF rates),
especially for those areas of medium detail (low spatial activity) in the
presence of uniform motion. This may be why some people say MPEG-1 video at
1.15 Mbit/sec looks almost as good as Laserdisc or Super VHS at times.
60. What are the typical MPEG-2 bitrates and picture quality?
MPEG-1 SIF
@ 1.15 Mbit/sec
38,000
MPEG-2 601
130,000
@ 4.00 Mbit/sec
I
Picture type
P
B
150,000
50,000
20,000
400,000
200,000
80,000
Average
Note: parameters assume Test Model for encoding, I frame distance of 15 (N =
15), and a P frame distance of 3 (M = 3).
Of course, among differing source material, scene changes, and use of
advanced encoder models... these numbers can be significantly different.
61. At what bitrates is MPEG-2 video optimal?
A. The Test subgroup has defined a few examples:
"Sweet spot" sampling dimensions and bit rates for MPEG-2:
Dimensions
--------------352x480x24 Hz
(progressive)
(better)
Coded rate
----------
Comments
----------------------------------------
2 Mbit/sec
Half horizontal 601. Looks almost NTSC
broadcast quality, and is a good
substitute for VHS.
544x480x30 Hz
capture
(interlaced)
4 Mbit/sec
704x480x30 Hz
(interlaced)
6 Mbit/sec
Intended for film src.
PAL broadcast quality (nearly full
of 5.4 MHz luminance carrier). Also
4:3 image dimensions windowed within 720
sample/line 16:9 aspect ratio via pan&scan.
Full CCIR 601 sampling dimensions.
[these numbers subject to change at whim of MPEG Test subgroup]
62. Why does film perform so well with MPEG ?
A. Several reasons, really:
1) The frame rate is 24 Hz (instead of 30 Hz) which is a savings of
some 20%.
2) the film source video is inherently progressive. Hence no fussy
interlaced spectral frequencies.
3) the pre-digital source was severely oversampled (compare 352 x 240
SIF to 35 millimeter film at, say, 3000 x 2000 samples). This can
result in a very high quality signal, whereas most video cameras
do
not oversample, especially in the vertical direction.
4) Finally, the spatial and temporal modulation transfer function
(MTF)
characteristics (motion blur, etc) of film are more amenable to
the transform and quantization methods of MPEG.
63. What is the best compression ratio for MPEG ?
A. The MPEG sweet spot is about 1.2 bits/pel Intra and .35 bits/pel
inter. Experimentation has shown that intra frame coding with the
familiar DCT-Quantization-Huffman hybrid algorithm achieves optimal
performance at about an average of 1.2 bits/sample or about 6:1
compression ratio. Below this point, artifacts become noticeable.
64. Can MPEG be used to code still frames?
A. Yes. There are, of course, advantages and disadvantages to using
MPEG over JPEG:
Disadvantages:
1. MPEG has only one color space
2. MPEG-1 and MPEG-2 Main Profile luma and chroma share quanitzation
and VLC tables
3. MPEG-1 is syntactically limited to 4k x 4k images, and 16k x 16k for
MPEG-2.
Advantages:
1. MPEG possesses adaptive quantization
2. With its limited still image syntax, MPEG averts any temptation to use
unnecessary, expensive, and academic encoding methods that have little
impact on the overall picture quality (you know who you are).
Philips' CD-I spec. has a requirement for a MPEG still frame mode, with
double SIF image resolution. This is technically feasible mostly thanks to
the fact that only one picture buffer is needed to decode a still image
instead of three buffers.
65. Is there an MPEG file format?
A. Not exactly. The necessary signal elements that indicate image size,
picture rate, aspect ratio, etc. are already contained within the sequence
layer of the MPEG video stream. The Whitebook format for Karoke and CD-I
movies specify a range of (time-division) multiplexing strategies for audio
and video bitstreams. A directory format listing scenes and their locations
on the disc is associated with the White Book specification.
66. What are some pre-processing enhancements ?
Adaptive de-interlacing:
This method maps interlaced video from a higher sampling rate (e.g 720 x 480)
into a lower rate, progressive format (352 x 240).
The most basic algorithm
measures the correlation between two immediate macroblock fields, and if the
correlation is high enough, uses an average of both fields to form a frame
macroblock. Otherwise, a field area from one field (usually of the same
parity) is selected. More clever algorithms are much more complex than this,
and may involve median filtering, and multirate/multidimensional tools.
Pre-anti-aliasing and Pre-blockiness reduction:
A common method in still image coding is to pre-smooth the image before
encoding. For example, if pre-analysis of a frame indicates that serious
artifacts will arise if the picture were to be coded in the current condition
(i.e. below the sweet spot), a pre-anti-aliasing filter can be applied. This
can be as simple as having a smoothing severity proportional to the image
activity. The pre-filter can be global (same smoothing factor for whole
image or sequence) or locally adaptive. More complex methods will again use
multirate/multidimensional methods.
One straightforward concept from multidimensional/multirate e-processing is
to apply source video whose resolution (sampling density) is greater than
the target source and reconstruction sample rates. This follows the basic
principles of oversampling, as found in A/D converters.
These filters emphasize the fact that most information content is contained
in the lower harmonics of a picture anyway. VHS is hardly considered to be a
sharp cut-off medium, tragically implying that "320 x 480 potential" of
VHS is never truly realized.
67. Why use these "advanced" pre-filtering techniques?
A. Think of the DCT and quantizer as an A/D converter. Think of the DCT/Q
pre-filter as the required anti-alias prefilter found before every A/D. The
big difference of course is that the DCT quantizer assigns a varying number
of bits per transform coefficient. Judging on the normalized activity
measured in the pre-analysis stage of video encoding (assuming you even have
a pre-analysis stage), and the target buffer size status, you have a fairly
good idea of how many bits can be spared for the target macroblock, for
example.
Other pre-filtering techniques mostly take into account: texture patterns,
masking, edges, and motion activity. Many additional advanced techniques can
be applied at different immediate layers of video encoding (picture, slice,
macroblock, block, etc.).
68. What about post-processing enhancements?
Some research has been carried out in this area. Non-linear interpolation
methods have been published by Wu and Gersho (e.g. ICASSP 93), convex hull
projections for MAP (Severinson, ICASSP 93), and others. Post-processing
unfortunately defies the spirit of MPEG conformance. Decoders should produce
similar reconstructions. Enhancements should ideally be done during the preprocessing and encoding stages.
69. Can motion vectors be used to measure object velocity?
A. Motion vector information cannot be reliably used as a means of
determining object velocity unless the encoder model specifically set
out to do so. First, encoder models that optimize picture quality generate
vectors that typically minimize prediction error and, consequently,
the vectors often do not represent true object translation. Standards
converters that resample one frame rate to another (as in NTSC to PAL)
use different methods (motion vector field estimation, edge detection, et
al) that are
not concerned with optimizing ratios such as SNR vs bitrate. Secondly, motion
vectors
are not transmitted for all macroblocks anyway.
70. How do you code interlaced video with MPEG-1 syntax?
A. Two methods can be applied to interlaced video that maintain
syntactic compatibility with MPEG-1 (which was originally designed for
progressive frames only). In the field concatenation method, the
encoder model can carefully construct predictions and prediction errors
that realize good compression but maintain field integrity (distinction
between adjacent fields of opposite parity). Some pre-processing
techniques can also be applied to the interlaced source video that
would, e.g., lessen sharp vertical frequencies.
This technique is not efficient of course. On the other hand, if the
original source was progressive (e.g. film), then it is more trivial to
convert the interlaced source to a progressive format before encoding.
(MPEG-2 would then only offer superior performance through greater DC
block precision, non-linear mquant, intra VLC, etc.) Reconstructed
frames are re-interlaced in the decoder Display process.
The second syntactically compatible method codes fields as separate pictures.
This approach has been acknowledged not to work as well.
71. Is MPEG patented?
A. Yes and no. Many encoding methods are patented. Approximately 11
blocking patents, that is, patents that are general enough to be unavoidable
in any implementation have been recently identified.
A patent pool is being formed within MPEG where a single royalty fee would be
split among the 31 patent-holding companies.
72. How many cable box alliances are there?
A. Many.
To start with:
Scientific Atlanta (SA), Kaledia, and Motorola:
SA will build the box, Motorola the chips, and Kaleida the
O/S and user interface (using ScriptX of course).
Silicon Graphics (SGI), Scientific Atlanta, and Toshiba
For the Time Warner's Orlando trial, SGI will provide the
RISC (MIPS R4000) and software, SA will do the box again,
and Toshiba will provide the chips.
General Instruments (GI) and Microsoft:
GI will make the box and Intel will supply the special low-cost
386SL processor on which a 1MB flash EPROM executable core
of Microsoft windows and DOS will run. Microsoft will develop the
user interface.
Hewlett Packard (HP):
HP will manufacture and/or design low cost, open architecture set-top
decoder boxes (not a part of the Eon wireless deal). The CPU will
explicitly not use a 80x68 based processor.
CLI and Philips:
Compression Labs will provide the encoder technology and Philips
will provide the decoder techology for an ADSL system whose
transport structure will be put together by Broadband Technologies.
["These alliances subject to change at the whim of PR departments
and market forces."]
73. Will there be an MPEG video tape format?
A. Not exactly. A consortium of international companies are codeveloping a consumer digital video 6 millimeter wide, metal particle
tape format. Due to the initial high cost of MPEG encoders, a JPEG-like
compression method will be used for inexpensive encoding of typical
consumer source video (broadcast PAL, NTSC). The natural consequence of
still image methods is less efficient use of bandwidth: 25 Mbit/sec for
the same subjective real-time playback quality achieved at 6 Mbit/sec
possible with MPEG-2. A second bit rate mode, 50 Mbit/sec, is
designated for HDTV.
Pre-coded digital video from, e.g., broadcast sources will be directly
recorded to tape and "passed-through" as a coded bitstream to the video
decompression box upon tape playback. Assuming if linear tape speed is
to be proportional to bit rate, the recording time of a pre-compressed
MPEG-2 program at the upper limit of 5 Mbit/sec for broadcast quality
video, the recording time would be over 20 hours. Channel coding
schemes (error correction, convolution coding, etc.), however, will
most likely be optimized for the tape medium and therefore may differ
from the channel methods for cable, terrestrial, and satellite. (A
Zenith-Goldstar S-VHS based experiment did, however, directly record the
4-VSB broadcast baseband signal of the old Zenith/AT&T HDTV proposal).
More specs: (Summarized from EE Times July 5, 1993 article)
tape width: 6.35 mm
Audio: two channel 48 KHz 16-bit audio, or 4 channel at 32 KHz at 12-bit
Tape format: metal evaporated tape, 13.5 microns thick
Cassette dimensions: (millimeters)
Size
Width
Height Depth
----------------- ----Standard
125
78
14.6
Small
66
48
12.2
Linear tape speeds: 18.812 mm/s (60Hz),
Video compression: DCT based
Recording times:
525/625 (25Mb/sec)
-----------------4h30min
1 hour
HDTV (50 Mb/s)
-------------2h15min
30min
18.831 mm/s (50 Hz)
Participants: Matsushita, Sony, Philips, Thomson, Hitachi, Mitsubishi,
Sanyo, Sharp, Toshiba, JVC.
MPEG in everyday life
74. Where will be see MPEG in everyday life?
A. Just about wherever you see video today.
DBS (Direct Broadcast Satellite)
The Hughes/USSB DBS service will use MPEG-2 video and audio. Thomson
has exclusive rights to manufacture the decoding boxes for the first 18
months of operation. Hughes/USSB DBS will begin its U.S. service in
April 1994. Two satellites at 101 degrees West will share the power
requirements of 120 Watts per 27 MHz transponder over a total of 32
transponders. Multi source channel rate control methods will be
employed to optimally allocate bits between several programs normalized
to one 22 Mbit/sec data carrier. Bit allocation adapts to instantaneous cochannel
spatial and co-channel temporal activity. An average of 150 channels are
planned with the addition of a second set of satellites augmenting the power
level of each transponder to 240 Watts. The coded throughput of each
transponder will increase to 30 Mbit/sec.
CATV (Cable Television)
Despite conflicting options, the cable industry has more or less
settled on MPEG-2 video. Audio is less than settled. For example,
General Instruments (the largest U.S. consumer cable set-top box
manufacturer) have announced the planned exclusive use of Dolby AC-3.
The General Instruments DigiCipher I video syntax is similar to MPEG-2
syntax, but employs smaller macroblock predictions and no B-frames. The
DigiCipher II specification will include modes to support both the GI
and full MPEG-2 Video Main Profile syntax. Digicipher-I services such
as HBO will upgrade to DigiCipher II in 1994.
HDTV
The U.S. Grand Alliance, a consortium of companies that formerly competed
to win the U.S. terrestrial HDTV standard, have already agreed to
use the MPEG-2 Video and Systems syntax---including B-pictures. Both
interlaced(1920 x 1080 x 30 Hz) and progressive (1280 x 720 x 60 Hz)
modes will be supported. The Alliance has also settled upon a modulation
method (VSB) convolution coding (Viterbi), and error correction (ReedSoloman) specification.
In September 1993, the consortium of 85 European companies signed an
agreement to fund a project known Digital Video Broadcasting (DVB) which
will develop a standard for cable and terrestrial transmission by the
end of 1994. The scheme will use MPEG-2. This consortium has put the
final nail in the coffin of the D-MAC scheme for gradual migration
towards an all-digital, HDTV consumer transmission standard. The only
remaining analog or digital-analog hybrid system left in the world is
NHK's MUSE (which will probably be axed in a few years as soon as it appears
to be politically secure thing to do).
75. What is the best compression ratio for MPEG ?
A. The MPEG sweet spot is about 1.2 bits/pel Intra and .35 bits/pel
inter. Experimentation has shown that intra frame coding with the
familiar DCT-Quantization-Entropy hybrid algorithm achieves optimal
performance at about an average of 1.2 bits/sample or about 6:1
compression ratio. Below this point, artifacts become noticeable.
76. Is there a MPEG CD-ROM format?
A. Yes, a consortium of international companies (Matsushita, Philips,
Sony, JVC, et al) have agreed upon a specification for MPEG video and
audio. 2 hour long movies are stored on two 650 MByte compact discs. The
video
rate is 1.15 Mbit/sec, the audio rate is either 128 kbit/sec or 192 kbit/sec
Layer I or Layer II.(this seems to contradict the Philips 224 kbit/s audio
spec?). Although the Video, Systems, and Audio syntax are identical, the CD-I
movie format and the White Book format are not compatible.
Researchers are busy experimenting with denser and faster rate CD
formats, perhaps using green or blue laser wavelengths. One demonstration
stretched the pit and track density to its limits, improving areal density by
almost 2 fold.
MZ EXE Format
Intel byte order
Information from File Format List 2.0 by Max Maischein.
--------!-CONTACT_INFO---------------------If you notice any mistakes or omissions, please let me know! It is only
with YOUR help that the list can continue to grow. Please send
all changes to me rather than distributing a modified version of the list.
This file has been authored in the style of the INTERxxy.* file list
by Ralf Brown, and uses almost the same format.
Please read the file FILEFMTS.1ST before asking me any questions. You may find
that they have already been addressed.
Max Maischein
Max Maischein, 2:244/1106.17
[email protected]
[email protected]
Corion on #[email protected]
--------!-DISCLAIMER-----------------------DISCLAIMER: THIS MATERIAL IS PROVIDED "AS IS". I verify the information
contained in this list to the best of my ability, but I cannot be held
responsible for any problems caused by use or misuse of the information,
especially for those file formats foreign to the PC, like AMIGA or SUN file
formats. If an information it is marked "guesswork" or undocumented, you
should check it carefully to make sure your program will not break with
an unexpected value (and please let me know whether or not it works
the same way).
Information marked with "???" is known to be incomplete or guesswork.
Some file formats were not released by their creators, others are regarded
as proprietary, which means that if your programs deal with them, you might
be looking for trouble. I don't care about this.
-------------------------------------------The old EXE files are the EXE files executed directly by MS-DOS. They were a
major improvement over the old 64K COM files, since EXE files can span multiple
segments. An EXE file consists of three different parts, the header, the
relocation table and the binary code.
The header is expanded by a lot of programs to store their copyright information
in the executable, some extensions are documented below.
The format of the header is as follows :
OFFSET
Count TYPE
Description
0000h
2 char
ID='MZ'
ID='ZM'
0002h
1 word
Number of bytes in last 512-byte page
of executable
0004h
1 word
Total number of 512-byte pages in executable
(including the last page)
0006h
1 word
Number of relocation entries
0008h
1 word
Header size in paragraphs
000Ah
1 word
Minimum paragraphs of memory allocated in
addition to the code size
000Ch
1 word
Maximum number of paragraphs allocated in
addition to the code size
000Eh
1 word
Initial SS relative to start of executable
0010h
1 word
Initial SP
0012h
1 word
Checksum (or 0) of executable
0014h
1 dword CS:IP relative to start of executable
(entry point)
0018h
1 word
Offset of relocation table;
40h for new-(NE,LE,LX,W3,PE etc.) executable
001Ah
1 word
Overlay number (0h = main program)
Following are the header expansions by some other prorams like TLink, LZExe and
other linkers, encryptors and compressors; all offsets are relative to the start
of the whole header :
---new executable
OFFSET
001Ch
0020h
0022h
003Ch
Count
4
1
26
1
TYPE
byte
word
byte
dword
Description
????
Behaviour bits ??
reserved (0)
Offset of new executable header from start of
file (or 0 if plain MZ executable)
---Borland TLINK
OFFSET
001Ch
001Eh
001Fh
0020h
Count
2
1
1
2
TYPE
byte
byte
byte
byte
Description
?? (apparently always 01h 00h)
ID=0FBh
TLink version, major in high nybble
??
---old ARJ self-extracting archive
OFFSET
Count TYPE
Description
001Ch
4 char
ID='RJSX' (older versions)
new signature is 'aRJsf'" in the first 1000
bytes of the file)
---LZEXE compressed executable
OFFSET
Count TYPE
Description
001Ch
2 char
ID='LZ'
001Eh
2 char
Version number :
'09' - LZExe 0.90
'91' - LZExe 0.91
---PKLITE compressed executable
OFFSET
Count TYPE
Description
001Ch
1 byte
Minor version number
001Dh
1 byte
Bit mapped :
0-3 - major version
4 - Extra compression
5 - Multi-segment file
001Eh
6 char
ID='PKLITE'
---LHarc 1.x self-extracting archive
OFFSET
Count TYPE
Description
001Ch
4 byte
unused???
0020h
3 byte
Jump to start of extraction code
0023h
2 byte
???
0025h
12 char
ID='LHarc's SFX '
--LHA 2.x self-extracting archive
OFFSET
Count TYPE
Description
001Ch
8 byte
???
0024h
10 char
ID='LHa's SFX '
For version 2.10
ID='LHA's SFX ' (v2.13)
For version 2.13
---LH self-extracting archive
OFFSET
Count TYPE
Description
001Ch
8 byte
???
0024h
8 byte
ID='LH's SFX '
---TopSpeed C 3.0 CRUNCH compressed file
OFFSET
Count TYPE
Description
001Ch
1 dword ID=018A0001h
0020h
1 word
ID=1565h
---PKARC 3.5 self-extracting archive
OFFSET
Count TYPE
Description
001Ch
1 dword ID=00020001h
0020h
1 word
ID=0700h
---BSA (Soviet archiver) self-extracting archive
OFFSET
Count TYPE
Description
001Ch
1 word
ID=000Fh
001Eh
1 byte
ID=A7h
---LARC self-extracting archive
OFFSET
Count TYPE
Description
001Ch
4 byte
???
0020h
11 byte
ID='SFX by LARC '
After the header, there follow the relocation items, which are used to span
multpile segments. The relocation items have the following format :
OFFSET
Count TYPE
Description
0000h
1 word
Offset within segment
0002h
1 word
Segment of relocation
To get the position of the relocation within the file, you have to compute the
physical adress from the segment:offset pair, which is done by multiplying the
segment by 16 and adding the offset and then adding the offset of the binary
start. Note that the raw binary code starts on a paragraph boundary within the
executable file. All segments are relative to the start of the executable in
memory, and this value must be added to every segment if relocation is done
manually.
EXTENSION:EXE,OVR,OVL
OCCURENCES:PC
PROGRAMS:MS-DOS
REFERENCE:Ralf Brown's Interrupt List
SEE ALSO:COM,EXE,NE EXE
NCSA HDF
Specification and Developer's Guide
Version 3.2
September 1993
University of Illinois at Urbana-Champaign
NCSA HDF Version 3.3 source code and documentation are in the public domain.
Specifically, we give to the public domain all rights for future licensing of the
source code, all resale rights, and all publishing rights.
We ask, but do not require, that the following message be included in all derived
works: Portions developed at the National Center for Supercomputing
Applications at the University of Illinois at Urbana-Champaign.
READ ME NOW
If you want to see more software like NCSA HDF, you need to send us a letter, email
or US mail, telling us what you are doing with NCSA HDF. We need to know: (1)
What science you are working on—an abstract of your work would be fine; and (2)
How NCSA HDF has helped you, for example, by increasing your productivity or
allowing you to do things you could not do before.
We encourage you to cite the use of NCSA HDF, and any other NCSA software you
have used, in your publications. A bibliography of your work would be extremely
helpful.
NOTE: This is a new kind of shareware. You share your science and successes with
us, and we can get more resources to share more software like NCSA HDF with you.
NCSA Contacts
Mail user feedback, bugs, and software
and manual suggestions to:
Send commmunications via electronic
mail to one of the following:
NCSA Software Tools Group
HDF
152 Computing Applications Bldg.
605 E. Springfield Ave.
Champaign, IL 61820
Bug Suggestions
[email protected]
[email protected]
All Other Communications
[email protected]
[email protected]
Disclaimer
THE UNIVERSITY OF ILLINOIS GIVES NO WARRANTY, EXPRESS OR IMPLIED,
FOR THE SOFTWARE AND/OR DOCUMENTATION PROVIDED, INCLUDING,
WITHOUT LIMITATION, WARRANTY OF MERCHANTIBILITY AND WARRANTY
OF FITNESS FOR A PARTICULAR PURPOSE.
Trademark
Acknowledgments
Macintosh and Macintosh II are trademarks of Apple Computer Inc.
UNIX is a registered trademark of AT&T.
CRAY and UNICOS are registered trademarks and CRAY-2 and CFT77 are trademarks
of Cray Research Inc.
IBM PC is a registered trademark of International Business Machines Corporation.
MS-DOS is a registered trademark of Microsoft Corporation.
Sun is a registered trademark and Sun Workstation and Sun System 3 are trademarks
of Sun Microsystems Inc.
Table of contents
Introduction
Overview
vi
Why HDF? v i
What is HDF?
vii
Some History
ix
About This Document
x
Conventions Used in This Document
Chapter
1
xi
Basic Structure of HDF Files
Chapter Overview
File Header
1-1
Data Objects
1-1
1-1
Physical Organization of HDF Files
1-4
Sample HDF File 1 - 5
Chapter
2
Software Overview
Chapter Overview
2-1
HDF Software Layers
2-1
Software Organization
2-2
Some HDF Conventions 2 - 6
April 12, 1996
iii
NCSA HDF Specification and Developer’s Guide
Chapter
3
General Purpose Interface
Chapter Overview
Introduction
3-1
3-1
New Low Level Routines with Version 3.2
3-2
Chapter
4
Overview of the Interface
3-2
Function Specifications
3-6
Sets and Groups
Chapter Overview
4-1
Data Sets 4 - 1
Groups
4-2
Raster Image Sets (RIS)
Scientific Data Sets
4-5
4-8
Vsets, Vdatas, and Vgroups 4 - 1 4
The Raster-8 Set
Chapter
5
(Obsolete)
4-16
Annotations
Chapter Overview
5-1
General Description
5-1
File Annotations 5 - 2
Object Annotations
Chapter
6
Tag Specifications
Chapter Overview
iv
5-2
6-1
National Center for Supercomputing Applications
Introduction
The HDF Tag Space
6-1
Extended Tags and Alternate Physical Storage
Methods
6-1
Tag Specifications
Chapter
7
6-7
Portability Issues
Chapter Overview
7-1
The HDF Environment
7-1
Organization of Source Files
7-3
Passing Strings Between FORTRAN and C
7-5
Function Return Values between FORTRAN and
C
7-7
Differences in Routine Names
7-8
Differences Between ANSI C and Old C
Type Differences
7-10
7-11
Access to Library Functions 7 - 1 4
Appendix
A
Tag and Extended Tag Table
Tags A - 1
Extended Tag Labels
April 12, 1996
A-4
v
I
Introduction
Overview
The Hierarchical Data Format (HDF) was designed to be an easy,
straight-forward, and self-describing means of sharing scientific data
among people, projects, and types of computers. An extensible header
and carefully crafted internal layers provide a system that can grow as
scientific data-handling needs evolve.
This document, the NCSA HDF Specification and Developer’s Guide,
fully defines HDF and its interfaces, discusses criteria employed in its
development, and provides guidelines for developers working on HDF
itself or building applications that employ HDF.
This introduction provides a brief overview of HDF capabilities and
design.
Why HDF?
A fundamental requirement of scientific data management is the ability
to access as much information in as many ways, as quickly and easily
as possible. A data storage and retrieval system that facilitates these
capabilities must provide the following features:
Support for scientific data and metadata
Scientific data is characterized by a variety of data types and
representations, data sets (including images) that can be extremely
large and complex, and the need to attach accompanying attributes,
parameters, notebooks, and other metadata. Metadata,
supplementary data that describes the basic data, includes
information such as the dimensions of an array, the number type of
the elements of a record, or a color lookup table (LUT).
Support for a range of hardware platforms
Data can originate on one machine only to be used later on many
different machines. Scientists must be able to access data and
metadata on as many hardware platforms as possible
Support for a range of software tools
Scientists need a variety of software tools and utilities for easily
searching, analyzing, archiving, and transporting the data and
metadata. These tools range from a library of routines for reading
and writing data and metadata, to small utilities that simply display
an image on a console, to full-blown database retrieval systems that
provide multiple views of thousands of sets of data and metadata.
April 12, 1996
vi
Introduction
Rapid data transfer
Both the size and the dispersion of scientific data sets require that
mechanisms exist to get the data from place to place rapidly.
Extendibility
As new types of information are generated and new kinds of science
are done, a means must be provided to support them.
What is HDF?
The HDF Structure
Figure I.1
HDF is a self-describing extensible file format using tagged objects that
have standard meanings. The idea is to store both a known format
description and the data in the same file. HDF tags describe the format
of the data because each tag is assigned a specific meaning: the tag
DFTAG_LUT stands for color palette, the tag DFTAG_RI stands for 8bit
raster image, and so on (see Figure 1). A program that has been written
to understand a certain set of tag types can scan the file for those tags
and process the data. This program also can ignore any data that is
beyond its scope.
Raster Image Set in an HDF File . The set has three data objects with different tags representing three
different types of data. The palette and dimension objects contain metadata.
palette
dimensions
400 x 600
raster image
The set of available data objects encompasses both primary data and
metadata. Most HDF objects are machine- and medium-independent,
physical representations of data and metadata.
HDF Tags
April 12, 1996
The HDF design assumes that we cannot know a priori what types of
data objects will be needed in the future, nor can we know how
scientists will want to view that data. As science progresses, people
will discover new types of information and new relationships among
existing data. New types of data objects new tags will be created to
meet these expanding needs. To avoid unnecessary proliferation of tags
and to ensure that all tags are available to potential users who need to
share data, a portable public domain library is available that interprets
all public tags. The library contains user interfaces designed to provide
views of the data that are most natural for users. As we learn more
about the way scientists need to view their data, we can add user
interfaces that reflect data models consistent with those views.
vii
NCSA HDF Specification and Developer’s Guide
Types of Data and
Structures
HDF currently supports the most common types of data and metadata
that scientists use, including multidimensional gridded data, 2dimensional raster images, polygonal mesh data, multivariate data sets,
finite-element data, non-Cartesian coordinate data, and text.
In the future there will almost certainly be a need to incorporate new
types of data, such as voice and video, some of which might actually be
stored on other media than the central file itself. Under such
circumstances, it may become desirable to employ the concept of a
virtual file. A virtual file functions like a regular file but does not fit
our normal notion of a monolithic sequence of bits stored entirely on a
single disk or tape.
HDF also makes it possible for the user to include annotations, titles,
and specific descriptions of the data in the file. Thus, files can be
archived with human-readable information about the data and its origins
One collection of HDF tags supports a hierarchical grouping structure
called Vset that allows scientists to organize data objects within HDF
files to fit their views of how the objects go together, much as a person
in an office or laboratory organizes information in folders, drawers,
journal boxes, and on their desktops.
Backward and Forward
Compatibility
An important goal of HDF is to maximize backward and forward
compatibility among its interfaces. This is not always achievable,
because data formats must sometimes change to enhance performance,
to correct errors, or for other reasons. However, whenever possible,
HDF files should not become out of date. For example, suppose a site
falls far behind in the HDF standard so its users can only work with the
portions of the specification that are three years old. Users at this site
might produce files with their old HDF software then read them with
newer software designed to work with more advanced data files. The
newer software should still be able to read the old files.
Conversely, if the site receives files that contain objects that its HDF
software does not understand, it should still be able to list the types of
data in the file. It should also be able to access all of the older types of
data objects that it understands, despite the fact that the older types of
data objects are mixed in with new kinds of data. In addition, if the
more advanced site uses the text annotation facilities of HDF
effectively, the files will arrive with complete human-readable
descriptions of how to decipher the new tag types.
Calling Interfaces
To present a convenient user interface made up of something more
usable than a list of tag types with their associated data requirements,
HDF supports multiple calling interfaces.
The low level calling interfaces are used to manipulate tags and raw
data, for error handling, and to control the physical storage of data.
These interfaces are designed to be used by developers who are providing
the higher level interfaces for applications like raster image storage or
scientific data archiving.
The application interfaces, at the next level, include several modules
specifically designed to simplify the process of storing and accessing
viii
National Center for Supercomputing Applications
Introduction
specific types of data. For example, the palette interface is designed to
handle color palettes and lookup tables while the scientific data interface
is designed to handle arrays of scientific data. If you are primarily
interested in reading or writing data to HDF files, you will spend most
of your time working with the application interfaces.
The HDF utilities and NCSA applications, at the top level, are special
purpose programs designed to handle specific tasks or solve specific
problems. The utilities provide a command line interface for data
management. The applications provide solutions for problems in
specific application areas and often include a graphic user interface.
Several third party applications are also available at this level.
Machine Independence
An important issue in data file design is that of machine independence
or transportability. The HDF design defines standard representations for
storing all data types that it supports. When data is written to a file, it
is typically written in the standard HDF representation. The conversion
is handled by the HDF software and need not concern the user. Users
may override this convention and install their own conversion routines,
or they may write data to a file in the native format of the machine on
which it was generated.
Some History
In 1987 a group of users and software developers at NCSA searched for
a file format that would satisfy NCSA’s data needs. There were some
interesting candidates, but none that were in the public domain, were
targeted to scientific data, and yet were sufficiently general and
extensible. In the course of several months, borrowing concepts from
several existing formats, the group designed HDF.
The first version of HDF was implemented in the spring and summer of
1988. It included a general purpose interface and an 8-bit raster image
interface. In the fall of 1988, a scientific data set interface was designed
and implemented, enabling HDF users to store multidimensional arrays
and related data. Soon thereafter interfaces were implemented for storing
color palettes, 24-bit raster images, and annotations.
In 1989, it became clear that there was a need to support a general
grouping structure and unstructured data such as that used to represent
polyhedra in graphical applications. This led to Vsets, whose interface
routines were implemented as a separate HDF library.
Also in 1989 it became clear that the existing general purpose layer was
not sufficiently powerful to meet anticipated future needs and that the
coding could use a substantial overhaul. From this, the long process of
redesigning the lower layers of HDF began. The first version
incorporating extended tags and the new lower layers of HDF was
released in the summer of 1992 as HDF Version 3.2.
This release, HDF Version 3.3, provides alternative physical storage
methods (external and linked block data elements) through extended
tags, JPEG data compression, changes to some Vset interface functions,
April 12, 1996
ix
NCSA HDF Specification and Developer’s Guide
access to netCDF files through a complete netCDF interface,1
hyperslab access routines for old-style SDS objects, and various
performance improvements.
About This Document
This document is designed for software developers who are designing
applications or routines for use with HDF files and for users who need
detailed information about HDF. Users who are interested in using HDF
to store or manipulate their data will not normally need the kind of
detail presented in this manual. They should instead consult one of the
user-level documents:
Versions 3.2 and earlier
NCSA HDF Calling Interfaces and Utilities
NCSA HDF Vset
Version 3.3
Getting Started with NCSA HDF
NCSA HDF User’s Guide
NCSA HDF Reference Manual
Someone using third-party software that uses HDF may also have to
consult a manual for that software.
Document Contents
The NCSA HDF Specification and Developer’s Guide contains the
following chapters and appendix:
Chapter 1: Basic Structure of HDF Files
Introduces and describes the components and organization of HDF
files
Chapter 2: Software Overview
Describes the organization of the software layers that make up the
basic HDF library and provides guidelines for writing HDF
software
Chapter 3: General Purpose Interface
Describes the low level HDF routines that make up the general
purpose interface
Chapter 4: Sets and Groups
Explains the roles of sets and groups in an HDF file, and describes
raster image sets, scientific data sets, and Vsets
Chapter 5: Annotations
Explains the use of annotations in HDF files
Chapter 6: Tag Specifications
Describes the tag identification space, the extended tag structure,
and all of the NCSA-supported tags
Chapter 7: Portability Issues
Describes the measures taken to maximize HDF portability across
platforms and to ensure that HDF routines are available to both C
and FORTRAN programs
1
NetCDF is a network-transparent derivative of the original CDF (Common Data Format) developed by the
National Aeronautics and Space Administration (NASA). It is used widely in atmospheric sciences and
other disciplines requiring very large data structures. NetCDF is in the public domain and was
developed at the Unidata Program Center in Boulder, Colorado.
x
National Center for Supercomputing Applications
Introduction
Appendix A: Tags and Extended Tag Labels
Presents a list of NCSA-supported HDF tags and a list of labels
used with extended tags
Conventions Used in This Document
Most of the descriptive text in this guide is printed in 10 point New
Century Schoolbook. Other typefaces have specific meanings that will
help the reader understand the functionality being described.
New concepts are sometimes presented in italics on their first
occurrence to indicate that they are defined within the paragraph.
Cross references within the specification include the title of the
referenced section or chapter enclosed in quotation marks. (E.g., See
Chapter 1, "The Basic Structure of HDF Files," for a description of the
basic HDF file structure.)
References to documents italicize the title of the document. (E.g., See
the guide Getting Started with NCSA HDF to familiarize yourself with
the basic principles of using HDF.)
Literal expressions and variables often appear in the discussion. Literal
expressions are presented in Courier while variables are presented in
italic Courier. A literal expression is any expression that would be
entered exactly as presented, e.g., commands, command options, literal
strings, and data. A variable is an expression that serves as a place
holder for some other text that would be entered. Consider the
expression cp file1 file2. cp is a command name and would be
entered exactly as it appears, so it is printed in bold Courier. But
file1 and file2 are variables, place holders for the names of real
files, so they are printed in italic bold Courier; the user would enter the
actual filenames.
This guide frequently offers sample command lines. Sometimes these
are examples of what might be done; other times they are specific
instructions to the user. Command lines may appear within running
text, as in the preceding paragraph, or on a separate line, as follows:
cp file1 file2
Command lines always include one or more literal expressions and may
include one or more variables, so they are printed in Courier and italic
Courier as described above.
Keys that are labeled with more than one character, such as the
RETURN key, are identified with all uppercase letters. Keys that are to
be pressed simultaneously or in succession are linked with a hyphen.
For example, “press CONTROL-A” means to press the CONTROL key
then, without releasing the CONTROL key, press the A key.
Similarly, “press CONTROL-SHIFT-A “ means to press the
CONTROL and SHIFT keys then, without releasing either of those,
press the A key.
Table I.1 summarizes the use of typefaces in the technical discussion
(i.e., everything except references and cross references).
April 12, 1996
xi
NCSA HDF Specification and Developer’s Guide
Table I.1
Meaning of entry format notations
Type
Appearance
Example
Entry Method
Literal expression
(commands, literal
strings, data)
Courier
dothis
Enter the expression exactly as it
appears.
Variables
Italic Courier
filename
Enter the name of the file or the
specific data that this expression
represents.
Special keys
Uppercase
RETURN
Press the key indicated.
Key combinations
Uppercase with
hyphens between key
names
CONTROL-A
While holding down the first one or
two keys, press the last key.
Program listings and screen listings are presented in a boxed display in
Courier type such as in Figure I.2, “Sample Screen Listing.” When the
listing is intended as a sample that the reader will use for an exercise or
model, variables that the reader will change are printed in italic Courier.
Figure I.2
Sample screen listing
mars_53% ls -F
MinMaxer/
mars_54% cd MinMaxer
mars_55% ls -F
list.MinMaxer
mars_56% cd minmaxer.v1.04
mars_57% ls -F
COPYRIGHT
README
mars_58%
xii
net.source
minmaxer.v1.04/
minmaxer.bin/
sample/
source.minmaxer/
source.triangulation/
National Center for Supercomputing Applications
Chapter
1
Basic Structure of HDF Files
Chapter Overview
This chapter introduces and describes the components and organization
of Hierarchical Data Format (HDF) files.
File Header
The first component of an HDF file is the file header (FH), which takes
up the first four bytes in an HDF file. The file header is a signature that
indicates that the file is an HDF file. Specifically, it is a 32-bit magic
number with the hexadecimal value 0e031301.
Note: To ensure portability,
the programmer must ensure
that the hexadecimal value in
an HDF file header is written in
big-endian order.
HDF assumes big-endian order in reading and writing files. The order of
bytes in the file header might be swapped on some machines when the
HDF file header is written, causing these characters to be written in
little-endian order. To maintain HDF file portability when developing
software for such machines, you must make sure the characters are read
and written in the exact order shown.
Data Objects
The basic building block of an HDF file is the data object, which
contains both data and information about the data. A data object has two
parts: a 12-byte data descriptor (DD) and a data element. Figure 1.1
illustrates two data objects.
Figure 1.1
Two Data Objects
Data Descriptors
Rank and dimensions
Data
Data Elements
2; 90 by 100
63.2, 54.5,
18.2, 103.6,
:
:
12.1,
6.9,
12.3, . . .
-7.4, . . .
:
83.6, . . .
As the names imply, the data descriptor provides information about the
data; the data element is the data itself. In other words, all data in an
April 12, 1996
1-1
NCSA HDF Specification and Developer’s Guide
HDF file has information about itself attached to it. In this sense, HDF
files are self-describing files.
Data Descriptor (DD)
Figure 1.2
A data descriptor (DD) has four fields: a 16-bit tag, a 16-bit reference
number, a 32-bit data offset, and a 32-bit data length. These are depicted
in Figure 1.2 and are briefly described in Table 1.1. Explanations of
each part appear in the paragraphs following Table 1.1.
A Data Descriptor (DD)
Reference
number
Tag
16 bits
16 bits
Offset
Length
32 bits
32 bits
Tag/ref
(data identifier)
Table 1.1
Parts of a Data Descriptor
Part
Tag/ref
(data identifier)
Offset
Length
Note: Only the full tag/ref
uniquely identifies a data
element.
Description
Unique identifier for each data element
Tag
Type of data in a data element
Reference
Number distinguishing data element from
number
others with the same tag
Byte offset of data element from beginning of file
Length of data element
Tag/ref (Data Identifier)
A tag and its associated reference number (abbreviated as tag/ref)
uniquely identify a data element in an HDF file. The tag/ref
combination is also known as a data identifier.
Tag
A tag is the part of a data descriptor that tells what kind of data is
contained in the corresponding data element. A tag is actually a 16-bit
unsigned integer between 1 and 65535, but every tag is also given a
name that programs can refer to instead of the number. If a DD has no
corresponding data element, its tag is DFTAG_NULL, indicating that no
data is present. A tag may never be zero.
Tags are assigned by NCSA as part of the specification of HDF. The
following ranges are to be used to guide tag assignment:
00001 – 32767 reserved for NCSA use
32768 – 64999 user-definable
65000 – 65535 reserved for expansion of the format
Chapter 6, “Tag Specifications,” provides full specifications for all
currently supported HDF tags. Appendix A, “Tags and Extended Tag
Labels,” lists the current tag assignments. See the section “Some HDF
Conventions” in Chapter 2, “Software Overview,” for more information
on allocating tags.
1-2
National Center for Supercomputing Applications
Basic Structure of HDF Files
Reference Number
Tags are not necessarily unique in an HDF file; there may be more than
one data element of a given type. Therefore, each tag is associated with
a unique reference number in the data descriptor.
Reference numbers are not necessarily assigned consecutively, so you
cannot assume that the actual value of a reference number has any
meaning beyond providing a way of distinguishing among elements
with the same tag. Furthermore, reference numbers are only unique for
data elements with the same tag; two 8-bit raster images will never
have the same reference number but an 8-bit raster image and a 24-bit
raster image might.
Reference numbers are 16-bit unsigned integers.
Data Offset and Length
Note: All offsets are from the The data offset states the byte position of the corresponding data
beginning of the file; they are element from the beginning of the file. The length states the number of
bytes occupied by the data element.
not relative.
Offset and length are both 32-bit unsigned integers.
DD Blocks
Data descriptors are stored physically in a linked list of blocks called
data descriptor blocks or DD blocks. The individual components of a
DD block are depicted in Figure 1.3. All of the DDs in a DD block are
assumed to contain significant data unless they have the tag
DFTAG_NULL (no data).
In addition to its DDs, each data descriptor block has a data descriptor
header (DDH). The DDH has two fields: a block size field and a next
block field. The block size field is a 16-bit unsigned integer that
indicates the number of DDs in the DD block. The next block field is a
32-bit unsigned integer giving the offset of the next DD block, if there
is one. The DDH of the last DD block in the list contains a 0 in its
next block field.
Figure 1.3
Model of a Data Descriptor Block
Block Next Tag Ref Offset Length Tag
Ref Offset Length Tag Ref Offset Length ....
size block
DDH
DD
DD
DD
DD Block
Since the default number of DDs in a DD block is defined when the
HDF library is compiled, changing the default requires recompilation.
Data Element
A data element is the raw data portion of a data object. Its data type can
be determined by examining its tag, but other interpretive information
may be required before it can be processed properly.
April 12, 1996
1-3
NCSA HDF Specification and Developer’s Guide
Each data element is stored as a set of contiguous bytes starting at the
offset and with the length specified in the corresponding DD.1
Exceptions
Note that the data object identified by the tag DFTAG_MT does not
adhere to the standards described above; it consists of the tag
immediately followed by four number types. Since there can be only
one DFTAG_MT tag in an HDF file, there is no need for a reference
number. Since all the data can be stored in the DD with the tag, there
is no need for a data element and the offset and length are unnecessary.
Several other tags, such as DFTAG_NULL and DFTAG_JPEG, serve as
binary flags and convey all the required information by the mere fact of
their presence in an HDF file. These tags therefore point to no data
element and have offset and length values of 0. Consider these
examples: DFTAG_NULL indicates a data object containing no data;
DFTAG_JPEG indicates that an associated data object, indicated by
another tag, contains a JPEG data image. The descriptions of these tags
include a sink pointer (
)in the diagrams in Chapter 6.
See the related entries in Chapter 6, “Tag Specifications,” for a
complete descriptions of these tags.
Physical Organization of HDF Files
The file header, DD blocks, and data elements appear in the following
order in an HDF file:
• File header
• First DD block
• Data elements
• If necessary, more DD blocks, more data elements, etc.
These relationships are summarized in Table 1.2.
The only rule governing the distribution of DD blocks and data
elements within a file is that the first DD block must follow
immediately after the file header. After that, the pointers in the DD
headers connect the DD blocks in a linked list and the offsets in the
individual DDs connect the DDs to the data elements.
Table 1.2
Summary of the Relationships among Parts of an HDF File
Part
Constituents
HDF file
FH, DD block, data, DD block, data, DD block, data...
FH
0x0e031301 [32-bit HDF magic number]
DD block
DDH, DD, DD, DD, ...
DDH
Number of DDs [16 bits], offset to next DD block [32 bits]
DD
Tag [16 bits], ref [16 bits], offset [32 bits], length [32 bits]
Data
Data element, data element, data element ...
FH = file header, DD = data descriptor, DDH = DD header
1
1-4
Some HDF software provides the capability of storing objects as a series of linked blocks or external
elements, but this occurs at a higher level. At the lowest level each object with a tag/ref is stored
contiguously.
National Center for Supercomputing Applications
Basic Structure of HDF Files
Sample HDF File
We are now ready to examine a sample file. Consider an HDF file that
contains two 400-by-600 8-bit raster images as described in Table 1.3.
Table 1.3
Sample Data Objects in an HDF File
Tag
Ref
DFTAG_FID
1
DFTAG_FD
1
DFTAG_LUT
1
DFTAG_ID
1
DFTAG_RI
1
DFTAG_RI
2
Data
File identifier: user-assigned title for file
File descriptor: user-assigned block of text describing overall file contents
Image palette (768 bytes)
x- and y-dimensions of the 2-dimensional arrays that contain the raster
images (4 bytes)
First 2-dimensional array of raster image pixel data (x*y bytes)
Second 2-dimensional array of pixel data (also x*y bytes)
Assuming that a DD block contains 10 DDs, the physical organization
of the file could be described by Figure 1.5.
In this instance, the file contains two raster images. The images have
the same dimensions and are to be used with the same palette, so the
same data objects for the palette (DFTAG_IP8) and dimension record
(DFTAG_ID8) can be used with both images.
Figure 1.5
Physical Representation of Data Objects
Section
Item
Offset
Contents
Header
DD block
FH
DDH
DD
DD
DD
DD
DD
DD
DD
DD
DD
DD
Data
Data
Data
Data
Data
Data
0
4
10
22
34
46
58
70
82
94
106
118
130
134
175
943
947
240947
0e031301
Data
April 12, 1996
10
(HDF magic number, in hexadecimal)
0
DFTAG_FID
1
130
4
DFTAG_FD
1
134
41
DFTAG_LUT
1
175
768
DFTAG_ID
1
943
4
DFTAG_RI
1
947
240000
DFTAG_RI
2
240947 240000
DFTAG_NULL
DFTAG_NULL
DFTAG_NULL
DFTAG_NULL
(Empty)
(Empty)
(Empty)
(Empty)
sw3
solar wind simulation: third try. 8/8/88
....
400 600
....
....
(Data for the image palette)
(Image dimensions)
(Data for the first raster image)
(Data for the second raster image)
1-5
Chapter
2
Software Overview
Chapter Overview
This chapter describes the HDF software organization and provides
guidelines for writing HDF software.
HDF is an amalgam of code and functionality from many sources. For
example, the netCDF code came from the Unidata Program Center, and
data compression and conversion software has been acquired from a
variety of third parties. NCSA staff wrote the code for the basic HDF
functionality and perfomed all of the integration work.
This document contains specifications for the NCSA-developed code and
functionality. It does not include specifications for code or
functionality from non-NCSA sources, though it does sometimes refer
to specifications provided by other sources. Only the HDF interface to
such code is specified in this document.
HDF Software Layers
There are three basic levels of HDF software:
• The HDF low level interface
• The HDF application interfaces
• HDF applications and utilities
The lowest layer, the low level interface, includes general purpose
routines that form the basis of all higher-level HDF development. The
low level routines directly execute functions such as file I/O, error
handling, memory management, and physical storage.
The application interfaces support higher level views of data and provide
the interfaces for building user-level applications. Routines to handle
raster images, palettes, annotations, scientific data sets, Vdatas and
netCDF appear at this level.
The applications and utilities are implemented at the highest level.
NCSA utilities, NCSA applications, and third party applications are all
implemented at this level.
The utilities perform general functions, such as listing the contents of
an HDF file, and more specialized functions, such as converting data
from one HDF data type to another (e.g., raster images to scientific data
sets). In general, the utilities have simple command line interfaces and
perform data management tasks.
The applications usually perform data analysis tasks and have polished
interactive user interfaces. They include the NCSA Visualization Tool
April 12, 1996
2-1
NCSA HDF Specification and Developer’s Guide
Suite, commercial software packages that use HDF, and other packages
created at NCSA and by various third party projects.
Figure 2.1 illustrates this layered implementation.
Figure 2.1.
HDF Software Layers 1
HDF Utilities
NCSA Applications
3rd Party Applications
HDF Application Interfaces
HDF Low Level Interfaces
HDF File
The general purpose interfaces are described in detail in this document.
The application interfaces and command line utilities are described in
the document NCSA HDF Calling Interfaces and Utilities for Versions
3.2 and earlier and in the NCSA HDF User’s Guide and NCSA HDF
Reference Manual for Version 3.3. Other HDF-based software tools
should have their own manuals.
Since the NCSA user community writes programs primarily in C and
FORTRAN, all of the HDF application interfaces developed at NCSA
are callable from both C and FORTRAN programs. Since the general
purpose interface is primarily for program development, not for
applications, it provides C-callable routines only.
Software Organization
Versions and Release
Numbers
Since HDF is under continual development, new releases are
periodically made available. Each new release of the HDF library is
identified by a version number.
The version number consists of three elements:
majorv
Major version number
minorv
Minor version number
rn
Release number
The version number is presented in the following format:
majorv.minorvrrn
(e.g., Version 3.2r1)
These elements are interpreted as follows:
Major version number
A new major version number is assigned when there is some
fundamental difference between a new version of the library and the
1
2-2
This is a simplified illustration of the HDF software layers. Though the basic principles illustrated here continue to apply,
the introduction of netCDF and multiple-file HDF data structures renders the implementation considerably more complex.
National Center for Supercomputing Applications
HDF Software Overview
previous version. When a new major version is released, HDF users
and developers are strongly encouraged to obtain the new source code
and documentation. There will probably be added functionality in
successive major versions of the library and some obsolete code may
be deleted. Some user code may have to be modified to use the new
library.
Minor version number
A new minor version number indicates an intermediate release
between one major version and the next. Changes will probably be
significant. When a new minor version is released, users and
developers are strongly encouraged to obtain the new source code and
documentation.
Release number
A new release number is assigned when bug fixes or other small
modifications have been made. Using a new release of the same
version of the library will not usually require modifying existing
user code.
ANSI C and Portability
To ensure that HDF can be easily ported to new platforms, all versions
of the HDF source code from Version 3.2 on will be written in ANSI
standard C, with special provisions for non-ANSI compilers. For more
information about porting HDF and writing portable HDF-based code,
refer to Chapter 7, "Making HDF Portable."
Modules and Interfaces
The HDF distribution contains many source files or modules that can
be grouped into families. For example, dfp.c, dfpf.c, and dfpff.f
all share the root name dfp and, therefore, all belong to the dfp
family. In general, each family of source modules represents one HDF
applications interface; the dfp family represents the HDF Palette
Interface. Exceptions to this rule will be discussed later in this section.
For each interface, there is necessarily one file that contains the C code
that provides the basic functionality of that interface. But some
interfaces may have one or two additional code modules that provide
FORTRAN callability for the interface, so families may have one, two,
or three files:
April 12, 1996
1 file
Modules of this sort are generally not calling interfaces
themselves; they provide useful support functions for actual
calling interfaces. Since they are not meant to be called by any
routine outside the HDF library, they do not need to be
FORTRAN-callable. Example: hblocks.c is called only by
internal HDF routines and has only the C-callable interface.
2 files
Although there are currently no two-file families, it is
conceivable (and desirable) that some future interface will need
only one extra source module to provide FORTRAN
compatibility. If this were to happen, there would only be two
source modules for the interface. Example: dfnew.c and
dfnewf.c would make up the New Interface.
3 files
Most current implementations of FORTRAN-callable HDF
interfaces require that character string arguments be passed to
some of their functions. Due to differences in the way C and
2-3
NCSA HDF Specification and Developer’s Guide
FORTRAN represent strings, passing strings requires that
there be a small amount of special purpose FORTRAN code
written for each function that takes a string argument.
Therefore, most FORTRAN-callable HDF interfaces consist of
three source modules:
• The primary C module
• A FORTRAN-callable C module
• A FORTRAN module
Example: dfsd.c, dfsdf.c, and dfsdff.f make up the
Scientific Data Set Interface. dfsd.c contains the basic
functionality of the interface. dfsdf.c provides the major
part of FORTRAN callability. And dfsdff.f contains the
special purpose FORTRAN code that enables passing character
string arguments.
Header Files
In addition to the source code modules discussed above, some interfaces
also have C header files associated with them that are meant to be
included by C applications programmers with the #include
preprocessor directive. They contain useful constants and data structures
for interaction with the interface from C programs. The header files can
be identified by the same name as the root name for the rest of the
family with the .h extension. For example, dfsd.h is the header file
for the Scientific Data Set Interface.
Of particular importance among the C header files are hdf.h and
hdfi.h:
hdf.h Contains all the symbolic constants and public data structures
required by HDF. hdf.h should be included by any program
that uses any of these constants or data structures.
hdfi.h Contains specific portability information about each platform
on which HDF is supported. hdfi.h is automatically
included in programs when hdf.h is included, so
programmers need not explicitly include it.
Refer to Chapter 7, “Making HDF Portable,” for more information on
hdfi.h and other portability issues.
By way of illustration, Table 2.1 lists selected families of source code
modules and header files from of HDF Version 3.3.
Table 2.1
Sample HDF Version 3.3 Source Code Modules
General
headers
General
purpose
Grouping
(non-Vset)
Utilities
Annota- General
tions
rasters
Scientific
data sets
Vsets
hdf.h
hdfi.h
hproto.h
dfivms.h
hfile.c
hfilef.c
hfileff.f
hkit.c
hblocks.c
hextelt.c
herr.c
herrf.c
hfile.h
herr.h
dfgroup.c
dfgroup.h
dfutil.c
dfutilf.c
dfutilff.f
dfutil.h
dfan.c
dfanf.c
dfanff.f
dfan.h
dfsd.c
dfsdf.c
dfsdff.f
dfsd.h
vg.c
vgf.c
vgff.f
vfp.c
vgi.h
vio.c
vconv.c
vparse.c
vrw.c
vsfld.c
vg.h
vproto.h
2-4
dfgr.c
dfgr.h
dfcomp.c
dfimcomp.c
dfrig.h
National Center for Supercomputing Applications
HDF Software Overview
The HDF Test Suite
In addition to the source code for the HDF library, versions 3.2 and
higher include a test suite. There are two test modules: one for C and
one for FORTRAN. Each module tests all of the routines in all of the
application interfaces and in the general purpose interface. The exact
form of these test modules may vary from one release to the next;
consult the release code and online test documentation for details.
Every effort has been made to ensure that the test programs provide a
thorough and accurate assessment of the health of the HDF library.
Although the test suite will greatly improve the reliability of HDF
code, it is almost inevitable that some parts of the code will remain
untested. Therefore, no guarantees can be made on the basis of test suite
performance.
Sample HDF Programs
Each HDF release includes several sample programs to help users write
HDF programs. They illustrate some of the common techniques
employed by HDF programmers.
Some HDF Conventions
The HDF specification described in the previous chapter is not
sufficient to guarantee its success. It is also important that HDF
programmers and users adhere to certain conventions. Some guidelines
are implicit in the discussions in other sections of this document.
Others are presented in the document NCSA HDF Calling Interfaces and
Utilities (for Versions 3.2 and earlier) or in the NCSA HDF User’s
Guide and NCSA HDF Reference Manual (for Version 3.3).
Guidelines not covered elsewhere are introduced in this section.
Naming and Assigning
Tags
Tags that are to be made available to a general population of HDF users
should be assigned and controlled by NCSA. Tags of this type are given
numbers in the range 1 to 32,767. If you have an application that fits
this criterion, contact NCSA at the address listed in the front matter at
the beginning of this manual and specify the tags you would like. For
each tag, your specifications should include a suggested name,
information about the type and structure of the data that the tag will
refer to, and information about how the tag will be used. Your
specifications should be similar to those contained in Chapter 6, “Tag
Specifications.” NCSA will assign a set of tags for your application
and will include your tag descriptions in the HDF documentation.
Tags in the range 32,768 to 64,999 are user-definable. That is, you can
assign them for any private application. If you use tags in this range,
be aware that they may conflict with other people's private tags.
April 12, 1996
2-5
NCSA HDF Specification and Developer’s Guide
Using Reference Numbers
to Organize Data Objects
The HDF library itself uses reference numbers solely to distinguish
among objects with the same tag. While application programmers may
find it convenient to impart some meaning to reference numbers, they
Note: Users are discouraged should be forewarned that the HDF library will be ignorant of any such
from assigning any meaning to meaning.
reference numbers beyond
that imparted by the HDF
library.
Multiple References
Multiple references to a single data element are quite common in HDF.
The general purpose routine Hdupdd generates a new reference to data
that is already pointed to by another DD. If Hdupdd is used several
times, there may be several DDs that point to the same data element.
It is important to note that when a multiply-referenced data element is
deleted or moved, the various DDs that previously pointed to the data
element are not automatically deleted or adjusted to point to the data
element in its new location. Consequently, each DD to be deleted or
moved should be checked for multiple references and handled
appropriately.
2-6
National Center for Supercomputing Applications
Chapter
3
General Purpose Interface
Chapter Overview
This chapter provides a detailed description of the routines that make
up the HDF general purpose interface.
Introduction
HDF supports several interfaces which can be categorized as high level
and general purpose interfaces:
• High level interfaces support utilities and applications.
• General purpose interfaces perform basic operations on HDF files.
These levels are illustrated in Figure 3.1, “HDF Software Layers.”
Figure 3.1.
HDF Software Layers
HDF Utilities
NCSA Applications
3rd Party Applications
HDF Application Interfaces
HDF Low Level Interfaces
HDF File
This chapter is concerned only with the general purpose routines.
Using these routines, you will be able to build and manipulate HDF
objects of any type, including those of your own design. All HDF
applications developed at NCSA use them as basic building blocks.
The general purpose routines are all written in C but are typically
accessible from FORTRAN.
New General purpose Routines with Version 3.2
The general purpose routines described in this chapter were new with
HDF Version 3.2, released in June 1992; they replace the routines
provided with earlier versions. The new routines provide better
April 12, 1996
3-1
NCSA HDF Specification and Developer’s Guide
performance and increased functionality and users are strongly advised
to use them in new applications. The old routines are supported through
emulation, but may be eliminated from the HDF library in a future
release.
The new lower layer incorporates the following improvements:
• More consistent data and function types
• More meaningful and extensive error reporting
• Simplification of key lower level functions
• Simplified techniques to facilitate portability
• Support for alternate forms of physical storage, such as linked
blocks storage and storage of the data portion of an object in an
external file
• A version tag to indicate which version of the HDF library last
changed a file
• Support for simultaneous access to multiple files
• Support for simultaneous access to multiple objects within a single
file
The previous lower layer was called the DF layer because all routines
began with the letters DF (e.g., DFopen and DFclose). The new
lower layer is called the H layer because all routines begin with the
letter H (e.g., Hopen, Hclose, and Hwrite). The source modules
containing these routines begin with the letter h (see Table 2.1, “HDF
Version 3.2 source code modules”):
hfile.c
Basic I/0 routines
herr.c
Error-handling routines
hkit.c
General purpose routines
hblocks.c
Routines to support linked block storage
hextelt.c
Routines to support external storage of HDF data
elements
Overview of the Interface
This section provides specifications and descriptions of the public
functions of the general purpose interface.
Opening and Closing HDF
Files
These calls are used to open and close HDF files:
Hopen
Provides an access path to an HDF file and reads all of the
DD blocks in the file into memory
Hclose
Closes the access path to a file
Locating Elements for Access
and Getting Information
These routines locate elements or acquire other information about an
HDF file or its data objects. Except for Hendaccess, they initialize the
element that they locate and return an access ID that is used in later
references to the data element. Calls can include wildcards so that one
can search for unknown tags and reference numbers (tag/refs).
Hstartread
Locates an existing data element with matching
tag/ref and returns an access ID for reading it
Hnextread
Continues the search with the same access ID
Hendaccess
Disposes of access ID for tag/ref
Hinquire
Returns access information about a data element
3-2
National Center for Supercomputing Applications
HDF General Purpose Interface
Determines whether a file is an HDF file.
Hnumber
Returns the number of occurrences of a specified
tag/ref in a file
Hgetlibversion
Returns version information for the current HDF
library
Hgetfileversion Returns version information for an HDF file
Hishdf
Reading and Writing Entire
Data Elements
There are two sets of routines for reading and writing data elements.
The routines described here are used to store and retrieve entire data
elements.
Hputelement
Adds or replaces elements in a file
Hgetelement
Reads data elements in a file
A second set of routines, described in the next section, may be used if
you wish to access only part of a data element.
Reading and Writing Part of a
Data Element
The second set of routines for reading and writing data elements makes
it possible to read or write all or part of a data element. One of the
access routines Hstartread or Hstartwrite must be called before
these Hwrite, Hread, or Hseek:
Hstartwrite
Hwrite
Hread
Hseek
Manipulating Data
Descriptors (DDs)
April 12, 1996
Sets up writing to the object with the supplied tag/ref. If the
object exists, it will be modified; otherwise it will be
created.
Writes data to a data element where the last write or Hseek()
stopped. If the space reserved is less than the length to
write, then only as much as can fit is written.
Reads a portion of a data element. It starts at the last
position left by an Hread or Hseek call and reads any data
that remains in the element up to a specified number of
bytes.
Sets the access pointer to an offset within a data element.
The next time Hread or Hwrite is called, the access
occurs from the new position. The location to seek can be
specified as an offset from the current location, from the
start of the element, or from the end of the element..
These routines perform operations on DDs without doing anything with
the data to which the DDs refer:
Hdupdd
Generates new references to data that is already
referenced from somewhere else
Hdeldd
Deletes a tag/ref from the list of DDs
Hnewref
Returns the next available reference number for the HDF
file
3-3
NCSA HDF Specification and Developer’s Guide
Creating Special Data
Elements
HDF 3.2 introduces two alternate methods of storing HDF objects:
linked blocks and external elements. In previous releases, any data
element had to be stored contiguously and all of the objects in an HDF
file had to be in the same physical file. The contiguous requirement
caused many problems, especially with regard to appending to existing
objects. If you wanted to append data to an object, the entire data
element had to be deleted and rewritten to the end of the file.
Linked blocks allow elements in a single HDF file to be noncontiguous.
External elements allow a single HDF object to be stored in an external
file.
It is not currently possible to store a single object (such as a very large
data set) in multiple files. Nor can multiple objects be stored in one
external file.
Once they are created with the following routines, these special data
elements can be accessed with the routines used for normal data
elements:
HLcreate
Creates a new linked block special data element
HXcreate
Creates a new external file special data element
These routines have two modes of operation. Calling HLcreate with a
tag/ref that does not exist in a file will create a new element with the
given tag/ref which will be stored as linked blocks. On the other hand,
if the tag/ref already exists in the file, the referenced object will be
promoted to linked block status. All data which had been stored in the
object before the promotion will be retained. HXcreate behaves
similarly.
Development Routines
The HDF library provides the following developer-level routines that
simplify the task of writing HDF applications. Most of these routines
mirror basic C library functions which are, unfortunately, not always
completely portable in their library form:
HDgettagname Returns a pointer to a text string describing a given
tag
HDgetspace
Allocates space
HDfreespace
Frees space
HDstrncpy
Copies a string from one location to another up to a
given number of characters
Error Reporting
The HDF library incorporates the notion of an error stack. This allows
much of the context to be known when trying to decipher an error
message.
Error reporting is handled by the following routines:
HEprint
Prints out all of the errors on the error stack to a specified
file
HEclear
Clears the error stack
HERROR
Reports an error
Pushes the following information onto the error stack:
3-4
National Center for Supercomputing Applications
HDF General Purpose Interface
HEreport
Error type
source file name
Line number and the name of the function reporting
the error
Adds a text string to the description of the most recently
reported error (only one text string per error)
Standard C does not enable the code inside a function to know the
name of the function. Therefore, to use the macro HERROR to report
errors, there must exist a variable FUNC which points to a string
containing the name of the reporting function.
Other
April 12, 1996
The Hsync routine has been defined and implemented to synchronize a
file with its image in memory. Currently it is not very useful because
the HDF software includes no buffering mechanism and the two images
are always identical. Hsync will become useful when buffering is
implemented:
Hsync
Synchronizes the stored version of an HDF file with the
image in memory
3-5
NCSA HDF Specification and Developer’s Guide
Function Specifications
The terms IN: and OUT: are used as follows in this discussion:
IN:
Value as input parameter
OUT: Value as output parameter
Opening and Closing Files
Hopen
int32 Hopen(char *path, int access, int16 ndds)
path
access
IN:
IN:
Name of file to be opened
DFACC_READ, DFACC_RDWR, DFACC_CREATE, DFACC_ALL, or
ndds
IN:
Purpose
Provides an access path to an HDF file and reads all of the DD blocks in the file
into primary memory.
Return value
Returns file ID if successful and FAIL ( -1) otherwise.
Description
Opens an HDF file.
DFACC_WRITE
Number of DDs in a block if this file needs to be created
The following events occur on successful exit:
• File_rec members are filled in. (File_rec is an internal HDF structure
containing information about the opened file.)
• The requested file is opened with the relevant permission.
• Information about DDs is set up in memory.
• The file headers and initial information are set up for new files.
Access privilege codes
HDF provides several constants for use as access privilege codes as listed below.
Note that these constants are not bit-flags and should not be ORed together to
combine access modes. Doing so may cause odd behavior and, in some cases,
loss of data:
Recommended:
DFACC_READ
Open for read only. If file does not exist, error.
DFACC_RDWR
Open for read/write. If file does not exist, create it.
DFACC_CREATE Force creation. If file exists, delete it, then open a new file
for read/write (in the spirit of the UNIX System command
clobber).
Others:
DFACC_ALL
Same as DFACC_RDWR (obsolete but still supported).
DFACC_WRITE
Same as DFACC_RDWR (obsolete but still supported).
3-6
National Center for Supercomputing Applications
HDF General Purpose Interface
Hclose
intn Hclose(int32 id)
id
IN:
The file ID of the file to be closed
Purpose
Closes the access path to the file.
Return value
Returns SUCCEED (0) if successful and FAIL (-1) otherwise.
Description
id is first validated. If valid, the function closes the access path to the file.
If there are still access elements attached to the file, the error DFE_OPENAID is
pushed onto the error stack and the file is not closed. This is a fairly common
error when developing new interfaces. See the discussion of Hendaccess below
for debugging hints.
April 12, 1996
3-7
NCSA HDF Specification and Developer’s Guide
Locating Elements for Access and Getting Information
Hstartread
int32 Hstartread(int32 file_id, uint16 tag, uint16 ref)
file_id
tag
ref
IN:
IN:
IN:
ID of file to attach access element to
Tag to search for
Reference number to search for
Purpose
Locates an existing data element with matching tag/ref and returns an access ID
for reading it.
Return value
Returns access element ID if successful and FAIL (-1) otherwise.
Description
Searches the DDs for a particular tag/ref combination. If the search is successful,
an access element is created, attached to the file, and positioned at the start of
that data element; otherwise an error is returned. Searching on wildcards begins
from the beginning of the DD list. Wildcards can be used for the tag or reference
number (DFTAG_WILDCARD and DFREF_WILDCARD) and they match any values.
Hnextread
intn Hnextread(int32 access_id, uint16 tag, uint16 ref, int origin)
access_id
tag
ref
origin
IN:
IN:
IN:
IN:
ID of a READ access element
Tag to search for
Reference number to search for
Position at which to start searching
Purpose
Locates and positions a read access ID on next occurrence of tag/ref.
Return value
Returns SUCCEED (0) if successful and FAIL (-1) otherwise.
Description
Searches for the next DD that fits the tag/ref. Wildcards apply. If origin is
DF_START, searches from start of DD list; if origin is DF_CURRENT, searches
from current position. Searching from the end of the file via DF_END is not yet
implemented.
If the search is successful, then the access element is positioned at the start of
that tag/ref; otherwise, the access ID is not modified.
3-8
National Center for Supercomputing Applications
HDF General Purpose Interface
Hstartwrite
int32 Hstartwrite(int32 file_id, uint16 tag, uint16 ref, int32 length)
file_id
tag
ref
length
IN:
IN:
IN:
IN:
ID of file to write to
Tag to write to
Reference number to write to
Length of the data element
Purpose
Creates or replaces data element with matching tag/ref.
Return value
Returns access element ID if successful and FAIL (-1) otherwise.
Description
Sets up an access element to write a data element. The DD list of the file is
searched first; if the tag/ref is found, the data element can be modified. If an
object with the corresponding tag/ref is not found, a new one is created.
Hendaccess
int32 Hendaccess(int access_id)
access_id
IN:
ID of access element to dispose of
Purpose
Disposes of access element for tag/ref.
Return value
Returns SUCCEED (0) if successful and FAIL (-1) otherwise.
Description
Disposes of an access element. Only a finite number of access elements can be
active at a given time, so it is important to call Hendaccess whenever you are
done using an element.
When developing new interfaces, a common mistake is to fail to call
Hendaccess for all of the elements accessed. When this happens, Hclose will
return FAIL and the dump of the error stack (see HEprint below) will tell how
many access elements are still active.
This can be a difficult problem to debug, as the low levels of the HDF library
have no idea who or what opened an access element and forgot to release it. A
tedious but effective means of debugging this problem is to annotate with
comments the locations where the attached count of a file record is changed.
This occurs in the files hfile.c, hblocks.c, and hextelt.c.
April 12, 1996
3-9
NCSA HDF Specification and Developer’s Guide
Hinquire
intn Hinquire(int32 access_id, int32 *pfile_id, uint16 *ptag,
uint16 *pref, int32 *plength, int32 *poffset, int32 *pposn,
int *paccess, int16 *pspecial)
access_id
pfile_id
ptag
pref
plength
poffset
pposn
paccess
pspecial
IN:
OUT:
OUT:
OUT:
OUT:
OUT:
OUT:
OUT:
OUT:
Access element ID
File ID
Tag of the element pointed to
Reference number of the element pointed to
Length of the element pointed to
Offset of element in the file
Position pointed to within the data element
Access type of this access element
Special code
Purpose
Returns access information for a data element.
Return value
Returns SUCCEED (0) if the access element points to some data element and
FAIL (-1) otherwise.
Description
Inquires for the statistics of the data element pointed to by the access element. If
a piece of information is not needed, a NULL can be sent in for that value.
Convenience macros for calls to Hinquire (HQuerypositon, HQuerylength,
etc.) are defined in hdf.h.
Hishdf
int32 Hishdf(char *path)
path
IN:
Name of file
Purpose
Determines whether a file is an HDF file.
Return value
Returns TRUE (non-zero) if file is an HDF file and FALSE ( 0) otherwise.
Description
The decision as to whether a file is an HDF file is based solely on the magic
number stored in the first four bytes of an HDF file. Hishdf may sometimes
identify a file as an HDF file that Hopen is unable to open (e.g., an HDF file
with a corrupted DD list).
Note: Hishdf only
determines whether a file is
an HDF file. It does not verify
that the file is readable.
3-10
National Center for Supercomputing Applications
HDF General Purpose Interface
Hnumber
int Hnumber(int32 file_id, uint16 tag)
file_id
tag
IN:
IN:
File ID
Tag to be counted
Purpose
Counts the number of occurrences of a tag in a file.
Return value
The number of occurrences of a tag in a file.
Hgetlibversion
Hgetlibversion(uint32 *majorv, uint32 *minorv, uint32 *release,
char string[])
majorv
minorv
release
string
OUT:
OUT:
OUT:
OUT:
Major version number
Minor version number
Release number
Informational text string
Purpose
Gets version information for current HDF library.
Return value
Returns SUCCEED (0).
Description
Returns the version of the HDF library. The version information is compiled
into the HDF library, so it is not necessary to have any open files for this
function to execute.
Hgetfileversion
Hgetfileversion(uint32 file_id, uint32 *majorv, uint32 *minorv,
uint32 *release, char *string)
file_id
majorv
minorv
release
string
IN:
OUT:
OUT:
OUT:
OUT:
Purpose
Gets version information for an HDF file.
Return value
Returns SUCCEED (0) if successful and FAIL (-1) otherwise.
Description
Returns the HDF version information stored in the given file.
April 12, 1996
File ID
Major version number
Minor version number
Release number
Informational text string
3-11
NCSA HDF Specification and Developer’s Guide
Reading and Writing Entire Data Elements
Hputelement
int Hputelement(int32 file_id, uint16 tag, uint16 ref, uint8 *data,
int32 length)
file_id
tag
ref
data
length
IN:
IN:
IN:
IN:
IN:
File ID
Tag of data element to put
Reference number of data element to put
Pointer to buffer
Length of data
Purpose
Adds or replaces an element in a file.
Return value
Returns SUCCEED (0) if successful and FAIL (-1) otherwise.
Description
Writes a new data element or replaces an existing data element in a HDF file.
Uses Hwrite and its associated routines.
Hgetelement
int Hgetelement(int32 file_id, uint16 tag, uint16 ref, uint8 *data)
file_id
tag
ref
data
IN:
IN:
IN:
OUT:
Purpose
Obtains the data referred to by the passed tag/ref.
Return value
Returns SUCCEED (0) if successful and FAIL (-1) otherwise.
Description
ID of the file to read from
Tag of data element to read
Reference number of data element to read
Buffer to read into
Reads a data element from an HDF file and puts it into the buffer pointed to by
data. The space allocated for the buffer is assumed to be large enough.
Note: Hgetelement assumes
that the buffer is large enough
to hold the data being read. It
is the user’s responsibility to
prevent data loss by ensuring
that this is the case.
3-12
National Center for Supercomputing Applications
HDF General Purpose Interface
Reading and Writing Part of a Data Element
Hread
int32 Hread(int32 access_id, int32 length, uint8 *data)
access_id
length
data
IN:
IN:
OUT:
Read access element ID
Length of segment to read in
Pointer to data array to read to
Purpose
Reads a portion of a data element.
Return value
Returns length of segment actually read if successful and FAIL (-1) otherwise.
Description
Reads in the next segment in the data element pointed to by the access element.
Hread starts at the last position left by an Hread or Hseek call and reads any
data that remains in the element up to length bytes. If the data element is too
short (less than length bytes long), Hread reads to the end of the data
element.
Hwrite
int32 Hwrite(int32 access_id, int32 length, uint8 *data)
access_id
length
data
IN:
IN:
IN:
Write access element ID
Length of segment to write
Pointer to data to write
Purpose
Writes next data segment to data element.
Return value
Returns length of segment successfully written and FAIL (-1) otherwise.
Description
Writes the data to the data element where the last Hwrite or Hseek stopped.
Hwrite starts at the last position left by an Hwrite or Hseek call, writes up to
a specified number of bytes, and leaves the write pointer at the end of the data
written. If the space reserved is less than the length to write, then only as much
as can fit is written.
It is the user’s responsibility to ensure that no two access elements are writing to
the same data element. Note that a user can interlace writes to multiple data
elements in the same file.
April 12, 1996
3-13
NCSA HDF Specification and Developer’s Guide
Hseek
intn Hseek(int32 access_id, int32 offset, int origin)
access_id
offset
origin
IN:
IN:
IN:
Access element ID
Offset to seek to
Position to seek from:
DF_START (0)
DF_CURRENT (1)
DF_END (2)
offset from beginning of data element
offset from current position
offset from end of data element
Purpose
Sets the access pointer to an offset within a data element. The next time Hread
or Hwrite is called, the read or write occurs from the new position.
Return value
Returns SUCCEED (0) if successful and FAIL (-1) otherwise.
Description
Sets the position of an access element in a data element so that the next Hread
or Hwrite will start from that position. origin determines the position from
which offset should be counted.
This routine fails if the access element is not associated with a data element or if
the position sought is outside of the data element.
Seeking from the end of a data element is not currently supported.
3-14
National Center for Supercomputing Applications
HDF General Purpose Interface
Manipulating Data Descriptors
Hdupdd
int Hdupdd(int32 file_id, uint16 tag, uint16 ref, uint16 old_tag,
uint16 old_ref)
file_id
tag
ref
old_tag
old_ref
IN:
IN:
IN:
IN:
IN:
File ID
Tag of new data descriptor
Reference number of new data descriptor
Tag of data descriptor to duplicate
Reference number of data descriptor to duplicate
Purpose
Generates new references to data that is already referenced from somewhere
else.
Return value
Returns SUCCEED (0) if successful and FAIL (-1) otherwise.
Description
Duplicates a data descriptor so that the new tag/ref points to the same data
element pointed to by the old tag/ref.
Hdeldd
int Hdeldd(int32 file_id, uint16 tag, uint16 ref)
file_id
tag
ref
IN:
IN:
IN:
Purpose
Deletes a tag/ref from the list of DDs.
Return value
Returns SUCCEED (0) if successful and FAIL (-1) otherwise.
Description
Deletes the data descriptor of tag/ref from the DD list of the file. This routine is
unsafe and may leave a file in a condition that is not usable by some routines.
Use with care.
April 12, 1996
File ID
Tag of data descriptor to delete
Reference number of data descriptor to delete
3-15
NCSA HDF Specification and Developer’s Guide
Hnewref
uint16 Hnewref(int32 file_id)
3-16
file_id
IN:
File ID
Purpose
Returns the next available reference number.
Return value
Returns the reference number if successful and 0 otherwise.
Description
Returns a reference number that can be used with any tag to produce a unique
tag/ref. Successive calls to Hnewref will generate a strictly increasing sequence
until the highest possible reference number has been returned; then Hnewref
will return unused reference numbers starting from 1.
National Center for Supercomputing Applications
HDF General Purpose Interface
Creating Special Data Elements
HLcreate
int32 HLcreate(int32 file_id, uint16 tag, uint16 ref,
int32 block_length, int32 number_blocks)
file_id
tag
ref
block_length
number_blocks
IN:
IN:
IN:
IN:
IN:
File ID
Tag of new data element (or object)
Reference number of new data element (or object)
Length of blocks to be used
Number of blocks to use per linked block record
Purpose:
Creates a new linked block special data element.
Return value
Returns access ID for special data element if successful and FAIL (-1)
otherwise.
Description
Appending to existing HDF elements was a problem prior to HDF Version 3.2
because HDF objects had to be stored contiguously. When appending, the HDF
library forced the user to delete the existing element and rewrite it at the end of
the file. HDF Version 3.2 introduced the concept of linked blocks, which allow
unlimited appending to existing elements without copying over existing data.
This routine can be used to create an object with the given tag/ref as a linked
block element or to promote an existing element to be stored in linked blocks.
Initially, a table is set up to accommodate number_blocks linked blocks for the
specified data object. Each block has block_length bytes. If an existing object
is being promoted, block_length does not have to be the same size as the
original element.
HLcreate returns an active access ID with write permission to the linked block
element.
April 12, 1996
3-17
NCSA HDF Specification and Developer’s Guide
HXcreate
int32 HXcreate(int32 file_id, uint16 tag, uint16 ref,
char *extern_file_name)
file_id
tag
ref
IN:
IN:
IN:
extern_file_name
IN:
file record ID
Tag of the special data element to create or promote
Reference number of the special data element to
create/promote
name of the external file to use for the data element
Purpose
Creates a new external file special data element.
Return value
Returns access ID for special data element if successful and FAIL ( -1)
otherwise.
Description
Creates a new element in an external file or promotes an existing element to be
stored in an external file. If an existing element is to be promoted, it is deleted
(using Hdeldd) from the original file and copied into the new external file.
Distributing a single object over multiple external files is not currently
supported. In addition, one cannot place multiple objects in the same external
file.
This routine returns an active access ID with write permission to the external
element.
3-18
National Center for Supercomputing Applications
HDF General Purpose Interface
Development Routines
HDgettagname
char *HDgettagname(uint16 tag)
tag
IN:
Tag to look up
Purpose
Gets a meaningful description of a tag.
Return value
Returns a pointer to a string describing this tag or NULL if the tag is unknown.
Description
To reduce the amount of duplicated code, this routine can be used to map a tag
to a character string containing the name of the tag.
The string returned by this routine is guaranteed to be 30 characters or less.
HDgetspace
void *HDgetspace(uint32 qty)
qty
IN:
Number of bytes to allocate
Purpose
Allocates space.
Return value
If successful, returns a pointer to space that was allocated; otherwise returns
NULL .
Description
Uses an appropriate allocation routine on the local machine to get space.
HDfreespace
void *HDfreespace(void *ptr)
ptr
IN:
Purpose
Frees space.
Return value
Returns NULL.
Description
Uses an appropriate routine on the local machine to free space. This routine is
platform dependent.
April 12, 1996
Pointer to previously-allocated space that is to be freed
3-19
NCSA HDF Specification and Developer’s Guide
HDstrncpy
char *HDstrncpy(register char *dest, register char *source,
int32 length)
3-20
dest
source
length
OUT:
IN:
IN:
Pointer to area to copy string to
Pointer to area to copy string from
Maximum number of bytes to copy
Purpose
Copies a string with maximum length length.
Return value
Returns address of dest.
Description
Creates a string in dest that is at most length characters long. The number of
characters must include the NULL terminator for historical reasons. Hence, if
you are working with the string Foo, you must call this copy function with the
value 4 (three characters plus the NULL terminator) in length.
National Center for Supercomputing Applications
HDF General Purpose Interface
Error Reporting
HEprint
void HEprint(FILE *stream, int32 level)
stream
level
IN:
IN:
Stream to print error messages on
Level of the error stack to print
Purpose
Prints information on the error stack.
Return value
Has no return value.
Description
Prints information on reported errors. If level is zero, all of the errors
currently on the error stack are printed. Output from this function is sent to the
file pointed to by stream.
The following information printed:
• An ASCII description of the error
• The reporting routine
• The reporting routine’s source file name
• The line at which the error was reported
If the programmer has supplied extra information by means of HEreport, this
information is printed as well.
HEclear
void HEclear(void)
Purpose
Clears all information on reported errors off of the error stack.
Return value
Has no return value.
Description
Clears all of the information off of the error stack.
HERROR
void HERROR(int16 number)
number
IN:
Error number
Purpose
Reports an error.
Return value
Has no return value.
Description
Reports an error. Any function calling HERROR must have a variable FUNC
which points to a string containing the name of the function.
HERROR is implemented as a macro.
April 12, 1996
3-21
NCSA HDF Specification and Developer’s Guide
HEreport
void HEreport(char *format, ....)
format
IN:
printf-style format and arguments
Purpose
Provides extra information to the error reporting routines.
Return value
Has no return value.
Description
Provides further annotation to an error report. Only one such annotation is
remembered for each error report. The arguments to this routine follow the style
of printf.
Consider the following example from hfile.c:
char *FUNC = "Hclose";
....
if (file_rec->attach > 0) {
file_rec->refcount++;
HERROR(DFE_OPENAID);
HEreport("There are still %d active aids attached", file_rec->attach);
return FAIL;
}
3-22
National Center for Supercomputing Applications
HDF General Purpose Interface
Other
Hsync
int Hsync(int32 file_id)
file_id
IN:
Purpose
Synchronizes on-disk HDF file with image in memory.
Return value
Returns SUCCEED.
Description
Hsync is not included in the current HDF library release because the on-disk
representation of an HDF file is always the same as its in-memory
representation. Hsync will be provided when future releases implement buffering
schemes.
April 12, 1996
ID of the file to synchronize
3-23
Chapter
4
Sets and Groups
Chapter Overview
This chapter discusses the roles of the following sets and groups in
organizing data stored in an HDF file:
• Raster image sets (RIS)
Raster image groups (RIG)
• Scientific data sets (SDS)
Scientific data groups (SDG)
Numeric data groups (NDG)
SDG-like NDGs
• Vsets
Vgroups
• Raster-8 sets (obsolete)
This chapter introduces several tags used in support of sets and groups.
All of these tags are fully described in Chapter 6, “Tag Specifications,”
and are listed in the table in Appendix A, “NCSA HDF Tags.”
Data Sets
HDF files frequently contain several closely related data objects. Taken
together, these objects form a data set which serves a particular user
requirement. For example, five or six data objects might be used to
describe a raster image; eight or more data objects might be used to
describe the results of a scientific experiment.
The HDF mechanism for specifying and controlling data sets is the
group. The data element of a group consists of a single record listing
the tag/refs for all the objects contained in the data set. For example,
the raster image groups described in the following sections each contain
three tag/refs that point to three data objects that, taken as a set, fully
describe an 8-bit raster image.
Types of Sets
April 12, 1996
The current HDF implementation supports three kinds of sets:
Raster image set
A set containing a raster image and descriptive information such
as the image dimensions and an optional color lookup table
Scientific data set
A set containing a multidimensional array and information
describing the data in the array
4-1
NCSA HDF Specification and Developer’s Guide
Vset
A general grouping structure containing any kinds of HDF
objects that a user wishes to include
Each HDF set is defined with a minimum collection of data objects that
will make sense when the set is used. For example, every raster image
set must contain at least the following data objects:
Raster image group
The list of the members of the set
Image dimension record
The width, height, and pixel size of the raster image
Raster image data
The pixel values that make up the image
In addition to the required objects, a set may include optional data
objects. An 8-bit raster image set, for instance, often contains a palette,
or color lookup table, which defines the red, green, and blue values
associated with each pixel in the raster image.
Calling Interfaces for
Sets
NCSA provides calling interfaces for all the HDF sets that it supports.
These interfaces provide routines for reading and writing the data
associated with each set. The libraries currently supported by NCSA are
callable from either C or FORTRAN programs.
In addition to the libraries, a growing number of command-line utilities
are available to manipulate sets. For example, a utility called r8tohdf
converts one or more raw raster images to HDF 8-bit raster image set
format.
The calling interfaces are described in the document NCSA HDF
Calling Interfaces and Utilities for Versions 3.2 and earlier and in the
NCSA HDF User’s Guide and NCSA HDF Reference Manual for
Version 3.3.
Groups
As discussed above, HDF data objects are frequently associated as sets.
But without some explicit identifying mechanism, there is often no
way to tie them together. To address this problem, HDF provides a
grouping mechanism called a group. A group is a data object that
explicitly identifies all of the data objects in a set.
Since a group is just another type of data object, its structure is like
that of any other data object; it includes a DD and a data element. But
instead of containing the pixel values for a raster image or the
dimensions of an array, a group data element contains a list of tag/refs
for the data objects that make up the corresponding set.
A group tag can be defined for any set. For instance, the raster image
group tag (RIG, DFTAG_RIG) is used to identify members of raster
image sets; the RIG data element lists the tag/refs for a particular raster
image set.
4-2
National Center for Supercomputing Applications
Sets and Groups
An Example
Figure 4.1
Suppose that the two images shown in Figure 1.5, “Physical
Representation of Data Objects,” are organized into two sets with group
tags. Since they are raster images, they may be stored as RIGs. Figure
4.1 illustrates the use of RIGs with these images.
Physical Organization of Sample RIG Groupings
Offset
0
4
10
22
34
46
58
70
82
94
106
118
130
134
175
943
947
240947
240951
480951
Item
Contents
FH
DDH
DD
DD
DD
DD
DD
DD
DD
DD
DD
DD
Data
Data
Data
Data
Data
Data
Data
Data
0e031301
10
(HDF magic number)
0L
DFTAG_FID
1
130
4
DFTAG_FD
1
134
41
DFTAG_LUT
1
175
768
DFTAG_ID
1
943
4
DFTAG_RI
1
947
240000
DFTAG_ID
2
240947
4
DFTAG_RI
2
240951
240000
DFTAG_RIG
1
480951
12
DFTAG_RIG
2
480963
12
DFTAG_NULL
(Empty)
sw3
solar wind simulation: third try. 8/8/88
...
400, 600 ...
...
400, 600 ...
(Data for image palette)
(Data for 1st image dimension record)
(Data for 1st raster image)
(Data for 2nd image dimension record)
(Data for 2nd raster image)
...
DFTAG_IP8/1, DFTAG_ID/1, DFTAG_RI/1
(Tag/refs for 1st RIG)
480963
Data
DFTAG_IP8/1, DFTAG_ID/2, DFTAG_RI/2
(Tag/refs for 2nd RIG)
The file depicted in Figure 4.1 contains the same raster image
information as the file in Figure 1.5, but the information is organized
into two sets. Note that there is only one palette (DFTAG_IP8/1) and
that it is included in both groups.
General Features of
Groups
April 12, 1996
Figure 4.1 also illustrates a number of important general features of
groups:
•
The contents of a group must be consistent with one another. Since
the palette (DFTAG_IP8) is designed for use with 8-bit images, the
image must be an 8-bit image.
•
An application program can easily process all of the images in the
file by accessing the groups in the file. The non-RIG information in
the example can be used or ignored, depending on the needs and
capabilities of the application program.
•
There is usually more than one way to group sets. For example, an
extra copy of the image palette (DFTAG_IP8) could have been stored
4-3
NCSA HDF Specification and Developer’s Guide
in the file so that each grouping would have its own image palette.
That is not necessary in this instance because the same palette is to
be used with both images. On the other hand, there are two image
dimension records in this example, even though one would suffice.
•
Group status does not alter the fundamental role of an HDF object;
it is still accessible as an individual data object despite the fact that
it also belongs to a larger set.
•
A group provides an index of the members of a set. There is nothing
to prevent the imposition of other groupings (indexes) that provide a
different view of the same collection of data objects. In fact, HDF is
designed to encourage the addition of alternate views.
The following sections formally describe raster image sets (RIS),
scientific data sets (SDS), Vsets, and several related groups. The last
section of this chapter discusses an obsolete structure known as the
raster-8 set.
4-4
National Center for Supercomputing Applications
Sets and Groups
Raster Image Sets (RIS)
The raster image set (RIS) provides a framework for storing images and
any number of optional image descriptors. An RIS always contains a
description of the image data layout and the image data. It may also
contain color look-up tables, aspect ratio information, color correction
information, associated matte or other overlay information, and any
other data related to the display of the image.
Raster Image Groups
(RIG)
Tying everything together is the raster image group (RIG, see Figure
4.1 and the related discussion for an example). An RIG contains a list
of tag/refs that point in turn to the data objects that make up and
describe the image.
The number of entries in an RIG is variable and most of the descriptive
information is optional. Complex applications may include references
to image-modifying data, such as the color table and aspect ratio, along
with the reference to the image data itself. Simple applications may use
simple application-level calls and ignore specialized video production or
film color correction parameters.
NCSA currently supports two RIG calling interfaces: RIS8 and RIS24.
These interfaces are described in the document NCSA HDF Calling
Interfaces and Utilities for Versions 3.2 and earlier and in the NCSA
HDF User’s Guide and NCSA HDF Reference Manual for Version 3.3.
RIS Tags
Table 4.1
RIS implementations must fully support all of the tags presented in
Table 4.1.
RIS Tags
Tag
Contents of Data Element
DFTAG_RIG
Raster image group
Image dimension record
Raster image data
DFTAG_ID
DFTAG_RI
With these tags, images can be stored and read from HDF files at any
bit depth, with several different component ordering schemes. As
illustrated in Figure 4.1, the RIG tag points to the collection of tag/refs
that fully describe the RIS. The data element attached to the tag
DFTAG_ID specifies the dimensions of the image, the number type of
the elements that make up its pixels, the number of elements per pixel,
the interlace scheme used, and the compression scheme used, if any.
The data element attached to the tag DFTAG_RI contains the actual raster
image data.
April 12, 1996
4-5
NCSA HDF Specification and Developer’s Guide
Figure 4.1
RIS Tags
DD List (tag/ref):
RIG/1
ID/1
RI/1
"Data:"
200 x 300, etc.
ID/1
RI/1
IP/1
The tags listed in Table 4.2 identify optional RIS information such as
color properties and aspect ratio. Note that the RI interface supports
only DFTAG_LUT at this time; the other tags in Table 4.2 are defined
but the interfaces have not been implemented.
Table 4.2
Optional RIS Tags
Tag
Contents of Data Element
DFTAG_XYP
XY position of image
Look-up table dimension record
Color look-up table for non true-color images
Matte channel dimension record
Matte channel data
Color correction factors
Color format designation
Aspect ratio
Machine-type override
DFTAG_LD
DFTAG_LUT
DFTAG_MD
DFTAG_MA
DFTAG_CCN
DFTAG_CFM
DFTAG_AR
DFTAG_MTO
4-6
National Center for Supercomputing Applications
Sets and Groups
Figure 4.2 illustrates the structure of an RIS that contains an image
palette (DFTAG_IP8).
Figure 4.2
RIS Tags for Sets Containing a Palette
DD List (tag/ref):
RIG/1
ID/1
RI/1
IP8/1
"Data:"
200 x 300, etc.
ID/1
Raster Image
Compression
RI/1
IP8/1
HDF currently supports two raster image compression tags:
DFTAG_RLE
DFTAG_IMCOMP
DFTAG_JPEG
Run-length encoding
Aerial averaging
JPEG compression
RIG support does not require support for all compression tags. Be sure
to provide a suitable error message to the user when an unknown
compression tag is encountered.
Since new forms of data compression can be added to HDF raster
images, incompatibilities can arise between old libraries and files
created by newer libraries. For example, HDF Version 3.3 includes
JPEG compression for images. A JPEG-compressed raster image in a
file created by an HDF Version 3.3 library cannot be read by an HDF
Version 3.2 library.
April 12, 1996
4-7
NCSA HDF Specification and Developer’s Guide
Scientific Data Sets
The scientific data set (SDS) provides a framework for storing
multidimensional arrays of data with descriptive information that
enhances the data. Current specifications support the following types of
numbers in SDS arrays.
•
•
8-bit, 16-bit, and 32-bit signed and unsigned integers
32-bit and 64-bit floating point numbers
Data in an SDS can be stored either as two's complement big endian
integers, as IEEE Standard floating point numbers, or in native mode,
the format used by the machine from which they were written.
The user interface for storing and retrieving SDSs is fully described in
the document NCSA HDF Calling Interfaces and Utilities for Versions
3.2 and earlier and in the NCSA HDF User’s Guide and NCSA HDF
Reference Manual for Version 3.3.
Backward and forward
compatibility
One of NCSA’s concerns in HDF development is always to maximize
backward and forward compatibility; as much as possible, any
application written to use HDF should be able to read data files written
with an older or a newer version of the libraries. To maximize this
compatibility, NCSA had to consider the following factors in upgrading
the SDS capabilities:
•
Support for future variations (e.g., new number types, data
compression, and new physical arrangements for SDS storage)
•
Older versions of the library should be able to read new data files if
the data itself can be interpreted by the older version. To do so, the
older version must be able to determine whether the data in a given
data object will be comprehensible to it. For example, if a newly
created file contains 32-bit IEEE floating point or Cray floating
point data objects, older versions of the library should be able
determine that fact then read and interpret the data.
•
New libraries must be able to read and interpret files created by older
versions.
Unfortunately, such compatibility concerns yield an SDS structure
somewhat more complex than would otherwise be the case. Two
examples illustrate the problem:
4-8
•
HDF 3.2 development had to accommodate the fact that HDF
Version 3.1 and previous versions only supported 32-bit IEEE
floating-point numbers and Cray floating point numbers in SDSs.
SDSs in HDF versions since Version 3.2 support 8-bit, 16-bit, and
32-bit signed and unsigned integers, 32-bit and 64-bit floating-point
numbers, and the local machine format (native mode) for all
supported architectures.
•
HDF 3.3 includes support for the netCDF data model, which
involved the creation of an entire new structure for supporting
netCDF objects, based on Vgroups and Vdatas. At the same time, a
goal of HDF 3.3 was to harmonize the SDS and the netCDF data
National Center for Supercomputing Applications
Sets and Groups
model, which was best accomplished by storing SDS objects in the
same way that netCDF objects are stored. In order to maintain
backward compatibility, two structures had to be created for every
SDS or netCDF object: one that could be recognized by older HDF
libraries, and the new structure.
In the following sections we describe how the first problem was solved.
A later issue of this manual will describe how the second problem was
addressed.
Internal Structures
The SDS capability was substantially enhanced for HDF Version 3.2.
Previous versions employed a structure known as a scientific data group
(SDG); Version 3.2 and subsequent versions use the numeric data group
(NDG). To accommodate the enhanced structure and to remain
compatible with previous releases, the current HDF library supports the
following scientific and numerical data groups:
SDGs
Created by old libraries and containing 32-bit IEEE and
Cray floating-point data.
NDGs
Created by the newer libraries (Version 3.2 and later) and
containing any acceptable floating-point or non-floatingpoint data. This data group will not be recognized by old
libraries.
SDG-like NDGs
Created by the new library and containing IEEE 32-bit
floating-point data only. The old libraries will recognize
and interpret these numerical data groups correctly.
The NDG structure supports 8-bit, 16-bit, and 32-bit signed and
unsigned integers, and 32-bit and 64-bit floating-point numbers. It also
supports native mode, data sets written to HDF files in the local
machine format.
The following sections describe the SDG, NDG, and SDG-like NDG
structures.
SDG Structures
Table 4.3
Required SDG Tags
SDGs must contain at least the data objects listed in Table 4.3.
Tag
Contents of Data Element
DFTAG_SDG
Scientific data group.
Dimension record for array-stored data. Includes the
rank (number of dimensions), the size of each
dimension, and the tag/refs representing the number
type of the array data and of each dimension.
All SDG number types are 32-bit IEEE floatingpoint.
Scientific data.
DFTAG_SDD
DFTAG_SD
In addition to the required data objects listed above, SDGs may contain
any of the objects listed in Table 4.4. Note that the optional data
objects are the same for SDGs, NDGs, and SDG-like NDGs; the only
differences are the number types that may be used.
April 12, 1996
4-9
NCSA HDF Specification and Developer’s Guide
Table 4.4
Optional SDG, NDG,
and SDG-like NDG
Tags
Tag
Contents of Data Element
DFTAG_SDS
Scales of the different dimensions. To be used when
interpreting or displaying the data (32-bit floating
point numbers only for SDGs and SDG-like NDGs).
Labels for all dimensions and for the data. Each of
the dimension labels can be interpreted as an
independent variable; the data label is the dependent
variable.
Units for all dimensions and for the data.
Format specifications to be used when displaying
values of the data.
Maximum and minimum values of the data. (32-bit
floating point numbers only for SDGs and SDG-like
NDGs.)
Coordinate system to be used when interpreting or
displaying the data.
DFTAG_SDL
DFTAG_SDU
DFTAG_SDF
DFTAG_SDM
DFTAG_SDC
As illustrated in Figure 4.3, the SDG tag points to the collection of
tag/refs that define the SDG.
Figure 4.3
SDG
Structure
DD list (tag/ref)
SDG/1
SDD/1
SD/1
SDM/1
Data
54 x 60, etc.
SDD/1
NDG Structures
Table 4.5
SD/1
2.3
2.5
...
1.6
4.5
4.8
4.1 ...
4.3 ...
3.9
7.2 ...
NDGs must contain at least the data objects listed in Table 4.5
Required NDG Tags
Tag
Contents of Data Element
DFTAG_NDG
Numerical data group.
Dimension record for array-stored data. Includes the
rank (number of dimensions), the size of each
dimension, and the tag/refs representing the number
types of the data and of each dimension.
DFTAG_SDD
4-10
SDM/1
max: 11.6
min: 0.2
National Center for Supercomputing Applications
Sets and Groups
DFTAG_SD
DFTAG_NT
In HDF 3.2 , the number types of dimension scales
must be the same as that of the array-stored data.
Later implementations allow dimension scales to be
typed separately.
Scientific data.
Number type of the data set. Default is the most
recent DFSDsetNT() setting. If DFSDsetNT() has
not been called, the default will be 32-bit IEEE
floating-point.
In addition to these required data objects, an NDG may contain any of
the data objects listed in Table 4.4, “Optional SDG, NDG, and SDGlike NDG Tags.”
As illustrated in Figure 4.4, the basic NDG and SDG structures are
identical. The first clue to the difference is that the NDG tag replaces
the SDG tag. This is a flag to prevent older libraries from stumbling
over the more important difference; the NDG data element can
accommodate data that pre-Version 3.2 libraries cannot interpret. The
new tag ensures that older libraries will not recognize the data object
and thus will not try to interpret the new data types. For example,
NDG data can include number types or a data compression scheme that
a pre-Version 3.2 library will not recognize.
Figure 4.4
NDG Structure
DD list (tag/ref)
NDG/1
SDD/1
SD/1
SDM/1
Data
54 x 60, etc.
SDD/1
SD/1
SDM/1
2.3
2.5
...
1.6
max: 11.6
min: 0.2
4.5
4.8
4.1 ...
4.3 ...
3.9
7.2 ...
SDG-like NDG Structures
As we have said earlier,
• SDGs, the SDS grouping structure available prior to HDF Version
3.2, could include only 32-bit floating point and Cray floating point
numbers.
• NDGs, available since Version 3.2, can include 8-bit, 16-bit, and
32-bit signed and unsigned integers, and 32-bit and 64-bit floating
point numbers.
• SDG-like NDGs, also available since Version 3.2, distinguish
SDSs that can still be read by the older versions of the library.
This backward compatibility is achieved by examining every SDS that
is written to an HDF file. If the SDS is compatible with older libraries,
April 12, 1996
4-11
NCSA HDF Specification and Developer’s Guide
it is written to the file with both SDG and NDG structures. If it is not
compatible with older libraries, only the NDG structure is used.
Table 4.6 lists the objects that SDG-like NDGs must contain.
Table 4.6
Required SDG-like
NDG Tags
Tag
Contents of Data Element
DFTAG_NDG
Numerical data group.
Scientific data group.
The NDG and SDG linked to the scientific data set
in this group.
Dimension record for array-stored data. Includes the
rank (number of dimensions), the size of each
dimension, and the tag/refs representing the
number types of the data and of each dimension.
In an SDG-like NDG, the number types are all 32bit IEEE floating-point.
Scientific data.
DFTAG_SDG
DFTAG_SDLNK
DFTAG_SDD
DFTAG_SD
SDG-like NDGs can include the same optional data objects as described
for SDGs and NDGs in Table 4.4, “Optional SDG, NDG, and SDGlike NDG Tags.”
Figure 4.5 illustrates the SDG-like NDG structure.
Figure 4.5
SDG-like NDG Structure
DD List (tag/ref)
SDG/1
NDG/1
SDLNK/1
SDD/1
SD/1
SDM/1
Data
SDG/1
SDD/1
SDD/1
SD/1
SD/1
Compatibility with
Future NDG Structures
4-12
SDM/1
SDM/1
NDG/1
54 x 60, etc.
SDLNK/1
2.3
2.5
...
1.6
4.5
4.8
4.1 ...
4.3 ...
3.9
7.2 ...
max: 11.6
min: 0.2
Future HDF releases will probably support additional optional SDS
features. These features will fall into the following categories:
Optional and compatible features
Optional features that are compatible with older HDF versions
even though they may not be supported in the older libraries.
National Center for Supercomputing Applications
Sets and Groups
For example, a new time stamp attribute might be added. The
time stamp would not be understood by older libraries, but it
would not render them unable to read the SDS data either
Optional and incompatible features
Optional new features that may render the data unreadable by
older HDF libraries.
For example, a compression attribute could be added. Older
HDF libraries that contain no compression routines would not
be able to read the compressed data.
A tag numbering convention has been developed to address this
problem:
Required tags
These tags are listed in Table 4.3, “Required SDG Tags,” Table
4.5, “Required NDG Tags,” and Table 4.6, “Required SDG-like
NDG Tags.” All SDSs must contain all of the tags in at least
one of these sets. (See Chapter 6, “Tag Specifications,” for the
assigned tag numbers.)
Optional-incompatible tags
Tags for new SDS features that might render the data set
unreadable by older libraries are each assigned a number t that
falls in a special range determined by the constants
DFTAG_EREQ and DFTAG_BREQ. That is, t must have a
value such that DFTAG_EREQ < t < DFTAG_BREQ.
When old software encounters a tag in this range that it is not
able to interpret, it should not process the group.
Optional-compatible tags
These tags can have any valid tag number not allocated to one
of the other two categories.
April 12, 1996
4-13
NCSA HDF Specification and Developer’s Guide
Vsets, Vdatas, and Vgroups
Vsets, Vdatas, and Vgroups enable users to create their own grouping
structures. Unlike RIGs, SDGs, and NDGs, HDF imposes no
required structure; they are implemented almost entirely at the user level
and are not specified in detail in HDF or in this document.* The only
specifications define DFTAG_VG, DFTAG_VH, and DFTAG_VS and the
formats of their respective data elements. A detailed discussion similar
to that for the other grouping structures is, therefore, inappropriate here.
Detailed information regarding the DFTAG_VG, DFTAG_VH, and
DFTAG_VS tags can be found in Chapter 6, “Tag Specifications.”
Conceptual and usage information can be found in the document NCSA
HDF Vset Version 2.0 for HDF Versions 3.2 and earlier and in the
NCSA HDF User’s Guide and the NCSA HDF Reference Manual for
HDF Version 3.3.
Figure 4.6. Illustration of a Vset
vgroup
text
March 15, 1990. Simulation
with k=10.0, beta=1.22e3.
Calculate the magnitude ...
palette
vdata
raster images
03
04
451.33
43
17
-3
72
523.21
34
22
45
77
684.19
57
57
45
67
762.93
45
36
3D mesh
An HDF Vset can contain any logical grouping of HDF data objects
within an HDF file. Vsets resemble the UNIX file system in that they
impose a basically hierarchical structure but also allow cross-linked data
objects. Unlike SDSs and RISs, Vsets have no prespecified content or
structure; users can use them to create structural relationships among
HDF objects according to their needs. Figure 4.6 illustrates a Vset.
A Vset is identified by a Vgroup, an HDF object that contains
information about the members of the Vset. The tag DFTAG_VG
identifies the Vgroup which contains the tag/refs of its members, an
*
Specialists in various fields are developing application program interfaces (APIs) that are becoming accepted standard
interfaces within their fields. Since these APIs are implemented with high level HDF functionality and using the standard
HDF user interface, they are user-level applications from the HDF development team’s point of view. From the final
enduser’s point of view, however, these APIs create a new level of user interface. When necessary, technical specifications
for these APIs and the associated interfaces will be presented by the specialized developers.
4-14
National Center for Supercomputing Applications
Sets and Groups
optional user-specified name, an optional user-specified class, and fields
that enable the Vgroup to be extended to contain more information.
The only required Vgroup tag is the tag that defines the Vgroup itself.
Table 4.7
The Vgroup Tag
Tag
Contents of Data Element
DFTAG_VG
Vgroup
Vgroups are fully described in the document NCSA HDF Vset, Version
2.0 for Versions 3.2 and earlier and in the NCSA HDF User’s Guide
and NCSA HDF Reference Manual for Version 3.3.
April 12, 1996
4-15
NCSA HDF Specification and Developer’s Guide
The Raster-8 Set
(Obsolete)
Current HDF versions use the raster image set (RIS) to manage raster
images. But before the RIS was implemented, a simpler, less flexible
set called the raster-8 set was used for storing 8-bit raster images. This
set is no longer supported in the HDF software, although it may turn
up in some older HDF files.*
Raster-8 Sets
Table 4.9
Raster-8 Set Tags
The raster-8 set is defined by a set of tags that provide the basic
information necessary to store 8-bit raster images and display them
accurately without requiring the user to supply dimensions or color
information. The raster-8 set tags are listed in Table 4.9.
Tag
Contents of Data Element
DFTAG_RI8
8-bit raster image data
8-bit raster image data compressed with run-length
encoding
IMCOMP compressed image data
Image dimension record
Image palette data
DFTAG_CI8
DFTAG_II8
DFTAG_ID8
DFTAG_IP8
Software that does not support DFTAG_CI8 or DFTAG_II8 must
provide appropriate error indicators to higher layers that might expect to
find these tags.
Compatibility Between
Raster-8 and Raster Image
Sets
To maintain backward compatibility with raster-8 sets, the RIS
interface stores tag/refs for both types of sets. For example, if an image
is stored as part of a raster image set, there is one copy each of the
image dimension data, the image data, and the palette data. But there
were two sets of tag/refs pointing to each data element: one for the RIS
and one for the raster-8 set. The image data, for example, is associated
with the tags DFTAG_RI8 and DFTAG_RI.
Note that future HDF releases will phase out support for the raster-8
Note: Raster-8 set support
will not be maintained in future set. Therefore, new software should not expect to find both raster-8 and
RIS structures supporting 8-bit raster images. Eventually, only RIS
HDF releases.
structures will be supported.
*
In fact, during the first three years that RIS was used, the HDF software stored raster images in both RIS and raster-8 sets.
4-16
National Center for Supercomputing Applications
Chapter
5
Annotations
Chapter Overview
This chapter introduces annotations, HDF data objects used to annotate
HDF files and objects.
The tags introduced in this chapter are fully described in Chapter 6,
“Tag Specifications,” and are listed in the table in Appendix A, “Tags
and Extended Tag Labels.”
General Description
It is often useful to attach a text annotation to an HDF file or its
contents and to store that annotation in the same HDF file. HDF
provides this capability through the annotation data object.
The data element of an annotation is a sequence of ASCII characters that
can be associated with any of three types of objects:
• The file itself
• An individual HDF data object in the file
• A tag that identifies a data element
The current annotation interface supports only the first two.
Annotations come in two forms:
Label
A short, NULL-terminated string. Labels may
include no embedded NULLs.
Description
A longer and more complex body of text of a
pre-defined length. Descriptions may contain
embedded NULLs.
Annotations are never required; they are used strictly at the discretion of
the creator or user of an HDF file.
Table 5.1 shows the currently defined annotation types and their
assigned tags.
Table 5.1
Annotation Tags
Label Types
File annotations
Object annotations
Tag annotations
April 12, 1996
Description Types
DFTAG_FID
DFTAG_FD
DFTAG_DIL
DFTAG_DIA
DFTAG_TID
DFTAG_TD
5-1
NCSA HDF Specification and Developer’s Guide
The annotation interface is fully described in the document NCSA HDF
Calling Interfaces and Utilities for Versions 3.2 and earlier and in the
NCSA HDF User’s Guide and NCSA HDF Reference Manual for
Version 3.3
File Annotations
Any HDF file can include label annotations (DFTAG_FID) and/or
description annotations (DFTAG_FD). The file annotation interface
routines provided in the HDF software read and write file labels and file
descriptions.
Object Annotations
HDF data object annotation is complicated by the fact that you must
uniquely identify the object being annotated. Since a tag/ref uniquely
identifies a data object, the data object that a particular annotation refers
to can be identified by storing the object's tag and reference number
with the annotation.
Note that an HDF annotation is itself a data object, so it has its own
DD. This DD has a tag/ref that points to the data element containing
the annotation. The annotation data element contains the following
information:
• The tag of the annotated object
• The reference number of the annotated object
• The annotation itself
For example, suppose you have an HDF file that contains three
scientific data sets (SDSs). Each SDS has its own DD consisting of the
SDS tag DFTAG_SDG and a unique reference number, as illustrated in
Figure 5.1.
Figure 5.1
Three SDS Tag/refs
Tag
Ref
DFTAG_NDG
2
DFTAG_NDG
4
DFTAG_NDG
9
Suppose you wish to attach the following annotation to the second
SDS: “Data from black hole experiment 8/18/87.” This text will be
stored in a description annotation data object. The data element will
include the tag/ref, DFTAG_NDG/4, and the annotation itself. Figure 5.2
illustrates the annotation data object.
Figure 5.2
5-2
Sample Annotation Data Object
National Center for Supercomputing Applications
Annotations
Annotation DD
DFTAG_DIA
Getting Reference
Numbers for Object
Annotations
2
DFTAG_NDG
4
Tag
Ref
Data from black hole experiment 8/18/87
Description
To use annotation routines, you need to know the tags and reference
numbers of the objects you wish to annotate.
The following routines return the most recent reference number used in
either reading or writing the specified type of data object:
DFSDlastref
SDS data objects
DFR8lastref
RIS data objects
DFPlastref
Palettes
DFANlastref
Annotations
Reference numbers for other objects can be obtained with the routine
Hfindnextref, a general purpose HDF routine that searches an HDF
file sequentially for reference numbers associated with a given tag.
These routines are described in the document NCSA HDF Calling
Interfaces and Utilities for Versions 3.2 and earlier and in the NCSA
HDF User’s Guide and NCSA HDF Reference Manual for Version 3.3.
April 12, 1996
5-3
Chapter
6
Tag Specifications
Chapter Overview
This chapter addresses issues related to HDF tags and the data they
represent. The first section provides general information about tags and
their interpretation. The remainder of the chapter contains a complete
list of tags supported by NCSA HDF Version 3.3 and detailed tag
specifications.
The HDF Tag Space
As discussed in Chapter 1, "The Basic Structure of HDF Files," 16 bits
are allotted for an HDF tag number. This provides for 65535 possible
tags, ranging from 1 to 65535; zero (0) is not used. This tag space is
divided into three ranges:
1 – 32767
Reserved for NCSA-supported tags
32768 – 64999
Set aside as user-definable tags
65000 – 65535
Reserved for expansion of the format
No restrictions are placed on the user-definable tags. Note that tags from
this range are not expected to be unique across user-developed HDF
applications.
The rest of this chapter is devoted to the NCSA-supported tags in the
range 1 to 32767.
Extended Tags and Alternate Physical Storage Methods
Prior to HDF Version 3.2, each data element had to be stored in one
contiguous block in the basic HDF file. Version 3.2 introduced
extended tags, a mechanism supporting alternate physical data element
storage structures. All NCSA-supported tags with variable-sized data
elements can take advantage of the extended tag features.
Extended Tag
Implementation
Extended tags are automatically recognized by current versions of the
HDF library and interpreted according to a description record. The
description record, a complete data element, identifies the type of
extended element and provides the relevant parameters for data retrieval.
Extended tags currently support two styles of alternate physical storage:
Linked block elements are stored in several non-contiguous blocks
within the basic HDF file.
April 12, 1996
6-1
NCSA HDF Specification and Developer’s Guide
External elements are stored in a separate file, external to the basic
HDF file.
Every NCSA-supported tag is represented in HDF libraries and files by
a tag number. NCSA-supported tags that take advantage of alternative
physical storage features have an alternative tag number, called an
extended tag number, that appears instead of the original tag number
when an alternative physical storage method is in use.
When NCSA determines that an extended tag should be defined for a
given tag, the extended tag number is determined by performing an
arithmetic OR with the original tag number and the hexadecimal
number 0x4000. For example, the tag DFTAG_RI points to a data
element containing a raster image. If the data element is stored
contiguously in the same HDF file, the DD contains the tag number
302; if the data element is stored either in linked blocks or in an
external file, the DD contains the extended tag number 16384.
If a data object uses a regular tag number, its storage structure will be
exactly as described in the “Tag Specifications” section of this chapter.
Figure 6.1 illustrates this general structure with the DD pointing
directly to a single, contiguous data block.
Figure 6.1
Regular Data Object
regular_tag
ref_no
data_element
regular_tag
ref_no
data_element
Tag number
Reference number
The data element
If a data object uses an extended tag, the storage structure will appear
generally as illustrated in Figure 6.2. The DD will point to an extended
tag description record which in turn will point to the data.
Figure 6.2
Data Object with
Extended Tag
extended_tag
ref_no
ext_tag_desc
data_location_information
data (in linked blocks or external file)
extended_tag
ref_no
ext_tag_desc
6-2
Extended tag number
Reference number
A 32-bit constant defined in Hdfi.h that identifies
the type of alternative storage involved. Current
definitions include EXT_LINKED for linked block
elements or EXT_EXTERN for external elements.
National Center for Supercomputing Applications
Tag Specifications
data_location_information
Information identifying and describing the linked
blocks or external file
The data, stored either in linked blocks or in an
external file
data
Since the HDF tools were modified for HDF Version 3.2 to handle
extended tags automatically, the only thing the user ever has to do is
specify the use of either the linked blocks mechanism or an external
file. Once that has been specified, the user can forget about extended
tags entirely; the HDF library will manage everything correctly.
There is only one circumstance under which an HDF user will need to
be concerned with the difference between regular tag numbers and
extended tag numbers. If a user bypasses the regular HDF interface to
examine a raw HDF file, that user will have to know the extended tag
numbers, their significance, and the alternative storage structures.
Linked Block Elements
As mentioned above, data elements had to be stored as single
contiguous blocks within the basic HDF file prior to HDF Version 3.2.
This meant that if a data element grew larger than the allotted space, the
file had to be erased from its current location and rewritten at the end of
the file.
Linked blocks provide a convenient means of addressing this problem
by linking new data blocks to a pre-existing data element. Linked block
elements consist of a series of data blocks chained together in a linked
list (similar to the DD list). The data blocks must be of uniform size,
except for the first block, which is considered a special case.
The linked block data element is a description record beginning with the
constant EXT_LINKED, which identifies the linked block storage method.
The rest of the record describes the organization of the data element
stored as linked blocks. Figure 6.3 illustrates a linked block description
record.
Figure 6.3
Linked Block
Description Record
extended_tag
EXT_LINKED
blk_len
extended_tag
ref_no
EXT_LINKED
length
first_len
blk_len
num_blk
April 12, 1996
ref_no
length
num_blk
first_len
link_ref
The extended tag counterpart of any NCSA standard
tag (16-bit integer)
Reference number (16-bit integer)
Constant identifying this as a linked block
description record (32-bit integer)
Length of entire element (32-bit integer)
Length of the first data block (32-bit integer)
Length of successive data blocks (32-bit integer)
Number of blocks per block table (32-bit integer)
6-3
NCSA HDF Specification and Developer’s Guide
Reference number of first block table (16-bit integer)
link_ref
The link_ref field of the description record gives the reference
number of the first linked block table for the element. This table is
identified by the tag/ref DFTAG_LINKED/link_ref and contains
num_blk entries. There may be any number of linked block tables
chained together to describe a linked block element. Figure 6.4
illustrates a linked block table.
Figure 6.4
A Linked Block Table
link_ref
DFTAG_LINKED
next_ref
link_ref
next_ref
blk_ref_n
blk_ref_1
blk_ref_2
Reference number for this table (16-bit integer)
Reference number for next table (16-bit integer)
Reference number for data block (16-bit integer)
The next_ref field contains the reference number of the next linked
block table. A value of zero (0) in this field indicates that there are no
additional linked block tables associated with this element.
The blk_ref_n fields of each linked block table contain reference
numbers for the individual data blocks that make up the data portion of
the linked block element. These data blocks are identified by the tag/ref
DFTAG_LINKED/blk_ref_n as illustrated in Figure 6.5. Although it
may seem ambiguous to use the same tag to refer to two different
objects, this ambiguity is resolved by the context in which the tags
appear.
Figure 6.5
A Data Block
DFTAG_LINKED
blk_ref_n
data_block
blk_ref_n
data_block
Reference number for this data block (16-bit integer)
Block of actual data (size specified by first_len or
blk_len in the description record)
Linked block elements can be created using the function HLcreate(),
which is discussed in Chapter 3, “The HDF General Purpose Interface.”
External Elements
External elements allow the data portion of an HDF element to reside in
a separate file. The potential of external data elements is largely
unexplored in the HDF context, although other file formats (most
notably the Common Data Format, CDF, from NASA) have used
external data elements to great advantage.
Because there has been little discussion of external elements within the
HDF user community, the structure of these elements is still not
6-4
National Center for Supercomputing Applications
Tag Specifications
completely defined. Figure 6.6 shows a diagram of the suggested
structure for an external element.
Figure 6.6
External Element
Description Record
extended_tag
EXT_EXTERN
extended_tag
ref_no
EXT_EXTERN
offset
length
filename
ref_no
offset
length
filename
The extended tag counterpart of any NCSA standard
tag (16-bit integer)
Reference number (16-bit integer)
Constant identifying this as an external element
description record (16-bit integer)
Location of the data within the external file (32-bit
integer)
Length in bytes of the data in the external file (32bit integer)
Non-null terminated ASCII string naming the
external file (any length)
An external element description record begins with the constant
EXT_EXTERN, which identifies the data object as having an externally
stored data element. The rest of the description record consists of the
specific information required to retrieve the data.
External elements can be created using the function HXcreate(), which
is discussed in Chapter 3, “The HDF General Purpose Interface.”
April 12, 1996
6-5
NCSA HDF Specification and Developer’s Guide
Tag Specifications
The following pages contain the specifications of all the NCSAsupported tags in HDF Version 3.3. Each entry contains the following
information:
•
The tag (in capital letters in the left margin)
•
The full name of the tag (on the first line to the right)
•
The type and, where possible, the amount of data in the
corresponding data element (on the second line to the right)
When the data element is a variable-sized data structure—such as
text, a string, or a variable-sized array—the amount of data cannot
be specified exactly. Where possible, a formula is provided to
estimate the amount of data. The string ? bytes appears when
neither the size nor the structure of the data element can be specified.
•
The tag number in decimal/(hexadecimal) (on the third line to the
right)
•
A diagram illustrating the structure of the tag and its associated data
Since all DDs that point to a data element contain data length and
data offset fields, these fields are not included in the illustrations.
•
A full specification of the tag, including a description of the data
element and a discussion of its intended use.
Tags are roughly grouped according to the roles they play:
• Utility tags
• Annotation tags
• Compression tags
• Raster Image tags
• Composite image tags
• Vector image tags
• Scientific data set tags
• Vset tags
• Obsolete tags
These groupings imply a general context for the use of each tag; they
are not meant to restrict their use.
Please note the subsection “Obsolete Tags.” These tags have fallen out
of use with the continuing development of HDF. They are still
recognized by the HDF library, but users should not write new objects
using them; they may eventually be dropped from the HDF
specification.
In the following discussion, the ground symbol indicates that the DD
for this tag includes no pointer to a data element. I.e., there is never a
data element associated with the tag.
This symbol indicates that
there is no data element
associated with the tag.
6-6
National Center for Supercomputing Applications
Tag Specifications
Utility Tags
DFTAG_NULL
No data
0 bytes
1 (0x0001)
DFTAG_NULL
ref_no
Reference number (16-bit integer; always 0)
ref_no
This tag is used for place holding and to fill empty portions of the data
description block. The length and offset fields (not shown) of a
DFTAG_NULL DD must be zero (0).
DFTAG_VERSION
Library version number
12 bytes plus the length of a string
30 (0x001E)
ref_no
DFTAG_VERSION
majorv
ref_no
majorv
minorv
release
string
minorv
release
string
Reference number (16-bit integer)
Major version number (32-bit integer)
Minor version number (32-bit integer)
Release number (32-bit integer)
Non-null terminated ASCII string (any length)
The data portion of this tag contains the complete version number and a
descriptive string for the latest version of the HDF library to write to
the file.
April 12, 1996
6-7
NCSA HDF Specification and Developer’s Guide
DFTAG_NT
Number type
4 bytes
106 (0x006A)
ref_no
DFTAG_NT
version
ref_no
version
type
width
class
type
width
class
Reference number (16-bit integer)
Version number of NT information (8-bit integer)
Unsigned integer, signed integer, unsigned character,
character, floating point, double precision floating point (8bit code)
Number of bits, all of which are assumed to be significant
(8-bit code)
A generic value, with different interpretations depending on
type: floating point, integer, or character (8-bit code)
Several values that may be used for each of the three types in the field
CLASS are listed in Table 6.1. This is not an exhaustive list.
Table 6.1
Number Type Values
Type
Floating point
Integer
Character
Mnemonic
Value
DFNTF_NONE
0
DFNTF_IEEE
1
DFNTF_VAX
2
DFNTF_CRAY
3
DFNTF_PC
4
DFNTF_CONVEX
5
DFNTI_MBO
1
DFNTI_IBO
2
DFNTI_VBO
4
DFNTC_ASCII
1
DFNTC_EBCDOC
2
DFNTC_BYTE
0
The number type flag is used by any other element in the file to
indicate specifically what a numeric value looks like. Other tag types
should contain a reference number pointer to an DFTAG_NT instead of
containing their own number type definitions.
The version field allows expansion of the number type information, in
case some future number types cannot be described using the fields
currently defined. Successive versions of the DFTAG_NT may be
substantially different from the current definition, but backward
compatibility will be maintained. The current DFTAG_NT version
number is 1.
6-8
National Center for Supercomputing Applications
Tag Specifications
DFTAG_MT
Machine type
0 bytes
107 (0x006B)
DFTAG_MT
double
float
int
char
double
float
int
char
Specifies method of encoding double precision floating point
(4-bit code)
Specifies method of encoding single precision floating point
(4-bit code)
Specifies method of encoding integers (4-bit code)
Specifies method of encoding characters (4-bit code)
DFTAG_MT specifies that all unconstrained or partially constrained values
in this HDF file are of the default type for that hardware. When
DFTAG_MT is set to VAX, for example, all integers will be assumed to be
in VAX byte order unless specifically defined otherwise with a
DFTAG_NT tag. Note that all of the headers and many tags, the whole
raster image set for example, are defined with bit-wise precision and
will not be overridden by the DFTAG_MT setting.
For DFTAG_MT, the reference field itself is the encoding of the DFTAG_MT
information. The reference field is 16 bits, taken as four groups of four
bits, specifying the types for double-precision floating point, floating
point, integer, and character respectively. This allows 16 generic
specifications for each type.
To the user, these will be defined constants in the header file hdf.h,
specifying the proper descriptive numbers for Sun, VAX, Cray,
Convex, and other computer systems. If there is no DFTAG_MT in a file,
the application may assume that the data in the file has been written on
the local machine; any portability problems must be addressed by the
user. For this reason, we recommend that all HDF files contain a
DFTAG_MT for maximum portability.
April 12, 1996
6-9
NCSA HDF Specification and Developer’s Guide
Currently available data encodings are listed in Table 6.2.
Table 6.2
Available Machine Types
Type
Available Encodings
Double precision floating
point
IEEE64
VAX64
CRAY128
IEEE32
VAX32
CRAY64
VAX32
Intel16
Intel32
Motorola32
CRAY64
ASCII
EBCDIC
Floating point
Integers
Characters
New encodings can be added for each data type as the need arises.
6-10
National Center for Supercomputing Applications
Tag Specifications
Annotation Tags
DFTAG_FID
File identifier
String
100 (0x0064)
DFTAG_FID
ref_no
character_string
Reference number (16-bit integer)
character_string Non-null terminated ASCII text (any length)
ref_no
This tag points to a string which the user wants to associate with this
file. The string is not null terminated. The string is intended to be a
user-supplied title for the file.
DFTAG_FD
File description
Text
101 (0x0065)
DFTAG_FD
ref_no
text_block
ref_no
text_block
Reference number (16-bit integer)
Non-null terminated ASCII text (any length)
This tag points to a block of text describing the overall file contents.
The text can be any length. The block is not null terminated. The text
is intended to be user-supplied comments about the file.
DFTAG_TID
Tag identifier
String
102 (0x0066)
DFTAG_TID
tag
character_string
tag
Tag number to which this tag refers (16-bit integer)
character_string
Non-null terminated ASCII text (any length)
The data for this tag is a string that identifies the functionality of the
tag indicated in the space normally used for the reference number. For
April 12, 1996
6-11
NCSA HDF Specification and Developer’s Guide
example, the tag identifier for DFTAG_TID might point to data that reads
"tag identifier."
Many tags are identified in the HDF specification, so it is usually
unnecessary to include their identifiers in the HDF file. But with userdefined tags or special-purpose tags, the only way for a human reader to
diagnose what kind of data is stored in a file is to read tag identifiers.
Use tag descriptions to define even more detail about your user-defined
tags.
Note that with this tag you may make use of the user-defined tags to
check for consistency. Although two persons may use the same userdefined tag, they probably will not use the same tag identifier.
DFTAG_TD
Tag description
Text
103 (0x0067)
DFTAG_TD
tag
text_block
tag
text_block
Tag number to which this tag refers (16-bit integer)
Non-null terminated ASCII text (any length)
The data for this tag is a text block which describes in relative detail the
functionality and format of the tag which is indicated in the space
normally occupied by the reference number. This tag is intended to be
used with user-defined tags and provides a medium for users to exchange
files that include human-readable descriptions of the data.
It is important to provide everything that a programmer might need to
know to read the data from your user-defined tag. At the minimum, you
should specify everything you would need to know in order to retrieve
your data at a later date if the original program were lost.
6-12
National Center for Supercomputing Applications
Tag Specifications
DFTAG_DIL
Data identifier label
String
104 (0x0068)
ref_no
DFTAG_DIL
obj_tag
ref_no
obj_tag
obj_ref_no
obj_ref_no
character_string
Reference number (16-bit integer)
Tag number of the data to which this label applies (16bit integer)
Reference number of the data object to which this label
applies (16-bit integer)
character_string
Non-null terminated ASCII text (any length)
The DFTAG_DIL data object consists of a tag/ref followed by a string.
The string serves as a label for the data identified by the tag/ref.
By including DFTAG_DIL tags, you can give a data object a label for
future reference. For example, DFTAG_DIL can be used to assign titles to
images.
DFTAG_DIA
Data identifier annotation
Text
105 (0x0069)
ref_no
DFTAG_DIA
obj_tag
ref_no
obj_tag
obj_ref_no
text_block
obj_ref_no
text_block
Reference number (16-bit integer)
Tag number of the data to which this annotation applies
(16-bit integer)
Reference number of the data object to which this
annotation applies (16-bit integer)
Non-null terminated ASCII text (any length)
The DFTAG_DIA data object consists of a tag/ref followed by a text
block. The text block serves as an annotation of the data identified by
the tag/ref.
With a DFTAG_DIA tag, any data object can have a lengthy, user-written
description. This can be used to include comments about images, data
sets, source code, and so forth.
April 12, 1996
6-13
NCSA HDF Specification and Developer’s Guide
Compression Tags
DFTAG_RLE
Run length encoded data
0 bytes
11 (0x000B)
DFTAG_RLE
ref_no
ref_no
Reference number (16-bit integer)
This tag is used in the DFTAG_ID compression field and in other places
to indicate that an image or section of data is encoded with a run-length
encoding scheme. The RLE method used is byte-wise. Each run is
preceded by a count byte. The low seven bits of the count byte indicate
the number of bytes (n). The high bit of the count byte indicates
whether the next byte should be replicated n times (high bit = 1), or
whether the next n bytes should be included as is (high bit = 0).
See also:
DFTAG_IMC
DFTAG_ID in “Raster Image Tags”
DFTAG_NDG in “Scientific Data Set Tags”
IMCOMP compressed data
0 bytes
12 (0x000C)
DFTAG_IMC
ref_no
ref_no
Reference number (16-bit integer)
This tag is used in the DFTAG_ID compression field and in other places
to indicate that an image or section of data is encoded with an
IMCOMP encoding scheme. This scheme is a 4:1 aerial averaging
method which is easy to decompress. It counts color frequencies in 4x4
squares to optimize color sampling.
See also:
6-14
DFTAG_ID in “Raster Image Tags”
DFTAG_NDG in “Scientific Data Set Tags”
National Center for Supercomputing Applications
Tag Specifications
DFTAG_JPEG
24-bit JPEG compression information
? bytes
13 (0x000D)
DFTAG_JPEG
ref_no
JFIF header
ref_no
Reference number (16-bit integer)
This tag points to header information for 24-bit JPEG compressed
images. The data in this tag is identical to the header data stored in a
JFIF (JPEG File Interchange Format) file up to the start-of-frame
parameter. The start-of-frame parameter and all further data for the JPEG
image is stored in the associated DFTAG_CI data element which is the
companion to the DFTAG_JPEG element. (See the document JPEG File
Interchange Format* for a detailed description of the file format.)
DFTAG_GREYJPEG
8-bit JPEG compression information
? bytes
14 (0x000E)
DFTAG_GREYJPEG
ref_no
JFIF header
ref_no
Reference number (16-bit integer)
This tag points to header information for 8-bit JPEG compressed
images. The data in this tag is identical to the header data stored in a
JFIF (JPEG File Interchange Format) file up to the start-of-frame
parameter (see the JFIF format document for further details). The startof-frame parameter and all further data for the JPEG image is stored in
the associated DFTAG_CI data element which is the companion to the
DFTAG_JPEG element.
*
The document JPEG File Interchange Format has not been published in a regular periodical. An electronic copy is available
as a Postscript file from NCSA’s FTP server ftp.ncsa.uiuc.edu in the same directory as this document, NCSA
HDF Specification and Developer’s Guide. Printed copies are available from C-Cube Microsystems, 1778 McCarthy
Boulevard, Milpitas, CA 95035 (phone: 408-944-6300. Fax: 408-944-6314. Current email contact:
[email protected]).
April 12, 1996
6-15
NCSA HDF Specification and Developer’s Guide
DFTAG_CI
Compressed raster image
? bytes
303 (0x012F
DFTAG_CI
ref_no
ref_no
Reference number (16-bit integer)
This tag points to a stream of bytes that make up a compressed image.
The type of compression, together with any necessary parameters, are
stored as a separate data object. For example, if DFTAG_JPEG is
contained in the same raster image group, the stream of bytes contains
the sratt-of-frame parameter and all further data for the JPEG image.
Other parameters are stored in the DFTAG_JPEG object.
6-16
National Center for Supercomputing Applications
Tag Specifications
Raster Image Tags
DFTAG_RIG
Raster image group
n*4 bytes (where n is the number of data objects in the group)
306 (0x0132)
DFTAG_RIG
tag_1
ref_no
tag_n
ref_n
ref_no
ref_1
tag_2
ref_2
Reference number (16-bit integer)
Tag number for nth member of the group (16-bit integer)
Reference number for nth member of the group (16-bit
integer)
The RIG data element contains the tag/refs of all the data objects
required to display a raster image correctly. Application programs that
deal with RIGs should read all the elements of a RIG and process those
identifiers which it can display correctly. Even if the application cannot
process all of the objects, the objects that it can process will be usable.
Table 6.3 lists the tags that may appear in an RIG.
Table 6.3
Available RIG Tags
Tag
Description
DFTAG_ID
Image dimension record
Raster image
X-Y position
LUT dimension
Color lookup table
Matte channel dimension
Matte channel
Color correction
Color format
Aspect ratio
DFTAG_RI
DFTAG_XYP
DFTAG_LD
DFTAG_LUT
DFTAG_MD
DFTAG_MA
DFTAG_CCN
DFTAG_CFM
DFTAG_AR
Example
DFTAG_ID, DFTAG_RI, DFTAG_LD, DFTAG_LUT
Assume that an image dimension record, a raster image, an LUT
dimension record, and an LUT are all required to display a particular
raster image correctly. These data objects can be associated in an RIG so
that an application can read the image dimensions then the image. It
will then read the lookup table and display the image.
DFTAG_ID
DFTAG_LD
DFTAG_MD
April 12, 1996
DFTAG_ID
DFTAG_LD
DFTAG_MD
Image dimension
20 bytes
300 (0x012C)
LUT dimension
20 bytes
307 (0x0133)
Matte dimension
20 bytes
308 (0x0134)
6-17
NCSA HDF Specification and Developer’s Guide
ref_no
DFTAG_ID
x_dim
y_dim
elements
ref_no
x_dim
y_dim
NT_ref
elements
interlace
comp_tag
comp_ref
DFTAG_NT
interlace
NT_ref
comp_tag
comp_ref
Reference number (16-bit integer)
Length of x (horizontal) dimension (32-bit integer)
Length of y (vertical) dimension (32-bit integer)
Reference number for number type information
Number of elements that make up one entry (16-bit
integer)
Type of interlacing used (16-bit integer)
0 The components of each pixel are together.
1 Color elements are grouped by scan lines.
2 Color elements are grouped by planes.
Tag which tells the type of compression used and any
associated parameters (16-bit integer)
Reference number of compression tag (16-bit integer)
These three dimension records have exactly the same format; they
specify the dimensions of the 2-dimensional arrays after which they are
named and provide information regarding other attributes of the data in
the array:
DFTAG_ID specifies the dimensions of a DFTAG_RI.
DFTAG_LD specifies the dimensions of a DFTAG_LUT.
DFTAG_MD specifies the dimensions of a DFTAG_MA.
Other attributes described in the image dimension record include the
number type of the elements, the number of elements per pixel, the
interlace scheme used, and the compression scheme used (if any).
For example, a 512x256 row-wise 24-bit raster image with each pixel
stored as RGB bytes would have the following values:
x_dim
512
y_dim
256
NT_ref
UINT8
elements
3 (3 elements per pixel: e.g., R, G, and B)
interlace
0 (RGB values not separated)
comp_tag
0 (no compression is used)
The diagram above illustrates the tag DFTAG_ID. The DFTAG_LD and
DFTAG_MD diagrams would be identical except for the tag name in the
fist cell, whch would be DFTAG_LD and DFTAG_MD, respectively.
6-18
National Center for Supercomputing Applications
Tag Specifications
DFTAG_RI
Raster image
xdim*ydim*elements*NTsize bytes (xdim, ydim, elements, and NTsize
are specified in the corresponding DFTAG_ID)
302 (0x012E)
DFTAG_RI
ref_no
ref_no
Reference number (16-bit integer)
This tag points to raster image data. It is stored in row-major order and
must be interpreted as specified by interlace in the related
DFTAG_ID.
DFTAG_LUT
Lookup table
xdim*ydim*elements*NTsize bytes (xdim, ydim, elements, and NTsize
are specified in the corresponding DFTAG_ID)
301 (0x012D)
DFTAG_LUT
ref_no
P0_0
P0_1
P0_m
P1_0
P1_1
P1_m
Pn_0
Pn_1
Pn_m
P0_0
P1_0
Pn_0
P0_1
P1_1
Pn_1
P0_m
P1_m
Pn_m
OR
ref_no
Pn_m
Reference number (16-bit integer)
mth value of parameter n (size is specified by the DFTAG_NT
in the corresponding DFTAG_LD)
The DFTAG_LUT, sometimes called a palette, is used to assign colors to
data values. When a raster image consists of data values which are
going to be interpreted through an LUT capability, the DFTAG_LUT
should be loaded along with the image.
The most common lookup table is the RGB lookup table which will
have X dimension = 256 and Y dimension = 1 with three elements per
April 12, 1996
6-19
NCSA HDF Specification and Developer’s Guide
entry, one each for red, green, and blue. The interlace will be either 0,
where the LUT values are given RGB, RGB, RGB, ..., or 1, where the
LUT values are given as 256 reds, 256 greens, 256 blues.
DFTAG_MA
Matte channel
xdim*ydim*elements*NTsize bytes (xdim, ydim, elements, and NTsize
are specified in the corresponding DFTAG_ID)
309 (0x0135)
DFTAG_MA
ref_no
ref_no
Reference number (16-bit integer)
The DFTAG_MA data object contains transparency data which can be used
to facilitate the overlaying of images. The data consists of a 2dimensional array of unsigned 8-bit integers ranging from 0 to 255.
Each point in a DFTAG_MA indicates the transparency of the
corresponding point in a raster image of the same dimensions. A value
of 0 indicates that the data at that point is to be considered totally
transparent, while a value of 255 indicates that the data at that point is
totally opaque. It is assumed that a linear scale applies to the
transparency values, but users may opt to interpret the data in any way
they wish.
6-20
National Center for Supercomputing Applications
Tag Specifications
DFTAG_CCN
Color correction
52 bytes (usually)
310 (0x0136)
DFTAG_CCN
ref_no
gamma
red_x
red_y
red_z
green_x
green_y
green_z
blue_x
blue_y
blue_z
white_x
white_y
white_z
Reference number (16-bit integer)
gamma
Gamma parameter (32-bit IEEE floating point)
red_x, red_y, and red_z
Red x, y, and z correction factors (32-bit IEEE floating
point)
green_x, green_y, and green_z
Green x, y, and z correction factors (32-bit IEEE floating
point)
blue_x, blue_y, and blue_z
Blue x, y, and z correction factors (32-bit IEEE floating
point)
white_x, white_y, and white_z
White x, y, and z correction factors (32-bit IEEE floating
point)
ref_no
Color correction specifies the Gamma correction for the image and color
primaries for the generation of the image.
April 12, 1996
6-21
NCSA HDF Specification and Developer’s Guide
DFTAG_CFM
Color format
String
311 (0x0137)
ref_no
DFTAG_CFM
character_string
ref_no
character_string
Reference number (16-bit integer)
Non-null terminated ASCII string (any length)
The color format data element contains a string of uppercase characters
that indicates how each element of each pixel in a raster image is to be
interpreted. Table 6.4 lists the available color format strings.
Table 6.4
Color Format String Values
String
Description
VALUE
Pseudo-color, or just a value associated with
the pixel
Red, green, blue model
Color-space model
Hue, saturation, value model
Hue, saturation, intensity
Spectral sampling method
RGB
XYZ
HSV
HSI
SPECTRAL
DFTAG_AR
Aspect ratio
4 bytes
312 (0x0138)
DFTAG_AR
ref_no
ratio
ref_no
ratio
Reference number (16-bit integer)
Ratio of width to height (32-bit IEEE float)
The data for this tag is the visual aspect ratio for this image. The image
should be visually correct if displayed on a screen with this aspect ratio.
The data consists of one floating-point number which represents width
divided by height. An aspect ratio of 1.0 indicates a display with
perfectly square pixels; 1.33 is a standard aspect ratio used by many
monitors.
6-22
National Center for Supercomputing Applications
Tag Specifications
Composite Image Tags
DFTAG_DRAW
Draw
n*4 bytes (where n is the number of data objects that make up the
composite image)
400 (0x0190)
DFTAG_DRAW
tag_1
ref_no
tag_n
ref_n
ref_no
ref_1
tag_2
ref_2
Reference number (16-bit integer)
Tag number of the nth member of the draw list (16-bit
integer)
Reference number of the nth member of the draw list (16-bit
integer)
The DFTAG_DRAW data element consists of a list of tag/refs that define a
composite image. The data objects indicated should be displayed in
order. This can include several RIGs which are to be displayed
simultaneously. It can also include vector overlays, like DFTAG_T14,
which are to be placed on top of an RIG.
Some of the elements in a DFTAG_DRAW list may be instructions about
how images are to be composited (XOR, source put, anti-aliasing, etc.).
These are defined as individual tags.
April 12, 1996
6-23
NCSA HDF Specification and Developer’s Guide
DFTAG_XYP
XY position
8 bytes
500 (0x01F4)
DFTAG_XYP
ref_no
x
ref_no
x
y
y
Reference number (16-bit integer)
X-coordinate (32-bit integer)
Y-coordinate (32-bit integer)
DFTAG_XYP is used in composites and other groups to indicate an XY
position on the screen. For this, (0,0) is the lower left corner of the
print area. X is the number of pixels to the right along the horizontal
axis and Y is the number of pixels up on the vertical axis. The X and Y
coordinates are two 32-bit integers.
For example, if DFTAG_XYP is present in a DFTAG_RIG, the DFTAG_XYP
specifies the position of the lower left corner of the raster image on the
screen.
See also:
6-24
DFTAG_DRAW in this section
National Center for Supercomputing Applications
Tag Specifications
Vector Image Tags
DFTAG_T14
Tektronix 4014
? bytes
602 (0x25A)
DFTAG_T14
ref_no
ref_no
Reference number (16-bit integer)
This tag points to a Tektronix 4014 data stream. The bytes in the data
field, when read and sent to a Tektronix 4014 terminal, will display a
vector image. Only the lower seven bits of each byte are significant.
There are no record markings or non-Tektronix codes in the data.
DFTAG_T105
Tektronix 4105
? bytes
603 (0x25B)
DFTAG_T105
ref_no
ref_no
Reference number (16-bit integer)
This tag points to a Tektronix 4105 data stream. The bytes in the data
field, when read and sent to a Tektronix 4105 terminal, will be
displayed as a vector image. Only the lower seven bits of each byte are
significant. Some terminal emulators will not correctly interpret every
feature of the Tektronix 4105 terminal, so you may wish to use only a
subset of the available Tektronix 4105 vector commands.
April 12, 1996
6-25
NCSA HDF Specification and Developer’s Guide
Scientific Data Set Tags
DFTAG_NDG
Numeric data group
n*4 bytes (where n is the number of data objects in the group.)
720 (0x02D0)
DFTAG_NDG
tag_1
ref_no
tag_n
ref_n
ref_no
ref_1
tag_2
ref_2
Reference number (16-bit integer)
Tag number of nth member of the group (16-bit integer)
Reference number of nth member of the group
(16-bit integer)
The NDG data contains a list of tag/refs that define a scientific data set.
DFTAG_NDG supersedes the old DFTAG_SDG, which became obsolete
upon the release on HDF Version 3.2. A more complete explanation of
the relationship between DFTAG_NDG and DFTAG_SDG can be found in
Chapter 4, “Sets and Groups.”
All of the members of an NDG provide information for correctly
interpreting and displaying the data. Application programs that deal
with NDGs should read all of the elements of a NDG and process those
data objects which it can use. Even if an application cannot process all
of the objects, the objects that it can understand will be usable.
Table 6.5 lists the tags that may appear in an NDG.
Table 6.5
Available NDG Tags
Tag
Description
DFTAG_SDD
Scientific data dimension record (rank and
dimensions)
Scientific data
Scales
Labels
Units
Formats
Maximum and minimum values
Coordinate system
Calibration information
Fill value
Color lookup table
Lookup table dimension record
Link to old-style DFTAG_SDG
DFTAG_SD
DFTAG_SDS
DFTAG_SDL
DFTAG_SDU
DFTAG_SDF
DFTAG_SDM
DFTAG_SDC
DFTAG_CAL
DFTAG_FV
DFTAG_LUT
DFTAG_LD
DFTAG_SDLNK
Example
DFTAG_SDD, DFTAG_SD, DFTAG_SDM
6-26
National Center for Supercomputing Applications
Tag Specifications
Suppose that an NDG contains a dimension record, scientific data, and
the maximum and minimum values of the data. These data objects can
be associated in an NDG so that an application can read the rank and
dimensions from the dimension record and then read the data array. If the
application needs maximum and minimum values, it will read them as
well.
See also:
DFTAG_SDD
Chapter 4, “Sets and Groups”
Scientific data dimension record
6 + 8*rank bytes
701 (0x02BD)
DFTAG_SDD
rank
ref_no
dim_1
dim_2
DFTAG_NT
data_NT_ref
DFTAG_NT
scale_NT_ref_1
DFTAG_NT
scale_NT_ref_2
DFTAG_NT
scale_NT_ref_n
ref_no
rank
dim_n
data_NT_ref
scale_NT_ref_n
dim_n
Reference number (16-bit integer)
Number of dimensions (16-bit integer)
Number of values along the nth dimension (32-bit
integer)
Reference number of DFTAG_NT for data
(16-bit integer)
Reference number for DFTAG_NT for the scale for
the nth dimension (16-bit integer)
This record defines the rank and dimensions of the array in the scientific
data set. For example, a DFTAG_SDD for a 500x600x3 array of floatingpoint numbers would have the following values and components.
Rank: 3
Dimensions: 500, 600, and 3.
One data NT
Three scale NTs
April 12, 1996
6-27
NCSA HDF Specification and Developer’s Guide
DFTAG_SD
Scientific data
NTsize*x*y*z*... bytes (where NTsize is the size of the data NT
specified in the corresponding DFTAG_SDD and x, y, z, etc. are the
dimension sizes)
702 (0x02BE)
ref_no
DFTAG_SD
2.1
2.9
8.7 ... 6.4
1.1
2.5
9.8 ...7.76.7
1.2
3.6
8.4 ...7.2
9.1
8.3
2.4
2.8
6.3 ...8.6
7.5
1.7
2.0
5.3 ... 8.2
.
.
.
.
6.1
.
.
.
6.4.
4.3
3.6
7.1 ... 6.2
ref_no
Reference number (16-bit integer)
This tag points to an array of scientific data. The type of the data may
be specified by an DFTAG_NT included with the SDG. If there is no
DFTAG_NT, the type of the data is floating-point in standard IEEE 32-bit
format. The rank and dimensions must be stored as specified in the
corresponding DFTAG_SDD. The diagram above shows a 3-dimensional
data array.
DFTAG_SDS
Scientific data scales
rank + NTsize0*x + NTsize1*y +NTsize2*z +... bytes (where rank is
the number of dimensions, x, y, z, etc. are the dimension sizes, and
NTsize# are the sizes of each scale NT from the corresponding
DFTAG_SDD)
703 (0x02BF)
ref_no
DFTAG_SDS
is_1
is_2
scale_1
ref_no
is_n
scale_n
is_3
scale_2
is_n
scale_3
scale_n
Reference number (16-bit integer)
A flag indicating whether a scale exists for the nth
dimension (8-bit integer; 0 or 1)
List of scale values for the nth dimension (type specified
in corresponding DFTAG_SDD)
This tag points to the scales for the data set. The first n bytes indicate
whether there is a scale for the corresponding dimension (1 = yes, 0 =
no). This is followed by the scale values for each dimension. The scale
consists of a simple series of values where the number of values and
their types are specified in the corresponding DFTAG_SDD.
6-28
National Center for Supercomputing Applications
Tag Specifications
DFTAG_SDL
Scientific data labels
? bytes
704 (0x02C0)
ref_no
DFTAG_SDL
label_1
label_2
label_n
Reference number (16-bit integer)
Null terminated ASCII string (any length)
ref_no
label_n
This tag points to a list of labels for the data in each dimension of the
data set. Each label is a string terminated by a null byte (0).
DFTAG_SDU
Scientific data units
? bytes
705 (0x02C1)
ref_no
DFTAG_SDU
unit_1
ref_no
unit_n
unit_2
unit_n
Reference number (16-bit integer)
Null terminated ASCII string (any length)
This tag points to a list of strings specifying the units for the data and
each dimension of the data set. Each unit's string is terminated by a null
byte (0).
DFTAG_SDF
Scientific data format
? bytes
706 (0x02C2)
DFTAG_SDF
ref_no
format_1
ref_no
format_n
format_2
format_n
Reference number (16-bit integer)
Null terminated ASCII string (any length)
This tag points to a list of strings specifying an output format for the
data and each dimension of the data set. Each format string is terminated
by a null byte (0).
April 12, 1996
6-29
NCSA HDF Specification and Developer’s Guide
DFTAG_SDM
Scientific data max/min
8 bytes
707 (0x02C3)
DFTAG_SDM
ref_no
max
ref_no
max
min
min
Reference number (16-bit integer)
Maximum value (type is specified by the data NT in the
corresponding DFTAG_SDD)
Minimum value (type is specified by the data NT in the
corresponding DFTAG_SDD)
This record contains the maximum and minimum data values in the data
set. The type of max and min are specified by the data NT of the
corresponding DFTAG_SDD.
DFTAG_SDC
Scientific data coordinates
? bytes
708 (0x02C4)
DFTAG_SDC
ref_no
string
ref_no
string
Reference number (16-bit integer)
Null terminated ASCII string (any length)
This tag points to a string specifying the coordinate system for the data
set. The string is terminated by a null byte.
6-30
National Center for Supercomputing Applications
Tag Specifications
DFTAG_SDLNK
Scientific data set link
8 bytes
710 (0x02C6)
ref_no
DFTAG_SDLNK
DFTAG_NDG
ref_no
DFTAG_NDG
NDG_ref
DFTAG_SDG
SDG_ref
NDG_ref
DFTAG_SDG
SDG_ref
Reference number (16-bit integer)
NDG tag (16-bit integer)
NDG reference number (16-bit integer)
SDG tag (16-bit integer)
SDG reference number (16-bit integer)
The purpose of this tag is to link together an old-style DFTAG_SDG and a
DFTAG_NDG in cases where the NDG contains 32-bit floating point data
and is, therefore, equivalent to an old SDG.
See also:
April 12, 1996
Chapter 4, “Sets and Groups”
6-31
NCSA HDF Specification and Developer’s Guide
DFTAG_CAL
Calibration information
36 bytes
731 (0x02DB)
ref_no
DFTAG_CAL
cal
ref_no
cal
cal_err
off
off_err
data_type
cal_err
off
off_err
data_type
Reference number (16-bit integer)
Calibration factor (64-bit IEEE float)
Error in calibration factor (64-bit IEEE float)
Calibration offset (64-bit IEEE float)
Error in calibration offset (64-bit IEEE float)
Constant representing the effective data type of the
calibrated data (32-bit integer)
This tag points to a calibration record for the associated DFTAG_SD. The
data can be calibrated by first multiplying by the cal factor, then adding
the off value. Also included in the record are errors for the calibration
factor and offset and a constant indicating the effective data type of the
calibrated data. Table 6.6 lists the available data_type values.
Table 6.6
Available Calibrated Data Types
Data Type
Description
DFTNT_INT8
Signed 8-bit integer
Unsigned 8-bit integer
Signed 16-bit integer
Unsigned 16-bit integer
Signed 32-bit integer
Unsigned 32-bit integer
32-bit floating point
64-bit floating point
DFTNT_UINT8
DFTNT_INT16
DFTNT_UINT16
DFTNT_INT32
DFTNT_UINT32
DFTNT_FLOAT32
DFTNT_FLOAT64
6-32
National Center for Supercomputing Applications
Tag Specifications
DFTAG_FV
Fill value
? bytes (size determined by size of data NT in corresponding
DFTAG_SDD)
732 (0x02DC)
DFTAG_FV
ref_no
fill_value
ref_no
fill_value
Reference number (16-bit integer)
Value representing unset data in the corresponding
DFTAG_SD (size determined by size of data NT in
corresponding DFTAG_SDD)
This tag points to a value which has been used to indicate unset values
in the associated DFTAG_SD. The number type of the value (and,
therefore, its size) is given in the corresponding DFTAG_SDD.
April 12, 1996
6-33
NCSA HDF Specification and Developer’s Guide
Vset Tags
DFTAG_VG
Vgroup
14 + 4*nelt + namelen + classlen bytes
1965 (0x07AD)
ref_no
DFTAG_VG
nelt
tag_1
ref_1
namelen
extag
ref_no
nelt
tag_n
ref_n
namelen
name
classlen
class
extag
exref
version
more
tag_2
ref_2
name
exref
tag_n
ref_n
classlen
version
class
more
Reference number (16-bit integer)
Number of elements in the Vgroup (16-bit integer)
Tag of the nth member of the Vgroup (16-bit integer)
Reference number of the nth member of the Vgroup (16bit integer)
Length of the name field (16-bit integer)
Non-null terminated ASCII string (length given by
namelen)
Length of the class field (16-bit integer)
Non-null terminated ASCII string (length given by
classlen)
Extension tag (16-bit integer)
Extension reference number (16-bit integer)
Version number of DFTAG_VG information (16-bit integer)
Unused (2 zero bytes)
DFTAG_VG provides a general-purpose grouping structure which can be
used to impose a hierarchical structure on the tags in the group. Any
HDF tag may be incorporated into a Vgroup, including other DFTAG_VG
tags.
See also:
6-34
“Vsets, Vdatas, and Vgroups” in Chapter 4,
“Sets and Groups”
NCSA HDF Vsets, Version 2.0 for HDF
Version 3.2 and earlier
NCSA HDF User’s Guide and NCSA HDF
Reference Manual for HDF Version 3.3
National Center for Supercomputing Applications
Tag Specifications
DFTAG_VH
Vdata description
22 + 10*nfields +
1962 (0x07AA)
Sfldnmlen n + namelen + classlen bytes
ref_no
DFTAG_VH
interlace
type_1
nvert
type_2
isize_1
offset_1
order_1
classlen
version
interlace
nvert
ivsize
nfields
type_n
isize_n
offset_n
order_n
fldnmlen_n
fldnm_n
namelen
name
classlen
class
extag
exref
April 12, 1996
isize_n
offset_2
offset_n
order_2
order_n
fldnm_1
fldnmlen_n
class
nfields
type_n
isize_2
fldnmlen_1
ref_no
ivsize
fldnmlen_2
fldnm_n
extag
namelen
fldnm_2
name
exref
more
Reference number (16-bit integer)
Constant indicating interlace scheme used (16-bit
integer)
Number of entries in Vdata (32-bit integer)
Size of one Vdata entry (16-bit integer)
Number of fields per entry in the Vdata (16-bit integer)
Constant indicating the data type of the nth field of the
Vdata (16-bit integer)
Size in bytes of the nth field of the Vdata (16-bit
integer)
Offset of the nth field within the Vdata (16-bit integer)
Order of the nth field of the Vdata (16-bit integer)
Length of the nth field name string (16-bit integer)
Non-null terminated ASCII string (length given by
corresponding fldnmlen_n)
Length of the name field (16-bit integer)
Non-null terminated ASCII string (length given by
namelen)
Length of the class field (16-bit integer)
Non-null terminated ASCII string (length given by
classlen)
Extension tag (16-bit integer)
Extension reference number (16-bit integer)
6-35
NCSA HDF Specification and Developer’s Guide
version
more
Version number of DFTAG_VH information (16-bit
integer)
Unused (2 zero bytes)
DFTAG_VH provides all the information necessary to process a
DFTAG_VS.
See also:
DFTAG_VS (this section)
“Vsets, Vdatas, and Vgroups” in Chapter 4,
“Sets and Groups”
NCSA HDF Vsets, Version 2.0 for HDF
Version 3.2 and earlier
NCSA HDF User’s Guide and NCSA HDF
Reference Manual for HDF Version 3.3
6-36
National Center for Supercomputing Applications
Tag Specifications
DFTAG_VS
Vdata
nfields
nvert *
∑(
isize_n * order_n
bytes, where nvert, isize_n,
n =1
and order_n are specified in the corresponding DFTAG_VH
1963 (0x07AB)
DFTAG_VS
ref_no
vdata
ref_no
vdata
Reference number (16-bit integer)
Data block interpreted according to the corresponding
nfields
DFTAG_VH ( nvert *
∑(
isize_n * order_n
n =1
bytes, where nvert, isize_n, and order_n are
specified in the corresponding DFTAG_VH)
DFTAG_VS contains a block of data which is to be interpreted according
to the information in the corresponding DFTAG_VH.
See also:
DFTAG_VH (this section)
“Vsets, Vdatas, and Vgroups” in Chapter 4,
“Sets and Groups”
NCSA HDF Vsets, Version 2.0 for HDF
Version 3.2 and earlier
NCSA HDF User’s Guide and NCSA HDF
Reference Manual for HDF Version 3.3
April 12, 1996
6-37
NCSA HDF Specification and Developer’s Guide
Obsolete Tags
DFTAG_ID8
Image dimension-8
4 bytes
200 (0x00C8)
DFTAG_ID8
ref_no
x_dim
ref_no
x_dim
y_dim
y_dim
Reference number (16-bit integer)
Length of x dimension (16-bit integer)
Length of y dimension (16-bit integer)
The data for this tag consists of two 16-bit integers representing the
width and height of an 8-bit raster image in bytes.
This tag has been superseded by DFTAG_ID.
DFTAG_IP8
Image palette-8
768 bytes
201 (0x00C9)
DFTAG_IP8
ref_no
Table entries
ref_no
Red
Green
Blue
R0
G0
B0
R1
G1
B1
R255
G255
B255
Reference number (16-bit integer)
256 triples of 8-bit integers
The data for this tag can be thought of as a table of 256 entries, each
containing one value for red, green, and blue. The first triple is palette
entry 0 and the last is palette entry 255.
This tag has been superseded by DFTAG_LUT.
6-38
National Center for Supercomputing Applications
Tag Specifications
DFTAG_RI8
Raster image-8
xdim*ydim bytes (where xdim and ydim are the dimensions specified in
the corresponding DFTAG_ID8)
202 (0x00CA)
ref_no
DFTAG_RI8
ref_no
Image data
Reference number (16-bit integer)
2-dimensional array of 8-bit integers
The data for this tag is a row-wise representation of the elementary 8-bit
image data. The data is stored width-first (i.e., row-wise) and is 8 bits
per pixel. The first byte of data represents the pixel in the upper-left
hand corner of the image.
This tag has been superseded by DFTAG_RI.
DFTAG_CI8
Compressed image-8
? bytes
203 (0x00CB)
DFTAG_CI8
ref_no
compressed_image
ref_no
compressed_image
Reference number (16-bit integer)
Series of run-length encoded bytes
The data for this tag is a row-wise representation of the elementary 8-bit
image data. Each row is compressed using the following run-length
encoding where n is the lower seven bits of the byte. The high bit
indicates whether the following n bytes will be reproduced exactly (high
bit = 0) or whether the following byte will be reproduced n times (high
bit = 1). Since DFTAG_CI8 and DFTAG_RI8 are basically
interchangeable, it is suggested that you not have a DFTAG_CI8 and a
DFTAG_RI8 with the same reference number.
This tag has been superseded by DFTAG_RLE.
April 12, 1996
6-39
NCSA HDF Specification and Developer’s Guide
DFTAG_II8
IMCOMP image-8
? bytes
204 (0x00CC)
DFTAG_II8
ref_no
compressed_image
ref_no
compressed_image
Reference number (16-bit integer)
Compressed image data
The data for this tag is a 4:1 compressed 8-bit image, using the
IMCOMP compression scheme.
This tag has been superseded by DFTAG_IMC.
6-40
National Center for Supercomputing Applications
Tag Specifications
DFTAG_SDG
Scientific data group
n*4 bytes (where n is the number of data objects in the group)
700 (0x02BC)
ref_no
DFTAG_SDG
tag_1
ref_no
tag_n
ref_n
ref_1
tag_2
ref_2
Reference number (16-bit integer)
Tag number of nth member of the group (16-bit integer)
Reference number of nth member of the group (16-bit
integer)
The SDG data element contains a list of tag/refs that define a scientific
data set. All of the members of the group provide information required
to correctly interpret and display the data. Application programs that
deal with SDGs should read all of the elements of an SDG and process
those which it can use. Even if an application cannot process all of the
objects, the objects that it can understand will be usable.
Table 6.7 lists the tags that may appear in an SDG.
Table 6.7
Available SDG Tags
Tag
Description
DFTAG_SDD
Scientific data dimension record (rank and
dimensions)
Scientific data
Scales
Labels
Units
Formats
Maximum and minimum values
Coordinate system
Transposition (obsolete)
Link to new DFTAG_NDG
DFTAG_SD
DFTAG_SDS
DFTAG_SDL
DFTAG_SDU
DFTAG_SDF
DFTAG_SDM
DFTAG_SDC
DFTAG_SDT
DFTAG_SDLNK
Example
DFTAG_SDD, DFTAG_SD, DFTAG_SDM
Assume that a dimension record, scientific data, and the maximum and
minimum values of the data are required to read and interpret a particular
data set. These data objects can be associated in an SDG so that an
application can read the rank and dimensions from the dimension record
and then read the data array. If the application needs the maximum and
minimum values, it will read them as well.
This tag has been superseded by DFTAG_NDG.
See also:
April 12, 1996
Chapter 4, “Sets and Groups”
6-41
NCSA HDF Specification and Developer’s Guide
DFTAG_SDT
Scientific data transpose
0 bytes
709 (0x02C5)
DFTAG_SDT
ref_no
ref_no
Reference number (16-bit integer)
The presence of this tag in a group indicates that the data pointed to by
the corresponding DFTAG_SD is in column-major order, instead of the
default row-major order. No data is associated with this tag.
This tag is no longer written by the HDF library. When it is
encountered in an old file, it is interpreted as originally intended.
6-42
National Center for Supercomputing Applications
Chapter
7
Portability Issues
Chapter Overview
The NCSA implementation of HDF is accessible to both C and
FORTRAN programs and is implemented on many different machines
and several operating systems. There are important differences between
C and FORTRAN, and among implementations of each language,
especially FORTRAN. There are also important differences among the
machines and operating systems that HDF supports.
If HDF is to be a portable tool, these differences must be constructively
addressed. This chapter describes many of these differences, discusses
the problems and issues associated with them, and presents the methods
employed in the HDF implementation to reduce their impact.
The HDF Environment
The list of machines and operating systems on which HDF is
implemented is steadily growing. For reasons that this chapter will
make clear, the number of NCSA-supported HDF platforms is growing
slowly. Every time a platform is added, additional code must be written
to address concerns of memory management, operating system and file
system differences, number representations, and differences in
FORTRAN and C implementations on that system.
April 12, 1996
7-1
NCSA HDF Specification and Developer’s Guide
Supported Platforms
Table 7.1
As of this writing, NCSA supports the platforms listed in Table 7.1.
NCSA-supported HDF Platforms
Hardware Platform
Convex
Cray X-MP, Y-MP, Cray 2
DEC Alpha
DECStation
HP 9000
IBM PC
IBM RS/6000
IBM RT
Macintosh
NeXT
Silicon Graphics
Sun Sparc
Vax
Operating System
Concentrix
UNICOS
Ultrix
Ultrix
HPUX
MS DOS, Windows 3.1
AIX
UNIX
MPW Shell
NeXTStep
UNIX
UNIX
VMS
HDF has also been ported to several platforms that NCSA does not
currently support. These include Alliant, Apollo (Domain), HP 3000,
Stellar, Amiga, Symbolics, Fujitsu, and IBM 3090 (MVS).
Language Standards
Unfortunately, not all compilers are the same. FORTRAN compilers
often differ in the ways they pass parameters, in the identifier naming
conventions they employ, and in the number types that they support.
Similarly, though generally not as drastically, C compilers differ in the
number types that they support and in their adherence to the ANSI C
standard.
To minimize the difficulties caused by these differences, the HDF
source code is written primarily in the following dialects:
• FORTRAN 77
• ANSI C
• The original C defined by Kernighan and Ritchie1, hereafter
referred to as old C
Almost all platforms have C and FORTRAN compilers that adhere to
at least one of these standards.
When time and resources permit, NCSA attempts to support features or
variations in other dialects of C and FORTRAN, particularly on
platforms that are important to NCSA users. Much of the remainder of
this chapter addresses these efforts.
Guidelines
1
7-2
One cannot over stress the importance of following the guidelines
outlined in this chapter. It may take longer to write code and it may be
difficult to adapt your coding style, but the long-term benefits, in terms
of portability and maintenance costs, will be well worth the effort.
The version of C described in the first edition of The C Programming Language, by Brian Kernighan and Dennis Ritchie,
published by Prentice-Hall.
National Center for Supercomputing Applications
Portability Issues
Organization of Source Files
Three types of files appear in the HDF source code directory:
• Header files
• Source code files
• A makefile
Header files and source code files are organized by application area. All
of the functions that apply to a particular application area are stored in
three source files, and all the definitions and declarations that apply to
that application are stored in a corresponding header file. The makefile
describes the dependencies among the source and header files and
provides the commands required to compile the corresponding libraries
and utilities.
Header Files
Certain application modules require header files. The header file
dfan.h, for example, contains definitions and declarations that are
unique to the annotation interface.
There are also several general header files that are used in compiling the
libraries for all application areas:
hdf.h, hdfi.h2
hdf.h contains declarations and definitions for the common data
structures used throughout HDF, definitions of the HDF tags,
definitions of error numbers, and definitions and declarations
specific to the general purpose interface. Since hdf.h depends on
hdfi.h, it includes hdfi.h via #include.
hdfi.h contains information specific to the various NCSAsupported HDF computing environments, environmental
parameters that need to be set to particular values when compiling
the HDF libraries, and machine dependent definitions of such
things as number types and macros for reading and writing
numbers.
When porting HDF to a new system, only hdfi.h and the
makefile should need to be modified, though there may be
exceptions.
It is normally a good idea to include hdf.h (and therefore
indirectly hdfi.h) in user programs, though users usually need
not be aware of its contents.
hproto.h
This file contains ANSI C prototypes for all HDF C routines. It
must be included in ANSI C programs that call HDF routines.
constants.i
This file is for use in FORTRAN programs. It contains important
constants, such as tag values, that are defined in hdf.h. Systems
with FORTRAN preprocessors might be able to include this file
via #include statements or their equivalent.
dffunc.i
This file is for use in FORTRAN programs. It contains
declarations of all HDF FORTRAN-callable functions. Systems
with FORTRAN preprocessors might be able to include this file
via #include statements or their equivalent.
2
In earlier implementations of HDF, these files were called df.h and dfi.h. Starting with HDF Version 3.2, the general
purpose layer of HDF was completely rewritten and all routine names were changed from df* to hdf*.
April 12, 1996
7-3
NCSA HDF Specification and Developer’s Guide
Source
Code Files
All HDF operations are performed by routines written in C. Hence,
even FORTRAN calls to HDF result in calls to the corresponding C
routines. Because of the problems described below the relationships
between the C routines and the corresponding FORTRAN routines can
be confusing. This section discusses the C and FORTRAN source file
organization. It is followed by discussions of problems users will face
in the FORTRAN–C interface.
HDF interfaces typically have three or four associated files. For
example, the scientific data set (SDS) interface is associated with the
following files: dfsd.h, dfsd.c, dfsdf.c, and dfsdff.f.
These files fill the following roles:
Header files
The *.h files are header files.
Normal C routines
These routines do the actual HDF work. The others are used
to transfer control and data from a FORTRAN environment to
a C environment.
These routines are in the *.c files, as in dfsd.c. Every call
to HDF, whether from C or FORTRAN, ultimately results in
a call to one of these routines.
C routines that are directly callable from FORTRAN
These routines provide recognizable function names to the
linker. They may also perform operations on data they receive
from the FORTRAN routines that call them, such as
transferring a FORTRAN string to a local C data area.
Examples are provided below.
These routines are in the *f.c files, such as dfsdf.c. The
f means that the routines can be called from FORTRAN; the
.c means that they are C source code.
FORTRAN routines that perform some operation on the
parameters that C would be unable to perform, before and/or
after calling the corresponding C routine
These routines are required, for example, when one of the
parameters is a string. The corresponding C routine has no
way of knowing the length of the string unless it is explicitly
given the length by the FORTRAN routine.
These routines are in the *ff.f files, such as dfsdff.f.
The ff means that the routines perform some FORTRAN
operation that C cannot perform and that they are to be called
from FORTRAN; the .f means that they are FORTRAN
source code.
The roles of these different types of source file types will become
clearer as we look at some of the problems that arise in interfacing C
and many different implementations of FORTRAN.
7-4
National Center for Supercomputing Applications
Portability Issues
File naming conventions
The naming conventions for HDF library source code files are
complicated by several factors. Because HDF must accommodate a
wide variety of platforms, all files that will compile to object modules
must have names that are unique in the first 8 characters, ignoring case.
The difficulties involved in maintaining a FORTRAN-callable interface
to a library that is primarily written in C further complicate the naming
of source code files.
Passing Strings Between FORTRAN and C
One of the most important differences between FORTRAN and C
compilers is in the way strings are represented. Different compilers use
different data structures for strings, and supply string length information
in different ways.
Passing Strings from
FORTRAN to C
When strings are passed between FORTRAN and C routines, they may
need to be converted from one representation to the other. C compilers
store strings in an array of type char, terminated by a null byte (\0).
The name of a string variable is equivalent to a pointer the first
character in the string. FORTRAN compilers are not consistent in the
ways that they store strings.
Two pieces of information must be acquired before FORTRAN can pass
a string to C:
The string’s length
The string’s address
The string’s length is determined by invoking the standard FORTRAN
function len(), which returns the length of a string. Since C expects
a null byte at the end of a string, care must be taken that this null byte
does not overwrite useful information in the FORTRAN string.
Determining the string’s address is more difficult because of the
different ways that different FORTRAN implementations store strings.
The macro _fcdtocp (FORTRAN character descriptor to C pointer) is
used to acquire this information. _fcdtocp is one of the elements that
must be customized for each platform. The following paragraphs
discuss several existing customized implementations:
• UNICOS FORTRAN stores strings in a structure called _fcd
(FORTRAN character descriptor). _fcdtocp is a built-in UNICOS
function that returns the string’s address. (Since UNICOS provides
this function, HDF omits the corresponding macro definition on
UNICOS systems.)
• VMS FORTRAN uses a string descriptor structure that provides the
string’s address and length. When compiled under VMS, _fcdtocp
extracts the string's address from that structure.
• Most other FORTRAN compilers supported by HDF store strings
just as C does, in character arrays with the array name identifying
the array's address. In such situations, nothing special needs to be
done to pass a string from FORTRAN to C, except to add a NULL
byte..
An HDF FORTRAN call that involves passing a string results in the
following sequences of actions:
April 12, 1996
7-5
NCSA HDF Specification and Developer’s Guide
1. A FORTRAN filter routine determines the length and address in
memory of the string. Since this filter is a FORTRAN routine, it
can be found in the appropriate *ff.f file.
2. The FORTRAN filter then calls a C routine, to which it passes all
parameters from the initial call the string's length.
3. The C routine converts the FORTRAN string to a C string by
copying it to a C array of type char and appending a null byte.
Since this C routine serves as a link between a FORTRAN filter
and the corresponding C interface call, it can be found in the
appropriate *f.c file.
4. This C routine then calls the HDF C routine that performs the
actual work.
This process is illustrated in Figure 7.1
Figure 7.1. Sequence of Events When a FORTRAN Call Includes a String as a Parameter
User's program
...
ret = dsgdim('myfile ', rank , ...)
...
User's FORTRAN program calls
dsgdims. The parameter myfile
is a string.
(the HDF library)
libdf.a
dfsdFf.f
dsgdim()
...
dsigdim(filename ,rank ,...,len(filename ))
...
The FORTRAN function dsgdim
calls the C function dsigdim,
adding an extra parameter--the
length of the filename parameter .
dfsdF.c
dsigdim()
...
DFSDgetdims(fn , prank ,...)
...
dsigdim converts the
FORTRAN string stored in
filename to a C string, then
calls DFSDgetdims.
dfsd.c
DFSDgetdims()
7-6
DFSDgetdims performs the
actual HDF function, getting the
rank and dimension of the next
scientific data set in the file.
National Center for Supercomputing Applications
Portability Issues
Passing Strings from C
to FORTRAN
When strings are passed from C to FORTRAN, the reverse procedure is
followed. First, a string pointer is allocated within the FORTRAN
routine's data area. (It is assumed that the space pointed to has already
been allocated, and is sufficiently large to hold the string.) The string
is then copied from the C data area to the FORTRAN data area.
Finally, the FORTRAN string's data area is padded with blanks, if
necessary.
Function Return Values between FORTRAN and C
When a FORTRAN routine calls a C function, it always expects a
return value from that function. Unfortunately, C functions do not
always return arguments in a FORTRAN-compatible format.
To solve this problem, some FORTRAN compilers offer the option of
controlling the form of the return value from a function. For example,
Language Systems FORTRAN for the Macintosh requires that all C
function declarations be prepended by the word pascal so that the
return value can be recognized by a FORTRAN routine that calls it, as
in:
pascal int dsgrang(void *pmax, void *pmin)
Since C always expects return values to be passed by value rather than,
say, by reference, it is important to coerce FORTRAN functions to do
the same. This is accomplished by defining a macro FRETVAL that is
prepended to the declaration of every FORTRAN-callable C function.
For example:
FRETVAL(int)
dsgrang(void *pmax, void *pmin)
If Language Systems FORTRAN is to be used, FRETVAL is defined in
hdfi.h as follows:
#if defined(MAC)
#
define FRETVAL(x)
#endif
/* with LS FORTRAN */
pascal x
Differences in Routine Names
HDF generally employs standard C conventions in naming routines.
But many FORTRAN compilers impose varying restrictions on the
length, character set, and form of identifiers, some of which are
considerable more restrictive than the C conventions. Therefore, an
extra effort must be made to accommodate those FORTRAN compilers.
To address this issue, HDF defines a set of preprocessor flags in
hdfi.h. Then conditional compilation, with #ifdef statements in
the source code , produces routine names that the target system’s
FORTRAN will understand.
April 12, 1996
7-7
NCSA HDF Specification and Developer’s Guide
Case
Sensitivity
C compilers are case sensitive; uppercase and lowercase letters are
recognized as different characters. Many FORTRAN compilers are not
case sensitive; they allow users to use uppercase and lowercase letters
while naming routines in the source code, but the names are converted
to all uppercase or all lowercase in the object module symbol tables.
Routine name recognition problems are common when routines
compiled by a case sensitive compiler are to be linked with routines
compiled by a non-case sensitive compiler.
For example, the UNICOS FORTRAN compiler allows you to name
routines without regard to case, but produces object module symbol
tables with the routine names in all uppercase. UNICOS C, on the
other hand, performs no such conversion.
Consider the HDF routine Hopen. Hopen is written in C, so the HDF
library symbol table contains the name Hopen. Suppose you make the
following call in your UNICOS FORTRAN program:
file_id = Hopen('myfile', ...)
The FORTRAN compiler will create an object module symbol table
with the routine name HOPEN. When you link it to the HDF library, it
will find Hopen but not HOPEN, and will generate an unsatisfied
external reference error.
HDF supports the following non-case sensitive compilers:
• VMS FORTRAN
• UNICOS FORTRAN
• Language Systems FORTRAN.
All of these compilers convert identifiers to all uppercase when building
an object module symbol table. In the following discussion, they are
referred to as all-uppercase compilers.
The HDF Solution
HDF addresses the all-uppercase compiler problem in the platformspecific section of hdfi.h where the DF_CAPFNAMES flag is defined.
With conditional compilation, HDF generates all-uppercase routine
names and symbol table entries.
Once again, consider UNICOS. The UNICOS section of hdfi.h
contains the following line:
#define DF_CAPFNAMES
The *f.c files contain corresponding conditional sections that produce
all-uppercase routine names. For example, the function name Fun can
be redefined as FUN:
#ifdef DF_CAPFNAMES
define Fun FUN
#endif /* DF_CAPFNAMES */
Appended Underscores
Differing compiler conventions create a similar problem in their use of
the underscore ( _ ) character. Many compilers, including most C
compilers, prepend an underscore to all external symbols in the object
module symbol table. The linker then looks for external symbols in
other symbol tables with the prefixed underscore.
Many FORTRAN compilers also append an underscore to identify
external symbols. Since C compilers do not generally do this, external
7-8
National Center for Supercomputing Applications
Portability Issues
references in FORTRAN-generated object modules will not recognize
externals with the same names in C-generated modules.
For example, the FORTRAN compiler on the CONVEX system places
an underscore both at the beginning and at the end of routine names,
while the C compiler places an underscore only at the beginning.
Since FUN is a C function, it appears under the name _FUN in the
object module containing it. Now suppose you make the following
call in a FORTRAN program:
x = FUN(y)
The FORTRAN compiler will create an object module symbol table
with the routine name _FUN_. When you link it to the C module, the
linker will be unable to link _FUN and _FUN_ and will generate an
unsatisfied external reference error.
The HDF Solution
Like the all-uppercase compiler problem, this issue is resolved in the
platform-specific sections of hdfi.h and with conditional sections of
code that append an underscore to C routine names on platforms where
the FORTRAN compiler expects it.
This is implemented as follows: The FNAME_POST_UNDERSCORE flag is
defined in the platform-specific section of hdfi.h for every platform
whose FORTRAN compiler requires appended underscores. Similarly,
the FNAME_PRE_UNDERSCORE flag is defined on platforms where the
FORTRAN compiler expects prepended underscores. The macro FNAME
is then defined to append and/or prepend underscores as required.
The FNAME macro is then applied to each routine in the module in
which it is actually defined (including in hptroto.h), adding the
appropriate underscores.
Consider the above example in which Fun was renamed FUN. The
actual definition appears as follows:
#ifdef DF_CAPFNAMES
define Fun FNAME(FUN)
#endif /* DF_CAPFNAMES */
Short Names vs. Long
Names
In the C implementations supported by HDF, identifiers may be any
length with at least the first 31 characters being significant.
FORTRAN compilers differ in the maximum lengths of identifiers that
they allow, but all of those supported by HDF allow identifiers to be at
least seven characters long.
To deal with the discrepancies between identifier lengths allowed by C
and those allowed by the various FORTRAN compilers, a set of
equivalent short names has been created for use when programming in
FORTRAN. For every HDF routine with a name more than seven
characters long, there is an identical routine whose name is seven or
fewer characters long.
For example, the routines DFSDgetdims (in dfsd.c) and dsgdims
(in dfsdff.f) are functionally identical.
April 12, 1996
7-9
NCSA HDF Specification and Developer’s Guide
Differences Between ANSI C and Old C
The current HDF release supports both ANSI C and old C compilers.
ANSI C is preferred because it has many features that help ensure
portability; unfortunately, many important platforms do not support
full ANSI C. The HDF code determines whether ANSI C is available
from the flag __STDC__. If ANSI C is available on a platform, then
__STDC__ is defined by the compiler.3
The most noticeable difference between ANSI C and old C is in the way
functions are declared. For example, in ANSI C the function
DFSDsetdims() is declared with a single line:
int DFSDsetdims(intn rank, int32 dimsizes[])
In old C the same function is declared as follows:
int DFSDsetdims(rank, dimsizes)
intn rank;
int32 dimsizes[];
HDF accommodates these differences by defining the flag PROTOTYPE
in hdfi.h. PROTOTYPE is used for every function declaration in a
manner similar to the following example:
#ifdef PROTOTYPE
int DFSDsetdims(intn rank, int32 dimsizes[])
#else
int DFSDsetdims(rank, dimsizes)
intn rank;
int32 dimsizes[];
#endif /* PROTOTYPE */
Note that prototypes are supported by some C compilers that are not
otherwise ANSI-conformant. In such situations, PROTOTYPE is defined
even though __STDC__ is not.
Another difference between old C and ANSI C is that ANSI C supports
function prototypes with arguments. (Old C also supports function
prototypes, but without the argument list.) , This feature helps in
detecting errors in the number and types of arguments. This difference
is handled by means of a macro PROTO, which is defined as follows:
#ifdef PROTOTYPE
#define
PROTO(x) x
#else
#define
PROTO(x) ()
#endif
This macro is applied as in the following example:
extern int32 Hopen
PROTO((char *path, intn access, int16 ndds));
When PROTOTYPE is defined, PROTO causes the argument list to stay
as it is. When PROTOTYPE is not defined, PROTO causes the argument
list to disappear.
3
__STDC__ is generally defined by ANSI-conforming C compilers. Some C compilers are not entirely ANSI-conforming,
yet they conform well enough that the HDF implementation can treat them as if they were. In such cases, it is permissible
to define __STDC__ by adding the option -D__STDC__ to the cc line in the makefile.
7-10
National Center for Supercomputing Applications
Portability Issues
Type Differences
Platforms and compilers also differ in the sizes of numbers that they
assign to different data types, in their representations of different
number types, and in the way they organize aggregates of numbers
(especially structures).
Size differences
The same number type can be different sizes on different platforms.
The type int, for example, is 16 bits to many IBM PC compilers, 48
bits to some supercomputer compilers, and 32 bits on most others.
This can cause problems that are difficult to diagnose in code, like the
HDF code, that depends in many places on numbers being the right
size.
HDF handles this problem by fully defining all variable types and
function data types via typedef, including the number of bits
occupied. All parameters, members of structures, and static, automatic,
and external variables are so defined .
The HDF data types include the following (types with the prefix u are
unsigned.)
int8
uint8
int16
uint16
int32
uint32
float32
float64
intn
uintn
For each machine, typedefs are declared that map all of the data types
used into the best available types. For example, int32 is defined as
follows for Sun's C compiler:
typedef long int int32;
Unfortunately, the HDF data types do not always map exactly to one of
the native data types. For example, the Cray UNICOS C compiler does
not support a 16-bit data type. In such instances, HDF uses the best
available match and care is taken to minimize potential problems.
The data types intn and uintn are for situations where it can be
determined that number type size is unimportant and that a 16-bit
integer is large enough to hold any value the number can have. In
such cases, the native integer type (or unsigned integer type) of the host
machine is used. Experience indicates that substantial performance
gains can be achieved by using intn or uintn in certain
circumstances.
April 12, 1996
7-11
NCSA HDF Specification and Developer’s Guide
Number Representation
One of the keys to producing a portable file format is to ensure that
numbers that are represented differently on different machines are
converted correctly when moved from machine to machine. HDF
provides conversion routines to convert between native representations
and a standard representation that is actually used in the HDF file. This
ensures that HDF data will always be interpreted correctly, regardless of
the platform on which it is read or written. Details of this process will
be included in a later edition of this manual.
Byte-order and Structure
Representations
Even when the basic bit-representation of constants or aggregates like
structures is the same across platforms, the ways that the bits are
packed into a word and the order in which the bits are laid out can differ.
For example, DEC and Intel-based machines generally order bytes
differently from most others. And the C compiler on a Cray, with a
64-bit word, packs structures differently from those on 32-bit word
machines.
Differences in byte order among machines are handled in either of two
ways. When the data to be written (or read) includes non-integer data
and/or a large array of any type of data, conversion routines mentioned
in the previous section, “Number Representation,” are invoked. When
an individual integer is to be written (or read), an ENCODE or DECODE
macro is used.
The following ENCODE and DECODE macros are available for 16-bit
and 32-bit integers:
INT16ENCODE
UINT16ENCODE
INT32ENCODE
UINT32ENCODE
INT16DECODE
UINT16DECODE
INT32DECODE
UINT32DECODE
The ENCODE macros write integers to an HDF file in a standard
format regardless of the word-size and byte order of the host machine.
Likewise, the DECODE macros read integers from a standard format in
an HDF file and provide the integers in the required byte order and word
size to the host machine.
Since the ENCODE and DECODE macros deal with both byte order and
word size, they are also used in reading and writing record-like
structures. For example, an HDF data descriptor consists of two 16-bit
fields followed by two 32-bit fields, as implied by the following C
declaration:
struct {
uint16
uint16
uint32
uint32
}
tag;
ref;
offset;
length;
Even though this structure might occupy 12 bytes on one platform or
32 bytes on another (e.g., a Cray), it must occupy exactly 12 bytes in
an HDF file. Furthermore, some machines represent the numbers
internally in different byte orders than others, but the byte order must
always be big-endian in an HDF file. The ENCODE and DECODE
7-12
National Center for Supercomputing Applications
Portability Issues
macros ensure that these values are always represented correctly in HDF
files and as presented to any host machine.
Access to Library Functions
Despite standardization efforts, function libraries often differ in
significant ways. At least three types of functions require special
treatment in the HDF implementation:
File I/O
Some platforms use 16-bit values for the element size and the
number of elements to write or read, while others use 32-bit
values. This must be considered when working with either stream
or system level I/O functions (i.e., the functions associated with
the fopen() and open() calls).
Memory allocation and release
First, 16-bit machines use a 16-bit value to indicate the number of
bytes to allocate or release at one time. Second, certain operating
systems (notably MS Windows and MAC/OS) don't have
malloc() and free() calls. These operating systems use
handles for allocating memory and require different function calls.
Memory and string manipulation
These functions (e.g., memcpy(), memcmp(), strcpy(), and
strlen()) require slightly different function names under different
memory models in MS DOS and under MS Windows than on
most other systems.
HDF accommodates these special situations by defining appropriate
macros in the machine-specific sections of hdfi.h.
April 12, 1996
7-13
Appendix
A
Tags and Extended Tag Labels
The tables in this appendix lists all of the NCSA-supported HDF tags
and the labels used to identify extended tags.
Tags
Table A.1 lists all the NCSA-supported HDF tags with the following
information:
Tag
The tag itself
Tag number
The regular tag number in decimal (top) and
hexadecimal (bottom)
Extended tag number
The extended tag number used with linked blocks and
external data elements in decimal and (hexadecimal)
Full name
The tag name, a descriptive English phrase
Section
The section of Chapter 6, “Tag Specifications,” in
which the tag is discussed
Table A.1
NCSA-supported HDF Tags
Tag
DFTAG_AR
DFTAG_CAL
DFTAG_CCN
DFTAG_CFM
DFTAG_CI8
DFTAG_DIA
April 12, 1996
Number
312
0x0138
731
0x02DB
310
0x0136
311
0x0137
203
0x00CB
105
0x0069
Extended
Number
Full Name
Section
Aspect ratio
Raster Image Tags
Calibration information
Scientific Data Set Tags
Color correction
Raster Image Tags
Color format
Raster Image Tags
Compressed image-8
Obsolete Tags
Data identifier annotation
Annotation Tags
A-1
NCSA HDF Specification and Developer’s Guide
Table A.1
NCSA-supported HDF Tags (Continued)
Tag
Number
DFTAG_DIL
104
0x0068
400
0x0190
101
0x0065
100
0x0064
732
0x02DC
14
0x000E
DFTAG_DRAW
DFTAG_FD
DFTAG_FID
DFTAG_FV
DFTAG_GREYJPEG
DFTAG_ID
DFTAG_ID8
DFTAG_II8
DFTAG_IMC
DFTAG_IP8
DFTAG_JPEG
DFTAG_LD
DFTAG_LUT
DFTAG_MA
DFTAG_MD
DFTAG_MT
DFTAG_NDG
DFTAG_NT
DFTAG_NULL
DFTAG_RI
DFTAG_RI8
A-2
Extended
Number
300
0x012C
200
0x00C8
204
0x00CC
12
0x000C
201
0x00C9
13
0x000D
307
0x0133
301
0x012D
309
0x0135
308
0x0134
107
0x006B
720
0x02D0
106
0x006A
1
0x0001
302
0x012E
202
0x00CA
16686
0x412E
Full Name
Section
Data identifier label
Annotation Tags
Draw
Composite Image Tags
File description
Annotation Tags
File identifier
Annotation Tags
Fill value
Scientific Data Set Tags
8-bit JPEG compression
information
Image dimension
Compression Tags
Image dimension-8
Obsolete Tags
IMCOMP image-8
Obsolete Tags
IMCOMP compressed data
Compression Tags
Image palette-8
Obsolete Tags
24-bit JPEG compression
information
LUT dimension
Compression Tags
Raster Image Tags
Lookup table
Raster Image Tags
Matte channel
Raster Image Tags
Matte channel dimension
Raster Image Tags
Machine type
Utility Tags
Numeric data group
Scientific Data Set Tags
Number type
Utility Tags
No data
Utility Tags
Raster image
Raster Image Tags
Raster image-8
Obsolete Tags
Raster Image Tags
National Center for Supercomputing Applications
Tags and Extended Tag Labels
Table A.1
NCSA-supported HDF Tags (Continued)
Tag
Number
DFTAG_RIG
306
0x0132
11
0x000B
702
0x02BE
708
0x02C4
701
0x02BD
DFTAG_RLE
DFTAG_SD
DFTAG_SDC
DFTAG_SDD
DFTAG_SDF
DFTAG_SDG
DFTAG_SDL
DFTAG_SDLNK
DFTAG_SDM
DFTAG_SDS
DFTAG_SDT
DFTAG_SDU
DFTAG_T105
DFTAG_T14
DFTAG_TD
DFTAG_TID
DFTAG_VERSION
DFTAG_VG
DFTAG_VH
DFTAG_VS
DFTAG_XYP
April 12, 1996
706
0x02C2
700
0x02BC
704
0x02C0
710
0x02C6
707
0x02C3
703
0x02BF
709
0x02C5
705
0x02C1
603
0x25B
602
0x25A
103
0x0067
102
0x0066
30
0x001E
1965
0x07AD
1962
0x07AA
1963
0x07AB
500
0x01F4
Extended
Number
17086
0x42BE
18347
0x47AB
Full Name
Section
Raster image group
Raster Image Tags
Run length encoded data
Compression Tags
Scientific data
Scientific Data Set Tags
Scientific data coordinates
Scientific Data Set Tags
Scientific data dimension
record
Scientific data format
Scientific Data Set Tags
Scientific data group
Obsolete Tags
Scientific data labels
Scientific Data Set Tags
Scientific data set link
Scientific Data Set Tags
Scientific data max/min
Scientific Data Set Tags
Scientific data scales
Scientific Data Set Tags
Scientific data transpose
Obsolete Tags
Scientific data units
Scientific Data Set Tags
Tektronix 4105
Vector Image Tags
Tektronix 4014
Vector Image Tags
Tag description
Annotation Tags
Tag identifier
Annotation Tags
Library version number
Utility Tags
Vgroup
Vset Tags
Vdata description
Vset Tags
Vdata
Vset Tags
X-Y position
Composite Image Tags
Scientific Data Set Tags
A-3
NCSA HDF Specification and Developer’s Guide
Extended Tag Labels
Table A.2 lists labels used to identify HDF extended tags. The table
includes the following information:
Extended tag label
The label, which appears as the first element of the
extended tag description record
Physical storage method
The alternative storage method indicated by the label
Table A.2
Extended Tag Labels
Extended Tag Label
EXT_EXTERN
EXT_LINKED
A-4
Physical Storage Method
External file element
Linked block element
National Center for Supercomputing Applications
Preferences:
? All numbers shown in DECIMAL (if not stated otherwise)
? First byte in a file has No: 0000
? Abbreviations :
hex. = hexadecimal, dec. = decimal
?
LSB = least significant byte
?
MSB = most significant Byte
?
NN
= value varies
Structure of a PCS file:
***** Header *****
Byte (hex)
Value (hex)
0000
0001
0002-0003
0004-0043
32
lead-in char., always hex 32
02 or 03
small hoop = 02 large = 03
10 00
No. of colors ALWAYS 16
NN NN NN NN
color definition, 4-Byte per
color. 16 colors
NN NN
Nr. of stitches in file, LSB
first. Max. 65536 Stitch count
does NOT (!) include color
changes
0044-0045
Meaning
***** Stitches and color change records *****
***** each nine bytes long
*****
0046-NNNN
***** stitch record *****
NNNN-NNNN
-orNNNN-NNNN
00 XX XX XX 00 YY YY YY 00
x and y
00 XX XX XX 00 YY YY YY 02
LSB first
coordinates
***** color change record *****
NNNN-NNNN
02 00 00 00 00 00 00 00 03
First Byte gives the appropriate number of the color, Last Byte = 03 marks record
as color change,
***** Bitmap file-name and design description *****
Def.:
MMMM = (stitch count + No. of color changes) * 9
0046+MMMM
Def.:
Bitmap filename used as pattern for
digitizing (12 characters) 00 terminated
DDDD = MMMM + 13
0046+DDDD
File description, 00 terminated
This concludes the structure description of a typical PCS file. The Hex dump of a
2-stitch, 1-color change PCS file designed for the 80x80 hoop (small) with the
bitmap pattern ëFEDER.BMPí and the file description ëSchnatzka Bubublickí follows:
Hex-dump of a 2-stitch design file:
Byte
00 01 02 03
04 05 06 07
08 09 0A 0B
0C 0D 0E 0F
0000
0010
0020
0030
0040
0050
0060
0070
0080
32
00
80
80
FF
73
00
68
6B
02
80
00
80
FF
00
46
6E
00
10
80
00
00
FF
00
45
61
00
00
00
00
00
00
44
74
00
00
FF
FF
02
27
45
7A
00
FF
00
FF
00
01
52
6B
00
FF
00
00
00
00
2E
61
00
00
00
00
00
00
42
20
00
80
00
80
00
00
4D
42
00
00
80
80
00
33
50
75
80
80
00
80
00
01
00
62
00
00
00
00
00
00
00
75
00
FF
00
C0
00
00
00
62
Here¥s the formatted TEXT dump of the same file:
Byte
0123456789ABCDEF
0000
0010
0020
0030
0040
0050
0060
0070
0080
2 ∑∑∑∑∑∑∑«∑∑∑_∑
∑««∑∑__∑«∑«∑_∑_∑
«∑∑∑_∑∑∑∑«∑∑∑_∑∑
««∑∑__∑∑«««∑+++∑
___∑ ∑∑∑∑∑∑∑∑∑ ∑
s∑∑∑' ∑∑∑3 ∑∑¸∑∑
∑FEDER.BMP∑∑∑∑Sc
hnatzka Bubublic
k∑
00
00
FF
C0
00
81
00
6C
FF
FF
00
C0
03
00
53
69
00
00
00
00
00
00
63
63
INTERNET-DRAFT
Portable Network Graphics
Expires: 10 Dec 1996
T. Boutell, et al
Boutell.Com, Inc.
10 June 1996
PNG (Portable Network Graphics) Specification
Version 1.0
File draft-boutell-png-spec-04.txt
Status of This Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
ftp.isi.edu (US West Coast).
Distribution of this memo is unlimited.
Abstract
This document describes PNG (Portable Network Graphics), an
extensible file format for the lossless, portable, well-compressed
storage of raster images. PNG provides a patent-free replacement for
GIF and can also replace many common uses of TIFF. Indexed-color,
grayscale, and truecolor images are supported, plus an optional alpha
channel. Sample depths range from 1 to 16 bits.
PNG is designed to work well in online viewing applications, such as
the World Wide Web, so it is fully streamable with a progressive
display option. PNG is robust, providing both full file integrity
checking and simple detection of common transmission errors. Also,
PNG can store gamma and chromaticity data for improved color matching
on heterogeneous platforms.
This specification defines a proposed Internet Media Type image/png.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 1]
10 June 1996
Table of Contents
1. Introduction ...................................................
2. Data Representation ............................................
2.1. Integers and byte order ...................................
2.2. Color values ..............................................
2.3. Image layout ..............................................
2.4. Alpha channel .............................................
4
5
5
5
6
7
3.
4.
5.
6.
7.
8.
2.5. Filtering ................................................. 8
2.6. Interlaced data order ..................................... 8
2.7. Gamma correction .......................................... 9
2.8. Text strings ............................................. 10
File Structure ................................................ 10
3.1. PNG file signature ....................................... 10
3.2. Chunk layout ............................................. 10
3.3. Chunk naming conventions ................................. 11
3.4. CRC algorithm ............................................ 14
Chunk Specifications .......................................... 14
4.1. Critical Chunks .......................................... 14
4.1.1. IHDR Image Header .................................. 14
4.1.2. PLTE Palette ....................................... 16
4.1.3. IDAT Image Data .................................... 16
4.1.4. IEND Image Trailer ................................. 17
4.2. Ancillary Chunks ......................................... 17
4.2.1. bKGD Background Color .............................. 17
4.2.2. cHRM Primary Chromaticities and White Point ........ 18
4.2.3. gAMA Image Gamma ................................... 19
4.2.4. hIST Image Histogram ............................... 19
4.2.5. pHYs Physical Pixel Dimensions ..................... 20
4.2.6. sBIT Significant Bits .............................. 21
4.2.7. tEXt Textual Data .................................. 22
4.2.8. tIME Image Last-Modification Time .................. 23
4.2.9. tRNS Transparency .................................. 24
4.2.10. zTXt Compressed Textual Data ...................... 25
4.3. Summary of Standard Chunks ............................... 26
4.4. Additional Chunk Types ................................... 26
Deflate/Inflate Compression ................................... 27
Filter Algorithms ............................................. 29
6.1. Filter type 0: None ...................................... 30
6.2. Filter type 1: Sub ....................................... 30
6.3. Filter type 2: Up ........................................ 31
6.4. Filter type 3: Average ................................... 31
6.5. Filter type 4: Paeth ..................................... 32
Chunk Ordering Rules .......................................... 33
7.1. Behavior of PNG editors .................................. 34
7.2. Ordering of ancillary chunks ............................. 34
7.3. Ordering of critical chunks .............................. 35
Miscellaneous Topics .......................................... 35
8.1. File name extension ...................................... 35
8.2. Internet media type ...................................... 35
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 2]
10 June 1996
8.3. Macintosh file layout ....................................
8.4. Multiple-image extension .................................
8.5. Security considerations ..................................
9. Recommendations for Encoders ..................................
9.1. Bit depth scaling ........................................
9.2. Encoder gamma handling ...................................
9.3. Encoder color handling ...................................
9.4. Alpha channel creation ...................................
9.5. Suggested palettes .......................................
9.6. Filter selection .........................................
9.7. Text chunk processing ....................................
9.8. Use of private chunks ....................................
9.9. Private type and method codes ............................
10. Recommendations for Decoders .................................
10.1. Error checking ..........................................
10.2. Pixel dimensions ........................................
10.3. Truecolor image handling ................................
10.4. Bit depth rescaling .....................................
10.5. Decoder gamma handling ..................................
10.6. Decoder color handling ..................................
10.7. Background color ........................................
10.8. Alpha channel processing ................................
10.9. Progressive display .....................................
35
36
36
37
37
39
41
43
43
44
44
45
46
46
46
47
47
48
49
51
52
52
56
10.10. Suggested-palette and histogram usage ..................
10.11. Text chunk processing ..................................
11. Glossary .....................................................
12. Appendix: Rationale ..........................................
12.1. Why a new file format? ..................................
12.2. Why these features? .....................................
12.3. Why not these features? .................................
12.4. Why not use format X? ...................................
12.5. Byte order ..............................................
12.6. Interlacing .............................................
12.7. Why gamma? ..............................................
12.8. Non-premultiplied alpha .................................
12.9. Filtering ...............................................
12.10. Text strings ...........................................
12.11. PNG file signature .....................................
12.12. Chunk layout ...........................................
12.13. Chunk naming conventions ...............................
12.14. Palette histograms .....................................
13. Appendix: Gamma Tutorial .....................................
14. Appendix: Color Tutorial .....................................
15. Appendix: Sample CRC Code ....................................
16. Appendix: Online Resources ...................................
17. Appendix: Revision History ...................................
18. References ...................................................
19. Credits ......................................................
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 3]
10 June 1996
1. Introduction
The PNG format provides a portable, legally unencumbered, wellcompressed, well-specified standard for lossless bitmapped image
files.
Although the initial motivation for developing PNG was to replace
GIF, the design provides some useful new features not available in
GIF, with minimal cost to developers.
GIF features retained in PNG include:
* Indexed-color images of up to 256 colors.
* Streamability: files can be read and written serially, thus
allowing the file format to be used as a communications
protocol for on-the-fly generation and display of images.
* Progressive display: a suitably prepared image file can be
displayed as it is received over a communications link,
yielding a low-resolution image very quickly followed by
gradual improvement of detail.
* Transparency: portions of the image can be marked as
transparent, creating the effect of a nonrectangular image.
* Ancillary information: textual comments and other data can be
stored within the image file.
* Complete hardware and platform independence.
* Effective, 100% lossless compression.
Important new features of PNG, not available in GIF, include:
*
*
*
*
57
58
59
63
63
63
64
65
65
66
66
67
68
68
69
70
70
72
72
80
84
86
87
87
89
Truecolor images of up to 48 bits per pixel.
Grayscale images of up to 16 bits per pixel.
Full alpha channel (general transparency masks).
Image gamma information, which supports automatic display of
images with correct brightness/contrast regardless of the
machines used to originate and display the image.
* Reliable, straightforward detection of file corruption.
* Faster initial presentation in progressive display mode.
PNG is designed to be:
* Simple and portable: developers should be able to implement PNG
easily.
* Legally unencumbered: to the best knowledge of the PNG authors,
no algorithms under legal challenge are used. (Some
considerable effort has been spent to verify this.)
* Well compressed: both indexed-color and truecolor images are
compressed as effectively as in any other widely used lossless
format, and in most cases more effectively.
* Interchangeable: any standard-conforming PNG decoder will read
all conforming PNG files.
* Flexible: the format allows for future extensions and private
add-ons, without compromising interchangeability of basic PNG.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 4]
10 June 1996
* Robust: the design supports full file integrity checking as
well as simple, quick detection of common transmission errors.
The main part of this specification gives the definition of the file
format and recommendations for encoder and decoder behavior. An
appendix gives the rationale for many design decisions. Although the
rationale is not part of the formal specification, reading it can
help implementors understand the design. Cross-references in the
main text point to relevant parts of the rationale. Additional
appendixes, also not part of the formal specification, provide
tutorials on gamma and color theory as well as other supporting
material.
See Rationale: Why a new file format? (Section 12.1), Why these
features? (Section 12.2), Why not these features? (Section 12.3), Why
not use format X? (Section 12.4).
Pronunciation
PNG is pronounced "ping".
2. Data Representation
This chapter discusses basic data representations used in PNG files,
as well as the expected representation of the image data.
2.1. Integers and byte order
All integers that require more than one byte will be in network
byte order: the most significant byte comes first, then the less
significant bytes in descending order of significance (MSB LSB for
two-byte integers, B3 B2 B1 B0 for four-byte integers). The
highest bit (value 128) of a byte is numbered bit 7; the lowest
bit (value 1) is numbered bit 0. Values are unsigned unless
otherwise noted. Values explicitly noted as signed are represented
in two's complement notation.
See Rationale: Byte order (Section 12.5).
2.2. Color values
Colors can be represented by either grayscale or RGB (red, green,
blue) sample data. Grayscale data represents luminance; RGB data
represents calibrated color information (if the cHRM chunk is
present) or uncalibrated device-dependent color (if cHRM is
absent). All color values range from zero (representing black) to
most intense at the maximum value for the bit depth. Note that
the maximum value at a given bit depth is (2^bitdepth)-1, not
2^bitdepth.
Sample values are not necessarily linear; the gAMA chunk specifies
the gamma characteristic of the source device, and viewers are
Boutell, et al
INTERNET-DRAFT
Informational
[Page 5]
PNG: Portable Network Graphics
strongly encouraged to compensate properly.
(Section 2.7).
10 June 1996
See Gamma correction
Source data with a precision not directly supported in PNG (for
example, 5 bit/sample truecolor) must be scaled up to the next
higher supported bit depth. This scaling is reversible with no
loss of data, and it reduces the number of cases that decoders
must cope with. See Recommendations for Encoders: Bit depth
scaling (Section 9.1) and Recommendations for Decoders: Bit depth
rescaling (Section 10.4).
2.3. Image layout
Conceptually, a PNG image is a rectangular pixel array, with
pixels appearing left-to-right within each scanline, and scanlines
appearing top-to-bottom. (For progressive display purposes, the
data may actually be transmitted in a different order; see
Interlaced data order, Section 2.6.) The size of each pixel is
determined by the bit depth, which is the number of bits per
sample in the image data.
Three types of pixel are supported:
* An indexed-color pixel is represented by a single sample
that is an index into a supplied palette. The image bit
depth determines the maximum number of palette entries, but
not the color precision within the palette.
* A grayscale pixel is represented by a single sample that is
a grayscale level, where zero is black and the largest value
for the bit depth is white.
* A truecolor pixel is represented by three samples: red (zero
= black, max = red) appears first, then green (zero = black,
max = green), then blue (zero = black, max = blue). The bit
depth specifies the size of each sample, not the total pixel
size.
Optionally, grayscale and truecolor pixels can also include an
alpha sample, as described in the next section.
Pixels are always packed into scanlines with no wasted bits
between pixels. Pixels smaller than a byte never cross byte
boundaries; they are packed into bytes with the leftmost pixel in
the high-order bits of a byte, the rightmost in the low-order
bits. Permitted bit depths and pixel types are restricted so that
in all cases the packing is simple and efficient.
PNG permits multi-sample pixels only with 8- and 16-bit samples,
so multiple samples of a single pixel are never packed into one
byte. 16-bit samples are stored in network byte order (MSB
first).
Scanlines always begin on byte boundaries.
Boutell, et al
INTERNET-DRAFT
When pixels have fewer
Informational
PNG: Portable Network Graphics
[Page 6]
10 June 1996
than 8 bits and the scanline width is not evenly divisible by the
number of pixels per byte, the low-order bits in the last byte of
each scanline are wasted.
unspecified.
The contents of these wasted bits are
An additional "filter type" byte is added to the beginning of
every scanline (see Filtering, Section 2.5). The filter type byte
is not considered part of the image data, but it is included in
the datastream sent to the compression step.
2.4. Alpha channel
An alpha channel, representing transparency information on a perpixel basis, can be included in grayscale and truecolor PNG
images.
An alpha value of zero represents full transparency, and a value
of (2^bitdepth)-1 represents a fully opaque pixel. Intermediate
values indicate partially transparent pixels that can be combined
with a background image to yield a composite image. (Thus, alpha
is really the degree of opacity of the pixel. But most people
refer to alpha as providing transparency information, not opacity
information, and we continue that custom here.)
Alpha channels can be included with images that have either 8 or
16 bits per sample, but not with images that have fewer than 8
bits per sample. Alpha samples are represented with the same bit
depth used for the image samples. The alpha sample for each pixel
is stored immediately following the grayscale or RGB samples of
the pixel.
The color values stored for a pixel are not affected by the alpha
value assigned to the pixel. This rule is sometimes called
"unassociated" or "non-premultiplied" alpha. (Another common
technique is to store sample values premultiplied by the alpha
fraction; in effect, such an image is already composited against a
black background. PNG does not use premultiplied alpha.)
Transparency control is also possible without the storage cost of
a full alpha channel. In an indexed-color image, an alpha value
can be defined for each palette entry. In grayscale and truecolor
images, a single pixel value can be identified as being
"transparent". These techniques are controlled by the tRNS
ancillary chunk type.
If no alpha channel nor tRNS chunk is present, all pixels in the
image are to be treated as fully opaque.
Viewers can support transparency control partially, or not at all.
See Rationale: Non-premultiplied alpha (Section 12.8),
Recommendations for Encoders: Alpha channel creation (Section
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 7]
10 June 1996
9.4), and Recommendations for Decoders: Alpha channel processing
(Section 10.8).
2.5. Filtering
PNG allows the image data to be filtered before it is compressed.
Filtering can improve the compressibility of the data. The filter
step itself does not reduce the size of the data. All PNG filters
are strictly lossless.
PNG defines several different filter algorithms, including "none"
which indicates no filtering. The filter algorithm is specified
for each scanline by a filter type byte that precedes the filtered
scanline in the precompression datastream. An intelligent encoder
can switch filters from one scanline to the next. The method for
choosing which filter to employ is up to the encoder.
See Filter Algorithms (Chapter 6) and Rationale: Filtering
(Section 12.9).
2.6. Interlaced data order
A PNG image can be stored in interlaced order to allow progressive
display. The purpose of this feature is to allow images to "fade
in" when they are being displayed on-the-fly. Interlacing
slightly expands the file size on average, but it gives the user a
meaningful display much more rapidly. Note that decoders are
required to be able to read interlaced images, whether or not they
actually perform progressive display.
With interlace method 0, pixels are stored sequentially from left
to right, and scanlines sequentially from top to bottom (no
interlacing).
Interlace method 1, known as Adam7 after its author, Adam M.
Costello, consists of seven distinct passes over the image. Each
pass transmits a subset of the pixels in the image. The pass in
which each pixel is transmitted is defined by replicating the
following 8-by-8 pattern over the entire image, starting at the
upper left corner:
1
7
5
7
3
7
5
7
6
7
6
7
6
7
6
7
4
7
5
7
4
7
5
7
6
7
6
7
6
7
6
7
2
7
5
7
3
7
5
7
6
7
6
7
6
7
6
7
4
7
5
7
4
7
5
7
6
7
6
7
6
7
6
7
Within each pass, the selected pixels are transmitted left to
right within a scanline, and selected scanlines sequentially from
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 8]
10 June 1996
top to bottom. For example, pass 2 contains pixels 4, 12, 20,
etc. of scanlines 0, 8, 16, etc. (numbering from 0,0 at the upper
left corner). The last pass contains the entirety of scanlines 1,
3, 5, etc.
The data within each pass is laid out as though it were a complete
image of the appropriate dimensions. For example, if the complete
image is 16 by 16 pixels, then pass 3 will contain two scanlines,
each containing four pixels. When pixels have fewer than 8 bits,
each such scanline is padded as needed to fill an integral number
of bytes (see Image layout, Section 2.3). Filtering is done on
this reduced image in the usual way, and a filter type byte is
transmitted before each of its scanlines (see Filter Algorithms,
Chapter 6). Notice that the transmission order is defined so that
all the scanlines transmitted in a pass will have the same number
of pixels; this is necessary for proper application of some of the
filters.
Caution: If the image contains fewer than five columns or fewer
than five rows, some passes will be entirely empty. Encoder and
decoder authors must be careful to handle this case correctly. In
particular, filter type bytes are only associated with nonempty
scanlines; no filter type bytes are present in an empty pass.
See Rationale: Interlacing (Section 12.6) and Recommendations for
Decoders: Progressive display (Section 10.9).
2.7. Gamma correction
PNG images can specify, via the gAMA chunk, the gamma
characteristic of the image with respect to the original scene.
Display programs are strongly encouraged to use this information,
plus information about the display device they are using and room
lighting, to present the image to the viewer in a way that
reproduces what the image's original author saw as closely as
possible. See Gamma Tutorial (Chapter 13) if you aren't already
familiar with gamma issues.
Gamma correction is not applied to the alpha channel, if any.
Alpha samples always represent a linear fraction of full opacity.
For high-precision applications, the exact chromaticity of the RGB
data in a PNG image can be specified via the cHRM chunk, allowing
more accurate color matching than gamma correction alone will
provide. See Color Tutorial (Chapter 14) if you aren't already
familiar with color representation issues.
See Rationale: Why gamma? (Section 12.7), Recommendations for
Encoders: Encoder gamma handling (Section 9.2), and
Recommendations for Decoders: Decoder gamma handling (Section
10.5).
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 9]
10 June 1996
2.8. Text strings
A PNG file can store text associated with the image, such as an
image description or copyright notice. Keywords are used to
indicate what each text string represents.
ISO 8859-1 (Latin-1) is the character set recommended for use in
text strings [ISO-8859]. This character set is a superset of 7bit ASCII.
Character codes not defined in Latin-1 should not be used, because
they have no platform-independent meaning. If a non-Latin-1 code
does appear in a PNG text string, its interpretation will vary
across platforms and decoders. Some systems might not even be
able to display all the characters in Latin-1, but most modern
systems can.
Provision is also made for the storage of compressed text.
See Rationale: Text strings (Section 12.10).
3. File Structure
A PNG file consists of a PNG signature followed by a series of
chunks. This chapter defines the signature and the basic properties
of chunks. Individual chunk types are discussed in the next chapter.
3.1. PNG file signature
The first eight bytes of a PNG file always contain the following
(decimal) values:
137 80 78 71 13 10 26 10
This signature indicates that the remainder of the file contains a
single PNG image, consisting of a series of chunks beginning with
an IHDR chunk and ending with an IEND chunk.
See Rationale: PNG file signature (Section 12.11).
3.2. Chunk layout
Each chunk consists of four parts:
Length
A 4-byte unsigned integer giving the number of bytes in the
chunk's data field. The length counts only the data field, not
itself, the chunk type code, or the CRC. Zero is a valid
length. Although encoders and decoders should treat the length
as unsigned, its value must not exceed (2^31)-1 bytes.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 10]
10 June 1996
Chunk Type
A 4-byte chunk type code. For convenience in description and
in examining PNG files, type codes are restricted to consist of
uppercase and lowercase ASCII letters (A-Z and a-z, or 65-90
and 97-122 decimal). However, encoders and decoders must treat
the codes as fixed binary values, not character strings. For
example, it would not be correct to represent the type code
IDAT by the EBCDIC equivalents of those letters. Additional
naming conventions for chunk types are discussed in the next
section.
Chunk Data
The data bytes appropriate to the chunk type, if any.
field can be of zero length.
This
CRC
A 4-byte CRC (Cyclic Redundancy Check) calculated on the
preceding bytes in the chunk, including the chunk type code and
chunk data fields, but not including the length field. The CRC
is always present, even for chunks containing no data. See CRC
algorithm (Section 3.4).
The chunk data length can be any number of bytes up to the
maximum; therefore, implementors cannot assume that chunks are
aligned on any boundaries larger than bytes.
Chunks can appear in any order, subject to the restrictions placed
on each chunk type. (One notable restriction is that IHDR must
appear first and IEND must appear last; thus the IEND chunk serves
as an end-of-file marker.) Multiple chunks of the same type can
appear, but only if specifically permitted for that type.
See Rationale: Chunk layout (Section 12.12).
3.3. Chunk naming conventions
Chunk type codes are assigned so that a decoder can determine some
properties of a chunk even when it does not recognize the type
code. These rules are intended to allow safe, flexible extension
of the PNG format, by allowing a decoder to decide what to do when
it encounters an unknown chunk. The naming rules are not normally
of interest when the decoder does recognize the chunk's type.
Four bits of the type code, namely bit 5 (value 32) of each byte,
are used to convey chunk properties. This choice means that a
human can read off the assigned properties according to whether
each letter of the type code is uppercase (bit 5 is 0) or
lowercase (bit 5 is 1). However, decoders should test the
properties of an unknown chunk by numerically testing the
specified bits; testing whether a character is uppercase or
lowercase is inefficient, and even incorrect if a locale-specific
case definition is used.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 11]
10 June 1996
It is worth noting that the property bits are an inherent part of
the chunk name, and hence are fixed for any chunk type. Thus,
TEXT and Text would be unrelated chunk type codes, not the same
chunk with different properties. Decoders should recognize type
codes by a simple four-byte literal comparison; it is incorrect to
perform case conversion on type codes.
The semantics of the property bits are:
Ancillary bit: bit 5 of first byte
0 (uppercase) = critical, 1 (lowercase) = ancillary.
Chunks that are not strictly necessary in order to meaningfully
display the contents of the file are known as "ancillary"
chunks. A decoder encountering an unknown chunk in which the
ancillary bit is 1 can safely ignore the chunk and proceed to
display the image. The time chunk (tIME) is an example of an
ancillary chunk.
Chunks that are necessary for successful display of the file's
contents are called "critical" chunks. A decoder encountering
an unknown chunk in which the ancillary bit is 0 must indicate
to the user that the image contains information it cannot
safely interpret. The image header chunk (IHDR) is an example
of a critical chunk.
Private bit: bit 5 of second byte
0 (uppercase) = public, 1 (lowercase) = private.
A public chunk is one that is part of the PNG specification or
is registered in the list of PNG special-purpose public chunk
types. Applications can also define private (unregistered)
chunks for their own purposes. The names of private chunks
must have a lowercase second letter, while public chunks will
always be assigned names with uppercase second letters. Note
that decoders do not need to test the private-chunk property
bit, since it has no functional significance; it is simply an
administrative convenience to ensure that public and private
chunk names will not conflict. See Additional Chunk Types
(Section 4.4) and Recommendations for Encoders: Use of private
chunks (Section 9.8).
Reserved bit: bit 5 of third byte
Must be 0 (uppercase) always.
The significance of the case of the third letter of the chunk
name is reserved for possible future expansion. At the present
time all chunk names must have uppercase third letters.
(Decoders should not complain about a lowercase third letter,
however, as some future version of the PNG specification could
define a meaning for this bit. It is sufficient to treat a
chunk with a lowercase third letter in the same way as any
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 12]
10 June 1996
other unknown chunk type.)
Safe-to-copy bit: bit 5 of fourth byte
0 (uppercase) = unsafe to copy, 1 (lowercase) = safe to copy.
This property bit is not of interest to pure decoders, but it
is needed by PNG editors (programs that modify PNG files).
This bit defines the proper handling of unrecognized chunks in
a file that is being modified.
If a chunk's safe-to-copy bit is 1, the chunk may be copied to
a modified PNG file whether or not the software recognizes the
chunk type, and regardless of the extent of the file
modifications.
If a chunk's safe-to-copy bit is 0, it indicates that the chunk
depends on the image data. If the program has made any changes
to critical chunks, including addition, modification, deletion,
or reordering of critical chunks, then unrecognized unsafe
chunks must not be copied to the output PNG file. (Of course,
if the program does recognize the chunk, it can choose to
output an appropriately modified version.)
A PNG editor is always allowed to copy all unrecognized chunks
if it has only added, deleted, modified, or reordered ancillary
chunks. This implies that it is not permissible for ancillary
chunks to depend on other ancillary chunks.
PNG editors that do not recognize a critical chunk must report
an error and refuse to process that PNG file at all. The
safe/unsafe mechanism is intended for use with ancillary
chunks. The safe-to-copy bit will always be 0 for critical
chunks.
Rules for PNG editors are discussed further in Chunk Ordering
Rules (Chapter 7).
For example, the hypothetical chunk type name "bLOb" has the
property bits:
bLOb
||||
|||+||+-|+--+----
<-- 32 bit chunk type code represented in text form
Safe-to-copy bit is 1 (lower case letter;
Reserved bit is 0
(upper case letter;
Private bit is 0
(upper case letter;
Ancillary bit is 1
(lower case letter;
bit
bit
bit
bit
5
5
5
5
is
is
is
is
1)
0)
0)
1)
Therefore, this name represents an ancillary, public, safe-to-copy
chunk.
See Rationale: Chunk naming conventions (Section 12.13).
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 13]
10 June 1996
3.4. CRC algorithm
Chunk CRCs are calculated using standard CRC methods with pre and
post conditioning, as defined by ISO 3309 [ISO-3309] or ITU-T V.42
[ITU-V42]. The CRC polynomial employed is
x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x+1
The 32-bit CRC register is initialized to all 1's, and then the
data from each byte is processed from the least significant bit
(1) to the most significant bit (128). After all the data bytes
are processed, the CRC register is inverted (its ones complement
is taken). This value is transmitted (stored in the file) MSB
first. For the purpose of separating into bytes and ordering, the
least significant bit of the 32-bit CRC is defined to be the
coefficient of the x^31 term.
Practical calculation of the CRC always employs a precalculated
table to greatly accelerate the computation. See Sample CRC Code
(Chapter 15).
4. Chunk Specifications
This chapter defines the standard types of PNG chunks.
4.1. Critical Chunks
All implementations must understand and successfully render the
standard critical chunks. A valid PNG image must contain an IHDR
chunk, one or more IDAT chunks, and an IEND chunk.
4.1.1. IHDR Image Header
The IHDR chunk must appear FIRST.
Width:
Height:
Bit depth:
Color type:
Compression method:
Filter method:
Interlace method:
4
4
1
1
1
1
1
It contains:
bytes
bytes
byte
byte
byte
byte
byte
Width and height give the image dimensions in pixels. They are
4-byte integers. Zero is an invalid value. The maximum for each
is (2^31)-1 in order to accommodate languages that have
difficulty with unsigned 4-byte values.
Bit
per
one
all
depth is a single-byte
sample (not per pixel,
sample). Valid values
values are allowed for
Boutell, et al
INTERNET-DRAFT
integer giving the number of bits
except when a pixel contains just
are 1, 2, 4, 8, and 16, although not
all color types.
Informational
[Page 14]
PNG: Portable Network Graphics
Color type is a single-byte integer that
interpretation of the image data. Color
sums of the following values: 1 (palette
and 4 (alpha channel used). Valid values
10 June 1996
describes the
type codes represent
used), 2 (color used),
are 0, 2, 3, 4, and 6.
Bit depth restrictions for each color type are imposed to
simplify implementations and to prohibit combinations that do
not compress well. Decoders must support all legal
combinations of bit depth and color type. The allowed
combinations are:
Color
Type
Allowed
Bit Depths
Interpretation
0
1,2,4,8,16
Each pixel is a grayscale sample.
2
8,16
Each pixel is an R,G,B triple.
3
1,2,4,8
Each pixel is a palette index;
a PLTE chunk must appear.
4
8,16
Each pixel is a grayscale sample,
followed by an alpha sample.
6
8,16
Each pixel is an R,G,B triple,
followed by an alpha sample.
Compression method is a single-byte integer that indicates the
method used to compress the image data. At present, only
compression method 0 (deflate/inflate compression with a 32K
sliding window) is defined. All standard PNG images must be
compressed with this scheme. The compression method field is
provided for possible future expansion or proprietary variants.
Decoders must check this byte and report an error if it holds
an unrecognized code. See Deflate/Inflate Compression (Chapter
5) for details.
Filter method is a single-byte integer that indicates the
preprocessing method applied to the image data before
compression. At present, only filter method 0 (adaptive
filtering with five basic filter types) is defined. As with
the compression method field, decoders must check this byte and
report an error if it holds an unrecognized code. See Filter
Algorithms (Chapter 6) for details.
Interlace method is a single-byte integer that indicates the
transmission order of the image data. Two values are currently
defined: 0 (no interlace) or 1 (Adam7 interlace). See
Interlaced data order (Section 2.6) for details.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 15]
10 June 1996
4.1.2. PLTE Palette
The PLTE chunk contains from 1 to 256 palette entries, each a
three-byte series of the form:
Red:
1 byte (0 = black, 255 = red)
Green: 1 byte (0 = black, 255 = green)
Blue: 1 byte (0 = black, 255 = blue)
The number of entries is determined from the chunk length.
chunk length not divisible by 3 is an error.
A
This chunk must appear for color type 3, and can appear for
color types 2 and 6; it is not allowed for color types 0 and 4.
If this chunk does appear, it must precede the first IDAT
chunk. There cannot be more than one PLTE chunk.
For color type 3 (indexed color), the PLTE chunk is required.
The first entry in PLTE is referenced by pixel value 0, the
second by pixel value 1, etc. The number of palette entries
must not exceed the range that can be represented in the image
bit depth (for example, 2^4 = 16 for a bit depth of 4). It is
permissible to have fewer entries than the bit depth would
allow. In that case, any out-of-range pixel value found in the
image data is an error.
For color types 2 and 6 (truecolor and truecolor with alpha),
the PLTE chunk is optional. If present, it provides a
suggested set of from 1 to 256 colors to which the truecolor
image can be quantized if the viewer cannot display truecolor
directly. If PLTE is not present, such a viewer must select
colors on its own, but it is often preferable for this to be
done once by the encoder. (See Recommendations for Encoders:
Suggested palettes, Section 9.5.)
Note that the palette uses 8 bits (1 byte) per sample
regardless of the image bit depth specification. In
particular, the palette is 8 bits deep even when it is a
suggested quantization of a 16-bit truecolor image.
There is no requirement that the palette entries all be used by
the image, nor that they all be different.
4.1.3. IDAT Image Data
The IDAT chunk contains the actual image data.
To create this
data:
* Begin with image scanlines represented as described in
Image layout (Section 2.3); the layout and total size of
this raw data are determined by the fields of IHDR.
* Filter the image data according to the filtering method
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 16]
10 June 1996
specified by the IHDR chunk. (Note that with filter
method 0, the only one currently defined, this implies
prepending a filter type byte to each scanline.)
* Compress the filtered data using the compression method
specified by the IHDR chunk.
The IDAT chunk contains the output datastream of the
compression algorithm.
To read the image data, reverse this process.
There can be multiple IDAT chunks; if so, they must appear
consecutively with no other intervening chunks. The compressed
datastream is then the concatenation of the contents of all the
IDAT chunks. The encoder can divide the compressed datastream
into IDAT chunks however it wishes. (Multiple IDAT chunks are
allowed so that encoders can work in a fixed amount of memory;
typically the chunk size will correspond to the encoder's
buffer size.) It is important to emphasize that IDAT chunk
boundaries have no semantic significance and can occur at any
point in the compressed datastream. A PNG file in which each
IDAT chunk contains only one data byte is legal, though
remarkably wasteful of space. (For that matter, zero-length
IDAT chunks are legal, though even more wasteful.)
See Filter Algorithms (Chapter 6) and Deflate/Inflate
Compression (Chapter 5) for details.
4.1.4. IEND Image Trailer
The IEND chunk must appear LAST. It marks the end of the PNG
datastream. The chunk's data field is empty.
4.2. Ancillary Chunks
All ancillary chunks are optional, in the sense that encoders need
not write them and decoders can ignore them. However, encoders
are encouraged to write the standard ancillary chunks when the
information is available, and decoders are encouraged to interpret
these chunks when appropriate and feasible.
The standard ancillary chunks are listed in alphabetical order.
This is not necessarily the order in which they would appear in a
file.
4.2.1. bKGD Background Color
The bKGD chunk specifies a default background color to present
the image against. Note that viewers are not bound to honor
this chunk; a viewer can choose to use a different background.
For color type 3 (indexed color), the bKGD chunk contains:
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 17]
10 June 1996
Palette index:
1 byte
The value is the palette index of the color to be used as
background.
For color types 0 and 4 (grayscale, with or without alpha),
bKGD contains:
Gray:
2 bytes, range 0 .. (2^bitdepth)-1
(For consistency, 2 bytes are used regardless of the image bit
depth.) The value is the gray level to be used as background.
For color types 2 and 6 (truecolor, with or without alpha),
bKGD contains:
Red:
2 bytes, range 0 .. (2^bitdepth)-1
Green: 2 bytes, range 0 .. (2^bitdepth)-1
Blue: 2 bytes, range 0 .. (2^bitdepth)-1
(For consistency, 2 bytes per sample are used regardless of the
image bit depth.) This is the RGB color to be used as
background.
When present, the bKGD chunk must precede the first IDAT chunk,
and must follow the PLTE chunk, if any.
See Recommendations for Decoders: Background color (Section
10.7).
4.2.2. cHRM Primary Chromaticities and White Point
Applications that need device-independent specification of
colors in a PNG file can use the cHRM chunk to specify the 1931
CIE x,y chromaticities of the red, green, and blue primaries
used in the image, and the referenced white point. See Color
Tutorial (Chapter 14) for more information.
The cHRM chunk contains:
White Point x: 4 bytes
White Point y: 4 bytes
Red x:
4 bytes
Red y:
4 bytes
Green x:
4 bytes
Green y:
4 bytes
Blue x:
4 bytes
Blue y:
4 bytes
Each value is encoded as a 4-byte unsigned integer,
representing the x or y value times 100000. For example, a
value of 0.3127 would be stored as the integer 31270.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 18]
10 June 1996
cHRM is allowed in all PNG files, although it is of little
value for grayscale images.
If the encoder does not know the chromaticity values, it should
not write a cHRM chunk; the absence of a cHRM chunk indicates
that the image's primary colors are device-dependent.
If the cHRM chunk appears, it must precede the first IDAT
chunk, and it must also precede the PLTE chunk if present.
See Recommendations for Encoders: Encoder color handling
(Section 9.3), and Recommendations for Decoders: Decoder color
handling (Section 10.6).
4.2.3. gAMA Image Gamma
The gAMA chunk specifies the gamma of the camera (or simulated
camera) that produced the image, and thus the gamma of the
image with respect to the original scene. More precisely, the
gAMA chunk encodes the file_gamma value, as defined in Gamma
Tutorial (Chapter 13).
The gAMA chunk contains:
Image gamma: 4 bytes
The value is encoded as a 4-byte unsigned integer, representing
gamma times 100000. For example, a gamma of 0.45 would be
stored as the integer 45000.
If the encoder does not know the image's gamma value, it should
not write a gAMA chunk; the absence of a gAMA chunk indicates
that the gamma is unknown.
If the gAMA chunk appears, it must precede the first IDAT
chunk, and it must also precede the PLTE chunk if present.
See Gamma correction (Section 2.7), Recommendations for
Encoders: Encoder gamma handling (Section 9.2), and
Recommendations for Decoders: Decoder gamma handling (Section
10.5).
4.2.4. hIST Image Histogram
The hIST chunk gives the approximate usage frequency of each
color in the color palette. A histogram chunk can appear only
when a palette chunk appears. If a viewer is unable to provide
all the colors listed in the palette, the histogram may help it
decide how to choose a subset of the colors for display.
The hIST chunk contains a series of 2-byte (16 bit) unsigned
integers. There must be exactly one entry for each entry in
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 19]
10 June 1996
the PLTE chunk. Each entry is proportional to the fraction of
pixels in the image that have that palette index; the exact
scale factor is chosen by the encoder.
Histogram entries are approximate, with the exception that a
zero entry specifies that the corresponding palette entry is
not used at all in the image. It is required that a histogram
entry be nonzero if there are any pixels of that color.
When the palette is a suggested quantization of a truecolor
image, the histogram is necessarily approximate, since a
decoder may map pixels to palette entries differently than the
encoder did. In this situation, zero entries should not
appear.
The hIST chunk, if it appears, must follow the PLTE chunk, and
must precede the first IDAT chunk.
See Rationale: Palette histograms (Section 12.14), and
Recommendations for Decoders: Suggested-palette and histogram
usage (Section 10.10).
4.2.5. pHYs Physical Pixel Dimensions
The pHYs chunk specifies the intended resolution for display of
the image.
It contains:
Pixels per unit, X axis: 4 bytes (unsigned integer)
Pixels per unit, Y axis: 4 bytes (unsigned integer)
Unit specifier:
1 byte
The following values are legal for the unit specifier:
0: unit is unknown
1: unit is the meter
When the unit specifier is 0, the pHYs chunk defines pixel
aspect ratio only; the actual size of the pixels remains
unspecified.
Conversion note: one inch is equal to exactly 0.0254 meters.
If this ancillary chunk is not present, pixels are assumed to
be square, and the physical size of each pixel is unknown.
If present, this chunk must precede the first IDAT chunk.
See Recommendations for Decoders: Pixel dimensions (Section
10.2).
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 20]
10 June 1996
4.2.6. sBIT Significant Bits
To simplify decoders, PNG specifies that only certain sample
bit depths can be used, and further specifies that sample
values should be scaled to the full range of possible values at
that bit depth. However, the sBIT chunk is provided in order
to store the original number of significant bits. This allows
decoders to recover the original data losslessly even if the
data had a bit depth not directly supported by PNG. We
recommend that an encoder emit an sBIT chunk if it has
converted the data from a lower bit depth.
For color type 0 (grayscale), the sBIT chunk contains a single
byte, indicating the number of bits that were significant in
the source data.
For color type 2 (truecolor), the sBIT chunk contains three
bytes, indicating the number of bits that were significant in
the source data for the red, green, and blue channels,
respectively.
For color type 3 (indexed color), the sBIT chunk contains three
bytes, indicating the number of bits that were significant in
the source data for the red, green, and blue components of the
palette entries, respectively.
For color type 4 (grayscale with alpha channel), the sBIT chunk
contains two bytes, indicating the number of bits that were
significant in the source grayscale data and the source alpha
data, respectively.
For color type 6 (truecolor with alpha channel), the sBIT chunk
contains four bytes, indicating the number of bits that were
significant in the source data for the red, green, blue and
alpha channels, respectively.
Each depth specified in sBIT must be greater than zero and less
than or equal to the sample depth (which is 8 for indexed-color
images, and the bit depth given in IHDR for other color types).
A decoder need not pay attention to sBIT: the stored image is a
valid PNG file of the sample depth indicated by IHDR. However,
if the decoder wishes to recover the original data at its
original precision, this can be done by right-shifting the
stored samples (the stored palette entries, for an indexedcolor image). The encoder must scale the data in such a way
that the high-order bits match the original data.
If the sBIT chunk appears, it must precede the first IDAT
chunk, and it must also precede the PLTE chunk if present.
See Recommendations for Encoders: Bit depth scaling (Section
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 21]
10 June 1996
9.1) and Recommendations for Decoders: Bit depth rescaling
(Section 10.4).
4.2.7. tEXt Textual Data
Textual information that the encoder wishes to record with the
image can be stored in tEXt chunks. Each tEXt chunk contains a
keyword and a text string, in the format:
Keyword:
1-79 bytes (character string)
Null separator: 1 byte
Text:
n bytes (character string)
The keyword and text string are separated by a zero byte (null
character). Neither the keyword nor the text string can
contain a null character. Note that the text string is not
null-terminated (the length of the chunk is sufficient
information to locate the ending). The keyword must be at
least one character and less than 80 characters long. The text
string can be of any length from zero bytes up to the maximum
permissible chunk size less the length of the keyword and
separator.
Any number of tEXt chunks can appear, and more than one with
the same keyword is permissible.
The keyword indicates the type of information represented by
the text string. The following keywords are predefined and
should be used where appropriate:
Title
Author
Description
Copyright
Creation Time
Software
Disclaimer
Warning
Source
Comment
Short (one line) title or caption for image
Name of image's creator
Description of image (possibly long)
Copyright notice
Time of original image creation
Software used to create the image
Legal disclaimer
Warning of nature of content
Device used to create the image
Miscellaneous comment; conversion from
GIF comment
For the Creation Time keyword, the date format defined in
section 5.2.14 of RFC 1123 is suggested, but not required
[RFC-1123]. Decoders should allow for free-format text
associated with this or any other keyword.
Other keywords may be invented for other purposes. Keywords of
general interest can be registered with the maintainers of the
PNG specification. However, it is also permitted to use
private unregistered keywords. (Private keywords should be
reasonably self-explanatory, in order to minimize the chance
Boutell, et al
Informational
INTERNET-DRAFT
PNG: Portable Network Graphics
[Page 22]
10 June 1996
that the same keyword will be used for incompatible purposes by
different people.)
Both keyword and text are interpreted according to the ISO
8859-1 (Latin-1) character set [ISO-8859]. The text string can
contain any Latin-1 character. Newlines in the text string
should be represented by a single linefeed character (decimal
10); use of other control characters in the text is
discouraged.
Keywords must contain only printable Latin-1 characters and
spaces; that is, only character codes 32-126 and 161-255
decimal are allowed. To reduce the chances for human
misreading of a keyword, leading and trailing spaces are
forbidden, as are consecutive spaces. Note also that the nonbreaking space (code 160) is not permitted in keywords, since
it is visually indistinguishable from an ordinary space.
Keywords must be spelled exactly as registered, so that
decoders can use simple literal comparisons when looking for
particular keywords. In particular, keywords are considered
case-sensitive.
See Recommendations for Encoders: Text chunk processing
(Section 9.7) and Recommendations for Decoders: Text chunk
processing (Section 10.11).
4.2.8. tIME Image Last-Modification Time
The tIME chunk gives the time of the last image modification
(not the time of initial image creation). It contains:
Year:
Month:
Day:
Hour:
Minute:
Second:
2
1
1
1
1
1
bytes (complete; for example, 1995, not 95)
byte (1-12)
byte (1-31)
byte (0-23)
byte (0-59)
byte (0-60)
(yes, 60, for leap seconds; not 61,
a common error)
Universal Time (UTC, also called GMT) should be specified
rather than local time.
The tIME chunk is intended for use as an automatically-applied
time stamp that is updated whenever the image data is changed.
It is recommended that tIME not be changed by PNG editors that
do not change the image data. See also the Creation Time tEXt
keyword, which can be used for a user-supplied time.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 23]
10 June 1996
4.2.9. tRNS Transparency
The tRNS chunk specifies that the image uses simple
transparency: either alpha values associated with palette
entries (for indexed-color images) or a single transparent
color (for grayscale and truecolor images). Although simple
transparency is not as elegant as the full alpha channel, it
requires less storage space and is sufficient for many common
cases.
For color type 3 (indexed color), the tRNS chunk contains a
series of one-byte alpha values, corresponding to entries in
the PLTE chunk:
Alpha for palette index 0:
Alpha for palette index 1:
... etc ...
1 byte
1 byte
Each entry indicates that pixels of the corresponding palette
index should be treated as having the specified alpha value.
Alpha values have the same interpretation as in an 8-bit full
alpha channel: 0 is fully transparent, 255 is fully opaque,
regardless of image bit depth. The tRNS chunk must not contain
more alpha values than there are palette entries, but tRNS can
contain fewer values than there are palette entries. In this
case, the alpha value for all remaining palette entries is
assumed to be 255. In the common case in which only palette
index 0 need be made transparent, only a one-byte tRNS chunk is
needed.
For color type 0 (grayscale), the tRNS chunk contains a single
gray level value, stored in the format:
Gray:
2 bytes, range 0 .. (2^bitdepth)-1
(For consistency, 2 bytes are used regardless of the image bit
depth.) Pixels of the specified gray level are to be treated as
transparent (equivalent to alpha value 0); all other pixels are
to be treated as fully opaque (alpha value (2^bitdepth)-1).
For color type 2 (truecolor), the tRNS chunk contains a single
RGB color value, stored in the format:
Red:
2 bytes, range 0 .. (2^bitdepth)-1
Green: 2 bytes, range 0 .. (2^bitdepth)-1
Blue: 2 bytes, range 0 .. (2^bitdepth)-1
(For consistency, 2 bytes per sample are used regardless of the
image bit depth.) Pixels of the specified color value are to be
treated as transparent (equivalent to alpha value 0); all other
pixels are to be treated as fully opaque (alpha value
(2^bitdepth)-1).
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 24]
10 June 1996
tRNS is prohibited for color types 4 and 6, since a full alpha
channel is already present in those cases.
Note: when dealing with 16-bit grayscale or truecolor data, it
is important to compare both bytes of the sample values to
determine whether a pixel is transparent. Although decoders
may drop the low-order byte of the samples for display, this
must not occur until after the data has been tested for
transparency. For example, if the grayscale level 0x0001 is
specified to be transparent, it would be incorrect to compare
only the high-order byte and decide that 0x0002 is also
transparent.
When present, the tRNS chunk must precede the first IDAT chunk,
and must follow the PLTE chunk, if any.
4.2.10. zTXt Compressed Textual Data
The zTXt chunk contains textual data, just as tEXt does;
however, zTXt takes advantage of compression. zTXt and tEXt
chunks are semantically equivalent, but zTXt is recommended for
storing large blocks of text.
A zTXt chunk contains:
Keyword:
Null separator:
Compression method:
Compressed text:
1-79 bytes (character string)
1 byte
1 byte
n bytes
The keyword and null separator are exactly the same as in the
tEXt chunk. Note that the keyword is not compressed. The
compression method byte identifies the compression method used
in this zTXt chunk. The only value presently defined for it is
0 (deflate/inflate compression). The compression method byte is
followed by a compressed datastream that makes up the remainder
of the chunk. For compression method 0, this datastream
adheres to the zlib datastream format (see Deflate/Inflate
Compression, Chapter 5). Decompression of this datastream
yields Latin-1 text that is identical to the text that would be
stored in an equivalent tEXt chunk.
Any number of zTXt and tEXt chunks can appear in the same file.
See the preceding definition of the tEXt chunk for the
predefined keywords and the recommended format of the text.
See Recommendations for Encoders: Text chunk processing
(Section 9.7), and Recommendations for Decoders: Text chunk
processing (Section 10.11).
Boutell, et al
Informational
INTERNET-DRAFT
[Page 25]
PNG: Portable Network Graphics
10 June 1996
4.3. Summary of Standard Chunks
This table summarizes some properties of the standard chunk types.
Critical chunks (must appear in this order, except PLTE
is optional):
Name
IHDR
PLTE
IDAT
IEND
Multiple
OK?
No
No
Yes
No
Ordering constraints
Must be first
Before IDAT
Multiple IDATs must be consecutive
Must be last
Ancillary chunks (need not appear in this order):
Name
cHRM
gAMA
sBIT
bKGD
hIST
tRNS
pHYs
tIME
tEXt
zTXt
Multiple
OK?
No
No
No
No
No
No
No
No
Yes
Yes
Ordering constraints
Before PLTE
Before PLTE
Before PLTE
After PLTE;
After PLTE;
After PLTE;
Before IDAT
None
None
None
and IDAT
and IDAT
and IDAT
before IDAT
before IDAT
before IDAT
Standard keywords for tEXt and zTXt chunks:
Title
Author
Description
Copyright
Creation Time
Software
Disclaimer
Warning
Source
Comment
Short (one line) title or caption for image
Name of image's creator
Description of image (possibly long)
Copyright notice
Time of original image creation
Software used to create the image
Legal disclaimer
Warning of nature of content
Device used to create the image
Miscellaneous comment; conversion from
GIF comment
4.4. Additional Chunk Types
Additional public PNG chunk types are defined in the document "PNG
Special-Purpose Public Chunks" [PNG-EXTENSIONS]. Chunks described
there are expected to be less widely supported than those defined
in this specification. However, application authors are
encouraged to use those chunk types whenever appropriate for their
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 26]
10 June 1996
applications. Additional chunk types can be proposed for
inclusion in that list by contacting the PNG specification
maintainers at [email protected]
New public chunks will only be registered if they are of use to
others and do not violate the design philosophy of PNG. Chunk
registration is not automatic, although it is the intent of the
authors that it be straightforward when a new chunk of potentially
wide application is needed. Note that the creation of new
critical chunk types is discouraged unless absolutely necessary.
Applications can also use private chunk types to carry data that
is not of interest to other applications. See Recommendations for
Encoders: Use of private chunks (Section 9.8).
Decoders must be prepared to encounter unrecognized public or
private chunk type codes. Unrecognized chunk types must be
handled as described in Chunk naming conventions (Section 3.3).
5. Deflate/Inflate Compression
PNG compression method 0 (the only compression method presently
defined for PNG) specifies deflate/inflate compression with a 32K
sliding window. Deflate compression is an LZ77 derivative used in
zip, gzip, pkzip and related programs. Extensive research has been
done supporting its patent-free status. Portable C implementations
are freely available.
Deflate-compressed datastreams within PNG are stored in the "zlib"
format, which has the structure:
Compression method/flags code:
Additional flags/check bits:
Compressed data blocks:
Check value:
1
1
n
4
byte
byte
bytes
bytes
Further details on this format are given in the zlib specification
[RFC-1950].
For PNG compression method 0, the zlib compression method/flags code
must specify method code 8 ("deflate" compression) and an LZ77 window
size of not more than 32K. Note that the zlib compression method
number is not the same as the PNG compression method number. The
additional flags must not specify a preset dictionary.
The compressed data within the zlib datastream is stored as a series
of blocks, each of which can represent raw (uncompressed) data,
LZ77-compressed data encoded with fixed Huffman codes, or LZ77compressed data encoded with custom Huffman codes. A marker bit in
the final block identifies it as the last block, allowing the decoder
to recognize the end of the compressed datastream. Further details
on the compression algorithm and the encoding are given in the
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 27]
10 June 1996
deflate specification [RFC-1951].
The check value stored at the end of the zlib datastream is
calculated on the uncompressed data represented by the datastream.
Note that the algorithm used is not the same as the CRC calculation
used for PNG chunk check values. The zlib check value is useful
mainly as a cross-check that the deflate and inflate algorithms are
implemented correctly. Verifying the chunk CRCs provides adequate
confidence that the PNG file has been transmitted undamaged.
In a PNG file, the concatenation of the contents of all the IDAT
chunks makes up a zlib datastream as specified above. This
datastream decompresses to filtered image data as described elsewhere
in this document.
It is important to emphasize that the boundaries between IDAT chunks
are arbitrary and can fall anywhere in the zlib datastream. There is
not necessarily any correlation between IDAT chunk boundaries and
deflate block boundaries or any other feature of the zlib data. For
example, it is entirely possible for the terminating zlib check value
to be split across IDAT chunks.
In the same vein, there is no required correlation between the
structure of the image data (i.e., scanline boundaries) and deflate
block boundaries or IDAT chunk boundaries. The complete image data
is represented by a single zlib datastream that is stored in some
number of IDAT chunks; a decoder that assumes any more than this is
incorrect. (Of course, some encoder implementations may emit files
in which some of these structures are indeed related. But decoders
cannot rely on this.)
PNG also uses zlib datastreams in zTXt chunks. In a zTXt chunk, the
remainder of the chunk following the compression method byte is a
zlib datastream as specified above. This datastream decompresses to
the user-readable text described by the chunk's keyword. Unlike the
image data, such datastreams are not split across chunks; each zTXt
chunk contains an independent zlib datastream.
Additional documentation and portable C code for deflate and inflate
are available from the Info-ZIP archives at
<URL:ftp://ftp.uu.net/pub/archiving/zip/>.
Boutell, et al
Informational
[Page 28]
INTERNET-DRAFT
PNG: Portable Network Graphics
10 June 1996
6. Filter Algorithms
This chapter describes the filter algorithms that can be applied
before compression. The purpose of these filters is to prepare the
image data for optimum compression.
PNG filter method 0 defines five basic filter types:
Type
Name
0
1
2
3
4
None
Sub
Up
Average
Paeth
(Note that filter method 0 in IHDR specifies exactly this set of five
filter types. If the set of filter types is ever extended, a
different filter method number will be assigned to the extended set,
so that decoders need not decompress the data to discover that it
contains unsupported filter types.)
The encoder can choose which of these filter algorithms to apply on a
scanline-by-scanline basis. In the image data sent to the
compression step, each scanline is preceded by a filter type byte
that specifies the filter algorithm used for that scanline.
Filtering algorithms are applied to bytes, not to pixels, regardless
of the bit depth or color type of the image. The filtering
algorithms work on the byte sequence formed by a scanline that has
been represented as described in Image layout (Section 2.3). If the
image includes an alpha channel, the alpha data is filtered in the
same way as the image data.
When the image is interlaced, each pass of the interlace pattern is
treated as an independent image for filtering purposes. The filters
work on the byte sequences formed by the pixels actually transmitted
during a pass, and the "previous scanline" is the one previously
transmitted in the same pass, not the one adjacent in the complete
image. Note that the subimage transmitted in any one pass is always
rectangular, but is of smaller width and/or height than the complete
image. Filtering is not applied when this subimage is empty.
For all filters, the bytes "to the left of" the first pixel in a
scanline must be treated as being zero. For filters that refer to
the prior scanline, the entire prior scanline must be treated as
being zeroes for the first scanline of an image (or of a pass of an
interlaced image).
To reverse the effect of a filter, the decoder must use the decoded
values of the prior pixel on the same line, the pixel immediately
above the current pixel on the prior line, and the pixel just to the
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 29]
10 June 1996
left of the pixel above. This implies that at least one scanline's
worth of image data must be stored by the decoder at all times. Even
though some filter types do not refer to the prior scanline, the
decoder must always store each scanline as it is decoded, since the
next scanline might use a filter that refers to it.
PNG imposes no restriction on which filter types can be applied to an
image. However, the filters are not equally effective on all types
of data. See Recommendations for Encoders: Filter selection (Section
9.6).
See also Rationale: Filtering (Section 12.9).
6.1. Filter type 0: None
With the None filter, the scanline is transmitted unmodified; it
is only necessary to insert a filter type byte before the data.
6.2. Filter type 1: Sub
The Sub filter transmits the difference between each byte and the
value of the corresponding byte of the prior pixel.
To compute the Sub filter, apply the following formula to each
byte of the scanline:
Sub(x) = Raw(x) - Raw(x-bpp)
where x ranges from zero to the number of bytes representing the
scanline minus one, Raw(x) refers to the raw data byte at that
byte position in the scanline, and bpp is defined as the number of
bytes per complete pixel, rounding up to one. For example, for
color type 2 with a bit depth of 16, bpp is equal to 6 (three
samples, two bytes per sample); for color type 0 with a bit depth
of 2, bpp is equal to 1 (rounding up); for color type 4 with a bit
depth of 16, bpp is equal to 4 (two-byte grayscale sample, plus
two-byte alpha sample).
Note this computation is done for each byte, regardless of bit
depth. In a 16-bit image, each MSB is predicted from the
preceding MSB and each LSB from the preceding LSB, because of the
way that bpp is defined.
Unsigned arithmetic modulo 256 is used, so that both the inputs
and outputs fit into bytes. The sequence of Sub values is
transmitted as the filtered scanline.
For all x < 0, assume Raw(x) = 0.
To reverse the effect of the Sub filter after decompression,
output the following value:
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 30]
10 June 1996
Sub(x) + Raw(x-bpp)
(computed mod 256), where Raw refers to the bytes already decoded.
6.3. Filter type 2: Up
The Up filter is just like the Sub filter except that the pixel
immediately above the current pixel, rather than just to its left,
is used as the predictor.
To compute the Up filter, apply the following formula to each byte
of the scanline:
Up(x) = Raw(x) - Prior(x)
where x ranges from zero to the number of bytes representing the
scanline minus one, Raw(x) refers to the raw data byte at that
byte position in the scanline, and Prior(x) refers to the
unfiltered bytes of the prior scanline.
Note this is done for each byte, regardless of bit depth.
Unsigned arithmetic modulo 256 is used, so that both the inputs
and outputs fit into bytes. The sequence of Up values is
transmitted as the filtered scanline.
On the first scanline of an image (or of a pass of an interlaced
image), assume Prior(x) = 0 for all x.
To reverse the effect of the Up filter after decompression, output
the following value:
Up(x) + Prior(x)
(computed mod 256), where Prior refers to the decoded bytes of the
prior scanline.
6.4. Filter type 3: Average
The Average filter uses the average of the two neighboring pixels
(left and above) to predict the value of a pixel.
To compute the Average filter, apply the following formula to each
byte of the scanline:
Average(x) = Raw(x) - floor((Raw(x-bpp)+Prior(x))/2)
where x ranges from zero to the number of bytes representing the
scanline minus one, Raw(x) refers to the raw data byte at that
byte position in the scanline, Prior(x) refers to the unfiltered
bytes of the prior scanline, and bpp is defined as for the Sub
filter.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 31]
10 June 1996
Note this is done for each byte, regardless of bit depth.
sequence of Average values is transmitted as the filtered
scanline.
The
The subtraction of the predicted value from the raw byte must be
done modulo 256, so that both the inputs and outputs fit into
bytes. However, the sum Raw(x-bpp)+Prior(x) must be formed
without overflow (using at least nine-bit arithmetic). floor()
indicates that the result of the division is rounded to the next
lower integer if fractional; in other words, it is an integer
division or right shift operation.
For all x < 0, assume Raw(x) = 0. On the first scanline of an
image (or of a pass of an interlaced image), assume Prior(x) = 0
for all x.
To reverse the effect of the Average filter after decompression,
output the following value:
Average(x) + floor((Raw(x-bpp)+Prior(x))/2)
where the result is computed mod 256, but the prediction is
calculated in the same way as for encoding. Raw refers to the
bytes already decoded, and Prior refers to the decoded bytes of
the prior scanline.
6.5. Filter type 4: Paeth
The Paeth filter computes a simple linear function of the three
neighboring pixels (left, above, upper left), then chooses as
predictor the neighboring pixel closest to the computed value.
This technique is due to Alan W. Paeth [PAETH].
To compute the Paeth filter, apply the following formula to each
byte of the scanline:
Paeth(x) = Raw(x) - PaethPredictor(Raw(x-bpp), Prior(x),
Prior(x-bpp))
where x ranges from zero to the number of bytes representing the
scanline minus one, Raw(x) refers to the raw data byte at that
byte position in the scanline, Prior(x) refers to the unfiltered
bytes of the prior scanline, and bpp is defined as for the Sub
filter.
Note this is done for each byte, regardless of bit depth.
Unsigned arithmetic modulo 256 is used, so that both the inputs
and outputs fit into bytes. The sequence of Paeth values is
transmitted as the filtered scanline.
The PaethPredictor function is defined by the following
pseudocode:
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 32]
10 June 1996
function PaethPredictor (a, b, c)
begin
; a = left, b = above, c = upper left
p := a + b - c
; initial estimate
pa := abs(p - a)
; distances to a, b, c
pb := abs(p - b)
pc := abs(p - c)
; return nearest of a,b,c,
; breaking ties in order a,b,c.
if pa <= pb AND pa <= pc then return a
else if pb <= pc then return b
else return c
end
The calculations within the PaethPredictor function must be
performed exactly, without overflow. Arithmetic modulo 256 is to
be used only for the final step of subtracting the function result
from the target byte value.
Note that the order in which ties are broken is critical and must
not be altered. The tie break order is: pixel to the left, pixel
above, pixel to the upper left. (This order differs from that
given in Paeth's article.)
For all x < 0, assume Raw(x) = 0 and Prior(x) = 0. On the first
scanline of an image (or of a pass of an interlaced image), assume
Prior(x) = 0 for all x.
To reverse the effect of the Paeth filter after decompression,
output the following value:
Paeth(x) + PaethPredictor(Raw(x-bpp), Prior(x), Prior(x-bpp))
(computed mod 256), where Raw and Prior refer to bytes already
decoded. Exactly the same PaethPredictor function is used by both
encoder and decoder.
7. Chunk Ordering Rules
To allow new chunk types to be added to PNG, it is necessary to
establish rules about the ordering requirements for all chunk types.
Otherwise a PNG editing program cannot know what to do when it
encounters an unknown chunk.
We define a "PNG editor" as a program that modifies a PNG file and
wishes to preserve as much as possible of the ancillary information
in the file. Two examples of PNG editors are a program that adds or
modifies text chunks, and a program that adds a suggested palette to
a truecolor PNG file. Ordinary image editors are not PNG editors in
this sense, because they usually discard all unrecognized information
while reading in an image. (Note: we strongly encourage programs
handling PNG files to preserve ancillary information whenever
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 33]
10 June 1996
possible.)
As an example of possible problems, consider a hypothetical new
ancillary chunk type that is safe-to-copy and is required to appear
after PLTE if PLTE is present. If our program to add a suggested
PLTE does not recognize this new chunk, it may insert PLTE in the
wrong place, namely after the new chunk. We could prevent such
problems by requiring PNG editors to discard all unknown chunks, but
that is a very unattractive solution. Instead, PNG requires
ancillary chunks not to have ordering restrictions like this.
To prevent this type of problem while allowing for future extension,
we put some constraints on both the behavior of PNG editors and the
allowed ordering requirements for chunks.
7.1. Behavior of PNG editors
The rules for PNG editors are:
* When copying an unknown unsafe-to-copy ancillary chunk, a
PNG editor must not move the chunk relative to any critical
chunk. It can relocate the chunk freely relative to other
ancillary chunks that occur between the same pair of
critical chunks. (This is well defined since the editor
must not add, delete, modify, or reorder critical chunks if
it is preserving unknown unsafe-to-copy chunks.)
* When copying an unknown safe-to-copy ancillary chunk, a PNG
editor must not move the chunk from before IDAT to after
IDAT or vice versa. (This is well defined because IDAT is
always present.) Any other reordering is permitted.
* When copying a known ancillary chunk type, an editor need
only honor the specific chunk ordering rules that exist for
that chunk type. However, it can always choose to apply the
above general rules instead.
* PNG editors must give up on encountering an unknown critical
chunk type, because there is no way to be certain that a
valid file will result from modifying a file containing such
a chunk. (Note that simply discarding the chunk is not good
enough, because it might have unknown implications for the
interpretation of other chunks.)
These rules are expressed in terms of copying chunks from an input
file to an output file, but they apply in the obvious way if a PNG
file is modified in place.
See also Chunk naming conventions (Section 3.3).
7.2. Ordering of ancillary chunks
The ordering rules for an ancillary chunk type cannot be any
stricter than this:
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 34]
10 June 1996
* Unsafe-to-copy chunks can have ordering requirements
relative to critical chunks.
* Safe-to-copy chunks can have ordering requirements relative
to IDAT.
The actual ordering rules for any particular ancillary chunk type
may be weaker. See for example the ordering rules for the
standard ancillary chunk types (Summary of Standard Chunks,
Section 4.3).
Decoders must not assume more about the positioning of any
ancillary chunk than is specified by the chunk ordering rules. In
particular, it is never valid to assume that a specific ancillary
chunk type occurs with any particular positioning relative to
other ancillary chunks. (For example, it is unsafe to assume that
your private ancillary chunk occurs immediately before IEND. Even
if your application always writes it there, a PNG editor might
have inserted some other ancillary chunk after it. But you can
safely assume that your chunk will remain somewhere between IDAT
and IEND.)
7.3. Ordering of critical chunks
Critical chunks can have arbitrary ordering requirements, because
PNG editors are required to give up if they encounter unknown
critical chunks. For example, IHDR has the special ordering rule
that it must always appear first. A PNG editor, or indeed any
PNG-writing program, must know and follow the ordering rules for
any critical chunk type that it can emit.
8. Miscellaneous Topics
8.1. File name extension
On systems where file names customarily include an extension
signifying file type, the extension ".png" is recommended for PNG
files. Lower case ".png" is preferred if file names are casesensitive.
8.2. Internet media type
The PNG authors intend to register "image/png" as the Internet
Media Type for PNG [RFC-1521, RFC-1590]. At the date of this
document, the media type registration process had not been
completed. It is recommended that implementations also recognize
the interim media type "image/x-png".
8.3. Macintosh file layout
In the Apple Macintosh system, the following conventions are
recommended:
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 35]
10 June 1996
* The four-byte file type code for PNG files is "PNGf". (This
code has been registered with Apple for PNG files.) The
creator code will vary depending on the creating
application.
* The contents of the data fork shall be a PNG file exactly as
described in the rest of this specification.
* The contents of the resource fork are unspecified. It may
be empty or may contain application-dependent resources.
* When transferring a Macintosh PNG file to a non-Macintosh
system, only the data fork should be transferred.
8.4. Multiple-image extension
PNG itself is strictly a single-image format.
However, it may be
necessary to store multiple images within one file; for example,
this is needed to convert some GIF files. In the future, a
multiple-image format based on PNG may be defined. Such a format
will be considered a separate file format and will have a
different signature. PNG-supporting applications may or may not
choose to support the multiple-image format.
See Rationale: Why not these features? (Section 12.3).
8.5. Security considerations
A PNG file or datastream is composed of a collection of explicitly
typed "chunks". Chunks whose contents are defined by the
specification could actually contain anything, including malicious
code. But there is no known risk that such malicious code could
be executed on the recipient's computer as a result of decoding
the PNG image.
The possible security risks associated with future chunk types
cannot be specified at this time. Security issues will be
considered when evaluating chunks proposed for registration as
public chunks. There is no additional security risk associated
with unknown or unimplemented chunk types, because such chunks
will be ignored, or at most be copied into another PNG file.
The tEXt and zTXt chunks contain data that is meant to be
displayed as plain text. It is possible that if the decoder
displays such text without filtering out control characters,
especially the ESC (escape) character, certain systems or
terminals could behave in undesirable and insecure ways. We
recommend that decoders filter out control characters to avoid
this risk; see Recommendations for Decoders: Text chunk processing
(Section 10.11).
Because
because
defense
attempt
every chunk's length is available at its beginning, and
every chunk has a CRC trailer, there is a very robust
against corrupted data and against fraudulent chunks that
to overflow the decoder's buffers. Also, the PNG
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 36]
10 June 1996
signature bytes provide early detection of common file
transmission errors.
A decoder that fails to check CRCs could be subject to data
corruption. The only likely consequence of such corruption is
incorrectly displayed pixels within the image. Worse things might
happen if the CRC of the IHDR chunk is not checked and the width
or height fields are corrupted. See Recommendations for Decoders:
Error checking (Section 10.1).
A poorly written decoder might be subject to buffer overflow,
because chunks can be extremely large, up to (2^31)-1 bytes long.
But properly written decoders will handle large chunks without
difficulty.
9. Recommendations for Encoders
This chapter gives some recommendations for encoder behavior. The
only absolute requirement on a PNG encoder is that it produce files
that conform to the format specified in the preceding chapters.
However, best results will usually be achieved by following these
recommendations.
9.1. Bit depth scaling
When encoding input samples that have a bit depth that cannot be
directly represented in PNG, the encoder must scale the samples up
to a bit depth that is allowed by PNG.
method is the linear equation
The most accurate scaling
output = ROUND(input * MAXOUTSAMPLE / MAXINSAMPLE)
where the input samples range from 0 to MAXINSAMPLE and the
outputs range from 0 to MAXOUTSAMPLE (which is (2^bitdepth)-1).
A close approximation to the linear scaling method can be achieved
by "left bit replication", which is shifting the valid bits to
begin in the most significant bit and repeating the most
significant bits into the open bits. This method is often faster
to compute than linear scaling. As an example, assume that 5-bit
samples are being scaled up to 8 bits. If the source sample value
is 27 (in the range from 0-31), then the original bits are:
4 3 2 1 0
--------1 1 0 1 1
Left bit replication gives a value of 222:
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 37]
10 June 1996
7 6 5 4 3 2 1 0
---------------1 1 0 1 1 1 1 0
|=======| |===|
|
Leftmost Bits Repeated to Fill Open Bits
|
Original Bits
which matches the value computed by the linear equation. Left bit
replication usually gives the same value as linear scaling, and is
never off by more than one.
A distinctly less accurate approximation is obtained by simply
left-shifting the input value and filling the low order bits with
zeroes. This scheme cannot reproduce white exactly, since it does
not generate an all-ones maximum value; the net effect is to
darken the image slightly. This method is not recommended in
general, but it does have the effect of improving compression,
particularly when dealing with greater-than-eight-bit sample
depths. Since the relative error introduced by zero-fill scaling
is small at high bit depths, some encoders may choose to use it.
Zero-fill should not be used for alpha channel data, however,
since many decoders will special-case alpha values of all zeroes
and all ones. It is important to represent both those values
exactly in the scaled data.
When the encoder writes an sBIT chunk, it is required to do the
scaling in such a way that the high-order bits of the stored
samples match the original data. That is, if the sBIT chunk
specifies a bit depth of S, the high-order S bits of the stored
data must agree with the original S-bit data values. This allows
decoders to recover the original data by shifting right. The
added low-order bits are not constrained. Note that all the above
scaling methods meet this restriction.
When scaling up source data, it is recommended that the low-order
bits be filled consistently for all samples; that is, the same
source value should generate the same sample value at any pixel
position. This improves compression by reducing the number of
distinct sample values. However, this is not a requirement, and
some encoders may choose not to follow it. For example, an
encoder might instead dither the low-order bits, improving
displayed image quality at the price of increasing file size.
In some applications the original source data may have a range
that is not a power of 2. The linear scaling equation still works
for this case, although the shifting methods do not. It is
recommended that an sBIT chunk not be written for such images,
since sBIT suggests that the original data range was exactly
0..2^S-1.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 38]
10 June 1996
9.2. Encoder gamma handling
See Gamma Tutorial (Chapter 13) if you aren't already familiar
with gamma issues.
Proper handling of gamma encoding and the gAMA chunk in an encoder
depends on the prior history of the sample values and on whether
these values have already been quantized to integers.
If the encoder has access to sample intensity values in floatingpoint or high-precision integer form (perhaps from a computer
image renderer), then it is recommended that the encoder perform
its own gamma encoding before quantizing the data to integer
values for storage in the file. Applying gamma encoding at this
stage results in images with fewer banding artifacts at a given
sample bit depth, or allows smaller samples while retaining the
same visual quality.
A linear intensity level, expressed as a floating-point value in
the range 0 to 1, can be converted to a gamma-encoded sample value
by
sample = ROUND((intensity ^ encoder_gamma) * MAXSAMPLEVAL)
The file_gamma value to be written in the PNG gAMA chunk is the
same as encoder_gamma in this equation, since we are assuming the
initial intensity value is linear (in effect, camera_gamma is
1.0).
If the image is being written to a file only, the encoder_gamma
value can be selected somewhat arbitrarily. Values of 0.45 or 0.5
are generally good choices because they are common in video
systems, and so most PNG decoders should do a good job displaying
such images.
Some image renderers may simultaneously write the image to a PNG
file and display it on-screen. The displayed pixels should be
gamma corrected for the display system and viewing conditions in
use, so that the user sees a proper representation of the intended
scene. An appropriate gamma correction value is
screen_gc = viewing_gamma / display_gamma
If the renderer wants to write the same gamma-corrected sample
values to the PNG file, avoiding a separate gamma-encoding step
for file output, then this screen_gc value should be written in
the gAMA chunk. This will allow a PNG decoder to reproduce what
the file's originator saw on screen during rendering (provided the
decoder properly supports arbitrary values in a gAMA chunk).
However, it is equally reasonable for a renderer to apply gamma
correction for screen display using a gamma appropriate to the
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 39]
10 June 1996
viewing conditions, and to separately gamma-encode the sample
values for file storage using a standard value of gamma such as
0.5. In fact, this is preferable, since some PNG decoders may not
accurately display images with unusual gAMA values.
Computer graphics renderers often do not perform gamma encoding,
instead making sample values directly proportional to scene light
intensity. If the PNG encoder receives sample values that have
already been quantized into linear-light integer values, there is
no point in doing gamma encoding on them; that would just result
in further loss of information. The encoder should just write the
sample values to the PNG file. This "linear" sample encoding is
equivalent to gamma encoding with a gamma of 1.0, so graphics
programs that produce linear samples should always emit a gAMA
chunk specifying a gamma of 1.0.
When the sample values come directly from a piece of hardware, the
correct gAMA value is determined by the gamma characteristic of
the hardware. In the case of video digitizers ("frame grabbers"),
gAMA should be 0.45 or 0.5 for NTSC (possibly less for PAL or
SECAM) since video camera transfer functions are standardized.
Image scanners are less predictable. Their output samples may be
linear (gamma 1.0) since CCD sensors themselves are linear, or the
scanner hardware may have already applied gamma correction
designed to compensate for dot gain in subsequent printing (gamma
of about 0.57), or the scanner may have corrected the samples for
display on a CRT (gamma of 0.4-0.5). You will need to refer to
the scanner's manual, or even scan a calibrated gray wedge, to
determine what a particular scanner does.
File format converters generally should not attempt to convert
supplied images to a different gamma. Store the data in the PNG
file without conversion, and record the source gamma if it is
known. Gamma alteration at file conversion time causes
requantization of the set of intensity levels that are
represented, introducing further roundoff error with little
benefit. It's almost always better to just copy the sample values
intact from the input to the output file.
In some cases, the supplied image may be in an image format (e.g.,
TIFF) that can describe the gamma characteristic of the image. In
such cases, a file format converter is strongly encouraged to
write a PNG gAMA chunk that corresponds to the known gamma of the
source image. Note that some file formats specify the gamma of
the display system, not the camera. If the input file's gamma
value is greater than 1.0, it is almost certainly a display system
gamma, and you should use its reciprocal for the PNG gAMA.
If the encoder or file format converter does not know how an image
was originally created, but does know that the image has been
displayed satisfactorily on a display with gamma display_gamma
under lighting conditions where a particular viewing_gamma is
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 40]
10 June 1996
appropriate, then the image can be marked as having the
file_gamma:
file_gamma = viewing_gamma / display_gamma
This will allow viewers of the PNG file to see the same image that
the person running the file format converter saw. Although this
may not be precisely the correct value of the image gamma, it's
better to write a gAMA chunk with an approximately right value
than to omit the chunk and force PNG decoders to guess at an
appropriate gamma.
On the other hand, if the image file is being converted as part of
a "bulk" conversion, with no one looking at each image, then it is
better to omit the gAMA chunk entirely. If the image gamma must
be guessed at, leave it to the decoder to do the guessing.
Gamma does not apply to alpha samples; alpha is always represented
linearly.
See also Recommendations for Decoders: Decoder gamma handling
(Section 10.5).
9.3. Encoder color handling
See Color Tutorial (Chapter 14) if you aren't already familiar
with color issues.
If it is possible for the encoder to determine the chromaticities
of the source display primaries, or to make a strong guess based
on the origin of the image or the hardware running it, then the
encoder is strongly encouraged to output the cHRM chunk. If it
does so, the gAMA chunk should also be written; decoders can do
little with cHRM if gAMA is missing.
Video created with recent video equipment probably uses the CCIR
709 primaries and D65 white point [ITU-BT709], which are:
x
y
R
0.640
0.330
G
0.300
0.600
B
0.150
0.060
White
0.3127
0.3290
An older but still very popular video standard is SMPTE-C [SMPTE170M]:
x
y
R
0.630
0.340
G
0.310
0.595
B
0.155
0.070
White
0.3127
0.3290
The original NTSC color primaries have not been used in decades.
Although you may still find the NTSC numbers listed in standards
documents, you won't find any images that actually use them.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 41]
10 June 1996
Scanners that produce PNG files as output should insert the filter
chromaticities into a cHRM chunk and the camera_gamma into a gAMA
chunk.
In the case of hand-drawn or digitally edited images, you have to
determine what monitor they were viewed on when being produced.
Many image editing programs allow you to specify what type of
monitor you are using. This is often because they are working in
some device-independent space internally. Such programs have
enough information to write valid cHRM and gAMA chunks, and should
do so automatically.
If the encoder is compiled as a portion of a computer image
renderer that performs full-spectral rendering, the monitor values
that were used to convert from the internal device-independent
color space to RGB should be written into the cHRM chunk. Any
colors that are outside the gamut of the chosen RGB device should
be clipped or otherwise constrained to be within the gamut; PNG
does not store out of gamut colors.
If the computer image renderer performs calculations directly in
device-dependent RGB space, a cHRM chunk should not be written
unless the scene description and rendering parameters have been
adjusted to look good on a particular monitor. In that case, the
data for that monitor (if known) should be used to construct a
cHRM chunk.
There are often cases where an image's exact origins are unknown,
particularly if it began life in some other format. A few image
formats store calibration information, which can be used to fill
in the cHRM chunk. For example, all PhotoCD images use the CCIR
709 primaries and D65 whitepoint, so these values can be written
into the cHRM chunk when converting a PhotoCD file. PhotoCD also
uses the SMPTE-170M transfer function, which is closely
approximated by a gAMA of 0.5. (PhotoCD can store colors outside
the RGB gamut, so the image data will require gamut mapping before
writing to PNG format.) TIFF 6.0 files can optionally store
calibration information, which if present should be used to
construct the cHRM chunk. GIF and most other formats do not store
any calibration information.
It is not recommended that file format converters attempt to
convert supplied images to a different RGB color space. Store the
data in the PNG file without conversion, and record the source
primary chromaticities if they are known. Color space
transformation at file conversion time is a bad idea because of
gamut mismatches and rounding errors. As with gamma conversions,
it's better to store the data losslessly and incur at most one
conversion when the image is finally displayed.
See also Recommendations for Decoders: Decoder color handling
(Section 10.6).
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 42]
10 June 1996
9.4. Alpha channel creation
The alpha channel can be regarded either as a mask that
temporarily hides transparent parts of the image, or as a means
for constructing a non-rectangular image. In the first case, the
color values of fully transparent pixels should be preserved for
future use. In the second case, the transparent pixels carry no
useful data and are simply there to fill out the rectangular image
area required by PNG. In this case, fully transparent pixels
should all be assigned the same color value for best compression.
Encoders should keep in mind the possibility that a decoder will
ignore transparency control. Hence, the colors assigned to
transparent pixels should be reasonable background colors whenever
feasible.
For applications that do not require a full alpha channel, or
cannot afford the price in compression efficiency, the tRNS
transparency chunk is also available.
If the image has a known background color, this color should be
written in the bKGD chunk. Even decoders that ignore transparency
may use the bKGD color to fill unused screen area.
If the original image has premultiplied (also called "associated")
alpha data, convert it to PNG's non-premultiplied format by
dividing each sample value by the corresponding alpha value, then
multiplying by the maximum value for the image bit depth, and
rounding to the nearest integer. In valid premultiplied data, the
sample values never exceed their corresponding alpha values, so
the result of the division should always be in the range 0 to 1.
If the alpha value is zero, output black (zeroes).
9.5. Suggested palettes
A PLTE chunk can appear in truecolor PNG files. In such files,
the chunk is not an essential part of the image data, but simply
represents a suggested palette that viewers may use to present the
image on indexed-color display hardware. A suggested palette is
of no interest to viewers running on truecolor hardware.
If an encoder chooses to provide a suggested palette, it is
recommended that a hIST chunk also be written to indicate the
relative importance of the palette entries. The histogram values
are most easily computed as "nearest neighbor" counts, that is,
the approximate usage of each palette entry if no dithering is
applied. (These counts will often be available for free as a
consequence of developing the suggested palette.)
For images of color type 2 (truecolor without alpha channel), it
is recommended that the palette and histogram be computed with
reference to the RGB data only, ignoring any transparent-color
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 43]
10 June 1996
specification. If the file uses transparency (has a tRNS chunk),
viewers can easily adapt the resulting palette for use with their
intended background color. They need only replace the palette
entry closest to the tRNS color with their background color (which
may or may not match the file's bKGD color, if any).
For images of color type 6 (truecolor with alpha channel), it is
recommended that a bKGD chunk appear and that the palette and
histogram be computed with reference to the image as it would
appear after compositing against the specified background color.
This definition is necessary to ensure that useful palette entries
are generated for pixels having fractional alpha values. The
resulting palette will probably only be useful to viewers that
present the image against the same background color. It is
recommended that PNG editors delete or recompute the palette if
they alter or remove the bKGD chunk in an image of color type 6.
If PLTE appears without bKGD in an image of color type 6, the
circumstances under which the palette was computed are
unspecified.
9.6. Filter selection
For images of color type 3 (indexed color), filter type 0 (none)
is usually the most effective.
Filter type 0 is also recommended for images of bit depths less
than 8. For low-bit-depth grayscale images, it may be a net win
to expand the image to 8-bit representation and apply filtering,
but this is rare.
For truecolor and grayscale images, any of the five filters may
prove the most effective. If an encoder uses a fixed filter, the
Paeth filter is most likely to be the best.
For best compression of truecolor and grayscale images, we
recommend an adaptive filtering approach in which a filter is
chosen for each scanline. The following simple heuristic has
performed well in early tests: compute the output scanline using
all five filters, and select the filter that gives the smallest
sum of absolute values of outputs. (Consider the output bytes as
signed differences for this test.) This method usually
outperforms any single fixed filter choice. However, it is likely
that much better heuristics will be found as more experience is
gained with PNG.
Filtering according to these recommendations is effective on
interlaced as well as noninterlaced images.
9.7. Text chunk processing
A nonempty keyword must be provided for each text chunk. The
generic keyword "Comment" can be used if no better description of
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 44]
10 June 1996
the text is available. If a user-supplied keyword is used, be
sure to check that it meets the restrictions on keywords.
PNG text
Encoders
Latin-1,
system's
strings are expected to use the Latin-1 character set.
should avoid storing characters that are not defined in
and should provide character code remapping if the local
character set is not Latin-1.
Encoders should discourage the creation of single lines of text
longer than 79 characters, in order to facilitate easy reading.
It is recommended that text items less than 1K
size be output using uncompressed tEXt chunks.
is recommended that the basic title and author
output using uncompressed tEXt chunks. Lengthy
other hand, are ideal candidates for zTXt.
(1024 bytes) in
In particular, it
keywords always be
disclaimers, on the
Placing large tEXt and zTXt chunks after the image data (after
IDAT) can speed up image display in some situations, since the
decoder won't have to read over the text to get to the image data.
But it is recommended that small text chunks, such as the image
title, appear before IDAT.
9.8. Use of private chunks
Applications can use PNG private chunks to carry information that
need not be understood by other applications. Such chunks must be
given names with lowercase second letters, to ensure that they can
never conflict with any future public chunk definition. Note,
however, that there is no guarantee that some other application
will not use the same private chunk name. If you use a private
chunk type, it is prudent to store additional identifying
information at the beginning of the chunk data.
Use an ancillary chunk type (lowercase first letter), not a
critical chunk type, for all private chunks that store information
that is not absolutely essential to view the image. Creation of
private critical chunks is discouraged because they render PNG
files unportable. Such chunks should not be used in publicly
available software or files. If private critical chunks are
essential for your application, it is recommended that one appear
near the start of the file, so that a standard decoder need not
read very far before discovering that it cannot handle the file.
If you want others outside your organization to understand a chunk
type that you invent, contact the maintainers of the PNG
specification to submit a proposed chunk name and definition for
addition to the list of special-purpose public chunks (see
Additional Chunk Types, Section 4.4). Note that a proposed public
chunk name (with uppercase second letter) must not be used in
publicly available software or files until registration has been
approved.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 45]
10 June 1996
If an ancillary chunk contains textual information that might be
of interest to a human user, you should not create a special chunk
type for it. Instead use a tEXt chunk and define a suitable
keyword. That way, the information will be available to users not
using your software.
Keywords in tEXt chunks should be reasonably self-explanatory,
since the idea is to let other users figure out what the chunk
contains. If of general usefulness, new keywords can be
registered with the maintainers of the PNG specification. But it
is permissible to use keywords without registering them first.
9.9. Private type and method codes
This specification defines the meaning of only some of the
possible values of some fields. For example, only compression
method 0 and filter types 0 through 4 are defined. Use numbers
greater than 127 when inventing experimental or private
definitions of values for any of these fields. Numbers below 128
are reserved for possible future public extensions of this
specification. Note that use of private type codes may render a
file unreadable by standard decoders. Such codes are strongly
discouraged except for experimental purposes, and should not
appear in publicly available software or files.
10. Recommendations for Decoders
This chapter gives some recommendations for decoder behavior. The
only absolute requirement on a PNG decoder is that it successfully
read any file conforming to the format specified in the preceding
chapters. However, best results will usually be achieved by
following these recommendations.
10.1. Error checking
To ensure early detection of common file-transfer problems,
decoders should verify that all eight bytes of the PNG file
signature are correct. (See Rationale: PNG file signature,
Section 12.11.) A decoder can have additional confidence in the
file's integrity if the next eight bytes are an IHDR chunk header
with the correct chunk length.
Unknown chunk types must be handled as described in Chunk naming
conventions (Section 3.3). An unknown chunk type is not to be
treated as an error unless it is a critical chunk.
It is strongly recommended that decoders verify the CRC on each
chunk.
In some situations it is desirable to check chunk headers (length
and type code) before reading the chunk data and CRC. The chunk
type can be checked for plausibility by seeing whether all four
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 46]
10 June 1996
bytes are ASCII letters (codes 65-90 and 97-122); note that this
need only be done for unrecognized type codes. If the total file
size is known (from file system information, HTTP protocol, etc),
the chunk length can be checked for plausibility as well.
If CRCs are not checked, dropped/added data bytes or an erroneous
chunk length can cause the decoder to get out of step and
misinterpret subsequent data as a chunk header. Verifying that
the chunk type contains letters is an inexpensive way of providing
early error detection in this situation.
For known-length chunks such as IHDR, decoders should treat an
unexpected chunk length as an error. Future extensions to this
specification will not add new fields to existing chunks; instead,
new chunk types will be added to carry new information.
Unexpected values in fields of known chunks (for example, an
unexpected compression method in the IHDR chunk) must be checked
for and treated as errors. However, it is recommended that
unexpected field values be treated as fatal errors only in
critical chunks. An unexpected value in an ancillary chunk can be
handled by ignoring the whole chunk as though it were an unknown
chunk type. (This recommendation assumes that the chunk's CRC has
been verified. In decoders that do not check CRCs, it is safer to
treat any unexpected value as indicating a corrupted file.)
10.2. Pixel dimensions
Non-square pixels can be represented (see the pHYs chunk), but
viewers are not required to account for them; a viewer can present
any PNG file as though its pixels are square.
Conversely, viewers running on display hardware with non-square
pixels are strongly encouraged to rescale images for proper
display.
10.3. Truecolor image handling
To achieve PNG's goal of universal interchangeability, decoders
are required to accept all types of PNG image: indexed-color,
truecolor, and grayscale. Viewers running on indexed-color
display hardware need to be able to reduce truecolor images to
indexed format for viewing. This process is usually called "color
quantization".
A simple, fast way of doing this is to reduce the image to a fixed
palette. Palettes with uniform color spacing ("color cubes") are
usually used to minimize the per-pixel computation. For
photograph-like images, dithering is recommended to avoid ugly
contours in what should be smooth gradients; however, dithering
introduces graininess that can be objectionable.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 47]
10 June 1996
The quality of rendering can be improved substantially by using a
palette chosen specifically for the image, since a color cube
usually has numerous entries that are unused in any particular
image. This approach requires more work, first in choosing the
palette, and second in mapping individual pixels to the closest
available color. PNG allows the encoder to supply a suggested
palette in a PLTE chunk, but not all encoders will do so, and the
suggested palette may be unsuitable in any case (it may have too
many or too few colors). High-quality viewers will therefore need
to have a palette selection routine at hand. A large lookup table
is usually the most feasible way of mapping individual pixels to
palette entries with adequate speed.
Numerous implementations of color quantization are available.
PNG reference implementation, libpng, includes code for the
purpose.
The
10.4. Bit depth rescaling
Decoders may wish to scale PNG data to a lesser bit depth (sample
precision) for display. For example, 16-bit data will need to be
reduced to 8-bit depth for use on most present-day display
hardware. Reduction of 8-bit data to 5-bit depth is also common.
The most accurate scaling is achieved by the linear equation
output = ROUND(input * MAXOUTSAMPLE / MAXINSAMPLE)
where
MAXINSAMPLE = (2^bitdepth)-1
MAXOUTSAMPLE = (2^desired_bitdepth)-1
A slightly less accurate conversion is achieved by simply shifting
right by bitdepth-desired_bitdepth places. For example, to reduce
16-bit samples to 8-bit, one need only discard the low-order byte.
In many situations the shift method is sufficiently accurate for
display purposes, and it is certainly much faster. (But if gamma
correction is being done, sample rescaling can be merged into the
gamma correction lookup table, as is illustrated in Decoder gamma
handling, Section 10.5.)
When an sBIT chunk is present, the original pre-PNG data can be
recovered by shifting right to the bit depth specified by sBIT.
Note that linear scaling will not necessarily reproduce the
original data, because the encoder is not required to have used
linear scaling to scale the data up. However, the encoder is
required to have used a method that preserves the high-order bits,
so shifting always works. This is the only case in which shifting
might be said to be more accurate than linear scaling.
When comparing pixel values to tRNS chunk values to detect
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 48]
10 June 1996
transparent pixels, it is necessary to do the comparison exactly.
Therefore, transparent pixel detection must be done before
reducing sample precision.
10.5. Decoder gamma handling
See Gamma Tutorial (Chapter 13) if you aren't already familiar
with gamma issues.
To produce correct tone reproduction, a good image display program
must take into account the gammas of the image file and the
display device, as well as the viewing_gamma appropriate to the
lighting conditions near the display. This can be done by
calculating
gbright = sampleval / MAXSAMPLEVAL
bright = gbright ^ (1.0 / file_gamma)
vbright = bright ^ viewing_gamma
gcvideo = vbright ^ (1.0 / display_gamma)
fbval = ROUND(gcvideo * MAXFBVAL)
where MAXSAMPLEVAL is the maximum sample value in the file (255
for 8-bit, 65535 for 16-bit, etc), MAXFBVAL is the maximum value
of a frame buffer sample (255 for 8-bit, 31 for 5-bit, etc),
sampleval is the value of the sample in the PNG file, and fbval is
the value to write into the frame buffer. The first line converts
from integer samples into a normalized 0 to 1 floating point
value, the second undoes the gamma encoding of the image file to
produce a linear intensity value, the third adjusts for the
viewing conditions, the fourth corrects for the display system's
gamma value, and the fifth converts to an integer frame buffer
sample. In practice, the second through fourth lines can be
merged into
gcvideo = gbright^(viewing_gamma / (file_gamma*display_gamma))
so as to perform only one power calculation. For color images, the
entire calculation is performed separately for R, G, and B values.
It is not necessary to perform transcendental math for every
pixel. Instead, compute a lookup table that gives the correct
output value for every possible sample value. This requires only
256 calculations per image (for 8-bit accuracy), not one or three
calculations per pixel. For an indexed-color image, a one-time
correction of the palette is sufficient, unless the image uses
transparency and is being displayed against a nonuniform
background.
In some cases even the cost of computing a gamma lookup table may
be a concern. In these cases, viewers are encouraged to have
precomputed gamma correction tables for file_gamma values of 1.0
and 0.5 with some reasonable choice of viewing_gamma and
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 49]
10 June 1996
display_gamma, and to use the table closest to the gamma indicated
in the file. This will produce acceptable results for the majority
of real files.
When the incoming image has unknown gamma (no gAMA chunk), choose
a likely default file_gamma value, but allow the user to select a
new one if the result proves too dark or too light.
In practice, it is often difficult to determine what value of
display_gamma should be used. In systems with no built-in gamma
correction, the display_gamma is determined entirely by the CRT.
Assuming a CRT_gamma of 2.5 is recommended, unless you have
detailed calibration measurements of this particular CRT
available.
However, many modern frame buffers have lookup tables that are
used to perform gamma correction, and on these systems the
display_gamma value should be the gamma of the lookup table and
CRT combined. You may not be able to find out what the lookup
table contains from within an image viewer application, so you may
have to ask the user what the system's gamma value is.
Unfortunately, different manufacturers use different ways of
specifying what should go into the lookup table, so interpretation
of the system gamma value is system-dependent. Gamma Tutorial
(Chapter 13) gives some examples.
The response of real displays is actually more complex than can be
described by a single number (display_gamma). If actual
measurements of the monitor's light output as a function of
voltage input are available, the fourth and fifth lines of the
computation above can be replaced by a lookup in these
measurements, to find the actual frame buffer value that most
nearly gives the desired brightness.
The value of viewing_gamma depends on lighting conditions; see
Gamma Tutorial (Chapter 13) for more detail. Ideally, a viewer
would allow the user to specify viewing_gamma, either directly
numerically, or via selecting from "bright surround", "dim
surround", and "dark surround" conditions. Viewers that don't
want to do this should just assume a value for viewing_gamma of
1.0, since most computer displays live in brightly-lit rooms.
When viewing images that are digitized from video, or that are
destined to become video frames, the user might want to set the
viewing_gamma to about 1.25 regardless of the actual level of room
lighting. This value of viewing_gamma is "built into" NTSC video
practice, and displaying an image with that viewing_gamma allows
the user to see what a TV set would show under the current room
lighting conditions. (This is not the same thing as trying to
obtain the most accurate rendition of the content of the scene,
which would require adjusting viewing_gamma to correspond to the
room lighting level.) This is another reason viewers might want
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 50]
10 June 1996
to allow users to adjust viewing_gamma directly.
10.6. Decoder color handling
See Color Tutorial (Chapter 14) if you aren't already familiar
with color issues.
In many cases, decoders will treat image data in PNG files as
device-dependent RGB data and display it without modification
(except for appropriate gamma correction). This provides the
fastest display of PNG images. But unless the viewer uses exactly
the same display hardware as the original image author used, the
colors will not be exactly the same as the original author saw,
particularly for darker or near-neutral colors. The cHRM chunk
provides information that allows closer color matching than that
provided by gamma correction alone.
Decoders can use the cHRM data to transform the image data from
RGB to XYZ and thence into a perceptually linear color space such
as CIE LAB. They can then partition the colors to generate an
optimal palette, because the geometric distance between two colors
in CIE LAB is strongly related to how different those colors
appear (unlike, for example, RGB or XYZ spaces). The resulting
palette of colors, once transformed back into RGB color space,
could be used for display or written into a PLTE chunk.
Decoders that are part of image processing applications might also
transform image data into CIE LAB space for analysis.
In applications where color fidelity is critical, such as product
design, scientific visualization, medicine, architecture, or
advertising, decoders can transform the image data from source_RGB
to the display_RGB space of the monitor used to view the image.
This involves calculating the matrix to go from source_RGB to XYZ
and the matrix to go from XYZ to display_RGB, then combining them
to produce the overall transformation. The decoder is responsible
for implementing gamut mapping.
Decoders running on platforms that have a Color Management System
(CMS) can pass the image data, gAMA and cHRM values to the CMS for
display or further processing.
Decoders that provide color printing facilities can use the
facilities in Level 2 PostScript to specify image data in
calibrated RGB space or in a device-independent color space such
as XYZ. This will provide better color fidelity than a simple RGB
to CMYK conversion. The PostScript Language Reference manual
gives examples of this process [POSTSCRIPT]. Such decoders are
responsible for implementing gamut mapping between source_RGB
(specified in the cHRM chunk) and the target printer. The
PostScript interpreter is then responsible for producing the
required colors.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 51]
10 June 1996
Decoders can use the cHRM data to calculate an accurate grayscale
representation of a color image. Conversion from RGB to gray is
simply a case of calculating the Y (luminance) component of XYZ,
which is a weighted sum of the R G and B values. The weights
depend on the monitor type, i.e., the values in the cHRM chunk.
Decoders may wish to do this for PNG files with no cHRM chunk. In
that case, a reasonable default would be the CCIR 709 primaries
[ITU-BT709]. Do not use the original NTSC primaries, unless you
really do have an image color-balanced for such a monitor. Few
monitors ever used the NTSC primaries, so such images are probably
nonexistent these days.
10.7. Background color
The background color given by bKGD will typically be used to fill
unused screen space around the image, as well as any transparent
pixels within the image. (Thus, bKGD is valid and useful even
when the image does not use transparency.) If no bKGD chunk is
present, the viewer must make its own decision about a suitable
background color.
Viewers that have a specific background against which to present
the image (such as Web browsers) will ignore the bKGD chunk, in
effect overriding bKGD with their preferred background color or
background image.
The background color given by bKGD is not to be considered
transparent, even if it happens to match the color given by tRNS
(or, in the case of an indexed-color image, refers to a palette
index that is marked as transparent by tRNS). Otherwise one would
have to imagine something "behind the background" to composite
against. The background color is either used as background or
ignored; it is not an intermediate layer between the PNG image and
some other background.
Indeed, it will be common that bKGD and tRNS specify the same
color, since then a decoder that does not implement transparency
processing will give the intended display, at least when no
partially-transparent pixels are present.
10.8. Alpha channel processing
In the most general case, the alpha channel can be used to
composite a foreground image against a background image; the PNG
file defines the foreground image and the transparency mask, but
not the background image. Decoders are not required to support
this most general case. It is expected that most will be able to
support compositing against a single background color, however.
The equation for computing a composited sample value is
output = alpha * foreground + (1-alpha) * background
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 52]
10 June 1996
where alpha and the input and output sample values are expressed
as fractions in the range 0 to 1. This computation should be
performed with linear (non-gamma-encoded) sample values. For
color images, the computation is done separately for R, G, and B
samples.
The following code illustrates the general case of compositing a
foreground image over a background image. It assumes that you
have the original pixel data available for the background image,
and that output is to a frame buffer for display. Other variants
are possible; see the comments below the code. The code allows
the bit depths and gamma values of foreground image, background
image, and frame buffer/CRT all to be different. Don't assume
they are the same without checking.
This code is standard C, with line numbers added for reference in
the comments below.
01
02
03
04
05
06
07
08
09
10
11
12
13
int foreground[4]; /* image pixel: R, G, B, A */
int background[3]; /* background pixel: R, G, B */
int fbpix[3];
/* frame buffer pixel */
int fg_maxsample;
/* foreground max sample */
int bg_maxsample;
/* background max sample */
int fb_maxsample;
/* frame buffer max sample */
int ialpha;
float alpha, compalpha;
float gamfg, linfg, gambg, linbg, comppix, gcvideo;
/* Get max sample values in data and frame buffer */
fg_maxsample = (1 << fg_bit_depth) - 1;
bg_maxsample = (1 << bg_bit_depth) - 1;
fb_maxsample = (1 << frame_buffer_bit_depth) - 1;
/*
* Get integer version of alpha.
* Check for opaque and transparent special cases;
* no compositing needed if so.
*
* We show the whole gamma decode/correct process in
* floating point, but it would more likely be done
* with lookup tables.
*/
ialpha = foreground[3];
Boutell, et al
INTERNET-DRAFT
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Informational
PNG: Portable Network Graphics
[Page 53]
10 June 1996
if (ialpha == 0) {
/*
* Foreground image is transparent here.
* If the background image is already in the frame
* buffer, there is nothing to do.
*/
;
} else if (ialpha == fg_maxsample) {
/*
* Copy foreground pixel to frame buffer.
*/
for (i = 0; i < 3; i++) {
gamfg = (float) foreground[i] / fg_maxsample;
linfg = pow(gamfg, 1.0/fg_gamma);
comppix = linfg;
gcvideo = pow(comppix,viewing_gamma/display_gamma);
fbpix[i] = (int) (gcvideo * fb_maxsample + 0.5);
}
} else {
/*
* Compositing is necessary.
* Get floating-point alpha and its complement.
* Note: alpha is always linear; gamma does not
* affect it.
*/
alpha = (float) ialpha / fg_maxsample;
compalpha = 1.0 - alpha;
for (i = 0; i < 3; i++) {
/*
* Convert foreground and background to floating
* point, then linearize (undo gamma encoding).
*/
gamfg = (float) foreground[i] / fg_maxsample;
linfg = pow(gamfg, 1.0/fg_gamma);
gambg = (float) background[i] / bg_maxsample;
linbg = pow(gambg, 1.0/bg_gamma);
/*
* Composite.
*/
comppix = linfg * alpha + linbg * compalpha;
/*
* Gamma correct for display.
* Convert to integer frame buffer pixel.
*/
gcvideo = pow(comppix,viewing_gamma/display_gamma);
fbpix[i] = (int) (gcvideo * fb_maxsample + 0.5);
28
29
30
31
32
33
34
35
36
}
}
Variations:
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 54]
10 June 1996
* If output is to another PNG image file instead of a frame
buffer, lines 21, 22, 33, and 34 should be changed to be
something like
/*
* Gamma encode for storage in output file.
* Convert to integer sample value.
*/
gamout = pow(comppix, outfile_gamma);
outpix[i] = (int) (gamout * out_maxsample + 0.5);
Also, it becomes necessary to process background pixels when
alpha is zero, rather than just skipping pixels. Thus, line
15 must be replaced by copies of lines 17-23, but processing
background instead of foreground pixel values.
* If the bit depth of the output file, foreground file, and
background file are all the same, and the three gamma values
also match, then the no-compositing code in lines 14-23
reduces to nothing more than copying pixel values from the
input file to the output file if alpha is one, or copying
pixel values from background to output file if alpha is
zero. Since alpha is typically either zero or one for the
vast majority of pixels in an image, this is a great
savings. No gamma computations are needed for most pixels.
* When the bit depths and gamma values all match, it may
appear attractive to skip the gamma decoding and encoding
(lines 28-31, 33-34) and just perform line 32 using gammaencoded sample values. Although this doesn't hurt image
quality too badly, the time savings are small if alpha
values of zero and one are special-cased as recommended
here.
* If the original pixel values of the background image are no
longer available, only processed frame buffer pixels left by
display of the background image, then lines 30 and 31 must
extract intensity from the frame buffer pixel values using
code like
/*
* Decode frame buffer value back into linear space.
*/
gcvideo = (float) fbpix[i] / fb_maxsample;
linbg = pow(gcvideo, display_gamma / viewing_gamma);
However, some roundoff error can result, so it is better to
have the original background pixels available if at all
possible.
* Note that lines 18-22 are performing exactly the same gamma
computation that is done when no alpha channel is present.
So, if you handle the no-alpha case with a lookup table, you
can use the same lookup table here. Lines 28-31 and 33-34
can also be done with (different) lookup tables.
* Of course, everything here can be done in integer
Boutell, et al
INTERNET-DRAFT
Informational
[Page 55]
PNG: Portable Network Graphics
10 June 1996
arithmetic. Just be careful to maintain sufficient
precision all the way through.
Note: in floating point, no overflow or underflow checks are
needed, because the input sample values are guaranteed to be
between 0 and 1, and compositing always yields a result that is in
between the input values (inclusive). With integer arithmetic,
some roundoff-error analysis might be needed to guarantee no
overflow or underflow.
When displaying a PNG image with full alpha channel, it is
important to be able to composite the image against some
background, even if it's only black. Ignoring the alpha channel
will cause PNG images that have been converted from an
associated-alpha representation to look wrong. (Of course, if the
alpha channel is a separate transparency mask, then ignoring alpha
is a useful option: it allows the hidden parts of the image to be
recovered.)
Even if the decoder author does not wish to implement true
compositing logic, it is simple to deal with images that contain
only zero and one alpha values. (This is implicitly true for
grayscale and truecolor PNG files that use a tRNS chunk; for
indexed-color PNG files, it is easy to check whether tRNS contains
any values other than 0 and 255.) In this simple case,
transparent pixels are replaced by the background color, while
others are unchanged. If a decoder contains only this much
transparency capability, it should deal with a full alpha channel
by treating all nonzero alpha values as fully opaque; that is, do
not replace partially transparent pixels by the background. This
approach will not yield very good results for images converted
from associated-alpha formats, but it's better than doing nothing.
10.9. Progressive display
When receiving images over slow transmission links, decoders can
improve perceived performance by displaying interlaced images
progressively. This means that as each pass is received, an
approximation to the complete image is displayed based on the data
received so far. One simple yet pleasing effect can be obtained
by expanding each received pixel to fill a rectangle covering the
yet-to-be-transmitted pixel positions below and to the right of
the received pixel. This process can be described by the
following pseudocode:
Starting_Row [1..7] =
Starting_Col [1..7] =
Row_Increment [1..7] =
Col_Increment [1..7] =
Block_Height [1..7] =
Block_Width [1..7] =
Boutell, et al
{
{
{
{
{
{
0,
0,
8,
8,
8,
8,
0,
4,
8,
8,
8,
4,
4,
0,
8,
4,
4,
4,
Informational
0,
2,
4,
4,
4,
2,
2,
0,
4,
2,
2,
2,
0,
1,
2,
2,
2,
1,
1
0
2
1
1
1
}
}
}
}
}
}
[Page 56]
INTERNET-DRAFT
PNG: Portable Network Graphics
10 June 1996
pass := 1
while pass <= 7
begin
row := Starting_Row[pass]
while row < height
begin
col := Starting_Col[pass]
while col < width
begin
visit (row, col,
min (Block_Height[pass], height - row),
min (Block_Width[pass], width - col))
col := col + Col_Increment[pass]
end
row := row + Row_Increment[pass]
end
pass := pass + 1
end
Here, the function "visit(row,column,height,width)" obtains the
next transmitted pixel and paints a rectangle of the specified
height and width, whose upper-left corner is at the specified row
and column, using the color indicated by the pixel. Note that row
and column are measured from 0,0 at the upper left corner.
If the decoder is merging the received image with a background
image, it may be more convenient just to paint the received pixel
positions; that is, the "visit()" function sets only the pixel at
the specified row and column, not the whole rectangle. This
produces a "fade-in" effect as the new image gradually replaces
the old. An advantage of this approach is that proper alpha or
transparency processing can be done as each pixel is replaced.
Painting a rectangle as described above will overwrite
background-image pixels that may be needed later, if the pixels
eventually received for those positions turn out to be wholly or
partially transparent. Of course, this is only a problem if the
background image is not stored anywhere offscreen.
10.10. Suggested-palette and histogram usage
In truecolor PNG files, the encoder may have provided a suggested
PLTE chunk for use by viewers running on indexed-color hardware.
If the image has a tRNS chunk, the viewer will need to adapt the
suggested palette for use with its desired background color. To
do this, replace the palette entry closest to the tRNS color with
the desired background color; or just add a palette entry for the
background color, if the viewer can handle more colors than there
are PLTE entries.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 57]
10 June 1996
For images of color type 6 (truecolor with alpha channel), any
suggested palette should have been designed for display of the
image against a uniform background of the color specified by bKGD.
Viewers should probably ignore the palette if they intend to use a
different background, or if the bKGD chunk is missing. Viewers
can use a suggested palette for display against a different
background than it was intended for, but the results may not be
very good.
If the viewer presents a transparent truecolor image against a
background that is more complex than a single color, it is
unlikely that the suggested palette will be optimal for the
composite image. In this case it is best to perform a truecolor
compositing step on the truecolor PNG image and background image,
then color-quantize the resulting image.
The histogram chunk is useful when the viewer cannot provide as
many colors as are used in the image's palette. If the viewer is
only short a few colors, it is usually adequate to drop the
least-used colors from the palette. To reduce the number of
colors substantially, it's best to choose entirely new
representative colors, rather than trying to use a subset of the
existing palette. This amounts to performing a new color
quantization step; however, the existing palette and histogram can
be used as the input data, thus avoiding a scan of the image data.
If no palette or histogram chunk is provided, a decoder can
develop its own, at the cost of an extra pass over the image data.
Alternatively, a default palette (probably a color cube) can be
used.
See also Recommendations for Encoders: Suggested palettes (Section
9.5).
10.11. Text chunk processing
If practical, decoders should have a way to display to the user
all tEXt and zTXt chunks found in the file. Even if the decoder
does not recognize a particular text keyword, the user might be
able to understand it.
PNG text is not supposed to contain any characters outside the ISO
8859-1 "Latin-1" character set (that is, no codes 0-31 or 127159), except for the newline character (decimal 10). But decoders
might encounter such characters anyway. Some of these characters
can be safely displayed (e.g., TAB, FF, and CR, decimal 9, 12, and
13, respectively), but others, especially the ESC character
(decimal 27), could pose a security hazard because unexpected
actions may be taken by display hardware or software. To prevent
such hazards, decoders should not attempt to directly display any
non-Latin-1 characters (except for newline and perhaps TAB, FF,
CR) encountered in a tEXt or zTXt chunk. Instead, ignore them or
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
display them in a visible notation such as "\nnn".
considerations (Section 8.5).
[Page 58]
10 June 1996
See Security
Even though encoders are supposed to represent newlines as LF, it
is recommended that decoders not rely on this; it's best to
recognize all the common newline combinations (CR, LF, and CR-LF)
and display each as a single newline. TAB can be expanded to the
proper number of spaces needed to arrive at a column multiple of
8.
Decoders running on systems with non-Latin-1 character set
encoding should provide character code remapping so that Latin-1
characters are displayed correctly. Some systems may not provide
all the characters defined in Latin-1. Mapping unavailable
characters to a visible notation such as "\nnn" is a good
fallback. In particular, character codes 127-255 should be
displayed only if they are printable characters on the decoding
system. Some systems may interpret such codes as control
characters; for security, decoders running on such systems should
not display such characters literally.
Decoders should be prepared to display text chunks that contain
any number of printing characters between newline characters, even
though encoders are encouraged to avoid creating lines in excess
of 79 characters.
11. Glossary
Alpha
A value representing a pixel's degree of transparency. The more
transparent a pixel, the less it hides the background against
which the image is presented. In PNG, alpha is really the degree
of opacity: zero alpha represents a completely transparent pixel,
maximum alpha represents a completely opaque pixel. But most
people refer to alpha as providing transparency information, not
opacity information, and we continue that custom here.
Ancillary chunk
A chunk that provides additional information. A decoder can still
produce a meaningful image, though not necessarily the best
possible image, without processing the chunk.
Byte
Eight bits; also called an octet.
Channel
The set of all samples of the
example, all the blue samples
"component" is also used, but
sample is the intersection of
Boutell, et al
INTERNET-DRAFT
same kind within an image; for
in a truecolor image. (The term
not in this specification.) A
a channel and a pixel.
Informational
PNG: Portable Network Graphics
[Page 59]
10 June 1996
Chunk
A section of a PNG file. Each chunk has a type indicated by its
chunk type name. Most types of chunks also include some data.
The format and meaning of the data within the chunk are determined
by the type name.
Chromaticity
A pair of values x,y that precisely specify the hue, though not
the absolute brightness, of a perceived color.
Composite
As a verb, to form an image by merging a foreground image and a
background image, using transparency information to determine
where the background should be visible. The foreground image is
said to be "composited against" the background.
CRC
Cyclic Redundancy Check. A CRC is a type of check value designed
to catch most transmission errors. A decoder calculates the CRC
for the received data and compares it to the CRC that the encoder
calculated, which is appended to the data. A mismatch indicates
that the data was corrupted in transit.
CRT
Cathode Ray Tube: a common type of computer display hardware.
Critical chunk
A chunk that must be understood and processed by the decoder in
order to produce a meaningful image from a PNG file.
Datastream
A sequence of bytes. This term is used rather than "file" to
describe a byte sequence that is only a portion of a file. We
also use it to emphasize that a PNG image might be generated and
consumed "on the fly", never appearing in a stored file at all.
Deflate
The name of the compression algorithm used in standard PNG files,
as well as in zip, gzip, pkzip, and other compression programs.
Deflate is a member of the LZ77 family of compression methods.
Filter
A transformation applied to image data in hopes of improving its
compressibility. PNG uses only lossless (reversible) filter
algorithms.
Frame buffer
The final digital storage area for the image shown by a computer
display. Software causes an image to appear onscreen by loading
it into the frame buffer.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 60]
10 June 1996
Gamma
The brightness of mid-level tones in an image. More precisely, a
parameter that describes the shape of the transfer function for
one or more stages in an imaging pipeline. The transfer function
is given by the expression
output = input ^ gamma
where both input and output are scaled to the range 0 to 1.
Grayscale
An image representation in which each pixel is represented by a
single sample value representing overall luminance (on a scale
from black to white). PNG also permits an alpha sample to be
stored for each pixel of a grayscale image.
Indexed color
An image representation in which each pixel is represented by a
single sample that is an index into a palette or lookup table.
The selected palette entry defines the actual color of the pixel.
Lossless compression
Any method of data compression that guarantees the original data
can be reconstructed exactly, bit-for-bit.
Lossy compression
Any method of data compression that reconstructs the original data
approximately, rather than exactly.
LSB
Least Significant Byte of a multi-byte value.
Luminance
Perceived brightness, or grayscale level, of a color. Luminance
and chromaticity together fully define a perceived color.
LUT
Look Up Table. In general, a table used to transform data. In
frame buffer hardware, a LUT can be used to map indexed-color
pixels into a selected set of truecolor values, or to perform
gamma correction. In software, a LUT can be used as a fast way of
implementing any one-variable mathematical function.
MSB
Most Significant Byte of a multi-byte value.
Palette
The set of colors available in an indexed-color image. In PNG, a
palette is an array of colors defined by red, green, and blue
samples. (Alpha values can also be defined for palette entries,
via the tRNS chunk.)
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 61]
10 June 1996
Pixel
The information stored for a single grid point in the image.
complete image is a rectangular array of pixels.
The
PNG editor
A program that modifies a PNG file and preserves ancillary
information, including chunks that it does not recognize. Such a
program must obey the rules given in Chunk Ordering Rules (Chapter
7).
Sample
A single number in the image data; for example, the red value of a
pixel. A pixel is composed of one or more samples. We use
"sample" both for color values and for the palette index values of
an indexed-color image.
Scanline
One horizontal row of pixels within an image.
Truecolor
An image representation in which pixel colors are defined by
storing three samples for each pixel, representing red, green, and
blue intensities respectively. PNG also permits an alpha sample
to be stored for each pixel of a truecolor image.
White point
The chromaticity of a computer display's nominal white value.
zlib
A particular format for data that has been compressed using
deflate-style compression. Also the name of a library
implementing this method. PNG implementations need not use the
zlib library, but they must conform to its format for compressed
data.
x^y
Exponentiation; x raised to the power y. C programmers should be
careful not to misread this notation as exclusive-or. Note that
in gamma-related calculations, zero raised to any power is valid
and should give a zero result.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 62]
10 June 1996
12. Appendix: Rationale
(This appendix is not part of the formal PNG specification.)
This appendix gives the reasoning behind some of the design decisions
in PNG. Many of these decisions were the subject of considerable
debate. The authors freely admit that another group might have made
different decisions; however, we believe that our choices are
defensible and consistent.
12.1. Why a new file format?
Does the world really need yet another graphics format? We
believe so. GIF is no longer freely usable, but no other commonly
used format can directly replace it, as is discussed in more
detail below. We might have used an adaptation of an existing
format, for example GIF with an unpatented compression scheme.
But this would require new code anyway; it would not be all that
much easier to implement than a whole new file format. (PNG is
designed to be simple to implement, with the exception of the
compression engine, which would be needed in any case.) We feel
that this is an excellent opportunity to design a new format that
fixes some of the known limitations of GIF.
12.2. Why these features?
The features chosen for PNG are intended to address the needs of
applications that previously used the special strengths of GIF.
In particular, GIF is well adapted for online communications
because of its streamability and progressive display capability.
PNG shares those attributes.
We have also addressed some of the widely known shortcomings of
GIF. In particular, PNG supports truecolor images. We know of no
widely used image format that losslessly compresses truecolor
images as effectively as PNG does. We hope that PNG will make use
of truecolor images more practical and widespread.
Some form of transparency control is desirable for applications in
which images are displayed against a background or together with
other images. GIF provided a simple transparent-color
specification for this purpose. PNG supports a full alpha channel
as well as transparent-color specifications. This allows both
highly flexible transparency and compression efficiency.
Robustness against transmission errors has been an important
consideration. For example, images transferred across Internet
are often mistakenly processed as text, leading to file
corruption. PNG is designed so that such errors can be detected
quickly and reliably.
PNG has been expressly designed not to be completely dependent on
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 63]
10 June 1996
a single compression technique. Although deflate/inflate
compression is mentioned in this document, PNG would still exist
without it.
12.3. Why not these features?
Some features have been deliberately omitted from PNG. These
choices were made to simplify implementation of PNG, promote
portability and interchangeability, and make the format as simple
and foolproof as possible for users. In particular:
* There is no uncompressed variant of PNG. It is possible to
store uncompressed data by using only uncompressed deflate
blocks (a feature normally used to guarantee that deflate
does not make incompressible data much larger). However,
any software that does not support full deflate/inflate will
not be considered compliant with the PNG standard. The two
most important features of PNG---portability and
compression---are absolute requirements for online
applications, and users demand them. Failure to support full
deflate/inflate compromises both of these objectives.
* There is no lossy compression in PNG. Existing formats such
as JFIF already handle lossy compression well. Furthermore,
available lossy compression methods (e.g., JPEG) are far
from foolproof --- a poor choice of quality level can ruin
an image. To avoid user confusion and unintentional loss of
information, we feel it is best to keep lossy and lossless
formats strictly separate. Also, lossy compression is
complex to implement. Adding JPEG support to a PNG decoder
might increase its size by an order of magnitude. This
would certainly cause some decoders to omit support for the
feature, which would destroy our goal of interchangeability.
* There is no support for CMYK or other unusual color spaces.
Again, this is in the name of promoting portability. CMYK,
in particular, is far too device-dependent to be useful as a
portable image representation.
* There is no standard chunk for thumbnail views of images.
In discussions with software vendors who use thumbnails in
their products, it has become clear that most would not use
a "standard" thumbnail chunk. For one thing, every vendor
has a different idea of what the dimensions and
characteristics of a thumbnail should be. Also, some
vendors keep thumbnails in separate files to accommodate
varied image formats; they are not going to stop doing that
simply because of a thumbnail chunk in one new format.
Proprietary chunks containing vendor-specific thumbnails
appear to be more practical than a common thumbnail format.
It is worth noting that private extensions to PNG could easily add
these features. We will not, however, include them as part of the
basic PNG standard.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 64]
10 June 1996
Basic PNG also does not support multiple images in one file. This
restriction is a reflection of the reality that many applications
do not need and will not support multiple images per file. (While
the GIF standard nominally allows multiple images per file, few
applications actually support it.) In any case, single images are
a fundamentally different sort of object from sequences of images.
Rather than make false promises of interchangeability, we have
drawn a clear distinction between single-image and multi-image
formats. PNG is a single-image format.
12.4. Why not use format X?
Numerous existing formats were considered before deciding to
develop PNG. None could meet the requirements we felt were
important for PNG.
GIF is no longer suitable as a universal standard because of legal
entanglements. Although just replacing GIF's compression method
would avoid that problem, GIF does not support truecolor images,
alpha channels, or gamma correction. The spec has more subtle
problems too. Only a small subset of the GIF89 spec is actually
portable across a variety of implementations, but there is no
codification of the most portable part of the spec.
TIFF is far too complex to meet our goals of simplicity and
interchangeability. Defining a TIFF subset would meet that
objection, but would frustrate users making the reasonable
assumption that a file saved as TIFF from their existing software
would load into a program supporting our flavor of TIFF.
Furthermore, TIFF is not designed for stream processing, has no
provision for progressive display, and does not currently provide
any good, legally unencumbered, lossless compression method.
IFF has also been suggested, but is not suitable in detail:
available image representations are too machine-specific or not
adequately compressed. The overall chunk structure of IFF is a
useful concept that PNG has liberally borrowed from, but we did
not attempt to be bit-for-bit compatible with IFF chunk structure.
Again this is due to detailed issues, notably the fact that IFF
FORMs are not designed to be serially writable.
Lossless JPEG is not suitable because it does not provide for the
storage of indexed-color images. Furthermore, its lossless
truecolor compression is often inferior to that of PNG.
12.5. Byte order
It has been asked why PNG uses network byte order. We have
selected one byte ordering and used it consistently. Which order
in particular is of little relevance, but network byte order has
the advantage that routines to convert to and from it are already
available on any platform that supports TCP/IP networking,
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 65]
10 June 1996
including all PC platforms. The functions are trivial and will be
included in the reference implementation.
12.6. Interlacing
PNG's two-dimensional interlacing scheme is more complex to
implement than GIF's line-wise interlacing. It also costs a
little more in file size. However, it yields an initial image
eight times faster than GIF (the first pass transmits only 1/64th
of the pixels, compared to 1/8th for GIF). Although this initial
image is coarse, it is useful in many situations. For example, if
the image is a World Wide Web imagemap that the user has seen
before, PNG's first pass is often enough to determine where to
click. The PNG scheme also looks better than GIF's, because
horizontal and vertical resolution never differ by more than a
factor of two; this avoids the odd "stretched" look seen when
interlaced GIFs are filled in by replicating scanlines.
Preliminary results show that small text in an interlaced PNG
image is typically readable about twice as fast as in an
equivalent GIF, i.e., after PNG's fifth pass or 25% of the image
data, instead of after GIF's third pass or 50%. This is again due
to PNG's more balanced increase in resolution.
12.7. Why gamma?
It might seem natural to standardize on storing sample values that
are linearly proportional to light intensity (that is, have gamma
of 1.0). But in fact, it is common for images to have a gamma of
less than 1. There are three good reasons for this:
* For reasons detailed in Gamma Tutorial (Chapter 13), all
video cameras apply a "gamma correction" function to the
intensity information. This causes the video signal to have
a gamma of about 0.5 relative to the light intensity in the
original scene. Thus, images obtained by frame-grabbing
video already have a gamma of about 0.5.
* The human eye has a nonlinear response to intensity, so
linear encoding of samples either wastes sample codes in
bright areas of the image, or provides too few sample codes
to avoid banding artifacts in dark areas of the image, or
both. At least 12 bits per sample are needed to avoid
visible artifacts in linear encoding with a 100:1 image
intensity range. An image gamma in the range 0.3 to 0.5
allocates sample values in a way that roughly corresponds to
the eye's response, so that 8 bits/sample are enough to
avoid artifacts caused by insufficient sample precision in
almost all images. This makes "gamma encoding" a much
better way of storing digital images than the simpler linear
encoding.
* Many images are created on PCs or workstations with no gamma
correction hardware and no software willing to provide gamma
correction either. In these cases, the images have had
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 66]
10 June 1996
their lighting and color chosen to look best on this
platform --- they can be thought of as having "manual" gamma
correction built in. To see what the image author intended,
it is necessary to treat such images as having a file_gamma
value in the range 0.4-0.6, depending on the room lighting
level that the author was working in.
In practice, image gamma values around 1.0 and around 0.5 are both
widely found. Older image standards such as GIF often do not
account for this fact. The JFIF standard specifies that images in
that format should use linear samples, but many JFIF images found
on the Internet actually have a gamma somewhere near 0.4 or 0.5.
The variety of images found and the variety of systems that people
display them on have led to widespread problems with images
appearing "too dark" or "too light".
PNG expects viewers to compensate for image gamma at the time that
the image is displayed. Another possible approach is to expect
encoders to convert all images to a uniform gamma at encoding
time. While that method would speed viewers slightly, it has
fundamental flaws:
* Gamma correction is inherently lossy due to quantization and
roundoff error. Requiring conversion at encoding time thus
causes irreversible loss. Since PNG is intended to be a
lossless storage format, this is undesirable; we should
store unmodified source data.
* The encoder might not know the source gamma value. If the
decoder does gamma correction at viewing time, it can adjust
the gamma (change the displayed brightness) in response to
feedback from a human user. The encoder has no such
recourse.
* Whatever "standard" gamma we settled on would be wrong for
some displays. Hence viewers would still need gamma
correction capability.
Since there will always be images with no gamma or an incorrect
recorded gamma, good viewers will need to incorporate gamma
adjustment code anyway. Gamma correction at viewing time is thus
the right way to go.
See Gamma Tutorial (Chapter 13) for more information.
12.8. Non-premultiplied alpha
PNG uses "unassociated" or "non-premultiplied" alpha so that
images with separate transparency masks can be stored losslessly.
Another common technique, "premultiplied alpha", stores pixel
values premultiplied by the alpha fraction; in effect, the image
is already composited against a black background. Any image data
hidden by the transparency mask is irretrievably lost by that
method, since multiplying by a zero alpha value always produces
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 67]
10 June 1996
zero.
Some image rendering techniques generate images with premultiplied
alpha (the alpha value actually represents how much of the pixel
is covered by the image). This representation can be converted to
PNG by dividing the sample values by alpha, except where alpha is
zero. The result will look good if displayed by a viewer that
handles alpha properly, but will not look very good if the viewer
ignores the alpha channel.
Although each form of alpha storage has its advantages, we did not
want to require all PNG viewers to handle both forms. We
standardized on non-premultiplied alpha as being the lossless and
more general case.
12.9. Filtering
PNG includes filtering capability because filtering can
significantly reduce the compressed size of truecolor and
grayscale images. Filtering is also sometimes of value on
indexed-color images, although this is less common.
The filter algorithms are defined to operate on bytes, rather than
pixels; this gains simplicity and speed with very little cost in
compression performance. Tests have shown that filtering is
usually ineffective for images with fewer than 8 bits per sample,
so providing pixelwise filtering for such images would be
pointless. For 16 bit/sample data, bytewise filtering is nearly
as effective as pixelwise filtering, because MSBs are predicted
from adjacent MSBs, and LSBs are predicted from adjacent LSBs.
The encoder is allowed to change filters for each new scanline.
This creates no additional complexity for decoders, since a
decoder is required to contain defiltering logic for every filter
type anyway. The only cost is an extra byte per scanline in the
pre-compression datastream. Our tests showed that when the same
filter is selected for all scanlines, this extra byte compresses
away to almost nothing, so there is little storage cost compared
to a fixed filter specified for the whole image. And the
potential benefits of adaptive filtering are too great to ignore.
Even with the simplistic filter-choice heuristics so far
discovered, adaptive filtering usually outperforms fixed filters.
In particular, an adaptive filter can change behavior for
successive passes of an interlaced image; a fixed filter cannot.
12.10. Text strings
Most graphics file formats include the ability to store some
textual information along with the image. But many applications
need more than that: they want to be able to store several
identifiable pieces of text. For example, a database using PNG
files to store medical X-rays would likely want to include
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 68]
10 June 1996
patient's name, doctor's name, etc. A simple way to do this in
PNG would be to invent new private chunks holding text. The
disadvantage of such an approach is that other applications would
have no idea what was in those chunks, and would simply ignore
them. Instead, we recommend that textual information be stored in
standard tEXt chunks with suitable keywords. Use of tEXt tells
any PNG viewer that the chunk contains text that might be of
interest to a human user. Thus, a person looking at the file with
another viewer will still be able to see the text, and even
understand what it is if the keywords are reasonably selfexplanatory. (To this end, we recommend spelled-out keywords, not
abbreviations that will be hard for a person to understand.
Saving a few bytes on a keyword is false economy.)
The ISO 8859-1 (Latin-1) character set was chosen as a compromise
between functionality and portability. Some platforms cannot
display anything more than 7-bit ASCII characters, while others
can handle characters beyond the Latin-1 set. We felt that
Latin-1 represents a widely useful and reasonably portable
character set. Latin-1 is a direct subset of character sets
commonly used on popular platforms such as Microsoft Windows and X
Windows. It can also be handled on Macintosh systems with a
simple remapping of characters.
There is presently no provision for text employing character sets
other than Latin-1. We recognize that the need for other character
sets will increase. However, PNG already requires that
programmers implement a number of new and unfamiliar features, and
text representation is not PNG's primary purpose. Since PNG
provides for the creation and public registration of new ancillary
chunks of general interest, we expect that text chunks for other
character sets, such as Unicode, eventually will be registered and
increase gradually in popularity.
12.11. PNG file signature
The first eight bytes of a PNG file always contain the following
values:
(decimal)
(hexadecimal)
(ASCII C notation)
137
89
\211
80
50
P
78
4e
N
71
47
G
13
0d
\r
10 26 10
0a 1a 0a
\n \032 \n
This signature both identifies the file as a PNG file and provides
for immediate detection of common file-transfer problems. The
first two bytes distinguish PNG files on systems that expect the
first two bytes to identify the file type uniquely. The first
byte is chosen as a non-ASCII value to reduce the probability that
a text file may be misrecognized as a PNG file; also, it catches
bad file transfers that clear bit 7. Bytes two through four name
the format. The CR-LF sequence catches bad file transfers that
alter newline sequences. The control-Z character stops file
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 69]
10 June 1996
display under MS-DOS. The final line feed checks for the inverse
of the CR-LF translation problem.
A decoder may further verify that the next eight bytes contain an
IHDR chunk header with the correct chunk length; this will catch
bad transfers that drop or alter null (zero) bytes.
Note that there is no version number in the signature, nor indeed
anywhere in the file. This is intentional: the chunk mechanism
provides a better, more flexible way to handle format extensions,
as explained in Chunk naming conventions (Section 12.13).
12.12. Chunk layout
The chunk design allows decoders to skip unrecognized or
uninteresting chunks: it is simply necessary to skip the
appropriate number of bytes, as determined from the length field.
Limiting chunk length to (2^31)-1 bytes avoids possible problems
for implementations that cannot conveniently handle 4-byte
unsigned values. In practice, chunks will usually be much shorter
than that anyway.
A separate CRC is provided for each chunk in order to detect
badly-transferred images as quickly as possible. In particular,
critical data such as the image dimensions can be validated before
being used.
The chunk length is excluded from the CRC so that the CRC can be
calculated as the data is generated; this avoids a second pass
over the data in cases where the chunk length is not known in
advance. Excluding the length from the CRC does not create any
extra risk of failing to discover file corruption, since if the
length is wrong, the CRC check will fail: the CRC will be computed
on the wrong set of bytes and then be tested against the wrong
value from the file.
12.13. Chunk naming conventions
The chunk naming conventions allow safe, flexible extension of the
PNG format. This mechanism is much better than a format version
number, because it works on a feature-by-feature basis rather than
being an overall indicator. Decoders can process newer files if
and only if the files use no unknown critical features (as
indicated by finding unknown critical chunks). Unknown ancillary
chunks can be safely ignored. We decided against having an
overall format version number because experience has shown that
format version numbers hurt portability as much as they help.
Version numbers tend to be set unnecessarily high, leading to
older decoders rejecting files that they could have processed
(this was a serious problem for several years after the GIF89 spec
came out, for example). Furthermore, private extensions can be
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 70]
10 June 1996
made either critical or ancillary, and standard decoders will
react appropriately; overall version numbers are no help for
private extensions.
A hypothetical chunk for vector graphics would be a critical
chunk, since if ignored, important parts of the intended image
would be missing. A chunk carrying the Mandelbrot set coordinates
for a fractal image would be ancillary, since other applications
could display the image without understanding what the image
represents. In general, a chunk type should be made critical only
if it is impossible to display a reasonable representation of the
intended image without interpreting that chunk.
The public/private property bit ensures that any newly defined
public chunk type name cannot conflict with proprietary chunks
that could be in use somewhere. However, this does not protect
users of private chunk names from the possibility that someone
else may use the same chunk name for a different purpose. It is a
good idea to put additional identifying information at the start
of the data for any private chunk type.
When a PNG file is modified, certain ancillary chunks may need to
be changed to reflect changes in other chunks. For example, a
histogram chunk needs to be changed if the image data changes. If
the file editor does not recognize histogram chunks, copying them
blindly to a new output file is incorrect; such chunks should be
dropped. The safe/unsafe property bit allows ancillary chunks to
be marked appropriately.
Not all possible modification scenarios are covered by the
safe/unsafe semantics. In particular, chunks that are dependent
on the total file contents are not supported. (An example of such
a chunk is an index of IDAT chunk locations within the file:
adding a comment chunk would inadvertently break the index.)
Definition of such chunks is discouraged. If absolutely necessary
for a particular application, such chunks can be made critical
chunks, with consequent loss of portability to other applications.
In general, ancillary chunks can depend on critical chunks but not
on other ancillary chunks. It is expected that mutually dependent
information should be put into a single chunk.
In some situations it may be unavoidable to make one ancillary
chunk dependent on another. Although the chunk property bits are
insufficient to represent this case, a simple solution is
available: in the dependent chunk, record the CRC of the chunk
depended on. It can then be determined whether that chunk has
been changed by some other program.
The same technique can be useful for other purposes. For example,
if a program relies on the palette being in a particular order, it
can store a private chunk containing the CRC of the PLTE chunk.
If this value matches when the file is again read in, then it
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 71]
10 June 1996
provides high confidence that the palette has not been tampered
with. Note that it is not necessary to mark the private chunk
unsafe-to-copy when this technique is used; thus, such a private
chunk can survive other editing of the file.
12.14. Palette histograms
A viewer may not be able to provide as many colors as are listed
in the image's palette. (For example, some colors could be
reserved by a window system.) To produce the best results in this
situation, it is helpful to have information about the frequency
with which each palette index actually appears, in order to choose
the best palette for dithering or to drop the least-used colors.
Since images are often created once and viewed many times, it
makes sense to calculate this information in the encoder, although
it is not mandatory for the encoder to provide it.
Other image formats have usually addressed this problem by
specifying that the palette entries should appear in order of
frequency of use. That is an inferior solution, because it
doesn't give the viewer nearly as much information: the viewer
can't determine how much damage will be done by dropping the last
few colors. Nor does a sorted palette give enough information to
choose a target palette for dithering, in the case that the viewer
must reduce the number of colors substantially. A palette
histogram provides the information needed to choose such a target
palette without making a pass over the image data.
13. Appendix: Gamma Tutorial
(This appendix is not part of the formal PNG specification.)
It would be convenient for graphics programmers if all of the
components of an imaging system were linear. The voltage coming from
an electronic camera would be directly proportional to the intensity
(power) of light in the scene, the light emitted by a CRT would be
directly proportional to its input voltage, and so on. However,
real-world devices do not behave in this way. All CRT displays,
almost all photographic film, and many electronic cameras have
nonlinear signal-to-light-intensity or intensity-to-signal
characteristics.
Fortunately, all of these nonlinear devices have a transfer function
that is approximated fairly well by a single type of mathematical
function: a power function.
equation
This power function has the general
output = input ^ gamma
where ^ denotes exponentiation, and "gamma" (often printed using the
Greek letter gamma, thus the name) is simply the exponent of the
power function.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 72]
10 June 1996
By convention, "input" and "output" are both scaled to the range
0..1, with 0 representing black and 1 representing maximum white (or
red, etc). Normalized in this way, the power function is completely
described by a single number, the exponent "gamma".
So, given a particular device, we can measure its output as a
function of its input, fit a power function to this measured transfer
function, extract the exponent, and call it gamma. We often say
"this device has a gamma of 2.5" as a shorthand for "this device has
a power-law response with an exponent of 2.5". We can also talk
about the gamma of a mathematical transform, or of a lookup table in
a frame buffer, so long as the input and output of the thing are
related by the power-law expression above.
How do gammas combine?
Real imaging systems will have several components, and more than
one of these can be nonlinear. If all of the components have
transfer characteristics that are power functions, then the
transfer function of the entire system is also a power function.
The exponent (gamma) of the whole system's transfer function is
just the product of all of the individual exponents (gammas) of
the separate stages in the system.
Also, stages that are linear pose no problem, since a power
function with an exponent of 1.0 is really a linear function.
a linear transfer function is just a special case of a power
function, with a gamma of 1.0.
So
Thus, as long as our imaging system contains only stages with
linear and power-law transfer functions, we can meaningfully talk
about the gamma of the entire system. This is indeed the case
with most real imaging systems.
What should overall gamma be?
If the overall gamma of an imaging system is 1.0, its output is
linearly proportional to its input. This means that the ratio
between the intensities of any two areas in the reproduced image
will be the same as it was in the original scene. It might seem
that this should always be the goal of an imaging system: to
accurately reproduce the tones of the original scene. Alas, that
is not the case.
When the reproduced image is to be viewed in "bright surround"
conditions, where other white objects nearby in the room have
about the same brightness as white in the image, then an overall
gamma of 1.0 does indeed give real-looking reproduction of a
natural scene. Photographic prints viewed under room light and
computer displays in bright room light are typical "bright
surround" viewing conditions.
Boutell, et al
Informational
[Page 73]
INTERNET-DRAFT
PNG: Portable Network Graphics
10 June 1996
However, sometimes images are intended to be viewed in "dark
surround" conditions, where the room is substantially black except
for the image. This is typical of the way movies and slides
(transparencies) are viewed by projection. Under these
circumstances, an accurate reproduction of the original scene
results in an image that human viewers judge as "flat" and lacking
in contrast. It turns out that the projected image needs to have
a gamma of about 1.5 relative to the original scene for viewers to
judge it "natural". Thus, slide film is designed to have a gamma
of about 1.5, not 1.0.
There is also an intermediate condition called "dim surround",
where the rest of the room is still visible to the viewer, but is
noticeably darker than the reproduced image itself. This is
typical of television viewing, at least in the evening, as well as
subdued-light computer work areas. In dim surround conditions,
the reproduced image needs to have a gamma of about 1.25 relative
to the original scene in order to look natural.
The requirement for boosted contrast (gamma) in dark surround
conditions is due to the way the human visual system works, and
applies equally well to computer monitors. Thus, a PNG viewer
trying to achieve the maximum realism for the images it displays
really needs to know what the room lighting conditions are, and
adjust the gamma of the displayed image accordingly.
If asking the user about room lighting conditions is inappropriate
or too difficult, just assume that the overall gamma
(viewing_gamma as defined below) should be 1.0 or 1.25. That's
all that most systems that implement gamma correction do.
What is a CRT's gamma?
All CRT displays have a power-law transfer characteristic with a
gamma of about 2.5. This is due to the physical processes
involved in controlling the electron beam in the electron gun, and
has nothing to do with the phosphor.
An exception to this rule is fancy "calibrated" CRTs that have
internal electronics to alter their transfer function. If you
have one of these, you probably should believe what the
manufacturer tells you its gamma is. But in all other cases,
assuming 2.5 is likely to be pretty accurate.
There are various images around that purport to measure gamma,
usually by comparing the intensity of an area containing
alternating white and black with a series of areas of continuous
gray of different intensity. These are usually not reliable.
Test images that use a "checkerboard" pattern of black and white
are the worst, because a single white pixel will be reproduced
considerably darker than a large area of white. An image that
uses alternating black and white horizontal lines (such as the
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 74]
10 June 1996
"gamma.png" test image at
ftp://ftp.uu.net/graphics/png/images/suite/gamma.png) is much
better, but even it may be inaccurate at high "picture" settings
on some CRTs.
If you have a good photometer, you can measure the actual light
output of a CRT as a function of input voltage and fit a power
function to the measurements. However, note that this procedure
is very sensitive to the CRT's black level adjustment, somewhat
sensitive to its picture adjustment, and also affected by ambient
light. Furthermore, CRTs spread some light from bright areas of
an image into nearby darker areas; a single bright spot against a
black background may be seen to have a "halo". Your measuring
technique will need to minimize the effects of this.
Because of the difficulty of measuring gamma, using either test
images or measuring equipment, you're usually better off just
assuming gamma is 2.5 rather than trying to measure it.
What is gamma correction?
A CRT has a gamma of 2.5, and we can't change that. To get an
overall gamma of 1.0 (or somewhere near that) for an imaging
system, we need to have at least one other component of the "image
pipeline" that is nonlinear. If, in fact, there is only one
nonlinear stage in addition to the CRT, then it's traditional to
say that the CRT has a certain gamma, and that the other nonlinear
stage provides "gamma correction" to compensate for the CRT.
However, exactly where the "correction" is done depends on
circumstance.
In all broadcast video systems, gamma correction is done in the
camera. This choice was made in the days when television
electronics were all analog, and a good gamma-correction circuit
was expensive to build. The original NTSC video standard required
cameras to have a transfer function with a gamma of 1/2.2, or
about 0.45. Recently, a more complex two-part transfer function
has been adopted [SMPTE-170M], but its behavior can be well
approximated by a power function with a gamma of 0.5. When the
resulting image is displayed on a CRT with a gamma of 2.5, the
image on screen ends up with a gamma of about 1.25 relative to the
original scene, which is appropriate for "dim surround" viewing.
These days, video signals are often digitized and stored in
computer frame buffers. This works fine, but remember that gamma
correction is "built into" the video signal, and so the digitized
video has a gamma of about 0.5 relative to the original scene.
Computer rendering programs often produce linear samples. To
display these correctly, intensity on the CRT must be directly
proportional to the sample values in the frame buffer. This can
be done with a special hardware lookup table between the frame
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 75]
10 June 1996
buffer and the CRT hardware. The lookup table (often called LUT)
is loaded with a mapping that implements a power function with a
gamma of 0.4, thus providing "gamma correction" for the CRT gamma.
Thus, gamma correction sometimes happens before the frame buffer,
sometimes after. As long as images created in a particular
environment are always displayed in that environment, everything
is fine. But when people try to exchange images, differences in
gamma correction conventions often result in images that seem far
too bright and washed out, or far too dark and contrasty.
Gamma-encoded samples are good
So, is it better to do gamma correction before or after the frame
buffer?
In an ideal world, sample values would be stored in floating
point, there would be lots of precision, and it wouldn't really
matter much. But in reality, we're always trying to store images
in as few bits as we can.
If we decide to use samples that are linearly proportional to
intensity, and do the gamma correction in the frame buffer LUT, it
turns out that we need to use at least 12 bits for each of red,
green, and blue to have enough precision in intensity. With any
less than that, we will sometimes see "contour bands" or "Mach
bands" in the darker areas of the image, where two adjacent sample
values are still far enough apart in intensity for the difference
to be visible.
However, through an interesting coincidence, the human eye's
subjective perception of brightness is related to the physical
stimulation of light intensity in a manner that is very much like
the power function used for gamma correction. If we apply gamma
correction to measured (or calculated) light intensity before
quantizing to an integer for storage in a frame buffer, we can get
away with using many fewer bits to store the image. In fact, 8
bits per color is almost always sufficient to avoid contouring
artifacts. This is because, since gamma correction is so closely
related to human perception, we are assigning our 256 available
sample codes to intensity values in a manner that approximates how
visible those intensity changes are to the eye. Compared to a
linear-sample image, we allocate fewer sample values to brighter
parts of the tonal range and more sample values to the darker
portions of the tonal range.
Thus, for the same apparent image quality, images using gammaencoded sample values need only about two-thirds as many bits of
storage as images using linear samples.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 76]
10 June 1996
General gamma handling
When more than two nonlinear transfer functions are involved in
the image pipeline, the term "gamma correction" becomes too vague.
If we consider a pipeline that involves capturing (or calculating)
an image, storing it in an image file, reading the file, and
displaying the image on some sort of display screen, there are at
least 5 places in the pipeline that could have nonlinear transfer
functions. Let's give each a specific name for their
characteristic gamma:
camera_gamma
the characteristic of the image sensor
encoding_gamma
the gamma of any transformation performed by the software
writing the image file
decoding_gamma
the gamma of any transformation performed by the software
reading the image file
LUT_gamma
the gamma of the frame buffer LUT, if present
CRT_gamma
the gamma of the CRT, generally 2.5
In addition, let's add a few other names:
file_gamma
the gamma of the image in the file, relative to the original
scene. This is
file_gamma = camera_gamma * encoding_gamma
display_gamma
the gamma of the "display system" downstream of the frame
buffer. This is
display_gamma = LUT_gamma * CRT_gamma
viewing_gamma
the overall gamma that we want to obtain to produce pleasing
images --- generally 1.0 to 1.5.
The file_gamma value, as defined above, is what goes in the gAMA
chunk in a PNG file. If file_gamma is not 1.0, we know that gamma
correction has been done on the sample values in the file, and we
could call them "gamma corrected" samples. However, since there
can be so many different values of gamma in the image display
chain, and some of them are not known at the time the image is
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 77]
10 June 1996
written, the samples are not really being "corrected" for a
specific display condition. We are really using a power function
in the process of encoding an intensity range into a small integer
field, and so it is more correct to say "gamma encoded" samples
instead of "gamma corrected" samples.
When displaying an image file, the image decoding program is
responsible for making the overall gamma of the system equal to
the desired viewing_gamma, by selecting the decoding_gamma
appropriately. When displaying a PNG file, the gAMA chunk
provides the file_gamma value. The display_gamma may be known for
this machine, or it might be obtained from the system software, or
the user might have to be asked what it is. The correct
viewing_gamma depends on lighting conditions, and that will
generally have to come from the user.
Ultimately, you should have
file_gamma * decoding_gamma * display_gamma = viewing_gamma
Some specific examples
In digital video systems, camera_gamma is about 0.5 by declaration
of the various video standards documents. CRT_gamma is 2.5 as
usual, while encoding_gamma, decoding_gamma, and LUT_gamma are all
1.0. As a result, viewing_gamma ends up being about 1.25.
On frame buffers that have hardware gamma correction tables, and
that are calibrated to display linear samples correctly,
display_gamma is 1.0.
Many workstations and X terminals and PC displays lack gamma
correction lookup tables. Here, LUT_gamma is always 1.0, so
display_gamma is 2.5.
On the Macintosh, there is a LUT. By default, it is loaded with a
table whose gamma is about 0.72, giving a display_gamma (LUT and
CRT combined) of about 1.8. Some Macs have a "Gamma" control
panel that allows gamma to be changed to 1.0, 1.2, 1.4, 1.8, or
2.2. These settings load alternate LUTs that are designed to give
a display_gamma that is equal to the label on the selected button.
Thus, the "Gamma" control panel setting can be used directly as
display_gamma in decoder calculations.
On recent SGI systems, there is a hardware gamma-correction table
whose contents are controlled by the (privileged) "gamma" program.
The gamma of the table is actually the reciprocal of the number
that "gamma" prints, and it does not include the CRT gamma. To
obtain the display_gamma, you need to find the SGI system gamma
(either by looking in a file, or asking the user) and then
calculating
Boutell, et al
INTERNET-DRAFT
Informational
[Page 78]
PNG: Portable Network Graphics
10 June 1996
display_gamma = 2.5 / SGI_system_gamma
You will find SGI systems with the system gamma set to 1.0 and 2.2
(or higher), but the default when machines are shipped is 1.7.
A note about video gamma
The original NTSC video standards specified a simple power-law
camera transfer function with a gamma of 1/2.2 or 0.45. This is
not possible to implement exactly in analog hardware because the
function has infinite slope at x=0, so all cameras deviated to
some degree from this ideal. More recently, a new camera transfer
function that is physically realizable has been accepted as a
standard [SMPTE-170M]. It is
Vout = 4.5 * Vin
Vout = 1.099 * (Vin^0.45) - 0.099
if Vin < 0.018
if Vin >= 0.018
where Vin and Vout are measured on a scale of 0 to 1. Although
the exponent remains 0.45, the multiplication and subtraction
change the shape of the transfer function, so it is no longer a
pure power function. If you want to perform extremely precise
calculations on video signals, you should use the expression above
(or its inverse, as required).
However, PNG does not provide a way to specify that an image uses
this exact transfer function; the gAMA chunk always assumes a pure
power-law function. If we plot the two-part transfer function
above along with the family of pure power functions, we find that
a power function with a gamma of about 0.5 to 0.52 (not 0.45) most
closely approximates the transfer function. Thus, when writing a
PNG file with data obtained from digitizing the output of a modern
video camera, the gAMA chunk should contain 0.5 or 0.52, not 0.45.
The remaining difference between the true transfer function and
the power function is insignificant for almost all purposes. (In
fact, the alignment errors in most cameras are likely to be larger
than the difference between these functions.) The designers of
PNG deemed the simplicity and flexibility of a power-law
definition of gAMA to be more important than being able to
describe the SMPTE-170M transfer curve exactly.
The PAL and SECAM video standards specify a power-law camera
transfer function with a gamma of 1/2.8 or 0.36 --- not the 1/2.2
of NTSC. However, this is too low in practice, so real cameras
are likely to have their gamma set close to NTSC practice. Just
guessing 0.45 or 0.5 is likely to give you viewable results, but
if you want precise values you'll probably have to measure the
particular camera.
Boutell, et al
INTERNET-DRAFT
Further reading
Informational
PNG: Portable Network Graphics
[Page 79]
10 June 1996
If you have access to the World Wide Web, read Charles Poynton's
excellent "Gamma FAQ" [GAMMA-FAQ] for more information about
gamma.
14. Appendix: Color Tutorial
(This appendix is not part of the formal PNG specification.)
About chromaticity
The cHRM chunk is used, together with the gAMA chunk, to convey
precise color information so that a PNG image can be displayed or
printed with better color fidelity than is possible without this
information. The preceding chapters state how this information is
encoded in a PNG image. This tutorial briefly outlines the
underlying color theory for those who might not be familiar with
it.
Note that displaying an image with incorrect gamma will produce
much larger color errors than failing to use the chromaticity
data. First be sure the monitor set-up and gamma correction are
right, then worry about chromaticity.
The problem
The color of an object depends not only on the precise spectrum of
light emitted or reflected from it, but also on the observer --their species, what else they can see at the same time, even what
they have recently looked at! Furthermore, two very different
spectra can produce exactly the same color sensation. Color is
not an objective property of real-world objects; it is a
subjective, biological sensation. However, by making some
simplifying assumptions (such as: we are talking about human
vision) it is possible to produce a mathematical model of color
and thereby obtain good color accuracy.
Device-dependent color
Display the same RGB data on three different monitors, side by
side, and you will get a noticeably different color balance on
each display. This is because each monitor emits a slightly
different shade and intensity of red, green, and blue light. RGB
is an example of a device-dependent color model --- the color you
get depends on the device. This also means that a particular
color --- represented as say RGB 87, 146, 116 on one monitor --might have to be specified as RGB 98, 123, 104 on another to
produce the same color.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 80]
10 June 1996
Device-independent color
A full physical description of a color would require specifying
the exact spectral power distribution of the light source.
Fortunately, the human eye and brain are not so sensitive as to
require exact reproduction of a spectrum. Mathematical, deviceindependent color models exist that describe fairly well how a
particular color will be seen by humans. The most important
device-independent color model, to which all others can be
related, was developed by the International Lighting Committee
(CIE, in French) and is called XYZ.
In XYZ, X is the sum of a weighted power distribution over the
whole visible spectrum. So are Y and Z, each with different
weights. Thus any arbitrary spectral power distribution is
condensed down to just three floating point numbers. The weights
were derived from color matching experiments done on human
subjects in the 1920s. CIE XYZ has been an International Standard
since 1931, and it has a number of useful properties:
* two colors with the same XYZ values will look the same to
humans
* two colors with different XYZ values will not look the same
* the Y value represents all the brightness information
(luminance)
* the XYZ color of any object can be objectively measured
Color models based on XYZ have been used for many years by people
who need accurate control of color --- lighting engineers for film
and TV, paint and dyestuffs manufacturers, and so on. They are
thus proven in industrial use. Accurate, device-independent color
started to spread from high-end, specialized areas into the
mainstream during the late 1980s and early 1990s, and PNG takes
notice of that trend.
Calibrated, device-dependent color
Traditionally, image file formats have used uncalibrated, devicedependent color. If the precise details of the original display
device are known, it becomes possible to convert the devicedependent colors of a particular image to device-independent ones.
Making simplifying assumptions, such as working with CRTs (which
are much easier than printers), all we need to know are the XYZ
values of each primary color and the CRT_gamma.
So why does PNG not store images in XYZ instead of RGB? Well, two
reasons. First, storing images in XYZ would require more bits of
precision, which would make the files bigger. Second, all
programs would have to convert the image data before viewing it.
Whether calibrated or not, all variants of RGB are close enough
that undemanding viewers can get by with simply displaying the
data without color correction. By storing calibrated RGB, PNG
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 81]
10 June 1996
retains compatibility with existing programs that expect RGB data,
yet provides enough information for conversion to XYZ in
applications that need precise colors. Thus, we get the best of
both worlds.
What are chromaticity and luminance?
Chromaticity is an objective measurement of the color of an
object, leaving aside the brightness information. Chromaticity
uses two parameters x and y, which are readily calculated from
XYZ:
x = X / (X + Y + Z)
y = Y / (X + Y + Z)
XYZ colors having the same chromaticity values will appear to have
the same hue but can vary in absolute brightness. Notice that x,y
are dimensionless ratios, so they have the same values no matter
what units we've used for X,Y,Z.
The Y value of an XYZ color is directly proportional to its
absolute brightness and is called the luminance of the color. We
can describe a color either by XYZ coordinates or by chromaticity
x,y plus luminance Y. The XYZ form has the advantage that it is
linearly related to (linear, gamma=1.0) RGB color spaces.
How are computer monitor colors described?
The "white point" of a monitor is the chromaticity x,y of the
monitor's nominal white, that is, the color produced when
R=G=B=maximum.
It's customary to specify monitor colors by giving the
chromaticities of the individual phosphors R, G, and B, plus the
white point. The white point allows one to infer the relative
brightnesses of the three phosphors, which isn't determined by
their chromaticities alone.
Note that the absolute brightness of the monitor is not specified.
For computer graphics work, we generally don't care very much
about absolute brightness levels. Instead of dealing with
absolute XYZ values (in which X,Y,Z are expressed in physical
units of radiated power, such as candelas per square meter), it is
convenient to work in "relative XYZ" units, where the monitor's
nominal white is taken to have a luminance (Y) of 1.0. Given this
assumption, it's simple to compute XYZ coordinates for the
monitor's white, red, green, and blue from their chromaticity
values.
Why does cHRM use x,y rather than XYZ? Simply because that is how
manufacturers print the information in their spec sheets!
Usually, the first thing a program will do is convert the cHRM
Boutell, et al
Informational
INTERNET-DRAFT
PNG: Portable Network Graphics
[Page 82]
10 June 1996
chromaticities into relative XYZ space.
What can I do with it?
If a PNG file has the gAMA and cHRM chunks, the source_RGB values
can be converted to XYZ. This lets you:
* do accurate grayscale conversion (just use the Y component)
* convert to RGB for your own monitor (to see the original
colors)
* print the image in Level 2 PostScript with better color
fidelity than a simple RGB to CMYK conversion could provide
* calculate an optimal color palette
* pass the image data to a color management system
* etc.
How do I convert from source_RGB to XYZ?
Make a few simplifying assumptions first, like the monitor really
is jet black with no input and the guns don't interfere with one
another. Then, given that you know the CIE XYZ values for each of
red, green, and blue for a particular monitor, you put them into a
matrix m:
m =
Xr Xg Xb
Yr Yg Yb
Zr Zg Zb
Here we assume we are working with linear RGB floating point data
in the range 0..1. If the gamma is not 1.0, make it so on the
floating point data. Then convert source_RGB to XYZ by matrix
multiplication:
X
R
Y = m G
Z
B
In other words, X = Xr*R + Xg*G + Xb*B, and similarly for Y and Z.
You can go the other way too:
R
X
G = im Y
B
Z
where im is the inverse of the matrix m.
What is a gamut?
The gamut of a device is the subset of visible colors which that
device can display. (It has nothing to do with gamma.) The gamut
of an RGB device can be visualized as a polyhedron in XYZ space;
the vertices correspond to the device's black, blue, red, green,
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 83]
10 June 1996
magenta, cyan, yellow and white.
Different devices have different gamuts, in other words one device
will be able to display certain colors (usually highly saturated
ones) that another device cannot. The gamut of a particular RGB
device can be determined from its R, G, and B chromaticities and
white point (the same values given in the cHRM chunk). The gamut
of a color printer is more complex and can only be determined by
measurement. However, printer gamuts are typically smaller than
monitor gamuts, meaning that there can be many colors in a
displayable image that cannot physically be printed.
Converting image data from one device to another generally results
in gamut mismatches --- colors that cannot be represented exactly
on the destination device. The process of making the colors fit,
which can range from a simple clip to elaborate nonlinear scaling
transformations, is termed gamut mapping. The aim is to produce a
reasonable visual representation of the original image.
Further reading
References [COLOR-1] through [COLOR-5] provide more detail about
color theory.
15. Appendix: Sample CRC Code
The following sample code represents a practical implementation of
the CRC (Cyclic Redundancy Check) employed in PNG chunks. (See also
ISO 3309 [ISO-3309] or ITU-T V.42 [ITU-V42] for a formal
specification.)
The sample code is in the ANSI C programming language.
may find it easier to read with these hints:
Non C users
&
Bitwise AND operator.
^
Bitwise exclusive-OR operator. (Caution: elsewhere in this
document, ^ represents exponentiation.)
>>
Bitwise right shift operator. When applied to an unsigned
quantity, as here, right shift inserts zeroes at the left.
!
Logical NOT operator.
++
"n++" increments the variable n.
Boutell, et al
INTERNET-DRAFT
Informational
[Page 84]
PNG: Portable Network Graphics
0xNNN
0x introduces a hexadecimal (base 16) constant.
indicates a long value (at least 32 bits).
10 June 1996
Suffix L
/* Table of CRCs of all 8-bit messages. */
unsigned long crc_table[256];
/* Flag: has the table been computed? Initially false. */
int crc_table_computed = 0;
/* Make the table for a fast CRC. */
void make_crc_table(void)
{
unsigned long c;
int n, k;
for (n = 0; n < 256; n++) {
c = (unsigned long) n;
for (k = 0; k < 8; k++) {
if (c & 1)
c = 0xedb88320L ^ (c >> 1);
else
c = c >> 1;
}
crc_table[n] = c;
}
crc_table_computed = 1;
}
/* Update a running CRC with the bytes buf[0..len-1]--the CRC
should be initialized to all 1's, and the transmitted value
is the 1's complement of the final running CRC (see the
crc() routine below)). */
unsigned long update_crc(unsigned long crc, unsigned char *buf,
int len)
{
unsigned long c = crc;
int n;
if (!crc_table_computed)
make_crc_table();
for (n = 0; n < len; n++) {
c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
}
return c;
}
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 85]
10 June 1996
/* Return the CRC of the bytes buf[0..len-1]. */
unsigned long crc(unsigned char *buf, int len)
{
return update_crc(0xffffffffL, buf, len) ^ 0xffffffffL;
}
16. Appendix: Online Resources
(This appendix is not part of the formal PNG specification.)
This appendix gives the locations of some Internet resources for PNG
software developers. By the nature of the Internet, the list is
incomplete and subject to change.
Archive sites
The latest released versions of this document and related
information can always be found at the PNG FTP archive site,
ftp://ftp.uu.net/graphics/png/. The PNG specification is
available in several formats, including HTML, plain text, and
PostScript.
Reference implementation and test images
A reference implementation in portable C is available from the PNG
FTP archive site, ftp://ftp.uu.net/graphics/png/src/. The
reference implementation is freely usable in all applications,
including commercial applications.
Test images are available from
ftp://ftp.uu.net/graphics/png/images/.
Electronic mail
The maintainers of the PNG specification can be contacted by email at [email protected]
PNG home page
There is a World Wide Web home page for PNG at
http://quest.jpl.nasa.gov/PNG/. This page is a central location
for current information about PNG and PNG-related tools.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 86]
10 June 1996
17. Appendix: Revision History
(This appendix is not part of the formal PNG specification.)
The PNG format has been frozen since the Ninth Draft of March 7,
1995, and all future changes are intended to be backwards compatible.
The revisions since the Ninth Draft are simply clarifications,
improvements in presentation, and additions of supporting material.
Changes since the Tenth Draft of 5 May, 1995
* Clarified meaning of a suggested-palette PLTE chunk in a
truecolor image that uses transparency
* Clarified exact semantics of sBIT and allowed bit depth
scaling procedures
* Clarified status of spaces in tEXt chunk keywords
* Distinguished private and public extension values in type
and method fields
* Added a "Creation Time" tEXt keyword
* Macintosh representation of PNG specified
* Added discussion of security issues
* Added more extensive discussion of gamma and chromaticity
handling, including tutorial appendixes
* Added a glossary
* Editing and reformatting
18. References
[COLOR-1]
Hall, Roy, Illumination and Color in Computer Generated Imagery.
Springer-Verlag, New York, 1989. ISBN 0-387-96774-5.
[COLOR-2]
Kasson, J., and W. Plouffe, "An Analysis of Selected Computer
Interchange Color Spaces", ACM Transactions on Graphics, vol 11 no
4 (1992), pp 373-405.
[COLOR-3]
Lilley, C., F. Lin, W.T. Hewitt, and T.L.J. Howard, Colour in
Computer Graphics. CVCP, Sheffield, 1993. ISBN 1-85889-022-5.
Also available from
<URL:http://info.mcc.ac.uk/CGU/ITTI/Col/colour_announce.html>
[COLOR-4]
Stone, M.C., W.B. Cowan, and J.C. Beatty, "Color gamut mapping and
the printing of digital images", ACM Transactions on Graphics, vol
7 no 3 (1988), pp 249-292.
[COLOR-5]
Travis, David, Effective Color Displays --- Theory and Practice.
Academic Press, London, 1991. ISBN 0-12-697690-2.
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 87]
10 June 1996
[GAMMA-FAQ]
Poynton, C., "Gamma FAQ".
<URL:http://www.inforamp.net/~poynton/Poynton-colour.html>
[ISO-3309]
International Organization for Standardization, "Information
Processing Systems --- Data Communication High-Level Data Link
Control Procedure --- Frame Structure", IS 3309, October 1984, 3rd
Edition.
[ISO-8859]
International Organization for Standardization, "Information
Processing --- 8-bit Single-Byte Coded Graphic Character Sets --Part 1: Latin Alphabet No. 1", IS 8859-1, 1987.
Also see sample files at
ftp://ftp.uu.net/graphics/png/documents/iso_8859-1.*
[ITU-BT709]
International Telecommunications Union, "Basic Parameter Values
for the HDTV Standard for the Studio and for International
Programme Exchange", ITU-R Recommendation BT.709 (formerly CCIR
Rec. 709), 1990.
[ITU-V42]
International Telecommunications Union, "Error-correcting
Procedures for DCEs Using Asynchronous-to-Synchronous Conversion",
ITU-T Recommendation V.42, 1994, Rev. 1.
[PAETH]
Paeth, A.W., "Image File Compression Made Easy", in Graphics Gems
II, James Arvo, editor. Academic Press, San Diego, 1991. ISBN
0-12-064480-0.
[POSTSCRIPT]
Adobe Systems Incorporated, PostScript Language Reference Manual,
2nd edition. Addison-Wesley, Reading, 1990. ISBN 0-201-18127-4.
[PNG-EXTENSIONS]
PNG Group, "PNG Special-Purpose Public Chunks". Available in
several formats from
ftp://ftp.uu.net/graphics/png/documents/pngextensions.*
[RFC-1123]
Braden, R., Editor, "Requirements for Internet Hosts --Application and Support", STD 3, RFC 1123, USC/Information
Sciences Institute, October 1989.
<URL:ftp://ds.internic.net/rfc/rfc1123.txt>
[RFC-1521]
Borenstein, N., and N. Freed, "MIME (Multipurpose Internet Mail
Extensions) Part One: Mechanisms for Specifying and Describing the
Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft,
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 88]
10 June 1996
September 1993.
<URL:ftp://ds.internic.net/rfc/rfc1521.txt>
[RFC-1590]
Postel, J., "Media Type Registration Procedure", RFC 1590,
USC/Information Sciences Institute, March 1994.
<URL:ftp://ds.internic.net/rfc/rfc1590.txt>
[RFC-1950]
Deutsch, P. and J-L. Gailly, "ZLIB Compressed Data Format
Specification version 3.3", RFC 1950, Aladdin Enterprises, May
1996.
<URL:ftp://ds.internic.net/rfc/rfc1950.txt>
[RFC-1951]
Deutsch, P., "DEFLATE Compressed Data Format Specification version
1.3", RFC 1951, Aladdin Enterprises, May 1996.
<URL:ftp://ds.internic.net/rfc/rfc1951.txt>
[SMPTE-170M]
Society of Motion Picture and Television Engineers, "Television
--- Composite Analog Video Signal --- NTSC for Studio
Applications", SMPTE-170M, 1994.
19. Credits
Editor
Thomas Boutell, [email protected]
Contributing Editor
Tom Lane, [email protected]
Authors
Authors' names are presented in alphabetical order.
*
*
*
*
*
*
*
*
Mark Adler, [email protected]
Thomas Boutell, [email protected]
Christian Brunschen, [email protected]
Adam M. Costello, [email protected]
Lee Daniel Crocker, [email protected]
Andreas Dilger, [email protected]
Oliver Fromme, [email protected]
Jean-loup Gailly, [email protected]
*
*
*
*
*
*
Chris Herborth, [email protected]
Alex Jakulin, [email protected]
Neal Kettler, [email protected]
Tom Lane, [email protected]
Alexander Lehmann, [email protected]
Chris Lilley, [email protected]
Boutell, et al
INTERNET-DRAFT
*
*
*
*
*
*
*
*
*
*
Informational
[Page 89]
PNG: Portable Network Graphics
10 June 1996
Dave Martindale, [email protected]
Owen Mortensen, [email protected]
Robert P. Poole, [email protected]
Glenn Randers-Pehrson, [email protected] or
[email protected]
Greg Roelofs, [email protected]
Willem van Schaik, [email protected]
Guy Schalnat, [email protected]
Paul Schmidt, [email protected]
Tim Wegner, [email protected]
Jeremy Wohl, [email protected]
The authors wish to acknowledge the contributions of the Portable
Network Graphics mailing list and the readers of comp.graphics.
Trademarks
GIF is a service mark of CompuServe Incorporated. IBM PC is a
trademark of International Business Machines Corporation.
Macintosh is a trademark of Apple Computer, Inc. Microsoft and
MS-DOS are trademarks of Microsoft Corporation. PhotoCD is a
trademark of Eastman Kodak Company. PostScript and TIFF are
trademarks of Adobe Systems Incorporated. SGI is a trademark of
Silicon Graphics, Inc. X Window System is a trademark of the
Massachusetts Institute of Technology.
IESG Note
A disclaimer by the Internet Engineering Steering Group regarding
intellectual property claims will be inserted here.
COPYRIGHT NOTICE
Copyright (c) 1996 by: Massachusetts Institute of Technology (MIT)
This W3C specification is being provided by the copyright holders
under the following license. By obtaining, using and/or copying
this specification, you agree that you have read, understood, and
will comply with the following terms and conditions:
Permission to use, copy, and distribute this
purpose and without fee or royalty is hereby
that the full text of this NOTICE appears on
specification or portions thereof, including
you make.
specification for any
granted, provided
ALL copies of the
modifications, that
THIS SPECIFICATION IS PROVIDED "AS IS," AND COPYRIGHT HOLDERS MAKE
NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF
EXAMPLE, BUT NOT LIMITATION, COPYRIGHT HOLDERS MAKE NO
REPRESENTATIONS OR WARRANTIES OF MERCHANTABILITY OR FITNESS FOR
ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SPECIFICATION WILL
NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR
Boutell, et al
INTERNET-DRAFT
Informational
PNG: Portable Network Graphics
[Page 90]
10 June 1996
OTHER RIGHTS. COPYRIGHT HOLDERS WILL BEAR NO LIABILITY FOR ANY
USE OF THIS SPECIFICATION.
The name and trademarks of copyright holders may NOT be used in
advertising or publicity pertaining to the specification without
specific, written prior permission. Title to copyright in this
specification and any associated documentation will at all times
remain with copyright holders.
Security Considerations
Security issues are discussed in Security considerations (Section
8.5).
Author's Address
Thomas Boutell
PO Box 20837
Seattle, WA 98102
Phone: (206) 329-4969
EMail: [email protected]
End of PNG Specification
Boutell, et al
Informational
[Page 91]
PLUG-IN TOOLKIT
1
Adobe Photoshop 3.0 Plug-in Toolkit
Copyright © 1991-95 Adobe Systems Incorporated All rights reserved.
Portions Copyright © 1990-91 Thomas Knoll
The information in this document is furnished for informational use only, is subject to change without
notice, and should not be construed as a commitment by Adobe Systems Incorporated. Adobe Systems
Incorporated assumes no responsibility or liability for any errors or inaccuracies that may appear in this
document. The software described in this document is furnished under license and may only be used or
copied in accordance with the terms of such license.
Adobe, Adobe Premiere, Adobe Photoshop, Adobe Illustrator, Adobe Type Manager, ATM and PostScript
are trademarks of Adobe Systems Incorporated that may be registered in certain jurisdictions. Macintosh and
Apple are registered trademarks of Apple Computer, Inc. MicrosoftMS, and MS-DOS are registered
trademarks, and Windows is a trademark of Microsoft Corporation. All other products or name brands are
trademarks of their respective holders.
Most of the material for this document was derived from earlier works by Thomas Knoll, Mark Hamburg
and Zalman Stern. Additional contributions came from David Corboy, Kevin Johnston, Sean Parent and
Seetha Narayanan. It was then compiled and edited by Dave Wise.
Version History:
11/07/94 David J. Wise
First Draft
1/15/95 David J. Wise
First Release
2/08/95 Seetharaman Narayanan MS-Windows Mods.
2
Introduction
6
How to Use This Toolkit 6
Plug-In Overview
6
Historical Note 6
Installation 7
Macintosh
7
Windows 7
Resources 7
'PiPL's 7
Definition 7
Structure
8
Notes 10
Types 10
General properties
11
Code Descriptor Properties 12
Filter specific properties
15
Format specific properties 18
Parser specific properties 20
'PiMI's 21
Execution 23
Macintosh
23
Windows 23
Callback Routines
TestAbort( ) 24
UpdateProgress( )
ProcessEvent( ) 24
DisplayPixels( ) 25
GetPropertyProc( )
Property keys 27
AdvanceStateProc ( )
ColorServicesProc ( )
Monitor Descriptions
24
24
27
29
29
31
Callback Suites 32
Buffer Suite 32
AllocateBuffer( ) 33
LockBuffer( ) 33
UnlockBuffer( ) 33
FreeBuffer( ) 33
BufferSpace( ) 33
Pseudo-Resource Suite
CountPIResources( )
GetPIResource( ) 34
DeletePIResource( )
AddPIResource( ) 34
Handle Suite 34
NewPIHandle ( ) 34
DisposePIHandle ( )
GetPIHandleSize ( )
SetPIHandleSize ( )
LockPIHandle ( ) 35
UnlockPIHandle ( )
RecoverSpaceProc ( )
33
33
34
34
34
34
35
35
3
General Notes 35
Macintosh 35
Global Variables 35
Segmentation 35
About Boxes 36
Configuration 36
Windows
36
Configuration 36
About the Sample Plug-ins
Macintosh Version 37
Windows Version
37
37
Acquisition Modules 39
Basics 39
The AcquireRecord Structure 39
Record Fields 40
Calling Order
45
(1) acquireSelectorPrepare 45
(2) acquireSelectorStart 46
(3) acquireSelectorContinue
46
(4) acquireSelectorFinish 46
(5) acquireSelectorFinalize 47
State Machine 47
Error return values 49
Sample Plug-in 49
DummyScan 49
Export Modules 50
Basics 50
The ExportRecord Structure 50
Record Fields 51
Calling Order
55
(1) exportSelectorPrepare 55
(2) exportSelectorStart 55
(3) exportSelectorContinue 55
(4) exportSelectorFinish 55
State Machine 56
Error return values 56
Sample Plug-ins 57
DummyExport 57
HistoryExport 57
Paths to Illustrator 57
Filter Modules 58
Basics 58
The FilterRecord Structure58
Record Fields 61
Calling Order
67
(1) filterSelectorParameters
67
(2) filterSelectorPrepare 68
(3) filterSelectorStart 68
(4) filterSelectorContinue 68
(5) filterSelectorFinish 68
4
State Machine 69
Error return values 70
Sample Plug-in 70
Dissolve 70
Image Format Modules 71
Basics 71
The FormatRecord Structure 71
Image Resources 72
Record Fields 73
Calling Sequences
77
(1) prepare 77
(2) start 78
(3) continue 78
(4) finish 79
Error return values 79
Sample Plug-in 79
Sample Format
79
Document File Formats 80
Image Resource Block
80
Path Resource Format 80
Photoshop 3.0 81
EPS 89
Filmstrip 91
TIFF
95
Load File Formats 97
Introduction 97
Arbitrary Map 98
Brushes 99
Color Table 101
Colors 102
Command Buttons 104
Curves 106
Duotone Options 108
Halftone Screens 110
Hue/Saturation 112
Ink Colors Setup 114
Custom Kernel 115
Levels 116
Monitor Setup 117
Replace Color/Color Range
Scratch Area 119
Selective Color 120
Separation Setup 121
Separation Tables122
Transfer Function
124
118
5
Introduction
How to Use This Toolkit
The Adobe Photoshop Plug-In Toolkit is for developers who wish to write their own plug-in modules for
use with Adobe Photoshop. Photoshop plug-ins are called by Photoshop to perform specific functions, such
as acquiring an image or filtering a portion of an image.
This toolkit documentation starts with information that is common to all the plug-in types. The rest of the
document is broken up into chapters specific to each type of plug-in, or to special types of files.
The best way to use this toolkit documentation is to read this Introduction chapter, then read the chapter
specific to the type of plug-in you're writing. You should then study and understand the sample plug-ins of
the type you're writing.
Plug-In Overview
Adobe Photoshop plug-ins are separate files that contain code which allows either Adobe Systems, Inc. or
third-party developers to extend Adobe Photoshop, without actually modifying the base application. Adobe
Photoshop version 3.0 supports five kinds of plug-in modules:
1. Acquisition modules, which open an image in a new window. Acquisition modules can be used to
interface to scanners or frame grabbers, read images in unsupported or compressed file formats, or to
generate synthetic images. These modules are accessed through the Acquire sub-menu.
2. Export modules, which output an existing image. Export modules can be used to print to printers
that do not have chooser-level driver support, or to save images in unsupported or compressed file
formats. These modules are accessed through the Export sub-menu.
3. Filter modules, which modify a selected area of an existing image. These modules are inserted into
the Filter menu.
4. File format modules which provide support for additional image formats. These appear in the format
pop-up in the Open..., Save As... and Save a Copy... dialogs.
5. Parser modules...TBD
A quick word about types is in order here. The interface files talk in terms of int32's and int16's rather than
longs and shorts when they specify 32-bit integers. VRect's are like Macintosh Rect's but they have 32-bit
coordinates. All of these types are defined in PITypes.h.
Historical Note
The concept of plug-in modules has become popular among Macintosh application developers. Perhaps the
best known example is Apple's HyperCard, with its support for XCMD's. One of the first companies to
incorporate plug-in modules into their products was Silicon Beach, in its Digital Darkroom and SuperPaint
products.
6
Silicon Beach's implementation of plug-in modules was well designed. Its good features include allowing
the plug-in modules to reside in individual files (rather than having to be pasted into the application using
ResEdit), allowing the plug-in modules to be placed anywhere (not just in the system folder), and allowing
for future extensions by means of a version number.
Adobe Photoshop's implementation of plug-in modules is similar to that used by Silicon Beach. Its uses a
similar calling sequence, and the same version number scheme.
Unfortunately, the detailed interface for Adobe Photoshop's plug-in modules is completely different from
that used by Silicon Beach. The differences were required primarily to support color images and Adobe
Photoshop's virtual memory scheme.
Installation
Macintosh
To install a plug-in module, all the user must do is drag the module's icon to one of the following folders:
the same folder as the application or the plug-ins folder designated in the user's Photoshop preferences file.
Photoshop 3.0 searches for plug-ins in the application folder, and throughout the tree of folders underneath
the designated plug-ins folder. Aliases are followed during the search process. (Folders with names
beginning with "¬" are ignored.)
Windows
To install a plug-in module, the user must copy the plug-in into the directory referred to in the
PHOTOSHO.INI file with the profile string PLUGINDIRECTORY.
When Adobe Photoshop starts executing, it searches the files in the PLUGINDIRECTORY, looking for
plug-in modules. When it finds a plug-in, it checks its version number, and if the version is supported, it
adds the plug-in's name to the appropriate menu or to the list of extensions to be executed.
Each kind of plug-in module has its own 4-byte resource-type. For example, acquisition modules have the
code '8BAM' (Note: the actual resource-type must be specified as _8BAM in your resource files to avoid a
syntax error caused by the first character being a number). Adobe Photoshop searches for acquisition
modules by examining the resources of all files in the PLUGINDIRECTORY that have file extension .8B*,
for resources of type _8BAM. The nameID, the integer value which uniquely identifies the resource, for
each 8BAM in the file must be consecutively numbered starting at 1.
Resources
'PiPL's
Definition
A Plug-In Property List, often called a 'PiPL' (pronounced "pipple") after its resource type code, is a
flexible, extensible mechanism for representing plug-in metadata. This includes all information Photoshop
needs to identify and load the plug-in as well as flags and other static properties that control the operation of
the plug-in.
7
Plug-in Property Lists replace the Plug-in Module Information structure, often called a 'PiMI' (pronounced
"pimmy") after its resource type code. A 'PiMI' is a fixed format record which originally contained only a
version number. With the evolution of Photoshop's plug-in interface, this record expanded to include other
information. The addition of multiple plug-in types resulted in the PiMI becoming a variant record with
generic data at the beginning and a type specific data at the end. Further plug-in interface evolution required
more complex metadata, such as an array of allowable file types for file format plug-ins.
The combination of variant and variable sized fields in the 'PiMI' made writing resource templates for them
very difficult. Requirements for new plug-in metadata in Photoshop 3.0 introduced further complexities. The
more general and flexible 'PiPL' mechanism was designed to address these issues.
'PiMI' based plug-ins are still fully supported. This is accomplished by converting the 'PiMI' into a 'PiPL'
when the plug-in is first scanned. Since 'PiPL's are cached in Photoshop's preferences file, this conversion
only happens once.
All plug-in file types are searched for 'PiPL' resources. Historically each type of plug-in had its own file
type, as follows:
Plug-in type
General (any type of plug-in)
Acquire modules
Export modules
Filter plug-ins
File Format plug-ins
Accelerator Extensions
Parser plug-ins
Macintosh
file type
8BPI
8BAM
8BEM
8BFM
8BIF
8BXM
8BYM
Windows
extension
.8bp
.8ba
.8be
.8bf
.8bi
.8bx
.8by
Filenames and extensions are case insensitive.
Only plug-ins of the correct type are searched for 'PiMI' resources of a given type. (This is due to the pairing
up of 'PiMI' resources with an appropriate type of code resource.) File types are only a matter of convention
for 'PiPL' based plug-ins. All the above file types are searched for 'PiPL' resources and for those that are
found, the information contained therein is used to determine the type of plug-in, code location, etc.
If no 'PiPL' resources are found in a plug-in file, the 'PiMI' search algorithm is used as documented in the
following section. This allows one to place both 'PiPL' and 'PiMI' resources in a plug-in. 3.0 or later
compatible hosts will use the 'PiPL' while 2.5.1 compatible hosts will use the 'PiMI'.
Structure
The Plug-in property list has a version number and a count followed by a sequence of arbitrary length byte
containers called properties. A "C " struct definition for the plug-in property list is as follows:
typedef struct PIPropertyList
{
8
int32 version;
int32 count;
PIProperty properties[1];
} PIPropertyList;
• version
This denotes the version of this specification the 'PiPL' is formatted to. The current version is 0.
•
count
This field holds the number of properties contained in the 'PiPL'. 0 is a valid value denoting a 'PiPL' with
no properties.
•
properties
A variable length array of variable length property data structures. Holds the actual contents of the
'PiPL'.
Each property has a vendor code, a key, an ID, a length, and property data the size indicated by the length.
The "C" struct definition for the plug-in properties are as follows:
typedef struct PIProperty
{
OSType vendorID;
OSType propertyKey;
int32 propertyID;
int32 propertyLength;
char propertyData [1];
/* Implicitly aligned to multiple of 4 bytes. */
} PIProperty;
The fields are defined as follows:
•
vendorID
This field identifies the vendor defining this property type. This allows other vendors to define their own
properties in a way that does not conflict with either Adobe or other vendors. It is recommended that a
registered application creator code be used for the vendorID to ensure uniqueness. All Photoshop
properties described in this document use the vendorID '8BIM'.
•
propertyKey
This field specifies the type of this property. Property types used by Photoshop are documented below.
(Think of a property type as similar to a resource type.)
•
propertyID
In theory this can be used to store more than one property of a given type (rather like a resource ID). In
practice, this field is always zero. It should be thought of as reserved for future use.
9
•
propertyLength
This field contains the length of the propertyData field. It does not include any padding bytes after
propertyData to achieve four byte alignment. This field may be zero.
•
propertyData
A variable length field that contains the bytes which are the contents of this property. Any values may be
contained.
Padding
Each property must be padded such that the next property begins on a four byte boundary.
Notes
Specific properties can be extended in an upward compatible fashion by adding extra data at their end. The
length field will allow an application to determine how much data is present, Optional properties can be
omitted without concern. (As opposed to a fixed length structure where omitted fields must be given a
default value.) The 'PiPL' format is fairly portable in that everything is four byte aligned. All OSType and
int32 fields are represented in native byte order for a given platform so the bytes of "the same" 'PiPL' will
differ between a big-endian machine (e.g. the Macintosh) and a little-endian machine (e.g. an Intel x86
based Windows machine). Although if one examines the bytes of the PiPL section of an x86 resource
binary, they will be backward compared to the Mac, the user can generally not concern themselves with the
difference. If they use the pre-defined PI-types, they will be interpreted and stored correctly as in the
following example (see PIKindProperty). If, however, an OSType has not been defined and they wish to
enter it as a 4-char series, then (since it is not interpreted as a long) they would have to supply the chars in
reverse order (see "MIB8").
"MIB8",
PIKindProperty,
0L,
4L,
"MFB8",
The Macintosh plug-in kit includes a resource template for the 'PiPL' type and the Windows version of the
kit includes a "PiPL Parser" application (CnvtPiPL.exe) to transform Mac ".r" files into Windows ".rc"
files. If you are developing for the Macintosh platform, you can automatically convert your Macintosh PiPL
resources into MSWindows' custom PiPL format by using CnvtPiPL.exe. This enables you to keep just one
copy of PiPL and saves you the headache of converting PiPLs by hand and also eliminates the errors
caused in the process. In order to use CnvtPiPL.exe, you need to pre-process your *.r file using the standard
C pre-processor and pipe the output to CnvtPiPL. The sample makefiles illustrate the process. Even if you
are not developing for the Macintosh, you are strongly encouraged to use the resource template (i.e. please
refer to any of the *.r files under the "RIncludes" sub-directory) to create the PiPLs, as it is more intuitive to
create the PiPLs that way, and then use CnvtPiPL.exe to convert them. CnvtPiPL.exe takes care of all byte
alignment and byte-ordering issues for you automatically. If you are going to use the PiPL resource template
to create your PiPLs, you can safely ignore the techno-babble about the Windows' PiPL resource format in
the following sections.
It is intended for 'PiPL's to collect all plug-in metadata in a single place. This includes the name and
category of the plug-in as well as all other information. (In the 'PiMI' world, the name and category were
10
stored as resource names on the plug-in code resource and 'PiMI' respectively.) Vendors are encouraged to
define new properties for extensions to plug-in metadata rather than introducing new resource types.
Types
int16, int32:
These are 16 and 32 bit integers respectively. They are stored within the 'PiPL' in native byte order.
OSType:
Same representation as an int32 but typically denotes a Macintosh style 4 character code like 'PiPL'.
TypeCreatorPair:
A structure of two OSTypes denoting a file type and creator code. The type code is the first field of the
structure and the creator code is second.
FlagSet:
This is an array of boolean values where the first boolean is contained in the high order bit of the first byte.
The eighth entry would be in the high-order bit of the second byte, etc.
PString:
A Pascal style string where the first byte gives the length of the string and the content bytes follow.
Structures:
Structures are typically represented the same way they would be in memory on the target platform. Native
padding and alignment constraints are observed.
Arrays:
Arrays are represented as a contiguous set of entries in the 'PiPL' typically with native padding and
alignment constraints observed. The length of the array is usually determined by the property length for
arrays of fixed length structures or types.
General properties
Plug-in Kind
OSType
#define PIKindProperty
0x6b696e64L /* 'kind' */
This property encodes the type or kind of a plug-in. Valid values are:
Filter '8BFM'
File Parser '8BYM'
File Format '8BIF'
Accelerator Extension '8BXM'
Acquire Module '8BAM'
Export Module '8BEM'
11
Version of kind specific API
int32
#define PIVersionProperty
0x76657273L /* 'vers' */
This property encodes a major and minor version number indicating which revision of the plug-in interface
this plug-in was written for. The major version number indicates incompatible changes while the minor
version number indicates incremental enhancements. The major version number is encoded in the most
significant 16 bits of the 32 bit version number, the minor version number is encoded in the least significant
16 bits.
There are separate version numbers for each kind of plug-in. The current version for a given kind of plug-in
is defined by a preprocessor macro in the header file defining the interface for that plug-in type.
Plug-in load order priority int16
#define PIPriorityProperty
0x70727479L /* 'prty' */
This property determines the order in which this plug-in will be loaded. This is typically only important for
acceleration extensions. It can however be used to control the order in which items with the same name
show up in menus. Lower numbers (including negative ones as the field is signed) load first.
Supported image modes FlagSet
#define PIImageModesProperty 0x6d6f6465L /* 'mode' */
This is a set of flags that determines which image modes the plug-in supports.
Required Host
OSType
#define PIRequiredHostProperty 0x686f7374L /* 'host' */
This property should be used if a plug-in relies on features of a specific host. It is typically filled in with the
applications creator code. (E.g. '8BIM' for Adobe Photoshop.)
Plug-in category PString
#define PICategoryProperty 0x63617467L /* 'catg' */
Plug-in name PString
#define PINameProperty
0x6e616d65L /* 'name' */
12
Code Descriptor Properties
Code descriptors tell Photoshop the type and location of a plug-in's code. More than one code descriptor
may be included to build a "fat" plug-in which will run on different types of machines. Photoshop will select
the best performing option. Photoshop makes sure that the callback structure is filled in with appropriate
functions for the type of code that is loaded. So for PowerPC code, native function pointers will be provided
and routine descriptor operations are not required either in calling the plug-in or for the plug-in to invoke
Photoshop callback functions.
68k code descriptor PI68KCodeDesc
PI68KCodeProperty
0x6d36386bL /* 'm68k' */
This descriptor indicates a 68K code resource. The type for this property is as follows:
typedef struct PI68KCodeDesc
{
OSType resourceType;
int16 resourceID;
} PI68KCodeDesc;
Any resource type may be used, but conventions for various types of plug-ins are as follows:
Filter '8BFM'
File Parser '8BYM'
File Format '8BIF'
Accelerator Extension '8BXM'
Acquire Module '8BAM'
Export Module '8BEM'
(This convention comes from Photoshop 2.5.1 where these types were required. When building a Plug-in
that is backwards compatible with 2.5.1 hosts, these resource types must be used.)
68k FPU code descriptor
PI68KCodeDesc
PI68KFPUCodeProperty 0x36386670L
/* '68fp' */
This descriptor is just like a PI68KCodeDesc except it will only be used on Macintosh machines that are
equipped with FPU hardware. This allows vendors to easily ship plug-ins that take advantage of FPU
hardware but still run on non-FPU Macs.
13
PowerPC code fragment descriptor
PICFMCodeDesc
PIPowerPCCodeProperty 0x70777063L /* 'pwpc' */
This descriptor indicates a PowerPC code fragment in the data fork of the plug-in file. The type for this
property is as follows:
typedef struct PICFMCodeDesc
{
long fContainerOffset;
long fContainerLength;
char fEntryName[1];
} PICFMCodeDesc;
With the fields documented as follows:
•
fContainerOffset
Contains the offset within the data fork for the start of this plug-in's code fragment. This allows more
than one code fragment based plug-in per file.
•
fContainerLength
Holds the length of this plug-ins code fragment. If the fragment extends to the end of the file (e.g. it is
the only fragment in the file), the container length may be 0.
•
fEntryName
The entrypoint name is represented as a Pascal string and is used to lookup the address of the function to
call within the fragment. If the entrypoint name is a zero length string, the default entrypoint for the code
fragment will be used. The entrypoint name allows a single code fragment to contain more than one
plug-in. (Note: in order for the Code Fragment Manager to find an entrypoint by name, that name must
be an exported symbol of the code fragment.) Entrypoint names allow more than one plug-in to be
exported from a single code fragment.
Windows 32-bit DLL code descriptor PIWin32X86CodeDesc
#define PIWin32X86CodeProperty 0x77783836L /* 'wx86' */
typedef struct PIWin32X86CodeDesc
{
"MIB8",
PIWin32X86CodeProperty,
0L,
12L,
"ENTRYPOINT1\0", ( if pad needed, "ENTRYPNT1\0\0\0" )
} PIWin32X86CodeDesc;
14
This code descriptor is used for 32 bit Windows DLL's. The entrypoint name is used to lookup the function
which is called to invoke the plug-in. The entrypoint name is represented as a NUL terminated string. The
string may need to be padded with additional NULs to satisfy the 4 byte alignment requirement.
Note for Windows' Developers:
CnvtPiPL .exe does not recognize any Code Descriptor Property other than "CodeWin32X86".
Filter specific properties
Layer case information Array
#define PIFilterCaseInfoProperty 0x66696369L /* 'fici' */
The key feature of Photoshop 3.0 is support for dynamically composited layers of image data. A layer
consists of color and transparency information for each pixel it contains. Previous versions of Photoshop did
not have a transparency component. Transparency introduces a greater richness as well as a number of
interesting problems. First off, completely transparent pixels have an undefined color. Second, filters will
likely affect transparency data as well as color data. This is especially true for filters which introduce spatial
distortions.
Photoshop 3.0 offers a fair bit of flexibility in how transparency data is presented to filters. The filter case
info property controls the filtering process and presentation of data to the plug-in. This property provides
information to Photoshop about what image data cases the plug-in supports. Photoshop then compares the
current filtering situation to the supported cases and chooses the best fitting case. The image data is then
presented in that case. If none of the supported cases are usable, the filter will be disabled.
So what are these "cases"? There are seven of them. The property is an array of seven four byte entries, one
for each case. The cases are as follows:
#define filterCaseFlatImageNoSelection 1
This is a background layer or a flat image. There is no transparency data. Nor is there a selection.
#define filterCaseFlatImageWithSelection 2
No transparency data, but a selection may be present. The selection will be presented as mask data.
#define filterCaseFloatingSelection
3
Image data with an accompanying mask.
#define filterCaseEditableTransparencyNoSelection
15
4
A layer with transparency editting enabled and no selection.
#define filterCaseEditableTransparencyWithSelection
5
A layer with transparency editting enabled and a selection.
#define filterCaseProtectedTransparencyNoSelection
6
A layer with transparency editting disabled and no selection.
#define filterCaseProtectedTransparencyWithSelection
7
A layer with transparency editting disabled and a selection.
Photoshop's fall through algorithm is as follows:
If the editable transparency cases are unsupported, then Photoshop will try the corresponding protected
transparency cases. This is important because this governs whether the filter will be expected to filter the
transparency data as well as the color data.
If the protected transparency case without a selection is disabled, Photoshop will fall through from there to
treating the layer data as a floating selection. As such, the transparency data will be presented via the mask
portion of the interface rather than with the input data.
Each four byte entry looks like so:
struct caseInfo {
unsigned8 inputHandling;
unsigned8 outputHandling;
unsigned8 flags;
unsigned8 reserved;
};
The inputHandling and outputHandling fields specify pre-processing and post-processing of the image data
respectively. Common values for both fields are as follows:
#define filterDataHandlingCantFilter
0
This case is not supported.
#define filterDataHandlingNone
1
Do nothing to the image data.
The next three cases are matting cases, which are useful when performing spatial distortions and blurs. You
can matte the data, process it, and then dematte to remove the added color. For these cases, the matting is
defined as follows:
16
mattedValue = ((unmattedValue * transparency) + 128) / 255 +
((matConstant * (255 - transparency)) + 128) / 255
Dematting is defined as follows:
unmattedValue = ((mattedValue - matConstant) ./ transparency) +
matConstant
with the ./ operator defined to be a suitable 8 bit fixed-point divide and the result value being pinned to the
range of 0 to 255.
#define filterDataHandlingBlackMat
2
For the input case, matte the image data with black (0) values based on the transparency. For output,
dematte the image data using black (0) values.
#define filterDataHandlingGrayMat
3
Matte the image data with gray (128) values based on the transparency on input. Dematte the image data
using gray (128) values on output.
#define filterDataHandlingWhiteMat
4
Matte the image data with white (255) values based on the transparency on input. Dematte the image data
using white (255) values on output.
The following modes are only useful for input:
#define filterDataHandlingDefringe
5
Defringe transparent areas filling with the nearest defined pixels using taxicab distance. Note that this only
applies to fully transparent pixels.
#define filterDataHandlingBlackZap
6
Set color component of totally transparent pixels to black (0).
#define filterDataHandlingGrayZap
7
Set color component of totally transparent pixels to gray (128).
#define filterDataHandlingWhiteZap
8
Set color component of totally transparent pixels to white (255).
#define filterDataHandlingBackgroundZap
17
10
Set color component of totally transparent pixels to the current background color.
#define filterDataHandlingForegroundZap
11
Set color component of totally transparent pixels to the current foreground color.
The following mode is only useful for output:
#define filterDataHandlingFillMask
9
This mode results in the transparency mask automatically being filled with full opacity in the area affected
by the filter. This is only valid for the editable transparency cases. This option is provided to make it easy
to write things like Photoshop's Clouds plug-in which wants to fill an area with a value.
The flags field holds the following bits. (Note: This field is not a FlagSet. The first bit
(PIFilterDontCopyToDestinationBit) is in the least-significant bit of the flag byte.)
#define PIFilterDontCopyToDestinationBit
0
Normally Photoshop copies the source data to the destination before filtering. This gives a good default
value for any pixels the filter does not write too, but degrades performance for filters which write all the
output pixels. Setting this bit inhibits the copying behavior.
#define PIFilterWorksWithBlankDataBit1
This flag determines whether the filter will work on "blank" areas. That is, areas that are completely
transparent. If not, an error message will be given when the filter is invoked on a blank area. This is only
valid for the editable transparency case because that is the only case where we could create opacity -- in the
protected transparency case, we would be left with what we started with: completely blank data.
#define PIFilterFiltersLayerMaskBit
2
In cases where transparency is editable, this flag determines if Layer Masks are filtered. (See the "Add Layer
Mask" item in the Layers palette menu to create a layer mask.) Setting this bit adds the layer mask to the set
of target channels if: transparency for the layer is editable (i.e., this is one of the editable transparency
cases), the bit is set, and the layer mask is specified as being positioned relative to the layer rather than the
image in Layer Mask Options. This is the same logic Photoshop uses for built-in filters like blur. The
distinction based on position is made with the assumption that layer relative masks will need to be distorted
along with the layer while image relative masks are independent of the layer.
Format specific properties
Default file creation information
TypeCreatorPair
#define PIFmtFileTypeProperty 0x666d5443 /* 'fmTC' */
18
Determines the default type and creator code used for files newly created with this format plug-in. On the
Windows platform, we don't store TypeCreator information (except internally), the PIFmtFileTypeProperty
is not required, they are simply interpreted as of type 'BINA' and creator 'mdos'. All the info regarding what
files can be read/written is obtained from the PIReadExtProperty or the PIFilteredExtProperty. PiMI
extensions are converted to PIReadExtProperty's so use of PIFilteredExtProperty requires additional coding
if the developer is porting a 16-bit plugin to 32-bit.
Readable types
TypeCreatorPair[ ]
#define PIReadTypesProperty 0x52645479 /* 'RdTy' */
This property contains a list of type and creator pairs which the format plug-in can read.
Filtered types TypeCreatorPair[ ]
#define PIFilteredTypesProperty 0x66667454 /* 'fftT' */
This property contains a list of type and creator pairs for which the file format plug-in should be called to
determine if the file can be read. See documentation for formatSelectorFilterFile plug-in selector.
Readable extensions OSType[ ]
#define PIReadExtProperty 0x52644578 /* 'RdEx' */
This property contains a list of extensions which the format plug-in can read. The extension is stored in the
first three characters of the OSType. The fourth character must be a space.
(The extension ".abc" would be encoded as "abc ". It is 0x61626320 on a big-endian machine. It is bytereversed on a little-endian machine 0x20636261, but you can still encode it as "abc ".)
Filtered extensions OSType[ ]
#define PIFilteredExtProperty 0x66667445 /* 'fftE' */
This property contains a list of extensions for which the file format plug-in should be called to determine if
the file can be read. See documentation for formatSelectorFilterFile plug-in selector.
(The extension ".abc" would be encoded as "abc ". It is 0x61626320 on a big-endian machine. It is bytereversed on a little-endian machine 0x20636261, but you can still encode it as "abc ".)
Format flags FlagSet
#define PIFmtFlagsProperty 0x666d7466 /* 'fmtf' */
This property contains a set of flags which control the operation of file format plug-ins. The default value
for any flag is false.
19
#define PIFmtReadsAllTypesFlag
0
obsolete.
#define PIFmtSavesImageResourcesFlag1
Along with the pixel information for a file, Photoshop stores various so-called image resources: printing
information, pen tool paths, etc.. Collectively, these are known as image resources. The plug-in format has
the option of taking responsibility for these resources by reading and writing a block of data containing the
image resources. If this flag is false, Photoshop will add the image resources to the file's resource fork but
this will not be portable to other platforms.
#define PIFmtCanReadFlag 2
This flag should be set to true if the file format can read files.
#define PIFmtCanWriteFlag 3
This flag should be set to true if the file format can write files.
#define PIFmtCanWriteIfReadFlag
4
Flag indicating whether we can write using this plug-in if we read the file using this plug-in. For
example, the plug-in to support Adobe Premiere's Filmstrip format has the can write flag set to false because
it cannot in general be used to save files. It has this flag set to true, however, because we can save out
filmstrips that we read in using the plug-in.
Maximum supported size
Point
#define PIFmtMaxSizeProperty 0x6d78737a /* 'mxsz' */
The maximum number of rows and columns that can be in an image saved in this format. Photoshop
will use this field to screen out ineligible formats.
Maximum channels int16[ ]
#define PIFmtMaxChannelsProperty 0x6d786368 /* 'mxch' */
An array of one byte counts of the maximum number of channels which can/will be saved for a given
image mode. This array is indexed by the plug-in mode constants. For example, if a format supports a
single alpha channel in RGB mode, it should set maxChannelsÊ[plugInModeRGBColor] to 4. A plug-in
may still be asked to save more channels than it reports it can support. This field exists primarily so that we
can warn the user that alpha channels will be discarded.
20
Parser specific properties
Parsable types
TypeCreatorPair[ ]
#define PIParsableTypesProperty 0x70735459L /* 'psTY' */
This property contains a list of type and creator pairs for files which the parser plug-in can parse.
Filtered parsable types TypeCreatorPair[ ]
#define PIFilteredParsableTypesProperty 0x70735479L /* 'psTy' */
This property contains a list of type and creator pairs for files which the parser plug-in may be able to parse.
The plug-in will be called to make the determination. See the documentation for the parserSelectorCanRead
selector.
Parsable extensions OSType[ ]
#define PIParsableExtProperty 0x70734558L /* 'psEX' */
This property contains a list of extensions for files which the parser plug-in can parse.
(The extension ".abc" would be encoded as "abc ". It is 0x61626320 on a big-endian machine. It is bytereversed on a little-endian machine 0x20636261, but you can still encode it as "abc ".)
Filtered parsable extensionsOSType[ ]
#define PIFilteredParsableExtProperty 0x70734578L /* 'psEx' */
This property contains a list of extensions for files which the parser plug-in may be able to parse. The plugin will be called to make the determination. See the documentation for the parserSelectorCanRead selector.
(The extension ".abc" would be encoded as "abc ". It is 0x61626320 on a big-endian machine. It is bytereversed on a little-endian machine 0x20636261, but you can still encode it as "abc ".)
Parsable clipboard types
OSType[ ]
#define PIParsableClipTypesProperty 0x70734342L /* 'psCB'*/
This Macintosh specific property contains a list of clipboard type codes which can be parsed by this plug-in.
'PiMI's
'PiMI' resources are superceded by the previously described 'PiPL' resources, however for compatibility with
existing plug-ins, Photoshop 3.0 will still recognize and function with plug-ins containing only 'PiMI's.
21
The 'PiMI' resource consists of two pieces: general information applicable to all (or most) plug-in types and
type specific info. The general information precedes the type specific information. Since the information
proceeds serially, however, all fields must be filled in through and including the last field supplied.
Generally, a plug-in should either just include the version number information or it should include all of the
information documented here.
A "C " struct definition for the 'PiMI' resource is as follows:
typedef struct PlugInInfo
{
short
version;
short
subVersion
short
priority;
short
generalInfoSize;
short
typeInfoSize;
short
supportsMode;
OSType requireHost;
} PlugInInfo;
•
version
The major version number for the interface used by the plug-in. This field is required.
•
subVersion
The minor version number for the interface used by the plug-in. This field is required.
•
priority
The priority which should be associated with this plug-in when it loads. Currently, this is only used for
extension modules.
•
generalInfoSize
The size of the general plug-in information in this resource.
•
typeInfoSize
The size of the type-specific plug-in information in this resource.
•
supportsMode
A bitmap describing the image modes supported by the plug-in. This field applies to filter, export, and
file format plug-ins. If it is not present, Photoshop will assume that the plug-in supports all image
modes. This field is one of the ways Photoshop decides whether to dim plug-ins in menus. Since not all
hosts may respect this field, the plug-in should still check that it can handle the image mode it has been
requested to process. The bits in the bitmap correspond to the plugInMode constants in PIGeneral.h (i.e. bit
0 corresponds to bitmaps, bit 1 to grayscale, etc.).
•
requiredHost
If the plug-in requires a particular host proc (see below), it should specify the signature for that host proc
here. If it does not require a particular host proc, it should fill this field with spaces. Photoshop will
decline to load plug-ins which require host procs other than Photoshop's '8BIM' proc. Plug-in developers
should be aware that again one cannot count on host developers to check this field.
22
The type specific info is documented in the documentation on the various types of plug-ins.
Note that it is possible to have multiple plug-in modules in a single file, as long as the resource numbers do
not conflict (if the modules are of different types, the file type should be set to '8BPI', which is always
searched as a special case). In most cases it is not a good idea to place multiple modules in a single file,
since it reduces the user's control of which modules are installed. However, there are cases when this should
be done. One example is matched acquisition/export, though these frequently correspond to the new file
format modules. In such cases, only one of the modules should display an about box, describing both
modules. Another example is a set of closely related filters, since the decrease in user control may be offset
by an improvement in the ease with which users can do maintenance on their plug-in folders.
Execution
Macintosh
When the user takes an action that causes a plug-in module to be called, Adobe Photoshop opens the
resource fork of the file the module resides in, loads the resource into memory, locks it, and calls the routine
starting at the first byte of the resource. The Macintosh prototype is:
•
pascal void PlugIn (short selector, Ptr stuff, long *data, short *result);
Windows
When the user takes an action that causes a plug-in module to be called, Adobe Photoshop does a
LoadLibrary call to load the module into memory. For each PiPL resource found in the file, Photoshop calls
GetProcAddress (routineName ) where "routineName" is the name found under "CodeWin32X86" property
to get the routine's address. If the file contains only PiMI resources and no PiPLs, Photoshop does a
GetProcAddress for each PiMI resource found in the file looking for the entry point ENTRYPOINT% where
% is the integer nameID of the PiMI resource to get the routine's address. Once Photoshop obtains the
routine's address it calls the routine following the calling conventions:
•
void Plugin(short selector, void * stuff, void *data, short *result);
The parameters are to be interpreted as follows:
• selector
This is an integer operation selector code. Selector 0 always means display an about box. The
meaning of other values depends on the type of plug-in.
•
stuff
This is a pointer to a parameter block. The exact nature of the parameter block depends on the type
of plug-in. In the case of the about box selector, this pointer leads to a record containing a single
platform specific 32-bit value. In the case of the Macintosh, this field contains no useful data.
•
data
This is a pointer to a long integer (32-bit value) which Photoshop will maintain for the plug-in across
invocations. One standard use for this field is to store a pointer or handle to a block of memory used
to store the plug-in's "global" data. It will be zero the first time the plug-in is called.
23
•
result
This is a pointer to the result code to be returned by the plug-in. A value of zero means that no error has
occurred. Any positive value means that execution of the plug-in should stop, but the plug-in has already
displayed any appropriate error message. (If the user cancels the operation in any way, the plug-in should
return a positive value and not report an error.) A negative value also means that execution of the plug-in
stop, but that the host should display its standard error dialog describing the error. Each plug-in type has a
one plug-in specific error code (see the header files for details) which can be returned here. Standard Maci
OS error codes such as memFullErr (-108) can also be used.
Callback Routines
A number of fields in the various plug-in "stuff" structures are callbacks to the host program to provide
specific services. A number of these routines are common to multiple plug-in types and are documented
here. Those specific to a single plug-in type are documented in that type's documentation. Some of these
routines are arranged in suites accessed via a pointer to a table of function pointers. Some of these routines
are also new in Photoshop 3.0 and may not be provided by other hosts including earlier versions of
Photoshop. If a host does not provide a particular routine or suite, the relevant pointer will be null.
Photoshop 3.0 has added an error code to indicate that the host does not supply necessary functionality:
#defineerrPlugInHostInsufficient -30900
All of the routines use Pascal calling conventions. A complete list can be found in PIGeneral.h.
The two routines guaranteed to be present if defined in the plug-in interface record provide checking for
command-period (or other requests to abort) and access to a host progress indicator:
TestAbort( )
• pascal Boolean TestAbort ( );
The plug-in should call this function several times a second during long operations to allow the user
to abort the operation. If the function returns TRUE, the operations should be aborted. As a side
effect, this changes the cursor to a watch and moves the watch hands periodically.
UpdateProgress( )
• pascal void UpdateProgress (long done, long total);
The plug-in may call this two-argument procedure periodically to update a progress indicator. The
first parameter is the number of operations completed; the second is the total number of operations.
This procedure should only be called during the actual main operation of the plug-in, not during long
operations during the preliminary user interface.
Photoshop automatically suppresses display of the progress bar during short operations.
Photoshop 3.0 provides a routine to allow some plug-in types to pass events to Photoshop for processing.
For example, when a plug-in receives a deactivate event for one of Photoshop's windows, it is considered
polite if the plug-in passes this event on to Photoshop. This routine can also be used to allow Photoshop to
do updates of its own windows by passing relevant update and null events to Photoshop. The calling
sequence for this routine is:
24
ProcessEvent( )
• pascal void ProcessEvent (EventRecord *event);
The parameter is actually a generic pointer and the data pointed to will depend on the platform the
plug-in is running on.
A general host callback routine may or may not be present. It's functionality and calling sequence is host
specific and is defined by the hostType field. The general functionality and calling sequence is identified by
the signature supplied with the procedure pointer. Host proc's are used to support operations which are
special to some host and would not apply to most other hosts. In the case of Photoshop's callback, for
example, this is a place where some callback operations which may well not be supported in the long term
but which are critical to getting certain features in Photoshop working as plug-ins are provided. This
callback is not generally useful on the Windows platform.
The next general callback routine is used to display pixels in various image modes. It takes a structure
describing a block of pixels to display:
DisplayPixels( )
• pascal OSErr DisplayPixels (const PSPixelMap *source, const VRect *srcRect, int32 dstRow,
int32 dstCol, unsigned32 platformContext);
The parameters have the following interpretations:
•
source
The PSPixelMap containing the pixels to be displayed.
typedef struct PSPixelMap
{
int32 version;
VRect bounds;
int32 imageMode;
int32 rowBytes;
int32 colBytes;
int32 planeBytes;
void *baseAddr;
/* Fields new in version 1. */
PSPixelMask *mat;
PSPixelMask *masks;
int32 maskPhaseRow;
int32 maskPhaseCol;
} PSPixelMap;
The fields in this structure are as follows:
•
version
The version number for this structure. The current version number is version 1. Future versions of
Photoshop may support additional parameters and will support higher version numbers for
PSPixelMap's.
25
•
bounds
The bounds for the pixel map.
•
imageMode
The mode for the image data. The supported modes are grayscale, RGB, CMYK, and Lab.
Additionally, if the mode of the document being processed is DuotoneMode or IndexedColorMode,
you can pas plugInModeDuotone or plugInModeIndexedColor.
•
rowBytes
The offset from one row to the next of pixels.
•
colBytes
The offset from one column to the next of pixels.
•
planeBytes
The offset from one plane of pixels to the next. In RGB, the planes are ordered red, green, blue; in
CMYK, the planes are ordered cyan, magenta, yellow, black; in Lab, the planes are ordered L, a, b.
•
baseAddr
The address of the byte value for the first plane of the top left pixel.
• mat
For all modes except indexed color, you can specify a mask to be used for matting correction. For
example, if you have white matted data to display, you can specify a mask in this field which will be
used to remove the white fringe. This field points to a PSPixelMask structure (see below) with a
maskDescription indicating what type of matting needs to be compensated for. If this field is NULL,
Photoshop performs no matting compensation. If the masks are chained, only the first mask in the
chain is used.
•
masks
This points to a chain of PSPixelMasks which are multiplied together (with the possibility of
inversion) to establish which areas of the image are transparent and should have the checkerboard
displayed. kSimplePSMask, kBlackMatPSMask, kWhiteMatPSMask, and kGrayMatPSMask all
operate such that 255 = opaque and 0 = transparent. kInvertPSMask has 255 = transparent and 0 =
opaque.
The PSPixelMasks structure is defined as follows:
typedef struct PSPixelMask
{
struct PSPixelMask *next;
void *maskData;
int32 rowBytes;
int32 colBytes;
int32 maskDescription;
}
PSPixelMask;
26
•
next
A pointer to the next mask in the chain
•
maskData
A pointer to the mask data.
•
•
rowBytes
colBytes
The row and column steps for the mask.
•
maskDescription
The mask description value, which is one of the following:
#define kSimplePSMask 0
#define kBlackMatPSMask 1
#define kGrayMatPSMask 2
#define kWhiteMatPSMask 3
#define kInvertPSMask 4
•
•
maskPhaseRow
maskPhaseCol
maskPhaseRow and maskPhaseCol give the phase of the checkerboard with respect to the top left
corner of the PSPixelMap.
•
srcRect
The rectangle within that PSPixelMap to be displayed.
•
•
dstRow
dstCol
The coordinates of the top left destination pixel in the current port (i.e., the destination pixel which will
correspond to the top left pixel in srcRect). The display routines does not scale the pixels, so specifying
the top left corner is sufficient to specify the destination.
•
platformContext
This parameter is not used on the Macintosh since the routine simply assumes that the target is the
current port. On Windows, platformContext should be the target hDC, cast to an unsigned32. Under
other platforms, this may specify a drawing context.
The routine will do the appropriate color space conversion and copybits the results to the screen with
dithering. It will leave the original data intact. If it is successful, it will return noErr. Non-success is
generally due to unsupported color modes.
27
GetPropertyProc( )
The GetProperty callback is available to filter and export modules. It allows these modules to get
information about the document currently being processed.
• pascal OSErr (*GetPropertyProc) (OSType signature, OSType key, int32 index,
int32 * simpleProperty, Handle *complexProperty);
The signature and key form a pair to identify the property of interest. Photoshop's signature is always
'8BIM'. The key values are documented below and in PIProperties.h.
Properties like channel names and path names or data can be indexed. Indices generally start at 0.
Properties can consist either of an integer returned in simpleProperty or a handle returned in
complexProperty. The type of property data is documented in PIProperties.h. In the case of a complex (i.e.,
handle based) property, the plug-in is responsible for disposing of the handle it is passed via the dispose call
in the handle suite.
Properties involving strings -- e.g., channel names and path names -- are returned in a handle where the
length of the handle determines the size of the string. There is no length byte nor is the string zero
terminated.
Property keys
'nuch' (number of channels): This returns the number of channels in the document. This count will include
the transparency mask and the layer mask for the target layer if these are present. This property is simple.
'nmch' (channel name): This returns the name of the channel. The channels are indexed from zero and
consist of the composite channels, the transpareny mask, the layer mask, and the alpha channels. This
property is complex.
'mode' (image mode): This returns the mode of the image using the constants defined in PIGeneral. This
property is simple.
'nupa' (number of paths): This property returns the number of paths in the document. This property is
simple.
'nmpa' (path name): This property returns the name of the indexed path. The paths are indexed starting with
zero. This property is complex.
'path' (path contents): This property returns the contents of the indexed path in the format documented in
the path resources documentation. LittleEndian platforms should note that the data is stored in BigEndian
form. This property is complex.
'wkpa' (work path index): This property returns the index of the work path or -1 if there is no work path.
This property is simple.
'clpa' (clipping path index): This property returns the index of the clipping path or -1 if there is no clipping
path. This property is simple.
'tgpa' (target path index): This property returns the index of the target
path or -1 if there is no target path. This property is simple.
28
This list is complete for Photoshop 3.0.1. Future versions of the program will almost certainly support more
properties. We will also probably add a way to write to properties from plug-ins in some future version.
There is no way to do so at this time.
AdvanceStateProc ( )
• pascal OSErr (*AdvanceStateProc) (void);
In a plug-in type specific manner, this callback allows the plug-in to drive Photoshop to update the buffers
used for communicating between Photoshop and the plug-in without the plug-in actually returning from the
selector call. It returns noErr if successful and a non-zero error code if something went wrong.
ColorServicesProc ( )
•
pascal OSErr (*ColorServicesProc) (ColorServicesInfo *info);
typedef struct ColorServicesInfo
{
int32 infoSize;
int16 selector;
int16 sourceSpace;
int16 resultSpace;
Boolean resultGamutInfoValid;
Boolean resultInGamut;
void *reservedSourceSpaceInfo;
void *reservedResultSpaceInfo;
int16 colorComponents[4];
void *reserved;
Str255 *pickerPrompt;
}
ColorServicesInfo;
The fields of this record are as follows:
•
infoSize
This field must be filled in with the size of the ColorServicesInfo record in bytes. The value is used as a
version identifier in case this record is expanded in the future. It can be filled in like so:
ColorServicesInfo requestInfo;
requestInfo.infoSize = sizeof(requestInfo);
•
selector
This field selects the operation performed by the ColorServices callback. At present there are two
operations available, choosing a color using the Photoshop color picker (actually, using the user's
preferred color picker), and converting color values from one color space to another. The selectors for
these are respectively:
#define plugIncolorServicesChooseColor 0
29
#define plugIncolorServicesConvertColor 1
Available color spaces:
#define plugIncolorServicesRGBSpace 0
#define plugIncolorServicesHSBSpace 1
#define plugIncolorServicesCMYKSpace 2
#define plugIncolorServicesLabSpace 3
#define plugIncolorServicesGraySpace 4
#define plugIncolorServicesHSLSpace 5
#define plugIncolorServicesXYZSpace 6
•
sourceSpace
This field is used for to indicate the color space of the input color contained in colorComponents. For
plugIncolorServicesChooseColor the input color is used as an initial value for the picker. For
plugIncolorServicesConvertColor the input color will be converted from the color space indicated by
sourceSpace to the one indicated by resultSpace.
•
resultSpace
This field holds the desired color space of the result color from the ColorServices call. The result will be
contained in the colorComponents field when ColorServices returns. For the
plugIncolorServicesChooseColor selector, resultSpace can be set to plugIncolorServicesChosenSpace to
return the color in whichever color space the user chose the color. In that case, resultSpace will contain
the chosen color space on output.
•
resultGamutInfoValid
This output only field indicates whether the resultInGamut field has been set. In Photoshop 3.0, this will
only be true for colors returned in the plugIncolorServicesCMYKSpace color space.
•
resultInGamut
This output only field is a boolean value that indicates whether the returned color is in gamut for the
currently selected printing setup. It is only meaningful if the resultGamutInfoValid field is true.
•
colorComponents
This array contains the actual color components of the input or output color. They will be included in the
components array as they are listed in the color space name. So for plugIncolorServicesRGBSpace
colorComponents[0] will contain the red (R) component, colorComponents[1] will contain the green (G)
component, colorComponents[2] will contain the blue (B) component. Components not used in the input
color space need not be filled in and components not used in the result color space are undefined.
•
pickerPrompt
This field contains a pointer to a Pascal string which will be used as a prompt in the Photoshop color
picker for the plugIncolorServicesChooseColor call. NULL can be passed to indicate no prompt should
be used.
30
•
•
•
reservedSourceSpaceInfo
reservedResultSpaceInfo
reserved
These three fields are reserved for future expansion and must be set to NULL. A parameter error will be
returned if they are not.
Monitor Descriptions
A number of the plug-ins get passed monitor descriptions via the PlugInMonitor structure. These
descriptions basically detail the information recorded in Photoshop's Monitor Setup dialog and are passed in
a structure of the following type:
typedef struct PlugInMonitor
{
Fixed gamma;
Fixed redX;
Fixed redY;
Fixed greenX;
Fixed greenY;
Fixed blueX;
Fixed blueY;
Fixed whiteX;
Fixed whiteY;
Fixed ambient;
} PlugInMonitor;
The fields of this record are as follow:
•
gamma
This field contains the monitor's gamma value or zero if the whole record is invalid.
•
•
•
•
•
•
redX
redY
greenX
greenY
blueX
blueY
These fields specify the chromaticity coordinates of the monitor's phosphors.
•
•
whiteX
whiteY
These fields specify the chromaticity coordinates of the monitor's white point.
31
•
ambient
This field specifies the relative amount of ambient light in the room. Zero means a relatively dark room,
0.5 means an average room, and 1.0 means a bright room.
Callback Suites
The rest of the callback routines are organized into "suites", collections of related routines which implement
a particular functionality. The suites are described by a pointer to a record containing in order a 2 byte
version number for the suite, a 2 byte count of the number of routines in the suite - routines can be added to
the suite without incrementing the version number - and a series of ProcPtr's for the routines. Before
calling a callback defined in the suite, the plug-in needs to check the following conditions:
• The suite pointer must not be null.
• The suite version number must match the version number the plug-in wishes to use. (We do not expect
to be changing version numbers with any degree of frequency.)
• The number of routines defined in the suite must be great enough to include the routine of interest.
• The pointer for the routine of interest must not be null.
If these conditions are not met and the plug-in does not want to work around the non-availability of the
callback, the plug-in should put up an error dialog and then return to the host as if the user had canceled.
Buffer Suite
The buffer suite provides an alternative to the memory management functions available in previous versions
of Photoshop's plug-in specification by providing a set of routines to request that the host allocate and
dispose of memory out of a pool which it manages.
Photoshop 2.5, for example, goes to a fair amount of trouble to balance the need for buffers of various sizes
against the space needed for the tiles in its virtual memory system. Growing the space needed for buffers
will result in Photoshop shrinking the number of tiles it keeps in memory.
Previous versions of the plug-in specification provide some mechanisms for interacting with this system by
letting a plug-in specify a certain amount of memory which the host should reserve for the plug-in. This
approach has two problems: (1) the memory is reserved throughout the execution of the plug-in and (2) the
plug-in may still run up against limitations imposed by the host - for example, Photoshop 2.5 will, in large
memory configurations, allocate most of memory at startup via a NewPtr call, and this memory will never
be available to the plug-in other than through the buffer suite. On Windows, Photoshop's memory scheme is
designed such that it allocates just enough memory to not let Windows' virtual memory manager to kick in.
If the plug-in allocates lots of memory using GlobalAlloc ( ), this scheme will be defeated and Photoshop
will be double-swapping thereby degrading performance. Using the buffer suite, a plug-in can avoid doing
some of the accounting for space to be reserved. This simplifies the prepare phase for acquire, filter, and
format plug-ins. Unfortunately, export modules are expected to account for the buffer for the data requested
from the host even though the host allocates the buffer. This means that the buffer suite routines do not
really provide any help for export modules. But for other plug-ins, buffer allocations can be delayed until
they are actually needed.
Buffers are identified by pointers to an opaque type called BufferID's.
Version 1 was purely developmental. The routines in version 2 of the suite are:
32
AllocateBuffer( )
•
pascal OSErr AllocateBuffer (int32 size, BufferID *buffer);
This routine sets buffer to be the ID for a buffer of the requested size and returns noErr if allocation
is successful. It returns an error code if allocation is unsuccessful. Note that buffer allocation is
more likely to fail during phases where other blocks of memory are locked down for the plug-in's
benefit - e.g., during the continue calls to filter and format plug-ins.
LockBuffer( )
•
pascal Ptr LockBuffer (BufferID buffer, Boolean moveHigh);
This locks the buffer so that it won't move in memory and returns a pointer to the beginning of the
buffer. It will optionally try to move the block to the high end of memory to avoid fragmentation.
"moveHigh" parameter has no effect under MS-Windows.
UnlockBuffer( )
•
pascal void UnlockBuffer (BufferID buffer);
This is the corresponding routine to unlock a buffer. A buffer can be locked multiple times and only
the final balancing unlock call will actually unlock it.
FreeBuffer( )
•
pascal void FreeBuffer (BufferID buffer);
This routine releases the storage associated with a buffer. Use of the buffer's ID after calling
FreeBuffer will probably result in severe crashes.
BufferSpace( )
•
pascal int32 BufferSpace (void);
This routine returns the amount of space available for buffers. This space may be fragmented so an
attempt to allocate all of the space as a single buffer may fail.
Pseudo-Resource Suite
Macintosh only.
The second suite of callback routine provides support for storing data with and retrieving data from a
document. These routines essentially provide pseudo-resources which plug-ins can attach to documents and
use to communicate with each other. Each resource is a handle of data and is identified by a 4 character
code (ResType) and a one-based index. (**NOTE: Some sort of registry needs to be set up for resource
types. No such registry yet exists. For now, please contact AppleLink: ADOBE.WISE or
[email protected] to discuss registering a type. **)
The first fully functional version of the suite is version 3. The routines in that version are:
CountPIResources( )
• pascal int16 CountPIResources (ResType ofType);
This routine returns a count of the number of resources of a given type.
33
GetPIResource( )
• pascal Handle GetPIResource (ResType ofType, int16 index);
This routine returns the indicated resource for the current document or NULL if no resource exists
with that type and index. The handle returned belongs to the host and should be treated as a read
only handle.
DeletePIResource( )
• pascal void DeletePIResource (ResType ofType, int16 index);
This routine deletes the resource that would have been returned by GetPIResource. Note that since
resources are identified by index rather than ID, this will cause subsequent resources to renumber.
AddPIResource( )
• pascal OSErr AddPIResource (ResType ofType, Handle data);
This routine adds a resource of the given type at the end of the list for that type. The contents of
data are duplicated so that the plug-in retains control over the original handle. If there is not enough
memory or the document already has too many plug-in resources (the limit in Photoshop is 1000),
this routine will return memFullErr.
Handle Suite
The use of handles in the pseudo-resource suite poses a problem for platforms other than the Macintosh
where a direct equivalent may not exist. In those cases, Photoshop chooses a specific model for what it
expects of a handle. The following suite of routines is used primarily for cross-platform support purposes
since on the Macintosh, handles are handles. The one additional feature gained by using these routines
rather than the Macintosh toolbox is that Photoshop will account for these handles in its VM space
calculations. Hence, it is important to free any handles allocated using this suite by calling the free routine
provided in this suite, etc..
Here are the routines in version 1 of the suite:
NewPIHandle ( )
• pascal Handle NewPIHandle (int32 size);
This routine allocates a handle of the indicated size. It returns NULL if the handle could not be
allocated.
DisposePIHandle ( )
• pascal void DisposePIHandle (Handle h);
This routine disposes of the indicated handle.
GetPIHandleSize ( )
• pascal int32 GetPIHandleSize (Handle h);
This routine returns the size of the indicated handle.
SetPIHandleSize ( )
• pascal OSErr SetPIHandleSize (Handle h, int32 newSize);
This routine attempts to resize the indicated handle. It returns noErr if successful and an error code
if unsuccessful.
34
LockPIHandle ( )
• pascal Ptr LockPIHandle (Handle h, Boolean moveHigh);
This routine locks and dereferences the handle. Optionally, the routine will move the handle to the
high end of memory before locking it. This routine really only matters for cross platform
implementations.
UnlockPIHandle ( )
• pascal void UnlockPIHandle (Handle h);
This routine unlocks the handle. Unlike the routines for buffers, the lock and unlock calls for
handles do not nest - a single unlock call unlocks the handle no matter how many times it has been
locked. This routine really only matters for cross platform implementations.
RecoverSpaceProc ( )
• pascal void (*RecoverSpaceProc) (int32 size);
All handles allocated through the Handle Suite have their space accounted for in Photoshop's estimates of
how much image data it can make resident at one time. If you obtain a handle via the handle suite (or some
other mechanism in Photoshop) which you are supposed to dispose of using the DisposePIHandle callback
but instead dispose of in some other way (e.g., use the handle as the parameter to AddResource and then
close the resource file), then you can use this call to tell Photoshop to stop reserving space for the handle.
General Notes
Macintosh
Global Variables
Most Macintosh development systems reference global variables by using negative offsets from register A5.
If a plug-in were to try to use global variables in the standard way, its global variable space would overlap
Adobe Photoshop's global variable space, usually resulting in a quick and fiery death.
The solution is to write the code in such a way as to not require global variables. In most cases, it is
possible to replace global variables with additional procedure parameters. One case where this is impossible
is with static data, which must be preserved between calls to the plug-in. Static data can be stored by
allocating a handle, storing the static data in memory pointed to by the handle, and storing the handle in the
data parameter, which Adobe Photoshop preserves between calls. This is the approach taken in the sample
plug-in code with this kit since it allows the code to work equally well under any development environment.
Segmentation
Macintosh 680x0 applications have a special code segment called the jump table. When a routine in one
segment calls a routine in another segment, it actually calls a small glue routine in the jump table segment.
This glue routine loads the routine's segment into memory if needed, and jumps to its actual location.
The jump table is accessed using positive offsets from register A5. Since Photoshop is already using A5 for
its jump table, the plug-in cannot use a jump table in the standard way.
The simplest way to solve this is to link all the plug-in's code into a single segment. This usually requires
the setting of optional compilation/link flags under most development environments if the resultant segment
exceeds 32k.
35
About Boxes
All five kinds of plug-in should respond to a selector value of zero, which means display an about box. The
plug-in actually has complete freedom to display any kind of about box it wishes, but to fit in smoothly with
the Adobe Photoshop interface it should obey the following conventions:
1. It should be centered on the main (menu-bar) screen, with 1/3 of the remaining space above the dialog,
and 2/3 below. Be sure to take into account the menu bar height.
2. It should not have an OK button, but should instead respond to a click anywhere in its dialog.
3. It should respond to the return and enter keys.
If you have placed multiple plug-in modules in a single file, only one of them should display an about box,
which should describe all of the modules. When Photoshop attempts to bring up the about box for a plug-in,
it will make the about box call for all of the plug-ins in the same file.
Configuration
Photoshop plug-ins may assume 128K or larger ROMs, and System 6.0.2 or later. PiPL-only plug-ins (i.e.,
Photoshop 3.0 or later) may assume System 7. Keep in mind that Photoshop will run, and thus your plug-in
may be called, on machines as old as the Mac Plus. Thus, plug-ins should not assume, and should check for
if they require: 68020 or 68030 processors, math co-processors, 256K ROMs, and Color or 32-Bit
QuickDraw. Photoshop 3.0 requires a 68020 or better, Color QuickDraw, and 32-Bit QuickDraw, so if you
restrict your plug-in so that it only runs under Photoshop 3.0, you can assume these features are present.
Windows
Configuration
Photoshop plug-ins may assume Windows 3.1 in standard or enhanced mode and a 80386 processor.
36
About the Sample Plug-ins
Macintosh Version
The 6 sample plug-ins included with this kit can be built using MPW. They have been tested against MPW
3.3.1.
The kit includes new header files. PIGeneral.h and PITypes.h contain definitions useful across multiple
plug-ins. PIAbout.h contains the information for the about box call for all plug-in types. PIAcquire.h,
PIExport.h, PIFilter.h, and PIFormat.h are the header files for the respective types of plug-in modules.
Also included are two sets of utilities: DialogUtilities and PIUtilities.
DialogUtilities.c and DialogUtilities.h provide general support for doing things with dialogs including
creating movable modal dialogs which make appropriate calls back to the host to update windows and such.
PIUtilities.c and PIUtilities.h contain various routines and macros to make it easier to use the host callbacks.
The macros make various assumptions about how global variables are being handled. None of the routines
worry about switching A5 worlds since the sample plug-ins do not use A5 worlds. If you do not follow the
model for dealing with globals (basically not using them) used in the sample plug-in code, you will probably
have to modify these files. Remember, it is VERY bad to call back to the host with the wrong A5 world!
Windows Version
The kit includes two sets of utilities: PIUtilities and Windows Utilities.
PIUtilities.c and PIUtilities.h contain various routines and macros to make it easier to use the host callbacks.
The macros make variou assumptions about how global variables are being handled. If you do not follow
the model for dealing with globals (basically not using them) used in the sample plug-in code, you will
probably have to modify these files.
Winutils.c provides support for some Mac Toolbox functions used in PIUtilities.c, namely memory
management functions (e.g NewHandle( ) etc.)
Structure packing for all records (i.e FilterRecord, FormatRecord, AcquireRecord, ExportRecord and
AboutRecord) should be the default for the machine (this has changed for 32-bit plugins for speed reasons)
however the Info structures (FilterInfo, FormatInfo etc.) must be packed to byte boundaries. This means the
PiMi resource should be byte aligned as before. These packing changes are reflected in the appropriate
header files using #pragma pack(1) to set byte packing and #pragma pack( ) to restore default packing.
These pragmas work only on Microsoft Visual C++ and Windows 32 bit SDK environment tools. If you are
using a different compiler, like say Symantec C++ or Borland C++, you have to modify the header files with
appropiate pragmas. The Borland #pragmas still appear in the header files as they did in the 16-bit plugin
kit, but are untested.
You need a DLLInit ( ) function prototyped as
BOOL APIENTRY DLLInit(HANDLE, DWORD,LPVOID);
The actual name of this entry point is provided to the linker by the
PSDLLENTRY=DLLInit
assignment in the sample makefiles.
The way that messages are packed into wParam and lParam have changed for Win32. You will need to
insure that your window procedures extract the appropriate information correctly. A new header file
"WinUtil.h" defines all the Win32 message crackers for cross-compilation or you may simply change your
37
extractions to the Win32 versions. (See The Win32 Application Programming Interface: An Overview for
more information on Win32 message parameter packing.)
Be sure that the definitions for your Windows callback functions (dialog box functions, etc.) conform to the
Win32 model. The most common problem is the use of "WORD wParam" for callback functions. The
plugin examples use
BOOL WINAPI MyDlgProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
which will work correctly for both 16 and 32 bit compilation.
The Windows kit also includes 2 executable utilties, namely, MacToDos.exe and CnvtPiPL.exe.
MacToDos.exe lets you convert Macintosh text files into PC text files. If you happen to pick up your plug-in
kit from Adobe server, some source files may be in Macintosh format. You may have to convert them into
PC format before compiling them.
CnvtPIPL.exe lets you convert PiPL resource in Macintosh format (ASCII format which conforms to the
PiPL resource template) into Windows's PiPL format.
These two utilities are included under the Util sub-directory.
38
Acquisition Modules
Basics
The code resource and file type for acquisition modules is '8BAM'.
The AcquireRecord Structure
The stuff parameter contains a pointer to a structure of the following type:
typedef struct AcquireRecord
{
int32
serialNumber;
TestAbortProc abortProc;
ProgressProc
progressProc;
int32
maxData;
int16
Point
int16
int16
imageMode;
imageSize;
depth;
planes;
Fixed
Fixed
imageHRes;
imageVRes;
LookUpTable
LookUpTable
LookUpTable
redLUT;
greenLUT;
blueLUT;
void *
data;
Rect
int16
int16
int16
int32
int32
theRect;
loPlane;
hiPlane;
colBytes;
rowBytes;
planeBytes;
Str255
int16
Boolean
fileName;
vRefNum;
dirty;
OSType
ProcPtr
hostSig;
hostProc;
int32
hostModes;
39
int16
planeMap [16];
Boolean
Boolean
canTranspose;
needTranspose;
Handle
int32
SpaceProc
duotoneInfo;
diskSpace;
spaceProc;
PlugInMonitormonitor;
void *
platformData;
BufferProcs * bufferProcs;
ResourceProcs * resourceProcs;
ProcessEventProc processEvent;
Boolean
Boolean
canReadBack;
wantReadBack;
Boolean
acquireAgain;
Boolean
canFinalize;
DisplayPixelsProc displayPixels;
HandleProcs * handleProcs;
Boolean
char
wantFinalize;
reserved1[3];
ColorServicesProc colorServices;
AdvanceStateProc advanceState;
char
reserved [216];
} AcquireRecord;
Record Fields
•
serialNumber
40
This field contains Adobe Photoshop's serial number. Plug-in modules can use this value for copy
protection, if desired.
•
abortProc
This field contains a pointer to the TestAbort callback documented in the general documentation.
•
progressProc
This field contains a pointer to the UpdateProgress callback documented in the general