ITU-T Rec. H.222.0 (05/2006) Information technology

ITU-T Rec. H.222.0 (05/2006) Information technology
I n t e r n a t i o n a l
T e l e c o m m u n i c a t i o n
ITU-T
TELECOMMUNICATION
STANDARDIZATION SECTOR
OF ITU
U n i o n
H.222.0
(05/2006)
SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS
Infrastructure of audiovisual services – Transmission
multiplexing and synchronization
Information technology – Generic coding of
moving pictures and associated audio
information: Systems
ITU-T Recommendation H.222.0
ITU-T H-SERIES RECOMMENDATIONS
AUDIOVISUAL AND MULTIMEDIA SYSTEMS
CHARACTERISTICS OF VISUAL TELEPHONE SYSTEMS
INFRASTRUCTURE OF AUDIOVISUAL SERVICES
General
Transmission multiplexing and synchronization
Systems aspects
Communication procedures
Coding of moving video
Related systems aspects
Systems and terminal equipment for audiovisual services
Directory services architecture for audiovisual and multimedia services
Quality of service architecture for audiovisual and multimedia services
Supplementary services for multimedia
MOBILITY AND COLLABORATION PROCEDURES
Overview of Mobility and Collaboration, definitions, protocols and procedures
Mobility for H-Series multimedia systems and services
Mobile multimedia collaboration applications and services
Security for mobile multimedia systems and services
Security for mobile multimedia collaboration applications and services
Mobility interworking procedures
Mobile multimedia collaboration inter-working procedures
BROADBAND AND TRIPLE-PLAY MULTIMEDIA SERVICES
Broadband multimedia services over VDSL
For further details, please refer to the list of ITU-T Recommendations.
H.100–H.199
H.200–H.219
H.220–H.229
H.230–H.239
H.240–H.259
H.260–H.279
H.280–H.299
H.300–H.349
H.350–H.359
H.360–H.369
H.450–H.499
H.500–H.509
H.510–H.519
H.520–H.529
H.530–H.539
H.540–H.549
H.550–H.559
H.560–H.569
H.610–H.619
INTERNATIONAL STANDARD ISO/IEC 13818-1
ITU-T RECOMMENDATION H.222.0
Information technology – Generic coding of moving pictures and
associated audio information: Systems
Summary
This Recommendation | International Standard specifies the system layer of the coding. It was developed in 1994 to
principally support the combination and synchronization of video and audio coding methods defined in Parts 2 and 3 of
ISO/IEC 13818. Since 1994, this standard has been extended to support additional video coding specifications (ISO/IEC
14496-2 and ISO/IEC 14496-10), audio coding specifications (ISO/IEC 13818-7 and ISO/IEC 14496-3), system streams
(ISO/IEC 14496-1 and ISO/IEC 15938-1), IPMP (ISO/IEC 13818-11) as well as generic metadata. The system layer
supports six basic functions:
1)
the synchronization of multiple compressed streams on decoding;
2)
the interleaving of multiple compressed streams into a single stream;
3)
the initialization of buffering for decoding start up;
4)
continuous buffer management;
5)
time identification; and
6)
multiplexing and signalling of various components in a system stream.
An ITU-T Rec. H.222.0 | ISO/IEC 13818-1 multiplexed bit stream is either a Transport Stream or a Program Stream.
Both streams are constructed from PES packets and packets containing other necessary information. Both stream types
support multiplexing of video and audio compressed streams from one program with a common time base. The
Transport Stream additionally supports the multiplexing of video and audio compressed streams from multiple
programs with independent time bases. For almost error-free environments the Program Stream is generally more
appropriate, supporting software processing of program information. The Transport Stream is more suitable for use in
environments where errors are likely.
An ITU-T Rec. H.222.0 | ISO/IEC 13818-1 multiplexed bit stream, whether a Transport Stream or a Program Stream, is
constructed in two layers: the outermost layer is the system layer, and the innermost is the compression layer. The system
layer provides the functions necessary for using one or more compressed data streams in a system. The video and audio
parts of this Specification define the compression coding layer for audio and video data. Coding of other types of data is
not defined by this Recommendation | International Standard, but is supported by the system layer provided that the other
types of data adhere to the constraints defined in this Recommendation | International Standard.
Source
ITU-T Recommendation H.222.0 was approved on 29 May 2006 by ITU-T Study Group 16 (2005-2008) under the
ITU-T Recommendation A.8 procedure. An identical text is also published as ISO/IEC 13818-1.
ITU-T Rec. H.222.0 (05/2006)
i
FOREWORD
The International Telecommunication Union (ITU) is the United Nations specialized agency in the field of
telecommunications. The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of
ITU. ITU-T is responsible for studying technical, operating and tariff questions and issuing
Recommendations on them with a view to standardizing telecommunications on a worldwide basis.
The World Telecommunication Standardization Assembly (WTSA), which meets every four years,
establishes the topics for study by the ITU-T study groups which, in turn, produce Recommendations on
these topics.
The approval of ITU-T Recommendations is covered by the procedure laid down in WTSA Resolution 1.
In some areas of information technology which fall within ITU-T's purview, the necessary standards are
prepared on a collaborative basis with ISO and IEC.
NOTE
In this Recommendation, the expression "Administration" is used for conciseness to indicate both a
telecommunication administration and a recognized operating agency.
Compliance with this Recommendation is voluntary. However, the Recommendation may contain certain
mandatory provisions (to ensure e.g. interoperability or applicability) and compliance with the
Recommendation is achieved when all of these mandatory provisions are met. The words "shall" or some
other obligatory language such as "must" and the negative equivalents are used to express requirements. The
use of such words does not suggest that compliance with the Recommendation is required of any party.
INTELLECTUAL PROPERTY RIGHTS
ITU draws attention to the possibility that the practice or implementation of this Recommendation may
involve the use of a claimed Intellectual Property Right. ITU takes no position concerning the evidence,
validity or applicability of claimed Intellectual Property Rights, whether asserted by ITU members or others
outside of the Recommendation development process.
As of the date of approval of this Recommendation, ITU had received notice of intellectual property,
protected by patents, which may be required to implement this Recommendation. However, implementers
are cautioned that this may not represent the latest information and are therefore strongly urged to consult the
TSB patent database at http://www.itu.int/ITU-T/ipr/.
© ITU 2007
All rights reserved. No part of this publication may be reproduced, by any means whatsoever, without the
prior written permission of ITU.
ii
ITU-T Rec. H.222.0 (05/2006)
CONTENTS
Page
SECTION 1 – GENERAL ...................................................................................................................
1.1
Scope............................................................................................................................
1.2
Normative references .......................................................................................................
1
1
1
SECTION 2 – TECHNICAL ELEMENTS..............................................................................................
2.1
Definitions .....................................................................................................................
2.2
Symbols and abbreviations................................................................................................
2.3
Method of describing bit stream syntax ...............................................................................
2.4
Transport Stream bitstream requirements .............................................................................
2.5
Program Stream bitstream requirements ..............................................................................
2.6
Program and program element descriptors............................................................................
2.7
Restrictions on the multiplexed stream semantics ..................................................................
2.8
Compatibility with ISO/IEC 11172 .....................................................................................
2.9
Registration of copyright identifiers ....................................................................................
2.10 Registration of private data format ......................................................................................
2.11 Carriage of ISO/IEC 14496 data.........................................................................................
2.12 Carriage of metadata ........................................................................................................
2.13 Carriage of ISO 15938 data ...............................................................................................
2.14 Carriage of ITU-T Rec. H.264 | ISO/IEC 14496-10 video .......................................................
2
2
6
7
8
51
63
94
98
98
99
99
111
120
120
Annex A – CRC decoder model ...........................................................................................................
A.0 CRC decoder model ........................................................................................................
124
124
Annex B – Digital Storage Medium Command and Control (DSM-CC) ........................................................
B.0 Introduction ...................................................................................................................
B.1 General elements ............................................................................................................
B.2 Technical elements ..........................................................................................................
125
125
126
128
Annex C – Program Specific Information ...............................................................................................
C.0 Explanation of Program Specific Information in Transport Streams ..........................................
C.1 Introduction ...................................................................................................................
C.2 Functional mechanism .....................................................................................................
C.3 The Mapping of Sections into Transport Stream Packets .........................................................
C.4 Repetition rates and random access .....................................................................................
C.5 What is a program?..........................................................................................................
C.6 Allocation of program_number ..........................................................................................
C.7 Usage of PSI in a typical system ........................................................................................
C.8 The relationships of PSI structures......................................................................................
C.9 Bandwidth utilization and signal acquisition time ..................................................................
133
133
133
134
135
135
135
136
136
137
139
Annex D – Systems timing model and application implications of this Recommendation | International
Standard.............................................................................................................................
D.0 Introduction ...................................................................................................................
141
141
Annex E – Data transmission applications...............................................................................................
E.0 General considerations .....................................................................................................
E.1 Suggestion .....................................................................................................................
149
149
150
Annex F – Graphics of syntax for this Recommendation | International Standard ............................................
F.0
Introduction ...................................................................................................................
151
151
Annex G – General information ............................................................................................................
G.0 General information.........................................................................................................
156
156
Annex H – Private data .......................................................................................................................
H.0 Private data ....................................................................................................................
157
157
Annex I – Systems conformance and real-time interface ...........................................................................
I.0
Systems conformance and real-time interface .......................................................................
158
158
ITU-T Rec. H.222.0 (05/2006)
iii
Page
Annex J – Interfacing jitter-inducing networks to MPEG-2 decoders ............................................................
J.0
Introduction ...................................................................................................................
J.1
Network compliance models .............................................................................................
J.2
Network specification for jitter smoothing ...........................................................................
J.3
Example decoder implementations .....................................................................................
158
158
159
159
160
Annex K – Splicing Transport Streams ...................................................................................................
K.0 Introduction ...................................................................................................................
K.1 The different types of splicing point ....................................................................................
K.2 Decoder behaviour on splices ............................................................................................
161
161
162
162
Annex L – Registration procedure (see 2.9) .............................................................................................
L.1 Procedure for the request of a Registered Identifier (RID) .......................................................
L.2 Responsibilities of the Registration Authority .......................................................................
L.3 Responsibilities of parties requesting an RID ........................................................................
L.4 Appeal procedure for denied applications.............................................................................
164
164
164
164
165
Annex M – Registration application form (see 2.9) ...................................................................................
M.1 Contact information of organization requesting a Registered Identifier (RID)..............................
M.2 Statement of an intention to apply the assigned RID ...............................................................
M.3 Date of intended implementation of the RID .........................................................................
M.4 Authorized representative .................................................................................................
M.5 For official use only of the Registration Authority .................................................................
165
165
165
165
165
166
Annex N ........................................................................................................................................
166
Annex O – Registration procedure (see 2.10) ...........................................................................................
O.1 Procedure for the request of an RID ....................................................................................
O.2 Responsibilities of the Registration Authority .......................................................................
O.3 Contact information for the Registration Authority ................................................................
O.4 Responsibilities of parties requesting an RID ........................................................................
O.5 Appeal procedure for denied applications.............................................................................
167
167
167
167
167
167
Annex P – Registration application form ................................................................................................
P.1
Contact information of organization requesting an RID ..........................................................
P.2
Request for a specific RID ................................................................................................
P.3
Short description of RID that is in use and date system that was implemented .............................
P.4
Statement of an intention to apply the assigned RID ...............................................................
P.5
Date of intended implementation of the RID .........................................................................
P.6
Authorized representative .................................................................................................
P.7
For official use of the Registration Authority ........................................................................
168
168
168
168
168
168
168
168
Annex Q – T-STD and P-STD buffer models for ISO/IEC 13818-7 ADTS ....................................................
Q.1 Introduction ...................................................................................................................
Q.2 Leak rate from Transport Buffer.........................................................................................
Q.3 Buffer size .....................................................................................................................
Q.4 Conclusion.....................................................................................................................
169
169
169
169
171
Annex R – Carriage of ISO/IEC 14496 scenes in ITU-T Rec. H.222.0 | ISO/IEC 13818- .................................
R.1 Content access procedure for ISO/IEC 14496 program components within a Program Stream ........
R.2 Content access procedure for ISO/IEC 14496 program components within a Transport
Stream ..........................................................................................................................
172
172
iv
ITU-T Rec. H.222.0 (05/2006)
173
Introduction
The systems part of this Recommendation | International Standard addresses the combining of one or more elementary
streams of video and audio, as well as other data, into single or multiple streams which are suitable for storage or
transmission. Systems coding follows the syntactical and semantic rules imposed by this Specification and provides
information to enable synchronized decoding of decoder buffers over a wide range of retrieval or receipt conditions.
System coding shall be specified in two forms: the Transport Stream and the Program Stream. Each is optimized for
a different set of applications. Both the Transport Stream and Program Stream defined in this Recommendation |
International Standard provide coding syntax which is necessary and sufficient to synchronize the decoding and
presentation of the video and audio information, while ensuring that data buffers in the decoders do not overflow or
underflow. Information is coded in the syntax using time stamps concerning the decoding and presentation of coded
audio and visual data and time stamps concerning the delivery of the data stream itself. Both stream definitions are
packet-oriented multiplexes.
The basic multiplexing approach for single video and audio elementary streams is illustrated in Figure Intro. 1. The
video and audio data is encoded as described in ITU-T Rec. H.262 | ISO/IEC 13818-2 and ISO/IEC 13818-3. The
resulting compressed elementary streams are packetized to produce PES packets. Information needed to use PES
packets independently of either Transport Streams or Program Streams may be added when PES packets are formed.
This information is not needed and need not be added when PES packets are further combined with system level
information to form Transport Streams or Program Streams. This systems standard covers those processes to the
right of the vertical dashed line.
Video
data
Video
encoder
Video PES
Packetizer
PS
Program
Stream
Audio
data
Audio
encoder
Audio PES
mux
Packetizer
TS
Transport
Stream
mux
Extent of systems specification
TISO5760-95/d01
Figure Intro. 1 – Simplified overview of the scope of this Recommendation | International Standard
The Program Stream is analogous and similar to ISO/IEC 11172 Systems layer. It results from combining one or more
streams of PES packets, which have a common time base, into a single stream.
For applications that require the elementary streams which comprise a single program to be in separate streams which
are not multiplexed, the elementary streams can also be encoded as separate Program Streams, one per elementary
stream, with a common time base. In this case the values encoded in the SCR fields of the various streams shall be
consistent.
Like the single Program Stream, all elementary streams can be decoded with synchronization.
The Program Stream is designed for use in relatively error-free environments and is suitable for applications which may
involve software processing of system information such as interactive multi-media applications. Program Stream
packets may be of variable and relatively great length.
The Transport Stream combines one or more programs with one or more independent time bases into a single stream.
PES packets made up of elementary streams that form a program share a common timebase. The Transport Stream is
designed for use in environments where errors are likely, such as storage or transmission in lossy or noisy media.
Transport Stream packets are 188 bytes in length.
ITU-T Rec. H.222.0 (05/2006)
v
Program and Transport Streams are designed for different applications and their definitions do not strictly follow a
layered model. It is possible and reasonable to convert from one to the other; however, one is not a subset or superset of
the other. In particular, extracting the contents of a program from a Transport Stream and creating a valid Program
Stream is possible and is accomplished through the common interchange format of PES packets, but not all of the fields
needed in a Program Stream are contained within the Transport Stream; some must be derived. The Transport Stream
may be used to span a range of layers in a layered model, and is designed for efficiency and ease of implementation in
high bandwidth applications.
The scope of syntactical and semantic rules set forth in the systems specification differ: the syntactical rules apply to
systems layer coding only, and do not extend to the compression layer coding of the video and audio specifications; by
contrast, the semantic rules apply to the combined stream in its entirety.
The systems specification does not specify the architecture or implementation of encoders or decoders, nor those of
multiplexors or demultiplexors. However, bit stream properties do impose functional and performance requirements on
encoders, decoders, multiplexors and demultiplexors. For instance, encoders must meet minimum clock tolerance
requirements. Notwithstanding this and other requirements, a considerable degree of freedom exists in the design and
implementation of encoders, decoders, multiplexors, and demultiplexors.
Intro. 1
Transport Stream
The Transport Stream is a stream definition which is tailored for communicating or storing one or more programs of
coded data according to ITU-T Rec. H.262 | ISO/IEC 13818-2 and ISO/IEC 13818-3 and other data in environments in
which significant errors may occur. Such errors may be manifested as bit value errors or loss of packets.
Transport Streams may be either fixed or variable rate. In either case the constituent elementary streams may either be
fixed or variable rate. The syntax and semantic constraints on the stream are identical in each of these cases. The
Transport Stream rate is defined by the values and locations of Program Clock Reference (PCR) fields, which in general
are separate PCR fields for each program.
There are some difficulties with constructing and delivering a Transport Stream containing multiple programs with
independent time bases such that the overall bit rate is variable. Refer to 2.4.2.2.
The Transport Stream may be constructed by any method that results in a valid stream. It is possible to construct
Transport Streams containing one or more programs from elementary coded data streams, from Program Streams, or
from other Transport Streams which may themselves contain one or more programs.
The Transport Stream is designed in such a way that several operations on a Transport Stream are possible with
minimum effort. Among these are:
1)
Retrieve the coded data from one program within the Transport Stream, decode it and present the
decoded results as shown in Figure Intro. 2.
2)
Extract the Transport Stream packets from one program within the Transport Stream and produce as
output a different Transport Stream with only that one program as shown in Figure Intro. 3.
3)
Extract the Transport Stream packets of one or more programs from one or more Transport Streams and
produce as output a different Transport Stream (not illustrated).
4)
Extract the contents of one program from the Transport Stream and produce as output a Program Stream
containing that one program as shown in Figure Intro. 4.
5)
Take a Program Stream, convert it into a Transport Stream to carry it over a lossy environment, and then
recover a valid, and in certain cases, identical Program Stream.
Figure Intro. 2 and Figure Intro. 3 illustrate prototypical demultiplexing and decoding systems which take as input a
Transport Stream. Figure Intro. 2 illustrates the first case, where a Transport Stream is directly demultiplexed and
decoded. Transport Streams are constructed in two layers:
–
a system layer; and
–
a compression layer.
The input stream to the Transport Stream decoder has a system layer wrapped about a compression layer. Input streams
to the Video and Audio decoders have only the compression layer.
Operations performed by the prototypical decoder which accepts Transport Streams either apply to the entire Transport
Stream ("multiplex-wide operations"), or to individual elementary streams ("stream-specific operations"). The
Transport Stream system layer is divided into two sub-layers, one for multiplex-wide operations (the Transport Stream
packet layer), and one for stream-specific operations (the PES packet layer).
A prototypical decoder for Transport Streams, including audio and video, is also depicted in Figure Intro. 2 to illustrate
the function of a decoder. The architecture is not unique – some system decoder functions, such as decoder timing
vi
ITU-T Rec. H.222.0 (05/2006)
control, might equally well be distributed among elementary stream decoders and the channel-specific decoder – but
this figure is useful for discussion. Likewise, indication of errors detected by the channel-specific decoder to the
individual audio and video decoders may be performed in various ways and such communication paths are not shown in
the diagram. The prototypical decoder design does not imply any normative requirement for the design of a Transport
Stream decoder. Indeed non-audio/video data is also allowed, but not shown.
Channel
Transport Stream
demultiplex
and decoder
Channel specific
decoder
Video
decoder
Decoded
video
Audio
decoder
Decoded
audio
Clock
control
Transport Stream
containing one or multiple programs
TISO5770-95/d02
Figure Intro. 2 – Prototypical transport demultiplexing and decoding example
Figure Intro. 3 illustrates the second case, where a Transport Stream containing multiple programs is converted into a
Transport Stream containing a single program. In this case the re-multiplexing operation may necessitate the correction
of Program Clock Reference (PCR) values to account for changes in the PCR locations in the bit stream.
Transport Stream
demultiplex
and decoder
Channel specific
decoder
Channel
TISO5780-95/d03
Transport Stream
containing multiple programs
Transport Stream
with single program
Figure Intro. 3 – Prototypical transport multiplexing example
Figure Intro. 4 illustrates a case in which a multi-program Transport Stream is first demultiplexed and then converted
into a Program Stream.
Figures Intro. 3 and Intro. 4 indicate that it is possible and reasonable to convert between different types and
configurations of Transport Streams. There are specific fields defined in the Transport Stream and Program Stream
syntax which facilitate the conversions illustrated. There is no requirement that specific implementations of
demultiplexors or decoders include all of these functions.
Channel
Channel specific
decoder
Transport Stream
demultiplex and
Program Stream
multiplexor
TISO5790-95/d04
Transport Stream
containing multiple programs
Program Stream
Figure Intro. 4 – Prototypical Transport Stream to Program Stream conversion
Intro. 2
Program Stream
The Program Stream is a stream definition which is tailored for communicating or storing one program of coded data
and other data in environments where errors are very unlikely, and where processing of system coding, e.g., by
software, is a major consideration.
ITU-T Rec. H.222.0 (05/2006)
vii
Program Streams may be either fixed or variable rate. In either case, the constituent elementary streams may be either
fixed or variable rate. The syntax and semantics constraints on the stream are identical in each case. The Program
Stream rate is defined by the values and locations of the System Clock Reference (SCR) and mux_rate fields.
A prototypical audio/video Program Stream decoder system is depicted in Figure Intro. 5. The architecture is not unique
– system decoder functions including decoder timing control might as equally well be distributed among elementary
stream decoders and the channel-specific decoder – but this figure is useful for discussion. The prototypical decoder
design does not imply any normative requirement for the design of an Program Stream decoder. Indeed non-audio/video
data is also allowed, but not shown.
Channel
Channel specific
decoder
Program Stream
decoder
Program
Stream
Video
decoder
Decoded
video
Audio
decoder
Decoded
audio
Clock
control
TISO5800-95/d05
Figure Intro. 5 – Prototypical decoder for Program Streams
The prototypical decoder for Program Streams shown in Figure Intro. 5 is composed of System, Video and Audio
decoders conforming to Parts 1, 2 and 3, respectively, of ISO/IEC 13818. In this decoder, the multiplexed coded
representation of one or more audio and/or video streams is assumed to be stored or communicated on some channel in
some channel-specific format. The channel-specific format is not governed by this Recommendation | International
Standard, nor is the channel-specific decoding part of the prototypical decoder.
The prototypical decoder accepts as input a Program Stream and relies on a Program Stream Decoder to extract timing
information from the stream. The Program Stream Decoder demultiplexes the stream, and the elementary streams so
produced serve as inputs to Video and Audio decoders, whose outputs are decoded video and audio signals. Included in
the design, but not shown in the figure, is the flow of timing information among the Program Stream decoder, the Video
and Audio decoders, and the channel-specific decoder. The Video and Audio decoders are synchronized with each other
and with the channel using this timing information.
Program Streams are constructed in two layers: a system layer and a compression layer. The input stream to the
Program Stream Decoder has a system layer wrapped about a compression layer. Input streams to the Video and Audio
decoders have only the compression layer.
Operations performed by the prototypical decoder either apply to the entire Program Stream ("multiplex-wide
operations"), or to individual elementary streams ("stream-specific operations"). The Program Stream system layer is
divided into two sub-layers, one for multiplex-wide operations (the pack layer), and one for stream-specific operations
(the PES packet layer).
Intro. 3
Conversion between Transport Stream and Program Stream
It may be possible and reasonable to convert between Transport Streams and Program Streams by means of PES
packets. This results from the specification of Transport Stream and Program Stream as embodied in 2.4.1 and 2.5.1
of the normative requirements of this Recommendation | International Standard. PES packets may, with some
constraints, be mapped directly from the payload of one multiplexed bit stream into the payload of another multiplexed
bit stream. It is possible to identify the correct order of PES packets in a program to assist with this if the
program_packet_sequence_counter is present in all PES packets.
Certain other information necessary for conversion, e.g., the relationship between elementary streams, is available in
tables and headers in both streams. Such data, if available, shall be correct in any stream before and after conversion.
viii
ITU-T Rec. H.222.0 (05/2006)
Intro. 4
Packetized Elementary Stream
Transport Streams and Program Streams are each logically constructed from PES packets, as indicated in the syntax
definitions in 2.4.3.6. PES packets shall be used to convert between Transport Streams and Program Streams; in some
cases the PES packets need not be modified when performing such conversions. PES packets may be much larger than
the size of a Transport Stream packet.
A continuous sequence of PES packets of one elementary stream with one stream ID may be used to construct a PES
Stream. When PES packets are used to form a PES stream, they shall include Elementary Stream Clock Reference
(ESCR) fields and Elementary Stream Rate (ES_Rate) fields, with constraints as defined in 2.4.3.8. The PES stream
data shall be contiguous bytes from the elementary stream in their original order. PES streams do not contain some
necessary system information which is contained in Program Streams and Transport Streams. Examples include the
information in the Pack Header, System Header, Program Stream Map, Program Stream Directory, Program Map Table,
and elements of the Transport Stream packet syntax.
The PES Stream is a logical construct that may be useful within implementations of this Recommendation |
International Standard; however, it is not defined as a stream for interchange and interoperability. Applications
requiring streams containing only one elementary stream can use Program Streams or Transport Streams which each
contain only one elementary stream. These streams contain all of the necessary system information. Multiple Program
Streams or Transport Streams, each containing a single elementary stream, can be constructed with a common time base
and therefore carry a complete program, i.e., with audio and video.
Intro. 5
Timing model
Systems, Video and Audio all have a timing model in which the end-to-end delay from the signal input to an encoder to
the signal output from a decoder is a constant. This delay is the sum of encoding, encoder buffering, multiplexing,
communication or storage, demultiplexing, decoder buffering, decoding, and presentation delays. As part of this timing
model all video pictures and audio samples are presented exactly once, unless specifically coded to the contrary, and the
inter-picture interval and audio sample rate are the same at the decoder as at the encoder. The system stream coding
contains timing information which can be used to implement systems which embody constant end-to-end delay. It is
possible to implement decoders which do not follow this model exactly; however, in such cases it is the decoder's
responsibility to perform in an acceptable manner. The timing is embodied in the normative specifications of this
Recommendation | International Standard, which must be adhered to by all valid bit streams, regardless of the means of
creating them.
All timing is defined in terms of a common system clock, referred to as a System Time Clock. In the Program Stream
this clock may have an exactly specified ratio to the video or audio sample clocks, or it may have an operating
frequency which differs slightly from the exact ratio while still providing precise end-to-end timing and clock recovery.
In the Transport Stream the system clock frequency is constrained to have the exactly specified ratio to the audio and
video sample clocks at all times; the effect of this constraint is to simplify sample rate recovery in decoders.
Intro. 6
Conditional access
Encryption and scrambling for conditional access to programs encoded in the Program and Transport Streams is
supported by the system data stream definitions. Conditional access mechanisms are not specified here. The stream
definitions are designed so that implementation of practical conditional access systems is reasonable, and there are some
syntactical elements specified which provide specific support for such systems.
Intro. 7
Multiplex-wide operations
Multiplex-wide operations include the coordination of data retrieval of the channel, the adjustment of clocks, and the
management of buffers. The tasks are intimately related. If the rate of data delivery of the channel is controllable, then
data delivery may be adjusted so that decoder buffers neither overflow nor underflow; but if the data rate is not
controllable, then elementary stream decoders must slave their timing to the data received from the channel to avoid
overflow or underflow.
Program Streams are composed of packs whose headers facilitate the above tasks. Pack headers specify intended times
at which each byte is to enter the Program Stream Decoder from the channel, and this target arrival schedule serves as a
reference for clock correction and buffer management. The schedule need not be followed exactly by decoders, but they
must compensate for deviations about it.
Similarly, Transport Streams are composed of Transport Stream packets with headers containing information which
specifies the times at which each byte is intended to enter a Transport Stream Decoder from the channel. This schedule
provides exactly the same function as that which is specified in the Program Stream.
An additional multiplex-wide operation is a decoder's ability to establish what resources are required to decode a
Transport Stream or Program Stream. The first pack of each Program Stream conveys parameters to assist decoders in
ITU-T Rec. H.222.0 (05/2006)
ix
this task. Included, for example, are the stream's maximum data rate and the highest number of simultaneous video
channels. The Transport Stream likewise contains globally useful information.
The Transport Stream and Program Stream each contain information which identifies the pertinent characteristics of,
and relationships between, the elementary streams which constitute each program. Such information may include the
language spoken in audio channels, as well as the relationship between video streams when multi-layer video coding is
implemented.
Intro. 8
Individual stream operations (PES Packet Layer)
The principal stream-specific operations are:
Intro. 8.1
1)
demultiplexing; and
2)
synchronizing playback of multiple elementary streams.
Demultiplexing
On encoding, Program Streams are formed by multiplexing elementary streams, and Transport Streams are formed by
multiplexing elementary streams, Program Streams, or the contents of other Transport Streams. Elementary streams
may include private, reserved, and padding streams in addition to audio and video streams. The streams are temporally
subdivided into packets, and the packets are serialized. A PES packet contains coded bytes from one and only one
elementary stream.
In the Program Stream both fixed and variable packet lengths are allowed subject to constraints as specified in 2.5.1
and 2.5.2. For Transport Streams the packet length is 188 bytes. Both fixed and variable PES packet lengths are
allowed, and will be relatively long in most applications.
On decoding, demultiplexing is required to reconstitute elementary streams from the multiplexed Program Stream or
Transport Stream. Stream_id codes in Program Stream packet headers, and Packet ID codes in the Transport Stream
make this possible.
Intro. 8.2
Synchronization
Synchronization among multiple elementary streams is accomplished with Presentation Time Stamps (PTS) in the
Program Stream and Transport streams. Time stamps are generally in units of 90 kHz, but the System Clock Reference
(SCR), the Program Clock Reference (PCR) and the optional Elementary Stream Clock Reference (ESCR) have
extensions with a resolution of 27 MHz. Decoding of N-elementary streams is synchronized by adjusting the decoding
of streams to a common master time base rather than by adjusting the decoding of one stream to match that of another.
The master time base may be one of the N-decoders' clocks, the data source's clock, or it may be some external clock.
Each program in a Transport Stream, which may contain multiple programs, may have its own time base. The time
bases of different programs within a Transport Stream may be different.
Because PTSs apply to the decoding of individual elementary streams, they reside in the PES packet layer of both the
Transport Streams and Program Streams. End-to-end synchronization occurs when encoders save time stamps at capture
time, when the time stamps propagate with associated coded data to decoders, and when decoders use those time stamps
to schedule presentations.
Synchronization of a decoding system with a channel is achieved through the use of the SCR in the Program Stream and
by its analogue, the PCR, in the Transport Stream. The SCR and PCR are time stamps encoding the timing of the bit
stream itself, and are derived from the same time base used for the audio and video PTS values from the same program.
Since each program may have its own time base, there are separate PCR fields for each program in a Transport Stream
containing multiple programs. In some cases it may be possible for programs to share PCR fields. Refer to 2.4.4,
Program Specific Information (PSI), for the method of identifying which PCR is associated with a program. A program
shall have one and only one PCR time base associated with it.
Intro. 8.3
Relation to compression layer
The PES packet layer is independent of the compression layer in some senses, but not in all. It is independent in the
sense that PES packet payloads need not start at compression layer start codes, as defined in Parts 2 and 3 of
ISO/IEC 13818. For example, video start codes may occur anywhere within the payload of a PES packet, and start
codes may be split by a PES packet header. However, time stamps encoded in PES packet headers apply to presentation
times of compression layer constructs (namely, presentation units). In addition, when the elementary stream data
conforms to ITU-T Rec. H.262 | ISO/IEC 13818-2 or ISO/IEC 13818-3, the PES_packet_data_bytes shall be byte
aligned to the bytes of this Recommendation | International Standard.
x
ITU-T Rec. H.222.0 (05/2006)
Intro. 9
System reference decoder
Part 1 of ISO/IEC 13818 employs a "System Target Decoder" (STD), one for Transport Streams (refer to 2.4.2) referred
to as "Transport System Target Decoder" (T-STD) and one for Program Streams (refer to 2.5.2) referred to as "Program
System Target Decoder" (P-STD), to provide a formalism for timing and buffering relationships. Because the STD is
parameterized in terms of ITU-T Rec. H.222.0 | ISO/IEC 13818-1 fields (for example, buffer sizes) each elementary
stream leads to its own parameterization of the STD. Encoders shall produce bit streams that meet the appropriate STD's
constraints. Physical decoders may assume that a stream plays properly on its STD. The physical decoder must
compensate for ways in which its design differs from that of the STD.
Intro. 10 Applications
The streams defined in this Recommendation | International Standard are intended to be as useful as possible to a wide
variety of applications. Application developers should select the most appropriate stream.
Modern data communications networks may be capable of supporting ITU-T Rec. H.222.0 | ISO/IEC 13818-1 video
and ISO/IEC 13818 audio. A real-time transport protocol is required. The Program Stream may be suitable for
transmission on such networks.
The Program Stream is also suitable for multimedia applications on CD-ROM. Software processing of the Program
Stream may be appropriate.
The Transport Stream may be more suitable for error-prone environments, such as those used for distributing
compressed bit-streams over long-distance networks and in broadcast systems.
Many applications require storage and retrieval of ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bitstreams on various Digital
Storage Media (DSM). A Digital Storage Media Command and Control (DSM-CC) protocol is specified in Annex B
and Part 6 of ISO/IEC 13818 in order to facilitate the control of such media.
ITU-T Rec. H.222.0 (05/2006)
xi
ISO/IEC 13818-1:2007 (E)
INTERNATIONAL STANDARD
ITU-T RECOMMENDATION
Information technology – Generic coding of moving pictures and
associated audio information: Systems
SECTION 1 – GENERAL
1.1
Scope
This Recommendation | International Standard specifies the system layer of the coding. It was developed principally to
support the combination of the video and audio coding methods defined in Parts 2 and 3 of ISO/IEC 13818. The system
layer supports six basic functions:
1)
the synchronization of multiple compressed streams on decoding;
2)
the interleaving of multiple compressed streams into a single stream;
3)
the initialization of buffering for decoding start up;
4)
continuous buffer management;
5)
time identification;
6)
multiplexing and signalling of various components in a system stream.
An ITU-T Rec. H.222.0 | ISO/IEC 13818-1 multiplexed bit stream is either a Transport Stream or a Program
Stream. Both streams are constructed from PES packets and packets containing other necessary information. Both
stream types support multiplexing of video and audio compressed streams from one program with a common time base.
The Transport Stream additionally supports the multiplexing of video and audio compressed streams from multiple
programs with independent time bases. For almost error-free environments the Program Stream is generally more
appropriate, supporting software processing of program information. The Transport Stream is more suitable for use in
environments where errors are likely.
An ITU-T Rec. H.222.0 | ISO/IEC 13818-1 multiplexed bit stream, whether a Transport Stream or a Program Stream, is
constructed in two layers: the outermost layer is the system layer, and the innermost is the compression layer. The
system layer provides the functions necessary for using one or more compressed data streams in a system. The video
and audio parts of this Specification define the compression coding layer for audio and video data. Coding of other
types of data is not defined by this Specification, but is supported by the system layer provided that the other types of
data adhere to the constraints defined in 2.7.
1.2
Normative references
The following Recommendations and International Standards contain provisions which, through reference in this text,
constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated
were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this
Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent
edition of the Recommendations and Standards listed below. Members of IEC and ISO maintain registers of currently
valid International Standards. The Telecommunication Standardization Bureau of the ITU maintains a list of currently
valid ITU-T Recommendations.
1.2.1
Identical Recommendations | International Standards
–
1.2.2
ITU-T Recommendation H.262 (2000) | ISO/IEC 13818-2:2000, Information technology – Generic
coding of moving pictures and associated audio information: Video.
Paired Recommendations | International Standards equivalent in technical content
–
ITU-T Recommendation H.264 (2005), Advanced video coding for generic audiovisual services.
ISO/IEC 14496-10:2005, Information technology – Coding of audio-visual objects – Part 10: Advanced
video coding.
ITU-T Rec. H.222.0 (05/2006)
1
ISO/IEC 13818-1:2007 (E)
–
ITU-T Recommendation T.171 (1996), Protocols for interactive audiovisual services: coded
representation of multimedia and hypermedia objects.
ISO/IEC 13522-1:1997, Information technology – Coding of Multimedia and Hypermedia information –
Part 1: MHEG object representation – Base notation (ASN.1).
1.2.3
Additional references
–
ISO 639-2:1998, Codes for the representation of names of languages – Part 2: Alpha-3 code.
–
ISO 8859-1:1998, Information technology – 8-bit single-byte coded graphic character sets – Part 1:
Latin alphabet No. 1.
–
ISO 15706:2002, Information and documentation – International Standard Audiovisual Number (ISAN).
–
ISO/PRF 15706-2, Information and documentation – International Standard audiovisual number (ISAN)
– Part 2: Version identifier.
–
ISO/IEC 11172-1:1993, Information technology – Coding of moving pictures and associated audio for
digital storage media at up to about 1,5 Mbit/s – Part 1: Systems.
–
ISO/IEC 11172-2:1993, Information technology – Coding of moving pictures and associated audio for
digital storage media at up to about 1,5 Mbit/s – Part 2: Video.
–
ISO/IEC 11172-3:1993, Information technology – Coding of moving pictures and associated audio for
digital storage media at up to about 1,5 Mbit/s – Part 3: Audio.
–
ISO/IEC 13818-3:1998, Information technology – Generic coding of moving pictures and associated
audio information – Part 3: Audio.
–
ISO/IEC 13818-6:1998, Information technology – Generic coding of moving pictures and associated
audio information – Part 6: Extensions for DSM-CC.
–
ISO/IEC 13818-7:2006, Information technology – Generic coding of moving pictures and associated
audio information – Part 7: Advanced Audio Coding (AAC).
–
ISO/IEC 13818-11:2004, Information technology – Generic coding of moving pictures and associated
audio information – Part 11: IPMP on MPEG-2 systems.
–
ISO/IEC 14496-1:2004, Information technology – Coding of audio-visual objects – Part 1: Systems.
–
ISO/IEC 14496-2:2004, Information technology – Coding of audio-visual objects – Part 2: Visual.
–
ISO/IEC 14496-3:2005, Information technology – Coding of audio-visual objects – Part 3: Audio.
–
Recommendation ITU-R BT.601-6 (2007), Studio encoding parameters of digital television for standard
4:3 and wide-screen 16.9 aspect ratios.
–
Recommendation ITU-R BT.470-7 (2005), Conventional analogue television systems.
–
Recommendation ITU-R BR.648, Digital recording of audio signals.
–
ITU-T Recommendation J.17 (1988), Pre-emphasis used on sound-programme circuits.
–
IEC Publication 60908:1999, Audio recording – Compact disc digital audio system.
SECTION 2 – TECHNICAL ELEMENTS
2.1
Definitions
For the purposes of this Recommendation | International Standard, the following definitions apply. If specific to a Part,
this is parenthetically noted.
2.1.1
access unit (system): A coded representation of a presentation unit. In the case of audio, an access unit is the
coded representation of an audio frame.
In the case of video, an access unit includes all the coded data for a picture, and any stuffing that follows it, up to but
not including the start of the next access unit. If a picture is not preceded by a group_start_code or a
sequence_header_code, the access unit begins with the picture start code. If a picture is preceded by a group_start_code
and/or a sequence_header_code, the access unit begins with the first byte of the first of these start codes. If it is the last
picture preceding a sequence_end_code in the bitstream, all bytes between the last byte of the coded picture and the
sequence_end_code (including the sequence_end_code) belong to the access unit.
2
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
For the definition of an access unit for ITU-T Rec. H.264 | ISO/IEC 14496-10 video, see the AVC access unit definition
in 2.1.3.
2.1.2
AVC 24-hour picture (system): An AVC access unit with a presentation time that is more than 24 hours in
the future. For the purpose of this definition, AVC access unit n has a presentation time that is more than 24 hours in the
future if the difference between the initial arrival time tai(n) and the DPB output time to,dpb(n) is more than 24 hours.
2.1.3
AVC access unit (system): An access unit as defined for byte streams in ITU-T Rec. H.264 |
ISO/IEC 14496-10 with the constraints specified in 2.14.1.
2.1.4
AVC Slice (system): A byte_stream_nal_unit as defined in ITU-T Rec. H.264 | ISO/IEC 14496-10 with
nal_unit_type values of 1 or 5, or a byte_stream_nal_unit data structure with nal_unit_type value of 2 and any
associated byte_stream_nal_unit data structures with nal_unit_type equal to 3 and/or 4.
2.1.5
AVC still picture (system): An AVC still picture consists of an AVC access unit containing an IDR picture,
preceded by SPS and PPS NAL units that carry sufficient information to correctly decode the IDR picture. Preceding an
AVC still picture, there shall be another AVC still picture or an End of Sequence NAL unit terminating a preceding
coded video sequence unless the AVC still picture is the very first access unit in the video stream.
2.1.6
AVC video sequence (system): Coded video sequence as defined in 3.30 of ITU-T Rec. H.264 |
ISO/IEC 14496-10.
2.1.7
AVC video stream (system): An ITU-T Rec. H.264 | ISO/IEC 14496-10 stream. An AVC video stream
consists of one or more AVC video sequences.
2.1.8
bitrate: The rate at which the compressed bit stream is delivered from the channel to the input of a decoder.
2.1.9
byte aligned: A bit in a coded bit stream is byte-aligned if its position is a multiple of 8-bits from the first bit
in the stream.
2.1.10
channel: A digital medium that stores or transports an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream.
2.1.11
coded B-frame: A B-frame picture or a pair of B-field pictures.
2.1.12
coded frame: A coded frame is a coded I-frame, coded B-frame or a coded P-frame.
2.1.13 coded I-frame: An I-frame picture or a pair of field pictures where the first field picture is an I-picture and
the second field picture is either an I-picture or a P-picture.
2.1.14
coded P-frame: A P-frame picture or a pair of P-field pictures.
2.1.15
coded representation: A data element as represented in its encoded form.
2.1.16
compression: Reduction in the number of bits used to represent an item of data.
2.1.17
constant bitrate: Operation where the bitrate is constant from start to finish of the compressed bit stream.
2.1.18 constrained system parameter stream; CSPS (system): A Program Stream for which the constraints
defined in 2.7.9 apply.
2.1.19
Cyclic Redundancy Check (CRC): The CRC to verify the correctness of data.
2.1.20
data element: An item of data as represented before encoding and after decoding.
2.1.21
decoded stream: The decoded reconstruction of a compressed bit stream.
2.1.22
decoder: An embodiment of a decoding process.
2.1.23 decoding (process): The process defined in this Recommendation | International Standard that reads an inputcoded bit stream and outputs decoded pictures or audio samples.
2.1.24 decoding time-stamp; DTS (system): A field that may be present in a PES packet header that indicates the
time that an access unit is decoded in the system target decoder.
2.1.25
digital storage media (DSM): A digital storage or transmission device or system.
2.1.26
DSM-CC: Digital storage media command and control.
2.1.27 entitlement control message (ECM): Entitlement Control Messages are private conditional access
information which specify control words and possibly other, typically stream-specific, scrambling and/or control
parameters.
ITU-T Rec. H.222.0 (05/2006)
3
ISO/IEC 13818-1:2007 (E)
2.1.28 entitlement management message (EMM): Entitlement Management Messages are private conditional
access information which specify the authorization levels or the services of specific decoders. They may be addressed to
single decoders or groups of decoders.
2.1.29 editing: The process by which one or more compressed bit streams are manipulated to produce a new
compressed bit stream. Edited bit streams meet the same requirements as streams which are not edited.
2.1.30 elementary stream; ES (system): A generic term for one of the coded video, coded audio or other coded bit
streams in PES packets. One elementary stream is carried in a sequence of PES packets with one and only one
stream_id.
2.1.31 Elementary Stream Clock Reference; ESCR (system): A time stamp in the PES Stream from which
decoders of PES streams may derive timing.
2.1.32
encoder: An embodiment of an encoding process.
2.1.33 encoding (process): A process, not specified in this Recommendation | International Standard, that reads a
stream of input pictures or audio samples and produces a coded bit stream conforming to this Recommendation.
2.1.34
entropy coding: Variable length lossless coding of the digital representation of a signal to reduce redundancy.
2.1.35 event: An event is defined as a collection of elementary streams with a common time base, an associated start
time, and an associated end time.
2.1.36 fast forward playback (video): The process of displaying a sequence, or parts of a sequence, of pictures in
display-order faster than real-time.
2.1.37 forbidden: The term "forbidden", when used in the clauses of this Recommendation | International Standard
defining the coded bit stream, indicates that the value specified shall never be used.
2.1.38 metadata: Information to describe audiovisual content and data essence in a format defined by ISO or any
other authority.
2.1.39 metadata access unit: A global structure within metadata that defines the fraction of metadata that is
intended to be decoded at a specific instant in time. The internal structure of a metadata Access Unit is defined by the
format of the metadata.
2.1.40 metadata application format: Identifies the format of the application that uses the metadata; signals
application specific information for transport of metadata.
2.1.41 metadata decoder configuration information: Data needed by a receiver to decode a specific metadata
service. Depending on the format of the metadata, decoder configuration information may or may not be needed.
2.1.42
metadata format: Identifies the coding format of metadata.
2.1.43
metadata service: Coherent set of metadata of the same format delivered to a receiver for a specific purpose.
2.1.44 metadata service id: Identifier of a specific metadata service; used for some transport methods of the
metadata.
2.1.45 metadata stream: The concatenation or collection of metadata Access Units from one or more metadata
services.
2.1.46 (multiplexed) stream (system): A bit stream composed of 0 or more elementary streams combined in a
manner that conforms to this Recommendation | International Standard.
2.1.47 layer (video and systems): One of the levels in the data hierarchy of the video and system specifications
defined in Parts 1 and 2 of this Recommendation | International Standard.
2.1.48 pack (system): A pack consists of a pack header followed by zero or more packets. It is a layer in the system
coding syntax described in 2.5.3.3.
2.1.49
packet data (system): Contiguous bytes of data from an elementary stream present in a packet.
2.1.50 packet identifier; PID (system): A unique integer value used to identify elementary streams of a program in
a single or multi-program Transport Stream as described in 2.4.3.
2.1.51 padding (audio): A method to adjust the average length of an audio frame in time to the duration of the
corresponding PCM samples, by conditionally adding a slot to the audio frame.
2.1.52 payload: Payload refers to the bytes which follow the header bytes in a packet. For example, the payload of
some Transport Stream packets includes a PES_packet_header and its PES_packet_data_bytes, or pointer_field and
4
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
PSI sections, or private data; but a PES_packet_payload consists of only PES_packet_data_bytes. The Transport Stream
packet header and adaptation fields are not payload.
2.1.53
PES (system): An abbreviation for Packetized Elementary Stream.
2.1.54 PES packet (system): The data structure used to carry elementary stream data. A PES packet consists of a
PES packet header followed by a number of contiguous bytes from an elementary data stream. It is a layer in the system
coding syntax described in 2.4.3.6.
2.1.55 PES packet header (system): The leading fields in a PES packet up to and not including the
PES_packet_data_byte fields, where the stream is not a padding stream. In the case of a padding stream the PES packet
header is similarly defined as the leading fields in a PES packet up to and not including padding_byte fields.
2.1.56 PES Stream (system): A PES Stream consists of PES packets, all of whose payloads consist of data from a
single elementary stream, and all of which have the same stream_id. Specific semantic constraints apply. Refer to
Intro. 4.
2.1.57 presentation time-stamp; PTS (system): A field that may be present in a PES packet header that indicates
the time that a presentation unit is presented in the system target decoder.
2.1.58
presentation unit; PU (system): A decoded Audio Access Unit or a decoded picture.
2.1.59 program (system): A program is a collection of program elements. Program elements may be elementary
streams. Program elements need not have any defined time base; those that do, have a common time base and are
intended for synchronized presentation.
2.1.60 Program Clock Reference; PCR (system): A time stamp in the Transport Stream from which decoder
timing is derived.
2.1.61 program element (system): A generic term for one of the elementary streams or other data streams that may
be included in a program.
2.1.62 Program Specific Information; PSI (system): PSI consists of normative data which is necessary for the
demultiplexing of Transport Streams and the successful regeneration of programs and is described in 2.4.4. An example
of privately defined PSI data is the non-mandatory network information table.
2.1.63
random access: The process of beginning to read and decode the coded bit stream at an arbitrary point.
2.1.64 reserved: The term "reserved", when used in the clauses defining the coded bit stream, indicates that the
value may be used in the future for ISO defined extensions. Unless otherwise specified within this Recommendation |
International Standard, all reserved bits shall be set to '1'.
2.1.65 scrambling (system): The alteration of the characteristics of a video, audio or coded data stream in order to
prevent unauthorized reception of the information in a clear form. This alteration is a specified process under the control
of a conditional access system.
2.1.66
source stream: A single non-multiplexed stream of samples before compression coding.
2.1.67 splicing (system): The concatenation, performed on the system level, of two different elementary streams.
The resulting system stream conforms totally to this Recommendation | International Standard. The splice may result in
discontinuities in timebase, continuity counter, PSI, and decoding.
2.1.68 start codes (system): 32-bit codes embedded in the coded bit stream. They are used for several purposes
including identifying some of the layers in the coding syntax. Start codes consist of a 24-bit prefix (0x000001) and an
8-bit stream_id as shown in Table 2-22.
2.1.69 STD input buffer (system): A first-in first-out buffer at the input of a system target decoder for storage of
compressed data from elementary streams before decoding.
2.1.70 still picture: A still picture consists of a video sequence, coded as defined in ITU-T Rec. H.262 |
ISO/IEC 13818-2, ISO/IEC 11172-2 or ISO/IEC 14496-2, that contains exactly one coded picture which is intra-coded.
This picture has an associated PTS and in case of coding according to ISO/IEC 11172-2, ITU-T Rec. H.262 |
ISO/IEC 13818-2 or ISO/IEC 14496-2, the presentation time of succeeding pictures, if any, is later than that of the still
picture by at least two picture periods.
2.1.71 system header (system): The system header is a data structure defined in 2.5.3.5 that carries information
summarizing the system characteristics of ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Program Stream.
2.1.72 System Clock Reference; SCR (system): A time stamp in the Program Stream from which decoder timing is
derived.
ITU-T Rec. H.222.0 (05/2006)
5
ISO/IEC 13818-1:2007 (E)
2.1.73 system target decoder; STD (system): A hypothetical reference model of a decoding process used to define
the semantics of an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 multiplexed bit stream.
2.1.74 time-stamp (system): A term that indicates the time of a specific action such as the arrival of a byte or the
presentation of a Presentation Unit.
2.1.75 transport stream packet header (system): The leading fields in a Transport Stream packet, up to and
including the continuity_counter field.
2.1.76 variable bitrate: An attribute of Transport Streams or Program Streams wherein the rate of arrival of bytes at
the input to a decoder varies with time.
2.2
Symbols and abbreviations
The mathematical operators used to describe this Recommendation | International Standard are similar to those used in
the C-programming language. However, integer division with truncation and rounding are specifically defined. The
bitwise operators are defined assuming two's-complement representation of integers. Numbering and counting loops
generally begin from 0.
2.2.1
Arithmetic operators
+
Addition
–
Subtraction (as a binary operator) or negation (as a unary operator)
++
Increment
––
Decrement
* or ×
Multiplication
^
Power
/
Integer division with truncation of the result toward 0. For example, 7/4 and –7/–4 are truncated to 1
and –7/4 and 7/–4 are truncated to –1.
//
Integer division with rounding to the nearest integer. Half-integer values are rounded away from 0
unless otherwise specified. For example 3//2 is rounded to 2, and –3//2 is rounded to –2.
DIV
Integer division with truncation of the result towards – ∞.
%
Modulus operator. Defined only for positive numbers.
Sign( )
Sign(x)
= 1
0
–1
x>0
x==0
x<0
NINT( ) Nearest integer operator. Returns the nearest integer value to the real-valued argument. Half-integer
values are rounded away from 0.
2.2.2
2.2.3
6
sin
Sine
cos
Cosine
exp
Exponential
√
Square root
log10
Logarithm to base ten
loge
Logarithm to base e
Logical operators
||
Logical OR
&&
Logical AND
!
Logical NOT
Relational operators
>
Greater than
≥
Greater than or equal to
<
Less than
≤
Less than or equal to
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
==
Equal to
!=
Not equal to
max [,...,] The maximum value in the argument list
min [,...,] The minimum value in the argument list
2.2.4
2.2.5
Bitwise operators
&
AND
|
OR
>>
Shift right with sign extension
<<
Shift left with 0 fill
Assignment
=
2.2.6
Assignment operator
Mnemonics
The following mnemonics are defined to describe the different data types used in the coded bit-stream.
bslbf
Bit string, left bit first, where "left" is the order in which bit strings are written in this
Recommendation | International Standard. Bit strings are written as a string of 1s and 0s
within single quote marks, e.g., '1000 0001'. Blanks within a bit string are for ease of reading
and have no significance.
ch
Channel
gr
Granule of 3 * 32 sub-band samples in audio Layer II, 18 * 32 sub-band samples in audio
Layer III.
main_data
The main_data portion of the bit stream contains the scale factors, Huffman encoded data,
and ancillary information.
main_data_beg This gives the location in the bit stream of the beginning of the main_data for the frame. The
location is equal to the ending location of the previous frame's main_data plus 1 bit. It is
calculated from the main_data_end value of the previous frame.
part2_length
This value contains the number of main_data bits used for scale factors.
rpchof
Remainder polynomial coefficients, highest order first
sb
Sub-band
scfsi
Scalefactor selector information
switch_point_l Number of scalefactor band (long block scalefactor band) from which point on window
switching is used
switch_point_s Number of scalefactor band (short block scalefactor band) from which point on window
switching is used
tcimsbf
Two's complement integer, msb (sign) bit first
uimsbf
Unsigned integer, most significant bit first
vlclbf
Variable length code, left bit first, where "left" refers to the order in which the variable
length codes are written
window
Number of actual time slot in case of block_type = = 2, 0 ≤ window ≤ 2.
The byte order of multi-byte words is most significant byte first.
2.2.7
2.3
Constants
π
3.14159265359
e
2.71828182845
Method of describing bit stream syntax
The bit streams retrieved by the decoder are described in 2.4.1 and 2.5.1. Each data item in the bit stream is in bold
type. It is described by its name, its length in bits, and a mnemonic for its type and order of transmission.
The action caused by a decoded data element in a bit stream depends on the value of that data element and on data
elements previously decoded. The decoding of the data elements and definition of the state variables used in their
ITU-T Rec. H.222.0 (05/2006)
7
ISO/IEC 13818-1:2007 (E)
decoding are described in the clauses containing the semantic description of the syntax. The following constructs are
used to express the conditions when data elements are present, and are in normal type.
Note this syntax uses the "C"-code convention that a variable or expression evaluating to a non-zero value is equivalent
to a condition that is true:
while ( condition ) {
data_element
...
}
If the condition is true, then the group of data elements occurs next in the data stream. This
repeats until the condition is not true.
do {
data_element
...
}
while ( condition )
The data element always occurs at least once. The data element is repeated until the
condition is not true.
if ( condition ) {
data_element
...
}
If the condition is true, then the first group of data elements occurs next in the data stream.
else {
data_element
...
}
If the condition is not true, then the second group of data elements occurs next in the data
stream.
for (i = 0; i < n; i++) {
data_element
...
}
The group of data elements occurs n times. Conditional constructs within the group of data
elements may depend on the value of the loop control variable i, which is set to zero for the
first occurrence, incremented to 1 for the second occurrence, and so forth.
As noted, the group of data elements may contain nested conditional constructs. For compactness, the {} are omitted
when only one data element follows:
data_element []
data_element [] is an array of data. The number of data elements is indicated by the context.
data_element [n]
data_element [n] is the n+1th element of an array of data.
data_element [m][n]
data_element [m][n] is the m+1,n+1th element of a two-dimensional array of data.
data_element [l][m][n]
data_element [l][m][n] is the l+1,m+1,n+1th element of a three-dimensional array of data.
data_element [m..n]
is the inclusive range of bits between bit m and bit n in the data_element.
While the syntax is expressed in procedural terms, it should not be assumed that either Figure 2-1 or Figure 2-2
implements a satisfactory decoding procedure. In particular, they define a correct and error-free input bitstream. Actual
decoders must include a means to look for start codes and sync bytes (Transport Stream) in order to begin decoding
correctly, and to identify errors, erasures or insertions while decoding. The methods to identify these situations, and the
actions to be taken, are not standardized.
2.4
Transport Stream bitstream requirements
2.4.1
Transport Stream coding structure and parameters
The ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Transport Stream coding layer allows one or more programs to be
combined into a single stream. Data from each elementary stream are multiplexed together with information that allows
synchronized presentation of the elementary streams within a program.
A Transport Stream consists of one or more programs. Audio and video elementary streams consist of access units.
Elementary Stream data is carried in PES packets. A PES packet consists of a PES packet header followed by packet
data. PES packets are inserted into Transport Stream packets. The first byte of each PES packet header is located at the
first available payload location of a Transport Stream packet.
The PES packet header begins with a 32-bit start-code that also identifies the stream or stream type to which the packet
data belongs. The PES packet header may contain decoding and presentation time stamps (DTS and PTS). The PES
packet header also contains other optional fields. The PES packet data field contains a variable number of contiguous
bytes from one elementary stream.
8
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Transport Stream packets begin with a 4-byte prefix, which contains a 13-bit Packet ID (PID), defined in Table 2-2. The
PID identifies, via the Program Specific Information (PSI) tables, the contents of the data contained in the Transport
Stream packet. Transport Stream packets of one PID value carry data of one and only one elementary stream.
The PSI tables are carried in the Transport Stream. There are Six PSI tables:
•
Program Association Table;
•
Program Map Table;
•
Conditional Access Table;
•
Network Information Table;
•
Transport Stream Description Table;
•
IPMP Control Information Table.
These tables contain the necessary and sufficient information to demultiplex and present programs. The Program Map
Table, in Table 2-33 specifies, among other information, which PIDs, and therefore which elementary streams are
associated to form each program. This table also indicates the PID of the Transport Stream packets which carry the PCR
for each program. The Conditional Access Table shall be present if scrambling is employed. The Network Information
Table is optional and its contents are not specified by this Recommendation | International Standard. The IPMP Control
Information Table shall be present if IPMP as described in ISO/IEC 13818-11 is used by any of the components in the
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream.
Transport Stream packets may be null packets. Null packets are intended for padding of Transport Streams. They may
be inserted or deleted by re-multiplexing processes and, therefore, the delivery of the payload of null packets to the
decoder cannot be assumed.
This Recommendation | International Standard does not specify the coded data which may be used as part of conditional
access systems. This Specification does, however, provide mechanisms for program service providers to transport and
identify this data for decoder processing, and to reference correctly data which are specified by this Specification. This
type of support is provided both through Transport Stream packet structures and in the conditional access table (refer to
Table 2-32 of the PSI).
2.4.2
Transport Stream system target decoder
The semantics of the Transport Stream specified in 2.4.3 and the constraints on these semantics specified in 2.7 require
exact definitions of byte arrival and decoding events and the times at which these occur. The definitions needed are set
out in this Recommendation | International Standard using a hypothetical decoder known as the Transport Stream
System Target Decoder (T-STD). Informative Annex D contains further explanation of the T-STD.
The T-STD is a conceptual model used to define these terms precisely and to model the decoding process during the
construction or verification of Transport Streams. The T-STD is defined only for this purpose. There are three types of
decoders in the T-STD: video, audio, and systems. Figure 2-1 illustrates an example. Neither the architecture of the
T-STD nor the timing described precludes uninterrupted, synchronized play-back of Transport Streams from a variety
of decoders with different architectures or timing schedules.
ITU-T Rec. H.222.0 (05/2006)
9
ISO/IEC 13818-1:2007 (E)
i-th byte of
Transport Stream
RX 1
TB 1
Rbx 1
MB1
A 1 (j)
td1 (j)
EB 1
TBn
t(i)
Bn
TB sys
tp1 (k)
k-th presentation unit
A n (j)
tdn (j)
RX sys
P1 (k)
D1
j-th access unit
RX n
O1
Video
Audio
Pn (k)
Dn
tpn (k)
R sys
B sys
D sys
System control
TISO5810-95/d06
Figure 2-1 – Transport Stream system target decoder notation
The following notation is used to describe the Transport Stream system target decoder and is partially illustrated in
Figure 2-1 above.
i, i′, i″
are indices to bytes in the Transport Stream. The first byte has index 0.
j
is an index to access units in the elementary streams.
k, k′, k″ are indices to presentation units in the elementary streams.
10
n
is an index to the elementary streams.
p
is an index to Transport Stream packets in the Transport Stream.
t(i)
indicates the time in seconds at which the i-th byte of the Transport Stream enters the system target
decoder. The value t(0) is an arbitrary constant.
PCR(i)
is the time encoded in the PCR field measured in units of the period of the 27-MHz system clock
where i is the byte index of the final byte of the program_clock_reference_base field.
An(j)
is the j-th access unit in elementary stream n. An(j) is indexed in decoding order.
tdn(j)
is the decoding time, measured in seconds, in the system target decoder of the j-th access unit in
elementary stream n.
Pn(k)
is the k-th presentation unit in elementary stream n. Pn(k) results from decoding An(j). Pn(k) is
indexed in presentation order.
tpn(k)
is the presentation time, measured in seconds, in the system target decoder of the k-th presentation
unit in elementary stream n.
t
is time measured in seconds.
Fn(t)
is the fullness, measured in bytes, of the system target decoder input buffer for elementary stream n
at time t.
Bn
is the main buffer for elementary stream n. It is present only for audio elementary streams.
BSn
is the size of buffer, Bn, measured in bytes.
Bsys
is the main buffer in the system target decoder for system information for the program that is in the
process of being decoded.
BSsys
is the size of Bsys, measured in bytes.
MBn
is the multiplexing buffer, for elementary stream n. It is present only for video elementary streams.
MBSn
is the size of MBn, measured in bytes.
EBn
is the elementary stream buffer for elementary stream n. It is present only for video elementary
streams.
EBSn
is the size of the elementary stream buffer EBn, measured in bytes.
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
2.4.2.1
TBsys
is the transport buffer for system information for the program that is in the process of being
decoded.
TBSsys
is the size of TBsys, measured in bytes.
TBn
is the transport buffer for elementary stream n.
TBSn
is the size of TBn, measured in bytes.
Dsys
is the decoder for system information in Program Stream n.
Dn
is the decoder for elementary stream n.
On
is the re-order buffer for video elementary stream n.
Rsys
is the rate at which data are removed from Bsys.
Rxn
is the rate at which data are removed from TBn.
Rbxn
is the rate at which PES packet payload data are removed from MBn when the leak method is used.
Defined only for video elementary streams.
Rbxn(j)
is the rate at which PES packet payload data are removed from MBn when the vbv_delay method is
used. Defined only for video elementary streams.
Rxsys
is the rate at which data are removed from TBsys.
Res
is the video elementary stream rate coded in a sequence header.
System clock frequency
Timing information referenced in the T-STD is carried by several data fields defined in this Specification. Refer to
2.4.3.4 and 2.4.3.6. In PCR fields this information is coded as the sampled value of a program's system clock. The
PCR fields are carried in the adaptation field of the Transport Stream packets with a PID value equal to the PCR_PID
defined in the TS_program_map_section of the program being decoded.
Practical decoders may reconstruct this clock from these values and their respective arrival times. The following are
minimum constraints which apply to the program's system clock frequency as represented by the values of the
PCR fields when they are received by a decoder.
The value of the system clock frequency is measured in Hz and shall meet the following constraints:
27 000 000 – 810 ≤ system_clock_frequency ≤ 27 000 000 + 810
rate of change of system_clock_frequency with time ≤ 75 × 10–3 Hz/s
NOTE – Sources of coded data should follow a tighter tolerance in order to facilitate compliant operation of consumer recorders
and playback equipment.
A program's system_clock_frequency may be more accurate than required. Such improved accuracy may be transmitted
to the decoder via the System clock descriptor described in 2.6.20.
Bit rates defined in this Specification are measured in terms of system_clock_frequency. For example, a bit rate of
27 000 000 bits per second in the T-STD would indicate that one byte of data is transferred every eight (8) cycles of the
system clock.
The notation "system_clock_frequency" is used in several places in this Specification to refer to the frequency of a
clock meeting these requirements. For notational convenience, equations in which PCR, PTS, or DTS appear, lead to
values of time which are accurate to some integral multiple of (300 × 233/system_clock_frequency) seconds. This is due
to the encoding of PCR timing information as 33 bits of 1/300 of the system clock frequency plus 9 bits for the
remainder, and encoding as 33 bits of the system clock frequency divided by 300 for PTS and DTS.
2.4.2.2
Input to the Transport Stream system target decoder
Input to the Transport Stream System Target Decoder (T-STD) is a Transport Stream. A Transport Stream may contain
multiple programs with independent time bases. However, the T-STD decodes only one program at a time. In the
T-STD model all timing indications refer to the time base of that program.
Data from the Transport Stream enters the T-STD at a piecewise constant rate. The time t(i) at which the i-th byte enters
the T-STD is defined by decoding the program clock reference (PCR) fields in the input stream, encoded in the
Transport Stream packet adaptation field of the program to be decoded and by counting the bytes in the complete
Transport Stream between successive PCRs of that program. The PCR field (see equation 2-1) is encoded in two parts:
one, in units of the period of 1/300 times the system clock frequency, called program_clock_reference_base
ITU-T Rec. H.222.0 (05/2006)
11
ISO/IEC 13818-1:2007 (E)
(see equation 2-2), and one in units of the system clock frequency called program_clock_reference_extension
(see equation 2-3). The values encoded in these are computed by PCR_base(i) (see equation 2-2) and PCR_ext(i)
(see equation 2-3) respectively. The value encoded in the PCR field indicates the time t(i), where i is the index of the
byte containing the last bit of the program_clock_reference_base field.
Specifically:
PCR (i ) = PCR _ base (i ) × 300 + PCR _ ext (i )
(2-1)
PCR_base (i ) = (( system_clo ck_ freque ncy × t (i )) DIV 300) % 2 33
(2-2)
PCR_ext (i ) = (( system_clo ck_ freque ncy × t (i )) DIV 1) % 300
(2-3)
where:
For all other bytes the input arrival time, t(i) shown in equation 2-4 below, is computed from PCR(i″) and the transport
rate at which data arrive, where the transport rate is determined as the number of bytes in the Transport Stream between
the bytes containing the last bit of two successive program_clock_reference_base fields of the same program divided by
the difference between the time values encoded in these same two PCR fields.
t (i ) =
PCR (i ′′)
system _ clock _ frequency
+
i − i ′′
transport _ rate (i )
(2-4)
where:
i is the index of any byte in the Transport Stream for i″ < i < i′.
i″ is the index of the byte containing the last bit of the most recent program_clock_reference_base
field applicable to the program being decoded.
PCR(i″) is the time encoded in the program clock reference base and extension fields in units of the
system clock.
The transport rate is given by:
transport _ rate (i ) =
((i − i′′) × system _ clock _ frequency )
PCR (i′) − PCR (i′′)
(2-5)
where:
i′ is the index of the byte containing the last bit of the immediately following
program_clock_reference_base field applicable to the program being decoded.
NOTE – i″ < i ≤ i′.
In the case of a timebase discontinuity, indicated by the discontinuity_indicator in the transport packet adaptation field,
the definition given in equation 2-4 and equation 2-5 for the time of arrival of bytes at the input to the T-STD is not
applicable between the last PCR of the old timebase and the first PCR of the new timebase. In this case the time of
arrival of these bytes is determined according to equation 2-4 with the modification that the transport rate used is that
applicable between the last and next to last PCR of the old timebase.
A tolerance is specified for the PCR values. The PCR tolerance is defined as the maximum inaccuracy allowed in
received PCRs. This inaccuracy may be due to imprecision in the PCR values or to PCR modification during
re-multiplexing. It does not include errors in packet arrival time due to network jitter or other causes. The
PCR tolerance is ± 500 ns.
In the T-STD model, the inaccuracy will be reflected as an inaccuracy in the calculated transport rate using
equation 2-5.
Transport Streams with multiple programs and variable rate
Transport Streams may contain multiple programs which have independent time bases. Separate sets of PCRs, as
indicated by the respective PCR_PID values, are required for each such independent program, and therefore the PCRs
12
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
cannot be co-located. The Transport Stream rate is piecewise constant for the program entering the T-STD. Therefore, if
the Transport Stream rate is variable it can only vary at the PCRs of the program under consideration. Since the PCRs,
and therefore the points in the transport Stream where the rate varies, are not co-located, the rate at which the Transport
Stream enters the T-STD would have to differ depending on which program is entering the T-STD. Therefore, it is not
possible to construct a consistent T-STD delivery schedule for an entire Transport Stream when that Transport Stream
contains multiple programs with independent time bases and the rate of the Transport Stream is variable. It is
straightforward, however, to construct constant bit rate Transport Streams with multiple variable rate programs.
2.4.2.3
Buffering
Complete Transport Stream packets containing system information, for the program selected for decoding, enter the
system transport buffer, TBsys, at the Transport Stream rate. These include Transport Stream packets whose PID values
are 0, 1, 2 or 3, and all Transport Stream packets identified via the Program Association Table (see Table 2-30) as
having the program_map_PID value for the selected program. Network Information Table (NIT) data as specified by
the NIT PID is not transferred to TBsys.
NOTE 1 – Size of IPMP Control Information table could be large, and the repetition rate of this table should be adjusted to meet
the buffer requirement.
All bytes that enter the buffer TBn are removed at the rate Rxn specified below. Bytes which are part of the PES packet
or its contents are delivered to the main buffer Bn for audio elementary streams and system data, and to the multiplexing
buffer MBn for video elementary streams. Other bytes are not, and may be used to control the system. Duplicate
Transport Stream packets are not delivered to Bn, MBn, or Bsys.
The buffer TBn is emptied as follows:
–
When there is no data in TBn, Rxn is equal to zero.
–
Otherwise for video:
Rxn = 1, 2 × Rmax [ profile , level ]
where:
Rmax[profile, level] is specified according to the profile and level which can be found in Table 8-13 of ITU-T
Rec. H.262 | ISO/IEC 13818-2. This Table specifies the upper bound of the rate of each elementary video stream within
a specific profile and level.
Rxn is equal to 1, 2 × Rmax for ISO/IEC 11172-2 constrained parameter video streams, where Rmax refers to the
maximum bitrate for a Constrained Parameters bitstream in ISO/IEC 11172-2.
For ISO/IEC 13818-7 ADTS audio:
Number of Channels
Rxn [bit/s]
1-2
2 000 000
3-8
5 529 600
9-12
8 294 400
13-48
33 177 600
Channels: The number of full-bandwidth audio output channels plus the number of independently switched coupling
channel elements within the same elementary audio stream. For example, in the typical case that there are no
independently switched coupling channel elements, mono is 1 channel, stereo is 2 channels and 5.1 channel surround is
5 channels (the LFE channel is not counted).
For other audio,
Rxn = 2 × 10 6 bits per second
For systems data:
Rxn = 1 × 10 6 bits per second
Rxn is measured with respect to the system clock frequency.
ITU-T Rec. H.222.0 (05/2006)
13
ISO/IEC 13818-1:2007 (E)
Complete Transport Stream packets containing system information, for the program selected for decoding, enter the
system transport buffer, TBsys, at the Transport Stream rate. These include Transport Stream packets whose PID values
are 0, 1, 2 and 3 (if present), and all Transport Stream packets identified via the Program Association Table (see
Table 2-30) as having the program_map_PID value for the selected program. Network Information Table (NIT) data as
specified by the NIT PID is not transferred to TBsys.
Bytes are removed from TBsys at the rate Rxsys and delivered to Bsys. Each byte is transferred instantaneously.
Duplicate Transport Stream packets are not delivered to Bsys.
Transport packets which do not enter any TBn or TBsys are discarded.
The transport buffer size is fixed at 512 bytes.
The elementary stream buffer sizes EBS1 through EBSn are defined for video as equal to the vbv_buffer_size as it is
carried in the sequence header. Refer to Summary of Constrained Parameters in ISO/IEC 11172-2 and Table 8-14 of
ITU-T Rec. H.262 | ISO/IEC 13818-2.
The multiplexing buffer size MBS1 through MBSn are defined for video as follows:
For Low and Main level:
MBS n = BS mux + BS oh + VBVmax [ profile , level ] − vbv _ buffer _ size
where BSoh, PES packet overhead buffering is defined as:
BS oh = (1 / 750) seconds × Rmax [ profile , level ]
and BSmux, additional multiplex buffering is defined as:
BS mux = 0.004 seconds × Rmax [ profile , level ]
and where VBVmax[profile, level] is defined in Table 8-14 of ITU-T Rec. H.262 | ISO/IEC 13818-2 and Rmax[profile,
level] is defined in Table 8-13 of ITU-T Rec. H.262 | ISO/IEC 13818-2, and vbv buffer size is carried in the sequence
header described in 6.2.2 of ITU-T Rec. H.262 | ISO/IEC 13818-2.
For High 1440 and High level:
MBS n = BS mux + BS oh
where BSoh is defined as:
BS oh = (1 / 750) seconds × Rmax [ profile , level ]
and BSmux is defined as:
BS mux = 0.004 seconds × Rmax [ profile , level ]
and where Rmax[profile, level] is defined in Table 8-13 of ITU-T Rec. H.262 | ISO/IEC 13818-2.
For Constrained Parameters ISO/IEC 11172-2 bitstreams:
MBS n = BS mux + BS oh + vbv _ max − vbv _ buffer _ size
where BSoh is defined as:
BS oh = (1 / 750) seconds × Rmax
14
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
and BSmux is defined as:
BS mux = 0.004 seconds × Rmax
and where Rmax and vbv_max refer to the maximum bitrate and the maximum vbv_buffer_size for a Constrained
Parameters bitstream in ISO/IEC 11172-2 respectively.
A portion BSmux = 4 ms × Rmax[profile, level] of the MBSn is allocated for buffering to allow multiplexing. The
remainder is available for BSoh and may also be available for initial multiplexing.
NOTE 2 – Buffer occupancy by PES packet overhead is directly bounded in PES streams by the PES-STD which is defined
in 2.5.2.4. It is possible, but not necessary, to utilize PES streams to construct Transport Streams.
Buffer BSn
The main buffer sizes BS1 through BSn are defined as follows.
Audio
For ISO/IEC 13818-7 ADTS audio:
Number of Channels
BSn [bytes]
1-2
3 584
3-8
8 976
9-12
12 804
13-48
51 216
Channels: The number of full-bandwidth audio output channels plus the number of independently switched coupling
channel elements within the same elementary audio stream. For example, in the typical case that there are no
independently switched coupling channel elements, mono is 1 channel, stereo is 2 channels and 5.1 channel surround is
5 channels (the LFE channel is not counted).
For other audio:
BS n = BS mux + BS dec + BS oh = 3584 bytes
The size of the access unit decoding buffer BSdec, and the PES packet overhead buffer BSoh are constrained by:
BS dec + BS oh ≤ 2848 bytes
A portion (736 bytes) of the 3584 byte buffer is allocated for buffering to allow multiplexing. The rest, 2848 bytes, are
shared for access unit buffering BSdec, BSoh and additional multiplexing.
Systems
The main buffer Bsys for system data is of size BSsys = 1536 bytes.
Video
For video elementary streams, data is transferred from MBn to EBn using one of two methods: the leak method or the
VBV delay method.
Leak method
The leak method transfers data from MBn to EBn using a leak rate Rbx. The leak method is used whenever any of the
following is true:
•
the STD descriptor (refer to 2.6.32) for the elementary stream is not present in the Transport Stream;
•
the STD descriptor is present and the leak_valid flag has a value of '1';
•
the STD descriptor is present, the leak_valid has a value of '0', and the vbv_delay fields coded in the
video stream have the value 0xFFFF; or
•
trick mode status is true (refer to 2.4.3.7).
ITU-T Rec. H.222.0 (05/2006)
15
ISO/IEC 13818-1:2007 (E)
For Low and Main level:
Rbx n = Rmax [ profile , level ]
For High-1440 and High level:
Rbx n = Min{1.05 × Res , Rmax [ profile , level ]}
For Constrained Parameters bitstream in ISO/IEC 11172-2:
Rbx n = 1, 2 × Rmax
where Rmax is the maximum bit rate for a Constrained Parameters bitstream in ISO/IEC 11172-2.
If there is PES packet payload data in MBn, and buffer EBn is not full, the PES packet payload is transferred from MBn
to EBn at a rate equal to Rbxn. If EBn is full, data are not removed from MBn. When a byte of data is transferred from
MBn to EBn, all PES packet header bytes that are in MBn and immediately precede that byte, are instantaneously
removed and discarded. When there is no PES packet payload data present in MBn, no data is removed from MBn. All
data that enters MBn leaves it. All PES packet payload data bytes enter EBn instantaneously upon leaving MBn.
Vbv_delay method
The vbv_delay method specifies precisely the time at which each byte of coded video data is transferred from MBn to
EBn, using the vbv_delay values coded in the video elementary stream. The vbv_delay method is used whenever the
STD descriptor (refer to 2.6.32) for this elementary stream is present in the Transport Stream, the leak_valid flag in the
descriptor has the value '0', and vbv_delay fields coded in the video stream are not equal to 0xFFFF. If any vbv_delay
values in a video sequence are not equal to 0xFFFF, none of the vbv_delay fields in that sequence shall be equal to
0xFFFF (refer to ISO/IEC 11172-2 and ITU-T Rec. H.262 | ISO/IEC 13818-2).
When the vbv_delay method is used, the final byte of the video picture start code for picture j is transferred from MBn
to the EBn at the time tdn(j) – vbv_delay(j), where tdn(j) is the decoding time of picture j, as defined above, and
vbv_delay(j) is the delay time, in seconds, indicated by the vbv_delay field of picture j. The transfer of bytes between
the final bytes of successive picture start codes (including the final byte of the second start code), into the buffer EBn, is
at a piecewise constant rate, Rbx(j), which is specified for each picture j. Specifically, the rate, Rbx(j), of transfer into this
buffer is given by:
Rbx ( j ) = NB ( j ) /(vbv _ delay ( j ) − vbv _ delay ( j + 1) + td n ( j + 1) − td n ( j ))
(2-6)
where NB(j) is the number of bytes between the final bytes of the picture start codes (including the final byte of the
second start code) of pictures j and j + 1, excluding PES packet header bytes.
NOTE 3 – vbv_delay(j + 1) and tdn(j + 1) may have values that differ from those normally expected for periodic video display if
the low_delay flag in the video sequence extension is set to '1'. It may not be possible to determine the correct values by
examination of the bit stream.
The Rbx(j) derived from equation 2-6 shall be less than or equal to Rmax[profile, level] for elementary streams of stream
type 0x02 (refer to Table 2-34), where Rmax[profile, level] is defined in ITU-T Rec. H.262 | ISO/IEC 13818-2, and shall
be less than or equal to the maximum bit rate allowed for constrained parameter video elementary streams of stream
type 0x01, refer to ISO/IEC 11172-2.
When a byte of data is transferred from MBn to EBn, all PES packet header bytes that are in MBn and immediately
precede that byte are instantaneously removed and discarded. All data that enters MBn leaves it. All PES packet payload
data bytes enter EBn instantaneously upon leaving MBn.
Removal of access units
For each elementary stream buffer EBn and main buffer Bn all data for the access unit that has been in the buffer
longest, An(j), and any stuffing bytes that immediately precede it that are present in the buffer at the time tdn(j) are
removed instantaneously at time tdn(j). The decoding time tdn(j) is specified in the DTS or PTS fields (refer to 2.4.3.6).
Decoding times tdn(j + 1), tdn(j + 2), ... of access units without encoded DTS or PTS fields which directly follow access
unit j may be derived from information in the elementary stream. Refer to Annex C of ITU-T Rec. H.262 | ISO/IEC
13818-2, ISO/IEC 13818-3, or ISO/IEC 11172. Also refer to 2.7.5. In the case of audio, all PES packet headers that are
stored immediately before the access unit or that are embedded within the data of the access unit are removed
16
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
simultaneously with the removal of the access unit. As the access unit is removed it is instantaneously decoded to a
presentation unit.
System data
In the case of system data, data is removed from the main buffer Bsys at a rate of Rsys whenever there is at least 1 byte
available in buffer Bsys.
Rsys = max (80 000 bits/s, transport _ rate (i ) × 8 bits / byte / 500 )
(2-7)
NOTE 4 – The intention of increasing Rsys in the case of high transport rates is to allow an increased data rate for the Program
Specific Information.
Low delay
When the low_delay flag in the video sequence extension is set to '1' (see 6.2.2.3 of ITU-T Rec. H.262 |
ISO/IEC 13818-2) the EBn buffer may underflow. In this case, when the T-STD elementary stream buffer EBn is
examined at the time specified by tdn(j), the complete data for the access unit may not be present in the buffer EBn.
When this case arises, the buffer shall be re-examined at intervals of two field-periods until the data for the complete
access unit is present in the buffer. At this time the entire access unit shall be removed from buffer EBn instantaneously.
Overflow of buffer EBn shall not occur.
When the low_delay_mode flag is set to '1', EBn underflow is allowed to occur continuously without limit. The T-STD
decoder shall remove access unit data from buffer EBn at the earliest time consistent with the paragraph above and any
DTS or PTS values encoded in the bit stream. Note that the decoder may be unable to re-establish correct decoding and
display times as indicated by DTS and PTS until the EBn buffer underflow situation ceases and a PTS or DTS is found
in the bit stream.
Trick mode
When the DSM_trick_mode flag (2.4.3.6) is set to '1' in the PES Packet header of a packet containing the start of a
B-type video access unit and the trick_mode_control field is set to '001' (slow motion) or '010' (freeze frame), or '100'
(slow reverse) the B-picture access unit is not removed from the video data buffer EBn until the last time of possibly
multiple times that any field of the picture is decoded and presented. Repetition of the presentation of fields and pictures
is defined in 2.4.3.8 under slow motion, slow reverse, and field_id_cntrl. The access unit is removed instantaneously
from EBn at the indicated time, which is dependent on the value of rep_cntrl.
When the DSM_trick_mode flag is set to '1' in the PES packet header of a packet containing the first byte of a picture
start code, trick_mode status becomes true when that picture start code in the PES packet is removed from buffer EBn
Trick mode status remains true until a PES packet header is received by the T-STD in which the DSM_trick_mode flag
is set to '0' and the first byte of the picture start code after that PES packet header is removed from buffer EBn. When
trick mode status is true, the buffer EBn may underflow. All other constraints from normal streams are retained when
trick mode status is true.
2.4.2.4
Decoding
Elementary streams buffered in B1 through Bn and EB1 through EBn are decoded instantaneously by decoders D1
through Dn and may be delayed in re-order buffers O1 through On before being presented at the output of the T-STD.
Re-order buffers are used only in the case of a video elementary stream when some access units are not carried in
presentation order. These access units will need to be re-ordered before presentation. In particular, if Pn(k) is an
I-picture or a P-picture carried before one or more B-pictures, then it must be delayed in the re-order buffer, On, of the
T-STD before being presented. Any picture previously stored in On is presented before the current picture can be stored.
Pn(k) should be delayed until the next I-picture or P-picture is decoded. While it is stored in the re-order buffer, the
subsequent B-pictures are decoded and presented.
The time at which a presentation unit Pn(k) is presented is tpn(k). For presentation units that do not require re-ordering
delay, tpn(k) is equal to tdn(j) since the access units are decoded instantaneously; this is the case, for example, for
B-frames. For presentation units that are delayed, tpn(k) and tdn(j) differ by the time that Pn(k) is delayed in the re-order
buffer, which is a multiple of the nominal picture period. Care should be taken to use adequate re-ordering delay from
the beginning of video elementary streams to meet the requirements of the entire stream. For example, a stream which
initially has only I- and P-pictures but later includes B-pictures should include re-ordering delay starting at the
beginning of the stream.
ITU-T Rec. H.262 | ISO/IEC 13818-2 explains re-ordering of video pictures in greater detail.
ITU-T Rec. H.222.0 (05/2006)
17
ISO/IEC 13818-1:2007 (E)
2.4.2.5
Presentation
The function of a decoding system is to reconstruct presentation units from compressed data and to present them in a
synchronized sequence at the correct presentation times. Although real audio and visual presentation devices generally
have finite and different delays and may have additional delays imposed by post-processing or output functions, the
system target decoder models these delays as zero.
In the T-STD in Figure 2-1 the display of a video presentation unit (a picture) occurs instantaneously at its presentation
time, tpn(k).
In the T-STD the output of an audio presentation unit starts at its presentation time, tpn(k), when the decoder
instantaneously presents the first sample. Subsequent samples in the presentation unit are presented in sequence at the
audio sampling rate.
2.4.2.6
Buffer management
Transport Streams shall be constructed so that conditions defined in this subclause are satisfied. This subclause makes
use of the notation defined for the System Target Decoder.
TBn and TBsys shall not overflow. TBn and TBsys shall empty at least once every second. Bn shall not overflow nor
underflow. Bsys shall not overflow.
EBn shall not underflow except when the low delay flag in the video sequence extension is set to '1' (refer to 6.2.2.3 in
ITU-T Rec. H.262 | ISO/IEC 13818-2) or trick_mode status is true.
When the leak method for specifying transfers is in effect, MBn shall not overflow, and shall empty at least once every
second. EBn shall not overflow.
When the vbv_delay method for specifying transfers is in effect, MBn shall not overflow nor underflow, and EBn shall
not overflow.
The delay of any data through the System Target Decoder buffers shall be less than or equal to one second except for
still picture video data and ISO/IEC 14496 streams. Specifically: tdn(j) – t(i) ≤ 1 second for all j, and all bytes i in
access unit An(j).
For still picture video data, the delay is constrained by tdn(j) – t(i) ≤ 60 seconds for all j, and all bytes i in access
unit An(j).
For ISO/IEC 14496 streams, the delay is constrained by tdn(j) – t(i) ≤ 10 seconds for all j, and all bytes i in access
unit An(j).
Definition of overflow and underflow
Let Fn(t) be the instantaneous fullness of T-STD buffer Bn.
Fn(t) = 0 instantaneously before t = t(0)
Overflow does not occur if:
Fn (t ) ≤ BS n
for all t and n.
Underflow does not occur if:
0 ≤ Fn (t )
for all t and n.
2.4.2.7
T-STD extensions for carriage of ISO/IEC 14496 data
For decoding of ISO/IEC 14496 data carried in a Transport Stream the T-STD model is extended. T-STD parameters
for decoding of individual ISO/IEC 14496 elementary streams are defined in 2.11.2, while 2.11.3 defines
T-STD extensions and parameters for decoding of ISO/IEC 14496 scenes and associated streams.
18
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
2.4.2.8
T-STD extensions for carriage of ITU-T Rec. H.264 | ISO/IEC 14496-10 video
To define the decoding in the T-STD of ITU-T Rec. H.264 | ISO/IEC 14496-10 video streams carried in a Transport
Stream, the T-STD model needs to be extended. The T-STD extension and T-STD parameters for decoding of ITU-T
Rec. H.264 | ISO/IEC 14496-10 video streams are defined in 2.14.3.1.
2.4.3
Specification of the Transport Stream syntax and semantics
The following syntax describes a stream of bytes. Transport Stream packets shall be 188 bytes long.
2.4.3.1
Transport Stream
See Table 2-1.
Table 2-1 – Transport Stream
Syntax
No. of bits
Mnemonic
MPEG_transport_stream() {
do {
transport_packet()
} while (nextbits() = = sync_byte)
}
2.4.3.2
Transport Stream packet layer
See Table 2-2.
Table 2-2 – Transport packet of this Recommendation | International Standard
Syntax
No. of bits
Mnemonic
8
1
bslbf
bslbf
1
1
13
2
2
4
bslbf
bslbf
uimsbf
bslbf
bslbf
uimsbf
8
bslbf
transport_packet(){
sync_byte
transport_error_indicator
payload_unit_start_indicator
transport_priority
PID
transport_scrambling_control
adaptation_field_control
continuity_counter
if(adaptation_field_control = = '10' || adaptation_field_control = = '11'){
adaptation_field()
}
if(adaptation_field_control = = '01' || adaptation_field_control = = '11') {
for (i = 0; i < N; i++){
data_byte
}
}
}
2.4.3.3
Semantic definition of fields in Transport Stream packet layer
sync_byte – The sync_byte is a fixed 8-bit field whose value is '0100 0111' (0x47). Sync_byte emulation in the choice
of values for other regularly occurring fields, such as PID, should be avoided.
transport_error_indicator – The transport_error_indicator is a 1-bit flag. When set to '1' it indicates that at least
1 uncorrectable bit error exists in the associated Transport Stream packet. This bit may be set to '1' by entities external
to the transport layer. When set to '1' this bit shall not be reset to '0' unless the bit value(s) in error have been corrected.
payload_unit_start_indicator – The payload_unit_start_indicator is a 1-bit flag which has normative meaning for
Transport Stream packets that carry PES packets (refer to 2.4.3.6) or PSI data (refer to 2.4.4).
ITU-T Rec. H.222.0 (05/2006)
19
ISO/IEC 13818-1:2007 (E)
When the payload of the Transport Stream packet contains PES packet data, the payload_unit_start_indicator has the
following significance: a '1' indicates that the payload of this Transport Stream packet will commence with the first byte
of a PES packet and a '0' indicates no PES packet shall start in this Transport Stream packet. If the
payload_unit_start_indicator is set to '1', then one and only one PES packet starts in this Transport Stream packet. This
also applies to private streams of stream_type 6 (refer to Table 2-34).
When the payload of the Transport Stream packet contains PSI data, the payload_unit_start_indicator has the following
significance: if the Transport Stream packet carries the first byte of a PSI section, the payload_unit_start_indicator value
shall be '1', indicating that the first byte of the payload of this Transport Stream packet carries the pointer_field. If the
Transport Stream packet does not carry the first byte of a PSI section, the payload_unit_start_indicator value shall be '0',
indicating that there is no pointer_field in the payload. Refer to 2.4.4.1 and 2.4.4.2. This also applies to private streams
of stream_type 5 (refer to Table 2-34).
For null packets the payload_unit_start_indicator shall be set to '0'.
The meaning of this bit for Transport Stream packets carrying only private data is not defined in this Specification.
transport_priority – The transport_priority is a 1-bit indicator. When set to '1' it indicates that the associated packet is
of greater priority than other packets having the same PID which do not have the bit set to '1'. The transport mechanism
can use this to prioritize its data within an elementary stream. Depending on the application the transport_priority field
may be coded regardless of the PID or within one PID only. This field may be changed by channel-specific encoders or
decoders.
PID – The PID is a 13-bit field, indicating the type of the data stored in the packet payload. PID value 0x0000 is
reserved for the Program Association Table (see Table 2-30). PID value 0x0001 is reserved for the Conditional Access
Table (see Table 2-32). PID value 0x0002 is reserved for Transport Stream Description Table (see Table 2-36),
PID value 0x0003 is reserved for IPMP Control Information Table (see ISO/IEC 13818-11) and PID values
0x0004-0x000F are reserved. PID value 0x1FFF is reserved for null packets (see Table 2-3).
Table 2-3 – PID table
Value
Description
0x0000
Program Association Table
0x0001
Conditional Access Table
0x0002
Transport Stream Description Table
0x0003
IPMP Control Information Table
0x0004-0x000F
Reserved
0x0010
…
0x1FFE
May be assigned as network_PID, Program_map_PID, elementary_PID, or for other purposes
0x1FFF
Null packet
NOTE – The transport packets with PID values 0x0000, 0x0001, and 0x0010-0x1FFE are allowed to carry a PCR.
transport_scrambling_control – This 2-bit field indicates the scrambling mode of the Transport Stream packet
payload. The Transport Stream packet header, and the adaptation field when present, shall not be scrambled. In the case
of a null packet the value of the transport_scrambling_control field shall be set to '00' (see Table 2-4).
Table 2-4 – Scrambling control values
Value
20
Description
00
Not scrambled
01
User-defined
10
User-defined
11
User-defined
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
adaptation_field_control – This 2-bit field indicates whether this Transport Stream packet header is followed by an
adaptation field and/or payload (see Table 2-5).
Table 2-5 – Adaptation field control values
Value
Description
00
Reserved for future use by ISO/IEC
01
No adaptation_field, payload only
10
Adaptation_field only, no payload
11
Adaptation_field followed by payload
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 decoders shall discard Transport Stream packets with the
adaptation_field_control field set to a value of '00'. In the case of a null packet the value of the adaptation_field_control
shall be set to '01'.
continuity_counter – The continuity_counter is a 4-bit field incrementing with each Transport Stream packet with the
same PID. The continuity_counter wraps around to 0 after its maximum value. The continuity_counter shall not be
incremented when the adaptation_field_control of the packet equals '00' or '10'.
In Transport Streams, duplicate packets may be sent as two, and only two, consecutive Transport Stream packets of the
same PID. The duplicate packets shall have the same continuity_counter value as the original packet and the
adaptation_field_control field shall be equal to '01' or '11'. In duplicate packets each byte of the original packet shall be
duplicated, with the exception that in the program clock reference fields, if present, a valid value shall be encoded.
The continuity_counter in a particular Transport Stream packet is continuous when it differs by a positive value of one
from the continuity_counter value in the previous Transport Stream packet of the same PID, or when either of the nonincrementing conditions (adaptation_field_control set to '00' or '10', or duplicate packets as described above) are met.
The continuity counter may be discontinuous when the discontinuity_indicator is set to '1' (refer to 2.4.3.4). In the case
of a null packet the value of the continuity_counter is undefined.
data_byte – Data bytes shall be contiguous bytes of data from the PES packets (refer to 2.4.3.6), PSI sections (refer
to 2.4.4), packet stuffing bytes after PSI sections, or private data not in these structures as indicated by the PID. In the
case of null packets with PID value 0x1FFF, data_bytes may be assigned any value. The number of data_bytes, N, is
specified by 184 minus the number of bytes in the adaptation_field(), as described in 2.4.3.4 below.
2.4.3.4
Adaptation field
See Table 2-6.
ITU-T Rec. H.222.0 (05/2006)
21
ISO/IEC 13818-1:2007 (E)
Table 2-6 – Transport Stream adaptation field
Syntax
adaptation_field() {
adaptation_field_length
if (adaptation_field_length > 0) {
discontinuity_indicator
random_access_indicator
elementary_stream_priority_indicator
PCR_flag
OPCR_flag
splicing_point_flag
transport_private_data_flag
adaptation_field_extension_flag
if (PCR_flag = = '1') {
program_clock_reference_base
Reserved
program_clock_reference_extension
}
if (OPCR_flag = = '1') {
original_program_clock_reference_base
Reserved
original_program_clock_reference_extension
}
if (splicing_point_flag = = '1') {
splice_countdown
}
if (transport_private_data_flag = = '1') {
transport_private_data_length
for (i = 0; i < transport_private_data_length; i++) {
private_data_byte
}
}
if (adaptation_field_extension_flag = = '1') {
adaptation_field_extension_length
ltw_flag
piecewise_rate_flag
seamless_splice_flag
Reserved
if (ltw_flag = = '1') {
ltw_valid_flag
ltw_offset
}
if (piecewise_rate_flag = = '1') {
reserved
piecewise_rate
}
if (seamless_splice_flag = = '1') {
Splice_type
DTS_next_AU[32..30]
marker_bit
DTS_next_AU[29..15]
marker_bit
DTS_next_AU[14..0]
marker_bit
}
for (i = 0; i < N; i++) {
reserved
}
}
for (i = 0; i < N; i++) {
stuffing_byte
}
}
}
2.4.3.5
No. of bits
Mnemonic
8
uimsbf
1
1
1
1
1
1
1
1
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
33
6
9
uimsbf
bslbf
uimsbf
33
6
9
uimsbf
bslbf
uimsbf
8
tcimsbf
8
uimsbf
8
bslbf
8
1
1
1
5
uimsbf
bslbf
bslbf
bslbf
bslbf
1
15
bslbf
uimsbf
2
22
bslbf
uimsbf
4
3
1
15
1
15
1
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
8
bslbf
8
bslbf
Semantic definition of fields in adaptation field
adaptation_field_length – The adaptation_field_length is an 8-bit field specifying the number of bytes in the
adaptation_field immediately following the adaptation_field_length. The value '0' is for inserting a single stuffing byte
in a Transport Stream packet. When the adaptation_field_control value is '11', the value of the adaptation_field_length
shall be in the range 0 to 182. When the adaptation_field_control value is '10', the value of the adaptation_field_length
shall be 183. For Transport Stream packets carrying PES packets, stuffing is needed when there is insufficient
22
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
PES packet data to completely fill the Transport Stream packet payload bytes. Stuffing is accomplished by defining an
adaptation field longer than the sum of the lengths of the data elements in it, so that the payload bytes remaining after
the adaptation field exactly accommodates the available PES packet data. The extra space in the adaptation field is filled
with stuffing bytes.
This is the only method of stuffing allowed for Transport Stream packets carrying PES packets. For Transport Stream
packets carrying PSI, an alternative stuffing method is described in 2.4.4.
discontinuity_indicator – This is a 1-bit field which when set to '1' indicates that the discontinuity state is true for the
current Transport Stream packet. When the discontinuity_indicator is set to '0' or is not present, the discontinuity state is
false. The discontinuity indicator is used to indicate two types of discontinuities, system time-base discontinuities and
continuity_counter discontinuities.
A system time-base discontinuity is indicated by the use of the discontinuity_indicator in Transport Stream packets of a
PID designated as a PCR_PID (refer to 2.4.4.9). When the discontinuity state is true for a Transport Stream packet of a
PID designated as a PCR_PID, the next PCR in a Transport Stream packet with that same PID represents a sample of a
new system time clock for the associated program. The system time-base discontinuity point is defined to be the instant
in time when the first byte of a packet containing a PCR of a new system time-base arrives at the input of the T-STD.
The discontinuity_indicator shall be set to '1' in the packet in which the system time-base discontinuity occurs. The
discontinuity_indicator bit may also be set to '1' in Transport Stream packets of the same PCR_PID prior to the packet
which contains the new system time-base PCR. In this case, once the discontinuity_indicator has been set to '1', it shall
continue to be set to '1' in all Transport Stream packets of the same PCR_PID up to and including the Transport Stream
packet which contains the first PCR of the new system time-base. After the occurrence of a system time-base
discontinuity, no fewer than two PCRs for the new system time-base shall be received before another system time-base
discontinuity can occur. Further, except when trick mode status is true, data from no more than two system time-bases
shall be present in the set of T-STD buffers for one program at any time.
Prior to the occurrence of a system time-base discontinuity, the first byte of a Transport Stream packet which contains a
PTS or DTS which refers to the new system time-base shall not arrive at the input of the T-STD. After the occurrence of
a system time-base discontinuity, the first byte of a Transport Stream packet which contains a PTS or DTS which refers
to the previous system time-base shall not arrive at the input of the T-STD.
A continuity_counter discontinuity is indicated by the use of the discontinuity_indicator in any Transport Stream
packet. When the discontinuity state is true in any Transport Stream packet of a PID not designated as a PCR_PID, the
continuity_counter in that packet may be discontinuous with respect to the previous Transport Stream packet of the
same PID. When the discontinuity state is true in a Transport Stream packet of a PID that is designated as a PCR_PID,
the continuity_counter may only be discontinuous in the packet in which a system time-base discontinuity occurs. A
continuity counter discontinuity point occurs when the discontinuity state is true in a Transport Stream packet and the
continuity_counter in the same packet is discontinuous with respect to the previous Transport Stream packet of the same
PID. A continuity counter discontinuity point shall occur at most one time from the initiation of the discontinuity state
until the conclusion of the discontinuity state. Furthermore, for all PIDs that are not designated as PCR_PIDs, when the
discontinuity_indicator is set to '1' in a packet of a specific PID, the discontinuity_indicator may be set to '1' in the next
Transport Stream packet of that same PID, but shall not be set to '1' in three consecutive Transport Stream packet of that
same PID.
For the purpose of this clause, an elementary stream access point is defined as follows:
•
ISO/IEC 11172-2 video and ITU-T Rec. H.262 | ISO/IEC 13818-2 video – The first byte of a video
sequence header.
•
ISO/IEC 14496-2 visual – The first byte of the visual object sequence header.
•
ITU-T Rec. H.264 | ISO/IEC 14496-10 video – The first byte of an AVC access unit. The SPS and PPS
parameter sets referenced in this and all subsequent AVC access units in the coded video stream shall be
provided after this access point in the byte stream and prior to their activation.
•
Audio – The first byte of an audio frame.
After a continuity counter discontinuity in a Transport packet which is designated as containing elementary stream data,
the first byte of elementary stream data in a Transport Stream packet of the same PID shall be the first byte of an
elementary stream access point. In the case of ISO/IEC 11172-2, or ITU-T Rec. H.262 | ISO/IEC 13818-2 or
ISO/IEC 14496-2 video, the first byte of an elementary stream access point may also be the first byte of a
sequence_end_code followed by an elementary stream access point.
Each Transport Stream packet which contains elementary stream data with a PID not designated as a PCR_PID, and in
which a continuity counter discontinuity point occurs, and in which a PTS or DTS occurs, shall arrive at the input of the
T-STD after the system time-base discontinuity for the associated program occurs. In the case where the discontinuity
state is true, if two consecutive Transport Stream packets of the same PID occur which have the same
continuity_counter value and have adaptation_field_control values set to '01' or '11', the second packet may be
ITU-T Rec. H.222.0 (05/2006)
23
ISO/IEC 13818-1:2007 (E)
discarded. A Transport Stream shall not be constructed in such a way that discarding such a packet will cause the loss of
PES packet payload data or PSI data.
After the occurrence of a discontinuity_indicator set to '1' in a Transport Stream packet which contains PSI information,
a single discontinuity in the version_number of PSI sections may occur. At the occurrence of such a discontinuity, a
version of the TS_program_map_sections of the appropriate program shall be sent with section_length = = 13 and the
current_next_indicator = = 1, such that there are no program_descriptors and no elementary streams described. This
shall then be followed by a version of the TS_program_map_section for each affected program with the
version_number incremented by one and the current_next_indicator = = 1, containing a complete program definition.
This indicates a version change in PSI data.
random_access_indicator – The random_access_indicator is a 1-bit field that indicates that the current Transport
Stream packet, and possibly subsequent Transport Stream packets with the same PID, contain some information to aid
random access at this point.
Specifically, when the bit is set to '1', the next PES packet to start in the payload of Transport Stream packets with the
current PID shall contain an elementary stream access point as defined in the semantics for the discontinuity_indicator
field. In addition, in the case of video, a presentation timestamp shall be present for the first picture following the
elementary stream access point.
In the case of audio, the presentation timestamp shall be present in the PES packet containing the first byte of the audio
frame. In the PCR_PID the random_access_indicator may only be set to '1' in Transport Stream packet containing the
PCR fields.
elementary_stream_priority_indicator – The elementary_stream_priority_indicator is a 1-bit field. It indicates,
among packets with the same PID, the priority of the elementary stream data carried within the payload of this
Transport Stream packet. A '1' indicates that the payload has a higher priority than the payloads of other Transport
Stream packets.
In the case of ISO/IEC 11172-2 or ITU-T Rec. H.262 | ISO/IEC 13818-2 or ISO/IEC 14496-2 video, this field may be
set to '1' only if the payload contains one or more bytes from an intra-coded slice.
In the case of ITU-T Rec. H.264 | ISO/IEC 14496-10 video, this field may be set to '1' only if the payload contains one
or more bytes from a slice with slice_type set to 2, 4, 7, or 9.
A value of '0' indicates that the payload has the same priority as all other packets which do not have this bit set to '1'.
PCR_flag – The PCR_flag is a 1-bit flag. A value of '1' indicates that the adaptation_field contains a PCR field coded
in two parts. A value of '0' indicates that the adaptation field does not contain any PCR field.
OPCR_flag – The OPCR_flag is a 1-bit flag. A value of '1' indicates that the adaptation_field contains an OPCR field
coded in two parts. A value of '0' indicates that the adaptation field does not contain any OPCR field.
splicing_point_flag – The splicing_point_flag is a 1-bit flag. When set to '1', it indicates that a splice_countdown field
shall be present in the associated adaptation field, specifying the occurrence of a splicing point. A value of '0' indicates
that a splice_countdown field is not present in the adaptation field.
transport_private_data_flag – The transport_private_data_flag is a 1-bit flag. A value of '1' indicates that the
adaptation field contains one or more private_data bytes. A value of '0' indicates the adaptation field does not contain
any private_data bytes.
adaptation_field_extension_flag – The adaptation_field_extension_flag is a 1-bit field which when set to '1' indicates
the presence of an adaptation field extension. A value of '0' indicates that an adaptation field extension is not present in
the adaptation field.
program_clock_reference_base; program_clock_reference_extension – The program_clock_reference (PCR) is a
42-bit field coded in two parts. The first part, program_clock_reference_base, is a 33-bit field whose value is given by
PCR_base(i), as given in equation 2-2. The second part, program_clock_reference_extension, is a 9-bit field whose
value is given by PCR_ext(i), as given in equation 2-3. The PCR indicates the intended time of arrival of the byte
containing the last bit of the program_clock_reference_base at the input of the system target decoder.
original_program_clock_reference_base; original_program_clock_reference_extension – The optional original
program reference (OPCR) is a 42-bit field coded in two parts. These two parts, the base and the extension, are coded
identically to the two corresponding parts of the PCR field. The presence of the OPCR is indicated by the OPCR_flag.
The OPCR field shall be coded only in Transport Stream packets in which the PCR field is present. OPCRs are
permitted in both single program and multiple program Transport Streams.
OPCR assists in the reconstruction of a single program Transport Stream from another Transport Stream. When
reconstructing the original single program Transport Stream, the OPCR may be copied to the PCR field. The resulting
24
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
PCR value is valid only if the original single program Transport Stream is reconstructed exactly in its entirety. This
would include at least any PSI and private data packets which were present in the original Transport Stream and would
possibly require other private arrangements. It also means that the OPCR must be an identical copy of its associated
PCR in the original single program Transport Stream.
The OPCR is expressed as follows:
OPCR (i ) = OPCR _ base (i ) × 300 + OPCR _ ext (i )
(2-8)
OPCR _ base (i ) = (( system _ clock _ frequency × t (i )) DIV 300)%233
(2-9)
OPCR _ ext (i ) = (( system _ clock _ frequency × t (i )) DIV 1)% 300
(2-10)
where:
The OPCR field is ignored by the decoder. The OPCR field shall not be modified by any multiplexor or decoder.
splice_countdown – The splice_countdown is an 8-bit field, representing a value which may be positive or negative. A
positive value specifies the remaining number of Transport Stream packets, of the same PID, following the associated
Transport Stream packet until a splicing point is reached. Duplicate Transport Stream packets and Transport Stream
packets which only contain adaptation fields are excluded. The splicing point is located immediately after the last byte
of the Transport Stream packet in which the associated splice_countdown field reaches zero. In the Transport Stream
packet where the splice_countdown reaches zero, the last data byte of the Transport Stream packet payload shall be the
last byte of a coded audio frame or a coded picture. In the case of video, the corresponding access unit may or may not
be terminated by a sequence_end_code. Transport Stream packets with the same PID, which follow, may contain data
from a different elementary stream of the same type.
The payload of the next Transport Stream packet of the same PID (duplicate packets and packets without payload being
excluded) shall commence with the first byte of a PES packet. In the case of audio, the PES packet payload shall
commence with an access point. In the case of video, the PES packet payload shall commence with an access point, or
with a sequence_end_code, followed by an access point. Thus, the previous coded audio frame or coded picture aligns
with the packet boundary, or is padded to make this so. Subsequent to the splicing point, the countdown field may also
be present. When the splice_countdown is a negative number whose value is minus n (–n), it indicates that the
associated Transport Stream packet is the n-th packet following the splicing point (duplicate packets and packets
without payload being excluded).
For the definition of an elementary stream access point, see the semantics of discontinuity_indicator.
transport_private_data_length – The transport_private_data_length is an 8-bit field specifying the number of
private_data bytes immediately following the transport private_data_length field. The number of private_data bytes
shall not be such that private data extends beyond the adaptation field.
private_data_byte – The private_data_byte is an 8-bit field that shall not be specified by ITU-T | ISO/IEC.
adaptation_field_extension_length – The adaptation_field_extension_length is an 8-bit field. It indicates the number
of bytes of the extended adaptation field data immediately following this field, including reserved bytes if present.
ltw_flag (legal time window_flag) – This is a 1-bit field which when set to '1' indicates the presence of the ltw_offset
field.
piecewise_rate_flag – This is a 1-bit field which when set to '1' indicates the presence of the piecewise_rate field.
seamless_splice_flag – This is a 1-bit flag which when set to '1' indicates that the splice_type and DTS_next_AU fields
are present. A value of '0' indicates that neither splice_type nor DTS_next_AU fields are present. This field shall not be
set to '1' in Transport Stream packets in which the splicing_point_flag is not set to '1'. Once it is set to '1' in a Transport
Stream packet in which the splice_countdown is positive, it shall be set to '1' in all the subsequent Transport Stream
packets of the same PID that have the splicing_point_flag set to '1', until the packet in which the splice_countdown
reaches zero (including this packet).
When this flag is set, and if the elementary stream carried in this PID is not an ITU-T Rec. H.262 | ISO/IEC 13818-2
video stream, then the splice_type field shall be set to '0000'. If the elementary stream carried in this PID is an ITU-T
Rec. H.262 | ISO/IEC 13818-2 video stream, it shall fulfil the constraints indicated by the splice_type value.
ltw_valid_flag (legal time window_valid_flag) – This is a 1-bit field which when set to '1' indicates that the value of the
ltw_offset shall be valid. A value of '0' indicates that the value in the ltw_offset field is undefined.
ITU-T Rec. H.222.0 (05/2006)
25
ISO/IEC 13818-1:2007 (E)
ltw_offset (legal time window offset) – This is a 15-bit field, the value of which is defined only if the ltw_valid flag has
a value of '1'. When defined, the legal time window offset is in units of (300/fs) seconds, where fs is the system clock
frequency of the program that this PID belongs to, and fulfils:
offset = t1 (i ) − t (i )
ltw _ offset = offset // 1
where i is the index of the first byte of this Transport Stream packet, offset is the value encoded in this field, t(i) is the
arrival time of byte i in the T-STD, and t1(i) is the upper bound in time of a time interval called the Legal Time Window
which is associated with this Transport Stream packet.
The Legal Time Window has the property that if this Transport Stream is delivered to a T-STD starting at time t1(i), i.e.,
at the end of its Legal Time Window, and all other Transport Stream packets of the same program are delivered at the
end of their Legal Time Windows, then:
•
For video – The MBn buffer for this PID in the T-STD shall contain less than 184 bytes of elementary
stream data at the time the first byte of the payload of this Transport Stream packet enters it, and no
buffer violations in the T-STD shall occur.
•
For audio – The Bn buffer for this PID in the T-STD shall contain less than BSdec + 1 bytes of elementary
stream data at the time the first byte of this Transport Stream packet enters it, and no buffer violations in
the T-STD shall occur.
Depending on factors including the size of the buffer MBn and the rate of data transfer between MBn and EBn, it is
possible to determine another time t0(i), such that if this packet is delivered anywhere in the interval [t0(i), t1(i)], no
T-STD buffer violations will occur. This time interval is called the Legal Time Window. The value of t0 is not defined
in this Recommendation | International Standard.
The information in this field is intended for devices such as remultiplexers which may need this information in order to
reconstruct the state of the buffers MBn.
piecewise_rate – The meaning of this 22-bit field is only defined when both the ltw_flag and the ltw_valid_flag are set
to '1'. When defined, it is a positive integer specifying a hypothetical bitrate R which is used to define the end times of
the Legal Time Windows of Transport Stream packets of the same PID that follow this packet but do not include the
legal_time_window_offset field.
Assume that the first byte of this Transport Stream packet and the N following Transport Stream packets of the same
PID have indices Ai, Ai+1, ..., Ai+N, respectively, and that the N latter packets do not have a value encoded in the field
legal_time_window_offset. Then the values t1(Ai+j) shall be determined by:
t1 ( A1+ j ) = t1 ( Ai ) + j × 188 × 8 bits / byte / R
where j goes from 1 to N.
All packets between this packet and the next packet of the same PID to include a legal_time_window_offset field shall
be treated as if they had the value:
offset = t1 ( Ai ) − t ( Ai )
corresponding to the value t1(.) as computed by the formula above encoded in the legal_time_window_offset field. t(j)
is the arrival time of byte j in the T-STD.
The meaning of this field is not defined when it is present in a Transport Stream packet with no
legal_time_window_offset field.
splice_type – This is a 4-bit field. From the first occurrence of this field onwards, it shall have the same value in all the
subsequent Transport Stream packets of the same PID in which it is present, until the packet in which the
splice_countdown reaches zero (including this packet). If the elementary stream carried in that PID is not an ITU-T
Rec. H.262 | ISO/IEC 13818-2 video stream, then this field shall have the value '0000'. If the elementary stream carried
in that PID is an ITU-T Rec. H.262 | ISO/IEC 13818-2 video stream, then this field indicates the conditions that shall be
respected by this elementary stream for splicing purposes. These conditions are defined as a function of profile, level
and splice_type in Table 2-7 through Table 2-20.
26
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
In these tables, a value for 'splice_decoding_delay' and 'max_splice_rate' means that the following conditions shall be
satisfied by the video elementary stream:
1)
The last byte of the coded picture ending in the Transport Stream packet in which the splice_countdown
reaches zero shall remain in the VBV buffer of the VBV model for an amount of time equal to
(splice_decoding_delay tn+1 – tn), where for the purpose of this subclause:
•
n is the index of the coded picture ending in the Transport Stream packet in which the
splice_countdown reaches zero, i.e., the coded picture referred to above.
•
tn is defined in C.3.1 of ITU-T Rec. H.262 | ISO/IEC 13818-2.
•
(tn+1 – tn) is defined in C.9 through C.12 of ITU-T Rec. H.262 | ISO/IEC 13818-2.
NOTE – tn is the time when coded picture n is removed from the VBV buffer, and (tn+1 – tn) is the duration
for which picture n is presented.
2)
The VBV buffer of the VBV model shall not overflow if its input is switched at the splicing point to a
stream of a constant rate equal to 'max_splice_rate' for an amount of time equal to
'splice_decoding_delay'.
Table 2-7 – Splice parameters Table 1
Simple Profile Main Level, Main Profile Main Level, SNR Profile Main Level (both layers),
Spatial Profile High-1440 Level (base layer),
High Profile Main Level (middle + base layers),
Multi-view Profile Main Level (base layer) Video
splice_type
Conditions
0000
splice_decoding_delay = 120 ms; max_splice_rate = 15.0 × 106 bit/s
0001
splice_decoding_delay = 150 ms; max_splice_rate = 12.0 × 106 bit/s
0010
splice_decoding_delay = 225 ms; max_splice_rate = 8.0 × 106 bit/s
0011
splice_decoding_delay = 250 ms; max_splice_rate = 7.2 × 106 bit/s
0100-1011
Reserved
1100-1111
User-defined
Table 2-8 – Splice parameters Table 2
Main Profile Low Level, SNR Profile Low Level (both layers),
High Profile Main Level (base layer),
Multi-view Profile Low Level (base layer) Video
splice_type
Conditions
0000
splice_decoding_delay = 115 ms; max_splice_rate = 4.0 × 106 bit/s
0001
splice_decoding_delay = 155 ms; max_splice_rate = 3.0 × 106 bit/s
0010
splice_decoding_delay = 230 ms; max_splice_rate = 2.0 × 106 bit/s
0011
splice_decoding_delay = 250 ms; max_splice_rate = 1.8 ×106 bit/s
0100-1011
Reserved
1100-1111
User-defined
Table 2-9 – Splice parameters Table 3
Main Profile High-1440 Level, Spatial Profile High-1440 Level (all layers),
High Profile High-1440 Level (middle + base layers),
Multi-view Profile High-1440 Level (base layer) Video
splice_type
Conditions
0000
splice_decoding_delay = 120 ms; max_splice_rate = 60.0 × 106 bit/s
0001
splice_decoding_delay = 160 ms; max_splice_rate = 45.0 × 106bit/s
0010
splice_decoding_delay = 240 ms; max_splice_rate = 30.0 × 106 bit/s
0011
splice_decoding_delay = 250 ms; max_splice_rate = 28.5 × 106 bit/s
0100-1011
Reserved
1100-1111
User-defined
ITU-T Rec. H.222.0 (05/2006)
27
ISO/IEC 13818-1:2007 (E)
Table 2-10 – Splice parameters Table 4
Main Profile High Level, High Profile High-1440 Level (all layers),
High Profile High Level (middle + base layers),
Multi-view Profile High Level (base layer) Video
splice_type
Conditions
0000
splice_decoding_delay = 120 ms; max_splice_rate = 80.0 × 106 bit/s
0001
splice_decoding_delay = 160 ms; max_splice_rate = 60.0 × 106 bit/s
0010
splice_decoding_delay = 240 ms; max_splice_rate = 40.0 × 106 bit/s
0011
splice_decoding_delay = 250 ms; max_splice_rate = 38.0 × 106 bit/s
0100-1011
Reserved
1100-1111
User-defined
Table 2-11 – Splice parameters Table 5
SNR Profile Low Level (base layer) Video
splice_type
Conditions
0000
splice_decoding_delay = 115 ms; max_splice_rate = 3.0 × 106 bit/s
0001
splice_decoding_delay = 175 ms; max_splice_rate = 2.0 × 106 bit/s
0010
splice_decoding_delay = 250 ms; max_splice_rate = 1.4 × 106 bit/s
0011-1011
Reserved
1100-1111
User-defined
Table 2-12 – Splice parameters Table 6
SNR Profile Main Level (base layer) Video
splice_type
Conditions
0000
splice_decoding_delay = 115 ms; max_splice_rate = 10.0 × 106 bit/s
0001
splice_decoding_delay = 145 ms; max_splice_rate = 8.0 × 106 bit/s
0010
splice_decoding_delay = 235 ms; max_splice_rate = 5.0 × 106 bit/s
0011
splice_decoding_delay = 250 ms; max_splice_rate = 4.7 × 106 bit/s
0100-1011
Reserved
1100-1111
User-defined
Table 2-13 – Splice parameters Table 7
Spatial Profile High-1440 Level (middle + base layers) Video
splice_type
28
Conditions
0000
splice_decoding_delay = 120 ms; max_splice_rate = 40.0 × 106 bit/s
0001
splice_decoding_delay = 160 ms; max_splice_rate = 30.0 × 106 bit/s
0010
splice_decoding_delay = 240 ms; max_splice_rate = 20.0 × 106 bit/s
0011
splice_decoding_delay = 250 ms; max_splice_rate = 19.0 × 106 bit/s
0100-1011
Reserved
1100-1111
User-defined
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Table 2-14 – Splice parameters Table 8
High Profile Main Level (all layers), High Profile High-1440 Level (base layer) Video
splice_type
Conditions
0000
splice_decoding_delay = 120 ms; max_splice_rate = 20.0 × 106 bit/s
0001
splice_decoding_delay = 160 ms; max_splice_rate = 15.0 × 106 bit/s
0010
splice_decoding_delay = 240 ms; max_splice_rate = 10.0 × 106 bit/s
0011
splice_decoding_delay = 250 ms; max_splice_rate = 9.5 × 106 bit/s
0100-1011
Reserved
1100-1111
User-defined
Table 2-15 – Splice parameters Table 9
High Profile High Level (base layer),
Multi-view Profile Main Level (both layers) Video
splice_type
Conditions
0000
splice_decoding_delay = 120 ms; max_splice_rate = 25.0 × 106 bit/s
0001
splice_decoding_delay = 165 ms; max_splice_rate = 18.0 × 106 bit/s
0010
splice_decoding_delay = 250 ms; max_splice_rate = 12.0 × 106 bit/s
0011-1011
Reserved
1100-1111
User-defined
Table 2-16 – Splice parameters Table 10
High Profile High Level (all layers),
Multi-view Profile High-1440 Level (both layers) Video
splice_type
Conditions
0000
splice_decoding_delay = 120 ms; max_splice_rate = 100.0 × 106 bit/s
0001
splice_decoding_delay = 160 ms; max_splice_rate = 75.0 × 106 bit/s
0010
splice_decoding_delay = 240 ms; max_splice_rate = 50.0 × 106 bit/s
0011
splice_decoding_delay = 250 ms; max_splice_rate = 48.0 × 106 bit/s
0100-1011
Reserved
1100-1111
User-defined
Table 2-17 – Splice parameters Table 11
4:2:2 Profile Main Level Video
splice_type
Conditions
0000
splice_decoding_delay = 45 ms; max_splice_rate = 50.0 × 106 bit/s
0001
splice_decoding_delay = 90 ms; max_splice_rate = 50.0 × 106 bit/s
0010
splice_decoding_delay = 180 ms; max_splice_rate = 50.0 × 106 bit/s
0011
splice_decoding_delay = 225 ms; max_splice_rate = 40.0 × 106 bit/s
0100
splice_decoding_delay = 250 ms; max_splice_rate = 36.0 × 106 bit/s
0101-1011
Reserved
1100-1111
User-defined
ITU-T Rec. H.222.0 (05/2006)
29
ISO/IEC 13818-1:2007 (E)
Table 2-18 – Splice parameters Table 12
Multi-view Profile Low Level (both layers) Video
splice_type
Conditions
0000
splice_decoding_delay = 115 ms; max_splice_rate = 8.0 × 106 bit/s
0001
splice_decoding_delay = 155 ms; max_splice_rate = 6.0 × 106 bit/s
0010
splice_decoding_delay = 230 ms; max_splice_rate = 4.0 × 106 bit/s
0011
splice_decoding_delay = 250 ms; max_splice_rate = 3.7 × 106 bit/s
0100-1011
Reserved
1100-1111
User-defined
Table 2-19 – Splice parameters Table 13
Multi-view Profile High Level (both layers) Video
splice_type
Conditions
0000
splice_decoding_delay = 120 ms; max_splice_rate = 130.0 × 106 bit/s
0001
splice_decoding_delay = 150 ms; max_splice_rate = 104.0 × 106 bit/s
0010
splice_decoding_delay = 240 ms; max_splice_rate = 65.0 × 106 bit/s
0011
splice_decoding_delay = 250 ms; max_splice_rate = 62.4 × 106 bit/s
0100-1011
Reserved
1100-1111
User-defined
Table 2-20 – Splice parameters Table 14
4:2:2 Profile High Level Video
splice_type
Conditions
0000
splice_decoding_delay = 45 ms; max_splice_rate = 300.0 × 106 bit/s
0001
splice_decoding_delay = 90 ms; max_splice_rate = 300.0 × 106 bit/s
0010-0011
0100
Reserved
splice_decoding_delay = 250 ms; max_splice_rate = 180.0 × 106 bit/s
0101-1011
Reserved
1100-1111
User-defined
DTS_next_AU (decoding time stamp next access unit) – This is a 33-bit field, coded in three parts. In the case of
continuous and periodic decoding through this splicing point it indicates the decoding time of the first access unit
following the splicing point. This decoding time is expressed in the time base which is valid in the Transport Stream
packet in which the splice_countdown reaches zero. From the first occurrence of this field onwards, it shall have the
same value in all the subsequent Transport Stream packets of the same PID in which it is present, until the packet in
which the splice_countdown reaches zero (including this packet).
stuffing_byte – This is a fixed 8-bit value equal to '1111 1111' that can be inserted by the encoder. It is discarded by the
decoder.
30
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
2.4.3.6
PES packet
See Table 2-21.
Table 2-21 – PES packet
Syntax
PES_packet() {
packet_start_code_prefix
stream_id
PES_packet_length
if (stream_id != program_stream_map
&& stream_id != padding_stream
&& stream_id != private_stream_2
&& stream_id != ECM
&& stream_id != EMM
&& stream_id != program_stream_directory
&& stream_id != DSMCC_stream
&& stream_id != ITU-T Rec. H.222.1 type E stream) {
'10'
PES_scrambling_control
PES_priority
data_alignment_indicator
copyright
original_or_copy
PTS_DTS_flags
ESCR_flag
ES_rate_flag
DSM_trick_mode_flag
additional_copy_info_flag
PES_CRC_flag
PES_extension_flag
PES_header_data_length
if (PTS_DTS_flags == '10') {
'0010'
PTS [32..30]
marker_bit
PTS [29..15]
marker_bit
PTS [14..0]
marker_bit
}
if (PTS_DTS_flags == '11') {
'0011'
PTS [32..30]
marker_bit
PTS [29..15]
marker_bit
PTS [14..0]
marker_bit
'0001'
DTS [32..30]
marker_bit
DTS [29..15]
marker_bit
DTS [14..0]
marker_bit
}
if (ESCR_flag == '1') {
Reserved
ESCR_base[32..30]
marker_bit
ESCR_base[29..15]
marker_bit
ESCR_base[14..0]
marker_bit
ESCR_extension
marker_bit
}
No. of bits
Mnemonic
24
8
16
bslbf
uimsbf
uimsbf
2
2
1
1
1
1
2
1
1
1
1
1
1
8
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
uimsbf
4
3
1
15
1
15
1
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
4
3
1
15
1
15
1
4
3
1
15
1
15
1
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
2
3
1
15
1
15
1
9
1
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
uimsbf
bslbf
ITU-T Rec. H.222.0 (05/2006)
31
ISO/IEC 13818-1:2007 (E)
Syntax
if (ES_rate_flag == '1') {
marker_bit
ES_rate
marker_bit
}
if (DSM_trick_mode_flag == '1') {
trick_mode_control
if ( trick_mode_control == fast_forward ) {
field_id
intra_slice_refresh
frequency_truncation
}
else if ( trick_mode_control == slow_motion ) {
rep_cntrl
}
else if ( trick_mode_control == freeze_frame ) {
field_id
Reserved
}
else if ( trick_mode_control == fast_reverse ) {
field_id
intra_slice_refresh
frequency_truncation
else if ( trick_mode_control == slow_reverse ) {
rep_cntrl
}
Else
Reserved
}
if ( additional_copy_info_flag == '1') {
marker_bit
additional_copy_info
}
if ( PES_CRC_flag == '1') {
previous_PES_packet_CRC
}
if ( PES_extension_flag == '1') {
PES_private_data_flag
pack_header_field_flag
program_packet_sequence_counter_flag
P-STD_buffer_flag
Reserved
PES_extension_flag_2
if ( PES_private_data_flag == '1') {
PES_private_data
}
if (pack_header_field_flag == '1') {
pack_field_length
pack_header()
}
if (program_packet_sequence_counter_flag == '1') {
marker_bit
program_packet_sequence_counter
marker_bit
MPEG1_MPEG2_identifier
original_stuff_length
}
if ( P-STD_buffer_flag == '1') {
'01'
P-STD_buffer_scale
P-STD_buffer_size
}
if ( PES_extension_flag_2 == '1') {
marker_bit
PES_extension_field_length
stream_id_extension_flag
If ( stream_id_extension_flag == '0') {
stream_id_extension
for (i = 0; i <
PES_extension_field_length; i++){
reserved
}
}
32
ITU-T Rec. H.222.0 (05/2006)
No. of bits
Mnemonic
1
22
1
bslbf
uimsbf
bslbf
3
uimsbf
2
1
2
bslbf
bslbf
bslbf
5
uimsbf
2
3
uimsbf
bslbf
2
1
2
bslbf
bslbf
bslbf
5
uimsbf
5
bslbf
1
7
bslbf
bslbf
16
bslbf
1
1
1
1
3
1
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
128
bslbf
8
uimsbf
1
7
1
1
6
bslbf
uimsbf
bslbf
bslbf
uimsbf
2
1
13
bslbf
bslbf
uimsbf
1
7
1
bslbf
uimsbf
bslbf
7
uimsbf
8
bslbf
ISO/IEC 13818-1:2007 (E)
Syntax
No. of bits
}
}
for (i < 0; i < N1; i++) {
stuffing_byte
}
for (i < 0; i < N2; i++) {
PES_packet_data_byte
}
}
2.4.3.7
}
else if ( stream_id == program_stream_map
|| stream_id == private_stream_2
|| stream_id == ECM
|| stream_id == EMM
|| stream_id == program_stream_directory
|| stream_id == DSMCC_stream
|| stream_id == ITU-T Rec. H.222.1 type E stream ) {
for (i = 0; i < PES_packet_length; i++) {
PES_packet_data_byte
}
}
else if ( stream_id == padding_stream) {
for (i < 0; i < PES_packet_length; i++) {
padding_byte
}
}
Mnemonic
8
bslbf
8
bslbf
8
bslbf
8
bslbf
Semantic definition of fields in PES packet
packet_start_code_prefix – The packet_start_code_prefix is a 24-bit code. Together with the stream_id that follows it
constitutes a packet start code that identifies the beginning of a packet. The packet_start_code_prefix is the bit string
'0000 0000 0000 0000 0000 0001' (0x000001).
stream_id – In Program Streams, the stream_id specifies the type and number of the elementary stream as defined by
the stream_id Table 2-22. In Transport Streams, the stream_id may be set to any valid value which correctly describes
the elementary stream type as defined in Table 2-22. In Transport Streams, the elementary stream type is specified in
the Program Specific Information as specified in 2.4.4.
PES_packet_length – A 16-bit field specifying the number of bytes in the PES packet following the last byte of the
field. A value of 0 indicates that the PES packet length is neither specified nor bounded and is allowed only in
PES packets whose payload consists of bytes from a video elementary stream contained in Transport Stream packets.
PES_scrambling_control – The 2-bit PES_scrambling_control field indicates the scrambling mode of the PES packet
payload. When scrambling is performed at the PES level, the PES packet header, including the optional fields when
present, shall not be scrambled (see Table 2-23).
Table 2-22 – Stream_id assignments
Stream_id
Note
1011 1100
1011 1101
1011 1110
1011 1111
110x xxxx
1
2
3
stream coding
program_stream_map
private_stream_1
padding_stream
private_stream_2
ISO/IEC 13818-3 or ISO/IEC 11172-3 or ISO/IEC 13818-7 or ISO/IEC 14496-3
audio stream number x xxxx
ITU-T Rec. H.262 | ISO/IEC 13818-2, ISO/IEC 11172-2, ISO/IEC 14496-2
or ITU-T Rec. H.264 | ISO/IEC 14496-10 video stream number xxxx
1110 xxxx
1111 0000
1111 0001
1111 0010
3
3
5
ECM_stream
EMM_stream
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Annex A
or ISO/IEC 13818-6_DSMCC_stream
1111 0011
1111 0100
2
6
ISO/IEC_13522_stream
ITU-T Rec. H.222.1 type A
ITU-T Rec. H.222.0 (05/2006)
33
ISO/IEC 13818-1:2007 (E)
Table 2-22 – Stream_id assignments
Stream_id
Note
1111 0101
1111 0110
1111 0111
1111 1000
1111 1001
1111 1010
1111 1011
1111 1100
1111 1101
1111 1110
1111 1111
6
6
6
6
7
stream coding
ITU-T Rec. H.222.1 type B
ITU-T Rec. H.222.1 type C
ITU-T Rec. H.222.1 type D
ITU-T Rec. H.222.1 type E
ancillary_stream
ISO/IEC 14496-1_SL-packetized_stream
ISO/IEC 14496-1_FlexMux_stream
metadata stream
extended_stream_id
reserved data stream
program_stream_directory
8
4
The notation x means that the values '0' or '1' are both permitted and results in the same stream type. The stream number is given
by the values taken by the x's.
NOTE 1 – PES packets of type program_stream_map have unique syntax specified in 2.5.4.1.
NOTE 2 – PES packets of type private_stream_1 and ISO/IEC_13552_stream follow the same PES packet syntax as those for
ITU-T Rec. H.262 | ISO/IEC 13818-2 video and ISO/IEC 13818-3 audio streams.
NOTE 3 – PES packets of type private_stream_2, ECM_stream and EMM_stream are similar to private_stream_1 except no
syntax is specified after PES_packet_length field.
NOTE 4 – PES packets of type program_stream_directory have a unique syntax specified in 2.5.5.
NOTE 5 – PES packets of type DSM-CC_stream have a unique syntax specified in ISO/IEC 13818-6.
NOTE 6 – This stream_id is associated with stream_type 0x09 in Table 2-34.
NOTE 7 – This stream_id is only used in PES packets, which carry data from a Program Stream or an ISO/IEC 11172-1 System
Stream, in a Transport Stream (refer to 2.4.3.8).
NOTE 8 – The use of stream_id 0xFD (extended_stream_id) identifies that this PES packet employs an extended syntax to permit
additional stream types to be identified.
Table 2-23 – PES scrambling control values
Value
Description
00
Not scrambled
01
User-defined
10
User-defined
11
User-defined
PES_priority – This is a 1-bit field indicating the priority of the payload in this PES packet. A '1' indicates a higher
priority of the payload of the PES packet payload than a PES packet payload with this field set to '0'. A multiplexor can
use the PES_priority bit to prioritize its data within an elementary stream. This field shall not be changed by the
transport mechanism.
data_alignment_indicator – This is a 1-bit flag. When set to a value of '1', it indicates that the PES packet header is
immediately followed by the video syntax element or audio sync word indicated in the
data_stream_alignment_descriptor in 2.6.10 if this descriptor is present. If set to a value of '1' and the descriptor is not
present, alignment as indicated in alignment_type '01' in Table 2-53, Table 2-54 or Table 55 is required. When set to a
value of '0', it is not defined whether any such alignment occurs or not.
copyright – This is a 1-bit field. When set to '1' it indicates that the material of the associated PES packet payload is
protected by copyright. When set to '0' it is not defined whether the material is protected by copyright. A copyright
descriptor described in 2.6.24 is associated with the elementary stream which contains this PES packet and the
copyright flag is set to '1' if the descriptor applies to the material contained in this PES packet.
original_or_copy – This is a 1-bit field. When set to '1' the contents of the associated PES packet payload is an
original. When set to '0' it indicates that the contents of the associated PES packet payload is a copy.
34
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
PTS_DTS_flags – This is a 2-bit field. When the PTS_DTS_flags field is set to '10', the PTS fields shall be present in
the PES packet header. When the PTS_DTS_flags field is set to '11', both the PTS fields and DTS fields shall be present
in the PES packet header. When the PTS_DTS_flags field is set to '00' no PTS or DTS fields shall be present in the PES
packet header. The value '01' is forbidden.
ESCR_flag – A 1-bit flag, which when set to '1' indicates that ESCR base and extension fields are present in the PES
packet header. When set to '0' it indicates that no ESCR fields are present.
ES_rate_flag – A 1-bit flag, which when set to '1' indicates that the ES_rate field is present in the PES packet header.
When set to '0' it indicates that no ES_rate field is present.
DSM_trick_mode_flag – A 1-bit flag, which when set to '1' it indicates the presence of an 8-bit trick mode field. When
set to '0' it indicates that this field is not present.
additional_copy_info_flag – A 1-bit flag, which when set to '1' indicates the presence of the additional_copy_info
field. When set to '0' it indicates that this field is not present.
PES_CRC_flag – A 1-bit flag, which when set to '1' indicates that a CRC field is present in the PES packet. When set
to '0' it indicates that this field is not present.
PES_extension_flag – A 1-bit flag, which when set to '1' indicates that an extension field exists in this PES packet
header. When set to '0' it indicates that this field is not present.
PES_header_data_length – An 8-bit field specifying the total number of bytes occupied by the optional fields and any
stuffing bytes contained in this PES packet header. The presence of optional fields is indicated in the byte that precedes
the PES_header_data_length field.
marker_bit – A marker_bit is a 1-bit field that has the value '1'.
PTS (presentation time stamp) – Presentation times shall be related to decoding times as follows: The PTS is a 33-bit
number coded in three separate fields. It indicates the time of presentation, tpn(k), in the system target decoder of a
presentation unit k of elementary stream n. The value of PTS is specified in units of the period of the system clock
frequency divided by 300 (yielding 90 kHz). The presentation time is derived from the PTS according to equation 2-11
below. Refer to 2.7.4 for constraints on the frequency of coding presentation timestamps.
PTS (k ) = (( system _ clock _ frequency × tpn (k )) DIV 300)% 233
(2-11)
where tpn(k) is the presentation time of presentation unit Pn(k).
In the case of audio, if a PTS is present in PES packet header it shall refer to the first access unit commencing in the
PES packet. An audio access unit commences in a PES packet if the first byte of the audio access unit is present in the
PES packet.
In the case of ISO/IEC 11172-2 video or ISO/IEC 14496-2 video, if a PTS is present in a PES packet header, it shall
refer to the access unit containing the first picture start code that commences in this PES packet. A picture start code
commences in a PES packet if the first byte of the picture start code is present in the PES packet. For I- and P-pictures
in non-low_delay sequences and in the case when there is no decoding discontinuity between access units (AUs) k and
k', the presentation time tpn(k) shall be equal to the decoding time tdn(k') of the next transmitted I- or P-picture (refer to
2.7.5). If there is a decoding discontinuity, or the stream ends, the difference between tpn(k) and tdn(k) shall be the same
as if the original stream had continued without a discontinuity and without ending.
NOTE 1 – A low_delay sequence is an ISO/IEC 14496-2 video sequence in which the low_delay flag is set to '1' (refer to 6.2.3 of
ISO/IEC 14496-2).
For ITU-T Rec. H.262 | ISO/IEC 13818-2 video, if a PTS is present in a PES packet header, it shall refer to the access
unit containing the first picture start code that commences in this PES packet. A picture start code commences in a PES
packet if the first byte of the picture start code is present in the PES packet. For I- and P-coded frames in non-low_delay
sequences and in the case when there is no decoding discontinuity between access units (AUs) k and k', the presentation
time tpn(k) shall be equal to the decoding time tdn(k') of the next transmitted I- or P-coded frame (refer to 2.7.5). If there
is a decoding discontinuity, or the stream ends, the difference between tpn(k) and tdn(k) shall be the same as if the
original stream had continued without a discontinuity and without ending.
NOTE 2 – A low_delay sequence is an ITU-T Rec. H.262 | ISO/IEC 13818-2 video sequence in which the low_delay flag is set
to '1' (refer to 6.2.2.3 of ITU-T Rec. H.262 | ISO/IEC 13818-2). Also note that for field pictures the presentation time refers to the
first field picture of the coded frame.
For ITU-T Rec. H.264 | ISO/IEC 14496-10 video, if a PTS is present in the PES packet header, it shall refer to the first
AVC access unit that commences in this PES packet. An AVC access unit commences in a PES packet if the first byte
of the AVC access unit is present in the PES packet. To achieve consistency between the STD model and the HRD
ITU-T Rec. H.222.0 (05/2006)
35
ISO/IEC 13818-1:2007 (E)
model defined in Annex C of ITU-T Rec. H.264 | ISO/IEC 14496-10, for each decoded AVC access unit, the PTS value
in the STD shall, within the accuracy of their respective clocks, indicate the same instant in time as the nominal DPB
output time in the HRD, defined herein as to,n,dpb(n) = tr,n(n) + tc * dpb_output_delay(n), where tr,n(n), tc, and
dpb_output_delay(n) are defined as in Annex C of ITU-T Rec. H.264 | ISO/IEC 14496-10.
NOTE 3 – Different clocks may be used for derivation of PTS and to,n,dpb(n).
The presentation time tpn(k) shall be equal to the decoding time tdn(k) for:
•
audio access units;
•
access units in ITU-T Rec. H.262 | ISO/IEC 13818-2 or ISO/IEC 14496-2 low delay video sequences;
•
B-pictures in ISO/IEC 11172-2, ITU-T Rec. H.262 | ISO/IEC 13818-2 or ISO/IEC 14496-2 video
streams.
If there is filtering in audio, it is assumed by the system model that filtering introduces no delay, hence the sample
referred to by PTS at encoding is the same sample referred to by PTS at decoding. In the case of scalable coding refer
to 2.7.6.
DTS (decoding time stamp) – The DTS is a 33-bit number coded in three separate fields. It indicates the decoding
time, tdn(j), in the system target decoder of an access unit j of elementary stream n. The value of DTS is specified in
units of the period of the system clock frequency divided by 300 (yielding 90 kHz). The decoding time derived from the
DTS according to equation 2-12 below:
DTS ( j ) = (( system _ clock _ frequency × td n ( j )) DIV 300)% 233
(2-12)
where tdn(j) is the decoding time of access unit An(j).
In the case of ISO/IEC 11172-2 video, ITU-T Rec. H.262 | ISO/IEC 13818-2 video, or ISO/IEC 14496-2 video, if a
DTS is present in a PES packet header, it shall refer to the access unit containing the first picture start code that
commences in this PES packet. A picture start code commences in a PES packet if the first byte of the picture start code
is present in the PES packet.
For ITU-T Rec. H.264 | ISO/IEC 14496-10 video, if a DTS is present in the PES packet header, it shall refer to the first
AVC access unit that commences in this PES packet. An AVC access unit commences in a PES packet if the first byte
of the AVC access unit is present in the PES packet. To achieve consistency between the STD model and the
HRD model defined in Annex C of ITU-T Rec. H.264 | ISO/IEC 14496-10, for each AVC access unit the DTS value in
the STD shall, within the accuracy of their respective clocks, indicate the same instant in time as the nominal
CPB removal time tr,n( n ) in the HRD, as defined in Annex C of ITU-T Rec. H.264 | ISO/IEC 14496-10.
NOTE 4 – Different clocks may be used for derivation of DTS and tr,n( n ).
In the case of scalable coding refer to 2.7.6.
ESCR_base; ESCR_extension – The elementary stream clock reference is a 42-bit field coded in two parts. The first
part, ESCR_base, is a 33-bit field whose value is given by ESCR_base(i), as given in equation 2-14. The second part,
ESCR_ext, is a 9-bit field whose value is given by ESCR_ext(i), as given in equation 2-15. The ESCR field indicates
the intended time of arrival of the byte containing the last bit of the ESCR_base at the input of the PES-STD for PES
streams (refer to 2.5.2.4).
Specifically:
ESCR(i ) = ESCR _ base(i ) × 300 + ESCR _ ext (i )
(2-13)
ESCR _ base(i ) = (( system _ clock _ frequency × t (i )) DIV 300)% 233
(2-14)
ESCR _ ext (i ) = (( system _ clock _ frequency × t (i )) DIV 1)% 300
(2-15)
where:
The ESCR and ES_rate field (refer to semantics immediately following) contain timing information relating to the
sequence of PES streams. These fields shall satisfy the constraints defined in 2.7.3.
ES_rate (elementary stream rate) – The ES_rate field is a 22-bit unsigned integer specifying the rate at which the
system target decoder receives bytes of the PES packet in the case of a PES stream. The ES_rate is valid in the PES
36
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
packet in which it is included and in subsequent PES packets of the same PES stream until a new ES_rate field is
encountered. The value of the ES_rate is measured in units of 50 bytes/second. The value '0' is forbidden. The value of
the ES_rate is used to define the time of arrival of bytes at the input of a P-STD for PES streams defined in 2.5.2.4. The
value encoded in the ES_rate field may vary from PES_packet to PES_packet.
trick_mode_control – A 3-bit field that indicates which trick mode is applied to the associated video stream. In cases
of other types of elementary streams, the meanings of this field and those defined by the following five bits are
undefined. For the definition of trick_mode status, refer to the trick mode section of 2.4.2.3.
When trick_mode status is false, the number of times N, a picture is output by the decoding process for progressive
sequences, is specified for each picture by the repeat_first_field and top_field_first fields in the case of ITU-T
Rec. H.262 | ISO/IEC 13818-2 Video, and is specified through the sequence header in the case of ISO/IEC 11172-2
Video.
For interlaced sequences, when trick_mode status is false, the number of times N, a picture is output by the decoding
process for progressive sequences, is specified for each picture by the repeat_first_field and progressive_frame fields in
the case of ITU-T Rec. H.262 | ISO/IEC 13818-2 Video.
When trick mode status is true, the number of times that a picture shall be displayed depends on the value of N.
When the value of this field changes or trick mode operations cease, any combination of the following may occur:
•
discontinuity in the time base;
•
decoding discontinuity;
•
continuity counter discontinuity.
Table 2-24 – Trick mode control values
Value
'000'
Description
Fast forward
'001'
Slow motion
'010'
Freeze frame
'011'
Fast reverse
'100'
Slow reverse
'101'-'111'
Reserved
In the context of trick mode, the non-normal speed of decoding and presentation may cause the values of certain fields
defined in video elementary stream data to be incorrect. Likewise, the semantic constraint on the slice structure may be
invalid. The video syntax elements to which this exception applies are:
•
bit_rate;
•
vbv_delay;
•
repeat_first_field;
•
v_axis_positive;
•
field_sequence;
•
subcarrier;
•
burst_amplitude;
•
subcarrier_phase.
A decoder cannot rely on the values encoded in these fields when in trick mode.
Decoders are not normatively required to decode the trick_mode_control field. However, the following normative
requirements shall apply to decoders that do decode the trick_mode_control field.
fast forward – The value '000', in the trick_mode_control field. When this value is present it indicates a fast forward
video stream and defines the meaning of the following five bits in the PES packet header. The intra_slice_refresh bit
may be set to '1' indicating that there may be missing macroblocks which the decoder may replace with co-sited
macroblocks of previously decoded pictures. The field_id field, defined in Table 2-25, indicates which field or fields
should be displayed. The frequency_truncation field indicates that a restricted set of coefficients may be included. The
meaning of the values of this field are shown in Table 2-26.
ITU-T Rec. H.222.0 (05/2006)
37
ISO/IEC 13818-1:2007 (E)
slow motion – The value '001', in the trick_mode_control field. When this value is present it indicates a slow motion
video stream and defines the meaning of the following five bits in the PES packet header. In the case of progressive
sequences, the picture should be displayed N × rep_cntrl times, where N is defined above.
In the case of ISO/IEC 11172-2 Video and ITU-T Rec. H.262 | ISO/IEC 13818-2 Video progressive sequences, the
picture should be displayed for N × rep_cntrl picture duration.
In the case of ITU-T Rec. H.262 | ISO/IEC 13818-2 interlaced sequences, the picture should be displayed for
N × rep_cntrl field duration. If the picture is a frame picture, the first field to be displayed is the top field if
top_field_first is 1, and the bottom field if top_field_first is '0' (refer to ITU-T Rec. H.262 | ISO/IEC 13818-2). This
field is displayed for N × rep_cntrl / 2 field duration. The other field of the picture is then displayed for N –
N × rep_cntrl / 2 field duration.
freeze frame – The value '010', in the trick_mode_control field. When this value is present it indicates a freeze frame
video stream and defines the meaning of the following five bits in the PES packet header. The field_id field, defined in
Table 2-25, identifies which field(s) should be displayed. The field_id field refers to the first video access unit that
commences in the PES packet which contains the field_id field, unless the PES packet contains zero payload bytes. In
the latter case the field_id field refers to the most recent previous video access unit.
fast reverse – The value '011', in the trick_mode_control field. When this value is present it indicates a fast reverse
video stream and defines the meaning of the following five bits in the PES packet header. The intra_slice_refresh bit
may be set to '1' indicating that there may be missing macroblocks which the decoder may replace with co-sited
macroblocks of previously decoded pictures. The field_id field, defined in Table 2-25, indicates which field or fields
should be displayed. The frequency_truncation field indicates that a restricted set of coefficients may be included. The
meaning of the values of this field are shown in Table 2-26.
slow reverse – The value '100', in the trick_mode_control field. When this value is present it indicates a slow reverse
video stream and defines the meaning of the following five bits in the PES packet header. In the case of
ISO/IEC 11172-2 Video and ITU-T Rec. H.262 | ISO/IEC 13818-2 Video progressive sequences, the picture should be
displayed for N × rep_cntrl picture duration, where N is defined above.
In the case of ITU-T Rec. H.262 | ISO/IEC 13818-2 interlaced sequences, the picture should be displayed for
N × rep_cntrl field duration. If the picture is a frame picture, the first field to be displayed is the bottom field if
top_field_first is 1, and the top field if top_field_first is '0' (refer to ITU-T Rec. H.262 | ISO/IEC 13818-2). This field is
displayed for N × rep_cntrl / 2 field duration. The other field of the picture is then displayed for N – N × rep_cntrl / 2
field duration.
field_id – A 2-bit field that indicates which field(s) should be displayed. It is coded according to Table 2-25.
Table 2-25 – Field_id field control values
Value
Description
'00'
Display from top field only
'01'
Display from bottom field only
'10'
Display complete frame
'11'
Reserved
intra_slice_refresh – A 1-bit flag, which when set to '1', indicates that there may be missing macroblocks between
coded slices of video data in this PES packet. When set to '0' this may not occur. For more information, see ITU-T
Rec. H.262 | ISO/IEC 13818-2. The decoder may replace missing macroblocks with co-sited macroblocks of previously
decoded pictures.
frequency_truncation – A 2-bit field which indicates that a restricted set of coefficients may have been used in coding
the video data in this PES packet. The values are defined in Table 2-26.
Table 2-26 – Coefficient selection values
Value
38
Description
'00'
Only DC coefficients are non-zero
'01'
Only the first three coefficients are non-zero
'10'
Only the first six coefficients are non-zero
'11'
All coefficients may be non-zero
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
rep_cntrl – A 5-bit field that indicates the number of times each field in an interlaced picture should be displayed, or
the number of times that a progressive picture should be displayed. It is a function of the trick_mode_control field and
the top_field_first bit in the video sequence header whether the top field or the bottom field should be displayed first in
the case of interlaced pictures. The value '0' is forbidden.
additional_copy_info – This 7-bit field contains private data relating to copyright information.
previous_PES_packet_CRC – The previous_PES_packet_CRC is a 16-bit field that contains the CRC value that
yields a zero output of the 16 registers in the decoder similar to the one defined in Annex A, but with the polynomial:
x16 + x12 + x5 + 1
after processing the data bytes of the previous PES packet, exclusive of the PES packet header.
NOTE 5 – This CRC is intended for use in network maintenance such as isolating the source of intermittent errors. It is not
intended for use by elementary stream decoders. It is calculated only over the data bytes because PES packet header data can be
modified during transport.
PES_private_data_flag – A 1-bit flag which when set to '1' indicates that the PES packet header contains private data.
When set to a value of '0' it indicates that private data is not present in the PES header.
pack_header_field_flag – A 1-bit flag which when set to '1' indicates that an ISO/IEC 11172-1 pack header or a
Program Stream pack header is stored in this PES packet header. If this field is in a PES packet that is contained in a
Program Stream, then this field shall be set to '0'. In a Transport Stream, when set to the value '0' it indicates that no
pack header is present in the PES header.
program_packet_sequence_counter_flag – A 1-bit flag which when set to '1' indicates that the
program_packet_sequence_counter, MPEG1_MPEG2_identifier, and original_stuff_length fields are present in this
PES packet. When set to a value of '0' it indicates that these fields are not present in the PES header.
P-STD_buffer_flag – A 1-bit flag which when set to '1' indicates that the P-STD_buffer_scale and P-STD_buffer_size
are present in the PES packet header. When set to a value of '0' it indicates that these fields are not present in the
PES header.
PES_extension_flag_2 – A 1-bit field which when set to '1' indicates the presence of the PES_extension_field_length
field and associated fields. When set to a value of '0' this indicates that the PES_extension_field_length field and any
associated fields are not present.
PES_private_data – This is a 16-byte field which contains private data. This data, combined with the fields before and
after, shall not emulate the packet_start_code_prefix (0x000001).
pack_field_length – This is an 8-bit field which indicates the length, in bytes, of the pack_header_field().
program_packet_sequence_counter – The program_packet_sequence_counter field is a 7-bit field. It is an optional
counter that increments with each successive PES packet from a Program Stream or from an ISO/IEC 11172-1 Stream
or the PES packets associated with a single program definition in a Transport Stream, providing functionality similar to
a continuity counter (refer to 2.4.3.2). This allows an application to retrieve the original PES packet sequence of a
Program Stream or the original packet sequence of the original ISO/IEC 11172-1 stream. The counter will wrap around
to 0 after its maximum value. Repetition of PES packets shall not occur. Consequently, no two consecutive PES packets
in the program multiplex shall have identical program_packet_sequence_counter values.
MPEG1_MPEG2_identifier – A 1-bit flag which when set to '1' indicates that this PES packet carries information
from an ISO/IEC 11172-1 stream. When set to '0' it indicates that this PES packet carries information from a Program
Stream.
original_stuff_length – This 6-bit field specifies the number of stuffing bytes used in the original ITU-T
Rec. H.222.0 | ISO/IEC 13818-1 PES packet header or in the original ISO/IEC 11172-1 packet header.
P-STD_buffer_scale – The P-STD_buffer_scale is a 1-bit field, the meaning of which is only defined if this PES
packet is contained in a Program Stream. It indicates the scaling factor used to interpret the subsequent
P-STD_buffer_size field. If the preceding stream_id indicates an audio stream, P-STD_buffer_scale shall have the value
'0'. If the preceding stream_id indicates a video stream, P-STD_buffer_scale shall have the value '1'. For all other stream
types, the value may be either '1' or '0'.
ITU-T Rec. H.222.0 (05/2006)
39
ISO/IEC 13818-1:2007 (E)
P-STD_buffer_size – The P-STD_buffer_size is a 13-bit unsigned integer, the meaning of which is only defined if this
PES packet is contained in a Program Stream. It defines the size of the input buffer, BSn, in the P-STD. If
P-STD_buffer_scale has the value '0', then the P-STD_buffer_size measures the buffer size in units of 128 bytes. If
P-STD_buffer_scale has the value '1', then the P-STD_buffer_size measures the buffer size in units of 1024 bytes. Thus:
if (P − STD _ buffer _ scale == 0)
BS n = P − STD _ buffer _ size × 128
(2-16)
else:
BS n = P − STD _ buffer _ size × 1024
(2-17)
The encoded value of the P-STD buffer size takes effect immediately when the P-STD_buffer_size field is received by
the ITU-T Rec. H.222.0 | ISO/IEC 13818-1 System Target Decoder (refer to 2.7.7).
The size BSn shall be larger than or equal to the size of the CPB signalled by the CpbSize[ cpb_cnt_minus1 ] specified
by the NAL hrd_parameters() in the AVC video stream. If the NAL hrd_parameters() are not present in the AVC video
stream, then BSn shall be larger than or equal to the size of the NAL CPB for the byte stream format defined in
Annex A of ITU-T Rec. H.264 | ISO/IEC 14496-10 as 1200 × MaxCPB for the applied level.
PES_extension_field_length – This is a 7-bit field which specifies the length, in bytes, of the data following this field
in the PES extension field up to and including any reserved bytes.
stream_id_extension_flag – A 1-bit flag, which when set to '0' indicates that a stream_id_extension field is present in
the PES packet header. The value of '1' for this flag is reserved.
stream_id_extension – In Program Streams, the stream_id_extension specifies the type and number of the elementary
stream as defined by the stream_id_extension in Table 2-27. In Transport Streams, the stream_id_extension may be set
to any valid value which correctly describes the elementary stream type as defined in Table 2-27. In Transport Streams,
the elementary stream type is specified in the Program Specific Information as specified in 2.4.4. Note that this field is
used as an extension of the stream_id defined above. This field shall not be used unless the value of stream_id is
1111 1101.
Table 2-27 – Stream_id_extension assignments
stream_id_extension
Note
000 0000
000 0001
000 0010 … 011 1111
100 0000 … 111 1111
1
2
stream coding
IPMP Control Information stream
IPMP stream
reserved_data_stream
private_stream
NOTE 1 – PES packets of stream_id_extension 0b000 0000 (IPMP Control Information Stream) have a unique syntax specified in
ISO/IEC 13818-11 (MPEG-2 IPMP).
NOTE 2 – PES packets of stream_id_extension 0b000 0001 (IPMP Stream) have a unique syntax specified in ISO/IEC 13818-11
(MPEG-2 IPMP).
stuffing_byte – This is a fixed 8-bit value equal to '1111 1111' that can be inserted by the encoder, for example to meet
the requirements of the channel. It is discarded by the decoder. No more than 32 stuffing bytes shall be present in one
PES packet header.
PES_packet_data_byte – PES_packet_data_bytes shall be contiguous bytes of data from the elementary stream
indicated by the packet's stream_id or PID. When the elementary stream data conforms to ITU-T
Rec. H.262 | ISO/IEC 13818-2 or ISO/IEC 13818-3, the PES_packet_data_bytes shall be byte aligned to the bytes of
this Recommendation | International Standard. The byte-order of the elementary stream shall be preserved. The number
of PES_packet_data_bytes, N, is specified by the PES_packet_length field. N shall be equal to the value indicated in the
PES_packet_length minus the number of bytes between the last byte of the PES_packet_length field and the first
PES_packet_data_byte.
In the case of a private_stream_1, private_stream_2, ECM_stream, or EMM_stream, the contents of the
PES_packet_data_byte field are user definable and will not be specified by ITU-T | ISO/IEC in the future.
padding_byte – This is a fixed 8-bit value equal to '1111 1111'. It is discarded by the decoder.
40
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
2.4.3.8
Carriage of Program Streams and ISO/IEC 11172-1 Systems streams in the Transport Stream
The Transport Stream contains optional fields to support the carriage of Program Streams and ISO/IEC 11172-1
Systems streams, in a way that allows simple reconstruction of the respective stream at the decoder.
When placing a Program Stream into a Transport Stream, Program Stream PES packets with stream_id values of
private_stream_1, ITU-T Rec. H.262 | ISO/IEC 13818-2 or ISO/IEC 11172-2 video, and ISO/IEC 13818-3 or
ISO/IEC 11172-3 audio, are carried in Transport Stream packets.
For these PES packets, when reconstructing the Program Stream at the Transport Stream decoder, the PES packet data
is copied to the Program Stream being reconstructed.
For Program Streams PES packets with stream_id values of program_stream_map, padding_stream, private_stream_2,
ECM, EMM, DSM_CC_stream, or program_stream_directory, all the bytes of the Program Stream PES packet, except
for the packet_start_code_prefix, are placed into the data_bytes fields of a new PES packet. The stream_id of this new
PES packet has the value of ancillary_stream (refer to Table 2-22). This new PES packet is then carried in Transport
Stream packets.
When reconstructing the Program Stream at the Transport Stream decoder, for PES packets with a stream_id value of
ancillary_stream_id, packet_start_code_prefix is written to the Program Stream being reconstructed, followed by the
data_byte fields from these Transport Stream PES packets.
ISO/IEC 11172-1 streams are carried within Transport Streams by first replacing ISO/IEC 11172-1 packet headers with
ITU-T Rec. H.262 | ISO/IEC 13818-2 PES packet headers. ISO/IEC 11172-1 packet header field values are copied to
the equivalent ITU-T Rec. H.262 | ISO/IEC 13818-2 PES packet header fields.
The program_packet_sequence_counter field is included within the header of each PES packet carrying data from a
Program Stream, or an ISO/IEC 11172-1 System stream. This allows the order of PES packets in the original Program
Stream, or packets in the original ISO/IEC 11172-1 System stream, to be reproduced at the decoder.
The pack_header() field of a Program Stream, or an ISO/IEC 11172-1 System stream, is carried in the Transport Stream
in the header of the immediately following PES packet.
2.4.4
Program specific information
Program Specific Information (PSI) includes both ITU-T Rec. H.222.0 | ISO/IEC 13818-1 normative data and private
data that enable demultiplexing of programs by decoders. Programs are composed of one or more elementary streams,
each labelled with a PID. Programs, elementary streams or parts thereof may be scrambled for conditional access.
However, Program Specific Information shall not be scrambled.
In Transport Streams, Program Specific Information is classified into six table structures as shown in Table 2-28. While
these structures may be thought of as simple tables, they shall be segmented into sections and inserted in Transport
Stream packets, some with predetermined PIDs and others with user selectable PIDs.
Table 2-28 – Program specific information
Structure Name
Stream Type
Reserved PID #
Description
Program Association Table
ITU-T Rec. H.222.0 |
ISO/IEC 13818-1
0x00
Associates Program Number and
Program Map Table PID
Program Map Table
ITU-T Rec. H.222.0 |
ISO/IEC 13818-1
Assigned in the PAT
Specifies PID values for components of
one or more programs
Network Information Table
Private
Assigned in the PAT
Physical network parameters such as
FDM frequencies, Transponder
Numbers, etc.
Conditional Access Table
ITU-T Rec. H.222.0 |
ISO/IEC 13818-1
0x01
Associates one or more (private) EMM
streams each with a unique PID value
Transport Stream
Description Table
ITU-T Rec. H.222.0 |
ISO/IEC 13818-1
0x02
Associates one or more descriptors from
Table 2-45 to an entire Transport
Stream
IPMP Control Information
Table
ITU-T Rec. H.222.0 |
ISO/IEC 13818-1
0x03
Contains IPMP Tool List, Rights
Container, Tool Container defined in
ISO/IEC 13818-11
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 defined PSI tables shall be segmented into one or more sections that are carried
within transports packets. A section is a syntactic structure that shall be used for mapping each ITU-T
Rec. H.222.0 | ISO/IEC 13818-1 defined PSI table into Transport Stream packets.
ITU-T Rec. H.222.0 (05/2006)
41
ISO/IEC 13818-1:2007 (E)
Along with ITU-T Rec. H.222.0 | ISO/IEC 13818-1 defined PSI tables, it is possible to carry private data tables. The
means by which private information is carried within Transport Stream packets is not defined by this Specification. It
may be structured in the same manner used for carrying of ITU-T Rec. H.222.0 | ISO/IEC 13818-1 defined PSI tables,
such that the syntax for mapping this private data is identical to that used for the mapping of ITU-T
Rec. H.222.0 | ISO/IEC 13818-1 defined PSI tables. For this purpose, a private section is defined. If the private data is
carried in Transport Stream packets with the same PID value as Transport Stream packets carrying Program Map Tables
(as identified in the Program Association Table), then the private_section syntax and semantics shall be used. The data
carried in the private_data_bytes may be scrambled. However, no other fields of the private_section shall be scrambled.
This private_section allows data to be transmitted with a minimum of structure. When this structure is not used, the
mapping of private data within Transport Stream packets is not defined by this Recommendation | International
Standard.
Sections may be variable in length. The beginning of a section is indicated by a pointer_field in the Transport Stream
packet payload. The syntax of this field is specified in Table 2-29.
Adaptation fields may occur in Transport Stream packets carrying PSI sections.
Within a Transport Stream, packet stuffing bytes of value 0xFF may be found in the payload of Transport Stream
packets carrying PSI and/or private_sections only after the last byte of a section. In this case all bytes until the end of
the Transport Stream packet shall also be stuffing bytes of value 0xFF. These bytes may be discarded by a decoder. In
such a case, the payload of the next Transport Stream packet with the same PID value shall begin with a pointer_field of
value 0x00 indicating that the next section starts immediately thereafter.
Each Transport Stream shall contain one or more Transport Stream packets with PID value 0x0000. These Transport
Stream packets together shall contain a complete Program Association Table, providing a complete list of all programs
within the Transport Stream. The most recently transmitted version of the table with the current_next_indicator set to a
value of '1' shall always apply to the current data in the Transport Stream. Any changes in the programs carried within
the Transport Stream shall be described in an updated version of the Program Association Table carried in Transport
Stream packets with PID value 0x0000. These sections shall all use table_id value 0x00. Only sections with this value
of table_id are permitted within Transport Stream packets with PID value of 0x0000. For a new version of the PAT to
become valid, all sections (as indicated in the last_section_number) with a new version_number and with the
current_next_indicator set to '1' must exit Bsys defined in the T-STD (refer to 2.4.2). The PAT becomes valid when the
last byte of the section needed to complete the table exits Bsys.
Whenever one or more elementary streams within a Transport Stream are scrambled, Transport Stream packets with a
PID value 0x0001 shall be transmitted containing a complete Conditional Access Table including CA_descriptors
associated with the scrambled streams. The transmitted Transport Stream packets will together form one complete
version of the conditional access table. The most recently transmitted version of the table with the
current_next_indicator set to a value of '1' shall always apply to the current data in the Transport Stream. Any changes
in scrambling making the existing table invalid or incomplete shall be described in an updated version of the conditional
access table. These sections will all use table_id value 0x01. Only sections with this table_id value are permitted within
Transport Stream packets with a PID value of 0x0001. For a new version of the CAT to become valid, all sections (as
indicated in the last_section_number) with a new version_number and with the current_next_indicator set to '1' must
exit Bsys. The CAT becomes valid when the last byte of the section needed to complete the table exits Bsys.
Each Transport Stream shall contain one or more Transport Stream packets with PID values which are labelled under
the program association table as Transport Stream packets containing TS program map sections. Each program listed in
the Program Association Table shall be described in a unique TS_program_map_section. Every program shall be fully
defined within the Transport Stream itself. Private data which has an associated elementary_PID field in the appropriate
Program Map Table section is part of the program. Other private data may exist in the Transport Stream without being
listed in the Program Map Table section. The most recently transmitted version of the TS_program_map_section with
the current_next_indicator set to a value of '1' shall always apply to the current data within the Transport Stream. Any
changes in the definition of any of the programs carried within the Transport Stream shall be described in an updated
version of the corresponding section of the program map table carried in Transport Stream packets with the PID value
identified as the program_map_PID for that specific program. All Transport Stream packets which carry a given
TS_program_map_section shall have the same PID value. During the continuous existence of a program, including all
of its associated events, the program_map_PID shall not change. A program definition shall not span more than one
TS_program_map_section. A new version of a TS_program_map_section becomes valid when the last byte of that
section with a new version_number and with the current_next_indicator set to '1' exits Bsys.
Sections with a table_id value of 0x02 shall contain Program Map Table information. Such sections may be carried in
Transport Stream packets with different PID values.
The Network Information Table is optional and its contents are private. If present it is carried within Transport Stream
packets that will have the same PID value, called the network_PID. The network_PID value is defined by the user and,
42
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
when present, shall be found in the Program Association Table under the reserved program_number 0x0000. If the
network information table exists, it shall take the form of one or more private_sections.
The maximum number of bytes in a section of a ITU-T Rec. H.222.0 | ISO/IEC 13818-1 defined PSI table is
1024 bytes. The maximum number of bytes in a private_section is 4096 bytes.
The Transport Stream Description Table is optional. When present, the Transport Stream Description is carried within
Transport Stream packets that have a PID value 0x0002 as specified in Table 2-28 and shall apply to the entire
Transport Stream. Sections of the Transport Stream Description shall use a table_id value of 0x03 as specified in
Table 2-31 and its contents are restricted to descriptors specified in Table 2-45. The TS_description_section becomes
valid when the last byte of the section required to complete the table exits Bsys.
There are no restrictions on the occurrence of start codes, sync bytes or other bit patterns in PSI data, whether this
Recommendation | International Standard or private.
2.4.4.1
Pointer
The pointer_field syntax is defined in Table 2-29.
Table 2-29 – Program specific information pointer
Syntax
pointer_field
2.4.4.2
No. of bits
Mnemonic
8
uimsbf
Semantics definition of fields in pointer syntax
pointer_field – This is an 8-bit field whose value shall be the number of bytes, immediately following the pointer_field
until the first byte of the first section that is present in the payload of the Transport Stream packet (so a value of 0x00 in
the pointer_field indicates that the section starts immediately after the pointer_field). When at least one section begins
in a given Transport Stream packet, then the payload_unit_start_indicator (refer to 2.4.3.2) shall be set to '1' and the first
byte of the payload of that Transport Stream packet shall contain the pointer. When no section begins in a given
Transport Stream packet, then the payload_unit_start_indicator shall be set to '0' and no pointer shall be sent in the
payload of that packet.
2.4.4.3
Program Association Table
The Program Association Table provides the correspondence between a program_number and the PID value of the
Transport Stream packets which carry the program definition. The program_number is the numeric label associated
with a program.
The overall table is contained in one or more sections with the following syntax. It may be segmented to occupy
multiple sections (see Table 2-30).
Table 2-30 – Program association section
Syntax
program_association_section() {
table_id
section_syntax_indicator
'0'
reserved
section_length
transport_stream_id
reserved
version_number
current_next_indicator
section_number
last_section_number
for (i = 0; i < N; i++) {
program_number
reserved
if (program_number = = '0') {
No. of bits
Mnemonic
8
1
1
2
12
16
2
5
1
8
8
uimsbf
bslbf
bslbf
bslbf
uimsbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
uimsbf
16
3
uimsbf
bslbf
ITU-T Rec. H.222.0 (05/2006)
43
ISO/IEC 13818-1:2007 (E)
Table 2-30 – Program association section
Syntax
No. of bits
Mnemonic
network_PID
13
uimsbf
program_map_PID
13
uimsbf
32
rpchof
}
else {
}
}
CRC_32
}
2.4.4.4
Table_id assignments
The table_id field identifies the contents of a Transport Stream PSI section as shown in Table 2-31.
Table 2-31 – table_id assignment values
Value
0x00
program_association_section
0x01
conditional_access_section (CA_section)
0x02
TS_program_map_section
0x03
TS_description_section
0x04
ISO_IEC_14496_scene_description_section
0x05
ISO_IEC_14496_object_descriptor_section
0x06
Metadata_section
0x07
IPMP_Control_Information_section (defined in ISO/IEC 13818-11)
0x08-0x3F
0x40-0xFE
0xFF
2.4.4.5
Description
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 reserved
User private
Forbidden
Semantic definition of fields in program association section
table_id – This is an 8-bit field, which shall be set to 0x00 as shown in Table 2-31.
section_syntax_indicator – The section_syntax_indicator is a 1-bit field which shall be set to '1'.
section_length – This is a 12-bit field, the first two bits of which shall be '00'. The remaining 10 bits specify the number
of bytes of the section, starting immediately following the section_length field, and including the CRC. The value in
this field shall not exceed 1021 (0x3FD).
transport_stream_id – This is a 16-bit field which serves as a label to identify this Transport Stream from any other
multiplex within a network. Its value is defined by the user.
version_number – This 5-bit field is the version number of the whole Program Association Table. The version number
shall be incremented by 1 modulo 32 whenever the definition of the Program Association Table changes. When the
current_next_indicator is set to '1', then the version_number shall be that of the currently applicable Program
Association Table. When the current_next_indicator is set to '0', then the version_number shall be that of the next
applicable Program Association Table.
current_next_indicator – A 1-bit indicator, which when set to '1' indicates that the Program Association Table sent is
currently applicable. When the bit is set to '0', it indicates that the table sent is not yet applicable and shall be the next
table to become valid.
section_number – This 8-bit field gives the number of this section. The section_number of the first section in the
Program Association Table shall be 0x00. It shall be incremented by 1 with each additional section in the Program
Association Table.
last_section_number – This 8-bit field specifies the number of the last section (that is, the section with the highest
section_number) of the complete Program Association Table.
44
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
program_number – Program_number is a 16-bit field. It specifies the program to which the program_map_PID is
applicable. When set to 0x0000, then the following PID reference shall be the network PID. For all other cases the value
of this field is user defined. This field shall not take any single value more than once within one version of the Program
Association Table.
NOTE – The program_number may be used as a designation for a broadcast channel, for example.
network_PID – The network_PID is a 13-bit field, which is used only in conjunction with the value of the
program_number set to 0x0000, specifies the PID of the Transport Stream packets which shall contain the Network
Information Table. The value of the network_PID field is defined by the user, but shall only take values as specified in
Table 2-3. The presence of the network_PID is optional.
program_map_PID – The program_map_PID is a 13-bit field specifying the PID of the Transport Stream packets
which shall contain the program_map_section applicable for the program as specified by the program_number. No
program_number shall have more than one program_map_PID assignment. The value of the program_map_PID is
defined by the user, but shall only take values as specified in Table 2-3.
CRC_32 – This is a 32-bit field that contains the CRC value that gives a zero output of the registers in the decoder
defined in Annex A after processing the entire program association section.
2.4.4.6
Conditional access Table
The Conditional Access (CA) Table provides the association between one or more CA systems, their EMM streams and
any special parameters associated with them. Refer to 2.6.16 for a definition of the descriptor() field in Table 2-32.
The table is contained in one or more sections with the following syntax. It may be segmented to occupy multiple
sections.
Table 2-32 – Conditional access section
Syntax
CA_section() {
table_id
section_syntax_indicator
'0'
reserved
section_length
reserved
version_number
current_next_indicator
section_number
last_section_number
for (i = 0; i < N; i++) {
descriptor()
}
CRC_32
}
2.4.4.7
No. of bits
Mnemonic
8
1
1
2
12
18
5
1
8
8
uimsbf
bslbf
bslbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
uimsbf
32
rpchof
Semantic definition of fields in conditional access section
table_id – This is an 8-bit field, which shall be set to 0x01 as specified in Table 2-31.
section_syntax_indicator – The section_syntax_indicator is a 1-bit field which shall be set to '1'.
section_length – This is a 12-bit field, the first two bits of which shall be '00'. The remaining 10-bits specify the
number of bytes of the section starting immediately following the section_length field, and including the CRC. The
value in this field shall not exceed 1021 (0x3FD).
version_number – This 5-bit field is the version number of the entire conditional access table. The version number
shall be incremented by 1 modulo 32 when a change in the information carried within the CA table occurs. When the
current_next_indicator is set to '1', then the version_number shall be that of the currently applicable Conditional Access
Table. When the current_next_indicator is set to '0', then the version_number shall be that of the next applicable
Conditional Access Table.
ITU-T Rec. H.222.0 (05/2006)
45
ISO/IEC 13818-1:2007 (E)
current_next_indicator – A 1-bit indicator, which when set to '1' indicates that the Conditional Access Table sent is
currently applicable. When the bit is set to '0', it indicates that the Conditional Access Table sent is not yet applicable
and shall be the next Conditional Access Table to become valid.
section_number – This 8-bit field gives the number of this section. The section_number of the first section in the
Conditional Access Table shall be 0x00. It shall be incremented by 1 with each additional section in the Conditional
Access Table.
last_section_number – This 8-bit field specifies the number of the last section (that is, the section with the highest
section_number) of the Conditional Access Table.
CRC_32 – This is a 32-bit field that contains the CRC value that gives a zero output of the registers in the decoder
defined in Annex A after processing the entire conditional access section.
2.4.4.8
Program Map Table
The Program Map Table provides the mappings between program numbers and the program elements that comprise
them. A single instance of such a mapping is referred to as a "program definition". The program map table is the
complete collection of all program definitions for a Transport Stream. This table shall be transmitted in packets, the PID
values of which are selected by the encoder. More than one PID value may be used, if desired. The table is contained in
one or more sections with the following syntax. It may be segmented to occupy multiple sections. In each section, the
section number field shall be set to zero. Sections are identified by the program_number field.
Definition for the descriptor() fields may be found in 2.6 (see Table 2-33).
Table 2-33 – Transport Stream program map section
Syntax
TS_program_map_section() {
table_id
section_syntax_indicator
'0'
reserved
section_length
program_number
reserved
version_number
current_next_indicator
section_number
last_section_number
reserved
PCR_PID
reserved
program_info_length
for (i = 0; i < N; i++) {
No. of bits
Mnemonic
8
1
1
2
12
16
2
5
1
8
8
3
13
4
12
uimsbf
bslbf
bslbf
bslbf
uimsbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
8
3
13
uimsbf
bslbf
uimsbf
4
12
bslbf
uimsbf
32
rpchof
descriptor()
}
for (i = 0; i < N1; i++) {
stream_type
reserved
elementary_PID
reserved
ES_info_length
for (i = 0; i < N2; i++) {
descriptor()
}
}
CRC_32
}
46
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
2.4.4.9
Semantic definition of fields in Transport Stream program map section
table_id – This is an 8-bit field, which in the case of a TS_program_map_section shall be always set to 0x02 as shown
in Table 2-31.
section_syntax_indicator – The section_syntax_indicator is a 1-bit field which shall be set to '1'.
section_length – This is a 12-bit field, the first two bits of which shall be '00'. The remaining 10 bits specify the number
of bytes of the section starting immediately following the section_length field, and including the CRC. The value in this
field shall not exceed 1021 (0x3FD).
program_number – program_number is a 16-bit field. It specifies the program to which the program_map_PID is
applicable. One program definition shall be carried within only one TS_program_map_section. This implies that a
program definition is never longer than 1016 (0x3F8). See Informative Annex C for ways to deal with the cases when
that length is not sufficient. The program_number may be used as a designation for a broadcast channel, for example.
By describing the different program elements belonging to a program, data from different sources (e.g., sequential
events) can be concatenated together to form a continuous set of streams using a program_number. For examples of
applications refer to Annex C.
version_number – This 5-bit field is the version number of the TS_program_map_section. The version number shall be
incremented by 1 modulo 32 when a change in the information carried within the section occurs. Version number refers
to the definition of a single program, and therefore to a single section. When the current_next_indicator is set to '1', then
the version_number shall be that of the currently applicable TS_program_map_section. When the
current_next_indicator is set to '0', then the version_number shall be that of the next applicable
TS_program_map_section.
current_next_indicator – A 1-bit field, which when set to '1' indicates that the TS_program_map_section sent is
currently applicable. When the bit is set to '0', it indicates that the TS_program_map_section sent is not yet applicable
and shall be the next TS_program_map_section to become valid.
section_number – The value of this 8-bit field shall be 0x00.
last_section_number – The value of this 8-bit field shall be 0x00.
PCR_PID – This is a 13-bit field indicating the PID of the Transport Stream packets which shall contain the PCR fields
valid for the program specified by program_number. If no PCR is associated with a program definition for private
streams, then this field shall take the value of 0x1FFF. Refer to the semantic definition of PCR in 2.4.3.5 and Table 2-3
for restrictions on the choice of PCR_PID value.
program_info_length – This is a 12-bit field, the first two bits of which shall be '00'. The remaining 10 bits specify the
number of bytes of the descriptors immediately following the program_info_length field.
stream_type – This is an 8-bit field specifying the type of program element carried within the packets with the PID
whose value is specified by the elementary_PID. The values of stream_type are specified in Table 2-34.
NOTE – An ITU-T Rec. H.222.0 | ISO/IEC 13818-1 auxiliary stream is available for data types defined by this Specification,
other than audio, video, and DSM-CC, such as Program Stream Directory and Program Stream Map.
Table 2-34 – Stream type assignments
Value
Description
0x00
ITU-T | ISO/IEC Reserved
0x01
ISO/IEC 11172-2 Video
0x02
ITU-T Rec. H.262 | ISO/IEC 13818-2 Video or ISO/IEC 11172-2 constrained parameter video stream
0x03
ISO/IEC 11172-3 Audio
0x04
ISO/IEC 13818-3 Audio
0x05
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 private_sections
0x06
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 PES packets containing private data
0x07
ISO/IEC 13522 MHEG
0x08
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Annex A DSM-CC
0x09
ITU-T Rec. H.222.1
0x0A
ISO/IEC 13818-6 type A
0x0B
ISO/IEC 13818-6 type B
0x0C
ISO/IEC 13818-6 type C
ITU-T Rec. H.222.0 (05/2006)
47
ISO/IEC 13818-1:2007 (E)
Table 2-34 – Stream type assignments
Value
Description
0x0D
ISO/IEC 13818-6 type D
0x0E
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 auxiliary
0x0F
ISO/IEC 13818-7 Audio with ADTS transport syntax
0x10
ISO/IEC 14496-2 Visual
0x11
ISO/IEC 14496-3 Audio with the LATM transport syntax as defined in ISO/IEC 14496-3
0x12
ISO/IEC 14496-1 SL-packetized stream or FlexMux stream carried in PES packets
0x13
ISO/IEC 14496-1 SL-packetized stream or FlexMux stream carried in ISO/IEC 14496_sections
0x14
ISO/IEC 13818-6 Synchronized Download Protocol
0x15
Metadata carried in PES packets
0x16
Metadata carried in metadata_sections
0x17
Metadata carried in ISO/IEC 13818-6 Data Carousel
0x18
Metadata carried in ISO/IEC 13818-6 Object Carousel
0x19
Metadata carried in ISO/IEC 13818-6 Synchronized Download Protocol
0x1A
IPMP stream (defined in ISO/IEC 13818-11, MPEG-2 IPMP)
0x1B
0x1C-0x7E
AVC video stream as defined in ITU-T Rec. H.264 | ISO/IEC 14496-10 Video
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Reserved
0x7F
IPMP stream
0x80-0xFF
User Private
elementary_PID – This is a 13-bit field specifying the PID of the Transport Stream packets which carry the associated
program element.
ES_info_length – This is a 12-bit field, the first two bits of which shall be '00'. The remaining 10 bits specify the
number of bytes of the descriptors of the associated program element immediately following the ES_info_length field.
CRC_32 – This is a 32-bit field that contains the CRC value that gives a zero output of the registers in the decoder
defined in Annex B after processing the entire Transport Stream program map section.
2.4.4.10 Syntax of the Private section
When private data is sent in Transport Stream packets with a PID value designated as a Program Map Table PID in the
Program Association Table the private_section shall be used. The private_section allows data to be transmitted with a
minimum of structure while enabling a decoder to parse the stream. The sections may be used in two ways: if the
section_syntax_indicator is set to '1', then the whole structure common to all tables shall be used; if the indicator is set
to '0', then only the fields 'table_id' through 'private_section_length' shall follow the common structure syntax and
semantics and the rest of the private_section may take any form the user determines. Examples of extended use of this
syntax are found in Informative Annex C.
48
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
A private table may be made of several private_sections, all with the same table_id (see Table 2-35).
Table 2-35 – Private section
Syntax
private_section() {
table_id
section_syntax_indicator
private_indicator
Reserved
private_section_length
if (section_syntax_indicator = = '0') {
for (i = 0; i < N; i++) {
private_data_byte
}
}
else {
table_id_extension
Reserved
version_number
current_next_indicator
section_number
last_section_number
for (i = 0; i < private_section_length-9; i++) {
private_data_byte
}
CRC_32
No. of
bits
Mnemonic
8
1
1
2
12
uimsbf
bslbf
bslbf
bslbf
uimsbf
8
bslbf
16
2
uimsbf
bslbf
5
1
8
8
uimsbf
bslbf
uimsbf
uimsbf
8
bslbf
32
rpchof
}
}
2.4.4.11 Semantic definition of fields in private section
table_id – This 8-bit field, the value of which identifies the Private Table this section belongs to. Only values defined in
Table 2-31 as "user private" may be used.
section_syntax_indicator – This is a 1-bit indicator. When set to '1', it indicates that the private section follows the
generic section syntax beyond the private_section_length field. When set to '0', it indicates that the private_data_bytes
immediately follow the private_section_length field.
private_indicator – This is a 1-bit user-definable flag that shall not be specified by ITU-T | ISO/IEC in the future.
private_section_length – A 12-bit field. It specifies the number of remaining bytes in the private section immediately
following the private_section_length field up to the end of the private_section. The value in this field shall not exceed
4093 (0xFFD).
private_data_byte – The private_data_byte field is user definable and shall not be specified by ITU-T | ISO/IEC in the
future.
table_id_extension – This is a 16-bit field. Its use and value are defined by the user.
version_number – This 5-bit field is the version number of the private_section. The version_number shall be
incremented by 1 modulo 32 when a change in the information carried within the private_section occurs. When the
current_next_indicator is set to '0', then the version_number shall be that of the next applicable private_section with the
same table_id and section_number.
current_next_indicator – A 1-bit field, which when set to '1' indicates that the private_section sent is currently
applicable. When the current_next_indicator is set to '1', then the version_number shall be that of the currently
applicable private_section. When the bit is set to '0', it indicates that the private_section sent is not yet applicable and
shall be the next private_section with the same section_number and table_id to become valid.
ITU-T Rec. H.222.0 (05/2006)
49
ISO/IEC 13818-1:2007 (E)
section_number – This 8-bit field gives the number of the private_section. The section_number of the first section in a
private table shall be 0x00. The section_number shall be incremented by 1 with each additional section in this private
table.
last_section_number – This 8-bit field specifies the number of the last section (that is, the section with the highest
section_number) of the private table of which this section is a part.
CRC_32 – This is a 32-bit field that contains the CRC value that gives a zero output of the registers in the decoder
defined in Annex A after processing the entire private section.
2.4.4.12 Syntax of the Transport Stream section
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 compliant bitstreams may carry the information defined in Table 2-36.
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 compliant decoders may decode the information defined in this table.
The Transport Stream Description Table is defined to support the carriage of descriptors as found in 2.6 for an entire
Transport Stream. The descriptors shall apply to the entire Transport Stream. This table uses a table_id value of 0x03 as
specified in Table 2-31 and is carried in Transport Stream packets whose PID value is 0x0002 as specified in Table 2-3.
Table 2-36 – The Transport Stream Description Table
Syntax
No. of bits
TS_description_section() {
table_id
section_syntax_indicator
'0'
Reserved
section_length
Reserved
version_number
current_next_indicator
section_number
last_section_number
for (i = 0; i < N; i++) {
descriptor()
}
CRC_32
}
Mnemonic
8
1
1
2
12
18
5
1
8
8
uimsbf
bslbf
bslbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
uimsbf
32
rpchof
2.4.4.13 Semantic definition of fields in the Transport Stream section
table_id – This is an 8-bit field, which shall be set to '0x03' as specified in Table 2-31.
section_length – This is a 12-bit field, the first two bits of which shall be '00'. The remaining 10 bits specify the number
of bytes of the section, starting immediately following the section_length field, and including the CRC. The value in
this field shall not exceed 1021 (0x3FD).
version_number – This 5-bit field is the version number of the whole Transport Stream Description Table. The version
number shall be incremented by 1 modulo 32 whenever the definition of the Transport Stream Description Table
changes. When the current_next_indicator is set to '1', then the version_number shall be that of the currently applicable
Transport Stream Description Table. When the current_next_indicator is set to '0', then the version_number shall be that
of the next applicable Transport Stream Description Table.
current_next_indicator – A 1-bit indicator, which, when set to '1', indicates that the Transport Stream Description
Table sent is currently applicable. When the bit is set to '0', it indicates that the table sent is not yet applicable and shall
be the next table to become valid.
section_number – This 8-bit field gives the number of this section. The section_number of the first section in the
Transport Stream Description Table shall be 0x00. It shall be incremented by 1 with each additional section in the
Transport Stream Description Table.
last_section_number – This 8-bit field specifies the number of the last section (that is, the section with the highest
section_number) of the complete Transport Stream Description Table.
50
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
CRC_32 – This is a 32-bit field that contains the CRC value that gives a zero output of the registers in the decoder
defined in Annex A after processing the entire Transport Stream Description section.
2.5
Program Stream bitstream requirements
2.5.1
Program Stream coding structure and parameters
The ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Program Stream coding layer allows one program of one or more
elementary streams to be combined into a single stream. Data from each elementary stream are multiplexed together
with information that allows synchronized presentation of the elementary streams within the program.
A Program Stream consists of one or more elementary streams from one program multiplexed together. Audio and
video elementary streams consist of access units.
Elementary Stream data is carried in PES packets. A PES packet consists of a PES packet header followed by packet
data. PES packets are inserted into Program Stream packs.
The PES packet header begins with a 32-bit start-code that also identifies the stream (refer to Table 2-22) to which the
packet data belongs. The PES packet header may contain just a Presentation Time Stamp (PTS) or both a presentation
timestamp and a Decoding Time Stamp (DTS). The PES packet header also contains other optional fields. The packet
data contains a variable number of contiguous bytes from one elementary stream.
In a Program Stream, PES packets are organized in packs. A pack commences with a pack header and is followed by
zero or more PES packets. The pack header begins with a 32-bit start-code. The pack header is used to store timing and
bitrate information.
The Program Stream begins with a system header that optionally may be repeated. The system header carries a
summary of the system parameters defined in the stream.
This Recommendation | International Standard does not specify the coded data which may be used as part of conditional
access systems. This Recommendation | International Standard does, however, provide mechanisms for program service
providers to transport and identify this data for decoder processing, and to correctly reference data which are here
specified.
2.5.2
Program Stream system target decoder
The semantics of the Program Stream and the constraints on these semantics require exact definitions of decoding
events and the times at which these events occur. The definitions needed are set out in this Specification using a
hypothetical decoder known as the Program Stream system target decoder (P-STD).
The P-STD is a conceptual model used to define these terms precisely and to model the decoding process during the
construction of Program Streams. The P-STD is defined only for this purpose. Neither the architecture of the P-STD nor
the timing described precludes uninterrupted, synchronized playback of Program Streams from a variety of decoders
with different architectures or timing schedules.
ITU-T Rec. H.222.0 (05/2006)
51
ISO/IEC 13818-1:2007 (E)
The following notation is used to describe the Program Stream system target decoder and is partially illustrated
in Figure 2-2.
i, i′ are indices to bytes in the Program Stream. The first byte has index 0.
j is an index to access units in the elementary streams.
k, k′, k″ are indices to presentation units in the elementary streams.
n is an index to the elementary streams.
t(i) indicates the time in seconds at which the i-th byte of the Program Stream enters the system
target decoder. The value t(0) is an arbitrary constant.
SCR(i) is the time encoded in the SCR field measured in units of the 27 MHz system clock where i is
the byte index of the final byte of the system_clock_reference_base field.
An(j) is the j-th access unit in elementary stream n. An(j) is indexed in decoding order.
tdn(j) is the decoding time, measured in seconds, in the system target decoder of the j-th access unit
in elementary stream n.
Pn(k) is the k-th presentation unit in elementary stream n. Pn(k) is indexed in presentation order.
tpn(k) is the presentation time, measured in seconds, in the system target decoder of the k-th
presentation unit in elementary stream n.
t is time measured in seconds.
Fn(t) is the fullness, measured in bytes, of the system target decoder input buffer for elementary
stream n at time t.
Bn the input buffer in the system target decoder for elementary stream n.
BSn is the size of the system target decoder input buffer, measured in bytes, for elementary stream
n.
Dn is the decoder for elementary stream n.
On is the reorder buffer for video elementary stream n.
2.5.2.1
System clock frequency
Timing information referenced in P-STD is carried by several data fields defined in this Specification. The fields are
defined in 2.5.3.3 and 2.4.3.6. This information is coded as the sampled value of a system clock.
The value of the system clock frequency is measured in Hz and shall meet the following constraints:
–
27 000 000 – 810 <= system_clock_frequency <= 27 000 000 + 810;
–
rate of change of system_clock_frequency with time <= 75 × 10–3 Hz/s.
The notation "system_clock_frequency" is used in several places in this Recommendation | International Standard
to refer to the frequency of a clock meeting these requirements. For notational convenience, equations in
which SCR, PTS, or DTS appear, lead to values of time which are accurate to some integral multiple of
(300 × 233/system_clock_frequency) seconds. This is due to the encoding of SCR timing information as 33 bits of 1/300
of the system clock frequency plus 9 bits for the remainder, and encoding as 33 bits of the system clock frequency
divided by 300 for PTS and DTS.
2.5.2.2
Input to the Program Stream system target decoder
Data from the Program Stream enters the system target decoder. The i-th byte enters at time t(i). The time at which this
byte enters the system target decoder can be recovered from the input stream by decoding the input System Clock
Reference (SCR) fields and the program_mux_rate field encoded in the pack header. The SCR, as defined in
equation 2-18, is coded in two parts: one, in units the period of 1/300 × the system clock frequency, called
system_clock_reference_base (see equation 2-19), and one, called system_clock_reference_ext equation
(see equation 2-20), in units of the period of the system clock frequency. In the following the values encoded in these
fields are denoted by SCR_base(i) and SCR_ext(i). The value encoded in the SCR field indicates time t(i), where i
refers to the byte containing the last bit of the system_clock_reference_base field.
Specifically:
SCR (i ) = SCR _ base (i ) × 300 + SCR _ ext (i )
52
ITU-T Rec. H.222.0 (05/2006)
(2-18)
ISO/IEC 13818-1:2007 (E)
where:
SCR _ base (i ) = (( system _ clock _ frequency × t (i )) DIV 300)%2 33
(2-19)
SCR _ ext (i ) = (( system _ clock _ frequency × t (i )) DIV 1)%300
(2-20)
The input arrival time, t(i), as given in equation 2-21, for all other bytes shall be constructed from SCR(i) and the rate at
which data arrives, where the arrival rate within each pack is the value represented in the program_mux_rate field in
that pack's header.
t (i ) =
SCR (i ′)
i − i′
+
system _ clock _ frequency program _ mux _ rate × 50
(2-21)
where:
i′ is the index of the byte containing the last bit of the system_clock_reference_base field in the
pack header
i is the index of any byte in the pack, including the pack header
SCR(i′) is the time encoded in the system clock reference base and extension fields in units of the
system clock
program_mux_rate is a field defined in 2.5.3.3.
After delivery of the last byte of a pack there may be a time interval during which no bytes are delivered to the input of
the P-STD.
2.5.2.3
Buffering
The PES packet data from elementary stream n is passed to the input buffer for stream n, Bn. Transfer of byte i from the
system target decoder input to Bn is instantaneous, so that byte i enters the buffer for stream n, of size BSn, at time t(i).
Bytes present in the pack header, system headers, Program Stream Maps, Program Stream Directories, or PES packet
headers of the Program Stream such as SCR, DTS, PTS, and packet_length fields, are not delivered to any of the
buffers, but may be used to control the system.
The input buffer sizes BS1 through BSn are given by the P-STD buffer size parameter in the syntax in equations 2-16
and 2-17.
At the decoding time, tdn(j), all data for the access unit that has been in the buffer longest, An(j), and any stuffing bytes
that immediately precede it that are present in the buffer at the time tdn(j), are removed instantaneously at time tdn(j).
The decoding time tdn(j) is specified in the DTS or PTS fields. Decoding times tdn(j + 1), tdn(j + 2), ... of access units
without encoded DTS or PTS fields which directly follow access unit j may be derived from information in the
elementary stream. Refer to Annex C of ITU-T Rec. H.262 | ISO/IEC 13818-2, ISO/IEC 13818-3, ISO/IEC 11172-2 or
ISO/IEC 11172-3. Also refer to 2.7.5. As the access unit is removed from the buffer, it is instantaneously decoded to a
presentation unit.
The Program Stream shall be constructed and t(i) shall be chosen so that the input buffers of size BS1 through BSn
neither overflow nor underflow in the program system target decoder. That is:
0 ≤ Fn (t ) ≤ BS n
for all t and n,
and:
Fn (t ) = 0
instantaneously before t = t(0).
Fn(t) is the instantaneous fullness of P-STD buffer Bn.
ITU-T Rec. H.222.0 (05/2006)
53
ISO/IEC 13818-1:2007 (E)
An exception to this condition is that the P-STD buffer Bn may underflow when the low_delay flag in the video
sequence header is set to '1' (refer to 2.4.2.6) or when trick_mode status is true (refer to 2.4.3.8).
For all Program Streams, the delay caused by system target decoder input buffering shall be less than or equal to one
second except for still picture video data and ISO/IEC 14496 streams. The input buffering delay is the difference in
time between a byte entering the input buffer and when it is decoded.
Specifically: in the case of no still picture video data and no ISO/IEC 14496 stream the delay is constrained by:
tdn ( j ) − t (i ) < = 1 s
in the case of still picture video data the delay is constrained by:
tdn ( j ) − t (i ) < = 60 s
in the case of ISO/IEC 14496 streams the delay is constrained by:
tdn ( j ) − t (i ) < = 10 s
for all bytes contained in access unit j.
For Program Streams, all bytes of each pack shall enter the P-STD before any byte of a subsequent pack.
When the low_delay flag in the video sequence extension is set to '1' (refer to 6.2.2.3 of ITU-T Rec. H.262 |
ISO/IEC 13818-2), the VBV buffer may underflow. In this case when the P-STD elementary stream buffer Bn is
examined at the time specified by tdn(j), the complete data for the access unit may not be present in the buffer Bn. When
this case arises, the buffer shall be re-examined at intervals of two field-periods until the data for the complete access
unit is present in the buffer. At this time the entire access unit shall be removed from buffer Bn instantaneously.
VBV buffer underflow is allowed to occur continuously without limit. The P-STD decoder shall remove access unit
data from buffer Bn at the earliest time consistent with the paragraph above and any DTS or PTS values encoded in the
bitstream. The decoder may be unable to re-establish correct decoding and display times as indicated by DTS and PTS
until the VBV buffer underflow situation ceases and a PTS or DTS is found in the bitstream.
2.5.2.4
PES streams
It is possible to construct a stream of data as a contiguous stream of PES packets each containing data of the same
elementary stream and with the same stream_id. Such a stream is called a PES stream. The PES-STD model for a PES
stream is identical to that for the Program Stream, with the exception that the Elementary Stream Clock Reference
(ESCR) is used in place of the SCR, and ES_rate in place of program_mux_rate. The demultiplexor sends data to only
one elementary stream buffer.
Buffer sizes BSn in the PES-STD model are defined as follows:
–
For ITU-T Rec. H.262 | ISO/IEC 13818-2 video:
BSn = VBVmax[profile, level] + BSoh
BSoh = (1/750) seconds × Rmax[profile, level], where VBVmax[profile, level] and Rmax[profile, level] are
the maximum VBV size and bit rate per profile, level, and layer as defined in Tables 8-14 and 8-13,
respectively, of ITU-T Rec. H.262 | ISO/IEC 13818-2. BSoh is allocated for PES packet header overhead.
–
For ISO/IEC 11172-2 video:
BSn = VBVmax + BSoh
BSoh = (1/750) seconds × Rmax, where Rmax and vbv_max refer to the maximum bitrate and maximum
vbv_buffer_size for a constrained parameter bitstream in ISO/IEC 11172-2 respectively.
–
For ISO/IEC 11172-3 or ISO/IEC 13818-3 audio:
BSn = 2848 bytes
54
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
–
For ITU-T Rec. H.264 | ISO/IEC 14496-10 video:
BSn = 1200 × MaxCPB[level] + BSoh
where MaxCPB[level] is defined in Table A.1 (Level Limits) in ITU-T Rec. H.264 | ISO/IEC 14496-10
for each level.
2.5.2.5
Decoding and presentation
Decoding and presentation in the Program Stream system target decoder are the same as defined for the Transport
Stream system target decoder in 2.4.2.4 and 2.4.2.5 respectively.
2.5.2.6
P-STD extensions for carriage of ISO/IEC 14496 data
For decoding of ISO/IEC 14496 data carried in a Program Stream the P-STD model is extended. For decoding of
individual ISO/IEC 14496 elementary streams in the P-STD see 2.11.2. Clause 2.11.3 defines P-STD extensions and
parameters for decoding of ISO/IEC 14496 scenes and associated streams.
2.5.2.7
P-STD extensions for carriage of ITU-T Rec. H.264 | ISO/IEC 14496-10 Video
For decoding of ITU-T Rec. H.264 | ISO/IEC 14496-10 video streams carried in a Program Stream in the P-STD model,
see 2.14.3.2.
2.5.3
Specification of the Program Stream syntax and semantics
The following syntax describes a stream of bytes.
2.5.3.1
Program Stream
See Table 2-37.
Table 2-37 – Program Stream
Syntax
No. of bits
Mnemonic
32
bslbf
MPEG2_program_stream() {
do {
pack()
} while (nextbits() = = pack_start_code)
MPEG_program_end_code
}
2.5.3.2
Semantic definition of fields in Program Stream
MPEG_program_end_code – The MPEG_program_end_code is the bit string '0000 0000 0000 0000 0000 0001 1011
1001' (0x000001B9). It terminates the Program Stream.
2.5.3.3
Pack layer of Program Stream
See Tables 2-38 and 2-39.
Table 2-38 – Program Stream pack
Syntax
No. of bits
Mnemonic
pack() {
pack_header()
while (nextbits() = -= packet_start_code_prefix) {
PES_packet()
}
}
ITU-T Rec. H.222.0 (05/2006)
55
ISO/IEC 13818-1:2007 (E)
Table 2-39 – Program Stream pack header
Syntax
pack_header() {
pack_start_code
'01'
system_clock_reference_base [32..30]
marker_bit
system_clock_reference_base [29..15]
marker_bit
system_clock_reference_base [14..0]
marker_bit
system_clock_reference_extension
marker_bit
program_mux_rate
marker_bit
marker_bit
reserved
pack_stuffing_length
for (i = 0; i < pack_stuffing_length; i++) {
stuffing_byte
}
if (nextbits() = = system_header_start_code) {
system_header ()
}
}
2.5.3.4
No. of bits
32
2
3
1
15
1
15
1
9
1
22
1
1
5
3
8
Mnemonic
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
bslbf
bslbf
uimsbf
bslbf
Semantic definition of fields in program stream pack
pack_start_code – The pack_start_code is the bit string '0000 0000 0000 0000 0000 0001 1011 1010' (0x000001BA).
It identifies the beginning of a pack.
system_clock_reference_base; system_clock_reference_extension – The system clock reference (SCR) is a 42-bit
field coded in two parts. The first part, system_clock_reference_base, is a 33-bit field whose value is given by
SCR_base(i) as given in equation 2-19. The second part, system_clock_reference_extension, is a 9-bit field whose value
is given by SCR_ext(i), as given in equation 2-20. The SCR indicates the intended time of arrival of the byte containing
the last bit of the system_clock_reference_base at the input of the program target decoder.
The frequency of coding requirements for the SCR field are given in 2.7.1.
marker_bit – A marker_bit is a 1-bit field that has the value '1'.
program_mux_rate – This is a 22-bit integer specifying the rate at which the P-STD receives the Program Stream
during the pack in which it is included. The value of program_mux_rate is measured in units of 50 bytes/second. The
value '0' is forbidden. The value represented in program_mux_rate is used to define the time of arrival of bytes at the
input to the P-STD in 2.5.2. The value encoded in the program_mux_rate field may vary from pack to pack in an ITU-T
Rec. H.222.0 | ISO/IEC 13818-1 program multiplexed stream.
pack_stuffing_length – A 3-bit integer specifying the number of stuffing bytes which follow this field.
stuffing_byte – This is a fixed 8-bit value equal to '1111 1111' that can be inserted by the encoder, for example to meet
the requirements of the channel. It is discarded by the decoder. In each pack header no more than 7 stuffing bytes shall
be present.
56
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
2.5.3.5
System header
See Table 2-40.
Table 2-40 – Program Stream system header
Syntax
No. of bits
Mnemonic
32
16
1
22
1
6
1
1
1
bslbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
bslbf
bslbf
system_video_lock_flag
marker_bit
video_bound
packet_rate_restriction_flag
1
1
5
1
bslbf
bslbf
uimsbf
bslbf
reserved_bits
while (nextbits () = = '1') {
stream_id
'11'
P-STD_buffer_bound_scale
P-STD_buffer_size_bound
}
7
bslbf
8
2
1
13
uimsbf
bslbf
bslbf
uimsbf
system_header () {
system_header_start_code
header_length
marker_bit
rate_bound
marker_bit
audio_bound
fixed_flag
CSPS_flag
system_audio_lock_flag
}
2.5.3.6
Semantic definition of fields in system header
system_header_start_code – The system_header_start_code is the bit string '0000 0000 0000 0000 0000 0001 1011
1011' (0x000001BB). It identifies the beginning of a system header.
header_length – This 16-bit field indicates the length in bytes of the system header following the header_length field.
Future extensions of this Specification may extend the system header.
rate_bound – A 22-bit field. The rate_bound is an integer value greater than or equal to the maximum value of the
program_mux_rate field coded in any pack of the Program Stream. It may be used by a decoder to assess whether it is
capable of decoding the entire stream.
audio_bound – A 6-bit field. The audio_bound is an integer in the inclusive range from 0 to 32 and is set to a value
greater than or equal to the maximum number of ISO/IEC 13818-3 and ISO/IEC 11172-3 audio streams in the Program
Stream for which the decoding processes are simultaneously active. For the purpose of this subclause, the decoding
process of an ISO/IEC 13818-3 or ISO/IEC 11172-3 audio stream is active if the STD buffer is not empty or if a
Presentation Unit is being presented in the P-STD model.
fixed_flag – The fixed_flag is a 1-bit flag. When set to '1' fixed bitrate operation is indicated. When set to '0' variable
bitrate operation is indicated. During fixed bitrate operation, the value encoded in all system_clock_reference fields in
the multiplexed ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream shall adhere to the following linear equation:
SCR_base(i) = ((c1 × i + c2) DIV 300) % 233
(2-22)
SCR_ext(i) = ((c1 × i + c2) DIV 300) % 300
(2-23)
ITU-T Rec. H.222.0 (05/2006)
57
ISO/IEC 13818-1:2007 (E)
where:
c1 is a real-valued constant valid for all i.
c2 is a real-valued constant valid for all i.
i is the index in the ITU-T Rec. H.222.0 | ISO/IEC 13818-1 multiplexed stream of the byte
containing the final bit of any system_clock_reference field in the stream.
CSPS_flag – The CSPS_flag is a 1-bit field. If its value is set to '1' the Program Stream meets the constraints defined
in 2.7.9.
system_audio_lock_flag – The system_audio_lock_flag is a 1-bit field indicating that there is a specified, constant
rational relationship between the audio sampling rate and the system_clock_frequency in the system target decoder. The
system_clock_frequency is defined in 2.5.2.1 and the audio sampling rate is specified in ISO/IEC 13818-3. The
system_audio_lock_flag may only be set to '1' if, for all presentation units in all audio elementary streams in the
Program Stream, the ratio of system_clock_frequency to the actual audio sampling rate, SCASR, is constant and equal
to the value indicated in the following table at the nominal sampling rate indicated in the audio stream.
SCASR =
The notation
system _ clock _ frequency
audio _ sample _ rate _ in _ the _ P − STD
(2-24)
X
denotes real division.
Y
Nominal audio
sampling
frequency (kHz)
16
32
22.05
44.1
24
48
SCASR
27 000 000
------------16 000
27 000 000
------------32 000
27 000 000
------------22 050
27 000 000
------------44 100
27 000 000
------------24 000
27 000 000
------------48 000
system_video_lock_flag – The system_video_lock_flag is a 1-bit field indicating that there is a specified, constant
rational relationship between the video time base and the system clock frequency in the system target decoder. The
system_video_lock_flag may only be set to '1' if, for all presentation units in all video elementary streams in the ITU-T
Rec. H.222.0 | ISO/IEC 13818-1 program, the ratio of system_clock_frequency to the frequency of the actual video
time base is constant.
For ISO/IEC 11172-2 and ITU-T Rec. H.262 | ISO/IEC 13818-2 video streams, if the system_video_lock_flag is set to
'1', then the ratio of system_clock_frequency to the actual video frame rate, SCFR, shall be constant and equal to the
value indicated in the following table at the nominal frame rate indicated in the video stream.
For ISO/IEC 14496-2 video streams, if the system_video_lock_flag is set to '1', then the time base of the
ISO/IEC 14496-2 video stream, as defined by vop_time_increment_resolution, shall be locked to the STC and shall be
exactly equal to N times system_clock_frequency divided by K, with N and K integers that have a fixed value within
each visual object sequence, with K greater than or equal to N.
For ITU-T Rec. H.264 | ISO/IEC 14496-10 video streams, the frequency of the AVC time base is defined by the AVC
parameter time_scale. If the system_video_lock_flag is set to '1' for an AVC video stream, then the frequency of the
AVC time base shall be locked to the STC and shall be exactly equal to N times system_clock_frequency divided by K,
with N and K integers that have a fixed value within each AVC video sequence, with K greater than or equal to N.
SCFR =
system _ clock _ frequency
frame _ rate _ in _ the _ P − STD
(2-25)
Nominal
frame rate
(Hz)
23.976
24
25
29.97
30
50
59.94
60
SCFR
1 126 125
1 125 000
1 080 000
900 900
900 000
540 000
450 450
450 000
The values of the ratio SCFR are exact. The actual frame rate differs slightly from the nominal rate in cases where the
nominal rate is 23.976, 29.97, or 59.94 frames per second.
58
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
video_bound – The video_bound is a 5-bit integer in the inclusive range from 0 to 16 and is set to a value greater than
or equal to the maximum number of video streams in the Program Stream of which the decoding processes are
simultaneously active. For the purpose of this subclause, the decoding process of a video stream is active if one of the
buffers in the P-STD model is not empty, or if a Presentation Unit is being presented in the P-STD model.
packet_rate_restriction_flag – The packet_rate_restriction_flag is a 1-bit flag. If the CSPS flag is set to '1', the
packet_rate_restriction_flag indicates which constraint is applicable to the packet rate, as specified in 2.7.9. If the CSPS
flag is set to value of '0', then the meaning of the packet_rate_restriction_flag is undefined.
reserved_bits – This 7-bit field is reserved for future use by ISO/IEC. Until otherwise specified by ITU-T | ISO/IEC it
shall have the value '111 1111'.
stream_id – The stream_id is an 8-bit field that indicates the coding and elementary stream number of the stream to
which the following P-STD_buffer_bound_scale and P-STD_buffer_size_bound fields refer.
If stream_id equals '1011 1000' the P-STD_buffer_bound_scale and P-STD_buffer_size_bound fields following the
stream_id refer to all audio streams in the Program Stream.
If stream_id equals '1011 1001' the P-STD_buffer_bound_scale and P-STD_buffer_size_bound fields following the
stream_id refer to all video streams in the Program Stream.
If the stream_id takes on any other value it shall be a byte value greater than or equal to '1011 1100' and shall be
interpreted as referring to the stream coding and elementary stream number according to Table 2-22.
Each elementary stream present in the Program Stream shall have its P-STD_buffer_bound_scale and
P-STD_buffer_size_bound specified exactly once by this mechanism in each system header.
P-STD_buffer_bound_scale – The P-STD_buffer_bound_scale is a 1-bit field that indicates the scaling factor used to
interpret the subsequent P-STD_buffer_size_bound field. If the preceding stream_id indicates an audio stream,
P-STD_buffer_bound_scale shall have the value '0'. If the preceding stream_id indicates a video stream,
P-STD_buffer_bound_scale shall have the value '1'. For all other stream types, the value of the
P-STD_buffer_bound_scale may be either '1' or '0'.
P-STD_buffer_size_bound – The P-STD_buffer_size_bound is a 13-bit unsigned integer defining a value greater than
or equal to the maximum P-STD input buffer size, BSn, over all packets for stream n in the Program Stream. If
P-STD_buffer_bound_scale has the value '0', then P-STD_buffer_size_bound measures the buffer size bound in units of
128 bytes. If P-STD_buffer_bound_scale has the value '1', then P-STD_buffer_size_bound measures the buffer size
bound in units of 1024 bytes. Thus:
if ( P − STD _ buffer _ bound _ scale = = 0)
BS n ≤ P − STD _ buffer _ size _ bound × 128
else:
BSn ≤ P – STD_buffer_size_bound × 1024
2.5.3.7
Packet layer of Program Stream
The packet layer of the Program Stream is defined by the PES packet layer in 2.4.3.6.
2.5.4
Program Stream map
The Program Stream Map (PSM) provides a description of the elementary streams in the Program Stream and their
relationship to one another. When carried in a Transport Stream this structure shall not be modified. The PSM is present
as a PES packet when the stream_id value is 0xBC (refer to Table 2-22).
NOTE – This syntax differs from the PES packet syntax described in 2.4.3.6.
Definition for the descriptor() fields may be found in 2.6.
ITU-T Rec. H.222.0 (05/2006)
59
ISO/IEC 13818-1:2007 (E)
2.5.4.1
Syntax of Program Stream map
See Table 2-41.
Table 2-41 – Program Stream map
Syntax
No. of bits
Mnemonic
24
8
16
1
2
5
7
1
16
bslbf
uimsbf
uimsbf
bslbf
bslbf
uimsbf
bslbf
bslbf
uimsbf
for (i = 0; i < N; i++) {
descriptor()
}
elementary_stream_map_length
16
uimsbf
for (i = 0; i < N1; i++) {
stream_type
elementary_stream_id
elementary_stream_info_length
8
8
16
uimsbf
uimsbf
uimsbf
32
rpchof
program_stream_map() {
packet_start_code_prefix
map_stream_id
program_stream_map_length
current_next_indicator
reserved
program_stream_map_version
reserved
marker_bit
program_stream_info_length
for (i = 0; i < N2; i++) {
descriptor()
}
}
CRC_32
}
2.5.4.2
Semantic definition of fields in Program Stream map
packet_start_code_prefix – The packet_start_code_prefix is a 24-bit code. Together with the map_stream_id that
follows it constitutes a packet start code that identifies the beginning of a packet. The packet_start_code_prefix is the bit
string '0000 0000 0000 0000 0000 0001' (0x000001 in hexadecimal).
map_stream_id – This is an 8-bit field whose value shall be 0xBC.
program_stream_map_length – The program_stream_map_length is a 16-bit field indicating the total number of bytes
in the program_stream_map immediately following this field. The maximum value of this field is 1018 (0x3FA).
current_next_indicator – This is a 1-bit field, when set to '1' indicates that the Program Stream Map sent is currently
applicable. When the bit is set to '0', it indicates that the Program Stream Map sent is not yet applicable and shall be the
next table to become valid.
program_stream_map_version – This 5-bit field is the version number of the whole Program Stream Map. The
version number shall be incremented by 1 modulo 32 whenever the definition of the Program Stream Map changes.
When the current_next_indicator is set to '1', then the program_stream_map_version shall be that of the currently
applicable Program Stream Map. When the current_next_indicator is set to '0', then the program_stream_map_version
shall be that of the next applicable Program Stream Map.
program_stream_info_length – The program_stream_info_length is a 16-bit field indicating the total length of the
descriptors immediately following this field.
marker_bit – A marker_bit is a 1-bit field that has the value '1'.
elementary_stream_map_length – This is a 16-bit field specifying the total length, in bytes, of all elementary stream
information in this program stream map. It includes the stream_type, elementary_stream_id, and
elementary_stream_info_length fields.
60
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
stream_type – This 8-bit field specifies the type of the stream according to Table 2-34. The stream_type field shall
only identify elementary streams contained in PES packets. A value of 0x05 is prohibited.
elementary_stream_id – The elementary_stream_id is an 8-bit field indicating the value of the stream_id field in the
PES packet headers of PES packets in which this elementary stream is stored.
elementary_stream_info_length – The elementary_stream_info_length is a 16-bit field indicating the length in bytes
of the descriptors immediately following this field.
CRC_32 – This is a 32-bit field that contains the CRC value that gives a zero output of the registers in the decoder
defined in Annex A after processing the entire program stream map.
2.5.5
Program Stream directory
The directory for an entire stream is made up of all the directory data carried by Program Stream Directory packets
identified with the directory_stream_id. The syntax for program_stream_directory packets is defined in Table 2-42.
NOTE 1 – This syntax differs from the PES packet syntax described in 2.4.3.6.
Directory entries may be required to reference I-pictures in a video stream as defined in ITU-T
Rec. H.262 | ISO/IEC 13818-2 and ISO/IEC 11172-2. If an I-picture that is referenced in a directory entry is preceded
by a sequence header with no intervening picture headers, the directory entry shall reference the first byte of the
sequence header. If an I-picture that is referenced in a directory entry is preceded by a group of pictures header with no
intervening picture headers and no immediately preceding sequence header, the directory entry shall reference the first
byte of the group of pictures header. Any other picture that a directory entry references shall be referenced by the first
byte of the picture header.
NOTE 2 – It is recommended that I-pictures immediately following a sequence header should be referenced in directory
structures so that the directory contains an entry at every point where the decoder may be reset completely.
Directory entries may be required to reference IDR picture or pictures associated with a recovery point SEI message in
an AVC video stream. Each such directory entry shall refer to the first byte of an AVC access unit.
Directory references to audio streams as defined in ISO/IEC 13818-3 and ISO/IEC 11172-3 shall be the syncword of
the audio frame.
NOTE 3 – It is recommended that the distance between referenced access units not exceed half a second.
Access units shall be referenced in a program_stream_directory packet in the same order that they appear in the
bitstream.
2.5.5.1
Syntax of Program Stream directory packet
See Table 2-42.
Table 2-42 – Program Stream directory packet
Syntax
directory_PES_packet(){
packet_start_code_prefix
directory_stream_id
PES_packet_length
number_of_access_units
marker_bit
prev_directory_offset[44..30]
marker_bit
prev_directory_offset[29..15]
marker_bit
prev_directory_offset[14..0]
marker_bit
next_directory_offset[44..30]
marker_bit
next_directory_offset[29..15]
marker_bit
next_directory_offset[14..0]
marker_bit
No. of bits
Mnemonic
24
8
bslbf
uimsbf
16
15
1
15
1
15
1
15
1
15
1
15
1
15
1
uimsbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
ITU-T Rec. H.222.0 (05/2006)
61
ISO/IEC 13818-1:2007 (E)
Table 2-42 – Program Stream directory packet
Syntax
for (i = 0; i < number_of_access_units; i++) {
packet_stream_id
PES_header_position_offset_sign
PES_header_position_offset[43..30]
marker_bit
PES_header_position_offset[29..15]
marker_bit
PES_header_position_offset[14..0]
marker_bit
reference_offset
marker_bit
reserved
PTS[32..30]
marker_bit
PTS[29..15]
marker_bit
PTS[14..0]
marker_bit
bytes_to_read[22..8]
marker_bit
bytes_to_read[7..0]
marker_bit
intra_coded_indicator
coding_parameters_indicator
reserved
}
No. of bits
Mnemonic
8
1
14
1
15
1
15
1
16
1
3
3
1
15
1
15
1
15
1
8
1
1
2
4
uimsbf
tcimsbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
bslbf
bslbf
bslbf
}
2.5.5.2
Semantic definition of fields in Program Stream directory
packet_start_code_prefix – The packet_start_code_prefix is a 24-bit code. Together with the stream_id that follows, it
constitutes a packet start code that identifies the beginning of a packet. The packet_start_code_prefix is the bit string
'0000 0000 0000 0000 0000 0001' (0x000001 in hexadecimal).
directory_stream_id – This 8-bit field shall have a value '1111 1111' (0xFF).
PES_packet_length – The PES_packet_length is a 16-bit field indicating the total number of bytes in the
program_stream_directory immediately following this field (refer to Table 2-22).
number_of_access_units – This 15-bit field is the number of access_units that are referenced in this Directory PES
packet.
prev_directory_offset – This 45-bit unsigned integer gives the byte address offset of the first byte of the packet start
code of the previous Program Stream Directory packet. This address offset is relative to the first byte of the start code of
the packet which contains this previous_directory_offset field. The value '0' indicates that there is no previous Program
Stream Directory packet.
next_directory_offset – This 45-bit unsigned integer gives the byte address offset of the first byte of the packet start
code of the next Program Stream Directory packet. This address offset is relative to the first byte of the start code of the
packet which contains this next_directory_offset field. The value '0' indicates that there is no next Program Stream
Directory packet.
packet_stream_id – This 8-bit field is the stream_id of the elementary stream that contains the access unit referenced
by this directory entry.
62
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
PES_header_position_offset_sign – This 1-bit field is the arithmetic sign for the PES_header_position_offset
described immediately following. A value of '0' indicates that the PES_header_position_offset is a positive offset. A
value of '1' indicates that the PES_header_position_offset is a negative offset.
PES_header_position_offset – This 44-bit unsigned integer gives the byte offset address of the first byte of the PES
packet containing the access unit referenced. The offset address is relative to the first byte of the start-code of the packet
containing this PES_header_position_offset field. The value '0' indicates that no access unit is referenced.
reference_offset – This 16-bit field is an unsigned integer indicating the position of the first byte of the referenced
access unit, measured in bytes relative to the first byte of the PES packet containing the first byte of the referenced
access unit.
PTS (presentation_time_stamp) – This 33-bit field is the PTS of the access unit that is referenced. The semantics of
the coding of the PTS field are as described in 2.4.3.6.
bytes_to_read – This 23-bit unsigned integer is the number of bytes in the Program Stream after the byte indicated by
reference_offset that are needed to decode the access unit completely. This value includes any bytes multiplexed at the
systems layer including those containing information from other streams.
intra_coded_indicator – This is a 1-bit flag. When set to '1' it indicates that the referenced access unit is not
predictively coded. This is independent of other coding parameters that might be needed to decode the access unit. For
example, this field shall be coded as '1' for video Intra frames, whereas for 'P' and 'B' frames this bit shall be coded as
'0'. For all PES packets containing data which is not from an ITU-T Rec. H.262 | ISO/IEC 13818-2 video stream, this
field is undefined (see Table 2-43).
Table 2-43 – Intra_coded indicator
Value
Meaning
0
Not Intra
1
Intra
coding_parameters_indicator – This 2-bit field is used to indicate the location of coding parameters that are needed to
decode the access units referenced. For example, this field can be used to determine the location of quantization
matrices for video frames.
Table 2-44 – Coding_parameters indicator
Value
2.6
Meaning
00
All coding parameters are set to their default values
01
All coding parameters are set in this access unit, at least one of them
is not set to a default
10
Some coding parameters are set in this access unit
11
No coding parameters are coded in this access unit
Program and program element descriptors
Program and program element descriptors are structures which may be used to extend the definitions of programs and
program elements. All descriptors have a format which begins with an 8-bit tag value. The tag value is followed by an
8-bit descriptor length and data fields.
2.6.1
Semantic definition of fields in program and program element descriptors
The following semantics apply to the descriptors defined in 2.6.2 through 2.6.34.
descriptor_tag – The descriptor_tag is an 8-bit field which identifies each descriptor.
Table 2-45 provides the ITU-T Rec. H.222.0 | ISO/IEC 13818-1 defined, ITU-T Rec. H.222.0 | ISO/IEC 13818-1
reserved, and user available descriptor tag values. An 'X' in the TS or PS columns indicates the applicability of the
descriptor to either the Transport Stream or Program Stream respectively. Note that the meaning of fields in a descriptor
may depend on which stream it is used in. Each case is specified in the descriptor semantics below.
descriptor_length – The descriptor_length is an 8-bit field specifying the number of bytes of the descriptor
immediately following descriptor_length field.
ITU-T Rec. H.222.0 (05/2006)
63
ISO/IEC 13818-1:2007 (E)
Table 2-45 – Program and program element descriptors
64
descriptor_tag
TS
PS
Identification
0
n/a
n/a
Reserved
1
n/a
n/a
Reserved
2
X
X
video_stream_descriptor
3
X
X
audio_stream_descriptor
4
X
X
hierarchy_descriptor
5
X
X
registration_descriptor
6
X
X
data_stream_alignment_descriptor
7
X
X
target_background_grid_descriptor
8
X
X
video_window_descriptor
9
X
X
CA_descriptor
10
X
X
ISO_639_language_descriptor
11
X
X
system_clock_descriptor
12
X
X
multiplex_buffer_utilization_descriptor
13
X
X
copyright_descriptor
14
X
15
X
X
16
X
X
17
X
18
X
19-26
X
27
X
maximum_bitrate_descriptor
private_data_indicator_descriptor
smoothing_buffer_descriptor
STD_descriptor
X
IBP_descriptor
Defined in ISO/IEC 13818-6
X
MPEG-4_video_descriptor
28
X
X
MPEG-4_audio_descriptor
29
X
X
IOD_descriptor
30
X
31
X
X
SL_descriptor
FMC_descriptor
32
X
X
external_ES_ID_descriptor
33
X
X
MuxCode_descriptor
34
X
X
FmxBufferSize_descriptor
35
X
36
X
X
multiplexbuffer_descriptor
content_labeling_descriptor
37
X
X
metadata_pointer_descriptor
38
X
X
metadata_descriptor
39
X
X
metadata_STD_descriptor
40
X
X
AVC video descriptor
41
X
X
IPMP_descriptor (defined in ISO/IEC 13818-11, MPEG-2 IPMP)
42
X
X
AVC timing and HRD descriptor
43
X
X
MPEG-2_AAC_audio_descriptor
44
X
X
FlexMuxTiming_descriptor
45-63
n/a
n/a
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Reserved
64-255
n/a
n/a
User Private
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
2.6.2
Video stream descriptor
The video stream descriptor provides basic information which identifies the coding parameters of a video elementary
stream as described in ITU-T Rec. H.262 | ISO/IEC 13818-2 or ISO/IEC 11172-2 (see Table 2-46).
Table 2-46 – Video stream descriptor
Syntax
video_stream_descriptor(){
descriptor_tag
descriptor_length
multiple_frame_rate_flag
frame_rate_code
MPEG_1_only_flag
constrained_parameter_flag
still_picture_flag
if (MPEG_1_only_flag = = '0'){
profile_and_level_indication
chroma_format
frame_rate_extension_flag
Reserved
No. of bits
Mnemonic
8
8
1
4
1
1
1
uimsbf
uimsbf
bslbf
uimsbf
bslbf
bslbf
bslbf
8
2
1
5
uimsbf
uimsbf
bslbf
bslbf
}
}
2.6.3
Semantic definitions of fields in video stream descriptor
multiple_frame_rate_flag – This 1-bit field when set to '1' indicates that multiple frame rates may be present in the
video stream. When set to a value of '0' only a single frame rate is present.
frame_rate_code – This is a 4-bit field as defined in 6.3.3 of ITU-T Rec. H.262 | ISO/IEC 13818-2, except that when
the multiple_frame_rate_flag is set to a value of '1' the indication of a particular frame rate also permits certain other
frame rates to be present in the video stream, as specified in Table 2-47:
Table 2-47 – Frame rate code
Coded as
23.976
24.0
25.0
29.97
30.0
50.0
59.94
60.0
Also includes
23.976
23.976
23.976 24.0 29.97
25.0
23.976 29.97
23.976 24.0 29.97 30.0 59.94
MPEG_1_only_flag – This is a 1-bit field which when set to '1' indicates that the video stream contains only
ISO/IEC 11172-2 data. If set to '0' the video stream may contain both ITU-T Rec. H.262 | ISO/IEC 13818-2 video data
and constrained parameter ISO/IEC 11172-2 video data.
constrained_parameter_flag – This is a 1-bit field which when set to '1' indicates that the video stream shall not
contain unconstrained ISO/IEC 11172-2 video data. If this field is set to '0' the video stream may contain both
constrained parameters and unconstrained ISO/IEC 11172-2 video streams. If the MPEG_1_only_flag is set to '0', the
constrained_parameter_flag shall be set to '1'.
still_picture_flag – This is a 1-bit field, which when set to '1' indicates that the video stream contains only still pictures.
If the bit is set to '0' then the video stream may contain either moving or still picture data.
profile_and_level_indication – This 8-bit field is coded in the same manner as the profile_and_level_indication fields
in the ITU-T Rec. H.262 | ISO/IEC 13818-2 video stream. The value of this field indicates a profile and level that is
equal to or higher than any profile and level in any sequence in the associated video stream. For the purposes of this
ITU-T Rec. H.222.0 (05/2006)
65
ISO/IEC 13818-1:2007 (E)
subclause, an ISO/IEC 11172-2 constrained parameter stream is considered to a be a Main Profile at Low Level stream
(MP @ LL).
chroma_format – This 2-bit field is coded in the same manner as the chroma_format fields in the ITU-T Rec. H.262 |
ISO/IEC 13818-2 video stream. The value of this field shall be at least equal to or higher than the value of the
chroma_format field in any video sequence of the associated video stream. For the purposes of this subclause, an
ISO/IEC 11172-2 video stream is considered to have chroma_format field with the value '01', indicating 4:2:0.
frame_rate_extension_flag – This is a 1-bit flag which when set to '1' indicates that either or both the
frame_rate_extension_n and the frame_rate_extension_d fields are non-zero in any video sequences of the
ITU-T Rec. H.262 | ISO/IEC 13818-2 video stream. For the purposes of this subclause, an ISO/IEC 11172-2 video
stream is constrained to have both fields set to zero.
2.6.4
Audio stream descriptor
The audio stream descriptor provides basic information which identifies the coding version of an audio elementary
stream as described in ISO/IEC 13818-3 or ISO/IEC 11172-3 (see Table 2-48).
Table 2-48 – Audio stream descriptor
Syntax
audio_stream_descriptor(){
descriptor_tag
descriptor_length
free_format_flag
ID
layer
variable_rate_audio_indicator
reserved
}
2.6.5
No. of bits
Mnemonic
8
8
1
1
2
1
3
uimsbf
uimsbf
bslbf
bslbf
bslbf
bslbf
bslbf
Semantic definition of fields in audio stream descriptor
free_format_flag – This 1-bit field when set to '1' indicates that the audio stream may contain one or more audio
frames with the bitrate_index set to '0000'. If set to '0', then the bitrate_index is not '0000' (refer to 2.4.2.3 of ISO/IEC
13818-3) in any audio frame of the audio stream.
ID – This 1-bit field when set to '1' indicates that the ID field is set to '1' in each audio frame in the audio stream (refer
to 2.4.2.3 of ISO/IEC 13818-3).
layer – This 2-bit field is coded in the same manner as the layer field in the ISO/IEC 13818-3 or ISO/IEC 11172-3
audio streams (refer to 2.4.2.3 of ISO/IEC 13818-3). The layer indicated in this field shall be equal to or higher than the
highest layer specified in any audio frame of the audio stream.
variable_rate_audio_indicator – This 1-bit flag, when set to '0' indicates that the encoded value of the bit rate field
shall not change in consecutive audio frames which are intended to be presented without discontinuity.
66
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
2.6.6
Hierarchy descriptor
The hierarchy descriptor provides information to identify the program elements containing components of
hierarchically-coded video, audio, and private streams. (See Table 2-49.)
Table 2-49 – Hierarchy descriptor
Syntax
No. of bits
Mnemonic
8
8
4
4
2
6
2
6
2
6
uimsbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
hierarchy_descriptor() {
descriptor_tag
descriptor_length
reserved
hierarchy_type
reserved
hierarchy_layer_index
reserved
hierarchy_embedded_layer_index
reserved
hierarchy_channel
}
2.6.7
Semantic definition of fields in hierarchy descriptor
hierarchy_type – The hierarchical relation between the associated hierarchy layer and its hierarchy embedded layer is
defined in Table 2-50.
hierarchy_layer_index – The hierarchy_layer_index is a 6-bit field that defines a unique index of the associated
program element in a table of coding layer hierarchies. Indices shall be unique within a single program definition.
hierarchy_embedded_layer_index – The hierarchy_embedded_layer_index is a 6-bit field that defines the hierarchy
table index of the program element that needs to be accessed before decoding of the elementary stream associated with
this hierarchy_descriptor. This field is undefined if the hierarchy_type value is 15 (base layer).
hierarchy_channel – The hierarchy_channel is a 6-bit field that indicates the intended channel number for the
associated program element in an ordered set of transmission channels. The most robust transmission channel is defined
by the lowest value of this field with respect to the overall transmission hierarchy definition.
NOTE – A given hierarchy_channel may at the same time be assigned to several program elements.
Table 2-50 – Hierarchy_type field values
Value
Description
0
Reserved
1
Spatial Scalability
2
SNR Scalability
3
Temporal Scalability
4
Data partitioning
5
Extension bitstream
6
Private Stream
7
Multi-view Profile
8-14
Reserved
15
Base layer
ITU-T Rec. H.222.0 (05/2006)
67
ISO/IEC 13818-1:2007 (E)
2.6.8
Registration descriptor
The registration_descriptor provides a method to uniquely and unambiguously identify formats of private data
(see Table 2-51).
Table 2-51 – Registration descriptor
Syntax
registration_descriptor() {
descriptor_tag
descriptor_length
format_identifier
for (i = 0; i < N; i++){
additional_identification_info
}
}
2.6.9
No. of bits
Identifier
8
8
32
uimsbf
uimsbf
uimsbf
8
bslbf
Semantic definition of fields in registration descriptor
format_identifier – The format_identifier is a 32-bit value obtained from a Registration Authority as designated by
ISO/IEC JTC 1/SC 29.
additional_identification_info – The meaning of additional_identification_info bytes, if any, are defined by the
assignee of that format_identifier, and once defined they shall not change.
2.6.10
Data stream alignment descriptor
The data stream alignment descriptor describes which type of alignment is present in the associated elementary stream.
If the data_alignment_indicator in the PES packet header is set to '1' and the descriptor is present, alignment – as
specified in this descriptor – is required (see Table 2-52).
Table 2-52 – Data stream alignment descriptor
Syntax
data_stream_alignment_descriptor() {
descriptor_tag
descriptor_length
alignment_type
}
2.6.11
No. of bits
Mnemonic
8
8
8
uimsbf
uimsbf
uimsbf
Semantics of fields in data stream alignment descriptor
alignment_type – Table 2-53 describes the alignment type for ISO/IEC 11172-2 video, ITU-T Rec. H.262 |
ISO/IEC 13818-2 video, or ISO/IEC 14496-2 visual streams when the data_alignment_indicator in the PES packet
header has a value of '1'. For these video streams, the first PES_packet_data_byte following the PES header shall be the
first byte of a start code of the type indicated in Table 2-53. At the beginning of a video sequence, the alignment shall
occur at the start code of the first sequence header.
NOTE – Specifying alignment type '01' from Table 2-53 does not preclude the alignment from beginning at a GOP or
SEQ header.
The definition of an access unit is given in 2.1.1.
68
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Table 2-53 – Video stream alignment values
Alignment type
Description
00
Reserved
01
Slice, or video access unit
02
Video access unit
03
GOP, or SEQ
04
SEQ
05-FF
Reserved
Table 2-54 describes the alignment type for ITU-T Rec. H.264 | ISO/IEC 14496-10 video when the
data_alignment_indicator in the PES packet header has a value of '1'. In this case the first PES_packet_data_byte
following the PES header shall be the first byte of an AVC access unit or the first byte of an AVC slice, as signalled by
the alignment_type value.
Table 2-54 – AVC video stream alignment values
Alignment type
Description
00
Reserved
01
AVC slice or AVC access unit
02
AVC access unit
03-FF
Reserved
Table 2-55 describes the audio alignment type when the data_alignment_indicator in the PES packet header has a value
of '1'. In this case the first PES_packet_data_byte following the PES header is the first byte of an audio sync word.
Table 2-55 – Audio stream alignment values
Alignment type
2.6.12
Description
00
Reserved
01
Sync word
02-FF
Reserved
Target background grid descriptor
It is possible to have one or more video streams which, when decoded, are not intended to occupy the full display area
(e.g., a monitor). The combination of target_background_grid_descriptor and video_window_descriptors allows the
display of these video windows in their desired locations. The target_background_grid_descriptor is used to describe a
grid of unit pixels projected on to the display area. The video_window_descriptor is then used to describe, for the
associated stream, the location on the grid at which the top left pixel of the display window or display rectangle of the
video presentation unit should be displayed. This is represented in Figure 2-3.
ITU-T Rec. H.222.0 (05/2006)
69
ISO/IEC 13818-1:2007 (E)
0.0
Vertical offset
Horizontal offset
Video presented here
Vertical size
Horizontal size
TISO5830-95/d08
Figure 2-3 – Target background grid descriptor display area
2.6.13
Semantics of fields in target background grid descriptor
horizontal_size – The horizontal size of the target background grid in pixels.
vertical_size – The vertical size of the target background grid in pixels.
aspect_ratio_information – Specifies the sample aspect ratio or display aspect ratio of the target background grid.
Aspect_ratio_information is defined in ITU-T Rec. H.262 | ISO/IEC 13818-2 (see Table 2-56).
Table 2-56 – Target background grid descriptor
Syntax
No. of bits
Mnemonic
8
8
14
14
4
uimsbf
uimsbf
uimsbf
uimsbf
uimsbf
target_background_grid_descriptor() {
descriptor_tag
descriptor_length
horizontal_size
vertical_size
aspect_ratio_information
}
2.6.14
Video window descriptor
The video window descriptor is used to describe the window characteristics of the associated video elementary stream.
Its values reference the target background grid descriptor for the same stream. Also see
target_background_grid_descriptor in 2.6.12 (see Table 2-57).
Table 2-57 – Video window descriptor
Syntax
video_window_descriptor() {
descriptor_tag
descriptor_length
horizontal_offset
vertical_offset
window_priority
}
70
ITU-T Rec. H.222.0 (05/2006)
No. of bits
Mnemonic
8
8
14
14
4
uimsbf
uimsbf
uimsbf
uimsbf
uimsbf
ISO/IEC 13818-1:2007 (E)
2.6.15
Semantic definition of fields in video window descriptor
horizontal_offset – The value indicates the horizontal position of the top left pixel of the current video display window
or display rectangle if indicated in the picture display extension on the target background grid for display as defined in
the target_background_grid_descriptor. The top left pixel of the video window shall be one of the pixels of the target
background grid (refer to Figure 2-3).
vertical_offset – The value indicates the vertical position of the top left pixel of the current video display window or
display rectangle if indicated in the picture display extension on the target background grid for display as defined in the
target_background_grid_descriptor. The top left pixel of the video window shall be one of the pixels of the target
background grid (refer to Figure 2-3).
window_priority – The value indicates how windows overlap. A value of 0 being lowest priority and a value of 15 is
the highest priority, i.e., windows with priority 15 are always visible.
2.6.16
Conditional access descriptor
The conditional access descriptor is used to specify both system-wide conditional access management information such
as EMMs and elementary stream-specific information such as ECMs. It may be used in both the
TS_program_map_section (refer to 2.4.4.8) and the program_stream_map (refer to 2.5.3). If any elementary stream is
scrambled, a CA descriptor shall be present for the program containing that elementary stream. If any system-wide
conditional access management information exists within a Transport Stream, a CA descriptor shall be present in the
conditional access table.
When the CA descriptor is found in the TS_program_map_section (table_id = 0x02), the CA_PID points to packets
containing program related access control information, such as ECMs. Its presence as program information indicates
applicability to the entire program. In the same case, its presence as extended ES information indicates applicability to
the associated program element. Provision is also made for private data.
When the CA descriptor is found in the CA_section (table_id = 0x01), the CA_PID points to packets containing
system-wide and/or access control management information, such as EMMs.
The contents of the Transport Stream packets containing conditional access information are privately defined
(see Table 2-58).
Table 2-58 – Conditional access descriptor
Syntax
CA_descriptor() {
descriptor_tag
descriptor_length
CA_system_ID
reserved
CA_PID
for (i = 0; i < N; i++) {
private_data_byte
}
}
2.6.17
No. of bits
Mnemonic
8
8
16
3
13
uimsbf
uimsbf
uimsbf
bslbf
uimsbf
8
uimsbf
Semantic definition of fields in conditional access descriptor
CA_system_ID – This is a 16-bit field indicating the type of CA system applicable for either the associated ECM
and/or EMM streams. The coding of this is privately defined and is not specified by ITU-T | ISO/IEC.
CA_PID – This is a 13-bit field indicating the PID of the Transport Stream packets which shall contain either ECM or
EMM information for the CA systems as specified with the associated CA_system_ID. The contents (ECM or EMM) of
the packets indicated by the CA_PID is determined from the context in which the CA_PID is found, i.e., a
TS_program_map_section or the CA table in the Transport Stream, or the stream_id field in the Program Stream.
In Transport Streams, the presence of PID 0x03 indicates that there is IPMP as described in ISO/IEC 13818-11 used by
components in the Transport Stream. In Program Streams, the presence of stream_ID_extension value 0x00 indicates
that IPMP as described in ISO/IEC 13818-11 is used by components in the Program Stream. Within a given ITU-T
Rec. H.222.0 | ISO/IEC 13818-1 stream, components could use both IPMP as described in ISO/IEC 13818-11 as well as
CA as defined in ISO/IEC 13818-1:2006. Compatibility between the two schemes is described in ISO/IEC 13818-11.
ITU-T Rec. H.222.0 (05/2006)
71
ISO/IEC 13818-1:2007 (E)
2.6.18
ISO 639 language descriptor
The language descriptor is used to specify the language of the associated program element (see Table 2-59).
Table 2-59 – ISO 639 language descriptor
Syntax
No. of bits
Mnemonic
8
8
uimsbf
uimsbf
24
8
bslbf
bslbf
ISO_639_language_descriptor() {
descriptor_tag
descriptor_length
for (i = 0; i < N; i++) {
ISO_639_language_code
audio_type
}
}
2.6.19
Semantic definition of fields in ISO 639 language descriptor
ISO_639_language_code – Identifies the language or languages used by the associated program element. The
ISO_639_language_code contains a 3-character code as specified by ISO 639, Part 2. Each character is coded into 8 bits
according to ISO 8859-1 and inserted in order into this 24-bit field. In the case of multilingual audio streams the
sequence of ISO_639_language_code fields shall reflect the content of the audio stream.
audio_type – The audio_type is an 8-bit field which specifies the type of stream defined in Table 2-60.
Table 2-60 – Audio type values
Value
Description
0x00
Undefined
0x01
Clean effects
0x02
Hearing impaired
0x03
Visual impaired commentary
0x04-0x7F
User Private
0x80-0xFF
Reserved
clean effects – This field indicates that the referenced program element has no language.
hearing impaired – This field indicates that the referenced program element is prepared for the hearing impaired.
visual_impaired_commentary – This field indicates that the referenced program element is prepared for the visually
impaired viewer.
2.6.20
System clock descriptor
This descriptor conveys information about the system clock that was used to generate the timestamps.
If an external clock reference was used, the external_clock_reference_indicator may be set to '1'. The decoder optionally
may use the same external reference if it is available.
If the system clock is more accurate than the 30-ppm accuracy required, then the accuracy of the clock can be
communicated by encoding it in the clock_accuracy fields. The clock frequency accuracy is:
clock_accuracy_integer × 10–clock_accuracy_exponent ppm
72
ITU-T Rec. H.222.0 (05/2006)
(2-26)
ISO/IEC 13818-1:2007 (E)
If clock_accuracy_integer is set to '0', then the system clock accuracy is 30 ppm. When the
external_clock_reference_indicator is set to '1', the clock accuracy pertains to the external reference clock
(see Table 2-61).
Table 2-61 – System clock descriptor
Syntax
system_clock_descriptor() {
descriptor_tag
descriptor_length
external_clock_reference_indicator
reserved
clock_accuracy_integer
clock_accuracy_exponent
reserved
}
2.6.21
No. of bits
Mnemonic
8
8
1
1
6
3
5
uimsbf
uimsbf
bslbf
bslbf
uimsbf
uimsbf
bslbf
Semantic definition of fields in system clock descriptor
external_clock_reference_indicator – This is a 1-bit indicator. When set to '1', it indicates that the system clock has
been derived from an external frequency reference that may be available at the decoder.
clock_accuracy_integer – This is a 6-bit integer. Together with the clock_accuracy_exponent, it gives the fractional
frequency accuracy of the system clock in parts per million.
clock_accuracy_exponent – This is a 3-bit integer. Together with the clock_accuracy_integer, it gives the fractional
frequency accuracy of the system clock in parts per million.
2.6.22
Multiplex buffer utilization descriptor
The multiplex buffer utilization descriptor provides bounds on the occupancy of the STD multiplex buffer. This
information is intended for devices such as remultiplexers, which may use this information to support a desired
re-multiplexing strategy (see Table 2-62).
Table 2-62 – Multiplex buffer utilization descriptor
Syntax
Multiplex_buffer_utilization_descriptor() {
descriptor_tag
descriptor_length
bound_valid_flag
LTW_offset_lower_bound
reserved
LTW_offset_upper_bound
}
2.6.23
No. of bits
Mnemonic
8
8
1
15
1
15
uimsbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
Semantic definition of fields in multiplex buffer utilization descriptor
bound_valid_flag – A value of '1' indicates that the LTW_offset_lower_bound and the LTW_offset_upper_bound
fields are valid.
LTW_offset_lower_bound – This 15-bit field is defined only if the bound_valid flag has a value of '1'. When defined,
this field has the units of (27 MHz/300) clock periods, as defined for the LTW_offset (refer to 2.4.3.4). The
LTW_offset_lower_bound represents the lowest value that any LTW_offset field would have, if that field were coded in
every packet of the stream or streams referenced by this descriptor. Actual LTW_offset fields may or may not be coded
in the bitstream when the multiplex buffer utilization descriptor is present. This bound is valid until the next occurrence
of this descriptor.
LTW_offset_upper_bound – This 15-bit field is defined only if the bound_valid has a value of '1'. When defined, this
field has the units of (27 MHz/300) clock periods, as defined for the LTW_offset (refer to 2.4.3.4). The
LTW_offset_upper_bound represents the largest value that any LTW_offset field would have, if that field were coded
in every packet of the stream or streams referenced by this descriptor. Actual LTW_offset fields may or may not be
ITU-T Rec. H.222.0 (05/2006)
73
ISO/IEC 13818-1:2007 (E)
coded in the bitstream when the multiplex buffer utilization descriptor is present. This bound is valid until the next
occurrence of this descriptor.
2.6.24
Copyright descriptor
The copyright_descriptor provides a method to enable audiovisual works identification. This copyright_descriptor
applies to programs or program elements within programs (see Table 2-63).
Table 2-63 – Copyright descriptor
Syntax
copyright_descriptor() {
descriptor_tag
descriptor_length
copyright_identifier
for (i = 0; i < N; i++){
additional_copyright_info
}
}
2.6.25
No. of bits
Identifier
8
8
32
uimsbf
uimsbf
uimsbf
8
bslbf
Semantic definition of fields in copyright descriptor
copyright_identifier – This field is a 32-bit value obtained from the Registration Authority.
additional_copyright_info – The meaning of additional_copyright_info bytes, if any, are defined by the assignee of
that copyright_identifier, and once defined, they shall not change.
2.6.26
Maximum bitrate descriptor
See Table 2-64.
Table 2-64 – Maximum bitrate descriptor
Syntax
maximum_bitrate_descriptor() {
descriptor_tag
descriptor_length
reserved
maximum_bitrate
}
2.6.27
No. of bits
Identifier
8
8
2
22
uimsbf
uimsbf
bslbf
uimsbf
Semantic definition of fields in maximum bitrate descriptor
maximum_bitrate – The maximum bitrate is coded as a 22-bit positive integer in this field. The value indicates an
upper bound of the bitrate, including transport overhead, that will be encountered in this program element or program.
The value of maximum_bitrate is expressed in units of 50 bytes/second. The maximum_bitrate_descriptor is included in
the Program Map Table (PMT). Its presence as extended program information indicates applicability to the entire
program. Its presence as ES information indicates applicability to the associated program element.
74
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
2.6.28
Private data indicator descriptor
See Table 2-65.
Table 2-65 – Private data indicator descriptor
Syntax
No. of bits
Identifier
38
38
32
uimsbf
uimsbf
uimsbf
private_data_indicator_descriptor() {
descriptor_tag
descriptor_length
private_data_indicator
}
2.6.29
Semantic definition of fields in Private data indicator descriptor
private_data_indicator – The value of the private_data_indicator is private and shall not be defined by ITU-T |
ISO/IEC.
2.6.30
Smoothing buffer descriptor
This descriptor is optional and conveys information about the size of a smoothing buffer, SBn, associated with this
descriptor, and the associated leak rate out of that buffer, for the program element(s) that it refers to.
In the case of Transport Streams, bytes of Transport Stream packets of the associated program element(s) present in the
Transport Stream are input to a buffer SBn of size given by sb_size, at the time defined by equation 2-4.
In the case of Program Streams, bytes of all PES packets of the associated elementary streams, are input to a buffer SBn
of size given by sb_size, at the time defined by equation 2-21.
When there is data present in this buffer, bytes are removed from this buffer at a rate defined by sb_leak_rate. The
buffer, SBn shall never overflow. During the continuous existence of a program, the value of the elements of the
Smoothing Buffer descriptor of the different program element(s) in the program, shall not change.
The meaning of the smoothing buffer_descriptor is only defined when it is included in the PMT or the Program Stream
Map.
If, in the case of a Transport Stream, it is present in the ES info in the Program Map Table, all Transport Stream packets
of the PID of that program element enter the smoothing buffer.
If, in the case of a Transport Stream, it is present in the program information, the following Transport Stream packets
enter the smoothing buffer:
•
all Transport Stream packets of all PIDs listed as elementary_PIDs in the extended program information
as well as;
•
all Transport Stream packets of the PID which is equal to the PMT_PID of this section;
•
all Transport Stream packets of the PCR_PID of the program.
All bytes that enter the associated buffer also exit it.
At any given time there shall be at most one descriptor referring to any individual program element and at most one
descriptor referring to the program in its entirety.
Table 2-66 – Smoothing buffer descriptor
Syntax
smoothing_buffer_descriptor () {
descriptor_tag
descriptor_length
reserved
sb_leak_rate
reserved
sb_size
}
No. of bits
Mnemonic
8
8
2
22
2
22
uimsbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
ITU-T Rec. H.222.0 (05/2006)
75
ISO/IEC 13818-1:2007 (E)
2.6.31
Semantic definition of fields in smoothing buffer descriptor
sb_leak_rate – This 22-bit field is coded as a positive integer. Its contents indicate the value of the leak rate out of the
SBn buffer for the associated elementary stream or other data in units of 400 bits/s.
sb_size – This 22-bit field is coded as a positive integer. Its contents indicate the value of the size of the multiplexing
buffer smoothing buffer SBn for the associated elementary stream or other data in units of 1 byte (see Table 2-66).
2.6.32
STD descriptor
This descriptor is optional and applies only to the T-STD model and to ITU-T Rec. H.262 | ISO/IEC 13818-2 video
elementary streams, and is used as specified in 2.4.2. This descriptor does not apply to Program Streams
(see Table 2-67).
Table 2-67 – STD descriptor
Syntax
STD_descriptor () {
descriptor_tag
descriptor_length
reserved
leak_valid_flag
}
2.6.33
No. of bits
Mnemonic
8
8
7
1
uimsbf
uimsbf
bslbf
bslbf
Semantic definition of fields in STD descriptor
leak_valid_flag – The leak_valid_flag is a 1-bit flag. When set to '1', the transfer of data from the buffer MBn to the
buffer EBn in the T-STD uses the leak method as defined in 2.4.2.3. If this flag has a value equal to '0', and the
vbv_delay fields present in the associated video stream do not have the value 0xFFFF, the transfer of data from the
buffer MBn to the buffer EBn uses the vbv_delay method as defined in 2.4.2.3.
2.6.34
IBP descriptor
This optional descriptor provides information about some characteristics of the sequence of frame types in an
ISO/IEC 11172-2, ITU-T Rec. H.262 | ISO/IEC 13818-2, or ISO/IEC 14496-2 video stream (see Table 2-68).
Table 2-68 – IBP descriptor
Syntax
ibp_descriptor() {
descriptor_tag
descriptor_length
closed_gop_flag
identical_gop_flag
max_gop-length
}
2.6.35
No. of bits
Mnemonic
8
8
1
1
14
uimsbf
uimsbf
uimsbf
uimsbf
uimsbf
Semantic definition of fields in IBP descriptor
closed_gop_flag – This 1-bit flag when set to '1' indicates that a group of pictures header is encoded before every
I-frame and that the closed_gop flag is set to '1' in all group of pictures headers in the video sequence.
identical_gop_flag – This 1-bit flag when set to '1' indicates that the number of P-frames and B-frames between
I-frames, and the picture coding types and sequence of picture types between I-pictures is the same throughout the
sequence, except possibly for the pictures up to the second I-picture.
max_gop_length – This 14-bit unsigned integer indicates the maximum number of the coded pictures between any two
consecutive I-pictures in the sequence. The value of '0' is forbidden.
76
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
2.6.36
MPEG-4 video descriptor
For individual ISO/IEC 14496-2 streams directly carried in PES packets, as defined in 2.11.2, the MPEG-4 video
descriptor provides basic information for identifying the coding parameters of such visual elementary streams. The
MPEG-4 video descriptor does not apply to ISO/IEC 14496-2 streams encapsulated in SL-packets and in FlexMux
packets, as defined in 2.11.3.
Table 2-69 – MPEG-4 video descriptor
Syntax
No. of bits
Mnemonic
8
8
8
uimsbf
uimsbf
uimsbf
MPEG-4_video_descriptor () {
descriptor_tag
descriptor_length
MPEG-4_visual_profile_and_level
}
2.6.37
Semantic definition of fields in MPEG-4 video descriptor
MPEG-4_video_profile_and_level – This 8-bit field shall identify the profile and level of the ISO/IEC 14496-2 video
stream. This field shall be coded with the same value as the profile_and_level_indication field in the Visual Object
Sequence Header in the associated ISO/IEC 14496-2 stream.
2.6.38
MPEG-4 audio descriptor
For individual ISO/IEC 14496-3 streams directly carried in PES packets, as defined in 2.11.2, the MPEG-4 audio
descriptor provides basic information for identifying the coding parameters of such audio elementary streams. The
MPEG-4 audio descriptor does not apply to ISO/IEC 14496-3 streams encapsulated in SL-packets and in FlexMux
packets, as defined in 2.11.3.
Table 2-70 – MPEG-4 audio descriptor
Syntax
No. of bits
Mnemonic
8
8
8
uimsbf
uimsbf
uimsbf
MPEG-4_audio_descriptor () {
descriptor_tag
descriptor_length
MPEG-4_audio_profile_and_level
}
2.6.39
Semantic definition of fields in MPEG-4 audio descriptor
MPEG-4_audio_profile_and_level – This 8-bit field shall identify the profile and level of the ISO/IEC 14496-3 audio
stream corresponding to the Table 2-71.
Table 2-71 – MPEG-4_audio_profile_and_level assignment values
Value
Description
0x00-0x0F
Reserved
0x10
0x11
0x12
0x13
0x14-0x17
0x18
Main profile, level 1
Main profile, level 2
Main profile, level 3
Main profile, level 4
Reserved
Scalable Profile, level 1
0x19
0x1A
0x1B
0x1C-0x1F
Scalable Profile, level 2
Scalable Profile, level 3
Scalable Profile, level 4
Reserved
ITU-T Rec. H.222.0 (05/2006)
77
ISO/IEC 13818-1:2007 (E)
Table 2-71 – MPEG-4_audio_profile_and_level assignment values
Value
0x20
0x21
0x22-0x27
0x28
0x29
0x2A
0x2B-0x2F
0x30
0x31
0x32
0x33
0x34
0x35
0x36
0x37
0x38
0x39
0x3A
0x3B
0x3C
0x3D
0x3E
0x3F
78
Description
Speech profile, level 1
Speech profile, level 2
Reserved
Synthesis profile, level 1
Synthesis profile, level 2
Synthesis profile, level 3
Reserved
High quality audio profile, level 1
High quality audio profile, level 2
High quality audio profile, level 3
High quality audio profile, level 4
High quality audio profile, level 5
High quality audio profile, level 6
High quality audio profile, level 7
High quality audio profile, level 8
Low delay audio profile, level 1
Low delay audio profile, level 2
Low delay audio profile, level 3
Low delay audio profile, level 4
Low delay audio profile, level 5
Low delay audio profile, level 6
Low delay audio profile, level 7
Low delay audio profile, level 8
0x40
0x41
0x42
0x43
0x44-0x47
0x48
0x49
0x4A
0x4B
0x4C
0x4D
0x4E-0x4F
Natural audio profile, level 1
Natural audio profile, level 2
Natural audio profile, level 3
Natural audio profile, level 4
Reserved
Mobile audio internetworking profile, level 1
Mobile audio internetworking profile, level 2
Mobile audio internetworking profile, level 3
Mobile audio internetworking profile, level 4
Mobile audio internetworking profile, level 5
Mobile audio internetworking profile, level 6
Reserved
0x50
0x51
0x52
0x53
0x54-0x57
0x58
0x59
0x5A
0x5B
0x5C-0xFF
AAC profile, level 1
AAC profile, level 2
AAC profile, level 4
AAC profile, level 5
Reserved
High efficiency AAC profile, level 2
High efficiency AAC profile, level 3
High efficiency AAC profile, level 4
High efficiency AAC profile, level 5
Reserved
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
2.6.40
IOD descriptor
The IOD descriptor encapsulates the InitialObjectDescriptor structure. An initial object descriptor allows access to a set
of ISO/IEC 14496 streams by identifying the ES_ID values of the ISO/IEC 14496-1 scene description and object
descriptor streams. Both the scene description stream and the object descriptor stream contain further information about
the ISO/IEC 14496 streams that are part of the scene. See Annex R for a description of the content access procedure.
The InitialObjectDescriptor is specified in 8.6.3 of ISO/IEC 14496-1.
Within a Transport Stream, the IOD descriptor shall be conveyed in the descriptor loop immediately following the
program_info_length field in the Program Map Table. If a Program Stream Map is present in a Program Stream, the
IOD descriptor shall be conveyed in the descriptor loop immediately following the program_stream_info_length field in
the Program Stream Map. More than one IOD descriptor may be associated to a program.
NOTE – This Specification does not specify how the IOD_label may be used by higher level service information to uniquely
select one of the ISO/IEC 14496 presentations identified by multiple IOD descriptors.
Table 2-72 – IOD descriptor
Syntax
IOD_descriptor () {
descriptor_tag
descriptor_length
Scope_of_IOD_label
IOD_label
InitialObjectDescriptor ()
}
2.6.41
No. of bits
Mnemonic
8
8
8
8
8
uimsbf
uimsbf
uimsbf
uimsbf
uimsbf
Semantic definition of fields in IOD descriptor
Scope_of_IOD_label – This 8-bit field specifies the scope of the IOD_label field. A value of 0x10 indicates that the
IOD_label is unique within the Program Stream or within the specific program in a Transport Stream in which the
IOD descriptor is carried. A value of 0x11 indicates that the IOD_label is unique within the Transport Stream in which
the IOD descriptor is carried. All other values of the Scope_of_IOD_label field are reserved.
IOD_label – This 8-bit field specifies the label of the IOD descriptor.
InitialObjectDescriptor () – This structure is defined in 8.6.3.1 of ISO/IEC 14496-1.
2.6.42
SL descriptor
The SL descriptor shall be used when a single ISO/IEC 14496-1 SL-packetized stream is encapsulated in PES packets.
The SL descriptor associates the ES_ID of this SL-packetized stream to an elementary_PID in case of a Transport
Stream or to an elementary_stream_id in case of a Program Stream. Within a Transport Stream, the SL descriptor shall
be conveyed for the corresponding elementary stream in the descriptor loop immediately following the ES_info_length
field in the Program Map Table. If a Program Stream Map is present in a Program Stream, the SL descriptor shall be
conveyed in the descriptor loop immediately following the elementary_stream_info_length field within the Program
Stream Map.
NOTE – SL packetized streams may be used in a Program Stream. However, only one stream_id exists for ISO/IEC 14496-1
SL-packetized streams. In order to associate multiple such streams within a Program Stream to an ISO/IEC 14496-1 scene,
FlexMux has to be used and signalled appropriately by an FMC descriptor. This limitation does not exist in a Transport Stream
where the SL descriptor provides unambiguous mapping between an ISO/IEC 14496-1 ES_ID value and an ITU-T Rec. H.222.0 |
ISO/IEC 13818-1 elementary_PID value.
Table 2-73 – SL descriptor
Syntax
No. of bits
Mnemonic
8
8
16
uimsbf
uimsbf
uimsbf
SL_descriptor () {
descriptor_tag
descriptor_length
ES_ID
}
2.6.43
Semantic definition of fields in SL descriptor
ES_ID – This 16-bit field shall specify the identifier of an ISO/IEC 14496-1 SL-packetized stream.
ITU-T Rec. H.222.0 (05/2006)
79
ISO/IEC 13818-1:2007 (E)
2.6.44
FMC descriptor
The FMC descriptor indicates that the ISO/IEC 14496-1 FlexMux tool has been used to multiplex ISO/IEC 14496-1
SL-packetized streams into a FlexMux stream before encapsulation in PES packets or ISO//IEC14496_sections. The
FMC descriptor associates FlexMux channels to the ES_ID values of the SL-packetized streams in the FlexMux stream.
An FMC descriptor is required for each program element referenced by an elementary_PID value in a Transport Stream
and for each elementary_stream_id in a Program Stream that conveys a FlexMux stream. Within a Transport Stream,
the FMC descriptor shall be conveyed for the corresponding elementary stream in the descriptor loop immediately
following the ES_info_length field in the Program Map Table. If a Program Stream Map is present in a Program
Stream, the FMC descriptor shall be conveyed in the descriptor loop immediately following the
elementary_stream_info_length field in the Program Stream Map.
For each SL_packetized stream in a FlexMux stream, the FlexMux channel shall be identified by a single entry in the
FMC descriptor.
Table 2-74 – FMC descriptor
Syntax
No. of bits
Mnemonic
8
8
uimsbf
uimsbf
16
8
uimsbf
uimsbf
FMC_descriptor () {
descriptor_tag
descriptor_length
for (i = 0; i < descriptor_length; i + = 3) {
ES_ID
FlexMuxChannel
}
}
2.6.45
Semantic definition of fields in FMC descriptor
ES_ID – This 16-bit field specifies the identifier of an ISO/IEC 14496-1 SL-packetized stream.
FlexMuxChannel – This 8-bit field specifies the number of the FlexMux channel used for this SL-packetized stream.
2.6.46
External_ES_ID descriptor
The External_ES_ID descriptor assigns an ES_ID, as defined in ISO/IEC 14496-1, to a program element to which no
ES_ID value has been assigned by other means. This ES_ID allows reference to a non-ISO/IEC 14496 component in
the scene description or, for example, to associate a non-ISO/IEC 14496 component with an IPMP stream.
Within a Transport stream, the assignment of an ES_ID shall be made by conveying an External_ES_ID descriptor for
the corresponding elementary stream in the descriptor loop immediately following the ES_info_length field in the
Program Map Table. If a Program Stream Map is present in a Program Stream, the External_ES_ID descriptor shall be
conveyed in the descriptor loop immediately following the elementary_stream_info_length field in the Program Stream
Map.
Table 2-75 – External_ES_ID descriptor
Syntax
No. of bits
Mnemonic
8
8
16
uimsbf
uimsbf
uimsbf
External_ES_ID_descriptor () {
descriptor_tag
descriptor_length
External_ES_ID
}
2.6.47
Semantic definition of fields in External_ES_ID descriptor
External_ES_ID – This 16-bit field assigns an ES_ID identifier, as defined in ISO/IEC 14496-1, to a component of a
program.
80
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
2.6.48
Muxcode descriptor
The Muxcode descriptor conveys MuxCodeTableEntry structures as defined in 11.2.4.3 of ISO/IEC 14496-1.
MuxCodeTableEntries configure the MuxCode mode of FlexMux.
One or more Muxcode descriptors may be associated to each elementary_PID or elementary_stream_id, respectively,
conveying an ISO/IEC 14496-1 FlexMux stream that utilizes the MuxCode mode. Within a Transport stream, the
Muxcode descriptor shall be conveyed for the corresponding elementary stream in the descriptor loop immediately
following the ES_info_length field in the Program Map Table. If a Program Stream Map is present in a Program
Stream, the Muxcode descriptor shall be conveyed in the descriptor loop immediately following the
elementary_stream_info_length field in the Program Stream Map.
MuxCodeTableEntries may be updated with new versions. In case of such updates, the version_number of each
Program Map Table or the program_stream_map_version of each Program Stream Map, respectively, carrying the
MuxCode descriptor in their descriptor loop shall be incremented by 1 modulo 32.
Table 2-76 – Muxcode descriptor
Syntax
No. of bits
Mnemonic
8
8
uimsbf
uimsbf
Muxcode_descriptor () {
descriptor_tag
descriptor_length
for (i = 0; i < N; i++) {
MuxCodeTableEntry ()
}
}
2.6.49
Semantic definition of fields in Muxcode descriptor
MuxCodeTableEntry () – This structure is defined in 11.2.4.3 of ISO/IEC 14496-1.
2.6.50
FmxBufferSize descriptor
The FmxBufferSize descriptor conveys the size of the FlexMux buffer (FB) for each SL packetized stream multiplexed
in a FlexMux stream.
One FmxBufferSize descriptor shall be associated to each elementary_PID or elementary_stream_id, respectively,
conveying an ISO/IEC 14496-1 FlexMux stream. Within a Transport stream, the FmxBufferSize descriptor shall be
conveyed for the corresponding elementary stream in the descriptor loop immediately following the ES_info_length
field in the Program Map Table. If a Program Stream Map is present in a Program Stream, the FmxBufferSize
descriptor shall be conveyed in the descriptor loop immediately following the elementary_stream_info_length field
within the Program Stream Map.
Table 2-77 – FmxBufferSize descriptor
Syntax
No. of bits
Mnemonic
8
8
uimsbf
uimsbf
FmxBufferSize_descriptor () {
descriptor_tag
descriptor_length
DefaultFlexMuxBufferDescriptor()
for (i=0; i<descriptor_length; i += 4) {
FlexMuxBufferDescriptor()
}
}
2.6.51
Semantic definition of fields in FmxBufferSize descriptor
FlexMuxBufferDescriptor() – This descriptor specifies the FlexMux buffer size for one SL-packetized stream carried
within the FlexMux stream. It is defined in 11.2 of ISO/IEC 14496-1.
DefaultFlexMuxBufferDescriptor() – This descriptor specifies the default FlexMux buffer size for this FlexMux
stream. It is defined in 11.2 of ISO/IEC 14496-1.
ITU-T Rec. H.222.0 (05/2006)
81
ISO/IEC 13818-1:2007 (E)
2.6.52
MultiplexBuffer descriptor
The MultiplexBuffer descriptor conveys the size of the multiplex buffer MBn, as well as the leak rate Rxn at which data
is transferred from transport buffer TBn into buffer MBn for a specific ITU-T Rec. H.222.0 | ISO/IEC 13818-1 program
element referenced by an elementary_PID value in the Program Map Table.
One MultiplexBuffer descriptor shall be associated to each elementary_PID that contains an ISO/IEC 14496 FlexMux
stream or SL-packetized stream, including those containing ISO_IEC_14496_sections. See 2.11.3.9 for the definition of
buffers and rates in the T-STD model for decoding of ISO/IEC 14496 content.
The MultiplexBuffer descriptor shall be conveyed in the descriptor loop immediately following the ES_info_length
field in the Program Map Table.
Table 2-78 – MultiplexBuffer descriptor
Syntax
No. of bits
Mnemonic
8
8
24
24
uimsbf
uimsbf
uimsbf
uimsbf
MultiplexBuffer_descriptor () {
descriptor_tag
descriptor_length
MB_buffer_size
TB_leak_rate
}
2.6.53
Semantic definition of fields in MultiplexBuffer descriptor
MB_buffer_size – This 24-bit field shall specify the size in byte of buffer MBn of the elementary stream n that is
associated with this descriptor.
TB_leak_rate – This 24-bit field shall specify in units of 400 bits per second the rate at which data is transferred from
transport buffer TBn to multiplex buffer MBn for the elementary stream n that is associated with this descriptor.
2.6.54
FlexMuxTiming descriptor
See Table 2-79.
Table 2-79 – FlexMuxTiming descriptor
Syntax
No. of bits
FlexMuxTiming_descriptor () {
descriptor_tag
descriptor_length
FCR_ES_ID
FCRResolution
FCRLength
FmxRateLength
}
2.6.55
8
8
16
32
8
8
Mnemonic
uimsbf
uimsbf
uimsbf
uimsbf
uimsbf
uimsbf
Semantic definition of fields in FlexMuxTiming descriptor
FCR_ES_ID – Is the ES_ID associated to this clock reference stream.
FCRResolution – Is the resolution of the object time base in cycles per second.
FCRLength – Is the length of the fmxClockReference field in FlexMux packets with index = 238. A length of zero
shall indicate that no FlexMux packets with index = 238 are present in this FlexMux stream. FCRlength shall take
values between zero and 64.
FmxRateLength – Is the length of the fmxRate field in FlexMux packets with index = 238. FmxRateLength shall take
values between 1 and 32.
2.6.56
Content labelling descriptor
The content labelling descriptor assigns a label to content; the label can be used by metadata to reference the associated
content. This label, the content_reference_id_record, is metadata application format specific. The content labelling
82
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
descriptor is associated with a content segment. For the purpose of this clause, a content segment is defined as a portion
in time of a program, an elementary stream (such as audio or video) or any combination of programs or elementary
streams. The descriptor may be included in the PMT in the descriptor loop for either the program or an elementary
stream, but may also be contained in tables not defined in this Specification, for example tables to describe segments of
programs or elementary streams. The content labelling descriptor also provides information on which content time base
is used and on the offset between the content time base and the metadata time base. When the Normal Play Time (NPT)
concept of DSM-CC, as specified in ISO/IEC 13818-6, is used as the content time base, the ID of the NPT time base is
provided. The descriptor allows for carriage of private data. See Table 2-80.
Table 2-80 – Content labelling descriptor
Syntax
No. of bits
Mnemonic
8
8
16
uimsbf
uimsbf
uimsbf
32
uimsbf
1
4
3
bslbf
uimsbf
bslbf
8
uimsbf
8
bslbf
7
33
7
33
bslbf
uimsbf
bslbf
uimsbf
1
7
bslbf
uimsbf
8
uimsbf
8
bslbf
8
bslbf
Content_labeling_descriptor () {
descriptor_tag
descriptor_length
metadata_application_format
if (metadata_application_format== 0xFFFF){
metadata_application_format_identifier
}
content_reference_id_record_flag
content_time_base_indicator
reserved
if (content_reference_id_record_flag == ‘1'){
content_reference_id_record_length
for (i=0; i<content_reference_id_record_length;i++){
content_reference_id_byte
}
}
if (content_time_base_indicator== 1|2){
reserved
content_time_base_value
reserved
metadata_time_base_value
}
if (content_time_base_indicator== 2){
reserved
contentId
}
if (content_time_base_indicator==3|4|5|6|7){
time_base_association_data_length
for (i=0; i< time_base_association_data_length;i++){
reserved
}
}
for (i=0; i<N;i++){
private_data_byte
}
}
2.6.57
Semantic definition of fields in content labelling descriptor
metadata_application_format: The metadata_application_format is a 16-bit field, coded as defined in Table 2-81, that
specifies the application responsible for defining usage, syntax and semantics of the content_reference_id record and of
any other privately defined fields in this descriptor. See also 2.12.1. The value 0xFFFF indicates that the format is
signalled by the value carried in the metadata_application_format_identifier field.
Table 2-81 – Metadata_application_format
Value
0x0000-0x000F
Description
Reserved
0x0010
ISO 15706 (ISAN) encoded in its binary form (see Notes 1 and 3)
0x0011
ISO 15706-2 (V-ISAN) encoded in its binary form (see Notes 2 and 3)
0x0012-0x00FF
Reserved
0x0100-0xFFFE
User defined
0xFFFF
Defined by the metadata_application_format_identifier field
ITU-T Rec. H.222.0 (05/2006)
83
ISO/IEC 13818-1:2007 (E)
Table 2-81 – Metadata_application_format
NOTE 1 – For ISAN, the content_reference_id_byte is set to binary encoding and the content_reference_id_record_length
is set to 0x08.
NOTE 2 – For V-ISAN, the content_reference_id_byte is set to binary encoding and the
content_reference_id_record_length is set to 0x0C.
NOTE 3 – For interoperability amongst metadata applications that use the metadata_application_format values of 0x0010
and 0x0011, it is recommended that the content_reference_id_flag be set to '1' and the content_time_base_indicator be set
to '00'.
metadata_application_format_identifier: The coding of this 32-bit field is fully equivalent to the coding of the
format_identifier field in the registration_descriptor, as defined in 2.6.8.
NOTE – The assigned Registration Authority for the format_identifier field is SMPTE.
content_reference_id_record_flag: The content_reference_id_record_flag is a 1-bit flag that signals the presence of a
content_reference_id_record in this descriptor.
content_time_base_indicator: The content_time_base_indicator is a 4-bit field which specifies the used content time
base. If the descriptor is associated with a program, then the content time base applies to all streams that are part of that
program. A value of 1 indicates usage of the STC, while a value of '2' indicates usage of NPT, the Normal Play Time as
defined in ISO/IEC 13818-6. The values between 8 and 15 indicate usage of a privately defined content time base. If
coded with a value of '0', no content time base is defined in this descriptor. If no content time base is specified for a
program or stream, then the mapping of time references in the metadata to the content is not defined in this
Specification.
Table 2-82 – Content_time_base_indicator values
Value
Description
0
No content time base defined in this descriptor
1
Use of STC
2
Use of NPT
3-7
Reserved
8-15
Use of privately defined content time base
content_reference_id_record_length: The content_reference_id_record_length is an 8-bit field that specifies the
number of content_reference_id_bytes immediately following this field. This field shall not be coded with the value '0'.
content_reference_id_byte: The content_reference_id_byte is part of a string of one or more contiguous bytes that
assigns one or more reference identifications (labels) to the content to which this descriptor is associated. The format of
this byte string is defined by the body indicated by the coded value in the metadata_application_format field.
content_time_base_value: The content_time_base_value is a 33-bit field that specifies a value in units of 90 kHz of the
content time base indicated by the content_time_base_indicator field.
metadata_time_base_value: The metadata_time_base_value is a 33-bit field that is coded in units of 90 kHz. The field
is coded with the value of the metadata time base at the instant in time in which the time base indicated by
content_time_base_indicator reaches the value encoded in the content_time_base_value field. Note that the metadata
time base may use any time-scale, but that its value is to be coded in units of 90 kHz. For example, if a SMPTE type of
time code is used, then the number of hours, minutes, seconds and frames is expressed in the corresponding number of
90-kHz units.
contentId: The contentId is a 7-bit field that specifies the value of the content_Id field in the NPT Reference Descriptor
for the applied NPT time base.
time_base_association_data_length: The time_base_association_data_length is an 8-bit field that specifies the number
of reserved bytes immediately following this field. The reserved bytes can be used to carry time base association data
for time bases defined in future.
private_data_byte: The private_data_byte is an 8-bit field. The private_data_bytes represent data, the format of which
is defined privately. These bytes can be used to provide additional information as deemed appropriate. The use of these
bytes is defined by the metadata application format.
84
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
2.6.58
Metadata pointer descriptor
The metadata pointer descriptor points to a single metadata service and associates this metadata service with
audiovisual content in an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream. The metadata is associated with the content
within the context of the descriptor. The context is defined by the location of the descriptor. In a transport stream, the
descriptor may be located in the PMT in the descriptor loop for either the program or an elementary stream, but may
also be located in tables not defined in this Specification, such as tables describing bouquets of broadcast services. The
metadata may be located in an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream, but the same metadata may also be
provided on alternative locations, such as the Internet.
The descriptor may contain location information of metadata that is not carried in an ITU-T Rec. H.222.0 |
ISO/IEC 13818-1 stream; the coding of the location information is metadata application format specific. The descriptor
allows for carriage of private data.
For metadata carried in an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream, the descriptor specifies the tools used for
such carriage. If the metadata is carried in PES packets, metadata sections, or ISO/IEC 13818-6 synchronized download
sections, the metadata_service_id field identifies the metadata service in the referenced metadata stream. If an
ISO/IEC 13818-6 carousel is used to carry the metadata, then the private data may provide information to signal the
metadata service, such as the applied value of the module_id for carriage of the metadata in a data carousel, and the file
name of the metadata when the object carousel is used.
Receivers should be aware that multiple metadata services may be pointed to from the same program or audiovisual
stream (as defined by the context of the descriptor). A unique metadata pointer descriptor shall be used to point to each
metadata service used by the program or audiovisual stream. Similarly, the same metadata service can be pointed to
from several programs or audiovisual streams by using a separate metadata pointer descriptors for each association.
Table 2-83 – Metadata pointer descriptor
Syntax
Metadata_pointer_descriptor () {
descriptor_tag
descriptor_length
metadata_application_format
if (metadata_application_format== 0xFFFF){
metadata_application_format_identifier
}
metadata_format
if (metadata_format== 0xFF){
metadata_format_identifier
}
metadata_service_id
metadata_locator_record_flag
MPEG_carriage_flags
reserved
if (metadata_locator_record_flag == '1'){
metadata_locator_record_length
for ( i = 0; i < metadata_locator_record_length; i + +){
metadata_locator_record_byte
}
}
if (MPEG_carriage_flags == 0|1|2){
program_number
}
if (MPEG_carriage_flags == 1){
transport_stream_location
transport_stream_id
}
for (i=0; i<N;i++){
private_data_byte
}
}
2.6.59
No. of bits
Mnemonic
8
8
16
uimsbf
uimsbf
uimsbf
32
uimsbf
8
uimsbf
32
uimsbf
8
1
2
5
uimsbf
bslbf
uimsbf
bslbf
8
uimsbf
8
bslbf
16
uimsbf
16
16
uimsbf
uimsbf
8
bslbf
Semantic definition of fields in metadata pointer descriptor
metadata_application_format: The metadata_application_format is a 16-bit field that specifies the application
responsible for defining usage, syntax and semantics of the metadata_locator_record record and any other privately
defined fields in this descriptor. The coding of this field is defined in Table 2-81 in 2.6.57.
metadata_application_format_identifier: The coding of this field is defined in subclause 2.6.57.
ITU-T Rec. H.222.0 (05/2006)
85
ISO/IEC 13818-1:2007 (E)
metadata_format: The metadata_format is an 8-bit field that indicates the format and coding of the metadata. The
coding of this field is specified in Table 2-84.
Table 2-84 – Metadata format values
Value
Description
0x00-0x0F
Reserved
0x10
ISO/IEC 15938-1 TeM
0x11
ISO/IEC 15938-1 BiM
0x12-0x3E
Reserved
0x3F
Defined by metadata application format
0x40-0xFE
Private use
0xFF
Defined by metadata_format_identifier field
The values 0x10 and 0x11 identify ISO/IEC 15938-1 defined data. The value 0x3F indicates that the format is defined
by the body indicated by the metadata_application_format field. The values in the inclusive range of 0x40 up to 0xFE
are available to signal use of private formats. The value 0xFF indicates that the format is signalled by the
metadata_format_identifier field.
metadata_format_identifier: The coding of this 32-bit field is fully equivalent to the coding of the format_identifier
field in the registration_descriptor, as defined in 2.6.8.
NOTE – SMPTE is assigned as Registration Authority for the format_identifier field.
metadata_service_id: This 8-bit field references the metadata service. It is used for retrieving a metadata service from
within a metadata stream.
metadata_locator_record_flag: The metadata_locator_record_flag is a 1-bit field which, when set to '1' indicates that
associated metadata is available on a location outside of an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream, specified in
a metadata_locator_record.
MPEG_carriage_flags: The MPEG_carriage_flags is a 2-bit field which specifies if the metadata stream containing the
associated metadata service is carried in an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream, and if so, whether the
associated metadata is carried in a Transport Stream or Program Stream. The coding of this field is defined in
Table 2-85.
Table 2-85 – MPEG_carrier_flags
Value
0
Description
Carriage in the same Transport Stream where this metadata pointer descriptor is carried.
1
Carriage in a different Transport Stream from where this metadata pointer descriptor is carried.
2
Carriage in a Program Stream. This may or may not be the same Program Stream in which this
metadata pointer descriptor is carried.
3
None of the above.
metadata_locator_record_length: The metadata_locator_record_length is an 8-bit field that specifies the number of
metadata_locator_record_bytes immediately following. This field shall not be coded with the value '0'.
metadata_locator_record_byte: The metadata_locator_record_byte is part of a string of one or more contiguous bytes
that form the metadata locator record. This record specifies one or more locations outside of an ITU-T Rec. H.222.0 |
ISO/IEC 13818-1 stream. The format of the metadata locator record is defined by the metadata application signalled by
the metadata_application_format field. The record may for example contain Internet URLs that specify where the
metadata can be found, possibly in addition to their location(s) in the Transport Stream. If the MPEG_carriage_flags is
coded with the value '0', '1' or '2' and the metadata locator record is present, then this signals alternative locations for the
same metadata.
program_number: The program_number is a 16-bit field that identifies the program_number of the MPEG-2 program
in the ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream in which associated metadata is carried. If the
MPEG_carriage_flags have the value '0', then the transport stream is the current one, and if the MPEG_carriage_flags
have the value '1', it is the transport stream signalled by the fields transport_stream_location and transport_stream_id.
86
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
transport_stream_location: The transport_stream_location is a 16-bit field that is defined privately. For example, this
field may be used by applications to signal the original_network_id defined by ETSI.
transport_stream_id: The transport_stream_id is a 16-bit field that identifies the Transport Stream in which associated
metadata is carried.
private_data_byte: The private_data_byte is an 8-bit field. The private_data_bytes represent data, the format of which
is defined privately. These bytes can be used to provide additional information as deemed appropriate.
2.6.60
Metadata descriptor
The metadata descriptor specifies parameters of a metadata service carried in an MPEG-2 TS or PS. In an MPEG-2 TS,
the descriptor is included in the PMT in the descriptor loop for the elementary stream that carries the metadata service.
The descriptor specifies the format of the associated metadata, and contains the value of the metadata_service_id to
identify the metadata service to which the metadata descriptor applies. As needed, the descriptor can convey
information to identify the metadata service from a collection of metadata transmitted in a DSM-CC carousel.
Optionally metadata application format specific private data can be carried.
The metadata descriptor also signals whether decoder configuration is required and is able to carry the decoder
configuration bytes, but this is only practical if the number of these bytes is small. If the decoder configuration
information is too large to be carried by the descriptor, it shall be contained in a metadata service. This may be within
the metadata service itself, or in another metadata service within the same program. Identification of the metadata
service that contains the decoder configuration is provided by the metadata descriptor. If a DSM-CC carousel is used to
carry the decoder configuration, then information can be provided how to retrieve the decoder configuration from the
carousel.
ITU-T Rec. H.222.0 (05/2006)
87
ISO/IEC 13818-1:2007 (E)
Table 2-86 – Metadata descriptor
Syntax
No. of bits
Mnemonic
8
8
16
uimsbf
uimsbf
uimsbf
32
uimsbf
8
uimsbf
32
uimsbf
8
3
1
4
uimsbf
bslbf
bslbf
bslbf
8
uimsbf
8
bslbf
8
uimsbf
8
bslbf
8
uimsbf
8
bslbf
8
uimsbf
8
uimsbf
8
bslbf
8
bslbf
Metadata_descriptor () {
descriptor_tag
descriptor_length
metadata_application_format
if (metadata_application_format == 0xFFFF) {
metadata_application_format_identifier
}
metadata_format
if (metadata_format== 0xFF){
metadata_format_identifier
}
metadata_service_id
decoder_config_flags
DSM-CC_flag
reserved
if (DSM-CC_flag == '1'){
service_identification_length
for(i=0; i<service_identification_length; i++) {
service_identification_record_byte
}
}
if (decoder_config_flags == '001') {
decoder_config_length
for(i=0; i<decoder_config_length; i++) {
decoder_config_byte
}
}
if (decoder_config_flags == '011') {
dec_config_identification_record_length
for(i=0;i<dec_config_id_record_length;i++) {
dec_config_identification_record_byte
}
}
if (decoder_config_flags == '100') {
decoder_config_metadata_service_id
}
if (decoder_config_flags == '101'|'110') {
reserved_data_length
for(i=0;i<reserved_data_length;i++) {
reserved
}
}
for (i=0; i<N;i++) {
private_data_byte
}
}
2.6.61
Semantic definition of fields in metadata descriptor
metadata_application_format: The metadata_application_format is a 16-bit field that specifies the application
responsible for defining usage, syntax and semantics of the service_identification_record and any privately defined
bytes in this descriptor. The coding of this field is defined in Table 2-81.
metadata_application_format_identifier: The coding of this field is defined in 2.6.57.
metadata_format: The coding of this field is defined in 2.6.59.
metadata_format_identifier: The coding of this field is defined in 2.6.59.
metadata_service_id. This 8-bit field identifies the metadata service to which this metadata descriptor applies.
decoder_config_flags: The decoder_config_flags is a 3-bit field which indicates whether and how decoder
configuration information is conveyed.
88
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Table 2-87 – decoder_config_flags
Value
Description
000
No decoder configuration is needed.
001
The decoder configuration is carried in this descriptor in the decoder_config_byte field.
010
The decoder configuration is carried in the same metadata service as to which this
metadata descriptor applies.
011
The decoder configuration is carried in a DSM-CC carousel. This value shall only be
used if the metadata service to which this descriptor applies is using the same type of
DSM-CC carousel.
100
The decoder configuration is carried in another metadata service within the same
program, as identified by the decoder_config_metadata_service_id field in this
metadata descriptor.
101, 110
111
Reserved.
Privately defined.
DSM-CC_flag: This is a one-bit flag that is set to '1' if the stream with which this descriptor is associated is carried in
an ISO/IEC 13818-6 data or object carousel.
NOTE 1 – The use of the object or data carousel is indicated by the applied stream-type value for this metadata stream.
service_identification length: This field specifies the number of service_identification_record_bytes immediately
following.
service_identification_record_byte: This byte is part of a string of one or more contiguous bytes that specify the
service_identification_record. This record contains data on retrieval of the metadata service from a DSM-CC carousel.
The format of the metadata locator record is defined by the application indicated by the metadata application format.
When a DSM-CC object carousel is used, the record may for example comprise the unique object identifier (the
IOP:IOR() from 11.3.1 and 5.7.2.3 of ISO/IEC 13818-6 DSM-CC) for the metadata service. Similarly, in case of a
DSM-CC data carousel, the record can for example provide the transaction_id and the module_id of the metadata
service.
decoder_config_length: This field specifies the number of decoder_config_bytes immediately following.
decoder_config_byte: These bytes comprise the decoder configuration information. This sequence of bytes comprises
the configuration information needed by the receiver to decode this service. It is intended that carriage in the metadata
descriptor is only used when the configuration information is very small.
decoder_config_DSM-CC_id: This is the download identifier of the decoder configuration information when it is
transmitted in a DSM-CC data carousel, or the object identifier of the decoder configuration information if it is carried
in a DSM-CC object carousel.
NOTE 2 – The use of the object or data carousel is indicated by the applied stream-type value for this metadata stream.
dec_config_identification_record_length:
dec_config_identification_record_bytes.
This
field
specifies
the
immediately
following
number
of
dec_config_identification_record_byte: This byte is part of a string of one or more contiguous bytes that specify the
dec_config_identification_record. This record specifies how to retrieve the required decoder configuration from a
DSM-CC carousel. The format of the metadata locator record is defined by the metadata application format. When a
DSM-CC object carousel is used, the record may for example comprise the unique object identifier (the IOP:IOR() from
11.3.1 and 5.7.2.3 of ISO/IEC 13818-6 DSM-CC) for the decoder configuration. Similarly, in case of a DSM-CC data
carousel, the record may for example provide the transaction_id and the module_id of the decoder configuration.
decoder_config_metadata_service_id: This is the value of the metadata_service_id that is assigned to the metadata
service
that
contains
the
decoder
configuration.
The
metadata
service
indicated
by
the
decoder_config_metadata_service_id and the metadata service that uses that decoder configuration shall be in the same
program. Hence in a Transport Stream, the metadata descriptors for both these metadata services shall be in the same
PMT. The metadata descriptor of the metadata service indicated by the decoder_config_metadata_service_id shall have
a decoder_config_flag field with a value of either '001', '010' or '011'.
reserved_data_length: This field specifies the number of reserved bytes immediately following.
private_data_byte: The private_data_byte is an 8-bit field. The private_data_bytes represent data, the format of which
is defined privately. These bytes can be used to provide additional information as deemed appropriate.
ITU-T Rec. H.222.0 (05/2006)
89
ISO/IEC 13818-1:2007 (E)
2.6.62
Metadata STD descriptor
This descriptor defines parameters of the STD model (defined in 2.12.10) for the processing of the metadata stream to
which this descriptor is associated.
Table 2-88 – Metadata STD descriptor
Syntax
Metadata_STD_descriptor () {
descriptor_tag
descriptor_length
reserved
metadata_input_leak_rate
reserved
metadata_buffer_size
reserved
metadata_output_leak_rate
}
2.6.63
No. of bits
Mnemonic
8
8
2
22
2
22
2
22
uimsbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
Semantic definition of fields in metadata STD descriptor
metadata_input_leak_rate: The metadata_input_leak_rate is a 22-bit field that specifies the leak rate for the associated
metadata stream in the T-STD model out of the buffer TBn into buffer Bn. The leak rate is specified in units of
400 bits/s. For metadata carried in a program stream, the coding of the metadata_input_leak_rate field is not specified,
as the rate into Bn equals the rate of the program stream.
metadata_buffer_size: The metadata_buffer_size is a 22-bit field that specifies the size of buffer Bn in the STD model
for the associated metadata stream. The size of Bn is specified in units of 1024 bytes.
metadata_output_leak_rate: The metadata_output_leak_rate is a 22-bit field that specifies for the associated metadata
service the leak rate in the STD model out of buffer Bn to the decoder. The leak rate is specified in units of 400 bits/s.
For metadata streams transported synchronously (stream-type 0x15 or 0x19), the metadata access units are
instantaneously removed from Bn under the control of PTS timestamps and in that case the coding of the
metadata_output_leak_rate field is not specified.
2.6.64
AVC video descriptor
For ITU-T Rec. H.264 | ISO/IEC 14496-10 video streams, the AVC video descriptor provides basic information for
identifying coding parameters of the associated AVC video stream, such as on profile and level parameters included in
the SPS of an AVC video stream.
The AVC video descriptor also signals the presence of AVC still pictures and the presence of AVC 24-hour pictures in
the AVC video stream. If this descriptor is not included in the PMT for an AVC video stream in a transport stream or in
the PSM, if present, for an AVC video stream in a program stream, then such AVC video stream shall not contain AVC
still pictures and shall not contain AVC 24-hour pictures. (See Table 2-89.)
Table 2-89 – AVC video descriptor
Syntax
AVC_video_descriptor () {
descriptor_tag
descriptor_length
profile_idc
constraint_set0_flag
constraint_set1_flag
constraint_set2_flag
AVC_compatible_flags
level_idc
AVC_still_present
AVC_24_hour_picture_flag
reserved
}
90
ITU-T Rec. H.222.0 (05/2006)
No. of bits
Mnemonic
8
8
8
1
1
1
5
8
1
1
6
uimsbf
uimsbf
uimsbf
bslbf
bslbf
bslbf
bslbf
uimsbf
bslbf
bslbf
bslbf
ISO/IEC 13818-1:2007 (E)
2.6.65
Semantic definition of fields in AVC video descriptor
profile_idc, constraint_set0_flag, constraint_set1_flag, constraint_set2_flag, AVC_compatible_flags and
level_idc – These fields, with the exception of AVC_compatible_flags shall be coded according to the semantics for
these fields defined in ITU-T Rec. H.264 | ISO/IEC 14496-10. The semantics of AVC_compatible_flags are exactly
equal to the semantics of the field(s) defined for the 5 bits between the constraint_set2 flag and the level_idc field in the
Sequence Parameter Set, as defined in ITU-T Rec. H.264 | ISO/IEC 14496-10. The entire AVC video stream to which
the AVC descriptor is associated shall conform to the profile, level and constraints signalled by these fields.
NOTE – In one or more sequences in the AVC video stream the level may be lower than the level signalled in the AVC video
descriptor, while also a profile may occur that is a subset of the profile signalled in the AVC video descriptor. However, in the
entire AVC video stream, only tools shall be used that are included in the profile signalled in the AVC video descriptor, if
present. For example, if the main profile is signalled, then the baseline profile may be used in some sequences, but only using
those tools that are in the main profile. If the sequence parameter sets in an AVC video stream signal different profiles, and no
additional constraints are signalled, then the stream may need examination to determine which profile, if any, the entire stream
conforms to. If an AVC video descriptor is to be associated with an AVC video stream that does not conform to a single profile,
then the AVC video stream must be partitioned into two or more sub-streams, so that AVC video descriptors can signal a single
profile for each such sub-stream.
AVC_still_present – This 1-bit field when set to '1' indicates that the AVC video stream may include AVC still
pictures. When set to '0', then the associated AVC video stream shall not contain AVC still pictures.
AVC_24_hour_picture_flag – This 1-bit flag when set to '1' indicates that the associated AVC video stream may
contain AVC 24-hour pictures. For the definition of an AVC 24-hour picture, see 2.1.2. If this flag is set to '0', the
associated AVC video stream shall not contain any AVC 24-hour picture.
2.6.66
AVC timing and HRD descriptor
The AVC timing and HRD descriptor provides timing and HRD parameters of the associated AVC video stream. For
each AVC video stream carried in an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream, the AVC timing and HRD
descriptor shall be included in the PMT or in the PSM, if PSM is present in the program stream, unless the AVC video
stream carries VUI parameters with the timing_info_present_flag set to '1':
•
for each IDR picture; and
•
for each picture that is associated with a recovery point SEI message.
Absence of the AVC timing and HRD descriptor in the PMT for an AVC video stream signals usage of the leak method
in the T-STD is defined in 2.14.3.1 for the transfer from MBn to EBn, but such usage can also be signalled by the
hrd_management_valid_flag set to '0' in the AVC timing and HRD descriptor. If the transfer rate into buffer EBn can be
determined from HRD parameters contained in an AVC video stream, and if this transfer rate is used in the T-STD for
the transfer between MBn to EBn, then the AVC timing and HRD descriptor with the hrd_management_valid_flag set to
'1' shall be included in the PMT for that AVC video stream. (See Table 2-90.)
ITU-T Rec. H.222.0 (05/2006)
91
ISO/IEC 13818-1:2007 (E)
Table 2-90 – AVC timing and HRD descriptor
Syntax
AVC timing and HRD descriptor () {
descriptor_tag
descriptor_length
hrd_management_valid_flag
reserved
picture_and_timing_info_present
if (picture_and_timing_info_present) {
90kHz_flag
reserved
if (90kHz_flag = = '0') {
N
K
}
num_units_in_tick
}
fixed_frame_rate_flag
temporal_poc_flag
picture_to_display_conversion_flag
reserved
}
2.6.67
No. of bits
Mnemonic
8
8
1
6
1
uimsbf
uimsbf
bslbf
bslbf
bslbf
1
7
bslbf
bslbf
32
32
uimsbf
uimsbf
32
uimsbf
1
1
1
5
bslbf
bslbf
bslbf
bslbf
Semantic definition of fields in AVC timing and HRD descriptor
hrd_management_valid_flag – This 1-bit field is only defined for use in transport streams.
When the AVC timing and HRD descriptor is associated to an AVC video stream carried in a transport stream, then the
following applies. If the hrd_management_valid_flag is set to '1', then Buffering Period SEI and Picture Timing SEI
messages, as defined in Annex C of ITU-T Rec. H.264 | ISO/IEC 14496-10, shall be present in the associated AVC
video stream. These Buffering Period SEI messages shall carry coded initial_cpb_removal_delay and
initial_cpb_removal_delay_offset values for the NAL HRD. If the hrd_management_valid_flag is set to '1', then the
transfer of each byte from MBn to EBn in the T-STD shall be according to the delivery schedule for that byte into the
CPB in the NAL HRD, as determined from the coded initial_cpb_removal_delay and initial_cpb_removal_delay_offset
values for SchedSelIdx = cpb_cnt_minus1. When the hrd_management_valid_flag is set to '0', the leak method as
defined in 2.14.3.1 shall be used for the transfer from MBn to EBn in the T-STD.
When the AVC timing and HRD descriptor is associated to an AVC video stream carried in a program stream, then the
meaning of the hrd_management_valid_flag is not defined.
picture_and_timing_info_present – This 1-bit field when set to '1' indicates that the 90kHz_flag and parameters for
accurate mapping to 90-kHz system clock are included in this descriptor.
90kHz_flag, N, K – The 90kHz_flag when set to '1' indicates that the frequency of the AVC time base is 90 kHz. For
an AVC video stream the frequency of the AVC time base is defined by the AVC parameter time_scale in VUI
parameters, as defined in Annex E of ITU-T Rec. H.264 | ISO/IEC 14496-10. The relationship between the AVC
time_scale and the STC shall be defined by the parameters N and K in this descriptor as follows.
time _ scale =
(N × system _ clock _ frequency )
K
where time_scale denotes the exact frequency of the AVC time base, with K larger than or equal to N.
If the 90kHz_flag is set to '1', then N equals 1 and K equals 300. If the 90kHz_flag is set to '0', then the values of N and
K are provided by the coded values of the N and K fields.
NOTE 1 – This allows mapping of time expressed in units of time_scale to 90-kHz units, as needed for the calculation of PTS
and DTS timestamps, for example in decoders for AVC access units for which no PTS or DTS is encoded in the PES header.
num_units_in_tick – Coded exactly in the same way as the num_units_in_tick field in VUI parameters in Annex E of
ITU-T Rec. H.264 | ISO/IEC 14496-10. The information provided by this field shall apply to the entire AVC video
stream to which the AVC timing and HRD descriptor is associated.
92
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
fixed_frame_rate_flag – Coded exactly in the same way as the fixed_frame_rate_flag in VUI parameters in Annex E
of ITU-T Rec. H.264 | ISO/IEC 14496-10. When this flag is set to '1', it indicates that the coded frame rate is constant
within the associated AVC video stream. When this flag is set to '0', no information about the frame rate of the
associated AVC video stream is provided in this descriptor.
temporal_poc_flag – When the temporal_poc_flag is set to '1' and the fixed_frame_rate_flag is set to '1', then the
associated AVC video stream shall carry Picture Order Count (POC) information (PicOrderCnt) whereby pictures are
counted in units of ∆tfi,dpb( n ), where ∆tfi,dpb( n ) is specified in equation E-10 of ITU-T Rec. H.264 | ISO/IEC 14496-10.
When the temporal_poc_flag is set to '0', no information is conveyed regarding any potential relationship between the
POC information in the AVC video stream and time.
NOTE 2 – This reduces the overhead necessary to signal timing for each access unit. An effective PTS and DTS can be
calculated for access units for which no explicit PTS/DTS is carried. Repetition of most recently presented field of the
appropriate parity (or frame) is implied when the difference between the PTSs of the current and the next picture is greater than
2 × ∆tfi,dpb (or greater than ∆tfi,dpb when frame_mbs_only_flag is equal to 1).
picture_to_display_conversion_flag – This 1-bit field when set to '1' indicates that the associated AVC video stream
may carry display information on coded pictures by providing the pic_struct field in picture_timing SEI messages (see
Annex D of ITU-T Rec. H.264 | ISO/IEC 14496-10) and/or by providing the Picture Order Count (POC) information
(PicOrderCnt), whereby pictures are counted in units of ∆tfi,dpb( n ) (see also the semantics of temporal_poc_flag), so
that timing information for a successive AVC access unit can be derived from the previous picture in decoding or
presentation order.
When the picture_to_display_conversion_mode_flag is set to '0', then picture timing SEI messages in the AVC video
stream, if present, shall not contain the pic_struct field, and hence the pic_struct_present_flag shall be set to '0' in the
VUI parameters in the AVC video stream.
2.6.68
MPEG-2 AAC audio descriptor
For individual ISO/IEC 13818-7 streams directly carried in PES packets, the MPEG-2 AAC audio descriptor defined in
Table 2-91 provides basic information for identifying the coding parameters of such audio elementary streams.
Table 2-91 – MPEG-2 AAC_audio_descriptor
Syntax
MPEG-2_AAC_audio_descriptor () {
descriptor_tag
descriptor_length
MPEG-2_AAC_profile
MPEG-2_AAC_channel_configuration
MPEG-2_AAC_additional_information
}
2.6.69
No. of bits
Mnemonic
8
8
8
8
8
uimsbf
uimsbf
uimsbf
uimsbf
uimsbf
Semantic definition of fields in MPEG-2 AAC audio descriptor
MPEG-2_AAC_profile – This 8-bit field indicates the AAC profile according to the index in Table 31 of
ISO/IEC 13818-7:2006.
MPEG-2_AAC_channel_configuration – This 8-bit field indicates the number and configuration of audio channels
presented to the listener by the AAC decoder for the specified program. Values in the range from 1 to 6 indicate number
and configuration of audio channels as given for "Default bitstream index number" in Table 42 of
ISO/IEC 13818-7:2006. All other values indicate that the number and configuration of audio channels is undefined.
MPEG-2_AAC_additional_information – This 8-bit field indicates whether or not bandwidth extension data as
defined in ISO/IEC 13818-7:2006 is embedded in the AAC bitstream according to Table 2-92.
Table 2-92 – MPEG-2_AAC_additional_information field values
Value
0x00
0x01
0x02-0xFF
Description
AAC data according to ISO/IEC 13818-7:2006
AAC data with Bandwidth Extension data present according to ISO/IEC 13818-7:2006
Reserved
ITU-T Rec. H.222.0 (05/2006)
93
ISO/IEC 13818-1:2007 (E)
2.7
Restrictions on the multiplexed stream semantics
2.7.1
Frequency of coding the system clock reference
The Program Stream shall be constructed such that the time interval between the bytes containing the last bit of
system_clock_reference_base fields in successive packs shall be less than or equal to 0.7 s. Thus:
t (i ) − t (i ′) ≤ 0.7 s
for all i and i′ where i and i′ are the indexes of the bytes containing the last bit of consecutive
system_clock_reference_base fields.
2.7.2
Frequency of coding the program clock reference
The Transport Stream shall be constructed such that the time interval between the bytes containing the last bit of
program_clock_reference_base fields in successive occurrences of the PCRs in Transport Stream packets of the
PCR_PID for each program shall be less than or equal to 0.1 s. Thus:
t (i ) − t (i ′) ≤ 0.1s
for all i and i′ where i and i′ are the indexes of the bytes containing the last bit of consecutive
program_clock_reference_base fields in the Transport Stream packets of the PCR_PID for each program.
There shall be at least two (2) PCRs, from the specified PCR_PID within a Transport Stream, between consecutive PCR
discontinuities (refer to 2.4.3.4) to facilitate phase locking and extrapolation of byte delivery times.
2.7.3
Frequency of coding the elementary stream clock reference
The Program Stream and Transport Stream shall be constructed such that if the elementary stream clock reference field
is coded in any PES packets containing data of a given elementary stream the time interval in the PES_STD between
the bytes containing the last bit of successive ESCR_base fields shall be less than or equal to 0.7 s. In PES Streams the
ESCR encoding is required with the same interval. Thus:
t (i ) − t (i ′) ≤ 0.7 s
for all i and i′ where i and i′ are the indexes of the bytes containing the last bits of consecutive ESCR_base fields.
NOTE – The coding of elementary stream clock reference fields is optional; they need not be coded. However, if they are coded,
this constraint applies.
2.7.4
Frequency of presentation timestamp coding
The Program Stream and Transport Stream shall be constructed so that the maximum difference between coded
presentation timestamps referring to each elementary video or audio stream is 0.7 s. Thus:
tp n ( k ) − tp n ( k ′′) ≤ 0.7 s
for all n, k, and k″ satisfying:
•
Pn(k) and Pn(k″) are presentation units for which presentation timestamps are coded;
•
k and k″ are chosen so that there is no presentation unit, Pn(k′) with a coded presentation timestamp and
with k < k′ < k″; and
•
No decoding discontinuity exists in elementary stream n between Pn(k) and Pn(k″).
The 0.7-s constraint does not apply in the case of:
94
•
still pictures as defined in 2.1;
•
AVC still pictures;
•
AVC access units with a very low frame rate, where the presentation time of subsequent access units
differs by more than 0.7 s. In this particular case, the VUI parameters num_units_in_tick and time_scale
shall be present either in the AVC video stream or in an AVC-timing and HRD descriptor associated to
the AVC video stream.
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
NOTE – The presentation time of an AVC access unit is equivalent to the DPB output time to,dpb(n) defined in Annex C of ITU-T
Rec. H.264 | ISO/IEC 14496-10.
2.7.5
Conditional coding of timestamps
For each elementary stream of a Program Stream or Transport Stream, a presentation timestamp (PTS) shall be encoded
for the first access unit.
A decoding discontinuity exists at the start of an access unit An(j) in an elementary stream n if the decoding time tdn(j)
of that access unit is greater than the largest value permissible given the specified tolerance on the
system_clock_frequency. For video, except when trick mode status is true or when low_delay flag is '1', this is allowed
only at the start of a video sequence. If a decoding discontinuity exists in any elementary video or audio stream in the
Transport Stream or Program Stream, then a PTS shall be encoded referring to the first access unit after each decoding
discontinuity except when trick mode status is true.
When low_delay is '1' a PTS shall be encoded for the first access unit after an EBn or Bn underflow.
A PTS may only be present in a ITU-T Rec. H.222.0 | ISO/IEC 13818-1 video or audio elementary stream PES packet
header if the first byte of a picture start code or the first byte of an audio access unit is contained in the PES packet.
A decoding_timestamp (DTS) shall appear in a PES packet header if and only if the following two conditions are met:
•
a PTS is present in the PES packet header;
•
the decoding time differs from the presentation time.
For each AVC 24-hour picture, no explicit PTS and DTS value shall be encoded in the PES header. For such AVC
access unit, decoders shall infer the presentation time from the parameters within the AVC video stream. Therefore,
each AVC video stream that contains one or more AVC 24-hour picture(s):
•
shall either carry picture timing SEI messages with coded values of cpb_removal_delay and
dpb_output_delay; or
•
shall carry VUI parameters with the fixed_frame_rate_flag set to '1' and shall carry Picture Order Count
(POC) information (PicOrderCnt) whereby pictures are counted in units of ∆tfi,dpb( n ), where ∆tfi,dpb( n )
is specified in equation E-10 of ITU-T Rec. H.264 | ISO/IEC 14496-10.
NOTE 1 – The requirements in the second bullet are met if an AVC timing and HRD descriptor is associated with the
AVC video stream with the fixed_frame_rate_flag set to '1' and the temporal_poc_flag set to '1'.
The following applies to AVC access units in an AVC video stream carried in an ITU-T Rec. H.222.0 |
ISO/IEC 13818-1 stream. For each AVC access unit that does not represent an AVC 24-hour picture, a PES header with
a coded PTS and, if applicable, DTS value shall be provided, unless all conditions expressed under one of the following
four bullets are true:
•
In the AVC video sequence the following SEI messages are present, as signalled by VUI parameters:
a)
picture timing SEI messages providing the cpb_removal_delay and the dpb_output_delay
parameters; and
b)
buffering period SEI messages providing
initial_cpb_removal_delay_offset parameters.
the
initial_cpb_removal_delay
and
the
NOTE 2 – When picture timing SEI messages are present in the AVC video sequence, then these messages are
present for each AVC access unit, as required by ITU-T Rec. H.264 | ISO/IEC 14496-10. When buffering period
SEI messages are present in the AVC video sequence, then these messages shall be present for each IDR access
unit and for each access unit that is associated with a recovery point SEI message, as required by ITU-T
Rec. H.264 | ISO/IEC 14496-10.
•
An AVC timing and HRD descriptor is associated with the AVC video stream and in this descriptor the
fixed_frame_rate_flag is set to '1' and the temporal_poc_flag is set to '1'.
•
An AVC timing and HRD descriptor is associated with the AVC video stream and in this descriptor the
fixed_frame_rate_flag is set to '1', the picture_to_display_conversion_flag is set to '1', the
temporal_poc_flag is set to '0' and in the AVC video sequence picture timing SEI messages with the
pic_struct field are present.
•
An AVC timing and HRD descriptor is associated with the AVC video stream and in this descriptor the
fixed_frame_rate_flag is set to '1' and the temporal_poc_flag is set to '0' and the
picture_to_display_conversion_flag is set to '0'.
NOTE 3 – In this specific case the pic_struct field is used to determine subsequent PTS values.
NOTE 4 – In this case the POC information in the AVC video stream is used to determine the subsequent
PTS values.
ITU-T Rec. H.222.0 (05/2006)
95
ISO/IEC 13818-1:2007 (E)
2.7.6
Timing constraints for scalable coding
If an audio sequence is coded using an extension bitstream, such as specified in ISO/IEC 13818-3, then corresponding
decoding/presentation units in the two layers shall have identical PTS values.
If a video sequence is coded as an SNR enhancement of another sequence, such as specified in 7.8 of ITU-T Rec. H.262
| ISO/IEC 13818-2, then the set of presentation times for both sequences shall be the same.
If a video sequence is coded as two partitions, such as specified in 7.10 of ITU-T Rec. H.262 | ISO/IEC 13818-2, then
the set of presentation times for both partitions shall be the same.
If a video sequence is coded as a spatial scalable enhancement of another sequence, such as specified in 7.7 of ITU-T
Rec. H.262 | ISO/IEC 13818-2, then the following shall apply:
•
If both sequences have the same frame rate, the set of presentation times for both sequences shall be the
same.
NOTE – This does not imply that the picture coding type is the same in both layers.
•
If the sequences have different frame rates, the set of presentation times shall be such that as many
presentation times as possible shall be common to both sequences.
•
The picture from which the spatial prediction is made shall be one of the following:
–
the coincident or most recently decoded lower layer picture;
–
the coincident or most recently decoded lower layer picture that is an I- or P-picture;
–
the second most recently decoded lower layer picture that is an I- or P-picture, and provided that the
lower layer does not have the low_delay flag set to '1'.
If a video sequence is coded as a temporally scalable enhancement of another sequence, such as specified in 7.9 of
ITU-T Rec. H.262 | ISO/IEC 13818-2, then the following lower layer pictures may be used as the reference. Times are
relative to presentation times of:
2.7.7
•
the coincident or most recently presented lower layer picture;
•
the next lower layer picture to be presented.
Frequency of coding P-STD_buffer_size in PES packet headers
In a Program Stream, the P-STD_buffer_scale and P-STD_buffer_size fields shall occur in the first PES packet of each
elementary stream and again whenever the value changes. They may also occur in any other PES packet.
2.7.8
Coding of system header in the Program Stream
In a Program Stream, the system header may be present in any pack, immediately following the pack header. The
system header shall be present in the first pack of an Program Stream. The values encoded in all the system headers in
the Program Stream shall be identical.
2.7.9
Constrained system parameter Program Stream
A Program Stream is a "Constrained System Parameters Stream" (CSPS) if it conforms to the bounds specified in this
subclause. Program Streams are not limited to the bounds specified by the CSPS. A CSPS may be identified by means
of the CSPS_flag defined in the system header in 2.5.3.5. The CSPS is a subset of all possible Program Streams.
Packet rate
In the CSPS, the maximum rate at which packets shall arrive at the input to the P-STD is 300 packets per second if the
value encoded in the rate_bound field (refer to 2.5.3.6) is less than or equal to 4 500 000 bits/s if the
packet_rate_restriction_flag is set to '1', and less than or equal to 2 000 000 bits/s if the packet_rate_restriction_flag is
set to '0'. For higher bit rates the CSPS packet rate is bounded by a linear relation to the value encoded in the rate_bound
field.
Specifically, for all packs p in the Program Stream when the packet_rate_restriction_flag (refer to 2.5.3.5) is set to a
value of '1',
Rmax ⎤
NP ≤ (t (i ′) − t (i ′) ) × 300 × max ⎡⎢1,
⎣ 4.5 × 10 6 ⎥⎦
96
ITU-T Rec. H.222.0 (05/2006)
(2-27)
ISO/IEC 13818-1:2007 (E)
and if the packet_rate_restriction_flag is set to a value of '0'
Rmax ⎤
NP ≤ (t (i ′) − t (i ′) ) × 300 × max ⎡⎢1,
⎣ 2.5 × 10 6 ⎥⎦
(2-28)
where:
Rmax = 8 × 50 × rate _ bound
bit/s
(2-29)
NP is the number of packet_start_code_prefixes and system_header_start_codes between adjacent
pack_start_codes or between the last pack_start_code and the MPEG_program_end_code as
defined in Table 2-37 and semantics in 2.5.3.2.
t(i) is the time, measured in seconds, encoded in the SCR of pack p.
t(i′) is the time, measured in seconds, encoded in the SCR for pack p + 1, immediately following
pack p, or in the case of the final pack in the Program Stream, the time of arrival of the byte
containing the last bit of the MPEG_program_end_code.
Decoder buffer size
In the case of a CSPS the maximum size of each input buffer in the system target decoder is bounded. Different bounds
apply for video elementary streams and audio elementary streams.
In the case of an ITU-T Rec. H.262 | ISO/IEC 13818-2 or ISO/IEC 11172-2 video elementary stream in a CSPS, the
following applies:
BSn has a size which is equal to the sum of the size of the Video Buffer Verifier (VBV) as specified in the ITU-T
Rec. H.262 | ISO/IEC 13818-2 or ISO/IEC 11172-2 stream, respectively, and an additional amount of buffering BSadd.
BSadd is specified as:
BSadd ≤ MAX [6 × 1024, Rvmax × 0.001] bytes
where Rvmax is the maximum bit rate of the ITU-T Rec. H.262 | ISO/IEC 13818-2 or ISO/IEC 11172-2 video elementary
stream.
In the case of an ITU-T Rec. H.264 | ISO/IEC 14496-10 video elementary stream in a CSPS, the following applies:
BSn has a size which is equal to the sum of cpb_size and an additional amount of buffering BSadd. BSadd is specified as:
BSadd ≤ MAX [6 × 1024, Rvmax × 0.001] bytes
where Rvmax is the maximum video bit rate of the AVC video stream, and
where cpb_size is the CpbSize[ cpt_cnt_minus1 ] size of the CPB for the byte stream format signalled in the NAL
hrd_parameters() in the AVC video stream. If the NAL hrd_parameters() are not present in the AVC video stream, then
the cpb_size shall be the size defined as 1200 × MaxCPB in Annex A of ITU-T Rec. H.264 | ISO/IEC 14496-10 for the
applied level.
In the case of an audio elementary stream in a CSPS, the following applies:
BSn ≤ 4096 bytes
In the case of ISO/IEC 13818-7 ADTS audio elementary stream in a CSPS the following applies to support 8 channels:
BSn ≤ 8976 bytes
2.7.10
Transport Stream
Sample rate locking in Transport Streams
In the Transport Stream there shall be a specified constant rational relationship between the audio sampling rate and the
system clock frequency in the system target decoder, and likewise a specified rational relationship between the video
frame rate and the system clock frequency. The system_clock_frequency is defined in 2.4.2. The video frame rate is
ITU-T Rec. H.222.0 (05/2006)
97
ISO/IEC 13818-1:2007 (E)
specified in ITU-T Rec. H.262 | ISO/IEC 13818-2 or in ISO/IEC 11172-2. The audio sampling rate is specified in
ISO/IEC 13818-3 or in ISO/IEC 11172-3. For all presentation units in all audio elementary streams in the Transport
Stream, the ratio of system_clock_frequency to the actual audio sampling rate, SCASR, is constant and equal to the
value indicated in the following table at the nominal sampling rate indicated in the audio stream.
SCASR =
The notation
system _ clock _ frequency
audio _ sample _ rate _ in _ the _ T − STD
(2-30)
X
denotes real division.
Y
Nominal audio
sampling
frequency (kHz)
16
32
22.05
44.1
24
48
SCASR
27 000 000
------------16 000
27 000 000
------------32 000
27 000 000
------------22 050
27 000 000
------------44 100
27 000 000
------------24 000
27 000 000
------------48 000
For all presentation units in each ISO/IEC 11172-2 video and ITU-T Rec. H.262 | ISO/IEC 13818-2 video stream in the
Transport Stream, the ratio of system_clock_frequency to the actual video frame rate, SCFR, is constant and equal to
the value indicated in the following table at the nominal frame rate indicated in the video stream.
SCFR =
system _ clock _ frequency
frame _ rate _ in _ the _ T − STD
(2-31)
Nominal
frame rate
(Hz)
23.976
24
25
29.97
30
50
59.94
60
SCFR
1 126 125
1 125 000
1 080 000
900 900
900 000
540 000
450 450
450 000
The values of the SCFR are exact. The actual frame rate differs slightly from the nominal rate in cases where the
nominal rate is 23.976, 29.97, or 59.94 frames per second.
For ISO/IEC 14496-2 video streams carried in a Transport Stream, the time base of the ISO/IEC 14496-2 video stream,
as defined by vop_time_increment_resolution, shall be locked to the STC and shall be exactly equal to N times
system_clock_frequency divided by K, with N and K integers that have a fixed value within each visual object
sequence, with K greater than or equal to N.
For ITU-T Rec. H.264 | ISO/IEC 14496-10 video streams, the time base of the ITU-T Rec. H.264 | ISO/IEC 14496-10
video stream shall be locked to the system clock frequency. The frequency of the AVC time base is defined by the AVC
parameter time_scale, and this frequency shall be exactly equal to N times system_clock_frequency divided by K, with
N and K integers that have a fixed value within each AVC video sequence and K greater than or equal to N. For
example, if the time_scale is set to 90 000, then the frequency of the AVC time base is exactly equal to
system_clock_frequency divided by 300.
2.8
Compatibility with ISO/IEC 11172
The Program Stream of this Recommendation | International Standard is defined to be forward compatible with
ISO/IEC 11172-1. Decoders of the Program Stream as defined in this Recommendation | International Standard shall
also support decoding of ISO/IEC 11172-1.
2.9
Registration of copyright identifiers
2.9.1
General
Parts 1, 2 and 3 of ISO/IEC 13818 provide support for the management of audiovisual works copyrighting. In ITU-T
Rec. H.222.0 | ISO/IEC 13818-1 this is by means of a copyright descriptor, while ITU-T Rec. H.262 | ISO/IEC 13818-2
and ISO/IEC 13818-3 contain fields for identifying copyright holders through syntax fields in the elementary stream
syntax. This Recommendation | International Standard presents the method of obtaining and registering copyright
identifiers in ITU-T Rec. H.222.0 | ISO/IEC 13818-1.
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 specifies a unique 32-bit copyright_identifier which is a work type code
identifier (such as ISBN, ISSN, ISRC, etc.) carried in the copyright descriptor. The copyright_identifier enables
98
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
identification of a wide number of Copyright Registration Authorities. Each Copyright Registration Authority may
specify a syntax and semantic for identifying the audiovisual works or other copyrighted works within that particular
copyright organization through appropriate use of the variable length additional_copyright_info field which contains the
copyright number.
In the following subclause and Annexes L, M and N, the benefits and responsibilities of all parties to the registration of
copyright_identifier are outlined.
2.9.2
Implementation of a Registration Authority (RA)
ISO/IEC JTC 1 shall call for nominations for an international organization which will serve as the Registration
Authority for the copyright_identifier as defined in 2.6.24. The selected organization shall serve as the Registration
Authority. The so-named Registration Authority shall execute its duties in compliance with Annex H/JTC 1 Directives.
The registered copyright_identifier is hereafter referred to as the Registered Identifier (RID).
Upon selection of the Registration Authority, JTC 1 shall require the creation of a Registration Management Group
(RMG) which will review appeals filed by organizations whose request for a RID to be used in conjunction with
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 has been denied by the Registration Authority.
Annexes L, M and N provide information on the procedure for registering a unique copyright identifier.
2.10
Registration of private data format
The registration descriptor of ITU-T Rec. H.222.0 | ISO/IEC 13818-1 is provided by this text in order to enable users of
this Specification to unambiguously carry data when its format is not recognized by this Specification. This provision
will permit this Specification to carry all types of data while providing for a method of unambiguous identification of
the characteristics of the underlying private data.
2.10.1
General
In the following subclause and Annexes O and P, the benefits and responsibilities of all parties to the registration of
private data format are outlined.
2.10.2
Implementation of a Registration Authority (RA)
ISO/IEC JTC 1/SC 29 shall call for nominations from member bodies of ISO or National Committees of IEC which will
serve as the Registration Authority for the format_identifier as defined in 2.6.8 and 2.6.9. The selected organization
shall serve as the Registration Authority. The so-named Registration Authority shall execute its duties in compliance
with Annex H/JTC 1 Directives. The registered private data format_identifier is hereafter referred to as the Registered
Identifier (RID).
Upon selection of the Registration Authority, JTC 1 shall require the creation of a Registration Management Group
(RMG) which will review appeals filed by organizations whose request for an RID to be used in conjunction with this
Specification has been denied by the Registration Authority.
Annexes O and P provide information on the procedures for registering a unique format identifier.
2.11
Carriage of ISO/IEC 14496 data
2.11.1
Introduction
An ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream may carry individual ISO/IEC 14496-2 and 14496-3 elementary
streams as well as ISO/IEC 14496-1 audiovisual scenes with its associated streams. Typically, the ISO/IEC 14496
streams will be elements of an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 program, as defined by the PMT in a Transport
Stream and the PSM in a Program Stream.
For the carriage of ISO/IEC 14496 data in Transport Streams and Program Streams, distinction is made between
individual elementary streams and an ISO/IEC 14496-1 audiovisual scene with its associated streams. For carriage of
individual ISO/IEC 14496-2 and 14496-3 elementary streams, only system tools from ITU-T Rec. H.222.0 | ISO/IEC
13818-1 are used, as defined in 2.11.2. For carriage of an audiovisual ISO/IEC 14496-1 scene and associated ISO/IEC
14496 elementary streams, contained in ISO/IEC 14496-1 SL_packetized streams or FlexMux streams, tools from both
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 and from ISO/IEC 14496-1 are used, as defined in 2.11.3.
Carriage of ITU-T Rec. H.264 | ISO/IEC 14496-10 video over ITU-T Rec. H.222.0 | ISO/IEC 13818-1 streams is
specified in 2.14.
ITU-T Rec. H.222.0 (05/2006)
99
ISO/IEC 13818-1:2007 (E)
2.11.2
Carriage of individual ISO/IEC 14496-2 and 14496-3 Elementary Streams in PES packets
2.11.2.1 Introduction
Individual ISO/IEC 14496-2 and 14496-3 elementary streams may be carried in PES packets as
PES_packet_data_bytes. For PES packetization no specific data alignment constraints apply. For synchronization PTSs
and, when appropriate, DTSs are encoded in the header of the PES packet that carries the ISO/IEC 14496 elementary
stream data; for PTS and DTS encoding the same constraints apply as for ISO/IEC 13818 elementary streams. See
Table 2-93 for an overview of how to carry individual ISO/IEC 14496 streams within an ITU-T Rec. H.222.0 |
ISO/IEC 13818-1 stream.
Table 2-93 – Carriage of individual ISO/IEC 14496 streams in ITU-T Rec. H.222.0 | ISO/IEC 13818-1
ISO/IEC 14496-2 visual
Carriage in PES packets
Stream_type = 0x10
Stream_id = '1110 xxxx'
ISO/IEC 14496-3 audio
Carriage in PES packets
Stream_type = 0x11
Stream_id = '110x xxxx'
If a PTS or DTS is present in the PES packet header it shall refer to the visual object that follows either the first VOP
start code or the first still texture object startcode that commences in the PES packet. Each ISO/IEC 14496-2 video
stream carried by ITU-T Rec. H.222.0 | ISO/IEC 13818-1 shall contain the information required to decode the
ISO/IEC 14496-2 video stream; consequently the stream shall contain Visual Object Sequence Headers, Visual Object
Headers and Video Object Layer Headers.
In case of an ISO/IEC 14496-3 elementary stream, before PES packetization the elementary stream data shall be first
encapsulated in the LATM/LOAS AudioSyncStream() transport syntax defined in ISO/IEC 14496-3. If a PTS is present
in the PES packet header, it shall refer to the first audio frame that follows the first syncword that commences in the
payload of the PES packet.
Carriage of individual ISO/IEC 14496-2 and ISO/IEC 14496-3 elementary streams in PES packets shall be identified by
appropriate stream_id and stream_type values, indicating the use of ISO/IEC 14496-2 Visual or 14496-3 Audio. In
addition, such carriage shall be signalled by the MPEG-4_video descriptor or MPEG-4_audio descriptor, respectively.
These descriptors shall be conveyed in the descriptor loop for the respective elementary stream entry in the Program
Map Table in case of a Transport Stream or in the Program Stream Map, when present, in case of a Program Stream.
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 does not specify presentation of ISO/IEC 14496-2 and ISO/IEC 14496-3
elementary streams in the context of a program.
2.11.2.2 STD extensions for individual ISO/IEC 14496 elementary streams
The T-STD model includes a transport buffer TBn and a multiplex buffer Bn prior to decoding of each individual
ISO/IEC 14496 elementary stream n. Note that in the T-STD the single multiplex buffer Bn is also applied for
ISO/IEC 14496-2 video, as indicated in Figure 2-4, instead of the approach with two buffers MBn and EBn used for
ISO/IEC 13818-2 video in the T-STD. For buffers TBn and Bn and the rate Rxn between TBn and Bn the following
constraints apply.
100
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
k-th composition unit
j-th access unit
Transport Stream
Demultiplexing
A1(j)
td1(j)
Rxn
B1
TB1
C1(k)
tc1(k)
Video
Dn
t(i)
O1
elementary stream
i-th byte of
Transport Stream
An(j)
tdn(j)
Rxn
TBn
Bn
Audio
Cn(k)
tcn(k)
Dn
T1609310-00/d09
Figure 2-4 – T-STD model extensions for individual ISO/IEC 14496 elementary streams
In case of carriage of an ISO/IEC 14496-2 stream:
Size BSn of Buffer Bn:
BSn = BSmux + BSoh + VBVmax[profile,level]
where:
BSoh, packet overhead buffering, is defined as:
BSoh = (1/750) seconds × max{Rmax[profile,level], 2 000 000 bit/s}
and:
BSmux, additional multiplex buffering, is defined as:
BSmux = 0.004 seconds × max{Rmax[profile,level], 2 000 000 bit/s}
Rate Rxn:
Rxn = 1.2 × Rmax[profile,level]
where:
VBVmax[profile,level] and Rmax[profile,level] are defined in ISO/IEC 14496-2 for each profile and level.
For profiles and levels for which no VBVmax value is specified, the size of Bn and the rate Rxn are user
defined.
In case of carriage of an ISO/IEC 14496-3 stream:
Size BSn of Buffer Bn for ISO/IEC 14496-3 AAC audio.
else BSn = BSmux + BSdec + BSoh = 3584 bytes
In this case the size of the access unit decoding buffer BSdec, and the PES packet overhead buffer BSoh are constrained
by:
BSdec + BSoh ≤ 2848 bytes
A portion (736 bytes) of the 3584 byte buffer is allocated for buffering to allow multiplexing. The rest, 2848 bytes, are
shared for access unit buffering BSdec, BSoh and additional multiplexing.
ITU-T Rec. H.222.0 (05/2006)
101
ISO/IEC 13818-1:2007 (E)
Rate Rxn for ISO/IEC 14496-3 AAC audio same as defined for ISO/IEC 13818-7 ADTS audio in 2.4.2.3:
else Rxn = 2 000 000 bit/s
The P-STD model includes a multiplex buffer Bn prior to decoding of each individual ISO/IEC 14496 elementary
stream n. The size BSn of buffer Bn in the P-STD is defined by the P-STD_buffer_size field in the PES packet header.
2.11.3
Carriage of audiovisual ISO/IEC 14496-1 scenes and associated ISO/IEC 14496 streams
2.11.3.1 Introduction
This clause describes the encapsulation and signaling when an audiovisual scene represented by ISO/IEC 14496 data is
carried in an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Program Stream or Transport Stream. ISO/IEC 14496 content
consists of the initial object descriptor and a variable number of streams such as object descriptor streams, scene
description streams (carrying either BIFS-Command or BIFS-Anim access units), IPMP streams, OCI streams and
audiovisual streams. Each of the ISO/IEC 14496 streams shall be contained in an SL-packetized stream and may
optionally be multiplexed into a FlexMux stream, both defined in ISO/IEC 14496-1. For carriage in ITU-T
Rec. H.222.0 | ISO/IEC 13818-1 Program Stream or Transport Stream, these SL-packetized streams and FlexMux
streams shall contain encoded Object Clock Reference (OCR) and FlexMux Clock Reference (FCR) fields as specified
in 2.11.3.4 and in 2.11.3.5, respectively. The SL-packetized streams or FlexMux streams are then encapsulated either in
PES packets or in ISO_IEC_14496_sections prior to Transport Stream packetization and multiplexing or Program
Stream multiplexing. ISO_IEC_14496_sections are built on the long format of H.222.0 | ISO/IEC 13818-1 sections.
2.11.3.2 Assignment of ES_ID values
An ISO/IEC 14496-1 scene carried over an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream may associate a number of
ISO/IEC 14496, ISO/IEC 13818 and other streams by the use of the ES_ID parameter. The scene and the associated
streams may be carried over the same ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream, but a scene may also reference
streams carried elsewhere, for example over an IP network. How to identify such other means is not defined in this
Specification.
ISO/IEC 14496-1 defines name scoping rules for identifiers. These rules allow the same ES_ID value to be used for two
different streams within ISO/IEC 14496 content. When one or multiple ISO/IEC 14496-1 scenes are carried in an
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Program, duplicate ES_ID values shall not occur within the program such that
each ISO/IEC 14496 SL-packetized stream or ISO/IEC 14496-1 FlexMux channel has a unique ES_ID value in the
program.
2.11.3.3 Timing of ISO/IEC 14496 scenes and associated streams
When carried over an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream, the object time base of each ISO/IEC 14496
stream shall be locked to the ITU-T Rec. H.222.0 | ISO/IEC 13818-1 STC, that is:
If X(t) = fstc(t)/fobject(t)
then the value of X(t) shall be constant at any time t.
where:
fstc(t) denotes the intended frequency of the STC at time t, i.e., 27 000 000 Hz
fobject(t) denotes the frequency of the object time base at time t
The object time base of ISO/IEC 14496 streams carried over an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream is
conveyed as follows:
102
–
The object time base of an SL-packetized stream carried in PES packets without the use of the FlexMux
shall be conveyed by coded OCRs in the SL packet header of that stream. See 2.11.3.4.
–
The object time base of SL-packetized streams carried in PES packets within a FlexMux stream shall be
conveyed by FCRs in that FlexMux stream. See 2.11.3.5. Consequently, all ISO/IEC 14496 streams
contained within the same FlexMux stream share the same object time base.
–
The object time base of an SL-packetized stream carried in sections shall be conveyed by another
ISO/IEC 14496 stream within the Transport Stream or Program Stream as indicated by the OCR_ES_ID
field in the ES descriptor for that stream.
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
The following constraints shall apply for encoding of OCRs and FCRs in SL-packetized streams and FlexMux streams
carried over an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream:
–
The OCRs and FCRs in each SL-packetized stream and each FlexMux stream associated to the same
scene shall have the same resolution.
–
The resolution of OCRs and FCRs for a scene, fcr, shall have a value smaller than or equal to 90 000 Hz.
–
The ratio (fstc(t)/300)/fcr, shall be an integer value larger than or equal to one. Consequently the resolution
of the OCR and FCR syntax elements may only take values such as 90 000 Hz, 45 000 Hz, 30 000 Hz,
22 500 Hz, 18 000 Hz, etc.
Within the above constraints and the ISO/IEC 14496-1 constraint that the resolution fcr shall represent an integer
number of cycles per second, fcr can be selected as appropriate for the scene.
The ISO/IEC 14496 time stamps coded in the SL packet header shall refer to instants of the object time base of the
stream carried in the SL packet. The resolution of each such time stamp shall be of a factor 2k smaller than the
resolution of the OCRs or FCRs associated to the stream, with k a positive integer larger than or equal to zero. To
achieve the same wrap around, the length of the time stamp fields, TimeStampLength, shall be k bit smaller than the
length of the OCR or FCR field, OCRLength and FCRLength, respectively. Hence for each stream the following
conditions shall apply for encoding of time stamps:
–
TimeStampResolution = (OCRResolution or FCRResolution respectively)/2k, with k a positive integer
larger than or equal to zero. ISO/IEC 14496-1 requires TimeStampResolution to represent an integer
number of cycles per second.
–
TimeStampLength = OCRLength or FCRLength respectively – k.
The relationship between a value of the STC and the corresponding value of the object time base of a stream is
established by associating PTS fields in PES packet headers with the OCR or FCR in SL packet headers and FlexMux
Stream packets, respectively, as specified in 2.11.3.6 and 2.11.3.7.
2.11.3.4 Delivery timing of SL-packetized streams
To carry ISO/IEC 14496 content in an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream, ISO/IEC 14496-1 SL-packetized
streams are used. In each SL-packetized stream carried in a PES packet without the use of FlexMux, the
objectClockReference field shall be encoded as follows:
1)
An objectClockReference (OCR) field shall be present in the first SL packet header of a SL-packetized
stream.
2)
The SL-packetized stream shall be constructed such that the time interval between the bytes containing
the last bit of successive OCR fields shall be less than or equal to 0.7 s. Thus:
t (i′′) − t (i′) < = 0.7 s
for all i′ and i″ where i' and i″ are the indexes of the bytes containing the last bit of consecutive
OCR fields in the FlexMux stream.
If an objectClockReference is encoded in an SL packet header, also the instantBitrate field shall be coded.
2.11.3.5 Delivery timing of FlexMux streams
Next to SL-packetized streams also the ISO/IEC 14496-1 FlexMux tool may be used to carry ISO/IEC 14496 content in
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 streams. The payload of FlexMux packets shall consist of SL packets as
specified in ISO/IEC 14496-1. In each FlexMux stream carried in an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream the
fmxClockReference field shall be encoded as follows:
1)
An fmxClockReference (FCR) field shall be present in the first FlexMux packet of a FlexMux stream.
2)
The FlexMux stream shall be constructed such that the time interval between the bytes containing the last
bit of successive FCR fields shall be less than or equal to 0.7 s. Thus:
t (i′′) − t (i′) < = 0.7 s
for all i′ and i″ where i′ and i″ are the indexes of the bytes containing the last bit of consecutive
FCR fields in the FlexMux stream.
ITU-T Rec. H.222.0 (05/2006)
103
ISO/IEC 13818-1:2007 (E)
3)
All ISO/IEC 14496 time stamps within the SL-packetized streams carried within a FlexMux stream shall
refer to instants of the object time base conveyed by the FCR fields in the FlexMux stream. The
SL-packetized streams carried in FlexMux packets need not carry OCR fields. If OCR fields are present,
they may be ignored.
2.11.3.6 Carriage of SL-packetized streams in PES packets
A single ISO/IEC 14496-1 SL-packetized stream may be mapped into a single PES stream. One and only one SL packet
from an SL-packetized stream shall constitute the payload of one PES packet. PES packets that carry an SL-packetized
stream shall be identified by stream_id = 0xFA in the PES packet header.
When an OCR field is coded in the SL packet header, a PTS shall be encoded in the header of the PES packet that
carries such SL packet header. This PTS shall be encoded with the 33-bit value of the 90-kHz portion of the STC that
corresponds to the value of the object time base at the instant in time indicated by the OCR.
The ES_ID associated to the SL-packetized stream shall be signalled by an SL descriptor as specified in 2.6.46.
2.11.3.7 Carriage of FlexMux streams in PES packets
PES packets with a payload consisting of FlexMux packets shall be identified by stream_id = 0xFB in the PES packet
header. An integer number of FlexMux packets shall constitute the payload of one PES packet, i.e., the payload of a
PES packet carrying a FlexMux stream shall start with a FlexMux packet header and shall end with the last byte of a
FlexMux packet.
If an fmxClockReference (FCR) field is encoded in one of the FlexMux packets contained in a PES packet, then a PTS
shall be encoded in the header of the PES packet that contains such FlexMux packet. This PTS shall be encoded with
the 33-bit value of the 90-kHz portion of the STC that corresponds to the value of the object time base of the FlexMux
stream at the instant in time indicated by the FCR. In case multiple FlexMux packets with an encoded FCR field are
contained in a PES packet, the PTS shall correspond to the time indicated by the FCR in the first such FlexMux packet
encountered in the payload of the PES packet.
The ES_IDs associated to each SL-packetized stream conveyed in the FlexMux stream shall be signalled by an FMC
descriptor as specified in 2.6.44.
2.11.3.8 Carriage of SL packets and FlexMux packets in sections
For transport of ISO/IEC 14496 content in sections, ISO_IEC_14496_sections are defined. Only SL-packetized object
descriptor streams and scene description streams shall use ISO_IEC_14496_sections. A single ISO_IEC_14496_section
shall contain either an entire SL packet of an SL-packetized stream or an integer number of FlexMux packets each
carrying an SL packet of the same ISO/IEC 14496-1 elementary stream.
Table 2-94 shows the syntax of ISO_IEC_14496_sections defined to convey ISO/IEC 14496-1 elementary streams,
qualified by the table_id as either object descriptor or scene description stream data. Object descriptor stream data
consists of an Object Descriptor Table that comprises a number of object descriptors. The Object Descriptor Table may
be transmitted in multiple ISO_IEC_14496_sections. Scene description data consists of a Scene Description Table that
may comprise a number of BIFS commands. The Scene Description Table may be transmitted in multiple
ISO_IEC_14496_sections. It is not required that a complete table be received in order to process its payload. However,
the payload of sections shall be processed in the correct order, as indicated by the value of the section_number field in
the ISO_IEC_14496_section header bytes.
104
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Table 2-94 – Section syntax for transport of ISO/IEC 14496 stream
Syntax
No. of bits
Mnemonic
8
1
1
2
12
16
2
5
1
8
8
uimsbf
bslbf
bslbf
bslbf
uimsbf
uimsbf
bslbf
uimsbf
bslbf
uimsbf
uimsbf
8
bslbf
32
rpchof
ISO_IEC_14496_section() {
table_id
section_syntax_indicator
private_indicator
reserved
ISO_IEC_14496_section_length
table_id_extension
reserved
version_number
current_next_indicator
section_number
last_section_number
if (PMT_has_SL_descriptor(current_PID)) {
SL_Packet()
}
else if (PMT_has_FMC_descriptor(current_PID)) {
for (i = 1; i < N; i++)
FlexMuxPacket()
}
else {
for (i = 1; i < N; i++)
reserved
}
CRC_32
}
table_id – This 8-bit field shall be set to '0x04' or '0x05' in case of an ISO_IEC_14496_section. A value of '0x04'
indicates an ISO_IEC_14496_scene_description_section that carries an ISO/IEC 14496-1 scene description stream. A
value of '0x05' indicates an ISO_IEC_14496_object_descriptor_section that carries an ISO/IEC 14496-1 object
descriptor stream.
section_syntax_indicator – This 1-bit field shall be set to '1'.
private_indicator – This 1-bit field shall not be specified by this Specification.
ISO_IEC_14496_section_length – This 12-bit field shall specify the number of remaining bytes in the section
immediately following the ISO_IEC_14496_section_length field up to the end of the ISO_IEC_14496_section. The
value of this field shall not exceed 4093 (0xFFD).
table_id_extension – This 16-bit field shall not be specified by this Specification; its use and value are defined by the
user.
version_number – This 5-bit field shall represent the version number of the Object Descriptor Table or Scene
Description Table respectively. The version number shall be incremented by 1 modulo 32 with each new version of the
table. Version control is at the discretion of the application.
current_next_indicator – This 1-bit field shall be set to 1.
section_number – This 8-bit field shall represent the number of the ISO_IEC_14496_section. The section_number
field of the first ISO_IEC_14496_section of the Object Descriptor Table or the Scene Description Table shall have a
value equal to 0x00. The value of section_number shall be incremented by 1 with each additional section in the table.
last_section_number – This 8-bit field shall specify the number of the last section of the Object Descriptor Table or
Scene Description Table of which this section is a part.
PMT_has_SL_descriptor(current_PID) – A pseudo function that shall be true if an SL descriptor is contained in the
descriptor loop in the Program Map Table for the ITU-T Rec. H.222.0 | ISO/IEC 13818-1 program element that
conveys this ISO_IEC_14496_section.
SL_Packet() – A sync layer packet as specified in 10.2.2 of ISO/IEC 14496-1.
ITU-T Rec. H.222.0 (05/2006)
105
ISO/IEC 13818-1:2007 (E)
PMT_has_FMC_descriptor(current_PID) – A pseudo function that shall be true if an FMC descriptor is contained in
the descriptor loop in the Program Map Table for the ITU-T Rec. H.222.0 | ISO/IEC 13818-1 program element that
conveys this ISO_IEC_14496_section.
FlexMuxPacket() – A FlexMux packet as specified in 11.2.4 of ISO/IEC 14496-1.
CRC_32 – This 32-bit field shall contain the CRC value that gives a zero output of the registers in the decoder defined
in Annex A after processing the entire ISO_IEC_14496_section.
2.11.3.9 T-STD extensions
2.11.3.9.1
T-STD Model for 14496 content
Figure 2-5 shows extensions of the Transport System Target Decoder for delivery of ISO/IEC 14496 program elements
encapsulated in ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Transport Streams.
j-th access unit
k-th composition unit
FlexMux
Demultiplexing
A11(j)
td11(j)
FB11
FlexMux stream
DB11
Transport Stream
Demultiplexing
D11
C11(k)
tc11(k)
D12
C12(k)
tc12(k)
D1m
C1m (k)
tc1m (k)
Dn
Cn(k)
tcn(k)
A12(j)
td12(j)
FB12
DB12
MB1
TB1
Rx1
Rbx1
A1m (j)
td1m(j)
FB1m
DB1m
t(i)
SL packetized stream
i-th byte of
Transport Stream
FlexMux buffer model
(see ISO/IEC 14496-1)
Rxn
TBn
MBn
DBn
An(j)
tdn(j)
T1607310-99/d10
Figure 2-5 – T-STD model for ISO/IEC 14496 content
The following notation is used in Figure 2-5 and its description:
TBn is the transport buffer.
MBn is the multiplex buffer for FlexMux stream n or for SL-packetized stream n.
FBnp is the FlexMux buffer for the elementary stream in FlexMux channel p of FlexMux stream n.
DBnp is the decoder buffer for the elementary stream in FlexMux channel p of FlexMux stream n.
DBn is the decoder buffer for elementary stream n.
Dnp is the decoder for the elementary stream in FlexMux channel p of FlexMux stream n.
Dn is the decoder for elementary stream n.
Rxn is the rate at which data are removed from TBn.
Rbxn is the rate at which data are removed from MBn.
Anp(j) is the jth access unit in elementary stream in FlexMux channel p of FlexMux stream n. Anp(j) is
indexed in decoding order.
An(j) is the jth access unit in elementary stream n. An(j) is indexed in decoding order.
106
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Tdnp(j) is the decoding time, measured in seconds, in the system target decoder of the jth access unit in
elementary stream in FlexMux channel p of FlexMux stream n.
Tdn(j) is the decoding time, measured in seconds, in the system target decoder of the jth access unit in
elementary stream n.
Cnp(k) is the kth composition unit in elementary stream in FlexMux channel p of FlexMux stream n.
Cnp(k) results from decoding Anp(j). Cnp(k) is indexed in composition order.
Cn(k) is the kth composition unit in elementary stream n. Cn(k) results from decoding An(j). Cn(k) is
indexed in composition order.
tcnp(k) is the composition time, measured in seconds, in the system target decoder of the kth
composition unit in elementary stream in FlexMux channel p of FlexMux stream n.
tcn(k) is the composition time, measured in seconds, in the system target decoder of the kth
composition unit in elementary stream n.
t(i) indicates the time in seconds at which the ith byte of the Transport Stream enters the system
target decoder.
2.11.3.9.2
Processing of FlexMux streams
Complete Transport Stream packets containing data from FlexMux stream n are passed to the transport buffer for
FlexMux stream n, TBn. The size of TBn is fixed at 512 bytes. All bytes that enter TBn are removed from TBn at a rate
Rxn, specified by the TB_leak_rate field in the MultiplexBuffer descriptor associated with FlexMux stream n. When
there is no data in buffer TBn, rate Rxn is equal to zero. Duplicate Transport Stream packets are not delivered to MBn.
In case of carriage in PES packets, the PES packet header and payload data bytes are delivered to buffer MBn; all other
bytes leaving TBn do not enter MBn, and may be used to control the system. In case of carriage in
ISO_IEC_14496_sections, the section header, payload and CRC-32 data bytes are delivered to buffer MBn; all other
bytes do not enter MBn and may be used to control the system. In either case, the size of MBn shall be specified by the
MB_buffer_size field in the MultiplexBuffer descriptor.
The FlexMux Stream packet bytes in buffer MBn are all delivered to their associated FlexMux buffer at the rate
specified by the field fmxRate encoded in the FlexMux stream and in compliance with the FlexMux buffer model
defined in 11.2.9 of ISO/IEC 14496-1. Only FlexMux packet payload data bytes in FlexMux channel p of FlexMux
stream n enter buffer FBnp. FlexMux packet header bytes in FlexMux channel p of FlexMux stream n are discarded and
may be used to control the system. The rate specified by the fmxRate field shall be applicable for all FlexMux packets
in the stream immediately following the FlexMux Clock Reference channel packet up to the next encountered FlexMux
Clock Reference channel packet. When there is no FlexMux stream data present in MBn, no data is removed from MBn.
Bytes from the PES packet header or from the ISO_IEC_14496_section header that immediately precede a FlexMux
header are instantaneously removed and discarded and may be used to control the system. Bytes from the
ISO_IEC_14496_section CRC-32 fields that immediately follow the last FlexMux Stream packet in the section payload
are removed instantaneously and discarded and may be used to verify the integrity of the data. Bytes from the FlexMux
Clock Reference channel are instantaneously removed and discarded and may be used to lock the ISO/IEC 14496 object
time base to the STC. When there is no PES packet or section payload data bytes, respectively present in MBn, no data
is removed from MBn. All data that enters MBn leaves it. All PES packet payload bytes of stream n enter the FlexMux
demultiplexer instantaneously upon leaving MBn.
2.11.3.9.3
Definition of FlexMux Buffer, FBnp
For each channel p of a FlexMux stream n, the size of FlexMux buffer FBnp is defined using the FmxBufferSize
descriptor. FlexMux packet payload bytes are transferred from buffer FBnp to decoder buffer DBnp in compliance with
the FlexMux buffer model defined in 11.2.9 of ISO/IEC 14496-1. Only SL packet payload bytes in FlexMux channel p
of FlexMux stream n enter buffer DBnp. The SL packet header bytes in FlexMux channel p of FlexMux stream n are
discarded and may be used to control the system.
2.11.3.9.4
Processing of SL-packetized streams
Complete Transport Stream packets containing data from SL-packetized stream n are passed to the transport buffer for
SL-packetized stream n, TBn. All bytes that enter TBn are removed at a rate Rxn, specified by the TB_leak_rate field in
the MultiplexBuffer descriptor. When there is no data in buffer TBn, rate Rxn is equal to zero. Duplicate Transport
Stream packets are not delivered to MBn.
In case of carriage in PES packets, the PES packet header and payload data bytes are delivered to buffer MBn; all other
bytes leaving TBn do not enter MBn, and may be used to control the system. In case of carriage in
ISO_IEC_14496_sections, the section header, payload and CRC-32 data bytes are delivered to buffer MBn; all other
bytes do not enter MBn and may be used to control the system. In either case the size of MBn is specified by the
MB_buffer_size field in the MultiplexBuffer descriptor.
ITU-T Rec. H.222.0 (05/2006)
107
ISO/IEC 13818-1:2007 (E)
The SL-packetized stream bytes in buffer MBn are all delivered to the decoder buffer DBn at the rate specified by the
field instantBitRate encoded in the SL-packetized stream and in compliance with the System Decoder Model defined in
7.4 of ISO/IEC 14496-1. The rate specified by the instantBitRate field shall be applicable for all data bytes in the
SL-packetized stream immediately following the instantBitRate field in the SL packet header up to the next encountered
instantBitRate field. If there are no SL-packetized stream bytes in MBn, no bytes are removed from MBn. Bytes from
the PES packet header or from the ISO_IEC_14496_section header that immediately precede a SL packet header are
instantaneously removed and discarded and may be used to control the system. Bytes from the ISO_IEC_14496_section
CRC-32 fields that immediately follow the last SL packet payload byte in the section are removed instantaneously and
discarded and may be used to verify the integrity of the data. When there are no PES packet or section payload data
bytes, respectively present in MBn, no data is removed from MBn. All data that enters MBn leaves it. All PES packet
payload bytes of stream n enter buffer DBn instantaneously upon leaving MBn, with the exception of the SL packet
headers. Bytes from the SL packet headers do not enter DBn and may be used to control the system. The size of decoder
buffer DBn is given by the bufferSizeDB of the DecoderConfigDescriptor defined in ISO/IEC 14496-1.
2.11.3.9.5
Buffer management
Transport streams shall be constructed so that conditions defined in this subclause are satisfied.
TBn shall not overflow and shall be empty at least once every second. MBn shall not overflow. FBnp shall not overflow.
DBnp and DBn shall neither underflow nor overflow. Underflow of DBnp occurs when one or more bytes of an access
unit are not present in DBnp at the decoding time associated with this access unit. Underflow of DBn occurs when one or
more bytes of an access unit are not present in DBn at the decoding time associated with this access unit.
2.11.3.10
Carriage within a Transport Stream
2.11.3.10.1
Overview
A Transport Stream may contain one or more programs, each described by a Program Map Table. ISO/IEC 14496
content can be conveyed in addition to the already defined stream types for such a program. Elements of the
ISO/IEC 14496 content may be conveyed in one or more ITU-T Rec. H.222.0 | ISO/IEC 13818-1 program elements
referenced by a unique PID value within a Transport Stream. As a special case, it is possible that a program within a
Transport Stream consists only of ISO/IEC 14496 program elements. ISO/IEC 14496 content associated to a program
and carried in the Transport Stream shall be referenced in the Program Map Table of that program. An initial object
descriptor shall be used to define an ISO/IEC 14496-1 scene; the use of this descriptor is specified in 2.11.3.10.2.
Carriage of ISO/IEC 14496 content in a PID is signalled by a stream_type value of 0x12 or 0x13 in the Program Map
Table in association with that PID value. A value of 0x12 indicates carriage in PES packets. The stream_id field in the
PES packet header signals whether the PES packet contains a single SL packet or a number of FlexMux packets. A
stream_type value of 0x13 in the Program Map Table indicates that the program element carries an object descriptor
stream or a BIFS-Command stream contained in sections. In this case the table_id in the section header indicates
whether an object descriptor stream is carried in the sections or a BIFS-Command stream. See also Table 2-95. The
section contains either a single SL packet or a number of FlexMux packets, as indicated by the presence of an SL
descriptor or a FMC descriptor respectively in the descriptor loop of the Program Map Table for the ITU-T Rec.
H.222.0 | ISO/IEC 13818-1 program element that carries the sections. When ISO/IEC 14496 content is carried, the SL
descriptor and the FMC descriptor shall specify the ES_ID for each encapsulated ISO/IEC 14496 stream. When the
assignment of ES_ID values changes, the Program Map Table shall be updated and the version_number of the PMT
shall be incremented by 1 modulo 32. An example of a content access procedure for ISO/IEC 14496 program
components within a Transport Stream is given in Annex R.
108
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Table 2-95 – ISO/IEC defined options for carriage of an ISO/IEC 14496 scene and
associated streams in ITU-T Rec. H.222.0 | ISO/IEC 13818-1
Encapsulation in
SL packets
ISO/IEC 14496-1
object descriptor
streams
Encapsulation in
SL packets followed by
Multiplex into FlexMux
packets
Encapsulation in
SL packets
ISO/IEC 14496-1
scene description
streams
All other ISO/IEC
14496 streams
2.11.3.10.2
Carriage in PES packets
Stream_type = 0x12
Stream_id = '1111 1010'
Carriage in ISO_IEC_
14496_sections
Stream_type = 0x13
Table_id = 0x05
Carriage in PES packets
Stream_type = 0x12
Stream_id = '1111 1011'
Carriage in ISO_IEC_
14496_sections
Stream_type = 0x13
Table_id = 0x05
Carriage in PES packets
Stream_type = 0x12
Stream_id = '1111 1010'
Carriage in ISO_IEC_
14496_sections
Stream_type = 0x13
Table_id = 0x04
Encapsulation in
SL packets followed by
Multiplex into FlexMux
packets
Carriage in PES packets
Stream_type = 0x12
Stream_id = '1111 1011'
Carriage in ISO_IEC_
14496_sections
Stream_type = 0x13
Table_id = 0x04
Encapsulation in
SL packets
Carriage in PES packets
Stream_type = 0x12
Stream_id = '1111 1010'
Encapsulation in
SL packets followed by
Multiplex into FlexMux
packets
Carriage in PES packets
Stream_type = 0x12
Stream_id = '1111 1011'
Initial Object Descriptor
In case of carriage of an ISO/IEC 14496-1 scene, the ISO/IEC 14496-1 initial object descriptor serves as the initial
access point to all associated streams. The initial object descriptor shall be conveyed in the IOD descriptor located in
the descriptor loop immediately following the program_info_length field in the Program Map Table of the program to
which the scene is associated. It contains ES_Descriptors identifying the scene description and object descriptor streams
that form part of this program. It may also contain ES_Descriptors identifying one or more associated IPMP or
OCI streams. Identification of streams is done by means of ES_IDs as specified in clause 8 of ISO/IEC 14496-1.
2.11.3.11
P-STD Model for 14496 content
Figure 2-6 shows the STD model when ISO/IEC 14496 systems data are carried in a Program Stream.
j-th access unit
FlexMux
Demultiplexing
FlexMux buffer model
(see ISO/IEC 14496-1)
FB11
k-th composition unit
A11(j)
td11(j)
DB11
D11
C11(k)
tc11(k)
D12
C12(k)
tc12(k)
D1m
C1m(k)
tc1m(k)
Dn
Cn(k)
tcn(k)
A12(j)
td12(j)
Program Stream
Demultiplexing
FB12
DB12
B1
SL-packetized
stream
FB1m
DB1m
A1m (j)
td1m (j)
t(i)
Bn
DBn
i-th byte of
Program Stream
T1607320-99/d11
Figure 2-6 – P-STD model for ISO/IEC 14496 Systems stream
ITU-T Rec. H.222.0 (05/2006)
109
ISO/IEC 13818-1:2007 (E)
The following notation is used in Figure 2-6 and its description:
Bn is the input buffer for FlexMux stream n or for SL-packetized stream n.
FBnp is the FlexMux buffer for the elementary stream in FlexMux channel p of FlexMux stream n.
DBnp is the decoder buffer for the elementary stream in FlexMux channel p of FlexMux stream n.
DBn is the decoder buffer for elementary stream n.
Dnp is the decoder for elementary stream in FlexMux channel p of FlexMux stream n.
Dn is the decoder for elementary stream n.
Anp(j) is the jth access unit in elementary stream in FlexMux channel p of FlexMux stream n. Anp(j) is
indexed in decoding order.
An(j) is the jth access unit in elementary stream n. An(j) is indexed in decoding order.
Tdnp(j) is the decoding time, measured in seconds, in the system target decoder of the jth access unit in
elementary stream in FlexMux channel p of FlexMux stream n.
Tdn(j) is the decoding time, measured in seconds, in the system target decoder of the jth access unit in
elementary stream n.
Cnp(k) is the kth composition unit in elementary stream in FlexMux channel p of FlexMux stream n.
Cnp(k) results from decoding Anp(j). Cnp(k) is indexed in composition order.
Cn(k) is the kth composition unit in elementary stream n. Cn(k) results from decoding An(j). Cn(k) is
indexed in composition order.
tcnp(k) is the composition time, measured in seconds, in the system target decoder of the kth
composition unit in elementary stream in FlexMux channel p of FlexMux stream n.
tcn(k) is the composition time, measured in seconds, in the system target decoder of the kth
composition unit in elementary stream n.
t(i) indicates the time in seconds at which the ith byte of the Program Stream enters the system
target decoder.
2.11.3.11.1
Processing of FlexMux streams
At the input of the STD each byte in the payload of PES packets carrying a FlexMux stream n is transferred
instantaneously to buffer Bn. The i-th byte enters Bn at time t(i). PES packet header bytes do not enter buffer Bn and may
be used to control the system. The size of Bn is specified by the P-STD_buffer_size field in the header of the
PES packet that carries stream n.
The FlexMux stream packet bytes in buffer Bn are all delivered to their associated FlexMux buffer at the rate specified
by the field fmxRate encoded in the FlexMux stream and in compliance with the FlexMux buffer model defined in
11.2.9 of ISO/IEC 14496-1. Only FlexMux packet payload data bytes in FlexMux channel p of FlexMux stream n enter
buffer FBnp. FlexMux packet header bytes in FlexMux channel p of FlexMux stream n are discarded and may be used to
control the system. The rate specified by the fmxRate field shall be applicable for all FlexMux packets in the stream up
to the next encountered FlexMux Clock Reference channel packet. Bytes from the FlexMux Clock Reference channel
are instantaneously removed and discarded and may be used to lock the ISO/IEC 14496 object time base to the STC.
When there is no PES packet payload data present in Bn, no data is removed from Bn. All data that enters Bn leaves it.
All PES packet payload bytes of stream n enter the FlexMux demultiplexer instantaneously upon leaving Bn.
2.11.3.11.2
Definition of FlexMux Buffer, FBnp
For each channel p of a FlexMux stream n, the size of FlexMux buffer FBnp is defined using the FmxBufferSize
descriptor if a Program Stream Map is present in the Program Stream. FlexMux packet payload bytes are transferred
from buffer FBnp to decoder buffer DBnp in compliance with the FlexMux buffer model defined in 11.2.9 of
ISO/IEC 14496-1. Only SL packet payload bytes in FlexMux channel p of FlexMux stream n enter buffer DBnp. The
SL packet header bytes in FlexMux channel p of FlexMux stream n are discarded and may be used to control the system
2.11.3.11.3
Processing of SL-packetized streams
At the input of the STD each byte in the payload of PES packets carrying an SL-packetized stream n is transferred
instantaneously to buffer Bn. The i-th byte enters Bn at time t(i). PES packet header bytes do not enter buffer Bn and may
be used to control the system. The size of Bn is specified by the P-STD_buffer_size field in the header of the PES
packet that carries stream n. The SL-packetized stream bytes in buffer Bn are delivered to the decoder buffer DBn at the
rate specified by the field instantBitRate encoded in the SL-packetized stream and in compliance with the System
Decoder Model defined in 7.4 of ISO/IEC 14496-1. The rate specified by the instantBitRate field shall be applicable for
all data bytes in the SL-packetized stream up to the next encountered instantBitRate field. When there is no PES packet
110
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
payload data present in Bn, no data is removed from Bn. All data that enters Bn leaves it. All bytes of stream n enter
buffer DBn instantaneously upon leaving Bn, with the exception of the SL packet headers. Bytes from the SL packet
headers do not enter DBn and may be used to control the system. The size of decoder buffer DBn is given by the
bufferSizeDB of the DecoderConfigDescriptor defined in ISO/IEC 14496-1.
2.11.3.11.4
Buffer management
Program Streams shall be constructed so that Bn does not overflow. FBnp shall not overflow. DBnp and DBn shall neither
underflow nor overflow. Underflow of DBnp occurs when one or more bytes of an access unit are not present in DBnp at
the decoding time associated with this access unit. Underflow of DBn occurs when one or more bytes of an access unit
are not present in DBn at the decoding time associated with this access unit.
2.11.3.12
Carriage within a Program Stream
2.11.3.12.1
Overview
A Program Stream contains only one program. ISO/IEC 14496 data can be conveyed in addition to the already defined
stream types for such a program. As a special case, it is also possible that a Program Stream carries only ISO/IEC 14496
data. If a Program Stream Map is present, ISO/IEC 14496 content carried in the Program Stream shall be referenced as
follows. Carriage of ISO/IEC 14496-1 scenes and associated ISO/IEC 14496 streams in SL and FlexMux packets is
indicated by the appropriate stream_id and by an initial object descriptor; the use of this descriptor is specified in
2.11.3.12.2. For each carried ISO/IEC 14496 stream the SL descriptor and the FMC descriptor shall specify the ES_ID.
When the assignment of ES_ID values changes, the Program Stream Map, if present, shall be updated and the
program_stream_map_version shall be incremented by 1 modulo 32. Note that in a Program Stream the ISO/IEC 14496
content may also be referenced by private means.
For an example of a content access procedure for ISO/IEC 14496 program components within a Program Stream, see
Annex R.
2.11.3.12.2
Initial object descriptor
In case of carriage of an ISO/IEC 14496-1 scene, the ISO/IEC 14496 initial object descriptor serves as the initial access
point to all associated streams. If a Program Stream Map is present in the Program Stream, the initial object descriptor
shall be conveyed in the IOD descriptor that is located in the descriptor loop immediately following the
program_stream_info_length field. It contains ES_Descriptors identifying the scene description and object descriptor
streams of the scene that form part of this program. It may also contain ES_Descriptors identifying one or more
associated IPMP or OCI streams. Identification of streams is done by means of ES_IDs as specified in clause 8 of
ISO/IEC 14496-1. In a Program Stream, the initial object descriptor may also be conveyed by private means.
2.12
Carriage of metadata
2.12.1
Introduction
An ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream can carry metadata. The format of the metadata may be defined by
ISO or by any other authority. This subclause defines how to carry the metadata; transport mechanisms are defined as
well as metadata related-signalling, the applied metadata timing model and extensions of the STD model for decoding
of metadata.
A metadata service is defined to be a coherent set of metadata of the same format delivered to a receiver for a specific
purpose. Metadata services are contained in metadata streams; each metadata stream carries one or more metadata
services. This Specification assumes the notion of metadata Access Units within a metadata service. The definition of a
Metadata Access Unit is metadata format specific, but each metadata service is assumed to represent a concatenation (or
a collection) of metadata Access Units.
When transporting a metadata service over an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream, a unique metadata
service id is assigned to each such service. A metadata service id references uniquely a metadata service among all the
metadata services available on the same Transport or Program Stream, and not unique solely within a metadata stream.
The metadata service identifier is used to retrieve the metadata service and all the information needed to decode it.
Decoding of metadata may require the availability of decoder configuration data. If a metadata service carried in an
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream requires decoder configuration data for decoding, then this metadata
decoder configuration data shall be carried within the same program of the same ITU-T Rec. H.222.0 |
ISO/IEC 13818-1 stream.
Subclause 2.12.2 discusses metadata timing, while 2.12.3 provides an overview of tools that are defined for transport of
metadata over an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream. The use of available transport tools is specified in
ITU-T Rec. H.222.0 (05/2006)
111
ISO/IEC 13818-1:2007 (E)
2.12.4 up to 2.12.8, and 2.12.9 specifies metadata related signalling. Finally, the STD model for metadata decoding is
specified in 2.12.10.
Since many forms of metadata may be carried, it is essential to signal both the precise format and encoding of the
metadata, and the semantic meaning the metadata conveys. The former is signalled by the metadata format, while the
latter is signalled by the metadata application format. In other words, the metadata format conveys how the metadata
shall be decoded, while the metadata application format conveys how to use the metadata, essentially which application
uses the metadata. This division is important since it separates the encoding or representation of the metadata from its
meaning, thereby allowing an application to be agnostic of the means by which its metadata is conveyed.
2.12.2
Metadata time-line model
Metadata may refer to time codes associated to the content, for example to indicate the beginning of a content segment.
Each time indication made in the metadata refers to a certain metadata content time line specific to the actual metadata
format and/or metadata application format. For example, one metadata (application) format may use UTC, while
another metadata application format may use SMPTE time codes. To allow for transport of the content at any time over
any media, the metadata content time line is expected but not required to be transport agnostic.
When transporting content and the associated metadata over ITU-T Rec. H.222.0 | ISO/IEC 13818-1 streams, accurate
time references from the metadata to the content are to be maintained. The same is needed if the metadata is delivered
over other means. To achieve this, the time line model of Figure 2-7 is assumed in this Specification.
Metadata timing,
transport agnostic
Content time line as
specified in the
Metadata (format is
metadata specific:
UTC, SMPTE, …)
MT
A/V Content
production
transport
A/V Content (1)
A/V Content (2)
RT
Delivery time
line (STC)
Receiver Content
time line (single
format)
H.222.0_FAMD1-1
Figure 2-7 – Timing model for delivery of content and metadata
Metadata is associated to the audiovisual content, usually in a transport agnostic way, at production or any other stage
prior to transport. Where needed, time information is embedded in the metadata to indicate for example specific
segments within the content, using the metadata content time line used in the metadata. For example UTC or SMPTE
time codes may be used. The time line format is independent of any time code that may or may not be embedded in the
audiovisual stream itself. For example, the metadata time line may utilize UTC, while SMPTE time codes are
embedded in the video stream.
The following requirements shall be met for each metadata stream:
•
no time discontinuities shall occur in the metadata content time line;
•
the metadata content time line shall be locked to the sampling clock of the content;
•
each time reference in the metadata stream refers to the same metadata content time line.
At transport, a transport-specific timing is associated with the content; this is the delivery time line. In the case of
transport over an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream, the delivery time line is provided by the System Time
Clock, the STC. The content may be delivered as a contiguous piece of information, but it is also possible to interrupt
the delivery of the content, for example in the case of news-flash interruptions of a program; in such and other cases
time line discontinuities may occur.
When time references are used in the metadata, in the System Target Decoder (STD) these time references are to be
associated unambiguously with time values in the received content. To achieve this, a receiver content time line is
required. The STC can be used as the receiver content time line, but due to STC discontinuities that may occur, the STC
does not necessarily offer an unambiguous time association. Therefore the NPT (Normal Play Time) concept from
ISO/IEC 13818-6 DSM-CC is also available for use as the receiver content time line. In any playback mode, such as
normal, reverse, slow motion, fast forward, fast backward and still picture, the NPT provides an unambiguous time
112
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
association, independent of STC discontinuities, and independent of insertions of other content. Note that a new
NPT_reference_descriptor needs to be transmitted when the STC rolls over.
To maintain the accurate time references from metadata to the content, information is needed how to map a metadata
time, MT, defined on the metadata content time line to the corresponding receiver time, RT, of the receiver content time
line. This is achieved by providing the offset in time (in 90-kHz units) between the metadata content time line and the
receiver content time line. The offset is provided in the content labelling descriptor. The offset conveys the value of the
metadata time base at the instant in time at which the receiver content time base reaches a specified value. See also
Figure 2-7.
The timing in metadata systems may refer to a specific picture or audio frame, for example using SMPTE time codes.
The offset in time between the metadata content time line and the receiver content time line is expressed in units of
90 kHz, and consequently the metadata time reference will translate into a 90-kHz value in receivers. To accommodate
for inaccuracies, receivers shall assume that when reference is made to a picture or audio frame the closest match shall
be used. For example, the translated 90-kHz metadata time reference shall be matched with the picture or frame whose
PTS value is closest to the translated value.
When using NPT, during playback in any mode at any point in time the offset remains constant between the metadata
time base and the NPT time base. As long as neither STC discontinuities nor insertions with other content occur, the
same is true for the offset in time between the metadata time base and the STC time base, but only in normal playback
mode. For privately defined time lines the offset is also required to be constant, but possibly within constraints not
defined in this Specification.
When synchronous transport of metadata is applied in PES packets or by using the synchronized DSM-CC download
protocol, PTSs are assigned to the metadata. Such PTS may for example indicate the point in time at which the
metadata becomes valid. This implies a priori knowledge of how to associate the metadata to the delivery timing.
However, synchronously transported metadata may also contain time references, which are to be mapped from the
metadata content time line to the receiver content time line using the specified offset between both time lines. See also
Figure 2-8.
Metadata 1
Content time line
as specified in the
Metadata (format
is metadata
specific: UTC,
SMPTE, …)
Metadata 3
Metadata 2
Metadata timing,
transport agnostic
MT
A/V Content
production
transport
A/V Content (1)
PTS3
Metadata 3
PTS2
Metadata 2
Metadata 1
PTS1
A/V Content (2)
RT
Delivery
time line
(STC)
Receiver
Content time
line (single
format)
H.222.0_FAMD1-2
Figure 2-8 – Delivery of metadata in PES packets
2.12.3
Options for transport of metadata
To acknowledge the very diverse characteristics of metadata, a variety of tools is defined to transport the metadata over
an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream.
This Specification defines two tools for synchronous delivery of the metadata:
•
carriage in PES packets;
•
use of DSM-CC synchronized download protocol.
In addition, this Specification defines three tools for asynchronous delivery of metadata:
•
carriage in metadata sections;
•
use of DSM-CC data carousels;
•
use of DSM-CC object carousels.
ITU-T Rec. H.222.0 (05/2006)
113
ISO/IEC 13818-1:2007 (E)
Note that some of the asynchronous transport options support carousels and file structures. The choice of transport tool
depends on the requirements that apply to the delivery of the metadata, and the requirements of the tools, as described
in the following subclauses.
Metadata may also be carried by private means such as PES packets with stream id value 0xBD or 0xBF
(private_stream_id_1 or private_stream_id_2) or private sections. This Specification does not specify how to use private
means for carriage of metadata, but allows for signalling of such metadata using the descriptors defined in 2.6.56 up to
2.6.63.
The basic referencing of metadata services is the same for all tools, using the metadata service id. However, there are
differences per tool. When PES packets, metadata sections, or synchronized DSM-CC download sections are used, data
from each metadata service is explicitly signalled within a metadata stream, using the metadata_service_id field.
However, when using DSM-CC carousels, this signalling is left at the discretion of metadata applications. Note that this
Specification allows for carriage of a metadata service in a DSM-CC carousel, but does not constrain how many
metadata services can be carried in one DSM-CC carousel.
Metadata decoder configuration data is signalled explicitly when carried in a metadata descriptor, in PES packets with
stream_type 0x15 and stream_id 0xFC, in metadata sections or in synchronized DSM-CC download sections. When
metadata decoder configuration data is carried in a DSM-CC carousel, the signalling of such data is required, but not
defined by this Specification; instead, such signalling is left at the discretion of applications.
2.12.4
Use of PES packets to transport metadata
PES packets provide a mechanism for synchronous transport of metadata. By means of the PTS in the PES packet
header the metadata access units are associated to a certain instant of the STC, without the need for time references in
the metadata. This implies a priori knowledge of how to associate the metadata to the delivery timing. Specific
stream_id and stream_type values are assigned to signal PES packets carrying metadata; see 2.12.9.
When using PES packets with a stream_type of 0x15 and a stream_id of 0xFC to transport the metadata, a Metadata
Access Unit Wrapper shall be used as the tool to align PES packets and the metadata Access Units, using
metadata_AU_cells. This allows random access indication, whose meaning depends on the format of the metadata, and
a cell sequence counter to identify loss of metadata_AU_cells. Each metadata Access Unit is carried and, if appropriate,
fragmented in one or more metadata_AU_cells. In each PES packet that carries metadata, the first
PES_packet_data_byte shall be the first byte of a Metadata_AU_cell. For each metadata Access Unit contained in the
same PES packet, the PTS in the PES header applies. The PTS signals the time at which the metadata Access Units are
decoded instantaneously and removed from buffer Bn in the STD. Note that the relationship between a decoded
metadata Access Unit and audiovisual content is beyond the scope of this Specification.
A PES packet may contain a single metadata_AU_cell. This is useful if a metadata Access Unit does not fit into a single
PES packet, in which case the fragmentation of the metadata Access Unit is handled by the metadata_AU_cell.
When metadata is carried by PES packets in a Program Stream, and if a Program Stream Map is applied in that Program
Stream, then the Program Stream Map shall specify which PES packets contain the associated metadata.
2.12.4.1 Metadata Access Unit Wrapper
The metadata Access Unit Wrapper shall be used when carrying metadata Access Units in PES packets with a
stream_type of 0x15 and a stream_id value of 0xFC or in synchronized DSM-CC download sections of stream_type
0x19. The wrapper defines a structure consisting of a concatenated number of Metadata_AU_cells. By coding the size
of the contained metadata in each metadata_AU_cell, metadata agnostic parsing is possible in receivers: the parser can
retrieve the metadata and provide it to a metadata decoder without a priori knowledge on any detail of the metadata.
The Metadata_AU_cell shall be aligned with the transport; that is the first byte of the payload of the PES packet or
synchronized DSM-CC download section shall be the first byte of a Metadata_AU_cell.
If a metadata Access Unit does not fit entirely into a metadata_AU_cell, then the metadata Access Unit shall be
fragmented into multiple metadata_AU_cells, where the fragmentation_indication in each such metadata_AU_cell
signals that the metadata_AU_cell contains a fragment.
To each Metadata_AU_cell that is contained in the same PES packet or synchronized download section, the PTS as
coded in the header of the PES packet or synchronized download section, respectively, applies.
114
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Table 2-96 – Metadata Access Unit Wrapper
Syntax
No. of bits
Mnemonic
Metadata_AU_wrapper () {
for (i = 0; i < N; i++){
Metadata_AU_cell ()
}
}
Table 2-97 – Metadata AU cell
Syntax
No. of bits
Mnemonic
8
8
2
1
1
4
16
uimsbf
uimsbf
bslbf
bslbf
bslbf
bslbf
uimsbf
8
bslbf
Metadata_AU_cell () {
metadata_service_id
sequence_number
cell_fragment_indication
decoder_config_flag
random_access_indicator
reserved
AU_cell_data_length
for (i = 0; I < AU_cell_data_length; i++){
AU_cell_data_byte
}
}
metadata_service_id: This 8-bit field identifies the metadata service associated to the metadata Access Unit carried in
this metadata AU cell.
sequence_number: This 8-bit field specifies the sequence number of the metadata_AU_cell. This number increments
by one for each successive metadata_AU_cell constituting the metadata_AU_wrapper, independent of the coded value
of the metadata_service_id.
cell_fragment_indication: This 2-bit field conveys information on the metadata Access Unit carried in this
metadata_AU_cell, corresponding to Table 2-98.
Table 2-98 – Cell fragment indication
Value
Description
11
A single cell carrying a complete metadata Access Unit.
10
The first cell from a series of cells with data from one metadata Access Unit.
01
The last cell from a series of cells with data from one metadata Access Unit.
00
A cell from a series of cells with data from one metadata Access Unit, but neither
the first nor the last one.
random_access_indicator: This 1-bit field, when coded with the value '1', indicates that the metadata carried in this
metadata_AU_cell represents an entry point to the metadata service where decoding is possible without information
from previous metadata_AU_cells. The meaning of a random access point is defined by the format of the metadata.
decoder_config_flag: This 1-bit field signals the presence of decoder configuration information in the carried metadata
Access Unit. Note that this does not preclude the presence of metadata in the Access Unit next to decoder configuration
data.
AU_cell_data_length: This 16-bit field specifies the number of AU_cell_data_bytes immediately following.
AU_cell_data_byte: This 8-bit field contains contiguous bytes from a metadata Access Unit.
2.12.5
Use of the DSM-CC synchronized download protocol to transport metadata
For synchronized transport, in addition to PES packets, the DSM-CC synchronized download protocol can be used.
When using synchronized DSM-CC download sections to transport the metadata, the Metadata Access Unit Wrapper
defined in 2.12.4.1 shall be used as the tool to encapsulate metadata Access Units. This allows random access
indication, whose meaning depends on the format of the metadata, and a cell sequence counter to identify loss of
metadata_AU_cells. In each DSM-CC synchronized download section that carries metadata, the first byte of the
payload shall be the first byte of a Metadata_AU_cell. For each metadata Access Unit contained in the same DSM-CC
synchronized download section, the PTS in the section header applies. The PTS signals the time at which the metadata
ITU-T Rec. H.222.0 (05/2006)
115
ISO/IEC 13818-1:2007 (E)
Access Units are decoded instantaneously and removed from buffer Bn in the STD. Note that the relationship between a
decoded metadata Access Unit and audiovisual content is beyond the scope of this Specification. A specific stream_type
value (as detailed in Table 2-34) is assigned to signal carriage of metadata in DSM-CC synchronized download
sections.
2.12.6
Use of metadata sections to transport metadata
If asynchronous transport of metadata Access Units without a carousel delivery mechanism is needed, metadata sections
can be utilized. The syntax and semantics of metadata sections are defined in this subclause. Each metadata section
shall carry either one complete metadata Access Unit or a single part of one metadata Access Unit, as signalled by the
section_fragment_indication field.
For transport in metadata sections, the metadata Access Units are structured in one or more Metadata Tables. Each
Metadata Table contains one or more complete metadata Access Units from one or more metadata services.
Conceptually, the transport mechanism of Metadata Tables is comparable to the transport mechanism of Program Map
Tables and Program Association Tables. Each Metadata Table may be made up of multiple metadata sections. Each
Metadata Table may contain metadata from multiple metadata services.
Specific stream_type and table_id values are assigned to metadata sections. Metadata decoder configuration data can
also be carried in sections, signalled by a metadata description value, as assigned by the metadata decoder configuration
descriptor.
Table 2-99 – Section syntax for transport of metadata
Syntax
Metadata_section() {
table_id
section_syntax_indicator
private_indicator
random_access_indicator
decoder_config_flag
metadata_section_length
metadata_service_id
reserved
section_fragment_indication
version_number
current_next_indicator
section_number
last_section_number
for (i = 1; i < N; i++){
metadata_byte
}
CRC_32
}
No. of bits
Mnemonic
8
1
1
1
1
12
8
8
2
5
1
8
8
uimsbf
bslbf
bslbf
bslbf
bslbf
uimsbf
uimsbf
bslbf
bslbf
uimsbf
bslbf
uimsbf
uimsbf
8
bslbf
32
rpchof
table_id: The table_id is an 8-bit field that shall be set to '0x06' for each metadata section.
section_syntax_indicator: This 1-bit field shall be set to '1'.
private_indicator: This 1-bit field is not specified by this Specification.
random_access_indicator: This 1-bit field, when coded with the value '1', indicates that the metadata carried in this
metadata section represents an access point to the metadata service where decoding is possible without information
from previous metadata sections. The meaning of a random access point is defined by the format of the metadata.
decoder_config_flag: This 1-bit field, when coded with the value '1', indicates that decoder configuration information
is present in the metadata Access Unit carried in this metadata section.
metadata_section_length: This 12-bit field shall specify the number of remaining bytes in the section immediately
following the metadata_section_length field, and including the CRC. The value of this field shall not exceed
4093 (0xFFD).
metadata_service_id: This 8-bit field identifies the metadata service associated to the metadata Access Unit carried in
this metadata section. Each Metadata Table may contain metadata from multiple metadata services.
116
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
section_fragment_indication: This 2-bit field conveys information on the fragmentation of the metadata Access unit
carried in this metadata section, corresponding to Table 2-100.
Table 2-100 – Section fragment indication
Value
Description
11
A single metadata section carrying a complete metadata Access Unit.
10
The first metadata section from a series of metadata sections with data from one metadata
Access Unit.
01
The last metadata section from a series of metadata sections with data from one metadata
Access Unit.
00
A metadata section from a series of metadata sections with data from one metadata Access
Unit, but neither the first nor the last one.
version_number: This 5-bit field is the version number of the whole Metadata Table. The version number shall be
incremented by 1 modulo 32 whenever the information contained within the Metadata Table changes. When the
current_next_indicator is set to '1', then the version_number shall be that of the currently applicable Metadata Table.
When the current_next_indicator is set to '0', then the version_number shall be that of the next applicable Metadata
Table.
current_next_indicator: A 1-bit field, which when set to '1' indicates that the Metadata Table sent is currently
applicable. When the bit is set to '0', it indicates that the Metadata Table sent is not yet applicable and shall be the next
Metadata Table to become valid.
section_number: This 8-bit field gives the number of the metadata section. The section_number of the first section in a
Metadata Table shall be 0x00. The section_number shall be incremented by 1 with each additional section in this
Metadata Table.
last_section_number: This 8-bit field specifies the number of the last section (that is, the section with the highest
section_number) of the complete Metadata Table of which this section is a part.
metadata_byte: This 8-bit contains contiguous bytes from a metadata Access Unit.
CRC_32: This 32-bit field shall contain the CRC value that gives a zero output of the registers in the decoder defined in
Annex A after processing the entire metadata_section.
2.12.7
Use of the DSM-CC data carousel to transport metadata
The DSM-CC tools as defined in ISO/IEC 13818-6 for Data Carousels can be used if a carousel delivery mechanism is
required without the need to express the hierarchical organization of the metadata structure in the transport mechanism.
Information on the carousel in which the metadata is contained, is included in the metadata descriptor defined in 2.6.60
and 2.6.62. A specific stream_type value is assigned to signal carriage of metadata in the DSM-CC data carousel. Note
that signalling of metadata services within a DSM-CC data carousel is required, but not defined by this Specification.
2.12.8
Use of the DSM-CC object carousel to transport metadata
If a carousel delivery mechanism is required with the capability to express the hierarchical organization of the metadata
structure in the transport, then the DSM-CC tools and file structures as defined in ISO/IEC 13818-6 for User to User
Object Carousels can be used. These file structures provide the tools to structure the metadata as deemed appropriate for
efficient parsing of the metadata and for expressing the hierarchical organization of the metadata. Information needed to
identify the carousel in which the metadata is contained, is included in the metadata descriptor defined in 2.6.60 and
2.6.61. This may be the IOP:IOR() as defined in 11.3.1 and 5.7.2.3 of ISO/IEC 13818-6 DSM-CC. A specific
stream_type value is assigned to signal carriage of metadata in the DSM-CC object carousel. Note that signalling of
metadata services within a DSM-CC object carousel is required, but not defined by this Specification.
2.12.9
Metadata-related signalling
Metadata-related signalling covers four distinct areas:
•
signalling of metadata services and streams;
•
signalling of content for use by a metadata system;
•
association of metadata to content; and
•
signalling of decoder configuration data.
ITU-T Rec. H.222.0 (05/2006)
117
ISO/IEC 13818-1:2007 (E)
2.12.9.1 Signalling of metadata services and streams
Carriage of metadata is signalled by a stream_type value in the inclusive range between 0x15 and 0x19, specifying
which of the five methods described in 2.12.4 to 2.12.8 is used to transport the metadata, and if appropriate, by a
stream_id value of 0xFC indicating a metadata stream.
To uniquely identify a metadata service a metadata_service_id value is assigned to each such service by the transport;
the assigned value shall be unique within the Transport or Program Stream carrying the metadata service. If the
metadata is carried in PES packets with a stream_id of 0xFC, or in metadata sections, or in ISO/IEC 13818-6
synchronized download sections, the assigned metadata_service_id value is signalled explicitly in the header of the
metadata_AU_cell or the metadata section. If a ISO/IEC 13818-6 carousel is used to carry the metadata, then the
signalling of metadata services is left to the application. The metadata descriptor specifies the format of the metadata
and provides information on the decoder configuration data, and is linked to the metadata service by carrying
information on the metadata service it is associated with.
2.12.9.2 Signalling of content for use by a metadata system
In 2.6.56 and 2.6.57, a content labelling descriptor is defined that can be used to assign a metadata application format
specific reference, the content_reference_id_record, to audiovisual or any other content carried over an MPEG-2
Transport Stream or Program Stream. The content_reference_id_record can be used by the metadata system as a label to
refer to such content. The content may represent, for example, a program or a stream or segments thereof. The content
labelling descriptor also provides information on the content time base used for time referencing from the metadata,
including the constant offset in time between the metadata time base and the applied content time base. The descriptor
allows carriage of private data. The metadata_application_format may define constraints on the
content_reference_record, such as constraints on the time period during which it is valid.
2.12.9.3 Association of metadata to content
In 2.6.58 and 2.6.59 the metadata pointer descriptor is defined to associate a single metadata service to audiovisual or
any other content in an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream. The metadata is associated to the content within
the context as defined by the location of the descriptor. In a transport stream, the descriptor may be located in the PMT
in the descriptor loop for either the program or an elementary stream, but may also be located in tables not defined in
this Specification, such as tables describing bouquets of broadcast services.
The metadata pointer descriptor points from the content's context to the metadata service associated to that content. The
descriptor provides the value of the metadata_service_id that is assigned to the associated metadata service, as well as
one or more locations of the associated metadata. The location may for example be within the same Transport Stream as
the content, or within another Transport Stream, but also at a non-ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream
location such as the Internet.
2.12.9.4 Signalling decoder configuration data
Decoding of metadata may require the availability of metadata decoder configuration data. If needed, decoder
configuration data shall be contained in one of the metadata services in the same program in the same ITU-T
Rec. H.222.0 | ISO/IEC 13818-1 stream as the metadata service. If decoder configuration data is needed to decode a
metadata service, then the metadata descriptor either carries such data or provides the information on retrieval of the
decoder configuration data from the same or another metadata service. In a transport stream such other service can be
found by searching in the PMT for a metadata_descriptor with the metadata_service_id as specified in the
decoder_config_metadata_service_id field (and with the same metadata_format and the same metadata_application
format).
2.12.9.5 Overview of metadata signalling
Figure 2-9 provides an example of metadata signalling, in which a single program carries the content (or essence), the
"content program", and the metadata is carried in a separate program, the "metadata program". In this example, the
metadata program and the content program exist on the same transport stream.
118
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Video
PMT
Content
labelling
descriptor
Points to :
• program number
• metadata service id
Content
label
Audio
Metadata
pointer
descriptor
MPEG-2
program
with
associated
metadata
Content
referencing by
metadata service
Metadata
descriptor
Metadata
service
id
PMT
MPEG-2
Program
carrying
Metadata
Stream
Metadata stream
Metadata service
H.222.0_FAMD1-3
Figure 2-9 – Metadata signalling and referencing
In the content program there are two metadata-related descriptors, the content_labeling descriptor and the
metadata_pointer descriptor. The content_labeling descriptor associates a label, illustrated in the diagram by "content
label" and encoded in the descriptor in the content_reference_id fields, with the content. The label can than be used by
the metadata service to refer to the essence, either in whole, in part, or by a time-described segment. For example, the
content_labeling descriptor could provide the label "News of 1/1/02", and the metadata could then refer to a specific
story item in the "News of 1/1/02", for example by providing the specific timing of the story item.
The metadata pointer descriptor provides information of where the metadata service can be found for the given content.
In this example, the metadata is carried in a separate program, but it would be equally valid to have the metadata carried
in the same program as the content, or provided by some means beyond the scope of this Specification, for instance
from a URL. This descriptor also provides the metadata service id value that is assigned to the metadata service. This is
required since a metadata stream could carry multiple metadata services for many different programs and each program
needs to be able to uniquely identify its own metadata service.
In the metadata program, the metadata descriptor signals to which metadata service within a metadata stream it applies.
If used, the metadata descriptor provides details of where to find the decoder configuration information.
Upon identifying a metadata pointer descriptor in the PMT by a receiver decoding the content program, the receiver
retrieves the metadata descriptor from the metadata program. If needed first the decoder configuration data is retrieved,
then the decoder is configured accordingly, after which the metadata service can start being decoded.
2.12.10 STD model for metadata
The STD model specifies normative constraints on ITU-T Rec. H.222.0 | ISO/IEC 13818-1 streams that carry metadata.
For decoding of metadata in the STD, the regular T-STD and P-STD models are applicable with buffer Bn, input rate
Rxn of the metadata into Bn and output rate Rmetadata out of Bn and into Dmetadata , the metadata decoder. See Figure 2-10.
Rxn
Bn
Rmetadata
Dmetadata
H.222.0_FAMD1-4
Figure 2-10 – Metadata decoding in the STD
The metadata enters buffer Bn at rate Rxn. In the P-STD, rate Rxn equals the rate of the program stream. In the T-STD,
rate Rxn is the rate out of TBn and equal to the rate defined by the metadata_input_leak_rate field in the metadata STD
descriptor. The size BSn of buffer Bn is equal to the size defined in the metadata_buffer_size field in the metadata STD
ITU-T Rec. H.222.0 (05/2006)
119
ISO/IEC 13818-1:2007 (E)
descriptor. In case of synchronous delivery, metadata decoding is instantaneous and controlled by PTSs. At decode
time, that is when the STC equals the PTS, the associated metadata is removed instantaneously from Bn. In case of
asynchronous delivery, the metadata is removed from Bn at a rate Rmetadata equal to the rate defined by the
metadata_output_leak_rate field in the metadata STD descriptor. Buffer Bn shall not overflow.
Note that the STD model defines constraints on the delivery of the metadata, without specifying any constraint on the
timing used in the metadata.
2.13
Carriage of ISO 15938 data
2.13.1
Introduction
Carriage of metadata over an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream as defined in 2.12 allows for carriage
of ISO 15938 data by appropriate coding of the metadata_format field. In this subclause, for the purpose to transport
ISO 15938 data, a specific instance is defined. Carriage of ISO 15938 data shall meet each requirement defined in 2.12,
but in addition the requirements defined in this subclause shall apply for transport of ISO 15938 data.
2.13.2
ISO 15938 decoder configuration data
Decoding of ISO 15938 data requires the availability of decoder configuration data. Consequently, when ISO 15938
data is carried in an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream, then the metadata descriptor shall signal carriage of
associated decoder configuration data in the same ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream by coding a value of
the decoder_config_flags of either '001' or '010' or '011' or '100'.
2.14
Carriage of ITU-T Rec. H.264 | ISO/IEC 14496-10 video
2.14.1
Introduction
This Specification defines the carriage of ITU-T Rec. H.264 | ISO/IEC 14496-10 elementary stream within ITU-T
Rec. H.222.0 | ISO/IEC 13818-1 systems, both for program and transport streams. Typically, an ITU-T Rec. H.264 |
ISO/IEC 14496-10 stream will be an element of an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 program, as defined by the
PMT in a Transport Stream and the PSM in a Program Stream. The carriage and buffer management of AVC video
streams is defined using existing parameters from this Recommendation | International Standard such as PTS and DTS,
as well as information present within an AVC video stream.
Carriage of AVC video streams in an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream defines accurate mapping between
STD parameters and HRD parameters that may be present in an AVC video stream. Requirements are defined for the
presence of HRD parameters in the AVC video stream, to ensure that it can be verified whether each STD requirement
is met for each AVC video stream carried in a transport stream or a program stream.
NOTE 1 – Though the timing information present in the AVC video stream may not use a 90-kHz clock, the PTS and DTS
timestamps need to be expressed in units of 90 kHz.
When an ITU-T Rec. H.264 | ISO/IEC 14496-10 stream is carried in an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 stream,
the ITU-T Rec. H.264 | ISO/IEC 14496-10 coded data shall be contained in PES packets. The ITU-T Rec. H.264 |
ISO/IEC 14496-10 coded data shall comply with the byte stream format defined in Annex B of ITU-T Rec. H.264 |
ISO/IEC 14496-10, with the following constraints:
•
Each AVC access unit shall contain an access unit delimiter NAL Unit;
NOTE 2 – ITU-T Rec. H.264 | ISO/IEC 14496-10 requires that an access unit delimiter NAL Unit, if present, is
the first NAL Unit within an AVC access unit. Access unit delimiter NAL Units simplify the ability to detect the
boundary between pictures; they avoid the need to process the content of slice headers, and they are particularly
useful for the Baseline and Extended profiles where slice order can be arbitrary.
•
Each byte stream NAL Unit that carries the access unit delimiter shall contain exactly one zero_byte
syntax element.
NOTE 3 – The syntax and semantics of byte stream NAL units are defined in Annex B of ITU-T Rec. H.264 |
ISO/IEC 14496-10.
•
All Sequence and Picture Parameter Sets (SPS and PPS) necessary for decoding the AVC video stream
shall be present within that AVC video stream.
NOTE 4 – ITU-T Rec. H.264 | ISO/IEC 14496-10 also allows delivery of SPS and PPS by external means. This
Specification does not provide support for such delivery, and therefore requires SPS and PPS to be carried within
the AVC video stream.
•
120
Each AVC video sequence that contains hrd_parameters() with the low_delay_hrd_flag set to '1', shall
carry VUI parameters in which the timing_info_present_flag shall be set to '1'.
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
NOTE 5 – If the low_delay_hrd_flag is set to '1', then buffer underflow is allowed to occur in the STD model;
see 2.14.3 and 2.14.4. Setting the timing_info_present_flag to '1' ensures that the AVC video stream contains
sufficient information to determine the DPB output time and the CPB removal time of AVC access units, also in
case of underflow.
To provide display specific information such as aspect_ratio, it is strongly recommended that each AVC video stream
carries VUI parameters with sufficient information to ensure that the decoded AVC video stream can be displayed
correctly by receivers.
2.14.2
Carriage in PES packets
ITU-T Rec. H.264 | ISO/IEC 14496-10 Video is carried in PES packets as PES_packet_data_bytes, using one of the
16 stream_id values assigned to video, while signalling the ITU-T Rec. H.264 | ISO/IEC 14496-10 Video stream by
means of the assigned stream-type value in the PMT or PSM (see Table 2-34). The highest level that may occur in an
AVC video stream as well as a profile that the entire stream conforms to should be signalled using the AVC video
descriptor. If an AVC video descriptor is associated with an AVC video stream, then this descriptor shall be conveyed
in the descriptor loop for the respective elementary stream entry in the Program Map Table in case of a Transport
Stream or in the Program Stream Map, when PSM is present, in case of a Program Stream. This Recommendation |
International Standard does not specify presentation of ITU-T Rec. H.264 | ISO/IEC 14496-10 streams in the context of
a program.
For PES packetization, no specific data alignment constraints apply. For synchronization and STD management, PTSs
and, when appropriate, DTSs are encoded in the header of the PES packet that carries the ITU-T Rec. H.264 | ISO/IEC
14496-10 video elementary stream data. For PTS and DTS encoding, the constraints and semantics apply as defined in
2.4.3.7 and 2.7.
2.14.3
STD extensions
2.14.3.1 T-STD extensions
The T-STD model includes a transport buffer TBn and a multiplex buffer MBn prior to buffer EBn for decoding of each
ITU-T Rec. H.264 | ISO/IEC 14496-10 video elementary stream n. See Figure 2-11.
Figure 2-11 – T-STD model extensions for ITU-T Rec. H.264 | ISO/IEC 14496-10 video
DPBn buffer management
Carriage of an AVC video stream over ITU-T Rec. H.222.0 | ISO/IEC 13818-1 does not impact the size of buffer DPBn.
For decoding of an AVC video stream in the STD the size of DPBn is as defined in ITU-T Rec. H.264 | ISO/IEC
14496-10. The DPB buffer shall be managed as specified in Annex C of ITU-T Rec. H.264 | ISO/IEC 14496-10
(clauses C.2 and C.4). A decoded AVC access unit enters DPBn instantaneously upon decoding of the AVC access unit,
hence at the CPB removal time of the AVC access unit. A decoded AVC access unit is presented at the DPB output
time. If the AVC video stream provides insufficient information to determine the CPB removal time and the DPB
output time of AVC access units, then these time instants shall be determined in the STD model from PTS and DTS
timestamps as follows:
1)
The CPB removal time of AVC access unit n is the instant in time indicated by DTS(n) where DTS(n) is
the DTS value of AVC access unit n.
2)
The DPB output time of AVC access unit n is the instant in time indicated by PTS(n) where PTS(n) is
the PTS value of AVC access unit n.
NOTE 1 – AVC video sequences in which the low_delay_hrd_flag in hrd parameters() is set to 1 carry sufficient information to
determine the DPB output time and the CPB removal time of each AVC access unit. Hence for AVC access units for which STD
ITU-T Rec. H.222.0 (05/2006)
121
ISO/IEC 13818-1:2007 (E)
underflow may occur, the CPB removal time and the DPB output time are defined by HRD parameters, and not by DTS and PTS
timestamps.
TBn, MBn and EBn buffer management
The input to buffer TBn and its size TBSn are specified in 2.4.2.3. For buffers MBn and EBn, and for the rate Rxn
between TBn and MBn and the rate Rbxn between MBn and EBn the following constraints apply for carriage of an ITU-T
Rec. H.264 | ISO/IEC 14496-10 stream:
Size EBSn of buffer EBn:
EBSn = cpb_size
Where cpb_size is the size CpbSize[ cpb_cnt_minus1 ] of the CPB for the byte stream format
signalled in the NAL hrd_parameters() carried in VUI parameters in the AVC video stream. If NAL
hrd_parameters() are not present in the AVC video stream, then the cpb_size shall be the size
defined as 1200 × MaxCPB in Annex A of ITU-T Rec. H.264 | ISO/IEC 14496-10 for the level of
the AVC video stream.
Size MBSn of Buffer MBn:
MBSn = BSmux + BSoh + 1200 × MaxCPB[level] – cpb_size
where BSoh, packet overhead buffering, is defined as:
BSoh = (1/750) seconds × max{1200 × MaxBR[level], 2 000 000 bit/s}
and BSmux, additional multiplex buffering, is defined as:
BSmux = 0.004 seconds × max{1200 × MaxBR[level], 2 000 000 bit/s}
where MaxCPB[level] and MaxBR[level] are defined for the byte stream format in Table A.1
(Level Limits) in ITU-T Rec. H.264 | ISO/IEC 14496-10 for the level of the AVC video stream, and
where cpb_size is the size CpbSize[ cpb_cnt_minus1 ] of the CPB for the byte stream format
signalled in the NAL hrd_parameters() carried in VUI parameters in the AVC video stream. If NAL
hrd_parameters() are not present in the AVC video stream, then the cpb_size shall be the size
1200 × MaxCPB defined in Annex A of ITU-T Rec. H.264 | ISO/IEC 14496-10 for the level of the
AVC video stream.
Rate Rxn:
when there is no data in TBn then Rxn is equal to zero.
Otherwise:
Rxn = bit_rate
where bit_rate is the bit rate BitRate[ cpb_cnt_minus1 ] of data flow into the CPB for the byte
stream format signalled in the NAL hrd_parameters() carried in VUI parameters in the AVC video
stream. If NAL hrd_parameters() are not present in the AVC video stream, then the bit_rate shall be
the bit rate 1200 × MaxBR[level] defined in Annex A of ITU-T Rec. H.264 | ISO/IEC 14496-10 for
the level of the AVC video stream.
Transfer between MBn and EBn
If the AVC_timing_and_HRD_descriptor is present with the hrd_management_valid_flag set to '1',
then the transfer of data from MBn to EBn shall follow the HRD defined scheme for data arrival in
the CPB as defined in Annex C of ITU-T Rec. H.264 | ISO/IEC 14496-10.
Otherwise, the leak method shall be used to transfer data from MBn to EBn as follows:
Rate Rbxn:
Rbxn = 1200 × MaxBR[level]
where MaxBR[level] is defined for the byte stream format in Table A.1 (Level Limits) in ITU-T
Rec. H.264 | ISO/IEC 14496-10 for each level.
If there is PES packet payload data in MBn, and buffer EBn is not full, the PES packet payload is
transferred from MBn to EBn at a rate equal to Rbxn. If EBn is full, data are not removed from MBn.
When a byte of data is transferred from MBn to EBn, all PES packet header bytes that are in MBn
and precede that byte, are instantaneously removed and discarded. When there is no PES packet
122
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
payload data present in MBn, no data is removed from MBn. All data that enters MBn leaves it. All
PES packet payload data bytes enter EBn instantaneously upon leaving MBn.
Removal of AVC access units from EBn
Each AVC access unit An(j) that is present in EBn is removed instantaneously at time tdn(j). The
decoding time tdn(j) is specified by the DTS or from the CPB removal time, as derived from
information in the AVC video stream.
STD delay
The total delay of any ITU-T Rec. H.264 | ISO/IEC 14496-10 data other than AVC still picture data through the System
Target Decoders buffers TBn, MBn, and EBn shall be constrained by tdn(j) – t(i) ≤ 10 seconds for all j, and all bytes i in
AVC access unit An(j).
The delay of any AVC still picture data through the System Target Decoders buffers TBn, MBn, and EBn shall be
constrained by tdn(j) – t(i) ≤ 60 seconds for all j, and all bytes i in AVC access unit An(j).
Buffer management conditions
Transport streams shall be constructed so that the following conditions for buffer management are satisfied:
•
TBn shall not overflow and shall be empty at least once every second.
•
MBn, EBn, and DPBn shall not overflow.
•
EBn shall not underflow, except when VUI parameters are present for the AVC video sequence with the
low_delay_hrd_flag set to '1'. Underflow of EBn occurs for AVC access unit An(j) when one or more
bytes of An(j) are not present in EBn at the decoding time tdn(j).
NOTE 2 – An AVC video stream may carry information to determine compliance of the AVC video stream to the HRD, as
specified in Annex C of ITU-T Rec. H.264 | ISO/IEC 14496-10. The presence of this information can be signalled in a transport
stream using the AVC timing and HRD descriptor with the hrd_management_valid_flag set to '1'. Irrespective of the presence of
this information, compliance of an AVC video stream to the T-STD ensures that HRD buffer management requirements for CPBn
are met when each byte in the AVC video stream is delivered to and removed from CPBn in the HRD at exactly the same instant
in time at which the byte is delivered to and removed from EBn in the T-STD.
2.14.3.2 P-STD extensions
The P-STD model for the decoding of an ITU-T Rec. H.264 | ISO/IEC 14496-10 elementary stream includes a
multiplex buffer Bn and a decoder Dn followed by a buffer DPBn (see Figure 2-12). For each AVC video stream n, the
size BSn of buffer Bn in the P-STD is defined by the P-STD_buffer_size field in the PES packet header.
Figure 2-12 – P-STD model extensions for ITU-T Rec. H.264 | ISO/IEC 14496-10 video
DPBn buffer management
Buffer DPBn shall be managed in exactly the same way as in the T-STD; see 2.14.3.1.
Bn buffer management
The AVC access unit data enters buffer Bn as specified in 2.5.2.2. At time tdn(j), AVC access unit An(j) is decoded and
instantaneously removed from Bn. The decoding time tdn(j) is specified by the DTS or by the CPB removal time,
derived from information in the AVC video stream. Upon decoding, the AVC access unit instantaneously enters DPBn
or is output without entry into DPBn, according to the rules specified in ITU-T Rec. H.264 | ISO/IEC 14496-10.
ITU-T Rec. H.222.0 (05/2006)
123
ISO/IEC 13818-1:2007 (E)
STD delay
The total delay of any ITU-T Rec. H.264 | ISO/IEC 14496-10 data other than AVC still picture data through the System
Target Decoders buffer Bn shall be constrained by tdn(j) – t(i) ≤ 10 seconds for all j, and all bytes i in AVC access unit
An(j).
The delay of any AVC still picture data through the System Target Decoders buffer Bn shall be constrained by tdn(j) –
t(i) ≤ 60 seconds for all j, and all bytes i in AVC access unit An(j).
Buffer management conditions
Program streams shall be constructed so that the following conditions for buffer management are satisfied:
•
Bn shall not overflow.
•
Bn shall not underflow, except when VUI parameters are present for the AVC video sequence with the
low_delay_hrd_flag set to '1' or when trick_mode status is true. Underflow of Bn occurs for AVC access
unit An(j) when one or more bytes of An(j) are not present in Bn at the decoding time tdn(j).
Annex A
CRC decoder model
(This annex forms an integral part of this Recommendation | International Standard)
A.0
CRC decoder model
The 32-bit CRC decoder model is specified in Figure A.1.
Received data and CRC_32 bits
(most significant bit first)
z(0)
z(1)
z(2)
z(3)
z(4)
z(31)
TISO5840-95/d12
Figure A.1 – 32-bit CRC decoder model
The 32-bit CRC Decoder operates at bit level and consists of 14 adders '+' and 32 delay elements z(i). The input of the
CRC decoder is added to the output of z(31), and the result is provided to the input z(0) and to one of the inputs of each
remaining adder. The other input of each remaining adder is the output of z(i), while the output of each remaining adder
is connected to the input of z(i + 1), with i = 0, 1, 3, 4, 6, 7 , 9, 10, 11, 15, 21, 22, and 25. Refer to Figure A.1 above.
This is the CRC calculated with the polynomial:
x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1
124
ITU-T Rec. H.222.0 (05/2006)
(A-1)
ISO/IEC 13818-1:2007 (E)
Bytes are received at the input of the CRC decoder. Each byte is shifted into the CRC decoder one bit at a time, with the
left most bit (msb) first. For example, if the input is byte 0x01 the seven '0's enter the CRC decoder first, followed by
the one '1'. Before the CRC processing of the data of a section the output of each delay element z(i) is set to its initial
value '1'. After this initialization, each byte of the section is provided to the input of the CRC decoder, including the
four CRC_32 bytes. After shifting the last bit of the last CRC_32 byte into the decoder, i.e., into z(0) after the addition
with the output of z(31), the output of all delay elements z(i) is read. In the case where there are no errors, each of the
outputs of z(i) shall be zero. At the CRC encoder the CRC_32 field is encoded with a value such that this is ensured.
Annex B
Digital Storage Medium Command and Control (DSM-CC)
(This annex does not form an integral part of this Recommendation | International Standard)
B.0
Introduction
The DSM-CC protocol is a specific application protocol intended to provide the basic control functions and operations
specific to managing an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bitstream on digital storage media. This DSM-CC is a
low-level protocol above network/OS layers and below application layers.
The DSM-CC shall be transparent in the following sense:
B.0.1
•
It is independent of the DSM used;
•
it is independent of whether the DSM is located at a local or remote site;
•
it is independent of the network protocol with which the DSM-CC is interfaced;
•
it is independent of the various operating systems on which the DSM is operated.
Purpose
Many applications of ITU-T Rec. H.222.0 | ISO/IEC 13818-1 DSM Control Commands require access to an ITU-T
Rec. H.222.0 | ISO/IEC 13818-1 bitstream stored on a variety of digital storage media at a local or remote site. Different
DSM have their own specific control commands and thus, a user would need to know different sets of specific DSM
control commands in order to access ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bitstreams from different DSM. This
brings many difficulties to the interface design of an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 or ISO/IEC 11172-1
application system. To overcome this difficulty, a set of common DSM control commands, which is independent of the
specific DSM used, is suggested in this annex. This annex is informative only. ISO/IEC 13818-6 defines DSM-CC
extension with a broader scope.
B.0.2
Future applications
Beyond the immediate applications supported by the current DSM control commands, future applications based on
extensions of DSM command control could include the following:
Video on demand
Video programs are provided as requested by a customer through various communication channels. The customer could
select a video program from a list of programs available from a video server. Such applications could be used by hotels,
cable TV, educational institutions, hospitals, etc.
Interactive video services
In these applications, the user provides frequent feedback controlling the manipulation of stored video and audio. These
services can include video-based games, user-controlled video tours, electronic shopping, etc.
Video networks
Various applications may wish to exchange stored audio and video data through some type of computer network. Users
could route AV information through the video network to their terminals. Electronic publishing and multimedia
applications are examples of this kind of application.
ITU-T Rec. H.222.0 (05/2006)
125
ISO/IEC 13818-1:2007 (E)
B.0.3
Benefits
Specifying the DSM control commands independent of the DSM, end-users can perform ITU-T Rec. H.222.0 |
ISO/IEC 13818-1 decoding without having to fully understand the detailed operation of the specific DSM used.
The DSM control commands are codes to give end users the assurance that the ITU-T Rec. H.222.0 | ISO/IEC 13818-1
bitstreams can be played and stored with the same semantics, independent of the DSM and user interface. They are
fundamental commands for the control of DSM operation.
B.0.4
Basic functions
B.0.4.1
Stream selection
The DSM-CC provides the means to select an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bitstream upon which to perform
the succeeding operations. Such operations include creation of a new bitstream. Parameters of this function include:
B.0.4.2
•
index of the ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bitstream (the mapping between this index and a
name meaningful to an application is outside the scope of the current DSM-CC);
•
mode (retrieval/storage).
Retrieval
The DSM-CC provides the means to:
B.0.4.3
•
play an identified ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bitstream;
•
play from a given presentation time;
•
set the playback speed (normal or fast);
•
set the playback duration (until a specified presentation time, the end of the bitstream in forward play or
the beginning in reverse play or the issuance of a stop command);
•
set the direction (forward or reverse);
•
pause;
•
resume;
•
change the access point in the bitstream;
•
stop.
Storage
The DSM-CC provides the means to:
•
cause storage of a valid bitstream for a specified duration;
•
cause storage to stop.
DSM-CC provides a useful but limited subset of functionality that may be required in DSM based ITU-T Rec. H.222.0 |
ISO/IEC 13818-1 applications. It is fully expected that significant additional capabilities will be added through
subsequent extensions.
B.1
General elements
B.1.1
Scope
The scope of this work consists of the development of a Recommendation | International Standard to specify a useful set
of commands for control of digital storage media on which an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bitstream is
stored. The commands can perform remote control of a digital storage media in a general way independently of the
specific DSM and apply to any ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bitstream stored on a DSM.
B.1.2
Overview of the DSM-CC application
The current DSM-CC syntax and semantics cover the single user to DSM application. The user's system is capable of
retrieving an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bitstream and is also (optionally) capable of generating an ITU-T
Rec. H.222.0 | ISO/IEC 13818-1 bitstream. The control channel over which the DSM commands and
acknowledgements are sent is shown in Figure B.1 as an out-of-band channel. This can also be accomplished by
inserting the DSM-CC commands and acknowledgements into the ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bitstreams if
an out-of-band channel is not available.
126
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
DSM-CC
User
DSM ACK
DSM
ISO/IEC 13818
Bitstream
ISO/IEC 13818
Bitstream
Video bitstream
System
decoder
Audio bitstream
Video bitstream
System
encoder
Audio bitstream
TISO5850-95/d13
Figure B.1 – Configuration of DSM-CC application
B.1.3
The transmission of DSM-CC commands and acknowledgements
The DSM-CC is encoded into a DSM-CC bitstream according to the syntax and semantics defined in B.2.2 through
B.2.9. The DSM-CC bitstream can be transmitted both as a stand-alone bitstream and in an ITU-T Rec. H.222.0 |
ISO/IEC 13818-1 Systems bitstream.
When the DSM-CC bitstream is transmitted in stand-alone mode, its relationship to the Systems bitstream and the
decoding process is illustrated in Figure B.2. In this case, the DSM-CC bitstream is not embedded in the Systems
bitstream. This transmission mode can be used in the applications when the DSM is connected directly with the ITU-T
Rec. H.222.0 | ISO/IEC 13818-1 decoder. It can also be used in the applications where the DSM-CC bitstream could be
controlled and transmitted by other types of network multiplexors.
Figure B.2 – BSM-CC bitstream decoded as a stand-alone bitstream
For some applications, it is desirable to transmit the DSM-CC in an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 systems
bitstream so that some features of the ITU-T Rec. H.222.0 | ISO/IEC 13818-1 systems bitstream could be applied to the
DSM-CC bitstream as well. In this case, the DSM-CC bitstream is embedded in the systems bitstream by the systems
multiplexor.
The DSM-CC bitstream is encoded by the systems encoder in the following process. First, the DSM-CC bitstream is
packetized into a packetized elementary stream (PES) according to the syntax described in 2.4.3.6. The PES packet is
then multiplexed into either a Program Stream (PS) or a Transport Stream (TS) according to the requirement of the
transmission media. The decoding procedures are the inverse of the encoding procedures and are illustrated in the block
diagram of the Systems decoder depicted in Figure B.3.
In Figure B.3, the output of the Systems decoder is a video bitstream, audio bitstream and/or DSM-CC bitstream. The
DSM-CC bitstream is identified by the stream_id, value '1111 0010' as defined by the stream_id Table 2-22. Once the
DSM-CC bitstream is identified, it follows the rules as specified by T-STD or P-STD.
ITU-T Rec. H.222.0 (05/2006)
127
ISO/IEC 13818-1:2007 (E)
ISO/IEC 13818 Bitstream
(DSM-CC is embedded)
Video bitstream
Audio bitstream
System
decoder
DSM-CC
bitstream
DSM-CC
decoder
Decoded
DSM-CC
TISO5870-95/d15
Figure B.3 – DSM-CC bitstream decoded as part of the system bitstream
B.2
Technical elements
B.2.1
Definitions
For the purposes of this Recommendation | International Standard, the following definitions apply:
B.2.1.1 DSM-CC: Digital Storage Media Command and Control Commands that are specified by ITU-T
Rec. H.222.0 | ISO/IEC 13818-1 for the control of digital storage media at a local or remote site containing an ITU-T
Rec. H.222.0 | ISO/IEC 13818-1 bitstream.
B.2.1.2
DSM ACK: The acknowledgement from the DSM-CC command receiver to the command initiator.
B.2.1.3 MPEG bitstream: An ISO/IEC 11172-1 Systems stream, ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Program
Stream or ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Transport stream.
B.2.1.4 DSM-CC server: A system, either local or remote, used to store and/or retrieve an ITU-T Rec. H.222.0 |
ISO/IEC 13818-1 bitstream.
B.2.1.5 point of random access: A point in an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bitstream with the property
that for at least one elementary stream within the bitstream, the next access unit, 'N', completely contained in the
bitstream can be decoded without reference to previous access units, and for every elementary stream in the bitstream
all access units with the same or later presentation times are completely contained subsequently in the bitstream and can
be completely decoded by a system target decoder without access to information prior to the point of random access.
The bitstream as stored on the DSM may have certain points of random access; the output of the DSM may include
additional points of random access manufactured by the DSM's own manipulation of the stored material (e.g., storing
quantization matrices so that a sequence header can be generated whenever necessary). A point of random access has an
associated PTS, namely the actual or implied PTS of access unit 'N'.
B.2.1.6 current operational PTS value: The actual or implied PTS associated with the last point of random access
preceding the last access unit provided from the DSM from the currently selected ITU-T Rec. H.222.0 |
ISO/IEC 13818-1 bitstream. If no access unit has been provided from this ITU-T Rec. H.222.0 | ISO/IEC 13818-1
bitstream, the DSM is incapable of providing random access into the current bitstream, then the current operational
PTS value is the first point of random access in the ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bitstream.
B.2.1.7
DSM-CC bitstream: A sequence of bits satisfying the syntax of B.2.2.
B.2.2
Specification of DSM-CC syntax
128
•
Every DSM control command shall commence with a start_code, as specified in Table B.1.
•
Every DSM control command shall have a packet_length to specify the number of byte in a
DSM-CC packet.
•
When the DSM-CC bitstream is transmitted as a PES packet as defined in 2.4.3.6, the fields up to the
packet_length field are identical to those specified in 2.4.3.6. In other words, if the DSM-CC packet is
encapsulated in a PES packet, the PES packet start code is the only start code at the beginning of the
packet.
•
The actual control command or acknowledgement shall follow the last byte of the packet_length field.
•
An acknowledgement stream shall be provided by the DSM control bitstream receiver after the requested
operation is started or is completed, depending on the command received.
•
At all times the DSM is responsible for providing a normative ITU-T Rec. H.222.0 | ISO/IEC 13818-1
stream. This may include manipulating the trick mode bits defined in 2.4.3.6.
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Table B.1 – DSM-CC syntax
Syntax
DSM_CC() {
packet_start_code_prefix
stream_id
packet_length
command_id
If (command_id = = '01') {
control()
} else if (command_id = = '02') {
ack()
}
}
B.2.3
No. of bits
Mnemonic
24
8
16
8
bslbf
uimsbf
uimsbf
uimsbf
Semantics of fields in specification of DSM-CC syntax
packet_start_code_prefix – This is a 24-bit code. Together with the stream_id that follows it constitutes a DSM-CC
packet start code that identifies the beginning of a DSM-CC packet bitstream. The packet_start_code_prefix is the bit
string '0000 0000 0000 0000 0000 0001' (0x000001).
stream_id – This 8-bit field specifies the bitstream type and shall have a value '1111 0010' for the DSM-CC bitstream.
Refer to Table 2-23.
packet_length – This 16-bit field specifies the number of bytes in the DSM-CC packet immediately following the last
byte of this field.
command_id – This 8-bit unsigned integer identifies the bitstream is a control command or an acknowledgement
stream. The values are defined in Table B.2.
Table B.2 – Command_id assigned values
Value
0x00
Forbidden
0x01
Control
0x02
Ack
0x03-0xFF
B.2.4
Command_id
Reserved
Control layer
Constraints on setting flags in DSM-CC control
•
At most one of the flags for select, playback and storage shall be set to '1' for each DSM control
command. If none of these bits are set, then this command shall be ignored.
•
At most one of pause_mode, resume_mode, stop_mode, play_flag, and jump_flag shall be set for each
retrieval command. If none of these bits are set, then this command shall be ignored.
•
At most one of record_flag and stop_mode shall be selected for each storage command. If none of these
bits are set, then this command shall be ignored.
ITU-T Rec. H.222.0 (05/2006)
129
ISO/IEC 13818-1:2007 (E)
See Table B.3.
Table B.3 – DSM-CC control
Syntax
control() {
select_flag
retrieval_flag
storage_flag
reserved
marker_bit
If (select_flag = = '1') {
bitstream_id [31..17]
marker_bit
bitstream_id [16..2]
marker_bit
bitstream_id [1..0]
select_mode
marker_bit
}
if ( retrieve_flag = = '1') {
jump_flag
play_flag
pause_mode
resume_mode
stop_mode
reserved
marker_bit
if (jump_flag = ='1') {
reserved
direction_indicator
time_code()
}
if (play_flag = ='1'){
speed_mode
direction_indicator
reserved
time_code()
}
}
if (storage_flag = ='1') {
reserved
record_flag
stop_mode
if (record_flag = ='1') {
time_code()
}
}
}
130
ITU-T Rec. H.222.0 (05/2006)
No. of bits
Mnemonic
1
1
1
12
1
bslbf
bslbf
bslbf
bslbf
bslbf
15
1
15
1
2
5
1
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
1
1
1
1
1
10
1
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
7
1
bslbf
bslbf
1
1
6
bslbf
bslbf
bslbf
6
1
1
bslbf
bslbf
bslbf
ISO/IEC 13818-1:2007 (E)
B.2.5
Semantics of fields in control layer
marker_bit – This is a 1-bit marker that is always set to '1' to avoid start code emulation.
reserved_bits – This 12-bit field is reserved for future use by this Recommendation | International Standard for
DSM control commands. Until otherwise specified by ITU-T | ISO/IEC it shall have the value '0000 0000 0000'.
select_flag – This 1-bit flag when set to '1' specifies a bitstream selection operation. When it is set to '0' no bitstream
selection operation shall occur.
retrieval_flag – This 1-bit flag when set to '1' specifies that a specific retrieval (playback) action will occur. The
operation starts from the current operational PTS value.
storage_flag – This 1-bit flag when set to '1' specifies that a storage operation is to be executed.
bitstream_ID – This 32-bit field is coded in three parts. The parts are combined to form an unsigned integer specifying
which ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bitstream is to be selected. It is the DSM server's responsibility to map
the names of the ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bitstreams stored on its DSM uniquely to a series of numbers
which could be represented by the bitstream_ID.
select_mode – This 5-bit unsigned integer specifies which mode of bitstream operation is requested. Table B.4 specifies
the defined modes.
Table B.4 – Select mode assigned values
Code
Mode
0x00
Forbidden
0x01
Storage
0x02
Retrieval
0x03-0x1F
Reserved
jump_flag – This 1-bit flag when set to '1' specifies a jump in the playback pointer to a new access unit. The new
PTS is specified by a relative time_code with respect to the current operational PTS value. This function is only valid
when the current ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bitstream is in the "stop" mode.
play_flag – This 1-bit flag when set to '1' specifies to play a bitstream for a certain time period. The speed, direction,
and play duration are additional parameters in the bit stream. The play starts from the current operational PTS value.
pause_mode – This is a one-bit code specifying to pause the playback action and keep the playback pointer at the
current operational PTS value.
resume_mode – This is a one-bit code specifying to continue the playback action from the current operational
PTS value. Resume only has meaning if the current bitstream is in the "pause" state, and the bitstream will be set to the
forward play state at normal speed.
stop_mode – This is a one-bit code specifying to stop a bitstream transmission.
direction_indicator – This is a one-bit code to indicate the playback direction. If this bit is set to '1', it stands for a
forward play. Otherwise it stands for a backward play.
speed_mode – This is a 1-bit code to specify the speed scale. If this bit is set to '1', it specifies that the speed is normal
play. If this bit is set to '0', it specifies that the speed is fast play (i.e., fast forward or fast reverse).
record_flag – This is one-bit flag to specify the request of recording the bitstream from an end user to a DSM for a
specified duration or until the reception of a stop command, whichever comes first.
ITU-T Rec. H.222.0 (05/2006)
131
ISO/IEC 13818-1:2007 (E)
B.2.6
Acknowledgement layer
Constraints on setting flags in DSM-CC control
Only one of the acks bits specified below can be set to '1' for each DSM ack bitstream (see Table B.5).
Table B.5 – DSM-CC Acknowledgement
Syntax
No. of bits
Mnemonic
1
1
1
1
10
1
1
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
ack() {
select_ack
retrieval_ack
storage_ack
error_ack
reserved
marker_bit
cmd_status
If (cmd_status = = '1' &&
(retrieval_ack = ='1' || storage_ack = = '1')) {
time_code()
}
}
B.2.7
Semantics of fields in Acknowledgement layer
select_ack – This 1-bit field when it is set to '1' indicates that the ack() command is to acknowledge a select command.
retrieval_ack – This 1-bit field when set to '1' indicates that the ack() command is to acknowledge a retrieval
command.
storage_ack – This 1-bit field when set to '1' indicates that the ack() command is to acknowledge a storage command.
error_ack – This 1-bit field when set to '1' indicates a DSM error. The defined errors are EOF (end of file on forward
play or start of file on reverse play) on a stream being retrieved and Disk Full on a stream being stored. If this bit is set
to '1', cmd_status is undefined. The current bitstream is still selected.
cmd_status – This 1-bit flag set to '1' indicates that the command is accepted. When set to '0' it indicates the command
is rejected. The semantics vary according to the command received as follows:
132
•
If select_ack is set and cmd_status is set to '1', it specifies that the ITU-T Rec. H.222.0 |
ISO/IEC 13818-1 bitstream is selected and the server is ready to provide the selected mode of operation.
The current operational PTS value is set to the first point of random access of the newly selected ITU-T
Rec. H.222.0 | ISO/IEC 13818-1 bitstream. If cmd_status is set to '0', the operation has failed and no
bitstream is selected.
•
If retrieval_ack is set and cmd_status is set to '1', it specifies that the retrieval operation is initiated for all
retrieval commands. The position of the current operational PTS pointer is reported by the succeeding
time_code.
•
For the play_flag command with infinite_time_flag != '1', a second acknowledgement will be sent. This
will acknowledge that the play operation has ended by reaching the duration defined by the play_flag
command.
•
If the cmd_status is set to '0' in a retrieval acknowledgement, the operation has failed. Possible reasons
for this failure include an invalid bitstream_ID, jumping beyond the end of a file, or a function not
supported such as reverse play in standard speed.
•
If storage_ack is set, it specifies that the storage operation is being started for the record_flag command
or is completed by the stop_mode command. The PTS of the last complete access unit stored is reported
by the succeeding time_code.
•
If the recording operation is ended by reaching the duration defined by the storage_flag command,
another acknowledgement shall be sent and the current operational PTS value after the recording shall be
reported.
•
If the cmd_status is set to '0' in a storage acknowledgement, the operation has failed. Possible reasons for
this failure include an invalid bitstream_ID, or the inability of the DSM to store data.
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
B.2.8
Time code
Constraints on time code
•
A forward operation of specified duration given by a time_code terminates after the actual or implied
PTS of an access unit is observed such that PTS minus the current operational PTS value at the start of
the operation modulo 233 exceeds the duration.
•
A backward operation of specified duration given by a time_code terminates after the actual or implied
PTS of an access unit is observed such that current operational PTS value at the start of the operation
minus that PTS modulo 233 exceeds the duration.
•
For all the commands in the control() layer, the time_code is specified as a relative duration with respect
to the current operational PTS value.
•
For all the commands in the ack() layer, the time_code is specified by the current operational PTS value.
See Table B.6.
Table B.6 – Time code
Syntax
No. of bits
Mnemonic
7
1
bslbf
bslbf
4
3
1
15
1
15
1
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
bslbf
time_code() {
reserved
infinite_time_flag
if (infinite_time_flag = = '0') {
reserved
PTS [32..30]
marker
PTS [29..15]
marker_bit
PTS [14..0]
marker_bit
}
}
B.2.9
Semantics of fields in time code
infinite_time_flag – This 1-bit flag when set to '1' indicates an infinite time period. This flag is set to '1' in applications
where a time period for a specific operation could not be defined in advance.
PTS [32..0] – The presentation timestamp of the access unit of the bitstream. Depending upon the function, this can be
an absolute value or a relative time delay in cycles of the 90-kHz system clock.
Annex C
Program Specific Information
(This annex does not form an integral part of this Recommendation | International Standard)
C.0
Explanation of Program Specific Information in Transport Streams
Subclause 2.4.4 contains the normative syntax, semantics and text concerning Program Specific Information. In all
cases, compliance with the constraints of 2.4.4 is required. This annex provides explanatory information on how to use
the PSI functions, and considers examples of how it may be used in practice.
C.1
Introduction
This Recommendation | International Standard provides a method for describing the contents of Transport Stream
packets for the purpose of the demultiplexing and presentation of programs. The coding specification accommodates
this function through the Program Specific Information (PSI). This annex discusses the use of PSI.
ITU-T Rec. H.222.0 (05/2006)
133
ISO/IEC 13818-1:2007 (E)
The PSI may be thought of as belonging to six tables:
1)
Program Association Table (PAT);
2)
TS Program Map Table (PMT);
3)
Network Information Table (NIT);
4)
Conditional Access Table (CAT).
5)
Transport Stream Description Table; and
6)
IPMP Control Information Table.
The contents of the PAT, PMT, CAT and TSDT are specified in this Recommendation | International Standard. ICIT is
defined in ISO/IEC 13818-11 (MPEG-2 IPMP).
The NIT is a private table, and the PID value of the Transport Stream packets which carry it is specified in the PAT.
Both the NIT and ICIT must follow the structure defined in this Recommendation | International Standard.
C.2
Functional mechanism
The tables listed above are conceptual in that they need never be regenerated in a specified form within a decoder.
While these structures may be thought of as simple tables, they may be partitioned before they are sent in Transport
Stream packets. The syntax supports this operation by allowing the tables to be partitioned into sections and by
providing a normative mapping method into Transport Stream packet payloads. A method is also provided to carry
private data in a similar format. This is advantageous as the same basic processing in the decoder can then be used for
both the PSI data and the private data helping to keep cost down. For advice on the optimum placing of PSI in the
Transport Stream, see Annex D.
Each section is uniquely identified by the combination of the following elements:
i)
table_id
The 8-bit table_id identifies to which table the section belongs.
•
Sections with table_id 0x00 belong to the Program Association Table.
•
Sections with table_id 0x01 belong to the Conditional Access Table.
•
Sections with table_id 0x02 belong to the TS Program Map Table.
•
Sections with table_id 0x03 belong to the TS_description_section.
•
Sections with table_id 0x04 belong to the ISO_IEC_14496_scene_description_section.
•
Sections with table_id 0x05 belong to the ISO_IEC_14496_object_descriptor_section.
•
Sections with table_id 0x06 belong to the metadata_section.
•
Sections with table_id 0x07 belong to the IPMP_Control_Information_section.
Other values of the table_id can be allocated by the user for private purposes.
It is possible to set up filters looking at the table_id field to identify whether a new section belongs to a
table of interest or not.
ii)
table_id_extension
This 16-bit field exists in the long version of a section. In the Program Association Table it is used to
identify the transport_stream_id of the stream – effectively a user-defined label which allows one
Transport Stream to be distinguished from another within a network or across networks. In the
Conditional Access Table this field currently has no meaning and is therefore marked as "reserved"
meaning that it shall be coded as 0xFFFF, but that a meaning may be defined by ITU-T | ISO/IEC in a
subsequent revision of this Recommendation | International Standard. In a TS Program Map section the
field contains the program_number, and thereby identifies the program to which the data in the section
refers. The table_id_extension can also be used as a filter point in certain cases.
iii) section_number
The section_number field allows the sections of a particular table to be reassembled in their original
order by the decoder. There is no obligation within this Recommendation | International Standard that
sections must be transmitted in numerical order, but this is recommended, unless it is desired to transmit
some sections of the table more frequently than others, e.g., due to random access considerations.
134
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
iv) version_number
When the characteristics of the Transport Stream described in the PSI change (e.g., extra programs
added, different composition of elementary streams for a given program), then new PSI data has to be
sent with the updated information as the most recently transmitted version of the sections marked as
"current" must always be valid. Decoders need to be able to identify whether the most recently received
section is identical with the section they have already processed/stored (in which case the section can be
discarded), or whether it is different, and may therefore signify a configuration change. This is achieved
by sending a section with the same table_id, table_id_extension, and section_number as the previous
section containing the relevant data, but with the next value version_number.
v)
current_next_indicator
It is important to know at what point in the bitstream the PSI is valid. Each section can therefore be
numbered as valid "now" (current), or as valid in the immediate future (next). This allows the
transmission of a future configuration in advance of the change, giving the decoder the opportunity to
prepare for the change. There is however no obligation to transmit the next version of a section in
advance, but if it is transmitted, then it shall be the next correct version of that section.
C.3
The Mapping of Sections into Transport Stream Packets
Sections are mapped directly into Transport Stream packets, that is to say without a prior mapping into PES packets.
Sections do not have to start at the beginning of Transport Stream packets, (although they may), because the start of the
first section in the payload of a Transport Stream packet is pointed to by the pointer_field. The presence of the
pointer_field is signalled by the payload_unit_start_indicator being set to a value of '1' in PSI packets. (In
non-PSI packets, the indicator signals that a PES packet starts in the Transport Stream packet.) The pointer_field points
to the start of the first section in the Transport Stream packet. There is never more than one pointer_field in a Transport
Stream packet, as the start of any other section can be identified by counting the length of the first and any subsequent
sections, since no gaps between sections within a Transport Stream packet are allowed by the syntax.
It is important to note that within Transport Stream packets of any single PID value, one section must be finished before
the next one is allowed to be started, or else it is not possible to identify to which section header the data belongs. If a
section finishes before the end of a Transport Stream packet, but it is not convenient to open another section, a stuffing
mechanism is provided to fill up the space. Stuffing is performed by filling each remaining byte of the packet with the
value 0xFF. Consequently the table_id value 0xFF is forbidden, or else this would be confused with stuffing. Once a
0xFF byte has occurred at the end of a section, then the rest of the Transport Stream packet must be stuffed with
0xFF bytes, allowing a decoder to discard the rest of the Transport Stream packet. Stuffing can also be performed using
the normal adaptation_field mechanism.
C.4
Repetition rates and random access
In systems where random access is a consideration, it is recommended to re-transmit PSI sections several times, even
when changes do not occur in the configuration, as in the general case, a decoder needs the PSI data to identify the
contents of the Transport Stream, to be able to start decoding. This Recommendation | International Standard does not
place any requirements on the repetition or occurrence rate of PSI sections. Clearly though, repeating sections
frequently helps random access applications, whilst causing an increase in the amount of bitrate used by PSI data. If
program mappings are static or quasi-static, they may be stored in the decoder to allow faster access to the data than
having to wait for it to be re-transmitted. The trade-off between the amount of storage required and the desired impact
on channel acquisition time may be made by the decoder manufacturer.
C.5
What is a program?
The concept of a program has a precise definition within this Recommendation | International Standard [refer to 2.1.60
program (system)]. For a Transport Stream the time base is defined by the PCR. This effectively creates a virtual
channel within the Transport Stream.
Note that this is not the same definition as is commonly used in broadcasting, where a "program" is a collection of
elementary streams not only with a common timebase, but also with a common start and end time. A series of
"broadcaster programs" (referred to in this annex as events) can be transmitted sequentially in a Transport Stream using
the same program_number to create a "broadcasting conventional" TV-channel (sometimes called a service).
Event descriptions could be transmitted in private_sections().
A program is denoted by a program_number which has significance only within a Transport Stream. The
program_number is a 16-bit unsigned integer and thus permits 65535 unique programs to exist within a Transport
Stream (program_number 0 is reserved for identification of the NIT). Where several Transport Streams are available to
ITU-T Rec. H.222.0 (05/2006)
135
ISO/IEC 13818-1:2007 (E)
the decoder (e.g., in a cable network), in order to successfully demultiplex a program, the decoder must be notified of
both the transport_stream_id (to find the right multiplex) and the program_number of the service (to find the right
program within the multiplex).
The Transport Stream mapping may be accomplished via the optional Network Information Table. Note that the
Network Information Table may be stored in decoder non-volatile memory to reduce channel acquisition time. In this
case, it needs to be transmitted only often enough to support timely decoder initialization set-up operations. The
contents of the NIT are private, but shall take at least the minimum section structure.
C.6
Allocation of program_number
It may not be convenient in all cases to group together all the program element which share a common clock reference
as one program. It is conceivable to have a multi-service Transport Stream with only one set of PCRs, common to all. In
general, though, a broadcaster may prefer to logically split up the Transport Stream into several programs, where the
PCR_PID (location of the clock reference) is always the same. This method of splitting the program elements into
pseudo-independent programs can have several uses. Two examples follow:
i)
multilingual transmissions into separate markets
One video stream may be accompanied by several audio streams in different languages. It is advisable to
include an example of the ISO_639_language_descriptor associated with each audio stream to enable the
selection of the correct program and audio. It is reasonable to have several program definitions with
different program_numbers, where all the programs reference the same video stream and PCR_PID, but
have different audio PIDs. It is, however, also reasonable and possible to list the video stream and all the
audio streams as one program, where this does not exceed the section size limit of 1024 bytes.
ii)
Very large program definitions
There is a maximum limit on the length of a section of 1024 bytes (including section header and
CRC_32). This means that no single program definition may exceed this length. For the great majority of
cases, even with each program element having several descriptors, this size is adequate. However, one
may envisage cases in very high bitrate systems, which could exceed this limit. It is then in general
possible to identify methods of splitting the references of the streams, so that they do not all have to be
listed together. Some program elements could be referenced under more than one program, and some
under only one or the other, but not both.
C.7
Usage of PSI in a typical system
A communications system, especially in broadcast applications, may consist of many individual Transport Streams.
Each one of the four PSI data structures may appear in each and every Transport Stream in a system. There must always
be a complete version of the program association table listing all programs within the Transport Stream and a complete
TS program map table, containing complete program definitions for all programs within the Transport Stream. If any
streams are scrambled, then there must also be a conditional access table present listing the relevant Entitlement
Management Messages (EMM) streams. The presence of a NIT is fully optional.
The PSI tables are mapped into Transport Stream packets via the section structure described above. Each section has a
table_id field in its header, allowing sections from PSI tables and private data in private_sections to be mixed in
Transport Stream packets of the same PID value or even in the same Transport Stream packet. Note, however, that
within packets of the same PID, a complete section must be transmitted before the next section can be started. This is
only possible for packets labelled as containing TS Program Map Table section or NIT packets however, since private
sections may not be mapped into PAT or CAT packets.
It is required that all PAT sections be mapped into Transport Stream packets with PID = 0x0000 and all CA sections be
mapped into packets with PID = 0x0001. PMT sections may be mapped into packets of user-selected PID value, listed
as the PMT_PID for each program in the Program Association Table. Likewise, the PID for the NIT-bearing Transport
Stream packets is user-selected, but must be pointed to by the entry "program_number = = 0x00" in the PAT, if the NIT
exists.
The contents of any CA parameter streams are entirely private, but EMMs and ECMs must also be sent in Transport
Stream packets to be compliant with this Recommendation | International Standard.
Private data tables may be sent using the private_section() syntax. Such tables could be used for example in a
broadcasting environment to describe a service, an upcoming event, broadcast schedules and related information.
136
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
C.8
The relationships of PSI structures
Figure C.1 shows an example of the relationship between the four PSI structures and the Transport Stream. Other
examples are possible, but the figure shows the primary connections.
In the following subclauses, each PSI table is described.
Program Association Table (PID 0)
Program Map PID
Network Information Table
Program 0
Program 1
Program 2
Program 45
Program 20
Network PID
Private
Network
Data
Program X
Program Y
Program 1
Program 45
Program 20
Audio
Elementary Stream PID
Video
Elementary Stream PID
Decoder
NVM
Program Y
Program Map Table
EMM Sys 2
Program 1
Audio
Program 20
Video
CA System 1
CA System 2
CA PID
EMM Sys 1
MPEG-2
Transport
Stream
CA PID
TISO5880-95/d16
CA System N
Conditional Access Table (PID 1)
Figure C.1 – Program and network mapping relationships
C.8.1
Program Association Table
Every Transport Stream must contain a complete valid Program Association Table. The Program Association Table
gives the correspondence between a program_number and the PID of the Transport Stream packets that carry the
definition of that program (the PMT_PID). The PAT may be partitioned into up to 255 sections before it is mapped into
Transport Stream packets. Each section carries a part of the overall PAT. This partitioning may be desirable to
minimize data loss in error conditions. That is, packet loss or bit errors may be localized to smaller sections of the PAT,
thus allowing other sections to still be received and correctly decoded. If all PAT information is put into one section, an
ITU-T Rec. H.222.0 (05/2006)
137
ISO/IEC 13818-1:2007 (E)
error causing a changed bit in the table_id, for example, would cause the loss of the entire PAT. However, this is still
permitted as long as the section does not extend beyond the 1024-byte maximum length limit.
Program 0 (zero) is reserved and is used to specify the Network PID. This is a pointer to the Transport Stream packets
which carry the Network Information Table.
The Program Association Table is always transmitted without encryption.
C.8.2
Program Map Table
The Program Map Table provides the mapping between a program number and the program elements that comprise it.
This table is present in Transport Stream packets having one or more privately-selected PID values. These Transport
Stream packets may contain other private structures as defined by the table_id field. It is possible to have TS PMT
sections referring to different programs carried in Transport Stream packets having a common PID value.
This Recommendation | International Standard requires a minimum of program identification: program number, PCR
PID, stream types and program elements PIDs. Additional information for either programs or elementary streams may
be conveyed by use of the descriptor() construct. Refer to C.8.6.
Private data may also be sent in Transport Stream packets denoted as carrying TS program map table sections. This is
accomplished by the use of the private_section(). In a private_section() the application decides whether version_number
and current_next_indicator represent the values of these fields for a single section or whether they are applicable to
many sections as parts of a larger private table.
NOTE 1 – Transport stream packets containing the Program Map Table are transmitted unencrypted.
NOTE 2 – It is possible to transmit information on events in private descriptors carried within the TS_program_map_section()s.
C.8.3
Conditional Access Table
The Conditional Access (CA) Table gives the association between one or more CA systems, their EMM streams and
any special parameters associated with them.
NOTE – The (private) contents of the Transport Stream packets containing EMM and CA parameters if present will, in general,
be encrypted (scrambled).
C.8.4
Network Information Table
The contents of the NIT are private and not specified by this Recommendation | International Standard. In general, it
will contain mappings of user-selected services with transport_stream_ids, channel frequencies, satellite transponder
numbers, modulation characteristics, etc.
C.8.5
Private_section()
Private_sections() can occur in two basic forms, the short version (where only the fields up to and including
section_length are included) or the long version (where all the fields up to and including last_section_number are
present, and after the private data bytes the CRC_32 field is present).
Private_section()s can occur in PIDs which are labelled as PMT_PIDs or in Transport Stream packets with other PID
values which contain exclusively private_sections(), including the PID allocated to the NIT. If the Transport Stream
packets of the PID carrying the private_section()s are identified as a PID carrying private_sections (stream_type
assignment value 0x05), then only private_sections may occur in Transport Stream packets of that PID value. The
sections may be either of the short or long type.
C.8.6
Descriptors
There are several normative descriptors defined in this Recommendation | International Standard. Many more private
descriptors may also be defined. All descriptors have a common format: {tag, length, data}. Any privately defined
descriptors must adhere to this format. The data portion of these private descriptors are privately defined.
One descriptor (the CA_descriptor()), is used to indicate the location (PID value of transport packets) of ECM data
associated with program elements when it is found in a TS PMT section. When found in a CA section it refers to
EMMs.
In order to extend the number of private_descriptors available, the following mechanism could be used: A private
descriptor_tag could be privately defined to be constructed as a composite descriptor. This entails privately defining a
further sub_descriptor as the first field of the private data bytes of the private descriptor. The described structure is as
indicated in Tables C.1 and C.2.
138
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Table C.1 – Composite_descriptor
Syntax
Composite_descriptor(){
descriptor_tag(privately defined)
descriptor_length
for (i = 0; i < N; i++){
sub_descriptor()
}
}
No. of bits
Mnemonic
8
8
uimsbf
uimsbf
No. of bits
Mnemonic
8
8
uimsbf
uimsbf
8
uimsbf
Table C.2 – Sub-descriptor
Syntax
sub_descriptor() {
sub_descriptor_tag
sub_descriptor_length
for (i = 0; i < N; i++) {
private_data_byte
}
}
C.9
Bandwidth utilization and signal acquisition time
Any implementation of an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bitstream must make reasonable bandwidth demands
for PSI information and, in applications where random access is a consideration, should promote fast signal acquisition.
This subclause analyses this issue and gives some broadcast application examples.
The packet-based nature of the Transport Stream allows for the interspersing of PSI information with fine granularity in
the multiplexed data. This provides significant flexibility in the construction and transmission of PSI.
Signal acquisition time in a real decoder is dependent on many factors, including: FDM tuning slew time,
demultiplexing time, sequence headers, I-frame occurrence rate and scrambling key retrieval and processing.
This subclause examines both the bitrate and signal acquisition time impacts of the PSI syntax in 2.4.4.4 and 2.4.4.9. It
is assumed that the Conditional Access Table does not need to be received dynamically at every program change. This
assumption is also made of the private EMM streams. This is because these streams do not contain the quickly-varying
ECM components used for program element scrambling (encryption).
Also, in the discussion below, the time to acquire and process ECMs has been neglected.
Tables C.3 and C.4 provide bandwidth usage values for a range of Transport Stream conditions. One axis of the table is
the number of programs contained in a single Transport Stream. The other axis is the frequency with which the PSI
information is transmitted in the Transport Stream.
Table C.3 – Program association table bandwidth usage (bit/s)
Number of programs per transport stream
1
Frequency of PA Table
5
10
32
128
1
1504
1504
1504
1504
4512
10
15040
15040
15040
15040
45120
Information
25
37600
37600
37600
37600
112800
(s–1)
50
75200
75200
75200
75200
225600
100
150400
150400
150400
150400
451200
NOTE – Since 46 program_association_sections fit into one transport packet, the numbers in the table do not change until the
last column.
ITU-T Rec. H.222.0 (05/2006)
139
ISO/IEC 13818-1:2007 (E)
Table C.4 – Program map table bandwidth usage (bit/s)
Number of programs per transport stream
1
1
5
10
32
128
1504
1504
3008
7520
28576
Frequency of PM Table
10
15040
15040
30080
75200
285760
Information
25
37600
37600
75200
188000
714400
(s–1)
50
75200
75200
150400
376000
1428800
100
150400
150400
300800
601600
2857600
This frequency will be a key determinant of the component of signal acquisition time due to PSI structures.
Both bandwidth usage tables assume that only the minimum program mapping information is provided. This means that
the PID values and stream types are provided with no additional descriptors. All programs in the example are composed
of two elementary streams. Program associations are 2 bytes long, while the minimal program map is 26 bytes long.
There is additional overhead associated with version numbers, section lengths, etc. This will be on the order of 1-3% of
the total PSI bitrate usage in sections of moderate to maximum length (a few hundred bytes to 1024 bytes) and will thus
be ignored here.
The above assumptions allow forty-six (46) program associations to map into one Program Association Table Transport
Stream packet (if no adaptation field is present). Similarly, seven (7) TS_program_map_sections fit into a single
Transport Stream packet. It may be noted that to facilitate easy "drop/add" it is possible to transmit only one (1)
TS_program_map_section per PMT_PID. This may cause an undesirable increase in PSI bitrate usage, however.
Using a frequency of 25 Hz for the two PSI Tables, yields a worst-case contribution to the signal acquisition time of
approximately 80 ms. This would only occur when the required PAT data was "just missed" and then, once the PAT
was acquired and decoded, the required PMT data was also "just missed". This doubling of the worst case acquisition
time is one disadvantage of the extra level of indirection introduced by the PAT structure. This effect could be reduced
by coordinated transmission of related PAT and PMT packets. Presumably, the advantage that this approach offers for
"drop/add" re-multiplexing operations is compensatory.
With the 25-Hz PSI frequency, the following examples may be constructed (all examples leave ample allowance for
various datalink, FEC, CA and routing overheads):
6-MHz CATV channel
•
five 5.2-Mbit/s programs:
26.5 Mbit/s (includes transport overhead)
•
total PSI bandwidth:
5.2 kbit/s
•
CA bandwidth:
500 kbit/s
total ITU-T Rec. H.222.0 | ISO/IEC 13818-1 transport bandwidth: 27.1 Mbit/s
•
PSI Overhead: 0.28 %
OC-3 fiber channel (155 Mbit/s)
•
32 3.9-Mbit/s programs:
127.5 Mbit/s (includes transport overhead)
•
total PSI bandwidth:
225.6 kbit/s
•
CA bandwidth:
500 kbit/s
total ITU-T Rec. H.222.0 | ISO/IEC 13818-1 transport bandwidth: 128.2 Mbit/s
•
PSI Overhead: 0.18 %
C-band satellite transponder
•
128 256-kbit/s audio programs:
33.5 Mbit/s (includes transport overhead)
•
total PSI bandwidth:
826.4 kbit/s
•
CA bandwidth:
500 kbit/s
total ITU-T Rec. H.222.0 | ISO/IEC 13818-1 transport bandwidth: 34.7 Mbit/s
•
PSI Overhead: 2.4 % (actually would be lower if only one PID used per program)
As expected, the percent overhead increases for lower-rate services since many more services are possible per Transport
Stream. However, the overhead is not excessive in all cases. Higher transmission rates (than 25 Hz) for the PSI data
may be used to decrease the impact on channel acquisition time with only modest bitrate demand increases.
140
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Annex D
Systems timing model and application implications
of this Recommendation | International Standard
(This annex does not form an integral part of this Recommendation | International Standard)
D.0
Introduction
The ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Systems specification includes a specific timing model for the sampling,
encoding, encoder buffering, transmission, reception, decoder buffering, decoding, and presentation of digital audio and
video in combination. This model is embodied directly in the specification of the syntax and semantic requirements of
compliant ITU-T Rec. H.222.0 | ISO/IEC 13818-1 data streams. Given that a decoding system receives a compliant bit
stream that is delivered correctly in accordance with the timing model it is straightforward to implement the decoder
such that it produces as output high quality audio and video which are properly synchronized. There is no normative
requirement, however, that decoders be implemented in such a way as to provide such high quality presentation output.
In applications where the data are not delivered to the decoder with correct timing, it may be possible to produce the
desired presentation output; however, such capabilities are not in general guaranteed. This informative annex describes
the ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Systems timing model in detail, and gives some suggestions for
implementing decoder systems to suit some typical applications.
D.0.1
Timing model
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Systems embodies a timing model in which all digitized pictures and audio
samples that enter the encoder are presented exactly once each, after a constant end to end delay, at the output of the
decoder. As such, the sample rates, i.e., the video frame rate and the audio sample rate, are precisely the same at the
decoder as they are at the encoder. This timing model is diagrammed in Figure D.1:
Video In
Encoder
Buffer
System
Coder
and
Multiplex
Audio In
Encoder
Storage
or Transmission
Decoder
Video
Out
Buffer
Decoder
Audio
Out
System
decoder
and
demultiplex
Buffer
Variable delay
Buffer
Constant delay
Variable delay
TISO5890-95/d17
Constant delay
Figure D.1 – Constant delay model
As indicated in Figure D.1, the delay from the input to the encoder to the output or presentation from the decoder is
constant in this model1), while the delay through each of the encoder and decoder buffers is variable. Not only is the
delay through each of these buffers variable within the path of one elementary stream, the individual buffer delays in
the video and audio paths differ as well. Therefore the relative location of coded bits representing audio or video in the
combined stream does not indicate synchronization information. The relative location of coded audio and video is
constrained only by the System Target Decoder (STD) model such that the decoder buffers must behave properly;
therefore coded audio and video that represent sound and pictures that are to be presented simultaneously may be
separated in time within the coded bit stream by as much as one second, which is the maximum decoder buffer delay
that is allowed in the STD model.
____________________
1) Constant delay as indicated for the entire system is required for correct synchronizaton, however some deviations are possible.
Network delay is discussed as being constant. Slight deviations may be tolerated, and network adaptation may allow greater
variations of network delay. Both of these are discussed later.
ITU-T Rec. H.222.0 (05/2006)
141
ISO/IEC 13818-1:2007 (E)
The audio and video sample rates at the encoder are significantly different from one another, and may or may not have
an exact and fixed relationship to one another, depending on whether the combined stream is a Program Stream or a
Transport Stream, and on whether the System_audio_locked and System_video_locked flags are set in the Program
Stream. The duration of a block of audio samples (an audio presentation unit) is generally not the same as the duration
of a video picture.
There is a single, common system clock in the encoder, and this clock is used to create timestamps that indicate the
correct presentation and decoding timing of audio and video, as well as to create timestamps that indicate the
instantaneous values of the system clock itself at sampled intervals. The timestamps that indicate the presentation time
of audio and video are called Presentation Time Stamps (PTS). Those that indicate the decoding time are called
Decoding Timestamps (DTS), and those that indicate the value of the system clock are called the System Clock
Reference (SCR) in Program Streams and the Program Clock Reference (PCR) in Transport Streams. It is the presence
of this common system clock in the encoder, the timestamps that are created from it, and the recreation of the clock in
the decoder and the correct use of the timestamps that provide the facility to synchronize properly the operation of the
decoder.
Encoder implementations may not follow this model exactly; however, the data stream which results from the actual
encoder, storage system, network, and one or more multiplexor must follow the model precisely. (Delivery of the data
may deviate somewhat, depending on the application). Therefore in this annex, the term "encoder system clock" is used
to mean either the actual common system clock as described in this model or the equivalent function, however it may be
implemented.
Since the end-to-end delay through the entire system is constant, the audio and video presentations are precisely
synchronized. The construction of System bit streams is constrained such that when they are decoded by a decoder that
follows this model with the appropriately sized decoder buffers, those buffers are guaranteed never to overflow nor
underflow, with specific exceptions allowing intentional underflow.
In order for the decoder system to incur the precise amount of delay that causes the entire end-to-end delay to be
constant, it is necessary for the decoder to have a system clock whose frequency of operation and absolute instantaneous
value match those of the encoder. The information necessary to convey the encoder's system clock is encoded in the
SCR or PCR; this function is explained below.
Decoders which are implemented in accordance with this timing model such that they present audio samples and video
pictures exactly once (with specific intentionally coded exceptions), at a constant rate, and such that decoder buffers
behave as in the model, are referred to in this annex as precisely timed decoders, or those that produce precisely timed
output. Decoder implementations are not required by this International Standard to present audio and video in
accordance with this model; it is possible to construct decoders that do not have constant delay, or equivalently do not
present each picture or audio sample exactly once. In such implementations, however, the synchronization between
presented audio and video may not be precise, and the behaviour of the decoder buffers may not follow the reference
decoder model. It is important to avoid overflow at the decoder buffers, as overflow causes a loss of data that may have
significant effects on the resulting decoding process. This annex covers primarily the operation of such precisely timed
decoders and some of the options that are available in implementing these decoders.
D.0.2
Audio and video presentation synchronization
Within the coding of this Recommendation | International Standard Systems data are timestamps concerning the
presentation and decoding of video pictures and blocks of audio samples. The pictures and blocks are called
"Presentation Units", abbreviated PU. The sets of coded bits which represent the PUs and which are included within the
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 bit stream are called "Access Units", abbreviated AU. An audio access unit is
abbreviated AAU, and a video access unit is abbreviated VAU. In ISO/IEC 13818-3 audio the term "audio frame" has
the same meaning as AAU or APU (audio presentation unit) depending on the context. A video presentation unit (VPU)
is a picture, and a VAU is a coded picture.
Some, but not necessarily all, AAUs and VAUs have associated with them PTSs. A PTS indicates the time that the PU
which results from decoding the AU which is associated with the PTS should be presented to the user. The audio PTSs
and video PTSs are both samples from a common time clock, which is referred to as the System Time Clock or STC.
With the correct values of audio and video PTSs included in the data stream, and with the presentation of the audio and
video PUs occurring at the time indicated by the appropriate PTSs in terms of the common STC, precise
synchronization of the presented audio and video is achieved at the decoding system. While the STC is not part of the
normative content of this Recommendation | International Standard, and the equivalent information is conveyed in this
Recommendation | International Standard via such terms as the system_clock_frequency, the STC is an important and
convenient element for explaining the timing model, and it is generally practical to implement encoders and decoders
which include an STC in some form.
PTSs are required for the conveyance of accurate relative timing between audio and video, since the audio and video
PUs generally have significantly different and essentially unrelated duration. For example, audio PUs of 1152 samples
142
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
each at a sample rate of 44 100 samples per second have a duration of approximately 26.12 ms, and video PUs at a
frame rate of 29.97 Hz have a duration of approximately 33.76 ms. In general the temporal boundaries of APUs and
VPUs rarely, if ever, coincide. Separate PTSs for audio and video provide the information that indicates the precise
temporal relation of audio and video PUs without requiring any specific relationship between the duration and interval
of audio and video PUs.
The values of the PTS fields are defined in terms of the System Target Decoder or STD, which is a fundamental
normative constraint on all System bit streams. The STD is a mathematical model of an idealized decoder which
specifies precisely the movement of all bits into and out of the decoder's buffers, and the basic semantic constraint
imposed on the bit stream is that the buffers within the STD must never overflow nor underflow, with specific
exceptions provided for underflow in special cases. In the STD model the virtual decoder is always exactly
synchronized with the data source, and audio and video decoding and presentation are exactly synchronized. While
exact and consistent, the STD is somewhat simplified with respect to physical implementations of decoders in order to
clarify its specification and to facilitate its broad application to a variety of decoder implementations. In particular, in
the STD model each of the operations performed on the bit stream in the decoder is performed instantaneously, with the
obvious exception of the time that bits spend in the decoder buffers. In a real decoder system the individual audio and
video decoders do not perform instantaneously, and their delays must be taken into account in the design of the
implementation. For example, if video pictures are decoded in exactly one picture presentation interval 1/P, where P is
the frame rate, and compressed video data are arriving at the decoder at bit rate R, the completion of removing bits
associated with each picture is delayed from the time indicated in the PTS and DTS fields by 1/P, and the video decoder
buffer must be larger than that specified in the STD model by R/P. The video presentation is likewise delayed with
respect to the STD, and the PTS should be handled accordingly. Since the video is delayed, the audio decoding and
presentation should be delayed by a similar amount in order to provide correct synchronization. Delaying decoding and
presentation of audio and video in a decoder may be implemented for example by adding a constant to the PTS values
when they are used within the decoder.
Another difference between the STD and precise practical decoder implementation is that in the STD model the explicit
assumption is made that the final audio and video output is presented to the user instantaneously and without further
delay. This may not be the case in practice, particularly with cathode-ray tube displays, and this additional delay should
also be taken into account in the design. Encoders are required to encode audio and video such that the correct
synchronization is achieved when the data is decoded with the STD. Delays in the input and sampling of audio and
video, such as video camera optical charge integration, must be taken into account in the encoder.
In the STD model proper synchronization is assumed and the timestamps and buffer behaviour are tested against this
assumption as a condition of bit stream validity. Of course in a physical decoder precise synchronization is not
automatically the case, particularly upon start-up and in the presence of timing jitter. Precise decoder timing is a goal to
be targeted by decoder designs. Inaccuracy in decoder timing affects the behaviour of the decoder buffers. These topics
are covered in more detail in later subclauses of this annex.
The STD includes Decoding Time Stamps (DTS) as well as PTS fields. The DTS refers to the time that an AU is to be
extracted from the decoder buffer and decoded in the STD model. Since the audio and video elementary stream
decoders are instantaneous in the STD, the decoding time and presentation time are identical in most cases; the only
exception occurs with video pictures which have undergone re-ordering within the coded bit stream, i.e., I- and
P-pictures in the case of non-low-delay video sequences. In cases where re-ordering exists, a temporary delay buffer in
the video decoder is used to store the appropriate decoded I- or P-picture until it should be presented. In all cases where
the decoding and presentation times are identical in the STD, i.e., all AAUs, B-picture VAUs, and I- and P-picture
VAUs within low-delay video sequences, the DTS is not coded, as it would have the same value as the PTS. Where the
values differ, both are coded if either is coded. For all AUs where only the PTS is coded, this field may be interpreted as
being both the PTS and the DTS.
Since PTS and DTS values are not required for every AAU and VAU, the decoder may choose to interpolate values
which are not coded. PTS values are required with intervals not exceeding 700 ms in each elementary audio and video
stream. These time intervals are measured in presentation time, that is, in the same context as the values of the fields,
not in terms of the times that the fields are transmitted and received. In cases of data streams where the system, video
and audio clocks are locked, as defined in the normative part of this Recommendation | International Standard, each AU
following one for which a DTS or PTS is explicitly coded has an effective decoding time of the sum of that for the
previous AU plus a fixed and specified difference in value of the STC. For example, in video coded at 29.97 Hz each
picture has a difference in time of 3003 cycles of the 90-kHz portion of the STC from the previous picture when the
video and system clocks are locked. The same time relationship exists for decoding successive AUs, although
re-ordering delay in the decoder affects the relationship between decoder AUs and presented PUs. When the data stream
is coded such that the video or audio clock is not locked to the system clock the time difference between decoding
successive AUs may be estimated using the same values as indicated above; however, these time differences are not
exact due to the fact that relationships between the frame rate, audio sample rate, and system clock frequency were not
exact at the encoder.
ITU-T Rec. H.222.0 (05/2006)
143
ISO/IEC 13818-1:2007 (E)
Note that the PTS and DTS fields do not, by themselves, indicate the correct fullness of the decoder buffers at start up
nor at any other time, and equivalently, they do not indicate the amount of time delay that should elapse upon receiving
the initial bits of a data stream before decoding should start. This information is retrieved by combining the functions of
the PTS and DTS fields and correct clock recovery, which is covered below. In the STD model, and therefore in
decoders which are modelled after it, the decoder buffer behaviour is determined completely by the SCR (or PCR)
values, the times that they are received, and the PTS and DTS values, assuming that data is delivered in accordance with
the timing model. This information specifies the time that coded data spends in the decoder buffers. The amount of data
that is in the coded data buffers is not explicitly specified, and this information is not necessary, since the timing is fully
specified. Note also that the fullness of the data buffers may vary considerably with time in a fashion that is not
predictable by the decoder, except through the proper use of the timestamps.
In order for the audio and video PTSs to refer correctly to a common STC, a correctly timed common clock must be
made available within the decoder system. This is subject of the next subclause.
D.0.3
System Time Clock recovery in the decoder
Within the ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Systems data stream there are, in addition to the PTS and
DTS fields, clock reference timestamps. These references are samples of the system time clock, which are applicable
both to a decoder and to an encoder. They have a resolution of one part in 27 000 000 per second, and occur at intervals
up to 100 ms in Transport Streams, or up to 700 ms in Program Streams. As such, they can be utilized to implement
clock reconstruction control loops in decoders with sufficient accuracy for all identified applications.
In the Program Stream, the clock reference field is called the System Clock Reference or SCR. In the Transport Stream,
the clock reference field is called the Program Clock Reference or PCR. In general the SCR and PCR definitions may
be considered to be equivalent, although there are distinctions. The remainder of this subclause uses the term SCR for
clarity; the same statements apply to the PCR except where otherwise noted. The PCR in Transport Streams provides
the clock reference for one program, where a program is a set of elementary streams that have a common time base and
are intended for synchronized decoding and presentation. There may be multiple programs in one Transport Stream, and
each may have an independent time base and a separate set of PCRs.
The SCR field indicates the correct value of the STC when the SCR is received at the decoder. Since the SCR occupies
more than one byte of data, and System data streams are defined as streams of bytes, the SCR is defined to arrive at the
decoder when the last byte of the system_clock_reference_base field is received at the decoder. Alternatively the SCR
can be interpreted as the time that the SCR field should arrive at the decoder, assuming that the STC is already known
to be correct. Which interpretation is used depends on the structure of the application system. In applications where the
data source can be controlled by the decoder, such as a locally attached DSM, it is possible for the decoder to have an
autonomous STC frequency, and so the STC need not be recovered. In many important applications, however, this
assumption cannot be made correctly. For example, consider the case where a data stream is delivered simultaneously to
multiple decoders. If each decoder has its own autonomous STC with its own independent clock frequency, the SCRs
cannot be assured to arrive at the correct time at all decoders; one decoder will in general require the SCRs sooner than
the source is delivering them, while another requires them later. This difference cannot be made up with a finite size
data buffer over an unbounded length of time of data reception. Therefore the following addresses primarily the case
where the STC must slave its timing to the received SCRs (or PCRs).
In a correctly constructed and delivered ITU-T Rec. H.222.0 | ISO/IEC 13818-1 data stream, each SCR arrives at the
decoder at precisely the time indicated by the value of that SCR. In this context, "time" means correct value of the STC.
In concept, this STC value is the same value that the encoder's STC had when the SCR was stored or transmitted.
However, the encoding may have been performed not in real time or the data stream may have been modified since it
was originally encoded, and in general the encoder or data source may be implemented in a variety of ways such that
the encoder's STC may be a theoretical quantity.
If the decoder's clock frequency matches exactly that of the encoder, then the decoding and presentation of video and
audio will automatically have the same rate as those at the encoder, and the end-to-end delay will be constant. With
matched encoder and decoder clock frequencies, any correct SCR value can be used to set the instantaneous value of the
decoder's STC, and from that time on the decoder's STC will match that of the encoder without the need for further
adjustment. This condition remains true until there is a discontinuity of timing, such as the end of a Program Stream or
the presence of a discontinuity indicator in a Transport Stream.
In practice a decoder's free-running system clock frequency will not match the encoder's system clock frequency which
is sampled and indicated in the SCR values. The decoder's STC can be made to slave its timing to the encoder using the
received SCRs. The prototypical method of slaving the decoder's clock to the received data stream is via a phase-locked
loop (PLL). Variations of a basic PLL, or other methods, may be appropriate, depending on the specific application
requirements.
A straight forward PLL which recovers the STC in a decoder is diagrammed and described here.
144
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Figure D.2 shows a classic PLL, except that the reference and feedback terms are numbers (STC and SCR or PCR
values) instead of signal events such as edges.
SCR or PCR
Subtractor
e
Low-pass
filter and gain
f
Voltagecontrolled
oscillator
Load
System Time Clock
27 MHz
System
clock
frequency
Counter
TISO5900-95/d18
Figure D.2 – STC recovery using PLL
Upon initial acquisition of a new time base, i.e., a new program, the STC is set to the current value encoded in the
SCRs. Typically the first SCR is loaded directly into the STC counter, and the PLL is subsequently operated as a closed
loop. Variations on this method may be appropriate, i.e., if the values of the SCRs are suspect due to jitter or errors.
The closed-loop action of the PLL is as follows. At the moment that each SCR (or PCR) arrives at the decoder, that
value is compared with the current value of the STC. The difference is a number, which has one part in units of 90 kHz
and one part in terms of 300 times this frequency, i.e., 27 MHz. The difference value is linearized to be in a single
number space, typically units of 27 MHz, and is called "e", the error term in the loop. The sequence of e terms is input
to the low-pass filter and gain stage, which are designed according to the requirements of the application. The output of
this stage is a control signal "f" which controls the instantaneous frequency of the Voltage Controlled Oscillator (VCO).
The output of the VCO is an oscillator signal with a nominal frequency of 27-MHz; this signal is used as the system
clock frequency within the decoder. The 27-MHz clock is input to a counter which produces the current STC values,
which consist of both a 27-MHz extension, produced by dividing by 300, and a 90-kHz base value which is derived by
counting the 90-kHz results in a 33-bit counter. The 33-bit, 90-kHz portion of the STC output is used as needed for
comparison with PTS and DTS values. The complete STC is also the feedback input to the subtractor.
The bounded maximum interval between successive SCRs (700 ms) or PCRs (100 ms) allows the design and
construction of PLLs which are known to be stable. The bandwidth of the PLLs has an upper bound imposed by this
interval. As shown below, in many applications the PLL required has a very low bandwidth, and so this bound typically
does not impose a significant limitation on the decoder design and performance.
If the free-running or initial frequency of the VCO is close enough to the correct, encoder's system clock frequency, the
decoder may be able to operate satisfactorily as soon as the STC is initialized correctly, before the PLL has reached a
defined locked state. For a given decoder STC frequency which differs by a bounded amount from the frequency
encoded in the SCRs and which is within the absolute frequency bounds required by the decoder application, the effect
of the mismatch between the encoder's and the decoder's STC frequencies if there were not PLL is the gradual and
unavoidable increase or decrease of the fullness of the decoder's buffers, such that overflow or underflow would occur
eventually with any finite size of decoder buffers. Therefore the amount of time allowable before the decoder's STC
frequency is locked to that of the encoder is determined by the allowable amount of additional decoder buffer size and
delay.
If the SCRs are received by the decoder with values and timing that reflect instantaneously correct samples of a constant
frequency STC in the encoder, then the error term e converges to an essentially constant value after the loop has reached
the locked state. This condition of correct SCR values is synonymous with either constant-delay storage and
transmission of the data from the encoder to the decoder, or if this delay is not constant, the effective equivalent of
constant delay storage and transmission with the SCR values having been corrected to reflect the variations in delay.
With the values of e converging to a constant, variations in the instantaneous VCO frequency become essentially zero
after the loop is locked; the VCO is said to have very little jitter or frequency slew. While the loop is in the process of
locking, the rate of change of the VCO frequency, the frequency slew rate, can be controlled strictly by the design of the
low pass filter and gain stage. In general the VCO slew rate can be designed to meet application requirements, subject
to constraints of decoder buffer size and delay.
ITU-T Rec. H.222.0 (05/2006)
145
ISO/IEC 13818-1:2007 (E)
D.0.4
SCR and PCR jitter
If a network or a Transport Stream re-multiplexor varies the delay in delivering the data stream from the encoder or
storage system to the decoder, such variations tend to cause a difference between the values of the SCRs (or PCRs) and
the values that they should have when they are actually received. This is referred to as SCR or PCR jitter. For example,
if the delay in delivering one SCR is greater than the delay experienced by other similar fields in the same program, that
SCR is late. Similarly, if the delay is less than for other clock reference fields in the program, the field is early.
Timing jitter at the input to a decoder is reflected in the combination of the values of the SCRs and the times when they
are received. Assuming a clock recovery structure as illustrated in Figure D.2, any such timing jitter will be reflected in
the values of the error term e; and non-zero values of e induce variations in the values of f, resulting in variations in the
frequency of the 27-MHz system clock. Variations in the frequency of the recovered clock may or may not be
acceptable within decoder systems, depending on the specific application requirements. For example, in precisely timed
decoders that produce composite video output, the recovered clock frequency is typically used to generate the
composite video sample clock and the chroma sub-carrier; the applicable specifications for sub-carrier frequency
stability may permit only very slow adjustment of the system clock frequency. In applications where a significant
amount of SCR or PCR jitter is present at the decoder input and there are tight constraints on the frequency slew rate of
the STC, the constraints of reasonable additional decoder buffer size and delay may not allow proper operation.
The presence of SCR or PCR jitter may be caused for example by network transmission which incorporates packet or
cell multiplexing or variable delay of packets through the network, as may be caused by queuing delays or by variable
network access time in shared-media systems.
Multiplexing or re-multiplexing of Transport or Program Streams changes the order and relative temporal location of
data packets and therefore also of SCRs or PCRs. The change in temporal location of SCRs causes the value of
previously correct SCRs to become incorrect, since in general the time at which they are delivered via a constant delay
network is not correctly represented by their values. Similarly, a Program Stream or Transport Stream with correct
SCRs or PCRs may be delivered over a network which imposes a variable delay on the data stream, without correcting
the SCR or PCR values. The effect is once again SCR or PCR jitter, with attendant effects on the decoder design and
performance. The worst case amount of jitter which is imposed by a network on the SCRs or PCRs received at a
decoder depends on a number of factors which are beyond the scope of this Recommendation | International Standard,
including the depth of queues implemented in each of the network switches and the total number of network switches or
re-multiplexing operations which operate in cascade on the data stream.
In the case of a Transport Stream, correction of PCRs is necessary in a re-multiplex operation, creating a new Transport
Stream from one or more Transport Streams. This correction is accomplished by adding a correction term to the PCR;
this term can be computed as:
∆PCR = delact – delconst
where delact is the actual delay experienced by the PCR, and delconst is a constant which is used for all PCRs of that
program. The value which should be used for delconst will depend on the strategy used by the original
encoder/multiplexor. This strategy could be, for instance, to schedule packets as early as possible, in order to allow later
transmission links to delay them. In Table D.1, three different multiplex strategies are shown together with the
appropriate value for delconst.
Table D.1 – Re-multiplexing strategy
Strategy
delconst
Early
Late
Middle
delmin
delmax
delavg
When designing a system, private agreements may be needed as to what strategy should be used by the
encoder/multiplexors, since this will have an effect on the ability to perform any additional re-multiplexing.
The amount of multiplex jitter allowed is not normatively bounded in this Recommendation | International Standard.
However, 4 ms is intended to be the maximum amount of jitter in a well-behaved system.
In systems which include re-multiplexors special care might be necessary to ensure that the information in the Transport
Stream is consistent. In particular, this applies to PSI and to discontinuity points. Changes in PSI tables might need to
be inserted into a Transport Stream in such a way that subsequent re-multiplexor steps never move them so far that
information becomes incorrect. For instance, a new version of PMT section in some cases should not be sent within
4 ms of the data affected by the change.
146
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Similarly, it may be necessary for an encoder/mux to avoid inserting PTS or DTS in a ±4-ms window around a
discontinuity point.
D.0.5
Clock recovery in the presence of network jitter
In applications in which there is any significant amount of jitter present in the received clock reference timestamps,
there are several choices available for decoder designs; how the decoder is designed depends in large part on the
requirements for the decoder's output signal characteristics as well as the characteristics of the input data and jitter.
Decoders in various applications may have differing requirements for the accuracy and stability of the recovered system
clock, and the degree of this stability and accuracy that is required may be considered to fall along a single axis. One
extreme of this axis may be considered to be those applications where the reconstructed system clock is used directly to
synthesize a chroma sub-carrier for use in composite video. This requirement generally exists where the presented video
is of the precisely timed type, as described above, such that each coded picture is presented exactly once, and where the
output is composite video in compliance with the applicable specifications. In that case the chroma sub-carrier, the pixel
clock, and the frame rate all have exactly specified ratios, and all of these have a defined relationship to the system
clock. The composite video sub-carrier must have at least sufficient accuracy and stability that any normal television
receiver's chroma sub-carrier PLL can lock to the sub-carrier, and the chroma signals which are demodulated using the
recovered sub-carrier do not show visible chrominance phase artifacts. The requirement in some applications is to use
the system clock to generate a sub-carrier that is in full compliance with the NTSC, PAL, or SECAM specifications,
which are typically even more stringent than those imposed by typical television receivers. For example, the SMPTE
specification for NTSC requires a sub-carrier accuracy of 3 ppm, with a maximum short term jitter of 1 ns per
horizontal line time and a maximum long term drift of 0.1 Hz per second.
In applications where the recovered system clock is not used to generate a chroma sub-carrier, it may still be used to
generate a pixel clock for video and it may be used to generate a sample clock for audio. These clocks have their own
stability requirements that depend on the assumptions made about the receiving display monitor and on the acceptable
amount of audio frequency drift, or "wow and flutter", at the decoder's output.
In applications where each picture and each audio sample are not presented exactly once, i.e., picture and audio sample
"slipping" is allowed, the system clock may have relatively loose accuracy and stability requirements. This type of
decoder may not have precise audio-video presentation synchronization, and the resulting audio and video presentation
may not have the same quality as for precisely timed decoders.
The choice of requirements for the accuracy and stability of the recovered system clock is application dependent. The
following focuses on the most stringent requirement which is identified above, i.e., where the system clock is to be used
to generate a chroma sub-carrier.
D.0.6
System clock used for chroma sub-carrier generation
The decoder design requirements can be determined from the requirements on the resulting sub-carrier and the
maximum amount of network jitter that must be accepted. Similarly, if the system clock performance requirements and
the decoder design's capabilities are known, the tolerable maximum network jitter can be determined. While it is beyond
the scope of this Recommendation | International Standard to state such requirements, the numbers which are needed to
specify the design are identified in order to clarify the statement of the problem and to illustrate a representative design
approach.
With a clock recovery PLL circuit as illustrated in Figure D.2, the recovered system clock must meet the requirements
of a worst case frequency deviation from the nominal, measured in units of ppm (parts per million), and a worst case
frequency slew rate, measured in ppm/s (ppm per second). The peak-to-peak uncorrected network timing jitter has a
value that may be specified in milliseconds. In such a PLL the network timing jitter appears as the error term e in the
diagram, and since the PLL acts as a low-pass filter on jitter at its input, the worst case effect on the 27-MHz output
frequency occurs when there is a maximum amplitude step function of PCR timing at the input. The value e then has a
maximum amplitude equal to the peak-to-peak jitter, which is represented numerically as the jitter times 2**33 in the
base portion of the SCR or PCR encoding. The maximum rate of change of the output of the low pass filter (LPF), f,
with this maximum value of e at its input, directly determines the maximum frequency slew rate of the 27-MHz output.
For any given maximum value of e and maximum rate of change of f a LPF can be specified. However, as the gain or
cut-off frequency of the LPF is reduced, the time required for the PLL to lock to the frequency represented by the SCRs
or PCRs is increased. Implementation of PLLs with very long time constants can be achieved through the use of digital
LPF techniques, and possibly analogue filter techniques. With digital LPF implementations, when the frequency term f
is the input to an analogue VCO, f is quantized by a digital to analogue converter, whose step size should be considered
when calculating the maximum slew rate of the output frequency.
In order to ensure that e converges to a value that approaches zero, the open loop gain of the PLL must be very high,
such as might be implemented in an integrator function in the low-pass filter in the PLL.
ITU-T Rec. H.222.0 (05/2006)
147
ISO/IEC 13818-1:2007 (E)
With a given accuracy requirement, it may be reasonable to construct the PLL such that the initial operating frequency
of the PLL meets the accuracy requirement. In this case the initial 27-MHz frequency before the PLL is locked is
sufficiently accurate to meet the stated output frequency requirement. If it were not for the fact that the decoder's buffers
would eventually overflow or underflow, this initial system clock frequency would be sufficient for long term
operation. However, from the time the decoder begins to receive and decode data until the system clock is locked to the
time and clock frequency that is represented by the received SCRs or PCRs, data is arriving at the buffers at a different
rate than it is being extracted, or equivalently the decoder is extracting access units at times that differ from those of the
System Target Decoder (STD) model. The decoder buffers will continue to become more or less full than those of the
STD according to the trajectory of recovered system clock frequency with respect to the encoder's clock frequency.
Depending on the relative initial VCO frequency and encoder system clock frequency, decoder buffer fullness is either
increasing or decreasing. Assuming this relationship is not known, the decoder needs additional data buffering to allow
for either case. The decoder should be constructed to delay all decoding operations by an amount of time that is at least
equal to the amount of time that is represented by the additional buffering that is allocated for the case of the initial
VCO frequency being greater than the encoder's clock frequency, in order to prevent buffer underflow. If the initial
VCO frequency is not sufficiently accurate to meet the stated accuracy requirements, then the PLL must reach the
locked state before decoding may begin, and there is a different set of considerations regarding the PLL behaviour
during this time and the amount of additional buffering and static delay which is appropriate.
A step function in the input timing jitter which produces a step function in the error term e of the PLL in Figure D.2
must produce an output frequency term f such that when it is multiplied by the VCO gain the maximum rate of change
is less than the specified frequency slew rate. The gain of the VCO is stated in terms of the amount of the change in
output frequency with respect to a change in control input. An additional constraint on the LPF in the PLL is that the
static value of e when the loop is locked must be bounded in order to bound the amount of additional buffering and
static decoding delay that must be implemented. This term is minimized when the LPF has very high DC gain.
Clock recovery circuits which differ somewhat from that shown in Figure D.2 may be practical. For example, it may be
possible to implement a control loop with a Numerically Controlled Oscillator (NCO) instead of a VCO, wherein the
NCO uses a fixed frequency oscillator and clock cycles are inserted or deleted from normally periodic events at the
output in order to adjust the decoding and presentation timing. There may be some difficulties with this type of
approach when used with composite video, as there is a tendency to cause either problematic phase shifts of the subcarrier or jitter in the horizontal or vertical scan timing. One possible approach is to adjust the period of horizontal scans
at the start of vertical blanking, while maintaining the phase of the chroma sub-carrier.
In summary, depending on the values specified for the requirements, it may or may not be practical to construct a
decoder which reconstructs the system clock with sufficient accuracy and stability, while maintaining desired decoder
buffer sizes and added decoding delay.
D.0.7
Component video and audio reconstruction
If component video is produced at the decoder output, the requirements for timing accuracy and stability are generally
less stringent than is the case for composite video. Typically the frequency tolerance is that which the display deflection
circuitry can accept, and the stability tolerance is determined by the need to avoid visible image displacement on the
display.
The same principles as illustrated above apply; however, the specific requirements are generally easier to meet.
Audio sample rate reconstruction again follows the same principles; however, the stability requirement is determined by
the amount of acceptable long and short term sample rate variation. Using a PLL approach as illustrated in the previous
subclause, short term deviation can be made to be very small, and longer term frequency variation is manifested as
variation in perceived pitch. Again, once specified bounds on this variation are set specific design requirements can be
determined.
D.0.8
Frame slipping
In some applications where precise decoder timing is not required, the decoder's system time clock may not adjust its
operating frequency to match the frequency represented by received SCRs (or PCRs); it may have a free-running
27 MHz clock instead, while still slaving the decoder's STC to the received data. In this case the STC value must be
updated as needed to match the received SCRs. Updating the STC upon receipt of SCRs causes discontinuities in the
STC value. The magnitude of these discontinuities depends upon the difference between the decoder's 27-MHz
frequency and the encoder's 27-MHz, i.e., that which is represented by the received SCRs, and upon the time interval
between successive received SCRs or PCRs. Since the decoder's 27-MHz system clock frequency is not locked to that
of the received data, it cannot be used to generate the video or audio sample clocks while maintaining the precise timing
assumptions of presenting each video and audio presentation unit exactly once and of maintaining the same picture and
audio presentation rate at the decoder and the encoder, with precise audio and video synchronization. There are multiple
possibilities for implementing decoding and presentation systems using this structure.
148
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
In one type of implementation, the pictures and audio samples are decoded at the time indicated by the decoder's STC,
while they are presented at slightly different times, according to the locally produced sample clocks. Depending on the
relationships of the decoder's sample clocks to the encoder's system clock, pictures and audio samples may on occasion
be presented more than one each or not at all; this is referred to as "frame slipping" or "sample slipping", in the case of
audio. There may be perceptible artifacts introduced by this mechanism. The audio-video synchronization will in
general not be precise, due to the units of time over which pictures, and perhaps audio presentation units, are repeated or
deleted. Depending on the specific implementation, additional buffering in the decoder is generally needed for coded
data or decoded presentation data. Decoding may be performed immediately before presentation, and not quite at the
time indicated in the decoder's STC, or decoded presentation units may be stored for delayed and possibly repeated
presentation. If decoding is performed at the time of presentation, a mechanism is required to support deleting the
presentation of pictures and audio samples without causing problems in the decoding of predictively coded data.
D.0.9
Smoothing of network jitter
In some applications it may be possible to introduce a mechanism between a network and a decoder in order to reduce
the degree of jitter which is introduced by a network. Whether such an approach is feasible depends on the type of
streams received and the amount and type of jitter which is expected.
Both the Transport Stream and the Program Stream indicate within their syntax the rate at which the stream is intended
to be input to a decoder. These indicated rates are not precise, and cannot be used to reconstruct data stream timing
exactly. They may, however, be useful as part of a smoothing mechanism.
For example, a Transport Stream may be received from a network such that the data is delivered in bursts. It is possible
to buffer the received data and to transmit data from the buffer to the decoder at an approximately constant rate such
that the buffer remains approximately one-half full.
However, a variable rate stream should not be delivered at constant rate, and with variable rate streams the smoothing
buffer should not always be one-half full. A constant average delay through the buffer requires a buffer fullness that
varies with the data rate. The rate that data should be extracted from the buffer and input to the decoder can be
approximated using the rate information present in the data stream. In Transport Streams the intended rate is determined
by the values of the PCR fields and the number of Transport Stream bytes between them. In Program Streams the
intended rate is explicitly specified as the Program_mux_rate, although as specified in this Recommendation |
International Standard the rate may drop to zero at SCR locations, i.e., if the SCR arrives before the time expected when
the data is delivered at the indicated rate.
In the case of variable rate streams, the correct fullness of the smoothing buffer varies with time, and may not be
determined exactly from the rate information. In an alternative approach, the SCRs or PCRs may be used to measure the
time when data enter the buffer and to control the time when data leave the buffer. A control loop can be designed to
provide constant average delay through the buffer. It may be observed that such a design is similar to the control loop
illustrated in Figure D.2. The performance obtainable from inserting such a smoothing mechanism before a decoder can
also be achieved by cascading multiple clock recovery PLLs. The rejection of jitter from the received timing will
benefit from the combined low pass filter effect of the cascaded PLLs.
Annex E
Data transmission applications
(This annex does not form an integral part of this Recommendation | International Standard)
E.0
General considerations
•
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 transport multiplex will be used to transmit data as well as video
and audio.
•
Data elementary streams are not continuous as may appear video and audio streams in broadcast
applications.
•
While it is already possible to identify the beginning of a PES packet, it is not always possible to identify
the end of a PES packet by the beginning of the next PES packet, because it is possible for one or more
Transport packet carrying PES packets to be lost.
ITU-T Rec. H.222.0 (05/2006)
149
ISO/IEC 13818-1:2007 (E)
E.1
Suggestion
A suitable solution is to transmit the following PES packet just after an associated PES packet. A PES packet without
payload may be sent when there are no further PES packets to send.
Table E.1 is an example of such a PES packet.
Table E.1 – PES packet header example
PES packet header fields
packet_start_code_prefix
stream_id
PES_packet_length
'10'
PES_scrambling_control
PES_priority
data_alignment_indicator
copyright
original_or_copy
0x000001
assigned
0x0003
'10'
'00'
'0'
'0'
'0'
'0'
PTS_DTS_flags
ESCR_flag
'00'
'0'
ES_rate_flag
DSM_trick_mode_flag
additional_copy_info_flag
PES_CRC_flag
PES_extension_flag
PES_header_data_length
150
Values
ITU-T Rec. H.222.0 (05/2006)
'0'
'0'
'0'
'0'
'0'
0x00
ISO/IEC 13818-1:2007 (E)
Annex F
Graphics of syntax for this Recommendation | International Standard
(This annex does not form an integral part of this Recommendation | International Standard)
F.0
Introduction
This annex is an informative annex presenting graphically the Transport Stream and Program Stream syntax. This annex
in no way replaces any normative clause(s).
In order to produce clear drawings, not all fields have been fully described or represented. Reserved fields may be
omitted or indicated by areas with no detail. Fields length are indicated in bits.
F.0.1
Transport Stream syntax
See Figure F.1.
188 bytes
Transport
packet
stream
header
payload
header
payload
header
payload
sync
byte
transport
error
indicator
payload
unit start
indicator
transport
priority
PID
transport
scrambling
control
adaptation
field
control
continuity
counter
8
1
1
1
13
2
2
4
adaptation
field
length
discontinuity
indicator
random
access
indicator
elementary
stream
priority
indicator
5 flags
8
1
1
1
5
PCR
OPCR
splice
countdown
transport
private
data
length
42
42
8
8
ltw_valid
flag
ltw
offset
1
15
transport
private
data
optional
fields
adaptation
field
extension
length
3 flags
8
3
piecewise
rate
splice
type
DTS_next_au
22
4
33
adaptation
field
stuffing
bytes
optional
fields
TISO5910-95/d19
2
Figure F.1 – Transport Stream syntax diagram
ITU-T Rec. H.222.0 (05/2006)
151
ISO/IEC 13818-1:2007 (E)
F.0.2
PES packet
See Figure F.2.
packet start
code prefix
stream id
PES
packet
length
24
8
16
optional
PES
HEADER
PES packet data bytes
'10'
PES
scrambling
control
PES
priority
data
alignment
indicator
copyright
original
or copy
2
2
1
1
1
1
PES
header
data
length
7 flags
8
stuffing
bytes
(0xFF)
optional fields
m∗8
8
PTS
DTS
ESCR
ES
rate
DSM
trick
mode
additional
copy info
previous
PES
CRC
33
42
22
8
7
16
PES extension
optional
fields
5 flags
PES
private
data
pack
header
field
program
packet
seq cntr
P-STD
buffer
PES
extension
field length
128
8
8
16
7
PES
extension
field data
TISO5920-95/d20
Figure F.2 – PES packet syntax diagram
F.0.3
Program Association section
See Figure F.3.
table id
section
syntax
indicator
8
1
'0'
1 2
section
length
transport
stream id
12
16
2
version
number
current next
indicator
section
number
last section
number
5
1
8
8
program
number 0
16
network PID
3
...
13
CRC
32
32
program
number i
16
N loop
program
map PID_i
3
...
13
TISO5930-95/d21
Figure F.3 – Program Association section diagram
152
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
F.0.4
CA section
See Figure F.4.
table id
section
syntax
indicator
8
1
'0'
1 2
section
length
18
12
version
number
current next
indicator
section
number
last section
number
5
1
8
8
N-loop
descriptors
CRC
32
32
TISO5940-95/d22
Figure F.4 – Conditional Access section diagram
F.0.5
TS program map section
See Figure F.5.
table id
section
syntax
indicator
8
1
'0'
1 2
program
info
length
section
length
program
number
12
16
version
number
5
2
N loop
descriptors
current next
indicator
N loop
section
number
last section
number
8
8
1
PCR
PID
3
13
4
CRC_32
32
stream type
8
elementary PID
3
13
ES info length
4
12
N-loop descriptors
TISO 5950-95/d23
Figure F.5 – TS program map section diagram
ITU-T Rec. H.222.0 (05/2006)
153
ISO/IEC 13818-1:2007 (E)
F.0.6
Private section
See Figure F.6.
table id
section
syntax
indicator
private
indicator
8
1
1
private
section
length
2
12
N private data bytes
table id
extension
16
2
version
number
current next
indicator
section
number
last section
number
5
1
8
8
N
private
data
CRC
32
32
TISO5960-95/d24
Figure F.6 – Private section diagram
154
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
F.0.7
Program Stream
See Figure F.7.
13818
Program
Stream
pack
layer
pack
header
pack
header
pack 1
pack
start
code
'0
1'
SCR
program
mux rate
32
2
42
22
pack
stuffing
length
5
...
PES packet 1
...
pack 2
pack
stuffing
byte
system
header
PES packet 1
header
length
rate
bound
audio
bound
fixed flag
CSPS
flag
audio
lock flag
video
lock flag
video
bound
32
16
22
6
1
1
1
1
5
P-STD
buffer bound
scale
P-STD
buffer size
bound
8
2
1
13
...
...
system
header start
code
'1
1'
pack n
3
PES packet n
stream id
pack
header
...
N loop
...
TISO5970-95/d25
Figure F.7 – Program Stream diagram
ITU-T Rec. H.222.0 (05/2006)
155
ISO/IEC 13818-1:2007 (E)
F.0.8
Program Stream map
See Figure F.8.
packet start
code prefix
map
stream id
program
stream
map length
current next
indicator
24
8
16
1
elementary stream
map length
program stream
info length
program stream
map version
5
2
7
N loop
descriptors
16
CRC
32
N loops
16
32
stream type
elementary
stream id
elementary stream
info length
8
8
16
N loop
descriptors
TISO5980-95/d26
Figure F.8 – Program Stream map diagram
Annex G
General information
(This annex does not form an integral part of this Recommendation | International Standard)
G.0
General information
G.0.1
Sync byte emulation
In the choice of PID values it is recommended that the periodic emulation of sync bytes be avoided. Such emulation
may potentially occur within the PID field or as a combination of the PID field and adjacent flag settings. It is
recommended that emulation of the sync byte be permitted to occur in the same position of the packet header for a
maximum of 4-consecutive transport packets.
G.0.2
Skipped picture status and decoding process
Assume that the sequence being displayed contains only I- and P-frames. Denote the next picture to be decoded by
picture_next, and the picture currently being displayed by picture_current. Because of the fact that the video encoder
may skip pictures, it is possible that not all of the bits of picture_next are present in the STD buffers EBn or Bn when the
time arrives to remove those bits for instantaneous decoding and display. When this case arises, no bits are removed
from the buffer and picture_current is displayed again. When the next picture display time arrives, if the remainder of
the bits corresponding to picture_next are now in buffer EBn or Bn, all the bits of picture_next are removed and
picture_next is displayed. If all the bits of picture_next are not in the buffer EBn or Bn, the above process of
re-displaying picture_current is repeated. This process is repeated until picture_next can be displayed. Note that if a
PTS preceded picture_next in the bitstream, it will be incorrect by some multiple of the picture display interval, which
itself may depend on some parameters, and must be ignored.
Whenever the skipped picture situation described above occurs, the encoder is required to insert a PTS before the
picture to be decoded after picture_next. This allows the decoder to immediately verify that it has correctly displayed
the received picture sequence.
156
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
G.0.3
Selection of PID values
Applications are encouraged to use low numbered PID values (avoiding reserved values as specified in Table 2-4) and
group values together as much as possible.
G.0.4
PES start_code emulation
Three consecutive bytes having the value of a packet_start_code_prefix (0x000001), which when concatenated with a
fourth byte, may emulate the four bytes of a PES_packet_header at a unintended place in the stream.
Such, so called, start code emulation is not possible in video elementary streams. It is possible in audio and data
elementary streams. It is also possible at the boundary of a PES_packet_header and a PES_packet payload, even if the
PES_packet payload is video.
Annex H
Private data
(This annex does not form an integral part of this Recommendation | International Standard)
H.0
Private data
Private data is any user data which is not coded according to a standard specified by ITU-T | ISO/IEC and referred to in
this Specification. The contents of this data is not and shall not be specified within this Recommendation | International
Standard in the future. The STD defined in this Specification does not cover private data other than the demultiplex
process. A private party may define each STD for private streams.
Private data may be carried in the following locations within the ITU-T Rec. H.222.0 | ISO/IEC 13818-1 syntax.
1)
Transport Stream packet Table 2-2
The data bytes of the transport_packet() syntax may contain private data. Private data carried in this
format is referred to as user private within the stream_type Table 2-34. It is permitted for Transport
Stream packets containing private data to also include adaptation_field()s.
2)
Transport Stream Adaptation Field Table 2-6
The presence of any optional private_data_bytes in the adaptation_field() is signalled by the
transport_private_data_flag. The number of the private_data_bytes is inherently restricted by the
semantic of the adaptation_field_length field, where the value of the adaptation_field_length shall not
exceed 183 bytes.
3)
PES packet Table 2-21
There are two possibilities for carrying private data within PES packets. The first possibility is within the
PES_packet_header, within the optional 16 bytes of PES_private_data. The presence of this field is
signalled by the PES_private_data_flag. The presence of the PES_private_data_flag is signalled by the
PES_extension_flag. If present, these bytes, when considered with the adjacent fields, shall not emulate
the packet_start_code_prefix.
The second possibility is within the PES_packet_data_byte field. This may be referred to as private data
within PES packets under the stream_type Table 2-34. This category of private data can be split in two:
private_stream_1 refers to private data within PES packets which follow the PES_packet() syntax such
that all fields up to and including, but not limited to, PES_header_data_length are present.
private_stream_2 refers to private data within PES packets where only the first three fields shall be
present followed by the PES_packet_data_bytes containing private data.
Note that PES packets exist within both Program Streams and Transport Streams therefore
private_stream_1 and private_stream_2 exist within both Program Streams and Transport Streams.
4)
Descriptors
Descriptors exist within Program Streams and Transport Streams. A range of private descriptors may be
defined by the user. These descriptors shall commence with descriptor_tag and descriptor_length fields.
For private descriptors, the value of descriptor_tag may take the values 64-255 as identified in Table 245. These descriptors may be placed within a program_stream_map() Table 2-34, a CA_section()
Table 2-32, a TS_program_map_section(), Table 2-33 and in any private_section(), Table 2-35.
Specifically private_data_bytes also appear in the CA_descriptor().
ITU-T Rec. H.222.0 (05/2006)
157
ISO/IEC 13818-1:2007 (E)
5)
Private Section
The private_section Table 2-35 provides a further means to carry private data also in two forms. This
type of elementary stream may be identified under stream_type Table 2-34 as private_data in PSI
sections. One type of private_section() includes only the first five defined fields, and is followed by
private data. For this structure the section_syntax_indicator shall be set to a value of '0'. For the other
type, the section_syntax_indicator shall be set to a value of '1' and the full syntax up to and including
last_section_number shall be present, followed by private_data_bytes and ending with the CRC_32.
Annex I
Systems conformance and real-time interface
(This annex does not form an integral part of this Recommendation | International Standard)
I.0
Systems conformance and real-time interface
Conformance for ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Program Streams and Transport Streams is specified in terms
of the normative specifications in this Recommendation | International Standard. These specifications include, among
other requirements, a System Target Decoder (T-STD and P-STD) which specifies the behaviour of an idealized
decoder when the stream is the input to such a decoder. This model, and the associated verification, do not include
information concerning the real-time delivery performance of the stream, except for the accuracy of the system clock
frequency which is represented by the Transport Stream and the Program Stream. All Transport Streams and Program
Streams must comply with this Recommendation | International Standard.
In addition, there is a real-time interface specification for input of Transport Streams and Program Streams to a decoder.
This Recommendation | International Standard allows standardization of the interface between MPEG decoders and
adapters to networks, channels, or storage media. The timing effects of channels, and the inability of practical adapters
to eliminate completely these effects, causes deviations from the idealized byte delivery schedule to occur. While it is
not necessary for all MPEG decoders to implement this interface, implementations which include the interface shall
adhere to the specifications. This Recommendation | International Standard covers the real-time delivery behaviour of
Transport Streams and Program streams to decoders, such that the coded data buffers in decoders are guaranteed not to
overflow nor underflow, and decoders are guaranteed to be able to perform clock recovery with the performance
required by their applications.
The MPEG real-time interface specifies the maximum allowable amount of deviation from the idealized byte delivery
schedule which is indicated by the Program Clock Reference (PCR) and System Clock Reference (SCR) fields encoded
in the stream.
Annex J
Interfacing jitter-inducing networks to MPEG-2 decoders
(This annex does not form an integral part of this Recommendation | International Standard)
J.0
Introduction
In this annex the expression "system stream" will be used to refer to both ITU-T Rec. H.222.0 | ISO/IEC 13818-1
Transport Streams and ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Program Streams. When the term "STD" is used, it is
understood to mean the P-STD (Program System Target Decoder) for Program Streams and the T-STD (Transport
System Target Decoder) for Transport Streams.
The intended byte delivery schedule of a system stream can be deduced by analyzing the stream. A system stream is
compliant if it can be decoded by the STD, which is a mathematical model of an idealized decoder. If a compliant
system stream is transmitted over a jitter-inducing network, the true byte delivery schedule may differ significantly
from the intended byte delivery schedule. In such cases it may not be possible to decode the system stream on such an
idealized decoder, because jitter may cause buffer overflows or underflows and may make it difficult to recover the time
base. An important example of such a jitter-inducing network is ATM.
158
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
The purpose of this annex is to provide guidance and insight to entities concerned with sending system streams over
jitter-inducing networks. Network-specific compliance models for transporting system streams are likely to be
developed for several types of networks, including ATM. The STD plus a real-time interface definition can play an
integral role in defining such models. A framework for developing network compliance models is presented in J.1.
Three examples of network encoding to enable the building of jitter-smoothing network adapters are discussed in J.2. In
the first example, a constant bitrate system stream is assumed and a FIFO is used for jitter smoothing. In the second
example, the network adaptation layer includes timestamps to facilitate jitter smoothing. In the final example, a
common network clock is assumed to be available end-to-end, and is exploited to achieve jitter smoothing.
Clause J.3 presents two examples of decoder implementations in which network-induced jitter can be accommodated. In
the first example, a jitter-smoothing network adapter is inserted between a network's output and an MPEG-2 decoder.
The MPEG-2 decoder is assumed to conform to a real-time MPEG-2 interface specification. This interface requires an
MPEG-2 decoder with more jitter tolerance than the idealized decoder of the STD. The network adapter processes the
incoming jittered bitstream and outputs a system stream whose true byte delivery schedule conforms to the real-time
specification. Example one is discussed in J.3.1. For some applications the network adapter approach will be too costly
because it requires two stages of processing. Therefore, in the second example the dejittering and MPEG-2 decoding
functions are integrated. The intermediate processing of the jitter-removal device is bypassed, so only a single stage of
clock recovery is required. Decoders that perform integrated dejittering and decoding are referred to in this annex as
integrated network-specific decoders, or simply integrated decoders. Integrated decoders are discussed in J.3.2.
In order to build either network adapters or integrated decoders a maximum value for the peak-to-peak network jitter
must be assumed. In order to promote interoperability, a peak-to-peak jitter bound must be specified for each relevant
network type.
J.1
Network compliance models
One way to model the transmission of a system stream across a jitter-inducing network is shown in Figure J.1.
The system stream is input to a network-specific encoding device that converts the system stream into a
network-specific format. Information to assist in jitter removal at the network output may be part of this format. The
network decoder comprises a network-specific decoder and an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 decoder. The
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 decoder is assumed to conform to a real-time interface specification, and could
have the same architecture as the STD with appropriate buffers made larger to provide more jitter tolerance. The
network-specific decoder removes the non- ITU-T Rec. H.222.0 | ISO/IEC 13818-1 data added by the network-specific
encoder and dejitters the network's output. The output of the network-specific decoder is a system stream that conforms
to the real-time specification.
A network target decoder (NTD) can be defined based on the above architecture. A compliant network bitstream would
be one that was able to be decoded by the NTD. A network decoder would be compliant provided it could decode any
network bitstream able to be decoded by the NTD. A real network decoder might or might not have the architecture of
the NTD.
MPEG-2
decoder
System
stream
Network-specific
encoding
Jitter-Inducing
Network
Network
bitstream
System
stream
Network specific
decoding (includes
jitter removal)
Network Decoder
TISO5990-95/d27
Figure J.1 – Sending system streams over a jitter-inducing network
J.2
Network specification for jitter smoothing
In the case of constant bit rate system streams, jitter smoothing can be accomplished with a FIFO. Additional data that
provides specific support for dejittering is not required in the network adaptation layer. After the bytes added by the
network encoding are removed, the system stream data is placed in a FIFO. A PLL keeps the buffer approximately half
ITU-T Rec. H.222.0 (05/2006)
159
ISO/IEC 13818-1:2007 (E)
full by adjusting the output rate in response to changes in buffer fullness. In this example the amount of jitter-smoothing
achieved will depend on the size of the FIFO and the characteristics of the PLL.
Figure J.2 illustrates a second way to accomplish jitter smoothing. In this example timestamp support from a network
adaptation layer is assumed. Using this technique, both constant bit rate and variable bit rate system streams can be
dejittered.
De-jittered
system stream
System
stream
data
System
stream
NXP
decode
NDP
Decode
Jitter-Inducing
Network
CR
Removal
Control
TC
Recover
NDP
encode
NDP
TC
Dejittering
Buffer
CR Insert
TC
NXP
encode
J/2
∑
TCd
TISO6000-95/d28
Figure J.2 – Jitter-smoothing using network-layer timestamps
Assume the network adapter is designed to compensate for a peak-to-peak jitter of J seconds. The intended byte
delivery schedule is reconstructed using Clock Reference (CRs) samples taken from a Time Clock (TC). The CRs and
the TC are analogous to PCRs and the STC. The Network Data Packet (NDP) encode converts each system stream
packet into a Network Data Packet (NDP). The network data packets contain a field for carrying CR values, and the
current value of the TC is inserted into this field as the NDP leaves the NDP encoder. The Network Transport
Packetization (NXP) function encapsulates the NDPs into network transport packets. After transmission across the
network, the CRs are extracted by the NDP decoder as the NDPs enter the NDP decoder. The CRs are used to
reconstruct the TC, for example by using a PLL. The first MPEG-2 packet is removed from the dejittering buffer when
the delayed TC (TCd) is equal to the first MPEG-2 packet's CR. Subsequent MPEG-2 packets are removed when their
CR values equal the value of the TCd.
Ignoring implementation details such as the speed of the TC clock recovery loop and the spectral purity of the TC, the
size of the dejittering buffer depends only on the maximum peak-to-peak jitter to be smoothed and the largest transport
rate that occurs in the system stream. The dejittering buffer size, Bdj, is given by
Bdj = JRmax
where Rmax is the maximum data rate of the system stream in bits per second. When packets traversing the network
experience the nominal delay, the buffer is half full. When they experience a delay of J/2 seconds, the buffer is empty,
and when they experience a delay (advance) of –J/2 seconds the buffer is full.
As a final example, in some cases a common network clock will be available end-to-end, and it may be feasible to lock
the system clock frequency to the common clock. The network adapter can smooth jitter with a FIFO. The adapter uses
PCRs or SCRs to reconstruct the original byte delivery schedule.
J.3
Example decoder implementations
J.3.1
Network adapter followed by an MPEG-2 decoder
In this implementation a network adapter conforming to the network compliance specification is connected to an
MPEG-2 decoder conforming to the real-time interface specification.
160
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
J.3.2
Integrated decoder
The example presented in J.3.1 requires two stages of processing. The first stage is necessary to dejitter the network's
output. The second stage, recovering the STC by processing PCRs or SCRs, is required for STD decoding. The example
presented in this subclause is a decoder that integrates the dejittering and decoding functions in a single system. The
STC clock is recovered directly using the jittered PCR or SCR values. For presenting this example, an MPEG-2
transport stream will be assumed.
Figure J.3 illustrates the operation of the integrated decoder. The stream of network packets input to the decoder is
assumed to be the same as the one shown in Figure J.2.
NXP and NDP
Decode
Integrated Decoding/
Dejittering Buffer
PID Filter
MPEG-2
transport
stream data
PCRs
Clock Recovery
STC
ES Decode
∑
STCd
J/2
TISO6010-95/d29
Figure J.3 – Integrated dejittering and MPEG-2 decoding
The incoming network packets are reassembled into MPEG-2 transport stream data by the NXP and NDP decode
functions. The jittered ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Transport Stream packets are then filtered to extract
packets with the desired PID. For the case illustrated, the PID being decoded is also carrying the PCRs. The PCR values
are sent to a PLL to recover the STC. Entire packets for the selected PID are placed in the integrated buffer. A positive
value of J/2 s is subtracted from the STC to obtain the delayed STC, STCd. Again, J is the peak-to-peak jitter the
network-savvy decoder can accommodate. The delay is introduced to guarantee that all the data required for an access
unit has arrived in the buffer when the PTS/DTS of the access unit equals the current value of the STCd.
Ignoring implementation details such as the speed of the STC clock recovery loop and the spectral purity of the STC:
Bsize = Bdec + Bmux + BOH + 512 + Bj
= Bn + 512 + Bj
where Bj = Rmax J and Rmax is the maximum rate at which data is input to the PID filter. Depending on the
implementation, the integrated memory could be broken into two components as in the transport STD.
Annex K
Splicing Transport Streams
(This annex does not form an integral part of this Recommendation | International Standard)
K.0
Introduction
For the purposes of this annex, the term 'splicing' refers to the concatenation performed on the Transport level of two
different elementary streams, the resulting Transport Stream conforming totally to this Recommendation | International
Standard. The two elementary streams may have been generated at different locations and/or at different times, and
were not necessarily intended to be spliced together when they were generated. In the following we will call the 'old'
stream a continuous elementary stream (video or audio), which has been superseded by another stream (the 'new' one)
from a certain point on. This point is called the splice. It is the boundary between data belonging to the 'old' stream and
data belonging to the 'new' one.
ITU-T Rec. H.222.0 (05/2006)
161
ISO/IEC 13818-1:2007 (E)
A splice can be seamless or non-seamless:
•
A seamless splice is a splice inducing no decoding discontinuity (refer to 2.7.6). This means that the
decoding time of the first access unit of the 'new' stream is consistent with respect to the decoding time
of the access unit of the 'old' stream preceding the splice, i.e., it is equal to the one that the next access
unit would have had if the 'old' stream had continued. In the following, we will call this decoding time
the 'seamless decoding time'.
•
A non-seamless splice is a splice which results in a decoding discontinuity, i.e., the decoding time of the
first access unit of the 'new' stream is greater than the seamless decoding time.
NOTE – A decoding time lower than the seamless decoding time is forbidden.
Splicing is allowed to be performed at any transport stream packet boundary, since the resulting stream is legal. But in a
general case, if nothing is known about the location of PES packet starts and access unit starts, this constraint imposes
that not only the Transport layer is parsed, but also the PES layer and the Elementary Stream layer, and may in some
cases, make some processing on the payload of Transport Stream packets necessary. If such complex operations are
wished to be avoided, splicing should be performed at locations where the Transport Stream has favourable properties,
these properties being indicated by the presence of a splicing point.
The presence of a splicing point is indicated by the splice_flag and splice_countdown fields (refer to 2.4.3.4 for the
semantics of these fields). In the following, the Transport Stream packet in which the splice_countdown field value
reaches zero will be called 'splicing packet'. The splicing point is located immediately after the last byte of the splicing
packet.
K.1
The different types of splicing point
A splicing point can be either an ordinary splicing point or a seamless splicing point.
K.1.1
Ordinary splicing points
If the seamless_splice_flag field is not present, or if its value is zero, the splicing point is ordinary. The presence of an
ordinary splicing point only signals alignment properties of the Elementary Stream: the splicing packet ends on the last
byte of an Access Unit, and the payload of the next Transport Stream packet of the same PID will start with the header
of a PES packet, the payload of which will start with an Elementary Stream Access Point (or with a
sequence_end_code() immediately followed by an Elementary Stream Access Point, in the case of video). These
properties allow 'Cut and Paste' operations to be performed easily on the Transport level, while respecting syntactical
constraints and ensuring bit stream consistency. However, it does not provide any information concerning timing or
buffer properties. As a consequence, with such splicing points, seamless splicing can only be done with the help of
private arrangements, or by analyzing the payload of the Transport Stream Packets and tracking buffer status and
timestamp values.
K.1.2
Seamless splicing points
If the seamless_splice_flag field is present and its value is one, information is given by the splicing point, indicating
some properties of the 'old' stream. This information is not aimed at decoders. Its primary goal is to facilitate seamless
splicing. Such a splicing point is called a seamless splicing point. The available information is:
•
The seamless decoding time, which is encoded as a DTS value in the DTS_next_AU field. This DTS
value is expressed in the time base which is valid in the splicing packet.
•
In the case of a video elementary stream, the constraints that have been applied to the 'old' stream when it
was generated, aiming at facilitating seamless splicing. These conditions are given by the value of the
splice_type field, in the table corresponding to the profile and level of the video stream.
Note that a seamless splicing point can be used as an ordinary splicing point, by discarding this additional information.
This information may also be used if judged helpful to perform non-seamless splicing, or for purposes other than
splicing.
K.2
Decoder behaviour on splices
K.2.1
On non-seamless splices
As described above, a non-seamless splice is a splice which results in a decoding discontinuity.
It shall be noted that with such a splice, the constraints related to the decoding discontinuity (see 2.7.6) shall be
fulfilled. In particular:
•
162
a PTS shall be encoded for the first access unit of the 'new' stream (except during trick mode operation or
when low_delay = '1');
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
•
the decoding time derived from this PTS (or from the associated DTS) shall not be earlier than the
seamless decoding time;
•
in the case of a video elementary stream, if the splicing packet does not end on a sequence_end_code(),
the 'new' stream shall begin with a sequence_end_code() immediately followed by a sequence_header().
In theory, since they introduce decoding discontinuities, such splices result in a non-continuous presentation of
presentation units (i.e., a variable length dead time between the display of two consecutive pictures, or between two
consecutive audio frames). In practice, the result will depend on how the decoder is implemented, especially in video.
With some video decoders, the freezing of one or more pictures may be the preferred solution. See Part 4 of
ISO/IEC 13818.
K.2.2
On seamless splices
The aim of having no decoding discontinuity is to allow having no presentation discontinuity. In the case of audio, this
can always be ensured. But it has to be noted that in the case of video, presentation continuity is in theory not possible
in cases 1) and 2) below:
1)
The 'old' stream ends on the end of a low-delay sequence, and the 'new' stream begins with the start of a
non-low-delay sequence.
2)
The 'new' stream ends on the end of a non-low-delay sequence, and the 'new' stream begins with the start
of a low-delay sequence.
The effects induced by such situations is implementation dependent. For instance, in case 1, a picture may have to be
presented during two frame periods, and in case 2, a picture may have to be skipped. However, it is technically possible
that some implementations support such situations without any undesirable effect.
In addition, referring to 6.1.1.6 of ITU-T Rec. H.262 | ISO/IEC 13818-2, a sequence_end_code() shall be present before
the first sequence_header() of the 'new' stream, if at least one sequence parameter (i.e., a parameter defined in the
sequence header or in a sequence header extension) has a different value in both streams, with the only exception of
those defining the quantization matrix. As an example, if the bit rate field has not the same value in the 'new' stream as
in the 'old' one, a sequence_end_code() shall be present. Thus, if the splicing packet does not end on a
sequence_end_code, the 'new' stream shall begin with a sequence_end_code followed by a sequence_header.
According to the previous paragraph, a sequence_end_code will be mandatory in most splices, even seamless ones. It
has to be noted that ITU-T Rec. H.262 | ISO/IEC 13818-2 specifies the decoding process of video sequences (i.e., data
comprised between a sequence_header() and a sequence_end_code()), and nothing is specified about how to handle a
sequence change. Thus, for the behaviour of the decoders when such splices are encountered, refer to Part 4 of
ISO/IEC 13818.
K.2.3
Buffer overflow
Even if both elementary streams obey the T-STD model before being spliced, it is not necessarily ensured that the STD
buffers do not overflow with the spliced stream in the time interval during which bits of both streams are in these
buffers.
In the case of constant bit rate video, if no particular conditions have been applied to the 'old' stream, and if no
particular precautions have been taken during splicing, this overflow is possible in the case where the video bit rate of
the 'new' stream is greater than the video bit rate of the 'old' one. Indeed, it is certainly true that the buffers MBn and EBn
of the T-STD do not overflow if bits are delivered to the T-STD at the 'old' rate. But if the delivery rate is switched to a
higher value at the input of TBn before 'old' bits are completely removed from the T-STD, the fullness of the STD
buffers will become higher than if the 'old' stream had continued without splicing, and may cause overflow of EBn
and/or MBn. In the case of variable bit rate video, the same problem can occur if the delivery rate of the 'new' stream is
higher than the one for which provision was made during the creation of the 'old' stream. Such a situation is forbidden.
However, it is possible for the encoder generating the 'old' stream to add conditions in the VBV buffer management in
the neighborhood of splicing points, so that provision is made for any 'new' video bit rate lower than a chosen value. For
instance, in the case of a seamless splicing point, such additional conditions can be indicated by a 'splice_type' value to
which entries correspond in Table 2-7 through Table 2-20 for 'splice_decoding_delay' and 'max_splice_rate'. In that
case, if the video bit rate of the 'new' stream is lower than 'max_splice_rate', it is ensured that the spliced stream will not
lead to overflow during the time interval during which bits of both streams are in the T-STD buffer.
In the case where no such constraints have been applied, this problem can be avoided by introducing a dead time in the
delivery of bits between the 'old' stream and the 'new' one, in order to let the T-STD buffers get sufficiently empty
before the bits of the 'new' stream are delivered. If we call tin the time at which the last byte of the last access unit of the
'old' stream enters the STD, and tout the time at which it exits the STD, it is sufficient to ensure that no more bits enter
the T-TD the time interval [tin, tout] with the spliced stream than if the 'old' stream had continued without splicing. As an
ITU-T Rec. H.222.0 (05/2006)
163
ISO/IEC 13818-1:2007 (E)
example, in the case where the 'old' stream has a constant bit rate Rold, and the 'new' one a constant bit rate Rnew, it is
sufficient to introduce a dead time Td satisfying the following relations to avoid this risk of overflow:
Td ≥ 0 and Td ≥ (tout – tin) × (1 – Rold/Rnew)
Annex L
Registration procedure (see 2.9)
(This annex does not form an integral part of this Recommendation | International Standard)
L.1
Procedure for the request of a Registered Identifier (RID)
Requesters of a RID shall apply to the Registration Authority. Registration forms shall be available from the
Registration Authority. Information which the requester shall provide is given in L.3. Companies and organizations are
eligible to apply.
L.2
Responsibilities of the Registration Authority
The primary responsibilities of the Registration Authority administrating the registration of copyright_identifiers is
outlined in this subclause; certain other responsibilities may be found in the JTC 1 Directives. The Registration
Authority shall:
L.2.1
a)
implement a registration procedure for application for a unique RID in accordance with Annex H/JTC 1
Directives;
b)
receive and process the applications for allocation of the work type code identifier from Copyright
Registration Authority;
c)
ascertain which applications received are in accordance with this registration procedure, and to inform
the requester within 30 days of receipt of the application of their assigned RID;
d)
inform application providers whose request is denied in writing within 30 days of receipt of the
application, and also inform the requesting party of the appeals process;
e)
maintain an accurate register of the allocated RID. Revisions to the contact information and technical
specifications shall be accepted and maintained by the Registration Authority;
f)
make the contents of this register available upon request to any interested party;
g)
maintain a database of RID request forms, granted and denied. Parties seeking technical information on
the format of private data which has a copyright_identifier shall have access to such information which is
part of the database maintained by the Registration Authority;
h)
report its activities to JTC 1, the ITTF and the JTC 1/SC 29 Secretariat, or their respective assignees,
annually on a schedule mutually agreed upon.
Contact information of the Registration Authority
Organization Name:
Address:
Telephone:
Fax:
L.3
Responsibilities of parties requesting an RID
The party requesting an RID for the purpose of copyright identification shall:
164
a)
apply using the form and procedures supplied by the Registration Authority;
b)
provide contact information describing how a complete description of the copyright organization can be
obtained on a non-discriminatory basis;
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
L.4
c)
include technical details of the syntax and semantics of the data format used to describe the audiovisual
works or other copyrighted works within the additional_copyright_info field. Once registered, the
syntax used for the additional copyright information shall not change;
d)
agree to institute the intended use of the granted copyright_identifier within a reasonable time-frame;
e)
maintain a permanent record of the application form and the notification received from the Registration
Authority of each granted copyright_identifier.
Appeal procedure for denied applications
The Registration Management Group is formed to have jurisdiction over appeals relating to a denied request for an RID.
The RMG shall have a membership who are nominated by P and L members of the ISO technical body responsible for
this Recommendation | International Standard. It shall have a convenor and secretariat nominated from its members.
The Registration Authority is entitled to nominate one non-voting observing member.
The responsibilities of the RMG shall be:
a)
to review and act on all appeals within a reasonable time-frame;
b)
to inform, in writing, organizations which make an appeal for reconsideration of its petition of the
RMG's disposition of the matter;
c)
to review the annual report of the Registration Authority summary of activities;
d)
to supply ISO member bodies with information concerning the scope of operation of the Registration
Authority.
Annex M
Registration application form (see 2.9)
(This annex does not form an integral part of this Recommendation | International Standard)
M.1
Contact information of organization requesting a Registered Identifier (RID)
Organization Name:
Address:
Telephone:
Fax:
email:
M.2
Statement of an intention to apply the assigned RID
RID application domain: using guidelines to be provided by the Registration Authority.
M.3
Date of intended implementation of the RID
M.4
Authorized representative
Name:
Title:
Address:
Signature: __________________________________
ITU-T Rec. H.222.0 (05/2006)
165
ISO/IEC 13818-1:2007 (E)
M.5
For official use only of the Registration Authority
Registration rejected: _______
Reason for rejection of the application:
Registration granted: ________ Registration value: _______
Attachment 1 – Attachment of technical details of the registered data format.
Attachment 2 – Attachment of notification of appeal procedure for rejected applications.
Annex N
(This annex does not form an integral part of this Recommendation | International Standard)
Registration Authority
Diagram of administration structure (see 2.9)
Correspondence table
ISSN
ISBN
ISMN
ISAN
XXI
XYI
XXI
YIX
.
.
.
.
.
.
Copyright_identifier
_____
Registration
Authority
_____
The Registration Authority
indicates the meaning
of the code which follows
and also identifies the
work type code.
Responsible for
the identifier
references
allocation
copyright_identifier
copyright_number
Video
copyright_identifier
copyright_number
Systems
copyright_identifier
copyright_number
Audio
TISO8190-97/d30
166
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Annex O
Registration procedure (see 2.10)
(This annex does not form an integral part of this Recommendation | International Standard)
O.1
Procedure for the request of an RID
Requesters of an RID shall apply to the Registration Authority. Registration forms shall be available from the
Registration Authority. The requester shall provide the information specified in O.4. Companies and organizations are
eligible to apply.
O.2
Responsibilities of the Registration Authority
The primary responsibilities of the Registration Authority administrating the registration of private data
format_identifiers is outlined in this annex; certain other responsibilities may be found in the JTC 1 Directives. The
Registration Authority shall:
a)
implement a registration procedure for application for a unique RID in accordance with the
JTC 1 Directives;
b)
receive and process the applications for allocation of an identifier from application providers;
c)
ascertain which applications received are in accordance with this registration procedure, and to inform
the requester within 30 days of receipt of the application of their assigned RID;
d)
inform application providers whose request is denied in writing within 30 days of receipt of the
application, and to consider resubmission of the application in a timely manner;
e)
maintain an accurate register of the allocated identifiers. Revisions to format specifications shall be
accepted and maintained by the Registration Authority;
f)
make the contents of this register available upon request to National Bodies of JTC 1 that are members of
ISO or IEC, to liaison organizations of ISO or IEC and to any interested party;
g)
maintain a database of RID request forms, granted and denied. Parties seeking technical information on
the format of private data which has an RID shall have access to such information which is part of the
database maintained by the Registration Authority;
h)
report its activities to JTC 1, the ITTF, and the SC 29 Secretariat, or their respective designees, annually;
i)
accommodate the use of existing RIDs whenever possible.
O.3
Contact information for the Registration Authority
O.4
Responsibilities of parties requesting an RID
The party requesting a format_identifier shall:
O.5
a)
apply, using the form and procedures supplied by the Registration Authority;
b)
include a description of the purpose of the registered bit stream, and the required technical details as
specified in the application form;
c)
provide contact information describing how a complete description can be obtained on a
non-discriminatory basis;
d)
agree to institute the intended use of the granted RID within a reasonable time-frame;
e)
to maintain a permanent record of the application form and the notification received from the
Registration Authority of a granted RID.
Appeal procedure for denied applications
The Registration Management Group is formed to have jurisdiction over appeals to denied requests for an RID.
The RMG shall have a membership who is nominated by P- and L-members of the ISO technical committee responsible
for this Specification. It shall have a convenor and secretariat nominated from its members. The Registration Authority
is entitled to nominate one non-voting observing member.
ITU-T Rec. H.222.0 (05/2006)
167
ISO/IEC 13818-1:2007 (E)
The responsibilities of the RMG shall be:
a)
to review and act on all appeals within a reasonable time-frame;
b)
to inform, in writing, organizations which make an appeal for reconsideration of its petition of the
RMG's disposition of the matter;
c)
to review the annual report of the Registration Authorities summary of activities;
d)
to supply member bodies of ISO and National Committees of IEC with information concerning the scope
of operation of the Registration Authority.
Annex P
Registration application form
(This annex does not form an integral part of this Recommendation | International Standard)
P.1
Contact information of organization requesting an RID
Organization Name:
Address:
Telephone:
Fax:
email:
Telex:
P.2
Request for a specific RID
NOTE – If the system has already been implemented and is in use, fill in this item and also the item P.3 and then skip to P.6;
otherwise leave this space blank and skip to P.4.
P.3
Short description of RID that is in use and date system that was implemented
P.4
Statement of an intention to apply the assigned RID
P.5
Date of intended implementation of the RID
P.6
Authorized representative
Name:
Title:
Address:
Signature: __________________________________
P.7
For official use of the Registration Authority
Registration rejected: _______
Reason for rejection of the application:
Registration granted: ________ Registration value: _______
Attachment 1 – Attachment of technical details of the registered data format.
Attachment 2 – Attachment of notification of appeal procedure for rejected applications.
168
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Annex Q
T-STD and P-STD buffer models for ISO/IEC 13818-7 ADTS
(This annex does not form an integral part of this Recommendation | International Standard)
Q.1
Introduction
The Transport Stream system target decoder model for audio streams is defined in 2.4.2. In this annex, the buffer model
for ISO/IEC 13818-7 ADTS is described.
ISO/IEC 13818-7 ADTS audio streams can be recognized in an ITU-T Rec. H.222.0 | ISO/IEC 13818-1 multiplex
through the presence of stream_id=0x110yyyyy ('y' = "don't care") and stream_type=0x0F as defined in Tables 2-22
and 2-23.
Q.2
Leak rate from Transport Buffer
For audio except ISO/IEC 13818-7 ADTS, the leak rate from Transport Buffer is 2 Mbit/s. This rate is, however, lower
than the maximum rate of ISO/IEC 13818-7 ADTS. Therefore, the leak rate for ISO/IEC 13818-7 ADTS streams is set
to a different value from ISO/IEC 11172-3 and ISO/IEC 13818-3 audio streams.
ISO/IEC 13818-7 ADTS elementary stream consists of one or more channels. The maximum rate of each channel is
576 kbit/s where the sampling frequency is 96 kHz. Therefore, the leak rate for ISO/IEC 13818-7 ADTS is calculated as
in the following equation.
Rxn = 1.2 × Rmax × N bits per second
where:
Rmax is a constant 576 kbit/s as defined in 3.2.2 of ISO/IEC 13818-7. It is an upper bound of the bit rate per channel of
AAC ADTS stream corresponding to the maximum value of sampling frequency (i.e., Fs = 96 kHz),
and where:
N is the number of audio channels that require their own decoder buffer in this elementary stream (i.e., individual
channel streams in a single channel element or channel pair element and independently switched coupling channel
elements).
Q.3
Buffer size
For audio except ISO/IEC 13818-7 ADTS, the main buffer size is 3584 bytes. This size is, however, smaller than the
maximum decoder input buffer size of ISO/IEC 13818-7 ADTS. Therefore, the main buffer size for the
ISO/IEC 13818-7 ADTS stream is set to a different value from ISO/IEC 11172-3 and ISO/IEC 13818-3 audio streams.
The main buffer size for ISO/IEC 13818-7 ADTS is calculated as follows:
BSn = BSmux + BSdec + BSoh
where BSoh, PES packet overhead buffering is defined as:
BSoh = 528 bytes
and BSmux, additional multiplexing buffering is defined as:
BSmux = 0.004 seconds × Rmax × N
and BSdec, access unit buffering is defined as:
BSdec = 6144 bits × N
ITU-T Rec. H.222.0 (05/2006)
169
ISO/IEC 13818-1:2007 (E)
where:
Rmax is a constant 576 kbit/s as defined in 3.2.2 of ISO/IEC 13818-7. It is an upper bound of the bit rate per channel of
AAC ADTS stream corresponding to the maximum value of sampling frequency (i.e., Fs = 96 kHz),
and where:
N is the number of audio channels that require their own decoder buffer in this elementary stream (i.e., individual
channel streams in a single channel element or channel pair element and independently switched coupling channel
elements).
Q.3.1
TBSn: same as other audio
In term of the smoothing buffer, there is no difference in TBn between ISO/IEC 13818-7 ADTS and other audio
streams. Consequently, it is not necessary to change TBSn, which is size of TBn.
Q.3.2
BSmux: different from other audio
BSmux, additional multiplexing buffering, shall be changed to accept up to 4 ms of delay jitter. This is similar to the
approach taken for other streams in ITU-T Rec. H.222.0 | ISO/IEC 13818-1.
Q.3.3
BSdec: different from other audio
BSdec, access unit buffering is based on the decoder input buffer size of the elementary stream. As defined in 3.2.2 of
ISO/IEC 13818-7, total decoder input buffer size is 6144 bits multiplied by the number of channels which require their
each decoder input buffer.
Q.3.4
BSoh: different from other audio
BSoh corresponds to the PES packet header overhead.
In 2.4.2.6,
The delay of any data through the System Target Decoders buffers shall be less than or equal to one second
except for still picture video data.
Besides, in 2.7.4,
The Program Stream and Transport Stream shall be constructed so that the maximum difference between
coded presentation time stamp referring to each elementary video or audio stream is 0.7 seconds.
BSoh shall be set to the appropriate size corresponding to the PES packet header overhead when AAC stream is
packetized with the above rules. The maximum size of PES packet header is 264 bytes. Therefore, BSoh = 528 bytes,
i.e., twice the maximum size of PES packet header, assures that at least two PES packet headers can enter the main
buffer regardless of the size of PES packet header. It means that PES packet header with PTS can be inserted at less
than 0.7 seconds intervals even when the data of one second will be in the main buffer.
Example: sampling frequency is 48 kHz
The size of PES packet header without any optional fields except PTS is 18 bytes. The number of Access Units of one
second is about 47. When the data of one second is in the main buffer (i.e., the worst case), PES packet header overhead
can fit the BSoh with packetizing more than or equal to two Access Units into one packet.
number_of_AU = 48 kHz/1024 = 46 875 per second
(number_of_AU/2) × 18[byte] = 421 875 bytes < BSoh
More frequent PES packet headers can fit BSoh, if the delay of any data through the main buffer is shorter than one
second.
170
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Q.4
Conclusion
The decoder buffer model should cover the maximum size of buffer; however, AAC can handle up to 48 channels and
very high bitrate. Therefore the 3 levels of number of channels, 2, 8 and 48, are used to define the leak rate and the main
buffer size. In a case of 2, the same leak rate and main buffer size as the conventional values are used to keep the
compatibility. In other cases (8 and 48), the proposed formulas are applied.
T-STD leak rate for ISO/IEC 13818-7 ADTS audio,
Number of Channels
Rxn [bit/s]
1-2
2 000 000
3-8
5 529 600
9-12
8 294 400
13-48
33 177 600
Channels: The number of full-bandwidth audio output channels plus the number of independently switched coupling
channel elements within the same elementary audio stream. For example, in the typical case that there are no
independently switched coupling channel elements, mono is 1 channel, stereo is 2 channels and 5.1 channel surround is
5 channels (the LFE channel is not counted).
T-STD main buffer size or ISO/IEC 13818-7 ADTS audio
Number of Channels
BSn [bytes]
1-2
3 584
3-8
8 976
9-12
12 804
13-48
51 216
Channels: The number of full-bandwidth audio output channels plus the number of independently switched coupling
channel elements within the same elementary audio stream. For example, in the typical case that there are no
independently switched coupling channel elements, mono is 1 channel, stereo is 2 channels and 5.1 channel surround is
5 channels (the LFE channel is not counted).
For Program Stream, the above main buffer size should be set in the P-STD_buffer_scale and P-STD_buffer_size as
follows.
Number of Channels
P-STD_buffer_scale
P-STD_buffer_size
1-2
0
28
3-8
0
71
9-48
0
401
ITU-T Rec. H.222.0 (05/2006)
171
ISO/IEC 13818-1:2007 (E)
Annex R
Carriage of ISO/IEC 14496 scenes in ITU-T Rec. H.222.0 | ISO/IEC 13818-1
(This annex does not form an integral part of this Recommendation | International Standard)
R.1
Content access procedure for ISO/IEC 14496 program components within a Program Stream
The following provides a reference receiver acquisition procedure for accessing ISO/IEC 14496 program elements in an
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Program Stream. Here, it is assumed that the Program Stream includes a
Program Stream Map (conveyed in a PES packet having stream_id equal to 0xBC):
•
Acquire the Program Stream Map.
•
Identify the IOD descriptor in the first descriptor loop.
•
Identify the ES_IDs of the object descriptor, scene description and other streams described within the
initial object descriptor.
•
Acquire the SL descriptor and FMC descriptor in the second descriptor loop for elementary_stream_ids
0xFA and 0xFB, as applicable.
•
Generate from these descriptors a stream map table
elementary_stream_id plus FlexMux channel, if applicable.
•
Locate the object descriptor stream using its ES_ID and the stream map table.
•
Locate other streams described in the initial object descriptor using their ES_ID and the stream map
table.
•
Continuously monitor the object descriptor stream and identify ES_IDs of additional streams.
•
Locate the additional streams using their ES_ID and the stream map table.
between
ES_IDs
and
associated
Figure R.1 gives an example of ISO/IEC 14496 content in a Program Stream, consisting of object descriptor stream,
scene description stream (BIFS-Command), BIFS-Anim stream and IPMP stream. All ISO/IEC 14496 streams are
multiplexed in a single FlexMux stream.
172
ITU-T Rec. H.222.0 (05/2006)
ISO/IEC 13818-1:2007 (E)
Objector Descriptor Stream
14496-1 FlexMux Stream
FMC = 0x01
BIFS-Command Stream
FMC = 0x02
OD Stream
ObjectDescriptor {
ES_Descriptor {
ES_ID = 0x0013
streamType = "Visual stream"
specificInfo = "BIFS-Anim"
}
}
ObjectDescriptor {
ES_Descriptor {
ES_ID = 0x0014
streamType = "IPMP stream"
}
}
PES Packet: stream_id = '1111 1011'
MPEG-2 Program Stream
PES Packet: stream_id = '1011 1100'
PES Packet: stream_id = '1111 1011'
Initial Object Descriptor
Program Stream Map
...
program_stream_info_length
1st_descriptor_loop {
IOD_descriptor {}
}
elementary_stream_map_length
{
stream_type = 0x12
elementary_stream_id = '1111 1011'
elementary_stream_info_length
2nd_descriptor_loop {
FMC_descriptor {}
}
}
CRC_32
...
ES_Descriptor {
ES_ID = 0x0011
streamType = "SD stream"
specificInfo = "BIFS-com"
}
ES_Descriptor {
ES_ID = 0x0012
streamType = "OD stream"
}
...
14496-1 FlexMux Stream
FMC = 0x03
BIFS_Anim Stream
FMC = 0x04
IPMP Stream
FMC descriptor
(ES_ID, FMC) = (0x0011, 0x01)
(ES_ID, FMC) = (0x0012, 0x02)
(ES_ID, FMC) = (0x0013, 0x03)
(ES_ID, FMC) = (0x0014, 0x04)
T1607330-99/d32
Figure R.1 – Example of ISO/IEC 14496 content in a Program Stream
R.2
Content access procedure for ISO/IEC 14496 program components within a Transport
Stream
The following provides a reference receiver acquisition procedure for accessing ISO/IEC 14496 program elements in an
ITU-T Rec. H.222.0 | ISO/IEC 13818-1 Transport Stream:
•
Acquire the Program Map Table for the desired program.
•
Identify the IOD descriptor in the first descriptor loop.
•
Identify the ES_IDs of the object descriptor, scene description and other streams described within the
initial object descriptor.
•
Acquire the set of all SL descriptors and FMC descriptors present in the second descriptor loop for any
of the elementary_PIDs.
•
Generate from these descriptors a stream map table between ES_IDs and associated elementary_PID plus
FlexMux channel, if applicable.
•
Locate the object descriptor stream using its ES_ID and the stream map table.
•
Locate other streams described in the initial object descriptor using their ES_ID and the stream map
table.
•
Continuously monitor the object descriptor stream and identify ES_IDs of additional streams.
•
Locate the additional streams using their ES_ID and the stream map table.
Figure R.2 gives an example of ISO/IEC 14496 program elements in a Transport Stream, consisting of object descriptor
stream, scene description stream (BIFS-Command), BIFS-Anim and IPMP elementary streams. BIFS-Command and
OD stream are conveyed by means of ISO_IEC_14496_sections, while BIFS-Anim and IPMP elementary streams are
conveyed in PES packets referenced by two distinct elementary_PID values, without the use of the ISO/IEC 14496-1
FlexMux tool.
ITU-T Rec. H.222.0 (05/2006)
173
ISO/IEC 13818-1:2007 (E)
Object Descriptor Stream
Program Association Section
...
{
...
program_number = 0x0001
program_map_PID = 0x0100
...
}
CRC_32
TS Packet: PID = 0x0000
ISO_IEC_14496_Section
ISO_IEC_14496_Section
BIFS-Command Stream
OD Stream
TS Packet: PID = 0x0111
TS Packet: PID = 0x0112
ObjectDescriptor {
ES_Descriptor {
ES_ID = 0x0013
streamType = "SD stream"
specificInfo = "BIFS-Anim"
}
}
ObjectDescriptor {
ES-Descriptor {
ES_ID = 0x0014
streamType = "IPMP stream"
}
}
MPEG-2 Transport Stream
TS Packet : PID = 0x0100
TS Packet: PID = 0x0113
TS Program Map Section
PES Packet: stream_id = '1111 1010'
Initial Object Descriptor
...
ES_Descriptor {
ES_ID = 0x0011
streamType = "SD stream"
specificInfo = "BIFS-com"
}
ES_Descriptor {
ES_ID = 0x0012
streamType = "OD stream"
}
...
...
program_stream_info_length
1st_descriptor_loop {
IOD_descriptor {}
}
{
stream_type = 0x13
elementary_PID = 0x0111
2nd_descriptor_loop {
SL_descriptor {
ES_ID = 0x0011
}
}
stream_type = 0x13
elementary_PID = 0x0112
2nd_descriptor_loop {
SL_descriptor {
ES_ID = 0x0012
}
}
BIFS-Anim Stream
stream_type = 0x12
elementary_PID = 0x0113
2nd_descriptor_loop {
SL_descriptor {
ES_ID = 0x0013
}
}
stream_type = 0x12
elementary_PID = 0x0114
2nd_descriptor_loop {
SL_descriptor {
ES_ID = 0x0014
}
}
...
T1607340-99/d33
Figure R.2 – Example of ISO/IEC 14496 content in a Transport Stream
174
ITU-T Rec. H.222.0 (05/2006)
SERIES OF ITU-T RECOMMENDATIONS
Series A
Organization of the work of ITU-T
Series D
General tariff principles
Series E
Overall network operation, telephone service, service operation and human factors
Series F
Non-telephone telecommunication services
Series G
Transmission systems and media, digital systems and networks
Series H
Audiovisual and multimedia systems
Series I
Integrated services digital network
Series J
Cable networks and transmission of television, sound programme and other multimedia signals
Series K
Protection against interference
Series L
Construction, installation and protection of cables and other elements of outside plant
Series M
Telecommunication management, including TMN and network maintenance
Series N
Maintenance: international sound programme and television transmission circuits
Series O
Specifications of measuring equipment
Series P
Telephone transmission quality, telephone installations, local line networks
Series Q
Switching and signalling
Series R
Telegraph transmission
Series S
Telegraph services terminal equipment
Series T
Terminals for telematic services
Series U
Telegraph switching
Series V
Data communication over the telephone network
Series X
Data networks, open system communications and security
Series Y
Global information infrastructure, Internet protocol aspects and next-generation networks
Series Z
Languages and general software aspects for telecommunication systems
Printed in Switzerland
Geneva, 2007
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement