The Digital Fact Book
The Digital Fact Book
The Digital Fact Book
Edition 11 including a supplement on
A reference manual for the television broadcast & post production industries
Editor: Bob Pank
Alan Hoggarth (Omneon Video Networks)
David Monk (Texas Instruments)
Peter Powell (Peter Powell Associates)
Steve Shaw (Digital Praxis)
Mark Horton, Steve Owen, Dave Throup, Roger Thornton (Quantel)
Quantel R&D for their co-operation and many contributions and to the many contributors and commentators from the industry around the world
Copyright Quantel Limited 1988, 1989, 1990, 1991, 1992, 1994, 1996, 1998, 2000, 2002
Extract from a letter from Michael Faraday to J. Clerk Maxwell
Royal Institution, 13 November 1857
There is one thing I would be glad to ask you. When a mathematician engaged in investigating physical actions and results has arrived at his own conclusions, may they not be expressed in common language as fully, clearly, and definitely as in mathematical formulae? If so, would it not be a great boon to such as we to express them so – translating them out of their hieroglyphics that we also might work upon them by experiment? I think it must be so, because I have found that you could convey to me a perfectly clear idea of your conclusions, which, though they may give me no full understanding of the steps of your process, gave me the results neither above nor below the truth, and so clear in character that I can think and work from them.
If this is possible, would it not be a good thing if mathematicians, writing on these subjects, were to give us their results in this popular useful working state as well as in that which is their own and proper to them?
Extract from “On Giants’ Shoulders” by Melvyn Bragg,
Hodder & Stoughton 1998
Quantel and digital technology
In 1973, the company now known as Quantel developed the first practical analogue to digital converter for television applications. That innovation not only gave Quantel its name (QUANtised TELevision), it also started a process that has fundamentally changed the look of television, the way it is produced and delivered. Quantel’s contribution to the changes leading to the digital age is best traced through its products, which have extended to film and print as well as television.
Quantel demonstrates the
, the world’s first digital framestore synchroniser. TV coverage of the 1976 Montreal Olympic games is transformed with synchronised shots from an airship ‘blimp’ freely used and quarter-sized picture inserts mark the genesis of digital effects.
The first portable digital standards converter, the
heralds high quality and easily available conversion.
1978 DPE 5000
series, the first commercially successful digital effects machine, popularises digital video effects.
series introduces digital still storage for on-air presentation.
framestore synchroniser is just one rack unit in height.
creates the market for video graphics – and, thanks to continuous development, continues as the industry standard.
introduces the TV page turn and is the first digital effects machine able to manipulate 3D images in 3D space.
1986 Harry ®
makes multilayering of live video a practical proposition and introduces nonlinear operation and random access to video clips.
, the second generation Paintbox, is faster, smaller and more powerful than its ‘classic’ predecessor.
stills store integrates the storage, presentation and management of stills.
, the dynamic graphics workstation manipulates still graphics over live video.
, the Effects Editor, offers simultaneous layering of multiple live video sources.
, the Video Design Suite, is the first custom-built, dedicated graphics production and compositing centre.
1993 Dylan ®
disk array technology provides fault-tolerant extended storage for non-compressed video with true random access to material. First applications are with
and the new
which introduces online nonlinear editing.
, digital opticals for movies, brings the flexibility of digital compositing and effects technology to the film industry. Its accuracy and speed open new horizons for movie making.
video server offers large-scale shared video storage for many users, each with true random access for input, editing and playout for news, sport, transmission and post-production.
the integrated news and sports production system offers the sought-after, fast-track end-to-end solution from lines-in to on-air. Stories edited at journalists’ desktops are immediately ready for transmission.
1999 Moving Picturebox ®
extends Picturebox stills facilities to both video clips and stills. This agile, server-based integrated system handles conventionally complex sequences for on-air presentation.
is an entirely new platform for media editing, compositing and serves as the centre-piece of modern post. It introduces Resolution Co-existence ® – working with all
TV formats together, SD and HD, as well as with digital film. Its unique ultra-fast hardware and software architecture allows scalability and access for third-party developers.
– a radical, all-encompassing, new concept that offers total scalability in both hardware and software across post production, graphics and broadcast for multiple resolution, team-working production environments. The post range includes the
media platform, the
mainstream SD/HD editor, the
PC-based SD/HD editor and
software for the PC. Graphics includes the
high powered, multi-resolution platform,
for powerful SD graphics production and
software for the PC. For broadcast, the totally scalable
combines broadcast and browse video within the same server, while a range of consistent, progressive software encompassing craft editing to journalist cut and view, runs on any level of hardware from a laptop to a high performance workstation.
For the latest developments please contact Quantel or visit our website www.quantel.com
Introduction to the eleventh edition
Convergence continues. SD, HD, film, Internet and DVD are merging and becoming digital media. Increasingly, production takes this into account so that one digital master can support them all and the 2k film file is now much more widely used. For example,
HBO’s Band of Brothers was the highest budget TV series ever. It was shot on 35 mm film and scanned into digits at 2k resolution and all post production was, of course, digital. The deliverables including HD as well as SD, were all copied from the 2k digital master. But 2k is not just for big budget productions; it is becoming the de facto standard for digital film.
Yet there is still much discussion about the qualities of film and what digital standards are needed to carry its qualities through to the screen. This goes right back to the nature of film itself. So, since not everyone in television is closely familiar with the medium, this edition has a large new section on Digital Film. In true Digital Fact Book style it contains a good deal of background information including tutorials and opinion articles besides explanations of some of the more common terminology.
Meanwhile European HD production is growing as the cost equation has swung heavily toward HD which is starting to replace Super 16 shoots. This means post production needs a mix of HD and SD capability. The next step is handling all formats together – something that Quantel’s resolution co-existence already deals with effortlessly. With
Moore’s Law still in operation, the added cost of storage and processing for HD NLE post is becoming less of an issue. The lack of HD broadcasting in Europe soon won’t matter; we’ll be able to watch on Blu-ray disks!
The technical landscape in broadcasting and post production has continued to evolve.
At the sharp end, new news systems now create a modern office environment with a couple of technical racks at the end of the room. Behind this is a merging of computer and video technology networked and working smoothly together – with hardly a tape in sight. One landmark is Sony’s option of an Ethernet card for its IMX VTRs. Dubbed by some as the eVTR it is a clear indication of the importance of networking in production and post production.
Another very significant step is the rollout of AAF and its use in equipment from Avid,
Quantel and a growing army of others. If networking is about communication then AAF helps everyone to understand each other – not only for video and audio, but also for the metadata. Applications are wide and include uses as a ‘super EDL’ and for archiving.
Digital television, or rather digital media, is moving forward on a broad front. One thing’s certain, it will carry on evolving and changing. Hopefully the Digital Fact Book will continue to help in understanding the changes and their value.
bit bits per second byte bytes per second frames per second fps gigabits Gb b b/s
B/s gigabits per second gigabyte gigabytes per second gigahertz hertz
Hz hours kilobit kilobits per second kilobyte kilohertz megabit megabyte megahertz micro milli seconds
µ m s h kb kb/s kB kHz
TV standards descriptions
TV standards are written in many different ways. The method used in the book is shown below and follows how they appear in ITU-R BT 709: i.e. –
Number of pixels per line x number of lines per frame/vertical refresh rate (in Hz) progressive or interlaced (P or I)
1) The vertical refresh rate for interlaced scans is twice the whole frame rate (two interlaced fields make up one whole frame). This assumes that interlace is only ever 2:1
(theoretically it can be greater but never is for broadcast purposes).
2) Digital standards are quoted for the active lines and pixels only.
Digital SDTV in Europe
720 x 576/50I
(analogue equivalent is 625/50I)
An HD standard in USA
1920 x 1080/30P
Lines per frame/vertical refresh rate, progressive or interlaced e.g.:
Digital technology is sweeping our industry and affects many parts of our lives. Yet we live in an analogue world. Light and sound naturally exist in analogue forms and our senses of sight and hearing are matched to that. The first machines to capture, record and manipulate pictures and sound were analogue but today it is far easier to do the jobs in the digital domain. Not only does this allow the use of the highly advanced digital components available from the computer industry but it also leads to many new capabilities that were simply impossible with analogue.
The techniques used to move between the analogue and digital worlds of television pictures are outlined here. Some of the pitfalls are shown as well as describing why the digital coding standards for standard definition and high definition television (ITU-R
BT.601 and ITU-R BT.709) are the way they are.
The digital machines used in television are generally highly complex and many represent the state-of-the-art of digital technology. The initial reason for the popularity of digital techniques was that the scale of the computer industry ensured that the necessary electronic components were both relatively easily available and continued to develop. But the preference for digits is also because of their fidelity and the power they give to handle and manipulate images. Rather than having to accurately handle every aspect of analogue signals, all digital circuits have to do is differentiate between, or generate, two electrical states – on and off, high and low, 1 and 0. This is relatively easy and so leads to superb fidelity in multi-generation recordings, no losses in passing the signal from place to place, plus the potential of processing to produce effects and many other techniques far beyond those available in analogue.
Thirty years ago the technology simply did not exist to convert television pictures into digits. Even if it could have been done there were no systems able to process the resulting data stream at anything like real-time. Today digital machines have successfully reached every aspect of television production – from scene to screen.
At the same time costs have tumbled so that all budgets can afford to take advantage of digital technology – from broadcast professionals to consumers.
From analogue to digital
Today, analogue to digital conversion may occur within the camera itself but the general principles of conversion remain the same. The process occurs in three parts: signal preparation, sampling and digitisation. Initially, digitisation involved working with television’s composite signals (PAL and NTSC) but this is now increasingly rare.
In professional circles it is the components (meaning separate signals that together make-up the full colour signal) which are digitised according to the ITU-R BT.601 digital sampling specification. ‘601’ describes sampling at standard definition and is widely
used in studio and post production operations so the digitisation of component signals is described here. Sampling for high definition, according to ITU-R BT.709, broadly follows the same principles, but works faster. Both standards define systems for 8-bit and 10-bit sampling accuracy – providing 2
(= 256) and 2
(= 1024) discrete levels with which to describe the analogue signals.
There are two types of component signals; the Red, Green and Blue (RGB) and Y, R-Y,
B-Y but it is the latter which is by far the most widely used in digital television and is included in the ITU-R BT.601 and 709 specifications. The R-Y and B-Y colour difference signals carry the colour information while Y represents the luminance. Cameras, telecines, etc. generally produce RGB signals. These are easily converted to Y, R-Y, B-Y using a resistive matrix – established analogue technology.
The analogue to digital converter (ADC) only operates correctly if the signals applied to it are correctly conditioned. There are two major elements to this. The first involves an amplifier to ensure the correct voltage and amplitude ranges for the signal are given to the ADC. For example, luminance amplitude between black and white must be set so that it does not exceed the range that the ADC will accept. The ADC has only a finite set of numbers (an 8-bit ADC can output 256 unique numbers – but no more, a 10-bit ADC has 1024 – but no more) with which to describe the signal. The importance of this is such that the ITU-R BT.601 standard specifies this set-up quite precisely saying that, for 8-bit sampling, black should correspond to level 16 and white to level 235, and at 10bit sampling 64 and 940 respectively. This leaves headroom for errors, noise and spikes to avoid overflow or underflow on the ADC. Similarly for the colour difference signals, zero signal corresponds to level 128 (512 for 10-bit) and full amplitude covers only
225 (900) levels.
Amplifiers Low-Pass Filters
8 or 10-bit
Cr & Cb
For the second major element the signals must be low-pass filtered to prevent the passage of information beyond the luminance band limit of 5.75 MHz and the colour difference band limit of 2.75 MHz, from reaching their respective ADCs. If they did, aliasing artefacts would result and be visible in the picture (more later). For this reason low pass (anti-aliasing) filters sharply cut off any frequencies beyond the band limit.
Sampling and digitisation
The low-pass filtered signals of the correct amplitudes are then passed to the ADCs where they are sampled and digitised. Normally two ADCs are used, one for the luminance Y, and the other for both colour difference signals, R-Y and B-Y. Within the active picture the ADCs take a sample of the analogue signals (to create pixels) each time they receive a clock pulse (generated from the sync signal). For Y the clock frequency is 13.5 MHz and for each colour difference channel half that – 6.75 MHz – making a total sampling rate of 27 MHz. It is vital that the pattern of sampling is rigidly adhered to, otherwise onward systems and eventual conversion back to analogue will not know where each sample fits into the picture – hence the need for standards! Cosited sampling is used, alternately making samples of Y, R-Y, and B-Y on one clock pulse and then on the next, Y only (ie there are half the colour samples compared with the luminance). This sampling format used in 601 is generally referred to as 4:2:2 and is designed to minimise chrominance/luminance delay – any timing off-set between the colour and luminance information. Other sampling formats are used in other applications – for example 4:2:0 for MPEG-2 compression.
The amplitude of each sample is held and precisely measured in the ADC. Its value is then expressed and output as a binary number and the analogue to digital conversion is complete. Note that the digitised forms of R-Y and B-Y are referred as Cr and Cb.
Sampling (clock) frequency
The (clock) frequency at which the picture signal is sampled is crucial to the accuracy of analogue to digital conversion. The object is to be able, at some later stage, to faithfully reconstruct the original analogue signal from the digits. Clearly using too high a frequency is wasteful whereas too low a frequency will result in aliasing – so generating artefacts. Nyquist stated that for a conversion process to be able to recreate the original analogue signal, the conversion (clock) frequency must be at least twice the highest input frequency being sampled (see diagram below) – in this case, for luminance, 2 x 5.5 MHz =11.0 MHz. 13.5 MHz is chosen for luminance to take account of both the filter characteristics and the differences between the 625/50 and 525/60 television standards. It is a multiple of both their line frequencies, 15,625 Hz and
15,734.265 Hz respectively, and therefore compatible with both (see 13.5 MHz). Since each of the colour difference channels will contain less information than the Y channel
(an effective economy since our eyes can resolve luminance better than chrominance) their sampling frequency is set at 6.75 MHz – half that of the Y channel.
Signal to be digitised
Correct: Sampling (clock) frequency is high enough to resolve the signal
Signal to be digitised
Signal as ‘seen’ by the sampling system
Wrong: The signal frequency is too high for the sampling (clock) frequency,
resulting in the wrong signal being seen by the sampling system
From digital to analogue
In the reverse process the digital information is fed to three digital to analogue converters (DACs), one each for Y, Cr and Cb (digitised R-Y and B-Y), which are clocked in the same way and with the same frequencies as was the case with the ADCs.
The output is a stream of analogue voltage samples creating a ‘staircase’ or ‘flat top’ representation similar to the original analogue signal (see figure below). The use of a sampling system imposes some frequency-dependent loss of amplitude which follows a
Sinx/x slope. This means that the output amplitude curves down to zero at half the frequency of the sampling frequency, known as the Nyquist frequency. For example sampling at 13.5 MHz could resolve frequencies up to 6.75 MHz. Although the ITU-R
BT.601 set-up is way off that zero point, the curved response is still there. This curve is corrected in the Sinx/x low-pass filters which, by losing the unwanted high frequencies, smoothes the output signal so it now looks the same as the original Y, R-Y, B-Y analogue inputs. For those needing RGB, this can be simply produced by a resistive matrix.
‘Flat-top’ output from DAC
Filtered resulting signal
Today the whole analogue to digital and digital to analogue process is usually reliable and accurate. However there are inherent inaccuracies in the process. The accuracy of the clock timing is important and it should not vary in time or jitter. Also the accuracy of the ADCs in measuring the samples, though within the specification of the chip, may not be exact. This is a highly specialised task as each sample must be measured and output in just 74 nanoseconds, or 13.5 nanoseconds for HD. Equally the DACs may only be expected to be accurate to within their specification, and so they too will impose some degree of non-linearity into the signal. Even with perfect components and operation the process of sampling and reconstituting a signal is not absolutely accurate.
The output is never precisely the same as the original signal. For this reason, plus cost considerations, systems are designed so that repeated digitisation processes are, as far as possible, avoided. Today it is increasingly common for pictures to be digitised at or soon after the camera and not put back to analogue, except for monitoring, until the station output, or, with DTV, until arriving at viewers’ TV sets or set-top boxes.
A set of digital sampling frequencies in ‘link B’ of a dual (SDI or HD-SDI) link that carries half-frequency R and B signals only.
The nominal 30 frames/60 fields per second of NTSC colour television is usually multiplied by 1000/1001 (= 0.999) to produce slightly reduced rates of 29.97 and 59.94
Hz. This offset gives rise to niceties such as drop-frame timecode (dropping one frame per thousand – 33.3 seconds) and audio that also has to run at the right rate. Although having analogue origins, it has also been extended into the digital and HD world where
24 Hz becomes 23.97 and 30 frames/60 fields per second are again changed to 29.97
and 59.94 Hz. Of course, as the frame/field frequency changes, so do the line and colour subcarrier frequency as all are locked together. Note that this does not apply to
PAL colour systems as these always use the nominal values (25 Hz frame rate).
The reason for the 1000/1001 offset is based in monochrome legacy. Back in 1953, the
NTSC colour subcarrier was specified to be half an odd multiple (455) of line frequency to minimize the visibility of the subcarrier on the picture. Then, to minimize the beats between this and the sound carrier, the latter was to be half an even multiple on line frequency and, to ensure compatibility with the millions of existing monochrome TV sets, the sound carrier was kept unchanged – at 4.5 MHz – close to 286 times the line frequency (Fl). Then, in a real tail-wags-dog episode, it was decided to make this exactly
286 times… by suitably altering the line frequency of the colour system (and hence that of the colour subcarrier and frame rate).
Here’s the maths.
Fl = frames per second x number of lines per frame
Nominally this is 30 x 525
But it was decided that:
286 x Fl
This reduced Fl by
= 15,750 kHz
= 4.5 MHz
Fl = 4,500,000/286 = 15,734.265 kHz
15734.265/15750 = 1000/1001 or 0.999
As all frequencies in the colour system have to be in proportion to each other, this has made:
NTSC subcarrier (Fl x 455/2)
30 Hz frame rate (Fl/number of lines per frame)
= 3.579 MHz
= 29.97 Hz
Following on, all digital sampling locked to video is affected so, for example, nominal 48 and 44.1 kHz embedded audio sampling becomes 47.952 and 44.056 kHz respectively.
This is the sampling frequency of luminance in SD digital television. It is represented by the 4 in 4:2:2. The 4 is pure nostalgia as 13.5 MHz is in the region of 14.3 MHz, the sampling rate of 4 x NTSC colour subcarrier (3.58 MHz), used at the very genesis of digital television equipment.
Reasons for the choice of 13.5 MHz belong to politics, physics and legacy. Politically it had to be global and work for both 525/60 (NTSC) and 625/50 (PAL) systems. The physics is the easy part; it had to be significantly above the Nyquist frequency so that the highest luminance frequency, 5.5 MHz for 625-line PAL systems, could be faithfully reproduced from the sampled digits – i.e. sampling in excess of 11 MHz – but not so high as to produce unnecessary, wasteful amounts of data. Some maths is required to understand the legacy.
The sampling frequency had to produce a static pattern on both 525 and 625 standards, otherwise it would be very complicated to handle and, possibly, restrictive in use. In other words, the frequency must be a whole multiple of both line lengths.
The line frequency of the 625/50 system is simply 625 x 25 = 15,625 Hz
(NB 50 fields/s makes 25 frames/s)
So line length is 1/15,625 = 0.000064 or 64µs
The line frequency of the 525/60 NTSC system is complicated by the need to offset it by a factor of 1000/1001 to avoid interference when transmitted. The line frequency is
525 x 30 x 1000/1001 = 15,734.265 Hz. This makes line length 1/15,734.265 =
The difference between the two line lengths is 64 – 63.55555 = 0.4444µs
This time divides into 64µs exactly 144 times, and into 63.5555µs exactly 143 times.
This means the lowest common frequency that would create a static pattern on both standards is 1/0.4444 MHz, or 2.25 MHz.
Now, back to the physics. The sampling frequency has to be well above 11 MHz, so
11.25 MHz (5 x 2.25) is not enough. 6 x 2.25 gives the sampling frequency that has been adopted -13.5 MHz.
Similar arguments have been applied to the derivation of sampling for HD. Here 74.25
MHz (33 x 2.25) is used.
Picture aspect ratio used for HDTV and some SDTV (usually digital).
Refers to 24 frames-per-second, progressive scan. This has been the frame rate of motion picture film since talkies arrived. It is also one of the rates allowed for transmission in the DVB and ATSC television standards – so they can handle film without needing any frame-rate change (3:2 pull-down for 60 fields-per-second systems or running film at 25fps for 50 Hz systems). It is now accepted as a part of television production formats – usually associated with high definition 1080 lines, progressive scan. A major attraction is a relatively easy path from this to all major television formats as well as offering direct electronic support for motion picture film and D-cinema.
24PsF (segmented frame)
A 24P system in which each frame is segmented – recorded as odd lines followed by even lines. Unlike normal television, the odd and even lines are from the same snapshot in time – exactly as film is shown today on 625/50 TV systems. This way the signal is more compatible (than normal progressive) for use with video systems, e.g. VTRs,
SDTI or HD-SDI connections, mixers/switchers etc., which may also handle interlaced scans. It can also easily be viewed without the need tp process the pictures to reduce
Refers to 25 frames-per-second, progressive scan. Despite the international appeal of
24P, 25P is widely used for HD productions in Europe and other countries using 50 Hz TV systems. This is a direct follow-on from the practice of shooting film for television at
1080/24P – The Global Production Format?
The use of 24 frames-per-second is certainly not new. Motion pictures have used it since introducing the ‘talkies’ and the TV industry has long since evolved methods for transferring it into its 50 and 60 field-per-second rates. What is new is the adoption of this frame rate, progressively scanned, for television itself. Interest is strongest in the
USA but there are reasons why those in other countries should also take note.
Speak to three different people and you may well be given three different answers as to why the idea first came about: as a good rate for animations, a match for motion pictures or something that can be efficiently translated to other frame rates. As the world adopts digital television a plethora of video formats is being proposed and adopted for standard definition (SD) as well as high definition (HD). In the USA, the ATSC digital television standard includes no less than 18 formats with a variety of image sizes using 24, 30 and
60 fps. This can leave programme-makers wondering what format to use for their programme – especially if it is to be widely distributed and to appear in good quality.
Using 1080 lines, 24 fps progressive scan is seen as a good answer not just for the USA, but as a global exchange format – hence the importance for international contributors.
The great interest in 1080/24P for production does not extend into transmission – even though the standard ATSC receiver in the USA should be able to display it. Everyone intends to translate the format into their chosen analogue or digital transmission standard. The efficiency of the translation has great appeal as it can avoid the complexity of frame rate conversion. At the same time it has the highest spatial resolution; 1920 x 1080 – 2 million pixels is the highest resolution in HDTV, and progressive scanning offers higher vertical resolution than interlaced. Any re-sizing to other formats will be down, rather than up, giving the best possible results for all formats.
re-size +4.17% speed
3:2 pull-down re-size
The popular HD transmission format 1080/60I (60 fields interlaced) can be produced by applying 3:2 pull-down – simply mapping the 24 original frames onto the 60 fields used for transmission – a loss-free operation well established for showing movies on TV.
Care must be taken if the resulting material is to be cut, natural edit points being five TV frames apart. Also, for graphics or further post work, clean whole frames need to be accessed, not those with two fields from different film frames. For this reason the 3:2 pull-down result is not best suited to post production. Where required, unpicking the
3:2 process and re-sequencing the result on output is a useful solution.
Other picture sizes at 720 and 480 lines are created by re-sizing the 1080-line originals.
The quality of the result depends on the algorithm used. The 1080-line pictures are large
– six times the pixel count of 480-line SD – so considerable processing power is needed for high quality, real-time results. In practice, purpose-built hardware is used.
Producing the 50 fields-per-second rate used in ‘PAL’ countries, once again replicates the method already used for motion picture film. That is to run the film at 25 fps.
Although the result is about 4 percent faster, few usually notice – though some did complain that their PAL VHS of Titanic was eight minutes short!
Global or Universal?
The 1080/24P format with 16:9 picture aspect ratio has been referred to as a Global
Image Format. Certainly the ‘Global’ accolade is due as it is possible to produce high quality results from it for the world’s television standards. This means world-wide distribution can be served from a single edited master – a better situation than the two,
625/50 PAL and 525/60 NTSC, often used now.
But clearly not everyone needs to produce for HD and the use of 525 and 625-line formats will continue for very many years yet. No one yet predicts the demise of SD and it does have great potential to support HD in many ways. The choice of production standard will continue to be open with the decision depending on parameters such as content, budget and distribution requirements.
Is 1080/24P always the best solution? Starting with the highest spatial resolution makes sound technical sense but the temporal resolution is questionable as consideration must be given to content. 24 fps is a low frame rate and, as such, is quite limited in its ability to portray motion. Flicker is reduced by the display system showing each frame twice or even three times – as with film in the cinema. 24 fps should be sufficient for most programming – including dramas, documentaries and natural history.
But if there is a great deal of action, such as in sports, the motion portrayal will be found lacking. Origination at 50 or 60 fields or frames per second would be more appropriate.
Being able to use whichever standard best fits the purpose would be ideal and should be accommodated by the flexibility offered by digital technology.
In the ever-changing world of television, it is important to think wider than local use of programmes, even wider than television itself. Working between SD and HD and between 50 and 60 frames per second is more important than ever as video formats proliferate. 24 fps progressive is very important middle ground from which all current formats can easily and effectively be reached. It neatly side-steps many of the problems associated with frame-rate conversions – especially any need to make them twice! It should be the basis of a global standard.
Beyond TV there is intense and growing interest in Digital Cinema and SMPTE established a group, DC28, to recommend standards for this. 1080/24P cameras are being used for electronic cinematography and some as an alternative to film as a source for television and some shooting material for motion pictures. Digital viewings of Star
Wars 2: Attack of the Clones, have been very well received and it joins a growing list of digitally shot features. While debate continues about standards for D-cinema, a growing number of motion pictures are being screened via the 1080/24P standard. ‘24P’ is proving an effective bridge between television and film and some see a coming together of HD and cinema in the quality end of entertainment.
Method used to map the 24 fps of motion picture film onto the 30 fps (60 fields) of
525-line TV, so that one film frame occupies three TV fields, the next two, etc. It means the two fields of every other TV frame come from different film frames making operations such as rotoscoping impossible, and requiring care in editing. Quantel equipment can unravel the 3:2 sequence to allow frame-by-frame treatment and subsequently re-compose 3:2.
The 3:2 sequence repeats every
6 of a second, i.e. every five TV frames or four film frames, the latter identified as A-D. Only film frame A is fully on a TV frame and so exists at one timecode only, making it the only editable point of the video sequence.
Film frame 24fps
A B C D
TV frames/ fields
30/60 fps a a
TV field b b b c c d d d
Edit point Edit point
This is a set of sampling frequencies in the ratio 4:1:1, used to digitise the luminance and colour difference components (Y, R-Y, B-Y) of a video signal. The 4 represents 13.5
MHz, (74.25 MHz at HD) the sampling frequency of Y, and the 1s each 3.75 MHz
(18.5625) for R-Y and B-Y (ie R-Y and B-Y are each sampled once for every four samples of Y).
With the colour information sampled at half the rate of the 4:2:2 system, this is used as a more economic form of sampling where video data rates need to be reduced. Both luminance and colour difference are still sampled on every line but the latter has half the horizontal resolution of 4:2:2 while the vertical resolution of the colour information is maintained. 4:1:1 sampling is used in DVCPRO (625 and 525 formats) as well as in
A sampling system used to digitise the luminance and colour difference components (Y,
R-Y, B-Y) of a video signal. The 4 represents the 13.5 MHz (74.25 MHz at HD) sampling frequency of Y while the R-Y and B-Y are sampled at 6.75 MHz (37.125 MHz) – effectively between every other line only (ie one line is sampled at 4:0:0, luminance only, and the next at 4:2:2).
This is used in some 625 line systems where video data rate needs to be reduced. It decreases data by 25 percent against 4:2:2 sampling and the colour information has a reasonably even resolution in both the vertical and horizontal directions. 4:2:0 is widely used in MPEG-2 and 625 DV and DVCAM.
A ratio of sampling frequencies used to digitise the luminance and colour difference components (Y, R-Y, B-Y) of a video signal. The term 4:2:2 denotes that for every four samples of Y, there are 2 samples each of R-Y and B-Y, giving more chrominance bandwidth in relation to luminance compared to 4:1:1 sampling.
ITU-R BT.601, 4:2:2 is the standard for digital studio equipment and the terms ‘4:2:2’ and ‘601’ are commonly (but technically incorrectly) used synonymously. The sampling frequency of Y is 13.5 MHz and that of R-Y and B-Y is each 6.75 MHz, providing a maximum colour bandwidth of 3.37 MHz – enough for high quality chroma keying. For
HD the sampling rates are 5.5 times greater, 74.25 MHz for Y, and 37.125 MHz for R-Y and B-Y.
This is the same as 4:2:2 but with the key signal included as the fourth component, also sampled at 13.5 MHz (74.25 MHz at HD).
One of the ratios of sampling frequencies used to digitise the luminance and colour difference components (Y, B-Y, R-Y) or the RGB components of a video signal. In this ratio there is always an equal number of samples of all components. RGB 4:4:4 is commonly used in standard computer platform-based equipment and when sampling film. While it may appear to offer the potential of yet better pictures formed from more data, television recording and transmission systems are based on 4:2:2, 4:1:1, and 4:2:0
Y, B-Y, R-Y sampling so, in most cases, the gains are limited and can be negated by the conversions between the sampling systems. However, film is commonly scanned at 2k and recorded directly to disks. The signal is then kept in the RGB form all the way through the Digital Intermediate process to the film recorder.
As for 4:4:4, except that the key signal (a.k.a. alpha channel) is included as a fourth component, also sampled at 13.5 MHz (74.25 MHz at HD).
A sampling rate locked to four times the frequency of colour subcarrier (f sc
). For example
D2 and D3 digital VTRs sample composite video at the rate of 4 x colour subcarrier frequency (ie 17.7 MHz PAL and 14.3 MHz NTSC). Its use is declining as all new digital equipment is based on component video.
8 VSB / 16 VSB
The Advanced Authoring Format – an industry initiative, launched in March 1998, to create a file interchange standard for the easy sharing of media data and metadata among digital production tools and content creation applications, regardless of platform. It includes EBU/SMPTE metadata and management of pluggable effects and codecs. It allows open connections between equipment where not only video and audio are transferred but also metadata including information on how the content is composed, where it came from, etc. It can fulfil the role of an all-embracing EDL or offer the basis for a media archive that any AAF-enabled system can use.
After years of development it is now used in equipment from a number of manufacturers including Quantel.
The part of a television line that actually includes picture information. This is usually over 80 percent of the total line time.
See also: Active picture
The area of a TV frame that carries picture information. Outside the active area there are line and field blanking which roughly, but not exactly, correspond to the areas defined for the original 525- and 625-line analogue systems. In digital versions of these, the blanked/active areas are defined by ITU-R BT.601, SMPTE RP125 and EBU-E.
For 1125-line HDTV (1080 active lines), which may have 60, 30, 25 or 24 Hz frame rates
(and more), the active lines are always the same length – 1920 pixel samples at 74.25
MHz – a time of 25.86 microseconds – defined in SMPTE 274M and ITU-R.BT 709-4.
Only their line blanking differs so the active portion may be mapped pixel-for-pixel between these formats. DTV standards tend to be quoted by their active picture content, e.g. 1920 x 1080, 1280 x 720, 720 x 576.
For both 625 and 525 line formats active line length is 720 luminance samples at 13.5
MHz = 53.3 microseconds. In digital video there are no half lines as there are in analogue. The table below shows blanking for SD and some popular HD standards.
Active lines 576
Field 1 lines 24
Field 2 lines 25
Line blanking 12µs
525/60 1125/60I 1125/50I 1125/24P
487 1080 1080 1080
625 line/50Hz system
525 line/60Hz system
Active line length
63.5 s (625–line system)
s (525–line system)
ADC or A/D
Analogue to Digital Conversion. Also referred to as digitisation or quantisation.
The conversion of analogue signals into digital data – normally for subsequent use in a digital machine. For TV, samples of audio and video are taken, the accuracy of the process depending on both the sampling frequency and the resolution of the analogue amplitude information – how many bits are used to describe the analogue levels. For TV pictures 8 or 10 bits are normally used; for sound, 16 or 20 bits are common while 24 bits is also possible. The ITU-R BT.601 standard defines the sampling of video components based on 13.5 MHz, and AES/EBU defines sampling of 44.1 and 48 kHz for audio. For pictures the samples are called pixels, which contain data for brightness and colour.
Asymmetrical Digital Subscriber Line – working on the copper ‘local loop’ normally used to connect phones, ADSL provides a broadband downstream channel (to the user) of maximum 1.5-6 Mb/s and a narrower band upstream channel (from the user) of maximum 16-640 kb/s, according to class. Some current implementations offer 2
Mb/s downstream and always-on connections. Its uses include high-speed Internet connections and streaming video over telephone lines.
The Audio Engineering Society (AES) and the EBU (European Broadcasting Union) together have defined a standard for Digital Audio, now adopted by ANSI (American
National Standards Institute). Commonly referred to as ‘AES/EBU’, this digital audio standard permits a variety of sampling frequencies, for example CDs at 44.1 kHz, or digital VTRs at 48 kHz. 48 kHz is widely used in broadcast TV production.
Undesirable ‘beating’ effects caused by sampling frequencies being too low faithfully to reproduce image detail. Examples are:
1) Temporal aliasing – e.g. wagon wheel spokes apparently reversing, also movement judder seen in the output of standards converters with insufficient temporal filtering.
2) Raster scan aliasing – twinkling effects on sharp boundaries such as horizontal lines. Due to insufficient filtering this vertical aliasing and its horizontal equivalent are often seen in lower quality DVEs as detailed images are compressed.
The ‘steppiness’ of unfiltered lines presented at an angle to the TV raster is also referred to as aliasing.
A familiar term for alias effects, such as ringing, contouring and jaggy edges caused by lack of resolution in a raster image. Some can be avoided by careful filtering or dynamic rounding.
Another name for key channel – a channel to carry a key signal.
Generally refers to the use of 16:9 aspect ratio pictures in a 4:3 system. For example, anamorphic supplementary lenses are used to change the proportions of an image to
16:9 on the surface of a 4:3 sensor by either extending the horizontal axis or compressing the vertical. Signals from 16:9 cameras and telecines produce an
‘anamorphic’ signal which is electrically the same as with 4:3 images but will appear horizontally squashed if displayed at 4:3.
The alternative way of carrying 16:9 pictures within 4:3 systems is letterbox. Letterbox has the advantage of showing the correct 16:9 aspect ratio on 4:3 displays, however the vertical resolution is less than 16:9 anamorphic.
Smoothing of aliasing effects by filtering and other techniques. Most, but not all, DVEs and character generators contain anti-aliasing facilities.
Application Programming Interface – a set of interface definitions (functions, subroutines, data structures or class descriptions) which provide a convenient interface to the functions of a subsystem. They also simplify interfacing work by insulating the application programmer from minutiae of the implementation.
Arbitrated Loop (AL)
A technique used on computer networks to ensure that the network is clear before a fresh message is sent. When it is not carrying data frames, the loop carries ‘keep-alive’ frames. Any node that wants to transmit places its own ID into a ‘keep-alive’ frame.
When it receives that frame back it knows that the loop is clear and that it can send its message.
Aspect Ratio Converters change picture aspect ratio – usually between 16:9 and 4:3.
Other aspect ratios are also allowed for, such as 14:9. Custom values can also be used.
Technically, the operation involves independent horizontal and vertical resizing and there are a number of choices for the display of 4:3 originals on 16:9 screens and vice versa (e.g. letterbox, pillar box, full height and full width). Whilst changing the aspect ratio of pictures, the objects within should retain their original shape.
Long-term storage of information. Pictures and sound stored in digital form can be archived and recovered without loss or distortion. The storage medium must be both reliable and stable and, as large quantities of information need to be stored, cost is of major importance. Currently the lowest cost is magnetic tape but there is increasing interest in optical disks and especially DVDs – more expensive but with far better access.
VTRs offer the most cost-effective storage, highest packing density and instant viewing
– even while spooling. Non-compressed component digital formats: D1, D5 (SD) and D6
(HD) give the best quality and 50 Mb/s 3:1 compression for SD is also used (DVCPRO
50, IMX). However, VTRs are format dependent so their long-term use is limited for the world which is increasingly format co-existent.
For stills and graphics compression should be avoided as full detail is required for viewing still images. CDs, DVDs and Magneto-optical (MO) disks are convenient, giving instant access to all pictures.
Today, networking, the proliferation of video formats and the use of digital film have meant a move away from VTRs toward data recorders. DTF-2 is increasingly used for archiving.
Archiving an editing or compositing session, requiring data on all aspects of the session to be stored, becomes practical with integrated equipment (e.g. non-linear suites).
Beyond an EDL this may include parameters for colour correction, DVE, keying, layering etc. This metadata can be transferred to a removable medium such as a floppy disk or an MO. Quantel has introduced an archiving system based on AAF – a format designed for the media industry that carries both essence and metadata.
Traditionally, material is archived after its initial use – at the end of the process.
More recently some archiving has moved to the beginning. An example is news where, in some cases, new material is archived and subsequent editing, etc., accesses this.
This reflects the high value of video assets where rapidly increasing numbers of channels are seeking material.
The density of data held on an area of the surface of a recording medium. This is one of the parameters that manufacturers of disk drives and tape recorders strive to increase. For example, some currently available high-capacity drives achieve over 15
Particular visible effects which are a direct result of some technical limitation. Artefacts are generally not described by traditional methods of signal evaluation. For instance, the visual perception of contouring in a picture cannot be described by a signal-to-noise ratio or linearity measurement.
American Standard Code for Information Interchange. This is a standard computer character set used throughout the industry to represent keyboard characters as digital information. There is an ASCII table containing 127 characters covering all the upper and lower case characters and non displayed controls such as carriage return, line feed etc. Variations and extensions of the basic code are used in special applications.
Application Specific Integrated Circuit. Custom-designed integrated circuit with functions specifically tailored to an application. These replace the many discrete devices that could otherwise do the job but work up to ten times faster with reduced power consumption and increased reliability. ASICs are now only viable for very largescale high volume products due to high startup costs and their inflexibility.
1. – of pictures. The ratio of length to height of pictures. Nearly all TV screens are currently 4:3, i.e. four units across to three units in height but there is a growing move towards widescreen 16:9. Pictures presented this way are believed to absorb more of our attention and have obvious advantages in certain productions, such as sport. In the change towards 16:9 some in-between ratios have been used, such as 14:9.
Website: document R95-2000 at www.ebu.ch/tech_texts/tech_texts_theme.html
2. – of pixels. The aspect ratio of the area of a picture described by one pixel. The ITU-R
BT.601 digital coding standard defines luminance pixels which are not square. In the
525/60 format there are 486 active lines each with 720 samples of which 711 may be viewable due to blanking. Therefore the pixel aspect ratio on a 4:3 screen is:
486/711 x 4/3 = 0.911
(ie the pixels are 10 percent taller than they are wide)
For the 625/50 format there are 576 active lines each with 720 samples of which 702 are viewable so the pixel aspect ratio is:
576/702 x 4/3 = 1.094
(ie the pixels are 9% wider than they are tall)
The newer DTV image standards, and all those for HD, define square pixels.
Account must be taken of pixel aspect ratios – for example in executing a DVE move – when rotating a circle, the circle must always remain circular and not become elliptical.
Another area where pixel aspect ratio is important is in the movement of images between computer platforms and television systems. Computers nearly always use square pixels so their aspect ratio must be adjusted to suit television. This change may not be real-time and its quality will depend on the processing used.
Asynchronous (data transfer)
Carrying no separate timing information. There is no guarantee of time taken but a transfer uses only small resources as these are shared with many others. A transfer is ‘stop-go’ – depending on handshakes to check data is being received before sending more. Ethernet is asynchronous. Being indeterminate, asynchronous transfers of video files are used between storage devices, such as disks, but are not ideal for ‘live’ operations.
Asynchronous Transfer Mode (ATM) provides connections for reliable transfer of streaming data, such as television. With speeds ranging up to 10Gbit/s it is mostly used by telcos.
155 and 622Mbit/s are most appropriate for television operations. Unlike Ethernet and
Fibre Channel, ATM is connection-based: offering good Quality of Service (QoS) by establishing a path through the system before data is sent.
Sophisticated lower ATM Adaptation Layers (AAL) offer connections for higher layers of the protocol to run on. AAL1 supports constant bit rate, time-dependent traffic such as voice and video. AAL3/4 supports variable bit rate, delay-tolerant data traffic requiring some sequencing and/or error detection. AAL5 supports variable bit rate, delay-tolerant connection-oriented data traffic – often used for general data transfers.
The (US) Advanced Television Systems Committee. Established in 1982 to co-ordinate the development of voluntary national technical standards for the generation, distribution and reception of high definition television. In 1995 the ATSC published “The Digital Television
Standard” which describes the US Advanced Television System. This uses MPEG-2 compression for the video and AC-3 for the audio and includes a wide range of video resolutions (as described in ATSC Table 3) and audio services (Table 2). It uses 8 and 16
VSB modulation respectively for terrestrial and cable transmission.
Advanced Television. The term used in North America to describe television with capabilities beyond those of analogue NTSC. It is generally taken to include digital television (DTV) and high definition (HDTV).
The psycho-acoustic phenomenon of human hearing where what can be heard is affected by the components of the sound. For example, a loud sound will mask a soft sound close to it in frequency. Audio compression systems such as Dolby Digital and
MPEG audio use auditory masking as their basis and only code what can be heard by the human ear.
Axis (x, y, z)
Used to describe the three-dimensional axes set at right angles to each other, available in DVE manipulations. At normal (clear) x lies across the screen left to right, y up the screen bottom to top and z points into the screen. Depending on the power of the equipment and the complexity of the DVE move, several hierarchical sets of xyz axes may be in use at one time. For example, one set may be referred to the screen, another to the picture, a third offset to some point in space (reference axis) and a fourth global axis controlling any number of objects together.
Axes controlling picture movement
Recording material into a system, usually a nonlinear editor, as a background task.
Thus the foreground task continues uninterrupted and when one job is completed, the next is already loaded – potentially increasing the throughput of the editing system.
A secondary operation that is completed while the main operation continues uninterrupted. This requires an overhead in machines’ capabilities beyond that needed for their primary foreground operation. This has particular benefits in pressured situations where time is short, or simply not available for extra operations – such as during edit sessions, live programming and transmission. Examples are Quantel’s use of
100BaseT and Gigabit Ethernet for the background transfers of pictures, video, audio and metadata. The equipment is designed so it continues its primary foreground operations during all such transfers.
The amount of information that can be passed in a given time. In television a large bandwidth is needed to show sharp picture detail in real-time, and so is a factor in the quality of recorded or transmitted images. For example, ITU-R BT.601 and SMPTE RP 125 allow analogue luminance bandwidth of 5.5 MHz and chrominance bandwidth of 2.75
MHz for standard definition video. 1080-line HD has a luminance bandwidth of 30 MHz
Digital image systems generally require large bandwidths hence the reason why many storage and transmission systems revert to compression techniques to accommodate the signal.
An analogue component VTR system for SD television, using a half-inch cassette – very similar to the domestic Betamax. This was developed by Sony and is marketed by them and several other manufacturers. Although recording the Y, R-Y and B-Y component signals onto tape, many machines are operated with coded (PAL or NTSC) video in and out.
The system was developed to offer models for the industrial and professional markets as well as full luminance bandwidth (Betacam SP), PCM audio and SDI connections.
A digital tape recording format developed by Sony which uses a constrained version of MPEG-2 compression at the 4:2:2 profile, Main Level ([email protected]). It uses half-inch tape cassettes.
Mathematical representation of a number to base 2, i.e. with only two states, 1 and 0; on and off; or high and low. This is the base of the mathematics used in digital systems and computing. Binary representation requires a greater number of digits than the base
10, or decimal, system most of us commonly use everyday. For example, the base 10 number 254 is 11111110 in binary.
There are important characteristics which determine good digital equipment design.
For example, the result of a binary multiplication contains the sum of digits of the original numbers. For example:
10101111 x 11010100 = 1001000011101100
(in decimal 175 x 212 = 37,100)
Each digit is known as a bit. This example multiplies two 8-bit numbers and the result is always a 16-bit number. So, for full accuracy, all the resulting bits should be taken into account. Multiplication is a very common process in digital television equipment (e.g.
keying, mixes and dissolves).
Binary digIT = bit
One mathematical bit can define two levels or states, on or off, black or white, 0 or 1 etc.; – two bits can define four levels, three bits eight, and so on: generally 2n, where n
= the number of bits. In image terms 10 bits can define 1024 levels of brightness from black to white (with ITU-R BT.601 and 709, 64 = black and 940 = white).
Bit rate reduction (BRR)
Burnt-in Timecode. Timecode that is displayed on the video to which it refers. This is often recorded to provide precise frame references for those viewing on equipment not supplied with timecode readers – such as domestic VCRs.
Blocking and ‘Blockiness’
This new optical disk will be able to hold 27 GB on a single-sided, single-layer CD-sized disk using 405nanometre blue-violet lasers. Manufacturers are aiming to eventually achieve 50 GB disks. The companies that established the basic specifications are:
Hitachi Ltd., LG Electronics Inc., Matsushita Electric Industrial Co., Ltd., Pioneer
Corporation, Royal Philips Electronics, Samsung Electronics Co. Ltd., Sharp
Corporation, Sony Corporation, and Thomson Multimedia.
For recording video and audio (AC3, MPEG-1 Layer-2, etc. supported), the technology uses the global standard MPEG-2 Transport Stream and is able to store two hours of
HDTV and reach data transfer rates up to 36 Mb/s. The same disk could store 13 hours of SD video at 3.8 Mb/s. Random accessing functions make it possible to easily edit video data captured on a video camera or play back pre-recorded video on the disc while simultaneously recording images being broadcast on TV.
Unlike CDs and DVDs, Blu-ray disks are contained within a cartridge to protect the disk from dust and fingerprints.
General term referring to faster-than-telephone-modem connections, i.e. receiving
(download) much faster than 56 kb/s and transmitting (upload) faster than 28 kb/s.
Typically this is used to connect to the Internet via DSL or ADSL over the original copper telephone lines. Cable can offer higher data rates.
Method used with some stills stores, graphics systems and disk-based video stores to display a selection of reduced size or reduced resolution images to aid choice of stored clips or stills. For moving video, a timeline may be available so clips can be shuttled allowing the full sized images to be brought to use pre-cued.
Browse/edit facilities are used in newsroom systems to provide video editing for journalists on their desktops. The material is stored on a browse server and distributed over a network to the many users. Details differ between models but some allow frameaccurate shot selections to be made with the resulting ‘cuts decision lists’ used to conform a broadcast quality version.
An error in a computer program that causes the system to behave erratically, incorrectly or to stop altogether. Term dates from the original computers with tubes and relays, where real live bugs were attracted by the heat and light and used to get between the relay contacts.
An internal pathway for sending digital signals from one part of a system to another.
Broadcast WAV file – an audio file format based on Microsoft’s WAV. It can carry PCM or MPEG encoded audio and adds the metadata, such as a description, originator, date and coding history, needed for interchange between broadcasters.
Byte (B), kilobyte (kB), megabyte (MB), gigabyte (GB), terabyte (TB) and petabyte (PB)
1 Byte (B) = 8 bits (b) which can describe 256 discrete values (brightness, colour, etc.).
Traditionally, just as computer-folk like to start counting from zero, they also ascribe 2 raised to the power 10, 20, 30 etc. (2
30 etc.) to the values kilo, mega, giga, etc.
which become, 1,024, 1,048,576, 1,073,741,824 etc. This can be difficult to handle for those drilled only in base-10 mathematics. Fortunately, disk drive manufacturers, who have to deal in increasingly vast numbers, describe their storage capacity in powers of
10, so a 20 GB drive has 20,000,000,000 bytes capacity. Observation suggests both systems are continuing in use… which could lead to some confusion.
1 kB = 2 10 bytes = 1,024 B
1 MB = 2 20 bytes = 1,048,576 B
1 GB = 2 30 bytes = 1.074 x 10 9 B
1 TB = 2 40 bytes = 1.099 x 10 12 B
1 PB = 2 50 bytes = 1.126 x 10 15 B
New Approx duration
1/5 line 1/8 line
1/5 frame 130 lines
3.5 sec 10 9 B 47 sec 6.4 sec
10 12 B 13_hrs 1_hrs 58 mins
10 15 B 550 days 74 days 40 days
Currently 3.5-inch hard disk drives store from 20-180 GB. Solid-state store chips,
DRAMs, increment fourfold in capacity every generation, hence the availability of 16,
64, and now 256 Mb chips (ie 256 x 2 20 ).
A full frame of digital television, sampled at 8 bits according to ITU-R BT.601, requires just under 1 MB of storage (830 kB for 625 lines, 701 kB for 525 lines). HDTV frames are around 5–6 times larger and 2k digital film frames sampled at 10 bits are 12 MB.
Charge Coupled Device (CCD) – either assembled as a linear or two-dimensional array of light sensitive elements. Light is converted to an electrical charge proportional to the brightness impinging on each cell. The cells are coupled to a scanning system which, after analogue to digital conversion, presents the image as a series of binary digits.
Early CCD arrays were unable to work over a wide range of brightness but they now offer low noise, high resolution imaging up to HDTV level and for digital cinematography.
Comité Consultatif International des Radiocommunications. This has been absorbed into the ITU under ITU-R.
See also: ITU
International Telegraph and Telephone Consultative Committee. As the name suggests this was initially set up to establish standards for the telephone industry in Europe.
It has now been superseded by ITU-T so putting both radio frequency matters (ITU-R) and telecommunications under one overall United Nations body.
Conventional Definition Television. The analogue NTSC, PAL, SECAM television systems with normal 4:3 aspect ratio pictures.
A simple check value of a block of data intended to recognise when data bits are wrongly presented. It is calculated by adding all the bytes in a block. It is fairly easily fooled by typical errors in data transmission systems so that, for most applications, a more sophisticated system such as CRC is preferred.
The process of overlaying one video signal over another, the areas of overlay being defined by a specific range of colour, or chrominance, on the background signal.
For this to work reliably, the chrominance must have sufficient resolution, or bandwidth.
PAL or NTSC coding systems restrict chroma bandwidth and so are of very limited use for making a chroma key which, for many years, was restricted to using live, RGB camera feeds.
An objective of the ITU-R BT.601 and 709 digital sampling standards was to allow high quality chroma keying in post production. The 4:2:2 sampling system allowed far greater bandwidth for chroma than PAL or NTSC and helped chroma keying, and the whole business of layering, to thrive in post production. High signal quality is still important to derive good keys so some people favour using RGB (4:4:4) for keying – despite the additional storage requirements. Certainly anything but very mild compression tends to result in keying errors appearing – especially at DCT block boundaries.
Chroma keying techniques have continued to advance and use many refinements, to the point where totally convincing composites can be easily created. You can no longer ‘see the join’ and it may no longer be possible to distinguish between what is real and what is keyed.
The colour part of a signal, relating to the hue and saturation but not to the brightness or luminance of the signal. Thus pure black, grey and white have no chrominance, but any coloured signal has both chrominance and luminance. Although imaging equipment registers red, blue and green television pictures are handled and transmitted as U and V, or I and Q, or Cr and Cb, or (R-Y) and (B-Y), which all represent the chrominance information of a signal, and the pure luminance (Y).
A camcorder for digital cinematography from Sony. It uses the 1080/24P or 1080/25P
HD format but is equipped for film-like operation.
The name is taken from the film industry and refers to a segment of sequential frames made during the filming of a scene. In television terms a clip is the same but represents a segment of sequential video frames. In Quantel editing systems, a clip can be a single video segment or a series of video segments spliced together. A video clip can also be recorded with audio or have audio added to it.
Quantel term for audio that has been recorded with, or added to, a video clip in an online, nonlinear editor. An audio-only clip is stored with only one frame of video
(for ident) repeated for the audio duration.
A multi-layer clip constructed in the Edit Desk in Quantel editing systems.
Gigabit Ethernet-based networking supported on Quantel’s Editbox, Henry Infinity and Clipbox.
An exact copy, indistinguishable from the original. As in copying recorded material, e.g. copy of a non-compressed recording to another non-compressed recording.
If attempting to clone compressed material care must be taken not to decompress it as part of the process or the result will not be a clone.
Coded Orthogonal Frequency Division Multiplexing – a modulation scheme which is used by the DVB digital television system. It allows for the use of either 1705 carriers
(usually known as ‘2k’), or 6817 carriers (‘8k’). Concatenated error correction is used.
The ‘2k’ mode is suitable for single transmitter operation and for relatively small singlefrequency networks with limited transmitter power. The ‘8k’ mode can be used both for single transmitter operation and for large area single-frequency networks. The guard interval is selectable. The ‘8k’ system is compatible with the ‘2k’ system.
There has been much discussion about the relative merits of COFDM vs the 8-VSB scheme used in the ATSC standard. The Japanese ISDB system uses a similar scheme,
Colour correction (primary)
Altering the colour of material to match the requirements for the overall colour look as defined by the Director of Photography (for film) or director/producer in television.
For film, the Colourist checks and corrects each shot – which is typically done as the material, usually cut camera negative, is scanned in a telecine and recorded to a server or tape. This Primary Colour Correction separately controls the lift, gain and gamma for the red, blue and green channels and is applied to whole pictures. This may be provided within the telecine itself or from a separate unit e.g. Pandora or Da Vinci. The camera negative is usually cut with handles – adding say 20 frames extra at the beginning and end of each shot to allow for later adjustment of edits.
The wider use of diverse material such as video and film shots of various ages makes greater demands on colour correction. Some, of necessity, would be tape-to-tape rather than negative-to-tape – limiting the range of correction available. Documentaries often draw on diverse material and so do some features.
Colour correction (secondary)
In some cases a more selective form of colour correction is needed after the Primary
Correction and is applied to the corrected and recorded material. This Secondary Colour
Correction is aimed at controlling a particular colour or narrow range of colours – such as those on a car or product. Here typically the hue, gain and saturation can be changed.
There are also several methods available for defining the object or area of required colour correction such as using wipe-pattern shapes, drawing an electronic mask by hand or a combination of automatic and by-hand methods. Some of the most sophisticated tools are provided by media workstations such as Quantel’s eQ and iQ.
The colour range between specified references. Typically three references are quoted in television: RGB, Y R-Y B-Y and Hue Saturation and Luminance (HSL). In print, Cyan,
Magenta, Yellow and Black (CMYK) are used. Film is RGB. Moving pictures between these is possible but requires careful attention to the accuracy of processing involved. Operating across the media – print, film and TV, as well as between computers and TV equipment – will require conversions in colour space.
Electronic light sensors detect red, blue and green light but TV signals are usually changed into Y, R-Y and B-Y components at, or very soon after the camera or telecine. There is some discussion about which colour space is best for post production – the main potential advantages being in the area of keying. However, with most mass storage and infrastructure being component-based the full RGB signal is usually not available so, any of its advantages can be hard to realise.
The increasing use of disk storage and networking able to carry RGB is beginning to allow its wider use. Even so, it takes up 50 percent more storage and, for most productions, its benefits over component working are rarely noticed. One area that is fixed on RGB use is 2k film. Modern digital techniques allow the use of both RGB and Y
R-Y B-Y to best suit production requirements.
Common Image Format (CIF)
The ITU has defined common image formats. A standard definition image of 352 x 240 pixels is described for computers. For HDTV production the HD-CIF preferred format is defined in ITU-R BT.709-4 as 1920 x 1080, 16:9 aspect ratio with progressive frame rates of 24, 25 and 30 Hz (including segmented) and interlace field rates of 50 and 60
Hz. This has helped to secure the 1920 x 1080 format as the basis for international programme exchange.
The normal interpretation of a component video signal is one in which the luminance and chrominance remain as separate components, e.g. analogue components in
Betacam VTRs, digital components Y, Cr, Cb in ITU-R BT.601 and 709. RGB is also a component signal. Pure component video signals retain maximum luminance and chrominance bandwidth and the frames are independent. Component video can be edited at any frame boundary.
Luminance and chrominance are combined along with the timing reference ‘sync’ information using one of the coding standards – NTSC, PAL or SECAM – to make composite video. The process, which is an analogue form of video compression, restricts the bandwidths (image detail) of components. In the composite result colour is literally added to the monochrome (luminance) information using a visually acceptable technique. As our eyes have far more luminance resolving power than for colour, the colour sharpness (bandwidth) of the coded signal is reduced to far below that of the luminance. This provides a good solution for transmission but it becomes difficult, if not impossible, to accurately reverse the process (decode) into pure luminance and chrominance which limits its use in post production.
Simultaneous multi-layering and design for moving pictures. Modern video designs often use many techniques together, such as painting, retouching, rotoscoping, keying/matting, digital effects and colour correction as well as multi-layering to create complex animations and opticals for promotions, title sequences and commercials as well as in programme content. Besides the creative element there are other important applications for compositing equipment such as image repair, glass painting and wire removal – especially in motion pictures.
The quality of the finished work, and therefore the equipment, can be crucial especially where seamless results are demanded. For example, adding a foreground convincingly over a background – placing an actor into a scene – without any tell-tale blue edges or other signs that the scene is composed.
Reduction of bandwidth or data rate for audio. Many digital schemes are in use, all of which make use of the way the ear hears (e.g. that a loud sound will tend to mask a quieter one) to reduce the information sent. Generally this is of benefit in areas where bandwidth and storage are limited, such as in delivery systems to the home.
The process of reducing the bandwidth or data rate of a video stream. The analogue broadcast standards used today, PAL, NTSC and SECAM are, in fact, compression systems which reduce the data content of their original RGB sources.
Digital compression systems analyse their picture sources to find and remove redundancy and less critical information both within and between picture frames.
The techniques were primarily developed for digital data transmission but have been adopted as a means of reducing transmission bandwidths and storage requirements on disks and VTRs.
A number of compression techniques are in regular use, these include ETSI, JPEG,
Motion JPEG, DV, MPEG-1, MPEG-2 and MPEG-4. Where different techniques are used in the same stream, problems can occur and picture quality can suffer more than if the same method is used throughout.
The MPEG-2 family of compression schemes, which was originally designed for programme transmission, has been adapted for studio use in Betacam SX and IMX recorders.
While there is much debate, and new technologies continue to be developed, it remains true that the best compressed results are produced from the highest quality source pictures. Poor inputs do not compress well. Noise, which may be interpreted as important picture detail, is the enemy of compression.
The ratio of the size of data in the non-compressed digital video signal to the compressed version. Modern compression techniques start with component television signals but a variety of sampling systems are used, 4:2:2 (‘Studio’ MPEG-2), 4:2:0 (MPEG-2), 4:1:1
(DVCPRO), etc. The compression ratio should not be used as the only method to assess the quality of a compressed signal. For a given technique, greater compression can be expected to result in worse quality but different techniques give widely differing quality of results for the same compression ratio. The only sure method of judgement is to make a very close inspection of the resulting pictures – where appropriate, re-assessing their quality after onward video processing.
Video Compression in Post Production
Video compression is commonly used throughout the television production chain – in acquisition, post production and transmission. There are many types so which should be used where? And for post production and editing, need compression be used at all?
Non-compressed digital video requires large volumes of storage and high-speed data transfer for real-time working. Digital coding at standard definition (ITU-R BT.601) produces 21 MB/s of data (8-bit 4:2:2 sampling) and for high definition (ITU-R BT.709) the rate is up to 156 MB/s (10-bit 4:2:2 sampling: see Byte, Storage capacity). Keeping video non-compressed maintains the quality regardless of copying and generations but compression is often used and is essential in some areas. All digital television home delivery: whether via DVDs or transmission (terrestrial, cable, satellite or Internet), uses
MPEG-2 coding for very high compression ratios – as much as 100:1 or more – to provide the required services over limited available bandwidth. Compression is also common at the beginning of the production chain – in acquisition where weight and size of camcorders needs to be manageable. Most video recorders use compression, saving tape and lowering costs. Exceptions are D1, D5 and D6 (HD).
In the studio, different rules apply. Although savings in tape may still be required, bandwidth for high-speed connections is quite easily available while size and portability are not a priority. In deciding whether to use compression and if so, what type, consideration must be given to both upstream and downstream operations. Upstream, most material is supplied compressed on DV, Digital Betacam, SX, IMX, etc. with only high-end programmes and commercials using D1 and D5. For HD there are HDCam,
DVCPRO HD and D5 HD compressed formats and D6 non-compressed. Downstream, digital broadcasters use MPEG-2 compression and can reliably deliver very high quality pictures to homes which are clearly seen on better, larger screens. As a result, technical quality in programmes is more important than ever as MPEG-2 coders can both compress more and give better-looking results, given high-quality, low-noise input material.
The rate of expansion of post production capabilities has increased since the introduction of online nonlinear working. Freedom to use them to the full is important.
Historically, operations have been limited by generation losses caused by successive recording and processing of analogue material. Digital techniques have the potential to avoid these and provide zero loss from input to edited master. This then passes near first generation quality to the broadcaster.
In practice, some compromise is made but the object is to keep this to a minimum.
With no losses due to the storage medium itself (true for disks, nearly true for tape), the signal should remain bit-perfect provided that it undergoes no processing or decode/recode cycles on the way. One possibility may be to stay with the compression scheme (if it is open) used in the original recording – accepting it in its native form.
This way the exact same signal appears on the edited master as was presented at the input. But all editing requires that clips are cut and spliced together. If each frame of the video is self-contained – intra-frame (I-frame only) compressed – then any frame can be cut to any other without processing. This is always the case for DV compression; however MPEG-2 commonly includes inter-frame compression working over a group of pictures (GOP). Cutting this to a higher accuracy than the GOP size (up to 13 is used) requires processing – decoding and recoding. Recently I-frame only MPEG-2 compression has been introduced which uses 4:2:2 sampling – as in the Sony IMX recorders (MPEG-2 is usually 4:2:0). There is agreement that DVCPRO 50 and I-only
MPEG-2 [email protected], both at 50 Mb/s, are suitable for SD studio use (see EBU D84-1999
– Use of 50 Mbit/s MPEG compression in television programme production – available at www.ebu.ch/tech_texts.html).
There are schemes, as defined in SMPTE 319M and 327M, where ‘recoding information’
(a.k.a. Mole) is embedded in the bitstream to ensure that a decoded long GOP MPEG-2 signal can be recoded exactly as it was originally – resulting in only minimal losses.
This scheme was intended to allow regional stations to accept MPEG-2 signals and to add station IDs and local presentation to their re-transmission. It has been proved to work well but the technique is only valid where there is no change applied to the pictures – making it of limited use in post production.
Although it is said that 85 percent of edits are cuts-only, nearly every programme needs to add other transitions such as wipes and dissolves as well as DVE moves, colour correction, keying, multi-layering, paint, graphics and more. All these can only be applied to the baseband, non-compressed signal. If storage is compressed, anything more than a cut requires a decode/recode cycle – which is not compatible with the drive for quality. However, choosing edit storage to match acquisition compression can work well in a restricted environment – such as news production.
If interoperability without any loss of quality is required in an unrestricted environment, then non-compressed 601 or 709 is the only solution. It is the most common format use in digital television and it keeps flexibility of operation and quality of pictures.
In the past, compression had appeal for its reduced requirements of storage and data handling. It was needed to supply low-cost, nonlinear solutions at a time when storage costs were high. This position has been hugely changed by the onward march of disk technology. As non-compressed SD requires 21 MB/s or 75 GB/h and a couple of hours with dual real-time bandwidth is good for fast operation, considerable demands are put on disk storage and networks. Today the costs of meeting such requirements are greatly reduced and no longer a practical barrier, hence the continuing trend toward non-compressed on-line for TV professionals.
For HD, the numbers are up to seven times larger at 560 MB/h – a factor that has been covered by just 2-3 years disk development. Already full bandwidth, non-compressed
HD storage has been produced in a practical form from Quantel. This is a key enabler for the theme of non-compressed post production to be continued into the HD era.
An impact of technological development is divergence as individual applications are more exactly honed to particular requirements. Digital technology is enabling this in television. In particular, compression schemes that are successful in acquisition or transmission will not necessarily be as effective or applicable in post production.
Keeping post production non-compressed offers the best solution in the drive to provide the creative and technical quality needed to serve today’s digital viewers.
It also gives the flexibility to handle the inevitable multiple compression systems once the media is distributed
The linking together of systems in a linear manner. In digital television this often refers to the concatenation of compression systems which is a subject of concern because any compression beyond about 2:1 results in the removal of information that cannot be recovered. As the use of compression increases, so too does the likelihood that material will undergo a number of compressions between acquisition and transmission. Although the effects of one compression cycle might not be very noticeable, the impact of multiple decompressions and recompressions – with the material returned to baseband in between – can cause considerable damage. The damage is likely to be greatest where different compression schemes are concatenated in a particular signal path.
Video Rushes Edit Trans- mission
Cutting together recorded material according to a prepared scheme such as a rough cut or
EDL. EDLs can be used to directly control conforming in an online edit suite (autoconforming). The time to conform varies widely, from a tape-based suite which takes much longer than the finished programme running time, to a nonlinear online suite with true random access to all material. This reduces time by loading material in C-mode (in the order it is recorded rather than the order required for the finished programme) and the conforming itself takes only a moment and still allows any subsequent adjustments to be easily made.
Note that with in-server editing, material may be loaded onto the server as an independent task, rather than involving the edit equipment itself. This circumvents the loading time so further reducing the total time to produce the finished programme. The same is also true of nonlinear edit systems with the bandwidth to support background loading.
Clearing continuous space on a disk store to allow consistent recording. This generally involves the moving of data on the disks to one area, leaving the remainder free so that recording can proceed track-to-track – without having to make random accesses.
The larger the amount of data stored, the longer consolidation may take. Careful consideration must be given to large-capacity multi-user systems, such as video servers, especially when used for transmission or on-air.
The need for consolidation arises because of the store’s inability to continuously record television frames randomly at video rate. This is taken care of by Quantel’s Frame
Magic. Recording can take place over small, scattered areas of stores so there is no need for consolidation.
Consolidated data Free space available for recording
Constant bit rate (CBR) compression
Many compression systems are used to create a fixed rate of output data. This is usually to fit within a given bandwidth such as that available on a video tape recorder or a constant bit rate transmission channel. With video, the useful information contained in the material varies widely both spatially and with movement. For example, a football match with crowds and grass texture as well as fast panning cameras typically contains far more information than a largely static head-and-shoulders shot of a newsreader. Using constant bit rate means that the quality is altered to reduce the information to fit the fixed bit rate. In the football case, the grass may go flat during a pan with the texture reappearing when the camera is still.
As overflowing the available bit rate could have disastrous results with bits being lost, the aim is always to use just under the available bit rate. The degree of success in almost filling the available space is a measure of the quality and efficiency of the compression system.
Pictures, sound, text, graphics, etc., that are edited and ready for delivery to customers
– typically as programmes for television.
An unwanted artefact similar to posterisation. Digital video systems exhibit contouring when insufficient quantising levels or inaccurate processing are used, or poor truncation occurs. The result is that the picture’s brightness changes in steps that are too large and become visible over relatively even brightness areas – like sky.
A linear track recorded onto video tape at frame frequency as a reference for the running speed of a VTR, for the positioning or reading of the video tracks and to drive a tape counter. It is a magnetic equivalent of sprocket holes in film. One of the main purposes of ‘striping’ tapes is to record a continuous control track for the pictures and audio to be added later – as in insert editing. Control tracks are not used in disk recording and nonlinear editing.
A technique for controlling the position and rotation of pictures in a DVE by dragging their corners to fit a background scene: for example, to fit a (DVE) picture into a frame hanging on a wall. Corner pinning was developed by Quantel as a practical alternative to precisely setting the many parameters needed to accurately position a picture in
This works well with graphical user interfaces, e.g. pen and tablet. It can also be combined with the data derived from four-point image tracking to substitute objects in moving images, for example replacing the licence plate on a moving vehicle.
This is a sampling technique applied to colour difference component video signals (Y,
Cr, Cb) where the colour difference signals, Cr and Cb, are sampled at a sub-multiple of the luminance, Y, frequency – for example as in 4:2:2. If co-sited sampling is applied, the two colour difference signals are sampled at the same instant, and simultaneously with a luminance sample. Co-sited sampling is the ‘norm’ for component video as it ensures the luminance and the chrominance digital information is coincident, minimising chroma/luma delay.
4:2:2 Co-sited sampling
Cyclic Redundancy Check – an advanced checksum technique used to recognise errors in digital data. It uses a check value calculated for a data stream by feeding it through a shifter with feedback terms ‘EXORed’ back in. It performs the same function as a checksum but is considerably harder to fool.
A CRC can detect errors but not repair them, unlike an ECC. A CRC is attached to almost any burst of data which might possibly be corrupted. On disks any error detected by a CRC is corrected by an ECC. ITU-R BT.601 and 709 data is subjected to CRCs, if an error is found the data is concealed by repeating appropriate adjacent data. Ethernet packets also use CRCs.
Changing video material from one HD format to another. For example, going from
720/60P to 1080/50I is a cross conversion. Note that this involves both changing picture size and the vertical scan from 60 Hz progressive to 50 Hz interlaced. Similar techniques are used as for standards conversion but the HD picture size means the processing has to work five or six times faster.
Carrier Sense Multiple Access with Collision Detection – a flow control used in Ethernet and standardised in IEEE 802.3 that sounds worse than it is. Simply put, it is like polite conversation among many people. The rules are that only one person talks at once, others listen but do not interrupt. Should two start together they both stop and wait a random period and only start again if no one is talking. Clearly, with many people (users) wishing to communicate it may take a while to deliver a message, which explains how
Ethernet can connect many but data transfers are not deterministic.
A transition at a frame boundary from one clip or shot to another. On tape a cut edit is performed by recording (dubbing) the new clip at the out-point of the last, whereas with
Frame Magic’s true random access storage no re-recording is required – there is simply an instruction to read frames in a new order. Simple nonlinear disk systems may need to shuffle, or de-fragment their recorded data in order to achieve the required frame-toframe access for continuous replay.
The editable frame boundaries may be restricted by video coding systems such as PAL,
NTSC, SECAM and MPEG-2. Non-compressed component video and that compressed using I-frame only compression (e.g. DV, motion JPEG or I-only MPEG-2) can be edited on any frame boundary without additional processing.
JPEG & DV,
A format for digital video tape recording working to the ITU-R BT.601, 4:2:2 standard using 8-bit sampling. The tape is 19 mm wide and allows up to 94 minutes to be recorded on a cassette.
Being a component recording system it is ideal for studio or post production work with its high chrominance bandwidth allowing excellent chroma keying. Also multiple generations are possible with very little degradation and D1 equipment can integrate without transcoding to most digital effects systems, telecines, graphics devices, disk recorders, etc. Despite the advantages, D1 equipment is not extensively used in general areas of TV production due, at least partly, to its high cost.
A VTR standard for digital composite (coded) PAL or NTSC signals. It uses 19 mm tape and records up to 208 minutes on a single cassette. Neither cassettes nor recording formats are compatible with D1. Being relatively costly and not offering the advantages of component operation the format has fallen from favour. VTRs have not been manufactured for some years.
A VTR standard using half-inch tape cassettes for recording digitised composite (coded)
PAL or NTSC signals sampled at 8 bits. Cassettes are available for 50 to 245 minutes.
Since this uses a composite signal the characteristics are generally as for D2 except that the half-inch cassette size has allowed a full family of VTR equipment to be realised in one format, including a camcorder.
There is no D4. Most DVTR formats hail from Japan where 4 is regarded as an unlucky number.
A VTR format using the same cassette as D3 but recording component signals sampled to
ITU-R BT.601 recommendations at 10-bit resolution. With internal decoding D5 VTRs can play back D3 tapes and provide component outputs.
Being a non-compressed component digital video recorder means D5 enjoys all the performance benefits of D1, making it suitable for high-end post production as well as more general studio use. Besides servicing the current 625 and 525 line TV standards the format also has provision for HDTV recording by use of about 4:1 compression (D5 HD).
A digital tape format which uses a 19mm helical-scan cassette tape to record noncompressed High Definition Television material. D6 is the only High Definition recording format defined by a recognised standard. The Thomson VooDoo Media Recorder is based on D6 technology.
D6 accepts both the European 1250/50 interlaced format and the Japanese 260M version of the 1125/60 interlaced format which uses 1035 active lines. ANSI/SMPTE
277M and 278M are D6 standards.
This is assigned to DVCRPO.
This is assigned to Digital-S
This refers to Sony’s MPEG IMX VTRs that record I-frame only 4:2:2-sampled MPEG-2 SD video at 50 Mb/s onto half-inch tape. In bit rate, this sits IMX between Betacam SX and
Digital Betacam. There is a Gigabit Ethernet card available which has caused some to dub it the eVTR as it can be considered more as a ‘storage medium’ for digital operations.
The HDCam VTR format has been assigned D11.
This is assigned to DVCPRO 100, or DVCPRO HD
This is a file system that rotates and delivers its content into a network at a defined point in a cycle – for example, teletext pages. It is a method to make a large amount of information or data files available within a reasonably short time after a request. The data is inserted into the digital broadcast transport stream.
Machines designed to record and replay data. They usually include a high degree of error correction to ensure that the output data is absolutely correct and, due to their recording format, the data is not easily editable. This compares with video recorders which will conceal missing or incorrect data by repeating adjacent areas of picture and which are designed to allow direct access to every frame for editing.
Where data recorders are used for recording video there has to be an attendant ‘work station’ to produce signals for video and audio monitoring, whereas VTRs produce the signals directly. Although many data recorders are based on VTRs’ original designs, and vice versa, VTRs are more efficient for pictures and sound while data recorders are most appropriate for data. They are useful for archiving and, as they are format-independent, can be used in multi-format environments.
Sometimes used as a generic term for film scanner.
SMPTE Task Force On Digital Cinema. DC 28 is intended to aid digital cinema development by determining standards for picture formats, audio standards and compression, etc.
Although the HD-based digital cinema presentations have been very well received, yet higher standards may be proposed to offer something in advance of today’s cinema experience.
Advances in electronic projectors have created the possibility of cinemas operating without film. Similarly, advances in telecoms networking and server technology enable the delivery, local storage and replay of high-quality digital movie and television content.
The projector designs are mainly based on Digital Micromirror Devices (DMD) and
Image Light Amplifier (ILA) technology. DMD is fundamentally a pixelated display and high-end ILA is a raster scanned display. Both have their merits and both are challenging film projection in terms of screen quality. At the high-end, DMD and ILAbased devices can handle many kilowatts of light. They challenge 35 mm projectors in brightness and contrast in terms of print film; also their resolution challenges the film process. Colourimetry and absolute black level remain somewhat of an issue. If digital
(non film) movie making becomes technically and commercially viable, film colourimetry at the theatre may not be necessary.
At the low end, similar low-cost versions of these and other technologies now offer simple reliable projection of acceptable quality images suitable for sports bar, cinema club and home theatre environments.
Digital Cinema projectors can generally display 24 frame-per-second, high-resolution images as well as HDTV and SDTV. Real-time image processing is necessary to change the size of the source image to that of the projected image. Many technologies are available for this.
Accommodating a range of image sources within the Digital Cinema allows the creation of new business models for the extended use of auditoriums; such as live showing of sports events, interactive large-audience events and many others. Similarly, low-cost projectors enable small outlets for the independent filmmaker before, or instead of, release to the major cinemas, whether film or digital.
Supplying material to the projector may be by a local server for movies etc. or via cable or satellite for live television. Material such as movies can be loaded into the local servers via dedicated telecom networks, Internet or even via DVD ROMs.
One enabler for digital cinemas is digital compression. No open standards have yet been agreed but a data rate of the order of 50 Mb/sec and an image size of the order of
2k x 1k pixels have been favoured.
Digitisation of ‘film’ is also possible at the studio or the location shoot. Recent advances in acquisition technology allow real-time, high-resolution images to be captured and recorded, at 1920 x 1080 HD, 24 fps progressive. Video image capture should not be confused with electronic 24 fps acquisition. The latter employs film-like techniques, for example, capturing complete images in a single shot (not as interlaced scans) and recording them as progressive segmented frames. It also allows image sensors to
smear the image in a filmic manner – blurring motion. Digital high-resolution acquisition will get cheaper. To date Sony HDCam, Panasonic DVCPRO HD and Thomson Viper are the main contenders, all with 24P scanning.
Low acquisition costs will impact the economic model of movie making. Whilst there is no substitute for full film crews ensuring the attention to detail of lighting, sets, continuity and camerawork, for some productions a smaller ‘video style’ crew may suffice.
Quality somewhere between progressive scan SDTV and HDTV may be sufficient if the production is unlikely to be transferred back to film or where the storyline is so compelling that the image quality is of lesser importance. Low cost independently produced movies could be made and distributed using ‘e-film’ and Digital Cinema.
For image resolution the ideal solution is to use the same or larger image matrix in the camera compared to the projector. This assists transparency from source to screen.
Using the same compression technology for cameras/recorders, transmission and in the servers could also help avoid or reduce image artefacts caused by concatenation. But, given sufficient overhead at acquisition, mixed compression schemes may suffice.
Ultimately there is no substitute for high data rate but it incurs costs at acquisition, distribution and display. Today, post production of whole motion pictures generally operates with non-compressed 2k images. Keeping that quality through to the cinema screen will challenge the business model.
Benefits of Digital Cinema could be unattended operation, flexibility of programme content, elimination of the multiple copies of film, choice of distribution means, digital transparency and digital security through the distribution chain. In turn this could create new business opportunities at venues large and small.
The whole process is the subject of many worldwide demonstrations. Unlike the relative simplicity and universality of 35 mm film, digital technologies are subject to rapid development and intense competition. In spite of the differences, industry groups such as
SMPTE have brought together interested parties to develop a set of technical standards from scene to screen. These are important if Digital Cinema or e-film is to match the longevity of film.
Discrete Cosine Transform – widely used as the first stage of compression of digital video pictures. DCT operates on blocks of the picture (usually 8 x 8 pixels) resolving them into frequencies and amplitudes. In itself DCT may not reduce the amount of data but it prepares it for following processes that will. JPEG, MPEG and DV compression depend on DCT.
Using D2 tape, data recorders have been developed offering (by computer standards) vast storage of data (which may be images). A choice of data transfer rates is available to suit computer interfaces. Like other computer storage media, images are not directly viewable, and editing is difficult.
Units of measurement expressing ratios of power that use logarithmic scales to give results related to human aural or visual perception. Many different attributes are given to a reference point termed 0 dB – for example a standard level of sound or power with subsequent measurements then being relative to that reference. Many performance levels are quoted in dB – for example signal to noise ratio (S/N).
Decibels are given by the expression:
P1/P2 where power levels 1 and 2 could be audio, video or any other appropriate values.
A very efficient form of server-based nonlinear editing where only the new media created in the editing process is sent back to the server. For cuts-only editing, the stored result would simply be a shot list with durations. If transitions are created, such as dissolves, wipes, DVE moves etc., these represent new frames that are processed by the editing workstation and sent to the server to be included as a part of the shot list.
Delta editing contrasts with dub editing or some NLE technology where every frame in the final edit has to be copied to a new file.
Tests to check the correct operation of hardware and software. As digital systems continue to become more complex, built-in automated testing becomes an essential part of the equipment. Some extra hardware and software has to be added to make the tests operate. Digital systems with such provisions can often be quickly assessed by a trained service engineer, so speeding repair.
Digital Broadcasting Experts Group, founded September 1997 to drive the growth of digital broadcasting and international contribution by promoting the exchange of technical information and international co-operation. It is predominantly made up of
Japanese manufacturers and the Japanese Ministry of Posts and Telecoms and has produced ISDB, a specification for digital broadcasting in Japan.
A development of the original analogue Betacam VTR which records SD video and audio digitally onto a Betacam-style cassette. It uses mild intra-field compression to reduce the ITU-R BT.601 sampled video data by about 2:1. Some models can replay both digital and analogue Betacam cassettes.
Shooting ‘film’ digitally. The largest digital acquisition format available is HD at 1920 x
1080 pixels – slightly less than 2k. However, this has already been used with great success on whole features as well as an alternative to 35 mm and Super 16 mm for television. Camcorders used to date have been Sony’s CineAlta and Panasonic’s
Varicam cinema cameras (1280 x 720 but with variable frame rate up to 60 Hz).
All have film-style attachments such as feature lenses.
Some have not cared so much for the digital approach because of the television frontend processing which is seen as limiting the room for downstream adjustments, the use of compression, less latitude (compared with camera negative) and increased depth of field due using smaller light sensors than a 35 mm frame. The more recent introduction of the Viper FilmStream Data camera from Thomson, which allows the output of almost raw 10-bit log RGB data from its CCDs, has caused a good deal of interest thanks to its greater latitude (closer to film) and potential for later adjustment as it can output 10-bit
log images. However, recording this data requires an attendant disk pack which is not as compact as a camcorder configuration. Viper can be used in both TV and film modes.
Websites: www.sonybiz.net/cinealta www.panasonic.com/pbds www.thomsongrassvalley.com/products/cameras/viper
Digital Disk Recorder (DDR)
Disk systems that record digital video. Their application is often as a replacement for a
VTR or as video caches to provide extra digital video sources for far less cost than a
DVTR. They have the advantage of not requiring pre-rolls or spooling but they are not necessarily able to randomly access video frames in real time.
Digital keying and chroma keying
Digital keying differs from analogue chroma keying in that it can key uniquely from any one of the billion colours of component digital video. It is then possible to key from relatively subdued colours, rather than relying on highly saturated colours which can cause colour-spill problems on the foreground.
A high quality digital chroma keyer examines each of the three components Y, B-Y, R-Y of the picture and generates a linear key for each. These are then combined into a composite linear key for the final keying operation. The use of three keys allows much greater subtlety of selection than with a chrominance-only key.
Digital mixing requires ‘scaling’ each of two digital signals and then adding them. A and
B represent the two TV signals and K the positional coefficient or value at any point of the mix between them (i.e. equivalent to the position of the transition arm on a switcher desk). In a digital system, K will also be a number, assumed here as 8-bit resolution to provide a smooth mix or dissolve.
Mathematically this can be shown as:
A x K = (Mix)1
B x (1-K) = (Mix)2
Result = (Mix)1 + (Mix)2
Note that such maths also applies to soft edge keys and any transparency created between two images. As such it is a fundamental part of video processing and good quality results are essential.
When two 8-bit numbers are multiplied together, the result is a 16-bit number (see
Binary). When mixing, it is important to add the two 16-bit numbers to obtain an accurate result. This result must then be truncated or rounded to 8 bits for transmission to other parts of the digital system.
Truncation by simply dropping the lower bits of the partial result (Mix)1 or (Mix)2, to 10 bits, or even 12 or 14 bits, will introduce inaccuracies. Hence it is important that all partial results, e.g. (Mix)1 and (Mix)2, maintain 16-bit resolution. The final rounding of the result to 8 bits can reveal visible 1-bit artefacts – but these can be avoided with
Assigned as D9, this is a half-inch digital tape format which uses a high-density metal particle tape running at 57.8mm/s to record a video data rate of 50 Mb/s. The tape can be shuttled and searched up to x32 speed. Video, sampled at 4:2:2, is compressed at
3.3:1 using DCT-based intra-frame compression. Two audio channels are recorded at
16-bit, 48 kHz sampling; each is individually editable. The format also includes two cue tracks and four further audio channels in a cassette housing with the same dimensions as VHS.
A system which converts an analogue input to a digital representation. Examples include analogue to digital converters (ADCs), touch tablets and mice. Some of these, mouse and touch tablet for example, are systems which take a spatial measurement and present it to a computer in a digital format.
Time taken to record footage into a disk-based editing system. The name suggests the material is being played from an analogue source which, with the rapidly increasing use of DVTRs, is less and less the case. A better term is ‘loading’. Use of high-speed networking may enable background loading – eliminating digitising time at an edit suite.
Digitising time is often regarded as dead time but it need not be. It can be reduced if some initial selection of footage has been made – for example by logging. Also, footage can be marked while loading and so be instantly available as a rough cut on completion, so speeding the final edit. The process is sometimes referred to as Triage, particularly where it is used to select and pre-edit clips from a live feed.
Discrete 5.1 Audio
This reproduces six separate (discrete) channels – Left, Centre, Right, Left Rear,
Right Rear, and sub-woofer (the .1). All the five main channels have full frequency response which, together with a separate low-frequency sub-woofer, create a threedimensional effect.
Disk drives (fixed)
Hard or fixed disk drives comprise an assembly of up to 12 rigid platters coated with magnetic oxide, each capable of storing data on both sides. Each recording surface has an associated read/write head, and any one may be activated at a given instant.
Disk drives give rapid access to vast amounts of data, are highly reliable as they have only two moving parts – the swinging head assembly and the spinning disk. They can be written and read millions of times. The use of disks to store video has changed many aspects of production editing, and transmission.
For high capacity, disks pack data very tightly indeed. Areal density, the amount of data stored per unit area of the disk surface, is one measure of technology. Currently available drives achieve up to 15 Gb per square inch but reportedly, several times that has already been achieved in the lab, so delivered capacities are still set to grow.
For this the heads float only a few molecules off the disk surface, so that even minute imperfections in the surface can cause heating of the head assembly. Track widths are minute – currently up to 31,000 TPI (tracks per inch). As a result, high capacity disk drives have to be handled with great care, especially when running. Vibration could easily send heads off-track or crashing into the disk surface – with possible terminal consequences.
Disk drive capacity 2000-2010
2000 2005 2010
It is predicted that, in this decade, development will continue at the same pace as the last decade. The graph shows this at both the 20-year historic rate of doubling every two years (41%/year) and the 60%/year achieved more recently. In late 2002 we are still tracking the 60% line.
In general this means ever-higher capacities continuing to become available in smaller packages at lower cost per megabyte. Those currently available range from about 20 to
180 GB for 3.5-inch drives. Peak data transfer rates, which are a factor of both data density along the tracks and rotational speed, range from 26-46 MB/sec. Average rates are quoted up to 36 MB/s – and clearly enough to sustain uncompressed ITU-R BT.601
video. The 156 MB/s for HD requires the use of an array of disks.
While capacities grow and data transfers become faster, access time changes relatively little. Increases in the rotation speed from 10,000 RPM to approximately 15,000 RPM, expected soon, will help reduce access times. For redundant storage and multi-channel bandwidth performance it is necessary to use specialist RAIDs. These will move on to handle more demanding requirements for multiple channels and high definition.
The computer industry has developed a series of display resolutions (see below) which span television’s SD and HD, and QXGA is identical to the 2k image size used for digital film*. The availability of hardware to support these resolutions has, and will continue to benefit television and digital film. There is already a QXGA projector on offer.
All use square pixels and none correspond exactly to television formats so attention to size and aspect ratio is needed when using computer images on TV and vice versa.
640 x 480 pixels
800 x 600 pixels
1024 x 768 pixels
1280 x 1024 pixels
1600 x 1280 pixels
2048 x 1536 pixels
M Pixels Aspect ratio
4 x 3
4 x 3
4 x 3
4 x 3
5 x 4
4 x 3
Video SD 720 x 576 (not square pixels)
720 x 480 (not square pixels)
1920 x 1080
2k digital film 2048 x 1556
* The image area of Full Frame film 35 mm images is usually scanned to occupy 2048 x
1536 pixels. The extra 20 lines scan the black strip between successive frames which only carries image information if film is shot with an open gate.
In digital television, analogue original pictures are converted to digits: a continuous range of luminance and chrominance values is translated into a finite range of numbers.
While some analogue values will correspond exactly to numbers, others will, inevitably, fall in between. Given that there will always be some degree of noise in the original analogue signal the numbers may dither by one Least Significant Bit (LSB) between the two nearest values. This has the advantage of providing a means by which the digital system can describe analogue values between LSBs to give a very accurate digital rendition of the analogue world.
Analogue output of original analogue input via ADC/DAC process
If the image is produced by a computer, or is the result of digital processing, the dither may not exist – which can lead to contouring effects. With the use of Dynamic
Rounding dither can be intelligently added to pictures to give more accurate, better looking results.
(Texas Instruments Inc) Digital Light Processing technology, the projection and display technology which uses DMDs as the light modulator. It is a collection of electronic and optical subsystems which enable picture information to be decoded and projected as high-resolution digital colour images. DLP technology enables very compact, high brightness projectors to be made and more than one million systems had been sold by early 2002. It has already been used in high definition television sets in the USA.
Lower cost widescreen digital sets will become available in Europe in 2002.
See also: DLP Cinema, DMD
(Texas Instruments Inc) DLP Cinema technology is a version of DLP technology specifically developed for digital electronic movie presentation. It contains extended colour management and control, and enhanced contrast performance. By the middle of
2002 more than 100 DLP Cinema technology-based projectors had been installed in commercial cinemas around the world showing ‘first run’ feature films.
(Texas Instruments Inc) Digital Micromirror Device. A silicon integrated circuit used to modulate light in a wide variety of applications. The most common use is in electronic projection systems where one or more devices are used to create high quality colour images. The device is a memory circuit whose elements are arranged in a display format array. Each element contains a square aluminium mirror which can tilt about its diagonal axis. The content of the memory cell causes the mirror to move from one tilt position to the other. By changing the memory data, the mirror can be switched very rapidly to create pulses of light whose duration causes the pixel to appear at a particular brightness. DMDs are produced at different sizes according to the resolution required.
The smallest contains over 500,000 mirrors. Devices with 1280 x 1024 (SXGA) have been widely used in digital cinema applications.
See also: DLP, DLP Cinema
Dolby Digital (DD/AC-3)
A digital audio compression system that uses auditory masking for compression.
It works with from 1 to 5.1 channels of audio and can carry Dolby Surround coded twochannel material. It applies audio masking over all channels and dynamically allocates bandwidth from a ‘common pool’. Dolby Digital is a constant bit rate system supporting from 64 kb/s to 640 kb/s rates; typically 64 kb/s mono, 192 kb/s two-channel, 320 kb/s 35 mm Cinema 5.1, 384 kb/s Laserdisc/DVD 5.1 and DVD 448 kb/s 5.1.
DVD players and ATSC receivers with Dolby Digital capability can provide a backwardcompatible mix-down by extracting the five main channels and coding them into analogue Dolby Surround for Pro Logic playback.
An audio compression scheme which can encode/decode up to eight channels plus metadata – typically 5.1 mix (six channels) and Rt/Lt (Right Total/Left Total surround) or stereo two-channel mix, etc. – onto two AES/EBU bitstreams at 1.92 Mb/s (20-bit audio at 48 kHz). Thus video recorders, typically with four channels, can support the greater channel requirements of DVD and some DTV systems (e.g. ATSC). With audio frames matching video frames, Dolby E is a professional distribution coding system for broadcast and post production which maintains quality up to 10 code/recode cycles.
See also: Dolby Digital
Dolby Surround (a.k.a. Dolby Stereo, Dolby 4:2:4 Matrix)
Analogue coding of four audio channels – Left, Centre, Right, Surround (LCRS), into two channels referred to as Right Total and Left Total (Rt, Lt). On playback, a Dolby Surround
Pro Logic decoder converts the two channels to LCRS and, optionally, a sub-woofer channel. The Pro Logic circuits are used to steer the audio and increase channel separation. The Dolby Surround system, originally developed for the cinema, is a method of getting more audio channels but suffers from poor channel separation, a mono limited bandwidth surround channel and other limitations.
A Dolby Surround track can be carried by analogue audio or linear PCM, Dolby Digital and MPEG compression systems.
Field dominance defines whether a field type 1 or type 2 represents the start of a new
TV frame. Usually it is field 1 but there is no fixed rule. Dominance may go unnoticed until flash fields occur at edits made on existing cuts. Replay dominance set the opposite way to the recording can cause a juddery image display. Much equipment, including Quantel’s, allows the selection of field dominance and can handle either.
Down conversion is down-resing that includes changing vertical refresh rates.
For instance, moving from 1080/60I to 576/50I is a down-conversion.
Decreasing the size of video images to fit another format. Typically this reduces an HD format to an SD format and, as the input images represent over sampled versions of output, the final quality should be excellent – better than an SD-shot original. Moving from 1080/60I to 480/60I is down-resing. Technically the process involves: spatial interpolation to reduce size while retaining quality, colour correction to compensate for the difference in HD and SD colour standards and possibly re-framing to fit 16:9 HD onto 4:3 SD. Note that down-res does not include any change of frame rate.
Dynamic RAM (Random Access Memory). High density, cost-effective memory chips
(integrated circuits). RAMs are used extensively in computers and generally in digital circuit design. In digital video equipment they also make up stores to hold pictures.
Being solid state there are no moving parts and they offer the fastest access for data.
Each bit is stored on a single transistor and DRAM chips must be powered and clocked to retain data. Synchronous DRAM (SDRAM) is faster, running up to 200 MHz clock rate.
DDR SDRAM is Double Data Rate SDRAM and is increasing the performance of many of the newer PC and graphics products. Current available capacities are up to 512 Mb per chip. Their fast access has allowed DRAM to replace more expensive SRAM in some applications.
Alteration of timecode to match the 1000/1001 speed offset of NTSC transmissions and many newer HD video formats used in NTSC countries. 525-line NTSC at a nominal
30 fps actually runs at 29.97 fps and 1080-line HD uses the same frame rate. Even the
24 fps of film gets modified to 23.97. With timecode locked to the video, it needs to make up 1 in 1001 frames. It does this by counting two extra frames every minute while the video remains continuous. So 10:35:59:29 advances to 10:36:00:02. In addition, at every ten-minute point the jump is not done. This brings the timecode time almost exactly into step with the video.
Timecode that does not use drop-frame is then called non drop-frame time-code.
Confusion arises when the wrong one is used!
Digital Subscriber Line. A general term for a number of techniques for delivering data over the telephone local loop (exchange to user). Referred to generically as xDSL these offer much greater data speeds than modems on analogue lines – up to 6 Mb/s.
Digital Satellite Service. One of the terms used to describe DTV services distributed via satellite.
DTF and DTF-2
Digital Tape Format for storing data on half-inch cassettes at high data density on the tape and offering fast read and write speeds. Generally it is used for long-term filebased storage and the modern DTF-2 can store 200 GB (uncompressed) per cassette with a sustained data rate of 24 MB/s. In television/digital film applications DTF is often used as the archive in a facility with networked workstations.
Digital Terrestrial Television. A term used in Europe to describe the broadcast of digital television services using terrestrial frequencies.
The bandwidth of SDI and HD-SDI links allow the transport of uncompressed 4:2:2 sampled video and embedded digital audio. Dual links are often used to carry larger requirements – such as video with key (4:2:2:4), RGB (4:4:4) and RGB with key
A dual link is arranged to allow some meaningful monitoring of each of the two links with standard equipment. So RGB is sent with Link A carrying full bandwidth G, half R and B (4:2:2). Link B is just half bandwidth R and B (0:2:2). RGB + Key is sent as (4:2:2) and (4:2:2).
The dual link arrangement for SD is defined in ITU-R/BT.799-2 and RP 175-1997.
(Full duplex) refers to communications that are simultaneously two-way (send and receive) – like the telephone. Those referred to as half-duplex switch between send and receive – e.g., single-channel recorders either play or record. Multi-channel servers can be full duplex.
This digital VCR format is a co-operation between Hitachi, JVC, Sony, Matsushita,
Mitsubishi, Philips, Sanyo, Sharp, Thomson and Toshiba. It uses 6.35 mm (quarter-inch) wide tape in a range of products to record 525/60 or 625/50 video for the consumer
(DV) and professional markets (Panasonic’s DVCPRO and Sony’s DVCAM). All models use digital intra-field DCT-based ‘DV’ compression (about 5:1) to record 8-bit component digital video based on 13.5 MHz luminance sampling. The consumer versions and DVCAM sample video at 4:1:1 (525/60) or 4:2:0 (625/50) video and provide two 16-bit/48 or 44.1 kHz, or four 12-bit/32 kHz audio channels onto a 4-hour
30-minute standard cassette (125 x 78 x 14.6 mm) or smaller 1-hour ‘mini’ cassette (66 x 48 x 12.2 mm). The video data rate is 25 Mb/s.
Digital Video Broadcasting, the Group, with over 200 members in 25 countries, which developed the preferred scheme for digital broadcasting in Europe. The DVB Group has put together a satellite system, DVB-S, that can be used with any transponder, current or planned, a matching cable system, DVB-C, and a digital terrestrial system, DVB-T. DVB is the predominant digital TV standard around the world. Notable exceptions are ATSC in the USA and ISDB in Japan (which is similar to DVB).
DVB over IP
Expression for delivery of digital television services (DVB) to homes over broadband IP networks. Typically this will be over cable so that the supplier can achieve the ‘triple play’ – bundling voice (over-IP) telephone as well as Internet with the television service.
This has great potential for interactive television as it includes a built-in fast return link to the service provider.
The DVB-T is a transmission scheme for terrestrial digital television. Its specification was approved by ETSI in February 1997 and DVB-T services started in the UK in autumn
As with the other DVB standards, MPEG-2 sound and vision coding are used. It uses
Coded Orthogonal Frequency Division Multiplexing (COFDM), which spreads the signals over a large number of carriers to enable it to operate effectively in very strong multipath environments. The multipath immunity of this approach means that DVB-T can operate an overlapping network of transmitting stations with a single frequency.
In the areas of overlap, the weaker of the two received signals is rejected.
Sony’s professional variant of native DV which records a 15-micron (15x10 -6 m, fifteen thousandths of a millimetre) track on a metal evaporated (ME) tape. DVCAM uses DV compression of a 4:2:0 signal for 625/50 (PAL) sources and 4:1:1 for 525/60 (NTSC).
Audio is recorded in one of two forms – four 12-bit channels sampled at 32 kHz, or two
16-bit channels sampled at 48 kHz.
Panasonic’s development of native DV which records an 18-micron (18x10 -6 m, eighteen thousandths of a millimetre) track on metal particle tape. DVCPRO uses native DV compression at 5:1 from a 4:1:1, 8-bit sampled source. It uses 12 tracks per frame for
625/50 sources and 10 tracks per frame for 525/60 sources. Tape speed is 33.8
mm/s and video data rate 25 Mb/s. It includes two 16-bit digital audio channels sampled at 48 kHz and an analogue cue track. Both Linear (LTC) and Vertical Interval
Time Code (VITC) are supported.
In many ways this is a x2 variant of DVCPRO with a tape speed of 67.7 mm/s, a video data rate of 50 Mb/s and using 3.3:1 video compression, it is aimed at the studio/higher quality end of the market. Sampling is 4:2:2 to give enhanced chroma resolution, useful in post production processes (e.g. chroma keying). Four 16-bit audio tracks are provided.
A series of VTRs for use with HDTV, in many ways these are x2 variants of DVCPRO 50 with a tape speed of 135.4 mm/s and a data rate of 100 Mb/s. Sampling is 4:2:2.
There are eight 16-bit, 48 kHz audio tracks. Formats supported include 1080I and 720P.
Digital Versatile Disk – a high-density development of the compact disk. It is the same size as a CD but stores upwards from 4.38 GB of actual data (seven times CD capacity) on a single-sided, single-layer disk. DVDs can also be double-sided or dual-layer – storing even more data.
The capacities commonly available at present:
DVD-10 Double-sided, single-layer
DVD-18 Double-sided, dual-layer
DVD-5 and DVD-9 are widely used. However the double-sided disks are quite rare, partly because they are more difficult to make and they cannot carry a label.
Future versions will have capacities rising from the current 4.38 GB to 15 GB per layer with blue laser technology in the medium term. Blu-ray disk technology now promises
27 GB per side and up to 50 GB in future.
Recordable DVDs with a data capacity of 4.38 GB. These are popular and low in price.
Re-recordable DVD. This is a record-many-times (around 100,000) DVD with capacity of
4.38 GB (single-sided). For television, this medium is being used in some new camcorders and offers instant access to shot material and record loop features – useful when waiting to record an event, like a goal, to happen. At home it could provide a removable media alternative to VHS. A particular feature is that it can record and replay at the same time.
This combines the DVD optical disk with MPEG-2 video compression for recording video on a CD-sized disk and has multi-channel audio, subtitles and copy protection capability.
To maximise quality and playing time DVD-Video uses variable bit rate (VBR) MPEG-2 coding where the bit rate varies with the demands of the material. Typically a 525/60 TV format, 24 fps movie would use an average bit rate of 3.5 Mb/s, but for sections with a great deal of movement it could peak at 8 or 9 Mb/s. Only 24 fps are coded onto the disk, the 3:2 pull-down conversion to 30 fps being performed in the player. This allows a
120 minute 24 fps movie to fit on a DVD-5. To store video (not film) with 50 or 60 discrete fields per second, the bit rate tends to average around 6 or 7 Mb/s, but again depends on the running time, original material and desired picture quality.
Reverse Spiral Dual Layer – For continuous playback of long movies, dual layer DVD-9 disks can employ a reverse spiral so that the second layer starts where the first layer ends. This transition is supported by all DVD-Video players.
DVD-Video was the first domestic format natively to support anamorphic 16:9 video, producing pictures with full vertical resolution on 16:9 widescreen TV sets. Previously, widescreen pictures would have been letterboxed within a 4:3 video frame and their proportions changed by the 16:9 TV sets, resulting in reduced vertical resolution.
For viewers with 4:3 TV sets the DVD player can create a 16:9 letterboxed image within a 4:3 frame.
DVD-Video supports PCM, MPEG and Dolby Digital audio, which can support anything from mono, stereo, Dolby Surround to 5.1 channels. It must use at least one of the formats and can have others as well. Digital Theatre Sound (DTS) and Sony Dynamic
Digital Sound (SDDS) are options. Up to eight separate audio streams can be supported, allowing multiple languages, audio description, director’s commentary etc.
For example, a release may have 5.1 Dolby Digital English, two-channel Dolby Digital
Spanish with Dolby Surround, and mono French.
Disks can be region-coded so as only to play in a particular region (as defined in the player), a set of regions or be ‘code-free’. A region-coded disk can only play on a player that is allowed by the coding.
The region numbers are:
1. Canada, US, US Territories
Japan, Europe, South Africa, Middle East (including Egypt)
3. Southeast Asia, East Asia (including Hong Kong)
4. Australia, New Zealand, Pacific Islands, Central America, South America, Caribbean
5. Former Soviet Union, Indian Subcontinent, Africa (also North Korea, Mongolia)
Website www.dvddemystified.com (for DVD FAQ)
Digital Video Effects (systems). These have been supplied as separate machines but increasingly are being included as an integral part of systems. The list of effects varies but will always include picture manipulations such as zoom and position and may go on to rotations, 3D perspective, page turns, picture bending, blurs etc. Picture quality and control also vary widely.
See also: Axis, Global
DVTR – Digital Video Tape Recorder. The first DVTR for commercial use was shown in
1986, working to the ITU-R BT.601 component digital standard and the associated
D1 standard for DVTRs. It used 19 mm cassettes recording 34, 78 or (using thinner tape)
94 minutes. Today many DVTR formats are available for HD as well as SD and from professional to consumer standards. Apart from D2 and D3, all are based on component video.
Multiple generations on DVTRs do not suffer from degradation due to tape noise, moiré, etc. and dropouts are mostly invisible due to sophisticated correction and concealment techniques. However tape is subject to wear and tear and the resulting errors and dropouts necessitate complex error concealment circuitry. In extreme cases multiple passes can introduce cumulative texturing or other artefacts. The safest haven for work that requires heavy multi-generation work is a disk-based system.
See also: D1, D2, D3, D5, D6, D7, D9, Digital Betacam, Betacam SX, DV, DVCPRO,
Media storage technology first developed by Quantel in 1993 to support SD editing and compositing systems. It has continually expanded and now provides features beyond those of generic storage solutions for a wide range of media including true random access and large bandwidth for real-time operation as well as background tasks.
It makes an excellent store for NLE systems (eQ, iQ, Infinity and Editbox), for compositing (gQ, and Paintbox FX) and for servers (sQserver and Clipbox Power).
There is RAID-3 protection so that, should a disk drive fail, operation continues and no data is lost. Implemented in Quantel’s generationQ systems it is resolution co-existent, holding any formats including SD, HD and film, and colour space independent storing
RGB, Y,Cr,Cb, etc. It does not need de-fragmenting and the total storage capacity is therefore always available.
Dynamic Rounding is a mathematical technique devised by Quantel for truncating the word length of pixels – usually to their normal 8 or 10 bits. Rather than simply losing the lower bits, it uses their information to control, via a randomiser, the dither of the LSB of the truncated result. This effectively removes any artefacts that would otherwise be visible, is non-cumulative on any number of passes and produces statistically correct results. Other attempts at a solution have involved increasing the dynamic range of equipment by raising the number of bits, usually to 10, making the size of LSBs smaller but without removing the inherent problem.
16 bits to 8 bits
Pseudo-random binary sequence generator
8 Higher order
A > B
There are many instances in digital systems where a number uses more bits than the system normally accommodates. This has to be rectified in a way that will keep as much information as possible and not cause noticeable defects – even after many processes.
A common example is image processing which requires that two signals are multiplied, as in digital mixing. Assuming the equipment is nominally 8-bit, the mixing produces a 16-bit result from two original 8-bit numbers. At some point this has to be truncated, or rounded, back to 8-bits either to fit within the structure of the equipment or to be accepted by external interfaces. Simply dropping the lower bits can result in visible contouring artefacts especially when handling pure, noise-free, computer generated pictures.
Dynamic Rounding is licensable from Quantel for SD applications and is used in a growing number of digital products both from Quantel and other manufacturers.
Handling the Bits
Although bit depth, or dynamic, resolutions for SD and HD are defined in the relevant standards, an awareness of their effects is important for connecting the digital facility and possibly recognising limitations of equipment design. The ITU-R BT.601 digital sampling standard for SD specifies that both luminance and chrominance information in a picture can be sampled at 8 or 10 bits. For HD, the much more recent ITU-R BT.709
also specifies both. Based on many years’ study of signals, parameters and, most important, viewers’ judgement, 601 originally specified 8 bits only. The analogue-digitalanalogue path appeared to work well for synchronisers, standards converters and digital effects systems. Now the recorders and cameras provide higher quality pictures and the more modern post production techniques are far more demanding of digital signal accuracy. As a result, the performance of 10-bit equipment is more widely needed – especially for high-end SD post production, HD in general and for digital film.
Inherently, camera’d images have a degree of random noise. This actually helps the digitisation process as it greatly reduces the possibility if displaying ‘banding’ due to quantizing issues. Computer-generated pictures can be more testing as they can produce images that are perfectly noise-free. In the following example, assume 8 bits are used - offering 2 8 , or 256 possible values (actually, only 220 are used) to describe brightness values from black to white. An image filled with a flat, uniform grey is described digitally by just one luminance value, say 110, repeated over the picture.
Another picture changes from mid-grey at its top to slightly darker grey at the base – over values 110 to 100. SDTV presents around 500 lines down the screen so the picture will be displayed as 10 bands each of 50 lines height with luminance values of 110, 109,
108 etc. Since the eye is very sensitive to the boundaries of large, even areas the very small changes may well be noticed – but may not be classified as objectionable.
A much greater problem arises when two such images are mixed, with a transition going through from the flat field to the graduated image. The mixing is a simple mathematical process (see Digital mixing) involving the multiplication of the two 8-bit values resulting in a 16-bit number. To show the true result the signal must pass through the system bus, which has been designed to be 8 bits wide, to the 8-bit output processor for conversion back to analogue for viewing. So, what happens to the 16-bit number? It has to be truncated and cut down to 8 bits. The simple way is to forget the bottom 8 bits which introduces, on average, a 50% error into the LSB of the truncated result. This way, the result will show severe contouring or banding which will be most objectionable as the bands move up or down the screen while the mix progresses – and the eye is very sensitive to movement. It was precisely this ‘banding’ that led some engineers to believe that 8 bits were not enough.
Keeping the methods the same but increasing resolution to 10 bits reduces the amplitude of the contour steps to a quarter of their previous value – and increases the number of bands by four times. The problem is smaller and less noticeable but still there. The 10-bit idea attempts to correct these symptoms, but does not address the cause which is poor truncation.
A solution, known as Dynamic Rounding, was developed by Quantel. In this case it can truncate the 16-bit words to 8 taking account of the value of the lost bits to weight the amount of least significant bit (LSB) dither. This ‘intelligent’ rounding effectively makes the dynamic resolution independent of the number of bits – allowing levels between the two
LSBs to be perceived. The result is perfectly smooth; there is no contouring or banding visible. 8-bit processing with Dynamic Rounding can produce better results than 10-bit without it. Other techniques, such as pulling keys from video images and colour correction, make demands on the images beyond the needs of our eyes. So, although
8-bit working with Dynamic Rounding may look good, better all-round performance is possible with 10 bits and Dynamic Rounding.
Given that distortions may appear when the word length is truncated, attention must be paid to system workflow where both 8 and 10-bit equipment is used together. Going from
8-bit to 10-bit presents no special problem but from 10 to 8 will necessarily involve truncation and such inputs need the rounding process, otherwise distortions will be produced. A not-uncommon effect is the appearance of a green tinge on pictures.
This can be caused by the loss of the lower two bits of the 10-bit Cr and Cb digital components, effectively reducing their value by an average of half the eighth bit – the new
LSB. The combined Cr and Cb shift is toward green. It is small but noticeable in areas of low saturation. The solution does not necessarily require introducing more 10-bit equipment, but the use of a good truncation technique at the input. Another, technically less appealing, solution is to use analogue connections between digital machines – letting the ADC/DAC/ADC process make the conversion. Even if all studio equipment were 10-bit, truncation would still be required to reduce the 20-bit results back to 10 bits.
In the end the signals are delivered via MPEG-2. MPEG-2 is based on 8 bits.
As the ever-increasing facilities of post production become more complex so the processing techniques make more demands on the detail of the pictures themselves.
10 bits is now widely used at SD. Dynamic Rounding is a solution where word lengths of digital images have to be truncated – within a machine or to connect the machine to another.
European Broadcasting Union. An organisation comprising European broadcasters which co-ordinates production and technical interests of European broadcasting. It has within its structure a number of committees which make recommendations to ITU-R.
Error Check and Correct. This system appends check data to a data packet in a communications channel or to a data block on a disk, which allows the receiving or reading system both to detect small errors in the data stream (caused by line noise or disk defects) and, provided they are not too long, to correct them.
Edit Decision List. A list of the decisions that describe a series of edits – often recorded on a floppy disk. EDLs can be produced during an off-line session and passed to the online suite to control the conforming of the final edit. In order to work across a range of equipment there are some widely adopted standards such as CMX 3400 and 3600.
News journalists working with integrated news production systems such as Quantel’s generationQ news systems, can effectively create EDLs at their desk tops.
EDLs have been frozen in time and not kept pace with the continued development of post production. They do not carry information on DVEs, colour correction, layering, keying etc., or carry other information about ownership, rights, etc. The development of
AAF has filled the gaps.
Electronic programme guides (EPG)
DTV allows broadcasters to transmit electronic programme guides. For many, this service is considered essential to keep viewers up to date with, and enable them to navigate between, the increased number of channels DTV brings. The programme guide database allows a receiver to build an on-screen grid of programme information and contains control information to ease navigation.
Audio that is carried within an SDI data stream – so simplifying cabling and routing.
The standard (ANSI/SMPTE 272M-1994) allows up to four groups each of four mono audio channels. Generally VTRs only support Group 1 but other equipment may use more.
48 kHz synchronous audio sampling is pretty well universal in TV but the standard also includes 44.1 and 32 kHz synchronous and asynchronous sampling. Synchronous means that the audio sampling clock is locked to the associated video (1920 samples per frame in 625/50, 8008 samples per five frames in 525/60). Up to 24-bit samples are allowed but mostly only up to 20 are currently used.
48 kHz sampling means an average of just over three samples per line, so three samples per channel are sent on most lines and four occasionally – the pattern is not specified in the standard. Four channels are packed into an Ancillary Data Packet and sent once per line (hence a total of 4 x 3 = 12 or 4 x 4 = 16 audio samples per packet per line).
The process of coding data so that a specific code or key is required to restore the original data. In conditional access broadcasts this is used to make transmissions secure from unauthorised reception and is often found on satellite or cable systems.
Today, the growth of digital services to homes is in danger of being held back because of the content owners’ concern about piracy – digital copies being perfect clones of their valuable assets. Encryption and content security are vital to the growth of digital media markets.
Electronic News Gathering. Term applied to a small portable outfit, with a broadcast quality TV camera, VTR and/or microwave link, usually used for news. The term was originated to distinguish between news gathering on film and video tape (electronic).
Also refers to compatible studio or portable editing equipment.
A point in a coded bit stream from which a complete picture can be decoded without first having to store data from earlier pictures. In the MPEG-2 frame sequence this can only be at an I-frame – the only frames encoded with no reference to others.
Error detection, concealment and correction
No means of digital recording is perfect. Both magnetic tape and disks suffer from a few marginal areas where recording and replay is difficult or even impossible. However the errors can be detected and some remedial action taken by concealment or correction.
The former attempts to hide the problem by making it less noticeable whereas the latter actually corrects the error so that perfect data is output.
When the recorded data is an image, an error can simply be concealed by using data from previous or following TV lines, fields or frames. The result is not guaranteed to be identical to the original but the process is relatively simple and, as important, quick.
If the stored information is from a database, a computer program or from special image processing, then 100% accuracy of data is essential. This can be ensured by recording data in a manner where any errors can be detected and the correct data calculated from other information recorded for this purpose. This is error correction.
A difference between computer systems and TV is that the latter is continuous and cannot wait for a late correction. Either the correct result must be ready in time or some other action taken – the show must go on – placing a very tight time constraint on any TV-rate error correction. In contrast, a computer can usually afford to wait a few milliseconds.
Digital VTRs monitor the error rate and provide warnings of excessive errors, which although not immediately visible, may build up during multiple passes.
Although error rates from disks are generally many times lower than those expected from digital videotape, they can still occur. To protect against this there is data redundancy and the replay of all data is checked. If an error is detected there is sufficient additional information stored to calculate and substitute the correct data.
The total failure of a disk drive can be covered and the missing data re-generated and recorded onto a new replacement – making the system highly accurate and very secure.
The material that television programmes are made of. In other words, the video, audio and any other material such as graphics and captions that are added to make up the final result.
Ethernet is a form of Local Area Network (LAN) widely used for interconnecting computers and standardised in IEEE 802.3, allowing a wide variety of manufacturers to produce compatible interfaces and extend capabilities – repeaters, bridges, etc.
The data transmission rate is 10, 100 or 1,000 Mb/s, but overheads in packaging data and packet separation mean actual throughput is less than the bit rate. Using Carrier
Sense Multiple Access/Collision Detect (CSMA/CD) a would-be talker on the net, rather than waiting their turn (as on a Token Passing Ring LAN) simply waits until the cable is free.
There are many connection methods for Ethernet varying from copper to fibre optic.
Currently the three most common are:
Thinwire Ethernet using relatively low cost 50 ohm coaxial cable and BNC connectors. The maximum length, without repeaters, is 180m, which can connect up to
The standard for 4-wire twisted pair cable using RJ connectors. This gives extremely low cost-per-node network capabilities.
(a.k.a. Fast Ethernet) 100 Mb/s 4-wire twisted pair cable using RJ connectors is now becoming very popular. Similar technology to 10 Base-T but uses
Cat. 5 cable.
Development of existing Ethernet technology to support 1,000
Mb/s. This is specified for both fibre and copper .
10 Gigabit Ethernet
is currently a draft standard. It is different as it will only function over optical fibre and only operate in full-duplex mode – so collision detection protocols are unnecessary. But the packet format and the current capabilities are easily transferable to the new draft standard.
The European Telecommunications Standards Institute. Its mission is to produce lasting telecommunications standards for Europe and beyond. ETSI has 730 members from 51 countries inside and outside Europe, and represents administrations, network operators, manufacturers, service providers, research bodies and users.
A compression technique, based on DCT. Unlike MPEG, which is asymmetrical having complex coders and simpler decoders and is designed for broadcast, this is symmetrical with the same processing power at the coder and decoder. It is designed for applications where there are only a few recipients, such as contribution links and feeds to cable head ends. ETSI compression is intra-frame, simpler than MPEG and imposes less delay in the signal path, typically 120 milliseconds against around a second, enabling interviews to be conducted over satellite links without unwarranted delays. Data rate is 34 Mb/s.
A colloquial term for a Sony IMX VTR with an Ethernet connection.
The mathematical operation of EXclusive OR logic gate on a number of data bits.
For example the EXOR of two bits is 1, only if one of them is 1. The EXOR is widely used in data recovery (see RAID). If the EXOR of a number of blocks of data is stored, when one of those blocks is lost, its contents can be deduced by EXORing the undamaged blocks with the stored EXOR.
A suite of colour manipulation controls provided on some Quantel systems. Operating in
Y, Cr, Cb and RGB colour space Fettle provides sophisicated secondary colour correction tools.
An integrated set of standards developed by ANSI designed to improve data speeds between workstations, supercomputers, storage devices and displays while providing a single standard for networking storage and data transfer. It can be used point-to point, switched or in an arbitrated loop (FC-AL) connecting up to 126 devices.
Planned to run up to 4 Gb/s on a fibre-optic or twisted-pair cable, the current top rate available is 2 Gb/s and most in use is 1 Gb/s. These are nominal wire speeds but
8b/10b encoding is used to improve transmission characteristics, provide more accuracy and better error handling. With every 8-bit data byte for transmission converted into a 10-bit Transmission Character, the useful data rate is reduced by
Because of its close association with disk drives, its TV application is mostly, but not always, in the creation of storage networking. It can interface with SCSI, the widely used disk interface, which is key to its operation in storage networking such as SAN.
A discontinuous transfer process which treats each transferred item as a single block, neither divisible into smaller, independent elements nor part of a larger whole. As the transfer process has a recognisable beginning and end (unlike streaming) it is possible for the complete transfer to be checked and any errors corrected. This is not possible with a streaming process.
File transfer has the disadvantage that the material to be transferred has to be complete and clearly identifiable. When handling time-based material such as video and audio, as against stills, this has the disadvantage that the complete file has to be available before transfer can start. If the whole clip is a single file, this cannot be transferred until complete. However, if the clip is sub-divided, for example into frames, the transfer can start immediately after the first frame is completed.
See also: Streaming
Changing the format of television images without changing the vertical (frame or field) refresh rate. So, starting with 1920 x 1080/50I and converting to 720 x 576/50I is a format conversion. This only alters the format spatially, changing the vertical and horizontal size which is a relatively straightforward task.
General term to define television pictures by the number of active pixels per line, number of active lines. For example, SD digital television in Europe has a format of 720 x
576 and 1920 x 1080 is an HD format.
The scattering of data over a (disk) store caused by many recording and deletion operations. Generally this will eventually result in the store becoming slow – a situation that is not acceptable for video recording or replay. The slowing is caused by the increased time needed to access randomly distributed data or free space. With such stores de-fragmentation routines arrange the data (by copying from one part of the disk to another) so that it is quickly accessible in the required order for replay. Clearly any change in replay, be it a transmission running order or the revision of an edit, could require further de-fragmentation.
Stores capable of true random access, such as Quantel’s Clipbox Power and sQServer are able to play frames in any order at video rate, and never need de-fragmentation.
Quantel term describing an advanced form of the management of video in a server.
This covers much ground but basically offers the practical goals of guaranteeing real-time access to any frame for all video connections and avoiding the deletion of any material by one user that is partly or wholly used by another.
This is achieved by implementing a number of basic design criteria including true random access that stores video material as a series of individual frames, rather than longer files, as well as an on-board real-time database management system which, among other things, tracks who is using what material. Frame Magic has been implemented in Quantel’s Clipbox Power and sQserver.
The Importance of Being Accessible
For servers, one vital statistic that should never be overlooked is access. The whole idea of servers is to offer fast, straightforward access for all its users. Video has its own specific requirements in that it constitutes a vast amount of data and, for any live work, recording or playout, suitably high data rates absolutely must be maintained. There are three particular issues involved: bandwidth, fragmentation and management which become even more critical when server-editing is required. In Quantel servers, these all come under the umbrella of Frame Magic.
Meeting bandwidth requirements is usually achieved by use of a disk array, such as a
RAID. The combined data rate of the multiple disks, suitably combined, can be made sufficient to run several live video outputs at once. However, the performance is liable to deteriorate as the system is used because successive recordings and deletions mean that the stored data becomes fragmented rather than being recorded on successive tracks. A quick look at a disk drive specification (see www.seagate.com) gives one figure for track-to-track seek time (around 1 millisecond) and another for the average
(~7-8 milliseconds) and a third for ‘full disk’ of around 17 milliseconds. Another factor is latency – waiting for the disk to rotate to the start of wanted data (see Disk drives).
As video applications demand very high data rates, fragmentation means much precious time is spent placing a disk’s read/write heads over the next required data before it can begin to be read. The result is a very significant slowdown in data rate caused by lengthening access times. One solution is to defragment the data but this can take a long time and put the server out of service – effectively temporarily alleviating the symptoms rather than addressing the cause of the problem. The other is to design the store to operate to full specification even when the data is fully fragmented – i.e. to design it to operate fully fragmented. This is the basis of Quantel’s true random access technology (See True random access). A part of this involves storing each frame as a separate file – contrasting with simpler schemes where each file contains many frames. Thus each frame is directly and instantly addressable – without the need to access a larger file and then find the required frame. Any frame can be instantly edited to any other frame. This is perfect for editing and offers very flexible operation. Longer files may suit applications such as programme playout but may not suit working with shorter items, such as interstitials – and certainly editing would be very limited. However, this makes the management of the store more complex.
With each frame stored as a file, a ten-hour server has to keep track of a million frames.
Server management is handled to a greater or lesser degree by different suppliers but the best solution is to treat this as a function internal to the server. The rationale is that
manufacturers know exactly the characteristics of their own equipment and can make the best job of managing it. Also it should reduce overall system costs and the need for any custom-made software. This has to be a more effective solution than relying on third parties who only have access to the necessarily more limited applications interface handed on by the server manufacturer.
To keep track of all material requires a real-time database tracking every frame that is recorded, edited or replayed by any users on the server. At one level, when a frame, clip, or entire programme is replayed, the user then just has to call up one item. All the related frames are located and their order of replay controlled so that the required realtime sequence is output. Similarly, for recordings, all the spare space, however fragmented, is logged so the operator can simply record and not be concerned about exactly where the material is located on the store. This does away with the need to consolidate available disk space in order to make recordings on contiguous tracks.
It also avoids the unexpected ‘store full’ message appearing when you know only 70 percent is actually used, simply because it cannot be accessed fast enough.
Realtime random access to every frame makes a store ideal for editing. Such servers often form the core of modern digital news installations, where many edits may be taking place at the same time and this requires more attention to the management of frames and their associated audio. At one level, such a server is able to perform cut edits – simply by instructing the server to play out the frames in the required order.
This is often driven from a journalist’s desktop working on browse-quality video supplied by a browse server which is kept in-step with the main, broadcast quality server.
Experience shows that keeping broadcast and browse servers in step is not straightforward and it absorbs more systemisation costs. A more logical solution is to place both functions under one roof and under control from one management system.
Apart from the prospect of better broadcast/browse synchronisation this cuts down on hardware and keeps responsibility for all server operation in one place.
As the server can perform live cut edits it is possible to connect edit stations that, for news, are basically applications running on journalists’ PCs. For the edit decision process, the PCs access the browse video and associated audio for journalists to make their edit decisions. The product is a form of EDL that can be instantly conformed and played directly by the server at broadcast quality. Using shared edit storage means that many editors have access to the same material used in the edits. There is a potential problem when editors wish to delete clips; has anyone else used any part of them in their edits? Without this knowledge it is all too easy to have missing shots during the news bulletin. Thus the management system needs to track not just the frames, but if they have been used in an edit by any user. If so, just those used frames are not deleted, and the rest are. This way anyone can delete their unwanted material without losing any frames in use by others. This also makes for much more efficient use of server storage.
Deletion of shared material in an edit server
Editing involves re-ordering the replay of frames.
Here two edits of the same material involve a total of six frames storage as the edits exist only as replay instructions of the source frames
Deleting the source clip will only remove the frames not in use by edits 1 & 2 (ie A and B)
Editable servers can work very efficiently with more complex edit applications where dissolves, wipes, DVEs, etc. are used. Any transition other than a cut requires processing to produce new frames, which may take place in an edit workstation, and the results passed back to the server. Other than that, the frames need never be passed to the edit workstation – thus cutting down on video transfers – saving time and saving storage.
Creating a dissolve in a random access edit/server set up
A B C D E F
A C 3f
X Z 3f
U V W X Y
Make an edit from the two
A B C X Y
Change the cut to a dissolve
A B/V C/W D/X E/Y Z B/V C/W D/X E/Y
These frames are rendered by the edit workstation and sent to the server as a ‘Dissolve’ clip
A B/V C/W D/X E/Y Z
A new EDL is formed
A A 1f
B/V E/Y 4f
Z Z 1f
Deletion: The source clips, Clip 1 & Clip 2 can be deleted but frames A, Z and the
‘Dissolve’ clip will be automatically retained until the edit is deleted
Access is the key to all these operations: the physical feat of true random access can only realise its full potential with suitable real-time database management. This needs to coordinate all frame usage including that in edits, both at broadcast and browse levels. There is huge potential in the use of servers for television. Making them easy to use and reliable in the pressured circumstances of broadcast operations is the real aim.
The name, coined by Quantel, given to solid-state video storage, usually built with
DRAMs. Technically it implies storage of one complete frame or picture, but the term is also used more generically to encompass the storage of a few lines to many frames.
With large DRAM capacities available, framestores are increasingly used to enhance equipment design.
The number of oscillations of a signal over a given period of time (usually one second).
For example, it defines subcarrier frequencies in analogue television colour coding systems, or clock rate frequencies in digital systems. Here are some commonly found frequencies in TV:
ITU-R BT.601 luminance sampling rate:
ITU-R BT.601 chrominance sampling rate:
ITU-R BT.709 luminance sampling rate:
ITU-R BT.709 chrominance sampling rate:
6.75 MHz (for 4:2:2 sampling)
37.125 MHz (for 4:2:2 sampling)
Although not appearing in any prominent headline, 2.25 MHz is important as the lowest common multiple of both 525 and 625 line frequencies. Multiples of 2.25 MHz make up many of the frequencies used for digital video sampling – including HD.
File Transfer Protocol. The high level Internet standard protocol for transferring files from one machine to another. FTP is usually implemented at the application level.
Full motion video
A general term for moving images displayed from a desktop platform. Its quality varies and is undefined.
A device connecting two computer networks. For example, a gateway can connect a local area network (LAN) to a SAN. This way a PC connected on an Ethernet LAN may have access to files stored on the SAN even though the PC is not SAN aware.
The signal degradation caused by successive recordings. Freshly recorded material is first generation; one re-recording, or copy, makes the second generation, etc. This is of major concern in analogue linear editing but much less so in a digital suite. Noncompressed component DVTRs should provide at least twenty generations before any artefacts become noticeable but the very best multi-generation results are possible with disk-based systems. These can re-record millions of times without causing dropouts or errors. Generations are effectively limitless.
Besides the limitations of recording, the action of processors such as decoders and coders will make a significant contribution to generation loss. The decode/recode cycle of NTSC and PAL is well known for its limitations but equal caution is needed for digital video compression systems, including MPEG and the colour space conversions that typically occur between computers handling RGB and video equipment using Y, Cr, Cb.
The top level of control in a multi-channel DVE system. A number of objects (channels) can be controlled at one time, for example to alter their opacity or to move them all together relative to a global axis – one which may be quite separate from the objects themselves. This way the viewing point of all the assembled objects can be changed.
For example, a cube assembled from six channels could be moved in 3D space as a single action from a global control.
GOP (Group Of Pictures)
General Purpose Interface. This is used for cueing equipment – usually by a contact closure. It is simple, frame accurate and therefore can easily be applied over a wide range of equipment. Being electro-mechanical it cannot be expected to be as reliable or sophisticated as pure electronic controls.
The United States grouping that was formed in May 1993, to propose ‘the best of the best’ proposed HDTV systems. The participants were: AT&T, General Instrument
Corporation, Massachusetts Institute of Technology, Philips Consumer Electronics,
David Sarnoff Research Centre, Thomson Consumer Electronics and Zenith Electronics
Corporation. The Grand Alliance played a big part in arriving at the ATSC digital television standard.
Term used to describe limits of accuracy or resolution – usually of editing. For example, the granularity of uncompressed component video (601 or 709) is one frame; i.e. it can be cut on any frame. The granularity of long GOP MPEG-2 is about half a second.
Graphical User Interface. A means of operating a system through the use of interactive graphics displayed on a screen. Examples in the computer world are the Apple
Macintosh and Microsoft Windows, both designed for general-purpose use and usually operated with a mouse as the pointing device.
In 1981 Quantel introduced Paintbox with its on-screen menu system operated from a pressure sensitive pen and touch tablet. The control has been further developed to cover a wide range of operations including DVEs, editing, VTR control and audio and today is applied to the whole range of Quantel products. Besides its success in offering fast and effective control, the GUI also enables easy updates to accommodate new facilities.
Short form for HDTV.
Assigned D11, this is a series of VTRs based on the Betacam principles for recording
HD video on a tape format which uses the same style of cassette shell as Digital
Betacam, although with a different tape formulation. The technology supports both
1080 and 1035 line standards. Various methods are believed to be used to reduce the video data including pre-filtering, DCT-based intra-frame compression and sampling at around 3:1:1. Together these are said to provide data reduction of between 7 and
10:1. Four non-compressed audio channels sampled at 48 kHz, 20 bits per sample, are also supported.
HDCam camcorders provide HD acquisition and recording in a compact ergonomic package. One model is specifically for 1080/24P operation and aimed at addressing some current film areas.
A D5 VTR that is able to handle component high definition signals. Using around 5:1 compression the signals connect via an HD-SDI link. HD D5 can be multi-format, operating at both SD and HD TV standards. It can replay 525-line D5 as well as HD D5 cassettes. Formats include 480/60I, 1080/24P, 1080/60I, 1080/50I, 1035/59.94I and
720/60P. The recorder can also slew between 24 and 25 Hz frame rates for PAL programme duplication from a 1080/24P master. Cassette recording times vary according to format, the longest is 155 minutes for 1080/24P.
Standardised in SMPTE 292M, this is a high definition version of the SDI (Serial Digital
Interface) used for SD television. The serial bit-stream runs at 1.485 Gb/s to carry up to
10-bit Y,Cr,Cb component video as well as audio and ancillary data. It extends the use of the coax cable and BNC connector ‘plug-and-play’ interface familiar to television operations for decades. The interface is also specified for fibre for distances up to 2 km.
High Definition Television. A television format with higher definition than SDTV. While DTV at 625 or 525 lines is usually superior to PAL and NTSC, it is generally accepted that 720line and upward is HD. This also has a picture aspect ratio of 16:9.
While there are many picture formats proposed and several in use, there is increasing consensus that 1080 x 1920/24P is a practical standard for global exchange. Many are planning to produce on this format.
A numbering system, often referred to as ‘Hex’, that works to base 16 and is particularly useful as a shorthand method for describing binary numbers. Decimal 0-9 are the same as Hex, then 10 is A, 11 is B, up to 15 which is F.
High performance parallel interface (ANSI X3.283-1996). Capable of transfers up to 200
MB/s (800 with the 6400 Mb/s HiPPI under development a.k.a. GSN) it is targeted at high performance computing and optimised for applications involving streaming large volumes of data rather than bursty network activity. The parallel connection is limited to short distance and so Serial HiPPI is now available.
A Quantel term describing the ability to instantly recall material in uncommitted form along with the associated set-up data. This allows changes to be made quickly and easily, for example a shadow could be softened or moved in a multi-layered commercial without having to find the original material or recalling the set-up data. Archiving a programme containing History means that the operator no longer needs to remember to save Packs or set-ups associated with the job as all the material and set-ups will be automatically included within the archive.
The High Speed Data Link is typically used to move uncompressed 2k, 10-bit RGB images (as used for digital film) in a facility. The data volumes involved are very large; each image is 12 MB, and at 24 fps this data amounts to 288 MB/s.
HSDL provides an efficient transport mechanism for moving and sharing data between applications. It uses two SMPTE 292M 1.485 Gb/s serial links (HD-SDI) to provide nearly 3 Gb/s bandwidth and can result in close to real-time transfers at up to 15-20 fps. Use of the SMPTE 292M data structure means the signal can be carried by the HD-
SDI infrastructure – cabling, patch panels and routers that may already be in place for
HSDL images can be imported as data into a workstation fitted out with dual HD I/O making them available for film restoration, compositing, editing, and film recorders.
Archiving and transporting HSDL material can be done with data transports such as
DTF2 or D6.
Hierarchical Storage Management is a scheme responsible for the movement of files between archive and the other storage systems that make up hierarchical storage architecture. Typically there may be three layers of storage – online, near-line and offline
– that make up the hierarchy that HSM manages.
Connects many network lines together as if to make them all part of the same wire.
This allows many users to communicate but, unlike a switch, only one transaction can occur at once over the whole network.
This compresses data by assigning short codes to frequently occurring long sequences and longer ones to those that are less frequent. Assignments are held in a Huffman
Table. Huffman coding is lossless and used in video compression systems where it often contributes around a 2:1 reduction in data.
Integrated Digital TV receiver. For viewers to receive DTV services they require a receiver either in the form of a new television set with the tuner and digital decoder built in (IDTV) or a set top box. The ATSC in the USA has produced a number of recommendations for manufacturers. Among these it states that the receiver should be capable of appropriately decoding and displaying the video scanning formats defined in ATSC’s
Table 3 and the audio services defined in Table 2.
A receiver’s ability to receive and display the services in Table 3 does not necessarily mean that it can display all the 2 million-pixel detail of an HD image.
Standard that defines Ethernet
IEEE 1394 (a.k.a. FireWire, I-Link)
A standard for a peer-to-peer serial digital interface which can operate at 100, 200, or
400 Mb/s. IEEE 1394A specifies working up to 400 Mb/s, typically over copper cables up to 4.5 metres in length with six-pin connectors. Consumer devices use a four-pin connector. Extenders increase the maximum distance from 4.5 metres on copper cables up to about 100 metres on glass optical fibre.
The proposed IEEE 1394B standard is nearing ratification and is intended to extend both data rate and distance. Data rates of 800 Mb/s and 1,600 Mb/s are supported and the distance is extended to over 100 metres via glass optical fibre, without the use of external extenders. IEEE 1394B will also enable sending video/audio media over 100 metres of Cat-5 cable at 100 Mb/s. Consumers will be able to connect together DV devices over longer distances using readily available low cost cables. Architecturally,
IEEE 1394B supports 3,200 Mb/s, however chip sets are not yet available and it is unlikely that products will appear on the market for a few years.
The high speed and low cost of IEEE 1394A make it popular in multimedia and, more recently, digital video applications. Early uses include peer-to-peer connections for digital dub editing between camcorders, as well as interfacing VCRs, printers, PCs, TVs and digital cameras. IEEE 1394 is recognised by SMPTE and EBU as a networking technology for transport of packetized video and audio. Its isochronous data channel can provide guaranteed bandwidth for frame-accurate real-time (and faster) transfers of video and audio while its asynchronous mode can carry metadata and support I/P.
Both modes may be run simultaneously.
I-frame only (a.k.a. I-only)
A video compression scheme in which each frame is intra-frame compressed, i.e. each frame is individually defined and does not depend on any others. There are no P
(predictive) or B (bi-directional) frames in this compression scheme. This is considered preferable for studio use as edits can be made on any frame boundaries without necessarily involving processing.
All DV compression is I-frame only. MPEG-2 with a GOP of 1 is I-frame only and is used at a data rate of 50 Mb/s in Sony’s IMX VTRs.
See MPEG-2 section
Image Light Amplifier. Technology developed by Hughes-JVC for video projection up to large screen size. The scanned electronic images are displayed on a CRT which has infrared phosphors. The resulting IR image is used to control the reflection of the projector light according to the intensity of the IR. The technology has been used up to cinema screen size to show ‘digital movies’ with full 2k resolution.
Colours that force a colour system to go outside its normal bounds – or gamut. Usually these are the result of electronically processed or painted images rather than direct camera outputs. For example, removing the luminance from a high intensity blue or adding luminance to a strong yellow in a paint system may well send a subsequent
PAL or NTSC coded signal too high or low – producing at least inferior results and maybe causing technical problems. ‘Out of gamut’ detectors can be used to warn of possible problems and correction is also available. Some broadcasters reject material with illegal colours.
Editing at a workstation which directly edits material stored in a server. For this the workstation does not need large-scale video and audio storage but depends totally on the server store. The arrangement allows background loading of new material, via several ports if required, and playout of finished results, while avoiding any need to duplicate storage or transfer material to/from the workstation and allowing any number of connected workstations to share work. The efficiency of in-server editing allows fast throughput and is specially attractive to news as well as to post production.
This depends on using a server that can act as an edit store and perform reliable video replay and record. It also requires a powerful interface to the edit workstation. Quantel’s edit workstations such as Qedit, Qcut and eQ work with sQServers that can operate this way. The workstation/server connection is by Gigabit Ethernet.
Load & replay
In-server editing workstation
In-server editing workstation
Video & audio server Video/audio
Load & replay
SDI Video/audio In-server editing workstation
In-server editing workstation
Interactive Television (iTV)
A service generally enabled by DTV which gives the viewer the opportunity to participate or access more information about the programme via a controller. The interactivity may be implemented by selecting different TV channels (unknown to the viewer) or by a return control path to the service provider. Besides using a phone line, DVB has devised return control paths for satellite (DVB-RCS), cable (DVB-RCC) and terrestrial (DVB-RCT).
Some consider interactivity is the future of television – the ‘killer app(lication)’ that will make DTV a commercial success. Others talk of lean back (viewing) and lean forward
(interaction) being very different attitudes of both body and mind and question whether the two belong in the same place.
Compression which involves more than one frame. Inter-frame compression compares consecutive frames to remove common elements and arrive at ‘difference’ information.
MPEG-2 uses two types of inter-frame processed pictures – the ‘P’ (predictive) and ‘B’
(bi-directional) frames. As ‘P’ and ‘B’ frames are not complete in themselves but relate to other adjacent frames, they cannot be edited independently.
The reduction in vertical definition during vertical image movement due to interlaced
(rather than progressive) scans. Typically this is assumed to be 30%, and is in addition to the Kell Factor (another 30% reduction), making an overall reduction of 50%. Note that, when scanning film frame-per-frame (ie 24 or 25fps – not 3:2 pull-down to 60fps), or a succession of electronic frames each representing a single snapshot in time, there is no vertical movement between fields and the Interlace Factor has no effect.
Method of scanning lines down a screen – as used in most of today’s television broadcasts. Each displayed picture comprises two interlaced fields: field two fills in between the lines of field one. One field displays odd lines, then the other shows even lines. For analogue systems, this is the reason for having odd numbers of lines in a TV frame e.g. 525 and 625, so that each of the two fields contain a half-line, causing the constant vertical scan to place the lines of one field between those of the other.
The technique improves the portrayal of motion and reduces picture flicker without having to increase the picture rate, and therefore the bandwidth or data rate.
Disadvantages are that it reduces vertical definition of moving images to about 70% (see
Interlace Factor) of the progressive scan definition and tends to cause some horizontal picture detail to ‘dither’. For processing, such as in DVE picture size changes, movement between fields has to be detected if the higher-quality frame-based processing is used.
There is continuing debate about the use of interlaced and progressive scans for
Interlaced or Progressive?
Since the start of television broadcasts interlaced scanning has been used. Meanwhile computer screens have always used progressive scans. As the two technologies have converged, and digital film has become a reality, there has been much discussion about which scanning system should be used.
Interlaced scanning has served television very well over its 50+ analogue years with the 525/60 and 625/50 formats. By scanning all the odd lines 1, 3, 5, 7, etc… and then all the even lines 2, 4, 6, 8, etc… it makes two vertical ‘field’ sweeps for every frame.
So, Europe with 25 Hz frame rate has 50 fields per second, similarly USA with 30 Hz frame rate has 60 fields per second. Progressive scanning is more obvious; it simply scans all the lines in a frame sequentially, to make one complete vertical sweep. As this is so obvious, why use interlace?
Film with 24 fps, is akin to progressive scans, and one thing that film is not so good at is the portrayal of movement. Partly for this reason pans are generally kept quite slow.
Fast action, such as sports, would be very juddery on 24, 25 or 30 Hz progressive/film, while drama with less action, would generally be fine. The 2:1 interlace (two fields per frame) used for television creates twice as many vertical sweeps, so the image is refreshed twice as often. The result is a much better portrayal of movement.
There are more factors favouring interlace. Nearly everyone watches TV on cathode ray tubes (CRT) and their phosphors have short persistence – i.e. their light fades fast after being ‘lit’ by the scanning electron beam. If they took any longer, image movement would look ‘sticky’ or ‘laggy’. There is also persistence in our vision which is why the refresh rate for film was set at 24 Hz. During vertical scans at 24-30 Hz there is enough time for the phosphors to go dark over large contiguous areas resulting in very flickery pictures. More recently technology such as framestores in receivers, allows processing so, for example, the 24-30P scans can be displayed without flicker – typically as 48-60I images. As interlace refreshes pictures twice as fast, the flicker is much reduced. It is interesting to note that similar action is taken in cinemas with ‘double shuttering’ showing each film frame twice – simply to reduce flicker.
Interlace has its downsides. While progressive scans give clear rock-steady pictures – as seen on computer screens, interlace causes horizontal and near horizontal edges to
‘twitter’ as some vital edge information is present in only one field (see diagram). Thus it appears to move up and down by a line at 25 or 30 Hz. This is hardly noticed on TV but using interlace on computer displays would make it difficult to see the fine detail and it would be very tiring on the eyes.
In addition, the ‘half-frame’ vertical scans of interlace have the effect of reducing detail in areas of pictures that contain movement. This is known as the Interlace Factor.
For normal viewing this is not a problem as our eyes cannot see so much detail in movement as when things are still. However, if a frame is frozen, then the areas where the image has changed between the two fields, flicker badly. Hence freezes are usually
The effect of interlace on apparent vertical position
The circle is scanned the same way for each once-per-frame complete vertical scan.
Progressive frame rates are 24,
25, 30 Hz for all HD sizes and
60 Hz for the 720-line format
The circle is scanned differently for each of the two interlaced fields so making its vertical position/height appear to change at the frame rate of the scan. This effect gives rise to the ‘twittering’ associated with interlace, which can make it difficult to see detail.
Interlaced field rates are 50 and 60 Hz. Two fields make a frame but there is no interlaced format for the 24 Hz frame rate.
taken from one field only or processed to resolve the differences between the fields.
Such differences make it much more complex for any image processing that involves changing the size of pictures – either as a zoom or for format/standards conversion.
For best resolution, the necessary spatial interpolation needs to work from information in both fields and movement upsets the process. Some form of movement detection or working from only one field provides solutions, though the latter is less favoured as it reduces vertical resolution. As none of these problems apply to progressive scans, they are much easier to handle in image effects and processing.
As television transmission must fit within channels of limited bandwidth (6, 7 or 8 MHz depending on country), there is only so much information that can be sent. At the same time, the pictures must look sharp and clear as well as providing good portrayal of movement. Without the bandwidth limitation, and assuming that the CRTs’ scanning systems can work twice as fast, 50 or 60 Hz progressive would be ideal but, with analogue, there simply is not the bandwidth to do that and maintain spatial resolution.
Digital transmission changes much. MPEG-2 compression means that more useful picture information (and not the repeated/redundant detail) can be sent down each
‘analogue’ TV channel – typically five times more. Now the bandwidth limitation has been circumvented there is a real choice for interlace or progressive. Despite the leap in technology, the fundamentals of display systems and our eyes mean that progressive is still not ideal for everything. Although some HD standards describe 50 or 60 frames progressive in the favoured 1080-line HD format there are no VTRs or practical technical solutions available so, still having to use only 25-30 Hz for progressive scans will continue to render action poorly. However, processing in the set-top boxes or digital receivers, means it would be possible to offer ‘double shuttering’ to suppress flickering, if the displays can scan that fast. So, until 50-60 Hz progressive becomes widely available at 1080-lines the best solution would be to switch between interlace and progressive to suit content.
The ability of systems to interoperate – to understand and work with information passed from one to another. Applied to television this means video, audio and metadata from one system can be used directly by another. Digital signals may be originated in various formats and subjected to different types of compression so care is needed to maintain interoperability.
Defining the value of a new pixel from those of its near neighbours. When re-positioning or re-sizing a digital image, for dramatic effect or to change picture format more, less or different pixels are required from those in the original image. Simply replicating or removing pixels causes unwanted artefacts. For far better results the new pixels have to be interpolated – calculated by making suitably weighted averages of adjacent input pixels – to produce a more transparent result. The quality of the results will depend on the techniques used and the number of pixels (points) taken into account (hence 16point interpolation), or area of original picture, used to calculate the result.
Interpolation between the same point in space (pixel) on successive frames. It can be used to provide motion smoothing and is extensively used in standards converters to reduce the judder caused by field rate changes – such as 50 to 60 Hz. The technique can also be adapted to create frame averaging for special effects.
Compression that occurs within one frame. The compression process only removes redundant information from within the frame itself. No account is taken of other frames.
JPEG and the ‘I’ frames of MPEG-2 are coded in this way and use DCT. In the MPEG-2 sequence only I-frames can be edited as they are the only independent frames.
Internet Protocol is the de facto standard for networking and is the widest used of the network protocols that carry the data and lie on top of physical networks and connections. Besides its Internet use it is also the main open network protocol that is supported by all major computer operating systems. IP, or specifically IPv4, describes the packet format for sending data using a 32-bit address to identify each device on the network with four eight-bit numbers separated by dots e.g. 188.8.131.52. Each IP data packet contains a source and destination address. There is now a move toward IPv6 which brings, among many other enhancements, 128-bit addressing – enough for over
6,000 billion devices and relieving IPv4’s address shortage.
Above IP are two transport layers. TCP (Transmission Control Protocol) provides reliable data delivery, efficient flow control, full duplex operation and multiplexing (simultaneous operation with many sources and destinations). It establishes a connection and detects corrupt or lost packets at the receiver and re-sends them. This TCP/IP, the most common form of IP, is used for general data transport but is slow and not ideal for video.
UDP (User Datagram Protocol) uses a series of ‘ports’ to connect data to an application.
Unlike the TCP, it adds no reliability, flow-control, or error-recovery functions but it can detect and discard corrupt packets by using checksums. This simplicity means its headers contain fewer bytes and consume less network overhead than TCP, making it useful for streaming video and audio where continuous flow is more important than replacing corrupt packets.
There are other IP applications that live above these protocols such as File Transfer
Protocol (FTP), Telnet for terminal sessions, Network File System (NFS), Simple Mail
Transfer Protocol (SMTP) and many more.
IP Datacast Forum (IPDC)
The IPDC Forum was launched in 2001 to promote and explore the capabilities of IP-based services over digital broadcast platforms (DVB and DAB). Participating companies include service providers, technology providers, terminal manufacturers and network operators.
The Forum aims to address business, interoperability and regulatory issues and encourage pilot projects.
See also: IP over DVB
IP over DVB
The delivery of IP data and services over DVB broadcast networks. Also referred to as datacasting, this takes advantage of the very wideband data delivery systems designed for the broadcast of digital television, to deliver IP-based data services – such as file transfers, multimedia, Internet and carousels, which may complement, or be instead of, TV.
Due to DVB-T’s ability to provide reliable reception to mobile as well as fixed receivers, there are possibilities to send IP-style service to people on the move. For interactivity, a return path can be established by telephone.
See also: IP Datacast Forum, Data carousel, DVB over IP
Integrated Receiver Decoder. A device that has both a demodulator and an MPEG decoder built in. This could be a digital television set or a digital set-top box.
Integrated Server Automation. This Quantel term describes the technology in its sQServers that manages all of the media, both broadcast and browse, in a single database. However large the system, every workstation can access all media for editing, playback or network transfer. ISA knows where every frame of every clip is stored and invisibly manages the movement of media from storage to editor to playout port.
It simplifies the architecture of larger, multi-server production systems, eliminating much of the automation hardware that has traditionally been required.
Integrated Services Digital Broadcasting – standard for digital broadcasting to be used in Japan. ISDB has many similarities to DVB including OFDM modulation for transmission and the flexibility to trade signal robustness against delivered data rate.
ISDB-T (terrestrial) is applicable to all channel bandwidth systems used world-wide – 6,
7, and 8 MHz. The transmitted signal comprises OFDM blocks (segments) allowing flexible services where the transmission parameters, including modulation and error correction, can be set segment-by-segment for each OFDM segment group of up to three hierarchical layers in a channel. Within one channel, the hierarchical system allows both robust SD reception for mobile and portable use and less robust HD – a form of graceful degradation.
International Standards Organisation. An international organisation that specifies international standards, including those for networking protocols, compression systems, disks, etc.
A form of data transfer that carries timing information with the data. Data is specified to arrive over a time window, but not at any specific rate within that time. ATM, IEEE 1394 and Fibre Channel can provide isochronous operation where links can be booked to provide specified transfer performance. For example, 60 TV fields can be specified for every second but their arrival may not be evenly spread through the period. As this is a guaranteed transfer it can be used for ‘live’ video but is relatively expensive on resources.
Independent Television Commission. It is responsible as a regulator, both legally and technically, for all independent programming in the United Kingdom, be it cable, satellite or terrestrial.
International Telecommunications Union. The United Nations regulatory body covering all forms of communication. The ITU sets mandatory standards and regulates the radio frequency spectrum. ITU-R (previously CCIR) deals with radio spectrum management issues and regulation while ITU-T (previously CCITT) deals with telecommunications standardisation. Suffix BT. denotes Broadcasting Television.
This standard defines the encoding parameters of digital television for studios. It is the international standard for digitising component television video in both 525 and 625 line systems and is derived from the SMPTE RP125. ITU-R BT.601 deals with both colour difference (Y, R-Y, B-Y) and RGB video, and defines sampling systems, RGB/Y, R-Y, B-Y matrix values and filter characteristics. It does not actually define the electro-mechanical interface – see ITU-R BT. 656.
ITU-R BT.601 is normally taken to refer to colour difference component digital video
(rather than RGB), for which it defines 4:2:2 sampling at 13.5 MHz with 720 luminance samples per active line and 8 or 10-bit digitising.
Some headroom is allowed. So, with 10-bit sampling, black level is at 64 (not 0) and white at level 940 (not 1023) – to minimise clipping of noise and overshoots. With 210 levels each for Y (luminance), Cr and Cb (the digitised colour difference signals) = 230 – over a billion unique colours can be defined.
The sampling frequency of 13.5 MHz was chosen to provide a politically acceptable common sampling standard between 525/60 and 625/50 systems, being a multiple of
2.25 MHz, the lowest common frequency to provide a static sampling pattern for both.
The international standard for interconnecting digital television equipment operating to the 4:2:2 standard defined in ITU-R BT.601. It defines blanking, embedded sync words, the video multiplexing formats used by both the parallel (now rarely used) and serial interfaces, the electrical characteristics of the interface and the mechanical details of the connectors.
Recommendation for 1125/60 and 1250/50 HDTV formats defining values and a ‘4:2:2’ and ‘4:4:4’ sampling structure that is 5.5 times that of ITU-R BT.601. Actual rates are
74.25 MHz for luminance Y, or R, G, B and 37.125 MHz for colour difference Cb and Cr, all at 8 bits or 10 bits. Note that this is an ‘expanded’ form of 601 and so uses nonsquare pixels.
The original standard referred only to 1035 and 1152 active lines corresponding to the total scan lines of 1125 and 1250. In 2000, ITU-R BT.709-4 added the 1080 active line standards and recommends that these are now used for all new productions. It also defines these 1080-line square-pixel standards as common image formats (CIF) for international exchange.
A general purpose programming language developed by Sun Microsystems and best known for its widespread use on the World Wide Web. Unlike other software, programs written in Java can run on any platform type, so long as they have a Java
Virtual Machine available.
Joint Photographic Experts Group (ISO/ITU-T). JPEG is a standard for the data compression of still pictures (intra-frame). In particular its work has been involved with pictures coded to the ITU-R BT.601 standard. It offers data compression of between two and 100 times and three levels of processing are defined: the baseline, extended and lossless encoding.
JPEG baseline compression coding, which is overwhelmingly the most common in both the broadcast and computer environments, starts with applying DCT to 8 x 8 pixel blocks of the picture, transforming them into frequency and amplitude data. This itself may not reduce data but then the generally less visible high frequencies can be divided by a high
‘quantising’ factor (reducing many to zero), and the more visible low frequencies by a lower factor. The ‘quantising’ factor can be set according to data size or picture quality requirements – effectively adjusting the compression ratio. The final stage is Huffman coding which is lossless but can further reduce data by 2:1 or more.
Baseline JPEG coding is very similar to the I-frames of MPEG, the main difference being they use slightly different Huffman tables.
Just a bunch of disks. This could be a collection of disk drives connected on a single data bus such as Fibre Channel or SCSI. These are cheap and can offer large volumes of storage that may be shared among their users. As there are no intelligent controllers, items such as data speed and protection may well be compromised.
The vertical definition of a scanned image is only around 70% (the Kell Factor) of the line count due to a scan’s inability to show detail occurring between the lines. Note that, for interlaced scans, vertical definition is further reduced by the Interlace Factor to 50% or less overall during most vertical image movement.
A machine-readable bar-code printed along the edge of camera negative film stock outside the perforations. It gives key numbers, film type, film stock manufacturer code, and offset from zero-frame reference mark (in perforations). It has applications in telecine for accurate film-to-tape transfer and in editing for conforming neg. cuts to EDLs.
A set of parameters defining a point in a transition, e.g. of a DVE effect. For example a keyframe may define a picture size, position and rotation. Any digital effect must have a minimum of two keyframes, start and finish, although all complex moves will use more – maybe as many as 100.
Increasingly, more parameters are becoming ‘keyframeable’, i.e. they can be programmed to transition between two, or more, states. Examples are colour correction to make a steady change of colour, and keyer settings, perhaps to make an object slowly appear or disappear.
The process of selectively overlaying an area of one picture (or clip) onto another. If the switch between the overlaid and background pictures is simply ‘hard’ this can lead to jagged edges of the overlaid, or keyed, pictures. They are usually subjected to further processing to produce ramped key edges to give a cleaner, more convincing, result.
The whole technology of deriving key signals and the colour corrections applied to keyed image edges, has greatly expanded through the use of digital technology, so that many operations may be used together, e.g. softening the key, colour correcting key spill areas, and more.
KLV is a data encoding protocol (SMPTE 336M). The Key is a unique, registered sequence of bits that defines the type of content that is coming (video, audio, EDL, etc.) and Length
– number of bytes ahead of Value, the content ‘payload’ itself. Compliance to KLV means that a wider range of equipment and applications can understand the files.
Latency (of data)
The delay between requesting and accessing data. For disk drives it refers to the delay due to disk rotation only – even though this is only one of several factors that determines time to access data from disks. The faster a disk spins the sooner it will be at the position where the required data starts. As disk diameters have decreased so rotational (spindle) speeds have tended to increase but there is still much variation. Modern 3.5-inch drives typically have spindle speeds of between 7,200 and 10,000 RPM, so one revolution is completed in
8 or 6 ms respectively. This is represented in the disk specification as average latency of 4 or 3 ms. It is reduced to 2 ms in the latest drives operating at 15,000 RPM.
A collection, or ‘pack’ of clip layers can be assembled to form a composite layered clip.
Layers may be background video or foreground video with their associated matte run.
The ability to compose many layers simultaneously means the result can be seen as it is composed and adjustments made as necessary.
The process of editing footage that can only be accessed or played in the sequence recorded. Tape is linear in that it has to be spooled for access to any material and can only play pictures in the order they are recorded.
With spooling, jogging and pre-rolls absorbing upwards of 40 percent of the time in a
VTR edit suite, linear editing is now considered slow. The imposition of having to record items to an edit master tape in sequence limits flexibility for later adjustments: e.g.
inserting shots between existing material may involve either starting the job again or re-dubbing the complete piece.
In linear keying the ratio of foreground to background at any point is determined on a linear scale by the level of the key (control) signal.
This form of keying provides the best possible control of key edge detail and anti-aliasing.
It is essential for the realistic keying of semi-transparent effects such as transparent shadows, through-windows shots and partial reflections.
Least Significant Bit. Binary numbers are represented by a series of ones and zeros.
Binary 1110 = Decimal 14
In this example the right-most binary digit, 0, is the LSB as it controls the lowest value in the number.
Longitudinal Timecode. Timecode recorded on a linear track on tape and read by a static head. This can be easily read when the tape is moving forwards or backwards but not at freeze frame – when VITC, timecode recorded with the picture material, can be used.
A component of video, the black and white or brightness element, of an image. It is written as Y, so the Y in Y,B-Y,R-Y, YUV, YIQ, Y,Cr,Cb is the luminance information of the signal.
In a colour TV system the luminance signal is usually derived from the RGB signals, originating from cameras or telecines, by a matrix or summation of approximately:
Y = 0.3R + 0.6G + 0.1B
Magneto optical disks
Main (Level/Profile) (ML) (MP)
Frequencies of millions of cycles (or events) per second.
Data about data. Data about the video and audio but not the actual video or audio themselves. This is important for labelling and finding data – either in a ‘live’ data stream or an archive. Within studios and in transmission, digital technology allows far more information to be added. Some believe metadata will revolutionise every aspect of production and distribution. Metadata existed long before digital networks; video timecode and film frame numbers are but two examples.
Software, not hardware. This exists above the operating system to provide a middle layer offering APIs for applications programmers but it is not an application itself. An example is
Multimedia Home Platform (MHP) which is beginning to be used in set-top boxes.
A prediction for the rate of development of modern electronics. This has been expressed in a number of ways but in general states that the density of information storable in silicon roughly doubles every year. Or, the performance of silicon will double every eighteen months, with proportional decreases in cost. For more than two decades this prediction has held true.
Moore’s law initially talked about silicon but it could be applied to other aspects such as disk drive capacity that doubles every two years and has held true, or been exceeded, since 1980, and still continues.
Media Object Server (protocol) – a communications protocol for newsroom computer systems (NCS) and broadcast production equipment. It is a collaborative effort between many companies to enable journalists to see, use, and control a variety of devices from their desktop computers, effectively allowing access to all work from one screen. Such devices include video and audio servers and editors, still stores, character generators and special effects machines.
MOS uses a TCP/IP-based protocol and is designed to allow integration of production equipment from multiple vendors with newsroom computers via LANs, WANs and the
Internet. It uses a ‘one-to-many’ connection strategy – multiple MOSs can be connected to a single NCS, or a single MOS to many NCSs.
A high-performance, perceptual audio compression coding scheme which exploits the properties of the human ear and brain while trying to maintain perceived sound quality.
MPEG-1 and 2 define a family of three audio coding systems of increasing complexity and performance – Layer-1, Layer-2 and Layer-3. MP3 is shorthand for Layer-3 coding.
MPEG defines the bitstream and the decoder but, to allow for future improvements, not an encoder. MP3 is claimed to achieve ‘CD quality’ at 128 – 112 kb/s – a compression of between 10 and 12:1.
Moving Picture Experts Group. This a working group of ISO/IEC for the development of international standards for compression, decompression, processing, and coded representation of moving pictures, audio and their combination. It has also extended into metadata. Four MPEG standards were originally planned but the accommodation of HDTV within MPEG-2 has meant that MPEG-3 is now redundant. MPEG-4 is very broad and extends into multimedia applications. Work is in progress. MPEG-7 is about metadata and work has started on MPEG-21 to describe a ‘big picture’ multimedia framework.
Website: http://mpeg.telecomitalialab.com & www.mpeg.org
A compression scheme designed to work at 1.2 Mb/s, the basic data rate of CD-ROMs, so that video could be played from CDs. Its quality is not sufficient for TV broadcast.
A family of inter- and intra-frame compression systems designed to cover a wide range of requirements from ‘VHS quality’ all the way to HDTV through a series of compression algorithm ‘profiles’ and image resolution ‘levels’. With data rates from below 4 to 100
Mb/s, the family includes the compression system that delivers digital TV to homes and that puts video onto DVDs.
MPEG-2 is highly asymmetrical. Coding the video is very complex, generally producing I,
P and B-frames, but is designed to keep the decoding at the reception end as simple, and therefore cheap, as possible. Since this operates in a one-to-many world, this is a good idea.
MPEG-2 generally achieves very high compression ratios and can offer better quality pictures than Motion-JPEG for a given bit rate, but it is less editable. It uses intra-frame compression to remove redundancy within frames as well as inter-frame compression to take advantage of the redundancy contained over series of many pictures. This creates long groups of pictures (GOPs – see following) which is highly efficient but not ideal for frame-accurate editing.
Of the six profiles and four levels, creating 24 possible combinations, 12 have already been implemented (see Profiles and Levels table following). The variations defined are so wide that it would not be practical to build a universal coder or decoder. Current interest includes the Main Profile @ Main Level and Main Profile @ High Level covering current
525/60 and 625/50 broadcast television as well as DVD-video with [email protected] and HDTV with [email protected] Besides the transmission/delivery applications which use 4:2:0 sampling, the 422 Profile, which uses 4:2:2 sampling, was designed for studio use and offers greater chrominance bandwidth which is useful for post production.
Website: http://mpeg.telecomitalialab.com & www.mpeg.org
Bi-directional predictive frames composed by assessing the difference between the previous and the next frames in a sequence of television pictures. As they contain only predictive information they do not make up a complete picture and so have the advantage of taking up much less data than the I-frames. However, to see the original picture requires information from a whole sequence (GOP) of MPEG frames to be decoded which must include an I-frame.
Blocking and ‘Blockiness’
Artefact of compression generally showing momentarily as rectangular areas of picture with distinct boundaries. This is one of the major defects of digital compression, especially MPEG, its visibility generally depending on the amount of compression used, the quality and nature of the original pictures as well as the quality of the coder.
The visible blocks may be 8 x 8 DCT blocks or ‘misplaced blocks’ – 16 x 16 pixel
macroblocks, due to the failure of motion prediction/estimation in an MPEG coder or other motion vector system, e.g. standards converter or, more commonly, the fault of the transmission system.
Rectangular areas of pictures, usually 8 x 8 pixels in size, which are individually subjected to DCT coding as part of a digital picture compression process.
Group Of Pictures. In an MPEG signal the GOP is a group of pictures or frames between successive I-frames, the others being P and/or B-frames. In the widest application, television transmission, the GOP is typically 12 frames in a 25 fps signal and 15 frames in a 30 fps signal (i.e. about half a second). Such groups are sometimes referred to as Long
GOP. Within a transmission the GOP length can vary – a new sequence starting with an
I-frame may be forced if there is a big change at the input, such as a cut.
MPEG-2 12 frame GOP
*Note for transmission the last ‘I’ frame is played out ahead of the last two ‘B’ frames to form the sequence I1, B1, B2, P1, B3,
B4, P2, B5, B6, P3, I2, B7, B8
The length of a GOP determines the editability of that MPEG signal. Additional processing, possibly requiring an MPEG decode/recode, will be needed to successfully cut within a GOP. This is one of the reasons that MPEG is not generally regarded as editable (see Cut edit) – much work has been done on this subject under the title of
Intra-frames – these compressed frames contain all required data to reconstruct a whole picture independently of any others. They produce more data than B or P frames, which cannot be decoded without reference to I-frames. Their compression technology is very similar to JPEG.
Used from Main Profile upwards, these contain only predictive information (not a whole picture) generated by looking at the difference between the present frame and the previous one. As with B-frames they hold less data than I- frames and a whole GOP must be decoded to see the picture.
Picture source format ranging from about VCR quality to full HDTV. For example:
Main Level (ML) – maximum size 720 x 576 – equivalent to ITU-R BT.601.
High Level (HL) – maximum size of 1920 x 1152 – as defined by ITU-R BT.709.
720 x 570
I, B, P
1920 x 115
1440 x 1152
720 x 576
352 x 288
I, B, P
720 x 608
I, B, P
720 x 576
352 x 288
I, B, P
I, B, P
1920 x 1152
1440 x 1152 1440 x 1152
60 Mb/s 80 Mb/s
720 x 576
MPEG-2 Family of Profiles and Levels
*SNR and Spatial profiles are both scalable
Note: Data rates shown are the maximums permitted for the Profile/Level
A group of picture blocks, usually four (16 x 16 pixels overall), which are analysed during
MPEG coding to give an estimate of the movement of particular elements of the picture between frames. This generates the motion vectors which are then used to place the macroblocks in decoded pictures.
Digital audio compression part of MPEG using auditory masking techniques. MPEG-1 audio specifies mono or two-channel audio which may be Dolby Surround coded at bit rates between 32 kb/s to 384 kb/s. MPEG-2 audio specifies up to 7.1 channels (but 5.1
is more common), rates up to 1 Mb/s and supports variable bit-rate as well as constant bit-rate coding. MPEG-2 handles backward compatibility by encoding a two-channel
MPEG-1 stream, then adds the 5.1/7.1 audio as an extension.
Direction and distance information used in MPEG coding and some standards converters to describe the movement of macroblocks (of picture) from one picture to the next. A large part of coder power is needed to determine the motion vectors. Their correctness is a big factor in MPEG-2 coders’ quality and efficiency.
See also: Macroblock
A collection of compression tools that make up a coding system. The profiles become more complex from left to right (see MPEG-2 section, Family of Profiles and Levels).
– 422 Profile @ Main Level has more chroma information than 4:2:0, a constrained version of which is used by BetacamSX using a 2-frame (IB) GOP producing a bitstream of 18 Mb/s, and I-frame only IMX VTRs with 50 Mb/s.
– Main Profile at High Level for HDTV and is implemented in DVB and ATSC systems with bitstreams running up to 19.4 Mb/s.
– Main Profile at Main Level covers broadcast television formats up to 720 pixels x 576 lines and 30 fps so includes 720 x 486 at 30 fps and 720 x 576 at 25 fps.
The economy of 4:2:0 sampling is used and bit rates vary from as low as 2 Mb/s on multiplexed transmissions, up to 9 Mb/s on DVD-video.
Website: http://mpeg.telecomitalialab.com & www.mpeg.org
ISO/IEC 14496. Whereas MPEG-1 and MPEG-2 are only about efficient storage and transmission of video and audio, MPEG-4 adds interactive graphics applications
(synthetic content) and Interactive multimedia (World Wide Web, distribution of and access to content). First published in early 1999, it has already grown to embrace a wide range of service types.
It specifies low bit rates (5-64 kb/s) for mobile and internet applications with frame rates up to 15 Hz, and images up to 352 x 288 pixels. At the higher end it works up SD,
HD resolutions and beyond and uses bit rates up to 10 Mb/s. The video coding is said to be three times more complex than MPEG-2 but is already 15 percent more efficient and may rise to 50 percent. There is also more complexity for the decoders but it is hoped that Moore’s Law will take care of that.
It also includes multimedia storage, access and communication as well as viewer interaction and 3D broadcasting. Aural and visual objects (AVOs) represent the content which may be natural – from cameras or microphones, or synthetic – generated by computers. Their composition is described by the Binary Format for Scene description
(BIFS) – scene construction information to form composite audiovisual scenes from the
AVOs. Hence, a weather forecast could require relatively little data – a fixed background image with a number of cloud, sun, etc., symbols appearing and moving, audio objects to describe the action and a video ‘talking head’ all composed and choreographed as defined by the BIFS. Viewer interactivity is provided by the selection and movement of objects or the overall point of view – both visually and aurally.
QuickTime 6 and RealPlayer 8 already make use of MPEG-4. DVB is showing interest in including it in future specifications but there is no sign of a large-scale decamp from the very well established MPEG-2.
The value of information often depends on how easily it can be found, retrieved, accessed, filtered and managed. MPEG-7, formally named ‘Multimedia Content
Description Interface’, provides a rich set of standardised tools to describe multimedia content. Both human users and automatic systems that process audiovisual information are within its scope. It is intended to be the standard for description and search of the vast quantity of audio and visual content that is now becoming available – including that from private databases, broadcast and via the World Wide Web. Applications include database retrieval from digital libraries and other libraries, areas like broadcast channel selection, multimedia editing and multimedia directory services.
MPEG-7 offers a comprehensive set of audiovisual Description Tools (the metadata elements, their structure and relationships that are defined as Descriptors and
Description Schemes). It specifies a Description Definition Language (DDL) so that
material with associated MPEG-7 data can be indexed and allow fast and efficient searches. These searches will permit not only text-based inquiries, but also for scene, motion and visual content. Material may include stills, graphics, 3D models, audio, speech and video as well as information about how these elements are combined.
Besides uses in programme-making MPEG-7 could help viewers by enhancing EPGs and programme selection.
Started in June 2001, work on MPEG-21 aims to create descriptions for a multimedia framework to provide a ‘big picture’ of how the system elements relate to each other and fit together.
The resulting open framework for multimedia delivery and consumption has content creators and content consumers as focal points to give creators and service providers equal opportunities in an MPEG-21 open market. This will also give the consumers access to a large variety of content in an interoperable manner.
Standardisation work includes defining items such as Digital Item Declaration (DID),
Digital Item Identification and Description (DII&D), Intellectual Property Management and Protection (IPMP).
Most Significant Bit. Binary numbers are represented by a series of ones and zeros.
Binary 1110 = Decimal 14
In this example the left-most binary digit, 1, is the most significant bit – here representing the presence of 2 3 i.e. 8.
Mean Time Between Failure. A statistical assessment of the average time taken for a unit to fail – a measure of predicted reliability. The MTBF of a piece of equipment is dependent on the reliability of each of its components. Generally the more components the lower the MTBF, so packing more into one integrated circuit can reduce the component count and increase the reliability. Modern digital components are highly reliable. Even complex electro-mechanical assemblies such as disk drives now offer
MTBFs of up to 1,000,000 hours – some 110 years! Note this does not mean a drive has been run for 110 years and failed just once, nor that it is expected to run for this period without failure, but it does show the average failure rate of many components of the same type.
The presentation of more than one medium. Strictly speaking TV is multimedia (if you have the sound up). More typically it is pictures (moving and still), sound and often text combined in an interactive environment. This implies the use of computers, with the significant amount of data this requires usually supplied either on CD-ROM or via a data link. ‘Surfing the net’ is an example. High compression ratios are used to allow the use of pictures. One of the first applications was in education; now it is commonly seen at home via the Internet or DVDs.
Multimedia has a wide meaning. Another example is in the production of material which is published in many forms. For example pictures from television productions can be transferred to print for listings magazines, to EPGs and to advertising. Such transfers are commonly handled through network connections.
A term for the group of compressed digital video channels multiplexed into single transmission stream occupying the space of one analogue terrestrial TV channel. The term ‘Bouquet’ has also been used in this context.
The Material Exchange Format is aimed at the exchange of program material between file servers, tape streamers and digital archives. It usually contains one complete sequence but this may comprise a sequence of clips and program segments. There are six operational patterns: Simple, Compiled, Compound, Uncompiled Simple,
Uncompiled Compound and Metadata-only.
As MXF is derived from the AAF data model it integrates closely with AAF files as well as stream formats. Bridging file and streaming transfers, MXF helps move material between AAF file-based post production and streaming program replay using standard networks. This set up extends the reliable essence and metadata pathways of both formats to reach from content creation to playout. The MXF body carries the content. It can include compressed formats such as MPEG and DV as well as uncompressed video and contains an interleaved sequence of picture frames, each with audio and data essence plus frame-based metadata.
MXF has been submitted to the SMPTE as a proposed standard.
Newsroom Computer System. The name sprang up when the only computer in a TV news area was used for storing and editing the textural information available from news services. It also created the running order for the bulletin and was interfaced to many other devices around the production studio. Today the NCS lives on… but it is no longer the only computer around the newsroom!
1. In TCP/IP, the network layer is responsible for accepting IP (Internet Protocol) datagrams and transmitting them over a specific network.
2. The third layer of the OSI reference model of data communications.
Network File System. Developed by Sun Microsystems NFS allows sets of computers to access each other’s files as if they were locally stored. NFS has been implemented on many platforms and is considered an industry standard.
See also: IP
8 binary bits = 1 Byte
4 binary bits = 1 Nibble
Near Instantaneously Companded Audio Multiplex. This digital audio system, used in
Europe, uses compression techniques to present ‘very near CD’ quality stereo into analogue TV transmissions.
Irregular level fluctuations of a low order of magnitude. All analogue video signals contain random noise. Ideally for digital sampling, the noise level should not occupy more than one LSB of the digital dynamic range. Pure digitally generated signals however do not contain any noise – a fact that can be a problem under certain conditions. Generally in ITU-R BT.601 systems fine noise is invisible; coarse or large area noise may be perceptible under controlled viewing conditions.
With digital compression, noise has a new importance. Noise, which can originate from analogue sources, can be hard to distinguish from real wanted high frequency information. This means compression coders can waste valuable output bandwidth on the noise to the cost of the real pictures.
A mix of two pictures which is controlled by their luminance levels relative to each other, as well as a mix value K (between 0 and 1): e.g. the position of a switcher lever arm. A and B sources are scaled by factors K and 1-K but the output signal is switched to that which has the greatest instantaneous product of the scaling and the luminance values. The output of any pixel is either signal A or B but not a mix of each. So if K = 0.5, in areas where picture A is brighter than B then only A will be seen. Thus two clips of single subjects shot against a black background can be placed in one picture.
The term has also come to encompass some of the more exotic types of picture mixing available today: for example to describe a mix that could add smoke to a foreground picture – perhaps better termed an additive mix.
Non drop-frame timecode
Timecode that does not use drop-frame and always identifies 30 frames per second.
This way the timecode running time will not exactly match normal time. The mismatch amounts to 1:1000, an 18-frame overrun every 10 minutes. This applies where 59.94,
29,97 or 23.976 picture rates are used in 525/60 systems as well as DTV.
Nonlinear means not linear – that the recording medium is not tape and editing can be performed in a non-linear sequence – not necessarily the sequence of the programme.
It describes editing with quick access to source clips and record space – usually using computer disks to store footage. This removes the spooling and pre-rolls of VTR operations so greatly speeding work. Yet greater speed and flexibility are possible with real-time random access to any frame (true random access).
The National Television Systems Committee. A United States broadcast engineering advisory group.
NTSC (television standard)
The colour television system used in the USA, Canada, Mexico, Japan and more, where
NTSC M is the broadcast standard (M defining the 525/60 line and field format). It was defined by the NTSC in 1953.
The bandwidth of the NTSC system is 4.2 MHz for the luminance signal and 1.3 and 0.4
MHz for the I and Q colour channels.
Note that the NTSC term is often incorrectly used to describe the 525-line format even when it is in component or digital form. In both these cases, NTSC, which includes the colour coding system, is not used.
Near video on demand – rapid access to programme material on demand often achieved by providing the same programme on a number of channels with staggered start times. Many of the hundreds of TV channels now on offer will be made up of NVOD services. These are delivered by transmission servers.
The minimum frequency that will faithfully sample an analogue signal so it can be accurately reconstructed from the digital result. This is always twice the maximum frequency of the signal to be sampled.
Requires low-pass filter to:
2. Block frequencies above
sampled will cause aliasing
In practice significantly higher sampling frequencies are used in order to stay well above the Nyquist frequency, where response is zero, and so avoid the chance of producing aliens (unwanted artefacts) and the severe attenuation, according to a Sinx/x characteristic, that exists around the Nyquist point. For example in ITU-R BT.601 the maximum luminance frequency is 5.5 MHz and its sampling frequency is 13.5 MHz.
A decision-making process using low-cost equipment usually to produce an EDL or a rough cut which can then be conformed or referred to in a high quality online suite – so reducing decision-making time in the more expensive online environment. Most offline suites enable shot selection and the defining of basic transitions such as cuts and dissolves which are carried by EDLs. It is only with the arrival of AAF that there has been an open standard for transporting a much wider range of decisions, including DVE, colour corrections, as well as other metadata, between systems from different manufacturers.
See also: AAF
Open Media Framework Interchange is an open standard for post production interchange of digital media among applications and across platforms. A precursor to
AAF it describes a file format and supports video, audio, graphics, animation and effects as well as comprehensive edit decision information. Transfer may be by removable disk or over a high-speed network or telephone line to another location.
Production of the complete, final edit performed at full programme quality – the buck stops here! Being higher quality than offline, time costs more but the difference is reducing. Preparation in an offline suite will help save time and money in the online.
To produce the finished edit, online has to include a wide range of tools, offer flexibility to try ideas and accommodate late changes, and to work fast to maintain the creative flow and to handle pressured situations.
Open Systems Interconnect (OSI)
The OSI Basic Reference Model describes a general structure for communications, defined by the ISO, which comprises seven layers and forms a framework for the coordination of current and future standards – but not defining the standards themselves.
Operating system (OS)
The base program that manages a computer and gives control of the functions designed for general purpose usage – not for specific applications. Common examples are MS-
DOS, Windows and Linux for PCs, Mac OS for Apple Macintosh and UNIX. For actual use, for example as a word processor, specific applications software is run on top of the operating system. While general purpose operating systems allow a wide range of applications to be used they do not necessarily allow the most efficient or fastest possible use of the hardware for the application.
Disks using optical techniques for recording and replay of material. These offer large storage capacities on a small area, the most common form being the 5.25-inch compact disk and, more recently, DVDs. They are removable and have rather slower data rates than fixed magnetic disks – but faster than floppies.
Write Once, Read Many or ‘WORM’ optical disks first appeared with 2 GB capacity on each side of a 12-inch platter – useful for archiving images. Today CD-Rs or DVD-Rs can do the job.
In 1989 the read/write magneto-optical (MO) disk was introduced which can be rewritten around a million times. With its modest size, just 5.25-inches in diameter, the
ISO standard cartridge then stored 325 MB per side – offering low priced removable storage for over 700 TV pictures per disk. MO disks with capacities up to 5.2 GB became available in 2000. Besides the obvious advantages for storing TV pictures this is particularly useful where large format images are used, in print and in film for example.
CD-RWs provide up to 700 MB of low-cost re-writable storage and DVD-Rs currently offer 4.7 GB. A further development is the new Blu-ray disk technology which is still in the lab but will offer 27 GB in a single layer, single sided disk.
Sampling information at a higher resolution than is required for the output format.
For example, an HD picture can be regarded as an over sampled version of SD. SD pictures created from down res’d HD are generally clearer and sharper than those made directly in SD. This is because the size reduction process tends to lower noise and the output pictures are derived from more information than is available in a direct SD scan.
An increasing amount of SD material is originated this way. Similarly, 35 mm film provides an over sampled source of SD.
A set of clips, mattes and settings for DVE, colour corrector, keyer, etc. that are used together to make a video layer in a composited picture. Quantel equipment allows packs to be saved and archived so they can be used later for re-works.
Phase Alternating Line. The colour coding system for television widely used in Europe and throughout the world, almost always with the 625 line/50 field system. It was derived from the NTSC system but by reversing the phase of the reference colour burst on alternate lines (Phase Alternating Line) is able to correct for hue shifts caused by phase errors in the transmission path.
Bandwidth for the PAL-I system is typically 5.5 MHz luminance, and 1.3 MHz for each of the colour difference signals, U and V.
Note that the PAL term is frequently used to describe any 625/50I analogue format – even if it is component, or in the 576/50I digital television system where PAL coding is not used.
A version of the PAL standard, but using a 525 line 60-field structure. Used only in parts of South America (e.g. Brazil).
Using several processors simultaneously with the aim of increasing speed over single processor performance. It often refers to array processor computer hardware that carries out multiple, often identical, mathematical computations at the same time.
Generally array processors are designed with specific tasks in mind and so are not suitable for running complex operational software. Due to system administration and the fact that not all processors will complete their tasks at the same moment, causing waiting time, the increase in speed gained by sharing the task is generally not proportional to the number of channels available.
Due to the very different structure of a parallel processing computer, software designed to run on a single processor system may well need very major changes to run on a parallel system.
Video effects that are constructed in such a way that they look totally real and not synthetic are referred to as photo-real effects. This use of effects has increased rapidly and has changed the way many productions are shot and post produced – leading to lower budgets and better looking results.
Achieving photo-real results requires careful planning from the shoot and computer imagery through to compositing in post production. Excellence in keying so that there are no telltales of blue screen haloes or colour spill, are among the many techniques required for successful results.
Pixel (or Pel)
A shortened version of ‘Picture cell’ or ‘Picture element’. The name given to one sample of picture information. Pixel can refer to an individual sample of R, G, B luminance or chrominance, or sometimes to a collection of such samples if they are co-sited and together produce one picture element.
Programmable Logic Device. This is a family of devices that has included PROMs
(Programmable Read Only Memories), PLAs (Programmable Logic Arrays) and PALs
(Programmable Array Logic). Today FPGAs (Field Programmable Gate Arrays) are the main interest. These range in size and complexity from a few dozen up to 10 million gates to provide a compact and efficient means of implementing complex non-standard logic functions. They are widely used in Quantel equipment where FPGAs also offer a fast track for the implementation of new improvements and ideas.
Software from a third party that brings added functionality to a computer application.
For post production this may add highly specialised aspects to digital effects.
Plain Old Telephone Service. This is the analogue connection that most people still speak on, or connect their modems or fax machines to. Its applications have gone far beyond its initial aims.
Method of scanning lines down a screen where all the lines of a picture are displayed in one continuous vertical scan. There are no fields or half pictures as with interlace scans.
It is commonly used with computer displays and is now starting to be used for some DTV formats, e.g. – 1080/24P, 720/60P. The ‘P’ denotes progressive.
A high picture refresh rate is required to give good movement portrayal, such as for fast action and camera pans, and to avoid a flickery display. For television applications using progressive scanning, this implies a high bandwidth or data rate and high scanning rates on CRT displays. The vertical definition is equal to around 70% of the number of lines (Kell
Factor) and does not show the dither of detail associated with interlaced scans.
Term used to describe the transfer of video from an edited master to deliverables – a process that will probably require making various versions in different formats, with or without pan-and-scan, etc.
An attraction of the 1080/24P format is that it can be used to make high quality versions for any television format, and even for film. This top-down approach preserves quality as the HD image size means any resizing will be down, making big pictures smaller, rather than up-res’d blow-ups from smaller pictures. For frame rate conversion, over half a century of running movies on TV has established straightforward ways to map 24 fps onto 25 Hz and 60 Hz vertical rates for television. Also the 16:9 original images offer room for pan-and-scan outputs at 4:3 aspect ratio.
Combinations of fast play, 3:2 pull-down, down-res and ARC are applied to output the required image format, vertical rate and aspect ratio. For example, fast play of the
1080/24P at 104 percent speed produces 1080/25P. Down-res produces 16:9 images in 576 lines and then the 25 progressive frames are read as 50 interlaced frames to create the 576/50I SD format. ARC is applied for 4:3 output. Changing from 24P to 60I vertical scans is achieved using 3:2 pull-down.
Publishing from a 1080/24P master
Standard def TV
Down-res +4% ‘fast’ play, pan/scan 4:3
Down-res, 3:2 pull-down, pan/scan for 4:3
High def TV
+4% ‘fast’ play
Down-res, 3:2 pull-down
Up-res, film record
The only exception is for film output. A popular size for this is 2k, which is only very slightly wider than 1080-line HD (2048 against 1920 pixels per line), but has significantly more lines (1536 against 1080), so the process would require a slight up-res.
Hardware and software built for a specific task (e.g. a DVE), not general purpose
(computer). Purpose-built hardware gives much improved processing speeds, between 10 and 100 fold, over systems using the same technology applied to general-purpose architecture and operating system software. This becomes important in image processing where tasks require a great deal of power, especially as the demands increase in proportion to the picture size – significant for working with HDTV.
However, as standard/general-purpose platforms continue to become ever more powerful, so it can make sense to swap out some purpose-built hardware, which tends to be more costly, for software solutions. This ability to swap is a part of Quantel’s generationQ architecture.
Apt name for the world leaders in digital television equipment – abbreviated from
QUANtised TELevision. Quantel has 29 years’ experience of digital television techniques – significantly more than any other manufacturer.
Factor applied to DCT coefficients as a part of the process of achieving a required video compression. The coefficients relating to the least noticeable aspects of picture detail – e.g. high frequency, low amplitude – are progressively reduced so that the final data will fit into the specified data file space. This space is often fixed and relates directly to the quoted compression ratio for I-frame only schemes such as DV. Note that the required quantisation will vary according to scene content. Given that too much data would cause problems by overflowing the allotted capacity of the file, schemes are cautious and designed to undershoot the file limit. To what extent the files are filled is a measure of the quality of a compression scheme – a reason why the quoted ‘compression ratio’ does not tell the whole story.
The process of sampling an analogue waveform to provide packets of digital information to represent the original analogue signal.
Redundant Array of Independent Disks. A grouping of standard disk drives together with a RAID controller to create storage that acts as one disk to provide performance beyond that available from individual drives. Primarily designed for operation with computers,
RAIDs can offer very high capacities, fast data transfer rates and much increased security of data. The latter is achieved through disk redundancy so that disk errors or failures can be detected and corrected.
A series of RAID configurations is defined by levels and, being designed by computer people, they start counting from zero. Different levels are suited to different applications.
No redundancy – benefits only of speed and capacity – generated by combining a number of disks.
Complete mirror system – two sets of disks both reading and writing the same data.
This has the benefits of level 0 plus the security of full redundancy – but at twice the cost. Some performance advantage can be gained in read because only one copy need be read, so two reads can be occurring simultaneously.
An array of nine disks. Each byte is recorded with one bit on each of eight disks and a parity bit recorded to the ninth. This level is rarely, if ever, used.
An array of n+1 disks recording 512 byte sectors on each of the n disks to create n x
512 ‘super sectors’ + 1 x 512 parity sector on the additional disk which is used to check the data.
The minimum unit of transfer is a whole superblock. This is most suitable for systems in which large amounts of sequential data are transferred – such as for audio and video.
For these it is the most efficient RAID level since it is never necessary to read/modify/write the parity block. It is less suitable for database types of access in which small amounts of data need to be transferred at random.
As level 3 but individual blocks can be transferred. When data is written it is necessary to read the old data and parity blocks before writing the new data as well as the updated parity block, which reduces performance.
As level 4, but the role of the parity disk is rotated for each block. In level 4 the parity disk receives excessive load for writes and no load for reads. In level 5 the load is balanced across the disks.
A RAID system implemented by low level software in the host system instead of a dedicated RAID controller. While saving on hardware, operation consumes some of the host’s power.
Diagnostics can often be run locally from a terminal (or a PC with terminal emulation software) or web browser. Equally the terminal/browser can be remote, connected via a modem and a phone line, or over the internet. This way any suitably configured equipment in any location (with a phone line) can be checked, and sometimes corrected, by specialist service personnel thousands of miles away.
Interdependent multiple systems, such as a video server and its ‘clients’, may require simultaneous diagnostics of all major equipment. Here, combining data links with a number of pieces of networked equipment, as with Quantel’s R-MON, effectively extends the Remote Diagnostics to larger and more complex situations.
A measure of the finest detail that can be seen, or resolved, in a reproduced image.
Whilst it is influenced by the number of pixels in the display (e.g. high definition 1920 x
1080, broadcast SDTV 720 x 576 or 720 x 487) note that the pixel numbers do not define the resolution but merely the resolution of that part of the equipment chain. The quality of lenses, display tubes, film processes, edit systems and film scanners, etc. in fact any element in the programme stream (from scene to screen), must be taken into account in assessing overall system resolution.
Term originated by Quantel to describe equipment able to operate with several moving image formats at the same time. For example, an editing system able to store and operate with any DTV production format material, making transitions between shots, composing layers originating from more than one format (resolution) and outputting in
any chosen format. Good equipment will be designed for fast operation at the largest specified TV format, e.g. 1920 x 1080 HD and so may operate faster with smaller images, but also may be able to handle larger images.
A term used to describe the notion of equipment that can operate at more than one resolution, though not necessarily at the same time. Most dedicated television equipment is designed to operate at a single resolution although some equipment, especially that using the ITU-R BT.601 standard, can switch between the specific formats and aspect ratios of 525/60 and 625/50. More recently, the advent of the multiple formats of DTV has encouraged new equipment able to operate with a number of video standards.
By their nature computers can handle files of almost any size so, when used to handle images, they can be termed ‘resolution independent’. However, as larger images require more processing, more storage and more bandwidth so, for a given platform, the speed of operation will slow as the resolution increases.
Other considerations when changing image resolution include: the need to reformat or partition disks, to check for sufficient RAM, to allow extra time for RAM/disk caching and to select an appropriate display.
Hiding or removing the defects acquired by old material and content. Digital technology has enabled many new and easy-to-use procedures to provide fast and affordable restoration. These range from fully automated systems – that depend on recognising generic faults and treating them – to hands-on operations that offer access to appropriate toolsets – often presented as ‘brushes’.
These have been applied to both television and to film, making available many old archives for the ever-hungry TV channels.
Return control (path)
Return control is needed for interactive television. It needs only to offer quite a low data rate but have little latency, as action should be followed as soon as possible by reaction.
DVB includes methods for return paths for cable, DVB-RCC; satellite-RCS; and terrestrial–RCT, services. While cable and terrestrial are devised to operate economically for individual viewers, the satellite solution is more appropriate for headends or groups – due to cost. Interestingly DVB-RCS has been adopted by many companies operating in the general telecoms world.
The abbreviation for the Red, Green and Blue signals, the primary colours of television.
Cameras and telecines have red, green and blue receptors, the TV screen has red, green and blue phosphors illuminated by red, green and blue guns. RGB is digitised with 4:4:4 sampling which generates 50% more data than 4:2:2.
Raster Image Process. The method of converting vector data (for example fonts) into raster image form – making it suitable for use as, or in, a television picture. Vector data is size independent and so has to be RIPped to a given size. RIPping to produce high quality results requires significant processing – especially if interactive operation is required, for example to fit the result into a background.
The practice of using frames of live footage as reference for painting animated sequences. While the painting will always be down to the skill of the artist, modern graphics equipment integrated with a video disk store makes rotoscoping, or any graphical treatment of video frames, quick and easy. This has led to many new designs and looks appearing on television as well as more mundane practices such as image repair.
Choices for Multi-format Post Production
Among the changes introduced with digital broadcasting is the addition of many new video formats. Gone are the days of single format operation as new digital standards allow many more both at HD as well as SD. The ATSC DTV standard permits broadcasters to choose from 18 video formats (see Table 3) and many more are included in the DVB standard. While a particular TV station may stay with one each for
SD and HD, a post facility may be called on to work with any – according to customer requirements. Traditionally, equipment manufacturers have offered hardware for operation at only one standard or switchable between just two – 625/50 and 525/60.
More recently 4:3/16:9 operation has also been available. Suppliers of computer-based systems have long highlighted this lack of flexibility pointing to their ‘resolution independence’. Hardware always has the potential for the fastest operation but can this be carried through to ‘resolution co-existent’ operation to offer a real choice in multiformat post production?
Any hardware solution should extend the speed and flexibility now experienced at SD to all TV formats – including HD. Despite handling multiple formats and six-times larger pictures, real-time operation is essential in some applications– as is high technical quality and an extended toolset at least equivalent to today’s on-line non-linear SD editing systems. Rather than switching between standards according to the job in hand, resolution co-existent operation should automatically accept any TV or multimedia sources for free use of all programme assets and to output at any chosen format.
Fig 1 shows the basic functional areas of an SD on-line non-linear system. Video is recorded to a disk store of typically about two hours (~150 GB for non-compressed) to give random access to material. The data rate must be real-time – 21 MB/s – greater will aid faster operation. For picture manipulation, mixes, wipes, keying, colour correction, paint, DVE etc., stored video is sent to the video processor and the result is sent back to disk as a new clip. For output, the edited result is played from the store in the required order.
Figure 1. Basic functional areas
Input processor Video store
2 hour, 150 Gbyte random access
> 21 MB/s
DVE, colour correct, keying, dissolves etc
The resolution co-existent scheme is broadly similar – but differs in detail. Assuming no change of frame rate, the major addition is for up/down-resing for format change at the output, and re-sizing – effectively both the same as a function already supplied for the
DVE in the video processor. Re-sizing for creative effect in the DVE can be intelligently linked with any required for source-to-output format change. This way material need only ever undergo one size change, saving time and storage while helping to maintain picture quality. Material not stored in the required output format uses the real-time up/down-resing in the output path (Fig 2) during replay.
Figure 2. Resolution co-existent solution
Input processor Video store
2 hours random access > 21 MB/s
Up/down res Output processor
For cut editing, the frames are read from the store and output in the order dictated by the edit decisions. If it is required to output to the TV world, this needs to be a real-time operation. As the frames are stored in their native format, the edited result may comprise different video formats. Any not corresponding to the required output format are changed by the Up/Down Res processor before passing to the output. Thus the output frames are only subjected to one process, at most.
Often editing is more complex with, at least, some DVE and compositing work. For example, the output is set to 1080I, and a 480I foreground is chosen to be composited at exactly half-frame height onto a 720P-original background clip. For this the background is interlaced and re-sized to 1080 lines, the foreground is re-sized to 112.5 percent
(540/480) while also compensating for any pixel aspect ratio changes. The resulting composite is recorded back to the video store as a new 1080I clip. (The alternative, working purely in 1080I, would involve sizing all items to 1080I prior to composition, the foreground then being re-sized to 50 percent of that. This uses more storage, more processing time and risks quality too). Material not requiring any ‘creative’ processing remains in the store in its native format. The replay operation is then the same as for cut editing with all required material that is stored at the chosen output standard, including the processed clips, passing directly to the output. Other material is up- or down-resed, before passing to the output. Accuracy and real-time operation are both essential for this. Here 16-point bicubic spatial interpolation, existing technology already used by
Quantel for exacting tasks such as outputting from video to film, offers the required accuracy. For real-time speed, processing is assigned to purpose-built hardware.
In practice, the up/down-res may use the video processor rather than involving additional hardware. In an integrated machine, the signal path can be switched as required during replay. This imposes real-time working which was not a strict requirement for the original processor functions. When exchanging data with the disk store it was not bound by the real-time world. But any extra processing power must be seen as a benefit.
Resolution co-existent online nonlinear editing is a fast, pragmatic alternative to resolution independent solutions. The benefits of hardware-based design for handling very high data rates allow similar operational speed and flexibility to that currently enjoyed at SD. With provision for other formats outside TV standards, a wide range of electronic media can be handled in such a system.
A standard for serial data communications defined by EIA standard RS-232 that is designed for short distances only – up to 10 metres. It uses single-ended signalling with a conductor per channel plus a common ground, which is relatively cheap, easy to arrange but susceptible to interference – hence the distance limitation.
Not to be confused with 4:2:2 sampling or 422P MPEG, this is a standard for serial data communications defined by EIA standard RS-422. It uses current-loop, balanced signalling with a twisted pair of conductors per channel, two pairs for bi-directional operation. It is more costly than RS232 but has a high level of immunity to interference and can operate over reasonably long distances – up to 300m/1000 ft. RS 422 is widely used for control links around production and post areas for a range of equipment –
VTRs, mixers, etc.
Real Soon Now. A phrase coined by Jerry Pournelle to satirise the tendency in the computer industry to discuss (and even offer for sale) things that are not actually available yet.
A system for compressing data. The principle is to store a pixel value along with a message detailing the number of adjacent pixels with that same value. This gives a very efficient way of storing large areas of flat colour and text but is not so efficient with pictures from a camera, where the random nature of the information, including noise, may actually mean that more data is produced than was needed for the original picture.
The area of picture into which it is considered safe to place material, graphics, text or action, so that it will be viewable when received at home. Initially this was necessary with 4:3 aspect ratio screens as they were always overscanned to avoid showing the black that surrounds the active picture. Typically 5% in from the edges was considered safe. More recently the whole Safe Area issue has become far more complicated as there are both 4:3 and 16:9 displays, as well as 4:3, 16:9 and sometimes 14:9 (a compromised version of 16:9 that is more acceptable to those viewing on 4:3 screens) aspect ratios for programme output. In the UK action has been taken to make all commercials so their message is conveyed whichever display aspect is used. The EBU website referenced below provides a very practical reference document as a download.
Website: document R95-2000 at www.ebu.ch/tech_texts/tech_texts_theme.html
A standard for sampling analogue waveforms to convert them into digital data.
The official sampling standard for 625/50 and 525/60 television is ITU-R BT.601.
ITU-R BT.709 and SMPTE 274M specify sampling for HD formats.
Storage Area Networks are now the most common method of providing shared video storage and can offer platform-independent storage that may be accessed from, say,
Windows 2000 and Unix workstations. They allow applications direct access to shared storage by cutting out the usual client-server ‘middle men’ to provide improved workflow and better work sharing on a common store.
Workstation Workstation Workstation Workstation Workstation
SAN Fabric: network links and switches
SAN Fabric: network links and switches
Shared storage device
RAID Tape RAID
The design recognises that moving large amounts of data (video) is inconsistent with normal-network general-data traffic. Therefore they form a separate network to connect data-hungry workstations to a large, fast array of disks. Although any network technology could be used, Fibre Channel predominates. Its original 800Mbit/s data rate (now doubled) and direct connections to disks are ideal for making large, fast storage networks. In practice, both basic networking and storage networking are used side-byside to offer wide scope for sharing and transferring material. Besides disks, essential items are FC switches (if FC is used) and software for file sharing and management.
SANs are scalable but additions may be complex to implement. Currently, expansion is ultimately limited by architecture and management considerations.
Analogue video signals have to be scaled prior to digitising in an ADC so that the full amplitude of the signal makes best use of the available levels in the digital system.
The ITU-R BT.601 digital coding standard specifies, when using 10 bits, black to be set at level 64 and white at 940. The same range of values is ascribed should RGB be used.
Computer applications tend to operate with a different scaling with black set to level 0 and white at 1023. For colour they usually use RGB from 0-1023. However, most still keep to 8-bit accuracy so the scale runs from 0-255. Clearly, going between computers and TV requires processing to change colour space and scaling.
A collection of tables and constraints that describe the structure of a database.
It provides a level of security as no one else can interpret the stored database without the schema; it is just a collection of figures. It organises the database to allow scalability for expansion and defines efficient operation to suit a particular application.
Replay of audio tracks at a speed and pitch corresponding to jog speed – as heard with analogue audio tape ‘scrubbing’ backwards and forwards past an audio replay head.
This feature, which is natural for analogue fixed-head recorders, may be provided on a digital system recording on disks to help set up cues.
The Small Computer Systems Interface is a very widely used high data rate, generalpurpose parallel interface. A maximum of eight devices can be connected to one bus
(16 for Wide SCSI), for example a controller, and up to seven disks or devices of different sorts – hard disks, optical disks, tape drives, scanners, etc. – and may be shared between several computers. The SCSI interface is used by manufacturers for high performance drives while ATA is popular for lower performance drives.
SCSI specifies a cabling standard (50-way), a protocol for sending and receiving commands and their format. It is intended as a device-independent interface so the host computer needs no details about the peripherals it controls. For reasonable performance SCSI does offer ‘plug-and-play’ performance but, for high performance, adding drives may not be as straightforward – there is some considerable variation between devices and cabling must be kept short.
SCSI has continued development over a number of years resulting in the following range of maximum transfer rates:
Ultra SCSI 2 (LVD)
5 M transfers/sec. (max)
10 M transfers/sec. (max)
20 M transfers/sec. (max)
40 M transfers/sec. (max)
80 M transfers/sec. (max)
160 M transfers/sec. (max)
For each of these there is the 8-bit normal ‘narrow’ bus (1 byte per transfer) or the 16-bit
Wide bus (2 bytes per transfer), so Wide Ultra SCSI 2 is designed to transfer data at a maximum rate of 80 MB/s and SCSI 160 at up to 160 MB/s. Continuous rates achieved from individual disk drives will be considerably less, currently up to some 46 MB/s.
Short form for SDTV.
Software Developers Kit. Typically a software and documentation package to facilitate the development of applications to run on a given operating system or other application.
It provides another layer on top of an API, often including shortcuts and pre-built routines to make development easier and final operation faster.
Serial Digital Transport Interface (SMPTE 305M). Based on SDI, this provides real-time streaming transfers. It does not define the format of the signals carried but brings the possibility to create a number of packetised data formats for broadcast use. There are direct mappings for SDTI to carry Sony SX, HD-CAM, DV-DIFF (DVCAM, DVCPRO 25/50,
Digital-S) and MPEG TS.
Serial Digital Transport Interface – Contents Package. A uniform ‘container’ designed for streaming pictures (still and moving), audio, data and metadata over networks.
Developed for use on SDTI, the Contents Package can also be stored.
Packets are handled identically, no matter what they contain, enabling one network to carry any type of content. Network efficiencies are achieved through SMPTE KLV compliance which places a header of Key – the type of content, and Length – number of bytes, ahead of Value – the content. This enables network nodes, switches and bridges quickly to identify the content of the packet without having to read the entire message.
Standard Definition Television. A digital television system in which the quality is approximately equivalent to that of analogue 525/60 or 625/50 systems.
Seek time (a.k.a. Positioning time)
The time taken for the read/write heads of a disk drive to be positioned over a required track. Average seek time is the time to reach any track from the centre track. Maximum positioning time is the time to reach any track from any track. A high performance modern disk will offer around 8 ms average seek time and typically twice that for the maximum. Minimum seek time to adjacent tracks is as low as 0.5 ms. These times are critical to disk performance, especially when operating with the very high data rates associated with video.
Serial Digital Interface (SDI)
The standard digital televsion studio connection based on a 270 Mb/s transfer rate.
This is a 10-bit, scrambled, polarity-independent interface, with common scrambling for both component ITU-R BT.601 and composite digital video and four groups each of four channels of embedded digital audio. Most new broadcast digital equipment includes
SDI which greatly simplifies its installation and signal distribution. It uses the standard
75 ohm BNC connector and coax cable as is commonly used for analogue video, and can transmit the signal over 200 metres (depending on cable type).
Video and audio editing that takes place within a server rather than in a workstation.
A storage system that provides data files to all connected users of a local network.
Typically the file server is a computer with large disk storage which is able to record or send files as requested by the other connected (client) computers – the file server often appearing as another disk on their systems.
The data files are typically around a few kB in size and are expected to be delivered within moments of request.
A storage system that provides audio and video storage for a network of clients.
Those used in professional and broadcast applications are based on fixed disk storage.
Aside from those used for video on demand (VOD), video servers are applied in three areas of television operation: transmission, post production and news. Compared to general-purpose file servers, video servers must handle far more data, files are larger and must be continuously delivered.
There is no general specification for video servers and so the performance between models varies greatly according to storage capacity, number of real-time video channels, protection level (RAID), compression ratio and speed of access to stored material – the latter having a profound influence.
Store sizes are very large, typically from about 500 GB up to a few terabytes. Operation depends on connected devices: edit suites, automation systems, secondary servers, etc., so the effectiveness of the server’s remote control and video networking is vital to success.
In 1948, C. E. Shannon’s article ‘The Mathematical Theory of Communication,’ established Information Theory which allows determination of the theoretical limit of any channel’s information-carrying capacity. Information Theory made possible development of digital systems and without it, much of modern communications, including the Internet, would not exist. Only very recent technology has allowed operation close to the Shannon limit – V.34 33.6 kb/s phone modems are an example.
Signal-to-noise ratio (S/N or SNR)
The ratio of noise to picture signal information – usually expressed in dB. Digital source equipment is theoretically capable of producing pure noise-free images, which would have an infinite signal to noise ratio. But these, by reason of their purity, may cause contouring artefacts if processed without special attention – a reason for Dynamic
A rule of thumb to express the realistic signal to noise capability of a digital system is given by the expression:
S/N (dB) = 6N + 6 where N is the number of bits. Hence an 8-bit system gives 54 dB S/N and a 10-bit system 66 dB. This would be the noise level of continuous LSB dither and would only be produced over the whole picture by digitising a flat field (ie equal grey over the whole picture) set at a level to lie midway between two LSBs. Other test methods give a variety of results, mostly producing higher S/N figures.
The term used to describe the simultaneous transmission of analogue 4:3 and digital
16:9 for the same service. Both versions are transmitted frame accurately at the same time to ensure that no viewer is disadvantaged.
Simultaneous True Random Access
Describes access on a video server where each of its real-time video connections can access any sequence of stored frames regardless of the demands of other video connections. This implies there is no copying of material to achieve this. Such access makes control of video servers much more straightforward, and allows many independent operations to take place at the same time.
Society of Motion Picture and Television Engineers. A United States organisation, with international branches, which includes representatives of the broadcasters, the manufacturers and individuals working in the film and television industry. It has within its structure a number of committees which make recommendations (RP 125 for example) to the ITU-R and to ANSI in the USA.
Static Random Access Memory. This type of memory chip in general behaves like dynamic RAM (DRAM) except that static RAMs retain data in a six-transistor cell needing only power to operate (DRAMs require clocks as well). Because of this, current available capacity is lower than DRAM – and costs are higher, but speed is also greater.
A computer and operating system built for general-purpose use. It cannot be used on its own but must be fitted with any, or many, of the very wide range of specific application software and additional hardware packages available. For example, the same standard platform may be used for accounting, word processing and graphics but each runs from a different software applications package and may need special hardware.
The term has become somewhat confusing in that a standard platform can be anything from a PC to a super computer. Also some applications are mutually exclusive – when
the computer’s hardware is configured for one it has to be re-configured to run another.
It is then arguable whether this is still a standard platform or has it metamorphosed into a dedicated system?
A digital television standard defines the picture format (pixels per line and active lines), vertical refresh rate and whether the vertical scan is interlaced or progressive. For example, European SD digital television is 720 x 576/50I, and an HD standard is 1920 x
Changing the standard of existing television material that may involve two processes
(four if going from and to analogue coded systems such as PAL and NTSC). The two main processes are format conversion to change the spatial (horizontal and vertical) sizes of the pictures and changing the vertical scan rate. For broadcast applications this needs to be completed retaining the maximum possible fidelity of the input content. The former process involves the relatively straightforward task of spatial interpolation – spreading the information from the original pictures over a different pixel structure. Note that the crude method of dropping or repeating lines/pixels will give very poor results and the detail of the interpolation process is important for best results.
The second process is more complex as, changing the number of frames or fields per second means creating new ones or removing some – preferably without upsetting the movement in the pictures, so simply repeating or dropping fields or frames will not do.
For this the movement within the pictures has to be analysed so that in-between pictures can be synthesised. This is a very specialised area and there are highly developed techniques used on the best modern standards converters that do this very well.
Statistical multiplexing (a.k.a. Stat Mux)
This increases the overall efficiency of a multi-channel digital television transmission multiplex by varying the bit-rate of each of its channels to take only that share of the total multiplex bit-rate it needs at any one time. The share apportioned to each channel is predicted statistically with reference to its current and recent-past demands.
For example, football – generally with much action and detail (grass and crowds) – would use a higher data rate than a chat show with close-ups and far less movement. The data streams for each programme are monitored and their bit rates varied accordingly to fit the bit rate of the whole multiplex.
A key signal used in graphics systems – such as the Quantel Paintbox and the
QPaintbox application. It can be drawn, derived from picture information, or both.
It can be used to define the area of an object, obscure part or all of an object, may be transparent and control the application of paint... and more.
Storage capacity (for video and film)
Using the ITU-R BT.601 4:2:2 digital coding standard for SD, each picture occupies a large amount of storage space – especially when related to computer storage devices such as DRAM and disks. So much so that the numbers can become confusing unless a few benchmark statistics are remembered. Fortunately the units of mega, giga and tera make it easy to express the vast numbers involved; ‘One gig’ trips off the tongue far more easily than ‘One thousand million’ and sounds less intimidating.
Storage capacities for SD video can all be worked out directly from the 601 standard.
Bearing in mind that sync words and blanking can be re-generated and added at the output, only the active picture area need be stored. In line with the modern trend of many disk drive manufacturers, kilobyte, megabyte and gigabyte are taken here to represent 103, 106 and 109 respectively.
Every line of a 625/50 or 525/60 TV picture has 720 luminance (Y) samples and 360 each of two chrominance samples (Cr and Cb), making a total of 1,440 samples per line.
There are 576 active lines per picture creating 1440 x 576 = 829,440 pixels per picture.
Sampled at 8 bits per pixel (10 bits can also be used) a picture is made up of 6,635,520 bits or 829,440 8-bit bytes – generally written as 830 kB.
With 25 pictures a second there are 830 x 25 = 20,750 kbytes or 21 Mbytes per second.
For analogue signals there are 487 active lines (480 for ATSC digital) and so 1,440 x 487
= 701,280 pixels per picture (691,200).
With each pixel sampled at 8-bit resolution this format creates 5,610,240 bits, or 701.3
kbytes. At 30 frames per second creates a total of 21,039 kbytes, or 21 Mbytes per second.
Note that both 625 and 525 line systems require approximately the same amount of storage for a given time – 21 Mbytes for every second. To store one hour takes 76
Gbytes. Looked at another way each gigabyte (GB) of storage will hold 47 seconds of non-compressed video. 10-bit sampling uses 25% more storage.
If compression is used, and assuming the sampling structure remains the same, simply divide the numbers by the compression ratio. For example, with 5:1 compression 1 GB will hold 47 x 5 = 235 seconds, and 1 hour takes 76/5 = 18 GB (approx).
There are many video formats for HD but the 1080 x 1920 format is popular. Using 4:2:2 sampling, each line has 1920 Y samples and 960 each of Cr and Cb = 3840 samples per line. So each picture has 3840 x 1080 = 4.147 M samples. For 10-bit sampling each picture has the equivalent data of 5.18 M (8-bit) bytes. Assuming 30 pictures per second these produce 155 M bytes/s – 7.4 times that of SD. An hour of storage now needs to accommodate 560 GB.
2k is used for digitising film with 4:4:4 10-bit sampling in RGB colour space. The file is
2048 x 1556 but this includes 20 lines of boundary between frames on the celluloid, so the actual picture is 2048 x 1536. This makes one frame 11.80 MB, and an hour of storage 1.04TB.
(H x V)
720 x 487
720 x 576
1920 x 1080 5.18
2048 x 1556 12
Streaming (video and/or audio)
Refers to supplying a constant service, often real-time, of a medium. Although broadcast TV has done this from the beginning and SDI streams data, the term is one more usually associated with delivery by networks, including the Internet. The transmission comprises a stream of data packets which can be viewed/heard as they arrive though are often buffered to compensate for any short interruptions of delivery.
For the Internet, media is usually highly compressed to offer acceptable results with 28 kb/s for audio and upwards of 64 kb/s for video. There are three predominant video streaming solutions: RealNetworks with RealVideo, RealAudio and RealPlayer, Microsoft
Media and Quicktime – each with their particular advantages. As Internet transfers are not deterministic, pictures and sound may not always be constantly delivered.
Structured Query Language (SQL)
A popular language for computer database management. It is very widely used in client/server networks for PCs to access central databases and can be used with a variety of database management packages. It is data-independent and deviceindependent so users are not concerned with how the data is accessed. As increasing volumes of stored media content are accessible over networks, SQL is able to play a vital role in finding any required items.
A spatial resolution smaller than that described by a pixel. Although digital images are composed of pixels it can be very useful to resolve image detail to smaller than pixel size, i.e. sub-pixel. For example, the data for generating a smooth curve on the screen needs to be created to a finer accuracy than the pixel grid itself – otherwise the curve will look jagged. Again, when tracking an object in a scene or executing a DVE move, the size and position of the manipulated picture must be calculated, and the picture resolved, to a far finer accuracy than the pixels – otherwise the move will appear jerky.
Moving by a whole line
Lines of input picture
Lines of output picture
Moving by half a line
Moving an image with sub-pixel accuracy requires picture interpolation as its detail, that was originally placed on lines and pixels, now has to appear to be where none may have existed, e.g. between lines. The original picture has to be effectively rendered onto an intermediate pixel/line position. The example of moving a picture down a whole line is achieved relatively easily by re-addressing the lines of the output. But to move it by half a line requires both an address change and interpolation to take information from the adjacent lines and calculate new pixel values. Good DVEs work to a grid many times finer than the line/pixel structure.
Connecting network users via a switch means that each can be sending or receiving data at the same time with the full wire-speed of the network available. This is made possible by the aggregate capacity of the switch. So, for example, an eight-port Gigabit Ethernet switch will have an aggregate capacity of 8 Gb/s. This means many simultaneous highspeed transactions taking place without interruption from other users.
Synchronous (data transfer)
This carries separate timing information (clock data) for keeping send and receive operations in step. The data bits are sent at a fixed rate so transfer times are guaranteed but transfers use more resources (than asynchronous) as they cannot be shared.
Applications include native television connections, live video streaming and SDI.
Operation depends on initial negotiation at send and receive ends but transfer is relatively fast.
Table 2 (ATSC)
Digital television offers new opportunities for the delivery of audio to the home.
The digital audio compression system documented in the ATSC standard for Digital
Television is AC-3. Table 2, Annexe B of this document shows the range of services catered for. The services may contain complete programme mixes or only a single programme element. Potentially this means that viewers can choose from a wide range of audio services as long as the receiver has the capability of decoding them.
Type of service
Main audio service: complete main (CM)
Main audio service: music and effects (ME)
Associated service: visually impaired (VI)
Associated service: hearing impaired (HI)
Associated service: dialogue (D)
Associated service: commentary (C)
Associated service: emergency (E)
Associated service: voice-over (VO)
ATSC Table 2 Audio service types
Website: www.atsc.org (documemt a53)
Table 3 (ATSC)
Table 3 of the ATSC DTV Standard, Annexe A, summarises the picture formats allowable for DTV transmission in the USA. Any one of these may be compressed and transmitted.
An ATSC receiver must be able to display pictures from any of these formats.
Vertical horizontal size value size value
aspect ratio frame rate information code
ATSC Annexe A, Table 3 Picture formats
Aspect Ratio: 1 = square samples, 2 = 4:3 display aspect ratio,
3 = 16:9 display aspect ratio
Frame Rate: 1 = 23.976 Hz, 2 = 24 Hz, 4 = 29.97 Hz, 5 = 30 Hz, 7 = 59.94, 8 = 60Hz
Vertical Scan: 0 = interlaced scan 1 = progressive scan
*Note that 1088 lines are actually coded in order to satisfy the MPEG-2 requirement that the coded vertical size be a multiple of 16 (progressive scan) or 32 (interlaced scan).
The wide range of formats in Table 3 caters for schemes familiar to television and computers and takes account of different resolutions, scanning techniques and refresh rates. Although each has its own application and purpose, some of the combinations are more familiar than others, for example:
480 x 704 at 29.97 and 30 Hz frame rates, interlaced scan is very close to ITU-R BT.601
– although the 704 pixels per line is not directly compatible with 601’s 720. Other familiar standards can be seen in the table. For example, 480 x 640, square pixels, 4:3 aspect ratio and progressive scan is the VGA computer standard. It is included in the
DTV specification to allow direct digital transmission of a signal to VGA displays.
For each frame size, a range of alternative frame rates is available to provide compatibility with existing transmission systems and receivers. 29.97 Hz is needed to keep step with NTSC simulcasts. This frame rate is not required once NTSC is no longer transmitted! 30 Hz is easier to use, and does not involve considerations such as dropframe timecode. 24 Hz progressive (24P) offers compatibility with film material.
A choice of progressive or interlaced scanning is also available for most frame sizes (see
Progressive and Interlaced).Table 3 is concerned with video formats to be handled in the ATSC system rather than defining standards for video production. ATSC’s Table 1 of annexe A refers to the standardised video production formats likely to be used as inputs to the compression table.
ATSC Annexe A, Table 1: Standardised video input formats
An image file format widely used in computer systems. It was developed by Truevision
Inc. and there are many variations of the format.
Timebase Corrector. This is often included as a part of a VTR to correct the timing inaccuracies of the pictures coming from tape. Early models were limited by their dependence on analogue storage devices, such as glass delay lines. This meant that
VTRs, such as the original quadruplex machines, had to be mechanically highly accurate and stable to keep the replayed signal within the correction range (window) of the TBC – just a few microseconds. The introduction of digital techniques made larger stores economic so widening the correction window and reducing the need for especially accurate, expensive mechanics. The digital TBC has had a profound effect on VTR design. Quantel’s first product was a digital TBC for use with IVC VTRs.
Transfer Control Protocol/Internet Protocol. A set of standards that enables the transfer of data between computers. Besides its application directly to the Internet it is also widely used throughout the computer industry. It was designed for transferring data files rather than large files of television or film pictures. Thus, although TCP/IP has the advantage of being widely compatible it is a relatively inefficient way of moving picture files.
Tagged Image File Format. A bit-mapped file format for scanned images – widely used in the computer industry. There are many variations of this format.
Device for converting film images into SD or HD video in real time. The main operational activity here is colour grading which is executed on a shot-by-shot basis which absorbs considerable telecine time. This includes the time needed for making colour correction decisions and involves significant handling of the film – spooling and cueing – besides the actual transfer time. The output of a telecine is digital video (rather than data files).
Digital technology has moved the transfer process on. Now, adding a disk store or server creates a virtual telecine. This enables the film-to-tape transfer to run as one continuous operation, whole film spools are scanned in one pass using a single bestlight pass. In this case the telecine is creating files (rather than digital video) and may be termed a Film Scanner .
Following a defined point, or points, in a series of pictures in a clip. Initially this was performed by hand, using a DVE. Not only was this laborious but it was also difficult, or impossible to create sufficiently accurate results – usually due to the DVE keyframe settings being restricted to pixel/line accuracy. More recently image tracking has become widely used, thanks to the availability of automatic point tracking operating to sub-pixel accuracy. The tracking data can be applied to control DVE picture moving for such applications as removal of film weave, replacing 3D objects in moving video etc.
True Random Access
The ability to continuously read any frame, or sequence of frames, in any order at or above video (or real-time) rate. A true random access video store (usually comprising disks) allows editing which offers rather more than just quick access to material. Cuts are simply instructions for the order of replay and involve no copying, so adjustments can be made as often and whenever required. This results in instant and very flexible operation. At the same time technical concerns associated with fragmentation do not arise as the store can operate to specification when fully fragmented, ensuring full operation at all times. This aspect is particularly important for server stores.
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
True Random Access
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
2 7 13 4 8
11 12 13 14 15
The TrueType vector font format was originally developed by Apple Computer, Inc.
The specification was later released to Microsoft. TrueType fonts are therefore supported on most operating systems. Most major type libraries are available in TrueType format.
There are also many type design tools available to develop custom TrueType fonts.
Quantel equipment supports the import of these and other commercially available fonts.
Removal of the least significant bits (LSBs) of a digital word – as could be necessary when connecting 10-bit equipment into 8-bit equipment, or handling the 16-bit result of a digital video mix on an 8-bit system. If not carefully handled truncation can lead to unpleasant artefacts on video signals. Quantel invented Dynamic Rounding to handle the truncation of digital image data.
Editing where the decisions are made and the edits completed but any can still easily be changed. This is possible in a true random access edit suite where the edits need only comprise the original footage and the edit instructions. Nothing is re-recorded so nothing is committed. This way, decisions about any aspect of the edit can be changed at any point during the session, regardless of where the changes are required.
The process which increases the number of pixels used to represent an image by interpolating between existing pixels to create the same image on a larger format.
There is no implied change of vertical scan rate. Despite its name, the process does not increase the resolution of the image.
Universal Serial Bus – now available as USB 2.0 which, with 400 Mb/s, offers potentially useful connectivity for media applications on PCs and Macs.
Software or hardware that is promised or talked about but is not yet completed – and may never be released.
Variable bit-rate (VBR) compression
While many video compression schemes are ‘constant bit rate’ – designed to produce fixed data rates irrespective of the complexity of the picture, VBR offers the possibility of fixing a constant picture quality by varying the bit-rate according to the needs of the picture. This allows the images that require little data, like still frames in MPEG-2, to use little data and to use more for those that need it, to maintain quality. The result is an overall saving in storage – as on DVDs – or more efficient allocation of total available bit-rate in a multi-channel broadcast multiplex.
Fonts that are stored as vector information – sets of lengths and angles to describe each character. This offers the benefits of using relatively little storage and the type can be cleanly displayed at virtually any size. However it does require that the type is RIPped before it can be used – requiring significant processing power if it is to be used interactively for sizing and composing into a graphic. Quantel’s range of graphics and editing equipment uses vector fonts.
Variable Frame Rate shooting has, until recently, only been possible with film cameras as all electronic cameras work at fixed frame rates. Panasonic’s HD Varicam has changed this and currently offers rates from 4 to 60 fps in one-frame increments.
Although film cameras offer a much wider speed range, VFR is seen as a significant step forward for digital cinematography.
Video over IP
Video projector technology can now show 2k images on large cinema screens.
Such technology is a major part of Digital Cinema development. There are two major technologies used for large-scale projection, D-ILA and DLP Cinema.
Vertical Interval Timecode (pronounced ‘vitsy’). Timecode information in digital form, added into the vertical blanking of a TV signal. This can be read by the video heads from tape at any time pictures are displayed, even during jogging and freeze but not during spooling. This effectively complements LTC ensuring timecode can be read at any time.
An open Quantel standard file format in which full images, browse images, stencils and cut-outs are transferred, and used in a wide range of third-party applications. The format is based on YCrCb 4:2:2 sampling to ITU-R BT.601 specifications. There are also RGB and
CMYK file types.
Vestigial Sideband modulation – an established modulation technique which is used in the RF (radio frequency) transmission subsystem of the ATSC Digital Television Standard.
The 8 VSB system has 8 discrete amplitude levels supporting a payload data rate of 19.28
Mb/s in a 6 MHz channel. There is also a high data rate mode – 16 VSB – designed for
CATV and supporting a payload of 38.57 Mb/s but this has yet to be implemented.
There has been much technical debate in the USA about the choice of 8 VSB.
COFDM has been studied repeatedly as an option but the FCC has made it clear that the existing DTV Standard adopted in 1996 is THE Standard and will remain so.
Since the COFDM/VSB controversy arose some years ago, there have been significant improvements in the design of DTV receivers for the U.S. Further improvements continue to be made as it has become clear that the U.S. is fully committed to the existing DTV Standard.
Website: search for VSB on www.broadcast.harris.com
An audio file format developed by Microsoft that carries audio that can be coded in many different formats. Metadata in WAV files describes the coding used. To play a WAV file requires the appropriate decoder to be supported by the playing device.
See also: Broadcast WAV
A compression technique in which the signal is broken down into a series of frequency bands. This can be very efficient but the processing is more complex than for DCT.
However, some early trials on D-cinema have made use of wavelet compression.
Wouldn’t It Be Nice If... A wish – usually referring to a hoped-for new feature on a piece of equipment.
A TV picture that has an aspect ratio wider than the ‘normal’ 4:3 – usually 16:9 – while still using the normal 525/60 or 625/50 or SD video. 16:9 is also the aspect ratio used for
HDTV. There is an intermediate scheme using 14:9 which is found to be more acceptable for those still using 4:3 displays. Widescreen is used on some analogue transmissions as well as many digital transmissions. The mixture of 4:3 and 16:9 programming and screens has greatly complicated the issue of safe areas.
See: document R95-2000 at www.ebu.ch/tech_texts/tech_texts_theme.html
Clock information associated with AES/EBU digital audio channels. Synchronous audio sampled at 48 kHz is most commonly used in TV. The clock is needed to synchronise the audio data so it can be read.
Write Once/Read Many – describes storage devices on which data, once written, cannot be erased or re-written. Being optical, WORMs offer very high recording densities and are removable making them very useful for archiving. CD-R and DVD-R are examples.
What You See Is What You Get. Usually, but not always, referring to the accuracy of a screen display in showing how the final result will look. For example, a word processor screen showing the final layout and typeface that will appear from the printer. Or in an edit suite, does the monitor show exactly what will be placed on the master recording?
This subject requires more attention as edited masters are now commonly output to a wide variety of ‘deliverables’ such as SD video, HD video, DVD, VHS, digital projection and film. Issues such as colour, gamma and display aspect ratio may need consideration.
Y, Cr, Cb
The digital luminance and colour difference signals in ITU-R BT.601 coding. The Y luminance signal is sampled at l3.5 MHz and the two colour difference signals are sampled at 6.75 MHz co-sited with one of the luminance samples. Cr is the digitised version of the analogue component (R-Y), likewise Cb is the digitised version of (B-Y). For the HD SMPTE 274M standard, sampling rates are 5.5 time greater – 74.25 MHz for Y and
37.125 MHz for Cr and Cb.
Convenient shorthand commonly – but incorrectly – used to describe the analogue luminance and colour difference signals in component video systems. Y is correct for luminance but I and Q are the two subcarrier modulation axes (I – In-phase and Q –
Quadrature) used in the NTSC colour coding system. Scaled and filtered versions of the
R-Y and B-Y colour difference signals are used to modulate the NTSC subcarrier in the I and Q axes respectively. The confusion arises because I and Q are associated with the colour difference signals but clearly they are not the same thing.
Note: The bandwidths of the modulated I and Q axes are different, making the success of chroma keying with the NTSC signal, whether in analogue or digital form, very limited.
This severely restricts NTSC’s use in modern post production.
Y, R-Y, B-Y
These are the analogue luminance, Y, and colour difference signals (R-Y) and (B-Y) of component video. Y is pure luminance information whilst the two colour difference signals together provide the colour information. The latter are the difference between a colour and luminance: red – luminance and blue – luminance. The signals are derived from the original RGB source (e.g. a camera or telecine).
The Y, (R-Y), (B-Y) signals are fundamental to much of television. For example in ITU-R
BT.601 it is these signals that are digitised to make 4:2:2 component digital video, in the
PAL and NTSC TV systems they are used to generate the final composite coded signal and in DTV they are sampled to create the MPEG-2 video bitstream.
Convenient shorthand commonly – but incorrectly – used to describe the analogue luminance and colour difference signals in component video systems. Y is correct for luminance but U and V are, in fact, the two subcarrier modulation axes used in the PAL colour coding system. Scaled and filtered versions of the B-Y and R-Y colour difference signals are used to modulate the PAL subcarrier in the U and V axes respectively. The confusion arises because U and V are associated with the colour difference signals but clearly they are not the same thing. Or could it just be because YUV trips off the tongue much more easily than Y, R-Y, B-Y?
Spots that occasionally appeared in digital pictures when the technology was in its youth. These were caused by technical limitations but now that designs have matured, the zits only now appear during fault conditions.
Digital Film Supplement
Editor: Bob Pank
What started over a decade ago with the use of digital effects and offline in film has grown into the much wider use of digital technology to embrace the whole scene-to-screen process. For many, Avid’s Film Composer was their first encounter with any form of digital film, and today it is commonplace for the offline decision-making process. Certainly the use of equipment like Kodak’s
Cineon, Quantel’s Domino and later Discreet’s Inferno have transformed the whole use of effects shots in motion pictures – even to the extent of changing the way pictures are made. Initially, only the effects shots were put through the film-digital-film processes. Now, thanks to further developments in many areas such as digital storage, scanning, processing and projection, whole features are handled as Digital Intermediates to produce digital masters at full film resolution.
As the technology gaps are plugged events are moving rapidly so that digital pictures and sound can carry all the way from scene to screen. This is already happening with a number of feature films, including Star Wars Episode 2:
Attack of the Clones, shot on digital cameras and digital all the way through to projection in digital, as well as traditional, cinemas. Famously, George Lucas said at NAB 2001 ‘I will never make another film… on film again.’ Clearly he is convinced of the benefits of all-digital production.
The digital scene-to-screen path comprises three distinct areas: capture/shooting, post production and distribution/projection – otherwise know as Digital Cinema. Any part of this process is interchangeable with the celluloid medium, thus a film shoot, digital post and film distribution is increasingly common. To a degree the choice depends on budgets, the required look and the distribution/cinema network that is available.
The use of digital technology through the ‘film’ chain is democratising the ‘film’ features business and changing the art. It is already possible to digitally shoot on DV and edit ‘films’ for very low budgets and, with digital projection, these can be shown without ever touching actual film. Digital projection itself opens the door to new possibilities for cinema operators to show a whole new swathe of digital material including live events. Major feature productions are turning to Digital Intermediates and some are using HD cameras, rather than film, for shooting where the benefits include the immediacy and interactivity of shoot and review, to ensure a good take.
After a century of celluloid the whole business is now changing. Quite rightly, the first attempts at applying digital technology involved at least equalling the quality of 35mm camera negative film. However, what cameras record and the detail that is seen in cinemas are two different things – raising the question of what is film quality? Digital projection has already shown the benefits of digital techniques and the whole digital film business is moving forward.
To understand the achievements of celluloid film and the potential of digital film, this section includes some detail about the nature of both and their sceneto-screen path. As digital technology offers viable alternatives and increasing advantages over film, there is a crossover and exchange of terminology which is best understood by having some knowledge of both areas. Here, the emphasis is on the ‘digital intermediate’ or ‘digital lab’ process, rather than acquisition and exhibition.
The film process is designed for capturing images from scenes that will be edited and copied for eventual projection onto hundreds or thousands of big cinema screens.
This has been in operation for over 100 years and so has been developed and refined to a very high degree to precisely meet these objectives. The film stocks used in cameras have a characteristic that allows them to capture a very wide range of scene brightness with good colour saturation to provide wide latitude for colour correction after processing. Intermediate stocks used to copy the one original negative are designed to be as faithful to the original as possible. Print stocks provide the high contrast needed to produce a bright and good contrast image on the projector screen to overcome the background illumination in the theatre.
Television is different in many ways. For example, the results are always instantly viewable and are delivered, sometimes live, to millions of smaller screens. The sensors used in video cameras do not presently have the wide dynamic range of film and so shooting with them has to be more carefully controlled as the ability to correct exposure faults later is more restricted. Also the viewing conditions for video are different from cinema. Not many of us sit in a darkened room to watch television, so the images need to be brighter and more contrasted than for film.
The three different basic types of film stock used – camera negative, intermediate and print – each have very specific jobs. Camera negative records as much detail as possible from the original scene, both spatially and in range of light to make that original detail eventually available on a multitude of internegatives from which are produced thousands of release prints for projection.
The Film Lab
Between the camera negative and the print there are normally two intermediate stages:
Interpositive and Internegative. At each point more copies are made so that there are a large number of internegatives from which to make a much larger number of release prints. The object of these intermediate stages is purely to increase the numbers of negatives to print as clearly the precious and unique camera negative would be effectively destroyed with so much handling. The intermediate materials, interpositive and internegative, are exactly the same and designed to make, as near as possible, exact copies for each stage (with each being the negative of the previous stage).
For this requirement the material has a gamma of 1.
But the release print is not just a film representation of the shot scenes: editing, visual effects, and grading – not to mention audio work – must take place between.
This mainly works in parallel with the film processing path – partly to reduce handling the negative.
Film post production
Negatives cut from scene list
Rush prints viewed and checked.
Initial grading decided
Edited copy of rush prints used to make master scene list
Master interpositives produced
& distributed to production labs
Release prints produced after final grading figure agreed
Camera negative is printed to make the rush prints which provide the first viewing of the shot material. Note that this will be at least several hours after the shoot so hopefully all the good takes came out well! The first edit decisions about what footage is actually required are made from the rush prints and with the aid of offline editing.
The negative cutter has the responsibility of cutting the unique footage according to the scene list. Initial grading is applied as the cut negative is transferred to interpositive.
Should there be any further need of grading, instructions for this are sent with the internegatives to the print production labs. Any need of dissolves rather than cuts, or more complex visual effects, will require work from the optical printer or, increasingly, a digital film effects workstation.
Grading or Timing
Grading is the process of applying a primary colour correction to the film copying process. The original camera negative may contain lighting errors which will mean that scenes shot on different days or times during the day need to look the same but simply do not. By effectively controlling the colour of the light used to copy the negative to one of the intermediate stages these errors can be much reduced to produce a scene-toscene match. Grading is carried out on a special system equipped with a video monitor displaying the current frame from the negative loaded onto it. Three controls provide settings of the red, green and blue ‘printer’ light values that adjust the amount of each of the three lights used to image the frame. These adjustments allow the operator to balance the colour and brightness of the scenes in the movie.
This results in a table of corrections linked to the edge code of the original negative.
This table may be stored on floppy disk or paper tape and used to control the optical printer making the copy. Most processing laboratories subscribe to a standard definition of the settings but this does not mean that setting defined at one processing lab can be used at another. The photochemical process is very complex and individual labs will vary. However they all aim toward a standard. The ‘neutral’ value for RGB printer lights is represented typically as between 25, 25, 25 to 27, 27, 27 – depending on which lab is used. To print an overexposed negative will require higher values and underexposed lower values. A change of 1 in the value represents 1 /
12 of a stop adjustment in exposure. Differential adjustments of the values provides basic colour correction.
This refers to a way that film image information is transformed into digits and stored.
It uses 10-bit data and so can describe 2 10 or 1024 discrete numbers, or levels: 0–1023 for each of the R, G and B planes in the images. However, as all electronic light sensors are linear, they produce an output proportional to the light they see, in this case, representing the transmittance of the film. This means a large portion of the numbers describes the black and dark areas, and too few are left for the light areas where
‘banding’ could be a problem – especially after digital processing. Transforming the numbers into log by use of a LUT gives a better distribution of the detail between dark and light areas and so offers good rendition over the whole brightness range without having to use more digits. A minimum of 13-bits linear converted to 10-bit log sampling means sufficient detail in the pictures is stored to allow room for downstream corrections.
This is the basis of the Kodak Cineon and SMPTE DPX formats that are widely used in the post production industry.
Short for the image size of 2048 x 1556. This is almost the same as the QXGA computer image resolution, has 3.19 Mpixels and a 1.316:1 aspect ratio – the same as full frame
35 mm film. This image size is increasingly used for digitising full frame 35 mm motion picture film sampled in RGB colour space – making each image 12 MB. Sampling is usually at 10-bit resolution and may be linear or log, depending on the application, and progressively scanned. Note that the sampling includes 20 lines of black between frames because of the use of a full-frame camera aperture. Thus the actual ‘active’ picture area is 2048 x 1536 and has a 4:3 aspect ratio. Removing the aperture creates an ‘open gate’ format which may have no black bar between frames – so all 1556 lines carry picture information.
There are other camera apertures, such as Academy, 1.85 and Cinemascope, which also sample at a nominal 2k size – see Film formats.
Although some may argue for bigger sizes, 2k is considered ideal for digital intermediates – providing enough resolution for digital film. This is born out by the analysis of overall MTF where the practical difference in final resolution between 2k and
4k is very small. Rather than being used just for effects shots, material for whole movies is scanned and post produced in this format. This generates large amounts of data requiring 287 MB/s, or about 1 TB/h for storage. Such quantities mean there needs to be careful planning of associated high-speed infrastructure, storage and management so that data can be available at the right place at the right time. Online storage is on disks. Offline is on data storage tapes such as DTF2 and LTO.
Not being a video format means that there are certain facilities that are not provided.
For example, there are no VTRs for live recording or playout (only possible with disks) no real-time connections and no electronic cameras – the input is always from film.
However it is a very useful mastering format that is not overly cumbersome for today’s technology. Highest quality outputs are available on celluloid by outputting to a film recorder and, by taking a 1920 x 1080 slice from the digital 2k images, for HD and onward publishing. In addition, very high quality SD and other resolutions can be produced most likely by down – or cross-conversion from HD.
Digitising Film: The Need for Logs
When a camera exposes a scene onto negative film it is the start of a complex photochemical process which results in a low-contrast orange-based image appearing on the developed negative. This is the original record of the scene that needs to be digitised for the ongoing processes of colour grading, editing and any effects work to produce a digital master. The design of a digitising scheme to produce the best results requires understanding of camera negative film, electronic light sensors and the nature of digital processing.
One of the great things about camera negative film is its ability to hold information in both the deep shadows and extreme highlights as well as in the mid-exposure range.
A modern camera negative stock will have almost a 10-stop exposure range representing a 500:1 brightness range. However, a typical scene will be lit to give a contrast nearer to 100:1 (approximately 6.7 stops) or, for higher contrast, 300:1.
But print film is different and will only show a portion of the scene contrast stored on the negative. Generally, black to white on the print corresponds to a scene contrast of
200:1 so a 300:1 scene contrast means the Director of Photography (DoP) needs to make a decision on how to print the material (see Tutorial: Film Basics). Does he roll off the highlights to show the deep shadow detail or crush the shadows to reveal the detail in the highlights?
Thus camera negative film has the capacity (latitude) to absorb errors in exposure, changes of mind on how to print a scene and extremes of exposure from specular sources. When negative is digitised it is important that such latitude is maintained to give room for manoeuvre in the downstream post production processes. All this information is photo-chemically recorded into the negative and the varying exposure levels can be measured as densities off the negative. As density (D) is a logarithmic measurement related to opacity (O), so
D = Log O
Thus a density of 0.3 represents an opacity of 2 which means only half the available light is transmitted through the negative. A density of 1.0 represents a tenth and a density of 1.6 (which is about as far as most negative stocks will go) will transmit
40 the light.
The requirement is to digitise negative film images and store the resulting data.
Electronic sensors are very different from our eyes (another photochemical system) in the way they respond to light. The output of all electronic sensors is directly proportional to the light level falling on them and, as they have a finite capacity, they have a limited range of exposure that they can accurately measure. When digitising film, the output of these sensors is then converted to numbers which rise proportionately with the input level provided to the sensor.
It does not take much exposure to increase negative density by 0.3 (transmission = 1 /
) and it takes significantly more exposure to get the density up to 1.0 (transmission =
) and significantly more again to reach a density of 1.6 (transmission = 1 /
). Feeding the intensities into our sensor arrangement shows the following results – assuming the maximum output of our sensor/digitiser is 1000.
Density Transmission Sensor output
none (black) small (dark grey) just below typical scene white extremely bright
The table shows that much of the digitiser’s capacity (number range) is used up in the dark regions, leaving only a small amount for the highlights. Simply increasing the maximum output of the digitiser (i.e. using more bits) can provide better information in the highlights but the resulting data then requires more storage space and raises distribution issues.
As our eyes have a nonlinear response to light, the density measurements make a very sensible coding for the negative film data. So, if we digitise the film to a high linear precision and convert the numbers, which are a direct measurement of transmission, into densities (log
) and then save them to disk, significant storage saving can be made with no perceptible loss of quality. This method of coding is the basis of the Kodak Cineon format and SMPTE DPX format, both widely adopted in post production.
These formats commonly carry 10-bit RGB data, although other forms are possible and will be found in regular use. The 10-bit data represents numbers 0–1023 which are coded as densities from 0–2.046, so each count represents a density change of 0.002. This coding, often referred to as 10-bit log, has sufficient precision to avoid showing digital artefacts, such as banding in lighter picture areas, and can cover the whole density range available from negative film stocks. Each RGB triplet is stored as a 32-bit number that is a considerable saving from the 40 or more bits needed for the original linear coding.
As the method commonly works with density measurements it suits the film industry.
However, there are different ways to measure densities. The general calculation is always the same but, with colour material, there are colour filters to take into account. The data files should be measures of densities either as Status-M or Printing. These will give different colour rendition to the results which do not materially affect processing but should be allowed for, for instance, when setting a ‘film look’ on a monitor. Using material from different sources will require some colour correction to normalize it before use.
For more information refer to the Cineon colormetric specification
For processing, the data can be either handled directly as 10-bit log data or converted into a linear colour space – depending on the requirements.
Processes not requiring the blending of image elements can be executed in log colour space. Here, the on-screen image will appear thin and washed out unless this is corrected by some screen correction process, for example, ‘film look’ to force the display to simulate the effect of ‘printing’ the data. With ‘film look’, an operator can safely colour correct the material and apply DVE and similar types of global adjustment to the image data. What is not technically correct is to apply dissolves, paint, cut and paste and compositing unless the system software knows how to handle this type of data. However, in this pragmatic world some often ignore this detail.
For other processes the data really should be converted to a linear RGB colour space which is designed for display without any further correction on a video monitor.
In theory, the conversion space should have at least the equivalent accuracy of intensity
– meaning at least 13-bit linear for 10-bit log to avoid losses when the result is converted back to 10-bit log. For efficiency, it is possible to do this conversion as it is needed for the painting/blending process. So the user is unaware of the complexity of the operation but will be pleased that the results are of the expected quality.
Shorthand for a digital image size of 4096 x 3112 – four times the area of 2k. This is specified as a digital film format but is not often used: partly because scanning-in from, and output to, film recorders is slow, and the amount of data produced for the RGB images amounts to over 1.1 GB/s. The main use for this is in some film effects applications where the output shots need to be ‘invisibly’ intercut with the original negative.
The answer print, also called the first trial print, is the first print made from edited film and sound track. It includes fades, dissolves and other effects. It is used as the last check before running off the release prints from the internegatives.
Best light (pass)
Similar to a one light pass but, by implication, the timer has studied the film more thoroughly to select a timing light that will agree with the majority of the footage.
Camera negative (film)
Camera negative film is designed to capture as much detail as possible from scenes.
This not only refers to its spatial resolution but also its dynamic resolution. Modern camera negative stock has almost 10 stops’ exposure range and so is able to record detail in both the low-lights and the highlights which are well beyond the range that can be shown on the final print film. This provides latitude to compensate for over or under exposure during the shoot or to change the look of a scene. The latitude is engineered into the film stock by giving it a very low gamma of around 0.6.
Exposed and developed camera colour negative film has an orange tint and is low in contrast – differing greatly from the un-tinted and high contrast print film. As not only the blue, but also the red and green layers of the film are sensitive to blue light, the orange layer is added below the blue layer to stop blue light going further. All types of film stocks use orange dye but for print films it is bleached away during processing.
There are numerous stocks available. High speed stocks work well in lower lights but tend to be more grainy. The opposite is true for low speed stocks.
An RGB bitmap file format (extension .cin) developed by Kodak and widely used for storing and transferring digitised film images. It accommodates a range of film frame sizes and includes up to full Vista Vision. In all cases the digital pictures have square pixels and use 10-bit log sampling. The sampling is scaled so that each of the code values from 0-1023 represents a density difference of 0.002 – describing a total density range of 2.046, equivalent to an exposure range of around 2,570:1 or about 11.3 stops.
Note that this is beyond the range of current negative film.
The format was partly designed to hold virtually all the useful information contained in negatives and so create a useful ‘digital negative’ suitable as a source for post production processing and creating a digital master of a whole programme.
Colour timing (a.k.a. Grading)
The colour of film exposed and processed in a laboratory is controlled by separately altering the amount of time that the red, blue and green lights are used to expose the film. This is referred to as colour timing and its effect is to alter contrast of R, G and B to create a required colour balance.
In a lab, colour timing is usually applied at the point where the edited negative is copied to the master interpositive but can be done later at other points if required. In the digital film process, colour timing is applied at any required point, as required. In addition there is far more flexibility for colour control with gamma, hue, luminance, saturation as well as secondary colour correction. In addition, the results can be seen immediately and projected onto a large cinema screen and further adjusted if required. The images have precise colour settings to show the results as if output via film, or digitally.
Refers to the digital distribution and projection of cinema material. High definition television and the continuing development of digital film projectors using DLP and D-ILA technology allow high quality viewing on large screens. The lack of all-too-familiar defects such as scratches and film weave – even after many showings – already has its appeal. Besides quality issues, D-cinema introduces potential new methods for duplication and distribution, possibly by satellite, and more flexibility in screening.
The SMPTE Task Force On Digital Cinema, DC28, is set up to recommend standards.
An instrument used to measure the density of film, usually over small areas of images.
The instrument actually operates by measuring the light passing through the film, which is a factor of transmittance, and green film will transmit more light than red or blue. Also, negative film differs from positive film. So two sets of colour filters are used to measure Status M density for camera negative and intermediate stocks
(orange/yellow-based) and Status A for print film.
The density (D) of a film is expressed as the log of opacity (O).
D = Log
Using a logarithmic expression is convenient as film opacity has a very wide range and the human sense of brightness is also logarithmic.
If supplying a digital cinema the ‘lab’ operation can be streamlined and simplified as the
‘film’ can remain in digital form for its grading and editing as well as for effects.
Adding the soundtrack allows everything to be placed onto a single digital master which is, effectively, a near-perfect copy of the scanned cut negative. Unlike film which can only be reproduced by one-to-one contact (hence all the interpositive and internegative copying to build up the numbers), digital media is reproduced by a one-to-many operation. All distribution copies, master copies, etc., can be made at one time. This is quicker, cheaper and avoids any wear on the master version.
The distribution can use digital channels. Currently DVDs are distributed and copied to hard disks at digital cinemas. There has also been direct distribution via satellite or broadband. With relatively low data speeds delivery can be overnight, whereas high data speeds can be used for live showings direct from the studio/distributor.
Digital Intermediate (DI)
Generally digital intermediate refers to a digital file or files resulting from a scan of a film
(usually negative) original that is used for the editing, effects and grading/colour correction. It is the material that is used in digital labs and constitutes the whole film.
As such it should carry all the useful information that is contained in the camera negative to provide both the latitude and sharpness of the original for which scanning at
2k resolution, 10-bit log is ideal.
It could also apply to material directly recorded from a digital TV camera – from DV through to HDCam, DVCPRO HD or Viper FilmStream Data camera. Currently there is no electronic camera available working beyond 1920 x 1080 resolution.
Negatives cut from scene list
Rush prints viewed and checked.
Initial grading decided
Edited copy of rush prints used to make master scene list
Film scanner Digital effects/edit w/s Any number of 1st generation or compressed copies, any format etc
The film lab accepts exposed footage and eventually delivers edited and graded masters – in the form of internegatives – to the production labs to produce large numbers of release prints. Although the boundaries may vary, generally the digital lab accepts developed camera negative and outputs edited and graded internegative master for a whole or part of a feature. However, the operational and decision processing between differ greatly from the film lab, not least because of the interactive nature of the operation. In the digital lab, decisions become on-screen reality and seen in full context as they are prepared – no waiting for the ‘chemical’ lab. Grading, dissolves, cuts and effects can be seen immediately and on a big screen – if needed.
The interactive process can be more creative and gives complete confidence that the decisions work well.
Using large-scale digital storage means that long sections of finished material can be sent for output to the digital lab’s film recorder, exposing 1000-foot reels at a time.
SMPTE file format for digital film images (extension .dpx) – ANSI/SMPTE 268M-1994.
This uses the same raster formats as Cineon and only differs in its file header.
The measurement of the range of brightness in a scene expressed as a ratio or the Log
10 of the ratio. Typically the lighting cameraman will try to keep the scene to less than 40:1
(Log = 1.6) to avoid loss of detail in the print. A 100:1 (Log = 2) contrast range in the scene is a typical maximum.
Any film editing, other than cuts, has traditionally required the use of a film optical lab where dissolves, wipes and any compositing work for making effects shots can be carried out. The techniques are highly refined but lack interactivity and are not lossless
– hence the jump in colour and quality before and after a film dissolve. Such effects work is increasingly undertaken in digital equipment. For this the appropriate camera negative footage is scanned and stored – usually onto disks. This is then digitally processed in an effects workstation, which can be lossless irrespective of the number of effects layers, provided that the images remain uncompressed, and the results can be seen immediately. The completed effects shots are then output to a film recorder to produce new negative footage to be cut in with the rest.
Digital film effects
Selected camera negatives Film scanner Digital effects w/s
Negatives cut from scene list + digital effects shots
Master interpositives produced
Exposure refers to the amount of light that falls on a film or light sensor. In a camera this is controlled by both time with the shutter, and the effective lens aperture, referred to as the F-number or T number.
Unlike pre-HD television, which had only two image formats, 525/60I and 625/50I,
35 mm film has many. Of these the most common are Full Frame, which occupies the largest possible area of the film, Academy and Cinemascope. The scanning for these is defined in the DPX file specification as follows:
Scanning resolution Full frame
4k 4,096 x 3,112
2,048 x 1,556
1,024 x 778
3,656 x 2,664
1,828 x 1,332
914 x 666
3,656 x 3,112
1,828 x 1,556
914 x 778
These scan sizes generally represent the valid image size within the total frame size indicated by full frame. It is generally considered that all scanning is done at full frame size as this avoids the complexity of adjusting the scanner optics or raster size with risks associated with repeatability and stability.
In addition, different camera apertures can be used to shoot at different aspect ratios.
All these (below) are ‘four perf’ (a measure of the length of film used) and so all consume the same amount of stock per frame. Note that scanners (and telecines) typically change scan size to maintain full 2k images regardless of aspect ratio. It is no longer normal for work to be scanned at a fixed full frame size.
There are many more 35 mm formats in use.
For lower-budget movies Super 16 is a good match for post production in standard definition television (601).
Equipment which inputs digital images and outputs exposed negative. In this area,
CRT and laser-based technology recorders expose high-resolution images onto film.
Currently CRT-based recorders are fastest outputting 2k images onto camera negative stock at the rate of one per second. The fastest laser-based units take 2.2 seconds for full aperture images
Websites: www.arri.com, www.celco.com
A general term for a device that creates a digital representation of film for direct use in digital television or for digital intermediate work. For television, film scanners are now replacing traditional telecines. For film, they should have sufficient resolution to hold the full detail of the film so that, when transferred back to film, the film-digital-film chain can appear as an essentially lossless process. For this, film scanners are able to operate at greater than HD resolutions (1920 x 1080). 2k is the most commonly used format but 3k and 4k are also in use – mainly for effects-related work. The output is data files rather than the digital video that would be expected from a traditional telecine.
The dynamic resolution needs to fit with the on-going process. If the digital material is treated as a ‘digital camera negative’ to act as a digital intermediate, then it must retain as much of the latitude of the negative as possible. In this case the material is transferred with a best light pass and the linear electronic light sensors, often CCDs, have to sample to at least 13 bits of accuracy (describing 8192 possible levels). Using a
LUT, this can be converted into 10-bit log which holds as much of the useful information but does not ‘waste’ data by assigning too many levels to dark areas of pictures.
Note that this is different from using a telecine to transfer film to video. Here, normal practice is to grade the film as the transfer takes place so additional latitude is not required in the digital state. Here 10 or 8 bits linear coding is sufficient.
Gamma has several meanings. In the video world a CRT television monitor’s brightness is not linearly proportional to its driving voltage. In fact the light output increases rapidly with drive. The factor, gamma of the CRT, is generally calculated to be 2.6.
This is compensated for in TV cameras by a gamma of 0.45 giving an overall gamma of 0.45 x 2.6 = 1.17 – adding overall contrast to help compensate for domestic viewing conditions.
In film gamma describes the average slope of the D/Log E curve over its most linear region. For negative stocks this is approximately 0.6, for intermediate stocks this is 1.0
and for print stocks 3.0. This gives a system gamma of 0.6 x 1 x 3 = 1.8. This overall boost in contrast is much reduced due to flare and auditorium lighting conditions.
Grading (film and digital)
Individually adjusting the contrast of the R, G and B content of pictures to alter the colour and look of the film. Traditionally this has been controlled in film labs by adjusting the amount of R, G and B light to alter the exposure of print to negative – known as colour timing. This is commonly applied during copying to interpositives in order to achieve a shot-to-shot and scene-to-scene match by altering the overall colour balance of the images. This is not a hue shift but a control of the proportions of R, G, and B in the final images. For film being transferred for video post production in a telecine, grading using lift, gain and gamma is executed shot-by-shot, as the film is transferred.
Today there is a growing move toward digital grading which involves one continuous best-light pass through a film scanner with the result being stored to a data server.
Grading then takes place purely in the digital domain and without film in the telecine.
Not only does this free up the scanner for other work but it also involves far less film handling and so is far quicker and involves less risk of damaging the film. Note that digital grading requires that there is sufficient latitude in the digital material to allow a wide range of adjustment. This involves using either linear sampling at 13 bits or more, or 10-bit log DPX/Cineon files.
As a part of the chemical lab film intermediate process internegatives are by contact printing from interpositives. These very much resemble the cut negative. The stock is the same as for interpositives: slow, very fine grain with a gamma of 1, and the developed film is orange-based. Again, to increase number, several interpositives are copied from each internegative. These are then delivered to production labs for large scale manufacture of release prints.
This is a first part of the chemical lab intermediate process where a positive print of film is produced from the cut (edited) camera negative. Interpositives are made by contact printing onto another orange-base stock. In order to preserve as much detail as possible from the negative, including its dynamic range, interpositive material is very fine grain, slow, and has a gamma of 1. During the copying process, grading controls are used to position the image density in the centre of the interpositive material’s linear range. As a part of the process of going from one camera negative to, possibly, thousands of prints, a number of interpositives are copied from the negative.
Latitude is the capacity of camera negative film to hold information over a wider brightness range than is needed for the final print. This provides a degree of freedom that is needed because it is impossible to see if the exposure is totally correct until the film comes back from the laboratory – long after the set has been struck and everyone has gone home. Latitude provides room for later adjustment in printing to compensate for over or under exposure. Using digital cinematography, it is possible to see the results immediately and make any required adjustment at the shooting stage. This procedure can reduce the need for a very wide latitude (which cannot extend to the release prints) by ensuring the lighting and set ups are always correct at the shoot.
Look-Up Table. This refers to a table of conversion factors that are used to transfer information between two differing but related systems. For example, there is often a requirement to look at digital image material to see what it looks like on a CRT, digitally projected and projected via film – all of which have different characteristics. A fast and relatively simple way to process the digital material so that it looks as it should on all displays is to multiply the level of each pixel by a unique number stored in a LUT that corresponds just to that value. Thus a linear CCD output can be processed to drive a
CRT which has a highly non-linear ‘gamma’ characteristic. In addition it provides a path between linear electronically scanned images and the logarithmic world of film.
The Modulation Transfer Function is a measure of the spatial resolution carried by a film – akin to frequency response in electronic images. To assess this, the film is exposed to special test images comprising sine wave bars of successively higher frequencies. The results on the processed film are assessed by measuring its density over microscopically small areas to obtain peak-to-trough values for the different frequencies. These results should then be corrected to allow for the response of the lens, the test film itself and any D/Log E non-linearities.
In a practical film system, the film images pass through many components including the camera lens, intermediate stocks and contact printing to the projection lens.
Each of these has its own MTF and the system MTF can be calculated as follows.
= MTF1 x MTF2 x MTF3 etc…
HD, 2k, 4k: How Big is Big Enough?
Film and digital images are very different materials but both are now used to carry very detailed moving images to big screens. Film has been developed and in wide use for over 100 years whereas digital technology is the new boy on the block and has had to prove itself up to the task. The yardstick has been the quality of 35mm movie film. That in itself leaves plenty of room for discussion as there is a big difference between what is captured on the camera negative and the information shown to cinema audiences – especially those watching a print on its 200 th
, or even 100 th showing. Scratches, dirt and weave aside, one accepted measure of film’s spatial resolution is its MTF.
Measured as the amplitude verses cycles/mm, fine grain colour film (50 EI) is theoretically flat to 20 cycles/mm but half amplitude at around 70. A practical measurement, made with a premium 60mm lens, was flat only to 7 cycles/mm and half amplitude at about 25. The difference can be attributed to many factors including the performance of the lens and the inability to focus accurately through the thickness of the film emulsion.
Practical system MTF for camera and 50 EI film
The film process then continues by copying the material twice (to interpositive and internegative). The material used for this is generally very fine grain and very low speed
(there’s more time in the lab!). A lens-based printer would have the same problems as mentioned above, so it is more common to use a contact printer. This would seem ideal but we should ask what is in contact with what?
Film contact printing
Diffuse light source
The best case is that the film emulsions are in contact but that leaves the original red layer and its copy partner as much as 16 microns apart. As printers use a diffuse light source, separation is one of many opportunities for resolution loss.
MTF changes through printing
1. Camera negative
3. Rush print
4. Release print
0 100 10
Applying the same analysis to scanning film with a CCD array it is possible to develop a mathematical model for its MTF performance. This takes into account the size of the detector cell surface and the spatial size of the modulating pattern. Here, the cycles/mm axis is scaled to represent an Academy film frame with a selection of resolutions: 1500, 3000 and 4000 pixels (the number of cells in the detector) over the width of the film frame.
Theoretical MTF for different resolutions
Once again there are various factors that mean the practical response is less than the theoretical. It is also dependent on wavelength, generally better toward blue and worse toward red. As a result, the whole system response shows not so much difference between the sampling resolutions – even with 1,500 pixel sampling. Looking at 0.5
modulation, all results are close around the 20 cycles/mm frequency. There is very little gain in using 4,000 pixels against 3,000 pixels and 2,000 pixels (2k) perform well.
This is because of the dominance of the MTF losses of the film and camera system on the original negative compared to the digital resolution.
MTF including camera, film, CCD and scanner lens:
Quantel model for film resolution
With current technology it is hard to justify the extra cost of using 4k over 2k. It would be very economic to take advantage of the rapidly developing economies available in
HD and use 1920 pixels/line. The real shortcoming with HD is the 1080 lines making its
16:9, or 1.777 aspect ratio – whereas Academy film is 1.316 (scanned at 2048 x 1556) and keeping to this size allows more flexibility to select the various aspect ratios that may be required for deliverables.
Scanning on a pixel-per-pixel basis – where the value of each output pixel is defined by the output from just one cell of the CCD – will result in aliasing on the digital image, but only at a very fine level. A better solution is to use a larger array, say twice the wanted size, and interpolate down to the required output.
Considering the state of the current technologies in film, lenses and scanners there seems little point in using higher resolutions for whole movies – the extra data will just increase the cost for little, or no benefit. The only possible requirement for greater resolution might be for effects work that requires close attention to very fine detail. That is not to say that higher resolutions may not be used in the future as technologies develop. To date, audiences have been well impressed with digital projections using
1280 x 1024 pixels which, although not a scientific test, is valid as, in the end, it’s the audience that matters.
Film that shows the shot scene as negative images. Negative material is used in several stages of the traditional film production chain that culminates in release prints.
In a one-light pass, all the material is given the same exposure during printing in a film processing laboratory. This is the simplest, quickest and cheapest way to print all the film and the results are typically used for making rushes, dailies, etc. These are often subsequently telecined and recorded to videotape as a reference for the offline decision-making process.
Short for perforations. It is a way to describe some information about the format of images on 35 mm film by how many of the perforations, or sprocket holes, are used per image. For example, Full Frame is 4 perf.
Film stock designed specifically for distribution and exhibition at cinemas. Unlike negative film, it is high contrast and low on latitude. This is designed to give the best performance when viewed at cinemas.
Obviously the release print has to be clear of the orange base so this is bleached out during processing.
The illumination used to expose film in the processing laboratory. ‘White’ light is passed through red, blue and green filters so that the exposure to each can be individually controlled. Film is contact printed, placing the new film stock against the processed film that carries the images. The amount of light can be varied to provide the required exposure to show more detail in the highlights or the shadows or to keep to the mid-range of the scene brightness. To print an overexposed negative will require higher values and underexposed lower values. A change of 1 in the value represents
12 of a stop adjustment in exposure. Differential adjustments of the values provides basic colour correction (timing). The values for the lights are recorded as grading (timing) numbers onto disk or paper tape.
Digital projectors input digital images and project them onto cinema-sized screens.
Huge advances in this technology in the last five years have been one of the driving forces behind digital film. There are two prominent technologies in the large projector area, D-ILA from JVC and DLP from Texas Instruments. The viewing public is generally very impressed with the results as, without film’s scratches, dirt and weave, they are treated to consistent high quality results.
The resolving power of film is a measure of its maximum spatial resolution. To assess this, the film is exposed to special test images comprising sine wave bars of successively higher frequencies. The results on the processed film are then judged by a panel of viewers – making them somewhat subjective.
Status M and Status A
A ratio of amount of light where one stop represents a doubling or halving of the light.
The operating range of film and electronic light sensors, such as CCDs, are quoted in stops. Typically, a camera’s shutter speed and the lens’s aperture setting restrict the light arriving at the sensors/film so the mid brightness of the required scene corresponds to the middle of the sensor’s or film’s sensitivity range for the required shutter speed.
Stops are simply the expression of a ratio, not absolute values. As they represent doubling of light, they are actually powers of 2. So
1 stop = x 2
2 stops = x 4
3 stops = x 8
4 stops = x 16 etc
Note that cine lenses are often marked in f-stops (white) and T-stops (red). The former is a geometric relationship between focal length and aperture and does not take into account how much light is lost within a lens. T-stops do and represent the real working values. So, on a lens that loses a full stop in transmission (i.e. a 50-percent loss), f/8 would result in the same exposure as T11. F and T values are usually close on prime lenses but zoom lenses show a greater difference.
As ‘best light’ but a technical grade is often a lower contrast scan to include all highlights and lowlights. A best grade is often of correct contrast but with some clipping of high or low lights.
Timing and Timer
Timing refers to the amount of the separate R, G and B lights that are used to expose film in a laboratory as a part of the grading process. The term is sometimes also applied to colour correction (grading) during telecine transfers. The timer is one who decides and controls the lights’ timing.
Basic optical measurements
Film has a very different nature to television. The following technical treatment of film optical measurements explains something of how images from a single camera are moved and multiplied to thousands of screens.
We see the information on film by shining a light (incident light) through it and viewing the light transmitted through it. The amount of transmitted light is dependent on the transmittance (T) of the film.
Transmittance (T) = transmitted light/incident light
Opacity (O) is a measure of the resistance of a material to transmit light.
Opacity (O) = 1/T
For convenience, as opacity has a very wide range and our perception of light, and therefore opacity, is logarithmic, it is expressed as a logarithmic value (to base 10) called density.
Density (D) = Log
Exposure (E) is the volume of light falling on a point of a film. Generally what we, and film, see is light reflected off surfaces that have a degree of reflectance. In a camera this will be limited by factors such as attenuation through the lens system (aperture setting) and exposure time.
Exposure (E) = incident light x reflectance x lens attenuation x time
Exposure can have a very wide range and so, once again is usually represented as a log value, Log10 E, and the greater the exposure, the denser the resulting film – producing a negative version of the original image. If this were true for the entire density range then:
D = Log
Ideal D/log E film characteristic
gradient = gamma
0 1 2 3
Although the original aim may have been to faithfully display the original scene, this is simply not possible – a projector cannot deliver the same brightness as the sun!
Making the transmittance of the print equivalent to the reflectance of the scene is also not practical due to photochemical limitations plus with much lower light available in projection than the original scene, it would result in a rather dull image on the screen.
In addition, there is the dilution of contrast by extraneous light. So, to compensate for this, the print film is given more contrast than the original scene by controlling the rate of change of opacity with exposure, called gamma (G), which appears as the slope of the D/log E graph.
In practice, the linear (constant G) range of film material is limited to between the ‘toe’ and ‘shoulder’ of the real-world D/Log E characteristic.
Practical camera negative film characteristic
shoulder linear (gradient = G)
Camera negative film has a low contrast, or density range, and is designed to capture as high a range of exposure as possible – hence the gentle roll-off in the toe and shoulder.
This range is beyond that which is printable (more later) but provides room so adjustments can be made after shooting to compensate for any over or under exposure.
The graph shows information could be recovered from about –1 to +2 of Log E – a range of 10 3 or 1000:1 or about 10 stops. By placing this onto the print film characteristic, which has a much higher gamma (slope) and, therefore, density range (contrast), the negative/print differences are clear.
Comparison of print and camera negative film
For modern colour film materials the following typically apply:
D = 0.6 x Log E (gamma of negative film is approximately 0.6)
D = 3.0 x Log E (gamma of print film is approximately 3.0)
In the printing process, only the linear portion of the characteristics is normally used.
By altering the print lights, the relatively narrow exposure range of print film can be effectively moved along the negative density range for best results in the shadows or highlights.
Printable range uses only a part of the negative image
printable range print negative
offers a free document service available as PDF downloads. Note that Technical
Standards are designated N, Recommendations R, Statements D and Information I.
provides an index. ‘Standards Scopes’ provides a useful glance of what’s available.
offers on-line purchasing. Most relevant items are listed at
Some general browsing
Keep up to date with Quantel products, news and events, access papers and comments and Quantel contact information.
DTV in the UK and general news and information on digital television worldwide.
Daily news and comment and useful industry reference.
Online version of Post Magazine, all about Post in the USA.
Daily news from Europe and beyond.
Includes editorial, news, product and professional centres.
Broadcast Engineering magazine with news, articles, archives and more. www.broadcastnow.co.uk
The online magazine of the UK’s Broadcast Magazine.
The Audio Engineering Society
60 East 42nd Street
Tel: +1 212 661 8528
Fax: +1 212 682 0477
E-mail: [email protected]
American National Standards
11 West 42nd Street
Tel: +1 212.642.4900
Fax: +1 212.398.0023
Department of Trade and Industry
New King’s Beam House
72 Upper Ground
London SE1 9SA
Tel: +44 2072 110211
2500 Wilson Boulevard
Tel: +1 703 907 7500
European Broadcasting Union
Ancienne Route, 17a
Case Postale 67
Tel: +41 22 717 2111
Fax: +41 22 717 2200
E-mail: [email protected]
Route des Lucioles
F-06921 Sophia Antipolis
Tel: +33 (0)4 92 94 42 00
Fax: +33 (0)4 93 65 47 16
Office of Science &
1919 M Street NW
DC 20554, USA
Tel: +1 202 418 0200
Fax: +1 202 418 0232
E-mail: [email protected]
3, rue de Varembé
Case Postale 131
CH-1211 Genève 20
Tel: +41 22 919 02 11
Fax: +41 22 919 0300
E-mail: [email protected]
International Organisation for Standardisation
1, rue de Varembé
CH-1211 Genève 20
Tel: +41 22 749 01 11
Fax: +41 22 733 34 30
E-mail: [email protected]
33 Foley Street
London W1P 7LB
Tel: +44 2072 553000
Fax: +44 171 306 7753
E-mail: [email protected]
Place des Nations
CH-1211 Genève 20
Tel: + 41 22 730 5111
Fax: + 41 22 733 7256
E-mail: [email protected]
Society of Motion Picture and
595 West Hartsdale Avenue
Tel: +1 914 761 1100
Fax: +1 914 761 3115
E-mail: [email protected]
All trademarks acknowledged.
Clipbox, Clipnet, Delta Editing, Dylan, Dynamic Rounding, Editbox, Frame Magic, Hal, Henry, Paintbox,
Picturebox, Quantel eQ, Quantel gQ, Quantel iQ, Qedit pro, Resolution Co-existence,
Stencil are trademarks of Quantel Limited.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project