CCTV Book new edition.indb

CCTV Book new edition.indb
Networking and Digital Technology
Second Edition
Networking and Digital Technology
Second Edition
Vlado Damjanovski
Elsevier Butterworth–Heinemann
30 Corporate Drive, Suite 400, Burlington, MA 01803, USA
Linacre House, Jordan Hill, Oxford OX2 8DP, UK
Copyright © 2005, Elsevier Inc. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior
written permission of the publisher.
Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in
Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail: [email protected]
uk. You may also complete your request on-line via the Elsevier homepage (, by
selecting “Customer Support” and then “Obtaining Permissions.”
Recognizing the importance of preserving what has been written, Elsevier prints its books on acid-free
paper whenever possible.
Library of Congress Cataloging-in-Publication Data
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
ISBN: 0-7506-7800-3
For information on all Elsevier Butterworth–Heinemann publications
visit our Web site at
Printed in the United States of America
05 06 07 08 09 10 10 9 8 7 6 5 4 3 2 1
1. SI units of measurement
The basic units
Derived units
Metric prefixes
2. Light and television
A little bit of history
Light basics and the human eye
Light units
Measuring object illumination in CCTV
Light onto an imaging device
Colors in television
Color temperatures and light sources
Eye persistence
3. Optics in CCTV
Lenses as optical elements
Geometrical construction of images
Aspherical lenses
F and T numbers
Depth of field
Neutral density (ND) filters
Manual, auto, and motorized iris lenses
Video- and DC-driven auto iris lenses
Auto Iris lens electronics
Image and lens formats in CCTV
Angles of view and how to determine them
Fixed focal length lenses
Zoom lenses
C- and CS-mount and back-focus
Back-focus adjustment
Optical accessories in CCTV
4. General characteristics of television systems
A little bit of history
The very basics of television
The video signal and its spectrum
Color video signal
Instruments commonly used in TV
Spectrum analyzer
Television systems around the world
5. CCTV cameras
General information about cameras
Tube cameras
CCD cameras
Sensitivity and resolution of the CCD chips
Types of charge transfer in CCDs
Pulses used in CCD for transferring charges
CCD chip as a sampler
Correlated double sampling (CDS)
Camera specifications and their meanings
Minimum illumination
Camera resolution
Signal/noise ratio (S/N)
Dynamic range of a CCD chip
Color CCD cameras
White balance
CMOS technology
Special low-light intensified cameras
Camera power supplies and copper conductors
V-phase adjustment
Camera checklist
6. CCTV monitors
General about monitors
Monitor sizes
Monitor adjustments
Impedance switch
Viewing conditions
LCD monitors
Projectors and projection monitors
Plasma display monitors
Field emission technology displays
7. Video processing equipment
Analog switching equipment
Video sequential switchers
Video matrix switchers (VMSs)
Switching and processing equipment
Quad compressors
Multiplexers (MUX)
Recording time delays
Simplex and duplex multiplexers
Video motion detectors (VMDs)
Video printers
8. Analog video recorders
A little bit of history and the basic concept
The early VCR concepts
The video home system (VHS) concept
Super VHS, Y/C, and comb filtering
Consumer VCRs for CCTV purposes
Time-lapse VCRs (TL VCRs)
9. Digital video
Why digital video?
Digital video recorders (DVRs)
The various standards
ITU-601: Merging the NTSC and PAL
The resolution of ITU-601 digitized video
The need for compression
Types of compressions
DCT as a basis
The variety of compression standards in CCTV
Motion JPEG 2000
About pixels and resolution
Dots per inch (DPI)
Psychophysiology of viewing details
Recognizing faces and license plates in CCTV
Operating systems and hard disks
Hard disk drives
The different file systems
FAT (File Allocation Table)
FAT 32 (File Allocation Table 32)
NTFS (New Technology File System)
HFS and HFS+
MTBF (Mean Time Between Failure)
10. Transmission media
Coaxial cables
The concept
Noise and electromagnetic interference
Characteristic impedance
BNC connectors
Coaxial cables and proper BNC termination
Installation techniques
Time domain reflectometer (TDR)
Twisted pair video transmission
Microwave links
RF wireless (open air) video transmission
Infrared wireless (open air) video transmission
Transmission of images over telephone lines
Cellular network
Fiber optics
Why fiber?
The concept
Types of optical fibers
Numerical aperture
Light levels in fiber optics
Light sources in fiber optics transmission
Light detectors in fiber optics
Frequencies in fiber optics transmission
Passive components
Fusion splicing
Mechanical splicing
Fiber optics multiplexers
Fiber optics cables
Installation techniques
Fiber optic link analysis
11. Networking in CCTV
The Information Technology era
Computers and networks
The main Ethernet categories
10 Mb/s Ethernet (IEEE 802.3)
Fast Ethernet (IEEE 802.3U)
Gigabit Ethernet (IEEE 802.3Z)
Gigabit Ethernet over Copper (IEEE 802.3AB)
10 Gigabit Ethernet
Wireless Ethernet (IEEE 802.11)
Data speed and types of network cabling
Ethernet over coax and UTP cables
Patch and crossover cables
Fiber optics network cabling
Network concepts and components
Networking software
The Internet protocols
The OSI seven-layer model of networking
1. The Physical layer
2. The Data Link layer
3. The Network layer
4. The Transport layer
5. The Session layer
6. The Presentation layer
7. The Application layer
IP addresses
IPv4 addressing notation
IP address classes
Class A, B, and C
Private addresses
IP address Class C
IP loopback address
Zero addresses
IP address Class D and Multicast
IP address Class E and limited broadcast
IP network partitioning
Virtual private networking (VPN)
IPv6 Address Types
Reserved addresses in IPv6
Domain Name Systems (DNS)
Networking hardware
Hubs, bridges, and switches
Routers for logical segmentation
Network ports
A network analogy example
Wireless LAN
What is 802.11?
802.11 (legacy)
Certification and security
What about Bluetooth?
Putting a network system together
The IP check commands
12. Auxiliary equipment in CCTV
Pan and tilt heads
Pan and tilt domes
Preset positioning P/T heads
PTZ site drivers
Camera housings
Lighting in CCTV
Infrared lights
Ground loop correctors
Lightning protection
In-line video amplifiers/equalizers
Video distribution amplifiers (VDAs)
13. CCTV system design
Understanding the customer’s requirements
Site inspections
Designing and quoting a CCTV system
Installation considerations
Training and manuals
Handing over
Preventative maintenance
14. Video testing
The CCTV Labs test chart
Before you start testing
Use high-quality lens
Use high-quality monitor
Setup procedure
What you can test
Other important measurements
Getting the best possible picture
Measurement of the digital image compression quality
The CCTV Labs test pattern generator TPG-8
How you could use the TPG-8
TPG-8 buttons description
The TPG-8 Navigator software
Instruments used with the TPG-8
Test patterns and how to create them
Appendix A:
Common terms used in CCTV
Appendix B:
Bibliography and acknowledgments
Appendix C:
All the CCTV links in the world
Appendix D: Book co-sponsors
About the author
Closed Circuit Television, commonly known as CCTV, is an interesting area of television technology.
It is usually used in surveillance systems, but a lot of components and concepts can be implemented
in an industrial production monitoring system, or, equally, in a hospital or university environment. So,
even though the majority of readers would be looking at this book as a great help in understanding and
designing surveillance systems, my intention was not to limit the topics to this area only.
This book you are holding in your hands is a new and enhanced version of the previous edition of
CCTV, which was published in 1999. During the past five years, so much has happened in the closed
circuit television industry that there was a need for a new and updated edition. I was pleased to see
the previous book being so highly valued by many readers, as well as constantly rated with five stars
on many web sites, including the popular This has made me even more committed to
making this new edition even better and more informative. I certainly could not change the contents
of the previous edition since the basics of CCTV are still the same, but I did “fine tune” certain sections, added some new illustrations, but most importantly I enhanced the contents with new chapters
on Digital and Networking in CCTV.
I have tried to cover the theory and practice of all components and fundamentals of CCTV. This is a
very wide area and involves various disciplines and technologies from electronics, telecommunications,
optics, fiber optics, digital image processing, programming, and, as of the last few years, networking,
IP communications, and digital image processing. So, my intention was to have a new book which still
encompasses the basic concepts of CCTV but also includes, explains, and demystifies digital CCTV,
video compressions, and networking.
Analog television is a complex technology, especially for people who have never had the opportunity
to study it, but understanding digital is even harder without understanding analog CCTV. So, if you
are not familiar with the analog CCTV, do not think even for a moment that you can bypass it and go
straight to digital and networking. Everything makes so much more sense in the digital once you know
the analog CCTV.
As with the previous edition I had to read and learn new things myself, and then I tried to put everything into the same style and perspective as the previous chapters. Understandably, I did not want to
reinvent the wheel, but I made efforts to simplify and explain the most important aspects of these new
technologies. I would not have felt comfortable writing about these new subjects if I did not have some
practical experience (though modest, at least so far), so that I tried to see it from a CCTV practical
perspective. Should you be interested in more in-depth knowledge of networking and digital there are
numerous books I would recommend (some are listed in the Bibliography section), but this book will
give you a good summary and basics about the relevant CCTV aspects.
As with the previous edition, I have deliberately simplified explanations of concepts and principles,
made many illustrations, tables, and graphs for better understanding, and tried to explain them in a
reasonably plain language. Still, a technical-minded approach is required.
Keeping up to date with the latest technologies and products was made easier through my involvement as
editor of the international magazine for Closed Circuit Television, CCTV focus (
You can find many new technological topics and most up-to-date articles on the magazine’s web site. The
CCTV focus magazine was launched in 1999, the same year the previous edition of this book was published, and it quickly became one of the most respected magazines in the CCTV industry. It has already
been translated into Russian and most likely will be translated into Chinese and German as well. You will
find it the best extension of this book, for it is continuously updated with the latest topics, most of which
are downloadable in Acrobat PDF format.
Another associated web site that could be extremely useful is the CCTV Labs web site (
CCTV Labs is my own company, specializing in consultancy, design, training, and publishing. The CCTV
Labs web site was started in 1995 with the very first edition of the book CCTV and is now one of the longest
serving CCTV web sites on the Internet. My intention was to have as much useful information on this web
site as possible, and I am proud to say that the CCTV Labs web site is now one of the most frequently visited
web sites in the world (in the CCTV industry).
The CCTV Central section of the CCTV Labs web site lists all the known businesses in the world. By
visiting the CCTV Labs web site, you have instant access to almost every CCTV product, company,
and manufacturer. Soon we will have it categorized and will include search tools for finding products
and technologies.
This book is intended for, and will be very helpful to, installers, salespeople, security managers, consultants, manufacturers, and everyone else interested in CCTV, providing they have some basic technical
The specially designed CCTV test chart printed on the back cover of the book will help you in videoquality testing, as explained in the last chapter of the book. This will be very handy for evaluating
cameras, monitors, and transmission, but also the playback quality of recording systems, regardless of
the compression they use. For readers who need a bigger and better test chart, CCTV Labs produces a
high-resolution, light-framed A3 format of the same chart, which can be ordered from the CCTV Labs
web site. The CCTV Labs test chart has been widely accepted, to the point that over 500 manufacturers and businesses worldwide are now using it. It is a great tool to check the quality of your system
and compare it with others.
In addition to the test chart, CCTV Labs has also produced a specialized programmable test pattern
generator (CCTV Labs TPG-8), the only such tool in the industry. It offers solid reference signals for
testing a variety of system parameters. This is not part of the book, although it has been designed as a
logical extension of the CCTV Labs test chart, so we decided to offer a special discount of 10% to all
readers who decide to order it and mention this book as a reference.
So much has changed in the five years since the last edition of this book was published by ButterworthHeinemann (now Elsevier) that the question became not whether a new edition should be written but
only when.
I would like to thank many readers who have already made numerous suggestions and corrections.
Readers who themselves write technical articles would know that no matter how many times one goes
through one’s own text will still find things that could be corrected, or be said somehow differently,
and unavoidably there will be some errors, although I did my best to elminate them. So please, feel
free to write to me if you find something needs to be changed or corrected for future editions. I am
especially thankful to Nicolas Echave from Argentina, for his observations and suggestion in the section for calculating the light falling onto an imaging chip, as well as Bernard Cuzzillo from Berkeley,
California, for his suggestions regarding correct light measurements using a photo camera.
I would also like to thank my colleague Les Simmonds for his assistance in providing me with some
nice oscilloscope measurements and screen shots.
I owe special thanks to Elsevier and its staff for making this book a reality, and in particular, I would
like to thank Pam Chester, Jennifer Soucy, and Sarah Hajduk.
This book has been made possible by Elsevier, as well as the CCTV manufacturers who have believed
in me and co-sponsored this edition. These are: Ademco Video Systems, Axis Communications, Bosch
Security Systems, Dallmeier Electronic, Elbex, Fast Video Security, Geutebrück, ITV, and Pelco.
The biggest thanks should go to CCTV Labs and CCTV focus magazine for their “loss of productivity” during my engagement with this book.
Thank you for purchasing the book and I hope you enjoy reading it.
Vlado Damjanovski, B.E. Electronics
Sydney, November 2004
[email protected]
[email protected]
Web Pages:
All names and trademarks mentioned in this book are registered marks of their respective owners.
This book has 14 chapters and they are written in a logical order.
Chapter 1, SI units of measurement, introduces the basics of the units of measurement which I thought
are important to mention, even though they are not only a CCTV subject, but rather a technical issue. Many products, terms, and concepts exist in the world of CCTV which sooner or later need to
be referred to with a correct unit. SI units are suggested by the ISO (the International Standardization
Organization), and if we accept these units as universal it will make our understanding of the products
and their specifications clearer and more accurate. I have also listed the common metric prefixes because
I have found a lot of technicians do not know them. If you are an engineer or have a good technical
background, you may not find this chapter of interest, so you can go directly to Chapter 2.
Chapter 2, Light and television, starts with a little bit of history so we can gain a wider perspective
of the television revolution. Then we go to the very basics of human vision: light and the human eye.
It is necessary to explain the human eye and how it works because television relies greatly on the human eye’s physiology. It is interesting to compare the similarities between the eye’s and the camera’s
Optics in CCTV is Chapter 3, which focuses on the first and important product used in CCTV, the
lens. Apart from the discussion on how lenses work and what their most important features are, there
is also a practical explanation of how and what to adjust (ALC and Level) on a lens, how to determine
a focal length for a particular angle of view, and very important for CCTV, how back-focusing should
be done. Also, C and CS-mounts are discussed and explained, as well as various chip sizes.
Chapter 4, General Characteristics of television systems, is very important, especially for readers
without prior knowledge of how television works. I have discussed both major standards PAL and
NTSC. I do apologize to readers using the SECAM for not going into detail on this standard. I simply
could not find sufficient literature to study it, although there are many similarities with PAL, at least in
the number of lines and fields per second used. General discussion on resolution is also included, and
more importantly the difference between a broadcast signal and CCTV video signal. Near the end of
the chapter I have also mentioned the most common instruments used in TV and what they measure. At
the end, I have included tables that show the differences between various television system subgroups,
as well as a listing of all the countries in the world with their adopted TV system.
Chapter 5, CCTV cameras, is probably the most interesting chapter in the book. It discusses at length
the concepts of CCD cameras, various designs, and camera specifications. Here, I have also included
a discussion of measurement and calculation of light coming onto a camera, power supplies, and voltage drops. I consider these very important practical issues which I have been asked about very often.
Although they seem trivial, a lot of problems have been caused by improper camera setting or powering
(unregulated or overrated power supply, thin wires, high-voltage drop). I found it suitable to discuss
this issue in the camera section because power supplies form a part of the camera assembly. I have
also included, at the end of this chapter, a very practical checklist which you or your installers can use
in order to make the CCTV installation trouble free.
CCTV monitors are discussed in Chapter 6, and I have devoted space to both B/W and color monitors.
Obviously, my main concentration is on the CRT monitors, as they are the most common in CCTV
today. You will find explanations on various important issues associated with monitors, like gamma,
the impedance switch, and viewing conditions. At the end of this chapter, I have included a description
of some major new developments in the display technology. At the time of the release of the previous
edition of this book, many of these technologies were only technical news, but today some of them
have been or are being widely adopted.
In Chapter 7, Video processing equipment, I have encompassed the “good old” sequential switchers
and then the matrix switchers, as representatives of the “analog” processing range, and of course quads,
multiplexers, video motion detectors, and frame stores as representatives of the “early digital” range.
Chapter 8, Analog video recorders, discusses their very important role in CCTV. Although slowly
being forgotten, the VHS format is explained as it is still a common type, but I have also included
the S-VHS format. Digital video storage, however, is becoming increasingly popular, and I found it
important to say a few words about it in a new chapter.
Chapter 9, Digital video, is the main reason for this new book. From the time of the previous edition of
this book (1999) when digital was only hinted, now (2005) there is almost no system installed without
a digital video recorder or network in place. This heading discusses all the intricacies of digital, and
why it is important to compress. Also, it analyzes the various compressions and puts them in a logical
Transmission media, Chapter 10, is one of the biggest owing to the large variety of transmission types
used in CCTV. Clearly, the coaxial cable is the most common and widely accepted, so I have dedicated
most of the space to the coaxial cable concept. Through my practical experience, and I believe a lot of
readers will agree, I have found that the majority of problems in the existing or just recently installed
CCTV systems are due to bad cable installations and/or terminations. So I have devoted some space
on the actual termination techniques. In the rest of the chapter you will find explanations on the other
media, like twisted pair, microwave, RF wireless, infrared, telephone lines, and, the most important for
the near future (at least in my opinion), fiber optics. You will find quite a lot of space devoted to fiber
optics, starting with the explanation of the concepts, light sources used in fiber, cables, and installation
techniques. This technology is not as new as some may think; rather, it has become very affordable
and easier to use and thus more common in larger CCTV systems.
Chapter 11, Networking in CCTV, includes the other important new technology we now face: Networking and IT technology. This goes hand in hand with the digital CCTV, but logically comes after the
Transmission Media chapter as it does belong to the transmission section. The Networking in CCTV
chapter does not intend to substitute the more in-depth literature you can find on networking and IT
technology (since there is plenty of it around) but it gives the “non-IT” reader some basic concepts and
understanding of the increasingly more important Information Technology.
Chapter 12, Auxiliary equipment in CCTV, includes the good old discussion on pan and tilt heads,
housings, lighting, infrared lights, ground loop correctors, lightning protection, video amplifiers, and
distribution amplifiers.
The previous twelve chapters focus on the equipment side of a CCTV system, so in Chapter 13, CCTV
System design, I discuss my understanding of how to design a CCTV system. This chapter is based
purely on practical experience and on feedback from installers and users. You do not have to accept this
as the only way to design a system, but I have certainly found it is very efficient and accurate. In this
chapter I have also included the actions taken after the system design is finished and installed. These
are: commissioning, training, and handing over. Preventative maintenance is often forgotten, but it is
an important part of a complete CCTV system offer. Even if preventative maintenance is done after
the system is finished I think it is important for this to be listed here as part of the complete picture of
The last, Chapter 14, Video testing, advises readers on the usage of the CCTV Labs test chart, which
I traditionally put at the back cover of the book in order to help you measure and test video. Many
people found the CCTV Test Chart very useful, and, not surprisingly, it has become a de facto industry standard, so it might be interesting to know how to use it. We regularly update and enhance the
chart adding some more useful features. Now you can use the test chart not only to determine camera
resolution, but also to see if you can recognize a person at a certain distance. For the more dedicated
CCTV technicians, the same test chart is also available on A3 format, foam framed, and printed on a
nonreflective chemical proof paper with durable and stable colors. Also, the full description on how to
use the Test Chart, apart from appearing in the book, is also available on our web site. Finally, there is
the new CCTV Labs test pattern generator, the first programmable test pattern generator in the industry, designed and manufactured according to our specifications. Unfortunately, we cannot include this
product with the book, but hopefully, by reading how useful it can be in your video tests, you may be
able to order via the CCTV labs web site (
Appendix A, Common terms used in CCTV, explains exactly what the heading says. I have tried
to include all the terms, acronyms, and names one might come across in CCTV and accompanying
In Appendix B, Bibliography and acknowledgments, you can find some interesting reference material and web sites, some of which I have used in the preparation of this book.
In Appendix C, All the CCTV Links in the World, we include a complete listing of all known CCTV
businesses in the world, courtesy of CCTV Labs and CCTV focus magazine.
I hope that this book will be very helpful and informative in all your CCTV work.
1. SI units of measurement
The basic units
The Laws of Physics are expressions of fundamental relationships between certain physical
There are many different quantities in physics. In order to simplify measurement and to comply with
the theory of physics, some of them are taken as basic quantities, while all others are derived from
those basic ones.
Measurements are made by comparing the magnitude of a quantity with that of a given unit of that
In physics, which Electronics and Television are a part of, the International system of units, known
as SI (from the French Système Internationale), is used.
The following are the seven basic units:
electric current
luminous intensity
amount of substance
These basic units are defined by internationally recognized standards.
The standard for meter, for example, until 1983 was defined as a certain number of wavelengths of a
specific radiation in the spectrum of krypton. In October 1983 it was redefined as the distance that light
travels in vacuum during a time of 1/299,792,458 second.
1. SI units of measurement
The standard of kilogram, for example, is the mass of a particular piece of platinum-iridium alloy
cylinder kept at the International Bureau of Weights and Measurements in Sèvres, France.
The basic unit of time, the second, was defined in 1967, as a “time required for a Cesium-133 atom
to undergo 9,192,631,770 vibrations.”
Kelvin degrees have the same scale division as Celsius degrees, only that the starting point of 0° K is
equivalent to –273° C and this is called the absolute zero.
All other units in physics are defined with some combination of the above-mentioned basic units. For
example, an area of a block of land is defined by the equation:
where a is the width of the block of land, and b is the length. If both a and b are expressed in meters
[m], the product P will be expressed in [m2]. We should mention that in mathematics the multiplication
is not always represented with the × sign as above, but very often a dot · is used in between the factors
being multiplied, or sometimes even without a symbol at all.
We all know that speed, for example, is defined as [m/s], although we quite often use [km/h]. We can
easily convert [km/h] into [m/s] by knowing how many meters there are in a kilometer and how many
seconds there are in an hour.
SI units are almost universally accepted in science and industry throughout the world, and we should
all be aware that measurements like “inches” for length, “miles per hour” for speed and “pounds or
stones” for weight should be used as little as possible. They often cause confusion in people from
various professions and various parts of the world. If you use SI units, more people will understand
you and your product. Also, it is easier to compare products from various parts of the world if they
use the same units.
Another very important thing to clarify is that every symbol in the SI system has a precise meaning
relative to the letter used (capital or small). So, a kilometer is written as [km], not [Km] or [klm]. A
megabyte is written as [MB], not [mB]. A nanometer is written as [nm], not [Nm] and so on. As technical
people involved in closed circuit television, we should stick to these principles.
Derived units
All other physical processes can be explained and measured using the basic units. We will not go into
the details of how they are obtained, nor is it the purpose of this book to do so, but it is important to
understand that there is always a fundamental relation between the basic and derived unit.
The following are some of the derived SI units, some of which will be used in this book:
1. SI units of measurement
Symbol / Definition
Square meter
Cubic meter
Meters per second
Meters per second per second
Hz = 1/s
Kilograms per cubic meter
N = kg·m/s2
Pa = kg/m·s2
Newton meter
T = N·m
Energy, work
J = N·m
W = J/s
Electric charge
C = A·s
Electric potential
V = Ω/A
Electrical resistance
Ω = V/A
Electrical capacitance
F = C/V
S = A/V
Magnetic flux
Wb = V·s
Magnetic field intensity
T = Wb/m2
H = Wb/A
lx = lm/m2
Luminous flux
lm = cd·steradian
nt = cd/m2
1. SI units of measurement
Metric prefixes
When the number of units (i.e., the value) for a particular measurement is very high or very small,
there is a convention for using certain symbols before the basic unit and each has a specific meaning.
The following are metric prefixes accepted by the international scientific and industrial community
that you may find not only in CCTV but also in other technical area:
100 = 1
By using these prefixes, we can say 2 km, referring to 2000 meters. If we say 1.44 MB, we are thinking
of 1,440,000 bytes. A very common measurement of data transmission speed over networks is expressed
in megabits per second (Mb/s), which is different from megabytes per second (MB/s). One byte is equal
to 8 bits, and they are denoted with lower case “b” for bits and capital “B” for bytes. A nanometer will be
1. SI units of measurement
0.000000001 meters. The frequency of 12 GHz would be 12 · 109 = 12,000,000,000 Hz and so on.
A very common unit used these days in CCTV when handling hard disk drives is gigabytes (GB). One
gigabyte is equal to thousand of megabytes, or a million of kilobytes. The correct value for binary 1
GB megabytes is 1024 MB (which is 210), and the correct binary value for 1 MB is 1024 kB. When
hard disk manufacturers write 300 GB on their disks, this represents a decadic 300,000,000,000 bytes.
So when such a hard disk is installed in the computer, the operating system reports 279 GB. This is the
real binary value, and it is obtained by dividing 300,000,000,000 with 1024 to get kB, then with 1024
again to get MB, and finally with 1024 again to get GB.
Now that we have established the basics of a technically correct discussion that is, introduced the
basic units of measurement, we can start with the fundamentals of all visions, including photography,
cinematography, and television – light.
2. Light and television
Let there be light.
A little bit of history
Light is one of the basic and greatest natural phenomena, vital not only for life on this planet, but
also very important for the technical advancement and ingenuity of the human mind in the visual
communication areas: photography, cinematography, television, and multimedia.
Even though it is so “basic” and we see it all the time and it is all around us, it is the single biggest
stumbling block of science. Physics, from a very simple and straightforward science at the end of the
nineteenth century, became very complex and mystical. It forced the scientists in the beginning of the
twentieth century to introduce the postulates of quantum physics, the “principles of uncertainty of the
atoms,” and much more – all in order to get a theoretical apparatus that would satisfy a lot of practical
experiments but, equally, make sense to the human mind.
This book is not written with the intent of going deeper into each of these theories, but rather I will
discuss the aspects that affect the television and video signals.
The major “problem” scientists face when researching light is that it performs a dual function: it behaves
as though it is of a wave nature (nonmaterial) – through the effects of refraction and reflection – but
it also appears as though it has material nature – through the well-known photo-effect discovered by
Heinrich Hertz in the nineteenth century and explained by Albert Einstein in 1905. As a result, the
latest trends in physics are to accept light as a phenomenon of a “dual” nature.
It would be fair at this stage, however, to give credit to at least a few major scientists in the development
of physics, and light theorists in particular, without whose work it would have been impossible to attain
today’s level of technology.
Isaac Newton was one of the first physicists to explain many natural phenomena including light. In
the seventeenth century he explained that light has a particle nature. This was until Christian Huygens,
later in that century, proposed an explanation of light behavior through the wave theory. Many scientists
had deep respect for Newton and did not change their views until the very beginning of the nineteenth
century when Thomas Young demonstrated the interference behavior of light. August Fresnel also
performed some very convincing experiments that clearly showed that light has a wave nature.
A very important milestone was the appearance of James Clerk Maxwell on the scientific scene, who in
1873 asserted that light was a form of high-frequency electromagnetic wave. His theory predicted the
speed of light as we know it today: 300,000 km/s. With the experiments of Heinrich Hertz, Maxwell’s
theory was confirmed. Hertz, however, discovered an effect that is known as the photo-effect, where
light can eject electrons from a metal whose surface is exposed to light. However, it was difficult to
explain the fact that the energy with which the electrons were ejected was independent of the light
2. Light and television
intensity, which was in turn contradictory to the wave theory. With the wave theory, the explanation
would be that more light should add more energy to the ejected electrons.
This stumbling block was satisfactorily explained by Einstein who used the concept of Max Planck’s
theory of quantum energy of photons, which represent minimum packets of energy carried by the light
itself. With this theory, light was given its dual nature, that is, some of the features of waves combined
with some of the features of particles.
This theory so far is the best explanation for the majority of light behavior, and that is why in CCTV
we apply this “dual approach” theory to light.
In explaining the concepts of lenses used in CCTV, we will be using, most of the time, the wave theory
of light, but we should always have in mind that there are principles like the CCD chip’s operation, for
example, based on the light’s particle behavior. That is why, in this case, we will be using the material
approach to light.
Clearly, in practice, light is a mixture of both approaches, and we should always have in mind that
they do not exclude each other.
Light basics and the human eye
Light is an electromagnetic radiation. The human eye is sensitive to this radiation and to various
radiation frequencies it picks up as colors. Electromagnetic radiation obviously comes in all frequencies,
i.e., wavelengths, as can be seen
in the drawing on the right. The
visible light occupies only a very
little “window” in this range. This
window is between 380 nm and
780 nm. We take this, however,
to be roughly from 400 nm to 700
nm, for easy remembering.
The 400 nm corresponds to violet
and 700 nm to red color. There is
a continuous color change from
the violet to blue, green, yellow,
orange, and red as the wavelength
increases. Many experiments and
tests have been done to check the
sensitivity of an average human
eye and, as can be seen from the
drawing, not all colors produce
the same effect on the eye’s
The electromagnetic spectrum and human's eye
2. Light and television
Green color excites the eye the most. In other words, if we have all the wavelengths of the light with an
equal amount of energy, the green will produce the highest “output” on the retina. Frequencies higher
than the violet (wavelengths shorter than 400 nm) and lower than the red (wavelengths longer than 700
nm) cannot be detected by the “average” human eye. I emphasize this “average” because human eye
sensitivity is a statistical curve. There are people who are “color blind,” which means their eye spectral
sensitivity is different (usually narrower) from the one shown. Some “color-blind” people cannot see
red color, some cannot see blue. A trained, professional eye of a painter or a photographer may develop
very high sensitivity for detecting various frequencies (colors) which might look the same to others.
Some may even extend their minimum and maximum detectable frequency limit, that is, see deeper
violet or red colors that are invisible to other individuals.
A very interesting question to ask ourselves is why is the eye’s spectral sensitivity maximum in the
green color area (at around 555 nm)? This can be associated with the fact that of all the sun’s energy
that penetrates the Earth’s atmosphere, the biggest amount is contained in the wavelengths at around
555 nm.
After millions of years of evolution of life on this planet, we (and most of the animals) have developed
vision using wavelengths that are most readily available (at least during the daytime). An obvious
alternative is the night vision eye characteristics of animals whose food targets are warm-blooded
mammals. Body heat is nothing more than infrared radiation. Typical examples are snakes, cats, and
owls. Some snakes, for example, apart from using the eyes for general vision, also have infrared sensitive
pit organs with which they can detect temperature change of less than 0.5º C (1º F). Cats, including wild
cats such as leopards, pumas, and other members of the cat family, are known for their good nighttime
vision, which would mean that their near infrared response is far better than that of the human eye.
We will concentrate on the human eye, and it is very important to understand the “construction” of it.
This will perhaps be of general interest, but we will also see a lot of conceptual similarities between
the eye and the TV camera construction.
Cross section of the eye
This cross section shows
that the eye has a lens
that focuses the image
onto the retina. The
retina is actually the
“photosensitive area,”
which is composed of
millions of cells, called
cones and rods. These
cells can be considered
a part of our nervous
system. The cones are
sensitive to the medium
and bright intensity of
light and they actually
sense the colors. The rod
2. Light and television
cells are sensitive to lower light levels, and they do not distinguish colors. We use rod cells to see
at night, which means when it is dark we cannot distinguish colors.
The number of cones in each eye is approximately 10 million, and the number of rods is over 100
million. The cones are concentrated around the area where the optical axis passes. This area is colored
with a yellowish pigment and is called the fovea. The fovea is the central area that our brain processes
and although it is a small area, the concentration of cones there is approximately 50,000. The average
focal length of an eye (i.e., the distance between the lens and the retina when an infinitely distant object
is being viewed) is approximately 17 mm. This focal length gives an undistorted image in a solid angle
of approximately 30°. This is also the size of the area most populated with the cone cells. This is why
an angle of about 30° is considered a standard angle of vision.
The concentration of cones increases toward the center of the optical axis with the peak being at only
10°. Each of these cone cells is connected to the brain via separate optic nerves, through which
electrical pulses are sent to the brain. The eye, of course, sees a much wider angle, since the retina
covers nearly a 90° solid angle and there are cones outside the yellow area as well, but these other
cones are connected to each nerve in groups. With this area we do not see as clearly as when we use
the single nerve cones, and that is why this area is known as the peripheral vision area.
The brain’s “image processing section” concentrates on 30°, although we see best at around 10°. This
processing is further supported with the constant eye movement in all directions, which is equivalent to a
Eye – Camera similarities
2. Light and television
pan/tilt head assembly in CCTV.
For a single lens reflex (SLR) camera the standard angle of view of 30° is achieved with a 50 mm lens,
for a 2/3'' camera this is a 16 mm lens, for a 1/2'' camera a 12 mm lens, and for a 1/3'' camera an 8 mm
lens. In other words, images of any type of camera, taken with their corresponding standard lenses,
will be of a very similar size and perspective as when seen through our eyes.
Lenses shorter in focal length give a wider angle of view and are called wide-angle lenses. Lenses
with longer focal length narrow the view, and therefore they look as if they are bringing distant objects
closer, hence the name telephoto (“tele” meaning distant). Another matter of interest associated with
CCTV is that by knowing the focal length of the eye and the maximum iris opening of approximately
6 mm, we can find the equivalent “F-number” (discussed later in the book) of the eye:
Fnumber-eye = 17/6 = 2.8
With such a fully opened iris we can still see quite well in full moonlight (this is approximately 0.1 lux
at the object). Have this number in mind when comparing the minimum illumination characteristics
of different cameras.
The focusing that the human eye does in order to see objects at various distances is achieved by
changing the thickness of the lens. This is done by the ciliary muscles. If the eye is normal, it should
be able to focus from infinity down to a minimum distance of about 20 cm in early childhood, to 25
cm at age 20, to 50 cm at age 40, and to 5 m at age 60. When we look at something very far away, that
is, eye focused on infinity, the ciliary muscles are relaxed and the lens is thin.
If the eye cannot focus at infinity that vision defect is called nearsightedness, or myopia. Such eyes
require glasses to help the “defective” human eye lens focus the image on the retina. These glasses are
sometimes called reducing glasses because they have a negative focus (or diopter).
A diopter is the inverse value of the focus of a lens, where the focus is expressed in meters. Reducing
glasses have a negative diopter. So, “reducing” glasses with a diopter of – 0.5, for example, have a
negative focus of 1/(– 0.5) = – 2 m.
Another defect an eye may have is when it cannot focus
on an image that is very close, that is, the eye’s lens
cannot be thickened enough for some reason.
This defect is called farsightedness, or
People with hypermetropia need glasses to be able to see
close objects sharply. These glasses would need to have
the opposite characteristics from those in the previous
case, that is, they would have to be magnifying glasses
with positive focus and diopter.
Simulation of how the eye works
2. Light and television
Correcting eye deficiencies with glasses
2. Light and television
Two eyes produce images that when mixed in our brain, give a stereoscopic impression of the volume
of space. If we cover one eye, it is very hard to judge the “three-dimensionality” of the space in front
of us. The distance between the eyes (60–70 mm) ensures our perception of three dimensions up to
10–15 m away. After this distance it is very hard to judge which of two objects is closer. This can be
experimented with by trying to see two objects in the air at different long distances, for example. If we
are looking at, let us say, two distant trees, the brain brings a conclusion on the basis of the soil and
perspective in front of us, but the perspective “decision” would not be concluded on the basis of the
eye’s “stereoscopic mechanism.”
It is amazing when you think about the complexity of the eye and the brain’s power for “image
processing.” We perform these operations hundreds of times a day without even thinking about it, not
to mention the fact that the images that fall on the retina are upside down, owing to the nature of the
optical refraction, and we also do not consider the eye movement in all directions when we follow
something. All of these things are being deciphered and controlled by the brain.
The “eye–brain” configuration is far superior to any camera that the human mind has, or will ever
invent. But, as technical people, we can say that by understanding how the eye “works” and using
the ever improving visual technology, both in hardware and software, we are getting better images
and more sophisticated information about the world around us and we can view things the human eye
cannot see or monitor things in places where the human cannot be present.
With experiments and testing it has been found that the most a human eye can resolve is no more
than 5 ~ 6 lp/mm (line pairs per millimeter). This refers to an optimum distance between eye and
object of around 0.3 m, as when we are reading a fine text. This equates to a minimum viewing angle
of about one-sixtieth of a degree (1/60°). So, 1/60º is considered the limit of angular discrimination
for normal vision. We can use this minimum angular vision for better understanding and optimizing
the psychophysiology of the viewing.
A known viewing distance parameter, from the Monitors chapter later in the book (Chapter 6),
recommends for CCTV viewing a distance of around seven times the monitor height. So, we should
understand that the viewing distance is an important factor for the experience of seeing fine details in
an image. It is of no use if a viewer gets closer to the monitor, but it is also not going to get any better
if he is positioned further away from the monitor screen.
Human eye's resolving power
2. Light and television
Light units
Light is a physical phenomenon but is interpreted by psychological processes in our brain. It is,
therefore, a bit more complex to measure than other physical processes. Some prerequisites have to
be established in order to make objective measurements. One of these is the bandwidth of the light
frequencies considered, and this is usually from 400 nm to 700 nm. All of the frequencies contribute
to the light energy radiated by the source.
Let us, first of all, make clear the kind of light sources we have. The basic division is into two major
• Primary sources (the sun, street lights, tungsten lights, monitor CRTs)
• Secondary sources (all objects that do not generate light but reflect it)
We do not apply the same type of measurement when measuring the amount of light radiated by a
tungsten globe, for example, and the light reflected by an object. It is not the same if we are analyzing
light radiated from a source in all directions, or just in a narrow solid angle. These are some of the
reasons we have so many different units of light measurement.
The science that examines all these different aspects is called photometry, and the units defined are
called photometric units.
Many different units have been defined by various scientists, depending upon the point of view taken.
Because of this, CCTV camera specifications are even harder to understand and describe precisely.
But let us try to shed some light on these units and explain what they mean. We will start in a logical
order, that is, the source of the light, traveling through space, falling onto an object, and finally as it
is reflected from it.
Luminous intensity (I) is the illuminating power of a primary light source, radiated in all directions. The
unit that measures this kind of light is the candela [cd]. One candela is approximately the amount of
light energy generated by an ordinary candle. Since 1948 there has been a more precise definition of
a candela as the luminous intensity of a black body heated up to a temperature at which platinum
converges from a liquid to a solid state.
Luminous flux (F) is the luminous intensity but in a certain solid angle. The unit for luminous flux is,
therefore, obtained by dividing the luminous intensity with 4 π (pi) radians (a sphere has 4 π = 12.56
steradian) and is measured in lumens [lm]. One lumen is produced by a luminous intensity of 1 cd
in one radian of a solid angle.
Because the sensation of brightness depends on the human eye sensitivity, the luminous flux depends
on the wavelength as well. For example, 1 watt of light power with 555 nm color (green) produces
approximately 680 lm, whereas all other wavelengths, with the same light power, produce proportionally
fewer lumens (see the eye spectral sensitivity curve). It is therefore meaningless to express light power
in watts, even if, theoretically, light energy like any other energy can be expressed in watts.
2. Light and television
Illumination (E) is the most
commonly used term in CCTV,
especially when referring to the
camera’s minimum illumination
characteristics. Illumination is
very similar to the luminance
except that we are now referring
to objects that are secondary
sources of light.
Therefore, the illumination
of a surface is the amount of
luminous flux on a unit area.
When luminous flux of 1 lumen
falls on an area of 1 m2 (square
meter), it is measured in lumens
per square meter or metercandelas, but it is better known
as lux [lx].
This means that if we have a
sphere of 1 meter radius, a light
source, with luminous intensity
of 1 candela inside this sphere,
Light units and their meaning
it will produce illumination
on the internal surface of 1 lx.
Mathematically, this relation can be described as:
E = Flux / Area = F/A
The flux F is, by definition, equal to luminous intensity times the solid angle, i.e.,
From the basics of volumetric trigonometry, and assuming a punctual source of light, we can express
ω through the area A being lit and its distance from the source d:
When (2) and (3) are replaced in (1), we get
which means that the illumination falls off with the square of the distance when the perpendicular
area is being lit. If, however, this area is at a certain angle to the incoming light, we can approximate
2. Light and television
the real area with the projection at an angle θ, as per the diagram shown here. In that case the formula
(4) becomes:
E = I · cos θ / d 2
Typical levels of illumination are shown on the following drawing:
Some typical levels of illumination
Very rarely, in certain small areas and from very strong light sources, levels higher than 100,000 lx
can be experienced (in the vicinity of a strong flashlight, for example). To describe such illuminations,
higher units called phots are sometimes used. One phot is equal to 10,000 lx.
In American terminology, where square feet are still widely used instead of the SI units, illumination
is expressed in square-foot candelas, or better known as foot-candelas. Because of the “square meter
– square foot” ratio, equal to nearly 10 (or more precisely 9.29), it is reasonably easy to convert luxes
into foot-candelas and vice versa. Basically, if an illumination is given in foot-candelas, just divide it by
10 and the approximate value in luxes is obtained, and if a value is given in luxes, in order to convert
it to foot-candelas, multiply it by 10.
Luminance (L) describes the brightness of the surface of either a primary or a secondary source of
light. Since brightness embeds subjective connotation, luminance is used as an objective, scientific
term. Luminance depends both on the luminous intensity of the surface itself and on the angle at which
it is being observed. It is therefore measured per unit of projected surface area perpendicular to that
direction. There are quite a few units for luminance. The internationally preferred metric unit is nit. One
nit is equal to one candela per square meter of projected surface area (I/A). If, instead of candelas,
lumens are used to describe the luminous flux of a source, the luminance will then be expressed in
apostilbs [asb]. Things get a bit more complicated when we have a surface where the luminous flux
radiated, or reflected, in a direction θ to the normal is directly proportional to cos θ. Such a surface
will appear equally bright when seen from all directions because both the reflected light and projected
surface area follow the same cosine law. This type of surface is called a lambert radiator or reflector
(depending on whether the surface is a primary or secondary source of light) and is usually described as
a perfectly diffusing surface. For the purpose of measuring such light luminance in the metric system,
a unit called lambert was introduced. The equivalent American unit would be the foot-lambert.
How much of the illumination is seen by the camera depends not only on the intensity of the source
itself, but also on the reflectivity of the object being illuminated. Obviously, it is not the same if the
object is white as opposed to black. With the same amount of light we can, naturally, see more if the
2. Light and television
objects are white. This is why we have to introduce another factor when talking about illumination,
and this is the percentage of object reflectivity. The definition of reflectivity could be described with
the following simple relation:
ρ = light reflected from surface / light incident on surface = E/L
Realistically, this percentage ranges from a very low 1% for black velvet to 32% for a typical soil
surface and up to 93% for bright snow in the field of view. Caucasian human flesh has a reflectivity
factor between 19% and 35%. The CCTV Labs test chart enclosed on the back cover of this book has
an approximate reflectivity factor of 60~70%.
The reflectivity is an important factor when stating a camera’s minimum illumination because with the
same level of illumination and various reflectivity factors, an object may appear more or less bright,
indirectly affecting the camera performance.
Measuring object illumination in CCTV
Very often you may have to measure and quantify the object illumination in your CCTV system. You
can use lux meters available on the market for such measurement. When using such meters, you should
check what is its measuring range. Typical lux meters have precision down to 1 lux (which might be
sufficient for majority situations). Low light illuminations of below 1 lux (typical for night viewing)
cannot be quantified unless you have a high-quality and expensive photographic light meter. There are
a number of known brands on the market, such as Sekonic, Minolta, and Gossen. Some of them will
even give you readings directly in luxes.
If you do not have a lux meter, a typical single lens reflex camera (SLR)
lightmeter can be used to measure the
same, although such measurements will
not be shown in luxes, but rather as film
exposure and F-stop setting. This could
be an extremely useful tool, and here I
will explain the principles and formula
to calculate such lux measurement.
Please note that most SLR cameras
would have a light meter, while nonSLR cameras may not necessarily have
it. So if you cannot find any indicators
for exposure and aperture on your camera, you may not be able to make use
of this. Please also note that a logical
prerequisite for a more accurate measurement is to have the SLR camera
Photo courtesy of Pentax
A typical light measurement display in a modern
SLR camera
2. Light and television
The EV graph to lux conversion using Exposure and F-stop measurements
have the same field of view as the potential (or existing) CCTV camera. For this reason the best lens to
have on the SLR camera is a zoom lens so that you can adjust the angle of view as close to the CCTV
camera as possible.
First, let us just refresh our memory about some basic rules of photographic film exposure.
The exposure indicators on all photographic cameras are in seconds, or to be more precise, in fractions
of a second. This means that when a camera light indicator shows 125, it actually indicates 1/125 of a
second. If the exposure is longer than 1 second, it is usually denoted with an “s” after the number; for
example “2 s” indicates 2 seconds. Standard exposure numbers are 1; 2; 4; 8; 15; 30; 60; 125; 250;
500; and 1000. These are all parts of a second. There are cameras that can set the exposure for longer
than 1 second and shorter than 1/1000. As you may notice, the values are chosen so that they represent
approximately half of the previous number.
The indicators for the lens iris opening, or aperture, are shown as “F-stop” values. So the number “5.6”
2. Light and television
indicates F-5.6. The higher this number is, the
smaller the lens iris opening is. Typical F-stop
numbers are 1.0; 1.4; 2; 2.8; 4; 5.6; 8; 11; 16;
22; 32; and 44. At each F-stop the opening is
half the area size it was at the previous F-stop,
that is, half the amount of light transmitted than
the previous F-stop.
For a correct exposure of the film in your camera an internal light meter is used, which sets
the correct time duration and aperture when
exposing the film. In “Program” mode, both
values are chosen automatically by the camera.
In “Aperture-priority” mode, you set the F-stop
and the camera computer selects the exposure.
In “Exposure-priority” mode, it is the other way The relationship between reference numbers,
around; you select the exposure and the camera
the exposure, and the F-stop
selects the aperture (i.e., F-stop).
Combinations of the exposure and F-stop can be such that they allow for equal amounts of light to get
onto the film. For example, if you or the camera selects 1/30 s and F-5.6, you will produce the same
effect on the film with 1/60 s and F-4. Of course, with the latter F-stop you would have a slightly narrower depth of field, but other than that the film will be correctly exposed too. Because of this “equality” of the amount of light with different exposure/F-stop combinations, photographic experts have
advised an Exposure Value (EV) rating for the amount of light that can be measured by the camera
light meters. We will not be going into detail of exactly how the light is measured inside the camera,
as this would require a full book to cover all models, but in general there are “Averaging” light meters, “Spot” light meters, and “Multi-pattern” light meters. I will not discuss these in depth because
they are beyond the scope of this book, but a majority of cameras would have at least the “Averaging”
light metering. This is close enough for our CCTV applications where illumination levels can only be
determined approximately.
When you buy your photo camera, you will usually find an EV graph somewhere inside the camera’s
manual, indicating its light measurement capability. This graph should look similar to the one shown
in the picture on the previous page. Most of the time the EV graphs refer to a film (or CCD chip if it is
a digital photo camera) sensitivity of 100 ISO, which is a pretty standard film. For this reason, in our
calculations in this text, we have assumed a film setting on your camera of 100 ISO. Of course, any
other film sensitivity can be used; you just need to adjust the findings accordingly.
The EV graph is very simple to read. For example, a combination of 1/30 s and F-5.6 makes an EV
value of 10. The same exposure can be achieved with 1/60 s and F-4 since they also have a combined
exposure value of 10.
The EV scale is put together by summing up the Reference Numbers of the exposure and the F-stop
(RNt and RNf). The table is shown above and indicates that both of these, the exposure and the F-stop,
2. Light and television
have value “0” for exposure of 1 second and the aperture is F-1.0. Then, the reference number goes to
1 for the next smaller value, being ½ s for the exposure and F-1.4 for the aperture. The table continues
like that, that is, reference number 2 is given to ¼ s and F-2, and so on.
EV values are obtained by summing up these reference numbers. For example, exposure of 1/30 s
and F-2.8 have an equivalent EV of 8 because the reference number for 1/30 s is 5 and for F-2.8 is 3.
Here are simple formulas that I discovered (as published in issue 9 of the CCTV focus magazine)
which will give you a very good approximation of the RN numbers with the simple use of a scientific
RNf = 6.7log(F-stop)
where F-stop is the number of the F-stop indicated by the camera light meter, that is, 5.6; 8; 11; and
so on.
RNt = - 3.32logt
where t is the absolute exposure time; that is, if the camera shows 1/125, this is what you put under t.
If preferred, you could use just the number 125 (we will call it T) instead of the absolute time t but the
minus sign in front of the logarithm disappears, that is, the second formula becomes:
RNT = 3.32logT
Please note: the logarithms are with base 10.
The EV is calculated by adding these two values:
EV = RNf + RNt = 6.7log(F-stop) - 3.32logt
or, if T is used instead of t:
EV = RNf + RNT
Let us work out one example.
If my camera, loaded with a 100 ISO film, shows 1/250 exposure setting and F-8, the reference numbers
for F-stop and exposure can be calculated as:
EV = RNf + RNt = 6.7log8 - 3.32log(1/250) = 6.7 × 0.9 + (-3.32) × (-2.398) = 6 + 8 = 14
the result is rounded.
There is a simple connection between EV values and the camera measurements described by the following equation:
Ilux = 2.5 × 2(RNf + RNt) = 2.5 × 2EV
2. Light and television
The right-hand side of the above equation is 2 to the power of the EV number and Ilux is obtained in
luxes. For example, if the EV value of what the camera has measured is 15, this means the approximate
illumination of the scene is:
Ilux = 2.5 × 215 = 81,192 lux
Of course, such precision is impossible when measuring light, since many factors influence light measurement, including the reflectivity of the surrounding objects, the primary sources of light in the field
of view (a light pole in the field of view will affect the average illumination dramatically), and so on.
We would usually approximate the above result with 82,000 luxes.
Please note that the “dynamic range” of the light meter EV measurement may vary from camera to
camera. Better cameras will have wider range. Also, do not forget to set 100 ISO when using these
measurement instructions. Of course, if 200 ISO film is used, everything will be shifted for 1 EV value
as the 200 ISO film is twice as sensitive as the 100; 400 ISO film is four times as sensitive, and the
EV values will be shifted by two numbers. For example, if a measurement with 200 ISO film gives 16
EV, this is equivalent to 15 EV light reading with 100 ISO film.
For the end let us work out a practical example.
If my light measurement shows exposure of 1/15 s and F-2.8 (at 100 ISO film setting), this would give
EV(F-2.8+1/15) = 6.7log2.8 - 3.32log(1/15) = 3 + 4 = 7
Ilux = 2.5 × 27 = 320 lux
To convert this value in foot-candles, you need to divide the value with 10, which gives around 32
It should be common knowledge that a bright sunny day will give illumination of around 100,000 lux,
a typical office environment would have anything between 100 and 1000 lux, a full moon night should
produce around 0.1 lux, and so on.
A bright sunny day will give an EV reading of around 15 or 16, while for comfortable CCTV monitoring
at night street-lights should produce an EV value of around 3, which converts to around 20 luxes.
Be aware of your light meter EV range. Many cameras have EV range between 1 and 20 EV. This indicates that the lowest light illumination you could measure with such a camera is around 5 lux. This
should be sufficient for a majority of CCTV projects, but if you want to measure even lower illumination I suggest you take a look at some of the professional light meters.
2. Light and television
Light onto an imaging device
In order to fully understand the “light issue,” as seen by the camera, we need to know how much light
actually falls on the imaging area.
The illumination amount at the CCD (or CMOS) chip, ECCD, depends mostly on the luminance L of
the object, but also on the F-stop of the lens, that is, the light-gathering ability of the lens. The lower
the F-number (bigger iris opening) the more light will get through the lens, as will be explained
later in the book. It is also proportional to the transmittance factor τ of a lens. Namely, depending on
the quality of the glass and its manufacture, as well as the inner walls of the lens mechanics, a certain
percentage of the light will be lost in the lens itself.
All of the above factors can be combined into the following relation:
ECCD = π · τ · L / (4 · F 2)
L = average luminance of the object (lux)
τ = transmittance of the lens (in percentage)
F = the actual F-stop of the lens used
π = 3.14
In the next few lines we will show how this relation is obtained and approximated, so that the technical
people using these formulas can have a clear understanding of what is being assumed in order to get
to formula (11). However, because these calculations involve slightly more complex mathematics, I
suggest that readers with no interest, or without the background, should just directly use relation (11)
as it is, knowing the values of L, τ, F, and π.
An object viewed by a camera, when lit
by a light source, radiates light, more or
less, in all directions, depending upon
the reflectivity function. In practice, the
majority of smooth surface objects can
be approximated with a Lambertian
perfectly diffusing surface.
The flux, then, can be regarded as passing
through a hemisphere of radius r and center
ds. If we now consider the incremental
angle dθ at an angle θ to the normal, the
flux occupying the volume of a revolution
swept out by the angle dθ passes through
Lambertian diffusing surface
2. Light and television
an annular ring on the surface of the sphere, with width rdθ and circumference 2πrsinθ.
This elementary surface area is given by:
dA = 2πr2sinθ dθ
and hence the solid angle ω that it subtends at the center of the sphere is given by:
ω = dA / r2 =2π r2 sinθ dθ /r2 = 2π sinθ dθ
Since for a Lambert surface the luminous intensity (flux per steradian) in a given direction falls as the
cosine of the angle to the normal, we have the luminous intensity of the whole surface in the direction
of the normal as I, and then at an angle θ it will be given with I cosθ.
The luminous intensity dI of a small area ds will be given by:
dI = I cosθ ds / s [lumens/steradian = candelas]
Since I/s is the actual luminance L in the perpendicular direction, the above relation becomes:
dI = L cosθ ds
The elementary flux dF is equal to the elementary intensity dI times the solid angle:
dF = L cosθ ds 2π sinθ dθ [lm]
The total light emitted into a cone of an angle θ can be found by integration from 0 to θ:
F = ∫ 2 π Lds sinθ cosθ dθ = π L ds sin2θ
If we want to find the total flux radiated in all directions, we have to put 90° for the angle θ so that the
total flux emitted in all directions will then be:
Ft = π L ds
Now, if we have to calculate the flux emitted into a solid angle smaller than 90°, as may be the case
when a camera is viewing an object, the total flux Fo is given by the formula:
Fo = π L dso sin2θo
If the lens transmission factor is τ, then the flux falling on the CCD or CMOS chip plane is:
FCCD = Fo τ = π τ L dso sin2θo
The illumination of the imaging chip would be flux divided by the imaging chip area dsCCD, that is,
ECCD = π τ L sin2θo dso /dsCCD
2. Light and television
The ratio (dsCCD /dso ) ,which is inverse in the preceding formula, is also known as the magnification
ratio of a lens m. The magnification ratio can also be approximated as a ratio between the focal length
of the lens and the distance to the object.
m = (f/D) 2 = dsCCD /dso
When we replace (18) in (17), it becomes:
ECCD = π τ L sin2θo (D/f) 2
We need to introduce here another ratio in a lens (d/f ), which is also known as the lens F-stop (this will
be explained in more detail in Chapter 3). For objects at a reasonably long distance from the camera
(again, this is typical in CCTV) we get the following to be true:
tang θo = d/2D = sin θo / cos θo = sin θo
Such an approximation can be made because for very long distance to objects the angle θo is very small
and the cosine of such angles is very close to 1.
So, we can substitute sin2 θo with (d/2D)2 , and thus equation (19) becomes:
ECCD = π τ L (d/2D)2 (D/f) 2
If we sort this out we will have:
Calculating the light radiation with Lambertian diffused light source
2. Light and television
ECCD = π τ L (d 2/4D2) (D2/f 2) = τ π L (d 2/4f 2)
And finally this becomes the simplified formula for calculating the light amount falling onto an imaging
ECCD = π τ L/(4 F 2)
This is a very useful formula because it uses only two variables (the luminance of an object and the lens
F-stop) to calculate the approximate illumination that falls onto an imaging chip. But the approximation
we made should not be forgotten, so it should be used only for rough calculations and in cases that
correspond to the conditions of the approximation, that is, the camera looking at an object with diffused
light, similar to Lambertian source (most of the real-life objects are like that, except mirrors and surfaces
alike), at a reasonably long distance relative to its focal length lens. Usually, the lens transmittance factor
τ ranges between 0.75 and 0.95. If you do not have the correct number from your lens manufacturer,
for calculation purposes a realistic transmittance factor can be taken to be 0.8.
Calculating the amount of light falling onto a CCD chip
Let us work out an example. If the light at the object plane is around 300 lx, as in an average office area
(this would be Eobject ), the luminance can be found using the reflection coefficient of the surrounding
objects, that is, L = ρ Eobject. As mentioned earlier, reflection factors vary substantially with various
objects, but we will not be far from a real office situation if we assume 50%. If the lens we are using
has an iris setting of, say, F-16, the illumination at the CCD plane will be approximately ECCD = 0.8 ·
3.14 · 300 · 0.5 / (4 · 256) = 0.36 lx. This, combined with the camera’s automatic gain control (AGC),
is a realistic illumination for a CCD chip plane for a full video signal. If, however, the lens iris is set to
F-1.4, for example, the illumination of the CCD plane will be approximately 48 lx (using relation (17)).
This is a far higher value than the CCD chip needs, and in practice it can only produce a recognizable
video if an auto iris lens is used, or if the camera has an electronic (or CCD) iris builtin. If a manual iris
lens is used with an F-1.4 and the camera’s AGC is set to off, 48 lx at the chip will produce a saturated,
or washed-out, white image.
A very basic rule of thumb is that even a lens with the lowest F-number attenuates the light for a factor
of 10 +. The higher the F-number, the lower the amount of light that reaches the CCD plane. In fact,
2. Light and television
it is inverse proportional to the square of the F-number.
With the above conclusions we are actually tapping into a very interesting question raised with CCD
cameras (especially B/W, i.e., cameras without infrared cut filters): If the object illumination is as at a
full sunny day (approximately 100,000 lx), the F-number has to be very high in order to “stop-down”
the light as required by the CCD chip. This is in the vicinity of 0.1 ~ 0.3 lx (or close) for a full video.
Such an F-number is actually so high that it requires the attenuation of the lens to be in the order of
over 1,000,000 times. Using the approximated formula (27), assuming the same values for τ = 0.8, ρ
= 0.5, and assuming the camera CCD chip requires 0.2 lx for a 1 Vpp signal, we will get an F-number
of 886.
This is an extraordinarily high F-number to be achieved by mechanical means (leaves shutter). The
precision of the leaves’ movement is limited, and, more importantly, an unwanted optical effect called
a Fresnel Edge Refraction becomes noticeable with small iris openings. This means that, in practice,
very high F-stops cannot be achieved by using just mechanical methods. So, special optical neutral
density (ND) filters are used to “help” the leaves shutter achieve high F-stops as required by the sensitive
CCD chips. The inferior optical precision of such filters could make an image appear less sharp in very
bright light and yet quite good in lower or normal light conditions.
Colors in television
Colors are a very important and complex issue in CCTV. Although some people still prefer monochrome
(B/W) cameras because of their greater sensitivity and response to the infrared invisible spectrum, color
cameras have become widely accepted. In the last few years (since the previous edition of this book
in 1999), an increasing number of camera manufacturers are offering so called Day/Night cameras,
which switch to B/W mode automatically when the light level falls below certain range.
Color offers valuable additional information on the objects being monitored. More importantly, the
human eye captures color information quicker than the fine details of an object. The drawback of using
color cameras is their not so good performance in low-light levels. The reason for this is the usage of the
optical infrared cut filter on the color CCD chips, which attenuate the light and eliminate the invisible
infrared portion of the projected image (more on this in Chapter 5). With the ever improving CCD
technology, however, the color camera minimum illumination performance has improved dramatically.
From 10 lx @ F1.4 at the object, of a few years ago, we now have cameras that can see down to less
than 1 lx @ F1.4 at the object, or even lower.
As already explained under the Light Basics and the Human Eye section which is in this chapter, the
colors we see are actually various wavelengths of light. When we see red, for example, it is a wavelength
reflected from a red object when white light is shone on it. Black absorbs almost all wavelengths,
whereas white reflects most of them.
The science of colors is very complex and becomes even more complicated when the natural colors
around us are reproduced by the phosphor coating of the cathode ray tubes (CRTs).
2. Light and television
The concept of producing colors in television is by additive mixing of three primary color phosphor
dots next to each other. These are tiny dots, representing parts of a mask that is on the inside of a
monitor’s CRT. A similar concept of mixing color is used on Plasma and LCD monitors. We are going
to explain CRTs in more detail because they are most common in CCTV.
The actual color mixing happens when we view the monitor from the viewing distance (usually a meter
or a couple of meters) and the resultant color of each of the three dots appears in our eyes.
The additive color mixing in television is opposite to the one in painting and printing technology, where
colors are obtained by subtractive mixing.
In additive mixing, light is produced by the phosphor coating of a CRT and adding colors makes
the resultant color brighter. Therefore, to get white, all three colors need to be present with
their corresponding amounts. Resultant colors are obtained by adding and therefore the name
With subtractive mixing of colors, when we use paper or acrylic as a secondary source of light (reflected),
colors are mixed in our eye after they are reflected from the surface. If we mix (add) all the primary
colors, we produce darker colors instead of brighter. The colors are mixed by reflected light, whose
color is defined by the pigment, which absorbs (subtracts) the wavelength its surface has.
Getting back to television, three colors, as mentioned, are used as primary colors: red, green, and blue,
usually referred to as RGB.
Television theory and experiments have shown that with these three primary colors most of the natural
colors can be represented (but not all).
Obviously, there are three different phosphor coatings inside the color CRT, each of which radiates
its own color when bombarded by
the electron beam.
The three primary phosphor
coatings have different luminosity
properties, which means equal
beam intensity produces unequal
brightness. In order to compensate
for these discrepancies of the
primary phosphors, every color TV
and monitor has a special matrix
circuit that multiplies each of the
color channels with a different
compensating number. This can
be shown by the very well-known
color TV luminance equation, which
is electronically applied to the three
primary signals in the CRT:
Color images on TV are made up of three
phosphor mosaics (RGB).
2. Light and television
The RGB shadow mask
Lscreen = 0.3R + 0.59G + 0.11B
The blue phosphor produces more light than the other two; it therefore has to be multiplied by 0.11 in
order to reduce the luminance to be equal to the other two components.
In this book we will not go much deeper into the theory of colors in television, for it requires a book
on its own. It is important, however, for the reader to appreciate the complexity of the issue and accept
that all colors as seen on TV are obtained by visual additive mixing of the three primary colors of the
CRT phosphor: red, green, and blue.
Color temperatures and light sources
Very often in television, CCTV, and photography, the term color temperature is used when talking
about light sources. Color temperature refers to the temperature to which an imaginary perfectly black
body is heated and consequently produces light.
The theory of physics states that the spectrum of light generated by heating is mostly dependent on
the temperature of the body and not on the material. This very important statement has been proven
by the physicist Max Planck whose formula explains the relationship between the peak wavelengths
radiated and the temperature to which the body is heated:
λm = 2896/T
In the above relation λm is the wavelength and T is the temperature in Kelvin degrees.
From the diagram on the next page it can be noted that the peaks for different temperatures are outside
of the visible spectrum, that is, in the infrared region. For tungsten (wolfram) filament light, the working
color temperature is around 3000° K, and more than three-quarters of the energy is radiated in the
2. Light and television
infrared region in the form of heat. Heat is nothing more than infrared light. Higher temperatures for
tungsten lights cannot be used because the melting point of wolfram is around 3500° K. Increasing
the temperature to more than 2800° K will dramatically shorten the lifetime of the tungsten light. In
today’s tungsten globes, the air is extracted from inside the bulb in order to minimize the burning of the
filament. Tungsten light is good for B/W cameras, since they are more sensitive to the infrared portion
of the spectrum. Color cameras have to be compensated for the yellow/reddish color produced by a
2800° K light globe typically found in domestic lighting.
For accurate testing of cameras, very often a light source of
around 3200º K is specified. Such lights can be purchased from
professional photographic shops, but there is a general rule of
thumb that can be used to calculate the color temperature and
the lumens produced by such a light source:
500W tungsten => 3200º K (approximately 27 lumens/watt)
200W tungsten => 2980º K (approximately 17.5 lumens/watt)
75W tungsten => 2820º K (approximately 15.4 lumens/watt)
A typical photographic tungsten
It is known that a tungsten light source produces a yellowish
light source with 3200º K
image on a photographic film camera. In order to compensate
for this blue optical filters (complementary color) can be inserted on the lens itself. Electronic cameras
(CCTV and TV) compensate the yellowish color shift electronically by changing the primary colors’
information by a certain percentage. Most of the CCTV cameras have the so-called automatic white
balance (AWB) circuitry that adjusts its color temperature automatically upon powering the camera
Spectral characteristic of a black body at various temperatures
2. Light and television
up and seeing a larger white area. A more advanced camera can readjust such a white balance “on the
fly,” that is, without powering the camera down and up again. This white balance is usually referred
to as automatic tracking white (ATW) and is very practical especially when using pan/tilt/zoom (PTZ)
cameras covering a larger area, part of which might be an area with tungsten light, for example, and
another with neon light.
The sun, as a natural source of light, has
a very high physical body temperature,
but the equivalent light color temperature
that we get on the Earth’s surface varies
with the time of the day and weather
conditions. This is due to the light
reflection and refraction through the
atmosphere. As shown in the table of
Color temperatures of various light
sources on the next page, on a clear day,
at noon, the color temperature reaches
over 20,000° K, while on a cloudy day
it drops down to nearly 6000° K. This is
why photographs taken at sunset hours
appear reddish. The lower the color
temperature, the redder the pictures will
appear, and the higher it is, the bluer they
will appear.
Standard light sources
Artificial sources of light have various color temperatures, depending on the source. The abovementioned formula (29) applies to heat sources only, that is, sources of light where a metal is heated up
to a high temperature. There are, however, gas sources of light, where light generation is of a different
nature. Neon lights, or mercury vapor lights, for example, generate light when an electromagnetic
field is applied to them. The atoms are excited by an energy sufficient to cause certain atom reactions,
and energy is released in the form of light. This light is of a discrete character due to the quantum
behavior of the atoms. The position(s) of the wavelength(s) will depend on the gas used. Some of the
glass tubes used with such gases are coated on the inside with a fluorescent powder that might absorb
certain primary wavelengths and then regenerate a continuous secondary spectrum of visible light.
Gas sources can also be described by their color temperature; only in this case we use a so-called
correlational color temperature.
For the purposes of having a reference point and correct color reproduction, standard sources of white
light have been defined. There are a few definitions (standards) used in practice. These standard sources
of white light are marked as A, B, C, D6500, and W.
Source A is the most natural standard as it represents a tungsten (wolfram) light globe, filled with
some gas to reduce burning of the filament. That is why most of the other later developed standards
are based on source A. As mentioned earlier, at a certain temperature, the characteristics of a wolfram
2. Light and television
Color temperatures of various light sources
2. Light and television
light coincide a great deal
with the radiation of a
black body. This means the
spectrum of source A, at a
certain temperature, can
be represented by only one
detail – the temperature,
which is equal to the
temperature of the black
body. To be precise, the
real temperature of the
wolfram and the black body
at which their spectrums are
supposed to be identical is
not exactly the same. The
Spectral energy dissipation of various light sources
black body is hotter by
approximately 50° K. The
spectrum characteristic of the standard source A is defined as a color temperature of 2854° K, while
the real filament temperature is approximately 2800° K. This is an insignificant difference, however,
and the theoretical approximation is valid and accepted as a descriptive factor for the color temperature
of such sources.
Standard source B radiates white light, similar to direct sunlight at noon. Source B can be obtained by
filtering the light from source A through a special light filter.
Similarly, by using another type of light filter, standard light source C can be obtained. The characteristics
of sources B and C cannot be represented with the color temperature of a black body, as can be seen
on the diagram above. However, if the color of a black body looks similar to either of the sources B
or C, we use the term correlational color temperature. So, the correlational temperature of source B
is 4880° K, and for source C it is 6740° K.
The International Committee for Light (CIE) in 1965 suggested a new standard source of light, which
is supposed to represent an average daylight color temperature and is represented as the D standard. The
recommended correlational color temperature for the standard D is 6500° K, so the standard is marked
as D6500. This source of light cannot be obtained by modifying source A, but its spectral characteristic
can be approximated with some other physical sources, as is the case with a correct mixture of the
three phosphor coatings of the CRT of a color monitor. An important fact to remember is that D6500 is
often used as a reference for color monitors.
Last, there is another, fictitious, light source with a uniform distribution of radiated energy, which looks
like a flat horizontal line. This is only for calculating purposes, and the code of this light source is W.
The human eye adapts to the color temperature differences quite easily, and our brain automatically
compensates the color variation due to different light sources. Film emulsions, tubes, and camera CCD
chips are a bit different. When using a film camera, special films or optical filters have to be used if
color temperature needs to be corrected. With TV cameras this is achieved by electronic compensation,
2. Light and television
which can be either manual or automatic.
Finally, and as already mentioned, do not forget to take into account the color temperature of the
monitor screen. The majority of CRTs are 6500° K, but some of them might have higher (9300° K) or
even lower (5600° K) temperatures.
Eye persistence
For us in CCTV, it is very important to know how the human eye works, and as we will see further in
the text, we actually use an anomaly of the human eye in order to “cheat” the brain into thinking we
see “motion pictures.” This anomaly is the persistence of the human eye.
Eye persistence is the most important “eye defect” used in cinematography and television. The
eye does not react instantly to the changes of light intensity. There is a delay of more than a few
milliseconds during which the brain gets the information about the object we are watching. This delay
increases with an increase of the object’s illumination. Not all parts of the retina have equal persistency.
The central area around the fovea has longer persistency. Eye persistence also depends on the spectral
characteristics of the light source, that is, its color and brightness.
The above eye deficiency is very important for the concept of motion pictures. As can be seen on
the graph on the next page, the persistence depends very much on the intensity of the light, or the
brightness of the area we are looking at. The brighter the area is, the faster we have to change the
pictures if we do not want to notice the flicker. The first movies from the beginning of the twentieth
century, cartoons, and even the cartoon “flipping books” we used to play with as kids are based on the
concept of persistency.
When pictures with a logical consecutiveness are played in front of our eyes at a speed equal to or
faster than the persistency of the eye, we will see continuous moving pictures even though the pictures
are still, individually.
A movie camera records images with a speed of 24 pictures/second. This is usually enough for the film
to be projected with a very low light intensity projector, as in the beginning of the cinema revolution.
For bigger audiences, bigger and stronger light projectors were needed, as well as brighter screens (as
we have today). So it was obvious that the initial 24 pic/s speed needed increasing.
From a photographic point of view, which is very similar to the cinematographic one, it is impractical
to increase the frame rate of the movie camera from 24 pic/s to a higher rate because the exposure
time of every film frame will have to be shortened. To achieve that, the film either has to be of a higher
sensitivity, which is reflected in the bigger grain structure of the film, or the iris of the lens needs to
be opened more, which results in not-so-good pictures at lower light levels as well as a reduced depth
of field.
Neither of these two suggestions was acceptable for cinematographers, so the solution was found in
increasing the projection frequency (not the recording) from 24 to 48 with a simple but clever design.
This was achieved with the so-called Maltese Cross shutter, which is a circular blade that is cut in the
2. Light and television
Persistence curve of the human eye
shape of the Maltese Cross. This rotates in front of the projection light bulb and not only blocks the
light when the film moves from one frame to another (so the viewers do not see black lines between
each film frame), but it also interrupts the projection while the frame is stationary (for the duration of
1/24 s) and produces two flashes of the same frame. As a result, we have a projection of 48 frames/s,
which is flicker-free to the eye. Clearly, there are only 24 different pictures recorded each second, but
the cross produces 48 of the same, and our brain perceives flickerless continuous moving pictures.
Television uses the same principles of eye persistence to achieve the illusion of motion by using socalled interlaced scanning. The conceptual difference is in composing the images not by using a light
projector through a celluloid film, but with electronic scanning of a CRT screen. In television, still
images are created by scanning, where a picture is formed line by line, in the same manner as when
reading a book, from left to right and from top to bottom. These principles shall be explained in more
detail later in the book.
It is important for the reader to understand that television also projects static images which, when
2. Light and television
displayed fast enough, are seen as “motion pictures.” Whether this is done by interlaced or progressive
scanning is irrelevant at this stage, but it should be noted that the television technology today is at such
a stage that it can use improved “tricks” for the eye’s motion illusion to be even better.
In the world today there are three basic television systems that differ in the number of pictures per
second, the number of lines each picture is composed of, and the method of color encoding. But in all
of them the concept of producing motion is the same.
PAL: 625 scanning lines/50 interlaced pictures per second
NTSC: 525 scanning lines/60 interlaced pictures per second
SECAM: 625 scanning lines (used to be 819)/50 interlaced pictures per second
Although different in the number of scanning lines and pictures per second, the general concept is the
same from the point of view of composing picture frames field by field and line by line, scanning them
at a fast rate to make use of the persistency concept as in film.
The NTSC’s (National Television Systems Commitee) 525-line, 30-frames-per-second system is shared
primarily by the United States, Canada, Greenland, Mexico, Cuba, Panama, Japan, the Philippines,
Puerto Rico, and most of South America. The NTSC standard was first developed for black and white
(monochrome) television in 1941. The first color TV broadcast system was implemented in the United
States in 1953.
More than half of the countries in the world use one of two 625-line, 25-frame systems: the PAL
(phase alternating line) system or the SECAM (sequential couleur avec memoire or sequential color
with memory) system.
The PAL standard was introduced in the early 1960s and implemented in most European countries,
Australia, New Zealand, China, India, and many countries in Africa and the Middle East. The PAL
standard utilizes a wider channel bandwidth than NTSC, which allows for better picture quality. Also,
the color encoding in PAL, being designed after the introduction of NTSC, offers more accurate color
reproduction and better immunity to noise.
The SECAM standard was introduced in the early 1960s and implemented in France; it is used in
parts of Europe, including countries in and around the former Soviet Union. SECAM uses the same
bandwidth as PAL but transmits the color information sequentially. The extra 100 lines in the SECAM
and PAL systems add significant detail and clarity to the video picture, but the 50 fields per second
(compared to 60 fields in the NTSC system) means that a slight flicker can sometimes be noticed. It
is interesting to note that although Russia, for example, uses SECAM for broadcast TV, the CCTV
industry there uses PAL.
With the introduction of the new digital TV standards (DTV) it is possible to have both interlaced and
progressive scanning. These are usually denominated with a lower case “i” or “p” next to the standard.
For example, “1080i” refers to HDTV with 1920 × 1080 pixels and interlaced scanning.
3. Optics in CCTV
Some people take optics quality in CCTV for granted. With the camera resolution development, as
well as the miniaturization of CCD chips, we are coming closer to the limits of optical resolution and
we need to know a bit more than an average technician. This chapter discusses, again in a simplified
way, the most common optical terms, concepts, and products used in CCTV.
The very first and basic concept we have to understand is the concept of refraction and reflection.
When a light ray traveling through air or a vacuum enters a denser medium, like glass or water, it
reduces its speed by a factor n (always bigger than 1) known as the index of refraction. Different
media (which are transparent to light) have different indices of refraction. For example, the speed of
light in air is 300,000 km/s (almost the same as in a vacuum). If a light ray enters glass, for example,
which has an index of 1.5, the speed is reduced to 200,000 km/s.
According to the wave theory of light, the reduction of the light speed is reflected in its shortened wavelength. This phenomenon represents the base of the concept of refraction. If a light ray enters the glass
perpendicularly, the wavelength of the light ray shortens, but when the ray exits the glass it resumes
to normal speed, that is, returns to the original “air wavelength” and continues its travel in the same
direction. If, however, the light ray enters the glass at any angle other than the perpendicular, interesting
things happen: the light ray (considered to be of a wave nature in this case) has a front that does not
enter the glass media at the same time (because it comes under an angle). The parts of the front that
enter the glass first are “slowed down” first. The end result is the refraction of the light ray; the ray
does not continue in the same direction but deflects slightly. This deviation depends on the density of
the media.
3. Optics in CCTV
The denser the media – that is, the higher the index of
refraction – the greater the inclination of the original
There is a very simple relation between the angles
of incidence and refraction and indices of refraction
between the two different media. This relation was
discovered by the Dutch physicist Willebrord Snell in
the early seventeenth century. By using a very simple
calculation, we can determine the angles of refraction
in various media. As we shall see later on, the same
concepts are used when calculating the angles of total
reflection and numerical aperture in fiber optics.
The basics of refraction are graphically explained in the diagram on the previous page, where it is
assumed a monochromatic (single frequency) light ray enters the glass. The bottom drawing also shows
that a percentage of the incident light is always reflected back into air (or vacuum); in the case of glass
this percentage is very small.
The refraction and reflection theories will be used in the next headings when explaining lens and fiber
optics concepts.
Lenses as optical elements
There are many optical components, but the two basic types of lenses are convex and concave. The first
one, convex, has a positive focal length; that is, the focus is real, and we usually call it a magnifying
glass, since it appears to magnify the objects. The second one, concave, has a negative focal length;
the focus is virtual, and it appears to reduce the objects.
Every lens has the following important parameters:
• Optical plane (a plane passing through the center of
the lens)
• Optical axis (an axis perpendicular to the center of
the optical plane)
• Focus (a point where rays falling parallel to the optical
axis converge)
• Focal length (the distance between the optical plane
and the focus, in meters)
• Diopter (an inverse value of the focal length, where
the focal length is stated in meters)
3. Optics in CCTV
In respect to the physical size
and the type of surface of the
lens, there are many different
types, such as plano-convex,
convex-concave, and planoconcave. The name describes
the physical appearance of the
lens, where plano means one of
the two surfaces is a plane.
Different types of lenses have
been put together in order to
correct various distortions
(aberrations) caused by
different factors.
As an example of why this is
necessary, let us examine a sun
ray falling onto a prism.
We all know the rainbow effect
produced on the other side of
the prism. This happens because
the “white” rays coming from the sun are composed of all the wavelengths (that is, colors) the human
eye can see. Because they all enter the glass prism with the index of refraction n1 > n0, different
wavelengths are changed at slightly different “rates” (proportional to their frequency), thus producing
the rainbow at the other end of the prism. This is actually a decomposition of the white light. The
color red has the longest wavelength (lowest frequency); therefore, it is refracted least. The color violet
has the shortest wavelength (highest frequency); therefore, it is refracted the most.
A very similar effect is the fabulous rainbow after the rain, which is actually the refraction and reflection
of sun rays through the raindrops.
No matter how impressive this effect
looks, it is an unwanted effect in a lens
A convex lens can be approximated with
many little prisms next to each other,
forming a mosaic. It is, then, obvious
that the image created by such a lens
using daylight (which is actually most
common) will be decomposed into the
basic colors as is the case with the prism
light decomposition.
3. Optics in CCTV
This means that when white rays fall onto a simple convex lens, the focal point will vary for different
colors. This is an unwanted effect, called color distortion of a lens, or a chromatic aberration.
So, it should be clearly understood that chromatic aberration happens not so much because of the
imperfection of the lens manufacture (although this is not excluded), but rather because of the physical
process of decomposing white light into the basic wavelengths when the light passes through a single
piece of lens.
Chromatic aberration can be minimized
by combining convex and concave
lenses together, where a white ray is
first split by the convex lens into a
“dispersed rainbow” but is then “put
back together” by the concave lens
because of the opposite effect of the
concave lens (relative to the incident
Chromatic aberration correction
When the two lenses (convex and concave) are chosen carefully (in respect to their thicknesses and
focal points), the result is that all the colors come together in the focus and form a single focusing point.
This is achieved with a proper selection of the convex-concave pairs, preserving the wanted combined
focal length as in the single-piece lens. A special transparent glue is used to join the two lenses.
This is just a very simple example of why numerous optical elements are required to compose a lens
of a certain focal length.
Lenses produce many other distortions, not just the chromatic aberration, but among others, also
geometrical (“pincushion” and “barrel”) and spherical. The name suggests the type of distortion it adds
to the image. These can also be corrected by adding some more optical elements to the group.
When designing a lens, optical engineers have to balance between a lens with as many corrections as
possible (in order to get a good quality picture), but also as few elements as possible (in order to be
economical and technologically acceptable).
One can imagine how many combinations are possible when designing a lens with a particular focal
length with half a dozen (or more) different optical elements. Earlier, optical engineers used to work
together with mathematicians when designing a lens with a certain focal length, and size, and they used
to do hundreds and hundreds of calculations
and iterations manually. The physical size,
the focal length, and the absolute and relative
positions of every element are all variables.
The only way to find such a combination of
a known focal length was by painfully long
3. Optics in CCTV
Obviously, the desired result was to get a good quality lens without going overboard with the number of
optical elements. Since this was quite a challenging task, manufacturers used to register the particular
lens design with their “recipe” of how many lenses, what focal length and at what positions they
were placed. That is why in cinematography and photography we may still see the lenses of a certain
manufacturer with names like Planar, Xenar. These names are actually patented designs of lenses for
a particular lens size and focal length.
Today, in the computer era, there are many professional programs for computerized optical simulations.
Within a few minutes optimum results are obtained, suggesting only as many optical elements as
necessary, yet correcting all the visible distortions.
This is why lenses of a certain focal length are available with different costs and sizes, all giving the
same viewing angle but different picture quality.
Lens quality depends on many factors, and one should not take it for granted. This is especially important
with zoom lenses, as there are so many variables in their design. Zoom lenses are widely used in most
of the bigger CCTV systems, so we should be very careful when choosing them.
There is no simple rule, so the best suggestion, again, is to do some testing and comparisons.
3. Optics in CCTV
The factors that determine the lens quality can be summarized by the following points:
1. Lens design
• Number of elements
• Relative position
• Aberration correction in the design stage
2. Lens elements manufacture
• Glass type
• Technology and type of glass manufacturing (heating, cooling, cleanness)
• Precision of grinding and polishing (very important)
• Antireflection coatings of the glass (micrometer layers for minimizing losses)
3. Lens mechanical composition
• The lens’s positional fixing and stability (shock, temperature)
• The lens’s moving mechanics (especially zooming, focusing, iris leaves)
• Internal light reflections (matte black absorption)
• Gears used for motorized lenses (plastic, metal,
4. Electronics (refers to auto iris and motorized lenses)
• Auto iris electronics quality (gain, stability,
• Electric consumption (auto iris – usually low,
but some older models may require more than a
camera can give since the camera powers the
auto iris)
• Zoom and focus control circuitry (voltages: 6,
9, or 12 volts, three- or four-wire control)
3. Optics in CCTV
Geometrical construction of images
Images can be constructed by using simple optical and geometrical rules. As can be seen on the following
drawing, at least two rays are used to create the image of an object.
There are three basic rules to follow:
• Objects taken at various distances touch the optical axis with one end.
• By definition, rays that pass through the center of the lens do not change direction, that is, in
the center, a lens behaves like parallel glass and no refraction occurs.
• By definition, all rays parallel to the optical axis pass through the focus.
3. Optics in CCTV
There is a very basic lens formula, worth mentioning, which we use when calculating the light falling
onto a CCD chip:
1/D + 1/d = 1/f
where D is the distance from the object to the lens, d is from the lens to the image, and f is the focal
length of the lens. Note that d refers to a noninfinite distance object image and that is why it is bigger
than f, whereas if the object is at an infinite distance, d would be equal to f.
Please note the position of images for various distance objects. Lens focusing is achieved by changing
the distance between the lens and the image plane (which is where the CCD chip is located). So,
only when a lens is focused at an infinitely far object does the image projection coincide, with
the focus plane. In all other cases the distance between the lens and the image is bigger than the focal
length of the lens.
It should also be noted that in practice, a lens is composed (as discussed earlier) of many optical
elements. Therefore, they are represented by an equivalent single-element lens located at the principal
point. The following drawing explains this.
3. Optics in CCTV
A lens composed of many optical elements (single thin lens) has two principal points called primary
and secondary principal points. For a thin lens, these points coincide and they are located at the
center of the lens.
The planes that pass through these principal points and are perpendicular to the optical axis are called
principal planes.
The principal planes have the following properties:
• A ray incident to the primary principal plane (and parallel to the optical axis) will leave the
secondary principal plane at the same height, traveling toward the focal point (focus).
• An incident ray directed toward the primary principal point will leave the secondary principal
point at the same angle.
• The focal length of such a lens is measured from the secondary principal plane to the focus.
Using the above properties, we can construct a geometrical image in the same manner as was shown
with the single optical element.
The secondary principal point may fall outside the group of lenses. This is the case with very short
focal length lenses. The shorter the focal length is, the more optical elements have to be added for
correcting various distortions, making the lens more expensive. With the CCD chip reduction (2/3''
down to 1/2'', then to 1/3'', and now to 1/4''), shorter focal length lenses have to be manufactured in
order to preserve the same wide angle as the preceding chip sizes. This, in turn, has forced the industry
to reduce the C-mount 17.5-mm back-flange distance in order for the optics to get simpler, smaller,
and cheaper. The new format of back-flange distance is 12.5 mm, and since it is smaller, it is referred
to as the CS-mount standard.
3. Optics in CCTV
Drawing courtesy of Elbex
Cross section of manual iris lens
Aspherical lenses
As mentioned earlier, spherical aberration is a common distortion that appears in the majority of lenses
of a spherical type. Spherical-type lenses are the most common since they are produced by grinding
and polishing in the easiest mechanical way, following the spherical laws. This refers to a circular
machine polishing with the result being a lens of a spherical appearance. It can be shown that apart
from the chromatic aberrations present in a single-lens element (the “color decomposition” of white
light), aberration also occurs because of the spherical profile of the lens. The focus is not a very precise
single point.
Theoretically, using the physical laws of refraction, we can show (but we will not go into the details)
that a bell-shaped lens, which does not follow the spherical law, is the ideal shape for obtaining a
single focusing point without spherical distortions. The cross-section profile of such a lens is a curve
that deviates slightly from a circular shape, appearing more bell shaped. This type of lens is called an
aspherical lens.
The drawing on the next page shows this in an exaggerated form in order to help the reader understand
Understandably, such a shape is hard to produce by regular polishing techniques, but if properly
manufactured, it offers quite a few advantages over the conventional spherical lenses, including higher
iris openings (which is reflected in a lower F-stop), wider angles of view, shorter minimum object
distances, and fewer optical elements because there are fewer aberrations to correct (thus resulting
in lighter and smaller lens designs).
This technology is more expensive due to the aforementioned complex polishing techniques.
3. Optics in CCTV
Some optical companies have started producing molded aspherical lenses, avoiding the critical process
of grinding. This process does not offer the same glass quality as the regular one, but it does offer a
solution for more economical production of aspherical lenses.
The quality of such lenses is yet to be proven, but they do exist and are available in the CCTV market
as well.
3. Optics in CCTV
What we want from a lens are sharp and clear images, free of distortions.
As already mentioned, lenses have limited resolving power, and this is especially important to have in
mind when using them in high-resolution systems.
Resolution refers to the lens’s ability to reproduce fine details. In order to measure this ability, a
chart that consists of black and white stripes with various density (spatial periods) is used. This is
usually expressed in lines per millimeter (lines/mm). When counting how many lines/mm a lens can
resolve, we count both black and white lines.
A characteristic that shows the “response” of a lens to various densities of lines/mm is called a Contrast
Transfer Function (CTF).
Theoretically, it is better to know the lens characteristics for a continuous variation of black to white (in
3. Optics in CCTV
the form of a sine wave), and not just for stripes that abruptly change from black to white. This would
be especially suitable for TV lenses since the optical signal is converted into an electrical signal with
which sine waves are easier to represent and evaluate. This characteristic is known as a Modulation
Transfer Function (MTF).
In practice, however, it is much easier to produce a test chart with just black/white stripes rather than
the sine wave variation between black and white. CTF is not the same as MTF, but it is much easier to
measure and is precise enough to describe the lens’s global characteristics.
The easiest analogy of MTF to understand would be the spectral response of an audio system. In
an audio system we usually describe the output level (voltage or sound pressure) versus the audio
frequency. In optics it is similar, where MTF is expressed in contrast values (from 0 to 100%) versus
spatial frequency (expressed in lines/mm), as can be seen on the previous page.
Different lenses have different MTF characteristics, depending on the quality of the glass, optical
design, and application. For example, a photographic lens will have a better MTF than a CCTV lens.
The reason for this is simple: the photographic film structure can register over 120 lines/mm and
manufacturers need to produce better lenses in order to minimize picture deterioration when film is
blown up to a poster size.
CCD chips have a lower resolution than the film crystal structure. Technically, there is no need to go to
the “expense” of producing a lens with much higher resolution than a CCD chip. With the miniaturization
of CCD chips, however, we are actually coming closer to the film resolution limits, so lenses need to
feature better characteristics.
An average 1/2'' B/W CCD chip, for example, has approximately 500 pixels (picture elements) in the
horizontal direction. When we take into account the physical width of the 1/2'' CCD chip (6.4 mm),
we can conclude that the maximum number of vertical lines (black and white pairs) we can have is
(500:6.4):2 = 39 lines/mm. This resolution is easily achieved with most TV lenses, since the optical
technology can produce over 50 lines/mm. But for a 1/3'' B/W CCD chip, with the same density of 500
pixels horizontally, we are actually talking about (500:4.4):2 = 57 lines/mm. This means that a 1/3''
CCD camera demands more from the lens resolution than a 1/2'' one.
Different lenses have different MTF characteristics, and sometimes it may be necessary to decide which
one to use on the basis of these characteristics.
The diagram presented here shows such an example. We
can evaluate this in the following way. Lens A has its MTF
extending into the high spatial frequency range, which means
it can resolve finer details than lens B. Lens B, however, has
better response in the lower frequencies. If we need a lens
for a high-resolution output, like film, for example, lens A
will be a better choice, but for CCTV purposes, where a CCD
chip cannot see more than 50 lines/mm, we are better off with
lens B since we will have better contrast with it.
3. Optics in CCTV
F and T numbers
In addition to the MTF and CTF characteristics of a lens, the F-number (more commonly, the F-stop)
is also a very important parameter.
The F-number indicates the brightness of an image formed by a lens. This is usually written (engraved)
on the lens itself as F-1.4, for example, or sometimes in another form, such as 1:1.4. The F-number
depends on the focal length of the lens and the effective diameter of the area through which the light
rays pass. This area can be controlled by a mechanical leaves assembly, which we usually refer to as
the iris.
It is important to note that the effective diameter of a lens is not the actual lens diameter, but rather
the diameter of the image of the iris as seen from in front of the lens. The first lens diameter is
usually called the entrance pupil. There is also an exit pupil, as shown on the diagram below. The
actual iris diaphragm is positioned between these two pupils, which also happens to be between the
two principal points.
The lower the F-number the bigger the iris opening is, and that means more light is transmitted through
the lens. The lowest number for a particular lens is the number engraved (or written) on the lens itself,
representing the light-gathering ability of that lens.
Often, the lower F-stop lenses are called faster lenses. The reason for this is that, in the early days of
photography, by increasing the amount of light (lower F-stop), the film exposure time needed to be
shortened, thus allowing pictures with fast action to be taken without losing any sharpness because of
camera movement.
If a 16 mm lens has a minimum F-stop of 1.4, for example, it is usually written as 16 mm/1.4, or
sometimes as 16 mm 1:1.4. The maximum effective iris opening is equivalent to a circle with a diameter
of 16/1.4 = 11.43 mm – equivalent because the iris leaves would usually make a triangle, a square, a
pentagon, or a hexagon opening.
3. Optics in CCTV
The iris position and size depend on the lens type and design.
In order to understand the consecutiveness of the F-numbers, we will have to do some simple
Starting with the above example of a 16 mm/1.4 lens, let us find the area when the iris is fully open
(that is, at F-1.4):
A1.4 = (d/2)2 · π = (11.43/2)2 · π = 32.66 · 3.14 = 102.5 mm2
Let us halve this area – that is, take 51.25 mm2 as a new area, and let us calculate what the iris opening
Ax = (x/2)2 · π ⇒ x = 2 · √ (Ax / π) = 8 mm
where √ is a square root. Now, the F-stop with an 8 mm iris opening would be 16/8 = 2, that is, F-2.
Here we have F-2 representing an area that is exactly half of the F-1.4. If we proceed with the same
logic, we will get the following familiar numbers:
2.8; 4; 5.6; 8; 11; 16; 22; 32; etc.
All of these numbers are common to all types of lenses, and what they mean is that every next higher
F-number transmits half the amount of light of the previous F-number.
Now it should be much clearer why a 16 mm/1.0 lens makes the same camera look more sensitive than,
for example, when a 16 mm/1.4 lens is used.
For zoom lenses, the F-numbers quoted refer to the iris opening at the shortest focal length of the
zoom lens. This is obviously the best “light-gathering number” of every lens. The F-number of the
same zoom lens at a longer focal length setting (tele) is always smaller than at the shorter end. But it
3. Optics in CCTV
is wrong to assume a linear function of the F-stop versus the
focal length. Namely, if an 8–80 mm/1.4 lens is in question, it
makes an 8/1.4 = 5.7 mm effective iris opening, while with the
same iris at 80 mm we should have an F-stop of 80/5.7 = 14.
This simply is not the case because it depends on the zoom lens
construction. The iris plane may vary in relation to the moving
parts of the zooming components, obeying a nonlinear law.
In most cases we have much better values for the F-stop at the
higher focal length than indicated, but they are still worse than
Vari-focal lenses have become
at the lower focal length.
very popular.
It is fair to say that every piece of glass, no matter how good
it is, introduces some light loss. These losses might be a very
small percentage of the total light energy, but they should be considered if accurate lens characteristics
need to be taken into account. An indication of lens’s level of light transmission is shown by the
transmittance factor, which is always less than 100%. This is why many professionals prefer to use
T-numbers instead of F-numbers.
The definition of a T-number takes the F-stop and the lens transmittance into account:
T-number = 10 · F-number/√(Transmittance)
where the symbol √ means square root. Since the transmittance of a lens is, as mentioned, always less
than 100% (usually 95 to 99%), it is obvious that the T-number will be a bit higher than the F-number.
For example, if a 16 mm/1.4 lens has a transmittance of 96%, the T-number will be equal to 1.43.
Depth of field
When a lens is focused on an object, theoretically, the whole plane passing through the object and
perpendicular to the optical axis should be in focus.
Practically, objects slightly in front of and behind the object in focus will also appear sharp. This “extra”
depth of sharpness is called depth of field.
A wide depth of field might be an undesired feature, as it is, for example, when we want an object we
are photographing to be isolated from the foreground and the background. This is very characteristic
when taking portrait shots with a telephoto lens, where the depth of field is very narrow.
In CCTV, however, we often want the opposite effect. We want to have as many objects in focus as
possible, no matter where the real focusing plane is.
The depth of field depends on the focal length of the lens, the F-stop, and the format size of the lens
(2/3'', 1/2'', etc.). A general rule is the shorter the focal length, the wider the depth of field; the
higher the F-stop, the wider the depth of field, and the smaller the lens format, the wider the
depth of field.
3. Optics in CCTV
The depth of field effect is explained by the so-called permissible circles of confusion. The permissible
circle of confusion is a projected circle of the depth of field area. If the smallest picture element (pixel)
of the CCD chip is equal to or bigger than the permissible circle of confusion, then it is obvious that
we cannot see details smaller than that circle. In other words, all objects and their details that appear
within the circle will look equally sharp, since that is the actual size of the pixels. From this it is clear
that the size of the permissible circles of confusion for a CCTV camera is determined by the pixel size
of the CCD chip – in other words, the chip resolution.
It may now be understood why some short focal length lenses in CCTV, such as 2.6 or 3.5 mm, do not
have a focusing ring at all but only an iris adjustment. This is because even with the lowest F-stop
for that lens (be it 1.4 or 1.8) the depth of field is so wide that it actually shows sharp images from a
couple of centimeters in front of the lens up to infinity. There is literally no need for focusing.
As shall be explained later in the book, the depth of field is an effect of which we should be very
aware, especially when adjusting the so-called back-focus. If the back-focus is not adjusted properly,
and a camera is installed at daylight (that is when the auto iris of the lens closes the iris as much as
possible, due to excessive light), the depth of field will produce sharpness even in areas that are not
really in focus.
3. Optics in CCTV
Practical experience shows that depth of field applied in this way (when the back-focus is not done
correctly) is the biggest source of frustration for a 24-hour operating system. The reason is obvious: at
night, when the iris opens due to a low light level (providing the AI functions properly), the depth of
field narrows down and shows the images out of focus even if they were in focus during the day. When
an operator complains to the installer or service people, not knowing the cause of such a problem, he
or she usually gets the service to visit during the daytime. Obviously, the problem will not be there
then, thanks to a wide depth of field that reappears “inexplicably” at nighttime.
The moral of the above is that the back-focus adjustment (discussed later in the book) should be done
when the iris is fully opened. The easiest way to have the iris opened is when low light levels reach it,
either at the end of the day (or at night), or by artificially reducing the daylight with external neutral
density filters (usually placed in front of the lens objective). All this is in order to reduce the depth of
field and consequently make back-focus adjustment easier and more accurate.
Quite often, when B/W cameras with infrared lights are used, another effect is present. Because of the
extremely long wavelength of the infrared light (compared to normal light), and the lesser angle of
refraction, we get the focused image plane slightly behind the CCD chip. Refer to the heading Lenses
as Optical Elements for further explanation of this phenomenon. If an image is sharp at day, then
at nighttime objects of the same distance will be out of focus. This might be a quite noticeable and
3. Optics in CCTV
unwanted effect. In order to minimize it, a lens should be designed with a special compensation for
infrared viewing (some manufacturers have special glass lenses for this purpose). However, a more
practical and common solution would be to have the camera back-focused at night with an infrared
light on, in which case the depth of field is minimal but the objects are in focus. At day, the depth of
field will increase the sharpness to a wider area, compensating for the difference between the infrared
and the normal light focus.
Neutral density (ND) filters
Earlier, when we discussed F-stops, we also mentioned some F-numbers – 1.4, 2, 2.8, 4, 5.6, 8, 11,
16, 22, 32, and so on. This list continues – 44, 64, 88, 128, and so on. The higher the F-number, the
smaller the iris opening is.
For photographic or movie film, F-32 is considered quite a high number. The film emulsion is so sensitive
that even on the sunniest days, this F-stop, combined with the available shutter speed, is enough to
compensate for the excessive light.
Film sensitivity is measured in ISO units, and the most common film we use for everyday purposes
has a sensitivity of 100 ISO units.
CCD chips are much more sensitive than a 100 ISO film, especially the B/W chips. Starting from known
light levels, the F-stop, and shutter speed of a photographic camera, the typical electronic exposure time
of a TV camera (1/50 s for CCIR/PAL), and the iris setting, we can calculate that a B/W CCD chip’s
sensitivity is close to the 100,000 ISO units mark. This is quite a high sensitivity.
Translated into everyday language, this means that CCD chips are so sensitive that the low light level
situation is not really a problem (although you would have a lot of customers asking you, “How sensitive
is your camera?”), but rather the strong light.
Since television cameras use one exposure speed only, 1/50 s in CCIR and SECAM, and 1/60 in NTSC
(not considering the CCD-iris cameras), we can only manipulate the F-stop to reduce the amount of
An average B/W CCD chip requires 0.1 lx at the chip to produce a full video signal. A bright sunny day
at the beach, or on the snow, produces more than 100,000 lx at the object. To reduce this to 0.1 lx, very
high F-stops, in the order of up to F-1200, need to be used. Using the basic definition for F-stop, for an
average 16 mm/1.4 lens, we will get F-1200 to be an effective iris opening of 16/1200 = 0.013 mm.
Mechanically, this is impossible to produce because of the very small size and precision required, but
also because with such a small iris we would introduce new problems such as edge diffraction of light
(known as the Fresnel effect), which will affect the picture quality.
The solution was found in the use of internal neutral density (ND) filters. These are very thin films
of circular, neutral color coatings, positioned in the middle of the lens, close to the iris plane. The
filters get less transparent toward the middle of the concentric circles. The F-stop is thus achieved by
3. Optics in CCTV
a combination of the mechanical iris (leaves)
and the optical ND filter (optical attenuation).
This is a very simple and efficient way of
battling strong light. The filters are called
neutral because they attenuate all wavelengths
(colors) evenly, therefore not changing the
color composition of the image.
The optical precision of such thin films is
very important in order to preserve the lens’s
MTF characteristics as the F-stop increases.
Theoretically, the resolving power of any lens
is best in the middle of the mechanical iris
setting, and it reduces as the F-stop goes lower
or higher (this is different from the depth of
field effect), but the ND filters may reduce it even further. Whether or not this will be obvious depends
on the quality of the lens in general.
Apart from the internal
ND filters, there are also
external ND filters, which
are not so sophisticated.
These are just precise
semitransparent pieces of
glass, or optical filters if
you like, that attenuate the
light × number of times.
This may be 10, 100, or
1000 times. Two or three of
these can be combined, so,
for example, 10 with 1000
times will result in an ND
filter with 10,000 times attenuation.
On-lens ND filters
Sometimes, and probably more correctly, the attenuation of the external ND filters is expressed in Fstops. Knowing that every next F-stop will divide the light gathering ability by 2 (50% of the previous
number), we can establish the following logic: 100 times ND filter is divided by 100, which is halfway
between 26 and 27 (26 = 64, 27 = 128). This means 100 times attenuation is approximately 6.5 F-stops.
One thousand times attenuation is close to 210, which means approximately 10 F-stops.
These types of ND filters are very handy, as already explained, for minimizing the depth of field for
the purposes of back-focus adjustments or AI level adjustments during the daytime.
3. Optics in CCTV
Manual, auto, and motorized iris lenses
Manual iris (MI) lenses adjust the iris manually (that is, by hand). These lenses are very common in
areas with constant light, such as shopping centers, underground car parks, and libraries. Basically,
these are areas where natural light does not interfere noticeably with the ambient, and therefore we have
almost constant artificial light. Eventual small variations are compensated by the camera’s automatic
gain control (AGC).
With the introduction of the CCD-iris cameras, however, fixed iris lenses are used in light-varying
areas as well, since the CCD electronic iris adjusts the exposure time, compensating for the light
Two major factors decide at what F-stop (iris) a manual iris lens should be set for optimum
• Light intensity
• Depth of field
These factors contradict each other, and that is why MI settings are always a compromise. When using
it in very low light level situations, or when using not so sensitive cameras, the general tendency is to
open the iris (low F-number) as much as possible. Obviously, in such cases the depth of field, as well
as the MTF, as explained previously, will be minimal. We should not forget that apart from the depth,
the lens resolution at the lowest F-stop is usually the poorest. A compromise is often the best solution
(if the camera’s minimum illumination characteristic allows for it), and the lens is set to one or two
3. Optics in CCTV
F-stops higher than the lowest (e.g., F-2, F-2.8).
Auto iris (AI) lenses have electronic circuitry that processes the video signal coming out of the camera
and decides, on the basis of the video signal level, whether the iris should open or close.
Auto iris works as automatic electronic-optical feedback. If the video signal is low, the electronics
tells the iris to open, and if it is too high, it tells it to close.
In order to do this, the AI lens takes power from the camera (usually 9 V DC), as well as the video
signal and references the electronics of the lens and the camera with a third common wire (called
zero, negative, or common). Quite often, you will find lenses with shielding as well. This is to protect
the video signal wire from strong external electromagnetic interference. Usually, this wire does not
have to be connected to the camera body because the connection is already made with the lens’s metal
ring when fitted on the camera. By keeping the AI cable as short as possible, the amount of unwanted
interference induced in the video signal is minimal. This goes hand in hand with the ever decreasing
camera size. Be aware, however, of plastic C/CS-mount adaptors that will not common the lens case
with the camera’s body.
Following are some color codes for the AI wires that are widely accepted in the industry:
• Black is usually used for common,
• Red for power (derived from the camera), and
• White for video.
Some manufacturers, in order to lower manufacturing costs, have started using two-wire AI cables
(red-power and white-video) with a shielding used as the common wire.
3. Optics in CCTV
Often, lenses with four-wire cables can be found, where the fourth wire is usually green. In most
cases this is an unused wire, but in some lenses it offers remote control of the iris, usually known as
motorized iris (MRI) control. When such control is wanted, the iris opens and closes as instructed by
the voltage from a site driver (controlled by an operator), much in the same way as zoom and focus
are controlled.
The latter type of lens is the preferred one in systems with CCD-iris cameras. The reason for this is
that CCD-iris and auto iris do not work well together. If the two of them are enabled, the electronic
iris usually works faster, and by the time the mechanical auto iris responds to the light fluctuations,
the electronic iris has already reduced the shutter exposure, forcing the auto iris to open more. The end
result is a widely opened iris and a very short electronic exposure. This gives a 1 Vpp output signal as
is expected, but the depth of field is minimal and vertical smearing is more noticeable because of
the very short exposure of the CCD chip.
Because of this, when auto iris lenses are used, it is suggested that the CCD-iris be switched off. The
electronic iris is, however, quicker and more reliable since there are no moving parts (only electronics),
although it does not control the depth of field.
So, to gain the benefits of both, motorized iris lenses are now recommended with CCD-iris cameras.
This can obviously be done only if a site driver with an iris control is used. In such systems, operators
can adjust the iris according to the light-level situation and required depth of field, but only when
drastic light changes occur.
The current consumption of the AI circuitry is usually below 30 mA, and it does not represent any
noticeable load on the camera power supply. Be aware, however, as mentioned earlier, that older lenses
(especially bigger zoom lenses) may demand more current drive, in which case (if a camera output
current is not sufficient), a separate 9 V DC power supply has to be used for the auto iris electronics
inside the lens.
Video- and DC-driven auto iris lenses
The division of lenses gets a bit more confusing in respect to the processing circuitry when auto iris
lenses are in question. Namely, apart from the “normal” AI lenses we have in the majority of cases,
where the electronics are built inside the lens itself and which we call video-driven AI lenses (since
they require a video signal from the camera), we can also find so-called DC-driven AI lenses. These
3. Optics in CCTV
lenses are similar to the video-driven ones, with the exception that the processing electronics are not
inside the lens but rather inside the camera. The lens, in that case, has only the motor and the iris
mechanism. Clearly, when DC-driven lenses are used, the camera has to be designed to have such an
output. Instead of having power, video, and common wires, we will have power, DC level, and common
connection. Often, these types of lenses are called Galvanometric auto iris lenses.
A DC-driven lens cannot be used on a camera that does not have that type of connector, and vice versa.
If a camera has a DC auto iris connector, you will usually find level and ALC adjustments (explained
in the following paragraphs) on the camera itself, instead of their being on the lens.
AI lenses, both fixed and zoom, have two potentiometers for adjusting the response and type of operation:
level and ALC (automatic light compensation). This also applies to DC-driven lenses, only in that case,
as mentioned above, the settings are on the camera itself.
Level adjusts the iris opening on the basis of the average level of the signal. The level is also known
as sensitivity adjustment because of its appearance on the monitor screen as brightness variation of
the object. When the level potentiometer is adjusted, iris operation should be checked both daily and
nightly. If the working point is shifted too high, the picture may look okay at day but very dark at night.
The opposite is also true: if the working point is shifted too low, it may be acceptable at night but too
bright at daylight. To make sure that this does not happen, the best adjustment is achieved in the late
afternoon with a little help from a torch. First, make sure the picture is as good at low light as it can
be (that is, iris fully opened). Then, shine the torch at the lens and see if the iris closes sufficiently to
see the torch filament only.
If tests cannot be conducted in the late afternoon, the alternative is to use some external ND filters.
These filters can be selected to attenuate the daylight to the level equivalent to a low light level situation,
which is usually a couple of luxes. Then, instead of using a torch, all it requires is to remove the ND
filters and see whether and how the iris reacts.
ALC, as we have noted, stands for automatic light compensation. The ALC is a photometric adjustment
of the iris, and it should be thought of as “automatic backlight compensation.” The ALC part of
the auto iris circuit decides on which portion of the video signal level the auto iris should react. ALC
adjusts the video reference point for the iris operation depending on the picture contrast. In most cases,
when the signal is “rich” with details from the darkest to the brightest (0 to 0.7 V), the reference level
is in the middle. If very bright spots appear in the picture, they will
participate in the calculation of the reference point and will force
the auto iris to close to produce a video signal with “full dynamic”
range. The visual appearance then will be a high-contrast picture.
So, very bright objects (e.g., sun reflections, bright lights, windows
and similar) will force the iris to close, making the dark objects
even darker, sometimes too dark to distinguish any details. In such
situations we may change the ALC setting from the factory default
to the extreme position to make the iris disregard the bright areas
and open more than usual. This allows for the objects in shadow
to be more distinguishable.
ALC and level pots
3. Optics in CCTV
This adjustment is equivalent to the backlight compensation found in many camcorders. The backlight
compensation is used, as the name suggests, to fight against the backlight. The idea is to tell the lens
electronics to disregard the very bright areas of the image and open the iris more in order to see details
of the darker objects in the foreground.
This is very useful when positioning the camera in hallways, for example, looking through glass doors
and against a bright background. If a person walks in the hallway, he or she will be a silhouette. When
the ALC is adjusted, the iris can be forced to open by one or two F-stops more, thus brightening the
face of the person. Similarly, the ALC can be adjusted to do the opposite job, that is, close the iris more
than it should in order to see details of the very bright background, as through the hallway door.
The ALC setting has two ends marked as Peak and Average. The first example above would correspond
to Peak setting, and the second to the Average setting. Factory defaults are usually in the middle of
these two positions. Please note that, in order to see the effects of the ALC adjustments, a very highcontrast scene is needed.
Auto iris lens electronics
As the optics quality of a lens cannot be taken for granted, neither should the electronics of an auto
iris lens. Different circuit designs offer different quality and precision of operation. This, combined
with the mechanical construction of the iris shutter, determines whether a lens is good, average, or bad.
The responsiveness of the iris to abrupt light changes is not instant and ranges anywhere from half a
second to two seconds. This needs to be taken into account when adjusting level and/or ALC settings
on a lens. The delay depends on the feedback, that is, the electronic and mechanical combination. The
electronics has its automatic gain control (AGC), but how effectively this combination works depends
on the camera’s electronics, including the AGC.
3. Optics in CCTV
The combination of the two can be such that they may produce oscillation in the auto iris operation,
which is usually called ringing or hunting. The ringing appears as a pulsating picture, depending on the
camera viewing direction and light conditions. It is especially common when looking against strong
light. To minimize it, usually level adjustment is sufficient, and sometimes ALC or both. There are
unfortunate camera/lens combinations, however, where ringing cannot be eliminated. The solution is
usually found in replacing the lens with that of another brand. Some newer auto iris lenses come with
an additional potentiometer for adjusting the level of the lens’s AGC.
As mentioned earlier, the auto iris lens cable is usually protected with a shielding that is often not
connected to the auto iris. The shielding’s purpose is to protect the video signal wire from picking up
noise. In order for it to be effective, it is sufficient for one end of the shielding to be connected to the
common of the signal electronics, which happens to be done through the lens body (the C- or CS-mount
ring) and the camera C-mount thread. With camera miniaturization, the cables are getting shorter, further
minimizing the risk of unwanted external noise interfering with the operation.
Finally, let’s remember that the AI current consumption is very low, usually below 30 mA.
Image and lens formats in CCTV
A lens sees objects with the same angle of vision in all directions, that is, the angle of vision has a
conical shape. Therefore, the image area projected by a lens has a circular shape, but the camera’s
sensitive area (CCD chip in our case) is a rectangle within the imaging circle.
In today’s television, this rectangle is with the aspect ratio of 4:3, that is, the standard is 4 units in
width by 3 units in height. As mentioned at the beginning of the book, this aspect was adopted for the
film format in the early days of television.
The all-new high-definition television (HDTV) system,
which is already accepted with its basic standard, has
an aspect ratio of 16:9. The idea is to have better movie
The “imaging rectangles” are within the image circles,
which have all (or at least the majority) of the aberrations
There is no point in making a lens that produces a much
bigger image circle than is required. Therefore, the lenses
are made to suit the image format, no less and no more.
There are exceptions, such as when lenses made for other
purposes, photography, for example, are used on a CCTV
camera with a special C-mount adaptor.
Today in CCTV, we have quite a few different chip sizes:
2/3'', 1/2'', 1/3'', and 1/4''. High-definition cameras and some
3. Optics in CCTV
special-application cameras may have 1'' or even larger chip sizes. In order to understand this variety,
we should know a little bit of the history of TV.
The very first TV cameras used imaging tubes of a certain diameter and were referred to as 1'' Vidicon
or perhaps 2/3'' Newvicon cameras. These dimensions referred to the actual diameter of the imaging
tube. The imaging area is a rectangle with a 4:3 aspect ratio, and this rectangle has a diagonal that is
smaller than the actual tube diameter mainly because of the tube photosensitive area (called target).
When the electron beam scans the imaging area, it does not go to the edges of the tube. Therefore, a
2/3'' tube camera has an imaging area, scanned by the electron beam, of approximately 8.8 × 6.6 mm.
This area gives a diagonal length of approximately 11 mm. This is not equal to 2/3'', which, converted
into millimeters, is 17 mm. So, do not think that the CCD chip measurements are as with TV screens,
where CRT size is expressed with its diagonal.
When we say a 2/3'' CCD chip, we are really referring to a device that has an imaging area equal
to what a 2/3'' tube would have.
When the first CCTV CCD cameras were made, the common tube size was 2/3''. The image area of
such tubes, as mentioned previously, was 8.8 × 6.6 mm, so the CCD chips designed in those days were
of the same imaging area size and they were called 2/3'' chips. The idea was to use the same lenses as
tube cameras did.
Various CCD chip sizes (actual size)
With the evolution of technology, CCDs were getting smaller, and the new chip size called 1/2'' measured
an imaging area of only 6.4 × 4.8 mm. The compatibility with the 2/3'' lenses was preserved (using the
same C-mount), but of course, the angle of view changed: it got smaller compared to when the same
type of lens was used on a 2/3'' camera.
So, new lenses were designed for the 1/2'' chips, which did not project as big an image as with 2/3'' chips.
In other words, owing to this reduction of the imaging area, lenses were designed to have the desired
focal length but with a smaller imaging circle projected, that is, a circle with a diameter sufficient to
cover a 1/2" chip but not necessarily 2/3''. These new lenses are called 1/2'' lenses. They still have the
C-mount ring, but they are smaller, and consequently cheaper, than their 2/3'' counterparts.
The same development is now happening with 1/3'' chips, where 1/3'' lenses are made to produce an
image circle sufficient in diameter to cover only the 1/3'' chips.
An obvious problem that will occur if a 1/3'' lens is used on a 1/2'' chip is that the image corners will
3. Optics in CCTV
be cut off (imagine a rectangle and a circle with a smaller diameter drawn inside).
The same applies when a 1/2'' lens is used on a 2/3'' chip. There is no problem, however, if a bigger
lens is used on a smaller chip. Since a lens of a bigger format will project an image circle much larger
than the actual chip size, there will be no corners cut off or any other deformation.
3. Optics in CCTV
It should be taken into consideration, however, that the reduction in the imaging pickup area may result
in a relative resolution reduction, since a smaller area is used (see the discussion on MTF and CTF).
In addition, the excessive light around the chip (when a larger format lens is used) may get reflected
inside the lens and CCD block, so if there are surfaces that are insufficiently neutralized with a black
matte finish, the usable image will be affected.
Angles of view and how to determine them
Different focal length lenses give different angles of view.
We quite often use the horizontal angle of view as a reference since the vertical can be found from it,
knowing that the video signal aspect ratio is 4:3, and the same applies to the horizontal vs. vertical
angle of view.
There are some very basic rules to follow when analyzing the angles of view:
• The shorter the focal length, the wider the angle of view is.
• The longer the focal length, the narrower the angle of view is.
• The smaller the CCD chip, the narrower the angle of view (with the same lens) is.
• The vertical angle of view can be easily determined if the horizontal is known.
As mentioned earlier, approximately 30° is considered a standard angle of view for whatever size
the image format is. Just to refresh our memory, 30° is taken as standard because it corresponds to
our perspective impression and what the human eye sees as normal.
The following are image formats with their corresponding standard lenses for a 30° horizontal angle
of view:
1'' = 25 mm
2/3'' = 16 mm
1/2'' = 12 mm
1/3'' = 8 mm
1/4'' = 6 mm
In CCTV, the widest angle of view that manufacturers offer is approximately 94°, which is achieved
with 4.8 mm for a 2/3'' CCD camera, 3.5 mm for a 1/2'', and 2.8 mm for a 1/3''.
Some unique “fish-eye” lenses offering almost a 180° angle of view are available, but these are very
specialized and show only a circular (thus the name “fish-eye”) image on the screen (within the CCD
3. Optics in CCTV
chip image area).
Lenses do come in discrete values; that is, one cannot order any value one wants, such as 5.8 mm or
14 mm. So it is useful to know the most common focal length lenses:
2.6 mm, 3.5 mm, 4.8 mm, 6 mm, 8 mm, 12 mm, 16 mm, 25 mm, 50 mm, and 75 mm
You may find some manufacturers have 3.7 mm instead of 3.5 mm, or 5.6 mm instead of 6 mm, but
the values are very close and there is practically no difference in the angle of view.
The above values have horizontal angles of view that differ, more or less, in steps of 10°–15° from one
to the next. These are quite sufficient to cover all practical situations, but should you really require a
special focal length that is not listed above, inquire at your supplier as some manufacturers do have
manually variable-focus lenses (both MI and AI) where the focal length can be varied from 6–12 mm
or perhaps from 8–16 mm. The optical quality of such lenses, however, is not as good as that of fixed
lenses, due to the limited precision and simplicity of the moving mechanics. But again, the quality in
most cases goes with the price.
What focal length lens should be used for a particular application? This is probably the most
commonly asked question when designing a CCTV system. Many techniques can be used to determine
the angles of coverage, and which one you are going to use is entirely up to you, as long as the result
is what your customer will be happy with.
Here is a listing of all practical methods. These are:
• Viewfinder calculator. This is usually a circular-shaped calculator, supplied by the lens
manufacturers (ask your supplier for one), where, in order to find the lens, three things need to
be known: the CCD chip size, the distance between the camera and the object, and the width of
the object. By adjusting these few things, the calculator should give you the focal length in mm.
3. Optics in CCTV
There are also ruler-shaped calculators with the same concept.
• Optical viewfinder. This device looks like a zoom lens, but it is used not on a camera but by eye.
When you are on site, you can manually zoom in and out and set the view to what your customer
requires. A scale indicator on the viewfinder
shows the focal length of the lens that will
give you the same view on the particular
type of camera (2/3'', 1/2'', or 1/3''). In order
to see the same view that the camera would
see, you have to position yourself close to
where the camera would be installed. One
little drawback with this instrument is that
you cannot see the very wide angles; most
of the optical viewfinders only show focal
lengths down to 6 mm.
• Camcorder with a zoom lens. This is quite a simple and practical method, especially these days
when we have such a huge choice of camcorders with built-in zoom lenses. We need to know
the chip size in the camcorder in order to refer to the same-size CCTV camera, or substitute it
accordingly. Obviously, it is good to have a camcorder with a wide range of zooming, but more
importantly the lens should have an indicator of each focal length at its corresponding position.
When we go on site, we have the added advantage of showing our customer what the options
are, and we can record and document what he or she chooses.
3. Optics in CCTV
• A simple lens formula. This seems the most complicated way of determining angles of view,
but it is actually the simplest. This formula uses the similarity of triangles, as shown in the figure
below. It is easy to understand and therefore it can be easily produced whenever necessary. The
only thing you need to memorize are the CCD chip widths of the most commonly used cameras:
6.4 mm for 1/2'', 4.8 mm for 1/3'', and 3.4 mm for a 1/4'' chip.
This formula gives you the focal length of the lens directly into millimeters.
f = cCCD· d / wobject
where f is the lens focal length we are looking for (in mm), cCCD is the CCD chip width (in mm),
d is the distance from the camera to the object (in meters), and wobject is the width of the object
we wish to view (in meters).
The same formula can be used if we want to find what focal length lens we need, to see a certain
object’s height, in which case instead of wCCD and wobject we will be working with hCCD and hobject,
where h stands for height.
• A more complicated formula. This formula gives the resulting angle of view in degrees. It is
based on elementary trigonometry and requires a scientific calculator or trigonometric tables.
α = 2 · arctan (wobject/2d)
where α is the angle of view (in degrees), arctan is an inverse tangent trigonometric function
(you need a scientific calculator for this), which is sometimes written as tan-1, wobject is the object
width (in meters), and d is the distance to the object the camera is looking at.
• A table and/or graph. This is easy to use as it does not require any calculations, however, it
requires a table or graph to be handy. The table on the next page gives only the horizontal angle
of view for a given lens, because this is most commonly required. Vertical angles are easily found
by applying the aspect ratio rule, that is, divide the horizontal angle by 4 and then multiply it
by 3.
3. Optics in CCTV
In all of the above methods, we have to take into account monitor overscanning as well. In other words,
most monitors do not show 100% of what the camera sees. Usually, 10% of the picture is hidden by
the overscanning by the monitors. The viewfinder calculator may allow for this 10%.
Some professional monitors offer the underscanning feature. If you get hold of such a monitor you can
use it to determine the amount of overscanning by the normal monitor. This is very important to know
when performing camera resolution tests, as will be described later.
Fixed focal length lenses
There are two basic types of lenses (in respect to focal length) used in CCTV: fixed focal length and
variable focal length (often called zoom) lenses.
Fixed focal length lenses, as the name suggests, are designed with a fixed focal length, that is, giving
only one angle of view. Such lenses are usually designed to have minimum aberrations and maximum
resolution, so there are not many moving optical parts, except the focusing group.
The quality of a lens depends on many factors, of which the most important are the materials used (the
type of glass, mechanical assembly, gears, etc.), the processing technology, and the design itself.
When manufacturers produce a certain type of lens, they have in mind its application and use. The lens
quality aimed for is dictated by practical and market requirements. As mentioned previously, when
3. Optics in CCTV
MTF and CTF were discussed, there is no need to go to the technical limits of precision and quality
(and consequently increase the cost) if that cannot be seen by the imaging device (CCD chips in this
case). This, however, does not mean that there is no difference among different makes and models of
the same focal length. Usually, the price goes hand in hand with the quality.
More than two decades ago, when 1'' tube cameras were used, 25 mm lenses offered a normal angle
of view (approximately 30° horizontal angle).
With the evolution of the formats (that is, with their
reduction), the focal length for the normal angle of
view was reduced, too. The mounting thread, however,
for compatibility purposes, remained the same.
With the C-mount format this thread was defined
as 1"-32UN-2A, which means it is 1" in diameter
with 32 threads/inch. When the new and smaller CS
format was introduced, the same thread was again kept
for compatibility, although the back-flange distance
was changed. This will be explained later in this chapter.
In respect to the iris, there are two major groups of fixed focal
length lenses: manual iris (MI) and automatic iris lenses (AI),
and these were described under the previous heading.
Finally, let us mention the vari-focal group of lenses. These
lenses should be classified as fixed focal length lenses,
because once they are manually set to a certain angle of
view (focal length) they have to be re-focused, unlike zoom
lenses, which once focused, stay in focus if the angle of view Vari-focal lenses can be clasified
as manually adjustable fixed focal
is changed.
3. Optics in CCTV
Zoom lenses
In the very early days of television, when a cameraman needed a different focal length lens, he would
use a specially designed barrel, fitted with a number of fixed lenses that rotated in front of the camera.
Different focal lengths were selected from this group of fixed lenses.
This concept, though practical compared to manually changing the lenses, lacked continuity of
length selection, and more importantly, optical blanking was unavoidable when a selection was being
That is why optical engineers had to come up with a design for a continuous focal length variation
mechanism, which got the popular name zoom. The zoom lens concept lies in the simultaneous movement
of a few groups of lenses. The movement path is obviously along the optical axis but with an optically
precise and nonlinear correlation. This makes not only the optical but also the mechanical design
very complicated and sensitive. It has, however, been accomplished, and as we all know today, zoom
lenses are very popular and practical in both CCTV and broadcast television.
With a special barrel cam mechanism, usually two groups of lenses (one called variator and the
other compensator) are moved in relation to each other so that the zooming effect is achieved while
preserving the focus at an object. As you can imagine, the mechanical precision and durability of the
moving parts are especially important for a successful zooming function.
For many perfectionists in photography, zoom lenses will never be as good as fixed ones. In the absolute
sense of the word this is very true because the moving parts of a zoom lens must always have some
tolerance in its mechanical manufacture, which introduces more aberrations than what a fixed lens
design has. Hence, the absolute optical quality of a certain focal length setting in a zoom lens can never
be as good as a well-designed
fixed lens of the same focal
For CCTV applications,
however, where the CCD chip
resolution is nowhere near a
film structure, compromises
are possible with good results.
Continuous variation of angles
of views, without the need
to physically swap lenses, is
extremely useful and practical.
This is especially the case
where cameras are mounted
in fixed locations (as on a pole
or on top of a building) and
resolution requirements are not
as high as with film cameras.
3. Optics in CCTV
Zoom lenses have a complex but very precise movement
of their optical elements.
It should not be assumed, however, that in their evolution zoom lenses will not come very close to the
optical quality of the fixed ones.
Zoom lenses are usually represented by their zoom ratio. This is the ratio of the focal length at the
telephoto end of the zoom and the focal length at the wide angle end. Usually, the telephoto angle
is narrower than the standard angle of vision, and the wide angle is wider than the standard angle of
vision. Since the telephoto end always has a longer focal length than the wide angle, the ratio is a
number larger than one.
3. Optics in CCTV
The most popular zoom lenses used in CCTV are:
• 6×: Six times, with 6–36 mm, 8–48 mm, 8.5–51 mm, and 12.5–75 mm being the most
• 10×: Ten times, with 6–60 mm, 8–80 mm, 10–100 mm, 11–110 mm, and 16–160 mm as the
most common examples.
• 15×: Fifteen times, with 6–90 mm, 8–120 mm.
Other ratios are available, such as 20×, or even 44× and 55×, but they are much more expensive and,
therefore, not very common.
In the last five to ten years, the miniature PTZ domes have
become very popular. Most of them have integral zoom
lenses with optical zoom range of 12×, 16×, or even 18×
zoom range. Usually, digital zooming of at least half a
dozen times is added onto this, which makes these little
domes extremely powerful. The digital zooming is not the
same as optical, but in some cases it may help see distant
objects a bit better. These PTZ dome cameras can have such
powerful optical zooming, and yet look so small (the typical
diameter of a PTZ dome module is around 12 cm), because
they are based on 1/4" CCD chips. The smaller the chip,
the smaller the optics is. This is one of the main reasons for
chip reduction, besides the manufacturing cost. It should be
made quite clear that the precision of manufacturing zoom
lenses for 1/4" CCD chips is more demanding because of
the miniaturization.
Zoom lenses are also characterized by their F-stop (or T-number). The F-stop, in zoom lenses (as
already mentioned when F-numbers were discussed) refers to the shortest focal length. For example,
for a 8–80 mm/1.8 lens the F-1.8 refers to the 8 mm. The F-stop is not constant throughout the zoom
range. It usually stays the same with the increase of focal length only until a certain focal length is
reached, after which a so-called F-drop occurs. The focal length at which this F-drop occurs depends
on the lens construction. The general rule, however, is the smaller the entrance lens, the higher the
likelihood of an F-drop. This is one of the main reasons lenses with bigger zoom have to have bigger
front lens elements (called objective), where the intention is to have a minimal F-drop.
Zoom lenses, like fixed lenses, come with manual iris, automatic iris, or motorized iris. Even though AI
was explained in the previous section with fixed lenses, and because there is an additional and common
subgroup with motorized iris, we will go through this again.
A manual iris zoom lens would have an iris ring, which is set manually by the installer or by the
user. This is a very rare type of lens in CCTV, and it is used in special situations, such as when doing
demonstrations or camera testing.
3. Optics in CCTV
An automatic iris zoom lens, often called auto iris (AI), is the most common type of zoom lens. This
lens has an electronic circuit inside, which acts as an electronic-optical feedback. It is usually connected
to the back of the camera
where it gets its power supply
(9 V DC) and its video signal.
The lens’s electronics then
analyze the video signal
level and act accordingly: if
the signal exceeds the video
level of 0.7 V, the lens closes
the iris until a 0.7 V signal
is obtained from the camera
AI terminal. If, however, the
signal is very low, the iris
opens in order to let more
light in and consequently
increases the video level.
Auto iris connection of a zoom lens
Two adjustments are available for this type of lens (as with fixed lenses): level and ALC.
Level, as the name indicates, adjusts the reference level of the video signal that is used by the electronics
of the lens in order to open or close the iris. This affects the brightness of the video signal. If it is not
adjusted properly (that is, adequately sensitive for daylight and lowlight situations), a big discrepancy
between the day and night video signals will occur. Obviously, the camera sensitivity has to be taken
into account when adjusting the iris level for low light level situations.
ALC adjustment refers to the automatic light compensation of the iris. This is in fact very similar to
the backlight compensation (BLC) found in many camcorders (as we have already explained in the
fixed lenses section). This light compensation is usually applied when looking at scenes with very high
contrast. The idea behind BLC operation is to open the iris more (even if there is a lot of light in the
background) so as to see details of the objects in the foreground. A typical example would be when
a camera is looking through a hallway (with a lot of light in the background) trying to see the face
of a person coming toward the camera.
With a normal lens setting, the face of
the person will appear very dark because
the background light will cause the iris
to close. A proper ALC setting could
compensate for such difficult lighting
conditions. The bright background in the
example above will become white, but the
foreground will show details. The ALC
setting actually adjusts the reference level
relative to the video signal average and
peak values. This is why the marks on the
ALC of a lens show Peak and Average.
Auto iris zoom lens with ALC and level pots
3. Optics in CCTV
Remember that, when you strat adjusting the ALC, a very high-contrast scene needs to be viewed by
the camera. If the opposite (low-contrast scene) is seen, no visible change of the video signal will occur.
So, by tweaking the ALC pot in a scene with normal contrast, a misalignment may occur that will be
visible only when the picture light changes.
All of the above mentioned refers to the majority of AI lenses, which are driven, as described, by the
video signal picked up from the AI connector at the back of the camera. Because of this, and because
there is another subgroup of AI zoom lenses that are not driven by the video signal taken from the
camera, we also call this AI type video-driven AI.
The other subgroup of the AI group of lenses are the DC-driven AI zoom lenses.
The DC-driven AI lenses do not have all the electronics for video processing, only the motor that opens
and closes the iris. The whole processing, in DC-driven auto iris lenses, is done by the camera’s
AI electronic section. The output from such a section is a DC voltage that opens and closes the iris
leaves according to the video level taken from inside of the camera. Cameras that have DC AI output
also have the level and ALC adjustments, but in this case on the camera body and not on the lens.
It should be clearly noted that video-driven AI zoom lenses cannot be used with cameras that provide
DC AI output, nor can DC AI be used with a video AI output camera. Some cameras can drive both
these types of AI designs, in which
case a switch or separate terminals are
available for the two different outputs.
Pay attention to this fact, for it can create
problems that initially seem impossible
to solve. In other words, make sure that
both the camera and the lens are of the
same type of AI operation.
The advantage of video-driven AI zoom
lenses is that they will work with the
majority of cameras. The advantage of
DC-driven AI zoom lenses is that they
are cheaper and are unlikely to have the
“hunting” effect as the camera processes
the gain. The disadvantage is that not all Some cameras can “drive” both types of AI lenses.
cameras have a DC-driven AI output. To
date, video-driven AI lenses are more common.
Motorized iris lenses belong to the third lens subgroup, if selection on the basis of the iris function
is made. This is an iris mechanism that can be controlled remotely and set by the operator according
to the light conditions. This type of zoom lens has become increasingly popular in the last few years,
especially with the development of CCD-iris cameras.
In order to open or close the iris, instead of an AI circuit driving the iris leaves, a DC voltage, produced
by the PTZ site driver, controls the amount of opening or closing. PTZ site drivers will be explained
3. Optics in CCTV
later in Chapter 12, but to put it very simply they are boxes with electronics that are capable of receiving
encoded digital data for the movement of the Pan/Tilt head, as well as the zoom lens functions, and
converting it into voltage that actually drives the PTZ assembly. In the case of motorized iris lenses,
the PTZ site driver has to have an output to drive the iris as well.
With the CCD-iris camera it is better to have this type of lens iris control than an automatic one. The
CCD-iris (electronic function of the CCD chip) is a faster and more reliable light-controlling section of
the camera, but it does not substitute the depth of field effect produced by the high F-stops of an optical
iris. Optical and electronic irises cannot function properly if they are working simultaneously. The
video camera usually balances with a low F-stop (high iris opening), which results in a very narrow
depth of field, and a high electronic shutter speed, which produces a less efficient charge transfer (that
is, high smear). This is especially obvious when such a camera/lens combination comes across a highcontrast scene. To avoid a low-quality picture, and yet use the benefits of a fast and reliable CCD-iris
function, and even more, have depth of field, motorized iris lenses are the solution. It will obviously
require an operator’s intervention, but that does not have to happen until the picture demands it, since
the CCD-iris will be functioning constantly to compensate for the abrupt light variations.
When ordering zoom lenses, you are expected to specify whether you want a motorized iris lens;
otherwise, the manufacturer may supply you with a standard video-driven AI zoom lens as they are
the most common.
And finally, let us mention the vari-focal lenses again. Vari-focals do not have the same functionality
as the zoom lenses. Their classification should be in the fixed focal lenses group. They will be practical
in cases where the customers do not know what angle of coverage they require, but they have to always
be manually re-focused once the angle of view (that is, the focal length) is changed.
A note of warning: be more critical of the optical quality of vari-focal lenses. It is more difficult to
produce the same optical resolution due to additional movement when compared to fixed focal lenses.
Of course, in some situations vari-focals may have quite sufficient quality for the application, but trials
will always give you a better judgment.
C- and CS-mount and back-focus
“Back-focusing” is what we call the adjustment of the lens back-flange relative to the CCD image
plane. Back-focusing is very important in CCTV. Currently, there are two standards for the distance
between the back-flange of a lens and the CCD image plane:
• C-mount, represented with 17.5 mm (more precisely, 17.526 mm).
This is a standard mounting, dating from the very early days of tube cameras. It consists of a metal
ring with a 1.00/32 mm thread and a front surface area at 17.5 mm away from the image plane.
• CS-mount, represented with 12.5 mm.
This is a new standard intended for smaller camera and lens designs. It uses the same thread of
3. Optics in CCTV
1.00/32 mm as the C-mount, but it is approximately 5 mm closer to the image plane. The intention
is to preserve compatibility with the old C-mount format lenses (by adding a 5 mm ring) and yet
allow for cheaper and smaller lenses, to suit smaller CCD chip sizes, to be manufactured.
Since both of the above formats use the thread type of lens mounting, there might be small variations
in the lens’s position relative to the CCD chip when mounted (screwed in), hence the need for a little
variation of this position (back-focus adjustment).
In photography, for example, we never
talk about back-focusing simply because
most of the brands come with a bayonet
mount, which has only one fixed position
of the lens relative to the film plane.
Camcorders, for that matter, come with
lenses as an integral part of the unit, so
the back-focus is already adjusted and
never changes.
In CCTV, because of the modular concept
of the camera/lens combination and the
thread mount, it is a different story.
Back-focus adjustment is especially
important and critical when zoom
lenses are used. This is because the opticsto-CCD distance in zoom lenses has to be
very precise in order to achieve good
focus throughout the zoom range.
If a lens is C-mount and the camera is CS,
a C/CS adaptor ring is required.
Obviously, the back-focusing adjustment applies to fixed lenses as well, only in that case we tend not to
pay attention to the distance indicator on the lens ring when focusing. If we want to be more accurate,
when the back-focus is adjusted correctly on a fixed lens, the distance indicator should show the real
distance between the camera and objects. Most installers, however, do not pay attention to the indicator
on the lens since all they want to see is a sharp image on the monitor. And this is fine, but if we want
to be precise, the back-focus adjustment should apply to all lenses used in CCTV. With zoom lenses
this is more critical.
An important factor to be taken into account when doing back-focus adjustment is the effect of depth
of field. The reason is very simple: if a CCTV camera is installed at daytime (which is most often
the case) and if we are using an AI lens, it is natural to see the iris set at a high F-stop to allow for a
good picture (assuming the AI is connected and works properly). Since the iris is at a high F-stop, we
have a very high depth of field. The image seems sharp no matter where we position the focusing. At
nighttime, however, the iris opens fully owing to the low light level situation, and the operator sees
the picture out of focus.
This is actually one of the most common problems with new installations. When a service call is
3. Optics in CCTV
placed, usually the installer comes during
the daytime to see what the problem is,
and if the operator cannot explain exactly
what he or she sees, the problem may not
be resolved, since the picture looks great
with a high F-stop.
The moral of the above is: always do the
back-focus adjustment with a low F-stop
(largest lens iris opening).
How do you make the iris open to
maximum? The following different
methods are used:
• Adjust the back-focus at low light
levels in the workshop (easiest).
• Adjust the back-focus in the late
afternoon, on site.
Some cameras don’t require C/CS adaptor.
• Adjust the back-focus at daytime, on site, by using external ND filters.
There is one exception to the above: if a camera with CCD-iris is used, then the optical iris can be opened
fully even at daylight because the CCD-iris will compensate for the excessive light. This means that
with CCD-iris cameras, the back-focus can easily be adjusted even at daylight without being confused
with the depth of field and without a need for ND filters. Obviously, you should not forget to switch
the CCD-iris off after the adjustment, should you decide to use auto iris.
Back-focus adjustment
In the following few paragraphs we will examine a procedure for proper back-focus adjustment. This
discussion is based on practical experience and by no means the only procedure, but it will give you a
good understanding of what is involved in this operation. We should also clarify that often, with a new
camera and zoom lens setup, there might not be a need for back-focus adjustment. This can be easily
checked as soon as the lens is screwed onto the C or CS ring, and the camera is connected to a monitor.
Obviously, the zoom and focus functions have to be operational so that one can check if there is a need
for adjustment. The idea is to get as sharp an image as possible, and if a zoom lens is used, once it is
focused on an object, the object should stay in focus no matter what zooming position is used. If this
is not the case, then there is a need for back-focus adjustment.
One will rightly ask: “What is so complicated about adjusting the back-focus?”
The answer is of a rather practical nature and is apart from the depth-of-field problems. The reason for
3. Optics in CCTV
this is that no zoom lens in CCTV comes with a distance indicator engraved on the lens. For example, if
a zoom lens had a distance indicator engraved, we could set the focus ring to a particular distance, then
set an object at that distance, and adjust for a perfectly sharp image (a monitor, of course, is required)
while rotating the lens together with the C-mount ring, or perhaps adjusting the CCD chip back and
forth by a screw mechanism on the camera. But, of course, the majority of zoom lenses do not come
with these distances engraved, so the hard part is to determine the starting point.
All lenses have two known points on the focus ring (the limits of the focus ring rotation):
• Focus infinity “∞” (no lens focuses past this point)
• Focus at the minimum object distance (MOD)
The second point varies with different lenses, that is, we do not know what the minimum focusing
distance of a particular lens will be, unless we have the manufacturer’s specification sheet, which is
usually not supplied with the lens or, quite often, is lost in the course of installation.
This leaves us with only the focus infinity as a known point. Obviously, infinity is not literally an infinite
distance, but it is big enough to give a sharp image when the lens is set to the “∞” mark.
The longer the focal length of the lens, the longer the infinity distance that has to be selected. For a
typical CCTV zoom lens of 10× ratio, which is usually 8–80 mm, 10–100 mm, or 16–160 mm, this
distance may be anything from 200–300 m onward. From this, we can see it is impossible to simulate
this distance in the workshop, so the technician working on the back-focus needs to point the camera
out through a window, in which case external ND filters are required to minimize the effect of depth
of field (unless a CCD-iris camera is used, of course).
The next step would be to set the focus to the infinity mark. To do this, a PTZ controller would be
required, but this is obviously impractical.
I suggest, therefore, simulating the zoom and focus control voltages by using a regular 9 V DC battery
3. Optics in CCTV
and applying it to the focus and zoom wires. Do not forget, the lens focus and zoom control voltages
are from ±6 V DC to ±9 V DC, and the lens has a very low current consumption, usually below 30 mA.
A 9 V battery has plenty of capacity to drive such motors for a considerable time, at least long enough
for the adjustment procedure to be completed.
There is no standard among manufacturers for the lens wire color coding, but quite often, if no
information sheet is supplied with the lens, the black wire is common, the zoom wire is red, and the
focus wire is blue. This is not a rule, so if in doubt it is not that difficult to work it out by using the
same battery and monitor. Instead of a monitor, an even more practical tool would be a viewfinder;
some call it a focus adjuster. This is a little monitor that is battery operated, with a rubber eyepiece
to protect from excessive daylight. On a bright sunny day, if a normal monitor is used in the field, it
will be almost impossible to see the picture on the screen, so it is highly recommended that you use a
viewfinder instead.
If no monitor is available at the point of adjustment, a distinction should be made between which
optical parts move when focusing and which when zooming. This is not so naive, since zoom lenses are
enclosed in black (or beige) boxes and no moving parts are visible. A rule of thumb, however, would
be zooming elements are not visible from the outside, while focusing is performed by the first group
of lenses, called the objective. When focusing is done, the objective rotates around its optical axis and
at the same time moves along the optical axis toward either the inside or the outside of the lens. All
lenses have this common concept of the objective moving toward the outside when focusing to closer
distances, and moving toward the inside of the lens when focusing to infinity. See the section on the
focusing concepts, on page 64, for an explanation.
So, even if the zoom lens does not have any visible markings for distances and zoom factors, using the
above logic we can start doing the back-focus adjustment.
3. Optics in CCTV
With the battery applied to the focus wire we need to focus the lens
to infinity. Even if we do not have a monitor, this will be achieved
when the lens objective goes to the end position on the inside of
the lens.
The next step is to point the camera to an infinity object, at a distance we
have already mentioned. The infinity objects can be trees or antennas
on the horizon.
Now, without changing the focus, zoom in and out fully. If the picture
on the monitor looks sharp throughout the zooming range, backfocusing is not necessary.
If, however, the camera’s C- or CS-mount ring is out of adjustment,
we will not see a sharp picture on the monitor for all positions of the
Then, we proceed with adjustment by either rotating the lens together with the C-ring (if the camera
is of such a type) or by shifting the CCD chip with a special back-focus adjustment screw or in some
cameras by rotating a large ring with C & CS written on it.
The first type of camera is the most common. In this case, the C- or CS-mount ring is usually secured
with miniature hexagonal locking screws. These need to be loosened prior to the adjustment but after
the zoom lens is screwed in tightly.
Then, when the focus is to be adjusted (after we did the battery focusing and pointing to infinity) we need
to rotate the zoom lens but now together with the ring (that is why we have loosened the ring). Again,
some cameras may have a special mechanism that shifts the CCD chip back or forth, in which case it
is easier since we do not have
to rotate the lens.
By doing one of the above,
the distance between the lens
and the CCD chip changes
until the picture becomes
sharp. Do not forget, because
we have made the depth of
field minimal by opening the
iris as much as possible (with
low light level simulation),
the sharpness of the objects in
the distance should be quite
easily adjusted. Once we find
the optimum we should stop
Some CCD cameras use miniature hexagonal screws to
secure the C-mount ring to the camera.
3. Optics in CCTV
Please note that the focus wires are not used yet; that is, we still need to have the zoom lens focused
at infinity. We are only making sure that while zooming, the lens stays focused at infinity throughout
the zoom range. Also, we need not be confused when the objects at infinity are getting smaller while
zooming out; because of the image size reduction, they might give the impression that they are going
out of focus.
The next step would be to zoom by using the 9 V battery. Watch the video picture carefully and make
sure that the objects at infinity stay in focus while zooming in or out. If this is the case, our back-focus
is nearly adjusted.
In order to confirm this, the next step would be to point the camera at an object that is only a couple
of meters away from the camera. Then we zoom in on the object and use the focusing wires to focus
on it. When focused precisely, use the zoom wires and zoom out. If the object stays in focus, that will
be confirmation of a correct back-focus adjustment.
The last step would be to tighten the little hexagonal screws (if such a camera is used) and secure the
C/CS mount ring on the camera.
If the above procedure does not succeed from the very first go, a couple of iterations might be necessary,
but the same logic applies.
As one can imagine, the mechanical design and robustness of the C-mount CCD-chip combination
is very important, especially the precision and “parallelness” of the C-ring and the CCD chip plane.
Little variations of only one-tenth of a millimeter at the image plane may make a focus variation of a
couple of meters at the object. With bad designs, such as locking the C-ring with only one screw or poor
mechanical construction, problems might be experienced even if the above procedure is correct. So it
is not only the lens that defines the picture quality, but the camera’s mechanical construction as well.
We have mentioned that a monitor is required when doing the back-focus adjustment, which is not a
surprise. This is fine when the adjustments are done in the workshop, but when back-focusing needs
to be performed on site it is almost impossible to use a normal CRT
monitor. The reason for this is not so much the impracticality of the
need for a main supply (240 VAC or whatever the country you are
in has), but more so because of the bright outdoor light compared
to the brightness produced by a CRT monitor. This is why I have
recommended the use of a viewfinder monitor (like the ones used on
camcorders) with a rubber eyepiece that protects from external light
and allows for comfortable use. In addition, these little viewfinder
Focus-adjusting tool
monitors are battery operated and very compact. Some manufacturers
have viewfinder focus adjusters specially made with a flicker indicator
to show when objects are in focus.
Small and practical tools like this one make the difference between a good and bad CCTV system
installation and/or commissioning.
3. Optics in CCTV
Optical accessories in CCTV
Apart from fixed and zoom lenses in CCTV, we also have some optical accessories.
One of the more popular is the 2× teleconverter (also known as an extender). The teleconverter is a
little device that is usually inserted between the lens and the camera. The 2× converter multiplies the
focal length by a factor of 2. In fact, this means a 16 mm lens will become 32 mm, a zoom lens 8–80
mm will become 16–160 mm, and so on. It is important to note, however, that the F-number is also
increased for one F-stop value. For example, if a 2× converter is used on a 16 mm/1.4, this becomes
32 mm/2. Back-focusing a lens with a
2× converter may be more complicated.
It is recommended that you first do the
back-focusing of the zoom lens alone,
and then just insert the converter. Some
zoom lenses come with a teleconverter
built-in but removable with a special
control voltage. For this purpose the
auxiliary output from a site driver can be
used. In general, the optical resolution of
a lens with a converter is reduced, and
if there is no real need for it, it should
be avoided. It should be noted that 1.5×
converters also exist.
Another accessory device is the external ND
filter, which comes with various factors of light
attenuation – 10×, 100×, or 1000×. They can also
be combined to give higher factors of attenuation.
As we have already described, external ND
filters are very helpful in back-focusing and AI
adjustments. Since they come as loose pieces of
glass, you may have to find a way of fixing them in
front of the lens objective. Some kind of a holder
could be made for better and more practical use
of the filters.
A 100X neutral density filter
Polarizing filters might sometimes be required when using a CCTV camera to view through a window
or water. In most cases, reflections make it difficult to see what is beyond the glass or water surface.
Polarizing filters can minimize such an effect. However, there is little drawback in the practicality of
this, since a polarizing filter requires rotation of the filter itself. If a fixed camera is looking at a fixed
area that requires a polarizing filter, that might be fine, but it will be impossible to use it on a PTZ (pan/
tilt/zoom) camera because of constant camera repositioning and objective rotation when focusing.
For special purposes, when the camera needs to have a close-up (macro) view of a very small object, it
is possible to focus the lens on objects much closer than the actual MOD (Minimum Object Distance)
3. Optics in CCTV
as specified by the lens manufacturer. This can be achieved
with special sets of extension rings that can be purchased
through some lens suppliers. It is much easier and also
more practical to use surplus CS-mount adaptor rings. By
combining one or more of them, and depending upon the
focal length in use, macro views can be obtained. This might
be useful for inspecting surface mount PCB components and
stamps, detecting fake money, and monitoring insects or other
miniature objects.
4. General characteristics of
television systems
This chapter discusses the theoretical fundamentals of video signals, their bandwidth and resolution.
It is intended for technical people who want to know the limits of the television system in general and
CCTV in particular.
A little bit of history
In order to understand the basic principles of television, we have to refer to the effect of eye persistence
(see Chapter 2).
Television, like cinema, uses this effect to cheat our brain, so that by showing us still images at a very
fast rate, our brain is made to believe that we see “motion pictures.”
In 1903, the first film shown to the public was The Great Train
Robbery which was produced in the Edison Laboratories.
This event marked the beginning of the motion picture
revolution. Although considered younger than film, the
concept of television has been under experimentation since
the late nineteenth century. It all began with the discovery
of the element selenium and its photoelectricity in 1817 by
the Swedish chemist Jons Berzelius. He discovered that the
electric current produced by selenium, when exposed to light,
would depend on the amount of light falling onto it. In 1875,
G.R. Carey, an American inventor, made the very first crude
Baird’s television receiver, 1923
television system, in which banks of photoelectric cells were
used to produce a signal that was displayed on a bank of light bulbs,
every one of which emitted light proportional to the amount of light
falling onto the photo cells. A few minor modifications were made to
this concept, such as the “scanning disk” presented by Paul Nipkow
in 1884, where elements were scanned by a mechanical rotating disk
with holes aligned in a spiral. In 1923, the first practical transmission
of pictures over wires was accomplished by John Baird in England and
later Francis Jenkins in the United States of America. The first broadcast
was transmitted in 1932 by the BBC in London, while experimental
broadcasts were conducted in Berlin by the Fernseh Company, led by
the cathode ray tube (CRT) inventor Professor Manfred von Ardenne.
In 1931, a Russian born engineer, Vladimir Zworykin, developed the
first TV camera known as the iconoscope, which had the same concept
Zworykin’s iconoscope as the later developed tube cameras and the CRT.
4. General characteristics of television systems
Both of these technologies, film and TV, produce many static images per second in order to achieve
the motion effect. In TV, however, instead of projecting static images with a light projector through a
celluloid film, this is achieved with electronic beam scanning. Pictures are formed line by line, in the
same manner as when reading a book, for example, from left to right and from top to bottom (as seen
from in front of the CRT). The persistency of the phosphor coating of the monitor’s CRT is playing an
important role in the whole process.
The very basics of television
There are a few different television standards used worldwide today. CCIR/PAL recommendations are
used throughout most of Europe, Australia, New Zealand, most of Africa, and Asia. A similar concept is
used in the EIA/NTSC recommendations for the television used in the United States, Japan, and Canada,
as well as in the SECAM recommendations used in France, Russia, Egypt, some French colonies, and
Eastern European countries. The major difference between these standards is in the number of scanning
lines and frame frequency.
Before we begin the television basics, let us first explain the abbreviation terminology used in the
technical literature discussing television:
CCIR stands for Committée Consultatif International des Radiotelecommuniqué. This is the committee
that recommended the standards for B/W television accepted by most of Europe, Australia, and others.
Hence we call equipment that complies with the B/W TV standards CCIR compatible. The same type
of standard, but later extended to color signals, was called PAL. The name comes from the concept
used for the color reproduction by alternate phase changes of the color carrier at each new line – hence,
Phase Alternating Line (PAL). Majority of CCIR/PAL systems are based on 625 scanning lines and
50 fields/s, although there are variations with 525 lines.
EIA stands for Electronics Industry Association, an association that created the standard for B/W
television in the United States, Canada and Japan, where it is often referred to as RS-170, the recommendation code of the EIA proposal. When B/W TV was upgraded to color, it was named by the group
that created the recommendation: the National Television Systems Committee (NTSC). The EIA/NTSC
systems are based on 525 scanning lines and 60 fields/s.
SECAM comes from the French “Séquentiel Couleur avec Mémoire” which actually describes how
the color is transmitted by a sequence of chrominance color signals and the need for a memory device
in the TV receiver when decoding the color information. Initially patented in 1956 by Henri de France,
the SECAM was actually the first analog color television proposal, based on 819 lines and 50 fields/s.
Later on, SECAM switched to 625 lines.
All of the TV standards’ recommendations have accepted the picture ratio of the TV screen to be 4:3
(4 units in width by 3 units in height). This is due mostly to the similar film aspect ratio of the early
days of television. The different number of lines used in different TV standards dictates the other
characteristics of the system, such are the signal bandwidth and resolution.
4. General characteristics of television systems
Regardless of these differences, all of the systems use the same concept of composing pictures with
electron beam scanning lines, one after another.
When a video signal, as produced by a camera, comes to the monitor input, the voltage fluctuations
are converted into current fluctuations of electrons in the electron beam that bombards the phosphor
coating of the cathode ray tube (CRT) as it is scanning line by line. The phosphor coating produces
light proportional to the amount of electrons, which is proportional to the voltage fluctuation. This is,
of course, proportional to the light information falling onto the camera CCD chip, thus, the monitor
screen shows an image of what the camera has seen.
4. General characteristics of television systems
The phosphor coating of the monitor has some persistency as well – light produced by the beam does
not immediately disappear with the disappearance of the beam. It continues to emit light for another
few milliseconds. This means the TV screen is lit by a bright stripe that moves downward at a certain
This is obviously a very simplified description of what happens to the video signal when it comes to the
monitor. We will discuss monitor operation in more detail in Chapter 6, but we will use the previous
information as an introduction to the television principles for the readers who do not have the technical
Many factors need to be taken into account when deciding the number of lines and the picture refresh
rate to be used. As with many things in life, these decisions have to be a compromise – a compromise
between as much information as possible, in order to see a faithful reproduction of the real objects,
and as little information as possible, in order to be able to transmit it economically and receive it by a
large number of users who can afford to buy such a TV receiver.
The more lines used, combined with the number of pictures per second, the wider the frequency
bandwidth of the video signal will be, thus dictating the cost of the cameras, processing equipment,
transmitters, and receivers.
The refresh rate, that is, the number of pictures composed in 1 second, was decided on the basis of the
persistence characteristic of the human eye and the luminance of the CRT. Theoretically, 24 pictures
per second would have been ideal because of the compatibility between cinematography and television
(used widely at the time of television’s beginning). Practically, however, this was impossible because
of the very high luminance produced by the phosphor of the CRT, which led to the flicker effect which
depends on the viewing distance and the screen luminance, as shown on the diagram on the previous
With many experiments it was found that at least 48 pictures per second were required for the flicker
to be eliminated. This would have been a good number to use because it was identical to the cinema
projector frequency and would be very practical when converting movies into television format. Still,
this was not the number that was accepted. The television engineers opted for 50 pictures per second
in CCIR and 60 in EIA recommendations. These numbers were sufficiently high for the flicker to be
undetectable to the human eye, but more importantly they coincided with the mains frequency of 50
Hz used all over Europe and 60 used in the United States, Canada, and Japan. The reason for this lies
in the electronic design of the TV receivers that were initially very dependent on the mains frequency.
Should the design with 48 pictures have been accepted, the 2 Hz difference for CCIR and 12 Hz for
EIA, would have caused a lot of interference and irregularities in the scanning process.
The big problem, though, was how to produce 50 (PAL) or 60 (NTSC) pictures per second, without
really increasing the initial camera scan rate of 25 (that is 30) pictures per second. Not that the camera
scan rate could not be doubled, but the bandwidth of the video signal would have to be increased,
thus increasing the electronics cost, as mentioned previously. Also, broadcasting channels were taken
into account, which would have to be wider, and therefore fewer channels would be available for use,
without interference, in a dedicated frequency area.
4. General characteristics of television systems
All of the above forced the engineers to use a trick, similar to the Maltese Cross used in film projection,
where 50 (60) pictures would be reproduced without increasing the bandwidth. The name of this trick
is interlaced scanning.
Simplified representation of the interlaced scanning
Instead of composing the pictures with 625 (525) horizontal lines by progressive scanning, the solution
was found in the alternate scanning of odd and even lines. In other words, instead of a single TV picture
being produced by 625 (525) lines in one progressive scan, the same picture was broken into two
halves, where one-half was composed of only odd lines and the other of only even lines. These were
scanned in such a way that they precisely fitted in between each other’s lines. This is why it is called
interlaced scanning. All of the lines in each half – in the case of the CCIR signal 312.5 and in NTSC
262.5 – form a so-called TV field. There are 25 odd fields and 25 even fields in the CCIR and SECAM
systems, and 30 in the EIA system – a total of 50
fields per second, or 60 in EIA, flicking one after the
other, every second.
An odd field together with the following even field
composes a so-called TV frame. Every CCIR/PAL
and SECAM signal is thus composed of 25 frames
per second, or 50 fields. Every EIA/NTSC signal
is composed of 30 frames per second, which is
equivalent to 60 fields.
The actual scanning on the monitor screen starts at
the top left-hand corner with line 1 and then goes to
line 3, leaving a space between 1 and 3 for line 2,
4. General characteristics of television systems
The vertical sync pulses shown on an oscilloscope (left) and on a monitor with
V-Hold adjustment (right)
which is due to come when even lines start scanning. Initially, with the very first experiments, it was
hard to achieve precise interlaced scanning. The electronics needed to be very stable in order to get
such oscillations that the even lines fit exactly in between the odd lines. But a simple and very efficient
solution was soon found in the selection of an odd number of lines, where every field would finish
scanning with half a line. By preserving a
linear vertical deflection (which was much
easier to ensure), the half line completes the
cycle in the middle of the top of the screen,
thus finishing the 313th line for CCIR (263th
for EIA), after which the exact interlace
was ensured for the even lines.
When the electron beam completes the
scanning of each line (on the right-hand
side of the CRT, when seen from the front), it
receives a horizontal synchronization pulse
(commonly known as horizontal sync).
This sync is embedded in the video signal
and comes after the line video information.
It tells the beam when to stop writing the
video information and to quickly fly back
to the left at the beginning of the new line.
Similarly, when a field finishes a vertical
sync pulse, it “tells” the beam when to stop
“writing” the video information and to
quickly fly back to the beginning of the new
field. The fly-back period of the electron
beam scanning is faster than the actual
active scanning, and it is only positional.
That is, no electrons are ejected during these
periods of the picture synthesis.
A test pattern generator signal and its waveform
on an oscilloscope
4. General characteristics of television systems
In reality, even though the scanning system is called 525 TV lines (or 625 for PAL), not all of the lines
are active, that is, visible on the screen. As can be seen on the NTSC and PAL TV Signal Timing Chart
(on the next two pages), some of the lines are used for vertical sync equalization, others are not used,
and still others are practically invisible because of the overscanning effect. Remember, no monitor or
TV shows 100% of the camera video signal, except for the special broadcasting monitors.
If we take into account the errors in the beam interlace, the thickness of the beam and so on in the
CCIR system (and again, a similar logic applies to the other standards), we cannot count more than 576
active TV lines in PAL and not more than 480 in NTSC. These are the limits of analog PAL and NTSC
4. General characteristics of television systems
NTSC television timing chart
4. General characteristics of television systems
PAL B television timing chart
4. General characteristics of television systems
Some of the “invisible” lines are used for other purposes quite efficiently. In the PAL Teletext concept,
for example, the CCIR recommends lines 17, 18, 330, and 331, where 8-bit digital information is
inserted. The Teletext decoder in your TV or VCR can accumulate the fields’ digital data, which contain
information about the weather, exchange rates, Lotto, and so on.
In some NTSC systems, line 21 carries closed captioning (i.e., subtitling information). Some of the other
invisible lines are used for specially shaped video insertion test signals (VITS), which when measured
at the receiving end, give valuable information on the quality of the transmission and reception in a
particular area. In CCTV, some manufacturers use the invisible lines to insert camera ID, time and
date, or similar information. When recorded on a VCR, these lines are also recorded but they are not
visible on the monitor screen. However, the information is always there, embedded in the video signal.
This type of information is more secure and harder to tamper with. It can be retrieved with a special
TV line decoder and used whenever necessary, revealing the camera ID together with the time and date
of the particular signal and, for example, the intruder in the picture.
The video signal and its spectrum
This heading discusses the theoretical fundamentals of the video signal’s limitations, bandwidth, and
resolution. This is a complex subject with its fundamentals involving higher mathematics and electronics,
but I will try to explain it in plain and simple language.
Most of the artificial electrical signals can be described mathematically. Mathematical description is
very simple for signals that are periodical, like the main power, for example. A periodical function can
always be represented with a sum of sine waves, each of which may have different amplitude and phase.
Similar to a spectrum of white light, this is called spectrum of an electrical signal. The more periodical
the electrical signal is, the easier it can be represented and with fewer sine wave components. Each sine
wave component can be represented with discrete value in the frequency spectrum of the signal. The less
periodical the function is, the more components will be required to reproduce the signal. Theoretically,
4. General characteristics of television systems
even a nonperiodical function can be represented with a sum of various sine waves, only that in such a
case there will be a lot more sine waves to summarize in order to get the nonperiodical result. In other
words, the spectral image of a nonperiodical signal will have a bandwidth more densely populated
with various components. The finer the details the signal has, the higher the frequencies will be in the
spectrum of the signal. Very fine details in the video signal will be represented with high-frequency
sine waves. This is equivalent to high-resolution information. A signal rich with high frequencies will
have wider bandwidth. Even a single, but very sharp, pulse will have a very wide bandwidth.
The above describes, in a very simplified way, the very important Fourier spectral theory, which states
that every signal in the time domain has its image in the frequency domain. The Fourier spectral
theory can be used in practice – wide bandwidth periodical electrical signals can be more efficiently
explored by analyzing their frequency spectrum. Without going deeper into the theory itself, CCTV
users need to accept the concept of the spectrum analysis as very important for examining complex
signals, such as the video itself. The video signal is perhaps one of the most complex electrical signals
ever produced, and its precise mathematical description is almost impossible because of the constant
change of the signal in the time domain. The video information (that is, luminance and chrominance
components) changes all the time. Because, however, we are composing video images by periodical
beam scanning, we can approximate the video signal with some form of a periodical signal. One of
the major components in this periodicity will be the line frequency – for CCIR and SECAM, 25 × 625
= 15,625 Hz; for EIA, 30 × 525 = 15,750 Hz.
It can be shown that the spectrum of a simplified video signal is composed of harmonics (multiples)
of the line frequency around which there are companion components, both on the left- and right-hand
sides (sidebands). The intercomponent distances depend on the contents of the video picture and the
dynamics of the motion activity. Also, it is very important to note that such a spectrum, composed of
harmonics and its components, is convergent, which means the harmonics become smaller in amplitude
as the frequency increases. One even more important conclusion from the Fourier spectral theory is
that positions of the harmonics and components in the video signal spectrum depend only on the
picture analysis (4:3 ratio, 625 interlaced scanning). The video signal energy distribution around the
harmonics depends on the contents of the picture. The harmonics, however, are at exact positions
because they only depend on the line frequency. In other words, the video signal dynamics and
amplitude of certain components in the sidebands will vary, but the harmonics locations (as subcarrier
frequencies) will remain constant.
Broadcast TV channels frequency displacement example (PAL)
4. General characteristics of television systems
This is a very important conclusion. It helped find a way, in broadcast TV, to reduce the spectrum of
a video signal to the minimum required bandwidth without losing too many details. There is always a
compromise, of course, but since the majority of the video signal energy is around the zero frequency
and the first few harmonics, there is no need and no way to transmit the whole video spectrum.
Scientists and engineers have used all of these facts to find a compromise, to find how little of the video
bandwidth need be used in a transmission, without losing too many details. As we already mentioned
when discussing different TV standards, the more scanning lines that are used in a system the wider
the bandwidth will be, and the higher the resolution of the signal is the wider the bandwidth will be.
Taking into account the electron beam’s limited size (which also dictates the smallest reproducible
picture elements), the physical size of the TV screens, viewing distances, and the complexity and
production costs of domestic TV sets, it has been concluded that for a good reproduction of a broadcast
signal, 5 MHz of video bandwidth is sufficient. Using a wider bandwidth is possible, but the quality
gain factor versus the expense is very low. As a matter of fact, in the broadcast studios, cameras and
recording and monitoring equipment are of much higher standards, with spectrums of up to 10 MHz.
This is for internal use only, however, for quality recording and dubbing. Before such a signal is RF
modulated and sent to the transmitting stage, it is cut down to 5 MHz video, to which about 0.5 MHz
is added for the left and right audio channels. When such a signal comes to the TV transmitter stage it
is modulated so as to have only its vestigial side band transmitted, with a total bandwidth, including
the separation buffer zone, of 7 MHz (for PAL). But please note that the actual usable video bandwidth
in broadcast reception is only 5 MHz. For the more curious readers we should mention that in most
PAL countries, the video signal is modulated with amplitude modulation (AM) techniques, while the
sound is frequency modulated (FM).
Similar considerations apply when considering NTSC signals, where the broadcasted bandwidth is
around 4.2 MHz.
In CCTV, with the majority of system designs, we do not
have such bandwidth limitations because we do not
transmit an RF-modulated video signal. We do not have
to worry about interference between neighboring video
channels. In CCTV, we use a raw video signal as it comes
out of the camera, which is a basic bandwidth video,
or usually called baseband video. This usually bears the
abbreviation CVBS, which stands for composite video
burst signal. The spectrum of such a signal, as already
mentioned, ranges from 0 to 10 MHz, depending on the
source quality.
Bayonet Neill-Concelman (BNC) is
the most common composite video
input connector in CCTV.
The spectral capacity of the coaxial cable, as a transmission medium, is much wider than this. The
most commonly used 75Ω coaxial cable RG-59B/U, for example, can easily transmit signals of up to
100 MHz bandwidth. This is applicable to a limited distance of a couple of hundred meters of course,
but that is sufficient for the majority of CCTV systems. Different transmission media imply different
bandwidth limitations, some of which are wider and some narrower than the coaxial one, but most of
them are considerably wider than 10 MHz.
4. General characteristics of television systems
Color video signal
When color television was introduced, it was based on monochrome signal definitions and limitations.
Preserving the compatibility between B/W and color TV was of primary importance. The only way
color information (chroma) could be sent together with the luminance without increasing the bandwidth
was if the color information was modulated with a frequency that fell exactly in between the luminance
spectrum components. This means that the spectrum of the chrominance signal is interleaved with the
spectrum of the luminance signal in such a way that they do not interfere. This color frequency is called
a chroma subcarrier and the most suitable frequency, for PAL, was found to be 4.43361875 MHz. In
NTSC, using the same principle, the color subcarrier was found to be 3.579545 MHz.
At this point we need to be more exact and highlight that NTSC is defined with 29.97 frames exactly,
not 30 (!). The reason for this is the definition of color signal in NTSC, as proposed by the RS170A
video standard, which is based on the exact subcarrier frequency of 3.579545 MHz. The horizontal
scanning frequency is defined as 2/455 times the burst frequency, which makes 15,734 Hz. The vertical
scanning frequency is derived from this one, and the NTSC recommends it as 2/525 times the horizontal
frequency. This produces 59.94 Hz for the vertical frequency (i.e., the field rate). For the purpose of
generalization and simplification, however, we will usually refer to NTSC as a 60-field signal in this
The basics of color composition in television lie in the additive mixing of three primary color
signals: red, green, and blue. So, for transmitting a complete color signal, theoretically, apart from
the luminance information,
another three different
signals are required. Initially,
in the beginning of the color
evolution, this seemed
impossible, especially when
only between 4 and 5 MHz
are used to preserve the
compatibility with the B/W
With a complex but clever
procedure, this was made
possible. It is beyond the
scope of this book to explain
such a procedure, but the
following facts are important
for our overall understanding
of the complexity of color
reproduction in television.
In a real situation, apart
from the luminance signal,
4. General characteristics of television systems
which is often marked as Y = UY , two more signals (not three)
are combined. These signals are the so-called color differences
V = UR– UY , and U = UB– UY , which means the difference
between the red and the luminance signal and between the blue
and the luminance. Color differences are used instead of just
plain values for R, B (and G) because of the compatibility with
the B/W system. Namely, it was found that when a white or gray
color is transmitted through the color system, only a luminance
signal needs to be present in the CRT. In order to eliminate
the color components in the system, the color difference was
Color burst waveform
Having in mind the basic relationship among the three color signals:
UY = 0.3UR + 0.59UG + 0.11UB
we can show that all three primary color signals can be retrieved using the luminance and color
difference signals:
UR = (UR – UY) + UY
UB = (UB – UY) + UY
UG = (UG – UY) + UY
For white color UR = UG = UB, thus UY = (0.3 + 0.59 + 0.11)UR = UB = UG. The green color difference
is not transmitted, but it is obtained from the following calculation (again using (36)):
UG – UY = – 0.51(UR – UY) – 0.19(UB – UY)
This relation shows that in color television, apart from the luminance, only two additional signals
would be sufficient for successful color retrieval. That is the red and the blue color difference (V and
U), and they are embedded in the CVBS signal.
Because the R, G, and B components are derived from the color difference signals by way of simple
and linear matrix equations, which in electronics can be realized by simple resistor networks, these
arrangements are called color matrices.
It should be noted here that the two discussed TV standards, NTSC and PAL, base their theory of color
reproduction on two different exponents of the CRT phosphor (called gamma, which will be explained
in Chapter 6). The NTSC assumes a gamma of 2.2, and PAL 2.8. This assumption is embedded in the
signal encoding prior to transmission.
In practice, gamma of 2.8 is a more realistic value, which is also reflected in a higher contrast picture.
Of course, the reproduced color contrast will depend on the monitor’s phosphor gamma itself.
In order to combine (modulate) these color difference signals with the luminance signal for
4. General characteristics of television systems
broadcast transmission, a socalled quadrature amplitude
modulation is used where the
two different signals (V and
U) modulate a single-carrier
frequency (color subcarrier). This
is possible by introducing a phase
difference of 90° between the two,
which is the reason for the name
quadrature modulation.
In the PAL color standard, we
have another clever design
to minimize the color signal
distortions. Knowing that the
human eye is more sensitive
to color distortions than to
changes in brightness, a special
procedure was proposed for the
color encoding so that distortions
would be minimized, or at least
made less visible. This is achieved
by the color phase change, of
180°, in every second line.
PAL color vectors
So, if transmission distortions
occur, which is usually in the
form of phase shifting, they will
result in a color change of the
same amount. But because the
electronic vector representation
of colors is chosen so that
complementary colors are
opposite each other, the errors
are also complementary and,
when “errored” lines next to
each other are seen from a
viewing distance, the errors
will cancel each other out. This
is the reason for the name phase
alternating line (PAL).
Standard order of color bars in television
4. General characteristics of television systems
Resolution is the property of a system to display fine details. The higher the resolution, the more details
we can see. The resolution of a TV picture depends on the number of active scanning lines, the quality
of the camera, the quality of the monitor, and the quality of the transmitting media.
Since we use two-dimensional display units (CCD chips and CRTs), we distinguish two kinds of
resolutions: vertical and horizontal.
The vertical resolution is defined by the number of vertical elements that can be captured on a
camera and reproduced on a monitor screen. When many identical vertical elements are put together in
the scanning direction, we get very dense horizontal lines. This is why we say the vertical resolution
tells us how many horizontal lines we can distinguish. Both black and white lines are counted, and
the counting is done vertically. Clearly, this is limited by the number of scanning lines used in the
system – we cannot count more than 625 lines in a CCIR system or 525 in an EIA system. If we take
into account the duration of the vertical sync and the equalization pulses, the invisible lines, and so on,
the number of active lines in CCIR comes down to 576 lines and about 480 in EIA.
This is still not the actual vertical resolution. Usually, the resolution is measured with a certain patterned
image in front of the camera, so there are a lot of other factors to take into account. One is that the
absolute position of the supposedly highresolution horizontal pattern can never
exactly match the interlaced lines pattern.
Also, the monitor screen overscanning
cuts a little portion of the video picture,
the thickness of the electronic beam is
limited, and for color reproduction the
“grill mask” is limited.
As early as 1933, Ray Kell and his
colleagues found by experimenting
that a correction factor of 0.7 should
be applied when calculating the “real”
vertical resolution. This is known as the
Kell Factor, and it is accepted as a pretty
4. General characteristics of television systems
good approximation of the real resolution. This means that 576 has to be corrected (multiplied) by 0.7
to get the practical limits of the vertical resolution for PAL, which is approximately 400 TV lines.
The same calculation applies for the NTSC signal, which will give us approximately 330 TV lines of
vertical resolution. This is all true in an ideal case, that is, with excellent video signal transmission.
Horizontal resolution is a little bit of a different story. The horizontal resolution is defined by the
number of horizontal elements that can be captured by a camera and reproduced on a monitor
screen. And, similar to what we said about the vertical resolution, the horizontal tells us how many
vertical lines can be counted.
One thing is different, however, because of the TV aspect ratio of 4:3, the width is greater than the
height. So, to preserve the natural proportions of the images, we count only the vertical lines of the
width equivalent to the height (i.e., three-quarters of the width). This is why we do not refer to the
horizontal resolution as just lines but rather TV lines.
The horizontal resolution of
a monochrome (B/W) TV
system is theoretically only
limited to the cross section of
the electron beam, the monitor
electronics, and, naturally,
the camera specifications. In
reality, there are a lot of other
limitations. One is the video
bandwidth applicable to the
type of transmission. Even
though we may have highresolution cameras in the TV
studio, we transmit only 5
MHz of the video spectrum
(as discussed earlier);
therefore there is no need
for television manufacturers
to produce TV receivers with
a wider bandwidth. In CCTV,
however, the video signal
bandwidth is dictated mostly A 12 MHz sweep generator is used to check the bandwidth
of a high-resolution monitor (shown 9 MHz = 700 TVL).
by the camera itself, since
B/W monitors have a very high resolution (up to 1000 TV lines), which is limited only by the monitor
quality, of which the most important are the electron beam precision and cross section.
A color system has an additional barrier, and that is the physical size of the color mask and its pitch. The
color mask is in the form of a very fine grille. This grille is used for the color scanning with the three
primary colors, red, green, and blue. The number of the grille’s color picture elements (RGB dots) is
determined by the size of the monitor screen and the quality of the CRT. In CCTV, anything from 330
4. General characteristics of television systems
TV lines (horizontal resolution) up to 600 TV lines is available. The most common are the standard 14
monitors with around 400 TV lines of resolution. Remember, we are talking about TV lines, which in
the horizontal direction gives us an absolute maximum number of 400 × 4/3 = 533 vertical lines.
In CCTV, as in broadcast TV, we cannot change the vertical resolution since we are limited to the number
defined by the scanning system. That is why we rarely argue about vertical resolution. The commonly
accepted number for realistic vertical resolution is around 400 TV lines for CCIR and 330 TV
lines for EIA. The horizontal resolution we can change; this will depend on the camera’s horizontal
resolution, the quality of the transmission media, and the monitor. It is not rare in CCTV to come across
a camera with 570 TV lines of horizontal resolution, which corresponds to a maximum of approximately
570 × 4/3 = 760 lines across the screen. This type of camera is considered a high-resolution camera. A
standard resolution B/W camera would have 400 TV lines of horizontal resolution.
There is a simple relation between the bandwidth of a video signal and the corresponding number of
lines. If we take one line of a video signal, of which the active duration is 57 µs, and spread 80 TV
lines across it, we will get a total of 80 × 4/3 = 107 lines. These lines, when represented as an electrical
signal, will look like sine waves. So, a pair of black and white lines actually corresponds to one period
of a sine wave. Therefore, 107 lines are approximately 54 sine waves. A sine wave period would be
57 µs/54 = 1.04 µs. If we apply the known relation for time and frequency (i.e., T = 1/f), we get f = 1
MHz. The following is a very simple rule of thumb, giving us the relation between the bandwidth of a
signal and its resolution: approximately 80 TV lines correspond to 1 MHz in bandwidth.
Instruments commonly used in TV
It is very hard to determine any of the video signal properties with a typical electronic multimeter.
There are, however, specialized instruments that, when used correctly, can describe the tested video
signal precisely. These instruments include oscilloscopes, spectrum analyzers, and vectorscopes. In
most cases an oscilloscope will be sufficient, and I strongly recommend that the serious technician or
engineer invest in it.
The change of a signal (timewise) can be slow or fast. What is slow and what is fast depends on many
things, and they are relative terms. One periodical change of something in one second is defined as
Hertz. Audio frequency of 10 kHz makes 10,000 oscillations in one second. The human ear can hear
a range of frequencies from around 20 Hz up to 15,000–16,000 Hz. A video signal, as defined by the
aforementioned standards, can have frequencies from nearly 0 Hz up to 5–10 MHz.
The higher the frequency, the finer the detail in the video signal.
How high we can go depends, first of all, on the pickup device (camera) but also on the transmission
(coaxial cable, microwave, fiber optics), and the processing/displaying media (VCR, framestore, hard
disk, monitor).
4. General characteristics of television systems
A time analysis of any
electrical signal (as opposed
to a frequency analysis)
can be conducted with
an electronic instrument
called an oscilloscope.
The oscilloscope works on
principles similar to those
of a TV monitor; only in
this case the scanning of
the electron beam follows
the video signal voltage in
the vertical direction, while
horizontally, time is the only
variable. With the so-called
time-base adjustment, video
signals can be analyzed from
a frame mode (20 ms) down
to the horizontal sync width (5 µs).
4. General characteristics of television systems
Tektronix 1781 video measurement set
Oscilloscope measurements have the most objective indication of the video signal quality, and it is
strongly recommended to anyone seriously involved in CCTV. First, with an oscilloscope it is very
easy to see the quality of the signal, bypassing any possible misalignment of the brightness/contrast
on a monitor. Sync/video levels can easily be checked and can confirm whether a video signal has
a proper 75-Ω termination, how far the signal is (reduction of the signal amplitude and loss of the
high frequencies), and whether there is a hum induced in a particular cable. Correct termination is
always required for proper measurements. That is, the input impedance of an oscilloscope is high and
whichever way the signal is connected, it needs to see 75 Ω at the end of the line. A few examples of
how an oscilloscope is to be connected for the purposes of correct video measurement are shown on
the diagram on the previous page.
Spectrum analyzer
Every electrical signal that changes (timewise) has an image in the frequency domain, as already
discussed by the Fourier theory. The frequency domain describes the signal amplitude versus frequency
instead of versus time. The representation in the frequency domain gives us a better understanding of
the composition of an electrical signal. The majority of the contents of the video signal are in the low
to medium frequencies, while fine details are contained in the higher frequencies. An instrument that
shows such a spectral composition of signals is called a spectrum analyzer.
A spectrum analyzer is an expensive device and is not really necessary in CCTV. However, if used properly,
when combined with a test pattern generator with a known spectral radiation, a lot of valuable data can
be gathered. Video signal
attenuation, proper cable
equalization, signal quality,
and so on can be precisely
determined. In broadcast TV,
the spectrum analyzer is a
must for making sure that the
broadcast signal falls within
certain predefined standard
4. General characteristics of television systems
For measuring the color characteristics of a video signal an instrument called a vectorscope is used. A
vectorscope is a variation of an oscilloscope, where the signal’s color phase is shown. The display of
a vectorscope is in the polar form, where primary colors have exact known positions with angles and
radii. The vectorscope is rarely used in CCTV but could be necessary when specific colors and lighting
conditions need to be reproduced.
In most cases, a color CCD camera will have an automatic white balance that, as discussed earlier in
the color temperature section, compensates for various color temperature light sources. Sometimes,
however, with manual white balance cameras, a color test chart may need to be used, and with the help
of a vectorscope, colors can be fine-tuned to fall within certain margins, marked on the screen as little
square windows. It should be noted that a vectorscope display of the same image in NTSC is different
from the vectorscope display in PAL, and this is because of the difference of color encoding in the two
systems. PAL has vertically symmetrical color vectors, as it can be seen in the photos above.
Many other practical instruments (designed for the broadcast industry really) can be used in CCTV.
With a little bit of understanding and willingness to learn, many features of a video component, or a
whole system, can be quantified. Some instruments combine more measuring devices into one box.
If you are serious about CCTV, these should be considered valuable tools of your trade.
Tektronix VM700 video measurement set
4. General characteristics of television systems
Television systems around the world
There are a number of variations of the three major systems PAL, NTSC, and SECAM. Various countries
have accepted various broadcast bandwidths, color subcarrier frequencies, and sound carriers. These
variations are usually referred to with a suffix next to the system a country uses.
The following tables show variations of the three major systems, and at the end of this chapter we list
most of the countries of the world with their respective standards.
With many newly designed TV sets and VCRs there is no need to know what standard you have as
the set will automatically find the standard, but as technical people it is a good idea to know what is
in use.
With the new digital standards, hopefully, there will be much less variation around the world.
4. General characteristics of television systems
High-definition television (HDTV) has arrived. Many experiments and tests have been conducted, and
most importantly, the technology has now been developed to a stage where it can be mass produced.
Many countries have already started broadcasting HDTV and are slowly phasing out the old analog
TV. This is supposed to happen by the end of 2006 in the United States, and by the end of 2008 in
Australia. All terrestrial broadcasts will be in digital format, which will be a mix of standard definition
(SDTV) and HDTV.
Hopefully, it will not take too long for CCTV to follow suit.
The idea of high definition is to have approximately twice as much resolution (horizontal and vertical,
which produces four times more details) and a new aspect ratio of 16:9 as opposed to the existing 4:3.
The reason for this widening of the TV screen is compatibility with the majority of movie formats. By
having such a resolution, HDTV offers picture quality close to 35 mm film and a sound quality equal
to that of a compact disc.
HDTV has been worked on for over two decades now, and the first test broadcasts have been conducted
in Japan, Europe, and the United States.
In 1993 a group of institutions and companies was formed in order to evaluate the present technologies
and decide on key elements that will be at the heart of the best HDTV system. This group was called
The Grand Alliance, and some of its members include AT&T™, General Instrument Corporation™,
Massachusetts Institute of Technology (MIT), Philips™, David Sarnoff Research Centre™, Thomson™,
and Zenith™.
In 1995, the Alliance agreed to use the MPEG-2 video, audio, and system multiplexing, which is the
same format as in DVD.
Two display modes have been proposed: interlaced and progressive (or noninterlaced) scanning.
The HDTV is now one of the many digital television (DTV) standards, offering the best picture
4. General characteristics of television systems
quality. There are around 18 DTV formats, of which six are HDTV formats. Five of these are based
on progressive scanning and one is based on interlaced scanning.
Of the remaining formats, eight are SDTV (four wide-screen formats with 16:9 aspect ratios, and four
conventional formats with 4:3 aspect ratios), and the remaining four are video graphics array (VGA)
formats. Stations are free to choose which formats to broadcast.
The following formats are used in HDTV:
• 720i – 1280 × 720 pixels interlaced
• 720p – 1280 × 720 pixels progressive
• 1080i – 1920 × 1080 pixels interlaced
• 1080p – 1920 × 1080 pixels progressive
“Interlaced” or “progressive” refers to the scanning system. The interlaced is the same type of scanning
as we have it in the analog CCTV, as explained previously. With modern larger and brighter television
sets, the persistence of the human eye becomes a problem as it starts to pick up the flicker.
Progressive scanning shows the whole picture, and every line is produced one after another, making
full pictures 50 or 60 times a second (depending on the frequency region). This provides for a much
smoother picture but uses slightly more bandwidth.
The recommended viewing distances for HDTV are four times the TV height (4H), which allows for
a proper cinematic experience.
The video compression algorithm in HDTV is the MPEG-2, while the audio compression is AC-3. The
transmission modulation technique proposed is to be a quadrature amplitude modulation with vestigial
sideband. The selected audio technology is an eight-channel, CD-quality, digital surround-sound system,
using one of Dolby’s cinema surround-sound techniques.
Digital terrestrial transmission broadcast (DTTB) brings to an end the direct relationship between one
television program and one frequency. DTTB is capable of carrying either one HDTV program or
up to six services using standard definition television (SDTV), or as many as 10 services with lower
definition formatting. As with computer technology, it is possible to trade off the bit rate, the channel
width, and picture quality.
Essentially, the type of picture determines how much of the channel’s capacity is needed for transmission.
A digital terrestrial broadcast channel can carry up to 20 Mb/s of data. HDTV services would use most,
if not all, of this capacity, but an SDTV service would use considerably less depending on the nature
of the service. Fast-moving sports, for example, could require up to a 10 Mb/s data rate, and hence
possibly only two of these services could be delivered at one time. By comparison, a talking-head
picture would utilize about 5 Mb/s of data.
4. General characteristics of television systems
DTTB systems can accommodate 6, 7, and 8 MHz channel spacings with minimal or no apparent cost
disadvantage. Australia uses 7 MHz channel spacing for analog services, the United States uses 6 MHz
and Europe commonly uses 8 MHz, although there is also some 7 MHz use.
DTTB can be accommodated within the existing broadcasting frequency bands, generally in UHF but
also in VHF bands, using vacant channels adjacent to analog services.
These channels often cannot be used for additional analog services because of technical constraints
inherent in analog systems, but they can be used for DTTB as such receivers are expected to tolerate
higher levels of co-channel and adjacent channel interference.
The HDTV will naturally be more exciting to watch, and the clarity and resolution of the images will
allow for much bigger screens. Initially, if such screens are based on CRT technology with which high
resolution can be achieved, we cannot expect a diagonal size larger than 1 m. But with one of the new
display technologies, such as the plasma display, the FED or DMD, and not excluding the LCD (all of
which are discussed in Chapter 6), we will certainly see larger screen sizes, most probably only limited
by the room size and the viewing distance.
For CCTV, the HDTV size will not be so critical and CRT high-definition monitors are okay since the
majority of security operators and users watch the screens from a very close distance. But this is not to
say that new control room designs will not take a different approach where one or two large monitors
at a room’s distance will be the main control displays.
4. General characteristics of television systems
4. General characteristics of television systems
4. General characteristics of television systems
4. General characteristics of television systems
5. CCTV cameras
The very first and most important element in the CCTV chain is the element that captures the images
– the camera.
General information about cameras
The term camera comes from the Latin camera obscura, which means “dark room.”
This type of room was an artist’s tool in the Middle Ages. Artists used a lightproof room, in the form of
a box, with a convex lens at one end and a screen that reflected the image at the other, to trace images
and later produce paintings.
In the nineteenth century, “camera” referred to a device for recording images on film or some
other light-sensitive material. It
consisted of a lightproof box, a
lens through which light entered
and was focused, a shutter that
controlled the duration of the
lens opening, and an iris that
controlled the amount of light
that passed through the glass.
Joseph Nicéphore Niépce produced the first negative film image in 1826. This is considered the birth
of photography. Initially, such photographic cameras did not differ much from the camera obscura
concept. They were in the form of a black box, with a lens at the front and a film plate at the back. The
initial image setup and focusing were done on an upside-down projection, which a photographer could
see only when he or she was covered with a black sheet.
The first commercial photographic cameras had a mechanism for manual transport of the film between
exposures and a viewfinder, or eyepiece, that showed the approximate view as seen by the lens.
Today, we use the term camera in film, photography, television, and multimedia. Cameras project
images onto different targets, but they all use light and lenses.
To understand CCTV you do not need to be an expert in cameras and optics, but it helps if you understand
the basics. Many things are very similar to what we have in photography, and since every one of us
has been, or is, a family photographer, it will not be very hard to make a correlation between CCTV
and photography or home video. In photographic and film cameras, we convert the optical information
(images) into a chemical emulsion imprint (film). In television cameras, we convert the optical
information into electrical signals. They all use lenses with certain focal lengths and certain angles of
5. CCTV cameras
view, which are different for different
Lenses have a limited resolution and
certain distortions (or aberrations),
but this is more obvious in the film
cameras. This is because the film
resolution is still far better than
the electronic camera resolution,
although there are higher resolution
chips coming out daily.
To illustrate, high-resolution CCD
chips in CCTV these days have about
752 × 582 pixels (picture elements),
while 100 ISO 35 mm color negative
film has a resolution equivalent
to 8000 × 6000 elements (film
grains). This is based on a typical film
resolution of 120 lpm.
Photo courtesy of Sarnoff Corp.
One of the very early television cameras from 1931
In 1997, another type of camera emerged on the market. This camera is used with computers for both
video conferencing and digital image storage. A camera like this uses a CCD chip as an imaging device,
but instead of producing analog electronic signal or projecting the image on film, it converts the image
to digital format and stores it on a micro disk or RAM-card in the camera, so it can be transferred to a
computer. Although most of such cameras produce still images, models with real-time video in digital
format are already appearing.
Tube cameras
The first experiments with television cameras, as mentioned earlier, were made in the 1930s by the
Russian-born engineer Vladimir (Vlado) Zworykin (1889–1982). His first camera, made in 1931, focused
the picture onto a mosaic of photoelectric cells. The voltage induced in each cell was a measure of the
light intensity at that point and could be transmitted as an electrical signal. The concept, with small
modifications, remained the same for decades.
Those first cameras were made with a glass tube and a light-sensitive phosphor coating on the inside
of the glass. We now call them tube cameras.
Tube cameras work on the principles of photosensitivity, based on the photo-effect. This means the
light projected onto the tube phosphor coating (called the target) has sufficient energy to cause the
ejection of electrons from the phosphor crystal structure. The number of electrons is proportional to
the light, thus forming an electrical representation of the light projection.
5. CCTV cameras
There were basically two main types
of tubes used in the early days of
CCTV: Vidicon and Newvicon.
Vidicon was cheaper and less sensitive.
It had a so-called automatic target
voltage control, which effectively
controlled the sensitivity of the
Vidicon and, indirectly, acted as an
electronic iris control, as we know it
today on CCD cameras. Therefore,
Vidicon cameras worked only with
manual iris lenses. The minimum
illumination required for a B/W
Vidicon camera to produce a signal
was about 5 ~ 10 lux reflected from
the object when using an F-1.4 lens.
A studio tube camera from 1952
Newvicon tube cameras were more sensitive (down to 1 lux) and more expensive, and required auto iris
lenses. Their physical appearance was the same as the Vidicon tube, and one could hardly determine
which type was which by just looking at the two. Only an experienced CCTV technician could notice the
slight difference in the color of the target area: the Vidicon has a dark violet color, while the Newvicon
has a dark bluish color. The electronics that control these two types of tubes are different, and on the
outside of the camera the Newvicon type has an auto iris connection.
5. CCTV cameras
All tube cameras use the principles of
electromagnetism, where the electron beam scans
the target from the inside of the tube. The beam is
deflected by the locally produced EMF which is
generated by the camera electronics. The more light
that reaches the photoconductive layer of the target,
the lower its local resistance will be. When an image
is projected, it creates a potential map of itself
by the photosensitivity effect. When the analyzing
electron beam scans the photosensitive layer, it
neutralizes the positive charges created, so that a current flow through the local resistor occurs. When
the electron beam hits a particular area of the potential map, an electrical current proportional to the
amount of light is discharged. This is a very low current, in the order of pico-Amperes (pA = 10–12 A),
which is fed into a very high-input impedance video preamplifier, from which a video voltage signal is
produced. For a tube camera it is important to have a thin and uniform photo layer. This layer produces
the so-called dark current, which exists even if there is no image projected by the lens (iris closed).
After a signal has been formed, the rest of the camera electronics add sync pulses, and at the output of
the camera we get a complete video, known as a composite video signal.
There are a few important concepts used in the operation of tube cameras, which we need to briefly
explain in order to appreciate the differences between this and the new, CCD, technology.
The first concept is the physical bulkiness of the camera as such, due to the glass tube, electromagnetic
deflection yoke around the tube, and the size of the rest of the electronic components in the era when
surface mount components were unknown. This made tube cameras quite big.
5. CCTV cameras
The second concept is the need for a precise alternating electromagnetic field (EMF) which will force
the electron beam to scan the target area as per the television recommendations. To use an EMF to do
the scanning means the external EMF of some other source may affect the beam scanning, causing
picture distortions.
Third is the requirement for a high voltage (up to 1000 V), which accelerates the electron beam and
gives it straight paths when scanning. Consequently, high-voltage components need to be used in the
camera, which are always a potential problem for the electronic circuit’s stability. Old and high-voltage
capacitors may start leaking, moisture can create conductive air around the components, and electric
sparks may be produced.
Fourth, there is the need for a phosphor coating of the target, which converts the light energy into
electrical information. The phosphor as such is subject to constant electron bombardment that wears
it out. Therefore, the life expectancy of a tube phosphor coating is limited. With constant camera
usage, as is the case in CCTV, a couple of years will be the realistic life expectancy, after which the
picture starts to fade out, or even an imprinted image will develop if the camera constantly looks at
the same object. As a result, we can see pictures from a tube camera where, when people move, they
appear as ghostlike figures, since they are semitransparent to the imprinted image.
And the fifth feature, conceptually different from the CCD cameras used today (and, again, this feature
can be considered as a drawback) and inherently part of the tube camera design itself, consists of
geometrical distortions due to the beam hitting the target at various angles. The path of the electron
beam is shorter when it hits the center of the target as compared to when it scans the edges of the tube.
Therefore, certain distortions of the projected image are present. In a lot of tube camera designs, we
will find some magnetic and electronic corrections for such distortions, which means every time a tube
needs to be replaced, all of these adjustments have to be remade.
With the new CCD technology, none of the above problems exists in cameras. One tube’s feature,
however, was very hard to beat in the early days of the CCD technology. This was the resolution of a
good tube camera. Vertical resolution is dependent on the scanning standard, and this would be, more or
less, the same at both, CCD and tube cameras, but the horizontal resolution (i.e., the number of vertical
lines that can be reproduced) depends on the thickness of the electron beam. Since this can be quite
successfully controlled by the electronics
itself, very fine details can be reproduced
(i.e., analyzed while scanning).
Initially, with the CCD design,
microelectronics technology was not
able to offer picture elements (pixels)
of the CCD chip smaller than the beam
cross section itself. This means that in
the very beginning of CCD technology,
the resolution lagged well behind that of
the tube cameras.
5. CCTV cameras
CCD cameras
In the 1970s, when personal computers were born, experiments were made with solid-state electronic
elements called charge-coupled devices (CCD), which were initially intended to be used as memory
Very soon it was found that CCDs are very sensitive to light, so they could be used more effectively
as imaging devices than as memory devices.
The basic principle of CCD operation is the storing of the information of electrical charges in the
elementary cells and then, when required, shifting these charges to the output stage.
When a CCD chip is used as an imaging device, the shifting concept stays the same, but instead of
injecting charge packets as digital information (which would be the case if the CCD chip is used as a
memory device), we have a photo-effect generating electrons proportional to the amount of light
falling on the imaging area, and then, these charges are shifted out vertically and/or horizontally, in
the same manner as shift registers in digital electronics shift binary values.
Fifteen years apart, and they do the same job (tube and CCD camera).
5. CCTV cameras
Electrons in a CCD chip are generated by photons.
So, in effect, we have charge packets, once they have been collected in each photosensitive cell, “sliding
down” to the output stage by using charge-coupling methods. Thus, an electrical coupling is done by
means of voltage and timing manipulation of each cell, called a picture element (or pixel).
One of the pioneers of CCD technology, Gilbert Amelio, in his article “Charge Coupled Imaging
Devices” written in 1974, describes charge coupling as “a collective transfer of all the mobile electric
charge stored within a semiconductor storage element to a similar, adjacent storage element by the
external manipulation of voltages. The quantity of the stored charge in this mobile packet can vary
widely, depending on the applied voltage and on the capacitance of the storage element. The amount
of electric charge in each packet can represent information.”
The construction of CCD chips is in the form of either a line area (linear CCD) or a two-dimensional
matrix (array CCD). It is important to understand that they are composed of discrete pixels, but
CCDs are not digital devices. Each of these pixels can have any number of electrons, proportional to
the light that falls onto it, thus representing analog information.
These discrete packets of electrons are then transferred (once the exposure time is over), by simultaneous
shifting of row and column packets, to the output stage of the chip.
This is why we can say CCDs are, in essence, analog shift registers sensitive to light.
Today, CCDs are not used as memory devices, but mostly as imaging devices. They can be found in many
objects of daily use: facsimile machines use linear CCD chips; picture and OCR scanners also use linear
5. CCTV cameras
CCDs; many auto-focusing photographic cameras
use CCD chips for auto-focusing; geographic
aerial monitoring, spacecraft planet scanning, and
industrial inspection of materials also use linear
CCD cameras; and last, but not least, most of the
television cameras these days, both in broadcast and
CCTV, use CCD chips.
CCD cameras have many advantages (in design) over
the tube cameras, although, as mentioned earlier, in
the beginning it was hard to achieve high-resolution
similar to what the tube cameras had. These days,
however, the technology is at such a level that high
resolution is no longer a problem.
The main advantages that the CCD cameras have
over tube cameras are:
• Very low minimum illumination performance (down to 0.1 lx at the object);
• No geometrical distortions due to a precise
two-dimensional construction;
• Low power consumption;
Line scan CCD chips are used in
satellite imaging.
• No need for high voltage for beam acceleration;
• Small size;
• No influence of external EMFs; and, most importantly,
• Unlimited lifetime of electrons generated by photo-effect.
As we said earlier, CCDs come in all shapes and sizes, but the general division is into linear and
two-dimensional matrices. Linear chips are used in applications where there is only one direction of
movement by the object (as with facsimile machines or scanners).
In CCTV we are only interested in two-dimensional matrices, the so-called 2/3'', 1/2'', 1/3'', and 1/4"
As mentioned earlier, these numbers are not the diagonal sizes of the chips, as many assume, but rather
they are the sizes people use to refer to the diameter of the tube that would produce such an image.
5. CCTV cameras
Sensitivity and resolution of the CCD chips
Comparing sensitivities will show us the advantage of CCD chips relative to the Vidicon and Newvicon
tubes, but also relative to a film emulsion.
The 100 ISO film is the most commonly used in photography, although we can buy 200 ISO film
(twice as sensitive) or 400 ISO (four times more sensitive than the 100 ISO film). Sometimes, we may
even come across 1600 ISO film, and this is usually used for extremely low light level situations (at
least in photographic terms).
It can be shown that an average B/W CCD chip has a very high light sensitivity compared to a film
emulsion. On a full sunny day, a typical 100 ISO film camera will require a setting of 1/125 s and F-16.
When the same scene is observed by a CCD camera, of which the normal CCIR exposure speed is 1/50s,
a lens with approximately F-1000 needs to be used (give or take an F-stop or two, since the camera’s
AGC plays a role too). If we convert the 1/50 to 1/125 (2.5 times shorter), in order to have the same
exposure the lens needs to have an opening 2.5 F-stops wider, in order to compensate for the shortening of the exposure. This brings us from F-1000 to approximately F-400 (remember the F-numbers:
1.4, 2, 2.8, 4, 5.6, 8, 11, 16, 22, 32, 44, 64, 88, 128, 180, 250, 360, 500, 720, 1000, 1400, etc.). Now,
in order to convert the sensitivity of the film emulsion to get from the 100 ISO settings of 1/125 and
F-16 to the equivalent settings of a higher film sensitivity, and knowing that double sensitivity occurs
with doubling the ISO number, we get 9.5 F-times from F-16 to F-400. And this is approximately 29.5
= 720 times. So, the average B/W CCD chip sensitivity, expressed in photographic ISO units, is
approximately 100 ISO × 720 = 72,000 ISO!
5. CCTV cameras
Similarly, we will find that a color
CCD camera has the equivalent
sensitivity of approximately
5000 ISO, which is still very high
compared to the photographic
standards. Admittedly, such a
high sensitivity picks up quite
a high noise, so in practice the
sensitivity is reduced somewhat
in order to minimize noise. Noise
is proportional to the temperature,
and unfortunately it cannot be
avoided unless CCD chip is very
cold. For special applications,
such as astronomy, CCD chips are
cooled down to -30º C, or even
lower, in order to get clean image,
with as little noise as possible.
Chemical (film) photography is
slowly but surely being replaced
with electronic cameras using
CCD chips. Such still cameras are not dependent on the TV standard; therefore, there is no practical
limitation on the number of pixels and aspect ratio. Even as this book is being written, manufacturers
are producing chips with an area size as small as 62 mm × 62 mm, with no less than 5120 × 5120
picture elements. As already mentioned, these are still cameras and should not be confused with CCTV
The spectral sensitivity of CCD chips varies with various silicon substrates, but the general characteristic
is a result of the photo-effect phenomenon: longer wavelengths penetrate deeper into the CCD silicon
structure. This refers to the red and infrared light. A typical CCD chip spectral curve is shown on the
drawing below.
Even though this “penetration” may
seem beneficial (CCD chips seem
more sensitive), there are reasons for
preventing some of the longer waves
from getting too deep inside the chip.
Such wavelengths might be so strong
that they could produce electron carriers
in areas that are not supposed to be
exposed to light. As a result, the picture
may lose details because the next-door
pixels will melt their content into each
other, losing high-resolution components
and causing a “blooming effect.” The
masked areas, which are supposed to
only temporarily store charges and are
not supposed to be exposed to light,
can also be affected, so that noise and
smear increase significantly. Because of
these reasons, special optical infrared
cut filters have been introduced as part
of a well-designed CCD camera. These
filters are optically precise plan-parallel
pieces of glass, mounted on top of the
CCD chips. As the name suggests, they
behave as optical low pass-filters, where
the cutting frequency is near 700 nm, that
is, near the color red.
There are a number of manufacturers of
B/W cameras, however, that prefer not
to put such filters on their chips, to make
their cameras more sensitive. This might
be acceptable, especially when cameras
for lower light levels need to be used or
if infrared illuminators are to be part of
the system. However, from a theoretical
point of view cameras with infrared
cut filters will show better resolution
(compared to the same chip without an
IR cut filter), better S/N ratio, and more
natural color-to-gray conversion, at
the expense of a not so low minimum
illumination response.
5. CCTV cameras
Infrared cut filter modifies the CCD response.
B/W CCD camera chip without infrared cut filter
Color CCD cameras, on the other
hand, must use an IR cut filter, as the
CCD chip’s spectral response, which we
saw is different compared to that of the
eye, must be made similar to the human
eye’s spectral sensitivity. This is also one
of the reasons color CCD cameras are less
sensitive than B/W.
A typical B/W CCD chip, without an
infrared cut, can produce a reasonable
level of video signal as low as 0.01
lx. The same camera with a filter will
Color CCD camera chip with infrared cut filter
5. CCTV cameras
be quoted 0.1 lx for the same object
Color cameras these days are quoted
to have 1 lx minimum illumination at
the object, with an F-1.4 producing a
video signal of a reasonable level (0.3
to 0.5 V).
The CCD technology is now at a stage
where a few million pixels are no longer
a problem. In digital photography,
6 million pixels CCD chip are very
common, and now manufacturers are
trying to go even higher. In the CCTV
area, however, we are only limited with the analog resolution of the TV system. Because of that we do
not have any higher than 752 × 584 pixels, for example, which counts a bit over 400,000 pixels. We
will have more to say later on about resolution and how it is measured, but we should also mention
some interesting CCD products, which, strictly speaking, are not CCTV cameras as such but offer
high-resolution solutions.
One is the solution by Spectrum San Diego Inc., called SentryScope,
which produces images with 21 million of pixels, using a 2048
pixel line scan CCD chip. The image is captured very similarly to
how satellites scan the Earth, using a mirror that scans a wide area
of around 10,000 pixels. This camera does not produce a video
signal as such, but it captures only a few images per second, which,
however, are extremely detailed.
Photo courtesy of Spectrum San Diego Inc.
The Sentry-Scope 21 million pixels CCD camera gives incredible details in one shot.
5. CCTV cameras
Other interesting new solutions are on the horizon. An
example is the extended definition camera by Co-Vi,
which utilizes some sort of extended definition CCD
chip of 1280 × 720 pixels, which is then electronically
cropped to produce a normal resolution video, but offers
Photo courtesy of Co-Vi
electronic panning and zooming across the nearly 1
The “nearly” HD CCTV camera
million pixels real estate. So the effect to the user is
that, even though a fixed camera is used, it is possible to zoom in a couple of times electronically into
objects or pan around to see more details.
And yet another, also interesting, solution is by some designers where standard resolution CCD cameras
are used, but they are arranged in a matrix of 3 × 3, or even 4 × 4 cameras, all looking at the same object
but using a narrow angle of view lenses, which are positioned in such a way that they overlap only a
little bit. The image obtained is then projected on a video wall made up of 3 × 3 or 4 × 4 monitors,
offering a total resolution of 3.6 million pixels, or 6.4 million pixels. The end result is a large image
with plenty of details, all of which can be recorded on a standard definition digital video recorder.
Types of charge transfer in CCDs
Matrix (array) CCD chips, as used in CCTV, can be divided into three groups based on the techniques
of charge transfer.
The very first design, dating from the early 1970s, is known as frame transfer (FT). This type of CCD
chip is effectively divided into two areas with an equal size, one above the other, an imaging and a
masked area.
The imaging area is exposed to light for 1/50 s for a CCIR standard video (1/60 s for EIA). Then,
during the vertical sync period, all photogenerated charges (electronically representing the optical
image that falls on the CCD chip) are shifted down to the masked area (see the simplified drawing
on the next page). Basically, the whole “image frame” comes down.
Note the upside-down appearance of the projected image, since that is how it looks in a real situation;
that is, the lens projects an inverted image and the bottom right-hand pixel is recreated in the top lefthand corner when displayed on a monitor.
For the duration of the next 1/50 s, the imaging area generates the electrons of the new picture frame,
while the electron packets in the masked area are shifted out horizontally, line by line. The electron
packets (current) from each pixel are put together in one signal and converted into voltage, creating a
TV line information.
Technically, perhaps, it would be more precise to call this mode of operation “field transfer” rather than
“frame transfer,” but the term field transfer has been used since the early days of CCD development
and we will accept it as such.
This first design of the CCD chip was good. It had surprisingly better sensitivity than Newvicon tubes
5. CCTV cameras
and much better than Vidicon, but it came with a new problem that was unknown to tube cameras:
vertical smearing. In the time between subsequent exposures when the charge transfer was active,
nothing stopped the light from generating more electrons. This is understandable since electronic
cameras do not have a mechanical shutter mechanism as photographic or film cameras do. So where
intense light areas were present in the image projection, vertical bright stripes would appear.
To overcome this problem, design engineers have invented a new way of transference called interline
transfer (IT). The difference here is (see the simplified drawing) that the exposed picture is not
transferred down during the vertical sync pulse period, but it is shifted to the left masked area
columns. The imaging and masked columns are next to each other and interleave, hence the name,
Carge-coupled device principle of operation
5. CCTV cameras
Frame transfer (FT) concept
interline. Since the masked pixel columns are immediately to the right of the imaging pixel columns,
the shifting is considerably faster; therefore, there is not much time for bright light to generate an
unwanted signal, the smear.
To be more precise, the smear is still generated but in a considerably smaller amount. As a result, we
also have a much higher S/N ratio.
There is one drawback to the IT transfer chips, which is obvious from the concept itself: in order to
add the masked columns next to the imaging columns on the same area as the previous FT design, the
size of the light-sensitive pixels had to be reduced. This reduces the sensitivity of the chip. Compared
to the benefits gained, however, this drawback is of little significance.
One new and interesting benefit is the possibility of implementing an electronic shutter in the
CCD design. This is an especially attractive feature, where the natural exposure time of 1/50 s (1/60
for NTSC) can be electronically controlled and reduced to whatever shutter speed is necessary, still
producing a 1 Vpp video signal.
5. CCTV cameras
Interline transfer (IT) concept
Initially, with the IT chip, manual control of the CCD-shutter was offered, but very soon an automatic
version came out. This type of control is known as an automatic CCD-iris, or electronic iris. The
electronic iris replaces the need for AI controlled lenses. So an MI lens can be used with an electronic
iris camera even in an outdoor installation. It should be noted, however, that an electronic iris cannot
substitute the depth of field function produced by the mechanical iris in a lens. Also, it should be
remembered that when the electronic iris switches to higher shutter speeds, and due to lower charge
transfer efficiency, the smear increases.
So, when the electronic iris is enabled, it switches from a normal exposure speed of 1/50 (1/60) to a
higher one (shorter duration), depending upon the light situation. Theoretically, exposures longer than
1/50 s (1/60 s for EIA) could not be used because of loss of motion. With some CCD cameras, longer
exposures are possible, and this mode of operation is called integration. With some of the latest camera
designs incorporating digital signal processing, integration is automatically turned on when object
illumination falls below a certain level. This is especially helpful with color cameras, where low light
level pictures are produced, which until now were possible only with B/W cameras. The price paid
for this is the loss of smoothness in motion (in integration mode we cannot have 50 fields), which is
substituted with a motion appearance similar to a playback from a TL VCR.
Reducing the pixel size in the IT design, we said, indirectly reduces the chip’s minimum illumination
performance. This problem can be solved with a very simple concept (technologically not as easy,
however) of putting micro lenses on top of every pixel. Micro lenses concentrate all of the light that
falls on them to a smaller area, that is, actually the pixel itself, and effectively increase the minimum
illumination performance. The most common types of CCD cameras in CCTV today have IT chips.
5. CCTV cameras
Comparison between a conventional on-chip micro
lens and the new Sony’s Exwave concept
An electronic microscope
photo of the on-chip micro
lens structure
A typical cross section of an IT CCD chip with a micro lens on top of every pixel is shown on the
drawing on the next page. As can be seen, the micro structure of the chip becomes quite complex when
a high-quality signal needs to be produced.
The best design so far is the latest frame interline transfer (FIT) chip, offering all the features of the
interline transfer plus even less smear and a better S/N ratio. As can be concluded from the simplified
drawing, the FIT CCD works as an interline transfer in the top part of the chip, thus having the electronic
iris control, but instead of holding the image in the masked columns for the duration of the next field
exposure, it is shifted down to the better protected masked area.
This is the reason for even less smearing in the FIT design, but there is also gain in the S/N ratio. Micro
lenses are also used here to increase the minimum illumination performance. FIT chips have an even
further advanced micro structure, with a lot of cells and areas designed to prevent spills of excessive
charges to the area around, trap the thermally generated electrons, and so on. With all these fine tuneups, FIT chips have a very high dynamic range, low smear, and high S/N ratio, which makes them
ideal for external camera shooting and
news gathering in broadcast TV. These
types of cameras, in broadcast TV, are
usually referred to as electronic news
gathering cameras (ENG).
So, as it can be seen, these chips are
expensive for CCTV, and their main
use is in broadcast TV.
In the end we should point out that no
matter how good the camera electronics
are, if the source of information – the
CCD chip – is of an inferior quality,
the camera will be inferior too. The
opposite statement is also true. That
is, even if the CCD chip is of the best
quality, if the camera electronics cannot
process it in the best possible way, the
total package becomes second class.
A typical structure of the cross section of a CCD
chip with micro lens design
5. CCTV cameras
Frame interline transfer (FIT) concept
It should also be noted that most of the handful of chip manufacturers have CCD products of the same
type divided into a few different classes, depending on the pixel quality and uniformity. Different
camera manufacturers may use different classes of the same chip type. This is in the end reflected not
only in the quality but also in the price of the camera.
Array CCD chips come in various sizes.
5. CCTV cameras
Pulses used in CCD for transferring charges
The quality of a signal as produced by the CCD chip depends also on the pulses used for transferring
charges. These pulses are generated by an internal crystal oscillator in the camera. This frequency
depends on many factors, but mostly on the number of pixels the CCD chip has, the type of charge
transfer (FT, IT, or FIT), as well as the number of phases used for each elementary shifting of charges;
namely, the elementary shifting can be achieved with a two-phase, three-phase, or four-phase shift
pulse. In CCTV, cameras with three-phase transfer pulses are the most common.
As you can imagine, the camera’s crystal oscillator needs to have a frequency at least a few times
higher than the signal bandwidth that a camera produces. All other syncs, as well as transfer pulses,
are derived from this master frequency. The drawing shows how this charge transfer is performed with
Timing pulses in a CCD chip are derived from a master clock.
5. CCTV cameras
the three-phase concept.
The pulses indicated with φ1, φ2 , and φ3 are lowvoltage pulses (usually between 0 and 5 V DC),
which explains why CCD cameras have no need
for high voltage, as was the case with the tube
The preceding schematic shows how video signal
sync pulses are created using the master clock.
This is only one of many examples, but it clearly
shows the complexity and number of pulses
generated in a CCD camera.
A fixed camera dome with tinted glass
CCD chip as a sampler
As we said earlier, the CCD chip used in CCTV is a two-dimensional matrix of picture elements (pixels).
The resolution that such a matrix produces depends on the number of pixels and the lens resolution.
Since the latter is usually higher than the resolution of the CCD chip, we tend to not consider the optical
resolution as a bottleneck. However, as mentioned in the heading on MTF, lenses are made with a
resolution suitable for a certain image size, and care should be taken to use the appropriate optics with
various chip sizes.
There is another important aspect of the CCD resolution to be taken into account, and this is the TV
line noncontinuity. A TV line produced by a tube camera is obtained by a continuous beam scanning
along the line. A CCD chip has discrete pixels and therefore the information contained in one TV line
is composed of discrete values from each pixel. This method does not produce digital information but
rather discrete samples. In a way, the CCD chip is an optical sampler.
As with any other sampler, we do not get the total information of each line, but only discrete values at
positions equivalent to the pixel positions.
To some, it may seem impossible to reproduce a continuous signal from only portions of the same.
In 1928, however, Nyquist showed that a signal can be reconstructed perfectly, without any loss of
information, if the sampling frequency is at least twice the bandwidth of the signal. Samples of the
signal in between the sampled points are not necessary. This is a great theory, proven correct and
used in many electronic samplers such as in CD-audio and video. The sampling frequency, which is
equivalent to two times the bandwidth, is called the Nyquist frequency.
There is, however, an unwanted by-product of the CCD sampling. This is the well-known Moiré pattern
that occurs when taking shots of higher-resolution objects. This is usually obvious with, for example,
a news reader wearing a coat or shirt with a very fine pattern. This can mathematically be described as
a foldover frequency around the sampling one. Since the spatial sampling frequency should be twice
5. CCTV cameras
the highest frequency in the optical image Fsmax, we can represent it, in the frequency domain, with a
single frequency located at the Nyquist frequency FNYQUIST. The basic bandwidth spatial spectrum of
the optical signal will be modulated around this frequency, very similar to an amplitude modulation
side bands spectrum. If a high spatial frequency exists in the optical image projected on the CCD chip,
and if this frequency is higher than half of the FNYQUIST frequency, the side bands (after the sampling is
done) will fold over into the visible basic bandwidth and we will see the result as an unwanted pattern,
known as the Moiré pattern. The Moiré frequency is lower than the highest frequency of the camera
(FNYQUIST /2 – Fsmax).
To minimize this unwanted effect, low-pass optical (LPO) filtering has to be done. These filters are
usually part of the CCD chip glass mask and are formed by combining several birefringent quartz
plates. The effect is similar to blurring the fine details of an optical image.
CCD chip as a sampler
5. CCTV cameras
5. CCTV cameras
Correlated double sampling (CDS)
The noise in a CCD chip has several sources. The most significant is the thermally generated noise,
but a considerable amount can be generated by the impurities of the semiconductors and the quality
of manufacture.
High noise reduces the image sensor’s dynamic range, which in turn degrades image quality.
A careful CCD device design and fabrication can minimize the noise. Also, low operating temperature can
reduce thermally generated noise. Unfortunately, the user rarely has control over these parameters.
There is, however, a signal processing technique that can be implemented in the design of the CCD
camera that reduces this noise considerably. This technique is called correlated double sampling (CDS).
The term sampling here refers to the CCD signal output sampling.
The concept of CDS is based on the fact that the same noise component is present in the valid video as
in the reference signal in between charge transfers. When the output stage of the CCD chip transfers
packets of electrons, they are converted to an output voltage. To do this, CCD devices typically use a
floating sensing diffusion to collect signal electrons as they are shifted out of the chip. As the electrons
are transferred out of the CCD, the voltage on the sensing diffusion area drops. This voltage represents
the valid data and is amplified on-chip by a thermally compensated amplifier. Before the next packet
of signal electrons can be transferred into the diffusion area, it must be cleared from the previous
packet. This represents a reference reset signal that has a thermal noise component of the same type.
By extracting these two values, a less noisy signal is obtained.
CDS is best accomplished using two high-speed sample and hold circuits connected to the image
sensor’s output signal through a low-pass filter.
We will not go into more details on how these circuits are designed, as it is beyond the scope of this
book, but it should be remembered that the CDS circuits are part of the camera electronics and not of
the CCD chip.
Correlated double sampling is one method to reduce the CCD chip noise.
5. CCTV cameras
Camera specifications and their meanings
The basic objective of the television camera is, as already explained, to capture images, break them up
into a series of still frames and lines, then transmit and display them onto a screen rapidly so that the
human eye perceives them as motion pictures.
We should take a number of characteristics into account when choosing a camera. Some of them are
important, others not so much, depending on the application.
It is impossible to judge a camera on the basis of only one or two characteristics from a
Different manufacturers use different criteria and evaluation methods, and in most cases, even if we
know how to interpret all of the numbers from a specification sheet, we still have to evaluate the picture
ourselves, relative to the picture taken with another camera.
Comparison tests are quite often the best and probably the only objective way to check camera
performance, such as smear, noise, and sensitivity.
Do not forget: the general impression of a good quality picture is a combination of many attributes
– resolution, smear, sensitivity, noise, gamma, and so on.
The human eye is not equally sensitive to all of these factors.
People with no experience would be amazed to find that a 50-line difference in resolution is sometimes
of less importance to picture quality than a correct gamma setting or a 3-dB difference in the S/N
figure, for example.
We will go through some of the most important features:
• Camera sensitivity
• Minimum illumination
• Camera resolution
• S/N ratio
• Dynamic range
Other, less important, but not wholly insignificant, features include gamma settings, dark current, spectral
response, optical low-pass filtering, AGC range in dB, power consumption, and physical size.
5. CCTV cameras
The sensitivity of a camera, though clearly defined in broadcast TV, is quite often misunderstood in
CCTV and is usually confused with minimum illumination.
Sensitivity is represented by the minimum iris opening (maximum F-stop) that produces a full 1Vpp
(1 V peak-to-peak) video signal of a test chart, when that same test chart is lit by exactly 2000 lx
at 3200° K color temperature of the source. The test chart has to have a gray scale with tones from
black to white, and an overall reflection coefficient of 90% for the white portion of the gray scale.
One of the standard test charts used for
such purposes is the EIA gray scale chart.
The white peak level needs to be 700
mV and the pedestal level around 20 mV.
Gamma also plays a role in the proper
reproduction of the grays and needs to
be set at 0.45. In order to establish the
sensitivity of the camera, a manual iris
lens, usually of 25 to 50 mm, is required.
In order to get a realistic measurement, the
camera’s AGC should be switched off.
When all of the above is done, the manual
iris lens is closed just until the white peak
level drops from 700 mV, relative to the blanking level. The reading obtained from the lens’s iris setting,
like F-4 or F-5.6, represents the camera sensitivity. The higher the number is, the more sensitive the
camera is. It is important to consider using the same light source and gray scale chart when comparing
different cameras.
Measurement and screen captures
courtesy of Les Simmonds
The example above illustrates a camera sensitivity of F-5.6 for a gray scale test chart
signal reproduced as full 1 VPP video.
5. CCTV cameras
Minimum illumination
A camera’s minimum illumination, contrary to the sensitivity, is not clearly defined in CCTV. It usually
refers to the lowest possible light at the object at which a chosen camera gives a recognizable video
signal. It is therefore expressed with luxes at the object, at which such a signal is obtained. The term
recognizable is very loosely used, and depending on the manufacturer, it may or may not be defined.
This represents one of the biggest loopholes in CCTV. Most manufacturers do not specify what video
level we should get at the camera output for the light amount specified as the minimum illumination. This
level could be 30% (of the 700 mV), sometimes 50%, and, for some, even 10% might be acceptable.
The usual wording when describing minimum illumination would be, for example: “0.1 lx at the object
with 80% reflectivity using an F-1.4 lens.”
Have in mind, however, that with high AGC circuitry in the camera, even 10% of video (70 mV)
could be pumped up to appear as a much higher value than what it really is. This could obviously be
Let us say, for example, we have specifications that say 0.01 lx at the object with an F-1.4 lens, which
presumes (but does not tell you) the AGC is switched to on. Another manufacturer may be very modest
in its specs, stating, let us say, minimum illumination (where 50% of the video signal is obtained with
the AGC set to off) of 0.1 lx with an F-1.4 lens. On paper, the first case would seem to be a much more
promising camera than the second one, although the second, in reality, is much better.
Another matter for discussion is when some manufacturers state the minimum illumination at the object,
while others may refer to the minimum illumination at the CCD chip itself. This is not the same, and
there is a big difference as well.
When the minimum illumination of a camera (with the illumination at the object) is stated, we should
also read to what F-stop it applies. Also, another important factor to know is the reflectivity percentage
of the object when the illumination is stated.
If the minimum illumination is stated at the CCD chip, then not all the factors (such as reflectance and
lens transmittance) have been taken into account. So, we have to compensate for all those factors when
calculating the equivalent object illumination that is projected onto a CCD chip.
There is a rule of thumb (which I have elaborated on in the “Light onto an imaging device” section in
Chapter 2) that, with an F-1.4 lens, the minimum illumination at the chip is usually 10 times higher
Some typical levels of illumination
5. CCTV cameras
(lower lux number) than the sensitivity at the object. For example, an illumination of 1 lx at the object
with a reflectivity of 75% @ F-1.4 is equal to 0.1 lx illumination at the CCD chip.
As it can be concluded from the above, the real characteristics of a camera can be obscured quite easily
by simply not stating all of the factors. Read the specs carefully.
A known fact is that B/W CCD cameras always have a much lower minimum illumination than color
CCD cameras.
One reason for this is the infrared cut filter on the CCD chip. As described earlier, it corrects the spectral
response of the CCD chip so that it can be closer to the human eye’s sensitivity, but it also reduces the
amount of light that falls on the chip.
The other reason is the primary color construction of a single color chip, as used in CCTV. A single pixel
of a color CCD chip is composed of three subpixels, sharing the same physical space of a single B/W
pixel. The size will be no more than one-third of a B/W pixel, indirectly reducing the sensitivity.
In the period between the previous edition of this book and this one, many CCD cameras have appeared
called Day/Night cameras. These cameras usually have a color chip that is converted into B/W by
removing the infrared filter mechanically and integrating the RGB pixels into one monochrome signal.
This has an effect of a more sensitive but monochrome camera for low light levels. This also extends
the infrared spectrum response of the camera (since the infrared cut filter is removed). Some cameras
only switch to monochrome mode without removing the infra-red cut filter and integrating the RGB
pixel response. Some manufacturers have gone an extra step by putting physically two separate chips,
one color and the other monochrome, and then use some kind of mechanical switching between the
two, when the light level drops below a certain level.
Although such designs are very
practical, the mechanical switching
design has to be extremely good
as it might eventually fail if it
is executed every day. The most
common application of these
cameras would be in areas
where nighttime viewing with
infrared light is required while
preserving the color operation at
full daylight.
The majority of color cameras
these days, even without removing
the infrared cut filter, have better
sensitivity than the human eye.
The example above shows a boy with a candle in his
hand on the left-hand side, hardly noticeable by the
film camera, while the CCTV camera sees it quite
nicely as shown on the monitor right.
5. CCTV cameras
Camera resolution
Camera resolution is very simple but quite often misunderstood. It is also one of the most frequently
quoted parameters of a camera or complete system. When talking about resolution of a complete
system (camera-transmission-recording-monitor), the most important part is the input (i.e., the camera
resolution). There are vertical and horizontal resolution, and they are measured using a test chart.
Vertical resolution on the left and horizontal on the right
Vertical resolution is the maximum number of horizontal lines that a camera is capable of resolving.
This number is limited to 625 horizontal lines by the CCIR/PAL standard, and to 525 by the EIA/NTSC
recommendations. The real vertical resolution (in both cases), however, is far from these numbers. If
we take into account the vertical sync pulses, equalization lines, and so on, the maximum for vertical
resolution appears to be 576 lines for CCIR/PAL and 480 for EIA/NTSC. This needs to be further
corrected by the Kell factor of 0.7, to get the maximum realistic vertical resolution of 400 TV lines
for CCIR/PAL (see “Resolution” in Chapter 4, “General characteristics of television systems” for more
in-depth study), similar deduction can be applied to the EIA/NTSC signal, where the maximum realistic
vertical resolution is 330 TV lines.
Horizontal resolution is the maximum number of vertical lines that a camera is capable of resolving.
This number is limited only by the technology and the monitor quality. These days, we have CCD
cameras with horizontal resolution of more than 600 TV lines. The horizontal resolution of CCD
More accurate horizontal resolution measurement with 5% modulation depth
cameras is usually 75% of the number
of horizontal pixels on the CCD chip.
As explained earlier, this is a result of
the 4:3 aspect ratio. When counting
vertical lines in order to determine
horizontal resolution, we count only
the horizontal width equivalent to the
vertical height of the monitor. The idea
behind this is to have equal thickness of
lines, both vertically and horizontally.
So if we count the total number of
vertical lines across the width of the
monitor, we then have to multiply this
by 3/4, which is equal to 0.75. Because
this is an unusual counting, we always
refer to horizontal resolution as TV
lines (TVL) and not just lines.
5. CCTV cameras
The CCTV Labs test chart is specially designed for
CCTV and is used to check resolution and many
other important details.
The important thing to observe, when
measuring the resolution, is that the video signal must be properly terminated with 75 Ω and the image
must be seen in full, without the picture being overscanned (as is the case with most standard monitors).
To do this, a high-resolution monitor with higher resolution than the camera under test and with an
underscanning feature needs to be used. The camera is then set to the best focus possible (usually at a
middle F-stop, 5.6 or 8), having the test chart fully in the field of view. Also, all internal camera correcting
circuits (AGC, gamma, CCD-iris) need to be switched off. Resolution can then be visually checked by
measuring where the resolution lines (in the form of a sharp triangle) merge into smaller number.
For example, if the test chart shows four lines as in the example below, the point where these merge
into three, or two, is the limit. For most accurate measurement, only the luminance signal should be
analyzed, typically by turning the color completely down, or even better using Y/C connection on the
camera, if such is available. Since the merging of these lines does not have a clean cut, it represents
only an approximate conclusion. The visual error of reading might be around 10%, which makes it very
difficult to observe a difference between cameras with close resolution, for example 460 TVL and 480
TVL. For a more precise reading, a high-quality oscilloscope, with TV line selection feature, should
Measurement and screen capture by Les Simmonds
Visual detection of a horizontal resolution (in the middle) is not as precise as when
using a proper oscilloscope with line selection and measuring 5% modulation.
5. CCTV cameras
RETMA test chart
The IEEE-208 is a new recommendation for resolution measurement.
be used. The measurement is then narrowed down to selecting a line where the four lines modulation
depth is equal or better than 5%. How this is calculated is shown on the drawing on the previous page
and is basically 100 × (A - B)/(A + B), where A is the highest point and B the lowest of the measured
lines. Using an oscilloscope in such a case enables us to disregard the monitor’s resolution limits. In
order to know which part of the test chart
you are measuring, you should have a way
of telling which line you are measuring on
the test chart. There are oscilloscopes such as
the one shown in the photo, which can switch
to a picture display, and the measured point
is indicated with a line. If this is not the case
with your instrument, you need to somehow
mark the position so that you can recognize
it on the oscilloscope waveform reading. In
the case of the CCTV Labs test chart, we
have made this even easier by having a line
indicator on the left-hand side of the text
Measurement and screen capture by Les Simmonds
Highly recommended – the Tektronix TDS3012B
It should also be noted that only good optics need be used when measuring resolution; otherwise average
lenses tend to have better center resolution and the corners of an image (center is always better than the
corners), so with such lenses resolution measurements are best if made in the central area of the test.
Resolution is closely related to the signal bandwidth a camera is capable of reproducing. Their correlation
was explained in the earlier section on resolution, but a simple rule of thumb is that 80 TVL (TV lines)
are equal to 1 MHz of bandwidth.
Practical experience shows that the human eye can hardly distinguish a resolution difference of less than
50 lines. This is not to say that the resolution is not an important factor in determining camera quality,
but small resolution differences are often hardly noticeable, especially if the resolution difference is
smaller than 10% of the total number of pixels.
5. CCTV cameras
Single-chip color CCD cameras (as used in CCTV) have lower resolution than B/W, again because
of the separation into three color components, yet they still have the same size chips as B/W cameras.
When analog signal is converted into digital, other factors need to be taken into account. You will find
more explanation of this in Chapter 9. Three-chip color cameras, as used in broadcast TV, have higher
resolution. Also, high-definition TV cameras are now available (unfortunately we do not use these in
CCTV yet), where horizontal resolution exceeds 1000 TV lines.
A number of test charts now on the market can be used to evaluate cameras resolution. The most wellknown was the EIA RETMA chart, but a new one devised by the IEEE-208 recommendation is getting
more popular. There are others which you can easily find on the Internet. Many of them are designed
to measure only one particular characteristic of a camera, but our own CCTV Labs test chart, which
was designed specifically for the CCTV industry and introduced with the first edition of this book in
1995, has become a de facto CCTV industry standard.
Over 500 manufacturers are using the CCTV Labs test chart in their measurements and comparison tests.
As with the previous editions of this book, we have enclosed a reproduction of this chart at the back of
the book. This is the latest version, and with the evolution of versions throughout the years, each one
has introduced more measurement details. For more accurate measurements we encourage the reader to
obtain the larger format (A3) and more accurate reproduction chart available from the CCTV Labs web
site ( This one has more accurate details and color reproduction. Our publisher has
taken maximum care to correctly reproduce the version available in this book, but exactitude is beyond our
control as the procedure
encompasses color inks
and printing machinery
that could not have been
taken into account in our
chart setup.
Another approach that
we encourage is the open
exchange of test chart
results and image captures
for comparison purposes,
all of which are available
at the CCTV Labs web
site. We welcome you to
submit your results so that
other readers can compare
cameras and DVRs and
analyze thir results.
Measuring bandwidth is closely related to resolution.
For more details on what else is measured with the CCTV Labs test chart, please refer to Chapter 14.
5. CCTV cameras
Signal/noise ratio (S/N)
The signal to noise (S/N) ratio is an expression that shows how good a camera signal can be, especially
in lower light levels. Noise cannot be avoided but only minimized. It depends mostly on the CCD
chip quality, the electronics and the external electromagnetic influences, but also very much on
the temperature of the electronics. The camera’s metal enclosure offers significant protection from
external electromagnetic influences. Internal noise sources include both passive and active components
of the camera, their quality and circuit design; noise depends very much on the temperature. This is
why, when stating the S/N ratio, a camera manufacturer should indicate the temperature at which this
measurement is taken.
The image noise is very similar to the noise in old audio tapes, only it is part of a video and not of an audio
signal. On the screen, a noisy picture appears grainy or snowy, and if color signal is viewed, sparkles of
colors may be noticeable. Extremely noisy signals may be difficult for equipment to synchronize these
days, with the increased usage of digital video recorders. Noisy pictures when captured and digitized
look even worse because compression engines see the noise speckles as video detail.
The units for expressing ratios (including the S/N) are called decibels and are written as dB.
Decibels are only relative units. Instead of expressing the ratio as an absolute number, a logarithm is
calculated. The reasoning behind this is simple: logarithms can show big ratios as only two- or threedigit numbers, but more importantly, signal manipulation (as when calculating the attenuation of a
medium or amplification of a system) is reduced to simple addition and subtraction. Another reason
for using decibels (i.e., logarithms) is the more natural understanding of sound and vision quantities.
Namely, the human ear, as well as the eye, hears and sees sound and light quantities (respectively) by
obeying logarithmic laws.
When a ratio of any two numbers with the same units is calculated, the units are in dB only. If, however,
a relative ratio is calculated – for example, a voltage level relative to 1 mV – the units are called dBmV.
If the power value is shown relative to 1 µW, the units are called dBµW.
Seeing details in an image is affected not only by the resolution but by the noise also.
5. CCTV cameras
The general formula for voltage and current ratios is:
S/N = 20 log(Vs /Vn)
where: Vs is the signal voltage and Vn is the noise voltage. Current values are used when a current ratio
needs to be shown.
If a power ratio is the purpose of a comparison, the formula is a little bit different:
S/N = 10 log(P1/P2)
We will not explain here why this is different (the factors 10 and 20 in front of the logarithm), but
remember that it comes from the relation between the voltage, current, and power.
In CCTV, we use decibels mostly for calculating voltage ratios, which means the first formula will be
the one we would use.
The following table gives some dB values of voltage (current) and power ratios. Please note the
difference between the two. While a 3 dB voltage difference means only a 41% higher value of the
compared volts relative to the referred one, in terms of power this 3 dB means twice as much power
(100% increase) of the compared relative to the reference power.
The S/N ratio of a CCD camera is measured differently from that of a broadcast or transmitted
In a broadcast TV signal the S/N ratio is the signal versus the accumulated noise from the transmission
to the reception end. This is defined as the ratio (in dB) of the luminance bar amplitude to the RMS
voltage of the superimposed random noise measured over a bandwidth of frequencies between 10 kHz
and 5 MHz. There are special instruments that are designed to measure this value directly from the
signal, by using some of the video insertion test signal (VITS) lines.
The S/N ratio in a CCD camera is defined as the ratio between the signal and the noise produced
by the chip combined with the camera electronics. In order to get a realistic value for the S/N ratio
of a camera, all internal circuits that modify the signal in one way or another need to be switched off or
disabled. This includes gamma, AGC, CCD-iris, and backlight compensation circuitry. The temperature,
as already mentioned, should be kept at room level. Of the few different methods used to measure
camera video noise, the easiest one is to use a special instrument called a video noise meter. This unit
selects the noise in the band between 100 kHz and 5 MHz and reads the S/N directly in decibels.
5. CCTV cameras
Practically, a S/N ratio of more than 48 dB is considered
good for a CCTV CCD camera.
Do not forget: a 3 dB higher S/N ratio means approximately
30% less noise, since the video level does not change. So
when comparing a 48 dB camera with a 51 dB camera,
for example, the latter one will show a considerably better
picture, more noticeable at lower light levels. We should
always assume that the automatic gain control (AGC) is
off when stating S/N ratios. For comparison purposes, let
us just mention that broadcast CCD cameras have a ratio of
Photo courtesy of Cohu
more than 56 dB, which is extremely good for an analog A camera design with Peltier coolvideo signal.
ing keeps the CCD chip operating
temperature at 5ºC and reduces the
Keeping the camera as cool as possible reduces the
noise by 85%.
noise. Lower temperatures, in any electronic device,
produce less noise. In astronomy and other industrial applications there are specially cooled cameras
designed to keep the CCD as cool as possible. Temperatures of below –50° C are not uncommon. For
such applications, cameras are available where the CCD block has provision for a coolant to be attached
to it. Some designs, like the one shown on the photo above, use Peltier cooling to keep the CCD chip
always at 5º C, which reduces the noise to one-eighth of the normal room temperature. So, it should
be remembered, if in a CCTV system we do not use good quality cameras, high temperature can play
a significant role in lowering the picture quality.
Dynamic range of a CCD chip
Dynamic range (DR) is seldom mentioned in CCTV camera specification sheets. Nonetheless, it is a
very important detail of the camera performance profile.
The dynamic range of a CCD chip is defined as the maximum signal charge (saturation exposure)
divided by the total RMS (root-mean-square) noise equivalent exposure. DR is similar to S/N ratio,
but it only refers to the CCD chip dynamics when handling low to bright objects in one scene. While
the S/N ratio refers to the complete signal including the camera electronics and is expressed in dB, the
DR is a pure ratio number (i.e., not a logarithm).
This number actually shows the light range a CCD chip can handle – only this light range is not expressed
with the photometric units but with the generated electrical signal. It starts from the very low light
levels, equal to the CCD chip RMS noise, and goes up to the saturation levels. Since this is a ratio of
two voltage values, it is a pure number, usually in the order of thousands. Typical values are between
1000 and 100,000. External daylight can easily saturate the CCD chip since the dynamic range of the
light variation in an outdoor environment is much wider than the range a CCD chip can handle. A bright
sunny day, for example, can easily saturate a CCD chip, especially if a camera does not have AGC, an
auto iris lens, or a CCD-iris function. An auto iris lens optically blocks the excessive light and reduces
it to whatever upper level the CCD chip can handle, whereas CCD-iris does that by electronically
5. CCTV cameras
reducing the exposure time of the chip (1/50 s for
CCIR/PAL, 1/60 s for EIA/NTSC).
When saturation levels are reached during a CCD
exposure (1/50 s for PAL, or 1/60 s for NTSC),
the blooming effect may become apparent when
excessive light saturates not only the picture
elements (pixels) on which it falls but the adjacent
ones as well. As a result, the camera reduces the
resolution and detail information of the bright areas.
To solve this problem, a special antiblooming
section is designed in most CCD chips. This
On the left a camera with visible smear
section limits the amount of charges that can be
and on the right almost invisible smear
collected in any pixel. When antiblooming is
designed properly, no pixel can accumulate more charges than what the shift registers can transfer. So,
even if the dynamic range of such a signal is limited, no details are lost in the bright areas of the
image. This may be extremely important in difficult lighting conditions such as looking at car headlights
or perhaps looking at people in a hallway against light in the background.
Some camera makers, like Plettac, have introduced a special
design that blocks the oversaturated areas during the digital signal
processing stage. The video signal AGC circuitry then does not see
extremely bright areas as a white peak reference point, but much
lower levels are taken as white peaks, thus making the details in
the dark more recognizable.
Others are using new methods of CCD chip operation, where,
instead of having one field exposure every field time (1/50 s for
PAL, or 1/60 s for NTSC), two exposures are done during this period. One at a very short time, usually
around 1/1000 s and the other at the normal time that will depend on the amount of light. Then, the
two exposures are combined in one field so that bright areas are exposed with short exposure duration
giving details in the very bright areas, and the darker areas are exposed with the lower speed giving
details in the dimmer part of the same picture. The overall effect is of the dynamic range of the camera
being increased a number of times. Some manufacturers call this the “superdynamic effect.”
The Panasonic superdynamic effect
5. CCTV cameras
Color CCD cameras
Color television is a very complex science in itself. The
basic concept of producing colors in television is, as
described earlier, by combining the three primary colors:
red, green, and blue. The color mixing actually happens
in our eyes when we view the monitor screen from a
certain distance. The discrete colors (R, G, and B) are
so small that we actually see a resultant color produced
by the additive mixing of the three components. As
mentioned earlier, this is called additive mixing, as
opposed to subtractive, because by adding more colors
we get more luminance, and with a correct mixture of
the primary colors, a white can be obtained.
Single chip color CCD cameras are
the most common in CCTV.
Most broadcast color TV cameras are made with three CCD chips, each of which receives its own color
component. The white light’s separation into R, G, and B components is done with a special optical
split-prism, which is installed between the lens and the CCD chips.
Three-chip CCD color cameras use split-prism for color separation.
The split-prism is a very expensive and precisely manufactured optical block with dichroic mirrors.
These are called three-chip color cameras and are not very commonly used in CCTV because they are
considerably more expensive than one-chip cameras. They do, however, offer a very high-resolution
and superior technical performance.
In CCTV, single-chip color cameras are the most common. They produce a composite color video
burst signal, known as CVBS. As already discussed under “Color video signal,” in Chapter 4, the
three components of the signal that are embedded in the CVBS composition are luminance (Y) and the
5. CCTV cameras
color difference for red (V = R – Y) and for Blue (U = B – Y). These are quadrature modulated and,
together with the luminance, combined in a composite color video signal. Then, the color monitor
circuit processes these components and obtains the pure R, G, and B signals.
In single-chip CCD color cameras, the colors may be separated using one of the two filtering
• RGB stripe filter, where three vertical pixel columns (stripes) are next to each other: red, green,
and blue.
• Complementary colors mosaic filter, where the CCD chip pixels are not made sensitive to R,
G, and B colors, but to the complementary colors of cyan, magenta, yellow, and green, ordered
in a mosaic.
The first type of single-chip color CCD camera has a very good color reproduction and requires simpler
circuits to achieve the same. However, it suffers from very
low horizontal resolution, which is usually on the order of
50% of the total number of pixels in the horizontal direction
of the chip. The vertical resolution, however, achieves the
full number of vertical pixels. This type of color camera can
easily produce RGB color signals.
The mosaic-type single-chip color CCD camera requires
more complex camera-electronics, and it may lag in color
reproduction quality compared to the RGB models (because
of the color transformation needed to be applied to the Cy,
Mg, Ye, and Gr components), but it offers a much higher
horizontal resolution of over 65% of the horizontal number
of pixels.
Since the mosaic type is the most common type of camera
in CCTV, we will devote a bit more space to explaining how
color components are converted to obtain a composite color
video signal.
The mosaic filter, which is usually called color filter array
(CFA), splits the light into magenta, cyan, yellow, and green
components. As mentioned, these colors are selected as
complementary colors. So, in practice, this type of single-chip
CCD color camera uses Mg, Cy, Ye, and Gr color components
to produce the luminance signal Y, and the color differences
V = R – Y and U = B – Y. It should be noted that the singlechip color CCD cameras have light-sensitive pixels of the
same silicon structure, and are not different for different
colors as some may think. It is the CFA filter that splits the
image into color components.
5. CCTV cameras
In order to understand how this is
produced, see the diagram of the
color filter array on the right.
This type of CFA refers to a standard
field integration camera, that is, a
camera where the exposure time is
1/50 s for PAL or 1/60 s for NTSC.
As can be seen from the diagram,
the four cells of the horizontal
shift register contain signals of (Gr
+ Cy), (Mg + Ye), (Gr + Cy) and
(Mg + Ye), respectively. By proper
processing of these four signals,
we can get the three components
that make a composite color video
signal: the luminance (Y), the red
color difference (R - Y), and the
blue color difference (B - Y).
First, the luminance signal is obtained by the relation:
Y = ½ [(Gr + Cy) + (Mg + Ye)] = ½ (2B + 3G + 2R)
The above relation shows how luminance signal is obtained in both types of single-chip color CCD
cameras – the mosaic filter and RGB stripe filter.
The red color difference is similarly obtained through line A1:
R - Y = [(Mg + Ye) - (Gr + Cy)] = (2R - Gr)
The blue color difference is composed of line A2 values:
B - Y = [(Gr + Ye) - (Mg + Cy)] = (2B - Gr)
So, these are the two signals that, together with the luminance, are embedded in the composite video
signal and represent a PAL (or NTSC) color video signal, as per standards.
New developments are continually improving the CCD (and the new CMOS) imaging technology, and
one of them is worth mentioning here. This is the multilayered single-chip color developed by Foveon
Inc. Instead of having pixels for each primary color separately, they have invented a layering technique
where colors are separated as they penetrate on the same pixels. The result is better color reproduction
and higher resolution. Cameras with Foveon’s X3 chip are already on the photographic market, and it
will not be a surprise if similar cameras appear on CCTV cameras.
5. CCTV cameras
White balance
From color cameras we require, apart from the resolution and minimum illumination, a good and
accurate color reproduction.
The first color CCD cameras had an external color sensor (usually installed on top of the camera) whose
light measurement would influence the color processing of the camera. This was called automatic
white balance (AWB), but lacked precision owing to the discrepancy of the viewing angle between the
white sensor and the camera lens. In modern cameras, we have a through-the-lens automatic white
balance (TTL-AWB).
Generally, the initial calibration of the camera is done by exposing the CCD chip to white on power up.
This is achieved by putting a white piece of paper in front of the camera and then turning the camera
on. This stores correction factors in the camera’s memory, which are then used to modify all other
colors. In a way, this depends very much on the color temperature of the light source in the area where
the camera is mounted.
Many cameras have an AWB reset button that does not
require camera powering down. How good, or sophisticated,
these corrections are depends on the CCD chip itself and
the white balance circuit design.
Although the majority of cameras today have AWB,
there are still models with manual white balance (MWB)
adjustments. In MWB cameras there are usually two
settings (switch selectable): indoor and outdoor. Indoor is
usually set for a light source with a color temperature of
around 2800° K to 3200° K, while the outdoor is usually
around 5600° K to 6500° K. These correspond to average
indoor and outdoor light situations. Some simpler cameras,
however, may have potentiometers accessible from the
White balance setting usually can
be automatic or manual.
5. CCTV cameras
outside of the camera for continuous adjustment.
Setting such a color balance might be tricky
without a reference camera to look at the same
scene. This gets especially complicated when
a number of cameras are connected to a single
switcher, quad, or multiplexer.
Newer design color cameras have, apart from
the AWB, an automatic tracking white balance
(ATWB), which continually adjusts (tracks) the
color balance as the camera’s position or light
changes. This is especially practical for PTZ
cameras and/or areas where there is a mix of
Accurate color reproduction can be
natural and artificial light. In a CCTV system
checked with a vectorscope.
where pan and tilt head assemblies are used, it
is possible while panning for a camera to come
across areas with different color temperature lights, like an indoor tungsten light at one extreme and
an outdoor natural light at the other. ATWB tracks the light source color temperature dynamically,
that is, while the camera is panning. Thus, unless you are using ATWB color cameras, you have to be
very wary of the lighting conditions at the camera viewing area, not only the intensity but the color
temperature as well.
Last, and as mentioned earlier, do not forget to take into account the monitor screen’s color temperature.
The majority of color CRTs are rated as 6500° K, but some of them might have higher (9300° K) or
even lower (5600° K) color temperature.
CMOS technology
CCD technology is about 30 years old now. It has matured to provide excellent image quality with
low noise. Although CCD chip operational fundamentals are based on MOS electronics (metal-oxidesemiconductor), the actual manufacturing of CCD chips requires a special type of silicon technology
with its own customized fabrication line.
It is technically feasible, but not economically so, to use the CCD process to integrate other camera
functions, like the clock-drivers, timing logic, and signal processing. These are therefore normally
implemented in secondary chips. Thus, most CCD cameras are comprised of several chips.
Apart from the need to integrate the other camera electronics into a separate chip, the Achilles heel of
all CCDs is the clock requirement. The clock amplitude and shape are critical for successful operation.
Generating correctly sized and shaped clocks is normally the function of a specialized clock-driver
chip and leads to two major disadvantages: multiple nonstandard supply voltages and high power
consumption. If the user is offered a simple single-voltage supply input, then several regulators will
be employed internally to generate these supply requirements.
5. CCTV cameras
In the last couple of years a new type
of image chip has appeared on the
market called complentary metal oxide
semiconductor (CMOS) chips.
CMOS sensors are manufactured on
standard CMOS processes using the socalled very large scale integration (VLSI)
technique. This is a much cheaper and
more standardized method of chip
manufacturing than is the case with
A major advantage of CMOS cameras
over CCDs lies in the high level of
product integration that can be achieved through implementing virtually all of the electronic camera’s
functions onto the same chip. CMOS technology is ideal for this function, and with its timing logic,
exposure control and A/D conversion can be put together with the sensor to make complete one-chip
CMOS imagers sense light in the same way as CCD, but from the point of sensing onward, everything
is different. The charge packets are not transferred, but they are instead detected as early as possible by
charge-sensing amplifiers, which are made from CMOS transistors. In some CMOS sensors, amplifiers
are implemented at the top of each column of pixels – the pixels themselves contain just one transistor
which is used as a charge gate, switching the contents of the pixel to the charge amplifiers. These passive
pixel CMOS sensors operate like analog dynamic random access memory (DRAM).
Conceptually, the weak point of CMOS sensors is the problem of matching the multiple different
amplifiers within each sensor. Some manufacturers have overcome this problem by reducing the residual
level of fixed-pattern noise to insignificant proportions. With the initial CMOS designs and prototype
cameras, there were problems with low-quality, noisy images that made the technology somewhat
Photo courtesy of Pixim
Modern and sophisticated CMOS chip offers A/D conversion on the chip.
5. CCTV cameras
questionable for commercial applications. Chip process variations produce a slightly different response
in each pixel, which shows up in the image as snow. In addition, the amount of chip area available to
collect light is also smaller than that for CCDs, making these devices less sensitive to light.
These issues, however, have been improved considerably in the last five years. Many advancements in
CMOS technology have occurred in the interval between the two editions of this book. The imaging
obtained from CMOS chips has improved dramatically, driven by the explosive demand of digital
cameras in photography; it forced manufacturers to find ways of increasing image quality while
reducing the price of production. Some of the major CMOS manufacturers, such as Canon and Kodak,
have introduced chips with 10 million pixels, which produce extremely good quality images. Many
innovations and improvements in producing CMOS have been made. One such innovation is the
removal of the so-called fixed pattern noise by method of imprinting the noise of the CMOS chip as
its own “noise signature” and deducting it from an exposed image, thus producing a correction of the
exposed image so that it appears with minimal noise.
Another new design that is especially interesting for CCTV is the one that we only hinted at as a
possibility around five years ago, and it is now a reality. A company by the name of Pixim has developed
a new CMOS chip that actually converts the analog electron charges into a digital stream of data,
directly at the chip itself. This is a revolutionary and an extremely powerful new concept that allows
for many imperfections of the CMOS and projected image to be improved or fixed. One of them is
accurate control of light exposure which can vary at each pixel location, thus allowing even higher
dynamic range; another one is subtracting the so-called dark noise of the chip itself, reducing the S/N
of the chip.
Special low-light-intensified cameras
The CCD chips have better minimum illumination performance than the image tubes, but there is still
a limit to how low they can see. A reasonably good approximation would be that a B/W CCD camera
can see, in low light levels, as much as the human eye. Described in a technical way, normal B/W
CCD cameras can cover a light range from 105 lx to 10–2 lx. This range of light intensity is called a
photopic vision area.
Sometimes, for special purposes, there is a need for an even lower light level camera. The light range
lower than 10–2 lx belongs to the scotopic vision area. Although the human eye cannot see this low, it is
possible to get images from light levels much lower than 10–2 lx with the use of the integration function
available on some cameras. This is a function where exposure time longer than 1/50 s (1/60 s for EIA)
is used. Obviously, in such a case we lose the real-time effect and the camera actually becomes a kind
of storage device. This might not be acceptable for viewing moving objects in low light levels, but it
is a good alternative for viewing slow-moving objects in the dark. If we want to see real movement
in the scotopic vision area, a special type of camera called intensified, or low light level (LLL), can
be used.
Intensified cameras have an additional element, called a light intensifier, that is usually installed
between the lens and the camera. The light intensifier is basically a tube that converts the very low
5. CCTV cameras
light, undetectable by the CCD chip, to a light level
that can be seen by it. First, the lens projects the low
light level image onto a special faceplate that acts as
an electronic multiplying device, where literally every
single photon of light information is amplified to a
considerable signal size. The amplification is done
by an avalanche effect of the electrons, which light
photons produce when attracted to a high-voltage
static field. The resultant electrons hit the phosphor
coating at the end of the intensifier tube, causing the
phosphor to glow, thus producing visible light (in the
same manner as when an electron beam produces light
onto a B/W CRT). This now visible image is then
projected onto the CCD chip, and that is how a very
low light level object is seen by the camera. Because
of the very specific infrared wavelengths of low light
levels, as well as the monochrome phosphor coating
of the intensifier, the LLL cameras will only display
monochrome images.
It is to be expected that, having a phosphor coating
inside the intensifier, the lifetime, or more correctly,
the MTBF (mean time between failure) of an
intensifier tube is short. It is usually in the vicinity
of a couple of thousand hours.
In order to prolong this lifetime, high F-stop lenses are necessary (with at least F-1200), especially
if the camera is to be used day and night. Also, lenses with infrared light correction should be more
More advanced and purposely built LLL cameras have a fiber optic
plate for coupling the phosphor screen of the intensifier tube to the
CCD chip. This technique avoids any further light losses and improves
picture sharpness.
5. CCTV cameras
Needless to say, the intensifier requires
a power source in order to produce
the high-voltage static field for the
electrons’ acceleration.
This type of intensifier can be
bought separately and installed onto
a camera, but specifically made
integrated cameras have much better
Another interesting and innovative
design has been offered by PixelVision
Inc. with its back-illuminated CCD
camera that operates without an
Photo courtesy of e2Vtechnologies
image intensifier. This camera, the
A modern LLL camera
manufacturer claims, is capable of
acquiring quality images at low light
levels previously attainable only with image intensifier tubes. Conventional video cameras use frontilluminated CCDs that impose some limitations on performance. The design of their special device
illuminates and collects a charge through the back surface, permitting the image photons to enter the
CCD unobstructed, allowing for high-efficiency light detection in the visible and ultraviolet wavelengths.
The manufacturer claims greater resolution under low light conditions through increased sensitivity,
better target identification through superior contrast and resolution, lower cost, and a longer lifespan
through increased reliability.
Camera power supplies and copper conductors
A typical CCD camera consumes between 3 and 4 W of energy. This means that a 12 V DC camera
needs no more than 300 mA of current supply. A 24 V AC camera needs no more than 200 mA. As the
technology improves, cameras will consume less current.
When powering a number of cameras from a central power supply, it is important to take the voltage
drop into account and not to overload the supply.
5. CCTV cameras
Another very important factor to check with DC power supplies is whether or not they are regulated.
For example, if a power supply of 12 V DC/2 A is used, it is advisable to have approximately 25% to
30% of spare capacity to minimize overheating. Be very critical when choosing a power supply. When
some manufacturers quote 12 V/2 A, the 2 A may only be a maximum rating. This is usually defined
with short lengths of peak deliveries. In other words, you cannot count on a constant load of 2 A with
any 2 A supply. It really depends on the make and model. Very often, 12 V DC power supplies are
actually made with a 13.8 V output used for charging batteries on security panels. Take this fact into
account to minimize camera overheating, especially if there is only a short-run power cable between
the camera and the power supply. Usually no intervention is required for a couple of hundred meters
of power cable run because of the voltage drop, but if the camera is in the vicinity of the power supply,
the excessive power must be dissipated somewhere, and this is usually in the camera itself. To put it
simply, the 12 V DC camera gets hotter if it is powered from a 13.8 V rather than a 12 V power supply,
and this influences the camera’s S/N performance.
Unregulated DC power supplies (usually in the form of plug-packs) are not very healthy for the CCD
cameras. First, there is a high probability of blowing the camera’s fuse when the power is switched on,
owing to voltage spikes created when turning the load on (the camera in this instance). Second, there
is an extra power dissipation that occurs in the camera when more than 12 V DC are applied.
Finally, if the camera does not have any further voltage regulations inside (DC/DC conversion), or if
the regulations are of a bad quality, the unregulated voltage ripples may get into the readout pulses,
thus affecting the video signal.
On the other hand, in most of the regulated power supplies there is a short-circuit protection. That
means, even if the installer makes a mistake with the polarities or termination, the power supply will
cut off the output, thus protecting the supply and the camera from further damage. Also, with regulated
5. CCTV cameras
power supplies, the voltage can be adjusted to compensate for voltage drops.
This is not the case with unregulated supplies.
Voltage drop has to be taken into account when powering distant cameras. This is especially critical
with 12 V DC cameras since the voltage drop at lower DC voltages is more evident. This is a result of
the P = V · I formula, where for a certain camera power consumption level, the lower the voltage is,
the higher the current will be, indirectly increasing the voltage drop through a long run power cable.
Very similar logic applies when using numerous 24 V AC cameras powered from a single source
(transformer). When calculating the total amount of current required for all the cameras, always leave
at least 25% to 30% of spare capacity.
When AC cameras are used, attention should be paid first of all to the voltage rating (24 V is what
the majority of AC-powered cameras require). Very often, power transformers can be purchased that
have secondary voltage stated with the transformer fully loaded, as with halogen lamps. This might be
misleading, since with big and constant loads, transformers may show lower voltage than they would
have if only one camera was connected to it.
An AC camera’s current consumption is very minimal (200 to 300 mA), so you should look for
transformers with an open circuit of 24 V AC rating. Not by any means least important is the sine wave
appearance, which can be especially critical when uninterruptible power supplies (UPS) are used. If
a step-sine wave UPS is used, it may interfere with the camera electronics and phase adjustment. If a
UPS is part of the CCTV system, a true sine wave is what we should always intend to use.
We will see in the following a very basic calculation for the voltage drop which occurs in the so-called
figure-8 cable that powers a single 12 V DC camera.
The typical copper wire resistance, together with the cross section and the AWG (American Wire Gauge)
is shown in the following table:
The popular figure-8 cable is, in most cases, a 14/0.20 type. The first number indicates the number of
strands per conductor, and the second indicates the diameter of each strand in mm. The cross-sectional
area of this cable is 14 × (0.1)2 × 3.14 = 0.44 mm2 . The resistance for a copper figure-8 wire, per meter,
is approximately 0.04 Ω. Α typical manufacturer’s specification for the 14/0.20 states approximately
8 Ω/100 m DC loop resistance (loop, meaning 2 × 100 m). Using these numbers we can calculate
5. CCTV cameras
5. CCTV cameras
the average voltage drop when powering a 12 V DC camera via a 300 m cable run, using the very
simple Ohm’s Law.
A realistic assumption would be that our 12 V CCD camera consumes 250 mA. This means that the
camera is seen by the power supply as 12 V/0.25 A = 48 Ω resistor. For 300 m of 14/0.20 cable we
will have a total loop resistance of 24 Ω. The supply voltage will now see a total resistance of 72 Ω.
The 12 V will be divided between the Rc and Rccd proportional to the resistance; that is, we will have a
voltage divider. The calculation will show Vd to be 4 volts.
With a 4 V drop, the camera will most likely not work. Therefore, we have to increase the voltage (and
a plug-pack cannot do this) to at least 16 V, according to this calculation.
In practice, however, depending on the camera, we may only need as much as 13 V, for our camera
under test may work properly with as low as 9 V (if we still assume around a 4 V drop). This would
be the case if the camera’s internal minimum requirement (due to further DC/DC regulations inside)
were no higher than 9 V.
If we were to use a 24/0.20 cable instead, we would have a 15 Ω total loop resistance, and using the
same calculations we would get only a 2.8 V voltage drop.
The conclusion is: The thicker the cable we use, the smaller the loop resistance will be, thus a
smaller voltage drop. Increasing, or pumping-the-voltage-up, with a regulated power supply unit
(PSU) may help, since the regulation range of such supplies is usually from 10 V to 16 V DC.
A similar principle applies to 24 V AC cameras, only then we are talking about RMS voltages (root
mean square); therefore, it may look as though there is a smaller voltage drop.
Ohm’s Law is valid for both AC and DC voltages, so if we try to calculate the voltage drops for when the
camera is powered with, let us say 24 V AC, we have to consider two things: the current consumption
is lower (since the voltage is higher), and the 24 V AC we refer to are really RMS, that is, 24 × 1.41
= 33.84 Vzp (volts zero-to-peak). So, by applying Ohm’s Law, a mathematical calculation will obtain
a lower voltage drop compared to the 12 V DC power, but this is only due to the different current and
voltage numbers. In other words, a lower voltage drop with 24 V AC (and even
lower with 110 or 240 V AC) is not because different laws apply to AC cameras,
but simply because the voltage is higher. This is in fact the same reason power
used in households is not distributed from power stations at the level it is used in
the household, but it is raised to tens of thousands of volts. Thus, the current and
voltage drop, due to the power cables’ resistance with long distances, becomes
For the purposes of easy calculation and further reference, located on the previous
page is a table of the typical copper wires found on the market, showing the relation
between the nearest AWG number, the most common stranding technique, the area
in mm2, and the resistance in ohms.
5. CCTV cameras
V-phase adjustment
AC-powered cameras are usually line-locked. This means that the vertical video frequency is
synchronized with the mains frequency. If all cameras in a system are locked to the same power supply,
that is, to the same phase (do not forget that we can have three different phases, each of them displaced
at 120° relative to the other two), then we will (indirectly) have synchronized cameras.
For the purpose of fine adjusting the vertical phase of each separate line-locked camera, a V-phase
adjustment is available. V-phase adjustment can not only align the vertical sync of the cameras relative to
their mains frequency zero crossing, but it can compensate even when different phase mains is used.
In order to do this, an oscilloscope with two
channels is required. One camera is then taken
as a reference, to which a monitor’s vertical
adjustment is set, so as to have no picture roll.
The V-phase of the camera being adjusted is
set so as to coincide with the V-phase of the
referenced camera.
It should be noted that not all AC cameras are
necessarily line-locked. That really depends
on the camera design and provision in the
electronics for such locking. If in doubt check
with your supplier.
The majority of AC cameras have V-phase
Line-locking cameras require two-channel oscilloscope and V-phase adjustment on
the slave cameras, to follow the master one.
5. CCTV cameras
Camera checklist
In order to help people involved in installations, here is a list of things to be checked before the camera
is installed in its position. Some may find this list very helpful, and others may even like to add a few
more operations, specific to their particular system. Many integral cameras (fixed and PTZ) made
in recent years do not require all of the steps listed below since many things are now factory preset.
However, there are still many camera setups that may require your thorough checking.
So, it is advisable to check the following before a camera is installed:
* Auto iris plug. This usually comes with the camera, not with the lens. Unfortunately, there is
no standard among manufacturers, although lately it seems that a majority of them are compatible, but still it is better to check. AI connectors of all shapes and sizes are available, although
the square ones are most common. Keep the connector with the camera. It might be very hard
to find a spare if you lose it. Also, keep the AI pin-wiring diagram that usually comes with the
camera instructions.
* If a DC camera is used, be sure to work out which is the positive and which is the negative
end of the power plug. Sometimes the tip is positive, and sometimes it is negative. For some
DC cameras there is no need for polarity to be known, as they are auto-sensing.
* Do the back-focus in the workshop, especially if a zoom lens is used. Doing the back-focus
on site will be at least 10 times harder. Follow the procedure described in the back-focus section, until you get more practice.
* Select a suitable lens for the angle of coverage required. For this purpose you can use focallength viewfinders, hand calculators, tables, and so on. Take into account the CCD chip size, as
well as whether you have a C-mount or CS-mount camera/lens combination. In the last couple
of years vari-focal lenses have been used instead and adjusted on site. Sometimes they may
not have a wide enough or narrow enough angle of view to suit the application, so a fixed focal
lens may be the answer.
* Adjust the optimum picture for the estimated distance when the camera is installed. This is
not so critical for a fixed lens, but installers tend to forget to adjust the camera focus on site,
or unintentionally change the focus ring. If any out-of-focus problems appear, they will not
be noticed during the daytime when the depth of field is big. They will become obvious and
problematic at nighttime, when the depth of field is minimal.
* Make sure that the level setting of the auto iris is good for day and night situations. ALC adjustment is important only if a very high-contrast scene needs to be monitored. The level may
need some adjusting depending on the picture contrast.
* Get the mounting screws for the camera (if installed in a housing) and the bracket. These are
1/4" imperial thread screws, usually 10 to 15 mm in length. Sometimes trivial things like this
will slow your installation.
5. CCTV cameras
* Make sure the camera/lens combination fits in the housing. If a zoom lens is used, take into
account the focusing objective protrusion when focused to the minimum object distance (MOD).
This should not add more than 10 mm to the lens length.
* Set the ID of the camera if such a model is used.
* If a camera with a CCD iris is used, along with an auto iris, switch the CCD iris off. Alternatively, use a manual iris or remote-controlled iris lens. Auto iris and CCD-iris do not go together
very well.
* Set a higher shutter speed if the application requires. This is usually the case when high-speed
traffic is observed and the signal is recorded on a VCR or DVR. Have in mind, however, that
with higher shutter speeds you will need more light on the object, and the CCD smear may
become more apparent.
* Set the power supply voltage value to what is required, that is, take into account voltage drop.
Also, consider the current required by all cameras connected to the supply.
* If a 24 V AC camera is used and synchronization needs to be achieved, a V-phase adjustment
may be necessary. You will need a two-channel oscilloscope and a reference camera for this
purpose. Very often it is easier to make such adjustment in the workshop, and when such cameras
are installed on site, just make sure they are powered from the same phase. Otherwise they will
be displaced for 120º, as this is the phase difference in the mains three phase system.
* If a color camera is used, check the white balance setting. Some cameras have selectable
indoor and outdoor white balance. Among the automatic white balance models you will find
cameras with AWB (automatic white balance) and ATWB (automatic tracking white balance)
selectable. In most situations ATW is the better choice.
* If a digital signal processing camera is used, set the parameters to suit the application.
* If a PTZ camera is used, set the camera ID, communication Baud rate data termination to the
correct values.
* If a PTZ camera is used, do not forget the mounting brackets, either wall or ceiling mount,
with the suitable cable connectors, conduits, and sealants (especially for outdoor installation).
6. CCTV monitors
Monitors are often considered an unimportant investment in CCTV, compared to the other parts of a
CCTV system. It is, however, very clear that if a monitor is not of equal or better quality than a camera,
the overall system quality will be diminished. Simple but worthwhile advice is: pay as much attention
to your monitor as you do to your camera selection.
General about monitors
Monitors display a video signal from a camera after it has gone through the transmission and switching
media. The camera might be of excellent quality and resolution, but if the monitor does not reproduce
equally or better than the camera, the whole system loses in quality.
In CCTV, as in broadcast TV, the majority of monitor display units are CRTs, which means they use
cathode ray tube technology, designed to convert the electrical information contained in the video
signal into visual information. Today, there are many alternatives to CRTs, such as liquid crystal display
(LCD) monitors, plasma display, and rear projection monitors, but the most popular are still the CRT
Monochrome monitor operation
6. CCTV monitors
The CRTs are coated on the inside with a phosphor layer that, when bombarded with electron beams,
converts the kinetic energy of the electrons into light radiation. Different compositions of phosphor
produce different colors. This is defined as the phosphor spectral characteristic.
For a monochrome (or B/W) CCTV system, a phosphor layer that produces neutral color is used.
Color CRTs use a mosaic of three different phosphors that produce red, green, and blue, that are called
primary colors. These are little pixels (limited by the physical size of the mask) that, when viewed
from a distance, mix into a secondary (resultant) color.
It has been proven that with the red, green, and blue primaries the majority of natural colors can be
simulated. This kind of color mixing is called additive mixing because light is added by each of the primary
components to produce the resultant color. This is contrary to the subtractive color mixing as in painting
The “In-line” type CRT has RGB pixel elements arranged in line, and every second
displaced vertically by half in order to make the most of the interlaced scanning.
6. CCTV monitors
and printing, where
the term subtractive
is used because these
colors are produced
by reflecting the
light. In that case we
have an absorption
of certain colors
(depending on the
color pigmentation),
thus a passive method
of producing the
resultant color.
There are a few
different technologies
available for making
The “Delta” type CRT has RGB pixels arranged in a triangle.
color CRTs, based on
how red, green, and blue phosphor elements are arranged. Some of these are patented technologies,
such as the popular Sony’s Trinitron. The other two common ones are the “In-line” as shown on the
representation on the previous page and the “Delta” as shown above. These technologies are used in
CCTV CRT monitors, but also in computers. The maximum resolution that can be reproduced is defined
first by the smallest RGB elements, which make a color dot, and their arrangement. This is usually
specified in the CRT technical data as dot-pitch. Current technology produces the smallest dot-pitch
of around 0.21 mm. This then indirectly defines the smallest CRT screen size with a given resolution.
This is one of the reasons small color monitors, for example, do not come in high resolution.
As in the human eye, an important property
of the CRT phosphor is persistence. The
persistence of the phosphor layer is
described as the duration of the luminance
after the electron bombardment has
stopped. Since the light produced does
not disappear abruptly, but decreases
slowly, persistence is measured until
the time when the luminance produced
decreases to 1% of its initial value.
Phosphor persistence is a useful feature
because it helps minimize the flicker, but
it should not be longer than the TV frame
duration (40 ms), as we want reproduction
of dynamic images, whose movements
would be blurred if the persistence were
too long. The persistence of the majority
of CRTs used these days is around 5 ms.
Cross section of a CRT
6. CCTV monitors
This is a bit more complicated with color
monitors, since not all the phosphors have
the same persistence (the blue phosphor has
the shortest), but they are all around 5 ms.
Apart from the persistence, other important
properties of the phosphor used in TV
monitors are efficiency and spectral
Efficiency is defined by the ratio between
the produced light flux and the electron
beam power. The electron beam power
depends on the acceleration produced by
the CRT’s high voltage and the electron
beam itself. Different phosphors have
different efficiencies; they can produce
different luminance with the same amount
A CRT monitor inside
of electrons and high voltage. In color TV,
the phosphor that produces green color, for example, has the highest efficiency, and the red one has
the lowest. Hence, the equation:
UY = 0.3 UR + 0.59 UG + 0.11 UB
is applied to the electron beams of the R, G, and B colors in color television sets. This is all happening
automatically inside a color monitor, and we do not have to worry about it, but it only indicates how
delicate the color balance of the three primaries can be. Even a little stronger external magnetic field
can affect the balance, which we sometimes see in the corners of monitors. To fix this, degaussing
coils are used, which fire a very strong electromagnetic pulse when turning on a monitor. Magnetic
color distortion occurs frequently when loudspeakers are near a monitor, or even if two monitors sit
next to each other. Their own magnetic field affects the other’s precision of reproducing red, green,
or blue, and in order to minimize this, in CCTV we use metal-cased monitors. Reproducing colors
correctly on a color monitor is a delicate process. Calibrating the camera with its white balance and
color temperature is only the beginning. The same process is repeated in the CRT. White color balance
in monitors is one of the most delicate adjustments in the manufacturing of monitors and TVs, since it
is very difficult to be done by the human eye, which is easily adaptable. Special color probes are used
for accurate tuning.
The basic division of monitors in CCTV is made into B/W and color, although lately, it is almost
impossible to find a B/W monitor. Because of the TV standard’s recommendations, there must be a
compatibility between B/W and color. B/W video signal can be displayed on a color monitor, and a
color signal can be displayed on a B/W monitor. B/W monitors have better resolution (since they have
only one continuous phosphor coating) and are very useful in measuring resolution. The smallest dot
element in B/W monitors is not defined by a dot-pitch (as there is none) but by the smallest electron
beam cross section hitting the phosphor.
6. CCTV monitors
Monitor sizes
Monitors are referred to by their diagonal screen size, which is usually expressed in inches, but
sometimes in centimeters. B/W monitors have a variety of sizes; most often used are 9'' (23 cm) and 12''
(31 cm). Smaller sizes, such as 5'' (13 cm) and 7'' (18 cm), are not very practical apart from, perhaps,
vehicle rear vision systems, video intercoms and back-focus adjustments. Bigger ones are most often
used where split-screen images are required, where sizes like 15'' (38 cm), 17'' (43 cm), and 19'' (48
cm) are available.
The most popular color monitor size in CCTV is 14'' (36 cm). This size is most suitable for the viewing
distances typical in CCTV. There are 9'' monitors (some manufacturers make 10'' CRTs as well), which
quite often are more expensive than the 14'' ones. This is due to the massive production of 14'' CRTs
for the domestic market, which has brought the tube prices down. Larger color monitors, such as 17"
or 20", are also available, but they are of a better quality and therefore more expensive.
A lot of installers prefer to use a 14'' TV receiver instead of a proper monitor. This is usually due to
the price advantage. TV receivers are produced by the hundreds of thousands and they have become
very cheap. When such a display is used, you have to make sure the TV has audio/video inputs since,
as we said earlier, in CCTV we use basic bandwidth video signals. In order to display the image on the
screen the A/V channel has to be selected, that is, bypass the TV tuner. If the TV does not have an A/V
input, this might be possible through the VCR A/V inputs, since VCR modulates the video signal at its
output to the VHF or UHF band (usually channels 2, 3, 4, or 36). The picture quality of a TV receiver,
when compared to a monitor’s display, may or may not be of equal quality. This depends on the CRT,
the receiver quality, and the input bandwidth, which are usually made to suit a 5 MHz broadcast signal.
Another important factor to consider is that TV receivers are usually housed in a plastic shell and are
not protected against electromagnetic radiation from another set next to it. As we know, in CCTV a
few monitors may be positioned next to each other, and that is why CCTV monitors are usually housed
in metal cabinets.
9" (23 cm) and 14" (36 cm) color monitors
6. CCTV monitors
Monitor adjustments
CCTV monitors usually have four adjustments at the front of the unit: horizontal hold, vertical hold,
contrast, and brightness.
The horizontal hold circuit adjusts the phase of the horizontal sync of the monitor circuit relative to the
camera signal. The effect of adjusting the horizontal hold is like shifting the picture left or right. When
the horizontal phase goes too far to either end, the picture becomes unstable and horizontal scanning
lines break. A similar effect may appear when the horizontal sync pulses are too low or deformed;
this usually happens with long coaxial cable runs (voltage drop due to significant resistance and highfrequency losses due to significant capacitance). The last effect cannot be compensated for by the
horizontal hold adjustments. By adjusting the horizontal hold, the picture can only be centered.
The vertical hold adjusts the vertical sync phase. This has an effect of compensating for various cameras’
vertical syncs. Usually, a monitor is adjusted for only one video signal, and so the picture stays stable.
However, when more nonsynchronized video signals are sequentially switched onto a monitor, an
unwanted effect called picture roll occurs. This is perhaps the most unwanted effect in CCTV. It occurs
owing to the monitor’s inability to quickly lock to the various signals as they are switched through a
sequential or a matrix switcher (this is also discussed in the switcher section). This also means that
various monitor designs have various locking times. Better monitors lock to vertical syncs quicker.
In CCTV, switching numerous cameras onto one monitor is the most common system design. This is
why we will devote some more space to explain the synchronization techniques used in CCTV. Very
rarely, systems are designed where each camera goes onto its own monitor. Not only do the system
costs become prohibitive, but the practicality of such systems is not sustained. First of all, physical
space is required for more monitors, but more importantly no security operator can concentrate for
long periods on so many different monitors.
Contrast adjusts the dynamic
range of the electron beam,
thus making the picture with
higher or lower contrast (a
difference from black to
white). It is usually used
when lighting conditions
in the room (where the
monitors are) change.
Brightness is different from
the contrast adjustment
because it raises or lowers
the DC level of the electron
beam, while preserving
the same dynamic range.
It is adjusted when the
6. CCTV monitors
video signal tone reproduction is not
A simple rule of thumb is to have
the brightness and contrast adjusted
so that the viewer can see as many
picture details as possible. The less
light in the monitor room, the lower the
contrast setting can be. By reducing the
contrast, picture sharpness improves
(smaller electron beam cross section),
and the CRT lifetime is prolonged.
Sometimes, brightness and contrast
are hard to adjust properly, especially
when switching different cameras with
different video signals. In order to have
an objective setting for the brightness
The gray scale of a test signal is used to adjust
and contrast, a test pattern generator
brightness and contrast to optimum setting.
that produces an electronic gray scale
should be used (i.e., where the gray
levels are equally spaced). Then, the contrast and brightness are adjusted so as to distinguish all of
the steps equally well. After such an adjustment is made, the camera brightness and contrast can be
judged more objectively. Consequently, we can decide whether a certain camera needs to have its iris
level or ALC adjusted.
With time, the phosphor coating of a monitor’s CRTs wears out. This is due to constant bombardment
of the phosphor layer with electrons. The lifetime expectancy of a B/W CRT is around 20,000 to 30,000
hours. This means about a couple
of years of constant operation.
Wo r n - o u t C RT p h o s p h o r
reproduces images with very
poor contrast and sharpness.
Color monitors should last a
little longer because the smaller
number of electrons (note that
there are three separate beams
for the three primary colors) are
used to excite each of the three
phosphors. In any case, after a
few years of constant use, contrast
and brightness adjustment can
no longer compensate for the
CRT’s ageing, and that means
the monitors need replacement.
6. CCTV monitors
Sometimes, when a monitor is displaying one camera all the time, an imprinted image effect becomes
noticeable (as was the case with tube cameras). If brightness and contrast adjustment are used carefully
and in accordance with the ambient light, the monitor’s life can be prolonged. The same applies to the
domestic TV receivers.
Linearity and picture height
are two other adjustments,
and are usually located at the
back of the monitor.
Linearity adjusts the vertical
scanning linearity, which
is reflected in the picture’s
vertical symmetry. If the
linearity is not properly
adjusted, circles appear eggshaped. In order to adjust
monitor linearity, a test
pattern generator with a
circular pattern is required.
Sometimes a CCD camera
can be used instead (CCDs
do not have geometrical distortions) by positioning it to look perpendicularly at a perfectly circular
Picture height, as the name suggests, adjusts the height of the picture. With an improper picture height
adjustment, the circles may appear elliptical. The scanning raster is also affected (increased or decreased),
which indirectly changes the
picture’s vertical resolution.
Most of the monitors have
electron beam focus adjustment,
which is usually inside the
monitor and close to the highvoltage unit. This adjustment
controls the thickness of the
electron beam when it hits the
phosphor coating, indirectly
affecting the sharpness of the
picture. On some monitors, this
adjustment may be located at the
front of the monitor and could
also be called aperture.
Color monitors have color
A close-up of a B/W monitor section showing 2.5
and 5 MHz test signal
6. CCTV monitors
adjustment as well, which
increases or decreases the
amount of color in a color
signal. This is different from
brightness control. Color
monitors are especially
sensitive to static and other
external magnetic fields
because the color reproduction
depends very much on the
proper dynamic positioning
of the three electron beams
(red, green, and blue).
Even a slight presence of
another magnetic field, such
as a loudspeaker next to the
CRT, may affect one of the
A special high-resolution sweep generator is used to
beams more than the other
check the bandwidth response of various monitors.
two. This then results in
unnatural colored spots in
certain areas of the screen that are close to the magnetic field. In order to combat such effects, color
TV monitors have an additional element in their design that is called a degaussing coil. The degaussing
coil is a conductor loop around the CRT, through which every time the monitor is turned on a strong
current pulse is injected. This creates a short but strong electromagnetic pulse that clears any residual
magnetic fields. If the external field is very strong and permanent, the degaussing coil might not be
capable of clearing it.
Professional monitors
(designed for the broadcast
industry) are quite often used
in bigger and better CCTV
systems. They are equipped
with sophisticated electronics
and high-resolution CRTs
whose horizontal resolution
exceeds 600 TV lines.
They quite often have some
additional adjustments along
with the ones mentioned
above. These may include
hue (which is actually the
color itself: red, green,
orange, etc.); saturation
(representing the purity of
A close-up of the dot-pitch of a delta type color CRT,
compared with the ruler below in millimeters.
6. CCTV monitors
the color, i.e., how much white is mixed into it, where 100% saturated color has no white additives);
H-V delay (a very useful feature where horizontal and vertical syncs are delayed so that the CRT will
show the signal broken up into four areas, similar to a quad, so that horizontal and vertical syncs can
be visually checked); and underscan (where the monitor shows 100% of the video signal, which is
especially important when testing camera resolution).
Impedance switch
At the back of most CCTV monitors is an impedance switch next
to two BNC connectors. The purpose of the impedance switch is to
allow for either terminating the video coaxial cable with 75 Ω (when
the monitor is the last element) or leaving it to high position if the
monitor is not the last component in the video signal path.
As we have already discussed, the video sources used in CCTV are
all designed to have 75 Ω output impedance, which requires the same
impedance from the signal receivers (monitors in this case).
Only then, we will have 100% energy transfer and perfect picture
reproduction. If however, the monitor is not
the last element in the signal path, but perhaps
another monitor is using the same signal, we then
have to set the impedance of the first monitor to
high (looping monitor) and set 75 Ω on the last
one (terminating monitor).
Most CCTV monitors have passive video inputs.
There are monitors and other devices, such as
VCRs, video printers, and video distribution
amplifiers, where the video input is active.
Active means the video signal is going through
an amplifier stage and the signal is split into
two or more components that are electronically
matched with their impedances. In such cases, The manual impedance switch is usually at
we do not have any switches to switch because
the back of monitors.
there is no need for them. In other
words, do not be confused if you
cannot see an impedance switch on
some professional monitors or VCRs.
This will simply mean that the video
input is automatically terminated with
75 Ω, and the output of it should be
treated as a new signal coming out If a monitor does not have a manual switch, it means
the electronics inside automatically terminate it.
of a camera.
6. CCTV monitors
Viewing conditions
In a CCTV system the number of monitors can be quite large. It is very important to know how many
monitors can be used in a place without going overboard, as well as how to position them and what will
be the correct viewing distance for the users. Even with one monitor in the system, operators should be
aware of certain facts and recommendations, especially when they are spending the majority of their
time in front of the monitors.
The CCIR Recommendation 500 (now called ITU) states that the preferred viewing conditions are
affected by the field frequency of the TV system, the size of the screen and the distance, relative to
the screen size.
Typically, for CCTV monitors, an optimum distance is around seven times the screen height. These
recommendations are based on the practical resolution limits that the human eye has. In other words,
these distances give the viewer the best detail for the given resolution (assuming of course 20/20 vision).
This is explained in more detail in Chapter 9.
The table on the next page shows recommendations only, and one should be flexible when applying
these recommendations in various circumstances, especially in considering the flicker effect in control
rooms with a large number of CRT monitors. With big systems, where perhaps a dozen monitors need
to be mounted in front of the operator(s), viewing distances may vary. It is also very important to plan
and suggest the number of operators needed for a given number of monitors and control points.
Typical distances for optimum viewing of details
6. CCTV monitors
It is a known fact that the vertical flicker is noticeable with the peripheral vision of the eye. In other
words, if you have many monitors to view, the vertical refresh rate of the surrounding monitors is
affecting your vision even though you may be watching a monitor directly in front of you with easy
comfort. For this reason, some manufacturers are now coming up with 100 Hz monitors for CCTV (this
is more critical with PAL and SECAM because of their lower vertical frequency). The 100 Hz monitors
simply double up the 50 fields refresh rate, and the display looks rock-steady. Sitting in front of such
monitors for a longer period is a definite advantage, and I would suggest using such monitors where
the display has to be of a bigger size. The bigger the monitor screen, the more noticeable the flicker.
Another important consideration is the electrostatic radiation of larger monitors. Although this radiation
is negligible, when the walls of monitors are in one room they may have a significant influence on
the environment, as can usually be confirmed by the amount of dust collected by such a large number
of monitors. There is a low radiation standard accepted in the medical science called MPR II. This
standard is also being accepted by some CCTV manufacturers and would clearly give an advantage to
systems designed with such monitors.
With large systems, visual display management is
of vital importance. For example, not all monitors
need to display images all the time. It may be much
more effective if the operator is concentrating on one
or two active monitors (usually larger sized) and the
rest of them are blank. In case of activity (i.e., an
alarm activation, motion detection, or perhaps video
fail detection), a blank monitor can be programmed
to bring the image of a preprogrammed camera. In
such a case, the operator’s attention is immediately
drawn to the new image and the system becomes
more efficient. As an additional bonus, the monitor’s
lifetime will also be prolonged. Most of the video
matrix switchers can be programmed to do such
blanking and display alarmed cameras only when
Another subject relating to viewing conditions is
the size of the monitor and its effect on the picture
6. CCTV monitors
Designing an efficient control room depends on the available space, the number of
operators, and the size of the system (number of cameras to be viewed).
resolution. Clearly, whether a 9", 12", or 17" monitor is used, the resolution would still be more or less
the same (assuming that the same quality of electronics is inside). The impression of picture sharpness
however, may be different. So, a 9'' monitor will be quite okay when a single operator views it at about
1 meter distance. But if a 17'' monitor is
viewed from the same distance (and this
is usually for the reasons of viewing a
quad picture, for example), when a full
screen is displayed it will appear that the
picture resolution is lower than with the 9"
monitor. This is only an illusion attributed
to the different viewing distance relative
to the monitor’s size.
Other types of monitors that have to be
considered in CCTV today are the rearprojection monitors, the screen projectors,
the LCD, and the plasma monitors. Each
of them has certain advantages, life
expectancy, and specifics in regards to
how they are installed.
A central projector monitor is activated only
when an alarm is triggered, attracting the
operator’s attention.
6. CCTV monitors
The CRT phosphor does not have a linear
characteristic. This means that if a linear signal is
displayed (continuous rising ramp from black (0
V) to white (0.7 V)), it will not have the same rate
of luminance rise. The monochrome monitor’s
characteristic of the electron beam current versus
luminance produced by the beam is a parabolic
function with a power exponent of 2.2.
Ideally, we would like to have a linear CCTV
system. This would mean linear reproduction
of the gray levels and colors. But since the CRT
phosphor coating is, naturally, without a linear
characteristic, we have to somehow compensate
for it. This compensation is easiest done at the
camera end. If the CCD camera’s characteristic
luminance-versus-voltage (usually linear) is
electronically modified to have an inverse
characteristic of the CRT (1/2.2 = 0.45), we will
get a linear camera-monitor system.
When these two curves (of the monitor and camera) are put together on a single diagram, they are
symmetrical around a straight line of 45°. This resembles the mathematical symbol γ (gamma), hence
the name.
In practice, if you have a camera with a gamma setting that is not complementary to the monitor’s
characteristic, the picture quality will not be as good. This is reflected in the unnatural reproduction of
gray levels where the picture has high contrast, lacking details in the middle gray range.
Most B/W monitors have a 2.2 gamma value; therefore, 0.45 should be the common default setting for
a B/W camera. Naturally, CCD cameras have a linear gamma value (1).
Color monitors are especially sensitive to the gamma effect, and as mentioned earlier in Chapter 4, the
NTSC and PAL systems are designed with two different assumptions for the color phosphors gamma
values. Theoretically, as assumed in NTSC, gamma should be 2.2, but in practice most of the phosphor
coatings are close to 2.8, as proposed by PAL. Higher values of gamma have the appearance of higher
contrast images. Clearly, this depends not only on the standard (NTSC or PAL) but also on the type of
phosphor coating inside the monitor’s CRT.
Today, gamma is an even more sensitive issue, especially when using just standard computer monitors
for displaying digital video recording material. It should be known that various operating systems,
video drivers, and programs control gamma differently.
6. CCTV monitors
LCD monitors
LCD stands for liquid crystal display, which refers to organic substances
that reflect light when voltage is applied. LCD technology was
introduced back in 1970, but it was long inferior in its image quality.
The liquid crystal display consists of a liquid suspension between two
glass or plastic panels. Crystals in this suspension are naturally aligned
parallel with one another, allowing light to pass through the panel.
When electric current is applied, the crystals change orientation
and block light instead of allowing it to pass through, turning the
crystal region dark.
The concept of LCD operation is quite different to the CRT principles. LCD monitors are small
Perhaps, the best description, or analogy, would be that LCD monitors
and elegant.
compared to CRT monitors are what CCDs are to tube cameras. The
image is not formed by electron beam scanning, but by addressing liquid crystal cells, which are polarized
in different directions when voltage is applied to their electrodes. The amount of voltage determines
the angle of polarization, which as a result determines the transparency of each pixel, thus forming an
element of the video picture. The early version of liquid crystals was unstable and unsuitable for mass
production. Today, things have changed. LCD monitors are also known as flat-panel, dual-scan, active
matrix, and thin film transistor (TFT).
The advantages of the LCD includes the following: no need for high-voltage elements; no phosphor
layer wear (i.e., unlimited lifetime of the screen); a flat and miniature appearance; no geometrical
distortions; low power consumption; and no effect from electromagnetic fields as is the case with CRTs.
In addition, LCD prices are going down, while providing better brightness and sharpness. This is the
main reason consumers and end-users are switching from the conventional CRT to LCD. Previous LCD
technologies were slower and less efficient, and provided lower contrast.
Although there are basically two kinds of LCD – DSTN (dual-scan twisted nematic, also known as
passive) and TFT (thin film transistor, also known as active matrix) – today we almost exclusively
use TFT-based panels.
LCD consists of several layers that are arranged in
the following order: polarizing filter, glass, electrode,
alignment layer, liquid crystals, alignment layer,
electrode, glass, and polarizing filter. The cross
section of the TFT LCD panel looks like a multilayer
sandwich. At the outermost layer on either side are
clear glass substrates. Between the substrates are the
thin film transistor, color filter panel that provides
the necessary red, blue, and green primary colors,
and the liquid crystal layer. At the back of the LCD
is a fluorescent backlight that illuminates the screen
from behind. Under normal conditions when there is
Courtesy of
The LCD concept
6. CCTV monitors
no electrical charge, the liquid crystals are in an amorphous state, in which, the liquid crystal passes
through. By subjecting the liquid crystal layer to varying amounts of electrical charges, the liquid crystal
layer will allow different amounts of light to pass through, as they orientate themselves according to
the control center for the liquid crystals.
Just as in an ordinary CRT, the red, green, and blue liquid crystal “chambers” make up one pixel
(picture element). By subjecting the red, green, and blue chambers to varying degrees of electrical
charges, different colors can be achieved. As in the CRTs, we have a certain size of the LCD pixels
which defines how many lines can be seen on screen. The typical LCD pixel size is around 0.28 mm,
which is sufficient to produce a 1024 x 768 resolution screen on a 14" notebook computer screen.
One other important parameters of the LCD screen is the pixel response time. The shorter the better,
but it is known that pixel response time is not as quick as electron CRTs, which is one of the reasons
LCD monitors require lower video frequency to produce a stable image.
Also, the viewing angle is an important parameter of an LCD screen. These days it is not uncommon
to have one greater than 120º.
And finally, one of the weakest parameters in LCDs
(compared to CRTs) is the contrast ratio. In CRTs it is
easier to produce higher contrast, even though the total
brightness of a CRT may be lower than in LCD screens,
simply because the electron gun can be completely
shut down if the image needs to display black. In LCD
monitors, the backlight is always on, and no matter how
good the LCD pixels are in blocking light, when black
needs to represented, there is a certain amount that still
comes through. For a good LCD display, a typical contrast
ratio should be at least 400 to 1.
The largest of them all: Apple’s
30" screen with astounding
2560 x 1600 pixels
On the positive side, LCDs do not have the CRT’s geometric,
convergence, or focus problems, and their clarity makes it
easier to view higher resolutions at smaller screen sizes.
Also, the latest LCD monitors are all digital, unlike CRTs. This means that graphics cards with digital
outputs do not have to convert the graphics information into analog form as they would with a typical
monitor. Theoretically, this makes for more accurate color information and pixel placement. In contrast,
LCDs that plug into standard analog VGA ports actually have to perform a second conversion back to
digital (because LCD panels are digital devices), which can result in distracting artifacts.
Because of the precise and discrete nature of each pixel element in LCD screens, the sharpest
and best image appearance is achieved when the video card resolution is made to match
the native resolution of the LCD screen. In other words, if the LCD screen is 1280 × 1024
pixels, the video card setting on the computer should be set to this mode, not lower or higher.
Some LCD monitors have composite video input, Y/C, and high-resolution RGB (computer) inputs.
When such a monitor is used as a composite, the LCD monitor electronics does oversampling in order
to fit the composite video into an XGA resolution, for example.
6. CCTV monitors
The following are widely accepted computer screen standards in pixels, which we may come across
in CCTV:
VGA: 640 × 480
SVGA: 800 × 600
XGA: 1024 × 768
SXGA: 1400 × 1050
UXGA: 1600 × 1200
WSGA: 1640 × 1024
WUXGA: 1920 × 1280
Apple 30": 2560 × 1600
Projectors and projection monitors
Although CRT monitors are the most widely used, they can only be so big, for their physical size is
limited primarily by the high voltage required to accelerate electrons over their size. The largest CRTs
used in CCTV are hardly bigger than 68 cm (27''). But there are other ways of producing a larger
picture, and this is usually by projection methods. Some years ago, projection monitors were extremely
big, expensive, and complicated for use and setup. They would
usually consist of three separate optical systems, each projecting
its own primary color. Today, video projectors are much smaller,
cheaper, brighter, and easier to use and set up. In most cases, they
would accept a range of video inputs such as composite video,
RGB (or component) video, Y/C, computer video S-VGA, and
the like. Most of the projectors are single-lens color projectors
that filter the light through an LCD film.
One of the biggest advantages of projectors is their ability to produce almost any size image required,
depending upon the wall or screen size. They may not have the same brightness as CRT, but technology
advances very rapidly, and brighter and brighter projectors are on the market.
The brightness is usually expressed in lumens, and a typical LCD projector would have over 1500
lumens, which is sufficient for even brighter rooms. Their resolution is increased, and with the advances
of the LCD technology we can get projectors with resolution well over XGA and up to SXGA mode,
more than sufficient for high-quality computer presentations, broadcast video, and certainly CCTV.
There are two main technologies here, both of which use a strong source of light, the LCD and DLP
(digital light processing) projectors. The DLP technology offers brighter and sharper images, but
6. CCTV monitors
certainly LCD technology gets better too.
The DLP idea was developed by Texas Instruments™
and is based on the micro-mirror device technology.
This is basically a memory chip with a matrix of
millions of microminiature mirrors (similar size and
appearance as CCD chips). A light source projects
an image to the DLP chip so as to have the mirrors
reflect the image onto virtually any size screen. The
size of each mirror is 26 millionths of a millimeter.
The mirrors are so small that a grain of salt could
obscure hundreds of them. Each mirror represents
a screen pixel. All are controlled and switched on
and off by the on-chip circuitry, and every one of
the hundreds of switches per second is performed
with great precision and accuracy. The mirrors are
programmed to remain at designated reflective angles
for various time periods within a single frame of
motion. This permits gray-scale projection or correct
color presentation. For color projection the light is
beamed through a condenser lens and then through a
red, green, and blue color sequential filter. The filter switching is synchronized to the video information
fed to the DLP chip at a rate three times that of the video (which results in 150 Hz switching for PAL
and 180 Hz for NTSC signals).
Filtered light is then projected onto the DLP integrated circuit, whose mirrors are switched on or
off according to the digital video information written into the chip’s memory circuits. Light shined
on these mirrors is then reflected into a lens to
project images from the DLP surface. The full
color digitized video image created on the DLP
displays onto either a front or rear projection
screen. Depending on whether one or three DLPs
are used, high brightness projection screen size can
range anywhere from 1.5 m to 5 m (diagonally).
By applying a zoom lens projection, the image
size can be increased or decreased to virtually any
screen size. But the most important benefits (apart
from the miniature physical size itself) include
equal high resolution, brightness, and color fidelity,
regardless of the screen size.
Because of the individual digital processing of
each DLP pixel, this technology and these types
of projectors are also known as digital light
processing (DLP) projectors.
The heart of the DLP projectors,
Texas Instruments’ patented digital
light processing chip
6. CCTV monitors
Plasma display monitors
Some scientists refer to plasma as the fourth state of matter (the first three being solid, gas, and liquid).
Often plasma is defined as an ionized gas. The theory of plasma is beyond the scope of this book, but
we would like to mention the usage of plasma in display monitors.
These monitors are made of an array of
pixels, each composed of three phosphor
subpixels – red, green and blue. As
opposed to CRTs where light radiation
was caused by electron bombardment,
in the plasma displays, gas in plasma
state is used to react with phosphors in
each subpixel. In plasma displays each
subpixel is individually controlled in
order to get 16.7 million colors.
Because of the fact that each pixel
Photo courtesy of Fujitsu
is excited with the plasma process
Plasma display
individually there is no geometrical
distortion as is the case in CRTs, and the picture sharpness and color richness are brought up to new
heights. The picture contrast is also high, typically over 400:1, making the plasma displays suitable
for bright areas.
Since the plasma display does not require high voltage as is the case with CRT, larger displays are
possible. Typical plasma display sizes are from 105 cm (42'') up to 125 cm (50''). More importantly,
however, the thickness of plasma displays is minimal, ranging from 10 to 15 cm (4–6''). This is especially
6. CCTV monitors
attractive for aesthetic reasons, but also for rooms with limited space.
Since plasma displays are based on phosphor coating, they also fade with time. Manufacturers usually
claim 30,000 hours for the brightness to get reduced to 50% of its original quantity. This is equivalent
to about three years of constant operation, which is more or less the same as what the CRT monitors
are quoted as.
Field emission technology displays
Another alternative for better image displays, but on a standard screen size and not a projection screen,
was presented by Motorola™ about five years ago. The concept is a flat display with active light
emission called field emission device (FED) technology. Instead of a single cathode ray source, as is
the case with a standard CRT display, the FEDs rely on hundreds of little cathode ray sources for each
pixel. FEDs are composed of two sheets of glass separated by a vacuum. The back glass, or cathode,
is made up of millions of tiny tips that form the source of electrons that accelerate across the vacuum.
The front glass, or anode, has layers of standard CRT phosphors.
The FED display, it is claimed, offers
many of the anode glass benefits of a
CRT display, but it is thinner, lighter,
uses less power, and has no geometrical
distortions. The addressable x-y emitter
layout eliminates the nonlinearity and
pincushion effects associated with
standard CRT images. The companies
developing the FED are claiming that
these types of display devices will be
cheaper and easier to manufacture
than LCDs, and considering there is
no need for a single RGB gun that
dictates the equivalent CRT size and
appearance, the FED display will be
larger, yet thinner and lighter.
6. CCTV monitors
7. Video processing equipment
Only very small CCTV systems use the simple camera-monitor concept. Most of the bigger ones, in
one way or another, use video switching or processing equipment before the signal is displayed on a
monitor. With the introduction of digital video recorders, these functions are slowly starting to be done
digitally, but in this chapter we are going to cover the good old analog switching equipment.
The term video processing equipment, as used here, refers to any electronic device that processes the
video signal in one way or another, such as switching between multiple video inputs, compression into
one quadrant of the screen, and boosting of the higher frequencies.
Analog switching equipment
The simplest and most common device found in small to medium-sized CCTV systems is the video
sequential switcher. As the name suggests, they switch multiple video signals onto one or two video
outputs sequentially, one after another.
Video sequential switchers
Since in the majority of CCTV systems we have more cameras than monitors on which to view them,
there is a need for a device that will sequentially switch from one camera signal to another. This device
is called a video sequential switcher.
Sequential switchers come in all flavors. The simplest one is the 4-way switcher; then we have 6-way,
8-way, 12-way, 16-way, and sometimes 20-way switchers. Other numbers of inputs are not excluded,
although they are rare.
The switcher’s front panel usually features
a set of buttons for each input, and besides
the switch position for manual selection
of cameras, there is a switch position for
including a camera in the sequence or
bypassing it. When a sequence is started,
the dwell time can be changed, usually by a
potentiometer. The most common and practical
setting for a dwell time is 2 to 3 seconds. A
shorter scanning time is too impractical
and eye-disturbing for the operator, while a
longer scanning time may result in the loss
of information for the nondisplayed cameras.
So, in a way, sequential switchers are always
a compromise.
7. Video processing equipment
Apart from the number of video inputs, sequential switchers can be divided into switchers with and
without alarm inputs.
When a sequential switcher has alarm inputs, it
means that external normally opened (N/O) or normally closed (N/C) voltage-free contacts can halt
the scanning and display the alarmed video signal.
Various sources can be used as alarm devices. For
indoor applications the choice of suitable sensors is
often straightforward, but outdoor alarm sensors are
more critical and harder to select. There is no perfect
sensor for all applications. The range of site layouts
and environmental conditions can vary enormously.
The best help you can get in selecting a sensor is from
a specialized supplier that has both the knowledge
and the experience.
Most common are the passive infrared (PIR) detectors, door reed switches, PE beams, and video
motion detectors (VMDs). Care should be taken, when designing such systems, about the switcher
activity after the alarm goes off – that is, how long the alarmed video input remains displayed; whether
it requires manual or automatic reset; if the latter, how many seconds the automatic reset activates for;
what happens when a number of alarms activate simultaneously; and so on. The answers to all of these
questions are often decisive for the system’s efficiency and operation. There is no common answer and
it should be checked with the manufacturer’s specifications. Even better, test it yourself.
It is not a rule, but quite often simple sequential switchers (i.e., those without alarm inputs) have only
one video output. The alarming sequential switchers on the other hand, quite often have two video
outputs: one for video sequencing and the other for the alarmed picture. The first output is the one that
scans through all the cameras, while the second one is often called the alarmed or spot output because
it displays the alarmed picture (when the alarm activates).
Video sequential switchers (or just switchers for short) are the cheapest thing that comes between
multiple cameras and a video monitor. This does not mean that more sophisticated sequential switchers
are not available. There are models with text insertion (camera identification, time, and date) multiple
configuration options via RS-232, RS-485, or RS-422 communications, and so on.
Some models like these either have the power down coaxial cable function, or they send synchronization
pulses to the camera via the same cable that brings video signals to the switcher. All this is with the
intention of synchronizing cameras, which will be discussed next. Most of these more sophisticated
sequential switchers can easily be expanded to the size of a miniature matrix switcher.
One of the more important aspects of switchers, regardless of how many inputs they have, is the
switching technique used. Namely, when more than one camera signal is brought to the switcher inputs,
7. Video processing equipment
it is natural to have them with various video signal phases. This is a result of the fact that every camera
is, in a way, a self-contained oscillator producing the line frequency of the corresponding TV system
(i.e., for CCIR 625 × 25 = 15,625 Hz and for EIA 525 × 30 = 15,750 Hz) and it is hard to imagine that
half a dozen cameras could have a coincidental phase. This is unlikely even for only two cameras. We
call such random phase signals nonsynchronized. When nonsynchronized signals are switched through
a sequential switcher, an unwanted effect appears on the monitor screen: picture-roll. A picture-roll
appears owing to the discrepancies in the vertical synchronization pulses at various cameras that results
in an eye-disturbing picture-roll when the switcher switches from one camera to another. The pictureroll is even more obvious when recording the switched output to a VCR. The roll is more visible with
the VCR because the VCR’s head needs to mechanically synchronize to the different cameras’ vertical
sync pulses, while the monitor does it electronically. The only way to successfully combat the rolling
effect is to synchronize the sources (i.e., the cameras).
The most proper way of synchronizing cameras is by use of an external sync generator (sync-gen, for
short). In such a case, cameras with an external sync input have to be used (please note: not every camera
can accept external sync). Various cameras have various sync inputs, but the most common are:
• Horizontal sync pulses (usually known as horizontal drive pulses or HD)
• Vertical sync pulses (usually referred to as vertical drive pulses or VD)
• Composite sync pulses (which include both HD and VD in one signal, usually referred to as
composite video sync or CVS)
In order to perform the synchronization, an extra coaxial cable has to be used between the camera and
the sync-gen (besides the one for video transmission) and the sync-gen has to have as many outputs
as there are cameras in use.
This is clearly a very
expensive exercise, although,
theoretically, it is the most
proper way to synchronize.
Some camera manufacturers
produce models where sync
pulses are sent from the
switcher to the camera via
the same coax that sends the
video signal back. The only
problem here is the need to
have all the equipment of the
same make.
There are cheaper ways
to resolve the picture-roll
problems and one of the most
accepted is through line-
A line-locked camera (24 V AC) which also has an external
vertical sync input terminal
7. Video processing equipment
locked cameras. Line-locked cameras are either 24 V AC or 240 V AC (110 V AC for the United States,
Canada, and Japan) powered cameras. The 50 Hz (60 for the United States, Canada, and Japan) mains
frequency is the same as the vertical sync rate, so these cameras (line-locked) are made to pick up the
zero crossings of the mains sine wave and the vertical syncs are phased with the mains frequency. If
all of the cameras in a system are powered from the same source (the same phase is required), then all
of the cameras will be locked to the mains and thus synchronized to each other.
The above method is the cheapest one, although it sometimes offers instability of the mains phase owing
to heavy industrial loads that are turned off and on at unpredictable intervals. Still, it is the easiest way.
There is even a solution for different phases powering different cameras in the form of the so-called
V-phase adjustment. This is a potentiometer on the camera body that will allow the camera electronics
to cope with up to 120° phase difference. It should be noted that the low-voltage AC-powered cameras
(i.e., the 24 V AC) are more popular and more practical than the high voltage ones primarily because
they are safer.
Some cameras are designed to accept the video signal of the previous camera and lock to it. This is
called master-slave camera synchronization. By daisy-chaining all of the cameras in such a system,
synchronization can be achieved, where one is the master camera and the others are slave cameras. A
coaxial cable is required between all of the cameras for this purpose, in addition to the coax for video
Still, not every sequential switcher can use the benefits of synchronized cameras. The switcher also
needs to be a vertical interval switcher. Only vertical interval switchers can switch synchronized signals
at the moment of the vertical sync pulse so that the switching is smooth and without roll. Nonvertical
interval switchers switch on a random basis rather than at a specific moment relative to the video signals.
With the vertical interval switcher, when a dwell time is adjusted to a particular value, the switcher
switches with this specific dwell time, but only when the vertical sync period occurs. By doing so,
the switching is nice and clean and happens in the vertical blanking period; that is, there is no
picture break on the monitor screen.
Normal switchers, without this design, will
switch anywhere in the picture duration; this
means it could be in the middle of a picture field.
So if the cameras are synchronized, there will
be no picture-roll, but picture breaking will still
be visible to the operator owing to the abrupt
transition from one signal to another in the
middle of the visible picture field.
The same concept of vertical interval switching
applies to the sequential switcher’s big brother,
the video matrix switcher.
Vertical sync detail
7. Video processing equipment
Video matrix switchers (VMSs)
The video matrix switcher, as we have noted, is the big brother of the sequential switcher. The bigger
CCTV systems can only be designed with a video matrix switcher (VMS) as the brain of the system.
The name “matrix switcher” comes
from the fact that the number of video
inputs plotted against the number of
video outputs makes a matrix, as it is
known in mathematics. Quite often,
video matrix switchers are called video
cross-point switchers. These cross-points
are actually electronic switches that
select any video input onto any video
output at any one time, preserving the
video impedance matching. Thus, one
video signal can simultaneously be
selected on more than one output. Also,
more video inputs can be selected on one
output; only in this case we would have
a sequential switching between more
inputs, since it is not possible to have
more than one video signal on one output
at any single point in time.
VMSs are, in essence, big sequential switchers with a number of advancements:
• A VMS can have more than one operator. Remember that the sequential switchers usually
have buttons at the front of the unit. Thus, only one operator can effectively control the system
at any one time. Matrix switchers can have up to a dozen operators, sometimes even more, all
of whom can concurrently control the system. In such a case, every operator controls (usually)
one video output channel. A certain intelligent control can be achieved, depending on the VMS
in use. Different operators may have equal or different priorities, depending on their position in
the security structure of the system.
• The VMS accepts many more video inputs and accommodates for more outputs, as already
mentioned, and more importantly, these numbers can easily be expanded at a later date by just
adding modules.
• The VMS has pan, tilt, and lens digital controllers (usually referred to as PTZ controllers). The
keyboard usually has an integral joystick, or buttons, as control inputs and at the camera end,
there is a so-called PTZ site driver (sometimes called PTZ decoder) within a box that is actually
part of the VMS. The PTZ site driver talks and listens to the matrix in digital language and drives
the pan/tilt head together with the zoom lens and perhaps some other auxiliary device (such as
wash/wipe assembly).
7. Video processing equipment
• The VMS generates camera identification, time and date, operator(s) using the system, alarm
messages, and similar on screen information, superimposed on the video signal.
• The VMS has plenty of alarm inputs and outputs and can be expanded to virtually any number
required. Usually, any combination of alarms, such as N/O, N/C, and logical combinations of
them (OR, NOR, AND, NAND), is possible.
• In order for the matrix switchers to perform the very complex task of managing the video
and alarm signals, a microprocessor is used as the brain. With the ever-increasing demand for
power and processing capacity, microprocessors are becoming cheaper and yet more powerful.
These days, full-blown PCs perform these complex processes. As a consequence, a VMS setup
becomes programming in itself, complex but with immense power and flexibility, offering
password protection for high security, data logging, system testing, and reconfiguring via modem
or network. The latest trend is in the form of the graphical user interface (GUI), using popular
operating systems, with touch-sensitive screens, graphical site layout representation that can be
changed as the site changes, and much more.
• The VMSs might be very complex for the system designer or commissioner, but they are very
simple and user friendly for the operator and, more importantly, faster in emergency response.
There are only a handful of manufacturers of VMS in the world, the majority of which come from the
United States, England, Denmark, Germany, Japan, and Australia. Many of them have stayed with
the traditional concept of cross-point switching and a little bit of programmability, usually stored in
a battery-backed EPROM. Earlier concepts with battery-backed EPROMs, without recharging, could
only last a few weeks. But many have accepted clever and flexible programming, with the system
configuration stored on floppy disks or hard drives, preventing loss of data even if the system is without
power for more than a couple of months.
The Maxpro video matrix at Sydney's Star City Casino handles over 1000 cameras and
over 800 VCRs.
7. Video processing equipment
The demand for compatibility has forced many systems to become PC based, making the operation
familiar to the majority of users and at the same time, offering compatibility with many other programs
and operating systems that may work in conjunction.
The large Plettac CCTV matrix at Frankfurt Airport
The new designs of matrix switchers take almost every practical detail into account. First of all,
configuring a new system, or even reconfiguring an old one, is as easy as entering details through a
setup menu. This is, however, protected with high levels of security, which allows only authorized
people who know the appropriate access code and procedures to play around with the setup.
Next, the VMS has become so intelligent and powerful that controlling other complex devices has
become possible. These include lights in buildings, air conditioning, door access control, boom gates
in car parks, power, and other regular operations performed at a certain time of the day or at certain
detectable causes.
Unfortunately, there is no standard design or language
for configuring and programming matrix switchers.
Different manufacturers use different concepts and
ideas, so it is very important to choose a proven expert
for a particular system.
Matrix switchers usually come with their basic
configurations of 16 or 32 video inputs and 2 or
4 video outputs. Other combinations of numbers
are possible, but the above mentioned are the most
common ones. Many of them come with a certain
number of alarm inputs and outputs. Almost all of
them, in their basic configuration, have a text insertion
feature incorporated and a keyboard for control. A
An intelligent, ergonomic, and
reconfigurable matrix keyboard
7. Video processing equipment
basic operator’s manual and other technical information
should be part of the switcher.
Most suppliers need a separate notice to incorporate PTZ
control modules, because in many systems only fixed cameras
are used and PTZ control is not considered a must. Some
makes, however, may include PTZ control as standard.
The latter does not mean PTZ site drivers will be a compulsory
part of the VMS. Since the number of PTZs may vary from
system to system, it is expected that the number will be
specified when ordering. How many you can actually use in a
system depends on the make and model. In most cases, VMS
use digital control which has a limited number of sites it can
address. This number depends on the controlling distances
as well and it can be anything between 1 and 32 PTZ sites.
For a higher number of sites, additional PTZ control modules
need to be used.
I will repeat again that, until now, there has been no
compatibility between products of different manufacturers, so
Some larger matrices by Pacific you cannot use, for example, a matrix switcher of one brand
Communications come neatly
and PTZ site drivers of another. In most cases, when a CCTV
system with a matrix switcher needs to be upgraded, you need
to replace the whole system, with the exception of the cameras,
lenses, monitors, and cables. It is fair to say, however, that in the period between the last edition of the book
and the present one, an increased number of matrix manufacturers have produced multifunctional driver
boards so that you can control
at least a couple of different
brands. Furthermore,
protocol converter boxes
are now available that will
allow the users, if they
know the protocol of the
PTZ camera and the matrix
switcher, to have them talk
to each other.
Small systems with up
to 32 cameras can easily
be configured, but when
more inputs and outputs
are required from a matrix
switcher it is better to
talk to the manufacturer’s
7. Video processing equipment
representative and work out exactly what modules are required. This selection can make a big difference
between an affordable and an expensive system, as well as between a functional and nonfunctional
Because of their capability and potential to do many things other than just video switching, video
matrix switchers are often referred to as CCTV management systems. This is not to say that VMS can
also perform quad processing or multiplexing of signals. Quad compressors and multiplexers would
still be required in addition, if such functions are to be performed.
Switching and processing equipment
Quad compressors
Because of the sequential switchers’ inability to view all of the cameras simultaneously and other
synchronization worries, the CCTV designers had to come out with a new device called a quad
compressor, which is sometimes known as a quad splitter.
Quad compressors, as the name suggests, put up to four cameras on a single screen by dividing the screen
into four quadrants (hence the name “quad”). In order to do that, video signals are first digitized and
then compressed to corresponding quadrants. The quad’s electronics does the time base correction, which
means all of the signals are synchronized, so when the resultant video signal is produced all of the four
quadrants are actually residing on one signal and there is no need for external synchronization.
Quad compressors are digital image processing devices with analog input and output.
As with any digital image processing device, we should know a few things that define the system
quality: framestore resolution expressed by the number of pixels (horizontal × vertical) and the image
processing speed.
The typical framestore capacities found in today’s
quads are 512 × 512 or 1024 × 1024 pixels. The first
one is fine compared to the camera resolution, but
do not forget that we split these 512 × 512 into four
images; hence every quadrant will have 256 × 256
pixel resolution, which might only be acceptable for
an average system. So, if you have a choice of quads,
you should opt for the higher framestore resolution.
Apart from this detail, every pixel stores the graylevel information (monochrome quads) and the color
information (color quads). A typical good quality B/W
quad will have 256 levels of gray, although 64 levels
are sufficient for some. However, 16 levels of gray are
too little, and the image appears too digitized. Color
7. Video processing equipment
quads of the highest quality will have over 16 million
colors, which corresponds to 256 levels of each of
the three primary colors (i.e., 2563).
The next important thing about quads is the image
processing time. In the early days of quads, the
digital electronics were not fast. Quite often you
would notice jerky movements, as the quad could
only process a few images every second. Slow
processing quads are still available today. In order
to see smooth movements we need electronics that will process every image at the vertical frequency
rate of the TV system (1/50 s or 1/60 s). Then, we will not have motion delays in the picture, and the
digitized effect will be less noticeable. We call these fast processing quads real-time quads. Real-time
and high-resolution quads are more expensive. Color quads are more expensive than monochrome
because there is a need for three frame stores for each channel (the three primary colors). If more than
four cameras are in the system, the solution can be found in the dual quads, where up to 8 cameras can
be switched onto two quad images alternating one after the other. On most of these quads the dwell
time between the two switching quads is adjustable.
Another very handy feature of most quads is the alarm input terminal. Upon receiving an alarm, the
corresponding camera is switched from quad mode to a full screen. Usually, this is a live mode; that
is, an analog signal is shown without being processed through the framestore. This full-screen alarm
activation is especially important when recording. No matter how good the quad video output may
look on the monitor, when recorded onto a VHS VCR the resolution is reduced to the limits of the
VCR. These limits are (discussed later in the VCR
section) 240 TV lines for a color signal and about
300 for B/W. When a quad picture is replayed from
the VCR, it is very hard to compare details to what
was originally seen in live mode. For this reason, a
system can be designed to activate an alarm that will
switch from quad to full-screen mode. The details
of the activity recorded can then be examined much
better. Various things can be used as an activation
device, but most often they are passive infrared detectors (PIR), infrared beams, video motion sensors,
duress buttons, and reed switches.
As with alarming sequential switchers, it should be determined what happens after alarm activation – that
is, how long the quad stays with the full image and whether or not it requires manual acknowledgment.
These are minor details, but they make a big difference in the system design and efficiency.
Sometimes a customer might be happy with quad images recorded as quad, in which case a plain quad,
without alarm inputs, might suffice.
However, when full-screen recordings are required, care should be taken in choosing quad compressors
that have zoom playback. They may appear the same as the alarming input quads, but in fact they do
7. Video processing equipment
not record full-screen images as might be expected. Rather, they electronically blow up the recorded
quadrants into a full screen. The resolution of such zoomed images is only a quarter (one-half vertical
and one-half horizontal) of what it is after being recorded.
Multiplexers (MUX)
The natural evolution of digital image processing equipment has made video multiplexers a better
alternative to quads, especially when recording. Multiplexers are devices that perform time division
multiplexing with video signals on their inputs, and they produce two kinds of video outputs: one for
viewing and one for recording.
The output for live viewing shows all of the cameras on a single screen simultaneously. This means,
if we have a 9-way multiplexer with 9 cameras, all of them will be shown in a 3 × 3 mosaic of multiimages. The same concept also applies to 4-way multiplexers and 16-way multiplexers. Usually, with
the majority of multiplexers, single full-screen cameras can also be selected. While the video output
shows these images, the multiplexer’s VCR output sends the time division multiplexed images of all
the cameras selected for recording. This time division multiplexing looks like a very fast sequential
switching, with the difference that all of these are now synchronized to be recorded on a VCR in a
sequential manner. Some manufacturers produce multiplexers that only perform fast switching (for
the purpose of recording) and full-screen images, but no mosaic display. These devices are called
frame switchers and when recording is concerned they work like multiplexers.
In order to understand this process, we should mention a few things about the VCR recording concept
(also discussed later in this book). The video recording heads (usually two of them) are located on a 62
mm rotating drum, which performs a helical scanning of the videotape that passes around the drum. The
rotation depends on the TV system: for PAL this is 25 revolutions per second and for NTSC it is 30. By
using the two heads, positioned at 180° opposite each other on the video drum, the helical scanning
can read or write 50 fields each second for PAL and 60 for NTSC. This means that every TV field
(composed of 312.5 lines for PAL and 262.5 for NTSC) is recorded in slanted tracks on the videotape
that are densely recorded next to each other. When the VCR plays back the recorded information, it does
so with the same speed as the TV standard requires, so we once again reproduce motion pictures.
Clearly, however, because the VCR heads are electromechanical devices, the rotation speed precision
is critical. Because of the electromechanical inertia, the VCRs have a longer vertical lock response
time than monitors. This is the main reason for even
bigger picture-roll problems when nonsynchronized
cameras are recorded through a sequential
With normal recordings and playback, the video
heads are constantly recording or reading field
after field after field. There are 50 (60 for NTSC)
of them every second.
7. Video processing equipment
Instead of recording one camera a few seconds, then another a few seconds, and so on (which is what
a sequential switcher produces), the multiplexer processes video signals in such a way that every next
TV field sent to the VCR is another camera (usually the next one in order of inputs).
So, in effect, we have a very fast switching signal coming out of the multiplexer that switches with the
same speed at which the recording heads are recording. This speed depends on the type of VCR and
the recording mode (as is the case with time-lapse VCRs, which will be discussed later in the book).
This is why it is very important to set the multiplexer to an output rate suitable for the particular VCR.
This selection is available on all multiplexers in their setup menu. If the particular VCR model is not
available on your multiplexer, you can either use the
generic selection, or if nothing else, use the method
of trial and error to find an equivalent VCR. The
major difference in TL VCRs is that some are field
recorders and others are frame recorders.
Apart from this output synchronization (MUXVCR), theoretically there is also the need for an
input synchronization (cameras-MUX), but because
multiplexers are digital image processing devices,
this synchronization, that is, the time base correction
(TBC) of the cameras, happens inside the multiplexer.
This means that different cameras can be mixed onto
a multiplexer and there is no need for them to be genlocked (i.e., synchronized between themselves).
7. Video processing equipment
Some multiplexer models on the market, however, are made to synchronize the cameras by sending sync
pulses via the same coaxial cable that brings the video signal back and then multiplex the synchronized
cameras. These multiplexers do not waste time on TBC and, therefore, are supposed to be faster.
When playback is needed, the VCR
video output goes first to the multiplexer,
and then the multiplexer extracts the
selected camera only and sends it to the
monitor. The multiplexer can display any
one camera in full screen, or play back
all of the recorded cameras in the mosaic
mode (multiple images on one screen).
Recording time delays
The number of shots (images) taken from every camera during the recording depends on the total
number of cameras connected to the multiplexer and the time-lapse mode of the VCR. Therefore, it is
not possible to record real-time images from all the cameras simultaneously because, as the name
suggests, this is a time division multiplexing.
There are, however, ways to improve performance by using external alarm triggers, usually with a
built-in activity detector (to be explained later) in the multiplexer. The best way, though, is to record
in as short a time-lapse mode as is practically possible and also to keep the number of cameras as low
as possible. Translated into plain language, if your customer can change tapes at least once a day, do
not use more than a 24-hr time-lapse recording mode. If the system is unattended over weekends,
then a 72-hr time-lapse mode should be selected. And, if the budget allows, instead of using a 16-way
multiplexer for more than 9 cameras, it would be better to use two 9-way (some manufacturers have
8-way and some 10-way) multiplexers and two VCRs. The recording frequency will then be doubled,
and two tapes will need to be used instead of one.
This is how you can calculate the time gaps between the subsequent shots of each camera. Let us say
we have a time-lapse VCR that records in 24-hr time-lapse mode. Earlier we stated stated normal (realtime) recording VCRs make 50 shots every second in PAL and 60 in NTSC. If you open the TL VCR
technical manual, you will find that when the VCR is in 24-hr mode it makes a shot every 0.16 s and
even if you do not have a manual with the VCR, it is easy to calculate: When a PAL VCR records in
real time, it makes a field recording every 1/50 = 0.02 s. If the TL VCR is in 24-hr time-lapse mode,
it means 24 ÷ 3 = 8 times slower recording frequency. If we multiply 0.02 with 8 we get 0.16 s. The
same exercise for NTSC VCR will obtain a field recording every 1/60 = 0.0167 s. For a 24-hr timelapse mode, when using T120 tape, 24 ÷ 2 = 12. This means that in 24-hr time-lapse mode in the NTSC
format, the TL VCR moves 12 times slower to fit 24 hr on one 2-hr tape. Thus, the update rate of each
recorded field in 24-hr mode is 12 × 0.0167 = 0.2 s.
All of these calculations refer to a single camera signal; therefore, if the multiplexer has only one camera,
it will make a shot every 0.16 s in PAL and every 0.2 s in NTSC. If more cameras are in the system,
7. Video processing equipment
in order to calculate the refresh rate of each camera, we need to multiply by the number of cameras,
plus add a fraction of the time the multiplexer spends on time base correction due to nonsynchronized
cameras (which will usually be the case). So if we have, for example, 8 cameras to record, 8 × 0.16
= 1.28 s (PAL) and 8 × 0.2 = 1.6 s (NTSC). Adding to it the time spent on sync correction and the
realistic time gaps between the subsequent shots of each camera should result in approximately 1.5
to 2 s. This is not a bad figure when considering that all 8 cameras are recorded on a single tape.
If we have to identify an important
event that happened at 3:00 P.M., for
example, we can either view all of
the cameras in a mosaic mode and
see which cameras have important
activity, or we can select each one of
them separately in full screen.
For some applications, 2 s might be
too long a time to waste; this is where
the alarm input or the motion activity
detection can be very handy. Most
of the multiplexers have alarm input
terminals and with this we can trigger
the priority encoding mode. The
priority encoding mode is when the
multiplexer encodes the alarmed camera on a priority basis. Say we have an alarm associated with
camera 3. Instead of the normal time division multiplexing of the 8 cameras in sequence 1, 2, 3, 4, 5,
6, 7, 8, 1, 2, it goes 1, 3, 2, 3, 4, 3, 5, 3, 6, 3 and so on. The time gap in such a case is prolonged for
all cameras other than 3. But since number 3 is the important camera at that point in time, the priority
encoding has made camera 3 appear with new shots every 2 × 0.16 = 0.32 s, or in practice almost 0.5 s
(due to the time base correction). This is a much better response than the previously calculated 2 s for
the plain multiplexed encoding. It should be noted, however, when more than one alarm is presented
to the multiplexer inputs the time gaps between the subsequent camera shots are prolonged, and once
we get through all of the alarmed camera inputs, we get plain multiplexed encoding.
In case a system cannot be designed to use external alarm triggers, it should be known that most of the
multiplexers have an activity motion detection built in. This is a very
handy feature in which every channel of the multiplexer analyzes the
changes in the video information in each of the framestore updates.
When there is a change in them (i.e., something is moving in the field
of view), they will set off an internal alarm, which in turn will start
the priority encoding scheme. This feature can be of great assistance
when replaying intrusions, or events, and determining the activity
Usually, the activity motion detection can be turned on or off. When
turned on, on some MUX models, it will allow you to configure the
shape of the detection area in order to suit various areas or objects.
7. Video processing equipment
Real-time time-lapse recorders have appeared on the CCTV market that might confuse the issue of
calculating the refresh rate. These are faster recording machines where the TL VCR’s mechanics is
modified so as to record 16.7 fields per second in PAL (a field every 0.06 s) for 24-hr relative to E240
tape. In the case of NTSC, around 20 fields per second (a field every 0.05 s) can be recorded for 24
hrs on a T160 tape. Understandably, to calculate the refresh rate of multiplexed cameras on such a TL
VCR, you would need to multiply the number of cameras with the above-mentioned field update.
If we wanted to be fair, this is not actually real-time recording, but it is definitely better than the ordinary
time-lapse mode. To the best of my knowledge only one CCTV manufacturer – Elbex – makes real
24-hr recordings at 50 fields per second for PAL and 60 for NTSC.
The NTSC system uses tape at a higher recording speed (2 meters/minute) than PAL or SECAM (1.42
meters/minute). To confuse this issue even more, VHS tapes are marked in playing times as opposed
to tape length. Therefore, a T120 (2-hr) tape bought in the United States is not the same as an E120
(2-hr) tape bought in the UK. The U.S. T120 tape is 246 meters in length and will give 2 hours of play
time on an NTSC VCR. This same tape used on a PAL VCR will give 2 hours and 49 minutes of play
time. Conversely, a UK E120 tape is 173 meters in length and will give 2 hours of play time on a PAL
VCR. The same tape used on an NTSC VCR will give only 1 hour and 26 minutes of play time. The
following chart compares the recording times of each tape in SECAM, NTSC and PAL.
7. Video processing equipment
Simplex and duplex multiplexers
Most multiplexers will allow you to view images of any selected camera in a mosaic mode while they
are encoding. When a recorded tape needs to be viewed, as we have already mentioned, the VCR
output does not go directly to a monitor, but it has to go through the multiplexer again in order for the
images to be decoded. While doing this, the multiplexer cannot be used for recording. So, if recording
is very important and the playback needs to be used in the meantime, another multiplexer and VCR are
required. Multiplexers that can do only one thing at a time are called simplex multiplexers.
There are also duplex multiplexers, which are actually two multiplexers in the one unit, one for
recording and one for playback. Still, two VCRs will be required if both recording and playback are
required at the same time.
Some manufacturers even make multiplexers, which they refer to as triplex. These are multiplexers
with the same functionality as the duplex ones, with the addition of displaying a mixture of live and
playback images on one monitor.
As with quad compressors, we can get B/W and color multiplexers. We also have a limited amount of
framestore resolution available. Needless to say, the bottleneck in the resolution reproduction will still
be the VCR itself. Many newer CCTV systems are being installed with Super VHS VCRs that offer an
improved resolution of 400 TV lines as opposed to 240 with the ordinary VHS format.
Multiplexers can successfully be used in applications other than just recording. This might be especially
useful if more than one video signal needs to be transmitted over a microwave link, for example. By
using two identical simplex multiplexers, one at each end of the link, we can transmit more than one
image in a time division multiplexed mode. In this instance, the speed of the refresh rate for each camera
is identical to what it would be if we were to record those cameras in real (3-hr) mode on a VCR.
Video motion detectors (VMDs)
A video motion detector (VMD) is a device that analyzes the video signal at its input and determines
whether its contents have changed. Consequently, it produces an alarm output.
With the ever-evolving image processing technology, it became possible to store and process images
in a very short period of time. If this processing time is equal to or smaller than 1/50 (PAL) or 1/60
(NTSC), which as we know, is the live video refresh rate, we can process images without losing any
fields and preserve the real-time motion appearance.
7. Video processing equipment
In the very beginning of the development of VMDs, only analog processing was possible. Those simple
VMDs are still available and perhaps still very efficient relative to their price, although they are incapable
of sophisticated analysis; therefore, high rates of false alarms are present. The principles of operation of
the analog VMDs (sometimes called video motion sensors) are very simple: a video signal taken from
a camera is fed into the VMD and then
onto a monitor, or whatever switching
device might be used. In the analyzed
video picture, little square marks (usually
four) are positioned by means of a few
potentiometers on the front of the VMD
unit. The square marks actually indicate
the sensor areas of the picture and the
video level is determined by the VMD’s
electronics. As soon as this level is
changed to a lower or higher value, by
means of someone or something entering
the field of view at the marked area,
an alarm is produced. The sensitivity
is determined by the amount of video
luminance level change required to raise
an alarm (usually 10% or more of the
peak video signal). The alarm is usually audible, and the VMD produces relay closure, which can be
used to further trigger other devices. The alarm acknowledgment can be either automatic (after a few
seconds) or manual. There is also a VMD sensitivity potentiometer on the front panel in such devices,
and with the proper adjustment it can bring satisfactory results. There will always be false alarms
activated by trees swaying in the wind, cats walking around, or light reflections, but at least the reason
for the alarm can be seen when the VCR is played back (assuming the VMD is connected to a VCR).
VMDs are often a better
solution than passive infrared
motion detectors (PIR), not
only because the cause of the
alarm can be seen, but also
because it analyzes exactly
what the camera sees, no less
and no more. When using a
PIR, its angle of coverage
has to match the camera’s
angle of view if an efficient
system is to be achieved.
When a number of cameras
are used, we cannot switch
signals through the VMD
because it will cause constant
7. Video processing equipment
alarms; therefore, one VMD is required per camera. In systems where further processing of the video
signal is done, the sensing markers can be made invisible, but they are still active.
The next step up in VMD technology is the digital video motion detector (DVMD), which is becoming
even more sophisticated and popular. This, of course, is associated with higher price, but the reliability
is also much higher and the false alarm rate is lower.
One of the major differences between various DVMD manufacturers is the software algorithm and
how motion is processed. These concepts have evolved to the stage where tree movement due to wind
can be ignored, and car movement in the picture background can also be discriminated against and
excluded from the process deciding about the alarm activity. In the last few years, DVMDs that take
the perspective into account have been developed. This means that as the objects move away from
the camera, thus getting smaller in size, the VMD sensitivity increases in order to compensate for the
object’s size reduction owing to the perspective effect. This effect, we should point out, also depends
on the lens.
Many companies now produce a cheaper alternative to a full-blown stand-alone system in a form
of PC card(s). The cards come with specialized software, and almost any PC can be used for VMD.
Furthermore, image snapshots can be stored on a hard disk and transmitted over telephone lines connected
to the PC. With many options available in the VMDs, a lot of time needs to be spent on a proper setup,
but the reward will be a much more reliable operation with fewer false alarms.
A special method of recording, called pre-alarm history, is becoming very standard in most of the VMD
devices. The idea behind this is very simple but extremely useful in CCTV. When an alarm triggers
the VMD, the device keeps a number of images recorded after the alarm occurrence, but also a few
of them before. The result
is a progressive sequence of
images showing not only the
alarm itself, but also what
preceded it.
One of the latest developments
in this area has brought to
light an Australian company,
among a few other successes,
with their (original) concept
of three-dimensional video
motion detection. This
concept offers extremely low
rates of false alarms by using
two or more cameras to view
objects from different angles.
Thus, a three dimensional
volumetric protection area
is defined, which like the
7. Video processing equipment
other VMDs is invisible to the public, but it is quite
distinguishable to the image processing electronics. With
this concept, movement in front of any of the cameras
will not trigger an alarm until the protected volume
area, as seen by both cameras, is disturbed. Using this
concept, valuable artworks in galleries, for example, can
be monitored so that the alarm does not activate every time
someone passes in front of the artwork, but only when the
artwork is removed from its position.
Quite often, alarm detection is useful not when someone
or something moves in the field of view, but rather when
something fixed is removed from its location. This can
be done with a video nonmotion detector (VNMD). This unit is very similar to the VMD, except that
additional information is collected for objects put in the field of view that are stationary for a longer
period. Movements around the selected object cause alarms only when the protected object is removed
from the stationary position.
In the last couple of years, some modern DSP cameras have offered VMD circuitry built inside the
camera itself. This could be quite practical in systems where recording and/or an alarm is initiated only
when a person or object moves inside the camera’s field of view.
All of the above-mentioned VMDs, used as an alarm output, produce a closure of relay contacts that
can trigger additional devices in the CCTV chain, like VCRs, or matrix switchers, framestores, sirens,
or similar. If you decide to use one, make sure you clarify the type of alarm output with the supplier,
for it can be anything from voltage-free N/O contacts to a logic level voltage (5 V) N/C output.
We should also mention VMDs that, apart from the motion detection, also dial remote receiving stations
and send images via telephone lines. With such devices, remote monitoring is possible from virtually
anywhere in the world. Images are sent to a receiver station only when the VMD detects movement,
indirectly saving on long-distance telephone calls.
Framestores are, conceptually, very simple electronic devices used to temporarily store images. Two
important parts of a framestore device are the analog-to-digital (A/D) conversion (ADC) section and
the random access memory (RAM) section. The ADC section converts the analog video signal into
digital, which is then stored in the RAM memory, for as long as it is powered.
The main advantage of the framestores compared to VCRs is their
response time. Since they do not have any mechanically moving
parts, the storage of the alarmed picture is instant on activation.
This is then fed back, usually, to a video printer or a monitor for
viewing or verification purposes.
7. Video processing equipment
More sophisticated framestores are usually designed to have a few framestore pages that constantly
store and discard a series of images using the first in first out (FIFO) principle until an alarm activates.
When that happens, it is possible to view not only the alarmed moment itself, but also a few frames
taken before the alarm event took place, thereby giving a short event history. This is the same concept
as the “pre-alarm history” used in VMDs.
Another application of the framestores is the frame locking device. This device constantly processes
video signals present at its input and also does the time base correction to be in sync with the master
clock inside. Since this processing takes place at a very fast real-time rate and the framestore has a high
resolution, there is no perceptible degradation of the video signal. This is a very practical and useful
device for showing (switching) nonsynchronized cameras on a single monitor. In such cases, the frame
lock device acts as a synchronizer (i.e., it eliminates picture-roll while cameras are scanning).
The major division of framestores, as used in CCTV, is into B/W and color. The quality of a framestore
is determined, first, with the framestore resolution, that is, the number of pixels that can be stored and
second, with the gray level’s bit resolution or, in the case of color, the number of bits used to store
colors. A typical good-quality framestore has more than 400 × 400 pixels, and the usual resolution is
752 × 480 pixels and 256 levels of gray (28). For a color framestore (with three color channels) we
would have over 16 million colors (256 × 256 × 256).
Video printers
Video printers are commonly used in larger systems where a hard-copy printout of a live or recorded image
is necessary for evaluation or evidence. There are two types of video printers: monochrome and color.
The monochrome video printers usually
use thermal paper as an output medium,
but some more expensive ones can print
on plain paper. The thermal-paper video
printers, used for monochrome signals,
are similar in operation to the facsimile
machine, and they print out images
with a size and resolution dependent on
the printer’s resolution. With thermal
printers the output is not as durable and
stable (due to thermal paper aging), and
the printouts need to be photocopied for
longer duration.
Color video printers print on special
paper, and the process of printing is
similar to dye-sublimation printing,
using cyan, magenta, yellow, and black
filters. The printing quality produced
by such technology is excellent, but the
7. Video processing equipment
number of copies that can be produced is limited; the cartridge needs to be replaced with every new
set of paper.
More sophisticated video printers have a number of controls, including titling, sharpening, duplication
into more copies, and storing the images in their framestores until a printout is necessary. In many
instances, CCTV users do not want to invest in a video printer, so often there is the need to use the
specialized services of some bureaus. The videotape is taken to them and the certain event(s) is/are
extracted and printed out.
In many instances, CCTV users do not want to invest in a video printer, so often there is the need to
use the specialized services of some bureaus. The videotape is taken to them, and the certain event(s)
is/are extracted and printed out.
8. Analog video recorders
Video recorders were very important in CCTV until a few years ago, but since the introduction of the
digital video recorder (DVR), systems sold with VCRs can be counted on one’s hands. It is fair, however,
for the sake of the good old times, to reproduce this whole chapter on VCRs in this edition of the book,
just in case you come across an existing system and you want to know a little bit more about it. Special
attention is given to time-lapse VCR technology, which was a predecessor to the DVR.
A little bit of history and the basic concept
The era of tape recording began in 1935 with the appearance of AEG’s first commercial sound tape
recorder, called the Magnetophone. The tape used was a cellulose acetate tape, coated with carbonyl
iron powder. The performance of these sound recording machines, even though it was very good for its
time, steadily improved during the 1930s and 1940s, to the point where at the end of the 1940s much
of the radio broadcast material was off tape and indistinguishable from live programs.
The basics of magnetic tape recording are familiar to most of us from the old audio cassette recorders.
An alternate current (AC) signal, passing through an audio head winding, produces alternate magnetic
flux through a magnetically permeable metal ring, called a head. In order for the magnetic flux to come
out of the ring (otherwise, the magnetic flux will stay inside the core), a little slit is made at one end
of the core. This slit will now act as an exhaust for the magnetic field that exits the core and closes it
through the air going back to the other end of the slit. But if we put a magnetic tape very close to the
head, the flux will pass through the tape itself, thus
closing the circle. The magnetic tape is a very thin
tape coated with magnetic powder, whose microscopic
particles act as little magnets. By applying an external
magnetic field, these little particles can be polarized in
various directions, depending on the current intensity
and its direction.
If the magnetic tape is stationary, no information will
be recorded, except for the last state of the magnetic
field. In order for an audio recording to be performed,
the tape needs to move at a constant speed. At what
speed, depends on the resolution, that is, the highest
frequency needed to be recorded. The faster the tape
moves and the smaller the gap in the ring is, the
higher the frequency can be recorded.
An analogy to the above would be like having a
fountain pen with a sharp tip and another one with a
calligraphic tip. With the sharper tip we can write more
8. Analog video recorders
details and smaller fonts, on the same space, than with the calligraphic tip.
This is a simplified description of how audio recording is done. In real life, the audio signal is not
recorded directly as it is, rather it is amplitude modulated with a sine wave. It has been found that the
linearity of the recorded signals is then better. The tape speed, in the case of an audio cassette, was
chosen to be 4.75 cm/s. So, a half-hour recording made on one side of a C-60 cassette will take about
86 m (4.75 × 60 × 30 = 8550 cm) of tape. With a good-quality tape an audio bandwidth of approximately
50 Hz to 15,000 Hz can be recorded with a clean head. With such audio characteristics, the recording
is not impressive when compared to today’s digital CD standards. Obviously, with bigger audio tape
recorders (reel-to-reel) and a quadrupled tape speed of 19 cm/s, the recorded and reproduced bandwidth
is much better.
A similar concept to audio tape recording was initially tried on video signals, back in the 1950s, when
strange machines were designed with tape speeds close to 1000 cm/s and extraordinarily big reels.
The theory behind the tape recording showed that in order to record a monochrome video signal with
a bandwidth of only 3 MHz (for a reasonable picture quality, as opposed to only 15 kHz in audio), a
tape speed of around 3 m/s (300 cm/s) is required. For such a speed, one can calculate that for only
a one-hour recording, 3 × 60 × 60 = 10,800 m of tape is required. The quality of such a longitudinal
recording was still very poor and the equipment extremely large and difficult to deal with.
Knowing the size of a C-60 tape (86 m), one can imagine the physical size of reels having 10 km of
tape. Since this was very impractical, a solution was sought for in a different way of achieving the
tape speed relative to the video head. In the 1950s, a couple of Ampex™ engineers came up with a
transverse-scan system that had 4 video heads rotating while the tape passed at an incredible speed of
40 m/s. This system was capable of recording up to 15 MHz of signal bandwidth and was sufficient
in quality for broadcast television. For the commercial and CCTV markets this product was far too
expensive, so other alternatives and solutions had to be developed.
The early VCR concepts
By the end of the 1950s, the concept of helical scanning was proposed. This was a much simpler
system than transverse scanning, although initially all of the manufacturers offered open reel designs,
incompatible with each other. The recorders were not using cassettes yet and were not for domestic
In the 1970s, Sony™ proposed its U-matic standard, which
became well established in the broadcast industry, having very
good performance for its time and introducing cassettes instead
of open reels.
It was 1972 when Philips™ came out with its first machine
aimed at the domestic market called N1500, which was a real
milestone in VCR development. Unfortunately, however, it did
not sell very well. It offered one hour of recording and had a
8. Analog video recorders
built-in tuner, timer, and RF modulator. This led to the development of the System 2000 design, but
unfortunately it happened at the time when color television appeared and a lot of people were saving
money to buy color TVs instead of VCRs.
In the early 1970s, Matsushita™ and JVC™ came out with their rival proposals, – the video home
system (VHS), while Sony™ proposed the Beta. So there was actually a bitter competition between the
System 2000, Beta, and VHS. They were similar in concept but unfortunately, totally incompatible.
In time, VHS became the most popular and widely accepted by the domestic market. Technically
speaking, VHS was initially the poorest in quality, but it was much simpler and cheaper to make.
Over the years, a lot of improvements have been made in VHS making it a much better quality product
than it was originally, so that today in CCTV, as is the case with the domestic market, VHS is used in
more than 90% of cases. Once VHS was widely accepted, Sony came out with its 8 mm format and
then its Hi 8 mm, offering much smaller tapes and better recording quality, but JVC™ released its
Super VHS, which matched the Hi 8 quality.
As we have already mentioned, a special type of VHS VCR was developed for CCTV, called a timelapse VCR. That is why in this book we will only cover the VHS concept. We are perhaps being a little
bit unfair to the other formats that may also be in use, like the U-Matic, Beta, or 8, but time and space
allows us only to concentrate on the equipment used in the majority of systems today.
The video home system (VHS) concept
In helical scanning, the heads are located on a tilted drum that rotates with a speed equal to the video
frame frequency, 25 revolutions per second for PAL and 30 for NTSC. The required tape-relativeto-head speed is achieved mainly with the head drum rotation.
With the initial video home system
(VHS) design actually two video
heads were used, 180° opposite each
other. They are mounted on a rotating
cylinder called a video drum. So when
a recording or playback happens, each
head records or plays back one TV
field. The videotape is wound around
the drum for 180°; thus, one of the two
video heads is always in contact with
the tape. The actual speed of the tape
relative to the stationary parts of the
VCR’s tape compartment is 2.339 cm/s
(PAL) – that is approximately half the
speed of an audio cassette. For NTSC
speed is a bit higher, 3.33 cm/s.
8. Analog video recorders
The VHS tape format is 1/2'' (12.65 mm) wide, and as can be seen from the drawing below, the thickness
of each of the slanted tracks is approximately 0.049 mm and their length is approximately 10 cm. In
so little space, information for 312.5 lines for PAL (and 262.5 for NTSC) has to be recorded. When
you have this in mind, it becomes understandable how important the quality of the tape is, both with
its magnetic coating and its mechanical continuity and durability.
Apart from the video signal, which is recorded on slanted tracks, audio is also recorded on the tape,
with a stationary audio head on the top part of the tape and control tracks on the bottom.
Certain limitations are imposed on the video signal when it gets inside the VCR electronics. For starters,
the design of the VHS recording, including the size of the video drum, the rotation speed, and the
videotape quality, determine how wide a bandwidth can be recorded on the videotape.
When the video signal gets to the video input stage of the VCR, it goes through a very sharp-edged
low-pass filter with a high-frequency end of 3 MHz. This filter passes only the luminance information,
while the chrominance is extracted from the high-pass filtered portion of the same signal. The reason
for such a cutoff of the luminance is simply that more cannot be recorded. Those are the limits of the
VHS concept.
8. Analog video recorders
From the simple relation we introduced earlier, 3 MHz corresponds to 240 TV lines of horizontal
resolution. This is the practical limitation for a color video signal when played back. This indicates
that the VCR is almost always the bottleneck in achieving a good-quality playback picture in today’s
CCTV systems.
When recording only monochrome signals, the low-pass filtering can be bypassed since we do not have
a color carrier. In such cases, the actual resolution will be a bit higher and depending on the tape and
VCR quality, it can be close to 300 TV lines. Many VCRs have an automatic switch for this bypass,
but on most time-lapse VCRs there is a manual switch for it.
The actual video luminance
signal is not recorded
directly as it is, but it is
modulated, as is the case
with the audio recording.
In VHS, the luminance is
frequency modulated (FM)
with frequency deviations
starting from 3.8 MHz
(corresponds to lowest
sync peak) up to 4.8 MHz
(corresponds to white
peaks). The chrominance
information, which is
extracted from the VCR
input, is directly recorded
with a down-converted
8. Analog video recorders
carrier of 627 kHz and occupies the 0 ~ 1 MHz spectrum range. This is possible because the luminance
is frequency modulated above this area.
With the further development of the VHS concept a lot of improvements were introduced. Models with
four heads were produced, long play mode was offered, and pause mode stability improved considerably.
Also, audio recording, which was initially very poor with low-speed transversal recording, was improved
in the Hi-Fi models. Instead of the initial 40 Hz ~ 12 kHz audio bandwidth, a high fidelity sound is
recorded with audio heads located on the video drum itself, rotating with the same speed as the video
heads. With such a high-speed tape-relative-to-heads recording, the audio bandwidth was widened to
20 Hz–20 kHz and the signal/noise ratio dramatically increased from 44 dB to over 90 dB. The Hi-Fi
audio channels are not recorded on separate tracks along the video but rather in the deeper layer of the
tape and with a different azimuth angle of the recorded FM signal. This type of recording is therefore
called depth multiplex recording.
Even though better tapes and video heads were manufactured, the video bandwidth could not be
improved considerably, owing to the limitations of the concept itself. Having this in mind, the VHS
inventors introduced a new and improved format called Super VHS.
Super VHS, Y/C, and comb filtering
The next major advancement in the development of VHS VCRs came in 1987 with the introduction
of the Super VHS concept. The Super VHS format improved the luminance and chrominance quality
of the recorded video signals, yet preserved downward compatibility with the VHS format. Thus, the
same type of video heads rotate with the same speed at the same angle.
S-VHS recorders differ from the VHS basically by their wider bandwidth. This is achieved by separating
the color and luminance from the composite video signal with a special comb filter and then modulating
the luminance signal with
a higher and wider FM
band, whose frequency
now deviates from 5.4
MHz to 7 MHz. Therefore,
a video luminance
bandwidth of over 5 MHz
can be recorded, giving
400+ TV lines resolution.
Video heads of the same
physical dimensions are
used, but they have better
characteristics. Also,
although the same sized
videotapes are used, the
magnetic coating is of a
much better quality.
8. Analog video recorders
S-VHS VCRs can record and play back VHS and S-VHS. For a S-VHS recording to be activated a
S-VHS tape must be used (the S-VHS recorder recognizes a S-VHS tape by a little slot on the cassette
box). A VHS VCR cannot play back S-VHS tapes.
When color and luminance signals are combined in a composite video signal, there is always a visible
cross-color and cross-luminance artifact. In order to minimize such deterioration, S-VHS recorders
permit direct input and output of the uncombined luminance and chrominance components. This pair is
called Y/C (Y stands for luminance and C for chrominance) and is found at the back of S-VHS VCRs
in the form of miniature DIN (Deutsche Industrie Normen) connectors.
If you have a video source that produces Y/C signals (like some multiplexers, VCRs, or framestores),
they can be connected to the S-VHS VCR with a special Y/C cable that is composed of two miniature
coaxial cables.
Some users erroneously believe that we can only record a high-quality video when a Y/C signal is
brought to the S-VHS. This is not true, since the S-VHS was designed primarily for recording composite
video signals. For this purpose, a special adaptive comb filter was designed for S-VHS, where the
color information is separated from the composite video signal without losing significant luminance
resolution (as is the case with the low-pass filter in VHS).
An early solution to the Y/C separation problem was to put a low-pass filter on the composite signal
and filter out the color signal above about 2.5 MHz in NTSC (above 3 MHz in PAL) to recover the
Y signal. The reduced bandwidth of the Y signal dramatically limited the resolution in the picture. A
bandpass filter was used to recover the color signal, but it was still contaminated by high-frequency
luminance crosstalk and suffered serious cross-color effects.
It is known, however, that the basic composite video signal is periodic in nature as a result of the
horizontal and vertical scanning and blanking processes. When such a signal is represented in the
frequency domain (a Fourier analysis is applied), it will be represented by harmonics in precise
locations, rather than have uniform spectrum throughout the whole spectrum of the video signal.
This is a very important and fundamental fact in television signal analysis.
8. Analog video recorders
By picking the horizontal and vertical scanning rates and the color subcarrier frequency in particular
harmonic relationships, the Y/C separation process can be simplified. The color subcarrier frequency in
NTSC (and similar logic can be applied to PAL), Fsc, is chosen to be 3.579545 MHz (usually referred
to as simply 3.58 MHz). This corresponds to the 455th harmonic of the horizontal scanning frequency,
Fh, divided by two (as per the NTSC definitions).
Fh = 15,734.26 Hz
Fsc = 455 × Fh/2 = 3.579545 MHz
Since there are 525 lines in a video frame and a frame consists of two interlaced fields, there are 262.5
lines in a field. Therefore, the vertical field rate is:
Fv = Fh/262.5 = 59.94 Hz
There are also two fields in a frame, so the frame rate is Fv/2 = 29.97 Hz.
Since the video signal is periodic in nature, the spectral distribution of the video frequencies is grouped
together in clusters. The Fourier analysis of a static video signal shows that the energy spectrum is
concentrated in clusters separated by 15.734 kHz, which is the horizontal scan rate. Each cluster has
sidebands with 59.94 and 29.97 Hz spacing. Therefore, the luminance signal does not have a continuous
distribution of energy across its bandwidth. Instead, it exists as clusters of energy, each separated by
8. Analog video recorders
15.734 kHz. These clusters are not very wide, so most of the space between them is empty.
The chrominance signal is also periodic in nature, because it appears on each horizontal scan and is
interrupted by the blanking process. Therefore, the chrominance signal will also cluster at 15.734
kHz intervals across its bandwidth. By picking the color subcarrier at an odd harmonic (455) of Fh/2,
the chroma signal clusters are centered exactly between the luminance signal clusters. Therefore,
the Y and C signals can occupy the same frequency space by this process of frequency interleaving.
This is the idea behind the comb filters design. A comb filter can be designed to have a frequency
response with nulls at periodic frequency intervals. At the center frequency between the nulls, the comb
filter passes the signal. If the comb filter is tuned to be periodic at the same 15.734 kHz intervals as the
Y/C frequency interleaving, it will pass the Y signal while rejecting the C signal or vice versa.
When using Y/C cables between S-VHS components there is minimal cross-luminance and cross-color
interference, but for CCTV this is quite impractical since it requires two coaxial cables. The miniature
Y/C cable that comes with some S-VHS VCRs is a twin-coaxial cable designed for short runs only, as it
has a much higher attenuation than the popular RG-59/U. The main intention of such Y/C connections
is for dubbing purposes.
The technology of comb filtering is improving daily. Today the most advanced comb filters are employed
not only in S-VHS VCRs but also in high-quality monitors and television sets.
First, it was the 2D-comb filter where not only one line in the video signal but the previous and the
next one was used to compare the color content and decide on the optimum filtering (thus, 2D). Further
improvement brought the 3D-comb filtering and digital comb filtering, where not only information
in one TV field, but the previous and next fields are processed for the color content (thus, 3D). New
developments are further improving the resolution and color fidelity.
Of the units you might have, an S-VHS recorder, for example, and a TV monitor, both of them might
have comb filters but not necessarily of equal type and quality. It may happen that a better picture quality
will be reproduced if a composite video signal is brought from the recorder and the TV is allowed to
extract the color information with its own comb filter (if it is of a better design), rather than having
Y/C cable connection between the S-VHS VCR and TV monitor.
So, using S-VHS recorders in CCTV with high-resolution color cameras and a single coaxial cable for
composite color video signal is still far superior than using VHS VCRs. The quality of the recorded
signal is ensured by the high-quality adaptive comb filter built in the S-VHS VCR, and the played
back signal will be as good as the monitor can show. If a high-resolution color monitor is used, which
would also have its own comb filter, the quality will be much better than using TV monitors designed
for commercial use. If we assume that a camera has 470 TV lines of horizontal resolution, the S-VHS
VCR has about 400 and the monitor 600 TV lines, the VCR will still be the bottleneck for the played
back resolution and the played back signal should have around 400 TV lines (providing, of course,
S-VHS tape is used).
Another minor note not found in many technical topics related to S-VHS VCRs is in regards to the
LP/SP (long play/standard play) modes. The S-VHS quality is achievable in LP mode as well as in SP
8. Analog video recorders
mode. A very minor deterioration of the higher frequencies recorded takes place because of the closer
video tracks and slower tape movement, but this is almost undetectable.
Consumer VCRs for CCTV purposes
A very trivial question that I have often been asked by nontechnical people is, Can I connect a CCTV
camera to my VCR at home and record and view it on my TV? The answer is yes, although you should
be aware of the reduction in recorder quality when compared to dedicated CCTV equipment.
A typical domestic VCR, apart from the RF (antenna) input, also has the audio/video inputs. In most
cases, they are in the form of phono sockets (some call them RCA connectors) – one for a basic bandwidth
video (this is, as we said earlier, what the CCTV camera gives us) and the other for an audio signal.
So, a CCTV camera video signal should be connected directly to the video input of the VCR, with an
appropriate adaptor (BNC-RCA). Then, the video output terminal of the VCR (the same type of RCA
connector) has to be connected to the video input of the TV receiver. Both the VCR and the TV have
to be switched to A/V channel, and then the CCTV camera should appear on your TV screen.
If your TV receiver does not have an A/V input, however, then the RF output of the VCR should
be taken to the RF input (or the antenna input) of the TV set. Understandably, the TV now has to be
tuned to the VCR channel, which in most cases should be UHF (36–39), as this is a dedicated area
for VCRs, but some older models may modulate their signal in the low VHF channels 0, 1, 2, or 3.
Also in this case, the VCR has to be set to the A/V channel in order to pass the CCTV camera signal
from its video input to the RF output. In both of the above cases, the VCR is in between the camera
Walls of VCRs in Sydney's Star City Casino record all cameras
in real time mode, fully managed by the matrix switcher.
8. Analog video recorders
and the TV. When viewing a live signal or recording, the picture is displayed on the TV, and when
playing back a recorded signal, the VCR cuts the incoming live signal and shows the recorded image
on the same TV.
When compared to the CCTV dedicated time-lapse VCRs (discussed in the next heading), the
disadvantages of the domestic VCR models are manyfold: there is no time and date inserted in the
recorded video signal, there are no external alarm trigger inputs, and maximum recording time can
be achieved in long play mode, which is not longer than 10 hours for PAL or over 8 hours for NTSC.
There are, however, some clear advantages: the price of a normal VCR is very low and affordable,
and the images are recorded in full motion – 50 fields per second for PAL and 60 fields per second for
Because of the above-mentioned advantages of domestic VCRs, some matrix manufacturers have
designed special hardware and software interface devices for their matrix switchers, so as to be able to
intelligently control VCRs. This is usually done by intercepting the infrared control section of the VCR,
and full control over the recorders is taken from the matrix. In large systems it is almost as expensive,
if not more so, to incorporate MUX-es and TL VCRs instead. Because of this reason and because of
the requirement for real-time recording all the time, this solution has been especially attractive for large
casino installations. With a properly designed and programmed matrix system, it is possible to fully
automate and control hundreds and hundreds of VCRs except when tapes need to be changed.
8. Analog video recorders
At this point we should also mention that owing to the different recording speeds in the two television
standards discussed in this book (PAL and NTSC), we also have different videotape length, and
consequently, slightly different recording/playback time. The accompanying table on the previous page
should give sufficient information for such discrepancies. Please note that international tape marking
for PAL system machines is with “E” and for NTSC machines with “T.”
Time-lapse VCRs (TL VCRs)
Time-lapse VCRs are a special category of video recorders, developed specifically for the security
The main difference between the TL VHS VCR and domestic models is the following:
• TL VCRs can record up to 960 hrs on a single 180-minute tape (PAL) or 120-minute (NTSC).
Other time-lapse modes between 3 and 960 are available: 12, 24, 48, 72, 96, 120, 168, 240, 480,
and 720 hrs. This is achieved by the time-lapse stepper motor that moves the tape in discrete
steps, while the video drum rotates constantly. Usually up to the 12-hr mode, the tape moves
with continuous speed, after which, starting from 24, it moves in discrete steps. The time-lapsed
between subsequent shots increases as the mode increases. Typical times are shown in the table
on the next page.
The modes mentioned refer to a 180-minute or 120-minute tape, depending on the television
system in question. If a 240-minute tape is used instead, the corresponding TL mode increases
by 1/3; that is, 24 hrs becomes 32, 72 becomes 96, and so on. The same logic applies when a
300-minute tape is used, where TL modes are increased by 2/3; that is, 24 hrs becomes 40, 72
becomes 120, and so on. Please refer to the table on next page for more details.
When a TL VCR is recording in TL mode, no real-time movement is recorded because there are
not 50 fields (60 for NTSC) recorded each second. The playback looks like a video playback in
Pause mode, advancing at short but regular intervals, as per the table. TL VCRs can record and
play back in any mode, regardless of which it was recorded in. In pause mode, still frames (fields)
have exceptionally good quality. When unstable, a special still lock adjustment potentiometer,
not available on commercial VCRs, can stabilize the picture to a perfectly still frame. This is of
great importance for verification purposes.
• TL VCRs have no tuners; that is, normal RF reception is not possible.
• TL VCRs can be triggered by an external alarm, which will cause the unit to switch instantly
from TL mode into real time for a preset duration (15 s, 30 s, 1 min, 3 min) or until the alarm
is cleared, after which it goes back into TL mode. Usually, voltage free N/O (normally open)
contacts are expected as the alarm input. This is a very powerful function of TL VCRs. When
an alarm is recorded, most TL VCRs index the tape so that a quick search of the alarmed area is
possible. Some makes offer search by time, date, and hour, and others offer alarm scan as well,
which can be very convenient when more than one alarm has to be reviewed every day.
8. Analog video recorders
• TL VCRs pass the incoming alarms out in a form of alarm voltage output that can be used to
trigger an additional device such as a buzzer, strobe light, or similar device.
• TL VCRs can be programmed to recycle-record, which is very useful when the tape duration
expires earlier than expected and there is no operator to replace it.
• The MTBF of a video head used in TL VCRs is usually about 10,000 hrs, which is equivalent to
about one year of constant play/record operation. After this, head replacement is recommended.
All TL VCRs have some form of indication of the head’s hourly usage. This is displayed either
on a mercury-based indicator or electronically when the setup is performed.
• Some TL VCRs can be programmed to record only one shot with each alarm input. Using this
type of recording we can fit more than 960 hrs on a single tape.
8. Analog video recorders
TL VCRs also have, as the standard VCRs, timer settings, which means they can be programmed to
record only at certain times and on certain days.
TL VCRs are important devices in CCTV, even though they are the weakest link in the resolution chain.
Apart from their use in multiplexed recording, one of the most important features is their ability to switch
to real-time recording when an external alarm is received. Most of the models available on the market
can be switched from stop mode to real-time recording, but it is more advantageous when the same
alarm switches the VCR to real time while it is already recording in time-lapse mode. The reason is very
simple: VCRs, being electromechanical devices, have inertia. This means a few parts of a second (and
sometimes even more than a second) might be lost until the video head starts spinning and the tape is
wound around the drum. If a
TL VCR is already recording
in time-lapse mode, it takes
only a few milliseconds to
change to real-time recording
because the tape is in place and
the video heads are already
spinning. If tape wastage is
of concern (due to the low
hours of time-lapse recording
that are not necessary), the
longest time-lapse mode can
be selected.
Some TL VCRs, or even
domestic models referred to
as Quick Start, have the tape
A time-lapse recorder
already wound around the
heads and are ready to record
even in stop mode. They have a better response than other VCRs, when the record button is pressed.
Be aware that on most domestic models there is a certain time delay during which the machine will
be in standby mode, after which the tape unwinds. This could be only a minute or two, or sometimes
up to ten.
Many installers have modified domestic model VCRs for alarm recording, which is reasonably easy to
do. The record button contacts are paralleled and connected to a relay that is controlled by an external
alarm. In such cases, the VCR’s warranty will be void. Another important detail is that there is no time
and date stamping when an alarm triggers such a VCR.
VHS VCRs’ horizontal resolution limitations (vertical is still defined by the TV system in use) are,
as mentioned earlier, 240 TV lines for a color signal. Because CCTV still uses a lot of B/W cameras,
most TL VCRs have a switch for selecting between B/W and color. When set to B/W, the video signal
bypasses the low-pass filtering used for extracting the color information from a color signal, thus
allowing for an improved horizontal resolution for a B/W signal, in excess of 300 TV lines (which also
depends very much on the tape quality and how clean the heads are).
8. Analog video recorders
If we want an even better recording
quality than what the VHS format offers,
we can use time-lapse Super VHS
models. They offer the same flexibility
and programmability as the VHS TL
VCRs, only they are of better picture
quality and more expensive.
Whichever type of video recorder you
use (and this also refers to domestic
VCRs), the video signal resolution
should not be taken for granted. It could
be much worse than in theory, if any
of the following requirements are not
A S-VHS time-lapse VCR
• For starters, connect a good video signal to the VCR input. This is especially important for
the horizontal sync pulses of the signal since they are reproduced from the tape as part of the
video signal. If the camera is very distant, with distorted syncs and color bursts (voltage drop and
high-frequency losses), the video playback will be very unstable, with picture breaking across
the top and unstable colors. Because the tape and the heads limit the resolution even further, the
quality of the sync pulses is also affected. How these distortions are reproduced on a monitor
screen depends very much on the monitor’s sync-handling capability, but if the sync pulses (and
video information) are recorded poorly, there is little else that can be done by the monitor.
• Always use good-quality tapes. The uniformity of the magnetic coating and the film base quality
is very important. Good tapes not only improve the recording quality, but also prolong the life of
the video heads and VCR mechanics in general. Bad tapes (or imitations of known brands) have
a nonuniform magnetic layer, which quite often peels off, and microscopic particles accumulate
on the video heads, causing more damage than saving dollars.
• Video heads need regular
cleaning, but it should be done
only with approved cleaning kits.
The best thing to do is to consult
your local video shop or service.
They have valuable practical
experience in VCRs which you
could apply to CCTV. If you do
not clean your VCR for a long
period, snowy playback is what
you will see. To confirm that it is
a dirty head and not a bad tape or
signal (which may look similar),
take a tape of a known brand and
8. Analog video recorders
make sure it has been properly recorded, then play it back. If the snowy picture is still there,
the video heads need cleaning. Do not confuse the snow produced by dirty heads with the need
for tracking adjustment. The difference is in the amount of snow. The tracking usually needs
adjustment if the bottom of the monitor shows picture breaking.
The table at the beginnig of this heading gives the number of fields recorded every second with different
TL settings, in both of the major TV systems, NTSC and PAL. The refresh rate represents the time gap
between the subsequent fields.
8. Analog video recorders
9. Digital video
All of the discussions in this book so far have involved PAL and NTSC television standards, which refer
to analog video signals. The majority of CCTV systems today would still have analog cameras, even
though an increased number of manufacturers offer digital video (IP) cameras designed to “stream”
video over network.
The very few components in CCTV that, only a half a dozen years ago, used digital video were the
framestores, quad compressors, multiplexers, and the internal circuits of the digital signal processing
(DSP) cameras. Today, we can freely say that the majority of new installations, though still working
with analog cameras, use digital video recorders for monitoring and long-term storage. Camera quality
is an important starting point in the CCTV system video chain, but the quality of the recorded images
and its intelligent processing have become equally important.
In the interval since the first edition of this book (1996), there have been revolutionary developments
in TV, multimedia, video, photography, and CCTV. The majority of these developments are based on
digital technology. One of the locomotives of the real new boom in CCTV has been the switch to digital
video processing, transmission, and recording. This development gathered a real momentum in the
last few years – hence the reason for a complete new edition of this book with extended discussions
on digital, video compression, networking, and IP technology.
Only a few years ago, the price of high-speed digital electronics capable of live video processing was
unaffordable and uneconomical. Today, however, with the ever increasing performance and speed
of memory chips, processors, and hard disks, as well as their decrease in price, digital video signal
processing in real time is not only possible and more affordable, but it has become the only way to
process a large number of high-quality video signals.
Digital video was first introduced in the broadcasting industry in the early 1990s. As with any new
technology it was initially very expensive and used rarely. Today digital video is the new standard,
replacing the nearly half a century old analog video. It comes basically in two flavors – Standard
Definition (SDTV) with the aspect ratio of 4:3 and the quality as we know it, and High Definition
(HDTV) with the aspect ratio of 16:9 and around 5 times the number of pixels of SDTV. Many countries
around the world are already broadcasting digital video, usually in both formats (SDTV and HDTV). Not
surprisingly, the HDTV is going to be the preferred choice of the consumer market, owing to its much
higher resolution and the theatrical experience one has watching movies, but since in the majority of
CCTV today we use the standard definition resolution in this chapter, we will cover all the key features
that refer to standard resolution video, with a 4:3 aspect ratio.
Digital video recorders (DVRs) and IP cameras have now become the main reason for the new CCTV
growth, a source of higher revenue and an inspiration for new and intelligent system design solutions
that have blurred the line between computers, IT technology, networking, and CCTV.
9. Digital video
Why digital video?
Analog signals are defined as signals that can have any value in a predefined range. Such analog signals
could be audio, but also video. As we know, the predefined range for video signal is anything from 0
volts (corresponding to black) to 0.7 volts (corresponding to white).
As mentioned earlier, most of the CCTV cameras today produce analog signals. But the main problem
with analog signals is that noise is easily induced onto them, and, as we know, in real life noise cannot
be avoided. It is just accumulated at every
stage of the signal path. Starting from the
thermal noise at the camera imaging chip,
the camera electronics itself, it adds in
the transmission media (cables) and at the
receiving end (recorders, monitors, etc.).
The longer this path is the more noise it will
get induced. This is where digital signals
can make a big difference. So, one of the
most important differences between an
analog and a digital signal (apart from
the form itself) is the immunity to noise.
A digital signal is also affected by noise, as
is the analog signal, but digital signals can
only have two values: zero or one. Noise
will only affect the signal when its value
reaches levels that may interfere with the
digital circuit margins that decide whether
a signal is zero or one. This means that
digital signals allow noise accumulation to
an extent unimaginable with analog video
signals, which is why we say digital signals
are virtually immune to noise. As a result,
this means longer distances, high immunity
to external EMIs, and no signal degradation,
(i.e., better picture quality).
The other important advantage of digital video signals is the possibility for digital processing and
storage. This includes image enhancement, compression, transmission, various corrections, and storage.
Also an important feature is that there is no difference in image quality between the copies and
the original. Whether we make one, two, or ten copies of the image captured in a digital format, the
quality is exactly the same as the original no matter what generation copy it is. And last, but not the least
important feature with video captured in digital format is the possibility of checking the originality
of a copy. This feature is very often referred to as “water-marking,” and it enables the protection of
digital signals against deliberate tampering, a very important aspect for CCTV security applications.
There are two main groups of compressions used in CCTV: video and image compressions.
9. Digital video
Digital video recorders (DVRs)
Today it seems that recording CCTV video on VCR tapes is nearly over. Five years ago, during
completion of the previous edition of this book, VCRs were still around in big numbers, and DVRs
were only starting to appear. Today this ratio is reversed.
So what are the real benefits of using DVRs in CCTV, as opposed to VCRs?
First, with the VCR’s analog method there is no direct and quick access to the desired camera, except
when using a reasonably quick Alarm Search mode (available on most TL VCRs). In VCRs the
information is stored in an analog format and cannot be further processed. The VCR recorded video
quality is always lower than the actual original source.
Initially, in CCTV, attempts were made to implement digital video recording on a digital audio tape (DAT)
format. Though digital, such recorded material still required a sequential search mode, which is not as
efficient as the random access used in a hard disk. Hard disks have a much higher through-output than
other digital storage media and higher capacity; better than S-VHS quality images are achievable with
appropriate video compressions. What was a problem only a few years ago – the length of recording
– these days is no longer such. Hard disks with capacities of 300 GB are readily available, and DVRs
with internal capacities of 1200 GB (1.2 TB) are not a rarity. Multi-week recording of a number of
cameras is no longer a problem. Hard disk drives (HDD) now have fast access time, and by using good
compression it is possible to record and play back multiple images from one – in real time (meaning
“live” video rate). The hard disk prices are falling daily and it is interesting to note that at the time of
writing the previous edition of this book, a single 3.5 inch HDD with the capacity of around 30 GB was
becoming available. For the same price, today in 2005, we have a nearly tenfold increase in capacity,
with 300 GB and 400 GB already been advertised. Because of the importance of the hard drives, the
need arose for a complete new chapter discussing all the important aspects of it.
How many days or weeks of video recording can be stored on a 300 GB, for example, depends first on
the type of compression and the quality of images elected for such compression. Also, an important
factor would be if the recording were made permanent or if it was based on video motion detection.
The latter one has become very popular in CCTV as it extends the recording capacity at least two or
threefold. Certainly, extending the hard drive storage is also possible, but providing redundancy (safety)
as well might be an important request by a customer.
Because there are so many variables, it is not easy to give a uniform answer. I am also aware that the
first question many customers ask is how many days of recording they will get; therefore, in order to
help you out, I have put two different spreadsheets on our web site ( which you
can download and use. The first one refers to multiplexed image compressions, and the other to video
compressions, which are all explained further in this chapter.
All of the above leads us to various considerations we have to have in mind when selecting digital
compression, storage media, and data transfer rate. This is why we have to understand the theory of
digital video and image representation with various compression techniques. The following few headings
will try to explain some of the basics.
9. Digital video
The various standards
A few international bodies engage in various standards for digitized video. The most well known is
the International Telecommunication Union (ITU), which is the United Nations specialized agency
in the field of telecommunications. The ITU Telecommunication Standardization Sector (ITU-T) is a
permanent organ of ITU. ITU-T is responsible for studying technical, operating, and tariff questions and
for issuing recommendations on them with a view to standardizing telecommunications on a worldwide
basis. The World Telecommunication Standardization Assembly (WTSA), which meets every four years,
establishes the topics for study by the ITU-T study groups which, in turn, produce recommendations
on these topics. The approval of ITU-T Recommendations is covered by the procedure laid down in
WTSA Resolution 1. In some areas of information technology that fall within ITU-T’s review, the
necessary standards are prepared on a collaborative basis with ISO and IEC.
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO and IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and nongovernmental, in liaison with ISO and IEC, also take part in the
work. In the field of information technology, ISO and IEC have established a joint technical committee,
ISO/IEC JTC1. Draft International Standards adopted by the joint technical committee are circulated to
national bodies for voting. Publication as an International Standard requires approval by at least 75%
of the national bodies casting a vote.
Some recommendations, such as the latest H.264, are prepared jointly by ITU-T SG16 Q.6, also known
as VCEG (Video Coding Experts Group), and by ISO/IEC JTC1/SC29/WG11, also known as MPEG
(Moving Picture Experts Group). VCEG was formed in 1997 to maintain prior ITU-T video coding
standards and to develop new video coding standard(s) appropriate for a wide range of conversational
and nonconversational services. MPEG was formed in 1988 to establish standards for coding moving
pictures and associated audio for various applications such as digital storage media, distribution, and
It should also be made clear that even though in CCTV we use video signal and we will talk about
video compression, we also make use of still image compressions. In order to make a clear distinction
between these two, we should refer to them as video compression and image compression.
9. Digital video
Video compressions use three dimensions when compressing: horizontal and vertical picture
dimensions, as well as time. As a result, such compressions are often referred to as temporal
compressions. Typical representatives of temporal (video) compressions are MPEG-1, MPEG-2,
MPEG-4, H.263, and H.264.
Image compressions use only two dimensions: the horizontal and vertical dimension of the image.
Typical image compressions representatives are JPEG and Wavelet (JPEG-2000).
The difficult challenge we face in CCTV is deciding which compression is best for a particular product
or project. There is no simple or single answer. Often it depends on how much we understand about
the differences between compressions, but more importantly on what is the intended usage. If a digital
CCTV system is designed to protect a cash teller in a bank, or a card dealer in a casino, a high image
rate would be preferred. Often live rates should be used (25 for PAL or 30 for NTSC), although in
some instances 10 images/second might be sufficient. A lower rate than this is possible but might not
be practical in such applications. Tests are often the best indicator.
Another example would be where normal human activity is recorded, like people walking in and out of
a foyer of a building. There is no need for a high rate here as it only adds gigabytes to the storage which
somebody has to eventually be able to go through and analyze. And this takes a lot of time, reducing the
efficiency of the system. Human activity can successfully be captured even at 2 images/second (although
the more the better), providing the image quality is high and the compression is low. How much detail
can be seen and recognized depends on the lens angles of coverage, but if a camera produces a signal
where a person’s face can be identified when viewing in live mode, a couple of images per second of
the same should be sufficient to make identification after the recording successful.
The other important recording technique we use in CCTV is the multiplexed recording. With digital
CCTV, we tend to mimic what was done in the days of multiplexed recording when we used multiplexers
and VCRs. So a typical CCTV digital video recorder (DVR) in actual fact is a multiplexer and digital
recorder in one. In such products, image compression (as opposed to video compression) would be
more convenient as it compresses TV frames or fields treating them as still images, without regard for
which camera comes before or after the compressed one. Some would argue that the disadvantages
of image compressions used in multiplexing DVRs are that they end up as relatively large image files
(typically, 30 kB – 60 kB per TV field, for a good image quality), but the pros are that each such image
is an independent entity and can be reconstructed on its own without the need for other images before
or after it to be available. For some legal cases this might be the preferred compression because of
such independence. This is not to say that video compressions cannot stand the court of law, but it
is only the interpretation of the argument that video compressions uses reconstruction based on past
and future reference images. With image compressions it is possible to have image rates much lower
than the live rate of 25 frames/second (29.97 for NTSC), making the most of the available hard drive
storage space. If we add to this the motion detection capability, which most of the multiplexed DVRs
use, we get a superior successor to the MUX+VCR combination. This is why it is possible to have
multiple cameras recorded on one DVR, each with at least a couple of images per second recording
rate, and achieve storage capacities of several days, weeks, and maybe months. This was unthinkable
only 5 to 10 years ago.
9. Digital video
In recorders where we want to achieve the highest
possible video quality or highest possible recording or
transmission rate, temporal compression is better suited as
it makes use of the redundancy of a video signal over time.
It does require, however, a continuous signal of the same
camera for maximum efficiency. The other advantage
of temporal (video) compression is that audio is almost
always part of such a scheme. Temporal compressions
makes better compression of the video sequence as it
uses motion prediction (not to be confused with motion
detection) so that object motion looks smoother when
playing back. Because of this fact, video compressions
are not used to multiplex cameras into one recording
system. Rather, if a DVR that uses temporal compression
has multiple camera inputs, they are usually independent
steamings recorded on hard drives.
Another important feature associated with temporal
compression is the time delay (latency) which happens
on video compressions such as MPEG-1 and MPEG-2.
This is a result of how the video compression is designed
to work, where video signal redundancy is reduced by
comparing past and future reference points, a technique
that requires some buffering (i.e., produces delays in
encoding and decoding). More susceptible to this effect
is MPEG-2, where high video quality is achieved with
higher bit rates, typically over 4 Mb/s, and this may
produce a latency from around half a second up to a
second. This delay is irrelevant in broadcast television
or when playing back a DVD movie for example, but it
becomes an issue when trying to control a PTZ camera
whose signal has been encoded for transmission over
LAN. It is possible, however, by combining a lower
streaming rate and lower group of pictures (GOP) size
to reduce the delay to acceptable 200 ms or less, with
unnoticeable video quality deterioration.
The temporal (video) compressions that use lower bit
rates and are designed for video conferencing (thus a
need for bidirectional video streaming), such as H.263
and MPEG-4, have much lower lag, though lower picture
quality too.
Video and image compressions have evolved considerably
in the last 10 years. Although in the majority of the
9. Digital video
broadcast and DVD industry, MPEG-2 is the dominant one, it is not excluded that a new and more
efficient compression will become more popular. At the time of writing this book, the latest and most
promising video compression seems to be H.264 (building on MPEG-4 v.10, and also known as advanced
video codec – AVC), and the latest and most promising image compression seems to be JPEG-2000
(using the Wavelet intelligence). We will see what the future will bring.
So, let us now list all of the compressions that we have or might use in CCTV; we will cover them in
more details later in this chapter.
• JPEG and Motion-JPEG (still image compression)
• JPEG-2000 / Wavelet and Motion JPEG-2000 (still compression encoding)
• MPEG-1 (video compression, uses data bit streaming between 1 ~ 3 Mb/s)
• MPEG-2 (video compression, uses data bit streaming between 1 ~ 30 Mb/s)
• MPEG-4 (video compression, uses even lower bit rates, from 9.6 kb/s ~ 1.5 Mb/s)
• MPEG-7 (new concept, offering smart object searching features)
• MPEG-21 (very new, promising larger scale integration of smartness of all MPEGs)
• H.261 (one of the first and oldest video compressions, designed for video conferencing,
uses multiples of 64 kb/s, typical for ISDN)
• H.263 (improved H.261, works with even lower bit rates)
• H.264 / AVC (new and advanced video compression for generic audiovisual services)
• Others (proprietary and hybrid compressions)
There are also hybrid compressions, which combine the features of the two above groups, such as the
Delta Wavelet compression, the Multi-Layer JPEG, and some other proprietary formats.
It is important to acknowledge that all compressions defined by the standards can be made in the
hardware, by dedicated processing chips, as opposed to the proprietary ones which, although they
may offer some advantages, are done in the software by the general purpose operating system and
the processor in the DVR. Thus, the continuity and consistency of such compression will very much
depend on how fast and how busy the main processor with other activities is.
The advantages of the hardware compression chips are obvious: their compression speed is independent
(i.e., constant) of the other activities taken by the main processor (web serving, backing up, remote
transmission, etc.).
Admittedly, software compression can easily be modified, and new features can be added, as it only
depends on the easily updated software code.
9. Digital video
ITU-601: Merging the NTSC and PAL
Prior to any digital processing, the first stage is analog to digital conversion (A/D). Such a circuit could
exist inside an IP camera, or a DVR. This is a stage where the analog signal is sampled and quantized
(broken into discrete values) in order to be converted to digital format. The sampling rate and levels
of quantization depend on the quality and speed of the electronics, and they define the resolution
(image quality) and the speed of the digital frame grabbing device. It is important to understand here
that, although theoretically a variety of A/D quality conversions might be used in terms of sampling
rates and quantizing levels, a television digitization standard has been established and the majority of
CCTV products use it.
The ITU-R BT.601 recommendation specifies the digitization of analog video signal comprised of
luminance Y, red color-difference component, and blue color-difference component with a sampling
base frequency of 3.375 MHz, common for both PAL and NTSC. The luminance Y is sampled with
four times of this “base” frequency (i.e., 3.375 × 4 = 13.5 MHz), and the color difference components
with two times the base frequency (i.e., 6.75 MHz). Hence this sampling arrangement is also known
as 4:2:2 sampling. Other sampling strategies are also possible, such as 4:1:1 and 4:4:4, but the 4:2:2
is the most common in CCTV.
A/D conversion starts with sampling and quantizing of the analog video signal.
9. Digital video
If we refresh our memory about PAL scanning lines and number of frames we get each second, we
can calculate that there are 625 × 25 frames = 15,625 lines each second. If we divide the 13.5 MHz
sampling rate (which is 13,500,000 times each second) with 15,625 Hz we get 864 samples per line.
This is the quality of the sampling in PAL when using the ITU-601 recommendation of 13.5 MHz.
Since PAL line duration is 64 µs (see the diagram), the sampling rate of 864 “slices” this time width
in pretty fine slices. It should be noted that this “slicing” includes the sync pulses as well.
The same type of calculation for NTSC, using 525 scanning lines at 59.94 Hz field rate (the accurate
field frequency is 59.94, not 60) obtains 525 × 29.97 Hz = 15,734.25 lines each second. Dividing 13.5
MHz by 15,734.25 Hz gives 858 samples per line, including, again, the sync pulses.
So, just to recap, using the ITU-601 recommendation in PAL luminance sampling, we get 864
samples/line, and in NTSC we get 858 samples/line. In both cases a sampling frequency of 13.5
MHz is used.
From the above, a very important fact about the ITU-R BT.601 can be concluded: the ITU-601 is the
first international recommendation that tries to merge the two incompatible analog composite
television standards (NTSC with 525/59.94 and PAL with 625/50) to a common component digital
sampling concept. The major achievement of Rec 601 is choosing a set of sampling frequencies
of 13.5 MHz which is common to both standards.
Out of the 864 samples for PAL and 858 for NTSC, the active line in both cases is given to have
720 samples. This is the maximum horizontal resolution a digitized signal using ITU-601 sampling
recommendation can have. The term resolution should be used loosely here because it has a slightly
different meaning than the analog video signal resolution expressed in TVL. We shall explain this in
more detail further in the text.
The sampling rate as recommended by ITU-601
9. Digital video
Some of you may ask, “Why 720, and not less or more than that?” This is because 720 is a number,
divisible by 8 (i.e., 23), which is very useful for most of the video compressions using discrete cosine
transformation (such as JPG, MPEG and H series) where video images are subdivided in blocks of
8 × 8 pixels. Often, you will find that some digital processing equipment will narrow the active video
signal by 8 samples to the left and 8 to the right of the main video signal contents of 720 samples,
making an active line consist of 704 pixels instead of 720. This is to allow for the various camera
signal fluctuations.
The vertical sampling recommendation by ITU-601 is equal to the number of active lines, which is
288 per TV field (or 576 for a full TV frame) in PAL and 240 per TV field (or 480 for a full TV frame)
in NTSC.
Therefore, the digitized TV frame according to the ITU-601 recommendation is 720 × 576 for
PAL and 720 × 480 for NTSC.
The ITU-601 digitization pixel count
9. Digital video
An example of a test chart as sampled by the ITU-601 recommendations (note the
horizontally “squashed” appearance) in PAL on the left, and how it is reproduced at
the analog video output on the right
This fact also indicates that the ITU-601 considers the interlaced scanning effect and in many digital
recorders a choice can be made if the recording is to be in field or frame mode.
An observant reader would notice in these numbers something that makes digital CCTV sometimes
confusing, and because of that it is worth clarifying it now. This is the aspect ratio of the standard
definition TV and the aspect ratio of the images produced when sampled with ITU-601 recommendation.
As we all know, all TVs and monitors in CCTV use an aspect ratio of 4:3 = 1.33, and yet the aspect ratio
of 720:576 = 1.25 for PAL and 720:480 = 1.5 for NTSC. This introduces so-called “non-square” pixels
in both of these standards. The PAL gets “horizontally squashed” pixels, which need to be stretched
out before reproduction onto a 4:3 aspect monitor, while the NTSC gets “vertically squashed” pixels,
which need to be expanded vertically before displaying it onto a monitor. This expansion/stretching is
done in the last stage of the decoding before it gets displayed. It may seem as if this is an unnecessary
stage in digital, but in actual fact it makes the decoding chips cheaper and more universal, since they
are used in both PAL and NTSC.
The resolution of ITU-601 digitized video
Based on the Nyquist theory, an analog and continuous signal can be reproduced from its discrete
samples if the sampling frequency is at least twice the highest bandwidth frequency. Higher frequencies
than the highest bandwidth are not wanted and in actual fact if they do exist they cause aliasing (like
the well-known Moiré patterning). In order for the aliasing to be minimized, the sampled signal has
to pass through a low-pass filter where frequencies higher than the upper frequency (equal to half the
sampling frequency) are being deliberately eliminated. An ideal brickwall low-pass filter doesn’t exist
in practice, so the actual filter cutoff frequency is slightly lower than what the theory needs it to be.
This fact has a direct bearing on the frequency response and the number of horizontal picture elements
(pixels) that a digitized system can handle.
9. Digital video
In ideal conditions, if no additional filtering was done, given the Nyquist frequency of 6.75 MHz
(i.e., a sampling rate 13.5 MHz), the 720 pixels per active line would be equivalent to a horizontal
resolution of 3/4 × 720 = 540 TVL, as defined by the analog TV.
The ITU-601 recommendation, however, specifies an anti-aliasing and reconstruction filter cutoff
of 5.75 MHz, which reduces the luminance analog horizontal resolution to 449 TVL in PAL and
455 TVL in NTSC.
Further reduction of the resolution is introduced by the video compression itself, so in practice it is
fair to say that no video signal in digitized CCTV can have any higher horizontal resolution than
around 450 TVL. It now becomes very clear that choosing a video compression that has as few losses
as possible is of paramount importance. This desire contradicts the requirement for long recording
storage. We will discuss the various video compressions further in this chapter, but it is important
to highlight again that the above resolution limit applies to the digitized video signal, before it
undergoes the compression.
The human eye is less sensitive to color resolution, and because of this in CCTV we do accept 4:2:2
sampling strategy as good enough, where the chrominance signals are subsampled by a factor of two,
at 6.75 MHz (only half the luminance sampling of 13.5 MHz). This results in 432 total pixels for PAL
and 429 pixels for the NTSC scanning standard (includes the sync pulses period). So, the digital active
The difference between full frame 720 x 576 pixels image (above) and the same in
CIF size (360 x 288) sometimes can make a difference between recognizing a license
plate and not recognizing it (for example, the car on the right).
9. Digital video
line accommodates the 360 red color-difference
and 360 blue color-difference component in both
standards. Under ideal conditions, given the Nyquist
frequency of 3.375 MHz, 360 pixels per active line
is equivalent to 3/4 × 360 = 270 TVL. Rec 601
specifies an anti-aliasing and reconstruction filter
cutoff of 2.75 MHz, resulting in a color differences
signal analog horizontal resolution on the order of
215 TVL in PAL and 218 TVL in NTSC.
All of the above, and especially the facts about the
digitized luminance, represents a very important
conclusion when are discussing resolution in
digitized video. It should be noted that this is the
ITU-601 digitization recommendation, and as we
said earlier, it is in use in the majority of digitization
products in the CCTV. There is no advantage
of using cameras with much higher resolution
than 450 TVL when the same is to be recorded
on ITU-601 compliant recorders. This is the
same argument as when we had high-resolution
cameras (460 TVL for example) being recorded
on VHS VCRs (which are limited to 240 TVL by
the low-pass filter design). The difference here is
The ITU-601 recommends a variety of
not so dramatic, as some CCTV manufacturers
sampling strategies, of which in CCTV the
lately have come up with color cameras offering
4:2:2 is widely accepted.
520 TVL, for example. Practically, this means you
cannot see any difference between a 460 TVL or 480 TVL or even 520 TVL camera, when these
are to be recorded even on the best quality ITU-601 compliant DVR. More attention should be
directed to choosing a camera with a better signal/noise ratio, less smear or better dynamic range
than to slight differences in horizontal resolution that nobody can see. If a system is designed and
used for just live monitoring, on high-quality CCTV monitors with better than 500 TVL resolution,
such a small difference in resolution might be advantageous, but unless Y/C connection is made to the
monitors instead of composite video (and this is really very rare in CCTV), not much difference can
be seen even then.
No one can predict what the future will bring, and I am confident that sooner or later we will
have some version of high-definition CCTV cameras, which will then be accompanied with
the appropriate high-definition
digitization recommendation. But,
until then, we should all be aware
of the limitations we face and the
compromises we must make with
If the sampling frequency is too low, aliasing
the current systems.
may occur.
9. Digital video
Left: TV field exported; Center: TV frame interlaced effect; Right: De-interlaced frame.
Note the jagged edges on the car when field recording is used (left) and the quality
of the same when frame recording is used (center and right).
The above is all true for horizontal resolution, but let us now talk about vertical resolution. In some
system designs, vertical resolution is as important, especially when detecting and recognizing license
plates, or faces at a distance.
The number of quantized levels in ITU-601 is chosen to be represented with 8 bits, that is, making a
total of 256 levels (28 = 256). The reason for such a choice is very practical from an engineering point
of view: no CRT can reproduce any more than around 250 levels of gray, so there is no need to
sample the analog video signal with more levels than this. The 256 is chosen because it is a binary
number, and as we know, in the digital world everything is represented with zeros and ones (i.e., with
the binary numbering system).
In actual fact, with the ITU-601 recommendations we should
be aware of some more “tricks in the bag.” As was the case
with the sampling frequency of 13.5 MHz that encompasses
the whole signal, including the sync pulses, the ITU-601
recommends that 8 bits are used for representing all of the
vertical details of the signal. We can think of the time as the
horizontal details, since it deals with lines in the horizontal
direction of the display.
So, ITU-601 suggests that out of the 256 combinations of 8
bits, the 0 and 255 are used for representing the syncs, which
leaves 1 to 254 values to be used for video. The luminance
level of black is given a value of 16 (binary 00010000) and
A comparison between full TV
frame and a CIF size
9. Digital video
the white level is given a value of 235 (binary
11101011). The value 128 is used to indicate
that there is no chrominance in the signal.
As we mentioned earlier, the number of vertical
pixels in a TV frame offered by the PAL system
is 576, while in the NTSC this is 480, which
corresponds to the actual number of active
lines in each standard. It is important to remind
the reader that each analog camera in CCTV
generates interlaced video (50 fields/s and
29.97 fields/s). The interlaced video consists
of TV fields displaced in time (1/50 s for PAL
and 1/29.97 s for NTSC). Because of this
when digitizing video with moving objects, A full frame exported image from a DVR that
uses wavelet compression
the interlaced effect may show up when the
recording is made in frame mode. This is a
natural TV effect – a result of the interlaced
scanning. It is not an error on behalf of the
digitization, as some may think. The objects
may seem blurred in the direction of movement,
and the faster the object moves the more obvious
this effect is.
Certain techniques called de-interlacing can
minimize or completely eliminate this effect.
Such functions are available in various photo
editing programs (such as PhotoShop or
PhotoPaint), but some dedicated DVR programs
can also do it.
A full frame exported image from a DVR that
uses MPEG-2 compression
The result of recording in frame mode, as
opposed to field, is twice the vertical resolution,
making object edges smoother and showing more details in an exported image (see the examples on this
and the next page). When playing a footage recorded in frame mode, an artifact of playing alternating
fields becomes apparent and this is the jumping up and down of each next field by one line. This is
again a natural result of how interlace television works, and it is not an error in the playback as some
may think. Basically, the digitized fields are displaced by one line (since they are coming from cameras
complying with the 2:1 interlaced PAL or NTSC TV standard). When recording in frame mode, basically
the DVRs record two fields, so the price paid for this is twice larger file sizes (since there are two fields
used by the system to make up a frame).
It is interesting, then, to ask the following question: how does a digitized video recorded in field mode
(720 × 288 for PAL, or 720 × 240 for NTSC) get reproduced to appear with 720 × 576, that is, 720 ×
480 on a screen or when exported? This is done simply by duplicating each line. Such duplicating
9. Digital video
The difference between field recorded image on the left and the frame recorded
produces another prominent effect of jagged edges. The human eye is more sensitive to resolution in
the horizontal direction than in the vertical, and this is the reason perhaps why the majority of DVR
systems in practice are set to record this way. In some systems, however, vertical resolution might be
more important, so frame recording should be used. For other DVRs, it is simply not possible to record
in any other than field mode as that could be the only option they have.
The interlaced effect explained above appears only when image compressions are of the JPG or Wavelet
type, which, as will be explained later in this chapter, are compressions that work with static TV fields
and will be treated as still images. In temporal video compressions, such as MPEGs and H.26x series,
the interlaced effect is compensated by the process of motion prediction vectors; therefore this effect
is not as noticeable.
An exported BMP from an
actual thief (below) and the
details obtained in full TV
frame (top) and CIF (below)
9. Digital video
All of the discussion we have had so far refers to the so-called full TV frame resolution. There are a
number of compression techniques that use one-quarter of the pixels in a full TV frame (i.e., 352 × 288
or 352 × 240 pixels). This size is usually referred to as the Common Interchange Format (CIF) and is
used typically by MPEG-1 and H.261 video compressions. The purpose is to reduce the digitized format
to an acceptable minimum data streaming size, useful for video conferencing, and comparable to VHS
image quality. When resolution is discussed in systems using MPEG-1, H.261, and others based on the
CIF size, all of the calculations and picture resolution we have made earlier are applicable by halving
these numbers. Thus, an equivalent analog resolution that a CIF size image would have is around
220 TVL. This makes the CIF image pixels real estate one-quarter of the full frame as defined by the
ITU-601 recommendation (one-half of the horizontal number of pixels and one-half of the vertical
number of pixels). For many applications this might be of sufficient quality and it does offer a better
image update rate when recording or transmitting (since the size is one-quarter of what it would be in
ITU-601). It is especially useful in video conferencing, which is the original application the CIF format
was designed for. The CIF resolution, before compression, is comparable to the maximum resolution
of an analog VHS video (240 TVL). This is important to consider when designing a system where face
identification or license plates recognition are required. Some discussions and CCTV specifications
refer to the full ITU-601 frame as 4CIF, indicating that the pixel count is four times the CIF size. Also,
a QCIF size is available, which refers to Quarter CIF size (i.e., 176 × 144 pixels).
It is understandable that we want the best possible picture quality. But no matter what steps we take,
the compressed image cannot be of a better quality than the original. The pixel count of a digitally
recorded image of any CCTV camera, even if it is a full frame size, is only close to 415,000 picture
elements for PAL and 345,000 for NTSC, at best. One can certainly appreciate the difference between
a 400,000 and say a still image of a digital photo-camera with, for example, 4,000,000 pixels. So when
you get a customer asking why he gets pixelization after he zooms into a digitized image exported from
his DVR, the answer is simple:
that is the number of pixels
the digital image has. CCTV
cameras produce images that are
far inferior to images produced
by a photographic camera, be
that a film camera or a digital
one, and therefore they cannot
be compared. So when you set
out to design a system where
faces and license plates need to
be recognized, the pixel count
has to be taken into account. We
will say a few more words near
the end of this chapter about
what current CCTV standards
recommend for such systems.
Countless numbers of DVRs are available these days for
digital CCTV.
9. Digital video
The need for compression
In order to find the data streaming rate required for ITU-601 digitized video we can make some simple
calculations. Multiplying the samples of each line (864 for PAL and 858 for NTSC) with the number
of lines in the system (625 and 525), then with the number of TV frames of the system (25 and 30) and
assuming 8 bits representation of luminance and 8 of the color differences (4 for Cr and 4 for Cb), we
get approximately the same bit rate for both digitized TV systems.
For PAL: 864 × 625 × 25 × (8 + 8) = 216 Mb/s, of which the active video streaming is 720 × 576 ×
25 × 16 = 166 Mb/s.
For NTSC: 858 × 525 × 29.97 × (8 + 8) = 216 Mb/s, of which, similarly, the active video streaming
is 720 × 480 × 29.97 × 16 = 166 Mb/s.
This is a bit rate for a digitized noncompressed live video streaming, according to ITU-601 with
4:2:2. If the 4:4:4 sampling strategy is used, or even 10 bits instead of 8 bits sampling (which is done
in the broadcast TV for video editing and processing), this number of over 166 Mb/s streaming becomes
almost twice as big.
Such a streaming is impractically high since even a 100BaseT network will not have sufficient bandwidth
to cope with only one live video, let alone multiple cameras, as we have them in CCTV. So the first and
most important thing that needs to be applied to the digitized video signal is compression.
Digital CCTV would be impossible without video compression.
Various video compressions are used in broadcast television as well, on the Internet for video streaming,
DVD recording and so on, but the CCTV industry makes most of it as it goes to the extreme with the
compression technologies, often making best compromises between highest possible compressions and
maximum picture quality. This is especially important when using multiple cameras into a typical DVR
Analog-to-digital conversion signal flow in a typical digital video recording system
9. Digital video
(multiplexed recording with typically 16, 18, 24, or 32 cameras). There are a variety of techniques and
standards, which offer different advantages.
A typical noncompressed full frame video image can be over 1.244 MB in size in PAL (720 × 576 × 3
= 1.2 MB) where we assume 3 colors and 8-bit sampling. Hence 8 bits can be converted to 1 byte. In
CCTV we work with a compressed image size of less than 100 kB, and often even lower than 10 kB.
When video compressions are used (instead of image compressions), the compression is not
expressed in field or frame size in kB, but rather as a streaming in kb/s or Mb/s. So, for example,
a high DVD quality video compression using
MPEG-2 takes around 4 Mb/s. A decent
quality of MPEG-4 streaming over the
Internet can be around 256 to 512 kb/s. How
far you can go in increasing the compression
depends on how much detail one is willing
to sacrifice from the image and the type of
compression used, but, again, there is no
doubt: compression must be applied.
It should be made clear that other video
processing can be done on a digitized signal
as well, before or after the compression.
Some processing makes a simple division
and recalculation in order to put the images
A typical split screening, usually available for
in smaller screens (as is the case with split
live viewing and playback of multiple cameras
screen compressors or multiplexers), other
processing may perform sharpening (which is actually an algorithm where every pixel value of the
image is changed on the basis of the values of the pixels around it) or video motion detection, and still
others may reduce the noise in the signal, and so on.
A graphical comparison between typical good-quality image and video compression
ratios used in CCTV. Note the better efficiency of the MPEG-2 for the similar quality.
9. Digital video
When a video signal is digitized and compressed, it is possible to store it (record it) or send it via a
network, the Internet, or any other communication channel much more quickly. These are only some
of the important flexibilities of digitized video which are impossible with the analog version of it.
The advantages of using digital networked CCTV are obvious: in many businesses, campuses or factories,
networks are already in place. Providing there is a green light from the responsible IT people for using
the corporate Local Area Network (LAN), the digital CCTV system can easily be retrofitted in such
environments, and furthermore, the distances can be increased by simply joining adjacent LANs to form
a Wide Area Network (WAN). It is quite obvious, in the age of the Internet that a local CCTV system
can easily become a global large-scale system connecting continents as if they were a street across.
Local networks and network cables have their limitations as well (this is discussed in detail in the
Networking chapter), and if longer distances are to be achieved, network repeaters are required (this
is usually done by network switches and routers).
Many digital CCTV systems today are designed to make use of the Internet, and once we get on the
Internet it seems as if there is no distance limitation since the Internet takes care of the repeaters and
signal amplification needed to get from point A to point B, no matter how far apart they are.
Digital CCTV cameras are now also available (usually referred to
as IP or LAN cameras) which can really bear this name since they
can be plugged into the existing LAN and be accessed via a web
browser using their IP address, as opposed to the digital signal
processing (DSP) cameras we spoke about in Chapter 5, which
produce analog composite video. Admittedly, IP cameras are still
used mainly in smaller installations, in video conferencing over
the Internet, or in industrial and specialized scientific applications.
At today’s stage of technology, the image quality and refresh rate
of IP cameras is not as high as the same from analog cameras, but
there is no doubt that the digital image processing and compression
Courtesy of Axis
technology goes so fast forward that the time IP cameras can be
A typical IP camera (there is
compared with their analog counterparts will come.
no BNC at the back)
The data rate for a good-quality live video of multiple cameras
could be very high even if it were compressed, and such a data rate would require better cables. More
importantly, after talking to the IT people where such a system is put in place, it could also be found
that most of them would have concerns about the data bandwidth consumed by such a digital CCTV. So
very often we will be faced with a requirement for a digital CCTV system with controlled bandwidth, or
even with its own dedicated LAN. Such an approach requires that we learn yet another new technology
– networking, TCP/IP concepts, and everything else you need to know when switching to digital. A
chapter on this topic is also included in this latest book on CCTV.
So let us now describe each of the compression technologies as we see and use them in CCTV.
9. Digital video
Types of compressions
CCTV has it all: JPEG, M-JPEG, Wavelet, H.263, MPEG-1, MPEG-2, JPEG-2000, MPEG-4, H.264,
and so on. There are too many different image compression techniques. How do you know which one
is best for you?
The answer is, without any doubt, not easy to find. One has to understand the concept of digitized images
and the limitation of the TV standards, on top of which digitized video and compression limitations
have to be added.
In general, there are two basic types of image/video compressions: loss-less and lossy.
Loss-less offers very low compression ratios (usually not more than 3 to 4 times) and is generally not
used in CCTV, but in broadcast and video editing. So, the compression types we will concentrate on in
this book are the lossy ones. Lossy means that certain details of the image (video) are lost and cannot
be retrieved, no matter what we do after they are compressed. Good compression is not the one that
offers the highest squeeze, but the one that offers the best compromise between quality and small
file size.
One of the most popular image compressions today is the JPG, used most often in digital photography.
We all know it, have seen it, and have experienced that a typical JPG compression of up to 10 times
hardly introduces any visible deterioration in the picture. So if you have a 4 Megapixel digital camera,
for example, it would produce a noncompressed file size of around 12 MB. This is not a small size to
work with and is not an easy task if we need to store more than a couple of such images on a typical
32 MB flash card. Yet, if we use just a normal JPG compression of 10 times, we get a bearable size of
around 1 MB to work with and no noticeable losses. The problem in CCTV is that we usually want
compressions even higher than 10 times. Do not forget that, as discussed in the previous heading, one
noncompressed TV frame when digitized becomes a data packet of over 1 MB in size. A 10 times
compression will make it around 100 kB file size, which some DVRs and IP cameras use, but very often
the need for long storage pushes the demand for
compression much further than this. It is not
rare to hear manufacturers quoting 100 times
compression per TV field. Common sense will
tell you that many more details are lost when
going to higher compressions, and furthermore
some artifacts are introduced. Again, finding the
best compromise between a decent quality and
a small file size is the real answer. It is also fair
to say that there are some smart and interesting
solutions (usually proprietary) that make a good
effort of getting further reduction in size by
introducing some kind of temporal redundancy
(static background, for example, of an image
is not recorded again, but only the difference A good choice of compression, camera, and
between the new image and the previous), lens can show vehicle license plates clearly.
9. Digital video
Noncompressed image (720 x 576 pixels) on the left (digitized and noncompressed
around 1.2 MB) and the same on the right compressed 100X with JPG compression
which in a way is somewhat similar to the principles on which MPEG and H series compressions are
based. But no matter what compression you decide to use, the source of your video signal, the camera,
should have the best quality signal you can get. This means a good camera and a good lens. Only when
the original video signal is optimized and shows good detail and color, can make your effort to have
the digitized video be almost as good.
You cannot have, or reproduce, a detail in a digitally recorded picture if such a detail was not
seen by the camera in the first place. This is a very basic and trivial statement, but very often I have
seen security managers want to recognize a number plate in their digitally recorded image, which was
not seen by the camera in the first place. So a simple rule of thumb would be: a digitally recorded and
replayed image cannot appear better than the original signal coming out of the camera.
It is worthwhile investing in a good-quality camera and lens. A good-quality camera is the one with highresolution CCD or CMOS chip, good signal/noise ratio, good dynamic range, low light performance,
and a good lens. Based on practical experience, it should be noted that when using CCTV cameras
for digital recording, the signal/noise ratio is of high importance for the digitized image quality.
The resolution is important, but the low noise performance is probably even more important for the
simple fact that when there is a too high noise content in the image the compression engine works
around the noise speckles as if they were a useful content of the captured image. So, if your camera
has a low S/N ratio (i.e., high noise content), after the compression the image will look worse than it
appears while viewing it live. In simple words, the higher this ratio is (50 dB+), the better quality the
digitized signal will be.
Once a good-quality analog signal is digitized, if it uses the ITU-601 recommendation it will be nearly
as good as the original signal (assuming we are using the full TV frame resolution) after which the
compression stage is the one that further reduces the picture quality. The compression stage is, in a way,
a resolution bottleneck. An important note should be made here not to confuse the number of pixels
with the compression loss of resolution. When using a full frame image capture and compression,
the number of frame pixels will still stay the same, say 720 × 576, but the artifacts produced with the
compression may change the picture resolution appearance. This is why we say that for a full TV frame
video, the compression stage is a resolution bottleneck.
9. Digital video
DCT as a basis
One of the most common mathematical transformations used on two-dimensional images is the Discrete
Cosine Transformation (DCT). This is the basis for almost all compression techniques used in CCTV,
with the exception of the Wavelet and JPEG-2000. So JPEG, MPEGs, and H series compressions all
use DCT in one form or another. Because of this, it is important to say a few words about it.
The DCT is based on the Fourier Transformation. The Fourier Transformation is a very good method for
analyzing signals in frequency domain. The only “problem” is that it always works with an assumption
of signals being periodical and infinite. This is never the case in reality and this is why an alternative
to the Fourier Transformation, the Fast Fourier Transformation (FFT), was introduced in the 1960s.
The DCT is based on FFT.
So how does the Discrete Cosine Transformation work? Spatial redundancy is found in all video material,
be that CCTV or broadcast television. If there is a sizable object in the picture (TV field), all of the
pixels representing that object will have quite similar values. This is redundancy, that is, it is possible
to reduce the amount of information of each pixel with a value of the one giving the average and by
defining the area only. Large objects produce low spatial frequencies, whereas small objects produce
high spatial frequencies. Generally, these frequencies will not be present at a high level at the same
time. Digitized video has to be able to transmit the whole range of spatial frequencies, but if a frequency
analysis is performed, only those frequencies actually present need be transmitted. Consequently, an
important step in compression is to perform a spatial frequency analysis of the image.
The illustration here shows how the
two-dimensional DCT works. The
image is converted a block at a time. A
typical block is 8 × 8 pixels. The DCT
converts the block into a block of 64
coefficients. A coefficient is a number
that describes the amount of a particular
spatial frequency that is present. In the
figure the pixel blocks that result from
each coefficient are shown. The top
left coefficient represents the average
brightness of the block and so is the
arithmetic mean of all the pixels or the
DC component. Going across to the right,
the coefficients represent increasing
horizontal spatial frequency. Going
downwards, the coefficients represent
increasing vertical spatial frequency.
Now the DCT itself does not achieve any
compression. In fact, the word length of
the coefficients will be longer than that of
The Discrete Cosine Transformation principles
9. Digital video
the source pixels. What the DCT does is convert the source
pixels into a form in which redundancy can be identified.
Because not all spatial frequencies are simultaneously
present, the DCT will output a set of coefficients where some
will have substantial values, but many will have values that
are almost or actually zero. If a coefficient is zero, it makes
no difference whether it is sent. If a coefficient is almost
zero, omitting it will have the same effect as adding the same
spatial frequency to the image but in the opposite phase.
The decision to omit a coefficient is based on how visible
that small unwanted signal would be, which is defined by
the compression scale. If a coefficient is too large to omit,
compression can also be achieved by reducing the number
of bits used to carry the coefficient. This has the same effect
as when a small noise is added to the picture. Some typical,
and unwanted, artifacts when using DCT are the “blocky”
appearance of the highly compressed images. This is due
to the DCT function being applied to each 8 × 8 block of
The scanning pattern when doing
Inverse DCT (IDCT) on blocks of
8 x 8 pixels
Readers should note that Wavelet compression is different compared to JPG in that Wavelet compression
“looks at” the whole image, not blocks of 8 × 8, and hence Wavelet artifacts do not have such “blocky,”
but rather “foggy” appearance. In both DCT-based and wavelet-based compressions, there are losses, and
this is why these compressions are called lossy compressions. The idea is to find the best compromise
between a high compression in order to reduce file size and the best image quality without too much
visible loss.
9. Digital video
The variety of compression standards in CCTV
In CCTV we use a variety of video and image compressions, probably more than in any other industry.
For example, in photography, JPEG is used most often when space saving has to be made. In broadcast
television MPEG-2 is the dominant one, while on the Internet and in the computer industry, MPEG-4
has become a very common compression format.
In CCTV we find almost all types of compression in a variety of products. In order to understand them
correctly, we should divide them into two main categories, as already explained in the previous few
headings: compression applied to still images, which we call image compressions, and compressions
applied to a continuous streaming video signal, and therefore called video compressions. The image
compressions use static images, while the video compressions use the time as an important variable
when reducing image redundancy and this is why video compressions are often referred to as temporal
Each type has its own advantages, and often it is not easy to disregard one in favor of the other. Typically,
image compressions are used on multiplex DVRs where multiple cameras are mixed and recorded onto
a single set of drive. Some DVR manufacturers use two different compressions in the same machine,
depending on the need. One could be for local recording, for example, and the other for remote transmission over narrow bandwidth communication channels where temporal compression might be more
efficient. So it is important to understand them all and have an open mind and flexibility in respect to
which one is to be used for a particular system.
Some authors make a compression division based on what standard group has proposed them, such as
ITU-T or ISO. But of course there are many proprietary compressions that some manufacturers offer
as their own, so we cannot really use such a division. Furthermore, new developments indicate that
ITU-T and the ISO/IEC group will merge their work. They have basically agreed to have the ITU-T
and the ISO/IEC JTC1 join their efforts in the development of the emerging H.264 standard, which
was initiated by the ITU-T committee.
The time progression of various video standards and the ITU-T and ISO/IEC joint work
9. Digital video
The following are the most common image compressions used in CCTV today, in the order of time
• JPEG – A widely spread standard, over 15 years in existence. Uses DCT type of compression.
Incorporated and used by many programs, such as image editing and web browsers.
• M-JPEG – A variation on JPEG, and not really a standard. M-JPEG stands for Motion JPEG,
where each image is an independently compressed TV field or frame using JPEG compression.
• Wavelet – A very popular image compression in CCTV. Offers better detail efficiency than
JPEG as it does not divide the image in blocks of 8 × 8 pixels.
• JPEG-2000 – A standardized version of the Wavelet compression. Plug-ins are available for
JPEG-2000 for a variety of image editing programs and web browsers.
• Motion JPEG-2000 – Similar to the M-JPEG, but this time using JPEG-2000 as a basis.
The following is the evolution of video compressions:
• H.261 – Low bit-rate technique introduced in 1984 by the ITU for audiovisual services.
• MPEG-1 – ISO standard, created as a modification of H.261 for the transfer of video onto CD
at low bit rates (at around 1.5 Mb/s).
• MPEG-2 – Introduced for broadcast-quality video: uses lower compression levels to enable
transfer of high-quality video. Today, used by majority of TV stations, DVDs, and cable television, and many DVR manufacturers.
• H.263 – An adaptation of MPEG-2 introduced to achieve higher levels of video compression
while maintaining high picture quality. Adopted worldwide in 1996. It was revised in 1998. The
H.263+ and H.263++ are enhanced versions of H.263.
• MPEG-4 – Developed as an object-based compression. There are a few versions of it. Handles
compression of video and audio and a wide variety of streaming rates. Suitable for anything that
uses narrow bandwidths, from mobile phones, the Internet, to television.
• MPEG-7 – New; defines an interoperable framework for content descriptions.
• MPEG-21 – New; describes the big picture of handling all objects in a variety of MPEGs.
• H.264 – Newest work based on H.263 and MPEG-4 (also called AVC), which offers a wide
range of video quality, including more efficient coding for HDTV (quoted up to three times
more efficient than MPEG-2).
So let us now analyze all of them separately.
9. Digital video
JPEG stands for Joint Photographic Experts Group of the ISO, which is the original name of the
committee that prepares the digital photographic standard.
JPEG is also named the standardized image compression mechanism which uses DCT in order
to reduce the image redundancy. It works only with still digital images, and resolution is not
Although it is widely used in digital photography and web-based technology, we use JPEG in CCTV
as well, where compression is applied to the digitized video (TV fields or TV frames), treating them
as independent still snapshots.
JPEG has a subgroup recommendation for loss-less compression (of about 2:1), but, as we mentioned
earlier, in CCTV we are more interested in the lossy compression JPEG, where compression factors of
over 10× are possible. JPEG works by transforming blocks of 8 × 8 picture elements using the discrete
cosine transformation (DCT). The compression factors achieved with lossy JPEG compression are quite
high (over 10 times), and the picture quality loss appears insignificant to the human eye.
JPEG is designed to exploit the known limitations of the human eye, like the fact that fine chrominance
details are not perceived as well as fine luminance details in a given picture. For each separate color
component, the image is broken into 8 × 8 blocks that cover the entire image. These blocks form the
input to the DCT. Typically, in the 8 × 8 blocks, the pixel values vary slowly. Therefore, the energy
is of low spatial frequency. A transformation that can be used to concentrate the energy into a few
coefficients is the two-dimensional,
8 × 8 DCT. This transformation,
studied extensively for image
compression, is extremely efficient
for highly correlated data.
JPEG stores full-color information:
24 bits/pixel (16 million colors)
compared to the graphics
interchange format (GIF),
for example (another popular
compression technique among
PC users), which can store only
8 bits/pixel (256 or fewer colors).
Gray-scale images do not compress
by such large factors with JPEG
because the human eye is much
more sensitive to brightness
variations than to hue variations
and JPEG can compress hue data
more heavily than brightness data.
The DCT blocking used in JPEG
9. Digital video
A 49 kB JPG TV field image of the CCTV Labs test chart; enlarged detail on the right
A 15 kB JPG TV field image of the CCTV Labs test chart; enlarged detail on the right
An interesting observation is that a gray-scale JPEG file is generally only about 10 to 25% smaller
than a full-color JPEG file of similar visual quality. Also, it should be noted that JPEG is not suitable
for line art (text or drawings), as the DCT is not suitable for very sharp B/W edges.
JPEG can be used to compress data from different color spaces such as RGB (video signal), YCbCr
(converted video signal), and CMYK (images for the printing industry) as it handles colors as separate
components. The best compression results are achieved if the color components are independent
(noncorrelated), such as in YCbCr, where most of the information is concentrated in the luminance
and less in the chrominance.
Since JPEG files are independent of each other, when used in CCTV recording, they can be easily
played back in reverse direction. Playback speed can be increased or reduced and copied as single files
or groups of files.
9. Digital video
Motion JPEG (or M-JPEG) is a JPEG derivative typically used in CCTV only. M-JPEG does not
exist as a separate standard but rather it is a rapid flow of JPEG images that can be played back at a
sufficiently high rate to produce an illusion of motion. Because the relation between individual frames
is not taken into account in M-JPEG, this method achieves relatively low compression rates compared
to the temporal compressions, such as the H.26x or MPEG described later. However, M-JPEG is used
by some DVR manufacturers where multiple cameras are used.
The M-JPEG method is not internationally standardized, and JPEG does not include a transmission
standard. The implementations of different manufacturers are therefore incompatible. As a variation,
the difference between consecutive images is often also coded with the JPEG method to achieve a
further reduction in the volume of data. This differential frame method is also not standardized, so that
the decoder of the same manufacturer is required for decoding.
A 45 kB Wavelet field image of the CCTV Labs test chart; enlarged detail on the right
A 15 kB Wavelet field image of the CCTV Labs test chart; enlarged detail on the right
9. Digital video
For many decades, scientists have wanted more appropriate functions than the sines and cosines that
comprise the bases of DCT’s Fourier analysis to approximate choppy signals. By their definition,
sines and cosines are nonlocal functions (they are periodical and stretch out to infinity). This is the
main reason they do a very poor job of approximating sharp changes, such as high-resolution details
in a finite, two-dimensional picture. This is the type of picture we most often have in surveillance
time-lapse multiplexed recording, as opposed to a continuous stream of motion images in broadcast
television. Wavelet analysis is one that works differently, and it is more efficient in preserving the
small details.
The wavelet mathematical apparatus was first explicitly introduced by Morlet and Grossman in
their works on geophysics during the mid-1980s. As a result, wavelet compression was first used in
scientific data compression
such as astronomy and
seismic research. It was soon
discovered that it would be
extremely useful in CCTV,
when Analog Devices
wavelet compression chip
601 was introduced. Wavelet
compression transforms the
entire image as opposed to
8 × 8 sections in JPEG and
is more natural as it follows
the shape of the objects in a
One of the clever wavelet ways of coding a picture and
picture. This is why wavelet
reducing the redundancy by a zig-zag method
has become especially
attractive for CCTV.
With wavelet we can use approximating functions that are contained in finite domains. Wavelets are
functions that satisfy certain mathematical requirements and are used in representing data or other
functions in wavelet analysis. The main difference compared to the FFT (DCT) analysis is that the
wavelets analyze the signal at different frequencies with different resolutions, i.e., many small
groups of waves, hence the name wavelet. The wavelet algorithms process data at different scales or
resolutions and try to see details and the global picture, or as some wavelet authors have said, “see
the forest and the trees” as opposed to Fourier analysis which “sees just the forest.”
Wavelets are well suited for approximating data with sharp discontinuities. The wavelet analysis
procedure is to adopt a wavelet prototype function, called an analyzing wavelet or mother wavelet.
Time analysis is performed with a contracted, high-frequency version of the prototype wavelet, whereas
frequency analysis is performed with a dilated, low-frequency version of the prototype wavelet. Because
the original signal or function can be represented in terms of a wavelet expansion (using coefficients
in a linear combination of the wavelet functions), data operations can be performed using just the
corresponding wavelet coefficients.
9. Digital video
Another interesting feature of
wavelet is the “Area of Interest”
or “Quality Box” function, where
an image presence can be detected
based on motion, for example,
and have that area compressed
with better quality relative to the
rest of the same image. By using
such an intelligent selection,
the file size is extremely small, Wavelet chips offer a so-called Area of Interest or someyet offers best details where the
times called Quality Box, shown on the right.
important object is.
JPEG-2000 (ISO 15444) is basically a standardized version of the wavelet compression, produced by
the JPEG group. At the time when wavelet compression chips were introduced by Analog Devices,
back in the 1990s, there was no common or standardized wavelet file format. The JPEG group realized
the superiority of wavelet and started working on a new standard for image compression. Its release
was scheduled for the year 2000, hence the name JPEG-2000.
The JPEG-2000 standard makes way for wide usage of the wavelet compression with full compatibility
between various products and programs. Many software plug-ins and hardware compression chips can
be found today, and images can be exchanged between various platforms. It is possible to download
a JPEG-2000 PhotoShop or web-browser plug-in from the Internet for example. Some photo editing
programs such as Corel Photo Paint
and JASC Paint Shop Pro already
embed a JPEG-2000 codec. This is
the purpose of standardization, to
have an exchangeable file format
between a variety of programs. Many
manufacturers now have JPEG-2000
hardware codec chips in their range of
Furthermore, the JPEG-2000 standard
defines usage of embedded information
about the author or source of the image,
or, what is interesting for us in CCTV,
the originality of the image. There are
some variations of JPEG-2000, one of
which refers to motion video and is
called Motion JPEG-2000.
Courtesy of Analog Devices
The new Analog Devices ADV202 chip uses JPEG2000 and promises a lot, both in HDTV and CCTV.
9. Digital video
Motion JPEG 2000
Motion JPEG-2000 is new, and although not used in CCTV yet, it is mentioned here as a highly flexible
and promising format. Because of the wavelet scalability, Motion JPEG can reproduce any size video
frame rate on the fly, from the same video stream. This is ideal for full-frame local storage along with
subresolution transmission over narrow bandwidth. Motion JPEG-2000 uses only key frame (TV fields)
compression, allowing each frame to be independently accessed. Key frame compression provides the
extremely accurate frame-by-frame time stamps needed for surveillance and evidentially procedures.
This is important for multiplexed recording in CCTV, but also for video editing. Real-time encoding
allows video to be compressed at capture time, enabling more efficient storage on and higher quality
video transmission over a network or the web.
MPEG-1 (ISO 11172) is one of the first video compression standards, proposed by the ISO’s Motion
Picture Experts Group soon after the introduction of the H.261. It belongs to the video compression
group; it works with continuous digitized video signal and includes two channels of audio. The output
visual quality at typical bit rates (as used in VCDs, for example) is comparable to that of an analog
VHS VCR. The audio layer of MPEG-1 is the actual, now popular, audio format MP3.
MPEG-1 is defined to work with CIF size (352 × 288 for PAL; 352 × 240 for NTSC source) video
sequence. The color information is sampled with half of that resolution: 176 × 144 (i.e., 176 × 120).
Typical video rates MPEG-1 works with are between 1 Mb/s and 3 Mb/s. Around 1.5 Mb/s is a data
speed achievable by majority CD players in the time when MPEG-1 was introduced, and this was one
of the major applications for the MPEG-1 video compression. Up to one hour can be stored on a CD
of 700 MB, which is the reason two CDs were needed for VCD movies.
MPEG does not define compression algorithms (although it is based on DCT), but rather the
compressed bit stream – the organization of digital data for recording, playback, and transmission.
The actual compression algorithms are up to the individual manufacturers, and their quality may
The basic idea in all temporal compressions
is to predict motion from frame to frame
in the temporal direction and then to use
the DCT to organize the redundancy in the
spatial directions. The DCTs are done on
8 × 8 blocks, and the motion prediction is
done in the luminance (Y) channel on 16 ×
16 blocks. In other words, the block of 16 ×
16 pixels in the current frame is coded with
a close match to the same pixel block in a
previous or future frame. This describes the
backward prediction mode, where frames
Courtesy of Dallmeier
An extremely simiplified representation of
how predicted pictures are calculated from I
9. Digital video
coming later in time are sent first to allow interpolating between frames. The DCT coefficients (of either
the actual data or the difference between this block and the close match) are quantized, which means
that they are divided by some value to drop bits off the bottom end. Hopefully, many of the coefficients
will then end up being zero. The quantization can change for every “macroblock” (a macroblock is 16
× 16 of Y and the corresponding 8 × 8’s in both U and V). The results of all of this, which include the
DCT coefficients, the motion vectors, and the quantization parameters (and other stuff), are encoded
using the so-called Huffman code, using fixed tables.
There are three types of coded frames (pictures) in MPEG-1 (the same applies to MPEG-2): the intra
frames (I), predicted frames (P), and the bidirectional frames (B).
The I pictures are basically still pictures compressed as JPG, and they are used as reference pictures
(frames). The P pictures are predicted from the most recently reconstructed I or P frame. Each
macroblock in a P frame
can either come with a
vector and difference
DCT coefficients for a
close match in the last
I or P, or it can just be
“intra” coded (as in the
I frames) if there was
no good match. The B
pictures are predicted
from the closest two I or
P pictures, one in the past
and one in the future. This
is why they are called
bi-directional, referring
to using the past and
“future” images. By the
The Group of Pictures (GOP) interrelation in MPEG
way, this is the source
(a GOP size 9 is shown above)
of the known delay
(latency) associated
with the MPEG encoding.
The combination of I, P, and B pictures in MPEG is called a Group of Pictures (GOP).
If a GOP is composed of only one image, that would be only the I frame, and functionally it would
be equivalent to Motion-JPEG. There is no temporal redundancy (saving) in such a case. When GOP
gets around 12 or 15, it achieves the best compromise between a good compression and not too large
A typical GOP sequence of 9, which always repeats itself, would look like this:
9. Digital video
Latency is a new side effect of the MPEG efforts to reduce redundancy by motion prediction. This
is the price MPEG is paying for getting better picture quality at lower data rates. Most of the MPEG
machines offer a choice of bit rates and GOP sizes – a combination of which can be selected to reduce
the latency to below noticeable levels by choosing higher bandwidth and smaller GOP sizes. Basically,
the number of pictures in a GOP define the latency. So if, for example, we have a GOP size of 12, in
PAL this makes a half a second in time, which becomes such a delay. If we add to this delay the network
latency, it becomes clear why a latency of nearly a second or even more is sometimes noticeable in
MPEG coding.
The latency may not even be noticed in a fixed camera CCTV system, but clearly this could be a problem
with PTZ camera control. So what is the acceptable latency when controlling a live video camera over
LAN and MPEG streaming? This really is defined by the human reaction speed. When driving a car,
for example, around 200 ms is taken as the fastest reaction a person can have. So, if we use this as a
guide, it will make practically acceptable latency time.
Another interesting, but positive, side effect of the bidirectional macroblock prediction is noise reduction
because of the averaging.
The practical application of MPEG-1 is most often in storing video clips on CD-ROMs, but is also
used in cable television and video conferencing. There are, however, some digital recorders designed
especially for CCTV applications where real-time video is recorded using MPEG-1 technique. In these
applications, real-time video is more important than high-resolution video at a lower rate. The majority
of higher quality MPEG-2 recorders are backwards compatible with MPEG-1 and can record and play
back video streaming made in MPEG-1.
MPEG-2 (ISO 13818) is not a next generation MPEG-1 but rather another standard targeted for
higher quality digital video with audio. It was proposed by the ISO’s MPEG group in 1993, and,
like MPEG-1, MPEG-2 is also an Emmy Award-winning standard. The MPEG-2 standard specifies
the coding formats for multiplexing high-quality digital video, audio, and other data into a form
suitable for transmission or storage.
9. Digital video
MPEG-2, like MPEG-1, does not limit its recommendations to video only, but also includes audio. It
should be highlighted again that MPEG-2 is not a compression scheme or technique, but rather a
standardization of handling and processing digital data in the fastest and most optimized way. MPEG2 encoding can produce data rates well above 18 Mb/s, although in most practical CCTV applications
there is hardly any visible difference between a live camera signal and its 4 Mb/s encoded video.
MPEG-2 is designed to support a wide range of applications and services of varying bit rate, resolution,
and quality. The MPEG-2 standard defines four profiles and four levels for ensuring the interoperability
of these applications. The profile defines the color space resolution and scalability of the bit stream.
The levels define the maximum and minimum for image resolution, and Y (Luminance) samples per
second, the number of video and audio layers supported for scalable profiles, and the maximum bit
rate per profile.
As a compatible extension, MPEG-2 video builds on the MPEG-1 video standard by supporting
interlaced video formats and a number of other advanced features. MPEG-2 today is used in almost
all broadcast television services, such as DBS (direct broadcast satellite), CATV (cable television),
HDTV (high-definition television), and of course the now popular movie format – DVD. A singlelayer, single-sided DVD has enough capacity to hold two hours and 13 minutes of high-quality video,
surround sound, and subtitles.
Like MPEG-1, the MPEG-2 is based on
GOPs made up of I, P, and B pictures.
The I pictures are intracoded, that
is, they can be reconstructed without
any reference to other pictures. The P
pictures are forward predicted from the
last I picture or P picture, that is, it is
impossible to reconstruct them without
the data of another picture (I or P). The
B pictures are both forward predicted
and backward predicted from the last
and next I pictures or P pictures, that is,
there are two other pictures necessary
to reconstruct them. Because of this, P
pictures and B pictures are referred to as Motion vectors are used to predict the movement
intercoded pictures.
of objects between I and P frames.
In its prediction algorithm, MPEG-2 works with motion vectors. Imagine an I frame showing a circle
on white background. A following P frame shows the same circle but at another position. Prediction
means to supply a motion vector, which declares how to move the circle on an I frame to obtain the
circle in a P frame. This motion vector is part of the MPEG stream and it is divided in a horizontal and
a vertical part. These parts can be positive or negative. A positive value means motion to the right or
motion downwards, respectively. A negative value means motion to the left or motion upwards. But
this model assumes that every change between frames can be expressed as a simple displacement of
pixels. There is also a prediction error matrix in MPEG stream, which helps in accurate reconstruction
9. Digital video
of the motion.
In the beginning of the CCTV digital era (only half a dozen years ago), only a very few DVR
manufacturers were using MPEG-2. Today many more see the benefits of high-quality digital video
recording, and many more unexplored MPEG-2 functionality are used in CCTV. For example reverse
playback, slow play forward or in reverse direction, as well as the incredible fast forward or rewind
speed of up to 1024 times the normal speed and even video motion detection triggered recording.
Admittedly, MPEG-2 is not designed to work with multiple cameras as there will hardly be any use
of the temporal redundancy in that case. So, a DVR working with MPEG-2 video compression will
usually stream a single-camera digitized signal to a hard disk, but some manufacturers have models
where multiple channels are streamed concurrently onto the same hard drive. Clearly, when knowing
the video data rate for a good quality video, for example, 4 Mb/s, it can easily be calculated that with
today’s hard disk technology it is possible to have quite a few channels in one box, even if we allow
for simultaneous playback of the same drive.
Although MPEG-2 encoding can be done in the software, with reasonably fast processors, in CCTV
it is always preferable to have this done by dedicated hardware compression chips, so at least the
encoding (recording) is not compromised and there are no gaps in it. Decoding (playing back) can be
made via software decoders, and there are quite a few around, since MPEG-2 is a standard. Windows
Media Player, Apple Quick Time, Real Audio, and so on, are all examples of software players capable
of playing the MPEG-2. With some MPEG-2 DVRs it is possible to burn a CD or DVD with MPEG-2
footage, which can be directly played onto a commercial DVD player.
Many high-end DVR manufacturers offer hardware encoding (recording) while doing hardware
decoding on a composite or Y/C monitor, as well as software decoding of the same machine from
another point in time in the past (usually referred to as Time Shift technology) for playback or backup
over network. This multiple functionality sometimes is referred to as triplex, quad-plex, or penta-plex
operation. In the latter case the following five functions would be performed concurrently: continuous
recording, playback a certain instance from the past on a composite monitor, export a footage from
the past on local external drive or CD, for
example, playback another instance from
the past via network on a PC, and back
up to another PC or network storage via
network, all at the same time. If all these
processes are coming out of, or being
written to, the same hard disk (which
would usually be the case), then the
hard drive data transfer rate needs to be
able to sustain such a speed. This is one
important reason some manufacturers
prefer to to have a single channel MPEG2 DVR rather than multiple channels in
The same GOP idea is used in MPEG-2 as in
one box.
9. Digital video
MPEG-2 is suitable for any security application because it offers the best picture quality, but it is
most often used in projects and systems where quick activities are typical, such as casinos and banks.
Considerations have to be made when PTZ cameras need to be controlled over LANs with MPEG-2
and latency taken into account, but as mentioned earlier, this can be reduced to around 200 ms, or less,
with the bit rate and GOP size smart adjustments.
It is fair to mention that because of the high data rate it uses, MPEG-2 is not suitable for remote and
narrow bandwidth communications. Many manufacturers offer MPEG-4 as an add-on compression
for such applications, since MPEG-4 is more flexible and is designed to make a maximum of narrow
bandwidths, such are Internet DSL upload speeds of 128 kb/s, 256 kb/s, or more.
MPEG-4 (ISO 14496) is another MPEG standard developed only recently. It is so recent that it was not
even practically used before the previous edition of this book was published five years ago.
MPEG-4 is the result of another international effort involving hundreds of researchers and engineers
from all over the world. MPEG-4, whose formal ISO/IEC designation is ISO/IEC 14496, was finalized
in October 1998 and became an International Standard in the first months of 1999.
The MPEG-4 visual standard is developed to provide users a new level of interaction with visual
contents. It provides technologies to view, access, and manipulate objects rather than pixels, with
great error robustness at a large range of bit rates. Application areas range from digital television
and streaming video to mobile multimedia, games, and of course CCTV and surveillance.
A major difference between MPEG-4 and the previous audiovisual standards, is the object-based
audiovisual representation model. An object-based scene is built using individual objects that have
relationships in space and time, offering a number of advantages. The MPEG-4 standard opens new
frontiers in the way users will play with, create, reuse, access, and consume audiovisual content. The
MPEG-4 object-based representation approach where a scene is modeled as a composition of objects,
both natural and synthetic, with which the user may interact, is at the heart of the MPEG-4 technology.
The handling of objects (especially the synthetic ones) and the interactivity are the heart of MPEG-4,
which unfortunately in CCTV we cannot make any use of.
Motion compensation is block based, with appropriate modifications for object boundaries. The block
size can be 16 × 16, or 8 × 8, with half pixel resolution. MPEG-4 also provides a mode for overlapped
motion compensation. Texture coding is based in 8 × 8 DCT, with appropriate modifications for object
boundary blocks. Coefficient prediction is possible to improve coding efficiency. Static textures can
be encoded using a wavelet transform. Error resilience is provided by resynchronization markers, data
partitioning, header extension codes, and reversible variable-length codes. Scalability is provided for
both spatial and temporal resolution enhancement. MPEG-4 provides scalability on an object basis,
with the restriction that the object shape has to be rectangular. This is perhaps one of the most useful
features for us in CCTV, for it allows scalable streaming over narrow bandwidths.
9. Digital video
The MPEG-4 visual standard has been explicitly optimized for three bit-rate ranges: below 64 kb/s, 64
to 384 kb/s, and 384 to 4 Mb/s.
MPEG-4 Video offers technology that covers a large range of existing applications as well as new
ones. The low bit rate and error-resilient coding allows for robust communication over limited rate
wireless channels, useful for mobile videophones, space communication, and certainly, CCTV. At high
bit rates, tools are available to allow the transmission and storage of high-quality video suitable even
for studio and other very demanding content creation applications. The standard has evolved through
many versions, and support data rates beyond those of MPEG-2.
A major application area, outside our industry, is interactive web-based video. Software that provides
live MPEG-4 video on a web page are very common.
MPEG-4 provides support for both interlaced and progressive (although progressive scan is rarely
used in CCTV) video material. The chrominance format that is supported is 4:2:0. In this format, the
number of Cb and Cr samples are half the number of samples of the luminance samples in both horizontal and vertical directions. Each component can be represented by a number of bits ranging from
4 to 12 bits.
As with MPEG-2, the MPEG-4 standard refers to a number of different Profiles. The MPEG-4 conformance points are defined at the Simple Profile, the Core Profile, and the Main Profile. The Simple
Profile and Core Profile address typical scene sizes of QCIF and CIF size, with bit rates of 64 kb/sec,
128 kb/s, 384 kb/s, and 2 Mb/s. The Main Profile addresses a typical scene sizes of CIF (352 × 288),
full standard definition ITU-R 601 (720 × 576), and High Definition (1920 × 1080), with bit rates at 2
Mb/s, 15 Mbit/s, and 38.4 Mb/s.
MPEG-4 is constructed as a toolbox rather than a monolithic standard, using profiles that provide solutions in these different settings. Although MPEG-4 is a rather big standard, it is structured in a way
that solutions are available at the measure of the needs. It is the task of each implementer to extract
from the MPEG-4 standard the technological solutions adequate to his needs, which are very likely a
small subset of the standardized tools.
MPEG-4 recorders are becoming more popular in CCTV, although they may not necessarily use the
same profiles, therefore, it should not be assumed that they are of the same visual quality.
MPEG-4 does not replace, as some may think, the MPEG-2, but it does offer wider flexibility in lower
bit rates and it does offer near live video transmission over 256 kb/s and better. Some manufacturers
even include MPEG-4 in their DVRs just for the purpose of remote connectivity and control, while
still using other compressions for the local recording.
New standard works by ITU-T and ISO are under way where the latest version (profile) of MPEG-4
and H.264 is supposed to bring new compressions levels where HDTV movies can be squeezed on a
high-capacity DVD suitable for HDTV movies and high-quality music.
9. Digital video
Although MPEG-7 and MPEG-21 (described briefly further in the text) are not the kind of video compressions we are used to in CCTV, it is important to mention them here, because they go one step further than compression. MPEG-1 and MPEG-2 provide interoperable ways of representing audiovisual
content, commonly used in broadcast television, video editing, and CCTV as well. MPEG-4 extends
this to many more application areas through features like its extended bit-rate range, its scalability, its
error resilience, its seamless integration of different types of objects in the same scene, its interfaces
to digital rights management systems, and its powerful ways to build interactivity into content.
MPEG-7 defines an interoperable framework for content descriptions way beyond the traditional
“metadata.” MPEG-7 has descriptive elements that range from basic signal features like colors, shapes,
and sound characteristics to high-level structural information about content collections. MPEG-7 is
also unique in its tools for structuring information about content.
MPEG-7 will complement MPEG-4, not replace it. MPEG-4 defines how to represent content;
MPEG-7 specifies how to describe it. MPEG-7 and MPEG-4 form a great couple, especially when
MPEG-4 objects are used. With MPEG-7, it is now possible to exchange information about multimedia content in interoperable ways, making it easier to find content and identify just what you wanted
to use. This could be an extremely powerful set of tools for us in CCTV where weeks and months of
information can be recorded on a set of hard drives. MPEG-7 could provide the answers about how to
find a particular object, a guy in the red shirt, for example, or a blue car that was stolen in the street.
MPEG-7 information will be, without any doubt, added to broadcasts, DVR recording, and various
visual search engines. It will greatly facilitate managing multimedia content in large storage drives.
Although currently in CCTV some products (DVRs) have the intelligence to find objects in certain
areas of activity (or inactivity), with the MPEG-7 such a search will be much more flexible and more
powerful, making CCTV even more efficient. We are yet to see when this will be implemented.
MPEG-21 is also a new standard, and it is not used in CCTV yet, but, in order to be complete with this
section we should mention it.
The MPEG-21 goal is to describe a “big picture” of how different elements to build an infrastructure
for the delivery and consumption of multimedia content – existing or under development – relate to
each other. The MPEG-21 world consists of Users that interact with Digital Items. A Digital Item can
be anything from an elemental piece of content (a single picture, a sound track) to a complete collection of audiovisual works. A User can be anyone who deals with a Digital Item, from producers to
vendors to end-users. Interestingly, all Users are “equal” in MPEG-21, in the sense that they all have
their rights and interests in Digital Items, and they all need to be able to express those. For example,
usage information is valuable content in itself; an end-user will want control over its utilization. A driving force behind MPEG-21 is the notion that the digital revolution gives every consumer the chance
to play new roles in the multimedia food chain.
9. Digital video
The H.320 standard is an ITU-T recommendation. It consists of a series of substandards that deal with
individual aspects of a complete system. For example, H.261 describes video coding, and H.221 is
responsible for multiplexing audio, video, data, and control information.
The H.320 recommendation is intended mainly for video conferencing systems and videophones and
is optimized for transmission via ISDN (Integrated Services Digital Network). At 128 kb/s (2 × ISDN
B channels), it is possible to achieve good image quality with a very good image refresh rate. Because
of its large range of bandwidth from 64 to 1920 kb/s, it can be used via almost all communications
media (LAN, WAN). In particular, because H.320 was developed for two-way video communication
between human beings, this standard is optimized for real-time transmission and does not have
the latency typical for MPEG-1 and MPEG-2.
In person-to-person communication, it is important that delays remain below 100 to 200 ms because
otherwise natural conversation is difficult. This short latency is very useful for CCTV, especially when
PTZ cameras are remotely controlled. In fact, some DVR manufacturers that use other image compressions for recording camera images use one of the H.320 standards when switching to PTZ control.
An important feature that can be implemented with H.320 in compliance with the standard is influencing
the image quality. The user can choose between resolution-optimized or motion-optimized transmission
and set a suitable compromise.
H.320 is not limited to image coding; it also standardizes all other components of a complete transmission system. The great advantage of H.320 therefore lies in its compatibility between terminals of
different manufacturers. For example, an ISDN videophone from one manufacturer can communicate
audiovisually with an ISDN video conferencing system or an ISDN video transmitter of another manufacturer if both support the H.320 standard.
The H.261 is one of the oldest video compression standards and is the actual standard some DVR
manufacturers started using in CCTV in the very beginning of the new DVR revolution. The H.261
is the actual video compression part of the H.320 video conferencing standard. At the time when
H.261 was introduced (the beginning of the 1980s) there was no Internet, and the fastest digital method
of transmission was done over ISDN lines (Integrated Services Digital Networks). That means that it
was optimized to compress video for transmission over ISDN lines that provide a range of bandwidths
from 64 kb/s to about 1.5 Mb/s. Like the MPEG standards, H.261 specifies formats appropriate for both
storage and transmission of compressed video. Moreover, since bandwidth over ISDN is available in
increments of 64 kb/s, the H.261 standard permits the compression options to be adjusted in a manner
that increases required bandwidth in 64 kb/s increments to get higher video quality.
H.261 normally works with images in CIF (Common Interchange Format) resolution (352 × 288 for
PAL and 352 × 240 for NTSC), which was in actual fact invented as a format exactly with the intro-
9. Digital video
duction of H.261. A quarter of this resolution was also introduced and it is widely known as Quarter
CIF, or QCIF (176 × 144). Although the actual H.261 standard in one of its documents describes a
high-resolution mode with 704 × 576 pixels, the majority of DVRs using H.261 use CIF size images,
comparable to VHS, as was the case with MPEG-1. Although video can be scaled to larger sizes on
PC screens, the lower the resolution of the actual transmitted image, the more blocky and pixelated
the viewed image becomes.
H.261 has seen its greatest application in the deployment of a variety of H.320 compliant video
conferencing systems. H.261 compression is not really that impressive for higher quality video in
CCTV, but it was most useful for remote connectivity over narrow bandwidths. I use the past tense
“was” here rather than “is” because MPEG-4 has definitely overtaken the H.261 in quality for the same
narrow bandwidth.
The H.263 standard was approved around 1996 and evolved from H.261, being a further development
of H.261. H.263 has been optimized specially for low data transfer rates below 64 kb/s within
the H.320 standard, for example, for connections via modem and analog telephone lines. H.263 is
an alternative to H.261 if both ends support this standard. Especially for transmission in the mobile
radio network GSM (9600 bit/s) or in the analog telephone network, use of H.263 improves the image
quality and image refresh rate. At higher data rates the quality is comparable to H.261.
By incorporating a more efficient video-compression algorithm, the H.263 standard offers higher video
quality than H.261 at every level of bandwidth, including ISDN. H.263 allows video to be transmitted at the very low bit rates required by modems in the range of 15 to 20 kb/s. The original intent was
to enable video calls and video conferencing over conventional telephone lines. Although QCIF may
support these applications, a new image resolution, Sub-QCIF, was added to ensure this capability.
Moreover, higher resolutions were added to exploit capabilities made possible by newer transmission
and compression technologies. Sub-QCIF (SQCIF) permits video as small as 128 horizontal pixels by
96 vertical pixels to be transmitted. The other two additions support image resolutions that are four
times and sixteen times the size of the CIF image (some people refer to it as 4CIF) – 704 horizontal ×
576 vertical pixels and 1408 horizontal × 1152 vertical pixels, respectively. Of these, H.263 equipment
must support only the SQCIF, the QCIF, and the CIF formats. All others are optional.
The H.264 seems to be the most promising new standard development. It is produced by the ITU-T
Video Coding Experts Group (VCEG) together with the ISO’s MPEG. This historical and collective
effort is also known as the Joint Video Team (JVT). This standard is identical to ISO MPEG-4 part 10,
also known as AVC, for Advanced Video Coding. The final drafting work on the standard was completed in May 2003. This standard merges the know-how used in H.263 and MPEG-4.
H.264 is a name related to the ITU-T line of H.26x video standards, whereas AVC relates to its ISO
MPEG roots. Some people call the standard H.264/AVC, or AVC/H.264, to emphasize the common
9. Digital video
heritage. The name H.26L, also related to its ITU-T history, is far less common but is still used.
The intent of H.264 project was to create a standard that would lead to fast implementations, using
low bit rates, that is, implementations that would demand little from the decoder hardware and from
the network bandwidth. H.264 contains several new features that allow it to compress video much
more effectively than older codecs. There is a new Context-Adaptive Binary Arithmetic Coding
(CABAC) used in H.264 to losslessly compress syntax elements in the video stream. H.264 also implements an in-loop de-blocking filter that helps prevent the ringing and blocking artifacts common to
other DCT-based image compression techniques. In previous video standards, motion compensation is
handled by allowing blocks in a frame to refer only to the frame before it. H.264 allows frames to be
predicted from other frames that are arbitrarily far in the past. This usually allows modest improvements
in bit rate and quality in most scenes. But, for example, in certain types of scenes with rapid repetitive
flashing, it allows a massive reduction in bit rate. These ideas, along with many other new ideas, help
H.264 to perform significantly better than MPEG-4 ASP can. H.264 can usually perform radically
better than MPEG-2 at a fraction of the bit rate. Various tests and comparisons have shown that
H.264 offers at least two to three times better efficiency than MPEG-2 for the same picture quality.
H.264 uses sophisticated prediction of macroblocks.
In addition, the JVT is nearing completion of the development of some extensions to the original standard that are known as the Fidelity Range Extensions. These extensions will support higher-fidelity
video coding by supporting increased sample accuracy (including 10-bit and 12-bit coding) and higherresolution color information (including sampling structures known as YUV 4:2:2 and YUV 4:4:4).
H.264 is already widely used for video conferencing. It has also been preliminarily adopted as a mandatory part of the future DVD specification known as HD-DVD, developed by the DVD Forum.
Like many ISO video standards, H.264
has a reference implementation that can
be freely downloaded. Its main concern
is to give examples of H.264 features
instead of being a useful application per
One leader in video editing and multimedia projects, Apple Computer, has already
integrated the H.264 into “Tiger,” the
new version of Mac OS X. Others will
no doubt follow.
Courtesy of Toshiba
HD DVDs of 30 GB just starting to appear
9. Digital video
About pixels and resolution
All of the compressions discussed previously are composed of one smallest element . This is the “building
block” of any digital still image or video – the pixel. It is important to analyze the pixel closer, for this
defines the clarity of an image and how much we can see.
Pixel is short for Picture Elements, sometimes also referred to as pels. These are the smallest
elements of any electronic (digitized) picture. Pixels are the atoms of an image. Understanding pixels
is especially important in the digital photography, but the same could be said for us in CCTV, especially
with the introduction of the digital video recorder. Pixel terminology is also used when printing leaflets
or catalogues, and also when using computer LCD screens, and yet they may not necessarily have the
same meaning as in digitized video.
Pixels can be associated with the image resolution, but understanding the differences between various
kinds of pixels is very important, since often we try to recognize a small detail (such as intruder face)
of a highly compressed image.
In the offset printing industry, the picture elements are usually referred to not as pixels, but as
RGB phosphor grill mask on a color CRT and smallest picture elements on a monitor
9. Digital video
dots; they have the same meaning,
however, one cannot dissect it any
more and get additional, meaningful
information about the image, of
which that pixel is part of. So, in
very simple terms, pixels contain
elementary information about the
smallest details of a picture, which
is the information about the pixel’s
color and the brightness of that
color. In television terms, we refer to
these attributes as chrominance and
luminance of the picture element.
Because of the need to represent a
variety of colors and shades with RGB grill mask on a color CRT. Note the half height
only a limited number of primary vertical displacement for use with interlaced scanning.
colors, pixels are composed of smaller details, each representing a certain value of their primary color.
So, in fact, pixels are not really the smallest elements of a picture, but only as a group of all the primary
elements they do represent a “complete” pixel.
A very important question is: Are the pixels used in digital photography, television, and printing of the
same kind? The answer is – No, they are not. The differences between various pixels are the source
of many misunderstandings and misinterpretations in many imaging industries and applications, one
of which is CCTV.
In CCTV, as all of you would know, we use red, green, and blue phosphor colors to “simulate” other
colors. With the three primary colors (RGB) (these are the primary colors in television), we can
represent almost any other color we can see with our eyes. With the appropriate luminance intensity
of the R, G, and B phosphor we can also represent the variety of brightness pixels have (from black
to white) including skin colors. The actual color mixing occurs in our eyes when viewing the pixels
from a typical viewing distance, which is usually so large relative to the pixel size that we perceive the
three elementary dots as one resultant – color dot, which has the new color – the result of the additive
mixing of the red, green, and blue phosphor in the TV screen pixels.
In analog television, which
the majority of us still use
(and, of course, it is also
used in CCTV), pixels as
elementary detail do exist at
both ends of the image chain
– at the input (i.e., camera
end) and at the output (i.e.,
monitor end). At the camera
end, we use CCD chips,
where the smallest elements
RGB elements on a Delta CRT are different.
9. Digital video
– pixels, are usually made up of red, green, and blue
components. These color pixel components respond to
the red, green, and blue portion of the spectrum of the
projected image, thus producing electrons proportional
to the color component of that picture element projected
at that physical location. In the 3 CCD chip cameras the
light is split into three color groups: red, green, and blue
spectral response. This split is done with a split prism,
and each of the three color groups is then projected onto
its own CCD chip. This means there are three CCD chips,
one for each of the primary colors. Clearly, in cameras
with 3 CCD chips we have a high-quality video signal,
The RGB CCD chip mosaic filter
in terms of both resolution and color reproduction.
Unfortunately, in CCTV, we rarely use 3 CCD chip
cameras because they are much more expensive and are usually bulkier. What we do use most often is
single-chip color cameras. So, in a single-chip color CCD camera each pixel is actually the collection
of the red, green, and blue primary color elements at that location. It is fair to say that there are CCD
chips where primary colors are not red, green, and blue, but rather cyan, yellow, and magenta (similar to
the printing primary colors), but such CCD chips are in use far less in CCTV, and we will not consider
them as an important component of a CCTV system. If we did, however, we would need to know that
the cyan, magenta, and yellow components are converted to red, green, and blue values using a lookup
table inside the camera, since the composite video signal generated at the end of the camera electronics
still needs to be represented with RGB values. As can be seen from the simplified illustration of the
single CCD chips above, the filtering of RGB colors is in the form of a mosaic, and as a result is called
a mosaic filter. It should be noted that there are more green sensors than blue and red (twice as much).
This is because the majority of the luminance information is contained within the green spectrum and
the human eye is most sensitive to the green color. These green cells are the ones that influence greatly
the resolution of the camera.
Although it might be logical to assume that the resolution of a single-chip color CCD camera should
be obtained by dividing the number of horizontal (three-color) pixels with 3/4 (for the aspect ratio) in
order to get the resolution in TV lines, in CCTV practice this is not the case. Because of the mosaic
composition on single-chip color cameras, and because of the way an interlaced scanning image is
obtained in television, the real single-color CCD chip resolution is approximately 70 – 80% of what
the pixel count is. So, for example, a 768 × 582 pixels CCD chip will produce approximately 768/4
× 3 × 0.8 = 460 TV lines. Three CCD chips color cameras have an advantage of at least 100 TV lines
extra simply because they do not use such a mosaic, but rather all pixels of all three colors are used.
Also, an important digression would be to remind the readers that in the days before the CCD chip
cameras (when tubes were used) because of the way a video image was read off the tube face plate
(scanning with a continuous electron beam), we did not have a discrete and finite smallest picture
element (as in CCDs). Discrete picture elements were introduced with color television, when TV sets
were made with CRTs with color grill. It was this color grill that split the beam into red, green, and
blue dots.
9. Digital video
When using monochrome tube cameras we did not talk about pixels, but rather resolution – which was
directly dependent on the smallest electron beam that can be produced by the camera and reproduced
on the monochrome TV screen. If you recall, monochrome monitors had quite a high resolution, simply
because there was no physical limitation with any kind of mechanical grill, or mesh, as introduced
later in the color television development. It was purely up to the electron beam precision (and the
electronics driving it) to reproduce the details captured by the electron beam at the camera end. So,
coming to current CCTV technology, we need to understand that the resolution of an image is defined
primarily by the source, that is, CCD camera resolution, which is dependent on the number of pixels
such a CCD chip has.
We cannot show more details on a monitor (even if the monitor could display more) than what the
actual CCD chip has captured. Although we can always state the number of pixels a CCD chip has,
we still use the term of TV lines as qualification of the quality of details we get from a camera. The
resolution in TV lines is measured with test charts and in the real world there could never be a perfect
alignment of a test chart pattern relative to the projected image on the CCD chip. As a result, TV lines
are showing less detail than the CCD pixel count would indicate. When a video signal is reproduced
on a monitor screen, the smallest picture element is clearly defined by the smallest of the two – camera
CCD pixels or monitor pixels. If we have a very low-resolution monitor – for example, a small, 23 cm
CRT with 330 TV lines specification – and our camera produces a high-resolution 480 TV lines signal,
we can only see what the monitor shows – 330 TV lines. If we have, for example, a high-definition TV
monitor capable of showing over 700 TV lines, and we put our 480 TV lines camera signal through,
we can only see what the camera resolution shows.
In order to get a complete picture
of the resolution measurements,
it is also important to mention
that lens resolving power, or
quality, is measured in lines
per millimeter (l/mm) (please
refer to Chapetr 3). There are
optical specification charts
that show such function of
the resolving power in l/mm
versus the contrast produced
by the lens. This is usually
referred to as Modulation
Transfer Function (MTF).
Here things are getting more
complicated because MTF
counts only the black lines
on a white background (as
opposed to counting both black
and white lines to express the
resolution in TV lines, as we
do in CCTV).
Lens resolution expressed in line pairs per millimeter
9. Digital video
Dots per inch (DPI)
The term dots per inch (DPI) is commonly used today, but because of the different definition of the
term dot it is a source of confusion and misinterpretation (somewhat similar to what we have when
we discuss TV lines and lines in defining a CCTV resolution). In the printing technology we express
resolution in dots per inch (DPI). Certainly, by knowing that 1" is equal to 25.4 mm, the printing
resolution can also be expressed in dots per millimeter, but this is not common. So when we say 300
DPI resolution, this practically means more than 10 dots per millimeter. This is certainly a very tiny
dimension, and the human eye cannot distinguish two different tiny color dots when they are very close
to each other in a 300 DPI print. In order to make a comparison, it is possible to convert the CCTV
screen resolution in dots per inch.
But here is one big but. The mixing of colors in the printing industry is done in a completely different
way – by subtractive mixing of other than RGB primary colors – cyan, magenta, and yellow. Black
is added for additional dark tones, although theoretically, CMY are sufficient to produce other colors.
All of you know that we call this CMYK printing, where CMYK color space is used. So basically, we
use four different inks when printing color magazines or books, and in order to produce the resultant
color the smallest picture elements in the printing industry are produced by having all these elementary
color dots very close to each other (similarly to the TV screen mixing). The difference compared to
TV monitors is that the colors are not positioned next to each other in line (which is typically the case
with most LCD and CRT phosphor these days), but rather the four color dots screening are positioned
Unfortunately, we cannot show CMYK offset printing example here, but the enlarged
section (bottom right) should show the CMYK pixel pattern typical for offset printing.
9. Digital video
at various angles, such as 45º for black, 75º for magenta, 90º for yellow, and 105º for cyan (see the
illustration, although, unfortunately, we could not reproduce it in color for this book). In order to have
high-quality print magazines or brochures, the printing industry requires 300 DPI resolution. So when
we read a magazine from a normal reading distance (typically 0.5 m), the color pixels cannot be detected
by a normal eye and we only see the resultant (subtractive) color mix.
Psychophysiology of viewing details
Through experiments and testing it has been found that the most a human eye can resolve is not
more than 5 to 6 lp/mm (line pairs per millimeter). This refers to an optimum distance between eye
and object of around 0.3 m, as when you are reading a fine text. This equates to a minimum angle of
about one-sixtieth of a degree (1/60°). So, 1/60º is considered the limit of angular discrimination
for normal vision. We can use this minimum angular vision for better understanding and optimizing
the psychophysiology of the viewing.
A known viewing distance parameter from Chapter 6 recommends for CCTV a viewing distance of
around seven times the monitor height. So it is important to understand that viewing distance is an
important factor for the psychophysiological experience of seeing details in an image. It is of no use if
a viewer gets closer to the monitor, but it is also not going to get any better if he is positioned farther
away from the monitor screen. For the analog PAL, with 576 active lines, the corresponding optimum
details viewing distance is obtained from the 5 to 6 lp/mm, projected at the distance where the monitor
screen is. So, for example, if we use the 7 × picture height rule, for a 15" (38 cm) CCTV monitor,
whose picture height would be around 23 cm, the recommended viewing distance is 483 picture lines of
525-line television, and the corresponding viewing distance is about seven times picture height, which
is around 1.6 m. The maximum resolving power of the human eye at that distance is simply a factor
of 5 from what is shown in the graphics below (because 0.3 m goes around six times in 1.6 m). This
equates to around 1 lp/mm (line pairs per millimeter), or 0.5 lines/mm, which is roughly what the 576
active lines will occupy on the screen. This assumes a high-quality, high-resolution monitor of course.
Coming much closer to such a monitor will not show any more details, nor will going further away
be of any advantage. Coming closer to a CCTV monitor has the same effect as when a much larger
monitor is put in place. If you decide, for example, to put a 21" color monitor at the same distance of
1.6 m instead of the 23" monitor, the picture clarity and video details will appear much worse to the
viewer. The optimum resolution distance for a 21" CCTV monitor would be 2.1 m.
The human eye acuity angle is 1/60º.
9. Digital video
Similar logic and calculation will show that viewing a high-resolution computer screen such as a CRT
with dot pitch of 0.21 mm is optimum at a distance of around 0.6 m. The majority of LCD monitors
do not have such a fine pixel detail, but typically around 0.28 mm, so they actually will look slightly
better if viewed from around 1 m distance.
The optimum distances for maximum eye resolution, based on 1/60º acuity
A much better looking picture would be shown on a computer screen (viewed from the same distance),
where a typical XGA monitor of 1024 × 768 pixels has an actual equivalent resolution of 92 DPI. To
see how this was obtained, divide 1024 pixels with the width of a 14" LCD notebook screen. Computer
screens therefore have a larger area (pixel count) but also a higher frame rate than we use in CCTV.
Please note that in order to display such a high-quality image on the screen, the computer has to have
a high quality video card with sufficient memory capable of processing that number of pixels (1024
× 768) with the number of colors sufficient to replicate a live scene (24-bit color, equivalent to 16.7
millions of combinations of the RGB). And another important note: such a display neither complies
with PAL nor with NTSC analog standard, but it is a computer XGA graphic display.
Printed material has even better resolution per mm than any kind of monitor. This is why you sometimes
may see better details when a video image is printed out on a high-quality ink jet photo paper than
looking at the same image on a CCTV monitor. We can express monitor resolution in DPI, but it would
not make the same sense as in printing. This is primarily because when viewing a monitor screen we
(usually) do not stand as close as when reading this book. So, let us assume we do have a very highresolution CCTV camera image displayed on a very high-quality CCTV monitor, which would be quoted
as having 500 TV horizontal lines of resolution. If the monitor has, for example, 38 cm (15") diagonal
9. Digital video
screen size, the 500 TVL resolution image means 666 vertical lines across the 30 cm screen width (30
cm = 11.8"). The 666 lines divided by the 11.8" gives 56 DPI! This resolution is close to the highest we
can get when displaying analog video signal, and it is defined by the video standard (PAL/NTSC).
In order to print a ITU-601 TV frame on an ink jet printer, we need to understand this technology, too,
so that we can optimize the printout quality. It is logical to expect that print size and resolution quality
can easily be calculated, since we know, for example, that our ink jet printer has, for example, 1440
DPI. This is not so, however. The dots per inch as described in the specification on your fine ink jet
printer (like 720 DPI, or 1440 DPI) refer to the finest dots that can be produced by each of the cyan,
magenta, yellow, or black ink nozzles of the ink head. Adding to the confusion is the fact that this is
not the same as the DPI described for magazine and book printing. The actual “natural” colors of the
ink jet printouts are obtained by a dithering process, which is actually spraying and mixing the ink jet
dots with various sizes and combining them to produce a resultant color. In essence, the ink jet color
printers are binary devices in which the cyan, magenta, yellow, and black dots are either “on” (printed)
or “off” (not printed), with no intermediate levels possible. This is conceptually different from the RGB
phosphor on a CRT, where each phosphor can have a variety of intensity. A “binary” CMYK printer can
only print five “solid” colors (cyan, magenta, yellow, black, and white). White is actually the nonprinted
paper background color, but that is also used. Clearly, this is not a big enough palette to deliver good
color print quality, which is where half toning comes in. This is still the case even with the new photo
quality ink jet printers that have additional two colors: light-cyan and light-magenta (for better human
flash reproduction). Half toning algorithms divide a printer’s native dot resolution into a grid of halftone
cells and then turn on varying numbers of dots within these cells in order to mimic a variable dot size.
By carefully combining cells containing different proportions of CMYK dots, a half toning printer can
“fool” the human eye into seeing a palette of millions of colors rather than just a few.
A simple rule of thumb by some imaging people, such as Adobe, is to divide the ink jet DPI as specified
by the ink jet printer manufacturers with 4 to get the “real” color dots per inch. Practically, this means
that a 720 ink jet DPI printer can reproduce 180 color dots per inch. To achieve highest quality, it is
important to use high-quality printing paper. For best results, inks and photo paper of the corresponding
printer manufacturer are recommended.
An important conclusion to make here is that when a
digitized and compressed image is exported and given
as evidence to a third party (police, for example) it is
desirable to have it in the original format, or at least
exported to BMP so that no additional compression
artifacts are introduced. If you are comparing various
image compressions, the most objective way of
comparing them is that either when the image is printed
out on the same high-quality photo paper, or when they
are compared on a computer screen, both should be
exported in a noncompressed BMP format.
High-quality color printers these days
are cheap and should be a standard
part of a CCTV system.
9. Digital video
Recognizing faces and license plates in CCTV
In CCTV, one of the most important functions is to be able to recognize a person, an intruder, or a
group of people involved in an accident, for example.
Second on the list of most required functions is recognizing a vehicle’s license plate. Certainly,
CCTV cameras and digital recorders have other applications, that are not always related to security or
surveillance, but since the two mentioned are the most common ones, we will explain the requirements
in designing and setting up the system that will guarantee successful identification of faces and license
Our main problem is the limited number of pixels both in the CCTV cameras and in the ITU-601
digitization recommendation, which, as mentioned in the beginning of this chapter, is around 400,000
pixels. So, most of the time the “trick” is to find a suitable lens and position for the camera so that it
can see sufficient details for identifying people or license plates. Customers are usually the ones who
expect one camera to cover everything, see everything, and recognize everything. This topic has been
discussed often, and it still is a stumbling block in designing various projects. If we all work under the
pressure of a budget (and the budget is a very important consideration) the tendency is to have as small
number of cameras as possible. Yet, when there is an incident and a positive identification is required,
the CCTV system designer could be blamed for having a system that cannot recognize a face or a car,
even if they are captured in the camera field of view. Here is a simple advise: do not compromise your
system design; rather, educate the customers so that they understand why more cameras with certain
coverage would be required. If necessary, have two cameras covering a foyer entrance, for example, one
having a wide and global coverage, the other with a narrower angle of view picking up faces entering
the foyer. Initially, this might seem to be an overkill, but when a suspected intruder is identified and
captured, the system proves its existence. This is the purpose of a surveillance system.
Finding a camera lens to give you an angle of view for positive identification is not science fiction, and
it is already known from the analog part of the CCTV design. Here, we will only highlight the fact that
when using digitized video certain image losses will occur and they have to be taken into account.
In actual fact, various national standards written specifically for CCTV define what views are required
in order to recognize faces and license plates. These standards are not necessarily identical for all
countries, but we will use what is close to the author of this book, and that is the Australian CCTV
Standards, which should give you sufficient information for practical use and even perhaps for further
If digital recording is used, it is recommended that a full PAL frame resolution and highest picture
quality be used (i.e., 704 × 576 active pixels, which is equivalent to 720 × 576 ITU frame grabbing recommendations). Where possible, for better vertical resolution, TV frames are recommended instead of
TV fields recording, although these standard recommendations are good for field recording as well.
If the target is a person and the CCTV system has an installed limiting resolution of at least 400 TV
lines (most of them would have around 460 TVL), the recommended minimum sizes of the targets in
order to be recognized are:
9. Digital video
• For face identification the entire target person should represent not less than 100% of screen
height. It is assumed that a person’s face (head) occupies around 15% of a person’s height. If
digitized image is used, the head should be not less than 90 pixels high, before a compression
is applied.
• For face recognition the entire target person should represent not less than 50% of picture
height. If a digitized image is used, the person’s image height should be not less than 288 pixels,
before a compression is applied.
• For detection of an intruder the entire target person should represent not less than 10% of
picture height. If a digitized image is used, the target person should be not less than 60 pixels
high, before a compression is applied.
• For crowd control (monitoring) the entire target person should represent not less than 5% of
picture height. If a digitized image is used the target person should be not less than 30 pixels
high, before a compression is applied.
• For vehicle number plate visual recognition the license plate characters should be not less
than 5% of the monitor height. If a digitized image is used the license plate should be not less
than 30 pixels high, before a compression is applied.
The CCTV Labs test chart ( has display indicators that can be used to verify system
compliance with all of the details listed above.
Minimum object sizes relative to the monitor display height in
order to recognize or identify the objects.
9. Digital video
Operating systems and hard disks
In order for a computer to work, it needs appropriate hardware and appropriate software that understands
the environment in which it works. When a computer starts up, the first thing that happens is that it
loads the BIOS table (Beginners Instructions Operating Set), which has all the hardware configuration details, hard disks, video cards, keyboard, mouse, communication ports, parallel ports, and so on.
Once the BIOS defines all these details, it goes to a special section of the hard drive called boot record
area and looks for an operating system. The operating system (OS) is software that usually resides on
the hard disk, and when it loads, it brings up a graphical user interface (GUI) and connects the whole
system in a meaningful interactive environment, loading various drivers, displaying certain quality
images, and accepting and executing commands as defined by the computer user or application. The
OS is what the name suggests: it is a system to operate with the computer, and it is the basis for
various applications and specialized programs, such as spreadsheet program, word processing, photo
editing, or video editing.
Many DVRs in CCTV fall into this category, since they use one of the few popular OSs and add video
processing as specialized application to it. Most common OSs used in CCTV are Windows and Linux.
There are other OSs as well, such as Unix, Solaris, and Mac OS X, but none of these is used in CCTV;
this is why we will not be going in depth comparing and analyzing them.
Some DVRs do not load their OS from a hard disk, but rather from a chip (usually flash memory or
EPROM). Sometimes you may find that DVR manufacturers refer to the OS as a Real Time Operating System (RTOS) or Embedded OS. Running a DVR with an embedded OS simplifies things, for
the OS is then smaller and it is faster to load. Also, if a hard disk fails, there is no need to install the
OS from scratch (which would usually be the case if a DVR with an OS on hard disk fails). The one
important limitation with DVRs with OS on a chip is that they are not as flexible and easy to upgrade
as the ones loaded on a hard disk.
One of the most important requirements in security and surveillance is the stability of the OS. Long-term
operation in CCTV is sometimes more demanding than a busy web server, and even more demanding
The trademarks of the three main operating systems: Mac, Linux, and Windows
9. Digital video
than a typical office or home computer usage. A web server can go down for a few minutes or maybe
even hours for maintenance purposes, but a DVR in security is expected to run without interruption for
months and years. This is a big task. The intensity of writing and reading data to and from hard disks
is usually higher than web servers, as video data is much larger than handling web pages or e-mails,
for example. Not all operating systems and hardware are suitable for such a long and uninterrupted
operation. One reason why the majority of web servers on the Internet these days are running on Linux
is exactly that – the long-term stability. This is not to say that the popular and widely spread Windows
is not suitable at all, but readers should be made aware that the current statistic shows that identical
hardware with Intel processors (which makes the majority of PCs) will perform faster and more reliably with Linux than with Windows.
Linux is still a young operating system written by a Finnish student by the name of Linus Torvald,
and it is based on Unix, one of the oldest and most robust industry standards, which is (unfortunately)
licensed. Linux has picked up so much in only 10 years of its existence because of its concept that
the source code for Linux is freely available to everyone (developed under GNU – General Public
License). When the first version of Linux was written and made freely available to everyone, the
author’s only requirement was to have any additional improvement or driver written by others to be
made available to everyone.
Thousands of software developers, students, and enthusiasts wholeheartedly accepted this idea of free
and open source OS.This is why Linux not only became more popular, but also, with more hardware
drivers it offered more applications and was continually getting better. Stability is only an inherent part
of the Unix concept, but it is further developed and improved with new kernels and file systems. Linux
comes in many “flavors” called distributions, but all use the same kernel (the core of the operating
system) and have a variety of additions, programs, tools, and GUIs, all of which are license free.
So, when Linux is used in a DVR in CCTV, it has
not only a cost benefit, but, maybe more importantly, long-term license independence. If a hard
disk inside a DVR with Linux fails (and drives can
fail regardless of the OS), installing a new version
does not require that any fees or license numbers
be entered when re-installing. This is not the case
when Windows is used.
Some DVR manufacturers use certain versions of
Windows and have gone an extra step forward by
having their own software engineers “tweak” the
Windows engine to suit their hardware better than
when it comes from Microsoft themselves, hence
achieving higher stability and reliability.
Others will argue that embedded OSs are an even
better choice since there are no issues; if hard disk
fails there, the OS is in the flash memory. So, in case
Further reading about various OS’s
9. Digital video
of a failure, or power loss, a DVR with embedded OS just quickly re-starts and continues recording.
There is no need for reloading the OS even if the drive fails; one should just slot in a new one. This
might be a better alternative for some. The only limitation here would be the flexibility of having a
variety of hardware drivers, the easiness of updating the embedded OS, and the variety of functions
one can have. Usually DVRs with full-blown OS, whether Windows or Linux, have many more features and functions since they are not limited in size as is the flash memory. Embedded OS in a flash
memory usually has stripped-down functionality.
Today a typical “fully loaded” PC, with many various applications, would use anything between 2 and 5
gigabytes (GB) of hard disk space. This would typically include the operating system (usually Microsoft
Windows) and the various applications such as text editors, spreadsheet programs, web browsers, and
image editors. User data, created using the applications, can vary significantly, depending on whether
you are working with text files only, or text and images, or perhaps, video clips.
Digital video recorders (DVRs) used in CCTV are an exception to this typical scenario. They are
designed and intended to use the maximum hard disk space available. With a typical large size hard
disk available these days of 300 GB, the internal DVR hard drive capacity can get extended to over 1
Today, in CCTV, there are at least a couple of hundred different DVR models.
9. Digital video
TB using up to four such drives. Some larger systems may even include external SCSI or RAID storage drives. A typical DVR, as used in CCTV, would be working really hard, day and night, 24 hours
a day, 7 days a week without (ideally) being shut down. DVRs are without any doubt a symbiosis of
software and hard disk technology. If any of the two fails, you will have a failed DVR and loss of
important recordings.
The need for a better understanding of hard disks and their limitations is greater than ever, especially for
those in CCTV. Even the most stable OS depends on the hardware reliability. If the hardware fails,
the OS can no longer run, even if, technically, the OS has not failed. The most vulnerable hardware
parts of any computer are the moving parts, notably the cooling fans and the spinning hard disks.
These parts most commonly fail simply because of wear and tear, increased temperature, dust, moisture,
and mechanical shocks. Some of these issues are addressed by only a few high-end DVR manufacturers.
The fact is, at the time of writing this book the majority of DVR manufacturers do not even consider
these issues. Driven by the competitive market, very few would go the extra length of putting higher
quality hardware and protecting it in its design and production stage. Everything is left in the hands
of suppliers and installers and how they educate customers about the importance of the environment,
having clean and air-conditioned equipment rooms, and maintaining and monitoring them.
The hard disk drives
are the most important DVR hardware
that has moving parts
(spinning disks), especially because they are
the storage area. For
this reason, we will
devote a bit more space
to them later on in this
It is beyond the scope
of this book to analyze
all of the intricacies of
the various OS available on DVRs today,
but we will say a few
words about the variety
of files and filing system used for recording
various data and video
on hard disks.
One of the most important electromechanical devices in DVRs
9. Digital video
Hard disk drives
Hard disk drives are an essential part of any modern computer device, and this includes the DVRs in
the digital CCTV. It is therefore very important to understand how they work and to learn what are
their performances and limitations. A hard disk or hard drive is responsible for long-term storage of
information. Unlike volatile memory (often referred to as RAM), which loses its stored information
once its power supply is shut off, a hard disk stores information permanently, allowing you to save
programs, files, and other data. Hard disks also have much greater storage capacities than RAM; in
fact, current single hard disks may contain over 400 GB of storage space.
A hard disk is comprised of four basic parts: platters, a spindle, read/write heads, and integrated
electronics. Platters are rigid disks made of metal or plastic. Both sides of each platter are covered with
a thin layer of iron oxide or other magnetizable material. The platters are mounted on a central axle or
spindle, which rotates all the platters at the same speed. Read/write heads are mounted on arms that
extend over both top and bottom surfaces of each disk. There is at least one read/write head for each side
of each platter. The arms jointly move back and forth between the platters’ centers and outside edges.
This movement, along with the platters’ rotation, allows the read/write heads to access all areas of the
platters. The integrated electronics translate commands from the computer and move the read/write
heads to specific areas of the platters, thus reading and/or writing the needed data.
Main mechanical parts of a hard drive
9. Digital video
Two major sizes of hard drives: 3.5" used in desktop machines and 2.5" used
in notebook computers
Computers record data on hard disks as a series of binary bits. Each bit is stored as a magnetic polarization (north or south) on the oxide coating of a disk platter. When a computer saves data, it sends
the data to the hard disk as a series of bits. As the disk receives the bits, it uses the read/write heads to
magnetically record or “write” the bits on the platters. Data bits are not necessarily stored in succession;
for example, the data in one file may be written to several different areas on different platters. When the
computer requests data stored on the disk, the platters rotate and the read/write heads move back and
forth to the specified data area(s). The read/write heads read the data by determining the magnetic field
of each bit, positive or negative, and then relay that information back to the computer. The read/write
heads can access any area of the platters at any time, allowing data to be accessed randomly (rather
than sequentially, as with a magnetic tape). Because hard disks are capable of random access, they can
typically access any data within a few millionths of a second.
In order for a computer operating system (OS) to know where to look for the information on the hard
disk, hard disks are organized into discrete, identifiable divisions, thus allowing the computer to easily
find any particular sequence of bits. The most basic form of disk organization is called formatting.
Formatting prepares the hard disk so that files can be written to the platters and then quickly retrieved
when needed.
Before a brand new hard drive is used, it needs to be formatted. Formatting is a method of organizing
what is saved to the disk, and it depends on the operating system (OS). Hard disks must be formatted in two ways: physically and logically. Physical formatting is done before a logical one.
Formatting is made in sectors, clusters (a group of sectors) and tracks according to the operating system used. Tracks are concentric circular paths written on each side of a platter, like those on a record
or compact disc. The tracks are identified by number, starting with track zero at the outer edge. Tracks
9. Digital video
This is how cylinders are made – tracks on both sides of magnetic plates
are divided into smaller areas or sectors, which are used to store a fixed amount of data. Sectors are
usually formatted to contain 512 bytes of data (there are 8 bits in a byte). A cylinder is comprised of a
set of tracks that lie at the same distance from the spindle on all sides of all the platters. For example,
track three on every side of every platter is located at the same distance from the spindle. If you imagine
these tracks as being vertically connected, the set forms the shape of a cylinder. Computer hardware
and software frequently work using cylinders. When data is written to a disk in cylinders, it can be fully
accessed without having to move the read/write heads. Because head movement is slow compared to
disk rotation and switching between heads, cylinders greatly reduce data access time.
After a hard disk is physically formatted, the magnetic properties of the platter coating may gradually deteriorate. Consequently, it becomes more and more difficult for the read/write heads to read
data from or write data to the affected platter sectors. The sectors that can no longer be used to hold
data are called bad sectors. Fortunately, the quality of modern disks is such that bad sectors are rare.
Furthermore, most modern computers can determine when a sector is bad; if this happens, the computer
simply marks the sector as bad
(so it will not be used again) and
then uses an alternate sector.
After a hard disk has been
physically formatted, it must
also be logically formatted.
Logical formatting places a file
system on the disk, allowing
an operating system (such as
Windows or Linux) to use the
available disk space to store and
retrieve files. Different operating systems use different file
systems, so the type of logical
formatting applied depends on
the OS installed.
9. Digital video
Formatting the entire hard disk with one file system limits the number and types of operating systems
that can be installed on the disk. If, however, the disk is divided into partitions, each partition can then
be formatted with a different file system, allowing multiple operating systems. Dividing the hard disk
into partitions also allows the use of disk space to be more efficient.
To read or write data, the disk head must be positioned over the correct track on the rotating media.
Seek times are usually quoted to include the time it takes for the head to stop vibrating after the move
(“settling time”). Then a delay occurs until the correct data sector rotates under the head (“rotational
latency”). Modern disks use accelerated track positioning, so that the head moves faster and faster until
about the halfway point and then is decelerated to a stop at the target track. This is why the average
seek is only a few times the minimum seek. The maximum seek time is usually about twice the average seek time because the head reaches its maximum speed before the middle track of the disk. The
minimum track-seek time is the time it takes to move the heads from one track to the next adjoining
track. For reading large blocks of data, such as our DVR recorded footage, this is the most significant
seek performance value. The average track seek time is more important for random access of small
amounts of data such as traversing a directory path.
Access Time is equal to the time to switch heads + time to seek the data track + time for a sector to
rotate under the head + repeat for the next sector. More heads reduce the need to mechanically seek a
new track.
Faster rotational speed (spindle speed) increases the maximum data transfer rate and reduces the rotational latency. The rotational latency is the additional delay in seeking a particular data sector while
waiting for that sector to come under the read head. The following table illustrates typical differences
between various rotational speed hard drives and their maximum transfer rates (discussed later in the
book), which are the most important indicator of how much data we can put through the magnetic
plates of the hard disks.
3600 rpm
4500 rpm
5400 rpm
7200 rpm
10,000 rpm*
12,000 rpm*
15,000 rpm*
16.7 ms
13.3 ms
11.1 ms
8.3 ms
6.0 ms
5.0 ms
4.0 ms
Maximum (Burst)
Transfer Rate
60 MB/s
80 MB/s
100 MB/s
140 MB/s
200 MB/s
250 MB/s
300 MB/s
*The higher speeds require better cooling of the drive.
Every hard disk is specified with its rotational speed or spindle speed. Expressed in revolutions per
minute (rpm), this specification gives a very good indication of the drive performance. Desktop drives
generally come in 5400 rpm and 7200 rpm varieties, with 7200 rpm drives averaging 10% faster (and
10 to 30% more expensive) than 5400 rpm models. High-end 10,000 rpm and 15,000 rpm hard drives
offer only marginally better performance than 7200 rpm drives and cost much more, in part because
9. Digital video
they are typically SCSI drives with
added reliability features. Also,
higher spindle hard drives need more
current; hence they get hotter. Cooling is very important for all hard
drives, and more so for the faster
ones. So, for a typical DVR, hard
disks with 5400 rpm or 7200 rpm are a
good compromise between sufficient
speed, reasonable cost, and being
relatively “cool” drives.
If two drives have the same spindle
speed, the seek time shows which one
is better. Differences in seek times,
Location of boot records
which range from 3.9 milliseconds
(ms) for ultra-fast SCSI drives to 12
ms for slower IDE drives, may be
noticeable in database or search applications where the head scoots all over the platter, but also when
doing a VMD or time/date search in a digital recorder footage.
Cache is another term used in hard drives, and it refers to the amount of memory built into the drive.
Designed to reduce disk reads, the cache holds a combination of the data most recently and most frequently read from disk. Large caches tend to produce greater performance benefits when multiple users
access the same drive at once. Although small differences in cache size may have little bearing on performance, a smaller cache may be a sign of an older, slower drive. Operating systems try to maximize
performance by minimizing the effect of mechanical activity. Keeping the most recently used data in
memory reduces the need to go to the disk drive, move the disk heads, and so on. The writing of new
data may also be cached and written to disk at a later, more efficient time. Other strategies include track
buffering where data sectors are read into
memory while waiting for the correct sector
to rotate under the head. This can eliminate
the delay of rotational latency because later
sectors have already been read after reading
of the sought sector is complete. For modern
disks, this track buffering is usually handled
by a memory cache on the disk drive’s builtin controller. Modern disk drives usually
have cache that ranges from 2 MB to 4 MB
of cache memory to buffer track reads and
hence eliminate rotational latency. Some
high-end drives have 8 MB or even 16 MB.
However, the rotational speed still limits the Larger hard disks usually have multiple heads
and platters.
maximum transfer rate.
9. Digital video
Despite the electronic methods of
improving the hard disk’s performance, they are determined primarily by the mechanical characteristics
of the drive. This is the reason external factors affecting the mechanical
performances of the hard disks also
affect their reliability and lifetime
expectancy. Exposure to high temperatures, dust, moisture, shocks,
and vibrations are external factors
that can cause hard disk failures.
High temperatures and dust are the
two most common causes of hard
disk failures we experience in practice in CCTV.
One of the very few DVR manufacturers that takes
good care of its hard drives by filtering the air,
monitoring fans, external and internal temperature
It is not unrealistic to say that hard
drives in some DVRs work even harder than hard drives on many Internet web servers. Unfortunately,
the culture of customers using DVRs is not the same as the culture of customers using company or web
servers, for example. DVRs are very often mistreated and put in places with minimum air conditioning,
often plenty of dust, and various evaporations. In all CCTV system designs, we must insist on treating
hard disks the same way as when they are running inside the company servers.
In the current race of making more and better DVRs, most CCTV manufacturers concentrate on having
higher compressions or recording more frames per second. Few among them have gone in the direction
of taking care of the actual DVR environment, filtering the cooling air, measuring the fan’s revolution,
as well as external and internal temperature. In addition to using a stable OS, these are extremely important factors influencing the longevity of the DVRs. At the end of the day, it is no good having even
the highest frame rate if they cannot be saved on a healthy hard disk.
There are a variety of hard disk interface standards between the computer and the hard disks, such as
ATA, SCSI, RAID, and SATA, all of which are discussed later on in the book. Each one has its own
advantages and disadvantages, but the bottom line is the hard disk itself.
The hard disk sustained data transfer rate ultimately defines how many cameras or images per second
a DVR can have. This is the bottleneck of data transfer as it depends on mechanically moving parts.
The sustained transfer rate is always less than the burst transfer rate. Generally ranging from 14 MB/s
to 60 MB/s (mega bytes per second!), it indicates how fast data can be read from the outermost track
of a hard drive’s platter into the cache. The sustained transfer rate is an important parameter of
the DVRs’ hard drives, which ultimately defines the upper limit of how many pictures per second
your system can record and play back.
This performance also depends on the operating system, the processor, and the compression speed, file
sizes, and the like, but ultimately, if the hard drive cannot cope with such a through-output the DVR
9. Digital video
cannot achieve what it is (theoretically) capable of.
Let us analyze this issue with one practical example:
Let us be conservative and assume that we have a typical and “not-so-good” hard disk with a sustained
transfer rate of only 14 MB/s. If we translate this rate into mega bits per second we should multiply
14 by 8, and we get a 112 Mb/s transfer rate. Let us now assume that we are recording on a DVR with
JPG compression that records good-quality images of, let us say, 40 kB. If we do not do anything else
while the recorder records (i.e., not playing back); to find out what the maximum (theoretical) recording performance of such a machine is we need to divide 14 MB/s by 40 kB. This gives a number of
350. If we have 16 cameras connected to the DVR, then the theoretical maximum recording rate will
be 350/16 = 21 pictures per second per camera. This is a theoretical maximum of a DVR where no
other processes are active. In reality, the DVR has to “spend time” doing time base correction, i.e.,
synchronizing the un-synchronized cameras. This will reduce the theoretical rate by at least 50% to
10 pictures per second per camera. If we decide to play back at the same time, or archive, this would
further reduce the recording performance by another factor of at least 50%, obtaining 5 pictures per
second per camera as a theoretical maximum with such a hard drive. In addition, the intelligence of
the operating system of handling files has to be considered. And this is all true if we assume that our
example DVR does hardware JPEG compression at a faster rate, hence not “wasting” the operating
system and main processor’s time. In many practical DVRs you will find that the compression might
be done by the “proprietary software encoding scheme.” This practically means that there will be some
additional “bottlenecks” for our theoretical recording performance, which most likely would drop down
from the above calculated 5 pictures per second to maybe 1 or 2 pictures per second per camera. And,
there is one more important, and almost invisible, factor that we need to consider in this example.
This is the point where we have to acknowledge that the fragmenting of files while performing continuous recording (24 hours a day, 7 days a week) is handled by the operating system and can affect the
recording performance, especially after a longer period of recording of a few days or weeks. No number
can be attached to this performance reduction factor, for it depends on the DVR software design, but it
is going to reduce the performance further,
although the actual hard disk sustained
transfer rate might be unaffected. (The
hard disk “wastes time” searching for the
free fragments as dictated by the operating
As can be concluded from the above example, many factors influence the performance of a digital video recorder, not just
the “DVR front end,” but other underlying
and invisible processes as well. The hard
drives are the starting and the ending point
A typical DVR in CCTV has 16 camera inputs, but
in such a chain of operation.
18, 24, or 32 are also possible.
9. Digital video
The different file systems
Each different operating system uses some kind of file system in order to write data on hard drives and
removable media, so that later on the user is able to find it and read it. Inherently, this is a fundamental
and important concept that defines the flexibility, capacity, and security of various systems. This is why
we will mention the most common ones here.
All file systems consist of structures necessary for storing and managing data. These structures typically include an operating system boot record, directories, and files.
A file system performs three main functions: it tracks the allocated and unused space; it maintains directories and filenames; and it tracks the physical coordinates where each file is stored
on the disk.
Different file systems are used by different operating systems. Some operating systems (such as Windows) can recognize only some of its own file systems, while others (such as Linux and Mac OSX)
can recognize several, including the ones from other OSs.
Some of the most common file systems in use today are:
Ext – Extended file system, designed for Linux systems
Ext2 – Extended file system 2, designed for Linux systems
Ext3 – Extended file system 3, designed for Linux systems (Ext2+journaling)
FAT – Used on DOS and Microsoft Windows, working with 12 and 16 bits
FAT32 – FAT with 32 bits
HFS – Hierarchical File System, used on older Mac OS systems
HFS+ – Used on newer Mac OS systems
HPFS – High Performance File system, used on IBM’s OS/2
ISO 9660 – Used on CD and DVD-ROM disks (Rock Ridge and Joliet are extensions to this)
JFS – IBM Journaling File system, provided in Linux, OS/2, and AIX
NTFS – Used on Windows NT-based systems (Windows 2000 and XP)
ReiserFS – File system that uses journaling, used in Linux and Unix
UDF – Packet-based file system for WORM/RW media such as CD-RW and DVD
UFS – Unix and Mac OS X File system
9. Digital video
FAT (File Allocation Table)
Introduced by Microsoft in 1983, the File Allocation Table (FAT) is a file system that was developed
for MS-DOS and used in consumer versions of Microsoft Windows up to and including Windows ME.
Even with 512-byte clusters, this could give up to 32 MB of space – enough for the 10 MB or 20 MB
XT hard drives that were typical at the time. As hard drives larger than 32 MB were released, large
cluster sizes were used. The use of 8192-byte clusters allowed for file system sizes up to 512 MB.
However, this increased the problem of internal fragmentation where small files could result in a great
deal of wasted space; for example, a 1-byte file stored in a 8192-byte cluster results in 8191-bytes of
wasted space.
The FAT file system is considered relatively uncomplicated, and because of that, it is a popular format for
floppy disks. Moreover, it is supported by virtually all
existing operating systems for personal computers, and
because of that it is often used to share data between
several operating systems booting on the same computer (a multi-boot environment). It is also used on
solid-state memory sticks and other similar devices.
The FAT file system also uses a root directory. This
directory has a maximum allowable number of entries
and must be located at a specific place on the disk or
Hard disk, CD-ROM, and floppies
Although it is one of the oldest file formats, FAT is likely to remain for a long time because it is an ideal
file system for small drives, like the floppies. It is also used on other removable storage for noncomputer
devices, such as flash memory cards for digital cameras, USB flash drives, and the like.
FAT32 (File Allocation Table 32)
In 1997, Microsoft created FAT32 as an extension to the FAT concept because the cluster growth possibility was exhausted. The largest cluster size in Windows FAT was 32 kB, giving a maximum volume
size of 2 GB. Microsoft decided to implement a newer generation of FAT, known as FAT32, with 32-bit cluster numbers, of
which 28 bits are currently used. In theory, this should support a
total of approximately 268,435,438 clusters, allowing for drive
sizes in the multi-terabyte range. However, due to limitations
in Microsoft’s ScanDisk utility, the FAT is not allowed to grow
beyond 4,177,920 clusters, placing the volume limit at 124.55
This is an enhancement of the FAT file system and is based on
32-bit file allocation table entries, rather than the 16-bit entries
used by the previous FAT system. As a result, FAT32 supports
9. Digital video
much larger disk or partition sizes (up to 2 TB). This file system can be used by Windows 95 SP2
and Windows 98/2000/XP. Previous versions of DOS or Windows cannot recognize FAT32 and are
thus unable to boot from or use files on a FAT32 disk or partition. The FAT32 file system uses smaller
clusters than the FAT file system, has duplicate boot records, and features a root directory that can be
of any size and can be located anywhere on the disk or partition.
NTFS (New Technology File System)
NTFS or New Technology File System is the standard file system of Microsoft Windows NT and
its descendants, Windows 2000, Windows XP, and Windows Server 2003. NTFS is a descendant of
HPFS, the file system designed by Microsoft and IBM for OS/2 as a replacement for the older FAT file
system of MS-DOS. HPFS has several improvements over FAT such as support for metadata and the
use of advanced data structures in order to improve performance, reliability, and disk space utilization. NTFS incorporates these plus additional extensions such as security access control lists and file
system journaling.
In NTFS everything that has anything to do with a file (name, creation date, access permissions, and
even contents) is written down as metadata. Internally, NTFS uses binary trees in order to store the file
system data; although complex to implement, this allows fast access times and decreases fragmentation.
A file system journal is used in order to guarantee the integrity of the file system itself (but not of each
individual file). Systems using NTFS are known to have improved reliability, a particularly important
requirement considering the unstable nature of the older versions of Windows NT.
Because details on the implementation’s details are closed, third-party vendors have a difficult time
providing tools to handle NTFS. Currently, the Linux kernel includes a module that makes it possible
to read NTFS partitions. However, the general complexity of the file system and inadequate developer
resources, both in time and persons, have delayed the addition of full write support.
NTFS is not recommended for use on small hard disks because it uses a great deal of space for system
structures. The central system structure of the NTFS file system is the master file table (MFT). NTFS
keeps multiple copies of the critical portion of the MFT to protect against corruption and data loss.
Like FAT and FAT32, NTFS uses clusters to store data files. However, the size of the clusters is not
dependent on the size of the disk or partition. A cluster size as small as 512 bytes can be specified,
regardless of whether a partition is 6 GB or 60 GB. Using small clusters not only reduces the amount
of wasted disk space, but also reduces file fragmentation, a condition where files are broken up
over many noncontiguous clusters, resulting in slower file access. Because of its ability to use small
9. Digital video
clusters, NTFS provides good performance on large drives. Finally, the NTFS file system supports hot
fixing, a process through which bad sectors are automatically detected and marked so that they will
not be used.
The Ext2 or second extended file system was the standard file system used on the Linux operating
system for a number of years and remains in wide use. It was initially designed by Rémy Card based on
concepts from the extended file system. It is quite fast, enough so that it is used as the standard against
which to measure many benchmarks. Its main drawback is that it is not a journaling file system. The
Ext2 file system supports a maximum disk or partition size of 4 terabytes. Its successor, Ext3, has a
journal and is compatible with Ext2.
The Ext3 or third extended file system is a journaled file
system that is coming into increasing use among users
of the Linux operating system. Although its performance
and scalability are less attractive than those of many of
its competitors such as ReiserFS and XFS, it does have
the significant advantage that users can upgrade from
the popular Ext2 file system without having to back up
and restore data.
The Ext3 file system adds a journal without which the file
system is a valid Ext2 file system. An Ext3 file system
can be mounted and used as an Ext2 file system. All of
the file system maintenance utilities for maintaining and
repairing the Ext2 file system can also be used with the
Ext3 file system, which means Ext3 has a much more
mature and well-tested set of maintenance utilities available than do its rivals.
The ReiserFS is a general-purpose computer file system designed and implemented by a team at Namesys
led by Hans Reiser. It is currently supported by Linux and may be included in other operating systems
in the future. Introduced with version 2.4.1 of the Linux kernel, it was the first journaling file system
to be included in the standard kernel.
The most publicized advantage over what was the stock Linux file system at the time, Ext2, is that
it uses a transaction journal to record changes to file system structures. The journal allows the file
system to quickly return to a consistent state after an unscheduled system shutdown caused by a
9. Digital video
power outage or a system crash. This feature greatly reduces the risk of file system corruption (and
the need for lengthy file system checks). ReiserFS also handles directories containing huge numbers
of small files very efficiently. Unfortunately, converting a system to ReiserFS requires users of Ext2
to completely reformat their disks, which is a disadvantage not shared by its main competitor Ext3.
Because of its advantages, many Linux distributions have made it the default file system.
HFS and HFS+
HFS Plus or HFS+ is a file system developed by Apple Computer to replace its Hierarchical File
System (HFS) as the primary file system used on Macintosh computers. It is also one of the formats
used by the iPod hard-disk based music player. HFS Plus was introduced with the January 19, 1998,
release of Mac OS 8.1. This format is also referred to as Mac OS Extended.
HFS Plus is an improved version of HFS, supporting much larger files
(64-bit length instead of 32 bit) and using Unicode (instead of MacRoman) for naming the items (files, folders). HFS Plus permits filenames
up to 255 characters in length. HFS Plus also uses a full 32-bit allocation
mapping table rather than HFS’s 16 bits. This was a serious limitation
of HFS, meaning that no disk could support more than 65,536 sectors
under HFS. When disks were small, this was of little consequence, but
as they started to approach the 1 GB mark, it meant that the smallest
amount of space that any file could occupy (a single sector) became
excessively large, wasting significant amounts of disk space. Like HFS,
HFS Plus uses B-trees to store most volume metadata.
With the release of the 10.2.2 update on November 11, 2002, Apple added optional journaling features
to HFS Plus for improved data reliability. These features were easily accessible in the Mac OS X Server,
but were only accessible through the command line in the standard desktop client. However, in 2003
Mac OS X version 10.3 set all HFS Plus volumes on all Macs to be journaled by default.
XFS is a high-performance journaling file system created by SGI (Silicon Graphics Inc.) for their
Irix Unix implementation. In May 2000, SGI released XFS under an open source license. It comes by
default with the 2.5.xx and 2.6.xx versions of the Linux kernel, but it was not available to the 2.4.xx
kernel, except as a patch, until version 2.4.25 when it was stable enough to be merged into the main
The UNIX file system (UFS) is used by many Unix operating systems. It is derived from the Berkeley
Fast File System (FFS), which itself was originally developed from FS in the first versions of UNIX
developed at Bell Labs.
9. Digital video
Nearly all BSD Unix derivatives including FreeBSD, NetBSD, OpenBSD, NeXTStep, and Solaris, use
a variant of UFS. In Mac OS X it is available as an alternative to HFS. In Linux, partial UFS support
is available, and the native linux Ext2 file system is derived from UFS.
Mac OS X is the latest version of the Mac OS operating system for Macintosh computers. Developed
and published by Apple Computer, it provides the stability of a Unix operating environment and adds
popular features of the traditional Macintosh user interface. The operating system was first commercially released in 2001.
The type of connection between the hard drive and the system (motherboard and CPU) is defined by
one of a few standards.
The most popular are the Enhanced Integrated Drive Electronics drives (EIDE), which are also known as Advanced
Technology Attachment (ATA) drives.
Another format that used to be very popular, but is now
used much less, is the Small Computer System Interface
(SCSI). The reason for having less SCSIs in CCTV is that
the ATA drives have become comparable in speed and reliability, at a lower cost.
The ATA drives dominate the PC industry, and this is the
case with the DVRs as well.
Most modern PCs can talk to up to four EIDE drives The ATA (EIDE) parallel standard
without any additional hardware. This is because the
EIDE controller is usually embedded in the motherboards.
Although this could also be the case with the SCSI controllers, it is not so frequent, especially in the
last few years when the speed of ATA drives has become comparable to SCSI.
This is why most DVRs in CCTV can have up to four internal hard drives, providing, of course,
there is physical space for them.
The evolution of SCSI standards
9. Digital video
Current EIDE drives generally conform
to the ATA/100 or ATA/133 specification.
The 100 in ATA/100 indicates that up to
100 MB/s (megabytes per second) can be
transferred from the drive to the system
in short bursts, and similarly the ATA/133
indicates up to a 133 MB/s burst transfer
rate. It should be noted that the sustained
transferred rate is usually around half of
the burst rate.
Typically, only servers or large storage
machines use SCSI drives, which cost
much more and require an interface card.
SCSI is designed to “talk to” more than
four drives (usually up to 16 devices, one
The various SCSI connectors
of which is the SCSI card). This is one
more reason for using SCSI with larger
storage capacity machines, although it is a costlier option.
There are a few generations of SCSI standard; usually the latest has the fastest transfer speed.
The prevailing SCSI specifications, Ultra160 and Ultra 320, support faster, 160 MB/s and 320 MB/s
(respectively) burst transfer rates.
RAID works with redundancy and uses hot swappable ATA drives.
9. Digital video
Lately, with the increased demand for more storage but also increased redundancy, devices called RAID
are becoming more popular in CCTV.
RAID stands for Redundant Arrays of Inexpensive Disks, and the name describes its concept. It
combines multiple small, inexpensive disk drives (usually ATA) into an array of disk drives that yields
performance or redundancy. If one drive fails, data is not lost and there is usually a way to remove it
and replace it while running. RAID usually requires interface electronics, like the SCSI, and this array
of drives appears to the computer as a single logical storage unit or drive.
There are two considerations in selecting the right hard drives for RAID usage: drive capacity and
rotational speed. Today’s interfaces correspond exclusively to UltraATA/100 or even UltraATA/133,
so they are always fast enough. High rotational speeds allow maximum data transfer rates and minimal
access times, but they are accompanied by an increase in both heat and operating noise. RAID can
principally be used with any hard drive.
Six types of array architectures are known today, RAID-1 through RAID-6, and each provides disk
fault-tolerance (redundancy), with different tradeoffs in features and performance. In addition to these
six redundant array architectures, it has become popular to refer to a non redundant array of disk drives
as a RAID-0 array. The following is a summary of the seven different RAID versions:
RAID-0 (Striping)
This is the fastest and most efficient array type but offers no fault-tolerance, that is, no
redundancy. So, technically speaking, RAID mode 0 does not adhere to the principles of a
RAID. Hence RAID 0 offers no advantages in terms of security. All of the data are evenly distributed to all of the existing drives, which is usually referred to as stripe set. The only benefit
of RAID-0 is speed – the data transfer rate is increased by the number of drives. If even one of
those drive crashes, however, all of the stored data will be lost.
RAID-1 (Mirroring)
RAID-1 is basically the complete opposite of RAID-0. The goal here is not to boost performance but to ensure data security. When reading or writing data, all drives of the array are used
simultaneously. Hence, data is written synchronously to two or more drives, which is equivalent
to a perfect backup copy – perfect because the data is always 100% up to date. RAID-1 is the
array of choice for performance-critical, fault-tolerant environments.
RAID-2 (Striping)
Striping in RAID-2 is based on the same principle as RAID-0: the stripe set distributes the
data to all drives, though not in block form, but, rather, on a bit level. This is necessary because
an Error Correcting Code (ECC) is implemented in all transaction data. Additional hard drives
are necessary to store the resulting additional volume. If you wanted to guarantee complete
data security, you would have to deploy at least 10 data disks and 4 ECC disks. The next level
would entail 32 data disks and 7 ECC disks. This explains why RAID-2 never caught on.
9. Digital video
RAID-3 (Data Striping, Dedicated Parity)
RAID-3 incorporates prudent error correction. Data is allocated byte by byte to several hard
drives, whereas the parity data is stored in a separate drive. This is exactly the disadvantage
of RAID-3, as the parity drive has to be accessed with every access. So the advantage of RAID,
bundling the disk performance by distributing access, is partially offset. RAID-3 needs a minimum of three drives and it requires quite a complex controller, which is why RAID-3, similar to
levels 4 and 5, never caught on in the mass market. RAID-3 drives are used in data-intensive or
single-user environments that access long sequential records to speed up data transfer. However,
RAID-3 does not allow multiple I/O operations to be overlapped and requires synchronizedspindle drives in order to avoid performance degradation with short records.
RAID-4 (Data Striping, Dedicated Parity)
The technology of RAID-4 is similar to that of RAID-3, except that the individual stripes
are not written in bytes but in blocks. In theory, this should speed things up, but the parity
drive still remains the bottleneck. So, RAID-4 offers no advantages over RAID-5 and does not
support multiple simultaneous write operations.
RAID-5 (Distributed Data, Distributed Parity)
RAID-5 is generally considered the best compromise between data security and performance. Not only the data, but also the parity information, is distributed to all the existing drives.
The resulting advantage is that RAID is only a bit slower than RAID 3. However, failure safety
is limited, as only one hard drive can safely crash. At least three hard drives are required in
each case. RAID-5 is a good choice in multi-user environments that are not write performance
RAID-6 (Distributed Data, Distributed Parity)
RAID-6 is very similar to RAID-5, except that twice the amount of parity information is
stored. Although this cuts down on performance a bit, it allows up to two hard drives to crash.
It does, however, require a minimum of five drives.
RAID is becoming more popular in CCTV because it offers extended recording time and redundancy.
Most commonly used is RAID-5, although some manufacturers offer RAID-1 mirroring.
The latest standard interface between a computer motherboard and hard disks is the Serial ATA (SATA).
The SATA has evolved from the legacy parallel ATA standard, and it has three main advantages over
its predecessor: speed, cable management, and hot-swappability.
Initially, Serial ATA was released at 150 MB/s, but it is designed to scale up quite substantially from
there. It is expected that SATA v.2.0 will double throughput to 300 MB/s, and 600 MB/s is planned
for around 2007. The current SATA transfer rate of 150 MB/s is still only 17 MB/s faster than the fastest parallel ATA interface ATA/133. Parallel buses have difficulty in reaching ever higher speeds due
to problems keeping all the data lines in sync. Serial ATA uses the new standards for signaling. Still,
9. Digital video
the need for such a high-speed interface
could be debated as hard disks are almost
always a bandwidth bottleneck, being
mechanical devices.
Physically, the cables used are the largest change. The data is carried by a light
flexible seven-conductor wire with 8 mm
wide wafer connectors on each end. It can
be anywhere up to 1 meter long. Compared to the short (45 cm), ungainly 40
or 80 conductor ribbon cables of parallel
ATA, this will come as a great relief to
system builders. In addition, airflow and
therefore cooling in equipment will be
improved. The concept of a master-slave
Serial ATA hard drive connectors
relationship between devices has been
dropped. SATA has only one device per cable. The connectors are keyed so that it should no longer be
possible to install cable connectors upside down, which often is a problem with IDE types.
Native SATA hard disks also require a different power connector
as part of the standard. It is wafer based but wider than the data
cable, so it should not be possible to confuse the two. Fifteen pins
are used to supply three different voltages if necessary – 3.3 V,
5 V, and 12 V. The same physical connections are used on 3.5"
and 2.5" (notebook) hard disks. In the transitional period between
parallel and serial ATA, various adapters are planned to convert
one to the other. To perform the serial to parallel translation or
vice versa, a bridge is used. There is a noticeable performance
penalty for such an arrangement, however, and tests conducted
in early 2003 show throughput reduced around 30 to 50%. Many
hard drive manufacturers, however, now produce native Serial
ATA hard drives.
Serial ATA cables
MTBF (Mean Time Between Failure)
A majority of hard disk manufacturers quote MTBF (Mean Time Between Failure) numbers for their
hard drives. Typical hard disk MTBF numbers are anything between 300,000 and 1,000,000 hours.
This is quite a high number, equivalent to 30 to 100 years. These numbers are more theoretical, rather
than a guarantee. Technology does not allow hard drives to be used for more than a couple of years;
they are quickly outdated, but mathematical calculation and statistical experience offer an important
indicator about the hard disk life quality and life expectancy.
9. Digital video
Practice has shown that hard drives fail sooner than their MTBF predicts. Some of the main reasons
(apart from the quality of manufacture) are, as we already mentioned, physical mistreatment (shocks
and vibrations), temperature (insufficient cooling), and dust.
The MTBF is based on a simple exponential distribution of failure
Failure Probability =
where e is the natural base number e = 2.71, t is the time for which this probability is calculated, and
M is the MTBF. So, for example, for a 500,000 hour MTBF drive there is 1% probability it will fail in
7 months, 5% in 3 years, 10% in 6 years, and 50% in 40 years.
9. Digital video
10. Transmission media
Once the image has been captured by a lens and a camera and then converted into an electrical signal,
it is further taken to a switcher, a monitor, or a recording device.
In order for the video signal to get from point A to point B, it has to go through some kind of transmission medium. The same applies to the control-data signal.
The most common media for video and data transmission in CCTV are as follows:
• Coaxial cable
• Twisted pair cable
• Microwave link
• RF open-air transmission
• Infrared link
• Telephone line
• Fiber optics cable
• Network
For video transmission, a coaxial cable is most often used, but fiber optics is becoming increasingly
popular with its superior characteristics. Mixed means of transmission are also possible, such as video
via microwave and PTZ control data via twisted pair, for example.
We will go through all of them separately, but we will pay special attention in this chapter to the coaxial
cable and fiber optics transmission. Network transmission has become so important to CCTV in the
last half a dozen years that we have a separate chapter dedicated to this topic.
A variety of fiber optic cables
10. Transmission media
Coaxial cables
The concept
The coaxial cable is the most common medium for transmission of video signals and sometimes video
and PTZ data together. It is also known as unbalanced transmission, which comes from the concept
of the coaxial cable (sometimes called “coax” for short).
A cross section of a coax is shown to the
right. It is of a symmetrical and coaxial
construction. The video signal travels
through the center core, while the shield
is used to common the ground potential
of the end devices – the camera and
the monitor, for example. It not only
commons the ground potential, but also
serves to protect the center core from
external and unwanted electromagnetic
interference (EMI).
The idea behind the coaxial concept is to have all the unwanted EMI induced in the shield only. When
this is properly grounded, it will discharge the induced noise through the grounds at the camera and
monitor ends. Electrically, the coaxial cable closes the circuit between the source and the receiver,
where the coax core is the signal wire, while the shield is the grounding one. This is why it is called
an unbalanced transmission.
10. Transmission media
Noise and electromagnetic interference
How well the coax shield protects the center core from noise and EMI depends on the percentage
of the screening. Typically, numbers between 90 and 99% can be found in the cable manufacturer’s
specifications. Have in mind, however, even if the screening is 100%, that it is not possible to have
100% protection from external interference. The penetration of EMI inside the coax depends on the
Theoretically, only frequencies above 50 kHz are successfully suppressed, and this is due mostly to
the skin-effect attenuation. All frequencies below this will induce current in smaller or bigger form.
The strength of this current depends on the strength of the magnetic field. Our major concern would
be, obviously, the mains frequency (50 or 60 Hz) radiation, which is present around almost all artificial
This is why we could have problems running a coaxial cable parallel to the mains. The amount of
induced electromagnetic voltage in the center core depends first on the amount of current flowing
through the mains cable, which obviously depends on the current consumption on that line. Second,
it depends on how far the coax is from the mains cable. And last, it depends on how long the cables
run together. Sometimes 100 m might have no influence, but if strong current is flowing through the
mains cable, even a 50 m run could have a major influence. When installing, try (whenever possible)
not to have the power cables and the coaxial cables very close to each other; at least 30 cm would be
sufficient to notably reduce the EMI.
The visual appearance of the induced (unwanted) mains frequency is a few thick horizontal bars slowly
scrolling either up or down. The scrolling frequency is determined by the difference between the video
field frequency and the mains frequency and can be anything from 0 to 1 Hz. This results in stationary
or very slow-moving bars on the screen.
Other frequencies will be seen as various noise patterns, depending on the source. A rule of thumb is
that the higher the frequency of the induced unwanted signal, the finer the pattern on the monitor will be.
Intermittent inducting, like lightning or cars passing by, will be shown as an irregular noise pattern.
Characteristic impedance
Short wires and cables used in an average electronic piece of equipment have negligible resistance,
inductance, and capacitance, and they do not affect the signal distribution. If a signal, however, needs
to be transmitted for a longer distance, many factors add up and contribute to the complex picture
of such transmission media. This especially influences high-frequency signals. Then, the resistance,
10. Transmission media
inductance, and capacitance play a considerable role and visibly affect the transmission.
A simple medium like the coaxial cable, when analyzed by the electromagnetic theory, is approximated
with a network of resistors (R), inductors (L), capacitors (C), and conductors (G) per unit length (as
shown on the diagram on the previous page). For short cable runs this network has a negligible influence
on the signal, but for longer runs it becomes noticeable. In such a case the network of R, L, and C
elements becomes so significant that it acts as a crude low-pass filter that, in turn, affects the amplitude
and phase of the various components in the video signal. The higher the frequencies of the signal, the
more they are affected by these nonideal cable properties.
Each cable is uniformly built and has its own characteristic impedance, which is defined by the R,
L, C, and G per unit length.
The main advantage of the unbalanced
video transmission (which will be shown a
little bit later) is based on the fact that the
characteristic impedance of the medium
is independent of the frequency (refers
mainly to the mid and high frequencies),
while the phase shift is proportional to the
The amplitude and phase characteristics
of the coax at low frequencies is very
dependent on the frequency itself, but since
the cable length in such cases is reasonably
short compared to the signal wavelength, it
results in negligible influence on the signal
When the characteristic impedance of
the coaxial cable is matched to the video
source output impedance and the receiving
unit input impedance, it allows for a
maximum energy transfer between the
source and the receiver.
For high-frequency signals, as is the video,
impedance matching is of paramount
importance. When the impedance is not
matched, the whole or part of the video
signal is reflected back to the source,
affecting not only the output stage itself,
but also the picture quality. A 100%
reflection of the signal occurs when the
end of the cable is either short circuited or
10. Transmission media
left open. The total (100%) energy of the signal (voltage × current) is transferred only when there is
a match between the source, transmission media, and the receiver. This is why we insist that the last
element in the video signal chain should always be terminated with 75 Ω (the symbol Ω stands
for ohms).
In CCTV, 75 Ω is taken as a characteristic impedance for all the equipment producing or receiving
video signals. This is why the coaxial cable is meant to be used with 75 Ω impedance. This does not
exclude manufacturers producing, say, 50 Ω equipment (which used to be the case with some broadcast
or RF equipment), but then impedance converters (passive or active) need to be used between such
sources and 75 Ω recipients.
Impedance matching is also done with the twisted pair transmitters and receivers, which will be
discussed later in this chapter.
The 75 Ω of the coax is a complex impedance, defined by the voltage/current ratio at each point
of the cable. It is not a pure resistance, and therefore it cannot be measured with an ordinary
To calculate the characteristic impedance, we will make use of the electromagnetic theory as mentioned
earlier and we will represent the cable with its equivalent network, composed of R, L, C, and G per unit
length. This network, as shown on the schematic diagram previously, has an impedance of:
where, as already explained, R is the resistance, L is the inductance, G is the conductance, and C is
the capacitance between the center core and the shield, per unit length. The symbol j represents the
imaginary unit (square root of –1), which is used when representing complex impedance, ω = 2πf,
where f is the frequency.
If the coaxial cable is of a reasonably short length (less than a couple of hundred metres), R and G can
be ignored, which brings us to the simplified formula for the coax impedance:
This formula simply means that the characteristic impedance does not depend on the cable length
and frequency but on the capacitance and inductance per unit length. This is not true, however,
when the length of a cable like RG-59/U exceeds a couple of hundred meters. The resistance and the
capacitance then become significant, and they do affect the video signal. For reasonably short lengths,
however, the above approximation is pretty good.
The cable limitations we have are mainly a result of the accumulated resistance and capacitance, which
are so high that the approximation (49) is no longer valid and the signal is distorted considerably. This
is basically in the form of voltage drop, high-frequency loss, and group delay.
10. Transmission media
The most commonly used
coaxial cable in CCTV is
the RG-59/U, which can
successfully and without
in-line correctors, transfer
B/W signals up to 300 m and
color up to 200 m.
The other popular cable
is the RG-11/U, which is
thicker and more expensive.
Its maximum recommended
lengths are up to 600 m for a
B/W signal and 400 m for a
color signal. There are also
thinner coaxial cables with 75 Ω impedance, with only 2.5 mm diameter or even coax ribbon cables.
They are very practical for crowded areas with many video signals, such as matrix switchers with
many inputs. Their maximum cable run is much shorter than the thicker representatives, but sufficient
for links and patches. Note that these numbers
may vary with different manufacturers and signal
quality expectations.
The difference between the B/W and color signal
maximum run is due to the color subcarrier of
4.43 MHz for PAL or 3.58 for NTSC. Since a long
coaxial cable acts as a low-pass filter, the color
information will obviously be affected sooner
than the lower frequencies, so the loss of color
information will happen before the loss of details
in the lower frequencies.
If longer runs are required, additional devices
can be used to equalize and amplify the video
spectrum. Such devices are known as inline amplifiers, cable equalizers, or cable
correctors. Depending on the amplifier (and
cable) quality, double or even triple lengths are
In-line amplifiers are best if they are used in
the middle of the cable run because of the more
acceptable S/N ratio, but this is quite often
impossible or impractical due to the need for
power supply and storage. So, the majority of
in-line amplifiers available in CCTV are designed
Miniature coaxial cable can save a lot of
space and improve accessibility.
10. Transmission media
to be used at the camera end, in which case we actually have pre-equalization and pre-amplification
of the video signal. There are, however, devices that are used at the monitor end, and they have 1Vpp
output with post-equalization of the video bandwidth.
Starting from the above theoretical explanation of the impedance, it can be seen that the cable uniformity
along its length is of great importance for fulfilling the characteristic impedance requirements. The
cable quality depends on the precision and uniformity of the center core, the dielectric, and the
shield. These factors define the C and L values of the cable, per unit length. This is why careful
attention should be paid to the running of the cable itself and its termination. Sharp loops and
bends affect the cable uniformity and consequently the cable impedance. This results in high-frequency
losses (i.e., fine picture detail loss), as well as double images due to signal reflections. So, if a short
and good-quality cable is improperly run, with sharp bends and kicks, the picture quality will still be
far from perfect.
Bends no smaller than 10 times the diameter of the coax are
suggested for best performance. This is the equivalent of saying
“bending radius should not be smaller than 5 times the diameter,
or 10 times the radius of the cable.” This means an RG-59/U
cable should not be bent in a loop with a diameter smaller than 6
cm (2.5''), and an RG-11/U should not be bent in a loop smaller
than 10 cm (4'') in diameter.
Copper is one of the best conductors for a coaxial cable. Only
gold and silver will show a better performance (resistance, corrosion), but these are too expensive to be
used for cable manufacturing. A lot of people believe that copper-plated steel makes a better cable, but
this is not correct. Copper-plated steel can only be cheaper and perhaps stiffer, but for longer lengths,
in CCTV, copper would be the better choice. Copper-plated steel coaxial cables are acceptable for
master antenna (MATV) installations, where the transmitted signals are RF modulated (VHF or UHF).
Namely, with higher frequencies the so-called skin effect becomes more apparent where the actual
signal escapes on the copper-plated surface of the conductor (not the shield, but the center conductor).
CCTV signals are, as explained, in the basic bandwidth, and this is why a copper-plated steel coaxial
cable might be okay for RF signals but not necessarily for CCTV. So one should always look for a
copper coaxial cable.
BNC connectors
A widely accepted coaxial cable termination, in CCTV, is the
BNC termination. BNC stands for Bayonet-Neil-Concelman
connector, named after its designers. There are three types:
screwing, soldering, and crimping.
Crimping BNCs are proven to be the most reliable of all. They
require specialized and expensive stripping and crimping tools,
but it pays to have them. Of the many installations done in the
10. Transmission media
industry, more than 50% of problems are proven to be a result of bad or incorrect termination.
An installer does not have to know or understand all the equipment used in a system (which will be
commissioned by the designer or the supplier), but if he or she does proper cable runs and terminations,
it is almost certain that the system will perform at its best.
There are various BNC products available on the market, of which the male plug is the most common.
Female plugs are also available, as well as right angle adaptors, BNC-to-BNC adaptors (often called
“barrels”), 75 Ω terminators (or “dummy loads”), BNC-to-other-type of video connection, and so
Breaking the cable in the middle of its length and terminating it will contribute to some losses of the
signal, especially if the termination and/or BNCs are of a bad quality. A good termination can result in as
small as 0.3 to 0.5 dB losses. If there are not too many of them in one cable run, this is an insignificant
There are silver-plated
and even gold-plated
BNC connectors designed
to minimize the contact
resistance and protect the
connector from oxidation,
which is especially critical
near the coast (salt water and
air) or heavily industrialized
A good BNC connector kit should include a gold-plated or silver-plated center tip, a BNC shell body,
a ring for crimping the shield, and a rubber sleeve (sometimes called a “strain relief boot”) to protect
the connector’s end from sharp bends and oxidation.
Coaxial cables and proper BNC termination
Never terminate a coaxial cable with electrical cutters or pliers. Stripping the coaxial cable to the
required length using electrical cutters is very risky. First, small pieces of copper fall around the center
core, and one can never be sure that a short circuit will not occur. Also, the impedance changes even
if they do not short circuit the core and the shield. Second, using normal pliers for fixing the BNC to
the coaxial cable is never reliable. All in all, these are very risky tools to terminate crimping BNCs,
and they should only be used when no other tools are available (remember to always take utmost care
10. Transmission media
when using them).
If you are an installer, or a CCTV technician who regularly terminates coaxial cables, get yourself a
proper set of tools. These are precise cutters, a stripping tool, and a crimping tool.
Make sure you have the crimping and stripping tools for the right cable. If you are using RG-59/U
(overall diameter 6.15 mm) do not get it confused with RG-58/U (overall diameter 5 mm) even though
they look similar. For starters, they have a different impedance, i.e., RG-59/U is 75 Ω, compared to
RG-58/U which is 50 Ω. Next, RG-59/U is slightly thicker, both in the center core and the shield.
There are BNC connectors for the RG-58/U which look identical externally, but they are thinner on
the inside.
The best thing to do is to
waste one and try terminating
it before proceeding with
the installation. Sometimes
a small difference in the
cable’s dimensions, even if
it is RG-59/U, may cause a
lot of problems fitting the
connectors properly.
Technically, a solid center core coaxial cable is better, both from the impedance point of view (the
cable is stiffer and preserves the “straightness”) and from the termination point of view. Namely, when
terminating the solid core cable, it is easier to crimp the center tip, compared to the stranded core
cable which is too flexible. Some people may prefer a stranded center core coax, mainly because of its
flexibility, in which case care should be taken when terminating as it is very easy to short circuit the
center core and the shield because of its flexibility.
If there are no other tools available, it is best to get the soldering-type BNC connectors and to terminate
the cable by soldering. Care should be taken with the soldering iron’s temperature, as well as the quality
of the soldering, since it can easily damage the insulation and affect the impedance. In this instance, a
multistranded core coax would be better.
10. Transmission media
If you have a choice of crimping connectors,
look for the ones that are likely to last longer
in respect to physical use and corrosion, like
silver-plated or gold-plated BNCs. A good
practice would be to use “rubber sleeves”
(sometimes called “protective sleeves”) for
further protection of the interior of the BNC
from corrosion and to minimize bending stress
from plugging and unplugging.
In special cases, as with pan/tilt domes, there
might be a need for a very thin and flexible 75
Ω coaxial cable (due to constant panning and
tilting of the camera). Such cables are available
from specialized cable manufacturers, but do
not forget that you need special BNCs and tools
for them.
Even if such a cable could be as thin as 2.5
mm, as is the case with the RG-179 B/U cable,
the impedance would still be 75 Ω, which is
achieved by the special dielectric and center
core thickness. The attenuation of such a
cable is high, but when used in short runs it is
For installations where much longer runs are
needed, other 75 Ω cables are used, such as
RG-11B/U with an overall diameter of more
than 9 mm. Needless to say, an RG-11 cable also needs special tools and BNCs for termination. Some
installers use machines purposely built to strip or label coaxial cables. Although these machines are
expensive and hard to find, they do exist and if you are involved in very large installations they are a
worthwhile investment.
10. Transmission media
In the accompanying table below, typical attenuation figures are
shown for various coax cables. Please note that the attenuation
is shown in decibels and that it refers to the voltage amplitude
of the video signal. If we use the decibel table shown under the
section of S/N for cameras, it can be worked out that 10 dB is
equivalent to attenuation of the signal to 30%, that is, 0.3 VPP .
The RG-59 will attenuate the signal for 10 dB after 300 m. Such
low-signal amplitude may be insufficient for a monitor or VCR
to lock onto. This is the point of attenuation where we would
usually require an amplifier to boost the signal.
Installation techniques
Prior to installation, it should be checked what cable length can be obtained from the supplier. Rolls of
approximately 300 m (1000 ft) are common, but 100 m and 500 m can also be found. Naturally, it is
better to run the cable in one piece whenever possible. If for some reason the installers need a longer
run than what they have, the cable can be extended by terminating both the installed and the extension
cable. In such a case, although it is common practice to have a BNC plug connected to another BNC
plug with a so-called barrel adaptor, it is better to minimize joining points by using one BNC plug
and one socket (i.e., “male” and “female” crimping BNCs).
Before cable laying commences, the route should be inspected for possible problems such as feedthrough, sharp corners, and clogged ducts. Once a viable route has been established, the cable lengths
should be arranged so that any joints, or possible in-line amplifier installation, will occur at accessible
At the location of a joint it is important to leave an adequate overlap of the cables so that sufficient material is
available for the termination operation. Generally, the overlap required does not need to be more than 1 m.
10. Transmission media
Whenever possible, the cable should be run inside a
conduit of an adequate size. Conduits are available
in various lengths and diameters, depending on
the number of cables and their diameters. For
external cable runs, a special conduit with better UV
protection is needed. In special environments, such
as railway stations, special metal conduits need to be
used. These are required because of the extremely
high electromagnetic radiation that occurs when
electric trains pass.
Similar treatment should be applied when a coaxial
cable is run underground. When burying a cable,
foremost consideration should be given to the
prevention of damage due to excessive local loading
points. Such loading may occur when backfill
material or an uneven trench profile digs into the
cable. The damage may not be obvious instantly but
the picture will get distorted due to the impedance
change at the point of the cable’s distortion. No
matter what, the cost of digging up the cable and
repairing it makes the expenditure of extra effort
during laying well worthwhile.
The best protection against cable damage is laying the cable on a bed of sand approximately 50 to 150
mm deep and backfilling
with another 50 to 150 mm
of sand. Due care needs to
be exercised in the cutting
of the trench so that the
bottom of the trench is fairly
even and free of protrusions.
Similarly, when backfilling,
do not allow soil with a
high rock content to fall
unchecked onto the sand
and possibly put a rock
through the cable, unless
your conduit is extremely
The trench depth is dependent
on the type of ground being
traversed as well as the load
that is expected to be applied
10. Transmission media
to the ground above the cable. A cable in
solid rock may need a trench of only 300
mm or so, whereas a trench in soft soil
crossing a road should be taken down to
about 1 m. A general-purpose trench, in
undemanding situations, should be 400 to
600 mm deep with 100 to 300 mm total
sand bedding.
Placing a coaxial cable on cable trays
and bending it around corners requires
observing the same major rule: minimum
bending radius. As mentioned, the
minimum bending radius depends on the
coaxial cable size, but the general rule
is that the bending radius should not
be smaller than 5 times the diameter
of the cable (or 10 times the radius).
The minimum bending radius must be
observed even when the cable tray does
not facilitate this. The tendency to keep it
neat and bend the coaxial cable to match
power and data cables on the tray must be
avoided. Remember, bending coax more
than the minimum bending radius affects
the impedance of the cable and causes a
video signal quality loss.
The pulling of coaxial cables through
ducts is performed by using a steel
or plastic leader and then joining and
securing all the cables that need to
go through. Some new, tough plastic
materials, called “snakes,” are becoming more popular.
The types of cable ties normally used to tie the cables together are generally satisfactory, but remember,
excessive force should not be applied, as it squashes the coax and therefore changes the impedance
Should a particular duct require the use of a lubricant, it is best to obtain a recommendation from the
cable manufacturer. Talcum powder and bean-bag-type polystyrene beans can also be quite useful in
reducing friction.
In some conditions the cable may already be terminated by connectors. These must be heavily protected
while drawing the cable. The holes in such a case need to be bigger.
10. Transmission media
Between the secured points of a cable
it is wise to allow a little slack rather
than leaving a tightly stretched length
that may respond poorly to temperature
variations or vibration.
If the cable is in some way damaged
during installation, then leave enough
extra cable length around the damaged
area so that additional BNC joiners can
be inserted.
Time domain reflectometer (TDR)
When a complex and long coaxial cable
installation is in question, it would
be very useful to get a time domain
reflectometer to help determine the
location of bad cable spots.
The TDR works on a basic principle in
that it inserts short and strong pulses
and measures the reflected energy. By
determining the delay between the
injected and the reflected signals, a pretty
accurate localization of bad termination
points and/or sharp bends can be made.
This can be especially important if the
cable goes through inaccessible places.
Twisted pair video transmission
Twisted pair cable is an alternative to the coaxial cable. It is useful in situations where runs longer
than a couple of hundred meters have to be made. It is especially beneficial when only two wires have
already been installed between two points.
Twisted pair cable is reasonably cheap when used with normal wires, but if a proper cable (as per
the recommendations by the manufacturers) is used, with at least 10 to 20 twists per meter and with
shielding, the price becomes much higher.
Twisted pair transmission is also called balanced video transmission.
The idea behind this is very simple and different to the unbalanced (coaxial) video transmission. Namely,
to minimize the external electromagnetic interference, the twisted pair trick is to have a signal converted
10. Transmission media
into balanced mode and sent via twisted wires. All the unwanted electromagnetic interference and noise
will eventually induce an equal amount of current in both of these wires. This is why we need a proper
twisted pair – the idea is to have both of the wires equally exposed to the interference and the voltage
drop. Unlike the coaxial transmission, where the shield is grounded and commons the zero potential
between the two points, the twisted pair video transmission concept does not common the zero
potential between the end points. So when the signal arrives at the twisted pair receiver end, it first
comes to a differential amplifier input, with a well-balanced and good common mode rejection ratio
(CMRR) factor. This differential amplifier reads the differential signal between the two wires.
If the two wires have similar characteristics and enough twists per meter (the more the better), they
will be equally affected by noise, voltage drops and induced signals. With a good CMRR amplifier at
the receiver end, most of the unwanted noise will be eliminated.
The output impedance of the twisted pair transmitters is usually 100 Ω.
The drawback of this type of transmission is that one transmitting and one receiving unit are necessary in
addition to the cable. They increase not only the cost of the system but also the risk of having the signal
lost if either of the two components fails. Quality, however, is on the increase, and price is dropping.
If the correct cable is used, much longer distances can be achieved than what is possible with an RG59 or even an RG-11 cable. Manufacturers usually quote over 2000 m for B/W signals and more than
1000 m for color, without any in-line repeaters. Furthermore, when balanced transmission is used, no
ground loops are apparent as with coax. Termination of the twisted pair cable does not require special
tools and connectors. All these facts make such transmission even more attractive.
I have always preferred coax installation. But after seeing a major installation at the Frankfurt airport
done with twisted pair video, I found the signal quality was,
to my surprise, as good as with coax. I am now convinced
that with proper equipment selection, both cable and
transmitter/receiver pair, this might be a good alternative to
coax. In fact, I have seen even more installation in the last
five years being done with twisted pair, which is especially
practical with DVRs, as a majority of them are extremely
sensitive to ground loops.
10. Transmission media
Microwave links
Microwave links are used for good-quality wireless video transmission.
The video signal is first modulated with a frequency that belongs in the microwave region of the
electromagnetic spectrum. The wavelengths of this region are between 1 mm and 1 m. Using the
known equation between wavelength (λ) and the frequency (f ) :
where c is the speed of light 300,000,000 m/s, we can find out that the microwave region is between 300
MHz and 30 GHz. The upper region actually overlaps with the infrared frequencies that are defined as
up to 100 GHz. Therefore, the lower part of the infrared frequency spectrum is also in the microwave
region. In practice, however, the typical frequencies used for microwave video transmission are between
1 GHz and 10 GHz.
Since many services, such as the military, the police, ambulances, couriers, and aircraft radars use
artificial frequencies, there is a need for some regulation of frequency. This is done on an international
level by the International Communications Union (ITU) and by the local authorities in your respective
country. For Australia this was the Department of Transport and Communications, which was recently
renamed the Spectrum Management Agency. Thus, a very important fact to consider when using
microwave links in CCTV is that each frequency and microwave power needs to be approved by the
local authority in order to minimize interference with the other services using the same spectrum. This
is to protect the registered users from new frequencies, but it is also a downfall (at least in CCTV) for
using microwaves and one of the reasons why a lot of CCTV designers turn to microwaves only as a
last resort.
Microwave links transmit a very wide bandwidth of video signals as
well as other data if necessary (including audio and/or PTZ control).
The transmission bandwidth depends on the manufacturer’s model. For
a well-built unit, a 7 MHz bandwidth is typical and sufficient to send
high-quality video signals without any visible degradation.
Microwaves are usually unidirectional when a CCTV video signal is sent
from point A to point B, but they can also be bidirectional when a video
signal needs to be sent in both directions, or video in one and data in the
other. The latter is very important if PTZ cameras are to be controlled.
10. Transmission media
The encoding technique in video transmission is usually frequency modulation (FM), but amplitude
modulation (AM) can also be used. If audio and video are transmitted simultaneously, usually the video
signal is AM modulated and the audio FM, as is the case with broadcast TV signals.
A line of sight is needed between the transmitter and the receiver. In most cases, the transmitting
and receiving antennas are parabolic dishes, similar to those used for satellite TV reception.
The distances achievable with this technology depend on the transmitter output power and on the diameter
of the antenna that contributes to the gain of the transmitter and the sensitivity of the receiver.
Obviously, atmospheric conditions will affect the signal quality. The same microwave link that has an
excellent picture during a nice day may have considerable signal loss in heavy rain if it is not designed
properly. Fog and snow also affect the signal. If the parabolic antenna is not anchored properly, wind
may affect the links indirectly by shaking it, causing an intermittent loss of line of sight.
Many parabolic antennas come with a plastic or leather cover that
protects the actual inner parabola. This protector simultaneously
breaks the wind force and protects the sensitive parts from rain
and snow.
The fitting and stability of a microwave antenna are of paramount
importance to the links. The longer the distance that is required,
the bigger the antenna and more secure fittings that are required.
The initial line-of-sight alignment is harder to achieve for longer
distances, although better quality units have a field strength
indicator built in, which helps to make the alignment easier.
Maximum achievable transmitting distances of up to 30 km are quoted by most specialized manufacturers.
In most cases a typical CCTV application will require only a couple of hundred meters, which is often
not a problem as long as there is a line of sight.
The transmitting power and the size of the antenna required for a specific distance need to be confirmed
with the manufacturer.
For shorter distances, microwave links may use rod or other types of nonparabolic antennas, which
become very practical if dimensions are in question. The obvious security problem in such a case
would be the omnidirectional radiation of the signal, but the advantage would be a fairly wide area of
One very interesting application that was initially developed in Australia was to use an omnidirectional
microwave with a transmitting antenna fitted on top of a race car roof, which would send signals to a
helicopter above the race track. From there it would be redirected to a TV broadcast van. With such a
setup, the so-called Race Cam allowed the television audience to see the driver’s view.
Most microwave manufacturers have RS-232 links available for camera and other remote control data,
but also have in mind that some CCTV manufacturers offer their controls in audio bandwidth, so you
10. Transmission media
can actually use an audio channel of the microwave (in the opposite direction of the video signal) to
control PTZ cameras.
Lately, with the introduction of digital video and networking, an increased number of “digital
microwaves” are establishing basically a network link between two points. That way, a digital video
system can easily become interlinked by simply assigning to it an IP address. The most popular “free
to use” frequency is 2.4 GHz, and here too, as in the analog microwave links, the maximum distance
achievable depends on the link power and the antenna size.
RF wireless (open air) video transmission
RF video transmission is similar to the microwave transmission in the way the modulation of signals
is done. The major differences, however, are the modulation frequency in the VHF or UHF bands and
the transmission of the signal, which is usually omnidirectional. When a directional Yagi antenna is
used (similar to the domestic ones used for reception of a specific channel), longer distances can be
achieved and there will be less distraction to the surrounding area. It should be noted, however, that
depending on the regulations of your country, the radiated power cannot be above a certain limit, after
which you will require approval from your respective frequency regulatory body.
RF transmitters are usually made with video and audio inputs and the modulation techniques are similar
to those of the microwaves: video is AM modulated and audio is FM. The spectrum transmitted depends
on the make, but generally it is narrower than the microwave. This usually means 5.6 MHz, which is
sufficient to have audio and video mixed into one signal.
Consumer products with similar characteristics to those listed above are found in the so-called RF
senders, or wireless VCR links. The RF modulator is fed with the audio and video signals of the VCR
outputs and re-modulates and then transmits them so they can be picked up by another VCR in the
house. Devices like these are not made with CCTV in mind, so the distances achievable are in the
vicinity of a household area. Sometimes this might be a cheap and easy way out of a situation where
a short-distance wireless transmission is required.
Since VHF and UHF bands are for normal broadcast TV reception, you should check with your local
authority and use channels that do not interfere with the existing broadcasting. In most countries, UHF
channels 36 to 39 are deliberately not used by the TV stations because they are left for VCR-to-TV
conversion, video games, and similar uses.
10. Transmission media
The downfall of such an RF CCTV transmission is that any TV receiver at a
reasonable distance can pick the signal up. Sometimes, however, this might
be exactly what is wanted. This includes systems in big building complexes,
where the main entrance cameras are injected through the MATV system
so that the tenants can call the camera on a particular channel of their TV
The RF frequency is such that, when compared to the microwave links, it does not require line of sight,
as the RF (depending upon whether it is UHF or VHF) can penetrate through brick walls, wood, and
other nonmetal objects. How far one can go with this depends on many factors, and the best bet is to
test it out in the particular environment (in which the RF transmitter will be used).
Infrared wireless (open air) video transmission
As the heading suggests, an infrared open air video transmission uses optical means to transmit a video
signal. An infrared LED is used as a light carrier. The light carrier is intensity modulated with a video
signal. Effectively, this type of transmission looks like a hybrid between microwave and fiber optics
transmission (to be discussed a little bit later). Instead of the microwave frequencies being used, it is the
infrared that are used (infrared frequencies are higher). And instead of sending such light modulation
over a fiber optics cable (such is the case in fiber optics, using the principles of total reflection), open
space is used. We therefore have to have line of sight. The obvious advantage of the infrared light
radiation is that you do not need a special license.
In order to have the infrared light concentrated into a narrow beam to minimize dispersion light losses,
a lens assembly is required at the transmitter end to concentrate the light into a narrow beam and a lens
assembly at the receiver end to concentrate the light onto the photosensitive detector.
Both color and B/W, as well as audio, can be transmitted over distances of more than 1 km. Bigger
lens assemblies and more powerful LEDs, as well as a more sensitive receiving end, will provide for
even longer distances.
Special precautions have to be taken here for the temperature
around the transmitter so that the receiver does not detect those
infrared frequencies radiated by hot walls, roofs, and metal
10. Transmission media
Understandably, weather conditions like rain, fog, and hot wind will affect infrared links more than
microwave transmission.
Transmission of images over telephone lines
First there was the “slow-scan TV.” That was a system that would send video pictures over a telephone
line at a very slow speed, usually many tens of seconds for a full-frame B/W picture. Then came up the
“fast scan” which was a popular alternative to the “slow-scan.” During the preparation of this book
almost the entire CCTV industry is turning toward using the Internet as a substitute for telephone line
point-to-point communication. The Internet connectivity becomes almost as good as the telephone one.
Most businesses and households have, or are getting, fast Internet connection, usually using their existing
telephone copper lines (this is the so called DSL) which offers a faster transmission than the “fast scan.”
Still, in order to be complete with this heading, and for the places without the Internet, we are going to
say a few words about the typical telephone line image transmission technology in CCTV.
The slow-scan concept originates from the late 1950s, when some amateur radio operators used it. It
was later applied to CCTV. The concept is very simple: There are units at both ends of the transmission
path, as with any other transmission, a transmitter and a receiver. An analog video signal of a camera is
captured and converted into digital format. It is then stored in the random access memory (RAM) of the
slow-scan transmitter. This is usually triggered by an external alarm or upon the receiver’s request. The
stored image, which is at this stage in a digital format, is usually frequency modulated with an audio
frequency that can be heard by the receiving phone. This frequency is usually between 1 and 2 kHz,
i.e., where the phone line attenuation is lowest. When the receiver receives the signal, it reassembles
the picture line by line, starting from the top left-hand corner until the picture at the receiving end is
converted into an analog display (a steady picture).
This concept was initially very slow, but considering the unlimited distances offered by telephone
lines (provided there was a transmitter compatible with the receiver), it was an attractive concept for
remote CCTV monitoring.
The slow-scan transmitter would usually have more cameras attached to it, so the viewer could browse
10. Transmission media
through all of them. Also, any camera could send an image automatically when triggered by an external
alarm associated with it. More transmitters could thus report to one or more receiving stations, each
one protected by a password to eliminate unwanted listeners.
One way of increasing the speed of transmission was to reduce the digitized picture resolution or to
use only a quarter of the screen for each camera. So the initial 32
seconds could effectively be reduced to 8 seconds for when one
picture update was required, or perhaps, have a 32-second update
of a quad screen with blocks of four cameras. Considering that
other signals could be added to this, like audio or control signals
for remote relay activation, a better picture can be attained for
these historic beginnings.
Older-generation slow scan systems would take 32 seconds to
send a single low-quality picture from an alarmed site to the
monitoring station. Dial-up and connecting time should be
added to this, totaling to more than a minute for the completion
of the first image transmission. The slow-scan, however, was
very popular and ahead of its time. Today we have much more
advanced techniques when video signals need to be transmitted
over the telephone line.
The new technology, using the same concept but much faster image processing and compression
algorithms, is called Fast Scan and can achieve speeds of less than 1 s for a full-color picture update.
The image manipulation is digital and various compression techniques are used to even further increase
the transmission speed, yet preserving the image quality.
The most important details to take into consideration when choosing a fast-scan system are:
• The framestore resolution (in pixels)
• B/W or color
• Whether other signals can be transmitted simultaneously (often PTZ control is required, or
perhaps some relays activation)
• The transmission speed
For the last consideration you have to be very flexible because different telephone lines and different
modems will give different and unfair comparisons.
For the customer it is sometimes more important just to see very roughly what is happening at the
other end of the line, as long as it is fast. Other customers may require a good definition (resolution),
regardless of the time delay.
It is also important to know what else can be connected to the system in the future. Is there a need for
10. Transmission media
more than one camera input, or perhaps one of the cameras should have a PTZ control?
Do not forget that if you require a PTZ control, you have to accept the delay between the command
issued from the keyboard and the picture update in order to see where the camera is pointing. This
might be a bit unusual or unacceptable for some, but a lot of manufacturers offer intelligent updates.
Namely, when a joystick is used, the picture automatically selects a smaller viewing area that remains
sharp (which will have a faster update), so you can see where the camera is pointing. It then upgrades
to a full screen as soon as the joystick is released.
Another type of system offers an additional integrated feature: video motion detection. The system
automatically sends images as soon as activity is detected in the video signal.
The normal PSTN (public switched telephone network) line has a narrow bandwidth of usually 300 to
3000 Hz, which is considered a standard (measured at 3-dBm points, where dBm is measured relative
to 1 mW across 600 ohms, the telephone line impedance). Some people call this type of line plain old
telephone service (POTS). PSTN is analog technology. As such, it is never constant with the bit rate it
offers, as it very much depends on the noise.
Theoretically, it is impossible to send live video images of 5 MHz over such a narrow channel. It is
possible, however, to compress and encode the signal to achieve faster transmission, and this can be
done today by most of the fast-scan transmitters. The technological explosion of the PCs, compression
algorithms, fast modems, and better telephone lines in the last few years has made it possible to transmit
video images over telephone lines at rates unimaginable when the first slow-scan transmitters were
As mentioned earlier, the concept remained the same as that of the slow scan, but the intelligence behind
the compression schemes (what and how it transmits) has improved so much that today a color video
signal with very good resolution can be sent in less than 1 s per frame. In addition, with many
models, other control and audio signals can be sent.
The more sophisticated fast-scan systems use a method of image updating called conditional refresh.
After the initial image is sent, only the portion that changes needs to be resent. This allows a much
more rapid update rate than that with the basic fast-scan systems. Other manufacturers stick to the full
image transmission but use proprietary compression algorithms to achieve similar speeds.
In order to understand the PSTN telephone line video transmission rates, let us consider this simplified
A typical B/W video signal with 256 × 256 pixels resolution will have 256 × 256 = 65,536 bits of
information, which is equal to 64 kB of digital information (65,536/1024). (Note the digital numbers
64 = 26, 256 = 28.)
To send this amount of uncompressed information over a telephone line using a normal 2400
10. Transmission media
bits per second modem (as was the case in the early days of slow scan), it would take about 27 s
If the signal is compressed, however (compressions of 10, 20, or even more times are available), by say
10 times, this gets reduced to 3 s. Most fast-scan transmitters will only send the first image at this speed,
after which they only send the difference in the pictures, thus dramatically reducing the subsequent
images’ update time to less than a second.
A color picture with the same resolution will obviously
require more. A high-resolution picture of quality
better than S-VHS is usually digitized in a 512 × 512
frame with 24-bit colors (8 bits of each R, G, and B)
and will equal 512 × 512 × 3 = 786,432 bytes, or 768
kB. If this is compressed by 10 times, it becomes 76
kB, which is not that hard to transmit with a 14,400
bps modem at approximately 76,000/14,400 = 5 s. It
all depends on the compression algorithm.
In practice, add another few seconds to the dialing
time, which is faster with DTMF (dual tone
multifrequency) and slower with pulse dialing
Most of the high security systems have a dedicated
telephone line, which means that once the line is
established, it stays open and there is no further time
loss for the modem’s handshake and initial picture
update delay.
In the end, we should emphasize that the theoretical maximum speed of transmission that can be achieved
with a POTS or PSTN line is somewhere around 56 kb/s. In practice, it is rarely over 32 kb/s, and if
the telephone line is old, or far from the exchange, it could be as low as 19 kb/s.
For the fastest possible transmission, ISDN (Integrated Services Digital Network) telephone lines
should be used, which are available in many industrialized countries.
ISDN lines were proposed and started to be implemented in the mid-1970s, almost at the same time
when CCD chips appeared.
The basic ISDN channel offers a rate of 64 kb/s, which dramatically improves the update speed of fast
scan. In comparison, a normal PSTN line, as mentioned above, can go up to 14.4 kb/s when the lines
are in a very good condition. Some new modems can increase this even further (up to 28.8 kb/s) by
using hardware compression techniques.
10. Transmission media
The ISDN is a digital network and transmits signals in digital format; hence the bandwidth is
not given in Hz but in b/s. For special purposes, like video conferencing and cable TV (available via
telephone lines), ISDN can be used combined in broadband ISDN (B-ISDN) links, where even higher
speed rates (multiples of 64 kb/s) of at least 128 kb/s can be achieved by intelligent multiplexing of
more channels into one.
The unit used to connect a device to an ISDN line is usually called the Terminal Adaptor (TA); the
function, as well as the appearance of such a device, is very similar to a modem with PSTN lines.
Intelligent TA for B-ISDN links are also known as Aggregating Terminal Adaptors.
Do not forget, however, that in order to use the benefits of a wider ISDN, both ends (the transmitting
and the receiving) need to have an ISDN connection. In most countries, ISDN connection is charged
per time of use.
Cellular network
Transmitting images over mobile phones is an attractive possibility with the technology available today.
A mobile phone with a modem socket, combined with a notebook computer, can easily be equipped
with the software and hardware needed for wireless and mobile image transmission.
The same principles and concepts as previously discussed apply, with the exception of the transmission
speed, which is much slower via the cellular network.
The digital network offers better noise immunity, although the coverage at the moment is not as good
as with the analog mobile service. The digital cellular network is growing rapidly, and worldwide
ROAM-ing is already possible in the majority of industrialized countries. This means that when users
are overseas, they can divert their calls to the digital network in the country they are visiting and make
calls without going through an operator. Understandably, to activate ROAM-ing, the user needs to
inform its major carrier, although even this is becoming fully automatic.
With the digital cellular network, speed of up to 9600 kb/s
can be achieved when using the modem mode. There are
advancements in the hardware and software of the GSM
technology where boosting data speeds from the current
9.6 kb to 14.4 kb in a single traffic channel is now possible.
By multiplexing up to four channels into a single time
slot, operators will be able to offer transmission rates up
to 57.6 kb, six times more than is currently available and
with the help of compression technology data speeds can
be increased even higher.
10. Transmission media
Fiber optics
Fiber optics, if correctly installed and terminated, is the best quality and most secure transmission of
all. Even though it has been used in long distance telecommunications, even across oceans, for over
30 years, it has been avoided or neglected in CCTV.
The main excuse installers have used was the fear of unknown technology, often labeled as “touchy
and sensitive” and also considered “too expensive.”
Fiber optics, however, offers many important advantages over other media and although it used to be
very expensive and complicated to terminate, it is now becoming cheaper and simpler to install.
The most important advantages of all are immunity to electromagnetic interference, more secure
transmission, wider bandwidth, and much longer distances without amplification. We will, therefore,
devote more space to it.
Why fiber?
Fiber optics is a technology that uses light as a carrier
of information, be it analog or digital. This light is
usually infrared, and the transmission medium is
Fiber optic signal transmission offers many
advantages over the existing metallic links. These
• It provides very wide bandwidth.
• It achieves very low attenuation, on the order
of 1.5 dB/km compared to over 30 dB/km for
RG-59 coax (relative to 10 MHz signal).
• The fiber (which is dielectric) offers electrical
(galvanic) isolation between the transmitting
and receiving end; therefore, no ground loops
are possible.
• Light used as a carrier of the signal travels
entirely within the fiber. Therefore, it causes
no interference with the adjacent wires or other
optical fibers.
• The fiber is immune to nearby signals and
electromagnetic interferences (EMI); therefore,
10. Transmission media
it is irrelevant whether the fiber optics passes next to a 110 VAC, 240 VAC, or 10,000 VAC, or
whether it is close to a megawatt transmitter. Even more important, lightning cannot induce any
voltage even if it hits a centimeter from the fiber cable.
• A fiber optics cable is very small and light in weight.
• It is impossible to tap into the fiber optics cable without physically intercepting the signal, in
which case it would be detected at the receiving end. This is especially important for security
• The cost of fiber is becoming cheaper every day. A basic fiber optics cable costs anywhere from
$1 to $5 per meter, depending on the specific type and construction used.
Fiber optics also has some not so attractive features, but they are being improved:
• Termination of fiber optics requires special tools and better precision of workmanship than
with any other media.
• Switching and routing of fiber optics signals are difficult.
Fiber optics offers more advantages than other cables.
Fiber optics has been used in telecommunications for many years and is becoming more popular in
CCTV and security in general.
As the technology for terminating and splicing fiber improves and at the same time gets cheaper, there
will be more CCTV and security systems with fiber optics.
The concept
The concept of fiber optics lies in the fundamentals of light refraction and reflection.
To some it may seem impossible that a perfectly clear fiber can constrain the light rays to stay within
the fiber as they travel many kilometers, and yet not have these rays exit through the walls along the
trip. In order to understand this effect, we have to refresh our memory about the physical principle of
total reflection.
Physicist Willebrord Snell laid down the principles of refraction and reflection in the early seventeenth
century. When light enters a denser medium, not only does the speed reduce, but the direction of travel
is also corrected in order for the light to preserve the wave nature of propagation (see Chapter 3).
Basically, the manifestation of this is a light ray sharply bent when entering different media. We have
all seen the “broken straw effect” in a glass of water. That is refraction.
A typical glass has an index of refraction of approximately 1.5. The higher the index, the slower the
speed of light will be, thus the bigger the angle of refraction when the ray enters the surface.
10. Transmission media
Fiber optic signal transmission is based on the effect of total reflection.
The beauty of a diamond comes primarily from the rainbow of colors we see due to its high index
of refraction (2.42). This is explained by the fact that a ray of light (natural light) has all the colors
(wavelengths) a white light is composed of.
Fiber optics uses a special effect of refraction under a maximum incident angle; hence, it becomes a
total reflection. This phenomenon occurs at a certain angle when a light ray exits from a dense medium
to a sparser medium.
The accompanying drawing shows the effect of a diver viewing the sky from under the water. There is
an angle below which he can no further see above the water surface. This angle is called the angle of
total reflection. Beyond that point he will actually see the objects inside the water, and it will seem to
him like looking through a mirror (assuming the water surface is perfectly still).
For the index of refraction of water (1.33), using Snell’s Law, we can calculate this angle:
sin ΦT = 1.00/1.33 = 0.752 → ΦT = 48.6°
The concept of fiber optics transmission follows the very same principles.
The core of a fiber optics cable has an index of refraction higher than the index of the cladding.
Thus, when a light ray travels inside the core it cannot escape it because of the total reflection.
So, what we have at the fiber optics transmitting end is an LED (light-emitting diode) or LD (laser
diode) that is modulated with the transmitted signal.
In the case of CCTV the signal will be video, but similar logic applies when the signal is digital,
like a PTZ control, network, or other security data. So, when transmitting, the infrared diode is
10. Transmission media
Fiber optics usage is based on the effect of total reflection.
intensity modulated and pulsates with the signal variations. At the receiving end, we have basically
a photodetector that receives the optical signal and converts it into electrical.
Fiber optics used to be very expensive and hard to
terminate, but that is no longer the case, because
the technology has improved substantially. Optical
technology has long been known to have many
potential capabilities, but major advancements are
achieved when mass production of cheap fundamental
devices like semiconductor light-emitting diodes,
lasers, and optical fibers are made.
Nowadays, we are witnessing a conversion of most
terrestrial hard-wired copper links to fiber.
Types of optical fibers
There are a few different types of fiber optics cables. This division is based on the path light waves
take through the fiber.
As mentioned in the introduction, the basic idea is to use the total reflection effect that is a result of
the different indices of refraction (n2 > n1) where n2 is the index of the internal (core) fiber and n1 is
the index of the outer (cladding) fiber.
A typical representation of what we have just described is the step index fiber optics cable. The index
10. Transmission media
The three different types of fibers
profile is shown here, as well as how light travels through such a cable. Note the input pulse deformation
caused by the various path lengths of the light rays bounced from the cylindrical surface that divides
the two different index fibers. This is called a modal distortion.
In order to equalize the path lengths of different rays and improve the pulse response, a graded index
(or multi-mode) fiber optics cable was developed. Multi-mode fiber makes the rays travel more or less
at an equal speed, causing the effect of optical standing waves.
And finally, a single-mode fiber cable is available with even better pulse response and almost eliminated
modal distortion.
This latter one is the most expensive of all and offers the longest distances achievable using the same
electronics. For CCTV applications, the multi-mode and step index are adequate.
The index profiles of the three types are shown above.
10. Transmission media
Numerical aperture
The light that is injected into the fiber cable may come from various angles.
Because of the different indices of the air and the fiber, we can apply the theory of refraction where
Snell’s Law gives us:
sinφ0 n0 = sinφ1 n1
Understandably, n1 is the index of the fiber core and n0 is the index of air, which is nearly 1.
Furthermore, this gives us:
sinφ0 = sinφ1 n1
The left-hand side of the above is accepted to be a very important fiber cable property, called numerical
aperture (NA).
NA represents the lightgathering ability of a fiber
optics cable.
In practice, NA helps us to
understand how two terminated
fibers can be put together and
still make a signal contact.
The realistic value of a typical
NA angle for a step index
fiber cable is shown on the
To calculate NA (basically the angle φ0), it is not necessary to know the angle φ1.
The following are some basic trigonometric transformations that will express NA using only the fiber
Applying Snell’s Law and using the drawing, we get:
sin(90° – φ1) n1 = sin(90° – φ2) n2
For a total reflection we have φ2 = 0°; therefore, the above becomes:
sin(90° – φ1) n1 = n2
Since sin(90° – φ1) = cosφ1, we can write:
10. Transmission media
cosφ1 = n2 / n1
Knowing the basic rule of trigonometry,
sin2φ + cos2φ = 1
and using equation (56), we can convert (53) into a more acceptable relation without sine and
sin2φ0 /n12 + n22/n12 = 1
sin2φ0 = n12 – n22
NA = sinφ0 = SQRT (n12 – n22)
Formula (60) is the well-known formula for calculating the numerical aperture of a fiber cable, based
on the two known indices, the core and the cladding. SQRT stands for square root.
Obviously, the higher this number is, the wider the angle of light acceptance will be of the
A realistic example would be with n1 = 1.46 and n2 = 1.40 which will give us NA = 0.41, that is, φ0 =
For a graded index fiber, this aperture is a variable, and it is dependent on the radius of the index which
we are measuring, but it is lower than the step index multi-mode fiber. A single-mode 9/125 µm fiber
has NA = 0.1.
Light levels in fiber optics
Light output power is measured in watts (like any other power), but since light sources used in fiber
optics communications are very low, it is more appropriate to compare an output power relative to the
input one, in which case we get the well-known equation for decibels:
AR = 10 log (PO/PI)
However, if we compare a certain light power relative to an absolute value, like 1 mW, then we are
talking about dBm-s, that is:
AA = 10 log (P/1 mW)
Working with decibels makes calculation of transmission levels much easier.
Negative decibels, when A is calculated, mean loss and positive decibels mean gain.
In the case of AA, a negative number of dBm represents power less than 1 mW and a positive
number is more than 1 mW.
10. Transmission media
The definition of dB, when comparing power values, is as shown in equation (61), but as noted earlier,
there is a slightly different definition when voltage or current is compared and expressed in decibels:
BR = 20 log (VO/VI)
Without going into the theory, it should be remembered that power decibels are calculated with 10 and
voltage (and current) decibels are calculated with 20 times in front of the logarithm.
Light, when transmitted through a fiber cable, can be lost due to:
• Source coupling
• Optical splices
• Attenuation of the fiber due to nonhomogeneity
• High temperature and so on
When designing a CCTV system with fiber optics cables, the total attenuation is very important to
know since we work with very small signals. It is therefore better to work with worst case estimates
rather than using average values, which will help design a safe and quality system.
For this purpose it should be known that in most cases an 850 nm LED light power output is between
1 dBm and 3 dBm, while a 1300 nm LED has a bit less power, usually from 0 dBm to 2 dBm (note:
the power is expressed relative to 1 mW).
The biggest loss of light occurs in the coupling between the LED and the fiber.
It also depends on the NA number and on whether you use step or graded index fiber.
A realistic number for source coupling losses is around –14dB (this is relative to the source power
Light sources in fiber optics transmission
The two basic electronic components used in producing light for fiber optics
cables are:
• LEDs
• LDs
Both of these produce frequencies in the infrared region, which is above 700 nm.
The light-generating process in both LEDs and LDs results from the recombination of electrons
and holes inside a P-N junction when a forward bias current is applied. This light is actually called
10. Transmission media
The recombined electron/hole pairs have less energy than each constituent had separately before the
recombination. When the holes and electrons recombine, they give up this surplus energy difference,
which leaves the point of recombination as photons (basic unit carriers of light).
The wavelength associated with this photon is determined by the equation:
λ = hc/E
h is the Planck’s constant, a fundamental constant in physics: 6.63 × 10-34 Joules
c is the speed of light (300 × 106 m/s)
E is the band gap energy of the P-N material
Since h and c are constant, it means that the wavelength depends solely on the band gap energy,
that is, the material in use. This is a very important conclusion.
For pure gallium arsenide (GaAs) λ is 900 nm. For example, by adding some small amounts of
aluminium, the wavelength can be lowered to 780 nm. For even lower wavelengths, other material,
such as gallium arsenide phosphate (GaAsP) or gallium phosphate (GaP), is used.
The basic differences between an LED and an LD are in the generated wavelength spectrum and
the angle of dispersion of that light.
An LED generates a fair bit of wavelength around the central wavelength as shown below. An LD has
a very narrow bandwidth, almost a single wavelength.
An LED P-N junction not only emits light with more frequencies than an LD, but it does so in all
directions, that is, with no preferred direction of dispersion. This dispersion will greatly depend on
the mechanical construction of the diode, its light absorption and reflection of the area. The radiation,
however, is omnidirectional, and in order to narrow it down, LED manufacturers put a kind of focusing
10. Transmission media
lens on top. This is still far too
wide an angle to be used with
a single-mode fiber cable. So
this is the main reason LEDs
are not used as transmitting
devices with single-mode fiber
An LD is built of a similar
material as an LED and the lightgenerating process is similar,
but the junction area is much
smaller and the concentration
of the holes and electrons is
much greater. The generated
light can only exit from a very
small area. At certain current
levels, the photon generation
process gets into a resonance
and the number of generated
photons increases dramatically,
producing more photons with
the same wavelength and in
phase. Thus, the optical gain
is achieved in an organized
way and the generated light
is a coherent (in phase),
stimulated emission of light.
In fact, the word “laser” is
an abbreviation for light
amplification by stimulated
emission of radiation.
In order to start this stimulated emission of light, an LD requires a minimum current of 5 to 100 mA,
which is called a threshold current. This is much higher than the threshold with normal LEDs. Once
the emission starts, however, LDs produce a high optical power output with a very narrow dispersion
For transmitting high frequencies and analog signals, it is important to have a light output linear with
the applied drive current, as well as a wide bandwidth.
LEDs are good in respect to linearity but not so good in high-frequency reproduction compared to the
LDs, although, they do exceed 100 MHz, which, for us in CCTV, is more than sufficient.
Laser diodes can easily achieve frequencies in excess of 1 GHz.
10. Transmission media
The above can be illustrated with the same analogy as when discussing magnetic recording: Imagine
the light output spectrum of an LED and LD to be tips of pencils. The LED spectrum will represent
the thicker and the LD the thinner pencil tip. With the thinner pencil you can write smaller letters and
more text in the same space; the signal modulated with an LD will contain higher frequencies.
LEDs, however, are cheaper and linear and require no special driving electronics. An LED of 850 nm
costs around $10, whereas 1300 nm is around $100. Their MTBF is extremely high (106 – 108 hrs).
LDs are more expensive, between $100 and $15,000. They are very linear once the threshold is
exceeded. They often have a temperature control circuit because the operating temperature is very
important, so feedback stabilization for the output power is necessary. Despite all of that, they have a
higher modulation bandwidth, and a narrower carrier spectral width, and they launch more power into
small fibers. Their MTBF is lower than the LEDs’, although still quite high (105 – 107 hrs).
Recently, a new LED called a super luminescent diode (SLD) has been attracting a great deal of
attention. The technical characteristics of the SLDs are in between those of the LEDs and LDs.
For CCTV applications, LEDs are sufficient light sources. LDs are more commonly used in multichannel
wide bandwidth multiplexers or very long run single-mode fibers.
Light detectors in fiber optics
Devices used for detecting the optical signals on the other side of the fiber cable are known as photo
diodes. This is because the majority of them are actually one type of a diode or another.
The basic division of photo diodes used in fiber technology is into:
• P-N photo diodes (PNPD)
• PIN photo diodes (PINPD)
• Avalanche photo diodes (APD)
The PNPD is like a normal P-N junction silicon diode that is sensitive to infrared light. Its main
characteristics are low respondence and high rise time.
The PINPD is a modified P-N diode where an intrinsic layer is inserted in between the P and N types
of silicon. It possesses high response and low rise times.
The APD is similar to the PINPD, but it has an advantage that almost each incident photon produces more
than one electron/hole pair, as a result of an internal chain reaction (avalanche effect). Consequently,
the APD is more sensitive than the PIN diode, but it also generates more noise.
All these basic devices are combined with amplification and “transimpedance” stages that amplify the
signal to the required current/voltage levels.
10. Transmission media
Frequencies in fiber optics transmission
The attenuation of the optical fibers can be grouped in attenuation due to material and external
Material influences include:
• Rayleigh scattering. This is due to the inhomogeneities in the fiber glass, the size of which is
small compared to the wavelength. At 850 nm, this attenuation may add up to 1.5 dB/km, reducing
to 0.3 dB/km for a wavelength of 1300 nm and 0.15 dB/km for 1550 nm.
• Material absorption. This occurs if hydroxyl ions and/or metal ions are present in the fiber.
Material absorption is much smaller than the Rayleigh scattering and usually adds up to 0.2
dB/km to the signal attenuation.
The external effects that influence the attenuation are:
• Micro-bending. This is mainly due to an inadequate cable design – inconsistency of the fiber
cable precision along its length. It can amount up to several dB/km.
• Fiber geometry. This is similar to the above, but is basically due to the poor control over its
drawn diameter.
The accompanying diagram on the next page shows a very important fact: that not all the wavelengths
(frequencies) have the same attenuation when sent through a fiber cable.
The wavelengths around the areas indicated with the vertical dotted lines are often called fiber optics
windows. There are three windows:
• First window at 850 nm
• Second window at 1300 nm
• Third window at 1550 nm
The first window is not really with minimum attenuation compared to the higher frequencies, but
this frequency was first used in fiber transmission. The LEDs produced for this use were reasonably
efficient and easy to make.
For short-distance applications, such as CCTV, this is still the cheapest and preferred wavelength.
The 1300 nm wavelength is becoming more commonly used in CCTV. This is the preferred wavelength
for professional telecommunications as well as CCTV systems with longer cable runs, where higher light
source cost is not a major factor. The losses at this frequency are much lower, as can be seen from the
diagram. The difference between 850 nm and 1300 nm in attenuation is approximately 2 to 3 dB/km.
The 1550 nm wavelength has even lower losses; therefore, more future systems will be oriented toward
10. Transmission media
this window.
For illustration purposes, a typical attenuation figure of a multi-mode 62.5/125 µm fiber cable, for an
850 nm light source, is less than 3.3 dB per kilometer. If a 1300 nm source is used with the same fiber,
attenuation of less than 1 dB can be achieved. Therefore, longer distances can be achieved with the
same fiber cable, by just changing the light source. This is especially useful with analog signals,
such as the video.
When an 850 nm light source is used with 62.5/125 µm cable, we can easily have a run of at least a
couple of kilometers, which in most CCTV cases is more than sufficient. However, longer distances
can be achieved by using graded multi-mode fiber and even longer when a 1300 nm light source is
used instead of 850 nm.
The longest run can be achieved with a single-mode fiber cable and light sources of 1300 nm and 1550
A typical attenuation figure for a 1300 nm light source is less than 0.5 dB/km, and for 1550 nm it is
less than 0.4 dB/km.
10. Transmission media
Passive components
Apart from the previously mentioned photo diodes and detectors, which can also be considered as
active devices, there are some passive components used in fiber optics systems.
These are:
• Splices: permanent or semipermanent junctions between fibers.
• Connectors: junctions that allow an optical fiber to be repeatedly connected to and/or disconnected
from another fiber or to a device such as a source or detector.
• Couplers: devices that distribute optical power among two or more fibers or combine optical
power from two or more fibers into a single fiber.
• Switches: devices that can reroute optical signals under either manual or electronic control.
Fusion splicing
Two fibers are welded together, often under a
microscope. The result is usually very good, but the
equipment might be expensive.
The procedure of fusion splicing usually consists of
cleaning the fiber, cleaving, and then positioning the
two fibers in some kind of mounting blocks.
The precision of this positioning is improved by
using a microscope, which is quite often part of the
machine. When the alignment is achieved, an electric
arc is produced to weld the two fibers. Such a process
can be monitored and repeated if an unsatisfactory
joint is produced.
Losses in fusion splicing are very low, usually around
0.1 dB.
Mechanical splicing
This is probably the most common way of splicing,
owing to the inexpensive tools used with relatively
good results.
Fibers are mechanically aligned, in reference to
10. Transmission media
their surfaces and (usually) epoxied together. The
performance cannot be as good as fusion splicing,
but it may come very close to it. More importantly,
the equipment used to perform the mechanical
splicing is far less expensive.
Losses in a good mechanical splicing range between
0.1 and 0.4 dB.
The mechanical splicing is based on two
• V groove
• Axis alignment
Both of these are shown in the diagrams at right.
For a good connection, the fiber optics cable needs
a good termination, which is still the hardest part
of a fiber optics installation. It needs high precision and patience and a little bit of practice. Anyone
can learn to terminate a fiber cable and in cases where they have no such skills, installers can hire
specialized people who supply the terminals, terminate the cable, and test it. The latter is the most
preferred arrangement in the majority of CCTV fiber optics installations.
Fiber optics multiplexers
These multiplexers are different from the VCR multiplexers described earlier. Fiber optics multiplexers
combine more signals into one, in order to use only a single fiber cable to simultaneously transmit
several live signals. They are especially practical in systems with an insufficient number of cables
(relative to the number of cameras).
There are a few different types of fiber multiplexers. The simplest and most affordable multiplexing
for fiber optics transmission is by use of wavelength division multiplexing (WDM) couplers. These
are couplers that transmit optical signals from two or more sources, operating at different wavelengths,
10. Transmission media
over the same fiber. This is possible because light rays of different wavelengths do not interfere
with each other. Thus, the capacity of the fiber cable can be increased, and if necessary, bidirectional
operation over a single fiber can be achieved.
Frequency-modulated frequency division multiplexing (FM-FDM) is a reasonably economical
design with acceptable immunity to noise and distortions, good linearity, and moderately complex
circuitry. A few brands on the market produce FM-FDM multiplexers for CCTV applications. They
are made with 4, 8, or 16 channels.
Amplitude vestigial sideband modulation, frequency division multiplexing (AVSB-FDM) is
another design, perhaps too expensive for CCTV, but very attractive for CATV, where with high-quality
optoelectronics up to 80 channels per fiber are possible.
Fully digital pulse code modulation, time division multiplexing (PCM-TDM) is another expensive
multiplexing, but of digitized signals, which may become attractive as digital video gains greater
acceptance in CCTV.
Combinations of these methods are also possible.
In CCTV we would most often use the FM-FDM for more signals over a single fiber. The WDM type
of multiplexing is particularly useful for PTZ, or keyboard control with matrix switchers. Video signals
are sent via separate fibers (one fiber per camera), but only one fiber uses WDM to send control data
in the opposite direction.
Even though the fiber optics multiplexing is becoming more affordable it should be noted that in the
planning stage of fiber installation it is still recommended that at least one spare fiber is run in addition
to the one intended for use.
Fiber optics cables
The fiber optics itself is very small in size. The external diameter, as used in CCTV and security in general,
is only 125 µm (1 µm = 10–6 m). Fiberglass, as a material, is relatively strong but can easily be broken
when bent to below a certain
minimum radius. Therefore, the
aim of the cabling is to provide
adequate mechanical protection
and impact and crush resistance
to preserve minimum bending
radius as well as to provide easy
handling for installation and
service and to ensure that the
transmission properties remain
stable throughout the life of the
10. Transmission media
The overall design may vary greatly
and depends on the application
(underwater, underground, in the air,
in conduit), the number of channels
required, and similar. Invariably, it
features some form of tensile strength
member and a tough outer sheath to
provide the necessary mechanical and
environmental protection.
Fiber optics cables have various
designs, such as a simple single fiber,
loose tube (fiber inserted into a tube),
slotted core (or open channel), ribbon,
and tight buffer.
We will discuss a few of the most commonly used designs in CCTV.
Single-fiber and dual-fiber cables usually employ a fibrous strength member (aramid yarn) laid around
the secondary coated fiber. This is further protected by a plastic outer sheath.
Multi-fiber cables are made in a variety of configurations.
The simplest involves grouping a number of single-fiber cables with a central strength member within
the outer jacket. The central strength member can be high tensile steel wire, or a fiberglass reinforced
plastic rod. Cables with this design are available with 2 to 12 or more communication fibers. When
the plastic rod is applied to the central strength member, it becomes a metal-free optical fiber cable.
Constructed entirely from polymeric material and glass, these cables are intended for use in installations
within buildings. They are suitable for many applications including CCTV, security, computer links,
and instrumentation. These heavy-duty cables are made extremely rugged to facilitate pulling through
Loose tube cables are designed
as a good alternative to the
single core and slotted cables.
The optical fibers are protected
by water-blocking gel-filled
polyester tubes. This type of
multi-fiber cable is designed for
direct burial or duct installation
in long-haul applications. It can
be air pressurized or gel-filled for
There are some other configurations
manufactured with slotted
10. Transmission media
polyethylene core profiles to
accommodate larger numbers of
fibers. This type is also designed
for direct burial or duct installation
in long-haul applications. It can
be air pressurized or gel filled for
water blocking.
Finally, another type of cable is
the composite optic/metallic
cable. These cables are made up
of a combination of optical fibers
and insulated copper wire and are designed for both indoor and outdoor use. These cables can be fully
filled with a water-blocking compound to protect the fibers from moisture in underground installations,
for example.
Since the fiber cable is much lighter than other cables, the installations are generally easier compared
to an electrical cable of the same diameter.
The protection they have will allow fiber cables to be treated in much the same way as electrical
cables. However, care should be taken to ensure that the manufacturer’s recommended maximum
tensile and crushing force are not exceeded.
Within a given optical cable, tension is carried by the strength members, usually fiber-reinforced plastic,
steel, kevlar, or a combination, that protect the comparatively fragile glass fibers. If the cable tension
exceeds the manufacturer’s ratings, permanent damage to the fibers can result.
The rating to be observed, as far as installation tension is concerned, is the maximum installation
tension, expressed in Newtons or kilo-Newtons (N or kN). A typical cable has a tension rating of around
1000 N (1 kN). To get an idea of what a Newton feels like, consider that 9.8 N of tension is created on
a cable hanging vertically and supporting a mass of 1 kg. In addition, manufacturers sometimes specify
a maximum long-term tension. This is typically less than half of the maximum installation tension.
As with coaxial cables, optical fiber cables must not be
bent to a tighter curve than their rated minimum bending
radius. In this case, however, the reason is not the electrical
impedance change, but rather preventing the fiber from
breaking and preserving the total angle of reflection. The
minimum bending radius varies greatly for various cable
constructions and may even be specified at different values,
depending on the presence of various levels of tension in
the cable. Exceeding the bending radius specification will
place undue stress on the fibers and may even damage the
stiff strength members.
Whenever a cable is being handled or installed, it is most
10. Transmission media
important to keep the curves as smooth as possible.
Often, during an installation, the cable is subject
to crush stresses such as being walked on or, even
worse, driven over.
Although great care should be taken to avoid such
stresses, the cable is able to absorb such forces up
to its rated crush resistance value. Crush resistance
is expressed in N/m or kN/m of cable length. For
example, a cable with a specified crush resistance
of 10 kN/m can withstand a load of 1000 kg spread
across a full 1 m of cable length (10 Newtons is
approximately the force that results from a 1 kg
mass). If we consider a size 9 boot (European 42)
to be 100 mm wide, then the cable will support a
construction worker who weighs 100 kg standing
on one foot squarely on the cable. However, a
vehicle driving over this cable may exceed the crush
resistance spec and probably damage the cable.
Be careful if a cable has one loop crossing over
another, then the forces on the cable due to, say,
a footstep right on the crossover will be greatly
magnified because of the smaller contact area.
Likewise in a crowded duct, a cable can be crushed
at localized stress points even though the weight upon
it may not seem excessive.
An optical cable is usually delivered wound onto
wooden drums with some form of heavy plastic
protective layer or wooden cleats around the
circumference of the drums. When handling a cable
drum, due consideration should be given to the mass
of the drum. The most vulnerable parts of a cable drum
are the outer layers of the cable. This is especially the
case when the cable drums are vertically interleaved
with each other. Then damage due to local crushing
should be of concern. To alleviate such problems,
the drums should be stacked either horizontally, or,
if vertically, with their rims touching. Do not allow
the drums to become interlocked. Also, when lifting
drums, with a forklift for example, do not apply
force to the cable surface. Instead, lift at the rims or
through the center axis.
Different types of fiber connectors
10. Transmission media
Installation techniques
Prior to installation, the cable drums should be checked for any sign of damage or mishandling. The
outer layer of a cable should be carefully examined to reveal any signs of scratching or denting. Should
a drum be suspected of having incurred damage, then it should be marked and put aside. For shorter
lengths (i.e., < 2 km) a simple continuity check can be made of the whole fiber using a penlight as a
light source. A fiber cable, even though used with infrared wavelengths, transmits normal light just as
well. This is useful in finding out if there are serious breaks in the cable. Continuity of the fiber can
be checked by using a penlight.
The following precautions and techniques are very similar to what was said earlier for coaxial cable
installations, but since it is very important we will go through it again.
Before cable laying commences, the route should be inspected for possible problems such as feedthrough, sharp corners, and clogged ducts. Once a viable route has been established, the cable lengths
must be arranged so that should any splices occur, they will do so at accessible positions.
At the location of a splice it is important to
leave an adequate overlap of the cables so that
sufficient material is available for the splicing
operation. Generally, the overlap required is
about 5 m when the splice is of the in-line type.
A length of about 2.5 m is required where the
cable leaves the duct and is spliced.
Note that whenever a cable end is exposed it
must be fitted with a watertight end cap. Any
loose cable should be placed to avoid bending
stress or damage from passing traffic. At either
end of the cable run, special lengths are often
left depending on the configuration planned.
Foremost consideration, when burying a cable,
is the prevention of damage due to excessive
local points loading. Such loading occurs
when backfill material is poured onto or an
uneven trench profile digs into the cable, thus
either puncturing the outer sheath or locally
crushing the cable. The damage may become
immediately obvious, or it may take some time
to show itself. Whichever, the cost of digging up
the cable and repairing it makes the expenditure
of extra effort during laying well worthwhile.
When laying cables in trenches, a number of
10. Transmission media
precautions must be taken to avoid damage to the cable or a reduction in cable life expectancy.
The main protection against cable damage is laying the cable on a bed of sand approximately 50 to 150
mm deep and backfilling with another 50 to 150 mm of sand. Due care needs to be taken in the digging
of the trench so that the bottom of the trench is fairly even and free of protrusions. Similarly, when
backfilling, do not allow rock soil to fall onto the sand because it may put a rock through the cable.
Trench depth is dependent upon the type of ground being traversed as well as the load that is expected
to be applied to the ground above the cable. A cable in solid rock may need a trench of only 300 mm
or so, whereas a trench crossing a road in soft soil should be taken down to about 1 m. A generalpurpose trench in undemanding situations should be 400 to 600 mm deep with 100 to 300 mm total
sand bedding.
The most straightforward technique is to lay the cable directly from the drum into a trench or onto a
cable tray. For very long cable runs, the drum is supported on a vehicle and allowed to turn freely on
its axis, or it can be held and rested on a metal axis. As the vehicle (or person) advances, the cable is
wound off the drum straight to its resting place. Avoid excessive speed and ensure that the cable can
be temporarily tied down at regular intervals prior to its final securing.
Placing an optical cable on a cable tray is not particularly different from doing so with conventional
cables of a similar diameter. The main points to observe are, again, minimum bending radius and crush
The minimum bending radius must be observed even when the cable tray does not facilitate this.
The tendency to keep it neat and bend the optical cable to match other cables on the tray must be
Crush loading on cable trays can become a critical factor, where the optical cable is led across a sharp
protrusion or crossed over another cable. The optical cable can then be heavily loaded by further cables
being placed on top of it or personnel walking on the tray. Keep the cable as flat as possible and avoid
local stress points.
The pulling of optical cables through ducts is no
different from conventional cabling. At all times use
only the amount of force required but stay below the
manufacturer’s ratings.
The types of hauling eyes and cable clamps normally
used are generally satisfactory, but remember that
the strength members, not the outer sheath, must take
the load.
If a particular duct requires a lubricant, then it is best to
obtain a recommendation from the cable manufacturer.
Talcum powder and bean bag polystyrene beans can
also be quite useful in reducing friction.
10. Transmission media
In some conditions, the cable may already be terminated with connectors. These must be heavily
protected while drawing the cable. The connectors themselves must not be damaged or contaminated, and
the cable must not experience any undue stress around the connectors or their protective sheathing.
Once the cable is installed it is often necessary to tie it down. On a cable tray, the cable can be held
down simply with nylon ties. Take particular care to anchor the cable runs in areas of likely creep. On
structures that are unsuited to cable ties, some form of saddle clamp is recommended. Care is required
in the choice and use of such devices so that the cable crush resistance is not exceeded and so that the
outer jacket is not punctured by sharp edges. Clips with molded plastic protective layers are preferred
and only one clip should be used for each cable. Between the secured points of a cable it is wise to
allow a little slack rather than to leave a tightly stretched length that may respond poorly to temperature
variations or vibration.
If the cable is in some way damaged during installation, then leave enough extra cable length around
the damaged area so that an additional splice can be inserted.
The conclusion is that installation of fiber cables is not greatly different from conventional cables,
and provided a few basic concepts are observed, the installation should be trouble free.
Fiber optic link analysis
Now that we have learned the individual components of a fiber optics system – the sources, cables,
detectors, and installation techniques – we may use this in a complete system. But before the installation,
we first have to do a link analysis, which shows how much signal loss or gain occurs in each stage of the
system. This type of analysis can be done with other transmission media, but it is especially important
with fiber optics because the power levels we are handling are very small. They are sufficient to go
over many kilometers, but can easily be lost if we do not take care of the microscopic connections and
The goal of the link analysis is to determine the signal strength at each point in the overall system and
to calculate if the power at the receiver (the detector) are sufficient for acceptable performance. If it is
not, each stage is examined and some are upgraded (usually to a higher cost), or guaranteed performance
specifications (distance, speed, errors) are reduced.
For fiber optics systems, the link analysis should also include unavoidable variations in performance that
occur with temperature as a result of component aging and from manufacturing tolerance differences
between two nearly identical devices. In this respect, fiber optics systems need more careful study than
all-electronic systems as there is greater device-to-device variation, together with larger performance
changes due to time and temperature.
As a practical example, the diagram on the next page shows a basic point-by-point fiber optics system,
which consists of an electrical data input signal, a source driver, an optical source, a 1 km optical fiber
with realistic maximum attenuation of 4 dB/km, an optical detector, and the receiver electronics.
We have assumed that the system is handling digital signals, as is the case with PTZ control, but the
10. Transmission media
logic will be very similar when analog signal budgeting is calculated.
The calculation begins with the optical output power of the source (–12 dBm in this case) and ends with
the power that is seen by the detector.
This analysis looks at each stage in the system and shows both the best and worst case power loss
(or gain) for each link as a result of various factors, such as coupling losses, path losses, normal parts
tolerance (best and worst for a specific model), temperature, and time.
The analysis also allows for an additional 5 dB signal loss that will occur if any repairs or splices are
made over the life of the system.
The conclusion of the example is that the received optical power, for a signal to be recognized, can
be anywhere between +7 dB (in the best case) and –23 dB (in the worst case) relative to the nominal
source value. Technically, +7 dB would mean amplification, which is not really what we have, but rather
it refers to the possible tolerance variations of the components. Therefore, the receiving detector must
handle a dynamic range of optical signals from –5 dBm (–12 dBm + 7 dB = –5 dBm) to –35 dBm (–12
dBm – 23 dB = –35 dBm), representing a binary 1. Of course, when the source is dark (no light, which
means binary 0), the received signal is also virtually zero (except for the system noise).
It is understandable that a digital signal can go further in distance, using the same electronics and
fiber cable, than an analog video signal, simply because of the big error margins digital signals have.
Nevertheless, a similar analysis can be performed with analog signals. If however, we are not prepared
or do not know how to, we can still get an answer to the basic question, Will it work? Unfortunately, the
answer can only be obtained once the fiber is installed. To do so we need an instrument that measures
cable continuity as well as attenuation. This is the optical time domain reflectometer.
10. Transmission media
An optical time domain reflectometer (OTDR) is an instrument that can test a fiber cable after it has
been installed, to determine the eventual breaks, attenuation, and the quality of termination.
The OTDR sends a light pulse into one end of the optical fiber and detects the returned light energy
versus time, which corresponds directly to the distance of light traveled.
It requires connection to only one end of the cable and it actually shows the obvious discontinuity in
the optical path, such as splices, breaks, and connectors.
It uses the physical phenomenon known as Rayleigh backscattering, which occurs within the fiber, to show the signal
attenuation along the fiber’s length. As a light wave travels
through the fiber, a very small amount of incident light
in the cable is reflected and refracted back to the source
by the atomic structure and impurities within the optical
fiber. This is then measured and shown visually on a screen
and/or printed out on a piece of paper as evidence of the
particular installation. Eventual breaks in a fiber cable are
found most easily with an OTDR. Being an expensive
instrument, an OTDR is usually hired for a fiber optics
installation evaluation or used by the specialized people
that terminate the cable.
10. Transmission media
11. Networking in CCTV
The Information Technology era
Today’s world is, without any doubt, the world of Information Technology, or, as many would refer
to it, the IT world.
In CCTV, we are usually looking for visual information about an event, an intruder, or a procedure,
such as who entered the building before it caught fire, or what is the procedure during the heart surgery,
or what is the license plate of a car involved in a collision.
So, how is information defined, and why is it so important?
Information is any communication or representation of knowledge such as facts, data, or opinions, in any medium or form, including textual, numerical, graphic, cartographic, narrative, or
audiovisual forms.
Human knowledge grows exponentially, and what has been achieved only in the last few decades, for
example, far exceeds the knowledge accumulated through thousands of years before that. The amount
of information in each and particular human activity is so large that without proper understanding and
management of such information we would lose track of what we know and where we are heading.
Because the information grows exponentially people have seen a need for a complete new subject – IT
– that deals with such a large amount of information.
IT is part of the larger scope of things that are especially interesting for us in the CCTV industry, and it
is concerned with the hardware and software that processes information, regardless of the technology
involved, whether this is digital video recorders, computers, wireless telecommunications, or others.
Because of the large amount of information recorded in our daily lives, reliable, fast, and efficient
access to such information is of paramount importance.
Filing cabinets and mountains of papers have given way to computers that store and manage information electronically. Colleagues and friends thousands of kilometers apart can share information
instantaneously, just as hundreds of workers in a single location can simultaneously review research
data maintained on-line. Students, doctors, and scientists can study, research, and exchange information even if they are continents apart. Computer networks are the glue that binds these elements
The large number of such networks forms the global network called the World Wide Web, or as we
all know it, the Internet. This only started in the 1980s, not much more than 20 years before the writing of this book, and yet most of the research and study I had to do for this book was done using the
11. Networking in CCTV
The Internet is probably one of the most important human achievements ever.
The Internet is truly a global network, a community of knowledge and information where everybody
can join in without a passport, regardless of their skin color, age, agenda, or religion.
In order to understand the use of IT, networks, and digital technology in today’s CCTV, we have to
dedicate some pages to the networking fundamentals.
Computers and networks
Before defining a network, we have to define the basic intelligent device that is a main part of any
network. This is the computer.
Computers are so much in our daily lives that not only can we not live without them, but they are in
all avenues of our daily lives so it is difficult to define them accurately. One of the many definitions
of a computer is as an electronic device designed to accept digital information (data input), perform prescribed mathematical and logical operations at high speed (processing), and supply the
results of these operations (output). Now, this could easily refer to a digital calculator, and it would
probably be correct, but in CCTV we will use the term computer to define an electronic device that
is composed of hardware (main processor – CPU, memory, and a display output) and software
(Operating System and applications) and which executes a set of instructions as defined by the
In the early years of the computer, numbers and high-speed calculations were the primary area of
computer usage. As the processing speed and computer power grew, the processing of images, video,
and audio became more frequent and this is the area of our interest.
Initially, in CCTV, computers were
used most often in video matrix
switchers to intelligently switch cameras onto monitors based on logical
processing of external alarm inputs as
well as manual selection. Computers
are also used in monitoring stations
where thousands and thousands of
alarms are processed and logged.
These days computers are used in
many new CCTV products where
digital video capturing, processing,
compression, and archiving are done.
The vast majority of these devices are
digital video recorders – DVRs – but
IP cameras, even though small, have
Courtesy of Fast Video Security AG
A typical computer of a digital video recorder (DVR)
11. Networking in CCTV
hardware and software with equivalent functionality of a computer.
All such computers can work on their own, but their real power comes to effect when they are put in
a network environment.
A network is simply a group of two or more computers linked together.
Networking allows one computer to send information to and receive information from another. We
may not always be aware of the numerous times we access information on computer networks. The
Internet is the most conspicuous example of computer networking, linking millions of computers around
the world, but smaller networks play a role in information access on a daily basis. Many libraries and
book shops have replaced their card catalogues with computer terminals that allow patrons to search
for books far more quickly and easily. Many companies exchange all their internal information using
their own LANs; product leaflets and CCTV system designs are quoted electronically using networks.
Facsimile machines are getting less use. Many Internet search engines help millions of people find the
information they need. In each of these cases, networking allows many different computers in multiple
locations to access a shared database.
Computers in CCTV are becoming more dominant, regardless of whether they run on a full-blown
operating system (OS), such as Windows or Linux, or on an embedded OS residing on a chip. One of
the main and indispensable features of computers is their ability to connect to other computers and
share information via networks. And the fact is – networks are already in place in many businesses,
organizations and even homes. Fitting a CCTV system to such networks is just a matter of connecting
the LAN cable to a digital video recorder, to a network-ready camera, or perhaps to a computer fitted
with a special video capturing card. With some minor network settings the CCTV system can be up
and running in a very short period of time.
This ease of network retrofitting and
installation is one of the major attractions (though not the only one)
of networks for CCTV.
This is not to say that the modern network CCTV systems have to be installed
on existing networks. Many designers
would actually create a complete new
and separate, parallel network, simply
because then the system becomes even
more secure, dedicated, and most importantly does not affect the data traffic
of the normal, everyday business usage
An example of a small computer network
Once we get to this stage of having
networked CCTV, there are many new
11. Networking in CCTV
issues and limitations we face and need to understand in order to further improve or modify our system
Later in the book we are going to get deeper into each of these issues, but first, let us start with the
basics of the networking and then clarify some of the key concepts and terminology used.
There are a few types of network transmission configurations and methods (protocols). These are the
Fiber Distributed Data Interface (FDDI), the Token Ring (as specified by the IEEE 802.5 recommendation), and the Ethernet (specified by IEEE 802.3 recommendation).
Of the three, the most popular, and the one we will devote most of the space in this book to, is the
Ethernet is used in over 85% of the world’s LANs,
primarily because it is a simple concept, easy to
understand, implement, and maintain; it allows
low-cost network implementations; it provides
extensive topological flexibility; and it guarantees
interconnection and operation of various products
that comply with the Ethernet standards, regardless
of the manufacturer and operating system used on
the computer.
Depending upon the scale of such network configurations, we have two major groups: Local Area Networks (LANs) and Wide Area Networks (WANs).
The Local Area Network (LAN) connects many devices
that are relatively close to each other, usually in the same
building. A typical example is a business or a company with
at least a couple of computers. Sometimes this configuration is called the Intranet.
In a classic LAN configuration, one computer is nominated
as the server. It stores all of the software that controls the
network, including the software that can be shared by the
computers attached to the network. Computers connected
to the server are referred to as clients (or workstations). On
most LANs, cables are used to connect the network interface
cards (NIC) in each computer.
The Wide Area Network (WAN) connects a number of
11. Networking in CCTV
devices that can be many kilometers apart.
For example, if a company has offices in two
major cities, hundreds of kilometers apart,
each of their LANs would most likely be
configured in WAN formation, using dedicated lines leased from the local telephone
company, or any available ISDN, ADSL, or
other network connections.
WANs connect larger geographic areas, such as interstate, or country to country. Satellite uplinks or
dedicated transoceanic cabling can be used to connect this particular type of network. WANs can be
highly complex systems, as they may be connecting local and metropolitan networks to global communications networks like the Internet. To the user, however, a WAN will appear no more complex
than a LAN.
In comparison to WANs, LANs are faster and more reliable, but improvements in technology continue to
blur the line of demarcation and have allowed LAN technologies to connect devices tens of kilometers
apart, while at the same time greatly improving the speed and reliability of WANs. This in turn blurs
the line between the WANs and the Internet.
The Internet can be considered as the largest global WAN.
LAN and WAN example
11. Networking in CCTV
Networking enables the user to access data from any location. This means an employee of a company
in City A can send (upload) or receive (download) a file to a colleague in City B in a few seconds. The
file can be a quotation document, a product leaflet, a program, or a digital photo.
In CCTV, we are interested mostly in video images (still images or a motion sequence), but other details
such as audio, alarms, and various data logged during the system operation, can also be accessed. The
principles in a network are the same. With the appropriate security level and a correct password, all
such CCTV information can be accessed, copied, and displayed from anywhere on the network.
Understandably, if typical analog cameras are used in a CCTV system, they have to be first converted to
digital (unless the cameras are already producing digital output) in order to be recognized and processed
by the network computers. We will talk more about the digitization and compression of the video later in
the book, but it is important to note that once the information is converted to digital, it is easily shared,
copied, printed, transferred, stored, and used, providing one has the correct authorization levels.
This is a very important advantage in security applications since remote sites can be monitored and
systems can be controlled from anywhere at any time. Most digital video recorders, or network cameras, are designed to allow you to view information from a remote location as if you were physically
A typical network camera
11. Networking in CCTV
First, let us present a little bit of history.
In 1973, at Xerox Corporation’s Palo Alto Research Center (more commonly known as PARC), researcher Bob Metcalfe designed and tested the first Ethernet network. While working on a way to link
Xerox’s “Alto” computer to a printer, Metcalfe developed the physical method of cabling that connected
devices on the Ethernet as well as the standards that governed communication on the cable. The data
rate of such connection was around 3 megabits per second (3 Mb/s).
Metcalfe’s original paper described Ethernet as “a branching broadcast communication system for
carrying digital data packets among locally distributed computing stations. The packet transport
mechanism provided by Ethernet has been used to build systems which can be viewed as either local
computer networks or loosely coupled
multiprocessors. An Ethernet’s shared
communication facility, its Ether, is
a passive broadcast medium with no
central control. Coordination of access to the Ether for packet broadcasts
is distributed among the contending
transmitting stations using controlled
statistical arbitration. Switching of
packets to their destinations on the
Ether is distributed among the reThe original Metcalfe Ethernet concept drawing
ceiving stations using packet address
After this event, a consortium of three companies – Digital Equipment Corporation (DEC), Intel, and
Xerox – produced a joint development around 1980, which defined the 10 Mb/s Ethernet version 1.0
specification. In 1983, the Institute of Electrical and Electronic Engineers (IEEE) produced the IEEE
802.3 standard, which was based on, and was very similar to, the Ethernet version 1.0.
In 1985 the official standard IEEE 802.3 was published, which marks the beginning of the new era,
leading the way to the birth of the Internet a few years later.
Ethernet has since become the most popular and most widely deployed network technology in the
world. Many of the issues involved with Ethernet are common to many network technologies, and
understanding how Ethernet addressed these issues can provide a foundation that will improve your
understanding of networking in general.
11. Networking in CCTV
The main Ethernet categories
10 Mb/s Ethernet (IEEE 802.3)
This Ethernet category refers to the original shared media Local Area Network (LAN) technology
running at 10 Mb/s. Ethernet can run over various media such as twisted pair and coaxial. 10 Mb/s
Ethernet is distinct from other higher speed Ethernet technologies such as FastEthernet, Gigabit
Ethernet, and 10 Gigabit Ethernet.
Depending on the media, 10 Mb/s Ethernet can also be referred to as:
• 10BaseT – Ethernet over Twisted Pair Media
• 10BaseF – Ethernet over Fiber Media
• 10Base2 – Ethernet over Thin Coaxial Media
• 10Base5 – Ethernet over Thick Coaxial Media
Fast Ethernet (IEEE 802.3U)
Fast Ethernet covers a number of 100 Mb/s Ethernet specifications. It offers a speed increase 10 times
that of the 10BaseT Ethernet specification, while preserving such qualities as frame format, MAC
mechanisms, and MTU. Such similarities allow the use of existing 10BaseT applications and network
management tools on Fast Ethernet networks.
Gigabit Ethernet (IEEE 802.3Z)
Gigabit Ethernet builds on top of the Ethernet protocol but increases speed tenfold over Fast Ethernet
to 1000 Mb/s, or 1 gigabit per second (Gb/s). Gigabit Ethernet allows Ethernet to scale from 10/100
Mb/s at the desktop to 100 Mb/s up the riser to 1000 Mb/s in the data center.
11. Networking in CCTV
By leveraging the current Ethernet standard as well as the installed base of Ethernet and Fast Ethernet
switches and routers, network managers do not need to retrain and relearn a new technology in order
to provide support for Gigabit Ethernet. Cisco is leading the industry by driving the standards for
Gigabit Ethernet while investing in products supporting Gigabit Ethernet, Gigabit Ethernet migration
paths, and ATM.
Gigabit Ethernet over Copper (IEEE 802.3AB)
Gigabit Ethernet over Copper (also known as 1000BaseT) is an extension of the existing Fast Ethernet standard. It specifies Gigabit Ethernet operation over the Category 5e/6 cabling systems already
installed, making it a highly cost-effective solution. As a result, most copper-based environments that
run Fast Ethernet can also run Gigabit Ethernet over the existing network infrastructure in order to
dramatically boost network performance for demanding applications.
10 Gigabit Ethernet
10 Gigabit Ethernet is basically a faster version of Ethernet. It uses the IEEE 802.3 Ethernet media
access control (MAC) protocol, the IEEE 802.3 Ethernet frame format, and the IEEE 802.3 frame size.
10 Gigabit Ethernet is full duplex, just like full-duplex Fast Ethernet and Gigabit Ethernet; therefore,
it has no inherent distance limitations. Because 10 Gigabit Ethernet is still Ethernet, it supports all
intelligent Ethernet-based network services such as multi-protocol label switching (MPLS), Layer 3
switching, quality of service (QoS), caching, server load balancing, security, and policy-based networking. And it minimizes the user’s learning curve by supporting familiar management tools and
architectures. With a data rate of 10 Gb/s, 10 Gigabit Ethernet offers a low-cost solution to the demand
for higher bandwidth in the LAN, MAN, and WAN. The potential applications and markets for 10
Gigabit Ethernet are enormous, including enterprises, universities, telecommunication carriers, and
Internet service providers.
Wireless Ethernet (IEEE 802.11)
The acceptance and practicality of wireless communications between computers, routers or digital
video devices is becoming so popular that manufacturers are forced to bring better and cheaper devices
each time. After many years of proprietary products and ineffective standards, the industry has finally
decided to back one set of standards for wireless networking: the 802.11 series from the Institute of
Electrical and Electronics Engineers (IEEE). These emerging standards define wireless Ethernet, or
wireless LAN (WLAN), also referred to as Wi-Fi (Wireless Fidelity).
The main group of products belong to two main categories with data transfer speed of 11 Mb/s and 54
Mb/s, most of which are using the 2.4 GHz “free spectrum” range.
Since this is an emerging but very popular technology, we dedicate more space to wireless Ethernet at
the end of this chapter.
11. Networking in CCTV
Data speed and types of network cabling
By definition, Ethernet is a local area technology and works with networks that traditionally operate
within a single building, connecting devices in close proximity. In the beginning coaxial cable was
used for most Ethernet networks, but twisted pair, first Category 3, then Category 5 and Category 6
became the preferred medium for small LANs.
Ethernet uses a bus or star topography (or mix of the two) and supports data transfer rates of 10, 100,
1000, or 10,000 Mb/s. After the basic 10 Mb/s data rate Ethernet (often called 10BaseT when twisted
pair cable is used), newer and faster standards were developed, notably the 100 Mb/s, also known as
Fast Ethernet, the 1000 Mb/s, also known as Gigabit Ethernet, and, during the preparation of this
book, 10 Gigabit Ethernet which is near completion.
The Ethernet standard has grown to encompass new technologies as computer networking has matured,
but the mechanics of operation for every Ethernet network today stem from Metcalfe’s original design.
The original Ethernet described communication over a single cable shared by all devices on the network. Once a device attached to this cable, it had the ability to communicate with any other attached
device. This allows the network to expand to accommodate new devices without requiring any
modification to those devices already on the network.
The table above indicates the approximate time needed to download certain file
size over a variety of data bandwidth media.
The graph on the next page illustrates the data bandwidth for various standard devices and standards.
11. Networking in CCTV
11. Networking in CCTV
One of the most common questions in CCTV today is how quick an image update is over a network
or how long a download of certain footage will take.
In order to be able to understand and calculate this the readers should be reminded that there is a difference between bits (marked with the lower case “b”) and Bytes (marked with capital “B”). Typically,
there are 8 bits in one Byte. Therefore, when making a rough calculation as to how long it would take
to download a file over a particular data link connection, the Mb/s data transfer rate needs first to be
converted to MB/s by dividing it by 8; also, allowance should be made for traffic collision losses and
noise, which could vary anywhere from 10 to 50%. So in many worst case scenario calculations, 50%
of the data transfer rate should be used.
For example, if we have a dial-up Internet connection with a typical modem of 56 kb/s, the maximum
transfer rate will be around 6 to 7 kB/s, as the best, and around 3 kB/s as the worst case scenario. With
a dial-up PSTN connection, we still use analog modulation techniques, the quality of which can vary
greatly, depending on the line noise, distance and hardware quality, so it is possible that the worst case
scenario can be even lower than 3 kB/s. So, when using 56 kb/s modem, there is no guarantee that the
established connection will be 56 kb/s, but that represents the maximum achievable data transfer rate
when conditions are excellent. Going back to our example, if we have for example, a 1 MB file to
download, it would take at least 150 seconds (1024 Bytes divided by 6 kB/s) when good-quality PSTN
dial-up connection is used. For the same file to be transferred over an ADSL Internet connection of 512
kb/s, it would take much faster, but at least 16 seconds (512 kb/s = 64 kB/s; 1024 kB divided by 64
kB = 16 sec) and can go up to 32 seconds, if equipment and lines are of low quality. This is still much
faster compared to over 2.5 minutes for a 56 kb/s PSTN modem connection.
In calculations as illustrated here, consideration should be given to the fact that the speed of download
of a file is as fast as the slowest speed in the chain. This means that if a computer you download from
has a limited upload speed which is much lower than your download speed, then that will define your
download time.
The same principles of data speed calculations apply to a variety of network communication and
storage devices. Each component in a computer and a network has its own limitation imposed by the
component. This is a very important consideration especially in the modern CCTV digital system designs where there is an ever growing need for faster transmission, more cameras recorded, and more
pictures per second required.
All components in such a chain of video streaming influence the total performance result. The network
is not always the bottleneck. If, for example, we have a Gigabit Ethernet in place (with matching network cards and network switchers or routers), it is quite possible that a computer we have as a video
recorder uses ATA66 hard drives (peaking at 520 Mb/s) which is slower than the network itself, and as
such becomes a bottleneck in playing back multiple cameras on multiple operators’ consoles.
Being aware of the totality of a digital networked system and of each single component making such system is the key to a successful implementation of this new technology we have embraced in CCTV.
11. Networking in CCTV
Ethernet over coax and UTP cables
The Ethernet coaxial cable uses 50 ohms impedance and cable (RG-58), as opposed to 75 ohms for
analog video (RG-59). Since they are very similar in size and use very similar BNC connectors, care
should be taken that these not be mixed. When using Ethernet coaxial cables, correct terminations are as
important, if not more so, as in analog video. If a network is configured in a bus topology using coaxial
cable, both ends of such a bus have to be terminated with 50 ohms. The networking with coaxial cables
is also known as unbalanced transmission, as is the case with the analog CCTV video signals, whereas
the unshielded twisted pair (UTP) is known as balanced. Coaxial networking offers longer distances
with no repeaters, but balanced transmission has other important advantages over unbalanced mostly
in eliminating external electromagnetic interferences using the common mode rejection principles (as
in twisted pair video).
“Balanced” relates to the physical
geometry and the dielectric properties of a twisted pair of conductors. If two insulated conductors
are physically identical to one
another in diameter, concentricity,
and dielectric material, and are
uniformly twisted with equal
length of conductor, then the
pair is electrically balanced with
respect to its surroundings.
The degree of electrical balance
Ethernet LAN using coaxial cable
depends on the design and manufacturing process. For balanced transmission, an equal voltage of opposite polarity is applied
on each conductor of a pair. The electromagnetic fields created by one conductor cancel out the
electromagnetic fields created by its “balanced” companion conductor, leading to very little radiation
from the balanced twisted pair transmission line. The same concept applies to external noise that is
induced on each conductor of a twisted pair.
A noise signal from an external source, such as radiation from a radio transmitter antenna, generates
an equal voltage of the same polarity, or “common mode voltage,” on each conductor of a pair. The
difference in voltage between conductors of a pair from this radiated signal (the “differential voltage”)
is effectively zero.
Since the desired signal on the pair is the differential signal, the interference practically does not
affect balanced transmission.
The degree of electrical balance is determined by measuring the “differential voltage” and comparing
it to the “common mode voltage” expressed in decibels (dB). When good network interface equipment,
quality cables and high-quality termination are used, the Cat-5 and Cat-6 are easier to prepare and offer
11. Networking in CCTV
good-quality networking. This is why the “Cat” networking makes up the majority of LANs today.
The term Cat to classifications of UTP (unshielded twisted pair) cables. The difference in classifications of the cables is based mainly on the bandwidth, copper type, size and electrical performance.
Currently, the most popular are the main Categories of cable – Category 3, 4, 5, 5e, and 6, all of which
are defined by the Electronic Industry Association (EIA) and the Telecommunications Industry Association (TIA) recommendations.
The EIA/TIA defines the following five categories of twisted pair cable:
• Cat-1 – Traditional telephone cable
• Cat-2 – Cable certified for data transmissions up to 4 Mb/s
• Cat-3 – Balanced 100 ohm cable and associated connecting hardware whose transmission
characteristics are specified up to 16 MHz. It is used by 10BaseT and 100BaseT4 installations. Category 3 is the most common type of previously installed cable found in corporate wiring schemes, and
it normally contains four pairs of wire.
• Cat-4 – Balanced 100 ohm cable and associated connecting hardware whose transmission
characteristics are specified up to 20 MHz. It is used by 10BaseT and 100BaseT4 installations. The
cable normally has four pairs of wire. This grade of UTP is not common.
• Cat-5 – Balanced 100 ohm cable and associated connecting hardware whose transmission
characteristics are specified up to 100 MHz. It is used by 10BaseT, 100BaseT4, and 100BaseTX installations.
The Cat-5 10/100 Ethernet cables have 8 wires,
of which 4 are used for data. The other wires are
twisted around the data lines for electrical stability
and resistance to electrical interference. The cable
termination (connector) is known as RJ-45 and
resembles a large telephone line connector.
An RJ-45 connector
Electrical signals propagate along a cable very quickly
(typically 65% of the speed of light), but even for
digital signals, as is the case with analog, the same
electrical laws apply – they weaken as they travel,
and electrical interference from neighboring electromagnetic devices affects the signal. The effects of
voltage drop, combined with the effect of inductance
and capacitance for high-frequency signals (high bit
rate) and external electromagnetic interferences, impose physical limitations on how far a certain cable
can carry data before it gets to a repeater (switch or
11. Networking in CCTV
router). A network cable must be short enough that devices
at opposite ends can receive each other’s signals clearly
and with minimal delay. This places a distance limitation
on the maximum separation between two devices. This is
called network diameter of the Ethernet network.
Limitations apply for other Ethernet media as well, such
as wireless or fiber optic, although the minimum distances
are different from copper.
The most common network cable, Cat-5, uses AWG24
wires (with approximate diameter of 0.2 mm2) and has
100 ohm impedance. Readers are reminded that AWG
(American Wire Gauge) is a system that specifies wire
size. The gauge varies inversely with the wire diameter
size, which defines the electrical resistance (the smaller
the AWG number, the larger the conductor diameter, the
smaller the resistance).
Twisted pair cable comes in two main varieties, solid and
stranded. Solid cable supports longer runs and works
best in fixed wiring configurations like office buildings.
Stranded cable, on the other hand, is more pliable and
better suited for shorter-distance, movable cabling such
as “patch” cables.
A variation on Cat-5, called Cat-5e, is even better performing network cable. It was ratified in 1999, formally
called ANSI/TIA/EIA 568A-5, or simply Category 5e (the
e stands for enhanced). Cat-5e is also 100 ohm impedance
cable and is completely backward compatible with the Cat-5
equipment. The enhanced electrical performance of Cat5e ensures that the cable will support applications that
require additional bandwidth, such as gigabit Ethernet
or analog video (if used in twisted pair video transmission).
A typical RJ-45 pin layout as
per the EIA T-568 standard
(view from the contacts side)
11. Networking in CCTV
Cat-5e has an incremental improvement designed to enable cabling to support full-duplex Fast Ethernet
operation and Gigabit Ethernet. The main differences between Cat-5 and Cat-5e can be found in the
specifications where performance requirements have been raised slightly.
While Cat-5 components may function to some degree in a Gigabit Ethernet (at shorter distances), they
perform below standard during high data transfer scenarios. Cat-5e cables work better with gigabit
speed products. So, when using a 100 Mb/s switch it is better to get Cat-5e cable instead of Cat-5.
The next level in the cabling hierarchy is Category 6 (ANSI/TIA/EIA-568-B.2-1), which was ratified
by the EIA/TIA in June 2002. Cat-6 provides higher performance than Cat-5e and features more
stringent specifications for crosstalk and system noise.
Also built to have 100 ohm impedance, Cat-6 cable requires a greater degree of precision in the manufacturing process compared to Cat-5. Similarly, a Cat-6 connector requires a more balanced circuit
design. Cat-6 provides higher performance than Cat-5e and features more stringent specifications
for crosstalk and system noise.
All Cat-6 components are backward compatible with Cat-5e, Cat-5, and Category 3.
The quality of the data transmission depends upon the performance of the components of the channel.
So to transmit according to Cat-6 specifications, connectors, patch cables, patch panels, cross-con-
This table shows typical Category Unshielded Twisted Pair specifications.
11. Networking in CCTV
nects, and cabling must all meet Cat-6 standards. The channel basically includes everything from the
wall plate to the wiring closet. The Cat-6 components are tested both individually and together for
performance. In addition, the standard calls for generic system performance so that Cat-6 components
from any vendor can be used in the channel. Cat-6 channel transmission requirements should result in a
Power-Sum Attenuation-to-Crosstalk Ratio (PS-ACR) that is greater than or equal to zero at 200 MHz.
In addition, all Cat-6 components must be backward compatible with Cat-5e, Cat-5, and Category 3.
If different category components are used with Cat-6 components, then the channel will achieve the
transmission performance of the lower category. For instance, if Cat-6 cable is used with Cat-5e connectors, the channel will perform at a Cat-5e level.
Cat-6 cable contains four pairs of copper wire and, unlike Cat-5, utilizes all four pairs; the communication speed it supports is more than twice the speed of Cat-5e. As with all other types of twisted pair
EIA/TIA cabling, Cat-6 cable runs are limited to a maximum recommended run rate of 100 m
(approximately 328 ft).
Because of its improved transmission performance
and superior immunity from external noise, systems operating over Category 6 cabling will have
fewer errors than Category 5e for current applications. This means fewer re-transmissions of lost or
corrupted data packets under certain conditions,
which translates into higher reliability.
The “fastest” copper cable currently covered by
the EIA/TIA standards is the Category 7 targeting
Gigabit networks, which is still under development.
Cat-7 is supposed to be fully compatible with previous standards. The Cat-7 is no longer unshielded.
The specification requirements are so high that
each pair has to be shielded, and in addition all
four pairs have to be shielded again, making the
Cat-7 the most expensive Cat cable. Also, Cat-7
no longer uses RJ-45 connectors. Many users will
argue that fiber optics is a better choice once you
see a need for such a high-performance network
cable, so we will leave this categorization for further reading elsewhere in more up-to-date books
and manuals, but readers should be aware that new
cable categories have been developed.
Cat-7 cable description
11. Networking in CCTV
Patch and crossover cables
Two kinds of wiring schemes are available for Ethernet cables: patch and crossover cables.
Patch cables are used for connecting computers using hubs or switches (sometimes referred to
as straight cable).
The crossover cables are normally used to connect two PCs without the use of a hub, or can be
used to cascade two hubs without using an uplink port.
A crossover cable is a segment of cable that crosses over pins
1&2 and 3&6, which the Tx and Rx pins in order to be able to
have two computers exchange information. If a cable does not
say crossover, it is a standard patch cable.
If you are not sure what type of cable you have, you can put the
two RJ-45 connectors next to each other from the same side (as
shown on the photos here) and if the wiring colors are in identical order from left to right, then it is a patch cable. If pins 1&2
have reversed color wires order, then it is a crossover. A good
practice is to always have the crossover cable color different from
the color of the majority of the patch cables used – for example, a
yellow crossover cable amongst blue-colored patch cables.
Stranded cable, as opposed to solid core, has several small gauge
wires in each separate insulation sleeve. Stranded cable is more
flexible, making it more suitable for patch cords. When using patch
cables, the recommended maximum lengths are around 10 m
(30 ft). This construction is great for the flexing and the frequent
changes that occur at the wall outlet or patch panel. The stranded
conductors do not transmit data signals as far as solid cable.
The EIA/TIA 568A standard limits the length of patch cables to 10
meters in total length. It does not mean you cannot use stranded cable
for longer runs; it is just not recommended. Some installations have
stranded cable running over 30 meters with no problems, but care
should be taken not to use stranded cable in larger installations.
Solid copper cable has one larger gauge wire in each sleeve. Solid
cable has better electrical performance than stranded cable and is
traditionally used for inside walls and through ceilings, or any type
of longer run of cable. All such Category network cables (using
solid core) are specified for a maximum length of around 100
m (328 ft) before a repeater is needed.
Patch and crossover
11. Networking in CCTV
This is not to say that longer distances are not possible, but this very much
depends on the cable quality and the intended network bandwidth. For example, if Cat-6 cable is used for up to 100 Mb/s, then longer distances than
100 m can be achieved since Cat-6 is very stringent in its design and it targets
the Gigabit network speeds. How much longer the cable can be run without
a repeater (router/switch) can only be proven by test.
A variety of expensive and cheaper tools are available to verify the patch
or crossover cable quality, and it is recommended that every network cable
installer should have at least the basic one.
One of the main sources of problems for any copper cabling, including the
“Cat” types of cable, are the electromagnetic interferences. Electromagnetic
interferences (EMIs) are potentially harmful to your communications system because they can
lead to signal loss and degrade the
overall performance of high-speed
Cat cabling. EMI interference in
signal transmission or reception is
caused by the radiation of electrical
or magnetic fields which are present nearby all power cables, heavy
electric machinery, or fluorescent
This is unfortunately the nature of
electrical current flowing through
copper cable, and it is the basics of
electromagnetic interdependence.
We say “unfortunately” in this
LAN cable tester
courtesy of ABM Communications
A variety of RJ-45 crimping tools are available.
11. Networking in CCTV
case, when discussing the unwanted interference to signal cables, but in fact the same concept is used
for generating electric power and moving electric motors, in which case the EMI (read it as electromagnetic inductance in such case) is a highly desirable effect.
Avoiding EMI is as simple as not laying the network cable within 30 cm (1 ft) of electrical cable, or,
if needed, switching from UTP to more expensive shielded cable. These are basic rules that should be
applied at all times.
The only time EMI is not an issue is when using fiber cables. This is simply because fiber does not
conduct electricity but uses light as a transmission media. All longer distance and wider bandwidth
communications are usually achieved with fiber cables, for they offer not only longer distances (a couple
of kilometers) but much wider bandwidth. Most importantly, they are not subject to EMI.
Fiber optics network cabling
As was the case in analog video transmission, fiber optics has some significant technological advantages over copper.
Fiber optics can transmit wider bandwidth data and longer distances than copper.
This means less equipment and infrastructure (such as switches and wiring cabinets) is needed, thereby
lowering the overall cost of the LAN. Fiber optics is physically much thinner and more durable than
copper, taking up less space in cabling ducts and allowing for a greater number of cables to be pulled
through the same duct. New developments in fiber optics cabling also allow it to be tied in a knot and
still function normally. As described under the analog video transmission over fiber section in Chapter
10, fiber optics completely encloses the light pulses within the outer sheath, making it impervious to
outside interference or eavesdropping.
Another very important property of fiber optic is its immunity to any electromagnetic interference,
including lightning induction. You can submerge it in water, and it is less susceptible to temperature
fluctuations than copper. All these qualities make fiber optics cable the ultimate choice.
Fiber provides higher bandwidth (approximately 50
Gb/s, that is 50 gigabits per second, over multi-mode
and even higher over single-mode fiber), and it “future
proofs” a network’s cabling architecture against copper
Various types of fiber cables
Although users currently do not require speeds faster
than Fast Ethernet in small to medium-size CCTV
projects, the cost differential between copper and fiber
optics will become less and less significant, making
fiber optics a compelling option for any size system.
Fiber optics infrastructures are still more expensive
11. Networking in CCTV
than copper. Fiber optics switch ports
and adapter cards cost, on average, approximately 50% more than comparable
copper products. However, when you
factor in the cost savings associated with
fiber (such as the need for fewer repeaters and switches, wider bandwidth), the
overall cost of a fiber optics system drops
comparable to one with a copper-based
When you eliminate the expense of creating and maintaining extra wiring cabinets, a fiber optics LAN costs about the
same, or even less, than a copper LAN. In
the past, fiber optics’ lofty price had little
to do with the medium itself – most of
the expense lay in transceivers and connectors. Due to new products in each of
these areas, costs have been decreasing,
pushing fiber optics use upward.
ST connectors
SC connectors
MTRJ connectors
Maximum distances achievable with a
single run of fiber depends on the type
of fiber (multi-mode or single-mode) as well as the transmitting and receiving equipment. The accurate distances can only be found after testing an installation with an OTDR (Optical Time Domain
Reflectometer), which will naturally consider the quality of terminations, cable, and equipment.
Courtesy of Signamax
Various types of fiber network interface cards
Courtesy of 3M
Various types of fiber network connectors
11. Networking in CCTV
A general rule of thumb is that multi-mode will go up
to 2 km, and single-mode usually over 20 km without
the need for repeaters.
Because of the high bandwidth and distances it can handle,
fiber optics is most often used as a network backbone,
where more network segments are connected in a larger
network, typical for the digital CCTV systems of casinos
and large shopping centers. In such system design, fiber to
copper media converters are used. Many different makes
and models are available on the market; they can be stand
alone, or multiple converters can be housed in 19” racks.
It is important here to highlight here again the importance
of proper tools for testing and fault-finding fiber networks.
If you consider fiber networks a serious part of your CCTV
business, investing in good-quality instruments and tools
is always a wise thing to do. If, however, this is beyond
the reach of your budget, you can always hire specialized
fiber optics businesses that can perform most of the tasks
on your behalf. If the fiber cable is already installed, they
would typically charge per fiber connection termination,
which would include an OTDR report.
Courtesy of Fluke
With proper tools everything is
We have already explained and described a few fiber cable
known – OTDR for networks.
termination methods in this book, under the analog video
transmission media. This technology gets better and easier
to terminate fiber cables, details of which can be found from the manufacturers, so we are not going to
go into details here, but we shall concentrate on the Ethernet basics and components.
Typical media converters (fiber to copper)
Courtesy of Signamax
11. Networking in CCTV
Network concepts and components
Ethernet networking follows a simple set of rules and components that govern its basic operation.
The Ethernet basically uses the CSMA/CD access method to handle simultaneous demands. It is one
of the most widely implemented LAN standards. The acronym CSMA/CD signifies carrier-sense
multiple access with collision detection and describes how the Ethernet protocol regulates communication among nodes. Although the term may seem intimidating, if we break it apart into its component
concepts we will see that it describes rules very similar to those that several people use in polite conversation. If one talks at the dinner table, for example, the other listens until he or she stops talking.
In the moments of silence when somebody decides to say something, the rest of the listeners wait
again until the second person finishes talking. If in the moments of pause two or more people start to
talk simultaneously, a collision occurs. In networking, this is equivalent to data collision between two
computers. The CSMA/CD protocol states that in such cases both computers maintain silence briefly
and wait a random time until they start talking again. Whichever randomness is shorter becomes the
first “speaker” of the two and the others wait until he or she finishes. The random time gives all participants (computer stations) an equal chance in the conversation (data exchange) at the dinner
table (Ethernet network) to have their say.
To better understand these rules and components, it is important to understand the basic terminology, so
here we are going to introduce the most common ones, with a short description of what they mean.
This book is intended for the CCTV industry, and as such the Ethernet basics are somewhat condensed.
Readers interested in more details are referred to more extensive books dedicated to networking, such
as Internetworking Technologies Handbook, published by Cisco Systems.
• Network – A network is a group of computers connected together in a way that allows information to be exchanged between the computers.
• Local Area Network (LAN) – A LAN is a network of computers that are in the same general
physical location, usually within a building or a campus. If the computers are far apart (such as
across town or in different cities), then a Wide Area Network (WAN) is typically used.
• OSI layers – An Open System Interconnection reference model was introduced by ISO, which
defines seven layers of networking.
• Node – A node is anything that is connected to the network. Although a node is typically a
computer, it can also be something like a printer or a DVR.
• Segment – A segment is any portion of a network that is separated from other parts of the
network by a switch, bridge, or router.
• Backbone – The backbone is the main cabling of a network to which all of the segments are
connected. Typically, the backbone is capable of carrying more information than the individual
11. Networking in CCTV
segments. For example, each segment may have a transfer rate of 10 Mb/s (megabits per second),
whereas the backbone may operate at 100 Mb/s.
• Repeater – Repeater is a network device used to extend and interconnect network segments
allowing for longer distances. Repeaters receive signals from one network segment and amplify,
re-time, and re-transmit those signals to another network. They are very similar to the in-line
amplifiers we have in the analog CCTV. There are limits to how many repeaters can be used
one after another. Repeaters are not capable of performing complex filtering or routing that
other devices listed below are.
• Hub – The hub connects multiple computers and devices into a LAN. Hubs work in the physical layer 1 of the OSI model (explained further in the book) and connects each computer via a
dedicated cable. Hubs do not perform any “intelligent” data packet switching or routing; thus,
hubs with many ports will cause more data collisions and losses. Hubs basically create physical
star networks, and in some respect they can be considered as repeaters.
• Bridge – Bridge is a more “intelligent” data communication device that connects and enables
data packet forwarding between homogeneous networks. Bridges support store-and-forward traffic switching. Bridging occurs at level 2 of the OSI model (explained further in this chapter).
Network basic terminology illustrated
11. Networking in CCTV
• Switch – A network switch is another “intelligent” data communication device that is more
common and a successor to the network bridge. While bridges have only a few ports, switches
handle many more. Switches also reduce data collision in the network segments it connects,
and it provides dedicated bandwidth to each network segment.
• Router – Routers are specialized computers that send messages to their destinations along
thousands of pathways. They have even higher “intelligence” than switches as they are crucial
devices that let messages flow between networks, rather than within networks. Routing is often
compared and thought to be the same as bridging, but the main difference is that routing is
“more intelligent” as it is based on the router knowing and learning the shortest path to deliver
a specific information from source to destination. Routing occurs at level 3 of the OSI model.
• Network Interface Card (NIC) – Every computer (and most other devices) is connected to a
network through an NIC. In most computers, this is an Ethernet card (normally 10 or 100 Mb/s)
that is plugged into a PCI slot on the computer’s motherboard.
• Media Access Control (MAC) address – This is the physical address of any device – such
as the NIC in a computer – on the network. The MAC address, which is made up of two equal
parts, is 6 bytes long. The first 3 bytes identify the company that made the NIC; the second 3
are the serial number of the NIC itself.
• Unicasting – A unicast is a transmission from one node addressed specifically to another
• Multicasting – In a multicast, a node sends a packet addressed to a special group address.
Network switches
11. Networking in CCTV
Devices that are interested in this group register to receive packets addressed to the group. An
example might be a router sending out an update to all of the other routers.
• Broadcasting – In a broadcast, a node sends out a packet that is intended for transmission to
all other nodes on the network.
• Data Frames – Frames are analogous to sentences in human language. In English, we have
rules for constructing our sentences: We know that each sentence must contain a subject and
a predicate. The Ethernet protocol specifies a set of rules for constructing frames. There are
explicit minimum and maximum lengths for frames, and a set of required pieces of information
that must appear in the frame. Each frame must include, for example, both a destination address
and a source address, which identify the recipient and the sender of the message.
Networking software
The Internet protocols
In order for various computers to talk to each other via any network, there must be a common language
of understanding, a common protocol. In networking, the term protocol refers to a set of rules that
govern communications. Protocols are to computers what language is to humans. Since this book is in
English, to understand it you must be able to read English. Similarly, for two devices on a network to
communicate successfully, they must both understand the same protocols.
Various protocols belonging to layers 5, 6, and 7, are used in today’s world of the Internet, and this
book would not be complete without listing all of them and describing them briefly.
TCP/IP – Transmission Control Protocol / Internet Protocol
Two of the most popular suites of protocols used in the Internet today. Introduced in the mid1970s by Stanford University and Bolt Beranek and Newman (BBN) after funding by DARPA
(Defence Advanced Research Projects Agency). First appeared under the Berkeley Software
Distribution (BSD) Unix.
TCP is reliable; that is, packets are guaranteed to wind up at their target, in the correct order.
IP is the underlying protocol for all the other protocols in the TCP/IP protocol suite. IP defines
the means to identify and reach a target computer on the network. Computers in the IP world
are identified by unique numbers, which are known as IP addresses (explained further in this
PPP – Point-to-Point Protocol
A protocol for creating a TCP/IP connection over both synchronous and asynchronous systems.
PPP provides connections for host to network or between two routers. It also has a security
11. Networking in CCTV
mechanism. PPP is well known as a protocol for connections over regular telephone lines using modems on both ends. This protocol is widely used for connecting personal computers to
the Internet.
SLIP – Serial Line Internet Protocol
A point-to-point protocol to be used over a serial connection, a predecessor of PPP. There is
also an advanced version of this protocol known as CSLIP (Compressed Serial Line Internet
Protocol) which reduces overhead on a SLIP connection by sending just a header information
when possible, thus increasing packet throughput.
FTP – File Transfer Protocol
A protocol that enables the transfer of text and binary files over a TCP connection. FTP allows
for files transfer according to a strict mechanism of ownership and access restrictions. It is one
of the most commonly used protocols over the Internet today.
A terminal emulation protocol, defined in RFC854, for use over a TCP connection. It enables
users to log in to remote hosts and use their resources from the local host.
SMTP – Simple Mail Transfer Protocol
A protocol dedicated for sending e-mail messages originating on a local host over a TCP connection to a remote server. SMTP defines a set of rules that allows two programs to send and
receive mail over the network. The protocol defines the data structure that would be delivered
with information regarding the sender, the recipient (or several recipients), and, of course, the
mail’s body.
HTTP – Hyper Text Transport Protocol
A protocol used to transfer hypertext pages across the World Wide Web.
SNMP – Simple Network Management Protocol
A simple protocol that defines messages related to network management. Through the use of
SNMP, network devices such as routers can be configured by any host on the LAN.
UDP – User Datagram Protocol
A simple protocol that transfers packets of data to a remote computer. UDP does not guarantee
that packets will be received in the same order they were sent. In fact, it does not guarantee
delivery at all.
11. Networking in CCTV
ARP – Address Resolution Protocol
In order to map an IP address into a hardware address the computer uses the ARP protocol which
broadcasts a request message that contains an IP address, to which the target computer replies
with both the original IP address and the hardware address.
NNTP – Network News Transport Protocol
A protocol used to carry USENET posting between News clients and USENET servers.
The OSI seven-layer model of networking
The basics of networking revolves around understanding the so-called seven-layer OSI model.
Proposed by the ISO (International Standards Organization) in 1984, the OSI acronym could be read
as ISO backwards, but it actually means Open System Interconnection reference model.
The OSI model describes how information from a software application in one computer moves
through a network medium to a software application in another computer. The OSI model is
considered the primary architectural model for intercomputer communications.
The idea behind such a model is to simplify the task of moving information between networked computers and make it manageable. A task, or group of tasks, is then assigned to each of the seven OSI layers.
Each layer is reasonably self-contained, so that the tasks assigned to each layer can be implemented
OSI has two major components:
• An abstract model of networking (the Basic Reference Model, or seven-layer model)
• A set of concrete protocols
Parts of OSI have influenced Internet protocol development, but none more than the abstract model
itself, documented in OSI 7498 and its various addenda. In this model, a networking system is divided
into layers. Within each layer, one or more entities implement its functionality. Each entity interacts
directly only with the layer immediately beneath it, and provides facilities for use by the layer above
it. Protocols enable an entity in one host to interact with a corresponding entity at the same layer in a
remote host.
The seven layers of the OSI Basic Reference Model are (from bottom to top):
Layer 7 – Application
Layer 6 – Presentation
Layer 5 – Session
11. Networking in CCTV
Layer 4 – Transport
Layer 3 – Network
Layer 2 – Data link
Layer 1 – Physical
Many prefer to list the seven layers starting from layer 1 down to layer 7, but it does not really matter, as long as they are remembered as the basic building blocks of the whole networking technology.
A handy way to remember the layers is the sentence “All people seem to need data processing” and
each first letter of that sentence corresponds to the first letter of the layers starting from layer 7 going
to layer 1.
The seven layers can be grouped into two main groups: upper layers and lower layers.
The upper layers of the OSI model deal with application issues and generally are implemented in
software only. Layer 7 is the closest to the computer user as it represents the software application
passing the information to the user. Basically, both the user and the application layer processes interact
with software application that contains a communication component.
As we go down through the layers, we get closer to the physical medium. So, the lower layers of the
OSI are closer to the hardware (although do not exclude software) and handle the data transport issues.
The lowest layer is closest to the physical medium, that is, network cards and network cables,
and they are responsible for actually placing information on the network medium.
11. Networking in CCTV
Let us now explain the meaning of each layer, starting from the lowest one.
1. The Physical layer
The Physical layer describes the physical properties of the various communications media, as well as
the electrical properties and interpretation of the exchanged signals. For example, this layer defines the
size of Ethernet cable, the type of connectors used, and the termination method.
The Physical layer is concerned with transmitting raw bits over a communication channel. The design
issues have to do with making sure that when one side sends a 1 bit, it is received by the other side as a
1 bit, not as a 0 bit. Typical questions here are how many volts should be used to represent a 1 and how
many for a 0, how many microseconds a bit lasts, whether transmission may proceed simultaneously
in both directions, how the initial connection is established, how it is torn down when both sides are
finished, how many pins the network connector has, and what each pin is used for. The design issues
here deal largely with mechanical, electrical, and procedural interfaces and the physical transmission
medium, which lies below the Physical layer. Physical layer design can properly be considered to be
within the electrical engineer’s domain.
2. The Data Link layer
The Data Link layer describes the logical organization of data bits transmitted on a particular medium.
This layer defines the framing, addressing, and check-summing of Ethernet packets. The main task of
the Data Link layer is to transform a raw transmission facility into a line that appears free of transmission errors in the Network layer. It accomplishes this task by having the sender break the input data
up into data frames (typically, a few hundred bytes), transmit the frames sequentially, and process
the acknowledgment frames sent back by the receiver. Since the Physical layer merely accepts and
transmits a stream of bits without any regard to meaning of structure, it is up to the Data Link layer
to create and recognize frame boundaries. This can be accomplished by attaching special bit patterns
to the beginning and end of the frame. If there is a chance that these bit patterns might occur in the
data, special care must be taken to avoid confusion. The Data Link layer should provide error control
between adjacent nodes.
Another issue that arises in the Data Link layer (and most of the higher layers as well) is how to keep
a fast transmitter from “drowning” a slow receiver in data. Some traffic regulation mechanism must be
employed in order to let the transmitter know how much buffer space the receiver has at the moment.
Frequently, flow regulation and error handling are integrated, for convenience.
If the line can be used to transmit data in both directions, this introduces a new complication for the
Data Link layer software. The problem is that the acknowledgment frames for A to B traffic compete
for use of the line with data frames for the B to A traffic. A clever solution in the form of piggybacking
has been devised.
11. Networking in CCTV
3. The Network layer
The Network layer describes how a series of exchanges over various data links can deliver data between
any two nodes in a network. This layer defines the addressing and routing structure of the Internet.
The Network layer is concerned with controlling the operation of the subnet. A key design issue is
determining how packets are routed from source to destination. Routes could be based on static tables
that are “wired into” the network and rarely changed. They could also be determined at the start of
each conversation, for example, a terminal session. Finally, they could be highly dynamic, being newly
determined for each packet, to reflect the current network load.
If too many packets are present in the subnet at the same time, they will get in each other’s way, forming bottlenecks. The control of such congestion also belongs to the Network layer.
Since the operators of the subnet may well expect remuneration for their efforts, often some accounting
function is built into the Network layer. At the very least, the software must count how many packets
or characters or bits are sent by each customer, to produce billing information. When a packet crosses
a national border, with different rates on each side, the accounting can become complicated.
When a packet has to travel from one network to another to get to its destination, many problems
can arise. The addressing used by the second network may be different from that of the first one; the
second one may not accept the packet at all because it is too large; the protocols may differ; and so
on. It is up to the Network layer to overcome all these problems to allow the interconnecting of the
heterogeneous networks.
In broadcast networks, the routing problem is simple, so the network layer is often thin or even nonexistent.
4. The Transport layer
The Transport layer describes the quality and nature of the data delivery. This layer defines if and how
retransmissions will be used to ensure data delivery. The basic function of the Transport layer is to
accept data from the session layer, split it up into smaller units if need be, pass these to the Network
layer, and ensure that all the pieces arrive correctly at the other end. Furthermore, all this must be done
efficiently and in a way that isolates the Session layer from the inevitable changes in the hardware
Under normal conditions, the Transport layer creates a distinct network connection for each transport connection required by the Session layer. If the transport connection requires a high throughput,
however, the Transport layer might create multiple network connections, dividing the data among the
network connections to improve throughput. On the other hand, if creating or maintaining a network
connection is expensive, the Transport layer might multiplex several transport connections onto the
same network connection to reduce the cost. In all cases, the Transport layer is required to make the
multiplexing transparent to the Session layer.
11. Networking in CCTV
The transport layer also determines what type of service to provide to the Session layer, and ultimately,
the users of the network. The most popular type of transport connection is an error-free point-to-point
channel that delivers messages in the order in which they were sent. However, other possible kinds of
transport, service and transport isolated messages exist, with no guarantee about the order of delivery
to multiple destinations. The type of service is determined when the connection is established.
The Transport layer is a true source-to-destination or end-to-end layer. In other words, a program on
the source machine carries on a conversation with a similar program on the destination machine, using
the message headers and control messages.
Many hosts are multiprogrammed, which implies that multiple connections will be entering and leaving each host. There needs to be some way to tell which message belongs to which connection. The
transport header is one place where this information could be added.
In addition to multiplexing several message streams onto one channel, the Transport layer must establish and delete connections across the network. This requires some kind of naming mechanism, so
that the process on one machine has a way of describing with whom it wishes to converse. There must
also be a mechanism to regulate the flow of information, so that a fast host cannot overrun a slow one.
Flow control between hosts is distinct from flow control between switches, although similar principles
apply to both.
5. The Session layer
The Session layer describes the organization of data sequences larger than the packets handled by lower
layers. This layer describes how request and reply packets are paired in a remote procedure call. The
Session layer allows users on different machines to establish sessions between them. A session allows
ordinary data transport, as does the transport layer, but it also provides some enhanced services useful
in some applications. A session might be used to allow a user to log into a remote time-sharing system
or to transfer a file between two machines.
One service provided by the Session layer is to manage dialogue control. Sessions can allow traffic to
go in both directions at the same time, or in only one direction at a time. If traffic can only go one way
at a time, the Session layer can help keep track of whose turn it is.
A related Session service is token management. For some protocols, it is essential that both sides do
not attempt the same operation at the same time. To manage these activities, the Session layer provides
tokens that can be exchanged. Only the side holding the token may perform the critical operation.
Another Session service is synchronization. Consider the problems that might occur when trying to
complete a two-hour file transfer between two machines on a network with a 1 hour mean time between
crashes. After each transfer is aborted, the whole transfer will have to start over again, and will probably
fail again with the next network crash. To eliminate this problem, the Session layer provides a way to
insert checkpoints into the data stream, so that after a crash, only the data after the last checkpoint has
to be repeated.
11. Networking in CCTV
6. The Presentation layer
The Presentation layer describes the syntax of data being transferred. This layer describes how floating
point numbers can be exchanged between hosts with different math formats. The Presentation layer
performs certain functions that are requested sufficiently often to warrant finding a general solution for
them, rather than letting each user solve the problems. In particular, unlike all the lower layers, which
are just interested in moving bits reliably from here to there, the Presentation layer is concerned with
the syntax and semantics of the information transmitted.
A typical example of a Presentation service is encoding data in a standard, agreed-upon way. Most user
programs do not exchange random binary bit strings; they exchange things such as people’s names,
dates, amounts of money, and invoices. These items are represented as character strings, integers, floating point numbers, and data structures composed of several simpler items. Different computers have
different codes for representing character strings, integers, and so on. In order to make it possible for
computers with different representation to communicate, the data structures to be exchanged can be
defined in an abstract way, along with a standard encoding to be used “on the wire.” The Presentation
layer handles the job of managing these abstract data structures and converting from the representation
used inside the computer to the network standard representation.
The Presentation layer is also concerned with other aspects of information representation. For example,
data compression can be used here to reduce the number of bits that have to be transmitted, and cryptography is frequently required for privacy and authentication.
7. The Application layer
The Application layer describes how real
work actually gets done. This layer would
implement file system operations. The
Application layer contains a variety of
protocols that are commonly needed. For
example, there are hundreds of incompatible terminal types in the world. Consider
the plight of a full-screen editor that is
supposed to work over a network with
many different terminal types, each with
different screen layouts, escape sequences
for inserting and deleting text, moving the
cursor, and so on.
One way to solve this problem is to define
an abstract network virtual terminal for
which editors and other programs can be
written to. To handle each terminal type,
11. Networking in CCTV
a piece of software must be written to map the functions of the network virtual terminal onto the real
terminal. For example, when the editor moves the virtual terminal’s cursor to the upper left-hand corner of the screen, this software must issue the proper command sequence to the real terminal to get its
cursor there too. All the virtual terminal software is in the Application layer.
Another Application layer function is file transfer. Different file systems have different file naming conventions, different ways of representing text lines, and so on. Transferring a file between two different
systems requires handling these and other incompatibilities. This work, too, belongs to the Application
layer, as do electronic mail, remote job entry, directory lookup, and various other general-purpose and
special-purpose facilities.
IP addresses
The Internet Protocol (IP) was created in the 1970s to support early computer networking with the Unix
operating system. Today, IP has become a standard for all modern network operating systems to communicate with each other. Many popular, higher-level protocols such as HTTP and TCP rely on IP.
The Internet Protocol (IP) address uniquely identifies the node or Ethernet device, just as a name
identifies a particular person.
No two Ethernet devices on the same network should ever have the same address.
Two versions of IP exist in production use today. Nearly all networks use IP version 4 (IPv4), but
an increasing number of educational and research networks have adopted the next generation IP version 6 (IPv6).
Since a signal on the Ethernet medium reaches every attached node, the destination address is critical to identify
the intended recipient of the frame. For example, when
computer B transmits to printer C, computers A and D
will still receive and examine the frame. However, when
a station first receives a frame, it checks the destination
address to see if the frame is intended for itself. If it is
not, the station discards the frame without even examining its contents.
One interesting aspect of Ethernet addressing is the implementation of a broadcast address. A frame with a destination address equal to the broadcast address (simply called
a broadcast, for short) is intended for every node on the
network, and every node will both receive and process.
Understanding the IP addressing is especially important
Typical IP address setting in
Microsoft Windows 2000
11. Networking in CCTV
for the CCTV technical guys who visit various sites that have their own networks. In order to connect,
program a DVR, or evaluate the network, one should not only have approval from the appropriate IT
personnel at such a company, but should clearly understand how to set up their own PC to become
part of the customer’s network, without intruding or affecting it. Although it is possible, and perhaps
much easier and safer, to connect to a DVR directly by using a crossover Cat-5 cable (that is, if you
are physically close to it), it is still important to know how such an IP address can be accessed from
one’s own PC (not a part of the network one is visiting).
The few “classic” network ping-commands mentioned at the end of this chapter may help establish
the validity of certain addresses.
IPv4 addressing notation
The most common IP address type is the IPv4, which consists of 4 bytes (32 bits).
These bytes are also known as octets.
For purposes of readability, humans typically work with IP addresses in a decimal notation that uses
periods to separate each octet. For example, the IP address
11000000 10101000 1100110 1011010
shown in the binary system has the first 8 bits (octet) equivalent to the decimal representation of:
1×2 + 1×2 + 0×2 + 0×2 + 0×2 + 0×2 + 0×2 + 0×2 = 128 + 64 + 0 + 0 + 0 + 0 + 0 + 0 = 192
Similar logic applies to the other three octets, so that the decimal equivalent representation of the
previous binary IP address is:
Because each byte is 8 bits in length, each octet in an IP address ranges in value from a minimum of
0 to a maximum of 255 (2 ).
Therefore, the full range of IP addresses in IPv4 annotation is from through
This represents a total of 256×256×256×256 = 256 = 4,294,967,296 possible IP addresses.
One could say that there are enough IP addresses for almost every single person on our planet, but
do not forget that in the beginning of the twenty-first century our planet already has over 6 billion
people. The growth of the Internet has been so rapid that larger addressing space is seen as inevitable
11. Networking in CCTV
IP address classes
Not all IP addresses are free for use in your local LAN, which you will no doubt find from your IT manager. In addition, not all addresses that you could use can be used, for you have to find out what address
is free to use and yet belongs to the same address group addressable by your network equipment.
In order to bring some order to the many possible LANs and WANs, there are some agreed-upon rules
and address classes that all Ethernet devices obey. These are the IPv4 classes.
The IPv4 address space can be subdivided into five classes: Class A, B, C, D, and E.
Each class consists of a contiguous subset of the overall IPv4 address range.
With a few special exceptions explained later in this chapter, the values of the leftmost 4 bits of an IPv4
address determine its class as shown in this table.
Class A, B, and C
Class A, B, and C are the three classes of addresses used on the Internet, with private addresses exceptions as explained next.
Private addresses
When a computer or a network device resides on a private network (not on the Internet), it should use
one of the many private addresses defined by the IP standards. Such devices, when connected to the
Internet, via an ADSL modem, for example, are practically invisible to the other Internet devices which
use the other (“visible”) Class A, B, or C IP addresses.
The IP standard defines specific address ranges within Class A, Class B, and Class C reserved for use by
private networks (intranets). The following table lists these reserved ranges of the IP address space.
Nodes are effectively free to use addresses in the private ranges if they are not connected to the Internet,
or if they reside behind firewalls or other gateways that use Network Address Translation (NAT).
11. Networking in CCTV
Private address allocation in Class A, B, and C
IP address Class C
All Class C addresses, for example, have the leftmost 3 bits set to “110,” but each of the remaining 29
bits may be set to either “0” or “1” independently (as represented by an x in these bit positions):
110xxxxx xxxxxxxx xxxxxxxx xxxxxxxx
By converting the above to dotted decimal notation, it follows that all Class C addresses fall in the
range from through
IP loopback address is the loopback address in IP.
Loopback is a test mechanism of network adaptors. Messages sent to do not get delivered
to the network. Instead, the adaptor intercepts all loopback messages and returns them to the sending
application. IP applications often use this feature to test the behavior of their network interface. On
some products this address is used to synchronize the time to a master device.
As with broadcast, IP officially reserves the entire range from through for
loopback purposes. Nodes should not use this range on the Internet, and it should not be considered
part of the normal Class A range.
Zero addresses
As with the loopback range, the address range from through should not be considered part of the normal Class A range.
0.x.x.x addresses serve no particular function in IP, but nodes attempting to use them will be unable to
communicate properly on the Internet.
IP address Class D and Multicast
The IPv4 networking standard defines Class D addresses as reserved for multicast.
11. Networking in CCTV
Multicast is a mechanism for defining groups of nodes and sending IP messages to that group rather
than to every node on the LAN (broadcast) or just one other node (unicast).
Multicast is used mainly on research networks, but in some CCTV systems multicasting is a required
feature. In a large digital matrix switcher where there are more than one user, multicasting can be
used for sending the same packets of data (in our case video images) to various operators. This consequently reduces the data traffic as the same packets are received by multiple operators, rather than
being transmitted separately.
As with Class E, Class D addresses should not be used by ordinary nodes on the Internet.
IP address Class E and limited broadcast
The IPv4 networking standard defines Class E addresses as reserved, which means that they
should not be used on IP networks.
Some research organizations use Class E addresses for experimental purposes. However, nodes that
try to use these addresses on the Internet will be unable to communicate properly.
A special type of IP address is the limited broadcast address A broadcast involves delivering a message from one sender to many recipients. Senders direct an IP broadcast to
to indicate that all other nodes on the local network (LAN) should pick up that message. This broadcast
is “limited” in that it does not reach every node on the Internet, only nodes on the LAN.
Technically, IP reserves the entire range of addresses from through for
broadcast, and this range should not be considered part of the normal Class E range.
IP network partitioning
Computer networks consist of individual segments of network cable. The electrical properties of cabling
limit the useful size of any given segment such that even a modestly sized local area network (LAN)
will require several of them. Gateway devices such as routers and bridges connect these segments
together, though not in a perfectly seamless way.
Besides partitioning through the use of cable, subdividing of the network can also be done at a higher
level. Subnets support virtual network segments that partition traffic flowing through the cable rather
than the cables themselves. The subnet configuration often matches the segment layout one to one, but
subnets can also subdivide a given network segment.
Network addressing fundamentally organizes hosts into groups. This can improve security (by isolating
critical nodes) and can reduce network traffic (by preventing transmissions between nodes that do not
need to communicate with each other). Overall, network addressing becomes even more powerful
when introducing subnetting and/or supernetting.
11. Networking in CCTV
Virtual private networking (VPN)
A VPN utilizes public networks to conduct private data communications. Most VPN implementations
use the Internet as the public infrastructure and a variety of specialized protocols to support private
communications through the Internet. VPN follows a client and server approach. VPN clients authenticate users, encrypt data, and otherwise manage sessions, with VPN servers utilizing a technique called
The governing bodies that administer Internet Protocol have reserved certain networks for internal uses.
In general, intranets utilizing these networks gain more control over managing their IP configuration
and Internet access. A subnet allows the flow of network traffic between hosts to be segregated based
on a network configuration. By organizing hosts into logical groups, subnetting can improve network
security and performance. Subnetting works by applying the concept of extended network addresses
to individual computer (and other network device) addresses.
An extended network address includes both a network address and additional bits that represent the
subnet number. Together, these two data elements support a two-level addressing scheme recognized
by standard implementations of IP. The network address and subnet number, when combined with the
host address, therefore support a three-level scheme.
IPv6 addressing notation
Although this addressing is not widespread as yet, it is no doubt something that future networks will
have use of, if nothing else because of the amount of addresses made available under such notation.
The IPv6 addresses are 16 bytes (128 bits) long, rather than 4 bytes (32 bits).
This represents more than
possible addresses (25616).
The preferred IPv6 addressing form is using hexadecimal values of the eight 16-bit pieces:
In hexadecimal representation of numbers, rather than decimal, the numbers use A as 11 in decimal, B
is 12, C is 13, D is 14, E is 15, and F is 16.
Note that it is not necessary to write the leading zeros in an individual field, but there must be at least
one numeral in every field.
In the coming years, as an increasing number of cell phones, PDAs, and other network appliances
11. Networking in CCTV
expand their networking capability, this much larger IPv6 address space will probably be necessary.
IPv6 Address Types
IPv6 does not use classes. IPv6 supports the following three IP address types:
• Unicast
• Multicast
• Anycast
Unicast and multicast messaging in IPv6 are conceptually the same as in IPv4.
IPv6 does not support broadcast, but its multicast mechanism accomplishes essentially the same effect.
Multicast addresses in IPv6 start with “FF” (255) just like IPv4 addresses.
Anycast in IPv6 is a variation on multicast. Whereas multicast delivers messages to all nodes in the multicast group, anycast delivers messages to any one node in the multicast group. Anycast is an advanced
networking concept designed to support the fail over and load balancing needs of applications.
Reserved addresses in IPv6
IPv6 reserves just two special addresses: 0:0:0:0:0:0:0:0 and 0:0:0:0:0:0:0:1.
IPv6 uses 0:0:0:0:0:0:0:0 internal to the protocol implementation, so nodes cannot use it for their own
communication purposes.
IPv6 uses 0:0:0:0:0:0:0:1 as its loopback address, equivalent to in IPv4.
Domain Name Systems (DNS)
Although IP addresses allow computers and routers to identify each other efficiently, humans prefer to
work with names rather than numbers.
The Domain Name System (DNS) supports the best of both worlds.
DNS allows nodes on the public Internet to be assigned both an IP address and a corresponding name
called a domain name. For DNS to work as designed, these names must be unique worldwide. Hence,
an entire “cottage industry” has emerged around the purchasing of domain names in the Internet name
DNS is a hierarchical system and organizes all registered names in a tree structure.
11. Networking in CCTV
At the base or root of the tree are a group of top-level domains including familiar names like com,
org, and edu and numerous country-level domains like au (Australia), fi (Finland), or uk (United
One generally cannot purchase names at this level. However, in a well-publicized and controversial
event in 2000, the island nation of Tuvalu agreed to receive a large payment in return for rights to the
root domain tv.
Below this level are the second-level registered domains such as These are domains that
organizations can purchase from any of numerous accredited registrars.
For nodes in the com, org, and edu domains, the Internet Corporation for Assigned Names and Numbers
(ICANN) oversees registrations. Below that, local domains like are defined and
administered by the overall domain owner. DNS supports additional tree levels as well.
The period (the dot ‘.’) always separates each level of the hierarchy in DNS.
DNS is also a distributed system. The DNS database contains a list of registered domain names. It
further contains a mapping or conversion between each name and one or more IP addresses. However,
DNS requires a coordinated effort among many computers (servers); no one computer holds the entire
DNS database. Each DNS server maintains just one piece of the overall hierarchy – one level of the
tree and then only a subset or zone within that level.
The top level of the DNS hierarchy, also called the root level, is maintained by a set of 13 servers
called root name servers. These servers have gained some notoriety for their unique role on the Internet. Maintained by various independent agencies, the servers are uniquely named A, B, C, and so
on up to M. Ten of these servers reside in the United States, one in Japan, one in London, and one in
Stockholm, Sweden.
DNS works in a client/server fashion. The DNS servers respond to requests from DNS clients called
resolvers. ISPs and other organizations set up local DNS resolvers as well as servers. Most DNS servers
also act as resolvers, routing requests up the tree to higher-level DNS servers and delegating requests
to other servers. DNS servers eventually return the requested mapping (either address-to-name or
name-to-address) to the resolver.
DHCP (Dynamic Host Configuration Protocol) is a protocol that lets network administrators centrally
manage and automate the assignment of IP addresses on the corporate network. When a company sets
up its computer users with a connection to the Internet, an IP address must be assigned to each machine.
Without DHCP, the IP address must be entered manually at each computer on the corporate network.
DHCP permits a network administrator to supervise and distribute IP addresses from a central point
and automatically sends a new IP address when a computer is plugged into a different place in the
network. DHCP uses the concept of a “lease” or amount of time that a given IP address will be valid
11. Networking in CCTV
for a computer. Using very short leases, DHCP can dynamically reconfigure networks in which there
are more computers than there are available IP addresses.
DNS was not designed to work with dynamic addressing such as that supported by DHCP. It requires
that fixed (static) addresses be maintained in the database. Web servers in particular require fixed IP
addresses for this reason.
Many CCTV systems that are designed to be accessible from remote locations via Internet need to
have fixed public IP address instead of a domain name. Such fixed addresses are available from most
Internet Service Providers (ISP) at an additional cost.
11. Networking in CCTV
Networking hardware
Hubs, bridges, and switches
Hubs classify as Layer 1 devices in the OSI model. Hubs connect multiple Ethernet devices in a
star configuration, so that any device connected to the hub can “see” and talk to any other device
in that group (network segment).
At the Physical layer, hubs can support little in the way of sophisticated networking. Hubs do not read
any of the data passing through them and are not aware of their source or destination. Essentially,
a hub simply receives incoming packets, possibly amplifies the electrical signal, and broadcasts these
packets out to all devices on the network, including the one that originally sent the packet. Typically,
4, 8, 16, or up to 24 devices can be connected to one hub since there are no hubs with more than 24
ports. If more devices are used, more hubs can be added.
Because Ethernet works on the principle of carrier-sense multiple access with collision detection
(CSMA/CD) it is quite obvious that the more devices
are connected to the hub the more data packets collision will occur, slowing down the network traffic.
One way to reduce data congestion would be to split a
single segment into multiple segments, thus creating
multiple collision domains. This solution creates a
different problem, for these now separate segments
are not able to share information with each other. This
is where network bridges and switches are used.
Photo courtesy of Micronet
Various size hubs
Bridges and switches are data communications devices that operate principally at Layer 2 of
the OSI reference model. Bridging and switching occur at the Data Link layer, which controls
data flow, handles transmission errors, provides physical (as opposed to logical) addressing, and
manages access to the physical medium. Bridges provide these functions by using various link layer
protocols that dictate specific flow control, error handling, addressing, and media-access algorithms.
Bridges and switches are not complicated devices. They analyze incoming frames, make forwarding decisions based on information contained in the frames, and forward the frames toward the
destination. In some cases, such as source-route bridging, the entire path to the destination is contained
in each frame. In other cases, such as transparent bridging, frames are forwarded one hop at a time
toward the destination.
Bridges became commercially available in the early 1980s. Like bridges in our daily lives that connect
one side of a river with another, network bridges connect one group of Ethernet devices with another. At
the time of their introduction, bridges connected and enabled packet forwarding between homogeneous
networks, but more recently, bridging between different networks has also been defined and standardized. Several kinds of bridging have proven important as internetworking devices so that transparent
11. Networking in CCTV
bridging is found primarily in Ethernet environments, while source-route bridging occurs primarily in
Token Ring environments. Translational bridging provides translation between the formats and transit
principles of different media types, such as Ethernet and Token Ring.
Network bridge
Bridges connect two or more network segments, increasing the network diameter (as a repeater does),
but they also help regulate traffic. They send and receive transmissions just like any other node, but
they do not function in the same way as a normal node. The bridge does not originate any traffic
of its own, it only echoes what it hears from other stations. So, one goal of the bridge is to reduce
unnecessary traffic on both segments. This is done by examining the destination address of the frame
before deciding how to handle it. If the destination address, for example, is that of station A or B (see
the illustration), then there is no need for the frame to appear on a segment where A and B are not
members. In this case, the bridge does nothing. We can say that the bridge filters or drops the frame.
If the destination address is that of station C or D, or if it is the broadcast address, then the bridge will
transmit or forward the frame onto the segments where C and D are. By forwarding packets, the bridge
allows any of the devices of different segments to communicate. In addition, by filtering packets when
appropriate, the bridge makes it possible for station A to transmit to station B at the same time that
station C transmits to station D, allowing two conversations to occur simultaneously.
Switches are the modern
counterparts of bridges,
functionally equivalent but
offering a dedicated segment for every node on the
network. Switches are Data
Link layer devices that, like
bridges, enable multiple
physical LAN segments to
be interconnected into a
single larger network.
Photo courtesy of Linksys
24-port gigabit switch
LAN switches are used to
interconnect multiple LAN
11. Networking in CCTV
segments. LAN switching provides dedicated, collision-free communication between network devices,
with support for multiple simultaneous conversations. LAN switches are designed to switch data frames
at high speeds.
By dividing large networks into self-contained units (segments), bridges and switches provide several
• Because only a certain percentage of traffic is forwarded, a bridge or switch reduces the
unnecessary traffic and the network becomes more efficient.
• The bridge or switch will act as a firewall for some potentially damaging network errors
and will accommodate communication between a larger number of devices than would be supported on any single LAN connected to the bridge.
• Bridges and switches extend the effective length of a LAN, permitting the attachment of
distant stations that was not previously permitted.
Although bridges and switches share most relevant attributes, several distinctions differentiate these
technologies. Bridges are generally used to segment a LAN into a couple of smaller segments,
whereas switches are generally used to segment a large LAN into many smaller segments. Bridges
generally have only a few ports for LAN connectivity, and switches generally have many.
Switches can also be used to connect LANs with different media – for example, a 10 Mb/s Ethernet
LAN and a 100 Mb/s Ethernet LAN can be connected using a switch. Some switches support cutthrough switching, which reduces latency and delays in the network, whereas bridges support only
store-and-forward traffic switching. Finally, switches reduce collisions on network segments because
they provide dedicated bandwidth to each network segment.
Modern Ethernet implementations often look nothing like their historical counterparts. Where long
runs of coaxial cable provided attachments for multiple stations in legacy Ethernet, modern Ethernet
networks use twisted pair wiring or fiber optics to connect stations in a radial pattern (star-configuration). Where legacy Ethernet networks transmitted data at 10 Mb/s, modern networks can operate at
100, 1000 Mb/s, or even 10,000 Mb/s.
Ethernet switching gave rise to another advancement: full-duplex Ethernet. Full-duplex is a data communications term that refers to the ability to send and receive data at the same time. Legacy Ethernet
is half-duplex, meaning information can move in only one direction at a time. In a totally switched
network, nodes only communicate with the switch and never directly
with each other. Switched networks also employ either twisted pair or
fibre optic cabling, both of which use separate conductors for sending
and receiving data. In this type of environment, Ethernet stations can
forgo the collision detection process and transmit at will, since they
are the only potential devices that can access the medium. This allows
end stations to transmit to the switch at the same time that the switch
transmits to them, achieving a collision-free environment.
11. Networking in CCTV
Routers for logical segmentation
Bridges and switches can reduce congestion by allowing multiple conversations to occur on different
segments simultaneously, but they have their limits in segmenting traffic as well.
An important characteristic of bridges is that they forward Ethernet broadcasts to all connected segments.
This behavior is necessary, as Ethernet broadcasts are destined for every node on the network, but it can
pose problems for bridged networks that grow too large. When a large number of stations broadcast on
a bridged network, congestion can be as bad as if all those devices were on a single segment.
Routers are “intelligent” networking components that can divide a single network into two logically
separate networks. While Ethernet broadcasts cross bridges in their search to find every node on the
network, they do not cross routers, because the router forms a logical boundary for the network.
Routers operate based on protocols that are independent of the specific networking technology, such
as Ethernet or token ring. This allows routers to easily interconnect various network technologies,
both local and wide area, and has led to their widespread deployment in connecting devices around the
world as part of the global Internet.
Network ports
A network port is an interface for communicating with a computer program over a network. Network
ports are usually numbered, and a network implementation (like TCP or UDP) will attach a port number
to data it sends. The receiving implementation will use the attached port number to figure out which
computer program to send the data to. The combination of a port and a network address (IP-number)
is often called a socket.
There are a total of 65,536 ports used on a networking device, which comes from the 16 bits allocated
to addressing the port numbers (216).
Not all ports of a network device are known , but there is a general division into three groups:
• The Well-Known Ports are those from 0 through 1023.
• The Registered Ports are those from 1024 through 49151.
• The Dynamic and/or Private Ports are those from 49152 through 65535.
For example, some of the Well-Known Ports are:
• 20 – FTP: the file transfer protocol – data
• 21 – FTP: the file transfer protocol – control
• 22 – SSH: secure logins, file transfers (scp, sftp), and port forwarding
11. Networking in CCTV
• 23 – Telnet: nonsecure text communications
• 25 – SMTP: Simple Mail Transfer Protocol (E-mail)
• 53 – DNS: Domain Name Server
• 80 – HTTP: HyperText Transfer Protocol (www)
• 110 – POP3: Post Office Protocol (E-mail)
• 143 – IMAP4: Internet Message Access Protocol (E-mail)
• 443 – HTTPS: used for securely transferring web pages, etc.
Ports can be closed, depending on the requirement and in order to minimize any risk from external
hacker attacks. In some tightly controlled network environments, certain ports need to be opened in
order to have a certain function of a DVR system, for example, accessible from a remote location. This
is usually negotiated with the appropriate IT manager of the company using such a system.
A network analogy example
In order to summarize all the aforementioned network concepts and devices, which for many in CCTV
might be a bit daunting, let me share with you the following analogy:
Imagine you live in a nice little town. The town could represent your Wide Area Network (WAN),
while your own suburb would represent the Local Area Network (LAN) segment. Each house, shop,
or object would represent a network device, with its own address, which is basically the IP address in
a network. All the houses in your own street have different numbers but carry the name of the street,
which is exactly how it is in the Local Area Network, where all devices have the first three groups of
IP numbers the same and the last is unique to each house. No two houses in the same street have the
same number. If one of the houses is better known by its owner’s name or the business name residing
on that address, that will be equivalent to DNS address allocation instead of the IP number (in our
analogy, the house number).
Imagine now that you have a variety of roads in your town, with many vehicles traveling in various
directions. In our networking analogy each road would represent the Ethernet media (cable), and
each vehicle driving on that road would represent a data packet. The roads are narrow and have traffic flowing in both directions, so that you cannot go any quicker than what the vehicle’s speed is in
front of you, and you can only use one-half of the road width, which is equivalent to half-duplex data
in network communications. If intersections are not regulated with traffic lights, you basically have
the equivalent of hub devices in networking. They do not regulate the traffic intelligently; they only
allow you to get from one street to another, but if you have many cars going in various directions, the
waiting time in front of such intersections could be quite long. This is equivalent to the data packets
collision in Ethernet terminology.
11. Networking in CCTV
On your way to the chemist shop, you might drive on a brand-new four-lane-wide road (equivalent to
100 Mb/s network), which will get you there quite quickly because there are not that many accidents
or stops (data collisions) and because the road is pretty wide and divided (equivalent to full-duplex
Ethernet). When you get to the traffic lights before you cross over to the other side where the shopping center might be, it is like getting to a network bridge that separates your traffic from the shopping
center traffic. If this were a major traffic intersection with five roads joining, for example, where some
roads can take you to other parts of your town, such as the industrial, the traffic intersection and its
intelligent traffic light switching would be equivalent to a network switch.
As it happens in real life, each vehicle could have a different size, which is similar to what we have
in Ethernet data packets. They could all have a different length and different sizes, and of course different content, which is like having a different number of passengers or items being transported in a
In order for a vehicle to get from its original location (your home, for example) to the chemist shop
which is near the shopping center (your destination, for example), your driver must know the address
of the chemist shop, which is the same as having an IP address of the destination.
Let us now assume you want to go into the main shopping center, and let us also assume you have your
friend with you, who unfortunately happens to be disabled and uses a wheelchair. In order for you to
take him to the shopping center, you would take him via the wheelchair ramp that is designed for such
purposes. The wheelchair ramp is another access to the shopping center, which nondisabled people
usually do not use. The shopping center is the IP address you know and went to, but the access for your
disabled friend is via the wheelchair ramp, for he cannot get up the stairs. This is exactly the same as
having a different port on an Ethernet device, designed only for such purposes (i.e., customers). The
shopping center delivery docks would be another way into the shopping center, but it is dedicated only
to the trucks that deliver goods to the shopping center shops. Again, this is equivalent to another port
number of the same IP address (i.e., the shopping center). In CCTV applications, for example, one DVR
can have one port for accessing images, and another for accessing the time server function.
Pursuing this analogy, let us now assume that you want to leave your lovely town (your own network)
and want to go outside, which happens to be another state, where you have border control, and they
will not let you go there unless you have all the right vehicle documents and passport. This would be
equivalent to a router device in a network, equipped with a firewall control.
Hackers or viruses are equivalent to either polite door-knocking salesmen (intruders), or robbers that
want to come and steal something from your house and at the same time make a mess, or even burn
it down.
The following heading, Wireless LAN, is equivalent in our analogy to having helicopters instead of
cars, where you can get to any point (in a certain radius, of course, depending upon the helicopter power
and fuel) flying in the air, without the need to build roads on the ground (copper or fiber Ethernet). The
flight control tower will still have to keep the traffic in order, which is the equivalent of the wireless
network bridge or hot-spot.
11. Networking in CCTV
Wireless LAN
An increasing number of CCTV products and projects are starting to use wireless LAN (WLAN). The
acceptance and practicality of wireless communications between computers, routers, or digital video
devices is becoming so widespread that manufacturers are being forced to bring out even better and
cheaper devices. After many years of proprietary products and ineffective standards, the industry has
finally decided to back one set of standards for wireless networking: the 802.11 series from the Institute
of Electrical and Electronics Engineers (IEEE). These emerging standards define wireless Ethernet, or
wireless LAN (WLAN), also referred to as Wi-Fi (Wireless Fidelity).
There are, however, many “flavors” of the 802.11
standards, with more of them being issued, so
it might be useful for the CCTV users to obtain
a better understanding of which one is what (at
least until the time of writing this book).
Sales are expanding rapidly as an increasing
number of enterprises see the value of WLANs.
Growth has been helped by the Wireless Ethernet
Compatibility Alliance (WECA), which provides
conformance and interoperability testing. So
far, this group of more than 130 companies has
granted its “Wi-Fi” label of approval to more
than two hundred products conforming to the
802.11b standard.
Within the IEEE’s 802.11 series there are several
specifications, some complete and some still
under development.
Photo courtesy of Linksys
Wireless network bridge
What is 802.11?
IEEE 802.11, or Wi-Fi, denotes a set of Wireless LAN standards developed by working group 11 of the
IEEE 802 group. The term is also used specifically for the original version; to avoid confusion, that is
sometimes called “802.11 legacy.”
The 802.11 family currently includes three separate protocols that focus on encoding (a, b, g); security
was originally included but is now part of other family standards (e.g., 802.11i). Other standards in the
family (c-f, h-j, n) are service enhancement and extensions, or corrections to previous specifications.
802.11b was the first widely accepted wireless networking standard, followed, paradoxically, by 802.11a
and 802.11g.
11. Networking in CCTV
The frequencies used by the 802.11 are in the microwave range and most are subject to minimal
governmental regulation. Licenses to use this portion of the radio spectrum are not required in most
802.11 (legacy)
The original version of the standard IEEE 802.11 released in 1997 and sometimes called “802.1y”
specifies two data rates of 1 and 2 megabits per second (Mb/s) to be transmitted via infrared (IR) signals
or in the Industrial Scientific Medical frequency band at 2.4 GHz.
IR has been dropped from later revisions of the standard because it could not succeed against the well
established IrDA protocol and has had no actual implementations. Legacy 802.11 was rapidly succeeded by 802.11b.
802.11b has a range of about 50 meters, with the low-gain omnidirectional antennas typically used in
802.11b devices. 802.11b has a maximum throughput of 11 Mb/s; however, a significant percentage
of this bandwidth is used for communications overhead. In practice, the maximum throughput is about
5.5 Mb/s. Metal, water, and thick walls absorb 802.11b signals and decrease the range drastically.
802.11 runs in the 2.4 GHz spectrum and uses Carrier Sense Multiple Access with Collision Avoidance
(CSMA/CA) as its media access method.
With high-gain external antennas, the protocol can also be used in fixed point-to-point arrangements,
typically at ranges up to 8 kilometers (although some report success at ranges up to 80 – 120 km where
line of sight can be established). This is usually done to replace costly leased lines, or in place of very
cumbersome microwave communications gear. Current cards can operate at 11 Mb/s but will scale
back to 5.5 Mb/s, then 2 Mb/s, and then 1 Mb/s, if signal strength becomes an issue.
Extensions have been made to the 802.11b protocol (e.g., channel bonding and burst transmission techniques) in order to increase speed to 22 Mb/s, 33 Mb/s, and 44 Mb/s, but the extensions are proprietary
and have not been endorsed by the IEEE.
Many companies call enhanced versions “802.11b+”.
The first widespread commercial use of the 802.11b standard for networking was made by Apple
Computer under the trademark AirPort.
In 2001, 802.11a, a faster related protocol started shipping even though the standard was ratified in
1999. The 802.11a standard uses the 5 GHz band, and operates at a raw speed of 54 Mb/s and more
realistic net achievable speeds in the mid-20 Mb/s. The speed is reduced to 48, 36, 34, 18, 12, 9, and
11. Networking in CCTV
then 6 Mb/s if required. 802.11a has 12 nonoverlapping channels, 8 dedicated to indoor and 4 to point
to point.
Different countries have different ideas about regulatory support, although a 2003 World Radiotelecommunications Conference made it easier for use worldwide.
802.11a has not seen wide adoption because of the high adoption rate of 802.11b and because of concerns about range: at 5 GHz, 802.11a cannot reach as far as 802.11b, other things (such as same power
limitations) being equal. It is also absorbed more readily.
Most manufacturers of 802.11a equipment countered the lack of market success by releasing dual-band/
dual-mode or tri-mode cards that can automatically handle 802.11a and b or a, b, and g as available.
Access point equipment that can support all these standards simultaneously is also available.
In June 2003, a third standard for encoding was ratified: 802.11g.
This flavor works in the 2.4 GHz band (like 802.11b) but operates
at 54 Mb/s raw, or about 24.7 Mb/s net, throughput like 802.11a. It
is fully backwards compatible with b and uses the same frequencies.
Details of making b and g work together well occupied much of the
lingering technical process. However, the presence of an 802.11b
participant reduces an 802.11g network to 802.11b speeds.
The 802.11g standard swept the consumer world of early adopters
starting in January 2003, well before ratification. The corporate users held back and Cisco and other big equipment makers waited until
ratification. By summer 2003, announcements were flourishing.
Most of the dual-band 802.11a/b products became dual-band/trimode, supporting a, b, and g in a single card or access point.
A new feature called Super G is now integrated in certain access
points. These can boost network speeds up to 108 Mb/s by using
channel bonding. This feature may interfere with other networks
and may not support all b and g client cards. In addition, packet
bursting techniques are also available in some chipsets and products
which will also considerably increase speeds. Again, they may not
be compatible with some equipment.
The first major manufacturer to use 802.11g was Apple, under the
trademark AirPort Extreme.
802.11b and 802.11g divide the spectrum into 14 overlapping,
staggered channels of 22 megahertz (MHz) each. Channels 1, 6,
Photo courtesy of Linksys
Wireless network camera
11. Networking in CCTV
11, and 14 have minimal overlap, and those channels (or other sets with similar gaps) can be used
where multiple networks cause interference problems. Channels 10 and 11 are the only channels which
work in all parts of the world, because Spain and France have not licensed channels 1 to 9 for 802.11b
In January 2004 IEEE announced that it will develop a new standard for wide area wireless networks.
The real speed would be 100 Mb/s (even 250 Mb/s in the Physical level), and so up to 4–5 times faster
than 802.11g and perhaps 50 times faster than 802.11b. As projected, 802.11n will also offer a better
operating distance than current networks. The standardization progress is expected to be completed by
the end of 2005, after the publishing date of this book, so stay tuned.
Certification and security
Because the IEEE only sets specifications but does not test equipment for compliance with them, a trade
group called the Wi-Fi Alliance runs a certification program that members pay to participate in. Virtually
all companies selling 802.11 equipment are members. The Wi-Fi trademark, owned by the group and
usable only on compliant equipment, is intended to guarantee interoperability. The Wi-Fi label means
compliant with any of 802.11a, b, or g. It also includes the security standard Wi-Fi Protected Access
or WPA. Eventually Wi-Fi will also mean equipment that implements the 802.11i security standard
(also known as WPA2). Products that are Wi-Fi are also supposed to indicate the frequency band in
which they operate in 2.4 or 5 GHz.
With the proliferation of cable modems and DSL, there has arisen an ever-increasing market of people
who wish to establish small networks in their homes to share their high-speed Internet connection.
Wired Equivalent Privacy (WEP) was an encryption algorithm designed to provide wireless security
for users implementing 802.11 wireless networks. WEP was developed by a group of volunteer IEEE
members. The intention was to offer security through an 802.11 wireless network while the wireless
data was transmitted from one end point to another over radio waves. WEP was used to protect wireless communication from eavesdropping (confidentiality), prevent unauthorized access to a wireless
network (access control), and prevent tampering with transmitted messages (data integrity). Wireless
office networks are often unsecured or secured with WEP, which is easily broken. These networks frequently allow “people on the street” to connect to the Internet. Volunteer groups have also made efforts
to establish wireless community networks to provide free wireless connectivity to the public.
The Wi-Fi Protected Access (WPA) is a standards-based interoperable security specification. The
specification is designed so that only software or firmware upgrades are necessary for the existing or
legacy hardware to meet the requirements. Its purpose is to increase the level of security for existing and
future wireless LANs. WPA is an interim security solution that targets all known WEP vulnerabilities. It
will be forward compatible with the new 802.11i standard, which will be the ultimate wireless security
solution. All products are supposed to comply with the 802.11i standard once released.
11. Networking in CCTV
What about Bluetooth?
If asked to construct a Wireless Local Area Network (WLAN), most IT managers would think of
802.11b wireless Ethernet technology. Few would consider using another short-range radio technology,
Bluetooth, on its own or in combination with 802.11b-based equipment.
The reason for its neglect is that Bluetooth has been marketed as a technology for linking devices such as
phones, headsets, PCs, digital cameras, and other peripherals, rather than as a technology for LANs.
However, Bluetooth could become a serious WLAN option, partly because a lot more Bluetooth devices have been released lately. But IT managers may think twice before supporting this technology
because 802.11b and Bluetooth use the same 2.4 GHz spectrum to transmit data, so interference is a
real possibility.
Bluetooth is also closing the gap in signal range. Some companies are testing new ceramic antennas
that will boost the range of Bluetooth to around 50 meters, up from the 10 meters currently specified
and on a par with the maximum range offered by 802.11b components.
The wireless network standards summary
The IEEE 802 network standards
11. Networking in CCTV
11. Networking in CCTV
Putting a network system together
Although CCTV professionals cannot replace IT managers in various companies, it is important for
them to be able to use the basic network ideas and principles in order to set up a digital camera or a
DVR on the network.
As an example (shown on the next page) I have drawn a small CCTV hybrid system, with two operators, 3 DVRs, a number of analog cameras connected to the DVRs, and two network cameras in the
The system is hybrid: the analog cameras are connected to the DVRs, while the network IP cameras
are connected to the network switch. So we have a digital recording system, but are still using analog
cameras, as is done typically in most of the new CCTV systems today. In addition, a network printer
is attached as well as a network area storage (NAS) device on the same network switch.
This configuration can allow for any recorded footage to be stored on the NAS for indefinite time, while
the DVRs are usually recording in loop mode, that is, using the first in first out concept.
We also have depicted a PTZ camera in the system, which requires pan/tilt/zoom control from the
Main Operator’s console. In the drawing, we have illustrated that the main operator uses a standard
CCTV keyboard, but software PTZ control from the computer is also possible, which, in our case, can
be done by the Operator B.
In order to have pan/tilt/zoom control function implemented, we need to explain some things associated
with PTZ data format and the transmission over networks.
It should be known that all computers (and computer-based DVRs) can use only RS-232 data format
(input and output) on its serial ports (typically 2). By design, the RS-232 format is limited to a maximum
of around 15 m (approximately, 50 ft) cable length. Network communications on most DVRs and IP
cameras can also transmit such data while receiving images. So if the customer wishes to use a CCTV
typical PTZ keyboard, then such a keyboard has to have PTZ data produced in RS-232 format, which
is then fed into the Serial port (also known as Communication port) of Main Operator PC station. In
order for the operator to be able to control the remote PTZ camera connected to the DVR, this data
has to be transferred to the corresponding DVR with that camera. Being RS-232, this data is produced
at the serial port output of that DVR. If there are more than one PTZ cameras, or the one shown in
the drawing is further away than 15 m, then a data converter needs to be used at this DVR, in order to
convert the RS-232 data to RS-422 or RS485, whichever is used by the particular PTZ camera. This
format (RS-422/RS485) is designed to be able to reach longer distances (usually up to a kilometer, or
over 3000 feet) and can address up to 32 PTZ cameras.
When the system is installed, and cables, DVRs, and cameras and set up correctly, the time comes when
all these devices need to be put together to work in a seamless networked system.
Since all of these devices are connected in a LAN configuration, the first thing to do is assign each
11. Networking in CCTV
11. Networking in CCTV
device an IP address. The drawing shows that I have allocated each device its own address from the
Private range of IP addresses. Any number can be given here, as long as they fall within the allowed
Private address range.
If there are other devices on the network, which may not necessarily be part of the security system,
but rather part of the company computer network, care should be taken to use addresses that are not
conflicting with the security equipment IP addresses. This is the point when you would require the IT
manager of the company to give you addresses from the reserved pool they may have.
If the system is designed to utilize the customer’s existing network, it should be clear that the amount
of data that goes through the security system may affect the speed of the network for normal usage.
There is no easy answer as to how much this will affect the network, but it can be measured with various tools after the installation is completed. The amount of data flowing through the network clearly
depends on the CCTV system design and the intended usage. If the operators are continually watching
multiple split-screen images of all cameras on their computer screens, and in addition the IP cameras
are continually streaming digital video in order to be stored on the NAS, then the CCTV network will
be pretty busy.
It is fair to say, however, that no digital CCTV system installed on any existing network will block
the normal usage that existed before the system was put in place, but it will slow it down, especially if
10 Mb/s network is used. It becomes less of an issue if the network is of a faster type (like 100 Mb/s
or gigabit). In such a case, the rate of slowing down may be undetectable by a network that reads and
responds to his e-mails, but for people who are transferring huge files from one computer to another,
or from the Internet, for example, it may be noticeable.
For this reason, consideration should be given to using analog video monitors as display devices,
instead of loading the network with unnecessary “live” image viewing. Certainly, it should be found
out if the DVRs have such an output, but most of them do. If this is the case, then the operator can use
the composite monitors for day-to-day surveillance activity and use the network only for playback or
backing up a footage. It is also possible to have only a small number of cameras selected for viewing
via the network (to save data bandwidth) and to program the DVR to bring up an image only when
motion is detected. In the meantime the DVRs would usually be recording continually regardless of
what display device the operator is using, so that no event is lost. Finally, many DVRs have network
data bandwidth control which is used to minimize the continual streaming to the main operator’s PC.
If none of the above approaches is possible (for whatever reason), then consideration should be given
to designing and installing a complete, separate, and independent network dedicated just for the digital
CCTV system. This might sound a bit more expensive, but it is without any doubt a better solution.
The security CCTV system becomes safer and independent, but also faster as there are no other users
on the network. This also gives an advantage in planning clean and fast data switching, and it is much
easier to use any IP addresses one wishes to use.
11. Networking in CCTV
Some typical network symbols as used in the IT industry
11. Networking in CCTV
The IP check commands
Some software commands found on many platforms, Windows and Linux alike, should be known to
CCTV users and could help determine whether a network device is present on the network and whether
it is visible by other computers. These are the “ping” commands.
Under Windows => Start => Run type “Command” or “cmd.”
This will open a DOS window where the ping command can be typed with the IP address of the device
you want to ping.
In the example shown, we typed
which is the internal address of our network router.
If the device is connected and has the IP address you have queried (in our example, it will
respond with something similar to what is shown in the window below. The time taken to respond is
shown in milliseconds (ms).
Another command that is very useful and
that can give you the IP address of the
computer you are logged on, as well as its
physical MAC address, DNS, and DHCP,
is the command:
ipconfig /all
In the example shown below, ipconfig command tells us that the computer from which we have queried
the IP address has a MAC address 00-4--F4-72-C5-F6 and the IP address is The Gateway
is, and the DNS server we are connected to has the address of and 15.
And finally, if there is a problem with the
connection, the following command may
indicate where the problem is:
tracert <destination address>
This command will show you the route
where your ping goes and where it
12. Auxiliary equipment in CCTV
Many items in CCTV can be classified as auxiliaries. Some of them are simple to understand and use;
others are very sophisticated and complex. We will start with the very popular moving mechanism,
usually called pan and tilt head.
Pan and tilt heads
When quoting or designing a CCTV system, the first question to ask is how many and what type of
cameras: fixed or pan and tilt?
Fixed cameras, as the name suggests, are cameras installed on fixed brackets, using fixed focal length
lenses and looking in the one direction without change.
The alternative to fixed are moving (or
pan and tilt) cameras. They are placed on
some kind of moving platform, usually
employing a zoom lens, so the whole set
can pan and tilt in virtually all directions
and can be zoomed and focused at
various distances.
In CCTV terminology, this type of camera
is usually referred to as a PTZ camera.
Perhaps a more appropriate term would
be “PTZF camera,” referring to Pan/Tilt/
Zoom/Focus, or even more precisely
in the last few years, a “PTZFI” for an
additional iris control. “PTZ camera” is,
however, more popularly accepted, and
we will use the same abbreviation for
a camera that, apart from pan, tilt, and
zoom functions, might have focus, or even iris remote control.
A typical P/T head, as shown on the picture, has a side platform for the load (a camera with a zoom
lens in a housing). There are pan and tilt heads that have an overhead platform instead. The difference
between the two is the load rating that each can have, which depends on the load’s center of gravity. This
center is lower for side platform pan and tilt heads, which means that of the two types of heads, with
the same size motors and torque, the side platform has a better load rating. This should not be taken as
a conclusion that the overhead platform P/T heads are of inferior quality, but it is only an observation
of the load rating, which in the last few years is not as critical because camera and lens sizes, together
with housings, are getting smaller.
12. Auxiliary equipment in CCTV
On the basis of the application, there are two major subgroups of P/T heads:
• Outdoor
• Indoor
Outdoor P/T heads fall into one of the three categories:
• Heavy duty (for loads of over 35 kg)
• Medium duty (for loads between 10 and 35 kg)
• Light duty (for loads of up to 10 kg)
Wi t h t h e r e c e n t
camera size and
weight reductions,
together with the
miniaturization of
zoom lenses and
housings, it is very
unlikely that you will
need a heavy-duty
P/T head these days.
A medium-duty load
rating will suffice
in the majority of
The outdoor P/T heads
are weatherproof,
heavier, and more
robust. The reason
for this is that they need to carry heavier housings and quite
often additional devices such as wash/wipe assemblies and/or
infrared lights.
Indoor P/T heads, as the name suggests, should only be used on
premises protected from external elements, especially rain, wind
and snow. Indoor P/T heads are usually smaller and lighter and
in most cases they fall into the light-duty load category; they can
handle loads of no more than a few kilograms. Because of this,
indoor pan and tilt heads are often made of plastic molding and
have a more aesthetic appearance than the outdoor ones.
In most cases, a typical P/T head is driven by 24 V AC
12. Auxiliary equipment in CCTV
synchronous motors. Mains voltage P/T heads are also available (220/240 V AC or 110 V AC), but the
24 V AC is more popular because of the safety factor (voltages less than 50 V AC are not fatal to the
human body) and it is more universal, regardless of whether you are working on a European, American,
or Australian CCTV system. Most manufacturers have a 24 V AC version of all P/T site drivers.
Pan and tilt domes
Other divisions of pan and tilt heads, on the basis of physical appearance, are also possible. In the last
few years, pan and tilt domes have become more popular. They work in the same way as the heads,
only inside the domes they usually have both the moving mechanism (P/T head) and the control
electronics. They are usually enclosed in a transparent or semitransparent dome, so they make an
acceptable appearance in aesthetically demanding interiors or exteriors. Again, thanks to camera and
lens size reductions, P/T domes are getting smaller in diameter. A few years ago, pan and tilt domes up
to 1 m in diameter were not rare, while today most of them are between 300 and 400 mm.
One of the biggest problems with P/T domes is the optical precision of the dome. It is very hard to get
no distortions at all, especially with heat-blown domes. Much better precision is achieved with injection
molded domes, which are more expensive. Also, thicker domes cause more distortion, especially when
the lens is zooming. So the best optical quality is achieved with thin and injection molded domes.
A lot of manufacturers, instead of going to the trouble of producing optically nondistorting domes
concentrate the optical precision on a thin vertical strip through which the camera can see and freely
tilt up and down. The panning movement revolves the camera and the dome. Although this is a clever
solution, it can be mechanically troublesome and a limiting factor for faster movements.
Let us also mention that domes can be transparent or neutral color tinted. Transparent domes usually
have an inner mask, with an optical slot in front of the lens, while the rest of the mask is a nontransparent
black plastic. By keeping the interior dark (black zoom lenses and camera bodies), they offer a very
discrete and concealed surveillance. Very often it is impossible to judge where the camera is pointing,
which is one of the very important features of dome cameras.
12. Auxiliary equipment in CCTV
Tinted domes usually
have no mask, and
so the whole dome is
transparent but tinted.
It is important in such
cases to know the Fstop attenuation of the
tinting to compensate
for the light. A
typical attenuation is
around one F-stop,
which means 50%
light attenuation.
With today’s CCD
cameras this does not
present any threat to
the picture quality.
The camera and zoom
lens colors are more
critical with this type of dome, so if they appear
too obtrusive from the outside a careful black matt
spray can minimize this. Utmost care should be
taken, however, to protect the lens, CCD chip, and
connectors from the paint.
Preset positioning P/T heads
Finally, another subgroup of P/T heads looks the same as all the others but are fitted with preset
potentiometers. They are usually referred to as P/T heads with PP pots.
The potentiometers are built in the head itself, mechanically coupled with each of the motors. Their
value is typically 1 kΩ or 5 kΩ and they are connected to the site driver electronics (discussed in the
next chapter). A low voltage (typically 5 V DC) is applied across the pots and the site driver electronics
reads the voltage drop over its center tap, depending on the pan or tilt position, thus allowing the
site driver to remember the particular position, which is later recalled by either manual command or
automatic alarm response.
Basically, when a site driver gets an instruction to go to a preset position, it forces the pan and tilt motors
to move (the same applies to zoom and focus) until the preset potentiometers reach the preset value.
So, if a certain door is protected, for example, by using a simple reed switch we can force a camera to
automatically turn in that direction, zooming and focusing on the previously stored view of the door.
The number of preset positions a PTZ site can store depends on the design itself, but the most common
numbers are 8, 10, 16, or 32.
12. Auxiliary equipment in CCTV
A very important question is, “How precisely can the preset
positions be repeatedly recalled?” This is defined by the
mechanics, electronics, and the software and hardware design.
The precision of preset positioning is especially important with
the very fast pan and tilt units. An error of only a couple of degrees
may not be noticed when a zoom lens is fully zoomed out, but it
will make a big difference when it is fully zoomed in.
When ordering a pan and tilt head, you must specify that you want
preset positioning. A pan and tilt head with preset potentiometers
looks the same as an ordinary non-preset P/T head, because the
pots are fitted inside the unit.
PTZ site drivers
A very easy way of controlling a 24 V AC pan and tilt head is
by simply applying 24 V to one of the motors. This means pan
and tilt control can be achieved by having voltage applied to a
hard-wired connection (for each of the four movement directions) relative to a common wire. So, a
total of five wires will give us full control over a typical pan and tilt head. For zoom and focus
control as well, another three wires are
required (one for zoom, positive and
negative voltage, one for focus, and one
for common). This gives us a total of
eight wires that would be required for
a so-called hard-wired PTZ controller.
They are the cheapest way of controlling
single PTZ camera assemblies but are
impractical for long distance control of
over a couple of hundred meters.
In the majority of CCTV systems,
however, we use digital control, which
only requires a twisted pair cable through which a matrix switcher can talk to a number of PTZ devices
at the same time. These devices are often called PTZ site drivers, PTZ decoders, or PTZ receiver
drivers. They are electronic boxes (discussed with video matrix switchers) that receive and decode
the instructions of the control keyboard for the camera’s movements: pan, tilt, zoom, and focus (and
sometimes iris as well).
As mentioned earlier, there is unfortunately no standard among manufacturers of the control
encoding schemes and protocols, which means the PTZ site driver of one manufacturer cannot
be used with a matrix switcher of another.
Depending on the site driver design, other functions might also be controllable, such as wash and wipe
12. Auxiliary equipment in CCTV
and turning auxiliary devices
on and off. PTZ drivers can
also deliver power for the
camera, either 12 V DC or
24 V AC.
The P/T heads’ movement
speed, when driven by 24
V AC synchronous motors
(which is most often the
case), depends on the mains
frequency, the load on
the head, and the gearing
mechanism. Typical panning
speeds are 9°/s and tilting
6°/s. This is closely related
to the torque required to
move a certain load, which
in most cases exceeds 5 kg
(that is: camera + zoom lens
+ housing). Some designs can reach a faster speed of around 15°/s pan because of camera/lens weight
reductions and different gear ratios. Most AC-driven P/T heads, which are driven by synchronous
motors, have fixed speeds because they depend on the mains frequency.
There are some advanced AC pan and tilt head site
drivers where an artificial frequency, lower and
higher than that of the mains, is produced for better
control of the heads. A slower speed is used for
finer control (when the zoom lens is fully zoomed
in) and a faster speed is used for quicker response
to emergency situations. The control keyboard
determines the speed it should apply on the basis of
the amount of time the joystick is kept pressed in
any direction.
Even faster speeds can be achieved with DC-driven
stepper motors and specially designed PTZ site
drivers. Over the last few years, P/T heads (more
so, P/T domes) have become much faster, exceeding
Producing such fast P/T assemblies brings a few
problems to attention: the camera moves so quickly
that an appropriate manual control is impossible, or
at least impractical, and the mechanical construction
12. Auxiliary equipment in CCTV
and durability are even more critical because of increased forces of inertia. When such a speed is
magnified by the zoom lens magnification factor, we see
already fast movement become even faster.
Therefore, a novel approach to the PTZ control is now
required. Such solutions can be seen in some advanced
designs, achieving speeds of over 300°/s with highly
accurate movement. This is achieved by combining great
electronic and mechanical precision. Details like reducing
the pan and tilt speed when the lens is zoomed in and
increasing it when the lens is zoomed out, or having the
very fast preset position speed of 300°/s drop down to
a manageable 45°/s when manual control is required,
make the difference between a fast and a fast-and-userfriendly system.
A preset operation is only possible with a PTZ site driver
equipped with PP electronics. Clearly, both the P/T head
and the zoom lens should have preset potentiometers
built in.
The number of wires required between the site driver and
the PTZ head is as follows: 5 wires are required for the
basic pan and tilt functions as described earlier (pan left, pan right, tilt up, tilt down and common), 4
wires for pan and tilt preset positioning (positive pot supply, negative pot supply, pan feedback, and
tilt feedback), 3 wires for the zoom and focus functions (sometimes 4 are required, depending on the
zoom lens), and 4 wires for the zoom and focus preset positioning. This makes a total of 16 wires. The
thickness of the preset wires and the zoom and focus wires is not critical as we have very low current
for these functions, but the pan and tilt wires need to be considered carefully because they depend on
the pan and tilt motors’ requirements.
Let us not forget to mention that a PTZ
site driver is usually installed next to the
camera. There are two main reasons for
this: practicality (long runs of a 16-core
cable are not needed) and maintenance
(one needs to see what the camera P/T
head does when certain instructions are
sent to the site driver).
If the situation demands however, the site
driver can be up to a couple of hundred
meters away from the camera itself.
12. Auxiliary equipment in CCTV
Camera housings
In order to protect the cameras from environmental influences and/or conceal their viewing direction,
we use camera housings.
Camera housings can be very simple and straightforward to install and use, but they can affect the
picture quality and camera lifetime if they are not well protected from rain, snow, dust, and wind or if
they are of poor quality.
They are available in all shapes and sizes, depending
on the camera application and length. Earlier tube
cameras and zoom lenses were much bigger, calling
for housings of as much as 1 m in length and over 10
kg in weight. Nowadays, CCD cameras are getting
smaller, and so are zoom lenses. As a consequence,
housings are becoming smaller too.
A lot of attention in the last few years has been paid
to the aesthetics and functionality of the housings,
such as easy access for maintenance and concealed
cable entries.
With camera size reductions, these days tinted domes
are used instead of housings, offering much better
blending with the interior and exterior.
The glass used for housing is often considered unimportant, but optical distortions and certain spectral
attenuation might be present if the glass is unsuitable. Another important factor is the toughness of
the glass for camera protection in demanding environments. The optical precision and uniformity are
even more critical when domes are used because the optical precision and glass (plastic) distortions
are more apparent. Tinted domes are often used to conceal the camera’s viewing direction. For tinted
domes, light attenuation has to be taken into account. This is usually in the range of one F-stop, which
is equal to half the light without the dome.
A lot of housings have provision for heaters and fans. Heaters
might be required in areas where a lot of moisture, ice, or snow is
expected. Usually, about 10 W of electrical energy is sufficient to
produce enough heat for a standard housing interior. The heaters
can work on 12 V DC, 24 V AC, or even mains voltage. Check
with the supplier before connecting. No damage will occur if a
240 V (or even 110 V) heater is connected to 24 V (however,
sufficient heat will not be produced), but the opposite is not
recommended. Also, avoid primitive improvisations without any
calculations, such as connecting two 110 V heaters in series to
replace a single 240 V heater (110 + 110 = 220). The little bit of
12. Auxiliary equipment in CCTV
difference (240 V instead of 220 V) is enough to produce excessive heat and cause a quicker burning
of the heater and may even cause a fire.
In case a heater is required for an already installed housing, it is relatively easy to simulate one with
a resistor of 30 to 50 Ω and 20 W power rating (for 24 V AC). As a circuit break element, an N/C
thermostat, with a low switch-on temperature, can be used. Do not forget though, that the camera,
having a power rating of a couple of watts, acts as a heater and if the housing is small and well sealed,
this should create sufficient heat to dry the moisture inside. For snowy areas, however, a proper heater
is needed, mounted close to the housing glass.
Fans should be used in areas with very high
temperatures, and sometimes they can be combined
with heaters. The voltage required for the fan to
work can also be DC or AC, but be sure to use goodquality fans, as most of the DC fans will sooner or
later produce sparks from the brushes when rotating,
which will interfere with the video signal.
So heaters and fans are an extra obligation, but
if you have to have them, make sure you provide
them with the correct and sufficient power. They
are usually set to work automatically with a rise or
fall in temperature (i.e., there is no need for manual
Special housings are required if a wash and wipe assembly is to be added to the PTZ camera. They
are special because of the matching required between the wipe mechanism and the housing window.
It should be pointed out that when the wash/wipe assembly is used, the PTZ driver needs to have
output controls for these functions as well. They might be 24 V, 220 ~ 240 V, or 110 V AC. Another
responsibility when using washers is to make sure that the washer bottle is always filled with a sufficient
amount of clean water.
Housings and boxes (such as PTZ site drivers) that
are exposed to environmental influences are rated
with the IP numbers. These numbers indicate to what
degree of shock, dust, and water aggression the box
is resistant.
Special liquid-cooled housing
for up to 1300° C
Most camera housings are well protected from the
environment, but in special system designs, even
better protection might be required. Vandal-proof
housings are required in systems where human
or vehicle intervention is predicted, so a special,
toughened (usually lexan) glass needs to be used,
together with special locking screws. Tamper switches
may also be added for extra security. In cases like that,
12. Auxiliary equipment in CCTV
12. Auxiliary equipment in CCTV
the tamper alarm has to come back to the control center, usually through the PTZ driver, providing it
has such a facility.
And last, bullet-proof, explosion-proof, and underwater housings are also available, but they are very
rare, specially built, and very expensive. We will therefore not dedicate any space to them in this book,
but should you need more details contact your local supplier.
Lighting in CCTV
Most of the CCTV systems with outdoor cameras use both day and night light sources for better
viewing. Systems for indoor applications use, obviously, indoor (artificial) light sources, although some
may mix with daylight, as when sunlight penetrates through a window.
The sun is our daylight source, and as mentioned earlier, the light intensity can vary from as low as 100
lx at sunset to 100,000 lx at noon. The color temperature of sunlight can also vary, depending on the
sun’s altitude and the atmospheric conditions, such as clouds, rain, or fog. This might not be critical
for B/W cameras, but a color system will reflect these variations.
Artificial light sources fall into three main groups, according to their spectral power content:
1. Sources that emit radiation by incandescence, such as candles, tungsten electric lamps, and
halogen lamps.
2. Sources that emit radiant energy as a result of an electrical discharge through a gas or vapor,
such as neon lamps and sodium and mercury vapor lamps.
3. Fluorescent tubes, in which a gas discharge emits visible or ultraviolet radiation within the
tube, causing phosphors on the inside surface of the tube to glow with their own spectrum.
12. Auxiliary equipment in CCTV
The light sources of the first group produce a smooth and continuous light spectrum as per the Max
Planck formula, similar to the black body radiation law. These light sources are very suitable for B/W
cameras because of the similarity in the spectrums, especially on the left side of the CCD chip spectral
The second group of light sources produces almost discrete components of particular wavelengths,
depending on the gas type.
The third group has a more continuous spectrum than the second one, but it still has components of
significant levels (at particular wavelengths only), again, depending on the type of gas and phosphor
The last two groups are very tricky for color cameras. Special attention should be paid to the color
temperature and white balance capability of the cameras used with such lights.
Infrared lights
When events need to be monitored at night, B/W cameras can be used in conjunction with infrared
illuminator(s). Infrared light is used because B/W CCD cameras have very good sensitivity in and near
the infrared region. These are the wavelengths longer than 700 nm. As mentioned at the beginning of
this book, the human eye can see up to 780 nm, with the sensitivity above 700 nm being very weak,
so in general we say that the human eye only sees up to 700 nm.
Monochrome CCD chips see better in the infrared portion of the spectrum than the human eye. The
reason for this is the nature of the photo-effect itself. Longer wavelength photons penetrate the CCD
structure more deeply. The infrared response is especially high with B/W CCD chips without an infrared
cut filter.
A few infrared light wavelengths are common to CCTV infrared viewing. Which one is to be used
and in what case depends first on the
camera’s spectral sensitivity (various
chip manufacturers have different
spectral sensitivity chips) and, second,
on the purpose of the system.
The two typical infrared wavelengths
used with halogen lamp illuminators
are: one starting from around 715 nm and
the other from around 830 nm.
If the idea is to have infrared lights that
will be visible to the public, the 715 nm
wavelength is the better choice. If nighttime hidden surveillance is wanted, the
830 nm wavelength (which is invisible
12. Auxiliary equipment in CCTV
to the human eye) should be used.
The halogen lamp infrareds come in two
versions: 300 W and 500 W. The principle
of operation is very simple: a halogen lamp
produces light (with a similar spectrum as
the black body radiation), which then
goes through an optical high-pass filter,
blocking the wavelengths shorter than
715 nm (or 830 nm). This is why we
say wavelengths starting from 715 nm
or starting from 830 nm. The infrared
radiation is not one frequency only but
a continuous spectrum starting from
the nominated wavelength.
The energy contained in the wavelengths that do not pass the filter is
reflected back and accumulated inside the infrared illuminator. There
are heat sinks on the IR light itself that help cool down the unit, but still,
the biggest reason for the short MTBF (1000–2000 hr) of the halogen
lamp is the excessive heat trapped inside the IR light.
The same description applies to the 830 nm illuminators; only in this
case we have infrared frequencies invisible to the human eye. As
mentioned earlier, 715 nm is still visible to many.
These infrared illuminators pose a certain danger, especially for
installers and maintenance people. The reason for this is that the human
eye’s iris stays open since it does not see any light, so blindness could
result. This can happen only when one is very close to the illuminator
at night, which is when the human eye’s iris is fully opened.
The IR photo cell, being active at night, turns the light on.
The best way to check that the IR works is to feel the temperature
radiation with your hand; human skin senses heat very accurately.
Remember, heat is nothing but infrared radiation.
The infrared illuminators are mains operated, and photo cells are used
to turn them on when daylight falls below a certain level.
Both types of halogen infrared illuminators mentioned come with
various types of dispersion lenses, and it is desirable to know what angle
of coverage is best for a situation. If the infrared beam is concentrated
to a narrow angle, the camera can see farther, provided a corresponding
narrow angle lens is used (or a zoom lens is zoomed in).
12. Auxiliary equipment in CCTV
Halogen lamp infrared lights offer the best illumination possible for a B/W CCD chip, but their short
lifespan has initiated new technologies, one of which is the concept of solid-state infrared LEDs
(Light-Emitting Diodes) mounted in the form of a matrix. This type of infrared is
made with high-luminosity infrared LEDs, which have a much higher efficiency
than standard diodes and radiate a considerable amount of light. Such infrared
lights come with a few different power ratings: 7 W, 15 W, and 50 W. They are
not as powerful as the halogen ones, but the main advantage is their MTBF of
over 100,000 hr (20 to 30 years of continuous night operation).
How far you can see with such infrareds depends, again, on the camera in use and its spectral
characteristics. It is always advisable to conduct a site test at night for the best understanding of
The angle of dispersion is limited to the LED’s angle of radiation, which usually ranges between about
30° and 40°, if no additional optics are placed in front of the LED matrix.
Another type of IR used in applications is an infrared LASER diode (LASER = light amplification by
stimulated emission of radiation). Perhaps not as powerful as the LEDs, but with a laser source, the
wavelength is very clean and coherent. A typical LASER diode radiates light in a very narrow angle,
so a little lens is used to disperse the beam (usually up to about 30°). Lasers use very little power. They
concentrate coherent light into one beam, but their MTBF is shorter than the LEDs, and it usually goes
up to about 10,000 hr (approximately two to three years of continuous night operation). The major
advantages of the LASER infrared light are its very low current consumption and its small size.
We need not mention that color cameras cannot see infrared light owing to the spectrum filtering
of their infrared cut filter. There are, however, camera manufacturers that have come up with some
innovative ideas of using a color CCD chip setup for day viewing and at night converting the same
chip to monochrome by removing the infrared cut filter.
Others use a simpler method, which is to place two chips (one color and the other B/W) in the one
camera body, where the light is split by a semitransparent mirror.
Ground loop correctors
Even if all precautions have been taken during installation, problems of a specific nature often occur:
ground loops.
Ground loops are an unwanted phenomenon caused by the ground potential difference between two
distant points. It is usually the difference between the camera and the monitor point, but it could also be
between a camera and a switcher, or two cameras, especially if they are daisy-chained for synchronization
purposes. The picture appears wavy and distorted. Small ground loops may not be noticeable at all,
but substantial ones are very disturbing for the viewers. When this is the case, the only solution is to
galvanically isolate the two sides. This is usually done with a video isolation transformer, sometimes
called a ground loop corrector or even a hum bug unit.
12. Auxiliary equipment in CCTV
Ground loops can be eliminated, or at least minimized, by using monitors or
processing equipment with DC restoration. The DC restoration is performed
by the input stage of a device that has DC restoration, where the “wavy”
video signal is sampled at the sync pedestals so as to regenerate a “straight”
DC level video signal. This in effect eliminates low-frequency induction,
which is the most common ground loop artifact. A better solution, though a
more expensive one, is the use of a fiber optics cable instead of a coaxial, at
least between the distant camera(s) and the monitor end.
Lightning protection
Lightning is a natural phenomenon about which there is not much we can do. PTZ sites are particularly
vulnerable because they have video, power and control cables concentrated in the one area. A good
and proper earthing is strongly recommended in areas where intensive lightning occurs, and of
course surge arresters (also known as spark or lightning arresters) should be put inside all the system
channels (control, video, etc.). Most good PTZ site drivers have spark arresters built in at the data
input terminals and/or galvanic isolation through the communication transformers.
Spark arresters are special devices made of two electrodes, which are
connected to the two ends of a broken cable, housed in a special gas
tube that allows excessive voltage induced by lightning to discharge
through it. They are helpful, but they do not offer 100% protection.
An important characteristic of lightning is that it is dangerous not only
when it directly hits the camera or cable but also when it strikes within
close range. The probability of having a direct lightning hit is close
to zero. The more likely situation is that lightning will strike close
by (within a couple of hundred meters of the camera). The induction
produced by such a discharge is sufficient to cause irreparable damage.
Lightning measuring over 10,000,000 V and 1,000,000 A is possible;
imagine the induction it can create.
12. Auxiliary equipment in CCTV
Again, as with the ground loops, the best protection from lightning is using a fiber optics cable; with
no metal connection, no induction is possible.
In-line video amplifiers/equalizers
When coaxial cables are used for video transmission of distances longer than what is recommended
for the particular coax, in-line amplifiers (sometimes called video equalizers or cable compensators)
are used.
The role of an in-line amplifier/equalizer is very straightforward: it amplifies and equalizes the video
signal, so by the time it gets to the monitor end it is restored, more or less, to the levels it should be
when a camera is connected to a monitor directly next to it.
If no amplifier is used on long runs, the total cable resistance and capacitance rise to the values where
they affect the video signal considerably, both in level and bandwidth. When using a couple hundred
meters of coaxial cable (RG-59), the video signal level can drop from the normal 1 Vpp down to 0.2 or
0.3 Vpp. Such levels become unrecognizable to the monitor (or VCR). As a result, the contrast is very
poor, and the syncs are low, so the picture starts breaking and rolling. In addition, the higher frequencies
are attenuated much more than the lower ones, which is reflected in the loss of fine details in the video
signal. From fundamental electronics it is known that higher frequencies are always attenuated more
because of various effects such as the skin effect or impedance-frequency relation, to name just two.
This is why equalization of the video signal spectrum is necessary and not just the amplification of
Obviously, with every amplification stage the noise is also amplified. That is why there are certain
guidelines, with each in-line amplifier/equalizer, that need to be followed. Theoretically, it would be
best if the amplifier/equalizer is inserted in the middle of the long
cable run, where the signal is still considerably high relative to the
noise level. However, the middle of the cable is not a very practical
place, mainly because it requires power and mounting somewhere
in the field or under the ground.
This is why most manufacturers suggest one of the two other
The first and most common alternative is to install the in-line
amplifier at the camera end, often in the camera housing itself. In such a case, we actually do a preamplification and pre-equalization, where the video signal is boosted up and equalized to unnatural
levels, so that by the time it gets to the receiving end (the distance should be roughly known), it drops
down to 1VPP.
The second installation alternative is to place the amplifier at the monitor end, where more noise will
accumulate along the length, but the amplification can be controlled better and needs to be brought up
from a couple of hundred millivolts to a standard 1 Vpp. This might be more practical in installations
12. Auxiliary equipment in CCTV
where there is no access to the camera itself.
In both of the above alternatives, a potentiometer is usually available at the front of the unit, with
calibrated positions for the cable length to be compensated. In any case, it is of great importance to
know the cable length being compensated for.
A number of in-line amplifiers can be used in series, that is, if 300 m of RG-59/U is the maximum
recommended length for a B/W signal, 1 km can be reached by using two amplifiers (some manufacturers
may suggest only one for runs longer than a kilometer) or maybe even three. Note: the noise cannot
be avoided, and it always accumulates. Furthermore, the risk of ground loops, lightning, and other
inductions, with more (two or three) in-line amplifiers, will be even greater.
Again, if you know in advance that your installation has to go over half a kilometer, the best suggestion
is to use fiber optics. Many would suggest an RG-11/U coaxial cable instead, where a single run, without
an amplifier, can go up to 600 – 700 m, but the cost of fiber optics these days is comparable to, if not
lower than, the RG-11/U. We have already covered the many advantages of fiber.
Video distribution amplifiers (VDAs)
Very often, a video signal has to be taken to a couple of different users: a switcher, a monitor, and
another switcher or quad, for example. This may not be possible with all of these units because not
all of them have the looping video inputs. Looping
BNCs are most common on monitors. Usually, there
is a switch near the BNCs, indicated with “75 Ω” and
“High” positions. This is a so-called passive input
impedance matching. If you want to go to another
device, a monitor, for example, the procedure is to
switch the first monitor to High Impedance and loop
the coaxial cable to the second monitor, where the
impedance setting should be at 75 Ω.
12. Auxiliary equipment in CCTV
This is important, as we discussed earlier, because a camera is a 75 Ω�source and it has to see 75 Ω
at the end of the line in order to have a correct video transmission with 100% energy transfer
(i.e., no reflections).
Now picture a situation, very common in CCTV, where a customer wants to have two switchers at two
different locations, switching the same cameras but independently from each other. This can easily
be solved by using two video switchers, one looping and the other terminating, where we can use the
same logic as with the monitors.
In practice, however, simple and cheap switchers are usually made with just one BNC per video input,
which means that they are terminating inputs (i.e., with 75 Ω input impedance and looping).
It would be wrong to use BNC T adaptors to loop from one switcher to another, as many installers do.
This is incorrect because then we will have two 75 Ω terminations per channel, so the video cameras
will see incorrect impedance, causing partial reflection of the signals, in which case the reproduction
will be with double imaging and incorrect dynamics.
The solution for these sorts of cases is the use of the video distribution amplifier (VDA). VDAs do
exactly what the name suggests – they distribute one video input to more outputs, preserving the
necessary impedance matching. This is achieved with the use of some transistors or op-amp stages.
Because active electronics is used (where power needs to be brought to the circuit), this is called active
impedance matching.
A typical VDA usually has one input
and four outputs, but models with six,
eight, or more outputs are available as
well. One VDA is necessary for each
video signal, even if not all four outputs
are used.
Video matrix switchers use the same
concept as the VDAs when distributing
a single video signal to many output
channels. In such a case, only a limited
number of VDA stages can be used in
bigger matrix systems. This is because
every new stage injects a certain amount
of noise, which with analog signals
cannot be avoided.
12. Auxiliary equipment in CCTV
13. CCTV system design
Designing a CCTV system is a complex task, requiring at least basic knowledge of all the stages in
a system, as well as its components. But more importantly, prior to designing the system, we need to
know what the customer expects from it.
Understanding the customer’s requirements
The first and most important preparation before commencing the design is to know and understand
the customer’s requirements. Customers can be technically oriented people, and many understand
CCTV as well as you do, but most often they are not aware of the latest technical developments and
capabilities of each component.
The most important thing to understand is the general concept of the surveillance the customer wants:
constant monitoring of cameras and activities undertaken by 24-hour security personnel, or perhaps just
an unattended operation (usually with constant recording), or maybe a mixture of the two. Once you
understand their general requirements, it might be a good idea to explain to them what is achievable
with the equipment you would be suggesting. This is reasonably easy to accomplish with smaller and
simpler systems, but once they grow to a size of more than 10 cameras some of which could be PTZs,
a few monitors, more than one control point, a number of alarms, VCRs, and the like, things will get
Many unknown variables need to be considered: What happens if a number of alarms go off
simultaneously? Which monitor should display the alarms? Will the alarms be recorded if the VCR(s)
is/are playing back? What is the level of priority for each operator? And so on.
Those are the variables that define the system complexity and as in mathematics, in order to solve a
system with more variables, one needs to know more parameters. They can be specified by the customer,
but only after the customer has understood the technical capabilities of the equipment.
Understandably, it is imperative for you, as a CCTV expert, to know the components, hardware, and
software you would be offering and to achieve what is required in the best possible way.
You can create a favorable impression in the customer’s
mind if at the end you give him or her as much as, or
even more than, what you have promised. You will prove
unsatisfactory if you do not. Remember that if the customer
is fully satisfied the first time, chances are he or she will
come back to do business with you again.
To put it simply: Do not claim the system will do this and
that if you are not certain; make sure your system delivers
what you say it will.
13. CCTV system design
So, to design a good,
f u n c t i o n a l system,
one has to know the
components used, their
benefits and limitations,
how they interconnect,
and how the customer
wants them to be used.
The first few parts are
assumed to be fulfilled,
since you would not
be doing that job
unless you knew a few
things about CCTV.
The last one – what the
customer wants – can
be determined during
the first phone call or
Usually, the next step is to conduct a site inspection. Here is a short list of questions you should ask your
customer prior to designing the system and before or during the site inspection:
• What is the main purpose of the CCTV system?
If it is a deterrent, you need to plan for cameras and monitors that will be displayed to the public.
If it is a concealed surveillance, you will need to pay special attention to the camera type and
size, its protection, concealed cabling, and the like, as well as when it is supposed to be installed
(after hours perhaps).
• Who will be the operator(s)?
If a dedicated 24 hour guard is going to use the system, the alarm response needs to be different
from that expected when unattended, or a partially attended, system operation.
• Will it be a monochrome or color system?
The answer to this question will dictate the price, as well as the minimum illumination response.
Consequently, the lighting in the area needs to be looked at. A color picture will give more details
about the observed events, but if the intention is to see images in very low light levels, or with
infrared lights, there is no other alternative but B/W cameras (unless the customer is prepared to pay
for some of the new cameras available on the market that switch between color and monochrome
The price of a color system is dictated not only by the cameras, but also by the monitors,
13. CCTV system design
multiplexers, and/or quads (if any). Needless to say, sequential or matrix switchers, as well as
time-lapse VCRs, are the same for both B/W and color.
• How many cameras are to be used?
A small system with up to half a dozen cameras can be easily handled by a switcher or multiplexer, but
bigger systems usually need a matrix switcher or a larger number of switchers and multiplexers.
• How many of the cameras will be fixed focal length and how many PTZ?
There is a big difference in price between the two because if a PTZ camera is used instead of a
fixed one, the extra cost is in the zoom lens (as opposed to the fixed one), the pan and tilt head or
dome, the site driver, and the control keyboard to control it. But the advantages your customer will
get having a PTZ camera will be quadrupled. If on top of this, preset positioning PTZ cameras
are used, the system flexibility and efficiency will be too great to be compared with the fixed
camera system. A system with only one PTZ camera and half a dozen fixed ones is a choice that
may require a matrix switcher for control and will increase the price dramatically (compared to
a system with only fixed cameras). Alternatively, single PTZ camera control can be achieved via
a special single-camera digital or hard-wired controller, but they would also increase the price
considerably. So, if a PTZ camera is required, it would be more economical to have more than
one PTZ camera.
• How many monitors and control keyboards are required?
If it is a small system, one monitor and keyboard is the logical proposal, but once you get more
operators and/or channels to control and view simultaneously, it becomes harder to plan a practical
and efficient system. Then, an inspection of the control room is necessary in order to plan the
equipment layout and interconnection.
13. CCTV system design
• Will the system be used for live monitoring (which will require an instant response to alarms),
or perhaps recording of the signals for later review and verification?
This question will define whether you need to use VCR(s) with multiplexer(s). If you have a
matrix switcher, you will still need a multiplexer or two in addition. Have in mind that the time
lapse mode you are going to use depends on how often the tapes can be changed, and this defines
the update rate of each camera recorded. Choose, whenever possible, a pair of 9-way (or 8-way)
multiplexers instead of one 16-way, if you want to minimize the time delay in the recording rate
• What transmission media can be used on the premises?
Usually, a coaxial cable is taken as an unwritten rule and installation should be planned accordingly.
Sometimes, however, there is no choice but to use a wireless microwave or even a fiber optics
transmission, which will add considerably to the total price. If the premises are subject to regular
lightning activity, you had better propose fiber optics from the beginning and explain to the
customer the savings in the long run. So, you have to find out more about the environment in
which the system is going, what is physically possible and what is not, and then plan an adequate
video and data transmission media.
• Lastly and probably the most important thing
to find out, if possible, is what sort of budget
is planned for such a CCTV system?
This question will define and clarify some
of the previous queries and will force you to
narrow down either the type of equipment,
the number of cameras, or how the system is
expected to work. Although this is one of the
most important factors, it should not force you
to downgrade the system to something that you
know will not operate satisfactorily.
If the budget cannot allow for the desired
system, it is still good to go back to the customer
with a system proposal that you are convinced
will work as per his or her requirements (even
if it is over budget) and another one designed
within the budget with as many features as the
budget will allow for. This will usually force
you to narrow down the number of cameras,
or change some from PTZ to fixed.
The strongest argument you should put forward
when suggesting your design is that a CCTV
system should be a secure one, which can only
13. CCTV system design
be the case if it is done properly. Thus, by having a well-designed system, bigger savings will
be made in the long run.
By presenting a fair and detailed explanation of how you think the system should work, the
customer will usually accept the proposal.
Site inspections
After the initial conversation with the customer and assuming you have a reasonably good idea of
what is desired, you have to make a site inspection where you would usually collect the following
• Cameras: type (i.e., B/W or color, fixed or PTZ, resolution, etc.).
• Lenses: angles of view, zoom magnification ratio for zoom lenses (12.5–75 mm, 8–80 mm,
• Camera protection: housing type (standard, weatherproof, dome, discrete, etc.), mounting.
• Light: levels, light sources in use (especially when color cameras are to be used), east/west
viewing direction. Visualize the sun’s position during various days of the year, both summer and
winter. This will be very important for overall picture quality.
• Video receiving equipment: location, control
room area, physical space, and the console.
• Monitors: resolution, size, position, mounting, and
the like.
• Power supply: type, size (always consider more
amperes than what are required). Is there a need for
an uninterruptable power supply (UPS)? (VA rating
in that case).
• If pan/tilt heads are to be used: type, size, load rating,
control (two wire – digital or multi-core). Is there a
need for preset positioning (highly recommended
for bigger systems)? Where are they going to be
mounted? What type of brackets?
• Make a rough sketch of the area, with the
approximate initial suggestions for the camera
positions. Take into account, as much as possible,
the installer’s point of view. A small change in the
camera’s position, which will not affect the camera’s
13. CCTV system design
performance, can save a lot of time and hassle for the installer and in the end, money for the
customer. An unwritten golden rule for a good picture is to try and keep the camera from directly
facing light.
• Put down the reference names of areas where the customer wants (or where you have suggested)
the cameras to be installed. Also write down the reference names of areas to be monitored because
you will need them in your documentation as reference points. Be alert for obvious “no-nos”
(in respect to installation), even if the customer wishes something to be done. Sometimes small
changes may result in high installation costs or technical difficulties that would be impossible to
solve. It is always easier to deter the customer from making changes by explaining why in the
initial stage, rather than having to do so later in the course of installation, when additional costs
will be unavoidable.
Designing and quoting a CCTV system
With all of the above information, as well as the product knowledge (which needs constant updating),
you need to sit down and think.
Designing a system, like designing anything new, is a form of art. As is true of many artists, your work
may not be rewarded immediately, or it may not be accepted for some reason. But think positively and
concentrate as if that is to be the best system you can propose. With a little bit of luck you may make
it the best, and tomorrow you can proudly show it to your colleagues and customers.
Different people will use different methods when designing a system. There is, however, an easy and
logical beginning.
Always start with a hand drawing of what you think the system should feature. Draw the monitors,
cameras, housings, interconnecting cables, power supplies, and so on. While drawing you will see the
physical interconnection and component requirements. Then you will not omit any of the little things
that can sometimes be forgotten, such as camera brackets, types of cable used, and cable length. Making
even a rough hand sketch will bring you to some corrections, improvements, or perhaps further inquiries
to the customer. You may, for example, have forgotten to check what the maximum distance for the
PTZ control is, or how far the operators are to be from the central video processing equipment, power
cable distances, voltage drops, and so on.
Once you have made the final hand drawing, you will know what equipment is required, and it is at
this point that you can make a listing of the proposed equipment. Then, perhaps, you will come to
the stage of matching camera/lens combinations. Make sure that they will fit in the housings or domes
you intend to use. This is another chance to glance through the supplier’s specifications booklet. Do not
forget to take into account some trivial things that may make installation difficult, like the coaxial cable
space behind the camera (remember, it is always good to have at least 50 mm for BNC terminations),
the focusing movement of a zoom lens (as mentioned earlier in the chapter on zoom lenses, in a lot
of zoom lenses focusing near makes the front optical element protrude for an additional couple of
millimeters), and so on.
13. CCTV system design
The next stage is pricing the equipment – costs, sales tax and duty, installation costs, profit margins,
and the most important of all (especially for the customer) the total price.
Do not forget to include commissioning costs in there, although a lot of people break that up and show
the commissioning figure separately. This is more of a practical matter, since the commissioning cost
may vary considerably and it could take longer or shorter than planned. General practical experience
shows that it will always take at least three times longer than planned. Also, in the commissioning fees,
time should be allocated for the CCTV operator’s training.
After this step has been completed, you need to make a final and more accurate drawing of the system
you are proposing. This can be hand drawn, but most CCTV designers these days use computers and
CAD programs. It is easier and quicker (once you get used to it), and it looks better.
Also, the hand-calculated price needs to be written in a quotation form, with a basic explanation of
how the system will work and what it will achieve. It is important for this to be written in a concise
and simple, yet precise form, because quotations and proposals (besides being read by security managers
and technical people) are also read by nontechnical people such as purchasing officers and accountants.
Often, spreadsheet programs are used for the purpose of precise calculation, and this is another chance
to double-check the equipment listing with your drawing and make sure nothing has been left out.
As with any quotation, it is more professional to have a set of brochures enclosed for the components
you are proposing.
In the quotation, you should not forget to include your company’s terms and conditions of sale which
will protect your legal position.
If the quotation is a response to a tender
invitation, you will most likely need to
submit a statement of compliance.
This is where you confirm whether
your equipment complies or does not
comply with the tender requirements.
This is where you also have to highlight
eventual extra benefits and features
your equipment offers. In the tender,
you may also be asked to commit
yourself to the progress of the work
and supply work insurance cover, in
which case you will need a little bit of
help from your accountant and/or legal
Many specialized companies only
design and supply CCTV equipment,
in which case you will need to get
a quote from a specialized installer,
13. CCTV system design
who, understandably, will need to inspect the site. It is a good practice, at the end, to have all the text,
drawings, and brochures bound in a single document, in a few copies, so as to be practical and efficient
for reviewing and discussions.
Installation considerations
If you are a CCTV system designer, you do not have to worry about how certain cables will be pulled
through a ceiling, raisers, or camera pole mounting; that is the installer’s job. But it would be very
helpful and will save a lot of money, if you have some knowledge in that area. If nothing else, it is a
good practice, before you prepare the final quotation, to take your preferred installer on site, so that
you can take into account his or her comments and suggestions of how the practical installation should
be carried out.
First, the most important thing to consider is the type of cable to
be used for video, power, and data transmission, their distances
and protection from mechanical damage, electromagnetic
radiation, ultraviolet protection, rain, salty air, and the like. For
this purpose it is handy to know the surrounding area, especially
if you have powerful electrical machinery next door, which
consumes a lot of current and could possibly affect the video
and control signals.
Powerful electric motors that start and stop often may produce
a very strong electromagnetic field and may even affect the
phase stability of the mains. This in turn will affect the camera
synchronization (if line-locked cameras are used) as well as the
monitor’s picture display.
For example, there might be a radio antenna installed in the
vicinity, whose radiation harmonics may influence the highfrequency signals your CCTV system uses.
Mounting considerations are also important at both the camera
and monitor end. If poles are to be installed, not only the height,
but also the elasticity of the poles is important. Steel poles, for
example, are much more elastic than concrete poles.
If a PTZ camera is installed, the zoom lens magnification factor
will also magnify the pole’s movement which could result from
wind, or vibrations from the pan/tilt head movement itself. This
magnification factor is the same as the optical magnification
(i.e., a zoom lens, when fully zoomed in, may magnify a 1
mm movement of the camera due to wind to a 1 m variation
at the object plane).
13. CCTV system design
The shape of the pole is also very important
– hexagonal poles are less elastic than round ones
of the same height and diameter.
The same logic applies to camera and pan/tilt
head mounting brackets. A very cheap bracket of
a bad design can cause an unstable and oscillating
picture from even the best camera.
If the system needs to be installed in a prestigious
hotel or shopping center, the aesthetics are an
additional factor to determine the type of brackets
and mounting. It is especially important then not
to have any cables hanging.
The monitoring end demands attention to all
aspects. It needs to be durable (people will be
working with the equipment day and night), or
aesthetical (it should look good) and practical
(easy to see pictures, without getting tired of too
much noise and flashing screens).
Since all of the cables used in a system wind
up at the monitoring end and in most cases this
is the same
room where
the equipment is located, special attention needs to be paid to cable
arrangement and protection.
Often, cables lying around on the floor for a few days (during the
installation) are subject to people walking on them, which is enough
weight to damage the cable characteristics, especially the coaxial
cable impedance. Remember, the impedance depends on the physical
relation between the center core, the insulation, and the shield. If
a bigger system is in question, it is always a better idea to propose
a raised floor, where all the cables are installed freely below the
raised floor.
Sometimes, if a raised floor is not possible, many cables can be run
over a false ceiling. In such cases special care should be taken to
secure the cables as they could become very heavy when bundled
A hexagonal metal pole is
stiffer than a circular pole
Larger installations may want a patch panel for the video signals.
This is usually housed in a 19'' rack cabinet, and its purpose is to
break the cables with special coax link connectors so as to be able
13. CCTV system design
to reroute them in case of a problem or testing.
Many installers fail to get into the habit of marking the cables
properly. Most of them would know all of the cables at the
time of installation, but two days later they can easily forget
them. Cable marking is especially critical with larger and more
complex systems.
Insist on proper and permanent cable markings as per your
drawings. There are plenty of special cable-marking systems on
the market. In addition, listing of all the numbers used on the
cables should be prepared and added to the system drawings.
Remember, good installers differ from bad ones in the way they
terminate, run, arrange, and mark the cables, as well as how they
document their work.
There is no standard for drawing CCTV system block diagrams,
as there is in electronics or architecture. Any clear drawing should be acceptable as long as you have
clearly shown the equipment used (i.e., cameras, monitors, VCRs) and their interconnection.
13. CCTV system design
Many people use technical drawing aids, such as CAD programs, or other PC or Mac-based drawing
packages. Depending on the system size, it might be necessary to have two different types of
drawings: one of a CCTV block diagram showing the CCTV components’ interconnection and cabling
requirements, while the other could be a site layout with the camera positions and coverage area. In
smaller installations, just a block diagram may be sufficient.
The CCTV block diagram needs to show the system in its completeness, how the components are
interconnected, which part goes where, what type of cable is used, and where it is used.
If the site layout drawing is well prepared, it can later be used as a reference by the installer, as well
as by your customer and yourself when reviewing camera locations, reference names, and discussing
eventual changes.
When the CCTV system is installed and the job is finished, drawings may need small alterations,
depending on the changes made during the installation. After the installation, the drawings are usually
enclosed with the final documentation, which should also include manuals, brochures, and other relevant
13. CCTV system design
Commissioning is the last and most important procedure in a CCTV system design before handing it
over to the customer. It involves great knowledge and understanding of both the customer’s requirements
and the system’s possibilities. Quite often, CCTV equipment programming and setup are also part of
this. It includes video matrix switcher programming, time-lapse VCR programming, camera setup,
and so on.
Commissioning is usually conducted in close cooperation with the customer’s system manager and/or
operator(s), since a lot of settings and details are made to suit their work environment.
The following is a typical list of what is usually checked when commissioning:
All wiring is correctly terminated.
Supply voltage is correct to all appropriate parts of the system.
Camera type and lens fitted are correct for each position.
Operation of auto irises under various light levels is satisfactory.
If VCRs are fitted, they should be recording in the most efficient time-lapse mode (especially
when multiplexed cameras are being recorded).
If DVRs are installed, the pictures per second performance and image quality (compression
setting) should be checked .
All system controls are properly functioning (pan/tilt, zoom,
focus, etc.)
The setting of all pan and tilt limits is correct.
Preset positioning, if such cameras are used, is correct.
The level of supplementary lighting is satisfactory.
The system must continue to work when the main supply is
disconnected, and a check should be made as to how long it
does (if UPS is used).
Commissioning larger systems may take a bit longer than the smaller
ones. This is an evolution from the system on paper to the real thing,
where a lot of small and unplanned things may come up because of
new variations in the system concept. Customers, or users, can suggest
the way they want things to be done, only when they see the initial
system appearance. Commissioning in such cases may therefore take
up to a few days.
13. CCTV system design
Training and manuals
After the initial setup, programming, and commissioning are finished, the operators, or system users,
will need some form of training.
For smaller systems this is fairly straightforward and simple. Just a verbal explanation may be sufficient,
although every customer deserves a written user’s manual. This can be as simple as a laminated sheet
of paper with clearly written instructions.
Every piece of equipment should come with its own User’s Manual, be it a time lapse VCR, a camera,
or a switcher, but they have to be put together in a system with all their interconnections and this is
what has to be shown to the customer. Every detail should be covered, especially alarm response and
the system’s handling in such cases. This is perhaps the most important piece of information to the
For larger systems, it is a
good idea to bind all the
component manuals, together
with the system drawings,
wiring details, and operator’s
instructions, in a separate
folder or a binder. Naturally,
for systems of a larger size,
training can be a more complex
task. It may even require some
special presentation with slides
and drawings so as to cover all
the major aspects.
Good systems are recognized not
only by their functionality but
also by their documentation.
Handing over
When all is finished and the customer is comfortable with what he or she is getting, it is time to hand
over the system. This is an official acceptance of the system as demonstrated and is usually backed by
the signing of appropriate documents.
It is at this point in time that the job can be considered finished and the warranty begins to be effective.
From now on, the customer takes over responsibility for the system’s integrity and operation.
If customers are happy with the job, they usually write an official note of thanks. This may be used
later, together with your other similar letters, as a reference for future customers.
13. CCTV system design
Preventative maintenance
Regardless of the system’s use, the equipment gets old and dirty, and faults may develop due to various
factors. It is of benefit to the customer if you suggest preventative maintenance of the system after the
warranty expires.
This should be conducted by appropriately qualified persons and most
often it is the installer that can perform this task successfully. However,
a third party can also do this provided the documentation gives sufficient
details about the system’s construction and interconnection.
The system should be inspected at least twice a year, or in accordance with
the manufacturer’s recommendations and depending on the environmental
aggressiveness. Where applicable, the inspection should be carried out in
conjunction with a checklist, or equipment schedule, and should include
inspections for loosened or corroded brackets, fixing and cleaning of the
housings or domes, monitor screens, VCR heads, DVR air filter, hard disk
upgrade, compression versus length of recording correction, improving
back-focus on some cameras, and so on.
Larger systems that include intelligent video matrix switchers may
require reprogramming of some functions, depending on the customer’s
13. CCTV system design
Please handle your Test Chart with care.
The CCTV Labs Test Chart was designed primarily for indoor use,
and although it can be used outside,
please avoid direct exposure to water, rain and snow,
as well as long periods of exposure to direct sunlight.
Although the CCTV Labs Test Chart has been designed
specifically for the CCTV Industry,
it can be used to verify the quality of other visual,
transmission and recording systems.
DISCLAIMER: CCTV Labs Pty. Ltd. has designed this chart with the best intentions to offer an objective and independent
measurement of various video signal characteristics, and although all the details are as accurate as we can make them, we
do not take any responsibility for any damage or loss resulting from the use of the chart.
14. Video testing
This last chapter will attempt to explain how the CCTV Labs test chart and the CCTV Labs test pattern generator TPG-8 can be used to make certain system evaluation and measurement. These are by
no means the only tools on the market. Other test charts and generators are available, but these are
specifically developed for the CCTV industry and they are readily available. Furthermore, the test chart
is traditionally reproduced on the back cover of this book so it is important for the reader to know how
to use it. Just to repeat once again what we said earlier in the book: for more accurate resolution and
color measurements we encourage you to obtain the larger A3 format test chart from CCTV Labs web
site (
The CCTV Labs test chart
In order to help you determine your camera resolution, as well as check other video details, CCTV
Labs Pty. Ltd. has designed this special test chart in A3 format, a reproduction of which also appears
on the back cover of this book.
We have tried to make it as accurate and informative as possible and although it can be used in the
broadcast applications, it should not be taken as a substitute for the various broadcast test charts. It
should be used for CCTV applications only and as a guide in comparing different equipment and/or
transmission media.
This test chart has been updated with some new features compared to the previous edition. This addition refers primarily to the white lines that will allow you to check whether you can recognize a person
at a certain distance. This procedure is
based on the recommendations of VBG
Installationshinweise für Optische Raumüberwachungs-anlagen (ORÜA) SP
9.7/5, and accepted by the European
Standards, EN 50132-7, and the Australian
CCTV Standards.
With this chart you can check a lot of
other details of a video signal, primarily
the resolution, but also bandwidth, monitor linearity, gamma, color reproduction,
impedance matching, reflection, and
digital recorders picture quality at various
compression levels.
14. Video testing
Before you start testing
Use high-quality lens
For the best quality picture reproduction of your camera you first have to select a very good lens (that
has much better resolution than the CCD chip itself). The smaller the CCD/CMOS chip is (i.e., 1/3" vs
1/4"), the more critical the lens quality is. When the physical chip size (width, for example) is divided
by the number of pixels, the number obtained is pixels per millimeter. If we assume, for example, that
we have 1/3" CCD chip with 752 horizontal pixels, and knowing the width of 1/3" chip is 4.8 mm,
the pixel density will be approximately 6.4 μm, which is equivalent to 156 pixels per mm. In order to
have at least this quality lens resolution, we need to have a lens with optical resolution of at least half
of this number (i.e., 78 lines/mm). Half of the number is used because in optics when counting lines
per millimeter resolution only black lines are counted, as opposed to television where both black and
white are. Check the resolution of the lens, expressed in lines/mm, with your lens supplier. Out of all
the various lenses (fixed, vari-focal, zoom) the best choice would be a fixed focal-length manual iris
Shorter focal lengths, showing angles of view wider
than 30°, should usually be avoided because of the
spherical image distortion they may introduce. A good
choice for 1/2'' CCD cameras would be a 12 mm, 16
mm, or 25 mm lens. For 1/3'' CCD cameras, a good
choice would be when 8 mm, 12 mm, or 16 mm lens
is used.
The longer focal length will force you to position the
camera further away from the test chart. For this purpose it is recommended that you get a photographic
tripod for the camera.
Use high-quality monitor
Next, you must use a high-resolution monitor with an
underscan feature in order to see 100% of what the
camera “sees.”
Most standard CCTV monitors do not have this feature,
but some brands do have it.
When testing camera resolution the best choice would
be a high-quality monochrome (B/W) monitor since
their resolution reaches 1000 TV lines in the center.
14. Video testing
Color monitors are acceptable only if they are of broadcast, or near-broadcast, quality. To qualify
for this, a monitor should have at least 500 TV lines of horizontal resolution. Understandably, B/W
cameras that have over 500 TV lines of horizontal resolution cannot have their resolution tested with
such a monitor, but the majority of color cameras (which have up to 480 TV lines) should be okay for
testing with such a monitor.
Setup procedure
Position the chart horizontally and perpendicular to the optical axis of the lens (see the accompanying
diagram). The camera has to see a full image of the chart exactly to the black/white triangular arrows. To
see this you must switch the monitor to the underscan position so you can view 100% of the image.
If you do not have a monitor with an underscanning feature, allow approximately a 10% narrower view
of the total chart width (measuring up to the black/white arrows intersection), which might be close to
what a normal overscanning monitor would show.
This is, however, not precise for checking resolution. So, if you only have a standard monitor, the following little trick might substitute the more expensive underscanning monitor.
Position the camera with its tripod as closely to displaying the full image as possible. Set the vertical
hold on the monitor in such a position to view the vertical blanking sync signal (the horizontal black
bar in between TV fields). You should be able to set the V-hold button to such a position as to have a
steady horizontal bar somewhere in the middle of the screen. Then, try to adjust the camera with its
tripod and/or lens so that you can see both the top and bottom positional triangles on the test chart
14. Video testing
touching the edge of the black vertical blanking bar (circled in red). Once you adjust the vertical camera
position it is easy to adjust the horizontal so that the test chart picture is in the middle of the monitor
screen. Then, and only then, can you read precise data from the test chart.
Illuminate the chart with two diffused lights on both sides, while trying to avoid light reflection off
the chart. The test chart surface is a matte finish, which minimizes reflections but still, ideally, the
light incident angle should be more than 45º (measured relative to the orthogonal) so that the chart is
uniformly illuminated. You can buy a calibrated light source if you want to have a good illumination
reference, but as a general rule the following tungsten light globes produce the following color temperatures and lumens:
500 W Tungsten => 3200º K (approximately 27 lumens/watt)
200 W Tungsten => 2980º K (approximately 17.5 lumens/watt)
75 W Tungsten => 2820º K (approximately 15.4 lumens/watt)
Ideally, this light source should be reflected from a white metal umbrella type lamp, similar to what
photographic studios use for diffusing the light. A uniform light on the test chart is very important for
accurate and consistent measurements.
It would be an advantage to have these
two lights controlled by a light dimmer
because then, you can also test the camera’s minimum illumination. Naturally, if
this needs to be done, the whole operation
will need to be conducted in a room without any external light. Also, if you want
to check the low light level performance
of your camera, you will need to obtain
a precise lux-meter, or perhaps use one
of our methods to convert photographic
camera EV light reading into luxes (check
“CCTV focus” issue 9, or download the
article from the CCTV Labs web site
( When
using a color camera, please note that
the camera needs to be switched on after
the lights have been turned on, so that
the color white balance circuit detects
its white point.
Setting up the test chart without the
underscanning monitor
For low light level camera testing a good source of light, and a relatively consistent one, could be a
standard candle, at approximately 1 m distance from the test chart. This, by definition, is producing an
illumination close to 1 lux on 1 m2. Because the candle flame cannot always be controlled to the same
14. Video testing
Using a candle light at around 1 m from the test chart can be useful in
comparing and testing low light camera performance
intensity, this illumination is only calculated and should be taken with caution. It should be a sufficiently
good reference, however, when comparing the low light performances of various cameras. The candle
in such cases should be positioned close to the camera in order to have uniform light at the test chart
plane. The best lens used in such a case is a manually controlled zoom lens, so that the camera can be
set to have 100% of the test chart in its field of view, while having the
candle next to it, not in front (it will affect the video signal level), but
also not behind the camera (it will create shadow).
Position the camera on a tripod, or a fixed bracket, at a distance that
will allow you to see a sharp image of the full test chart. The best focus
sharpness can be achieved by seeing the center of the “Focus target”
section. Make sure the black/white arrows’ tips touch the underscanned
picture edge or the black vertical sync bar, if you are using the alternative method described above.
In the latest version of the test chart we have added concentric circles
in each of the four corners which can also help you adjust the perpen-
14. Video testing
dicular position of the optical axis relative to the test chart plane. Furthermore, these circles can also
help you determine whether the CCD chip of the camera has correct 90º positioning relative to the
optical axis. Some cameras with lower quality back-focus mechanisms have obvious misalignment of
the CCD chip plane, which can be determined by looking at the concentric circles. In order to obtain
minimum depth of field, open the iris fully for such measurement.
For optimum test chart sharpness set the lens’s iris
to the middle position (F-5.6 or F-8), which is the
best optical resolution in most lenses, and then adjust
the light dimmer to get a full dynamic range video
signal. In order to see this, an oscilloscope will be
Do not forget to switch off all the video processing
circuits in the camera you are testing (i.e., AGC,
CCD-iris, BLC).
Make sure that all the impedances are matched, that
the camera sees 75 ohms at the end of the coaxial
14. Video testing
What you can test
To check the camera resolution (either vertical or horizontal), you have to
determine the point at which the four sharp triangular lines inside the circle
converge into three. That is the point where the resolution limits can be
read off the chart. The example on the right shows a horizontal resolution
of approximately 550 TV lines. For a more precise reading of the horizontal
resolution, as per the broadcast definition, you would need an oscilloscope
with a line selection feature. The resolution limit is then determined with
the oscilloscope rather than relying on the visual performance of your
monitor. By definition, resolution limit is where the depth of modulation is
around 5%. In order to use
these, we have redesigned
the test chart to make
easier and more accurate
measurements with line
selection oscilloscopes.
Position the camera on a
tripod, or a fixed bracket,
at a distance that will allow you to see a sharp image of the full test chart.
Coordinates of TV lines for 5% depth of modulation The best focus sharpness
can be achieved by see-
14. Video testing
ing the center of the “Focus target” section. Make sure the
black/white arrows’ tips touch the underscanned picture edge
or the black vertical sync bar, if you are using the alternative
method described above.
If the alignment with white/black arrows is precise, then the
line counting starts from the top of the monitor, having 288
active lines in a PAL TV field and 240 in NTSC TV field.
In order to verify (or measure) horizontal resolution of, for
example, 400 TVL, the line counted as number 133 in PAL
TV signal should show around 5% depth of modulation. For
An actual photo of a camera
NTSC this measurement should be around line 113. In order
showing around 460 TVL
to see and more easily locate the depth of modulation, the
horizontal resolution measurement lines have been replicated
at the very beginning of the test chart, so that they can be
selected easily and the TV line trigger locked when using oscilloscope.
Please make note that these are approximate line coordinates; errors will depend on the precision of your
test chart/camera alignment, as well as the calibration and accuracy of your oscilloscope display.
Vertical resolution in cameras is seldom debated because it is limited by the scanning lines of the PAL
or NTSC system, but the horizontally positioned four lines that reduce in thickness are sufficient to
measure it.
Other important measurements
If you want to check the video bandwidth of the signal,
read the megahertz number next to the finest group of lines
where black and white lines are distinguishable.
The tilted bandwidth lines (tilted at 4º exactly relative
to the horizontal axis) inside the large circle will help
you determine artifacts produced by various CCD chip
pixels, which depends on the pixel size, alignment, and
color mosaic.
The small concentric lines in the center square of the test
chart can be used for easy focusing and/or back-focus
adjustments. Prior to doing this, you should check the exact distance between
the camera and the test chart. In most cases, the distance should be measured
to the plane where the CCD chip resides. Some lenses, however, may have the
indicator of the distance referring to the front part of the lens.
The large circle reproduction will show you the linearity of your monitor, since
14. Video testing
CCD cameras have no geometrical distortion by design. Sometimes
linearity can be more easily checked by measuring the vertical and
horizontal length of the 6 × 6 squares, left of the focus square. These
squares are tilted exactly 4º in order to show you different artifacts
from pixel size and geometry.
The wide black and white bars on the left-hand side have a twofold
function. First, they will show you if your impedances are matched
properly or if you have signal reflection, that is, if you have a spillage of the white into the black area (and the other way around),
which is a sign of reflections from the end of the line. The same
can be used to test long cable run quality, VCR playback, and other
transmission or reproduction media.
Second, you can determine whether your camera/lens combination gives sufficient details to recognize human activity, such
as intrusion or holdup. For this reason you must position the
camera at such a distance to see 4.5 m width at the test chart
plane. If you can distinguish the bars, then your camera/lens
combination is good for recognizing activity. Obviously, reading bars at number 1 is better than at number 2. Use one of the
formulas described under the focal length section to find out the
distance you have to go to with the lens you have.
The white tilted bars on the right-hand side have a similar purpose as the thicker
ones on the left-hand side. If you recognize the lines near the green letter C,
or even better B and A when the camera is at a distance to see 1.5 m width at
the chart, then you can identify a person at such a distance. A is better than B,
which is better than C. Again, to find out at what distance you need to position the camera so as to see 1.5 meter width, use the same formula mentioned
earlier. This test can be very useful to find out if your camera/lens combination
gives sufficient details. Such measurement is even more informative
in determining the playback quality of a digital video recorder since
there is no objective method of determining compression/decompression quality in CCTV.
Another measurement that can be done with the image of children’s
faces is the face recognition, as defined by the CCTV standards, where
it is required that a person occupy 100% of the picture height, in which
case the face occupies around 15% of the test chart height. The face
sizes are made to fulfill these requirements.
The color of the flash skin of the three kids will also give you a good
indication of your system’s reproduction of Caucasian human flesh
color. In such a case you must take into account the color temperature
of your light source.
14. Video testing
For an even more accurate color test of your camera, use the color scale on the top of the chart, which
are printed colors matching the color bars produced by a typical broadcast test generator. If you have
a vectorscope, you can check the color output on one of the lines scanning the color bar. As with any
color reproduction system, the color temperature of the source is very important and in most cases it
should be a daylight source. Most ATW (Automatic Tracking White) color cameras should be able
to compensate for various temperature light sources. This means that by switching the light between
natural and artificial and following how the color reproduction adjusts indirectly, the ATW capability
of the camera can be tested.
The colors are chosen in accordance with the broadcast television standards, in standard order starting
from the lightest – white, yellow, cyan, green, magenta, red, blue, and black. Using a vectorscope, you
can check accurate color reproduction and white balance on a camera.
The gray background is set to be exactly 30% gray, which, together with the gray scale at the bottom,
can be used to check the gamma setting of the camera/monitor.
This gray scale is a linear one, as opposed to some logarithmic scales you may find. A linear scale is
chosen because the majority of today’s cameras are with linear response, which makes it easy to adjust
various levels on an oscilloscope.
The gray scale can also be used to set up the optimum contrast/brightness of a monitor. The purpose
is to set up brightness and contrast in such a way so as to be able to see all levels of gray. Typically,
lower contrast is better as it gives richer gray scale details and also produces sharper images (the electron beam is thinner). More importantly, monitor phosphor will last much longer when the settings are
made this way. Very often, in order to have optimum monitor settings, external light sources have to
be minimized. Always make an effort not to position a monitor screen facing a bright window.
Finally, we have three different sizes of license plates in the top right-hand corner. The 5% is the minimum requirement, as per CCTV standards, where the characters represent 5% of the test chart height. If
you can clearly read this license plate after it has gone through
your system, perhaps being recorded and played back, that
means your system is compliant with the standards. If you
manage to get clear reading of the 4%, or, better still, the 3%,
your system has even better license plate reading capability.
Understandably, the camera has to be positioned so that it
sees 100% of the test chart.
14. Video testing
Getting the best possible picture
To have the best possible picture setting on a monitor follow these steps:
• Set the camera to 1 VPP video signal, while viewing the full image of the test chart.
• Set the monitor contrast pot in the middle position.
• Set the brightness pot to see all steps of the gray scale.
• While doing the above, readjust the contrast pot if necessary.
• Observe and note the light conditions in the room while setting this up, for this dictates the
contrast/brightness setting combination.
• Always use a minimum amount of light in the monitor room so that you can set the monitor
brightness pot at the lowest position. When this is the case, the sharpness of the electron beam
of the monitor’s CRT is maximum since it uses less electrons. The monitor picture is then not
only sharper, but the lifetime expectancy of the phosphor will be prolonged.
Lately, there have been an increasing number of LCD monitors with composite video inputs. Please
be aware of the re-sampling such monitors perform in order to fill up a composite analog video into a
(typically) XGA screen (1024 × 768 pixels). Because of this, LCD monitors are not recommended for
resolution testing.
If image testing needs to be done using a frame
grabber board on a PC, use the highest possible
resolution you can find, but not less than the full
ITU-601 recommendation (720 × 576 for PAL,
and 720 × 480 for NTSC). Again, in such a case,
native camera resolution testing cannot be performed accurately as signal is digitized by the
frame grabber. If however, various digital video
recorders are to be compared, then the “artificial
(digitized) resolution” can be checked and compared.
And last, but not least, when light illumination is measured, the CCTV Labs test chart has approximately
60% reflectivity. For more accurate lux measuring use industrial light meters with lux scale, such as
Gossen or Minolta models. If this is not available, please refer to Chapter 2 for instructions on how to
use a photographic camera to read luxes.
For the latest updates and instructions on various measurements, please visit the CCTV Labs web site
regularly at
14. Video testing
Measurement of the digital image compression quality
The CCTV Labs test chart can also be used to determine and compare the quality of various digital
compression techniques, regardless of the type. For an objective measurement and comparisons, all
you need to do is to be able to export a still noncompressed image from the digital recorder (usually
bitmap – BMP).
To do this, set up a camera to see 100% of the test chart (use the underscanning monitor). Adjust the
camera and lens parameters to produce the best quality picture (1 VPP video signal, focus adjusted on
the “Focus target,” lens iris at the middle setting).
Record the video signal on the (digital) recording equipment and then play it back and export a selected
image of the full test chart. When exporting, select BMP format for maximum picture quality. BMP
does not compress, in addition to the compression used in the recorder. Copy the file(s) onto a PC and
open them with Photo Editing software (Photo Shop, Photo Paint, and alike). Open all the images that
you want to compare and select full screen display. Switch between various compressions and images
using “Ctrl-Tab” for easiest comparison.
Various compression schemes have various compression artifacts. JPG, for example, produces blockiness in 8 × 8 pixels block sizes, while Wavelet smears the low detail areas, as shown in Chapter 9. The
children’s faces in the test chart is the area where compression quality can be determined easily. Other
parts of the chart, however, will give you other valuable details about a certain picture quality, resulting
from the recording/compression quality. This can only be learned by experimenting.
For more BMP examples visit:
JPG (on the left) and Wavelet (on the right) exports
14. Video testing
The CCTV Labs test pattern generator TPG-8
The CCTV Labs programmable test pattern generator TPG8 is a helpful tool in a variety of cases in closed circuit and
broadcast television where certain properties and qualities
of video signal have to be determined.
Because the TPG-8 is fully programmable (any existing or
custom made test patterns can be inserted), there is no limit
to what can be checked.
The following details, however, are the most common video
signal properties that could be analyzed:
1. Video signal bandwidth (MHz).
2. Linearity, gamma, and age of monitors.
3. Optimizing the contrast and brightness of a video
monitor display.
4. Digital video recording, playback, and export
image quality.
5. Minimum or maximum video signal levels acceptable by a device.
6. Transmission link (device) quality.
7. Ground loop problems.
8. Image distortion after digitization of images.
9. Video signal dynamic range.
10. Video signal impedance matching or line end signal reflection.
11. VCR playback quality.
12. Face identification and recognition capability of a recorder (DVR).
13. Vehicle license plate recognition capability of a recording system.
There are many more custom-designed test patterns that can be created to suit your specific needs.
CCTV Labs encourages development and exchange of such designed test patterns among users. We
will endeavor to include the most interesting on our web site for free download.
14. Video testing
How you could use the TPG-8
In a typical CCTV system installation, if the signal displayed on a monitor is bad, any part of the video
signal path can be “blamed” for it.
Some of the factors that could be the reason for a problem in a CCTV system are:
• The camera itself
• Bad lens setting
• Incorrect back-focus
• Bad cable termination
• Ground loop
• Excessive cable length
• External electromagnetic interferences
• Incorrect amplifier settings
• Bad quality (compression) recording device
• Low-resolution recording device
• Bad monitor, or bad monitor settings, and so on.
This listing makes it quite understandable that for many installers, technicians, or engineers, it is almost
impossible to rectify a problem since the method of elimination could be too costly or time consuming.
This is where the CCTV Labs TPG-8 is an irreplaceable tool.
By inserting the TPG-8 at the camera end, a number of possible problem sources are automatically
eliminated: the camera, the lens, the lens setting, and the possible ground loop. This is because the
TPG-8 is a portable device, always generating constant and precise video levels, and since it is powered
from internal batteries, no ground loops can be created.
Once a test image from the TPG-8 has been inserted at the camera end, the result can be observed,
recorded, analyzed, and compared. Simply insert the same signal pattern, or better still, use another
TPG-8 at the receiving end so that the image qualities can be compared.
Knowing that the TPG-8 always sends the same high-quality video signal, exactly 1 VPP as defined by
PAL or NTSC analog video standards, it can easily be read and compared.
14. Video testing
The best part of the CCTV Labs TPG-8 is that these patterns are fully programmable and customizable.
This means that you are not limited to the eight patterns that came with your TPG-8, but you can create
your own, or download new ones, from the CCTV Labs web site (
Better still, you can customize any pattern and insert your own logo or company name in it. This is an
extremely sophisticated advertising for your company, be that a manufacturer, distributor, consulting,
or installation business.
The TPG-8 comes with the TPG Navigator program on a CD, and all that is required is a standard PC
with USB connectivity. This program can be used to read your eight current patterns loaded into the
TPG-8 memory as well as allow you to write any pattern you want to replace.
There are eight pattern cells, each with sufficient memory allocated to accept a standard resolution digital
signal as defined by the ITU-601 standard (i.e., 720 × 576 pixels), at 24-bit colors (RGB, 16 millions).
These cells can be overwritten with new patterns whenever you want and as per your choice.
The simple procedure of creating a new test pattern is explained further in this manual, and if you are
capable of copying your digital images from your digital camera back to your PC, you should have no
problems doing so with the TPG-8 as well.
TPG-8 buttons description
The CCTV Labs test pattern generator TPG-8 generator is turned on by pressing the On button and
turned off by pressing the Off button.
When the power is on, the front panel green LED is on.
Press buttons P1, P2, ... P8 to switch between the corresponding images stored in the flash-memory
of the generator.
These numbers correspond to the numbers shown in the TPG Navigator, as illustrated further in this manual.
Select your video standard (PAL or NTSC) by pressing the PAL or
NTSC button.
The TPG-8 remembers the last buttons you have pressed, so there is no
need to re-select the video standard or the last pattern you have used.
You can increase or decrease the video output signal level by pressing
the arrows ↑ and ↓. The video level increase or decrease is done in
small steps in tens of millivolts and has to be done by repetitive presses.
Holding the arrow buttons will not continue to increase or decrease the
video level.
14. Video testing
The minimum level, when the TPG-8 is terminated with 75 Ω , will go down to 300 mVPP and maximum to 1.5 VPP . Such extreme levels can be used to verify certain minimum sensitivity or saturation
voltage of various devices (DVRs, monitors) or transmission sets (fiber optics, twisted pair, or RF
transmitters). Should you be using this feature, do not forget to reset the output level to the nominal
value of 1 VPP .
The default video signal level of 1.0 V is set by pressing the button 1.0 V.
The accompanying photos represent the following video output sockets of the generator (left to
• Composite (monochrome) video signal (CVS);
• Composite video burst signal output (CVBS);
• S-Video (Y/C) output.
Typical and most common usage is with the CVBS
The following connections are mounted at the bottom:
• External 5 VDC power supply socket;
• USB cable socket.
In addition, there is a yellow LED next to the USB socket
which flashes when data between a PC and the TPG-8
is exchanged.
The TPG-8 Navigator software
The TPG-8 Navigator is program supplied on the CD together with the generator. This is the software
you require in order to load a different pattern, should you decide to make your own or to use any of
the patterns available from the CCTV Labs web site. This software is designed to run on Windows 98,
2000, or XP machine with USB interface.
The TPG Navigator has the following buttons (selections) and settings:
• Preview all images, shows all images in the TPG-8, in a split-screen mode (when connected
with USB cable).
• Display shows a selected image in small screen. This is quicker than Preview and useful
14. Video testing
when you know which pattern (button) you are
• Read shows a selected image in larger screen.
It takes longer to load but shows better pattern
• Write an image, as the name suggests, writes a
pattern image into the TPG-8. If the image is readable and acceptable by the TPG-8, it will be shown
in preview mode.
When an image is written into the TPG-8, it overwrites
the previous image loaded in the memory cell, of which
the button was pressed (Pict.1 to Pict.8).
Before the pattern is written into the TPG-8, the software shows an image of the selected file and asks
if this needs to be written as PAL or NTSC (also selectable under the System setting).
The Level setting defines the video level that the TPG-8 produces and typically should be left to the
middle setting of 128.
Use the Video Interlace filter for smoother video output.
Show Preview window needs to be ticked if you want to see a preview of the images.
Instruments used with
the TPG-8
Ideally, and for most accurate video measurements, an oscilloscope and a vectorscope should be used with the TPG-8.
In order to be able to see and compare
various aspects and details of the variety
of patterns, an oscilloscope with line selection capability should be used. In order to
compare color reproduction a vectorscope
is recommended.
14. Video testing
Test patterns and how to create them
The video output level produced by the TPG-8 is typically 1VPP , when terminated with 75 Ω. The
actual video signal is recreated from a digital pattern as per the ITU-601 recommendation (i.e., with
720 × 576 pixels for PAL and 720 × 480 for NTSC). These are the resolution limits for analogue TV,
and there is no sense (although technically possible) in having any higher resolution than these.
A technically minded user will notice that none of the above pixel counts has an aspect ratio of 4:3 as
is the actual Standard Definition monitor screen ratio. In PAL, for example, there are 576 active scanning lines and to get the same aspect ratio of 4:3 (assuming square picture elements) we would require
768 horizontal elements (pixels). For the same reason, in NTSC, where there are 480 scanning lines,
we would require 640 horizontal pixels in order to get 4:3. This is where the ITU-601 standard has
actually found “common ground” of 720 horizontal pixels for both PAL and NTSC. In order to get the
appropriate aspect ratio for PAL, the signal is “stretched” horizontally a little bit, whereas in NTSC they
are “compressed” horizontally, so that a 4:3 aspect ratio is obtained. These manipulations are done by
high-precision 10-bits D/A output engine of the TPG-8, so that maximum quality is achieved.
This is a normal procedure in all digital video sources, and for this reason it should be explained clearly
that the resolution of the pattern created in any photo editing program (Photoshop, Photo paint, Paintshop Pro, etc.) can be either 720 × 576 or 768 × 576. In the latter case, the TPG Navigator converts the
pixel count to be 720 × 576, if working within PAL. Similar logic applies if working in NTSC.
Larger pattern sizes of twice, three times, or four times the original is possible (1440 × 1152 for example), and in fact is recommended for better quality, as the TPG uses high-quality Mitchell filtering
to re-sample such a bitmap to the required 720 × 576.
The best file formats to use are the noncompressed TIF or BMP.
When using TIF files, care should be taken not to select LZW lossless compression under the saving
JPG is also accepted by the TPG, but low compression ratios (less than 10×) should be used in order
to minimize compression artifacts.
In all cases, the color space used in the photo application should be RGB only, 8 bits per color. Care
should be taken not to use any other color space, such as CMYK, in order to have accurate color reproduction.
All patterns supplied with the TPG-8 are also available from the CCTV Labs web site, and new ones
will be added as they become available. Please visit for updates.
CCTV Labs reserves the rights to change specifications for further product improvements.
14. Video testing
Dual analog TV standard – PAL, NTSC.
Video signal is compliant with recommendations of ITU-R BT.470-4 “Television Systems”
(former CCIR Report 624).
Video signal analog output: S-Video, Composite.
Possibility to vary the TPG signal voltage in the range: 0.3 VPP–1.5 VPP.
Default level 1.0 VPP, when terminated with 75 Ω.
Test pattern loading via USB.
Maximum test images recorded in the generator 8.
• Built-in 3 × 1.5 V rechargeable batteries;
• External 5 VDC power supply;
• USB PC port power.
Charging the built-in batteries and generator power supply through the mains adapter:
• Source output voltage 5 ± 0.2 V.
• Power consumption minimum 5 W.
Autonomous power supply of the generator is through 3 NiMH batteries, AA type.
Autonomous operating with fully charged built-in batteries: over 3 hours.
Generator Operation range:
• Ambient temperature from 10 to 35º C;
• Max relative moisture 80%;
• Atmosphere pressure range from 84 to 107 kPa (from 630 to 800 mm m.c.).
Generator Dimensions: 190 × 77 × 26 mm3.
Weight: 300 g.
14. Video testing
Some typical connections with the CCTV Labs TPG-8
14. Video testing
Appendix A
Common terms used in CCTV
1080i. One of the resolution specs used in the HDTV. 1080i stands for resolution of 1920 ×
1080 pixels, and the little “i” means that the video is being interlaced. Other common HDTV
resolutions are 720i and 720p.
1080p. Same as above but with progressive scanning.
16:9. A standard TV aspect ratio is 4:3, whereas a Widescreen TV aspect ratio is 16:9. 16:9
fans argue that most movies are shot in a Widescreen aspect ratio, therefore, viewing them on a
Widescreen TV is much better. A lot of DVDs that were shot in Widescreen format are available
in both 16:9 and 4:3, but you will see the entire recorded picture with 16:9 but probably just a
portion of the screen with 4:3. This is known as Pan & Scan, where the area on your TV changes
to whichever part of the recording has the most action in it. Widescreen movies should show
letterboxed on normal 4:3 TVs.
4:3. A standard TV aspect ratio is 4:3. If you look at a standard TV, you will see that it is almost
a square. 4:3 simply means 4 units wide by 3 units tall. Original TV programming shows fine on
a 4:3 TV as decades ago, the Television industry used the Academy Standard for TVs.
720i. One of the resolution specs used in the HDTV. 720i stands for resolution of 1280 × 720
pixels, and the magic little “i” means that the video is in interlaced format. Other common HDTV
resolutions are 1080i and 720p.
720p. One of the resolution specs used in the HDTV. 720p stands for resolution of 1280 × 720
pixels, and the magic little “p” means that the video is in progressive format. Other common
HDTV resolutions are 1080i and 720i.
802.11. A range of IEEE standards covering the usage of wireless internetworking.
Aberration. A term from optics that refers to anything affecting the fidelity of the image in
regards to the original scene.
AC. Alternating current.
Activity detection. Refers to a method built into some multiplexers for detecting movement
within the camera’s field of view (connected to the multiplexer), which is then used to improve
camera recording update rate.
AC/DC. Alternating current/direct current.
A/D (AD). Usually refers to analog-to-digital conversion.
Appendix A: Common terms used in CCTV
ADC. Analog-to-digital conversion. This is usually the very first stage of an electronic device
that processes signals into digital format. The signal can be video, audio, control output, and the
AGC. Automatic gain control. A section in an electronic circuit that has feedback and regulates
a certain voltage level to fall within predetermined margins.
ALC. Automatic light control. A part of the electronics of an automatic iris lens that has a function
similar to backlight compensation in photography.
Aliasing. An occurrence of sampled data interference. This can occur in CCD image projection
of high spatial frequencies and is also known as Moiré patterning. It can be minimized by a
technique known as optical low-pass filtering.
Alphanumeric video generator (also text inserter). A device for providing additional
information, normally superimposed on the picture being displayed; this can range from one or
two characters to full-screen alphanumeric text. Such generators use the incoming video signal
sync pulses as a reference point for the text insertion position, which means if the video signal
is of poor quality, the text stability will also be of poor quality.
Amplitude. The maximum value of a varying waveform.
Analog signal. Representation of data by continuously varying quantities. An analog electrical
signal has a different value of volts or amperes for electrical representation of the original
excitement (sound, light) within the dynamic range of the system.
ANSI. American National Standards Institute.
Anti-aliasing. A procedure employed to eliminate or reduce (by smoothing and filtering) the
aliasing effects.
Aperture. The opening of a lens that controls the amount of light reaching the surface of the
pickup device. The size of the aperture is controlled by the iris adjustment. By increasing the
F-stop number (F/1.4, F/1.8, F/2.8, etc.), less light is permitted to pass to the pickup device.
Apostilb. A photometric unit for measuring luminance where, instead of candelas, lumens are
used to measure the luminous flux of a source.
Archive. Long-term off-line storage. In digital systems, pictures are generally archived onto
some form of hard disk, magnetic tape, floppy disk, or DAT cartridge.
ARP. Address Resolution Protocol.
Artifacts. Undesirable elements or defects in a video picture. These may occur naturally in the
video process and must be eliminated in order to achieve a high-quality picture. The most common
are cross-color and cross-luminance.
Appendix A: Common terms used in CCTV
ASCII. American Standard Code for Information Interchange. A 128-character set that includes
the upper-case and lower-case English alphabet, numerals, special symbols, and 32 control codes.
A 7-bit binary number represents each character. Therefore, one ASCII-encoded character can
be stored in one byte of computer memory.
Aspect ratio. This is the ratio between the width and height of a television or cinema picture
display. The present aspect ratio of the television screen is 4:3, which means four units wide
by three units high. Such an aspect ratio was elected in the early days of television, when the
majority of movies were of the same format. The new, high-definition television format proposes
a 16:9 aspect ratio.
Aspherical lens. A lens that has an aspherical surface. It is harder and more expensive to
manufacture, but it offers certain advantages over a normal spherical lens.
Astigmatism. The uneven foreground and background blur that is in an image.
Asynchronous. Lacking synchronization. In video, a signal is asynchronous when its timing
differs from that of the system reference signal. A foreign video signal is asynchronous before a
local frame synchronizer treats it.
ATM. Asynchronous transfer mode. A transporting and switching method in which information
does not occur periodically with respect to some reference such as a frame pattern.
ATSC. Advanced Television System Committee (think of it as a modern NTSC). An American
committee involved in creating the high definition television standards.
Attenuation. The decrease in magnitude of a wave, or a signal, as it travels through a medium
or an electric system. It is measured in decibels (dB).
Attenuator. A circuit that provides reduction of the amplitude of an electrical signal without
introducing appreciable phase or frequency distortion.
Auto iris (AI). An automatic method of varying the size of a lens aperture in response to changes
in scene illumination.
AWG. American wire gauge. A wire diameter specification based on the American standard. The
smaller the AWG number, the larger the wire diameter (see the reference table in Chapter 5).
Back-focus. A procedure of adjusting the physical position of the CCD-chip/lens to achieve the
correct focus for all focal length settings (especially critical with zoom lenses).
Back porch. 1. The portion of a video signal that occurs during blanking from the end of horizontal
sync to the beginning of active video. 2. The blanking signal portion that lies between the trailing
edge of a horizontal sync pulse and the trailing edge of the corresponding blanking pulse. Color
burst is located on the back porch.
Appendix A: Common terms used in CCTV
Balanced signal. In CCTV this refers to a type of video signal transmission through a twisted
pair cable. It is called balanced because the signal travels through both wires, thus being equally
exposed to the external interference; thus, by the time the signal gets to the receiving end, the
noise will be canceled out at the input of a differential buffer stage.
Balun. This is a device used to match or transform an unbalanced coaxial cable to a balanced
twisted pair system.
Bandwidth. The complete range of frequencies over which a circuit or electronic system can
function with minimal signal loss, usually measured to the point of less than 3 dB. In PAL systems
the bandwidth limits the maximum visible frequency to 5.5 MHz, in NTSC to 4.2 MHz. The ITU
601 luminance channel sampling frequency of 13.5 MHz was chosen to permit faithful digital
representation of the PAL and NTSC luminance bandwidths without aliasing.
Baseband. The frequency band occupied by the aggregate of the signals used to modulate a
carrier before they combine with the carrier in the modulation process. In CCTV the majority
of signals are in the baseband.
Baud. Data rate, named after Maurice Emile Baud, which generally is equal to 1 bit/s. Baud is
equivalent to bits per second in cases where each signal event represents exactly 1 bit. Typically, the
baud settings of two devices must match if the devices are to communicate with one another.
BER. Bit error rate. The ratio of received bits that are in error relative to the total number of bits
received, used as a measure of noise-induced distortion in a digital bit stream. BER is expressed
as a power of 10. For example, a 1 bit error in 1 million bits is a BER of 10–6.
Betamax. Sony’s domestic video recording format, a competitor of VHS.
B-frame. Bidirectionally predictive coded frame (or picture). This terminology is used in MPEG
video compression. The B pictures are predicted from the closest two I (intra) or P (predicted)
pictures, one in the past and one in the future. They are called bi-directional because they refer
to using the past and future images.
Bias. Current or voltage applied to a circuit to set a reference operating level for proper circuit
performance, such as the high-frequency bias current applied to an audio recording head to
improve linear performance and reduce distortion.
Binary. A base 2 numbering system using the two digits 0 and 1 (as opposed to 10 digits [0–9]
in the decimal system). In computer systems, the binary digits are represented by two different
voltages or currents, one corresponding to zero and another corresponding to one. All computer
programs are executed in binary form.
Bipolar. A signal containing both positive-going and negative-going amplitude. May also contain
a zero amplitude state.
B-ISDN. Broadband Integrated Services Digital Network. An improved ISDN, composed of
Appendix A: Common terms used in CCTV
an intelligent combination of more ISDN channels into one that can transmit more data per
Bit. A contraction of binary digit. Elementary digital information that can only be 0 or 1. The
smallest part of information in a binary notation system. A bit is a single 1 or 0. A group of bits,
such as 8 bits or 16 bits, compose a byte. The number of bits in a byte depends on the processing
system being used. Typical byte sizes are 8, 16, and 32.
Bitmap (BMP). A pixel-by-pixel description of an image. Each pixel is a separate element. Also
a computer uncompressed image file format.
Bit rate. B/s = Bytes per second, b/s = bits per second. The digital equivalent of bandwidth, bit
rate is measured in bits per second. If expressed in bytes per second, multiplied with 8 gives bits
per second. It is used to express the data rate at which the compressed bitstream is transmitted.
The higher the bit rate, the more information that can be carried.
Blackburst (color-black). A composite color video signal. The signal has composite sync,
reference burst, and a black video signal, which is usually at a level of 7.5 IRE (50 mV) above
the blanking level.
Black level. A part of the video signal, close to the sync level, but slightly above it (usually 20
mV–50 mV) in order to be distinguished from the blanking level. It electronically represents the
black part of an image, whereas the white part is equivalent to 0.7 V from the sync level.
Blanking level. The beginning of the video signal information in the signal’s waveform. It resides
at a reference point taken as 0 V, which is 300 mV above the lowest part of the sync pulses. Also
known as pedestal, the level of a video signal that separates the range that contains the picture
information from the range that contains the synchronizing information.
Blooming. The defocusing of regions of a picture where brightness is excessive.
Bluetooth. A wireless data standard, used in a variety of electronic devices for close proximity
interconnection (see Chapter 11).
BNC. Bayonet-Neil-Concelman connector. It is the most popular connector in CCTV and broadcast
TV for transmitting a basic bandwidth video signal over an RG-59 type coaxial cable.
B-picture. Bidirectionally predictive coded picture. This terminology is used in MPEG video
compression. The B pictures are predicted from the closest two I (intra) or P (predicted) pictures,
one in the past and one in the future. They are called bi-directional because they refer to using
the past and future images.
Braid. A group of textile or metallic filaments interwoven to form a tubular structure that may
be applied over one or more wires or flattened to form a strap.
Bridge (network). A more “intelligent” data communications device that connects and enables
data packet forwarding between homogeneous networks.
Appendix A: Common terms used in CCTV
Brightness. In NTSC and PAL video signals, the brightness information at any particular instant
in a picture is conveyed by the corresponding instantaneous DC level of active video. Brightness
control is an adjustment of setup (black level, black reference).
Burst (color burst). Seven to nine cycles (NTSC) or ten cycles (PAL) of subcarrier placed near
the end of horizontal blanking to serve as the phase (color) reference for the modulated color
subcarrier. Burst serves as the reference for establishing the picture color.
Bus. In computer architecture, a path over which information travels internally among various
components of a system and is available to each of the components.
Byte. A digital word made of 8 bits (zeros and ones).
Cable equalization. The process of altering the frequency response of a video amplifier to
compensate for high-frequency losses in coaxial cable.
CAD. Computer-aided design. This usually refers to a design of system that uses computer
specialized software.
Candela [cd]. A unit for measuring luminous intensity. One candela is approximately equal to
the amount of light energy generated by an ordinary candle. Since 1948 a more precise definition
of a candela has become: “the luminous intensity of a black body heated up to a temperature at
which platinum converges from a liquid state to a solid.”
CATV. Community antenna television.
C-band. A range of microwave frequencies, 3.7–4.2 GHz, commonly used for satellite
CCD. Charge-coupled device. The new age imaging device, replacing the old tubes. When first
invented in the 1970s, it was initially intended to be used as a memory device. Most often used
in cameras, but also in telecine, fax machines, scanners, and so on.
CCD aperture. The proportion of the total area of a CCD chip that is photosensitive.
CCIR. Committée Consultatif International des Radiocommuniqué or, in English, Consultative
Committee for International Radio, which is the European standardization body that has set the
standards for television in Europe. It was initially monochrome; therefore, today the term CCIR
usually refers to monochrome cameras that are used in PAL countries.
CCIR 601. An international standard (now renamed to ITU 601) for component digital television
that was derived from the SMPTE RP1 25 and EBU 3246E standards. ITU 601 defines the
sampling systems, matrix values, and filter characteristics for Y, Cr, Cb, and RGB component
digital television. It establishes a 4:2:2 sampling scheme at 13.5 MHz for the luminance channel
and 6.75 MHz for the chrominance channels with 8-bit digitizing for each channel. These
sample frequencies were chosen because they work for both 525-line 60 Hz and 625-line 50 Hz
Appendix A: Common terms used in CCTV
component video systems. The term 4:2:2 refers to the ratio of the number of luminance channel
samples to the number of chrominance channel samples; for every four luminance samples, each
chrominance channels is sampled twice.
CCIR 656. The international standard (now renamed to ITU 656) defining the electrical and
mechanical interfaces for digital television equipment operating according to the ITU 601 standard.
ITU 656 defines both the parallel and serial connector pinouts, as well as the blanking, sync, and
multiplexing schemes used in both parallel and serial interfaces.
CCTV. Closed circuit television. Television system intended for only a limited number of viewers,
as opposed to broadcast TV.
CCTV camera. A unit containing an imaging device that produces a video signal in the basic
CCTV installation. A CCTV system, or an associated group of systems, together with all necessary
hardware, auxiliary lighting, etc., located at the protected site.
CCTV system. An arrangement comprised of a camera and lens with all ancillary equipment
required for the surveillance of a specific protected area.
CCVE. Closed circuit video equipment. An alternative acronym for CCTV.
CD. Compact disc. A media standard as proposed by Philips and Sony, where music and data
are stored in digital format.
CD-ROM. Compact disk read only memory. The total capacity of a CD-ROM when storing data
can be 640 MB or 700 MB.
CDS. Correlated double sampling. A technique used in the design of some CCD cameras that
reduces the video signal noise generated by the chip.
CFA. Color filter array. A set of optical pixel filters used in single-chip color CCD cameras to
produce the color components of a video signal.
Chip. An integrated circuit in which all the components are micro-fabricated on a tiny piece of
silicon or similar material.
Chroma crawl. An artifact of encoded video, also known as dot crawl or cross-luminance,
Occurs in the video picture around the edges of highly saturated colors as a continuous series of
crawling dots and is a result of color information being confused as luminance information by
the decoder circuits.
Chroma gain (chroma, color, saturation). In video, the gain of an amplifier as it pertains to the
intensity of colors in the active picture.
Chroma key (color key). A video key effect in which one video signal is inserted in place of
Appendix A: Common terms used in CCTV
areas of a particular color in another video signal.
Chrominance. The color information of a color video signal.
Chrominance-to-luminance intermodulatlon (crosstalk, cross-modulation). An undesirable
change in luminance amplitude caused by superimposition of some chrominance information on
the luminance signal. Appears in a TV picture as unwarranted brightness variations caused by
changes in color saturation levels.
CIE. Commission Internationale de l’Eclairagé. This is the International Committee for Light,
established in 1965. It defines and recommends light units.
CIF. Common Interchange Format, refers to digitized image with pixel count of 352 × 288 (or
240) pixels.
Cladding. The outer part of a fiber optics cable, which is also a fiber but with a smaller material
density than the center core. It enables a total reflection effect so that the light transmitted through
the internal core stays inside.
Clamping (DC). The circuit or process that restores the DC component of a signal. A video
clamp circuit, usually triggered by horizontal synchronizing pulses, reestablishes a fixed DC
reference level for the video signal. A major benefit of a clamp is the removal of low-frequency
interference, especially power line hum.
Clipping Level. An electronic limit to avoid overdriving the video portion of the television
C-mount. The first standard for CCTV lens screw mounting. It is defined with the thread of 1''
(2.54 mm) in diameter and 32 threads/inch, and the back flange-to-CCD distance of 17.526 mm
(0.69''). The C-mount description applies to both lenses and cameras. C-mount lenses can be put
on both, C-mount and CS-mount cameras; only in the latter case an adaptor is required.
CMYK. Cyan, magenta, yellow, and black. A color encoding system used by printers in which
colors are expressed by the “subtractive primaries” (cyan, magenta, and yellow) plus black (called
K). The black layer is added to give increased contrast and range on printing presses.
Coaxial cable. The most common type of cable used for copper transmission of video signals. It
has a coaxial cross section, where the center core is the signal conductor, while the outer shield
protects it from external electromagnetic interference.
Codec. Code/Decode. An encoder plus a decoder is an electronic device that compresses and
decompresses digital signals. Codecs usually perform A/D and D/A conversion.
Color bars. A pattern generated by a video test generator, consisting of eight equal-width color
bars. Colors are white (75%), black (7.5% setup level), 75% saturated pure colors red, green,
and blue, and 75% saturated hues of yellow, cyan, and magenta (mixtures of two colors in 1:1
ratio without third color).
Appendix A: Common terms used in CCTV
Color carrier. The subfrequency in a color video signal (4.43 MHz for PAL) that is modulated
with the color information. The color carrier frequency is chosen so that its spectrum interleaves
with the luminance spectrum with minimum interference.
Color difference signal. A video color signal created by subtracting luminance and/or color
information from one of the primary color signals (red, green, or blue). In the Betacam color
difference format, for example, the luminance (Y) and color difference components (R–Y and
B–Y) are derived as follows:
Y = 0.3 Red + 0.59 Green + 0.11 Blue
R–Y = 0.7 Red – 0.59 Green – 0.11 Blue
B–Y = 0.89 Blue – 0.59 Green – 0.3 Red
The G-V color difference signal is not created because it can be reconstructed from the other
three signals. Other color difference conventions include SMPTE, EBU-N1 0, and MII. Color
difference signals should not be referred to as component video signals. That term is reserved
for the RGB color components. In informal usage, the term component video is often used to
mean color difference signals.
Color field. In the NTSC system, the color subcarrier is phase-locked to the line sync so that on
each consecutive line, the subcarrier phase is changed 180º with respect to the sync pulses. In the
PAL system, color subcarrier phase moves 90º every frame. In NTSC this creates four different
field types, while in PAL there are eight. In order to make clean edits, alignment of color field
sequences from different sources is crucial.
Color frame. In color television, four (NTSC) or eight (PAL) properly sequenced color fields
compose one color frame.
Color phase. The timing relationship in a video signal that is measured in degrees and keeps the
hue of a color signal correct.
Color subcarrier. The 3.58 MHz for NTSC, and 4.43 MHz for PAL signal that carries color
information. This signal is superimposed on the luminance level. Amplitude of the color subcarrier
represents saturation, and phase angle represents hue.
Color temperature. Indicates the hue of the color. It is derived from photography where the
spectrum of colors is based on a comparison of the hues produced when a black body (as in
Physics) is heated from red through yellow to blue, which is the hottest. Color temperature
measurements are expressed in Kelvin degrees.
Comb filter. An electrical filter circuit that passes a series of frequencies and rejects the frequencies
in between, producing a frequency response similar to the teeth of a comb. Used on encoded
video to select the chrominance signal and reject the luminance signal, thereby reducing crosschrominance artifacts or, conversely, to select the luminance signal and reject the chrominance
Appendix A: Common terms used in CCTV
signal, thereby reducing cross-luminance artifacts. Introduced in the S-VHS concept for a better
luminance resolution.
Composite sync. A signal consisting of horizontal sync pulses, vertical sync pulses, and equalizing
pulses only, with a no-signal reference level.
Composite video signal. A signal in which the luminance and chrominance information has been
combined using one of the coding standards NTSC, PAL, SECAM, and so on.
Concave lens. A lens that has negative focal length – the focus is virtual, and it reduces the
Contrast. A common term used in reference to the video picture dynamic range – the difference
between the darkest and the brightest parts of an image.
Convex lens. A convex lens has a positive focal length – the focus is real. It is usually called
magnifying glass, since it magnifies the objects.
CPU. Central processing unit. A common term used in computers.
CRO. Cathode ray oscilloscope. See Oscilloscope.
Crosstalk. A type of interference or undesired transmission of signals from one circuit into another
circuit in the same system. Usually caused by unintentional capacitance (AC coupling).
CS-Mount. A newer standard for lens mounting. It uses the same physical thread as the C-mount,
but the back flange-to-CCD distance is reduced to 12.5 mm in order to have the lenses made
smaller, more compact, and less expensive. CS-mount lenses can only be used on CS-mount
CS-to-C-mount adaptor. An adaptor used to convert a CS-mount camera to C-mount to
accommodate a C-mount lens. It looks like a ring 5 mm thick, with a male thread on one side
and a female on the other, with 1'' diameter and 32 threads/inch. It usually comes packaged with
the newer type (CS-mount) of cameras.
CVBS. Composite video bar signal. In broadcast television this refers to the video signal, including
the color information and syncs.
D/A (also DA). Opposite to A/D, that is, digital to analog conversion.
Dark current. Leakage signal from a CCD sensor in the absence of incident light.
Dark noise. Noise caused by the random (quantum) nature of the dark current.
DAT (digital audio tape). A system initially developed for recording and playback of digitized
audio signals, maintaining signal quality equal to that of a CD. Recent developments in hardware
and software might lead to a similar inexpensive system for video archiving, recording, and
Appendix A: Common terms used in CCTV
dB. Decibel. A logarithmic ratio of two signals or values, usually refers to power, but also voltage
and current. When power is calculated, the logarithm is multiplied by 10, while for current and
voltage by 20.
DBS. Direct broadcast satellite. Broadcasting from a satellite directly to a consumer user, usually
using a small aperture antenna.
DC. Direct current. Current that flows in only one direction, as opposed to AC.
DCT. Discrete cosine transform. Mathematical algorithm used to generate frequency representations
of a block of video pixels. The DCT is an invertible, discrete orthogonal transformation between
the time and frequency domain. It can be either forward discrete cosine transform (FDCT) or
inverse discrete cosine transform (IDCT).
Decoder. A device used to recover the component signals from a composite (encoded) source.
Degauss. To demagnetize. Most often found on CRT monitors.
Delay line. An artificial or real transmission line or equivalent device designed to delay a wave
or signal for a specific length of time.
Demodulator. A device that strips the video and audio signals from the carrier frequency.
Depth of field. The area in front of and behind the object in focus that appears sharp on the screen.
The depth of field increases with the decrease of the focal length – the shorter the focal length
the wider the depth of field. The depth of field is always wider behind the objects in focus.
DHCP. Dynamic Host Configuration Protocol.
Dielectric. An insulating (nonconductive) material.
Differential gain. A change in the subcarrier amplitude of a video signal caused by a change in
the luminance level of the signal. The resulting TV picture will show a change in color saturation
caused by a simultaneous change in picture brightness.
Differential phase. A change in the subcarrier phase of a video signal caused by a change in
the luminance level of the signal. The hue of colors in a scene changes with the brightness of
the scene.
Digital disk recorder. A system that allows the recording of video images on a digital disk.
Digital signal. An electronic signal whereby every different value from the real-life excitation
(sound, light) has a different value of binary combinations (words) that represent the analog
DIN. Deutsche Industrie-Normen. Germany’s standard.
Appendix A: Common terms used in CCTV
Disk. A flat circular plate, coated with a magnetic material, on which data may be stored by
selective magnetization of portions of the surface. May be a flexible, floppy disk or a rigid hard
disk. It could also be a plastic compact disk (CD) or digital video disk (DVD).
Distortion. Nonproportional representation of an original.
DMD. Digital micro-mirror device. A new video projection technology that uses chips with a large
number of miniature mirrors, whose projection angle can be controlled with digital precision.
DOS. Disk operating system. A software package that makes a computer work with its hardware
devices such as hard drive, floppy drive, screen, and keyboard.
Dot pitch. The distance in millimeters between individual dots on a monitor screen. The smaller
the dot pitch the better, since it allows for more dots to be displayed and better resolution. The dot
pitch defines the resolution of a monitor. A high-resolution CCTV or computer monitor would
have a dot pitch of less than 0.3 mm.
Drop-frame time code. SMPTE time code format that continuously counts 30 frames per
second but drops two frames from the count every minute except for every tenth minute (drops
108 frames every hour) to maintain the synchronization of time code with clock time. This is
necessary because the actual frame rate of NTSC video is 29.94 frames per second rather than
an even 30 frames.
DSP. Digital signal processing. It usually refers to the electronic circuit section of a device capable
of processing digital signals.
Dubbing. Transcribing from one recording medium to another.
Duplex. A communication system that carries information in both directions is called a duplex
system. In CCTV, duplex is often used to describe the type of multiplexer that can perform two
functions simultaneously, recording in multiplex mode and playback in multiplex mode. It can also
refer to duplex communication between a matrix switcher and a PTZ site driver, for example.
D-VHS. A new standard proposed by JVC for recording digital signals on a VHS video
DV-Mini. Mini digital video. A new format for audio and video recording on small camcorders,
adopted by the majority of camcorder manufacturers. Video and sound are recorded in a digital
format on a small cassette (66 × 48 × 12 mm), superseding S-VHS and Hi 8 quality.
Dynamic range. The difference between the smallest amount and the largest amount that a
system can represent.
EBU. European Broadcasting Union.
EDTV. Enhanced (Extended) definition television. Usually refers to the progressive scan
Appendix A: Common terms used in CCTV
transmission of NTSC (also referred to as 480p) and PAL (also referred to as 576p).
EIA. Electronics Industry Association, which has recommended the television standard used in
the United States, Canada, and Japan, based on 525 lines interlaced scanning. Formerly known
as RMA or RETMA.
Encoder. A device that superimposes electronic signal information on other electronic signals.
Encryption. The rearrangement of the bit stream of a previously digitally encoded signal in a
systematic fashion to make the information unrecognizable until restored upon receipt of the
necessary authorization key. This technique is used for securing information transmitted over
a communication channel with the intent of excluding all other than authorized receivers from
interpreting the message. Can be used for voice, video, and other communications signals.
ENG camera. Electronic News Gathering camera. Refers to CCD cameras in the broadcast
EPROM. Erasable and programmable read only memory. An electronic chip used in many
different security products that stores software instructions for performing various operations.
Equalizer. Equipment designed to compensate for loss and delay frequency effects within a
system. A component or circuit that allows for the adjustment of a signal across a given band.
Ethernet. A local area network used for connecting computers, printers, workstations, terminals,
and so on, within the same building. Ethernet operates over twisted wire and coaxial cable at
speeds up to 10 Mbps. Ethernet specifies a CSMA/CD (carrier sense multiple access with collision
detection). CSMA/CD is a technique of sharing a common medium (wire, coaxial cable) among
several devices.
External synchronization. A means of ensuring that all equipment is synchronized to the one
FCC. Federal Communications Commission (US).
FFT. Fast Fourier Transformation.
Fiber optics. A technology designed to transmit signals in the form of pulses of light. Fiber
optic cable is noted for its properties of electrical isolation and resistance to electrostatic and
electromagnetic interference.
Field. Refers to one-half of the TV frame that is composed of either all odd or even lines. In
CCIR systems each field is composed of 625/2 = 312.5 lines, in EIA systems 525/2 = 262.5 lines.
There are 50 fields/second in CCIR/PAL and 60 in the EIA/NTSC TV system.
Film recorder. A device for converting digital data into film output. Continuous tone recorders
produce color photographs as transparencies, prints, or negatives.
Appendix A: Common terms used in CCTV
Fixed focal length lens. A lens with a predetermined fixed focal length, a focusing control, and
a choice of iris functions.
Flash memory. Nonvolatile, digital storage. Flash memory has slower access than SRAM or
Flicker. An annoying picture distortion, mainly related to vertical syncs and video fields display.
Some flicker normally exists due to interlacing; more apparent in 50 Hz systems (PAL). Flicker
also shows when static images are displayed on the screen such as computer-generated text
transferred to video. Poor digital image treatment, found in low-quality system converters (going
from PAL to NTSC and vice versa), creates an annoying flicker on the screen. There are several
electronic methods to minimize flicker.
F-number. In lenses with adjustable irises, the maximum iris opening is expressed as a ratio
(focal length of the lens)/(maximum diameter of aperture). This maximum iris will be engraved
on the front ring of the lens.
Focal length. The distance between the optical center of a lens and the principal convergent
focus point.
Focusing control. A means of adjusting the lens to allow objects at various distances from the
camera to be sharply defined.
Foot-candela. An illumination light unit used mostly in American CCTV terminology. It equals
10 times (more precisely, 9.29) the illumination value in luxes.
Fourier Transformation. Mathematical transformation of time domain functions into frequency
Frame. Refers to a composition of lines that make one TV frame. In CCIR/PAL TV system
one frame is composed of 625 lines, while in EIA/NTSC TV system of 525 lines. There are 25
frames/second in the CCIR/PAL and 30 in the EIA/NTSC TV system. (See also Field.)
Frame-interline transfer (FIT). Refers to one of the few principles of charge transfer in CCD
chips. The other two are interline and frame transfer.
Frame store. An electronic device that digitizes a TV frame (or TV field) of a video signal and
stores it in memory. Multiplexers, fast scan transmi