Technical guide to network video.

Technical guide to network video.
Technical guide to network video.
Technologies and factors to consider for the successful deployment
of IP-based security surveillance and remote monitoring applications.
Axis’ technical guide to network video
The market for network video products has grown tremendously since Axis introduced the
industry’s first network camera in 1996. The rapid deployment of network video indicates an
irreversible shift from old, analog video technologies as network video advances with ever more
effective, innovative and easier to use products.
Huge strides have been made in video quality. HDTV surveillance cameras are becoming the
norm and more megapixel cameras are being introduced. There are cameras that can handle
challenging lighting conditions such as low light, high contrast lighting and total darkness,
enabling improved surveillance capability. Processors in cameras and video encoders are not
only faster but also smarter. In addition, efficient video compression techniques as well as a new
type of iris control, P-Iris, have been introduced.
There are more product choices to meet a variety of needs. There are smaller, more discreet—
even covert—cameras, as well as thermal network cameras. Different fields of view, from
telephoto to 360° panorama, are available. Axis’ product development has also focused on easy
and flexible installation. Outdoor cameras, for example, are weather-proofed right out of the
box. Virtually all Axis cameras and video encoders support Power over Ethernet, which simplifies
installation. Many varifocal fixed cameras (box and dome) allow the focus and angle of view to
be remotely set from a computer. Many fixed cameras also have the ability to stream vertically
oriented views that maximize coverage of vertical areas such as aisles and hallways.
Managing cameras and video streams are being made easier. There is increased support for
intelligent video functionalities. There are also video management solutions to suit every type of
customer—whether it is a retail store with a few cameras or one involving hundreds of cameras
at multiple sites. Products that support ONVIF can be easily integrated into systems that
incorporate other ONVIF-conformant products from different manufacturers.
Greater network bandwidth is becoming more commonplace, and technologies have improved
to make the transmission of data over wired and wireless networks safer and more robust.
Progress has also been made in storage solutions, especially for small systems. Available today
are high capacity network-attached storage (NAS) solutions that provide terabytes of storage
at minimal costs and memory cards that enable weeks’ worth of video to be stored in a
camera or video encoder.
The range of network video products is widening and the scope of their capabilities is increasing.
This is reflected in the Technical Guide, which aims to provide network video users with a better
understanding of the technologies and products that are available to meet their surveillance
Table of contents
Network video: overview, benefits and applications 1.1 Overview of a network video system
1.2 Benefits
1.3 Applications
1.3.1 Retail
1.3.2 Transportation
1.3.3 Banking and finance
1.3.4 City surveillance
1.3.5 Education
1.3.6 Government
1.3.7 Healthcare
1.3.8 Industrial
1.3.9 Critical infrastructure
Network cameras 2.1 What is a network camera?
2.1.1 AXIS Camera Application Platform
2.1.2 Application programming interface
2.2 Camera features for handling difficult scenes
2.2.1 Lens’ light gathering ability (F-number)
2.2.2 Iris 2.2.3 Day/night functionality
2.2.4 Infrared (IR) illuminators
2.2.5 Lightfinder technology
2.2.7 Exposure control settings
2.2.8 Wide dynamic range (WDR)
2.2.9 Thermal radiation
2.3 Camera features for ease of installation
2.3.2 Focused at delivery
2.3.3 Remote focus and zoom
2.3.4 Remote back focus
2.3.5 3-axis camera angle adjustment
2.3.6 Corridor Format
2.3.7 Pixel counter
2.4 Types of network cameras
2.4.1 Fixed network cameras
2.4.2 Fixed dome network cameras
2.4.3 Functionalities in multi-megapixel fixed and
fixed dome cameras
2.4.4 Covert network cameras
2.4.5 PTZ network cameras
2.4.6 Thermal network cameras
2.5 Guidelines for selecting a network camera
Camera elements
Light sensitivity
Lens elements
3.2.1 Field of view
3.2.2 Matching lens and sensor
3.2.3 Lens mount standards for exchangeable lens
3.2.4 F-number and exposure
3.2.5 Types of iris control
3.2.6 Depth of field
Removable IR-cut filter
Image sensors
Image scanning techniques
3.5.1 Interlaced scanning
3.5.2 Progressive scanning
Exposure control
3.6.1 Exposure priority
3.6.2 Exposure zones
3.6.3 Dynamic range
3.6.4 Backlight compensation
Installing a network camera
Video encoders
What is a video encoder?
4.1.1 Video encoder compenents and considerations
4.1.2 Event management and intelligent video
Standalone video encoders
Rack-mounted video encoders
Video encoders with analog PTZ cameras
Deinterlacing techniques
Video decoder
Environmental protection
Protection and ratings
External housings
Transparent coverings
Positioning a fixed camera in a housing
Vandal and tampering protection
5.5.1 Vandal-resistant ratings
5.5.2 Camera/housing design
5.5.3 Mounting
5.5.4 Camera placement
5.5.5 Intelligent video
Types of mounting
5.6.1 Ceiling mounts
5.6.2 Wall mounts
5.6.3 Pole mounts
5.6.4 Parapet mounts
Video resolutions
NTSC and PAL resolutions
VGA resolutions
Megapixel resolutions
High-definition television (HDTV) resolutions
Video compression
Compression basics
7.1.1 Video codec
7.1.2 Image compression vs. video compression
Compression formats
7.2.1 Motion JPEG
7.2.2 MPEG-4
7.2.3 H.264 or MPEG-4 Part 10/AVC
Variable and constant bit rates
Comparing standards
Audio applications
Audio support and equipment
Audio modes
Half duplex
Full duplex
Audio detection alarm
Audio compression
Sampling frequency
Bit rate
Audio codecs
Audio and video synchronization
Network technologies
Local area network and Ethernet
9.1.1 Types of Ethernet networks
9.1.2 Connecting network devices and network switch
9.1.3 Power over Ethernet
Sending data over the Internet
9.2.1 IP addressing
9.2.2 Data transport protocols for network video
Quality of Service
Network security
9.5.1 User name and password authentication
9.5.2 IP address filtering
9.5.3 IEEE 802.1X
9.5.4 HTTPS or SSL/TLS
9.5.5 VPN (Virtual Private Network)
Wireless technologies
10.1 802.11 WLAN standards
10.2 WLAN Security
10.2.1 WEP (Wired Equivalent Privacy)
10.2.2 Wi-Fi Protected Access
10.2.3 Recommendations
10.3 Wireless bridges
10.4 Wireless mesh network
Video management systems
11.1 Types of video management solutions
11.1.1 Decentralized solution for small systems AXIS Camera Companion
11.1.2 Hosted video solution for businesses with many
small sites
11.1.3 Centralized, general client-server solution for
medium-sized systems - AXIS Camera Station
11.1.4 Customized solutions for small to big systems
from Axis’ partners
11.2 System features
11.2.1 Viewing
11.2.2 Multi-streaming
11.2.3 Video recording
11.2.4 Recording and storage
11.2.5 Event management and intelligent video
11.2.6 Administration and management features
11.2.7 Security
11.3 Integrated systems
11.3.1 Point of Sale
11.3.2 Access control
11.3.3 Building management
11.3.4 Industrial control systems
11.3.5 RFID
Bandwidth and storage considerations
12.1 Bandwidth and storage calculations
12.1.1 Bandwidth needs
12.1.2 Calculating storage needs
12.2 Edge storage
12.2.1 Edge storage with SD cards or NAS
12.3 Server-based storage
12.4 NAS and SAN
12.5 Redundant storage
12.6 System configurations
Tools and resources
Axis Communications’ Academy
Network video: overview, benefits and applications - CHAPTER 1
Network video: overview, benefits and
Network video, like many other kinds of communications such as e-mail, web browsing and computer telephony, is conducted over wired or wireless IP (Internet Protocol)
networks. Digital video and audio streams, as well as other data, are communicated
over the same network infrastructure. Network video provides users, particularly in
the security surveillance industry, with many advantages over traditional analog CCTV
(closed-circuit television) systems.
This chapter provides an overview of network video, as well as its benefits and
applications in various industry segments. Comparisons with an analog video
surveillance system are often made to provide a better understanding of the scope and
potential of a digital, network video system.
Overview of a network video system
Network video, often also called IP-based video surveillance or IP surveillance as it is applied in
the security industry, uses a wired or wireless IP network as the backbone for transporting
digital video, audio and other data. When Power over Ethernet (PoE) technology is applied, the
network can also be used to carry power to network video products.
A network video system allows video to be monitored and recorded from anywhere on the
network, whether it is, for instance, on a local area network (LAN) or a wide area network (WAN)
such as the Internet.
CHAPTER 1 - Network video: overview, benefits and applications
Axis network cameras
Axis video encoders
0 -
FNP 30
100-240 AC
50-50 Hz
4-2 A
0 -
FNP 30
AXIS Q7900 Rack
50-50 Hz
4-2 A
AXIS Q7406
Video Encoder
AXIS Q7406
Video Encoder
Analog cameras
Computer with
video management
Remote access from
office/home computer
with web browser
Figure 1.1a A network video system comprises many different components, such as network cameras, video
encoders and video management software. The other components including the network, storage and servers are all
standard IT equipment.
The core components of a network video system consist of the network camera, the video
encoder (used to connect analog cameras to an IP network), the network, the server and storage,
and video management software. As the network camera and the video encoder are computerbased equipment, they have capabilities that cannot be matched by an analog CCTV camera. The
network camera, the video encoder and the video management software are considered the
cornerstones of an IP surveillance solution.
The network, the server and storage components involve standard IT equipment. The ability to use
common off-the-shelf equipment is one of the main benefits of network video. Other components
of a network video system include accessories, such as mountings, PoE midspans and joysticks.
Each network video component is covered in more detail in other chapters.
A fully digital, network video surveillance system provides a host of benefits and advanced
functionalities that cannot be provided by a traditional analog video surveillance system. The
advantages include high image quality, remote accessibility, event management and intelligent
video capabilities, easy integration possibilities and better scalability, flexibility and
High image quality: In a video surveillance application, high image quality is essential to
be able to clearly capture an incident in progress and identify persons or objects involved.
With progressive scan and HDTV/megapixel technologies, a network camera can deliver
better image quality and higher resolution than an analog camera. For more on image
quality, see chapters 2, 3 and 6.
Network video: overview, benefits and applications - CHAPTER 1
Image quality can also be more easily retained in a network video system than in an analog
surveillance system. With today’s analog systems that use a digital video recorder (DVR) as
the recording medium, many analog-to-digital conversions take place: first, analog signals
are converted to digital in the camera and then back to analog for transportation; then the
analog signals are digitized for recording. Captured images are degraded with every
conversion between analog and digital formats and with the cabling distance. The further
the analog video signals have to travel, the weaker they become. In a fully digital IP
surveillance system, images from a network camera are digitized once and they stay digital
with no unnecessary conversions and no image degradation due to distance traveled over a
Remote accessibility: Network cameras and video encoders can be configured and accessed
remotely, enabling multiple, authorized users to view live and recorded video at any time and
from virtually any networked location in the world. This is advantageous if users would like
a third-party company, such as an alarm monitoring center or law enforcement, to also
gain access to the video.
> Event management and intelligent video: There is often too much video recorded and lack
of time to properly analyze them. Network video products can address this problem in a few
ways. Network cameras and video encoders, for instance, can be programmed to send videos
for recording only when an event, whether scheduled or triggered, occurs. This would reduce
the amount of uninteresting recordings. Video recordings can also be tagged with certain
information called metadata to make it easier to search for and analyze videos that are of
Axis network video products support intelligent video functionalities (for example, video
motion detection, active tampering alarm, audio detection, tripwire and third-party
applications such as people counting and heat mapping). They may also provide I/O
(input/output) connections to external devices such as lights. These features allow users to
define the conditions or event triggers for an alarm. When an event is met, the products can
automatically respond with programmed actions. Configurable actions may include video
recording to one or more sites, whether local and/or off-site for security purposes;
activating external devices such as alarms, lights and door position switches; and sending
notification messages to users. Event management functionalities can be configured using
the network video product’s web pages or using a video management software program. For
more on video management, see Chapter 11.
10 CHAPTER 1 - Network video: overview, benefits and applications
Figure 1.2a Setting up an event trigger using the network video product’s web page.
Easy, future-proof integration: Network video products based on open standards can be
easily integrated into a wide array of video management systems. Video from a network
camera can also be integrated into other systems such as point of sales, access control or
a building management system. An analog system, on the other hand, rarely has an open
interface for easy integration with other systems and applications. For more on integrated
systems, see Chapter 11.
Scalability and flexibility: A network video system can grow with a user’s needs—one
camera at a time, while analog systems can often only grow in steps of four or 16 at a time.
IP-based systems provide a means for network video products and other types of applications
to share the same wired or wireless network for communicating data. Video, audio, PTZ
and I/O commands, power and other data can be carried over the same cable and any number
of network video products can be added to the system without significant or costly changes
to the network infrastructure. This is not the case with an analog system. In an analog video
system, a dedicated cable (normally coax) must run directly from each camera to a viewing/
recording station. Separate pan/tilt/zoom (PTZ) and audio cables may also be required.
Network video products can also be placed and networked from virtually any location, and
the system can be as open or as closed as desired. Since a network video system is based on
standard IT equipment and protocols, it can benefit from those technologies as the system
grows. For instance, video can be stored on redundant servers placed in separate locations to
increase reliability, and tools for automatic load sharing, network management and system
maintenance can be used—none of which is possible with analog video.
Cost-effectiveness: An IP surveillance system typically has a lower total cost of ownership
than a traditional analog CCTV system. An IP network infrastructure is often already in
place and used for other applications within an organization, so a network video application
can piggyback off the existing infrastructure. IP-based networks and wireless options are
Network video: overview, benefits and applications - CHAPTER 1 11
also much less expensive alternatives than traditional coaxial and fiber cabling for an
analog CCTV system. In addition, digital video streams can be routed around the world using
a variety of interoperable infrastructure. Management and equipment costs are also lower
since back-end applications and storage run on industry standard, open systems-based
servers, not on proprietary hardware such as a DVR in the case of an analog CCTV system.
A network video system may also provide insights into ways of improving a business. For
example, in retail applications, implementing network video analytics may help improve
customer flow and enhance sales.
Furthermore, network video products can support Power over Ethernet technology. PoE
enables networked devices to receive power from a PoE-enabled switch or midspan through
the same Ethernet cable that transports data (video). There is, therefore, no need for a power
outlet right at the camera location. PoE provides substantial savings in installation costs and
can increase the reliability of the system. For more on PoE, see Chapter 9.
Network camera
with built-in PoE
camera without
built-in PoE
Uninterruptible Power
Supply (UPS)
PoE-enabled switch
Active splitter
Power over Ethernet
Figure 1.2b A system that uses Power over Ethernet.
Secure communication: Network video products as well as the video streams can be
secured in many ways. They include user name and password authentication, IP address
filtering, authentication using IEEE 802.1X, and data encryption using HTTPS (SSL/TLS) or
VPN. There is no encryption capability in an analog camera and no authentication
possibilities. Anyone can tap into the video or replace the signal from an analog camera with
another video signal. Network video products also have the flexibility to provide multiple
user access levels. For more on network security, see chapters 9 and 10.
Existing analog video installations, however, can migrate to a network video system and take
advantage of some of the digital benefits with the help of video encoders and such devices as
Ethernet over coax adapter, which makes use of legacy coax cables. For more on video encoders
and decoders, see Chapter 4.
12 CHAPTER 1 - Network video: overview, benefits and applications
Network video can be used in an almost unlimited number of applications. Most of its uses fall
under security surveillance or remote monitoring of people, places, property and operations.
Increasingly, network video is also being used to improve business efficiency as the number of
intelligent video applications grows. The following are some typical application possibilities in
key industry segments.
Network video systems in retail stores can significantly reduce theft,
improve staff security and optimize store management. A major benefit
of network video is that it can be integrated with a store’s EAS (electronic article surveillance) system or a POS (point of sale) system to
provide a picture and a record of shrink-related activities. The system
can enable rapid detection of potential incidents, as well as any false
alarms. Network video offers a high level of interoperability and gives
the quickest return on investment.
Network video, together with intelligent video applications, can help identify the most popular
areas of a store and provide a record of consumer activity and buying behaviors that will help
optimize the layout of a store or display. It can also count the number of people entering and
exiting a store to help, for instance, in staff planning and show when more cash registers need
to be opened because of long queues.
Network video helps to protect passengers, staff and assets in all
modes of transport. Within public transportation, all security
cameras—from stations, terminals, buses, trains and tunnels—can be
connected to a security center. When an incident occurs, security
operators can view live video from the relevant cameras to quickly
decide on the appropriate action. At airports, network video is also
becoming a tool that is used to increase the efficiency of a wide
range of services in areas such as parking, retail, check-in, catering
services and security control.
Harbors and logistics terminals benefit from network video’s built-in detection capabilities,
which can automatically alert security staff when a perimeter is breached. Network video can
also be used to monitor traffic conditions to reduce congestion and enable quick response to
accidents. A wide variety of Axis network cameras meet tough indoor and outdoor conditions.
For onboard vehicles such as buses and trains, Axis offers network cameras that can withstand
varying temperatures, humidity, dust, vibrations and vandalism.
Network video: overview, benefits and applications - CHAPTER 1 13
1.3.3 Banking and finance
Banks have been using video surveillance for a long time, and while
most installations are still analog, network video is commonly used
for new and retrofit installations. This enables a bank to efficiently
monitor its headquarters, branch offices and ATM machines from a
central location. The system can be equipped with intelligent capabilities that automatically send alerts for ATM fraud attempts such
as skimming, card jamming or cash trapping. All video can be
recorded in HDTV quality, providing detailed images of persons and
objects that facilitate investigations and positive identification.
1.3.4 City surveillance
Network video is one of the most useful tools for fighting crime and
protecting citizens. It can be used to detect and deter. The use of
wireless networks has enabled effective city-wide deployment of
network video. Installation costs can be greatly reduced with
network cameras that offer quick and reliable installation features,
including the ability to focus and configure cameras remotely over
the network. The remote surveillance capabilities of network video
have enabled police to respond quickly to crimes being committed in
live view.
1.3.5 Education
From daycare centers to universities, network video systems help to
deter vandalism and increase the safety of staff and students. They
allow efficient monitoring of all indoor and outdoor facilities and
provide high quality images that enable positive identification of
persons and objects. In addition, network cameras can generate
automatic alarms. For example, if a camera is tampered with, or if
there is noise or motion in a building during off hours, real-time
images can be sent to security staff. Network video can also be used
for remote learning, for example, for students who are unable to attend lectures in person. The
system can be easily connected to an existing network infrastructure, thus keeping installation
and maintenance costs down.
Network video can be used by law enforcement, military and border
control. It is also an efficient means to secure all kinds of public
buildings, from museums and libraries to court buildings and prisons. Cameras placed at building entrances and exits can record who
comes in and out, 24 hours a day. They can be used to prevent vandalism and increase security for staff and visitors.
14 CHAPTER 1 - Network video: overview, benefits and applications
Network video enables hospitals and healthcare facilities to improve
the overall safety and security of staff, patients and visitors. In case
of alarms, authorized security and hospital staff can view live video
from critical areas such as emergency rooms, psychiatric
departments and medical supply rooms to quickly get a clear view of
the situation. Network video also enables high-quality patient
monitoring, remote care from specialists and remote learning.
Network video is not only an efficient tool to secure perimeters and
premises, it can also be used to monitor and increase efficiencies in
manufacturing lines, processes and logistic systems. In hazardous or
cleanroom areas, remote monitoring shortens troubleshooting and
response times. For industries with multiple production sites, network
video can greatly reduce the amount of travel required for technical
support issues.
1.3.9 Critical infrastructure
Whether it is a solar plant, an electrical substation or a waste management facility, network video can help ensure safe, secure and
uninterrupted activity everyday. Production data from remote sites
can be enhanced with visual information.
IP-based surveillance systems enable new security and business possibilities for all industry
segments. Learn more from Axis case studies at
Network cameras
A wide range of network cameras are available today to meet a variety of needs in
terms of form, use, light sensitivity, resolution and environmental considerations.
This chapter provides a description of what a network camera is, the different options
and features that it may have, and the different types of cameras available: fixed
cameras, fixed domes, covert cameras, PTZ (pan/tilt/zoom) and thermal cameras. A
camera selection guide is included at the end of the chapter. For more on camera
elements, see Chapter 3.
What is a network camera?
A network camera, often also known as an IP camera, is used primarily to send video/audio over
an IP network such as a local area network (LAN) or the Internet. A network camera enables live
viewing and/or recording, either continuously, at scheduled times, on request or when triggered
by an event. Video can be saved locally and/or at a remote location, and authorized access to
video can be made wherever there is access to an IP network.
Axis network camera
PoE switch
Computer with video
management software
Figure 2.1a A network camera connects directly to the network.
A network camera can be described as a camera and computer combined in one unit. The main
components of a network camera include a lens, an image sensor, one or several processors,
and memory. The processors are used for image processing, compression, video analysis and
networking functionalities. The memory is used mainly for storing the network camera’s firmware (computer program), but also to store video for shorter or longer periods of time.
16 CHAPTER 2 - Network CAMERAS
Like a computer, the network camera has its own IP address, is connected directly to a wired or
wireless network and can be placed wherever there is a network connection. This differs from
a web camera, which can only operate when it is connected to a personal computer (PC) via the
USB or IEEE 1394 port, and to use it, software must be installed on the PC. A network camera
provides web server, FTP (File Transfer Protocol), e-mail functionalities, and includes many
other IP network and security protocols.
In addition to capturing video, Axis network cameras provide event management and intelligent video functionalities such as video motion detection, audio detection, active tampering
alarm and autotracking. Many network cameras also offer input/output (I/O) ports that enable
connections to external devices such as motion sensors and relays (for controlling, for instance,
the locking/unlocking of doors). Event management is about defining an event that is triggered
either from features in the network video products or from other systems, and configuring the
products or the system to automatically respond to the event by, for example, recording video,
sending alert notifications and activating different devices such as doors and lights. Users can
configure network video products to only record when an event is triggered. In this way, event
management enables a surveillance system to more efficiently use network bandwidth and
storage space.
Other network camera features may include audio capabilities, built-in support for Power over
Ethernet (PoE), and a memory card slot for local storage of recordings. Axis network cameras
also support advanced security and network management features.
Zoom puller
P-Iris lens
Internal microphone
Memory card slot
Iris connector
Figure 2.1b Front, back and underside of a network camera.
Audio in
Audio out
I/O terminal
Network CAMERAS - CHAPTER 2 17
Network cameras can be accessed over the network by entering the product’s IP address in the
Address/Location field of a computer’s web browser. Once a connection is made with the network video product, the product’s ‘start page’, along with links to the product’s configuration
pages, is automatically displayed in the web browser.
The built-in web pages of Axis network video products enable users to, among many things,
define user access, configure camera settings, set the resolution, frame rate and compression
format (H.264/Motion JPEG), as well as action rules for when an event occurs. Managing a
network video product through its built-in web pages works when only a few cameras are involved in a system. For professional installations or systems with many cameras, the use of a
video management solution, in combination with the cameras’ built-in web pages, is recommended. For more on video management solutions, see Chapter 11.
Axis network cameras also support a host of accessories that extend the cameras’ abilities. For
example, network cameras can be connected to a fiber optic network using a media converter
switch or to coax cables using an Ethernet over coax adapter with support for Power over Ethernet.
AXIS Camera Application Platform
Most Axis network video products support AXIS Camera Application Platform, which enables
compatible applications—typically intelligent video applications—that are accessible from Axis’
website to be downloaded to the products. It allows the products to boost their intelligent video
capabilities with applications either from Axis or from third-party suppliers of video analytics.
An example of such an application is AXIS Cross Line Detection, which is a tripwire application
that detects and triggers an event when moving objects cross a virtual line.
Figure 2.1c AXIS Cross Line Detection is well suited for many situations, including video monitoring of building
entrances, loading docks and parking lots.
18 CHAPTER 2 - Network CAMERAS
2.1.2 Application programming interface
All Axis network video products have an application programming interface (API) called VAPIX®.
VAPIX enables developers to easily integrate Axis video products and their built-in functionalities in software solutions. VAPIX also enables an Axis camera with an upgraded firmware to be
backward compatible with, for example, an existing video management system.
Most of Axis’ network video products are ONVIF conformant. ONVIF, which is a global, open
industry forum founded by Axis, Bosch and Sony in 2008, works to standardize the network interface of network video products of different manufacturers to ensure greater interoperability.
It gives users the flexibility to use ONVIF conformant products from different manufacturers in
a multi-vendor network video system. ONVIF has rapidly gained momentum and is today endorsed by the majority of the world’s largest manufacturers of IP video products. ONVIF now has
more than 400 member companies involved. For more information, visit
Camera features for handling difficult scenes
Security cameras face many challenges that affect their ability to provide quality video for
effective surveillance. Scenes may have changing and wide ranging light levels, and conditions
such as complete darkness, haze and smoke may present problems to getting usable video. To
address these scenarios, cameras may be equipped with a variety of features (see list below)
that are important to consider as they have an impact on image quality.
2.2.1 Lens’ light gathering ability (f-number)
Camera lenses with a small f-number have better light gathering ability. In general, the smaller
the f-number, the better its performance in low-light settings. Sometimes a higher f-number is
preferable for handling some types of lighting. A camera’s light sensitivity depends not only on
its lens, but also on the image sensor and image processing. More details on lenses and image
sensors are covered in Chapter 3.
Lenses with a manually adjustable iris are suitable for scenes with a constant light level. For
scenes with changing light levels, an automatically adjustable iris (DC-iris/P-Iris) is recommended to provide the right level of exposure. Cameras with P-Iris enable better iris control for
optimal image quality in all lighting conditions. More details are covered in Chapter 3.
2.2.3 Day/night functionality
A network camera with day/night functionality has an automatically removable infrared-cut
filter. The filter is on during daytime, enabling the camera to produce colors as the human eye
sees them. At night, the filter is removed to enable the camera to take advantage of near infrared
light and produce good quality, black and white images. This is one way of extending a network
camera’s usefulness in low-light conditions.
Network CAMERAS - CHAPTER 2 19
Figure 2.2a At left, an image in Day mode. At right, an image in Night mode.
2.2.4 Infrared (IR) illuminators
In low light or complete darkness, built-in IR LEDs in a camera or a separately installed infrared
illuminator will strengthen a camera’s ability to use near infrared light to deliver quality black
and white images. Near infrared light from the moon, street lamps or IR illuminators is not visible to the human eye, but a camera’s image sensor can detect it. (Near infrared light is just
beyond the visible part of the light spectrum and has longer wavelengths than visible light.)
IR illuminators provide different illumination distances. The illumination with built-in IR LEDs in
Axis cameras can be adjusted to match the viewing angle and can be activated automatically in
darkness, upon an event or on request from a user. Axis cameras with built-in IR LEDs simplify
installation and provide a cost-effective option. External IR illuminators, meanwhile, give installers the flexibility to choose the IR illuminator—for instance, a long range one—and place the
light where it is needed and not necessarily at the same location as the camera.
Figure 2.2b At left, Night mode image without the use of IR illuminators (whereby the camera made use of the small
amount of light coming underneath a door in the left-hand corner of the room). At right, Night mode image with IR
20 CHAPTER 2 - Network CAMERAS
2.2.5 Lightfinder technology
Cameras with Axis’ Lightfinder technology have extreme light sensitivity. Such cameras can
deliver color images in as little light as 0.18 lux or lower. This is achieved through the optimal
selection of the image sensor and lens, Axis’ image processing know-how and in-house ASIC
chip development. For more details, see the Lightfinder white paper at
Figure 2.2c Scene (with 0.4 lux of illumination at the back wall) shown at left using a camera that has switched
over to Night mode, and at right using a camera with Lightfinder technology, which is still functioning in Day mode,
providing a color image and details such as the box on the floor at the back wall.
A camera’s resolution is defined by the number of pixels in an image provided by an image sensor.
Depending on the lens used, the resolution can mean either more details in an image or a wider field
of view to cover a larger area of a scene. Cameras with megapixel sensors offer images with one
million or more pixels. When using a wide viewing angle, it can provide a wider area of coverage
than a non-megapixel camera. When using a narrow viewing angle, it can enable viewers to see
greater details, which would be helpful in identifying people and objects. Cameras supporting HDTV
720p (1280x720 pixels) and HDTV 1080p (1920x1080 pixels), which are approximately 1 and 2
megapixels, respectively, are gaining popularity since they follow standards that guarantee full
frame rate, high color fidelity and a 16:9 aspect ratio. For more details about image sensors and
resolution, see Chapter 3 and 6, respectively.
2.2.7 Exposure control settings
When the level of lighting changes, Axis cameras automatically adjust to ensure optimal exposure.
The cameras also give users the option of modifying various exposure control settings in challenging
situations. For example, in low light situations, users can increase gain to enable more details to be
seen. The downside is that noise may be more visible. In low light, users can also increase the exposure
time to get a brighter image but this may lead to smearing of moving objects. Exposure zones may
also be available, enabling users to set the area of an image that should be more properly exposed.
Backlight compensation is another technique that can be used in a camera to enable objects in dark
areas to be visible against a very bright background (e.g., in front of a window/entrance).
Network CAMERAS - CHAPTER 2 21
2.2.8 Wide dynamic range (WDR)
For surveillance scenes with very bright and dark areas, such as at entrance doors in a retail/office
environment, entrance way to an indoor parking garage or tunnel, or at train platforms, a camera
with wide dynamic range may provide the best solution. WDR cameras often incorporate an image
sensor that takes different exposures of a scene (e.g., a short exposure for very bright areas and long
exposure for dark areas) and combine them into one image, enabling objects in both bright and dark
areas of a scene to be visible. For more details, see the WDR white paper at
Figure 2.2d At left, image from a conventional camera. At right, image with a WDR camera.
2.2.9 Thermal radiation
Besides the use of sunlight, artificial light and near infrared light, there is thermal radiation, which
can be used to generate images. A thermal network camera requires no light source. Instead it
detects thermal radiation emitted from every object with a temperature above zero degrees Kelvin.
The hotter the object, the greater the radiation. Greater temperature differences produce higher
contrast thermal images. Thermal network cameras can be used to detect subjects in complete
darkness or under other challenging conditions such as smoke or light fog, or when subjects are
hiding in shadows or obscured by a complex background. Such cameras are also not blinded by
strong lights. Thermal cameras are ideal for detection purposes and can be used to complement
conventional cameras to enhance the effectiveness of a surveillance system.
Figure 2.2e At left, image from a conventional camera. At right, image from a thermal camera.
22 CHAPTER 2 - Network CAMERAS
Camera features for ease of installation
Axis network cameras incorporate features that make the products easy to install and use, as
well as more reliable by minimizing installation errors. They include the following.
Outdoor-ready products are ready right out of the box for installation outdoors. No separate
housing is required. The products are designed to meet a range of operating temperatures and
offer protection against dust, rain and snow. Some even meet military standards for operation
in harsh climates.
2.3.2 Focused at delivery
To make installation quicker and simpler, Axis cameras with a fixed focal lens are focused at the
factory, which eliminates the need to focus them at the installation site. This is possible since
fixed focal cameras with a wide or mid-range field of view usually have a wide depth of field
(the range where near and far objects are in focus). For an explanation about focal length, fnumber and depth of field, see Chapter 3.
2.3.3 Remote focus and zoom
A varifocal camera with remote focus and zoom eliminates the need for manual focusing and field
of view adjustment at the camera location. The camera, together with the lens motor, allows the
focus and viewing angle to be remotely controlled and adjusted from a computer on the network.
2.3.4 Remote back focus
A CS-mount varifocal camera with remote back focus allows the focus to be fine-tuned remotely from a computer by enabling the image sensor to move. This functionality works even
with optional lenses.
2.3.5 3-axis camera angle adjustment
Axis’ fixed dome cameras are designed with a 3-axis camera angle adjustment that allows the
lens holder (comprising the lens and image sensor) to pan, tilt and rotate. This enables the
cameras to be mounted on a wall or ceiling. Users can then easily adjust the cameras’ direction
and level the image. The flexibility of the camera adjustment, together with the ability to rotate
the image using the cameras’ web page, enables users to get vertically oriented video streams
(Axis’ Corridor Format).
Figure 2.3a 3-axis camera angle adjustment.
Network CAMERAS - CHAPTER 2 23
2.3.6 Corridor Format
Axis’ Corridor Format enables a fixed/fixed dome camera to provide a vertically oriented video
stream. The vertical format optimizes the coverage of areas such as corridors, hallways and
aisles, maximizing image quality while eliminating bandwidth and storage waste. It enables, for
example, HDTV network cameras to deliver video with a 9:16 aspect ratio. With a fixed dome,
it is achieved first by rotating the 3-axis lens 90° (or with a fixed camera, by positioning it on
its side), and then rotating the video image back 90° in the camera’s web page.
Figure 2.3b A display of camera views using Axis’ Corridor Format.
2.3.7 Pixel counter
Axis’ pixel counter helps ensure that the video resolution has sufficient video quality to meet
goals such as facial identification. It can be used to verify that the pixel resolution of an object
fulfills regulatory or customer requirements.
Figure 2.3c Axis’ pixel counter is a visual aid shaped as a frame with a corresponding counter to show the box’s
width and height. The pixel counter helps verify, for instance, that the pixel resolution of a face is enough for facial
24 CHAPTER 2 - Network CAMERAS
Types of network cameras
Network cameras can be classified in terms of whether they are designed for indoor use only or
for indoor and outdoor use. An outdoor camera requires an external, protective housing unless
the camera design already incorporates a protective enclosure. For more on environmental protection, see Chapter 5.
Network cameras, whether for indoor or outdoor use, can be further categorized into fixed,
fixed dome, covert, PTZ and thermal network cameras.
2.4.1 Fixed network cameras
Figure 2.4a Fixed network cameras, including models with features such as wireless, built-in IR illuminators,
HDTV/multi-megapixel, WDR, Lightfinder, outdoor-ready and vandal-resistant design.
A fixed network camera is a camera that has a fixed viewing direction once it is mounted. It
may come with a fixed, varifocal or motorized zoom lens, and the lens may be exchangeable on
some cameras. A fixed camera is the traditional camera type where the camera and the direction in which it is pointing are clearly visible. This type of camera represents the best choice in
applications where it is advantageous to make the camera very noticeable. Fixed cameras can
be installed in protective enclosures. Axis’ outdoor fixed cameras come pre-installed in housings. Fixed cameras can also be mounted on a pan/tilt motor for greater viewing flexibility.
2.4.2 Fixed dome network cameras
Figure 2.4b Fixed dome network cameras, including models with features such as panoramic view, HDTV/multimegapixel, built-in IR illuminators, WDR, Lightfinder, outdoor-ready and vandal-resistant design.
A fixed dome network camera is a fixed camera in a dome design. It may come with a fixed,
varifocal or motorized zoom lens, and the lens may be exchangeable on some cameras.
Network CAMERAS - CHAPTER 2 25
The camera can be directed to point in any direction. Its main benefit lies in its discreet,
non-obtrusive design, as well as in the fact that it is hard to see in which direction the camera
is pointing. The camera is also tamper resistant. Axis’ fixed dome cameras provide different
types and levels of protection such as vandal- and dust-resistance, and IP66 and NEMA 4X
ratings for outdoor installations. The cameras can be mounted on a wall, ceiling or pole.
A fixed dome with a wide-angle lens and a megapixel sensor that provides a 360° field of view
is often known as a panoramic or 360° camera.
Figure 2.4c An Axis 5-megapixel 360° fixed dome camera offers multiple viewing modes such as 360° overview,
panorama, view area with digital PTZ and quad view.
2.4.3 Functionalities in multi-megapixel fixed and fixed dome cameras
Multi-megapixel fixed and fixed dome cameras are becoming more common. While the multimegapixel resolution offers advantages, as described earlier, it also presents challenges to
bandwidth and storage requirements. However, functionalities have been developed to use
such cameras in innovative ways that help reduce bandwidth and storage needs. Some functionalities that Axis’ multi-megapixel cameras can support are described over on the next page.
26 CHAPTER 2 - Network CAMERAS
> Digital PTZ: Since a multi-megapixel camera can cover a large area, the camera may enable
digital pan/tilt/zoom capability with preset positions.
> AXIS Digital Autotracking: This application, when installed in an Axis multi-megapixel
camera, aims to reduce bandwidth and storage requirements particularly in low-traffic
surveillance situations where it is unnecessary to continuously send the camera’s full view
at maximum resolution. AXIS Digital Autotracking enables the camera to automatically
detect movement its field of view and stream the part of the view where there is activity.
The cropped viewing area zeroes in on and follows moving objects with no loss in image
quality. As the application will not lock on a single object, the view can “zoom out” to
cover moving objects in different areas of the camera’s field of view, ensuring that no
incidents are missed. When there is no movement, a scaled-down overview of the camera’s
full view is streamed. While the size of the video streams is reduced, the video quality of
zoomed-in views is maintained using the camera’s original pixel resolution. Depending on
the scenario, AXIS Digital Autotracking—in SVGA (800x600) resolution at 30 frames per
second—can reduce bandwidth/storage use by approximately 90% compared with a
continuous 2-megapixel video stream at 30 frames per second. Correspondingly, a digital
autotracking stream in VGA (640x480) at 12 frames per second can reduce by
approximately 95% compared with a continuous 5-megapixel videostream at 12 frames per
Figure 2.4d At left, a scaled down 5-megapixel image. At right, AXIS Digital Autotracking provides a cropped VGA
view—with no loss in image quality—of the area where there is activity.
> Multi-view streaming: This functionality allows several cropped view areas from a multi megapixel camera to be streamed simultaneously, simulating up to eight virtual cameras.
Each stream can be individually configured. The streams, for instance, can be sent at
different frame rates for live viewing or recording. Multi-view streaming gives users the
ability to reduce bandwidth and storage use while being able to cover a large area with just
one camera.
Figure 2.4e One multi-megapixel camera. Full overview enabling cropped view areas. Multiple virtual camera views
(up to eight views possible).
2.4.4 Covert network cameras
Covert cameras are designed to blend into the environment and be virtually impossible to
discover. They can be placed at eye-level at entrances or integrated into things such as ATM
machines for discreet or covert surveillance. They can enable close-up shots for identification
purposes or overview surveillance. Tampering risks are also reduced. Using a pin-hole lens, Axis’
indoor/outdoor covert network cameras provide resolutions of up to 1 MP, including HDTV 720p,
and come pre-mounted with an Ethernet cable for both power and data. The cameras are ideal
for use in retail stores, banks and hospitals.
Main unit with various
Sensor unit (lens and
image sensor)
Figure 2.4f Covert cameras, such as an AXIS P12 Network Camera pictured above, blend easily into a variety of
environments. The sensor unit can be integrated into very small spaces, such as behind a thin metal sheet in a doorway, behind any wall, in an ATM machine or in a special casing. The main unit can be placed up to 8 m (26 ft.) away.
Figure 2.4g Covert cameras in AXIS P85 Network Camera Series, which are pre-mounted for eye-level placement,
provide discreet surveillance and the best angle of view for facial identification compared with ceiling-mounted
2.4.5 PTZ network cameras
Figure 2.4h PTZ network cameras including HDTV and outdoor-ready models, as well as (at far right) a dual PTZ
camera that combines both a visual (conventional) and thermal camera in one unit for mission-critical surveillance.
A PTZ camera provides pan, tilt and zoom functions (using manual or automatic control), enabling
wide area coverage and great details when zooming in. An Axis PTZ camera usually has the ability to pan 360°, tilt 180° or 220°, and is often equipped with a zoom lens. (A zoom lens provides
an optical zoom that maintains image resolution, as opposed to a digital zoom, which enlarges an
image with loss in image quality.)
PTZ commands are sent over the same network cable as for video transmission (no need for RS485 wires as is the case with an analog PTZ camera). PTZ cameras with support for Power over
Ethernet (PoE/PoE+/High PoE) also do not require separate power cables, unlike an analog PTZ
PTZ cameras can come in various form factors; the most common is a PTZ dome, which is
ideal for use in discreet installations due to its design, mounting (particularly in indoor, dropceiling mounts), and difficulty in seeing the camera’s viewing angle. In outdoor installations, the
cameras are usually mounted on poles or walls of a building.
In operations with live monitoring, PTZ cameras can be used to follow a person or object, and
zoom in for closer inspection. In unmanned operations, automatic guard tour on PTZ cameras
can be used to monitor different areas of a scene. In guard tour mode, one PTZ network camera
can cover an area where many fixed network cameras would be needed. The main drawback is
that only one location can be monitored at any given time.
Axis’ high-end PTZ domes offer high-speed endless pan, tilt and zoom, and provide mechanical
robustness for continuous operation in guard tour mode. PTZ domes with a mechanical stop
incorporate Axis’ Auto-flip functionality to enable them to pan 360°.
Figure 2.4i At left, wide view and at right, 20x zoomed-in view with an HDTV 1080p PTZ dome, enabling texts on the
cargo ship to be read 1.6 km (1 mile) away from the camera.
Figure 2.4j At left, wide view and at right, 20x zoomed-in view with an HDTV 1080p PTZ dome, enabling the license
plate to be read 275 m (900 ft.) away from the camera.
It is worth noting that an HDTV camera with a lower zoom factor may be able to provide the same
level of detail in zoomed-in views as a lower resolution camera with a higher zoom. This was illustrated when comparing an 18x zoom, HDTV 720p Axis camera with a 4CIF, 36x zoom camera.
For details, see whitepaper on 18x vs. 36x zoom at
PTZ domes are not limited to high-end installations. Using Axis’ palm-sized, ceiling-mount PTZ
cameras, price-sensitive installations such as retail stores have the flexibility to easily change
where the cameras are pointing and use them as tools to improve store management as well as
secure premises.
Another innovative product from Axis combines an HDTV PTZ dome camera with a wide-angle
lens converter that provides a 360° field of view. AXIS P5544 PTZ Dome Network Camera can
switch between a 360° field of view for overview surveillance, and pan, tilt and zoom in with a
separate lens for close-up views in HDTV resolution and with no loss in image quality. This kind
of camera is ideal for live monitoring applications.
Figure 2.4k With the ability to cover a 360° field of view and mechanically pan, tilt and zoom in with no loss in
image quality, AXIS P5544 can cover an area more than 950 m² (10,000 sq. ft.). The above left image shows the live
view in Overview mode (with a digital magnifier in the corner) and at right, the zoomed-in view in Normal mode.
Some of the features that can be incorporated in a PTZ camera include:
> 3D privacy masking. 3D privacy masking, which is supported in most Axis PTZ cameras,
enables selected areas of a scene to be blocked or masked from viewing and recording. It
allows masking to be maintained even as the camera’s field of view changes through
panning, tilting and zooming since the masking moves with the camera’s coordinate system.
Figure 2.4l With built-in privacy masking (gray rectangles in image), the camera can guarantee privacy for areas
that should not be covered by a surveillance application.
> E-flip. When a PTZ camera is mounted on a ceiling and is used to follow a person in, for
example, a retail store, there will be situations when a person will pass just under the
camera. When following through on the person, images would be seen upside down without
the E-flip functionality. E-flip electronically rotates images 180° in such cases. It is
performed automatically and will not be noticed by an operator.
> Preset positions/guard tour. PTZ cameras enable a number of preset positions, normally
between 20 and 100, to be programmed. Once the preset positions have been set in the
camera, it is very quick for the operator to go from one position to the next. In guard tour
mode, the camera can be programmed to automatically move from one preset position to the
next in a pre-determined order or at random. Normally up to 20 guard tours can be set up
and activated during different times of the day.
> Tour recording. The tour recording functionality in PTZ cameras enables easy setup of an
automatic tour using a device such as a joystick to record an operator’s pan/tilt/zoom
movements and length of time spent at each point of interest. The tour can then be activated
at a touch of a button or at a scheduled time.
> Autotracking. Autotracking is an intelligent video functionality that will automatically
detect a moving person or vehicle and follow it within the camera’s area of coverage. Auto tracking is particularly beneficial in unmanned video surveillance situations where the
occasional presence of people or vehicles requires special attention. The functionality cuts
down substantially the cost of a surveillance system since fewer cameras are needed to
cover a scene. It also increases the effectiveness of the solution since it allows a PTZ
camera to record areas of a scene with activity.
> Advanced/Active Gatekeeper. Advanced Gatekeeper enables an Axis PTZ camera to pan, tilt
and zoom in to a preset position when motion is detected in a pre-defined area and return
to home position after a set time. When this is combined with the ability to continue to track
the detected object, the function is called Active Gatekeeper.
> Electronic image stabilization (EIS). In outdoor installations, PTZ cameras with zoom
factors above 20x are sensitive to vibrations and motion caused by traffic or wind. EIS helps
reduce the affects of vibration in a video. In addition to getting more useful video, EIS will
reduce the file size of the compressed image and thereby save valuable storage space.
2.4.6 Thermal network cameras
Figure 2.4m Indoor and outdoor thermal network cameras, as well as (at far right) a dual PTZ camera that combines
both a visual (conventional) and thermal camera in one unit for mission-critical surveillance.
Thermal network cameras create images based on heat that radiates from all objects. Images are
generally produced in black and white but can be artificially colored to make it easier to distinguish different shades. Thermal images are best when there are great temperature differences in
a scene; the hotter an object, the brighter it is in a thermal image.
Thermal cameras are ideal for detecting people, objects and incidents in shadows, complete darkness or in other challenging conditions such as smoke and dust. The cameras are used primarily to
detect suspicious activities as thermal images do not enable reliable identification. They, therefore,
complement and support conventional network cameras in a surveillance installation.
Thermal cameras can be used for perimeter or area protection, providing a powerful and costeffective alternative to radio frequency intruder detection, electrified fences and flood lights. In
the dark, they provide discreet surveillance since there is no need for artificial light. In public
areas, thermal cameras can help secure dangerous or off-limit areas such as tunnels, railway
tracks and bridges. Indoor uses include building security and emergency management, enabling
humans to be detected inside a building, whether after business hours or during emergencies
such as a fire. Thermal cameras are often used in high security buildings and areas such as nuclear power plants, prisons, airports, pipelines and sensitive railway sections.
A thermal camera requires special optics since regular glass will block the thermal radiation.
Most thermal camera lenses are made using germanium, which enables infrared light and thermal radiation to pass through. How much or how far away a thermal camera can “see” or detect
depends on the lens. A wide-angle lens enables a thermal camera to have a wider field of view,
but a shorter detection range than a telephoto lens, which provides a longer detection range
with a narrower field of view.
A thermal camera also requires a special, more expensive image sensor. Detectors used for thermal
imaging can be broadly divided into two types: uncooled thermal image sensors and cooled
thermal image sensors.
Sensors in uncooled thermal cameras operate at or close to the ambient temperature and operate between 8 µm and 14 µm in the long-wave infrared range. Uncooled sensors are based often
on microbolometer technology. Uncooled thermal image sensors are smaller and less expensive
than cooled image sensors. Hence, an uncooled thermal camera is more affordably priced. Such
cameras also have a longer life span.
Cooled thermal image sensors are usually contained in a vacuum-sealed case and cooled to temperatures as low as -210 °C (-346 °F) to reduce noise created by their own thermal radiation at
higher temperatures. It allows the sensors to operate in the mid-wave infrared band, approx. 3 to
5 µm (hot pink band in the image on the next page), which provides better spatial resolution and
higher thermal contrast since such sensors can distinguish smaller temperature differences and
produce crisp, high resolution images. The disadvantages of such detectors are that they are
bulky, expensive, energy-consuming and the coolers must be rebuilt every 8,000 to 10,000 hours.
A thermal camera’s sensitivity to infrared radiation is expressed as its NETD value (Noise Equivalent Temperature Difference). The lower the NETD value, the better the sensitivity to infrared
Micrometers (µm)
0.01(10 )
(1 mm)
(1 m)
Figure 2.4n Conventional cameras work in the range of visible light, i.e. with wavelengths between approximately
0.4–0.7 μm. Thermal cameras, on the other hand, are designed to detect radiation in the much broader infrared
spectrum, up to around 14 μm (the distances in the spectrum above are not according to scale).
Thermal imaging technologies, which were originally developed for military use, are regulated.
In order for a thermal camera to be freely exported, the maximum frame rate cannot exceed 9
frames per second (fps). Thermal cameras with a frame rate of up to 60 fps can be sold within
the EU, Norway, Switzerland, Canada, U.S.A., Japan, Australia and New Zealand on the condition
that the buyer is registered and can be traced.
Guidelines for selecting a network camera
With the variety of network cameras available, it is useful to have some guidelines when
selecting a network camera.
> Define the surveillance goal: overview or high detail, and detection, recognition or
identification. Overview images aim to view a scene in general or view the general
movements of people. High detail images are important for identification of persons or
objects (e.g., face or license plate recognition, point-of-sales monitoring). The surveillance
goal will determine the field of view, the placement of the camera, and the type of camera/
lens required. For more on lenses, see Chapter 3.
Area of coverage. For a given location, determine the number of interest areas, how much
of these areas should be covered and whether the areas are located relatively close to each
other or spread far apart. The area will determine the type of camera and number of
cameras required.
- Megapixel/HDTV or lower resolution. For instance, if there are two, relatively small
areas of interest that are close to each other, an HDTV/megapixel camera with a wide angle lens can be used instead of two lower resolution cameras.
- Fixed or PTZ. An area may be covered by several fixed/fixed dome cameras or a few PTZ
cameras. Consider that a PTZ camera with a high optical zoom can provide highly
detailed images and survey a large area. A conventional PTZ camera may provide a
brief view of one part of its area of coverage at a time, while a fixed camera will be able
to provide full coverage of its area all the time. The special PTZ dome with the
additional 360° field of view provides a middle ground where full, wide area coverage
can be provided when pan/tilt/zoom is not used. To make full use of a PTZ camera, an
operator is required or an automatic tour needs to be set up.
> Indoor or outdoor environment.
Light sensitivity and lighting requirements. Cameras come with different light
sensitivities. There are two factors that purchasers can look at: one is the lowest
f-number on the camera lens (the lower the number, the more light sensitive it is); the
other is the lux specification (the lower, the better). The lux specification takes into
account the combined performance of several factors such as the lens, image sensor
and image processing. (Keep in mind that lux measurements on network cameras are
not comparable among different network video product vendors as there is no industry
standard for measuring light sensitivity.)
In outdoor environments, consider the use of day/night cameras. Day/night cameras
with Axis’ Lightfinder technology have extended light sensitivity, providing color
information even in dark environments. Meanwhile, cameras with built-in IR LEDs, or
with external IR illuminators, help enhance black and white video in low light and will
also provide usable video in completely dark conditions. If adding external light through
the use of a normal lamp or an IR illuminator is not an option, consider the use of
thermal cameras for detection in complete darkness.
In scenes with backlight (e.g., an indoor camera pointing at a window or door) or
scenes with a combination of very bright and dark areas, repositioning the camera may
be an answer to getting better video quality. If such scenarios are unavoidable,
consider cameras with wide dynamic range (WDR). A good WDR surveillance camera
can deliver images that capture details in both well-lit and dark areas.
Protection. If the camera is to be placed outdoors or in environments that require
protection, select cameras with the appropriate specifications, such as IP51/52 for
indoor cameras, IP66 and NEMA 4X for outdoor cameras, IK08/10 for vandal/impactresistance, and operating temperatures that are suitable for the environment.
Specialized, external housings are also available. For more on environmental protection,
see Chapter 5.
> Overt or covert surveillance. This will help in selecting the cameras, as well as the type of
housing and mount, that offer a non-discreet or discreet installation.
Other important feature considerations that may be required of a camera include:
> Resolution. For applications that require detailed images, HDTV/megapixel cameras may be
the best option. For more on megapixel resolution, see Chapter 6.
> Compression. Axis’ latest network video products support H.264 and Motion JPEG video
compression formats. H.264 offers the greatest savings in bandwidth and storage. For more on
compression, see Chapter 7.
> Audio. If audio is required, consider whether one- or two-way audio is needed. An Axis network
camera with audio support comes with a built-in microphone and/or an input for an external
microphone and a speaker or a line out for external speakers. For more on audio, see Chapter 8.
Event management and intelligent video. Event management is often configured using a
video management software program. Event management is enhanced with the use of input/
output ports and intelligent video functionalities in a network video product. Making
recordings based on event triggers from input ports and/or intelligent video features in a
network video product saves on bandwidth and storage use, and allows operators to take
care of more cameras since not all cameras require live monitoring unless an alarm/event
takes place. For more on event management functions, see Chapter 11.
> Edge storage. Edge storage allows an Axis network video product to create, control and
manage recordings either locally on a memory card or to network shares on a
network-attached storage (NAS) or file server. Many Axis network video products have a
built-in SD card slot or a micro version of it. When integrated with video management
software, edge storage can provide an easy video management solution for systems with a
few cameras at a site. For mission-critical installations, at remote locations or in mobile
situations, edge storage can help create a more robust and flexible video surveillance
system. For more on video management functions, see Chapter 11.
Networking functionalities. Considerations include PoE; HTTPS encryption for encrypting
video streams before they are sent over the network; IP address filtering, which gives or
denies access rights to defined IP addresses; IEEE 802.1X to control access to a network;
IPv6; Quality of Service to prioritize traffic over a network; and wireless functionality. For
more on networking and security technologies, see Chapter 9.
Open interface and application software. A network video product with an open interface
enables better integration possibilities with other systems. It is also important that the
product is supported by a good selection of application software, and management software
that enable easy installation and upgrades of network video products. Axis products are
supported by a variety of video management software and intelligent video applications
from Axis and more than 1,000 of its Application Development Partners. For more on video
management systems, see Chapter 11.
Another important consideration, outside of the network camera itself, is the selection of the
network video product vendor. Since needs grow and change, the vendor should be seen as a
partner, and a long-term one. This means that it is important to select a vendor that offers a full
product line of network video products and accessories that can meet the needs now and well
into the future. The vendor should also provide innovation, support, upgrades and product path
for the long term.
Once a decision has been made as to the required camera, it is a good idea to purchase one and
test its quality before setting out to order quantities of it.
Camera elements
There are a number of camera elements that have an impact on image quality and the
field of view and are, therefore, important to understand when choosing a network
camera. The elements include the light sensitivity of a camera, the type of lens, type of
image sensor and scanning technique, as well as image processing functionalities—all
of which are discussed in this chapter. Some guidelines on installation considerations
are also provided at the end.
Light sensitivity
A network camera’s light sensitivity is defined mainly by the lens and image sensor, which are
discussed in the following sections. Light sensitivity is often specified in terms of lux, which
corresponds to an illuminance level at which a camera produces an acceptable image. The
lower the lux specification, the better light sensitivity the camera has. Normally, at least 200 lux
is needed to illuminate an object so that a good quality image can be obtained. In general, the
more light on the subject, the better the image. With too little light, focusing will be difficult
and the image will be noisy and/or dark.
Lighting condition
100,000 lux
Strong sunlight
10,000 lux
Full daylight
500 lux
Office light
100 lux
Poorly lit room
Table 3.1a Examples of different levels of illuminance.
Different light conditions offer different illuminance. Many natural scenes have fairly complex
illumination, with both shadows and highlights that give different lux readings in different parts
of a scene. It is important, therefore, to keep in mind that one lux reading does not indicate the
light condition for a scene as a whole.
Many manufacturers specify the minimum level of illumination needed for a network camera to
produce an acceptable image. While such specifications are helpful in making light sensitivity
comparisons for cameras produced by the same manufacturer, it may not be helpful to use such
numbers to compare cameras from different manufacturers. This is because different manufacturers use different methods and have different criteria for what is an acceptable image.
To properly compare the low light performance of two different cameras, the cameras should be
placed side by side and be viewing a moving object in low light.
To capture good quality images in low light or nighttime conditions, Axis provides a variety of
solutions. They include cameras with day/night functionality, which takes advantage of nearinfrared light to produce quality black and white video; day/night cameras with Axis’ Lightfinder technology, which enables color video in very little light; and day/night cameras with
built-in infrared (IR) LED or an external IR illuminator to enhance the quality of black and white
video in low light or complete darkness. A thermal camera, which makes use of infrared radiation from objects (i.e. longer wavelengths than visible light), is also another alternative for detection in complete darkness or in challenging lighting conditions. For more about Lightfinder
technology, cameras with built-in IR LED and thermal cameras, see Chapter 2. More information
on IR illuminators can be found on Axis’ website at For
more on day/night functionality, see Section 3.3.
Lens elements
A lens or lens assembly on a network camera performs several functions. They include:
> Defining the field of view; that is, defining how much of a scene and level of detail are to be
> Controlling the amount of light passing through to the image sensor so that an image is
correctly exposed.
> Focusing by adjusting either elements within the lens assembly or the distance between the
lens assembly and the image sensor.
3.2.1 Field of view
A consideration to take into account when selecting a camera is the field of view required; that
is, the area of coverage. The field of view is determined by the focal length of the lens and the
size of the image sensor.
A lens’ focal length is defined as the distance between the center of a single lens or a specific
point in a complicated lens assembly and the point where all the light rays converge to a point
(normally the camera’s image sensor). The longer the focal length, the narrower the field of view.
The fastest way to find out what focal length lens is required for a desired field of view is to use
a rotating lens calculator or an online lens calculator (, both of which are
available from Axis. The size of a network camera’s image sensor, typically 1/4”, 1/3” and 1/2”,
must also be used in the calculation.
The field of view can be classified into three types:
> Normal view: offering the same field of view as the human eye.
Telephoto: a narrower field of view, providing, in general, finer details than a human eye can
deliver. A telephoto lens is used when the surveillance object is either small or located far
away from the camera. A telephoto lens generally has less light gathering capability than a
normal lens.
> Wide angle: a larger field of view with less detail than in normal view. A wide-angle lens
generally provides good depth of field and fair, low-light performance. Wide-angle lenses
produce geometrical distortions such as “fish-eye” and barrel effects.
Figure 3.2a Different fields of view: wide-angle view (at left); normal view (middle); telephoto (at right).
Figure 3.2b Network camera lenses with different focal lengths: wide-angle (at left); normal (middle); telephoto (at
There are three main types of lenses:
> Fixed lens: Such a lens offers a focal length that is fixed; that is, only one field of view
(either normal, telephoto or wide angle). A common focal length of a fixed network camera
lens is 3 mm.
Varifocal lens: This type of lens offers a range of focal lengths, and hence, different fields of
view. The field of view can be adjusted manually or by a motor. Whenever the field of view
is changed, the user has to refocus the lens. Varifocal lenses for network cameras often
provide focal lengths that range from 3 mm to 8 mm.
Zoom lens: Zoom lenses are like varifocal lenses in that they enable the user to select
different fields of view. However, with zoom lenses, there is no need to refocus the lens if the
field of view is changed. Focus can be maintained within a range of focal lengths, for
example, 5.1 mm to 51 mm. Lens adjustments can be either manual or motorized for remote
control. When a lens states, for example, 10x-zoom capability, it is referring to the ratio
between the lens’ longest and shortest focal length.
3.2.2 Matching lens and sensor
If a network camera comes with an exchangeable lens, it is important to select a lens suitable
for the camera. A lens made for a 1/2-inch image sensor will be large enough for 1/2-inch, 1/3inch and 1/4-inch image sensors, but not for a 2/3-inch image sensor.
If a lens is made for a smaller image sensor than the one that is actually fitted inside the camera,
the image will have black corners (see left-hand illustration in Figure 3.2c below). If a lens is
made for a larger image sensor than the one that is actually fitted inside the camera, the field
of view will be smaller than the lens’ capability since part of the information will be “lost” outside the image sensor (see right-hand illustration in Figure 3.2c).
1/4” lens
1/3” lens
1/2” lens
Figure 3.2c Examples of different lenses mounted onto a 1/3-inch image sensor.
When replacing a lens on a megapixel camera, a high quality lens is required since megapixel sensors have pixels that are much smaller than those on a VGA sensor (640x480 pixels). It is best to
match the lens resolution to the camera resolution in order to fully use the camera’s capability as
well as other aspects of the lens. Note that lenses may be tailored to a specific camera type in
order to reach maximum performance. Axis’ optional lenses are selected with this in mind.
3.2.3 Lens mount standards for exchangeable lenses
When changing a lens, it is also important to know what type of lens mount the network camera
has. The lens mount is the interface that connects the lens to the camera body. There are three
main mounting standards for exchangeable lenses on Axis network cameras: CS, C and M12. CS
and C mounts are used on fixed cameras, while M12 is used on lenses for fixed dome cameras.
CS and C mount both have a 1-inch thread and they look the same. What differs is the distance
from the lenses to the sensor when fitted on the camera. With CS mount, the distance between
the sensor and the lens should be 12.5 mm. With C mount, the distance should be 17.526 mm. It
is possible to mount a C-mount lens to a CS-mount camera body by using a 5 mm spacer (C/CS
adapter ring). If it is impossible to focus a camera, it is likely that the wrong type of lens is used.
An M12 lens has a metric M12 thread with a 0.5 mm pitch.
3.2.4 F-number and exposure
In low-light situations, particularly in indoor environments, an important factor to look for in a
network camera is the lens’ light-gathering ability. This can be determined by the lens’ f-number,
also known as f-stop. An f-number defines how much light can pass through a lens.
An f-number is the ratio of the lens’ focal length to the diameter of the aperture or iris as seen
from the front of the lens—normally referred to as the entrance pupil; that is, f-number = focal
length/aperture. The smaller the f-number (either short focal length relative to the aperture, or
large aperture relative to the focal length), the better the lens’ light gathering ability; that is,
more light can pass through the lens to the image sensor. In low-light situations, a smaller
f-number generally produces a better image quality. (There may be some sensors, however, that
may not be able to take advantage of a lower f-number in low-light situations due to the way
they are designed.) A higher f-number, on the other hand, increases the depth of field, which is
explained in Section 3.2.6.
F-numbers are sometimes expressed as F/x. The slash indicates division. An F/4 means the
entrance pupil is equal to the focal length divided by 4; so if a camera has a lens with a focal
length of 8 mm, light must pass through an entrance pupil that is 2 mm in diameter.
While lenses with automatically adjustable iris have a range of f-numbers, often only the maximum light gathering end of the range (smallest f-number) is specified.
A lens’ light-gathering ability or f-number, and the exposure time (that is, the length of time an
image sensor is exposed to light) are the two main elements that control how much light an
image sensor receives. A third element, the gain, is an amplifier that is used to make the image
brighter. However, increasing the gain also increases the level of noise (graininess) in an image,
so adjusting the exposure time or iris opening is preferred. For more on exposure control, see
Section 3.6.
3.2.5 Types of iris control: fixed, manual, auto, precise (P-Iris)
The ability to control a camera’s iris opening plays an important role in image quality. An iris is
used to maintain the optimum light level to the image sensor so that images are properly exposed. The iris can also be used to control the depth of field, which is explained in more detail
in Section 3.2.6. Iris control can be fixed or adjustable, and adjustable iris lenses can be manual or automatic. Automatic iris lenses can further be classified as either auto iris or P-Iris
Fixed iris
With fixed iris lenses, the iris opening cannot be adjusted and is fixed at a certain f-number.
The camera can compensate for changes in the level of light by adjusting the exposure time or
using gain.
Manual iris
With manual iris lenses, the iris can be adjusted by turning a ring on the lens to open or close
the iris. This is not convenient in environments with changing light conditions, such as in outdoor surveillance applications.
Auto iris (DC and video)
There are two types of auto iris lenses: DC iris and video iris. Both use a galvanometer to automatically adjust the iris opening in response to changes in light levels. Both also use an analog
signal (often analog video signal) to control the iris opening. The difference between the two is
where the circuitry to convert the analog signal into control signals is located. In a DC-iris lens,
the circuit resides inside the camera; in a video iris, it is inside the lens.
In bright situations, a camera with an auto iris lens can be affected by diffraction and blurring
when an iris opening becomes too small. This problem is especially prominent in megapixel and
HDTV cameras since the pixels in the image sensors are smaller than lower resolution cameras.
Therefore, the image quality is more dependent on getting the right iris opening (aperture). In
order to optimize image quality, a camera needs to have control over the position of the iris
opening. The problem with an auto iris lens is that this control cannot be made available to the
camera or user.
P-Iris is an automatic, precise iris control first developed by Axis and
Kowa Company of Japan. It involves a P-Iris lens and specialized software
that optimize image quality. The system is designed to address the shortcomings of an auto-iris lens. P-Iris provides improvements in contrast,
clarity, resolution and depth of field. Having good depth of field—where
objects at different distances from the camera are in focus simultaneously—is important in the video monitoring of, for example, a long corridor or parking lot.
Old technology
Figure 3.2d The P-Iris image (at right) provides greater depth of field.
Old technology (cropped view)
P-Iris (cropped view)
Figure 3.2e The P-Iris image (at right) provides higher contrast.
In bright situations, P-Iris limits the closing of the iris to avoid blurring (diffraction) caused
when the iris opening becomes too small. This can typically happen in cameras that use DC-iris
lenses in combination with megapixel sensors that have small pixels. Being able to avoid diffraction and at the same time benefit from an automatically controlled iris is highly valued in
outdoor video surveillance applications.
A P-Iris lens uses a motor that allows the position of the iris opening to be precisely controlled.
Together with software that is configured to optimize the performance of the lens and image
sensor, P-Iris automatically provides the best iris position for optimal image quality in all lighting conditions.
In an Axis network camera with P-Iris, the camera’s web page provides a scale of f-numbers
that ranges between the widest and smallest iris opening. This feature enables the user to adjust the preferred iris position, which is the iris position used by the automatic control for most
lighting conditions.
Figure 3.2f P-Iris enables the user to adjust the preferred iris position for most lighting conditions
P-Iris allows fixed network cameras to reach a new level of performance in image quality. The
advanced iris control is especially beneficial for megapixel/HDTV cameras and demanding video
surveillance applications.
3.2.6 Depth of field
A criterion that may be important to a video surveillance application is depth of field. Depth of
field refers to the distance in front of and beyond the point of focus where objects appear to be
sharp simultaneously. Depth of field may be important, for instance, in monitoring a parking lot,
where there may be a need to identify license plates of cars at 20, 30 and 50 meters (60, 90 and
150 feet) away.
Depth of field is affected by four factors: focal length, f-number, distance of the camera to the
subject, and the circle of confusion, which is a measurement of how carefully an image is
viewed. A long focal length, a large entrance pupil, a short distance between the camera and
the subject or a close-up view will limit the depth of field.
Figure 3.2g Depth of field: Imagine a line of people standing behind each other. If the focus is in the middle of the
line, the depth of field makes it possible to identify the faces of all in front and behind the mid-point more than 15 m
(45 ft.) away.
Figure 3.2h Iris opening and depth of field. The above illustration is an example of the depth of field for different
f-numbers with a focal distance of 2 m (7 ft.). A large f-number (smaller iris opening) enables objects to be in focus
over a longer range. (Depending on the pixel size, very small iris openings may blur an image due to diffraction.)
Removable IR-cut filter (Day/night functionality)
In many cameras, there is an automatically removable infrared-cut filter that sits behind a camera
lens, and in front of the image sensor. The role of an IR-cut filter is to filter out infrared light to enable
cameras to produce colors that the human eye sees. However, if the filter is removed under low light
or nighttime conditions, the camera’s sensor is able to take advantage of near-infrared light and
deliver black and white images even when there is not enough visible light.
Night filter
Image sensor
Optical holder
Front guard
Day filter
Figure 3.3a Illustration and photo of the IR-cut (day/night) filter on the optical holder, which in this camera, slides
sideways on the back side of the front guard to use the red-hued filter during the day and the clear part during the night.
Near-infrared light, which spans from 0.7 micrometers (μm) up to about 1.0 μm, is beyond what the
human eye can see, but most camera sensors can detect it and make use of it.
B/W mode
Color mode
Visible light
Near-IR light
Relative response
Wavelength (μm)
Figure 3.3b The graph shows how an image sensor responds to visible and near-IR light. Near-IR light spans the 0.7 μm
to 1.0 μm range.
Cameras with a removable IR-cut filter have day/night functionality as they deliver color video during
daytime, and during nighttime, black and white video, which reduces the image noise. They have
applications in low-light video surveillance situations, covert surveillance and in environments that
restrict the use of artificial light. An IR illuminator that provides near-infrared light can also be used
in conjunction with a day/night camera to further enhance the camera’s ability to produce highquality video in low-light or complete darkness. Day/night cameras with built-in IR illuminators are
also available.
Figure 3.3c At left, external IR illuminators; at right, two cameras with built-in IR illuminators.
Image sensors
As light passes through a lens, it is focused on the camera’s image sensor. An image sensor is made
up of many photosites and each photosite corresponds to a picture element (more commonly known
as “pixel”) on an image sensor. Each pixel on an image sensor registers the amount of light it is exposed to and converts it into a corresponding number of electrons. The brighter the light, the more
electrons are generated.
When building a camera, there are two main technologies that can be used for the camera’s
image sensor:
> CMOS (complementary metal-oxide semiconductor)
> CCD (charge-coupled device)
Figure 3.4a Images sensors: CMOS (at left); CCD (at right).
CMOS sensors are developing at a much faster pace than CCDs. The quality of CMOS sensors
has undergone dramatic improvements and they are today well suited for delivering high-performance multi-megapixel video. Compared with CCD sensors, CMOS sensors enable more integration possibilities and functions, and have a faster readout, which is advantageous when
high-resolution images are required. They also have lower power dissipation at the chip level
and a smaller system size. CMOS sensors lower the total cost for cameras since they contain all
the logics needed to build cameras around them. Megapixel CMOS sensors are more widely
available and are often less expensive than megapixel CCD sensors.
The megapixel sensors that are generally used in video surveillance cameras have smaller size
pixels than lower resolution sensors. For this reason, megapixel sensors were known to be less
light sensitive than lower resolution sensors. However, advancements in CMOS technology
make it possible for newer megapixel sensors (and hence, new multi-megapixel cameras) to
match the light sensitivity of many lower resolution sensors and cameras. While megapixel
sensors with larger pixel sizes are available, they are not often used in video surveillance cameras due to the limited availability of lenses that match them.
Image sensors with wide dynamic range are also making it possible to introduce cameras that
can simultaneously show objects in very bright and dark areas of a scene.
CCD sensors, which employ a technology that was specifically developed for the camera industry, have been in use since the 1970s and still present some benefits at moderate resolutions
and video speed. CCD sensors, however, are often more expensive and more complex to incorporate into a camera. A CCD can also consume much more power than an equivalent CMOS
For details, see the white paper on image sensors at
Image scanning techniques
Interlaced scanning and progressive scanning are the two techniques available today for reading and displaying information produced by image sensors. Network cameras can make use of
either scanning technique. Analog cameras can only make use of the interlaced scanning technique for transferring images over a coaxial cable and for displaying them on analog monitors.
3.5.1 Interlaced scanning
When an image from an interlaced image sensor is produced, two fields of lines are generated:
a field displaying the odd lines, and a second field displaying the even lines. However, to create
the odd field, information from both the odd and even lines on a sensor is combined. The same
goes for the even field, where information from both the even and odd lines is combined to
form an image on every other line.
When transmitting an interlaced image, only half the number of lines (alternating between odd
and even lines) of an image is sent at a time, which reduces the use of bandwidth by half. The
monitor, for example, a traditional TV, must also use the interlaced technique. First the odd
lines and then the even lines of an image are displayed and then refreshed alternately at 25/50
(PAL) or 30/60 (NTSC) frames per second so that the human visual system interprets them as
complete images. All analog video formats and some modern HDTV formats are interlaced.
Although the interlacing technique creates artifacts or distortions as a result of ‘missing’ data,
they are not very noticeable on an interlaced monitor.
However, when interlaced video is shown on progressive scan monitors such as computer monitors, which scan lines of an image consecutively, the artifacts become noticeable. The artifacts, which can be seen as “tearing”, are caused by the slight delay between odd and even line
refreshes as only half the lines keep up with a moving image while the other half waits to be
refreshed. It is especially noticeable when the video is stopped and a freeze frame of the video
is analyzed.
3.5.2 Progressive scanning
With a progressive scan image sensor, values are obtained for each pixel on the sensor and each
line of image data is scanned sequentially, producing a full frame image. In other words, captured images are not split into separate fields as with interlaced scanning. With progressive
scan, an entire image frame is sent over a network and when displayed on a progressive scan
computer monitor, each line of an image is put on the screen one at a time in perfect order.
Moving objects are, therefore, better presented on computer screens using the progressive
scan technique. In a video surveillance application, it can be critical in viewing details of a moving subject (e.g., a person running away). Virtually all Axis network cameras use the progressive
scan technique.
1st field: Odd lines
2nd field: Even lines
Freeze frame on moving dot
[17/20 ms (NTSC/PAL) later] using interlaced scanning
Freeze frame on moving dot
using progressive scan
Figure 3.5a At left, an interlaced scan image shown on a progressive (computer) monitor. At right, a progressive
scan image on a computer monitor.
Figure 3.5b At left, a full-sized JPEG image (704x576 pixels) from an analog camera using interlaced scanning.
At right, a full-sized JPEG image (640x480 pixels) from an Axis network camera using progressive scan technology.
Both cameras used the same type of lens and the speed of the car was the same at 20 km/h (15 mph). The background
is clear in both images. However, the driver is clearly visible only in the image using progressive scan technology.
Exposure control
As mentioned earlier, exposure time has an effect on images and users can change the settings
related to exposure in a number of ways. The most important ones—exposure priority, exposure
zones, dynamic range and backlight compensation—are explained in this section.
3.6.1 Exposure priority
Bright environments require shorter exposure time. Low-light conditions require longer exposure time so that the image sensor can receive more light and thereby, improve image quality.
However, increasing the exposure time also increases motion blur and lowers the total frame
rate since a longer time is required to expose each image frame.
In low-light conditions, Axis network cameras enable users to prioritize video quality in terms
of either movement or low noise (graininess). When rapid movement or when a high frame rate
is required, a shorter exposure time/fast shutter speed is recommended, but image quality may
be reduced.
When low noise is prioritized, the gain (amplification) should be kept as low as possible to improve image quality, but frame rate may be reduced as a result. Keep in mind that in dark
conditions, setting a low gain can result in a very dark image. A large gain value makes it possible to observe a dark scene, but with increased noise.
Figure 3.6a A camera’s web page with options for setting, among other things, exposure in low-light conditions.
3.6.2 Exposure zones
Besides dealing with limited areas of high illumination, a network camera’s automatic exposure
must also decide what area of an image should determine the exposure value. For instance, the
foreground (usually the bottom section of an image) may hold more important information than
the background; for example, the sky (usually the top section of an image). The less important
areas of a scene should not determine the overall exposure. In many Axis network cameras, the
user is able to use exposure zones to select the area of a scene—center, left, right, top or bottom—that should be more correctly exposed.
3.6.3 Dynamic range
Dynamic range, as it relates to light, is the ratio between the largest and smallest illumination
values. Many scenes have high dynamic range, with areas that are very bright and very dark.
This is a problem for standard cameras, which have limited dynamic range. In such scenes or in
backlight situations where a person is in front of a bright window, a typical camera will produce
an image where objects in the dark areas will hardly be visible. To increase a camera’s dynamic
range and enable objects in dark and light areas to be seen, various techniques can be applied.
Exposure can be controlled and tone mapping can be used to increase the gain in dark areas.
Figure 3.6b Above are two images of the same scene but the image on the right better handles the dynamic range
in the scene since details in both the bright and dark areas are visible.
3.6.4 Backlight compensation
While a camera’s automatic exposure tries to get the brightness of an image to appear as the human eye would see a scene, it can be easily fooled. Strong backlight can cause objects in the
foreground to be dark. Network cameras with backlight compensation strive to ignore limited
areas of high illumination, just as if they were not present. It enables objects in the foreground to
be seen, although the bright areas will be overexposed.
Installing a network camera
Once a network camera has been purchased, the way it is installed is just as important. Below
are some recommendations on how to best achieve high-quality video surveillance based on
camera positioning and environmental considerations.
> Surveillance objective and camera positioning. If the aim is to get an overview of an area
to be able to track the movement of people or objects, make sure a camera that is suitable
for the task is placed in a position that achieves the objective.
If the intention is to be able to identify a person or object, the camera must be positioned or
focused in a way that will capture the level of detail needed for identification purposes.
Axis’ pixel counter functionality, which is available in most Axis cameras, can be used to
verify that the pixel resolution of an object fulfills regulatory or customer requirements, for
example, for facial identification.
If a surveillance scene benefits more from a vertically oriented view, installing a camera with
Axis’ Corridor Format will be advantageous.
Cameras with varifocal lenses also enable the field of view to be adjusted, so be sure to make
the necessary adjustments and refocus to optimize the view. Local police authorities may
also be able to provide guidelines on how best to position a camera. See Chapter 2 for more
information on features such as Corridor Format and pixel counter.
> Use lots of light or add light if needed. It is normally easy and cost-effective to add strong
lamps in both indoor and outdoor situations to provide the necessary light conditions for
capturing good images.
> Avoid pointing the camera toward the sun as it will “blind” the camera and can reduce the
performance of the image sensor. If possible, position the camera with the sun shining from
behind the camera.
Avoid backlight. This problem typically occurs when attempting to capture an object in
front of a window. To avoid this problem, reposition the camera or use curtains and
close blinds if possible. If it is not possible to reposition the camera, add frontal lighting.
Cameras with support for wide dynamic range are better at handling a backlight scenario.
> Reduce the dynamic range of the scene. In outdoor environments, viewing too much sky
results in too high a dynamic range. If the camera does not support wide dynamic range,
a solution is to mount the camera high above the ground, using a pole if needed.
> Adjust camera settings. It may be necessary at times to adjust settings for white balance,
brightness and sharpness to obtain an optimal image. In low light situations, users must also
prioritize either frame rate or image quality.
Prior to mounting a camera, it is advisable to test it first. Where the distance between the
camera and the surveillance object, and the size of the object are known or can be approximated, setting the field of view on a varifocal lens and roughly focusing it can be done prior
to installing it. Once the camera is installed, aspects such as the field of view, focus and
other settings can be fine-tuned.
Axis network
AXIS T8412
Installation Display
Figure 3.7a A battery-powered handheld display device, such as the AXIS T8414 Installation Display, can be helpful
at the installation site for fine-tuning a camera’s settings. The AXIS T8414 connects to and powers up the camera
and gives installers an easier alternative than the use of a laptop, which may be awkward to work with when
installing a camera while on a ladder or sky lift.
Legal considerations. Video surveillance can be restricted or prohibited by laws that vary
from country to country. It is advisable to check the laws in the local region before installing
a video surveillance system. It may be necessary, for instance, to register or get a license for
video surveillance, particularly in public areas. Signage may be required. Video recordings
may require time and date stamping. There may be rules regulating how long video should
be retained. Audio recordings may or may not be permitted.
Video encoders
Video encoders enable an existing analog CCTV video surveillance system to be
integrated with a network video system. Video encoders play a significant role in
installations where many analog cameras are to be maintained. This chapter provides
an overview on video encoders and describes the different types of video encoders
that are available. A brief discussion on deinterlacing techniques is also included, in
addition to a section on video decoders.
What is a video encoder?
A video encoder makes it possible for an analog CCTV system to migrate to a network video
system. It enables users to gain the benefits of network video without having to discard existing
analog equipment such as analog CCTV cameras and coaxial cabling.
A video encoder connects to an analog video camera via a coaxial cable and converts analog
video signals into digital video streams that are then sent over a wired or wireless IP-based
network (e.g., LAN, WLAN or Internet). To view and/or record the digital video, computer monitors and PCs can be used instead of DVRs or VCRs and analog monitors.
Remote access from
office/home computer
with web browser
Axis network cameras
Axis video encoders
0 -
FNP 30
100-240 AC
50-50 Hz
4-2 A
0 -
FNP 30
AXIS Q7900 Rack
50-50 Hz
4-2 A
AXIS Q7406
Video Encoder
AXIS Q7406
Video Encoder
Computer with video
management software
Network video decoder
and video wall
Figure 4.1a An illustration of how analog video cameras and analog monitors can be integrated with a network
video system using video encoders and decoders.
By using video encoders, analog video cameras of all types, such as fixed, indoor/outdoor,
dome, pan/tilt/zoom, and specialty cameras such as microscope cameras can be remotely
accessed and controlled over an IP network.
A video encoder also offers other benefits such as event management and intelligent video
functionalities, as well as advanced security measures. It may also incorporate a memory card
slot for storing recordings locally. A video encoder also provides scalability and ease of integration with other security systems.
Analog input
Ethernet (PoE)
Memory card
RS-485 RS-422
Figure 4.1b A four-channel, standalone video encoder with audio, I/O (input/output) ports for controlling
external devices such as sensors and alarms, serial ports (RS-422/RS-485) for controlling PTZ analog cameras, Ethernet connection with Power over Ethernet support and a memory card slot for local storage of recordings.
4.1.1 Video encoder components and considerations
Axis video encoders offer many of the same functions that are available in network cameras.
Some of the main components of a video encoder include:
> Analog video input for connecting an analog camera using a coaxial cable.
Processor for running the video encoder’s operating system, for networking and security
functionalities, for encoding analog video using various compression formats and for video
analysis. The processor determines the performance of a video encoder, normally measured in
frames per second in the highest resolution. Advanced video encoders can provide full frame
rate (30 frames per second with NTSC-based analog cameras or 25 frames per second with
PAL-based analog cameras) in the highest resolution for every video channel. Axis video
encoders also have auto sensing to automatically recognize if the incoming analog video
signal is an NTSC or PAL standard. For more on NTSC and PAL resolutions, see Chapter 6.
> Memory for storing the firmware (computer program) using Flash, as well as buffering of
video sequences (using RAM).
> Memory card slot that enables recordings to be locally stored on a memory card.
> Ethernet/Power over Ethernet port to connect to an IP network for sending and receiving
data, and for powering the unit and the attached camera if Power over Ethernet is supported.
For more on Power over Ethernet, see Chapter 9.
>Serial port (RS-232/RS-422/RS-485) often used for controlling the pan/tilt/zoom
functionality of an analog PTZ camera.
> Input/output ports for connecting external devices; for example, sensors to detect an alarm
event, and relays to activate, for instance, lights in response to an event.
> Audio in for connecting a microphone or line-in equipment and audio out for connecting to
When selecting a video encoder, key considerations for professional systems are reliability and
quality. Other considerations include the number of supported analog channels, image quality,
compression formats, resolution, frame rate and features such as pan/tilt/zoom support, audio,
event management, intelligent video, Power over Ethernet and security functionalities.
Figure 4.1c IP66-rated protective enclosure for video encoders.
Meeting environmental requirements may also be a consideration if the video encoder must
withstand such conditions as vibration, shock and extreme temperatures. In such cases, a protective enclosure or a rugged video encoder should be considered.
4.1.2 Event management and intelligent video
One of the main benefits of Axis video encoders is the ability to provide event management and
intelligent video functionalities—capabilities that cannot be provided in an analog video system.
Built-in intelligent video features such as multi-window video motion detection, audio detection
and active tampering alarm, as well as input ports for external sensors, enable a network video
surveillance system to be constantly on guard to detect an event. Once an event is detected, the
system can automatically respond with actions that may include video recording, sending alerts
such as e-mails and SMS, activating lights, opening/closing doors and sounding alarms. For more
on event management and intelligent video, see Chapter 11.
Standalone video encoders
Figure 4.2a Standalone video encoders ranging from one channel up to 16 channels, including a rugged version.
The most common type of video encoders is the standalone version, which offers one or multichannel connections to analog cameras. A multi-channel video encoder is ideal in situations
where there are several analog cameras located in a remote facility or a place that is a fair
distance from a central monitoring room. Through the multi-channel video encoder, video
signals from the remote cameras can then share the same network cabling, thereby reducing
cabling costs.
In situations where investments have been made in analog cameras but coaxial cables have not
yet been installed, it is best to use and position standalone video encoders close to the analog
cameras. It reduces installation costs as it eliminates the need to run new coaxial cables to a
central location since the video can be sent over an Ethernet network. It also eliminates the loss
in image quality that would occur if video were to be sent over long distances through coaxial
cables. With coaxial cables, the video quality decreases the further the signals have to travel.
A video encoder produces digital images, so there is no reduction in image quality due to the
distance traveled by a digital video stream.
Figure 4.2b An illustration of how a single-channel video encoder can be positioned next to an analog
camera in a camera housing.
Rack-mounted video encoders
Rack-mounted video encoders are beneficial in instances where there are many analog
cameras with coaxial cables running to a dedicated control room. They enable many analog
cameras to be connected and managed from one rack in a central location. A rack allows a
number of different video encoder blades to be mounted and thereby offers a flexible and
expandable, high-density solution. A video encoder blade may support one, four or six analog
cameras. A blade can be seen as a video encoder without a casing, although it cannot function
on its own since it has to be mounted in a rack to operate.
Figure 4.3a Video encoder blades and racks supporting various numbers of analog cameras and features. When the
AXIS Q7900 Rack (far right) is fully outfitted with 6-channel video encoder blades, it can connect to as many as 84
analog cameras.
Axis video encoder racks support features such as hot swapping of blades; that is, blades can
be removed or installed without having to power down the rack. The racks also provide serial
communication and input/output ports for each video encoder blade, in addition to a common
power supply and shared Ethernet network connection(s).
Video encoders with analog PTZ cameras
In a network video system, pan/tilt/zoom commands from a control board are carried over the
same IP network as for video transmission and are forwarded to the analog PTZ camera through
the video encoder’s serial port (RS-232/RS-422/RS-485). Video encoders, therefore, enable analog PTZ cameras to be controlled over long distances, even through the Internet. (In an analog
CCTV system, each PTZ camera would require separate and dedicated serial wiring from the
control board—with joystick and other control buttons—all the way to the camera.)
To control a specific PTZ camera, a driver must be uploaded to the video encoder. Many manufacturers of video encoders provide PTZ drivers for most analog PTZ cameras. A PTZ driver can also
be installed on the PC that runs the video management software program if the video encoder’s
serial port is set up as a serial server that simply passes on the commands.
twisted pair
Coax Cable
Analog dome
Video encoder
PC workstation
Figure 4.4a An analog PTZ dome camera can be controlled via the video encoder’s serial port (e.g., RS-485), making
it possible to remotely control it over an IP network.
The most commonly used serial port for controlling PTZ functions is RS-485. One of the benefits
that RS-485 allows is the possibility to control multiple PTZ cameras using twisted pair cables in
a daisy chain connection from one dome camera to the next. The maximum distance of an
RS-485 cable, without using a repeater, is 1,200 m (4,000 ft.).
Deinterlacing techniques
Video from analog cameras is designed to be viewed on analog monitors such as traditional TV
sets, which use a technique called interlaced scanning. With interlaced scanning, two consecutive interlaced fields of lines are shown to form an image. When such video is shown on a
computer screen, which uses a different technique called progressive scanning, interlacing
effects (i.e., tearing or comb effect) from moving objects can be seen. In order to reduce the
unwanted interlacing effects, different deinterlacing techniques can be employed. In advanced
Axis video encoders, users can choose between two different deinterlacing techniques: adaptive interpolation and blending.
Figure 4.5a At left, a close-up of an interlaced image shown on a computer screen; at right, the same interlaced
image with deinterlacing technique applied.
Adaptive interpolation offers the best image quality. The technique involves using only one of
the two consecutive fields and using interpolation to create the other field of lines to form a
full image.
Blending involves merging two consecutive fields and displaying them as one image so that all
fields are present. The image is then filtered to smooth out the motion artifacts or ‘comb effect’
caused by the fact that the two fields were captured at slightly different times. The blending
technique is not as processor intensive as adaptive interpolation.
Video decoder
Axis’ video decoders enable digital or analog monitors to connect to and display live video from
Axis network cameras and video encoders. The video decoders can decode digital video and audio
coming from video encoders or network cameras into analog signals, which can then be used by
analog monitors, such as traditional TV sets, and video switches. The video decoders can also
provide high-quality, digital outputs on LCD screens. They are ideal for use with a public view
monitor, and in large and small surveillance systems. The video decoders have the ability to decode and display video from many cameras sequentially; that is, decoding and showing video
from one camera for some seconds before changing to another and so on. They also have autoconnect on alarm, which will automatically display alarm-triggered video.
In situations where only live video display is required, such as with a public view monitor at a store
entrance, a video decoder offers a more cost-effective solution than connecting a monitor to the
network via a PC. A video decoder can also complement a video management system by helping
to offload the main server from decoding digital streams simply for display purposes.
Another common application for a video decoder is to use it in an analog-to-digital-to-analog
configuration for transporting video over long distances. The quality of digital video is not affected by the distance traveled, which is not the case when sending analog signals over long
distances. The only downside may be some level of latency, from 100 ms to a few seconds, depending on the distance and the quality of the network between the end points.
Analog camera
Axis video
Axis video
Analog monitor
Figure 4.6a A video encoder and video decoder can be used to transport video over long distances, from an analog
camera to an analog monitor.
Environmental protection
Surveillance cameras are often placed in environments that are very demanding.
Cameras, video encoders and certain accessories may require protection from rain, hot
and cold environments, dust, corrosive substances, vibrations and vandalism. Various
methods may be used to meet such environmental challenges.
The sections below cover such topics as environmental protection, external housings,
coverings, positioning of fixed cameras in enclosures, vandal and tampering protection, and types of mounting.
Protection and ratings
The main environmental threats to a network video product—particularly one that is installed
outdoors—are cold, heat, water, dust and snow. Today, many indoor and outdoor Axis network
video products are designed to meet environmental challenges—right out of the box—and do
not require separate housings. This results in a more compact camera/video encoder and an
easier installation process. For example, Axis cameras that are designed to operate in temperatures up to 75 °C (167 °F) are very compact, even with a built-in active cooling system.
A camera design can also ensure reliability and maintenance of a camera’s lifetime, especially
under extreme operating conditions. For instance, some of Axis’ fixed and PTZ dome cameras
incorporate Arctic Temperature Control, which allows the cameras to start-up in temperatures as
low as -40 °C/°F without causing extra wear and tear on the cameras. The control enables different elements in the camera unit to receive power at different times. Some Axis fixed domes
without Arctic Temperature Control can also start up at -40 °C/°F and send video immediately.
The level of protection provided by enclosures, whether built-in or separate from the network
video product, is often indicated by classifications set by such standards as IP, NEMA and IK
ratings. IP stands for Ingress Protection (also sometimes known as International Protection)
and is applicable worldwide. NEMA stands for National Electrical Manufacturers Association
and is applicable in the U.S. IK ratings pertain to external mechanical impacts and are applicable internationally.
Figure 5.1a From left, a rugged camera designed to meet the special environment of a bus, an outdoor-ready fixed
dome, an outdoor fixed camera with Arctic Temperature Control, a PTZ dome with built-in active cooling, as well as
a rugged video encoder.
The most common environmental ratings for Axis’ indoor products are IP42, IP51 and IP52,
which provide resistance against dust and humidity/dripping water. Axis’ outdoor products usually have IP66 and NEMA 4X ratings. IP66 ensures protection against dust, rain and powerful
water jets. NEMA 4X ensures protection not only against dust, rain, and hose-directed water, but
also snow, corrosion and damage from the external build-up of ice. Some Axis cameras that are
designed for extreme environments also meet the U.S. military’s MIL-STD-810G standard for
high temperature, temperature shock, radiation, salt fog and sand. For vandal-resistant products, IK08 and IK10 are the most common ratings for resistance against impact. More on IP
ratings can be found here:
In situations where cameras may be exposed to acids, such as in the food industry, housings
made of stainless steel are required. Special enclosures may also be required for aesthetic considerations. Some specialized housings can be pressurized, submersible and bulletproofed. When
a camera is to be installed in a potentially explosive environment, other standards—such as
IECEx, which is a global certification, and ATEX, a European certification—come into play.
External housings
In instances where the demands of the environment are beyond a network video product’s operating conditions, external enclosures are required. Housings come in different sizes and qualities
and with different features.
There may be camera housings with heaters and fans (blowers) to accommodate changing temperatures. Some housings also have peripherals such as antennas for wireless applications. An
external antenna is only required if the housing is made of metal. A wireless camera inside a
plastic housing will work without the use of an external antenna.
In outdoor installations, special enclosures may also be required for video encoders and accessories such as I/O audio modules and video decoders. Critical system equipment such as power
supply, midspan and switch may also require protection from weather and vandalism.
Housings are made of either metal or plastic. When selecting an enclosure, several things need to
be considered, including:
Easy access to the network video product
Mounting brackets
Clear or smoked dome cover (for dome camera housings)
Cable management
Temperature and other ratings (consider the need for heater, fan and sunshield)
Power supply (12 V, 24 V, 110 V, 230 V, PoE etc.)
Level of vandal resistance
Figure 5.2a Outdoor-ready, vandal-resistant cabinets for protecting such equipment as power supply and switches,
as well as providing a place to mount the Axis cameras. At far right, an outdoor-ready enclosure for video encoders,
I/O audio modules and video decoders.
Transparent coverings
The “window” or transparent covering of an enclosure is usually made of acrylic (PMMA) or
polycarbonate plastic. As windows act like optical lenses, they should be of high quality to
minimize its effect on image quality. When there are built-in imperfections in the clear material, clarity is compromised.
Higher demands are placed on the windows of housings for PTZ cameras. Not only do the windows have to be specially shaped in the form of a bubble, but they must also have high clarity
since imperfections such as dirt particles can be magnified, particularly when cameras with
high resolution and zoom factors are installed. In addition, if the thickness of the window is
uneven, a straight line may appear curved in the resulting image. A high-quality dome cover
should have very little impact on image quality, irrespective of the camera’s zoom level and lens
The thickness of a dome cover can be increased to withstand heavy blows, but the thicker a
covering is, the higher the chances of imperfections. Increased thickness may also create
unwanted reflections and refraction of light. Therefore, thicker coverings should meet higher
requirements if the effect on image quality is to be minimized.
A variety of dome coverings are available, including clear and smoked versions. While smoked
versions enable a more discreet installation, they also act much like sunglasses do in reducing
the amount of light available to the camera. It will, therefore, have an effect on the camera’s
light sensitivity.
Positioning a fixed camera in a housing
When installing a fixed camera in an enclosure, it is important that the lens of the camera is
positioned right up against the window to prevent any glare. Otherwise, reflections from the
camera and the background will appear in the image. To reduce reflection, special coatings can
be applied on any glass used in front of the lens. Today, Axis’ outdoor fixed cameras are delivered pre-mounted in an outdoor housing, which saves on installation time and prevents errors.
Figure 5.4a When installing a camera behind a glass, correct positioning of the camera becomes important to avoid
Vandal and tampering protection
In some surveillance applications, cameras are at risk of hostile and violent attacks. While a
camera or housing can never guarantee 100% protection from destructive behavior in every
situation, vandalism can be mitigated by considering various aspects: camera/housing design,
mounting, placement and use of intelligent video functionalities.
5.5.1 Vandal-resistant ratings
Vandal or impact resistance can be indicated by the IK rating on a camera or housing. IK ratings
specify the degree of protection that enclosures of electrical equipment can provide against
external mechanical impacts. For example, an IK10 rating means the product can withstand 20
joules of impact, which is equivalent to a drop of a 5-kg object from a height of 40 cm.
5.5.2 Camera/housing design
The shape of the housing or camera is an important factor. A housing or a traditional fixed camera
that protrudes from a wall or ceiling is more vulnerable to attacks (e.g., kicking or hitting) than
more discreetly designed housings or casings for a fixed dome or PTZ camera. The smooth,
rounded covering of a fixed dome or a ceiling-mounted PTZ dome makes it more difficult, for
example, to block the camera’s view by trying to hang a piece of clothing over the camera. The
more a housing or camera blends into an environment or is disguised as something other than a
camera—for example, an outdoor light—the better the protection against vandalism.
Figure 5.5a Examples of vandal-resistant cameras and housings
The way cameras and housings are mounted is also important. As mentioned earlier, a traditional fixed network camera or a PTZ camera whose mount protrudes from a wall or ceiling is
more vulnerable to attacks. How the cabling to a camera is mounted is also an important consideration. Maximum protection is provided when the cable is pulled directly through the wall
or ceiling behind the camera. In this way, there are no visible cables to tamper with. If this is not
possible, a conduit should be used to protect cables from attacks.
5.5.4 Camera placement
Camera placement is also an important factor in deterring vandalism. By placing a camera out
of reach on high walls or in the ceiling, many spur-of-the-moment attacks can be prevented.
The downside may be the angle of view, which to some extent can be compensated by selecting
a different lens.
5.5.5 Intelligent video
Axis’ active tampering alarm feature helps protect cameras against vandalism. It can detect if a
camera has been redirected, obscured or tampered with, and can send alarms to operators. This is
especially useful in installations with hundreds of cameras in demanding environments where
keeping track of the proper functioning of all cameras is difficult. It is also useful in situations
where no live viewing takes place and operators can be notified when cameras have been
tampered with.
Types of mounting
Cameras need to be placed in all kinds of locations and this requires a large number of variations
in the type of mounting.
Pendant Kit
Figure 5.6a Examples of mounting accessories
5.6.1 Ceiling mounts
Ceiling mounts are mainly used in indoor installations. The enclosure itself can be:
> A surface mount: mounted directly on the surface of a ceiling and, therefore, completely
> A drop-ceiling mount: mounted inside the ceiling with only parts of a camera and
housing (usually the clear dome cover) visible
> A pendant mount: hung from a ceiling like a pendant
5.6.2 Wall mounts
Wall mounts are often used to mount cameras inside or outside a building. The housing is
connected to an arm, which is mounted on a wall. Advanced mounts have an inside cable gland
to protect the cable. To install an enclosure at a corner of a building, a normal wall mount,
together with an additional corner adapter, can be used.
5.6.3 Pole mounts
A pole mount is often used together with a PTZ camera in locations such as a parking lot. This
type of mount usually takes into consideration the impact of wind. The dimensions of the pole
and the mount itself should be designed to minimize vibrations. Cables are often enclosed inside
the pole and outlets must be properly sealed. Some PTZ cameras have built-in electronic image
stabilization to limit the effects of wind and vibrations.
5.6.4 Parapet mounts
Parapet mounts are used for roof-mounted housings or to raise the camera for a better angle of
Axis provides an online tool that can help users identify the right housing and mounting accessories needed. Visit
Video resolutions
Video resolution in an analog or digital world is similar, but there are some important
differences in how it is defined. In analog video, an image consists of lines or TV-lines
since analog video technology is derived from the television industry. In a digital system, an image is made up of square pixels.
The sections below describe the different resolutions that network video can provide.
They include NTSC, PAL, VGA, megapixel and HDTV.
NTSC and PAL resolutions
NTSC (National Television System Committee) and PAL (Phase Alternating Line) resolutions are
analog video standards. They are relevant to network video since video encoders provide such
resolutions when they digitize signals from analog cameras. Older Axis PTZ network cameras
also provide NTSC and PAL resolutions since such cameras include an NTSC/PAL-compatible
camera block (which incorporates the camera sensor with integrated lens that enables zoom,
autofocus and auto-iris functions) made for analog video cameras, in conjunction with a builtin video encoder board.
Both NTSC and PAL standards originate from the television industry. NTSC has a resolution of
480 scan lines and uses a refresh rate of 60 interlaced fields per second (or 30 full frames per
second). The naming convention for this standard is 480i60, which defines the number of lines,
type of scan (“i” stands for interlaced scanning) and refresh rate. PAL has a resolution with 576
scan lines and uses a refresh rate of 50 interlaced fields per second (or 25 full frames per second). The naming convention for this standard is 576i50. The total amount of information per
second is the same in both standards.
When analog video is digitized, the maximum amount of pixels that can be created is based on
the number of TV lines available to be digitized. The maximum size of a digitized image is typically D1 and the most commonly used resolution is 4CIF.
D1 720 x 576
D1 720 x 480
When shown on a computer screen, digitized analog video may show interlacing effects such
as tearing and shapes may be off slightly since the pixels generated may not conform to the
square pixels on the computer screen. Interlacing effects can be reduced using deinterlacing
techniques (see Chapter 4.5). Correction for the aspect ratio (the ratio of the width of an image
to its height) can be applied to video before it is displayed to ensure, for instance, that a circle
in an analog video remains a circle when shown on a computer screen.
4CIF 704 x 480
4CIF 704 x 576
2CIF 704 x 288
2CIF 704 x 240
CIF 352 x 288
CIF 352 x 240
QCIF 176 x 120
QCIF 176 x 144
Figure 6.1a At left, different NTSC image resolutions. At right, different PAL image resolutions..
VGA resolutions
4CIF 704 x 480
VGA 640 x 480
SVGA 800 x 600
With 100% digital systems based on network cameras, resolutions that are derived from the computer industry and that are standardized worldwide can be provided, allowing for better flexibility.
The limitations of NTSC and PAL become irrelevant. VGA (Video Graphics Array) is a graphics
display system for PCs originally developed by IBM. The resolution is defined as 640x480 pixels.
Axis cameras today offer resolutions greater than that. They include SVGA (Super VGA), which is
800x600 pixels, and HDTV and multi-megapixel resolutions, which are explained further in the
following sections.
HDTV 720p 1280 x 720
1 MP 1280 x 800
2 MP / HDTV 1080 1920 x 1080
~2 MP 1600 x 1200
3 MP 2048 x 1536
5 MP 2592 x 1944
Figure 6.2a Common resolutions in Axis products.
Megapixel resolutions
A network camera that offers megapixel resolution uses a megapixel sensor to deliver an image
that contains one million or more pixels. The more pixels a sensor has, the greater the potential
it has for capturing finer details and for producing a higher quality image. Megapixel network
cameras can be used to allow users to see more details (ideal for identification of people and
objects) or to view a larger area of a scene. This benefit is an important consideration in video
surveillance applications.
Megapixel resolution is one area in which network cameras excel over analog cameras. The
maximum resolution a conventional analog camera can provide after the video signal has been
digitized in a digital video recorder or a video encoder is D1, which is 720x480 pixels (NTSC) or
720x576 pixels (PAL). The D1 resolution corresponds to a maximum of 414,720 pixels or 0.4
megapixel. By comparison, a common megapixel format of 1280x1024 pixels gives a 1.3-megapixel resolution. This is more than 3 times the resolution that can be provided by analog CCTV
Megapixel resolution also provides a greater degree of flexibility in terms of being able to provide images with different aspect ratios. A conventional TV monitor displays an image with an
aspect ratio of 4:3. Axis megapixel network cameras can offer the same ratio, as well as others,
such as 16:9. The advantage of a 16:9 aspect ratio is that unimportant details, usually located
in the upper and lower part of a conventional-sized image, are not present and therefore, bandwidth and storage requirements can be reduced.
Figure 6.3a Illustration of 4:3 and 16:9 aspect ratios.
High-definition television (HDTV) resolutions
The video industry has embraced HDTV formats and today, HDTV is prevalent. HDTV provides
up to five times higher resolution than standard analog TV. HDTV also has better color fidelity
(i.e., how true colors are to reality) and a 16:9 format. Defined by SMPTE (Society of Motion
Picture and Television Engineers), the two most important HDTV standards are SMPTE 296M
and SMPTE 274M.
SMPTE 296M (HDTV 720p) defines a resolution of 1280x720 pixels with high color fidelity in a
16:9 format using progressive scanning at 25/30 hertz (Hz), which corresponds to 25 or 30
frames per second depending on the country, and at 50/60 Hz (50/60 frames per second).
Countries using 25/50 Hz frequencies include those in Europe, many in Asia and Africa, Australia, and some in South America such as Argentina. Countries using 30/60 Hz include those in
North and Central America, as well as South Korea, Brazil and Saudi Arabia. Some countries like
Japan use 25/50 Hz and 30/60 Hz.
SMPTE 274M (HDTV 1080) defines a resolution of 1920x1080 pixels with high color fidelity in
a 16:9 format using either interlaced (represented by an “i” as in HDTV 1080i) or progressive
scanning (represented by a “p” as in HDTV 1080p) at 25/30 Hz and 50/60Hz.
A camera that complies with the SMPTE standards indicates adherence to HDTV quality and
should provide all the benefits of HDTV in resolution, color fidelity and frame rate.
The HDTV standard is based on square pixels—similar to computer screens, so HDTV video from
network video products can be shown on either HDTV screens or standard computer monitors.
With progressive scan HDTV video, no conversion or deinterlacing technique needs to be applied when the video is to be processed by a computer or displayed on a computer screen.
Video compression
Video compression technologies are about reducing and removing redundant video
data so that a digital video file can be effectively sent over a network and stored on
computer disks. With efficient compression techniques, a significant reduction in file
size can be achieved with little or no adverse effect on the video quality. The quality,
however, can be affected if the file size is further lowered by raising the compression
level for a given compression technique.
Different compression technologies, both proprietary and industry standards, are available. Most network video vendors today use standard compression techniques. Standards are important in ensuring compatibility and interoperability. They are particularly
relevant to video compression since video may be used for different purposes and, in
some video surveillance applications, needs to be viewable many years from the recording date. By deploying standards, end users are able to pick and choose from different
vendors, rather than be tied to one supplier when designing a video surveillance system.
Axis uses mostly two video compression standards: H.264 and Motion JPEG. H.264 is
the latest and most efficient video compression standard. The use of MPEG-4 Part 2
(or simply referred to as MPEG-4) is being phased out. This chapter covers the basics of
compression and provides a description of the compression standards mentioned earlier.
Compression basics
7.1.1 Video codec
The process of compression involves applying an algorithm to the source video to create a
compressed file that is ready for transmission or storage. To play the compressed file, an inverse
algorithm called decompression is applied to produce a video that shows virtually the same
content as the original source video. The time it takes to compress, send, decompress and display
a file is called latency. The more advanced the compression algorithm, the higher the latency.
A pair of algorithms that works together is called a video codec (encoder/decoder). Video
codecs of different standards are normally not compatible with each other; that is, video content that is compressed using one standard cannot be decompressed with a different standard.
For instance, an MPEG-4 Part 2 decoder will not work with an H.264 encoder. This is simply
because one algorithm cannot correctly decode the output from another algorithm but it is
possible to implement many different algorithms in the same software or hardware, which
would then enable multiple formats to coexist.
Image compression vs. video compression
Different compression standards utilize different methods of reducing data, and hence, results
differ in bit rate, quality and latency. Compression algorithms fall into two types: image compression and video compression.
Image compression uses intraframe coding technology. Data is reduced within an image frame
simply by removing unnecessary information that may not be noticeable to the human eye.
Motion JPEG is an example of such a compression standard. Images in a Motion JPEG sequence
are coded or compressed as individual JPEG images.
Figure 7.1a With the Motion JPEG format, the three images in the above sequence are coded and sent as separate
unique images (I-frames) with no dependencies on each other.
Video compression algorithms such as H.264 and MPEG-4 use interframe prediction to reduce
video data between a series of frames. This involves techniques such as difference coding,
where one frame is compared with a reference frame and only pixels that have changed with
respect to the reference frame are coded. In this way, the number of pixel values that is coded
and sent is reduced. When such an encoded sequence is displayed, the images appear as in the
original video sequence.
Figure 7.1b With difference coding, only the first image (I-frame) is coded in its entirety. In the two following images (P-frames), references are made to the first picture for the static elements, i.e., the house. Only the moving parts,
i.e., the running man, are coded using motion vectors, thus reducing the amount of information that is sent and
Other techniques such as block-based motion compensation can be applied to further reduce
the data. Block-based motion compensation takes into account that much of what makes up a
new frame in a video sequence can be found in an earlier frame, but perhaps in a different location. This technique divides a frame into a series of macroblocks (blocks of pixels). Block by
block, a new frame can be composed or ‘predicted’ by looking for a matching block in a reference frame. If a match is found, the encoder codes the position where the matching block is to
be found in the reference frame. Coding the motion vector, as it is called, takes up fewer bits
than if the actual content of a block were to be coded.
Search window
Matching block
Motion vector
Earlier reference frame
Target block
Figure 7.1c Illustration of block-based motion compensation.
With interframe prediction, each frame in a sequence of images is classified as a certain type
of frame, such as an I-frame, P-frame or B-frame.
An I-frame, or intra frame, is a self-contained frame that can be independently decoded without any reference to other images. The first image in a video sequence is always an I-frame.
I-frames are needed as starting points for new viewers or resynchronization points if the transmitted bit stream is damaged. I-frames can be used to implement fast-forward, rewind and
other random access functions. An encoder will automatically insert I-frames at regular intervals or on demand if new clients are expected to join in viewing a stream. The drawback of
I-frames is that they consume many more bits, but on the other hand, they do not generate
many artifacts, which are caused by missing data.
A P-frame, which stands for predictive inter frame, makes references to parts of earlier I and/
or P frame(s) to code the frame. P-frames usually require fewer bits than I-frames, but a drawback is that they are very sensitive to transmission errors because of the complex dependency
on earlier P and/or I frames.
A B-frame, or bi-predictive inter frame, is a frame that makes references to both an earlier
reference frame and a future frame. Using B-frames increases latency.
Figure 7.1d A typical sequence with I-, B- and P-frames. A P-frame may only reference preceding I- or P-frames,
while a B-frame may reference both preceding and succeeding I- or P-frames.
When a video decoder restores a video by decoding the bit stream frame by frame, decoding
must always start with an I-frame. P-frames and B-frames, if used, must be decoded together
with the reference frame(s).
Axis network video products allow users to set the GOV (group of video) length, which determines how many P-frames should be sent before another I-frame is sent. By decreasing the
frequency of I-frames (having longer GOV), the bit rate can be reduced. However, if there is
congestion on the network, the video quality may decline.
Besides difference coding and motion compensation, other advanced methods can be employed
to further reduce data and improve video quality. H.264, for example, supports advanced tech-
niques that include prediction schemes for encoding I-frames, improved motion compensation
down to sub-pixel accuracy, and an in-loop deblocking filter to smooth block edges (artifacts).
For more information on H.264 techniques, see Axis’ white paper on H.264 at
Compression formats
7.2.1 Motion JPEG
Motion JPEG or M-JPEG is a digital video sequence that is made up of a series of individual JPEG
images. (JPEG stands for Joint Photographic Experts Group.) When 16 image frames or more
are shown per second, the viewer perceives motion video. Full motion video is perceived at 25
(50 Hz) or 30 (60 Hz) frames per second.
One of the advantages of Motion JPEG is that each image in a video sequence can have the
same guaranteed quality that is determined by the compression level chosen for the network
camera or video encoder. The higher the compression level, the lower the file size and image
quality. In some situations, such as in low light or when a scene becomes complex, the image
file size may become quite large and use more bandwidth and storage space. To prevent an
increase in the bandwidth and storage used, Axis network video products allow the user to set
a maximum file size for an image frame.
Since there is no dependency between the frames in Motion JPEG, a Motion JPEG video is
robust, meaning that if one frame is dropped during transmission, the rest of the video will not
be affected.
Motion JPEG is an unlicensed standard. It has broad compatibility and may be needed when
integrating with systems that support only Motion JPEG. It is also popular in applications where
individual frames in a video sequence are required—for example, for analysis—and where lower
frame rates, typically 5 frames per second or lower, are used.
The main disadvantage of Motion JPEG is that it makes no use of any video compression techniques to reduce the data since it is a series of still, complete images. The result is that it has a
relatively high bit rate or low compression ratio for the delivered quality compared with video
compression standards such as H.264 and MPEG-4.
When MPEG-4 is mentioned in video surveillance applications, it is usually referring to MPEG-4
Part 2, also known as MPEG-4 Visual. Like all MPEG (Moving Picture Experts Group) standards,
it is a licensed standard, so users must pay a license fee per monitoring station. MPEG-4 has,
in most applications, been replaced by the more efficient H.264 compression.
7.2.3 H.264 or MPEG-4 Part 10/AVC
H.264, also known as MPEG-4 Part 10/AVC for Advanced Video Coding, is the latest MPEG
standard for video encoding and is the current video standard of choice. This is because an
H.264 encoder can, without compromising image quality, reduce the size of a digital video file
by more than 80% compared with the Motion JPEG format and as much as 50% more than
with the MPEG-4 Part 2 standard. This means that much less network bandwidth and storage
space are required for a video file. Or seen another way, much higher video quality can be
achieved for a given bit rate.
H.264 was jointly defined by standardization organizations in the telecommunications (ITU-T’s
Video Coding Experts Group) and IT industries (ISO/IEC Moving Picture Experts Group). It is the
most widely adopted standard.
H.264 helps accelerate the adoption of megapixel/HDTV cameras since the highly efficient compression technology can reduce the large file sizes and bit rates generated without compromising
image quality. There are tradeoffs, however. While H.264 provides savings in network bandwidth
and storage costs, it requires higher performance network cameras and monitoring stations.
The Baseline Profile for H.264 uses only I- and P- frames, while the Main Profile may also use
B-frames in addition to I- and P-frames. Axis’ network video products use the H.264 Baseline
or Main Profile. The Baseline Profile allows network video products to have low latency. In
video products with more powerful processors, Axis uses the Main Profile without B-frames to
enable higher compression and at the same time low latency and maintained video quality.
Using Axis’ Main Profile H.264 compression, VGA-sized video streams are reduced by 10% to
15% and HDTV-sized video streams are reduced by 15% to 20%, compared with Axis’ Baseline
Profile H.264 compression.
H.264 profile comparison
Bit rate
Figure 7.2a While maintaining the same quality, Axis’ Main Profile H.264 compression generates fewer bits per
second than with its Baseline Profile H.264 compression.
Variable and constant bit rates
With MPEG-4 and H.264, users can allow an encoded video stream to have a variable or a
constant bit rate. The optimal selection depends on the application and network infrastructure.
With VBR (variable bit rate), a predefined level of image quality can be maintained regardless
of motion or the lack of it in a scene. This means that bandwidth use will increase when there
is a lot of activity in a scene and will decrease when there is no motion. This is often desirable
in video surveillance applications where there is a need for high quality, particularly if there is
motion in a scene. Since the bit rate may vary, even when an average target bit rate is defined,
the network infrastructure (available bandwidth) must be able to accommodate high throughputs.
With limited bandwidth available, the recommended mode is normally CBR (constant bit rate)
as this mode generates a constant bit rate that can be predefined by a user. The disadvantage
with CBR is that when there is, for instance, increased activity in a scene that results in a bit
rate that is higher than the target rate, the restriction to keep the bit rate constant leads to a
lower image quality and frame rate. Axis network video products allow the user to prioritize
either the image quality or the frame rate if the bit rate rises above the target bit rate.
Comparing standards
When comparing the performance of MPEG standards such as MPEG-4 and H.264, it is important to note that results may vary between encoders that use the same standard. This is
because the designer of an encoder can choose to implement different sets of tools defined by
a standard. As long as the output of an encoder conforms to a standard’s format and decoder,
it is possible to make different implementations. An MPEG standard, therefore, cannot guarantee a given bit rate or quality, and comparisons cannot be properly made without first defining
how the standards are implemented in an encoder. A decoder, unlike an encoder, must implement all the required parts of a standard in order to decode a compliant bit stream. A standard
specifies exactly how a decompression algorithm should restore every bit of a compressed
The graph on the following page provides a bit rate comparison, given the same level of image
quality, among the following video standards: Motion JPEG, MPEG-4 Part 2 (no motion compensation), MPEG-4 Part 2 (with motion compensation) and H.264 (Baseline Profile).
Figure 7.4a Axis’ Baseline Profile H.264 compression generated up to 50% fewer bits per second for a sample video
sequence than an MPEG-4 compression with motion compensation. The H.264 compression was at least three times
more efficient than an MPEG-4 compression with no motion compensation and at least six times more efficient than
with Motion JPEG.
While the use of audio in video surveillance systems is still not widespread, having
audio can enhance a system’s ability to detect and interpret events, as well as enable
audio communication over an IP network. The use of audio, however, can be restricted
in some countries, so it is a good idea to check with local authorities.
Topics covered in this chapter include application scenarios, audio equipment, audio
modes, audio detection alarm, audio compression and audio/video synchronization.
Audio applications
Having audio as an integrated part of a video surveillance system can be an invaluable addition
to a system’s ability to detect and interpret events and emergency situations. The ability of
audio to cover a 360° area enables a video surveillance system to extend its coverage beyond
a camera’s field of view. It can instruct a PTZ camera (or alert the operator of one) to visually
verify an audio alarm.
Audio can also be used to provide users with the ability to not only listen in on an area, but also
communicate orders or requests to visitors or intruders. For instance, if a person in a camera’s
field of view demonstrates suspicious behavior, such as loitering near a bank machine, or is seen
to be entering a restricted area, a remote security guard can send a verbal warning to the person. In a situation where a person has been injured, being able to remotely communicate with
and notify the victim that help is on the way can also be beneficial. Access control—that is, a
remote ‘doorman’ at an entrance—is another area of application. Other applications include a
remote helpdesk situation (e.g., an unmanned parking garage), and video conferencing. An
audiovisual surveillance system increases the effectiveness of a security or remote monitoring
solution by enhancing a remote user’s ability to receive and communicate information.
Audio support and equipment
Audio support can be more easily implemented in a network video system than in an analog
CCTV system. In an analog system, separate audio and video cables must be installed from
endpoint to endpoint; that is, from the camera and microphone location to the viewing/record-
ing location. If the distance between the microphone and the station is too long, balanced
audio equipment must be used, which increases installation costs and difficulty. In a network
video system, a network camera with audio support processes the audio and sends both audio
and video over the same network cable for monitoring and/or recording. This eliminates the
need for extra cabling, and makes synchronizing the audio and video much easier.
AUDIO Stream
VIDEO Stream
Figure 8.2a A network video system with integrated audio support. Audio and video streams are sent over the same
network cable.
AUDIO Stream
Video encoder
VIDEO Stream
Figure 8.2b Some video encoders have built-in audio, making it possible to add audio even if analog cameras are
used in an installation.
Figure 8.2c An example of an Axis omnidirectional condenser microphone.
A network camera or video encoder with an integrated audio functionality often provides a builtin microphone, and/or mic-in/line-in jack. With mic-in/line-in support, users have the option of
using another type or quality of microphone than the one that is built into the camera or video
encoder. It also enables the network video product to connect to more than one microphone, and
the microphone can be located some distance away from the camera. The microphone should
always be placed as close as possible to the source of the sound to reduce noise. In two-way,
full-duplex mode, a microphone should face away and be placed some distance from a speaker to
reduce feedback from the speaker.
Many Axis network video products do not come with a built-in speaker. An active speaker—a
speaker with a built-in amplifier—can be connected directly to a network video product with
audio support. If a speaker does not have a built-in amplifier, it must first connect to an amplifier,
which is then connected to a network camera/video encoder.
To minimize disturbance and noise, always use a shielded audio cable and avoid running the cable
near power cables and cables carrying high frequency switching signals. Audio cables should also
be kept as short as possible. If a long audio cable is required, balanced audio equipment—that is,
cable, amplifier and microphone that are all balanced—should be used to reduce noise.
Audio modes
Depending on the application, there may be a need to send audio in only one direction or both
directions, which can be done either simultaneously or in one direction at a time. There are
three basic modes of audio communication: simplex, half duplex and full duplex.
Audio sent by camera
Video sent by camera
Network camera
Figure 8.3a In simplex mode, audio is sent in one direction only. In this case, audio is sent by the camera to the
operator. Applications include remote monitoring and video surveillance..
Audio sent by operator
Video sent by camera
Network camera
Figure 8.3b In this example of a simplex mode, audio is sent by the operator to the camera. It can be used, for
instance, to provide spoken instructions to a person seen on the camera or to scare a potential car thief away from a
parking lot.
8.3.2 Half duplex
Audio sent by operator
Audio sent by camera
Video sent by camera
Network camera
Figure 8.3c In half-duplex mode, audio is sent in both directions, but only one party at a time can send. This is
similar to a walkie-talkie.
8.3.3 Full duplex
Full duplex audio sent and received by operator
Video sent by camera
Network camera
Figure 8.3d In full-duplex mode, audio is sent to and from the operator simultaneously. This mode of communication is similar to a telephone conversation. Full duplex requires that the client PC has a sound card with support for
full-duplex audio.
Audio detection alarm
Audio detection alarm can be used as a complement to video motion detection since it can
react to events in areas too dark for the video motion detection functionality to work properly.
It can also be used to detect activity in areas outside of the camera’s view.
When sounds, such as the breaking of a window or voices in a room, are detected, they can trigger
a network camera to send and record video and audio, send e-mail or other alerts, and activate
external devices such as alarms. Similarly, alarm inputs such as motion detection and door
contacts can be used to trigger video and audio recordings. In a PTZ camera, audio detection can
trigger the camera to automatically turn to a preset location such as a specific window.
Audio compression
Analog audio signals must be converted into digital audio through a sampling process and then
compressed to reduce the size for efficient transmission and storage. The conversion and compression is done using an audio codec, an algorithm that codes and decodes audio data.
8.5.1 Sampling frequency
There are many different audio codecs supporting different sampling frequencies and levels of
compression. Sampling frequency refers to the number of times per second a sample of an
analog audio signal is taken and is defined in hertz (Hz). In general, the higher the sampling
frequency, the better the audio quality and the greater the bandwidth and storage needs.
8.5.2 Bit rate
The bit rate is an important setting in audio since it determines the level of compression and,
thereby, the quality of the audio. In general, the higher the compression level (the lower the bit
rate), the lower the audio quality. The differences in the audio quality of codecs may be particularly noticeable at high compression levels (low bit rates), but not at low compression levels
(high bit rates). Higher compression levels may also introduce more latency or delay, but they
enable greater savings in bandwidth and storage.
The bit rates most often selected with audio codecs are between 32 kbit/s and 64 kbit/s. Audio
bit rates, as with video bit rates, are an important consideration to take into account when
calculating total bandwidth and storage requirements.
8.5.3 Audio codecs
Axis network video products support three audio codecs. The first is AAC-LC (Advanced Audio
Coding - Low Complexity), also known as MPEG-4 AAC, which requires a license. AAC-LC,
particularly at a sampling rate of 16 kHz or higher and at a bit rate of 64 kbit/s or more, is the
recommended codec to use when the best possible audio quality is required. The other two
codecs are G.711 and G.726, which are non-licensed ITU-T standards. They have lower delay
and requires less computing power than AAC-LC. G.711 and G.726 are speech codecs that are
primarily used in telephony and have low audio quality. Both have a sampling rate of 8 kHz.
G.711 has a bit rate of 64 kbit/s. Axis’ G.726 implementation supports 24 and 32 kbit/s. With
G.711, Axis’ products support only µ-law, which is one of two sound compression algorithms in
the G.711 standard. When using G.711, it is important that the client also uses the µ-law compression.
Audio and video synchronization
Synchronization of audio and video data is handled by a media player (a computer software
program used for playing back multimedia files) or by a multimedia framework such as Microsoft DirectX, which is a collection of application programming interfaces that handles multimedia files.
Audio and video are sent over a network as two separate packet streams. In order for the client
or player to perfectly synchronize the audio and video streams, the audio and video packets
must be time-stamped. The timestamping of video packets using Motion JPEG compression
may not always be supported in a network camera. If this is the case and if it is important to
have synchronized video and audio, the video format to choose is MPEG-4 or H.264 since such
video streams, along with the audio stream, are sent using RTP (Real-time Transport Protocol),
which timestamps the video and audio packets. There are many situations, however, where
synchronized audio is less important or even undesirable; for example, if audio is to be monitored but not recorded.
Network technologies
Different network technologies are used to support and provide the many benefits of
a network video system. This chapter begins with a discussion about the local area
network, in particular, Ethernet networks and the components that support it. The use
of Power over Ethernet is also covered.
Internet communication is then addressed with discussions on IP (Internet Protocol)
addressing—what they are and how they work, including how network video products
can be accessed over the Internet. An overview of the data transport protocols used in
network video is also provided.
Other areas covered in the chapter include virtual local area networks and Quality of
Service, and the different ways of securing communication over IP networks. For more
on wireless technologies, see Chapter 10.
Local area network and Ethernet
A local area network (LAN) is a group of computers that are connected together in a localized
area to communicate with one another and share resources such as printers. Data is sent in the
form of packets, and to regulate the transmission of the packets, different technologies can be
used. The most widely used LAN technology is Ethernet and it is specified in a standard called
IEEE 802.3. (Other types of LAN networking technologies include token ring and FDDI.)
Today Ethernet uses a star topology in which the individual nodes (devices) are networked with
one another via active networking equipment such as switches. The number of networked devices in a LAN can range from two to several thousand.
The physical transmission medium for a wired LAN involves cables, mainly twisted pair or fiber
optics. A twisted pair cable consists of eight wires, forming four pairs of twisted copper wires
and is used with RJ45 plugs and sockets. The maximum cable length of a twisted pair is 500 m
(1640 ft.) while for fiber, the maximum length ranges from 10 km to 70 km (6 miles to 43 miles),
depending on the type of fiber. Depending on the type of twisted pair or fiber optic cables used,
data rates today can range from 100 Mbit/s to 100,000 Mbit/s.
Figure 9.1a Twisted pair cabling includes four pairs of twisted wires, normally connected to a RJ45 plug at the end.
A rule of thumb is to always build a network with greater capacity than is currently required.
To future-proof a network, it is a good idea to design a network such that only 30% of its
capacity is used. Since more and more applications are running over networks today, higher and
higher network performance is required. While network switches (discussed below) are easy to
upgrade after a few years, cabling is normally much more difficult to replace.
9.1.1 Types of Ethernet networks
Below are the most common types of Ethernet networks in the video surveillance industry.
Fast Ethernet
Fast Ethernet refers to an Ethernet network that can transfer data at a rate of 100 Mbit/s. It
can be based on a twisted pair or fiber optic cable. (The older 10 Mbit/s Ethernet is still installed
and used, but such networks do not provide the necessary bandwidth for some network video
Most devices that are connected to a network, such as a laptop or a network camera, are
equipped with a 100BASE-TX/10BASE-T Ethernet interface, most commonly called a 10/100
interface, which supports both 10 Mbit/s and Fast Ethernet. The type of twisted pair cable that
supports Fast Ethernet is called a Cat-5 cable.
Gigabit Ethernet
Gigabit Ethernet, which can also be based on a twisted pair or fiber optic cable, delivers a data
rate of 1,000 Mbit/s (1 Gbit/s) and is now more commonly used than Fast Ethernet. 1 or 10 Gbit/s
Ethernet may be necessary for the backbone network that connects many network cameras.
The type of twisted pair cable that supports Gigabit Ethernet is a Cat-5e cable, where all four pairs
of twisted wires in the cable are used to achieve the high data rates. Cat-5e or higher cable categories are recommended for network video systems. Most interfaces are backwards compatible
with 10 and 100 Mbit/s Ethernet and are commonly called 10/100/1000 interfaces.
For transmission over longer distances, fiber cables such as 1000BASE-SX (up to 550 m/1,639 ft.)
and 1000BASE-LX (up to 550 m with multimode optical fibers and 5,000 m or 3 miles with singlemode fibers) can be used.
Figure 9.1b Longer distances can be bridged using fiber optic cables. Fiber is typically used in the backbone of a
10 Gigabit Ethernet
10 Gigabit Ethernet delivers a data rate of 10 Gbit/s (10,000 Mbit/s), and a fiber optic or twisted
pair cable can be used. 10GBASE-LX4, 10GBASE-ER and 10GBASE-SR based on an optical fiber
cable can be used to bridge distances of up to 10 km (6 miles). With a twisted pair solution, a very
high quality cable (Cat-6a or Cat-7) is required. 10 Gbit/s Ethernet is mainly used for backbones
in high-end applications that require high data rates.
9.1.2. Connecting network devices and network switch
When only two devices need to communicate directly with one another via a twisted pair cable,
a so-called crossover cable may be needed. The crossover cable simply crosses the transmission
pair on one end of the cable with the receiving pair on the other end and vice versa. Since many
devices have network interfaces that automatically detect such cases, a regular network cable
may be used.
To network multiple devices in a LAN, network equipment such as a network switch is required.
When using a network switch, a regular network cable is used. The main function of a network
switch is to forward data from one device to another on the same network. It does it in an efficient manner since data can be directed from one device to another without affecting other
devices on the same network.
A network switch works by registering the MAC (Media Access Control) addresses of all devices that are connected to it. (Each networking device has a unique MAC address, which is
made up of a series of numbers and letters in hexadecimal notation and is set by the manufacturer. The address is often found on the product label.) When a network switch receives data,
it forwards it only to the port that is connected to the device with the appropriate destination
MAC address.
Network switches typically indicate their performance in per port rates and in backplane or
internal rates (both in bit rates and in packets per second). The port rates indicate the maximum
rates on specific ports. This means that the speed of a switch, for example 100 Mbit/s, is often
the performance of each port.
Figure 9.1c With a network switch, data transfer is managed very efficiently as data traffic can be directed from one
device to another without affecting any other ports on the switch.
A network switch normally supports different data rates simultaneously. The most common rates
used to be 10/100 Mbit/s, supporting the 10 Mbit/s and Fast Ethernet standards. Today, network
switches often have 10/100/1000 interfaces, thus supporting 10 Mbit/s, Fast Ethernet and Gigabit
Ethernet simultaneously. The transfer rate and mode between a port on a switch and a connected
device are normally determined through auto-negotiation, whereby the highest common data
rate and best transfer mode are used. A network switch also allows a connected device to function in full-duplex mode—that is, send and receive data at the same time, resulting in increased
Network switches may come with different features or functions. Some switches include the
function of a router (see Section 9.2). A switch may also support Power over Ethernet or Quality
of Service (see Section 9.4), which controls how much bandwidth is used by different applications.
9.1.3 Power over Ethernet
Power over Ethernet (PoE) provides the option of supplying devices connected to an Ethernet
network with power using the same cable as for data communication. Power over Ethernet is
widely used to power IP phones, wireless access points and network cameras in a LAN.
The main benefit of PoE is the inherent cost savings. Hiring a certified electrician and installing
a separate power line are not needed. This is advantageous, particularly in difficult-to-reach
areas. The fact that no power cable has to be installed can save, depending on the camera location, up to a few hundred dollars per camera. Having PoE also makes it easier to move a camera
to a new location, or add cameras to a video surveillance system.
Additionally, PoE can make a video system more secure. A video surveillance system with PoE
can be powered from the server room, which is often backed up with a UPS (Uninterruptible
Power Supply). This means that the video surveillance system can be operational even during a
power outage.
Due to the benefits of PoE, it is recommended for use with as many devices as possible. The
power available from the PoE-enabled switch or midspan should be sufficient for the connected devices and the devices should support power classification. These are explained in
more detail in the sections below.
802.3af standard, PoE+ and High PoE
Most PoE devices today conform to the IEEE 802.3af standard, which was published in 2003. The
IEEE 802.3af standard uses standard Cat-5 or higher cables, and ensures that data transfer is not
affected. In the standard, the device that supplies the power is referred to as the power sourcing
equipment (PSE). This can be a PoE-enabled switch or midspan. The device that receives the
power is referred to as a powered device (PD). The functionality is normally built into a network
device like a network camera, or provided in a standalone splitter (see section below).
Backward compatibility to non PoE-compatible network devices is guaranteed. The standard
includes a method for automatically identifying if a device supports PoE, and only when that is
confirmed will power be supplied to the device. This also means that the Ethernet cable that is
connected to a PoE switch will not supply any power if it is not connected to a PoE-enabled device. This eliminates the risk of getting an electrical shock when installing or rewiring a network.
In a twisted pair cable, there are four pairs of twisted wires. PoE can use either the two ‘spare’
wire pairs, or overlay the current on the wire pairs used for data transmission. Switches with
built-in PoE often supply electricity through the two pairs of wires used for transferring data,
while midspans normally use the two spare pairs. A PD supports both options.
According to IEEE 802.3af, a PSE provides a voltage of 48 V DC with a maximum power of 15.4 W
per port. Considering that power loss takes place on a twisted pair cable, only 12.95 W is guaranteed for a PD. The IEEE 802.3af standard specifies various performance categories for PDs.
PSE such as switches and midspans normally supply a certain amount of power, typically 300 W
to 500 W. On a 48-port switch, that would mean 6 W to 10 W per port if all ports are connected to devices that use PoE. Unless the PDs support power classification, a full 15.4 W must
be reserved for each port that uses PoE, which means a switch with 300 W can only supply
power on 20 of the 48 ports. However, if all devices let the switch know that they are Class 1
devices, the 300 W will be enough to supply power to all 48 ports.
Minimum power
level at PSE
Maximum power
level used by PD
15.4 W
0.44 W - 12.95 W
4.0 W
0.44 W - 3.84 W
7.0 W
3.84 W - 6.49 W
15.4 W
6.49 W - 12.95 W
30 W
12.95 W - 25.5 W
Table 9.1a Power classifications according to IEEE 802.3af and IEEE 802.3at.
Most fixed network cameras can receive power via PoE using the IEEE 802.3af standard and are
normally identified as Class 1 or 2 devices.
Another PoE standard is IEEE 802.3at, also known as PoE+. Using PoE+, the power limit is raised
to at least 30 W via two pairs of wires from a PSE. For power requirements that are higher than
the PoE+ standard, Axis uses the term, High PoE. With High PoE, the power limits are raised to at
least 60 W via four pairs of wires and 51 W is guaranteed for power over Ethernet.
PoE+ and High PoE midspans and splitters can be used for devices such as PTZ cameras with motor control, as well as cameras with heaters and fans, which require more power than can be
delivered by the IEEE 802.3af standard. For PoE+ and High PoE, the use of at least a Cat-5e or
higher cable is recommended.
Midspans and splitters
Midspans and splitters (also known as active splitters) are equipment that enable an existing
network to support Power over Ethernet.
Power Supply
Network camera
with built-in PoE
Network camera
without built-in
Network switch
Active splitter
Power over Ethernet
Figure 9.1d An existing system can be upgraded with PoE functionality using a midspan and splitter.
The midspan, which adds power to an Ethernet cable, is placed between the network switch and
the powered devices. To ensure that data transfer is not affected, it is important to keep in mind
that the maximum distance between the source of the data (e.g., switch) and the network
video products is not more than 100 m (328 ft.). This means that the midspan and active
splitter(s) must be placed within the distance of 100 m.
A splitter is used to split the power and data in an Ethernet cable into two separate cables,
which can then be connected to a device that has no built-in support for PoE. Since PoE or PoE+
only supplies 48 V DC, another function of the splitter is to step down the voltage to the appropriate level for the device; for example, 12 V or 5 V.
Sending data over the Internet
To send data between a device on one local area network to another device on another LAN, a
standard way of communicating is required since local area networks may use different types
of technologies. This need led to the development of IP addressing and the many IP-based
protocols for communicating over the Internet, which is a global system of interconnected
computer networks. Before IP addressing is discussed, some of the basic elements of Internet
communication such as routers, firewalls and Internet service providers are covered below.
To forward data packages from one LAN to another LAN via the Internet, a networking equipment called a network router must be used. A router routes information from one network to
another based on IP addresses. It forwards only data packages that are to be sent to another
network. A router is most commonly used for connecting a local network to the Internet. Traditionally, routers were referred to as gateways.
A firewall is designed to prevent unauthorized access to or from a private network. Firewalls can
be implemented in both hardware and software, or a combination of both. Firewalls are frequently used to prevent unauthorized Internet users from accessing private networks that are
connected to the Internet. Messages entering or leaving the Internet pass through the firewall,
which examines each message, and blocks those that do not meet the specified security criteria.
Internet connections
In order to connect a LAN to the Internet, a network connection via an Internet service provider
(ISP) must be established. When connecting to the Internet, terms such as upstream and downstream are used. Upstream describes the transfer rate (bandwidth) with which data can be uploaded from the device to the Internet; for instance, when video is sent from a network camera.
Downstream is the transfer speed for downloading files; for instance, when video is received by a
monitoring PC. In most scenarios—for example, a laptop that is connected to the Internet—the
download speed from the Internet is the most important to consider. In a network video
application with a network camera at a remote site, the upstream speed is more relevant since
data (video) from the network camera will be uploaded to the Internet. Internet technologies with
asymmetrical bandwidth such as ADSL (Asymmetric Digital Subscriber Line) may not be suitable
for network video applications since their upstream data rate may be too low.
9.2.1 IP addressing
Any device that wants to communicate with other devices via the Internet must have a unique
and appropriate IP address. IP addresses are used to identify the sending and receiving devices.
There are currently two IP versions: IP version 4 (IPv4) and IP version 6 (IPv6). The main difference between the two is that the length of an IPv6 address is longer (128 bits compared with
32 bits for an IPv4 address). IPv4 addresses are most commonly used today. IPv4 addresses
IPv4 addresses are grouped into four blocks, and each block is separated by a dot. Each block
represents a number between 0 and 255; for example,
Certain blocks of IPv4 addresses have been reserved exclusively for private use. These private
IP addresses are to, to and to Such addresses can only be used on private networks and are not allowed to
be forwarded through a router to the Internet. All devices that want to communicate over the
Internet must have its own individual, public IP address. A public IP address is an address allocated by an Internet service provider. An ISP can allocate either a dynamic IP address, which
can change during a session, or a static address, which normally comes with an additional
monthly fee.
A port number defines a particular service or application so that the receiving server (e.g., network camera) will know how to process the incoming data. When a computer sends data tied to
a specific application, it usually automatically adds the port number to an IP address without
the user’s knowledge.
Port numbers can range from 0 to 65535. Certain applications use port numbers that are
pre-assigned to them by the Internet Assigned Numbers Authority (IANA). For example, a web
service via HTTP is typically mapped to port 80 on a network camera.
Setting IPv4 addresses
In order for a network camera or video encoder to work in an IP network, an IP address must be
assigned to it. Setting an IPv4 address for an Axis network video product can be done mainly
in two ways: automatically using DHCP (Dynamic Host Configuration Protocol) and manually.
Manual setting can be done in two ways. One is to use the network video product’s web page
to enter the static IP address, the subnet mask, as well as the IP addresses of the default
router, the DNS (Domain Name System) server and the NTP (Network Time Protocol) server for
synchronizing the time of the network video product. The second way is to use a management
software tool such as AXIS Camera Management.
DHCP manages a pool of IP addresses, which it can assign dynamically to a network camera/
video encoder. The DHCP function is often performed by a broadband router. The broadband
router in turn is typically connected to the Internet and gets its public IP address from an
Internet service provider. Using a dynamic IP address means that the IP address for a network
device may change from day to day. With dynamic IP addresses, it is recommended that users
register a domain name (e.g., for the network video product at a dynamic
DNS server, which can always tie the domain name for the product to any IP address that is
currently assigned to it. (A domain name can be registered using some of the popular dynamic
DNS sites such as Axis also offers its own called AXIS Internet Dynamic DNS
Service at, which is accessible from an Axis network video product’s web
Using DHCP to set an IPv4 address works as follows. When a network video product comes
online, it sends a query requesting configuration from a DHCP server. The DHCP server replies
with the configuration requested by the network video product. This normally includes the IP
address, the subnet mask, and IP addresses for the router, DNS server and NTP server. The network video product first verifies that the offered IP address is not already in use on the local
network, assigns the address to itself and can then update a dynamic DNS server with its
current IP address so that users can access the product using a domain name.
With AXIS Camera Management, the software can automatically find and set IP addresses and
show the connection status. The software can also be used to assign static, private IP addresses
for Axis network video products. This is recommended when using video management software
to access network video products. In a network video system with potentially hundreds of
cameras, a software program such as AXIS Camera Management is necessary in order to
effectively manage the system. For more on video management, see Chapter 11.
NAT (Network address translation)
When a network device with a private IP address wants to send information via the Internet, it
must do so using a router that supports NAT. Using this technique, the router can translate a
private IP address into a public IP address without the sending host’s knowledge.
Port forwarding
To access cameras that are located on a private LAN via the Internet, the public IP address of the
router should be used together with the corresponding port number for the network video product
on the private network.
Since a web service via HTTP is typically mapped to port 80, what happens when there are several network video products using port 80 for HTTP in a private network? Instead of changing the
default HTTP port number for each network video product, a router can be configured to associate a unique HTTP port number to a particular network video product’s IP address and default
HTTP port. This is a process called port forwarding.
Port forwarding works as follows. Incoming data packets reach the router via the router’s public
(external) IP address and a specific port number. The router is configured to forward any data
coming into a predefined port number to a specific device on the private network side of the
router. The router replaces the router address with the private address of the device and forwards
the data to the device. The reverse happens with outgoing data packets. The router replaces the
private IP address of the device with the router’s public IP address before the data is sent out over
the Internet. For the external client, it looks like its communicating with the router when in fact
the sent packets originate from the device on the private network.
Port mapping in the router
External IP address
of router
External port Internal IP address Internal port
of network device
Port 80
HTTP Request
Port 80
Port 80
Figure 9.2a Thanks to port forwarding in the router, network cameras with private IP addresses on a local network
can be accessed over the Internet. In this illustration, the router knows to forward data (request) coming into port
8032 to a network camera with a private IP address of port 80. The network camera can then begin
to send video.
Port forwarding is traditionally done by first configuring the router. Different routers have different ways of doing port forwarding and there are websites such as that
offer step-by-step instruction for different routers. Usually port forwarding involves bringing up
the router’s interface using an Internet browser, and entering the public (external) IP address of
the router and a unique port number that is then mapped to the internal IP address of the specific network video product and its port number for the application.
To make the task of port forwarding easier, Axis offers the NAT traversal feature in its network
video products. NAT traversal will, when enabled, attempt to configure port mapping in a NAT
router on the network using UPnP. On the network video product’s web page, users can manually enter the IP address of the NAT router. If a router is not manually specified, then the network video product will automatically search for NAT routers on the network and select the
default router. In addition, NAT traversal will automatically select an HTTP port if none is
manually entered.
Figure 9.2b Axis network video products enable port forwarding to be set using NAT traversal. IPv6 addresses
An IPv6 address is written in hexadecimal notation with colons subdividing the address into
eight blocks of 16 bits each; for example, 2001:0da8:65b4:05d3:1315:7c1f:0461:7847
The major advantages of IPv6, apart from the availability of a huge number of IP addresses,
include enabling a device to automatically configure its IP address using its MAC address. For
communication over the Internet, the host requests and receives from the router the necessary
prefix of the public address block and additional information. The prefix and host’s suffix is then
used, so DHCP for IP address allocation and manual setting of IP addresses are no longer
required with IPv6. Port forwarding is also no longer needed. Other benefits of IPv6 include
renumbering to simplify switching entire corporate networks between providers, faster routing,
point-to-point encryption according to IPSec, and connectivity using the same address in
changing networks (Mobile IPv6).
An IPv6 address is enclosed in square brackets in a URL and a specific port can be addressed in
the following way: http://[2001:0da8:65b4:05d3:1315:7c1f:0461:7847]:8081/ Setting an IPv6
address for an Axis network video product is as simple as checking a box to enable IPv6 in the
product. The product will then receive an IPv6 address according to the configuration in the
network router.
9.2.2 Data transport protocols for network video
The Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP) are the IPbased protocols used for sending data. These transport protocols act as carriers for many other
protocols. For example, HTTP (Hyper Text Transfer Protocol), which is used to browse web
pages on servers around the world using the Internet, is carried by TCP.
TCP provides a reliable, connection-based transmission channel. It ensures that data sent from
one end is received on the other. TCP’s reliability through retransmission may introduce significant delays. In general, TCP is used when reliable communication is preferred over transport
UDP is a connectionless protocol and does not guarantee the delivery of data sent, thus leaving
the whole control mechanism and error-checking to the application itself. UDP provides no
transmissions of lost data and, therefore, does not introduce further delays.
Common usage
Network video usage
(File Transfer
Transfer of files
over the Internet/
Transfer of images or video from a
network camera/video encoder to an
FTP server or to an application
(Send Mail
Protocol for
sending e-mail
A network camera/video encoder can
send images or alarm notifications
using its built-in e-mail client.
Used to browse
the web, i.e. to
retrieve web
pages from web
The most common way to transfer
video from a network camera/video
encoder where the network video
device essentially works as a web
server making the video available for
the requesting user or application
Used to access
web pages
securely using
Secure transmission of video from
network cameras/video encoders.
(Hyper Text
Protocol over
Socket Layer)
(Real Time
(Real Time
RTP standardized
packet format for
delivering audio
and video over
the Internet—
often used in
streaming media
systems or video
A common way of transmitting
H.264/MPEG-based network video,
and for synchronizing video and audio
since RTP provides sequential
numbering and timestamping of data
packets, which enable the data
packets to be reassembled in the
correct sequence. Transmission can
be either unicast or multicast.
Used to set up and control multimedia sessions over RTP
Table 9.2a Common TCP/IP protocols and ports used for network video.
When a network video system is designed, there is often a desire to keep the network separate
from other networks, both for security as well as performance reasons. At first glance, the obvious
choice would be to build a separate network. While the design would be simplified, the cost of
purchasing, installing and maintaining the network would often be higher than using a technology called virtual local area network (VLAN).
VLAN is a technology for virtually segmenting networks, a functionality that is supported by most
network switches. It can be achieved by dividing network users into logical groups. Only users in
a specific group are capable of exchanging data or accessing certain resources on the network. If
a network video system is segmented into a VLAN, only the servers located on that VLAN can
access the network cameras. VLANs provide a flexible and more cost-efficient solution than a
separate network. The primary protocol used when configuring VLANs is IEEE 802.1Q, which tags
each frame or packet with extra bytes to indicate which virtual network the packet belongs to.
Figure 9.3a In this illustration, VLANs are set up over several switches. First, each of the two different LANs are segmented into VLAN 20 and VLAN 30. The links between the switches transport data from different VLANs. Only members
of the same VLAN are able to exchange data, either within the same network or over different networks. VLANs can be
used to separate a video network from an office network.
Quality of Service
Since different applications—for example, telephone, e-mail and surveillance video—may be
using the same IP network, there is a need to control how network resources are shared to
fulfill the requirements of each service. One solution is to let network routers and switches
operate differently on different kinds of services (voice, data, and video) as traffic passes
through the network. By using Quality of Service (QoS), different network applications can coexist on the same network without consuming each other’s bandwidth.
The term, Quality of Service, refers to a number of technologies such as Differentiated Service
Codepoint (DSCP), which can identify the type of data in a data packet and so divide the packets into traffic classes that can be prioritized for forwarding. The main benefits of a QoS-aware
network include the ability to prioritize traffic to allow critical flows to be served before flows
with lesser priority, and greater reliability in a network by controlling the amount of bandwidth
an application may use and thus controlling bandwidth competition between applications. An
example of where QoS can be used is with PTZ commands to guarantee fast camera responses
to movement requests. The prerequisite for the use of QoS within a video network is that all
switches, routers and network video products must support QoS.
PC 3
PC 1
Router 1
Router 2
100 Mbit
100 Mbit
Switch 1
Camera 1
10 Mbit
Switch 2
PC 2
100 Mbit
Camera 2
Figure 9.4a Ordinary (non-QoS aware) network. In this example, PC1 is watching two video streams from cameras 1
and 2, with each camera streaming at 2.5 Mbit/s. Suddenly, PC2 starts a file transfer from PC3. In this scenario, the File
Transfer Protocol (FTP) will try to use the full 10 Mbit/s capacity between the routers 1 and 2, while the video streams
will try to maintain their total of 5 Mbit/s. The amount of bandwidth given to the surveillance system can no longer be
guaranteed and the video frame rate will probably be reduced. At worst, the FTP traffic will consume all the available
PC 3
PC 1
Router 1
Router 2
100 Mbit
Switch 1
Camera 1
100 Mbit
10 Mbit
Switch 2
PC 2
100 Mbit
Camera 2
Figure 9.4b QoS aware network. Here, Router 1 has been configured to use up to 5 Mbit/s of the available 10 Mbit/s
for streaming video. FTP traffic is allowed to use 2 Mbit/s, and HTTP and all other traffic can use a maximum of 3
Mbit/s. Using this division, video streams will always have the necessary bandwidth available. File transfers are considered less important and get less bandwidth, but there will still be bandwidth available for web browsing and other
traffic. Note that these maximums only apply when there is congestion on the network. If there is unused bandwidth
available, this can be used by any type of traffic.
Network security
There are different levels of security when it comes to securing information being sent over IP
networks. The first is authentication and authorization. The user or device identifies itself to the
network and the remote end by a user name and password, which are then verified before the
device is allowed into the system. Added security can be achieved by encrypting the data to
prevent others from using or reading the data. Common methods are SSL/TLS (also known as
HTTPS), VPN and WEP or WPA in wireless networks. (For more on wireless security, see Chapter
10.) The use of encryption may slow down communications, depending on the kind of implementation and encryption used.
9.5.1 User name and password authentication
Using user name and password authentication is the most basic method of protecting data on
an IP network and may be sufficient where high levels of security are not required, or where the
video network is segmented off from the main network and unauthorized users would not have
physical access to the video network. The passwords can be encrypted or unencrypted when
they are sent; the former provides the best security.
Axis network video products provide multi-level password protection. Three levels are available: Administrator (full access to all functionalities), Operator (access to all functionalities
except the configuration pages), Viewer (access only to live video).
9.5.2 IP address filtering
Axis network video products provide IP address filtering, which gives or denies access rights to
defined IP addresses. A typical configuration is to configure the network cameras to allow only
the IP address of the server that is hosting the video management software to access the network video products.
9.5.3 IEEE 802.1X
Many Axis network video products support IEEE 802.1X, which is a method used to protect a
network from connecting with unauthorized devices. IEEE 802.1X establishes a point-to-point
connection or prevents access from the LAN port if authentication fails. IEEE 802.1X prevents
what is called “port hijacking”; that is, when an unauthorized computer gets access to a network
by getting to a network jack inside or outside a building. IEEE 802.1X is useful in network video
applications since network cameras are often located in public spaces where an openly accessible
network jack can pose a security risk. In today’s enterprise networks, IEEE 802.1X is becoming a
basic requirement for anything that is connected to a network.
In a network video system, IEEE 802.1X can work as follows: 1) A network camera that is configured for IEEE 802.1X sends a request for network access to a switch or access point; 2) the switch
or access point forwards the query to an authentication server; for instance, a RADIUS (remote
authentication dial-in user service) server such as a Microsoft Internet Authentication Service
server; 3) if authentication is successful, the server instructs the switch or access point to open
the port to allow data from the network camera to pass through the switch and be sent over the
(Network camera)
Server (RADIUS)
or other LAN
Figure 9.5a IEEE 802.1X enables port-based security and involves a supplicant (e.g., a network camera), an authenticator (e.g., a switch) and an authentication server. Step 1: network access is requested; step 2: query forwarded to
an authentication server; step 3: authentication is successful and the switch is instructed to allow the network camera to send data over the network.
9.5.4 HTTPS or SSL/TLS
HTTPS (Hyper Text Transfer Protocol Secure) is a secure communication method that sends
HTTP inside a Secure Socket Layer (SSL) or Transport Layer Security (TLS) connection. It means
that the HTTP and the data itself are encrypted.
Many Axis network video products have built-in support for HTTPS, which makes it possible for
video to be securely viewed using a web browser. To enable an Axis network camera or video
encoder to communicate over HTTPS, a digital certificate and an asymmetric key pair must be
installed in the Axis product. The key pair is generated by the Axis product. The certificate can
either be generated and self-signed by the Axis product, or issued by a certificate authority.
With HTTPS, the certificate is used for authentication and encryption. This means that the
certificate allows a web browser to verify the identity of the camera or video encoder, and it
enables the communication to be encrypted using keys that are generated by public-key
9.5.5 VPN (Virtual Private Network)
With VPN, a secure “tunnel” between two communicating devices can be created, enabling safe
and secure communication over the Internet. In such a set up, the original packet, including the
data and its header, which may contain information such as the source and destination addresses, the type of information being sent, the packet number in the sequence of packets and
the packet length, is encrypted. The encrypted packet is then encapsulated in another packet
that shows only the IP addresses of the two communicating devices (i.e., routers). This set up
protects the traffic and its contents from unauthorized access, and only devices with the correct “key” will be able to work within the VPN. Network devices between the client and the
server will not be able to access or view the data.
SSL/TLS encryption
VPN tunnel
Figure 9.5b The difference between SSL/TLS and VPN is that in SSL/TLS only the actual data of a packet is encrypted.
With VPN, the entire packet can be encrypted and encapsulated to create a secure “tunnel”. Both technologies can
be used in parallel, but it is not recommended since each technology will add overhead and decrease the performance
of the system.
Wireless technologies
For video surveillance applications, wireless technology offers a flexible, cost-efficient
and quick way to deploy cameras, particularly over a large area as in a parking lot or a
city center surveillance application. There would be no need to pull a cable through the
ground. In older, protected buildings, wireless technology may be the only alternative
if standard Ethernet cables may not be installed.
Axis offers cameras with built-in wireless support. Network cameras without built-in
wireless technology can still be integrated into a wireless network if a wireless bridge
is used.
10.1 802.11 WLAN standards
The most common set of standards for wireless local area networks (WLAN) is IEEE 802.11.
While there are also other standards as well as proprietary technologies, the benefit of 802.11
wireless standards is that they all operate in a license-free spectrum, which means there is no
license fee associated with setting up and operating the network. The most relevant amendments of the standards for Axis products are 802.11b, 802.11g and 802.11n.
802.11b, which was approved in 1999, operates in the 2.4 GHz range and provides data rates up
to 11 Mbit/s. 802.11g, which was approved in 2003, operates in the 2.4 GHz range and provides
data rates of up to 54 Mbit/s. WLAN products are usually 802.11b/g compliant. Most wireless
products today support 802.11n, which was approved in 2009 and which operates in the
2.4 GHz or 5 GHz band. Depending on what features in the standard are implemented, 802.11n
enables a maximum data rate of between 65 Mbit/s and 600 Mbit/s. Data rates, in practice, can
be much lower than the theoretical maximums. The forthcoming IEEE 802.11ac standard, which
will operate in the 5 GHz band, aims for even higher data rates.
When setting up a wireless network, the bandwidth capacity of the access point and the bandwidth requirements of the network devices should be considered. In general, the useful data
throughput supported by a particular WLAN standard is about half the bit rate stipulated by a
standard due to signaling and protocol overhead. With network cameras that support 802.11g,
no more than four to five of such cameras should be connected to a wireless access point.
10.2 WLAN security
Due to the nature of wireless communications, anyone with a wireless device that is present
within the area covered by a wireless network can share the network and intercept data being
transferred over it unless the network is secured.
To prevent unauthorized access to the data transferred and to the network, some security technologies such as WEP and WPA/WPA2 have been developed to prevent unauthorized access and
encrypt data sent over the network.
10.2.1 WEP (Wired Equivalent Privacy)
WEP was designed to prevent people without the correct key from accessing the network. It is,
however, not a recommended security technology due to its weaknesses such as keys that are
relatively short and the ease of reconstructing the keys from a relatively small amount of intercepted traffic.
10.2.2 Wi-Fi Protected Access
Wi-Fi Protected Access (WPA™) and its successor Wi-Fi Protected Access II (WPA2™) are based
on IEEE 802.11i standard. They significantly increase wireless security by addressing the shortcomings in WEP.
WPA-Personal, also known as WPA-/WPA2–PSK (Pre-shared key), is designed for small networks and does not require an authentication server. With WPA-Personal (WPA-/WPA2-PSK),
Axis’ wireless cameras use a PSK to authenticate with the access point. The key can be entered
either as a 256 bit number — expressed as 64 hexadecimal digits (0 to 9, A to F) — or a passphrase using 8 to 63 ASCII characters. Long passphrases must be used to circumvent weaknesses with this security method.
Meanwhile, WPA-/WPA2-Enterprise is designed for large networks and requires an authentication server with the use of IEEE 802.1X. See Chapter 9 for more on IEEE 802.1X.
To simplify the process of configuring WLAN and connecting to an access point, some Axis
wireless cameras support a WLAN pairing mechanism that is compatible with Wi-Fi Protected
Setup™ push-button configuration. It involves a WLAN pairing button on the camera and an
access point with a push button configuration (PBC) button. When the buttons on both the
camera and the access point are pressed within a 120 second time frame, the devices will automatically discover each other and agree on a configuration. The WLAN pairing function
should be disabled once the camera is installed to prevent someone with physical access to the
camera from connecting the camera to a rogue access point.
Figure 10.2a Some Axis wireless cameras support a WLAN pairing mechanism that is compatible with Wi-Fi Protected
Setup™ protocol, which simplifies the process of configuring security on wireless networks.
Some security guidelines when using wireless cameras for surveillance:
> Enable the user/password login in the cameras.
> Use WPA/WPA2 and a passphrase with at least 20 random characters in a mixed
combination of lower and uppercase letters, special characters and numbers.
> Enable the encryption (HTTPS) in the wireless router/cameras. This should be done before the
keys or credentials are set for the WLAN to prevent anyone from seeing the keys as they are
sent to/configured in the camera.
10.3 Wireless bridges
Some solutions may use other standards than the dominating IEEE 802.11, providing increased
performance and much longer distances in combination with very high security. Two commonly used technologies are microwave and laser, which can be used to connect buildings or
sites with a point-to-point high-speed data link.
10.4 Wireless mesh network
A wireless mesh network is a common solution for city center video surveillance applications
where hundreds of cameras, together with mesh routers and gateways, may be involved. Such
a network is characterized by several connection nodes that serve to receive, send as well as
relay data, providing individual and redundant connection paths between one another. Keeping
the latency down is important in applications such as live video and particularly in cases where
PTZ cameras are used.
Video management systems
An important aspect of a video surveillance system is managing video for live viewing,
recording, playback and storage, in addition to managing the network video products.
If the system consists of only one or a few cameras, viewing and some basic video
recording can be managed via the built-in web pages of the network cameras and
video encoders. When the system consists of more than a few cameras, using a network video management system—as well as the products’ built-in web pages in some
cases—is recommended.
Today, several hundred different video management systems are available, based on
different hardware and software platforms that cover different operating systems
(Windows, UNIX, Linux and Mac OS), market segments and languages.
Axis offers decentralized and centralized solutions for Windows, with support for different languages and remote access to live viewing and recording using a laptop, an
iPhone/iPad or an Android-based smartphone with Internet access. Furthermore, the
company’s network of Application Development Partners offers solutions for any system type, size or complexity. The sections below provide a description of Axis’ video
management solutions, system features, as well as integration possibilities with other
systems such as point of sale or building management.
11.1 Types of video management solutions
Video management solutions involve a combination of hardware and software platforms that may
be set up in different ways. Recording, for example, can be done either decentrally at many camera locations, hosted or centrally at one location. PC-based solutions offer flexibility and maximum performance for the specific design of the system, with the ability to add functionality, such
as increased or external storage, firewalls, virus protection and intelligent video applications.
Solutions are often tailored to the number of cameras supported. For smaller systems with less
demanding video management requirements, solutions with limited functionality are ideal. The
scalability of most video management software, in terms of the number of cameras and frames
per second that can be supported, is in most cases limited by the hardware capacity rather than
the software. Storing video files puts strains on the storage hardware because it may be required to operate on a continual basis, as opposed to only during normal business hours. In
addition, video by nature generates large amounts of data, which put high demands on the
storage solution. For more on servers and storage, see Chapter 12.
11.1.1 Decentralized solution for small systems - AXIS Camera Companion
For end users who want a simple solution for viewing and recording video even in HDTV, Axis
provides AXIS Camera Companion. It supports one to 16 cameras per site—ideal for retail
stores, offices and hotels. It is a decentralized video management solution that enables recordings to be stored on a SD/SDHC/SDXC memory card in an Axis camera or video encoder. It enables live viewing, playback of recordings, video exporting and recording settings to be made
remotely from any location with Internet access. AXIS Camera Companion allows end users
with a few site installations to access each site individually.
Figure 11.1a AXIS Camera Companion’s live view involving four cameras (at left); playback view with recording timeline (at right).
The free AXIS Camera Companion software client only needs to be used at installation for
configuring and uploading settings in the network video products. Once the network video
products are configured, the products operate independently without the need for a central PC
server or DVR. Since recordings take place locally in the video products without the use of any
network, network failure would not disrupt any recording. Network bandwidth would only be
used when live viewing or playback is required.
Using the default settings of motion-based recording, HDTV 720p resolution and 15 frames per
second, a 64 GB SDXC card can record more than a month of video.
Figure 11.1b At left, an AXIS Camera Companion setup involving cameras with memory cards, PoE switch, router (for
wireless and for Internet access), laptop and smartphone. At right, viewing on a smartphone.
11.1.2 Hosted video solution for businesses with many small sites
Hosted video offers a hassle-free monitoring solution over the Internet for end users. It usually
involves a subscription to a monitoring service provider, such as a security integrator or an
alarm monitoring center that also provides services such as guards, and supports other business areas such as cash protection.
With Axis’ hosted video solution, end users’ investments are limited to the Axis camera or
video encoder and an Internet connection. There is no need to maintain the recording and
monitoring station locally. Using a web browser on a computer or smartphone, an authorized
user can connect to a service portal on the Internet to access live or recorded video. The service
is enabled by a network of hosting providers that uses AXIS Video Hosting System (AVHS)
software, which makes it easy for security integrators and alarm monitoring centers to offer
video monitoring services over the Internet. The solution is suitable for systems with a limited
number of cameras per site in single or multiple locations and is ideal for retailers such as convenience stores, gas stations, banks and small offices.
Customer site
AVHS server and storage
Figure 11.1c An AXIS Video Hosting System setup with video recording saved off-site. End customers access live
view and recordings by logging in to the service provider’s portal.
11.1.3 Centralized, general client-server solution for medium-sized systems –
AXIS Camera Station
AXIS Camera Station offers advanced video management functionalities, providing a complete
monitoring and recording system for up to 100 cameras per server. The software is ideal for
retail shops, hotels and schools with more than 10 cameras and a locally connected, standard
PC for running the software. It offers easy installation and setup with automatic camera discovery, a powerful Configuration Wizard and efficient management of Axis network video
products. Details about supported system features are described in Section 11.2.
By using a Windows client-server software, AXIS Camera Station is a centralized solution that
requires the video management software to run continuously on an on-site computer for management and recording functions. Recordings are made on the local network, either on the
same computer where the AXIS Camera Station software is installed or on separate storage
A client software is provided and can be installed on any computer for viewing, playback and
administration functions, which can be done either on-site or remotely via the Internet. As
multi-site functionality is supported, the client enables users to access cameras that are supported by different AXIS Camera Station servers. This makes it possible to manage video at
many remote sites or in a large system.
AXIS Camera Station offers an open API (Application Programming Interface) for integration
with other systems such as point of sale, access control, tracking (e.g., radio-frequency identification), building management and industrial control. When video is integrated, information
from other systems can be used to trigger functions such as event-based recordings in the
network video system, and vice versa. In addition, users can benefit from having a common
interface for managing different systems.
AXIS Camera Station
Client software
Analog cameras
Remote access via
AXIS Camera Station
Client software
Axis video encoder
Axis network cameras
AXIS Camera Station
Figure 11.1d A network video surveillance system based on an open, PC server platform with AXIS Camera Station
video management software.
11.1.4 Customized solutions for small to big systems from Axis’ partners
Axis works with more than 800 Application Development Partners globally to ensure tightly
integrated software solutions that support Axis network video products. The partners provide
a range of customized software solutions. Such solutions may offer optimized features and
advanced functionalities, tailored features for a specific industry segment or country-focused
solutions. There are also solutions that support more than 1000 cameras and multiple brands
of network video products. To find compatible applications, see
11.2 System features
A video management system can support many different features. Some of the more common
ones are listed below:
Simultaneous viewing of video from multiple cameras
Recording of video and audio
Event management functions including intelligent video such as video motion detection
Camera administration and management
Search options and playback
User access control and activity (audit) logging
A key function of a video management system is enabling live and recorded video to be viewed
in efficient and user-friendly ways. Most video management software applications enable
multiple users to view in different modes such as split view (to view different cameras at the
same time), full screen or camera sequence (where views from different cameras are displayed
automatically, one after the other).
Recording indicator
Links to
View groups
Audio and
PTZ controls
Alarm log
Figure 11.2a AXIS Camera Station’s live view screen.
Software such as AXIS Camera Station supports Axis network video products’ multi-streaming
capability. Multiple video streams from a network camera or video encoder can be individually
configured with different frame rates, compression formats and resolutions, and sent to different recipients simultaneously. This capability optimizes the use of network bandwidth.
Remote recording/
viewing at medium
frame rate and
medium resolution
Analog camera
Video encoder
Local recording/
viewing at full
frame rate and
high resolution
Viewing with a
mobile telephone
at medium frame
rate and low
Figure 11.2b Multiple, individually configurable video streams enable different frame rate video and resolution to
be sent to different recipients.
11.2.3 Video recording
With video management software such as AXIS Camera Station, video can be recorded manually, continuously and on trigger (by an event/alarm). Continuous and triggered recordings can
be scheduled to run at selected times during each day of the week.
Continuous recording normally uses more disk space than an event-triggered recording. An
event-triggered recording may be activated by, for example, video motion detection or external
inputs through a camera’s or video encoder’s input port. With scheduled recordings, timetables
for both continuous and event-triggered recordings can be set.
Figure 11.2c Scheduled recording settings with a combination of continuous and event-triggered recordings applied using AXIS Camera Station video management software.
The quality of the recordings can be determined by selecting the video format (e.g., H.264,
MPEG-4, Motion JPEG), resolution, compression level and frame rate. These parameters will
affect the amount of bandwidth used, as well as the amount of storage space required.
Network video products may have varying frame rate capabilities depending on the resolution.
Recording and/or viewing at full frame rate (considered as 25 frames per second in 50 Hz and
30 frames per second in 60 Hz) on all cameras at all times is more than what is required for
most applications. Frame rates under normal conditions can be set lower—for example, one to
four frames per second—to dramatically decrease storage requirements. In the event of an
alarm—for instance, if video motion detection or an external sensor is triggered—a separate
stream with a higher recording frame rate can be sent.
11.2.4 Recording and storage
Most video management software use the standard Windows file system for storage, so any
system drive or network share can be used for storing video. A video management software
program may enable more than one level of storage; for instance, recordings are made on a
primary hard drive (the local hard disk) and archiving takes place on either local disks, networkattached drive or remote hard drive. Users may be able to specify how long images should remain on the primary hard drive before they are automatically deleted or moved to the archive
drive. Users may also be able to prevent event-triggered video from being deleted automatically by specially marking or locking them in the system.
11.2.5 Event management and intelligent video
Event management is about identifying or creating an event that is triggered by inputs, whether from built-in features in the network video products or from other systems such as point-ofsale terminals or intelligent video software. The network video surveillance system can then be
configured to automatically respond to the event by, for example, recording video, sending alert
notifications and activating different devices such as doors and lights.
Event management and intelligent video functionalities can work together to enable a video
surveillance system to more efficiently use network bandwidth and storage space. Live camera
monitoring is not required all the time since alert notifications to operators can be sent when
an event occurs. All configured responses can be activated automatically, improving response
times. Event management helps operators cover more cameras.
Both event management and intelligent video functionalities can be built-in and conducted in
a network video product or in a video management software program. It can also be handled
by both in the sense that a video management software program can take advantage of an
intelligent video functionality that is built into a network video product. For instance, the intelligent video functionality, such as video motion detection and camera tampering, can be performed by the network video product and flagged to the management software program for
further actions to be taken. This process offers a number of benefits:
It enables a more efficient use of bandwidth and storage space since there is no need for a
camera to continuously send video to a video management server for analysis of any
potential events. Analysis takes place at the network video product and video streams are
sent for recording and/or viewing only when an event occurs.
> It does not require the video management server to have a fast processing capability,
thereby providing some cost-savings. Conducting intelligent video algorithms is CPU
(central processing unit) intensive.
Scalability can be achieved. If a server were to perform intelligent video algorithms, only a
few cameras can be managed at any given time. Having the intelligent functionality “at the
edge”, that is, in the network camera or video encoder, enables a fast response time and a
very large number of cameras to be managed proactively.
Computer with video
Axis network
Alarm siren
Figure 11.2d Event management and intelligent video enable a surveillance system to be constantly on guard in
analyzing inputs to detect an event. Once an event is detected, the system can automatically respond with actions
such as video recording and sending alerts.
Event triggers
An event can be scheduled or triggered. Events can be triggered by, for example:
Input port(s): The input port(s) on a network camera or video encoder can be connected to
external devices such as a motion sensor, PIR (passive infrared detection that detects motion
based on heat emission), a door contact or glass break detector (detects change in air pres
sure). The range of devices that can be connected to a network video product’s input port is
almost infinite. The basic rule is that any device that can toggle between an open and closed
circuit can be connected to a network camera or a video encoder.
> Manual trigger: An operator can make use of buttons to manually trigger an event.
Video motion detection: When a camera detects certain movement in a camera’s motion
detection window, an event can be triggered. Video motion detection (VMD) defines an
activity in a scene by analyzing image data and differences in a series of images. With VMD,
motion can be detected in any part of a camera’s view. Users can configure an “included”
window (a specific area in a camera’s view where motion is to be detected), and an
“excluded” window (an area within an “included” window that should be ignored).
Figure 11.2e Setting video motion detection in AXIS Camera Station video management software.
> Tampering: This feature, which allows a camera to detect when it has been intentionally
covered, moved or is no longer in focus, can be used to trigger an event.
> Audio detection: This enables a camera with built-in audio support to trigger an event if it
detects audio below or above a certain threshold. For more on audio detection, see Chapter 8.
Failover recording: This means that images can be temporarily stored on a memory card in
a network camera/video encoder in case of network failure. When the network connection is
restored and the system returns to normal operation, the video management system can
retrieve and merge local video recordings seamlessly. This ensures that the user gets
uninterrupted video recordings. The functionality provides increased system reliability and
safeguards system operation.
> Temperature: If the temperature rises or falls outside of the operating range of a camera, an
event can be triggered.
Applications that are compatible with the AXIS Camera Application Platform can also be used
as triggers. See Chapter 2 for more information about AXIS Camera Application Platform.
Network video products or a video management software program can be configured to respond to events all the time or at certain set times. When an event is triggered, some of the
common responses that can be configured include the following:
> Upload images or recording of video streams to specified location(s) with a specified
compression format and at a certain frame rate.
> Activate output port: The output port(s) on a network camera or video encoder can be
connected to external devices such as alarms and door relay for controlling the locking/
unlocking of doors.
> Send e-mail notification: This notifies users that an event has occurred. An image can also
be attached in the e-mail.
> Send HTTP/TCP notification: This is an alert to a video management system, which can then,
for example, initiate recordings.
> Go to a PTZ preset: This feature may be available with PTZ cameras. The camera can be
directed to point to a specified position such as a window when an event takes place, or start
guard tour or autotracking.
> Send an SMS (Short Message Service) with text information about the alarm or an MMS
(Multimedia Messaging Service) with an image showing the event.
> Activate an audio alert on the video management system.
> Enable on-screen pop-up, showing views from a camera where an event has been activated.
> Show procedures that the operator should follow.
In addition, pre-alarm and post-alarm image buffers can be set, enabling a network video product to send a set length and frame rate of video captured before and after an event is triggered.
This can be beneficial in helping to provide a more complete picture of an event.
11.2.6 Administration and management features
All video management software applications provide the ability to add and configure basic
camera settings, frame rate, resolution and compression format, but some also include more
advanced functionalities, such as camera discovery and complete device management. The
larger a video surveillance system becomes, the more important it is to be able to efficiently
manage networked devices.
Software programs that help simplify the management of network cameras and video encoders
in an installation often provide the following functionalities:
Locating and showing the connection status of video devices on the network
Setting IP addresses
Configuring single or multiple units
Managing firmware upgrades of multiple units
Managing user access rights
Providing a configuration sheet, which enables users to obtain, in one place, an overview of
all camera and recording configurations
Figure 11.2f AXIS Camera Management software makes it easy to find, install and configure network video products.
An important part of video management is security. A network video product or video management software should enable the following possibilities:
> Define/set authorized users
> Set passwords and have the ability to encrypt passwords
> Define/set different user-access levels, for example:
- Administrator: access to all functionalities (In the AXIS Camera Station software, for
instance, an administrator can select which cameras and functionalities a user may
have access to.)
- Operator: access to all functionalities except for certain configuration pages
- Viewer: access only to live video from selected cameras
> Support IEEE 802.1X to prevent unauthorized network access. See Chapter 9 for more on
IEEE 802.1X and network security.
11.3 Integrated systems
When video is integrated with other systems such as point-of-sale and building management,
information from other systems can be used to trigger functions such as event-based recordings in the network video system, and vice versa. In addition, users can benefit from having a
common interface for managing different systems.
11.3.1 Point of Sale
The introduction of network video in retail environments has made the integration of video
with point-of-sale (POS) systems easier.
The integration enables all cash register transactions to be linked to actual video of the transactions. It helps catch and prevent fraud and theft from employees and customers. POS exceptions such as returns, manually entered values, line corrections, transaction cancellations, coworker purchases, discounts, specially tagged items, exchanges and refunds can be visually
verified with the captured video. A POS system with integrated video surveillance makes it
easier to find and verify suspicious activities.
Event-based recordings can be applied. For instance, a POS transaction or exception, or the
opening of a cash register drawer, can be used to trigger a camera to record and tag the recording. The scene prior to and following an event can be captured using pre- and post-event recording buffers. Event-based recordings increase the quality of the recorded material, as well
as reduce storage requirements and the amount of time needed to search for incidents.
Figure 11.3a An example of a POS system integrated with video surveillance. This screenshot displays the receipt
together with video clips of the event. Picture courtesy of Milestone Systems.
11.3.2 Access control
Integrating a video management system with a facility’s access control system allows for facility and room access to be logged with video. For example, video can be captured at all doors
when someone enters or exits a facility. This allows for visual verification when exceptional
events occur. In addition, identification of tailgating events can also be made. Tailgating occurs
when, for instance, the person who swipes his/her access card knowingly or unknowingly enables others to gain entry without having to swipe a card.
11.3.3 Building management
Video can be integrated into a building management system (BMS) that controls a number of
systems ranging from heating, ventilation and air conditioning (HVAC) to security, safety, energy and fire alarm systems. The following are some application examples:
> An equipment failure alarm can trigger a camera to show video to an operator, in addition
to activating alarms at the BMS.
A fire alarm system can trigger a camera to monitor exit doors and begin recording for
security purposes. This makes it possible for first responders and building managers to assess
the situation at all emergency exits in real time and focus their efforts where they are
needed the most.
> Intelligent video can be used to detect reverse flow of people into a building due to an open
or unsecured door from events such as evacuations.
> Automatic video alerts can be sent when someone enters a restricted area or room.
> Information from the video motion detection functionality of a camera that is located in a
meeting room can be used with lighting and heating systems to turn the light and heat off
once the room is vacated, thereby saving energy.
11.3.4 Industrial control systems
Remote visual verification is often beneficial and required in complex industrial automation
systems. By having access to network video using the same interface as for monitoring a process, an operator does not have to leave the control panel to visually check on part of a process.
In addition, when an operation malfunctions, the network camera can be triggered to send
images. In some sensitive clean-room processes, or in facilities with dangerous chemicals,
video surveillance is the only way to have visual access to a process. The same goes for electrical grid systems with substations in remote locations.
Tracking systems that involve RFID (radio-frequency identification) or similar methods are used
in many applications to keep track of items. For example, tagged items in a store can be tracked
together with video footage to prevent theft or provide evidence. Another example is luggage
handling at airports whereby RFID can be used to track the luggage and direct it to the correct
destination. If it is integrated with video surveillance, there is visual evidence when luggage is
lost or damaged, and search routines can be optimized.
bandwidth and storage considerations - CHAPTER 12 127
Bandwidth and storage considerations
Network bandwidth and storage requirements are important considerations when
designing a video surveillance system. The factors include the number of cameras,
the image resolution used, the compression type and ratio, frame rates and scene
complexity. This chapter provides some guidelines on designing a system, along with
information on storage solutions and various system configurations.
12.1 Bandwidth and storage calculations
Network video products utilize network bandwidth and storage space based on their configuration. As mentioned earlier, this depends on the following:
Number of cameras
Continuous or event-triggered recording
Edge recording in the camera/video encoder, server-based recording or a combination
Number of hours per day the camera will be recording
Frames per second
Image resolution
Video compression type: H.264, MPEG-4, Motion JPEG
Scenery: Image complexity (e.g., gray wall or a forest), lighting conditions and amount of
motion (e.g., office environment or crowded train stations)
How long data must be stored
12.1.1 Bandwidth needs
In a small surveillance system involving fewer than 10 cameras, a basic 100-megabit (Mbit)
network switch can be used without having to consider bandwidth limitations. Most companies
can implement a surveillance system of this size using their existing network. When implementing 10 or more cameras, the network load can be estimated using a few rules of thumb:
A camera that is configured to deliver high-quality images at high frame rates will use
approx. 2 to 3 Mbit/s of the available network bandwidth.
With more than 12 to 15 cameras, consider using a switch with a gigabit backbone. If a
gigabit-supporting switch is used, the server that runs the video management software
should have a gigabit network adapter installed.
Technologies that enable the management of bandwidth consumption include the use of VLANs
on a switched network, Quality of Service and event-triggered recordings. For more on these
topics, see chapters 9 and 11.
12.1.2 Calculating storage needs
One of the factors affecting storage requirements is the type of video compression used. The
H.264 compression format is by far the most efficient video compression technique available
today. Without compromising image quality, an H.264 encoder can reduce the size of a digital
video file by more than 80% compared with the Motion JPEG format. This means much less
network bandwidth and storage space are required for an H.264 video file.
Sample storage calculations for the two compression formats, H.264 and Motion JPEG, are
provided in the tables below. Because of a number of variables that affect average bit rate
levels, calculations are not so clear-cut for H.264. With Motion JPEG, there is a clear formula
because Motion JPEG consists of one individual file for each image. Storage requirements for
Motion JPEG recordings vary depending on the frame rate, resolution and level of compression.
H.264 calculation:
Approx. bit rate / 8(bits in a byte) x 3600s = KB per hour / 1000 = MB per hour
MB per hour x hours of operation per day / 1000 = GB per day
GB per day x requested period of storage = Storage need
Frames per
Bit rate
Hours of
HDTV 720p
HDTV 1080p
Table 12.1a The figures above are based on continuous recording with lots of motion in a scene, e.g., at a station. With
fewer changes in a scene, the figures can be 20% lower. The amount of motion in a scene can have a big impact on the
amount of storage required.
bandwidth and storage considerations - CHAPTER 12 129
Motion JPEG calculation:
Image size x frames per second x 3600s = Kilobyte (KB) per hour/1000 = Megabyte (MB) per hour
MB per hour x hours of operation per day / 1000 = Gigabyte (GB) per day
GB per day x requested period of storage = Storage need
Frames per
Bit rate
Hours of
HDTV 720p
HDTV 1080p
Table 12.1c The figures above are based on continuous recording with lots of motion in a scene, e.g. at a station. With
fewer changes in a scene, the figures can be 20% lower. The amount of motion in a scene can have a big impact on the
amount of storage required.
A helpful tool in estimating requirements for bandwidth and storage is the AXIS Design Tool, which
is accessible from the following web address:
Figure 12.1a AXIS Design Tool includes advanced project management functionality that enables bandwidth and
storage to be calculated for a large and complex system.
12.2 Edge storage
Edge storage—sometimes referred to as local storage or onboard recording—is a concept in Axis
network cameras and video encoders that allows network video products to create, control and manage recordings locally on an SD (Secure Digital) memory card, network-attached storage (NAS) or
file server.
Edge storage enables the possibility to design flexible and reliable recording solutions. They include
increased system reliability, high-quality video in low bandwidth installations, recording for remote
and mobile surveillance, and integration with video management software.
AXIS Camera Companion is an example of a video management system based on edge storage,
whereby all video is recorded on the memory card in the network camera or video encoder and the
need for central storage is eliminated. A 64 GB SDXC card can record more than a month of video
using motion-based recording with HDTV 720p resolution and 15 frames per second. For more information on AXIS Camera Companion, see Chapter 11.
Edge storage can work as a complement to central storage. It can record video locally when the
central system is not available, or continuously record in parallel. When used together with video
management software such as AXIS Camera Station, failover recordings can be handled. This means
that missing video clips from network disruptions or central system maintenance can be retrieved
later from the camera and merged with the central storage, ensuring the user gets uninterrupted
video recordings.
System redundancy example
Edge storage video merged after failure
Figure 12.2a Edge storage for redundancy (failover recording).
Additionally, edge storage can improve video forensics for systems with low network bandwidth
where video cannot be streamed at the highest quality. By supporting low bandwidth monitoring
with high-quality local recordings, users can optimize bandwidth limitations and still retrieve highquality video from incidents for detailed investigation.
Edge storage can also be used to manage recordings in remote locations and other installations
where there is intermittent or no network availability. On trains and other rail bound vehicles, edge
storage can be used to first record video onboard and then transfer to the central system when the
vehicle stops at a depot.
bandwidth and storage considerations - CHAPTER 12 131
12.2.1 Edge storage with SD cards or NAS
There are pros and cons of using SD cards or NAS for edge storage. (More details on NAS are
provided in Section 12.4 below.) The following are some considerations:
SD cards are easier to deploy and configure than NAS.
SD cards are limited in storage compared with NAS. NAS can store terabytes of data.
SD cards can be tampered with if reachable by authorized persons. A NAS can be located in
a place that is secured.
SD cards are resilient to single point of failure. If the NAS or its connection is disrupted,
multiple cameras will be affected.
The expected lifespan of the disk in a NAS is longer than an SD card’s. The NAS can also
have RAID configuration. See Section 12.5 for more on RAID.
SD cards may be costly to replace if the camera is mounted in hard-to-reach places such as
on a pole or wall more than 4.5 m (15 ft.) off the ground.
NAS is the only edge storage option for cameras without an SD card slot.
12.3 Server-based storage
Server-based storage involves a PC server that is connected locally to the network video products
for video management and recording. The server would run a video management software application that records video to either the local hard disk (called a direct-attached storage) or to a NAS.
Depending on a PC server’s central processing unit (CPU), network card and internal RAM
(Random Access Memory), a server can handle a certain number of cameras, frames per second
and image sizes. Most PCs can hold several hard disks, and each disk can be up to several terabytes. With the AXIS Camera Station video management software, for instance, one hard disk
is suitable for storing recordings from up to 15 cameras when using H.264, or between 8 and
10 cameras when using Motion JPEG.
12.4 NAS and SAN
When the amount of stored data and management requirements exceed the limitations of a
direct-attached storage, a network-attached storage or storage area network (SAN) allows for
increased storage space, flexibility and recoverability.
Axis network cameras
Network switch,
broadband router or
corporate firewall
Computer server with video
management software
Figure 12.4a Network-attached storage
NAS provides a single storage device that is directly attached to a LAN and offers shared storage
to all clients on the network. A NAS device is simple to install and easy to administer, providing a
low-cost storage solution. However, it provides limited throughput for incoming data because it
has only one network connection, which can become problematic in high-performance systems.
SANs are high-speed, special-purpose networks for storage, typically connected to one or more
servers via fiber. Users can access any of the storage devices on the SAN through the servers,
and the storage is scalable to hundreds of terabytes. Centralized storage reduces administration and provides a high-performance, flexible storage system for use in multi-server environments. Fibre Channel technology is commonly used to provide data transfers up to 16 Gbit/s
and to allow large amounts of data to be stored with a high level of redundancy.
Fiber channel
Fiber channel
Fiber channel switch
RAID disk
RAID disk
Figure 12.4b A SAN architecture where storage devices are tied together and the servers share the storage capacity.
bandwidth and storage considerations - CHAPTER 12 133
12.5 Redundant storage
SAN systems build redundancy into the storage device. Redundancy in a storage system allows
video, or any other data, to be saved simultaneously in more than one location. This provides a
backup for recovering video if a portion of the storage system becomes unreadable. There are
a number of options for providing this added storage layer in an IP surveillance system, including a Redundant Array of Independent Disks (RAID), data replication, server clustering and
multiple video recipients.
RAID. RAID is a method of arranging standard, off-the-shelf hard drives such that the operating system sees them as one large hard disk. A RAID setup spans data over multiple hard disk
drives with enough redundancy so that data can be recovered if one disk fails. There are different levels of RAID, ranging from practically no redundancy to a full-mirrored solution in which
there is no disruption and no loss of data in the event of a hard disk failure.
Data replication. This is a common feature in many network operating systems. File servers in a
network are configured to replicate data among each other, providing a backup if one server fails.
Figure 12.5a Data replication.
Server clustering. A common server clustering method is to have two servers work with the
same storage device, such as a RAID system. When one server fails, the other identically configured server takes over. These servers can even share the same IP address, which makes the
so-called “fail-over” completely transparent for users.
Multiple video recipients. A common method to ensure disaster recovery and off-site storage in
network video is to simultaneously send the video to two different servers in separate locations.
These servers can be equipped with RAID, work in clusters, or replicate their data with servers
even further away. This is an especially useful approach when surveillance systems are in hazardous or not easily accessible areas, such as in mass-transit installations or industrial facilities.
12.6 System configurations
Small system
Using an edge storage solution such as AXIS Camera Companion, users can manage video recordings on memory cards for up to 16 cameras/video encoders. Since all video is stored on the
edge, there is no need to have dedicated recording equipment such as a server running during
operation, making the system very simple.
12.6 System configurations
Small system
Using an edge storage solution such as AXIS Camera Companion, users can manage video recordings on memory cards for up to 16 cameras/video encoders. Since all video is stored on the
edge, there is no need to have dedicated
recording equipment such as a server running during
operation, making the system very simple.
Figure 12.6a A small system using an edge storage solution such as AXIS Camera Companion.
Hosted video system
In a hosted video setup (often referred to as cloud computing), the system requirements are
handled by a hosting provider and a video service provider such as a security integrator or
alarm monitoring center, which in turn provides end users with access to live and recorded
video over the Internet. In an AXIS Video Hosting System (AVHS) setup, the AVHS software is
installed on a hosting provider’s server that serves as both a web and recording server. Together with the One-Click Camera Connection feature that is supported in Axis network video
products, it is easy to install cameras/encoders to the system regardless of the Internet service
provider, routers and firewall settings. The solution supports up to 10 cameras per site in single
or multiple locations.
Customer site
AVHS server and storage
Figure 12.6b A hosted video system involving a hosting provider with its server farm, a video service provider that
provides security services, and cameras/video encoders at the site to be monitored. End users gain access to videos by
logging on to an Internet site.
bandwidth and storage considerations - CHAPTER 12 135
Medium system
A typical, medium-sized installation has a server with additional storage attached to it. The
storage is usually configured with RAID in order to increase performance and reliability. The
videos are normally viewed and managed from a client rather than from the recording server
Application and
storage server
client (optional)
RAID storage (optional)
Figure 12.6c A medium system.
Large centralized system
A large-sized installation requires high performance and reliability in order to manage the large
amount of data and bandwidth. This requires multiple servers with dedicated tasks. A master
server controls the system and decides what kind of video is stored at what storage server. As
there are dedicated storage servers, it is possible to do load balancing. In such a setup, it is also
possible to scale up the system by adding more storage servers when needed and do maintenance without bringing down the entire system.
Master server 1
Master server 2
Figure 12.6d A large centralized system.
Storage server 1
Storage server 2
Large distributed system
When multiple sites require surveillance with centralized management, distributed recording
systems may be used. Each site records and stores the video from local cameras. The master
controller can view and manage recordings at each site.
Storage server RAID
Storage server RAID
Figure 12.6e A large distributed system.
Tools and resources
Axis offers a variety of tools and information resources to help design IP surveillance systems. Many are accessible from the Axis website:
Axis Product Selector
This tool helps you select the right cameras or video encoders for your project. A version of this
tool, AXIS Guide iPhone app, is available for use on iPhone, iPod Touch and iPad.
Axis Accessory Selector Tool
This tool helps you pick the right housing, bracket and power accessory for the cameras in your
AXIS Camera Companion Buyers Tool
Pick the cameras, storage and networking devices you need for a small surveillance system with
this user-friendly tool.
Axis Lens Calculator
Use the Axis Lens Calculator to easily establish for a specific camera the optimal camera placement and required focal length for a particular scene size and resolution.
AXIS Design Tool
Estimate the storage and network bandwidth needs for your system. This tool lets you experiment
with viewing, recording and compression options for each camera.
Axis Coverage Shapes for Microsoft Visio
This tool visualizes the coverage of cameras in a layout drawing to help you ensure that all critical
areas are covered.
Axis Camera Families for Autodesk® Revit®
Design surveillance systems based on Axis cameras
directly in your Autodesk Revit 3D CAD building layout.
Axis’ innovative Revit security camera families provide 3D
camera models to illustrate what the camera setup will
look like in reality and which areas the surveillance system
will cover once installed.
Intelligent Network Video: Understanding modern surveillance systems
This 390-page hardcover book is authored by Fredrik Nilsson and Axis Communications. It represents the first resource to provide detailed coverage of
advanced digital networking and intelligent video capabilities. Published in
September 2008, the book is available for purchase through Amazon, Barnes
& Noble and CRC Press, or contact your local Axis office. A second edition is
planned for release at the end of 2013.
Axis Communications’ Academy
Building your strengths in network video.
At Axis Communications, we understand that your business success depends on
continually building your strengths and staying on top of the latest technology to
offer your customers the very best.
We have designed Axis Communications’ Academy to work with every facet of
your business, providing training, tools and quick reference help for everything
your customers expect you to be an expert in—as well as the things they don’t
even know they need yet.
Whether you need instant help with a specific customer situation or comprehensive training to reach your long-term business goals, Axis Communications’
Academy has what you need, when you need it. From sales and system design to
installation, configuration and ongoing customer care.
Choose from a wide range of online tools and training as well as interactive
classes and seminars.
> Classroom training
> Online courses
> Business seminars
> Tutorials and guides
> System design tools
> Axis Certification Program
For more information,
visit Axis’ Learning Center
About Axis Communications
As the market leader in network video, Axis is leading the
way to a smarter, safer, more secure world — driving the
shift from analog to digital video surveillance. Offering
network video solutions for professional installations,
Axis’ products and solutions are based on an innovative,
open technology platform.
Axis has more than 1,400 dedicated employees in 40
locations around the world and cooperates with partners covering 179 countries. Founded in 1984, Axis is a
Sweden-based IT company listed on NASDAQ OMX
Stockholm under the ticker AXIS. For more information
about Axis, please visit our website
Electro Mechanical Technology
(765) 296-3661 | [email protected] |
©2006-2013 AXIS COMMUNICATIONS, AXIS, ETRAX, ARTPEC and VAPIX are registered trademarks or trademark
applications of Axis AB in various jurisdictions. All other company names and products are trademarks or registered
trademarks of their respective companies.
Microsoft and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States
and/or other countries. Mac OS, iPad, iPhone and iPod are trademarks or registered trademarks of Apple Inc. in the
United States and/or other countries. SMPTE is a registered trademark or trademark of Society of Motion Picture
and Television Engineers, Inc. in the United States and/or other countries. The UPnP® Certification Word and Logo
Mark and the UPnP Forum℠ Word and Logo Mark are trademarks or registered trademarks of UPnP Forum. SD, SDHC
and SDXC are trademarks or registered trademarks of SD-3C, LLC in the United States, other countries or both. Wi-Fi
Protected Access®, Wi-Fi Protected Setup™, WPA™ and WPA2™, are registered trademarks or trademarks of the Wi-Fi
Alliance. Autodesk and Revit are registered trademarks or trademarks of Autodesk, Inc., and/or its subsidiaries and/
or affiliates in the USA and/or other countries.
Some Axis products include software developed by the OpenSSL Project for use in the OpenSSL Toolkit
(, and cryptographic software written by Eric Young ([email protected]).
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF