Voice Over IP (VoIP)
Voice Over IP
Ayse Yasemin Seydim
[email protected]
Southern Methodist University
EE 8302 Fall 1999
List of Figures
List of Tables
1. Introduction
2. Overview of VoIP Applications and Services
2.1. Applications
2.2. Implementation Considerations
2.3. Summary of Benefits
3. VoIP Technologies and QoS Issues
3.1. Speech Quality and Characteristics
4. VoIP Equipment, Protocols and Standards
4.1. H.323
5. An Example Software Architecture For VoIP
6. Summary and Conclusions
Figure 1. VoIP Architecture
Figure 2. VoIP Protocol Stack
Figure 3. Voice Gateway/Terminal Functions
Table 1. VoIP Network Protocols
Table 2. H.323 and Related Recommendations
Table 3. Other VoIP Protocols
With the aim of reducing communication costs, efforts of integrating voice and data
networks have been a rising priority for many companies. Organizations have been
working on the solutions which would make them use the excess capacity on broadband
networks for voice and data transmission, as well as utilize the Internet and company
Intranets as alternatives to expensive systems. At the same time, more and more
companies are seeing the value of transporting voice over IP networks to reduce
telephone and facsimile costs and to set the stage for advanced multimedia applications.
Providing high quality telephony over IP networks is one of the key steps in the
convergence of voice, fax, video, and data communications services.
This alternative application, Voice over IP -VoIP- technologies vary in complexity
from simple personal computer software packages that pass digitized voice into a network
to more complicated hardware and software products. All of these products are mainly
focused on using IP (Internet Protocol) as a voice transport mechanism, providing low
cost communication. It has many problems to be considered such as, delay, jitter, packet
loss, echo cancellation, and by this time, the implementations should also consider more
interoperability, reliability, and these problems in the design of the network. By this time,
using VoIP in the company Intranets is more reasonable considering the cost and benefit
ratio. The paper provides a brief overview of the technology and how this technology can
be applied for the integration of voice and data networks. It also gives the main
characteristics and typical hardware and software architecture and cover most VoIP
system issues, including the technical problems and Quality of Service concerns. It
provides a brief summary about the standards for VoIP.
In today’s world, most of the communication is in digital form and data is transported via
packet networks such as IP (Internet Protocol), ATM (Asynchronous Transfer Mode), and
Frame Relay. Since data traffic is growing much faster than telephone traffic, there has
been considerable interest in transporting voice over data networks. Organizations have
been working on the solutions which would allow them to use the excess capacity on
broadband networks for voice and data transmission, as well as to utilize the Internet and
company Intranets as alternatives to more expensive systems.
Voice over IP (or VoIP) is defined as “the ability to make telephone calls and send
facsimiles over IP-based data networks with a suitable quality of service (QoS) and a
superior cost/benefit” [1], [2]. Equipment developers and manufacturers see VoIP as a
new opportunity to innovate and compete. The challenge for them is turning this vision
into reality by developing new VoIP-enabled equipment. For Internet Service Providers
(ISPs), the possibility of introducing usage-based pricing and increasing their traffic
volumes is very attractive. On the other hand, users (individuals and organizations) are
interested in the integration of voice and data applications as well as cost benefits.
Although, support for voice communications using IP, VoIP, has become
attractive mainly for its low-cost, flat-rate pricing of the public Internet, the technology
has not been developed to the point where it can replace the services and quality provided
by the public network. In VoIP, the voice signal is digitized, compressed, and sliced into
packets and sent with other packets across the packet switched network. At the receiving
end, re-assembled packets arrive as normal sound voice call. Successful delivery of voice
over packet networks presents a tremendous opportunity; however, implementing the
products is not so straightforward with all the varieties in the standards, user
requirements, interoperability, scalability issues and the need for research.
This paper presents a brief overview of the VoIP technology with its applications
and services. Section 2 includes the implementation considerations and summary of
benefits. In Section 3, we pointed out the technological constraints and included the brief
hardware and software architecture in Sections 4 and 5. Summary and conclusions are
included in the last section.
The role of Public Switched Telecommunications Network (PSTN) has been providing
dedicated circuit connection between calling parties over which voice can be transported
with good quality. The networking technology has been improved to include sophisticated
software-driven switching systems, cross-connect systems, and signaling networks to
control setup and provide new services. With the evolution, the promise for the future
becomes supporting all standard call services on a IP-based network and take advantage
of the efficiencies of a shared packet network as well as the opportunity to provide an
open environment for the creation of new services.
The Internet has evolved over the last thirty years, but has exploded in growth in
the last three to four years with the usage of browsers, the World Wide Web (WWW),
and electronic messaging (i.e., e-mail). The Internet, using the Internet Protocol(IP),
provides a connectionless, single priority service which is not designed for real-time
applications, such as voice conversations. However, in 1995 VocalTec Communications,
Ltd.[3] introduced a software package that allowed PC users to carry real-time voice
conversation over the Internet software (Internet Phone). For the first time, users of PCs
with a sound card, microphone, earphone (or speaker) and Internet connection could talk
to others with similar systems anywhere in the world with no long-distance charges. The
voice quality was poor, primarily due to the delay encountered on the public Internet, but
it was a free call, innovative, and demonstrated the capability of Voice over Internet
Protocol (VoIP) and triggered new markets and studies. Some figures showing the trend
in IP telephony is presented in [4] which is taken from the Voice on the Net Conference
Fall 98. It is stated that, in 1998, there were approximately 80 Million minutes of use of
IP Telephony and more than One Trillion minutes of POTS (Plain Old Telephone
System). In the meantime, the US Market for VoIP services was roughly $157 Million,
whereas the total US market for telecommunications services in 1997 was around $230
Billion. It is pointed out that, IP Telephony minutes are growing 40% to 50% per month,
and by 2004; 5% to 20% of long distance calls is estimated to be by VoIP. The Fax over
IP market is estimated to be $25 Billion. To show the marketplace of these innovations,
we can give a small list of the vendors in VoIP products and services as: Cisco, Clarent,
elemedia (only software), Ericcson, Inter-Tel, Linkon (software), Lucent, MICOM
(NortelNetworks), Motorola, NetSpeak, NeTrue, Nuera, RADVision, Telogy (softwareTexas Instruments), Vienna Systems, VocalTec and others.
2.1 Applications
In order to provide access by using a basic telephone new IP Telephony Service Providers
(ITSPs) began developing gateway1 products for voice that provided an interface between
the PSTN and the Internet. By deploying these gateways in strategic locations, they were
able to provide VoIP telephone service over high usage and expensive tariff paths. Since
ITSPs are not considered Interexchange Carriers (IECs), they were not subject to access
charges that are normally paid to the Local Exchange Carriers (LECs), or settlement
charges for international calls[4]. This allowed them to provide service at significant
discounts over the rates charged by standard telecommunication carriers. On the other
hand, quality of the voice was still subject to the delays and uneven service provided by
the public Internet. The next generation of ITSPs began using dedicated backbone
networks that were well managed and not subject to the same variable as public Internet.
A typical phone-to-phone call using an ITSP involves three separate network
segments and a multistep setup process. The caller dials an access number to connect over
Gateway is an endpoint device like an interface unit and it is defined as an H.323 entity which provides
real-time, two-way communications between H.323 terminals on the LAN and other ITU terminals on a
WAN, or to another H.323 Gateway [9].
the PSTN to the originating gateway. The caller then enters a Personal Identification
Number (PIN) to identify themselves for billing purposes. Many ITSPs use a debit
approach to billing, where the subscriber prepays for a certain number of minutes, which
are reduced as calling times add up. This provides additional savings to the ITSP, which
collects interest on the unconsumed balance of predeposited funds and does not have to
create, print, and send bills or deal with bad depts. After entering the PIN, the caller then
enters the number of the person they want to call. The originating gateway must translate
those dialed digits into the IP address of the destination gateway that serves that calling
area. The translation tables may reside in the gateway itself or may be contained in a
centralized network resource. The originating gateway exchanges information such as
dialed digits and voice compression scheme to be used with the destination gateway, all
via the IP network. The destination gateway then places a call over the PSTN to the
destination party. The voice travels over the circuit connection in the originating LEC to
the originating gateway, through the IP network between the gateways, and over the
circuit connection in the destination LEC down to the called party.
Corporations have taken advantage of the advances in VoIP technology as well,
locating gateways to connect onsite voice networks (e.g., via Private Branch Exchanges PBXs) to the corporate data network - Intranet - to support internal voice calling. This
provides potential savings by reducing minutes of usage charges that is paid to traditional
telecommunication carriers and also by reducing the number of access lines needed for
voice lines into the PSTN. Sometimes, corporations can also use this architecture to save
on toll calls to public numbers in areas where they have corporate sites. The call is carried
over the corporate data network to the destination city, then a local call is placed from the
corporate site to the destination number. This is called as call tail end hop-off [4].
While the initial motivations for voice and fax over IP has been cost savings, the
long term vision of VoIP are the new services that could be provided by integrating voice
and of data on the same network. Creating new applications that are not possible (or
feasible) before this on the circuit switched network is likely to be the main advantage of
VoIP rather than just discount telephony. Some examples of these services include
unified messaging and Internet call centers.
“Unified messaging is loosely defined as providing a common interface to manage
all types of messages: voice mail, e-mail, fax, and ultimately multimedia messages” [4].
All message manipulation (retrieving, saving, forwarding, attaching, broadcasting, etc.) is
performed in the same standard manner. This requires a common interface and
technologies, such as text-to-speech conversion and speech recognition. Allowing all of
the information to be transported over a common network and protocol is a big step to
realizing the full benefits of the service.
Internet Call Centers deliver personalized service at the click of a button by
combining the expert capabilities of a organization's call center with the self-service
functionality and spontaneity of the World Wide Web[5]. In general, it allows an agent to
participate as a part of a call center from any location that has access to the Internet. Calls
and customer profile data can be coordinated and directed to any address, allowing call
distribution functions to act on data such as network congestion, agent capabilities, or
time of day when routing the call. The agent can also coordinate data interaction with a
caller who is also connected to the Internet, such as pushing relevant web pages to the
caller’s PC.
One of the immediate applications for IP telephony is real-time facsimile
transmission. Facsimile services normally use dial-up PSTN connections, at speeds up to
14.4 kb/s, between pairs of compatible fax machines. Transmission quality is affected by
network delays, machine compatibility, and analog signal quality. To operate over packet
networks, a fax interface unit must convert the data to packet form, handle the conversion
of signaling and control protocols (the T.30 and T.4 standards), and ensure complete
delivery of the scan data in the correct order. For fax transmission applications, Fax over
IP (FoIP), packet loss and end-to-end delay are more critical than in voice applications
and it is out of this paper’s scope.
2.2 Implementation Considerations
VoIP can be applied to almost any voice communications requirement, ranging from a
simple inter-office intercom to complex multi-point teleconferencing/shared screen
environments. The quality of voice reproduction to be provided can also be tailored
according to the application. Therefore, VoIP equipment must have the flexibility to
provide a wide range of configurations and environments and the ability to blend
traditional telephony with VoIP.
The main issues that needs to be considered in VoIP implementations today are
quality of voice over a shared data network, reliability of the network itself for
establishing and maintaining calls, interworking with the PSTN for consistent call setup,
billing, and maintenance, standards for VoIP call setup, and the interoperability between
networks and vendor products, improving the scalability and reducing the cost of gateway
products and dealing with the uncertain regulatory environment as ITSPs begin to look
more like IECs.
Most VoIP applications that have been defined are considered to be real-time
activities. It is stated in [4] that store-and-forward voice services can be implemented
using VoIP. For example, voice messages could be prepared locally using a telephone and
delivered to an integrated voice/data mailbox using Internet or Intranet services. Voice
annotated documents, multimedia files, etc. can also become standard within office suites
in the near future. The key issue is that the real-time and store-and-forward modes of
operation need to be compatible and interoperable.
2.3 Summary of Benefits
The benefits of VoIP technology can be summarized as follows :
Cost Reduction. Although reducing long distance telephone costs is always a popular
topic and would provide a good reason for introducing VoIP, the actual savings over the
long term are still a subject of debate in the industry. Flat rate pricing is available with the
Internet and can result in considerable savings for both voice and facsimile (at least
currently). It has been estimated that up to 70% of all calls to Asia are facsimile, most of
which could be replaced by FoIP. These lower prices are based on avoiding telephony
access charges and settlement fees rather than being a fundamental reduction in resource
costs. The sharing of equipment and operation costs across both data and voice users can
also improve network efficiency since excess bandwidth on one network can be used by
the other.
Simplification. An integrated infrastructure that supports all forms of communication
allows more standardization and reduces the total equipment investment. This combined
infrastructure can support dynamic bandwidth optimization and a fault-tolerant design.
The differences between the time of day and geographic traffic patterns of voice and data
offer further opportunities for significant efficiency improvements.
Consolidation. Since people are among the most significant cost elements in a
network, any opportunity to combine operations, to eliminate points of failure, and to
consolidate accounting systems would be beneficial. In the enterprise, system
management can be provided for both voice and data services using VoIP. Universal use
of the IP protocols for all applications provides both reduced complexity and more
flexibility and also related facilities such as directory services and security services may
be more easily shared.
Advanced Applications. Even though basic telephony and facsimile are the initial
applications for VoIP, the longer term benefits are expected to be derived from
multimedia and multi-service applications. For example, Internet commerce solutions can
combine WWW access to information with a voice call button that allows immediate
access to a call center agent from the PC. Needless to say, voice is an integral part of
conferencing systems that may also include shared screens, whiteboarding2, etc.
Combining voice and data features into new applications will provide the greatest returns
over the longer term.
The goal for developers is adding telephone calling capabilities (both voice transfer and
signaling) to IP-based networks and interconnecting these to the public telephone network
and to private voice networks in such a way to maintain current voice quality standards
and preserve the features everyone expects from the telephone. An overall architecture is
presented in Figure 1. These technologies range in complexity from software packages
that pass digitized voice to a specified destination in a network to sophisticated
hardware/software products providing toll-quality voice, directory services, and complex
voice quality assurance capabilities[6].
These products ail providing transparent,
business-quality communication in corporate Intranets and lower-cost, reduced-quality
operation in the Internet.
System Management
PSTN/IP Interworking
Representation Call
and Coding
Voice Transport
Figure 1. VoIP Architecture[2]
3.1. Speech Quality and Characteristics
Providing a level of quality that at least equals the PSTN (toll-quality voice) is viewed as
a basic requirement, although it is said that some experts argue that a cost versus function
versus quality trade-off should be applied. Although QoS usually refers to the fidelity of
the transmitted voice and facsimile documents, it can also be applied to network
availability (i.e., call capacity, or level of call blocking), telephone feature availability
(conferencing, calling number display, etc.), and scalability (any-to-any, universal,
Whiteboarding : sharing and editing document, photos and drawings with others in real-time.
Although standardized measures have been developed by the ITU, the quality of
sound reproduction over a telephone network is fundamentally subjective. It has been
found that there are three factors that can profoundly impact the quality of the service [2].
Delay: Two problems that result from high end-to-end delay in a voice network
are echo and talker overlap. Echo is caused by the signal reflections of the speaker’s voice
from the far end telephone equipment back into the speaker’s ear[7]. Echo becomes a
significant problem when the round-trip delay is more than 50 milliseconds. Since echo is
perceived as a significant quality problem, VoIP systems must address the need for echo
control and implement some means of echo cancellation. Talker overlap -the problem of
one caller stepping on the other talker's speech- becomes significant if the one-way delay
becomes greater than 250 milliseconds. The major constraint and requirement for
reducing delay through a packet network is the end-to-end total delay time. The following
are sources of delay in an end-to-end voice over packet call [7]:
a. accumulation delay (or algorithmic delay):This delay is caused by the need to
collect a frame of voice samples to be processed by the voice coder. It is related to the
type of voice coder used and varies from a single waveform sample time (.125
microseconds) to many milliseconds. A representative list of standard voice coders and
their frame times are given as:
G.726-ADPCM (16, 24, 32, 40 kb/s)0.125 microseconds
G.728-LD-CELP(16 kb/s)-2.5 milliseconds
G.729-CS-ACELP (8 kb/s)-10 milliseconds
G.723.1-Multi Rate Coder (5.3, 6.3 kb/s)-30 milliseconds
b. processing delay: This delay is caused by the actual process of encoding and
collecting the encoded samples into a packet for transmission over the packet network.
The encoding delay is a function of both the processor execution time and the type of
algorithm used. Often, multiple voice coder frames will be collected in a single packet to
reduce the packet network overhead. For example, three frames of G.729 codewords,
equaling 30 milliseconds of speech, may be collected and packed into a single packet.
c. network delay: This delay is caused by the physical medium and protocols used
to transmit the voice data and by the buffers used to remove packet jitter on the receive
side. Network delay is a function of the capacity of the links in the network and the
processing that occurs as the packets transit the network. The jitter buffers add delay that
is used to remove the packet delay variation that each packet is subjected to as it transits
the packet network. This delay can be a significant part of the overall delay because
packet-delay variations can be as high as 70 msec to 100 msec in some frame-relay
networks and IP networks.
Jitter (Delay Variability): Jitter is the inter-packet arrival time as introduced by
the variable transmission delay over the network. Removing jitter requires collecting
packets and holding them long enough to allow the slowest packets to arrive in time to be
played in the correct sequence. This causes additional delay. The two conflicting goals of
minimizing delay and removing jitter have engendered various schemes to adapt the jitter
buffer size to match the time varying requirements of network jitter removal. This
adaptation has the explicit goal of minimizing the size and delay of the jitter buffer while
at the same time preventing buffer underflow caused by jitter.
Two approaches to adapting the jitter buffer size are given in [7]. The approach
selected will depend on the type of network the packets are traversing. The first approach
is to measure the variation of packet level in the jitter buffer over a period of time and to
incrementally adapt the buffer size to match the calculated jitter. This approach works
best with networks that provide a consistent jitter performance over time (e.g., ATM
networks). The second approach is to count the number of packets that arrive late and
create a ratio of these packets to the number of packets that are successfully processed.
This ratio is then used to adjust the jitter buffer to target a predetermined allowable late
packet ratio. This approach works best with the networks with highly variable packet
inter-arrival intervals (e.g., IP networks). In addition to the techniques described above,
the network must be configured and managed to provide minimal delay and jitter,
enabling a consistent QoS.
Lost Packet Compensation: Because the Internet is a packet-switched or
connectionless network, the individual packets of each voice signal may travel over
separate network paths for reassembly in the proper sequence at their ultimate destination.
While this provides for a more efficient use of network resources than the circuitswitched PSTN, which routes a call over a single path, it also increases the chances for
packet loss. Since all voice frames are treated as data, packets may be dropped under peak
loads and during periods of congestion (caused by link failures or congestion). Due to the
time sensitivity of voice transmissions, the normal Transmission Control Protocol (TCP)
based retransmission schemes are not suitable. Packet losses greater than 10% are
generally not tolerable. The data frames, however, are not time-sensitive and dropped or
erroneous packets can be appropriately corrected through the process of retransmission.
Lost voice packets, however, cannot be dealt with in this manner. Some schemes used by
voice over packet software to address the problem of lost frames are given in [7] as:
Interpolate for lost speech packets by replaying the last valid packet received during
the interval when the lost packet was supposed to be played out. This is a simple
method that fills the time between noncontiguous speech frames. It works well when
the incidence of lost frames is infrequent. It does not work very well when there are a
number of lost packets in a row or a burst of lost packets.
Send redundant information at the expense of bandwidth utilization. The basic
approach replicates and sends the nth packet of voice information along with the
(n+1)th packet. This method has the advantage of being able to exactly correct for the
lost packet. However, this approach uses more bandwidth and creates greater delay.
A hybrid approach uses a much lower bandwidth voice coder to provide redundant
information carried along in the (n+1)th packet. This reduces the problem of the extra
bandwidth required but fails to solve the problem of delay.
Echo Compensation: Echo in a telephone network is caused by signal reflections
generated by the hybrid circuit that converts between a 4-wire circuit (a separate transmit
and receive pair) and a 2-wire circuit (a single pair for both transmit and receive) [10].
These reflections of the speaker’s voice are heard in the speaker’s ear. Echo is present
even in a conventional circuit-switched telephone network. However, it is acceptable
because the round-trip delays through the network are smaller than 50 msec and the echo
is masked by the normal side tone every telephone generates.
Echo becomes a problem in voice over packet networks because the round-trip
delay through the network is almost always greater than 50 msec. Thus, echo cancellation
techniques are always used. ITU standard G.165 defines performance requirements that
are currently required for echo cancellers. The ITU is defining much more stringent
performance requirements in the G.IEC specification.
Echo is generated toward the packet network from the telephone network. The
echo canceller compares the voice data received from the packet network with voice data
being transmitted to the packet network. The echo from the telephone network hybrid is
removed by a digital filter on the transmit path into the packet network. On the other
hand, maintenance of acceptable voice quality levels despite inevitable variations in
network performance (such as congestion or link failures) is achieved using techniques
such as compression, silence suppression, and QoS-enabled3 transport networks. Several
developments in the 1990s, most notably advances in digital signal processor (DSP)
technology, high-powered network switches, and QoS-based protocols, have combined to
enable and encourage the implementation of voice over data networks. Low-cost, highperformance DSPs can perform the compression and echo cancellation algorithms
Software pre-processing of voice conversations can also be used to further
optimize voice quality. One technique, called silence suppression, detects whenever there
is a gap in the speech and suppresses the transfer of things like pauses, breaths, and other
periods of silence. This can amount to 50-60% of the time of a call, resulting in
considerable bandwidth conservation. Since the lack of packets is interpreted as complete
QoS-enabled network : a network architecture which brings together end hosts closer by increasing
performance and reducing delay of the underlying network. In order to do this the network should
implement service models so that services are specific to the traffic they service.
silence at the output, another function is needed at the receiving end to add "comfort
noise" to the output, so there is no perceptible or disturbing change in apparent
background noise level when silence occurs.
Another software function that improves speech quality is echo cancellation. As
was noted earlier, echo becomes a problem whenever the end-to-end delay for a call is
greater than 50 milliseconds. Sources of delay in a packet voice call include the collection
of voice samples (accumulation delay), encoding/decoding and packetizing time, jitter
buffer delays, and network transit delay. The ITU recommendation G.168 defines the
performance requirements that are currently required for echo cancellers.
Engineering a VoIP network (and the equipment used to build it) involves tradeoffs among the quality of the delivered speech, the reliability of the system, and the
delays inherent in the system. Minimizing the end-to-end delay budget is one of the key
challenges in VoIP systems. Ensuring reliability in a "best effort" environment is another.
Equipment producers that offer the flexibility to configure their systems to fit the
environment and thereby optimize the quality of the voice produced will have a
competitive advantage.
VoIP equipment, which can be categorized into client, access/gateway, and carrier
class/infrastructure segments, should be configurable and sufficiently flexible to add new
techniques as they become available. Producers that make use of embedded software
focuses on how to best utilize the functions instead of focusing on the problems
associated with implementing and testing the objects themselves. Real-time voice traffic
can be carried over IP networks in three different ways:
Voice trunks can replace the analog or digital circuits that are serving as voice trunks
(such as private links between company-owned PBXs) or PSTN-access trunks (links
between a PBX and the carrier). Voice packets
are transferred between pre-defined IP addresses, thereby eliminating the need for
phone number to IP address conversions.
PC-to-PC voice can be provided for multimedia PCs (i.e., PCs with a microphone and
sound system) operating over an IP-based network without connecting to the PSTN.
PC applications and IP-enabled telephones can communicate using point-to-point or
multipoint sessions. This type of system may emulate an Internet chat group and
could be combined with shared data systems like multimedia solutions.
Telephony (any phone-to- any other phone) communications appears like a normal
telephone to the caller but may actually consist of various forms of voice over packet
network, all interconnected to the PSTN. Gateway functionality is required when
interconnecting to the PSTN or when interfacing the standard telephones to a data
With each type of application, there are many standards and protocol that a
designer must think. Figure 2 presents the basic IP network protocol stack used to
implement VoIP. The brief explanations of the basic protocols are given in Table 1.
Network Laye(IPv4, Ipv6, IPM)
Data Link Layer
Physical Layer
Figure 2.VoIP Protocol Stack[2]
The most important consideration at the network level is to minimize unnecessary
data transfer delays. Providing sufficient node and link capacity and using congestion
avoidance mechanisms (such as prioritization, congestion control, and access controls)
can help to reduce overall delay. The ability to manage network and optimize route
choices will reduce the effects of jitter.
Table 1. VoIP Network Protocols
Other Standards
RTP (Real-time Transport Protocol)
RTCP (RTP Control Protocol)
RSVP (Resource Reservation Protocol)
IA 1.0
IPv4, IPv6, IP multicast and various
routing protocols
Various subnetworks including ATM and
Frame Relay
SNMP (Simple Network Management
LDAP (Lightweight Directory Access
Other Internet application protocols
IETF RFC1889, a real-time end-to end protocol utilizing
existing transport layers for data that has real-time properties
IETF RFC1889, a protocol to monitor the QoS and to convey
information about the participants in an ongoing session;
provides feedback on total performance and quality so that
modifications can be made
IETF RFC2205-2209, a general purpose signaling protocol
allowing network resources to be reserved for a connectionless
data stream, based on receiver-controlled requests
VoIP Forum Implementation Agreement 1.0 selecting protocol
options for interoperable VoIP
Internet standard Transport Layer protocols
Internet standard Network Layer protocols (currently IPv4 is
in widespread use) both for data transfer and routing
A variety of subnetworks can be used to carry IP datagrams
including LANs and WANs using a variety of transmission
Internet standard for communications between a manager and
a managed object
Internet standard for accessing Internet directory services
Several other application protocols are used in conjunction
with network nodes including FTP, Telnet, http/WWW, etc.
4.1. H.323
The Internet industry is tackling the problems of network reliability and sound quality on
the Internet through the gradual adoption of standards. Efforts for setting standards are
focusing on the three basic elements of Internet telephony: the audio codec format,
transport protocols, and directory services. In May 1996, the International
Telecommunications Union (ITU) ratified the H.323 specification, which defines how
voice, data, and video traffic will be transported over IP-based local area networks; it also
incorporates the T.120 data-conferencing standard[9]. The recommendation is based on
the real-time protocol/real-time control protocol (RTP/RTCP) for managing audio and
video signals.
H.323 addresses the core Internet-telephony applications by defining how delaysensitive traffic, (i.e., voice and video), gets priority transport to ensure real-time
communications service over the Internet. (The H.324 specification defines the transport
of voice, data, and video over regular telephony networks, while H.320 defines the
protocols for transporting voice, data, and video over ISDN networks.)
H.323 is a set of recommendations, one of which is G.729 for audio codecs, which
the ITU ratified in November 1995. Despite the ITU recommendation, however, the
Voice over IP Forum in March 1997 voted to recommend the G.723.1 specification over
the G.729 standard. The industry consortium, which is led by Intel and Microsoft, agreed
to sacrifice some sound quality for the sake of greater bandwidth efficiency-G.723.1
requires 6.3 kb/s, while G.729 requires 7.9 kb/s. Adoption of the audio codec standard,
while an important step, is expected to improve reliability and sound quality mostly for
Intranet traffic and point-to-point IP connections. To achieve PSTN-like quality,
standards are required to guarantee delay and jitter of Internet connections.
The transport protocol RTP, on which the H.323 recommendation is based,
essentially is a new protocol layer for real-time applications; RTP-compliant equipment
will include control mechanisms for synchronizing different traffic streams. However,
RTP does not have any mechanisms for ensuring the on-time delivery of traffic signals or
for recovering lost packets. RTP also does not address the so-called "quality of service"
(QoS) issue related to guaranteed bandwidth availability for specific applications.
Currently, there is a draft signaling-protocol standard aimed at strengthening the Internet’s
ability to handle real-time traffic reliably (i.e., to dedicate end-to-end transport paths for
specific sessions much like the circuit-switched PSTN does). If adopted, the resource
reservation protocol, or RSVP, will be implemented in routers to establish and maintain
requested transmission paths and quality-of-service levels.
On the other hand, there is a need for industry standards in the area of Internettelephony directory services. Directories are required to ensure interoperability between
the Internet and the PSTN, and most current Internet-telephony applications involve
proprietary implementations. However, the lightweight directory access protocol (LDAP
v3.0) is said to be emerging as the basis for a new standard.
The ability to digitize and process voice streams using self-contained software
building blocks is the key to success with VoIP implementation. VoIP equipment should
comply with the H.323 standard which has been defined by the ITU to describe terminals,
equipment, and services for multimedia communication over networks (such as LANs or
the Internet) that do not provide a guaranteed QoS. H.323 is a family of software-based
standards that define various options for compression and call control. Figure 3 illustrates
the functional components of terminals that use the H.323 standards. Table 2 gives a list
of the various standards that have been adopted as part of the H.323 family.
Although H.323 is the recognized standard for VoIP terminals, there are
additional standards that are more appropriately suited for client applications, such as IP
phones. As H.323 was originally designed for the desktop, a higher priority was given to
rich functionality, rather than resource allocation. This has given rise to alternative
protocols which are listed in Table 3, that can interoperate with H.323.
SNMP Messages
IP Packages
Figure 3 Voice Gateway/Terminal Functions[2]
Table 2. H.323 and Related Recommendations
G.723, G.723.1
G.729, G.729a
Brief Description
Document called "Visual telephone systems and equipment for local area
networks which provide a non-guaranteed quality of service" (November,
Call control messages including signaling, registration, and admissions,
and for the packetization and synchronization of media streams including
both point-to-point and multipoint calls
Messages for opening and closing channels for media streams, and other
commands, requests, and indications
Video codec for audio visual services at multiples of 64 kb/s
Specifies a codec for video over the PSTN
Audio codec for 3.1 kb/s bandwidth over 48,56, and 64 kb/s channels
(normal telephony)
Audio codec for 7 kb/s bandwidth over 48,56, and 64 kb/s channels
Audio codec for 3.1 kb/s bandwidth over 16 kb/s channels
Audio codec for 3.1 kb/s bandwidth over 5.3 and 6.3 kb/s channels
(G.723.1 has been selected by the VoIP Forum for use with VoIP)
Audio codec for 3.1 kb/s bandwidth over 8 kb/s channels (adopted by the
Frame Relay Forum for voice over Frame Relay)
Data and conference control
Table 3. Other VoIP Protocols
SGCP (Simple Gateway Control Protocol)
SAP (Session Announcement Protocol)
SIP (Session Initiation Protocol)
RTSP (Real-Time Streaming Protocol)
SDP (Session Description Protocol)
Brief Description
Simple UDP-based protocol for managing endpoints and
connections between endpoints.
Protocol used by multicast session managers to distribute a
multicast session description to a large group of recipients
Protocol used to invite an individual user to take part in a
point-to-point or unicast session
Protocol used to interface to a server that will provide realtime data
Describes the session for SAP, SIP and RTSP
A VoIP software solution should be designed with well-defined interfaces
between the modules, for example, the interface between the voice processing performed
on a DSP and the rest of the system must be clearly defined. This also allows the same
device to be configured to work with IP, Frame Relay, or ATM without a complete redesign. An example software architecture for VoIP is described in Section 5.
Voice and telephone calling can be viewed as one of many applications for an IP network,
with software being used to support the application and interface to the network. The
emergence of VoIP is a direct result of the advances that have been made in hardware and
software technologies in the early 1990s. The software functionality required for voice-topacket conversion in a VoIP terminal or gateway are stated in [7] as:
The Voice Processing module, which prepares voice samples for transmission over
the packet network. This software is typically run on a DSP.
The Call Processing (Signaling) module, which serves as a signaling gateway
allowing calls to be established across the packet network. This software supports
E&M (wink, delay and immediate), loop, or ground start Foreign Exchange Station
(FXS) and Foreign Exchange Office(FXO).
The Packet Processing module, which processes voice and signaling packets, adding
the appropriate transport headers prior to submitting the packets to the IP network (or
other packet networks). Signaling information is converted from telephony protocols
to the packet signaling protocol.
The Network Management module, which provides management agent functionality,
allowing remote fault, accounting, and configuration management to be performed
from standard management systems (see the next section). The Network Management
module could include ancillary services such as support for security features, access to
dialing directories, and remote access support.
The Voice Processing module must include, PCM (Pulse Code Modulation) Interface,
which receives samples from the telephony (PCM) interface and forwards them to the
appropriate VoIP software module for processing. The PCM interface performs
continuous phase re-sampling of output samples to the analog interface. It should add an
Echo Cancellation Unit, which performs echo cancellation on sampled, full-duplex voice
port signals in accordance with the ITU G.165 or G.168 standard. Since round-trip delay
for VoIP is always greater than 50 milliseconds, echo cancellation is a requirement.
Operational parameters may be programmable. The Voice Activity/Idle Noise Detector,
which suppresses packet transmission when voice signals are not present (and hence
saves additional bandwidth) must also be included. If no activity is detected for a period
of time, the voice encoder output will not be transported across the network. Idle noise
levels are also measured and reported to the destination so that "comfort noise" can be
inserted into the call so that the listener does not get “dead air” on their telephone. It may
include a Tone Detector, which detects the reception of DTMF (Digital Tone MultiFrequency) tones and discriminates between voice and facsimile signals. These can be
used to invoke the appropriate voice processing functions (i.e., the decoding and
packetizing of facsimile information or the compression of voice). The Tone Generator
will generate DTMF tones and call progress tones under command of the operating
system. An additional Facsimile Processing module, will provide a facsimile relay
function by demodulating the PCM data, extracting the relevant information, and packing
the scan data into packets. The Packet Voice Protocol module, encapsulates the
compressed voice and fax data for transmission over the data network. Each packet
includes a sequence number that allows the received packets to be delivered in the correct
order. This also allows silence intervals to be reproduced properly and lost packets to be
detected. On the other hand, a Voice Playout module at the destination, to buffer the
packets that are received and to forward them to the voice codec for playout will be
useful. This module provides an adaptive jitter buffer and a measurement mechanism that
allows buffer sizes to be adapted to the performance of the network.
The Call Processing (signaling) subsystem detects the presence of a new call and
collects addressing information. Various telephony signaling standards must be
supported. A number of functions must be performed if full telephone calling is to be
supported. The interface to the telephone network must be monitored to collect incoming
commands and responses. The signaling protocols must be terminated and the
information must be extracted. The signaling information must be mapped into a format
that can be used to establish a session across the packet network. Telephone numbers
(E.164 dial addresses) must be converted into IP addresses (with the possible need for an
external reference to a directory service). Two approaches to dialing are being used:
single stage (dial the destination number and use automatic route selection functions), and
two stage (dial the VoIP gateway number, then dial the real destination).
Data traffic has traditionally been forced to fit onto the voice network. On the other hand,
Internet has created an opportunity to reverse this integration strategy -voice and facsimile
can now be carried over IP networks, with the integration of video and other multimedia
applications close behind. Internet and its underlying TCP/IP protocol suite have become
the driving force for new technologies, with the unique challenges of real-time voice
being the latest in a series of developments. Consequently, the market for VoIP products
is established and is in rapid growth.
Several factors will influence future developments in VoIP products and services.
Currently, the most promising areas for VoIP are corporate Intranets and commercial
extranets. Another influential element in the ongoing Internet-telephony evolution is the
VoIP gateway. As these gateways evolve from PC-based platforms to robust embedded
systems, each will be able to handle hundreds of simultaneous calls. Consequently,
corporations will deploy large numbers of them in an effort to reduce the expenses
associated with high-volume voice, fax, and video-conferencing traffic. The economics of
placing all traffic- data, voice, and video-over an IP-based network will pull companies in
this direction, simply because IP will act as a unifying agent, regardless of the underlying
architecture (i.e., leased lines, frame relay, or ATM) of an organization’s network.
Implementations of VoIP systems must provide interoperability, since in a public
environment different products will need to be able to work together. Using common
software that is compatible to all applicable standards can significantly reduce the cost of
product development. The VoIP network, whether by design or through management,
should be fault-tolerant with only a very small likelihood of complete failure. In
particular, the gateway between the Telephone and VoIP systems needs to be highly
reliable. Sufficient capacity must be available in the VoIP system and its gateways to
minimize the likelihood of call blocking and call droppings. This will be especially
important when the network is shared with data traffic that may cause congestion.
Mechanisms for admission control should be available for both the voice and data traffic,
with prioritization policies set.
There is potential for extremely high growth rates in VoIP systems, especially if
they prove the equal in perceived quality of the PSTN at much lower cost. VoIP systems
must be flexible enough to grow to serve very large user populations, to allow a mix of
public and private services and to adapt to local regulations. The need for large numbers
of addressable points may force the use of improved Internet protocols such as IPv6.
Internet network capacity should be reconsidered.
Telephone systems assume that any telephone can call any other telephone and to
allow conferencing of multiple telephones across wide areas. This will be driven by
functions that map between telephone numbers and other types of packet network
address, specifically IP addresses. There must, of course, exist gateways that allow every
device to be reachable. On the other hand, many are claiming significant economic
advantages to the implementation of VoIP. These are often based on flat rate prices for
Internet service, the fact that services such as the "Internet 911" are not required and that
there is no regulatory prohibition against interconnection of telephone systems with IP
systems. Also assumed is that higher performance compression will not be used in the
telephone network to reduce costs. If circumstances change, the motivation for VoIP
purely for cost avoidance reasons may change also.
[1] “Voice Over IP (VoIP)”, Tech Papers, http://www.protocols.com/papers/voip.html,
[2] J. Ryan, Voice Over IP (VoIP), The Technology Guide Series,
http://www.techguide.com, 1998.
[3] “About VocalTec”, http://www.vocaltec.com/about/aboutus.htm
[4] Understanding Voice over IP, Telecommunications Research Associates, 1999.
[5] “Lucent Technologies announces Internet Call Centers”
[6] “Voice Over IP Unite As Technologies Mature”,
[7] E.B. Morgan, “Voice Over Packet” White Paper, Telogy Networks, 1998.
[8] Internet Telephony Tutorial, http://www.webproforum.com/int_tele, 1999.
[9] A Primer on the H.323 Series Standard,
[10] R.C. Levine, EE8302 Digital Telephony Class Notes, Southern Methodist
University, TX, USA, Fall 1999.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF