Voice Over IP (VoIP) by Ayse Yasemin Seydim [email protected] Southern Methodist University EE 8302 Fall 1999 TABLE OF CONTENTS List of Figures iii List of Tables iv Abstract v 1. Introduction 1 2. Overview of VoIP Applications and Services 2 2.1. Applications 3 2.2. Implementation Considerations 6 2.3. Summary of Benefits 6 3. VoIP Technologies and QoS Issues 3.1. Speech Quality and Characteristics 4. VoIP Equipment, Protocols and Standards 4.1. H.323 8 8 13 15 5. An Example Software Architecture For VoIP 19 6. Summary and Conclusions 21 References 23 ii LIST OF FIGURES Figure 1. VoIP Architecture 8 Figure 2. VoIP Protocol Stack 14 Figure 3. Voice Gateway/Terminal Functions 17 iii LIST OF TABLES Table 1. VoIP Network Protocols 15 Table 2. H.323 and Related Recommendations 18 Table 3. Other VoIP Protocols 18 iv ABSTRACT With the aim of reducing communication costs, efforts of integrating voice and data networks have been a rising priority for many companies. Organizations have been working on the solutions which would make them use the excess capacity on broadband networks for voice and data transmission, as well as utilize the Internet and company Intranets as alternatives to expensive systems. At the same time, more and more companies are seeing the value of transporting voice over IP networks to reduce telephone and facsimile costs and to set the stage for advanced multimedia applications. Providing high quality telephony over IP networks is one of the key steps in the convergence of voice, fax, video, and data communications services. This alternative application, Voice over IP -VoIP- technologies vary in complexity from simple personal computer software packages that pass digitized voice into a network to more complicated hardware and software products. All of these products are mainly focused on using IP (Internet Protocol) as a voice transport mechanism, providing low cost communication. It has many problems to be considered such as, delay, jitter, packet loss, echo cancellation, and by this time, the implementations should also consider more interoperability, reliability, and these problems in the design of the network. By this time, using VoIP in the company Intranets is more reasonable considering the cost and benefit ratio. The paper provides a brief overview of the technology and how this technology can be applied for the integration of voice and data networks. It also gives the main characteristics and typical hardware and software architecture and cover most VoIP system issues, including the technical problems and Quality of Service concerns. It provides a brief summary about the standards for VoIP. v 1. INTRODUCTION In today’s world, most of the communication is in digital form and data is transported via packet networks such as IP (Internet Protocol), ATM (Asynchronous Transfer Mode), and Frame Relay. Since data traffic is growing much faster than telephone traffic, there has been considerable interest in transporting voice over data networks. Organizations have been working on the solutions which would allow them to use the excess capacity on broadband networks for voice and data transmission, as well as to utilize the Internet and company Intranets as alternatives to more expensive systems. Voice over IP (or VoIP) is defined as “the ability to make telephone calls and send facsimiles over IP-based data networks with a suitable quality of service (QoS) and a superior cost/benefit” [1], [2]. Equipment developers and manufacturers see VoIP as a new opportunity to innovate and compete. The challenge for them is turning this vision into reality by developing new VoIP-enabled equipment. For Internet Service Providers (ISPs), the possibility of introducing usage-based pricing and increasing their traffic volumes is very attractive. On the other hand, users (individuals and organizations) are interested in the integration of voice and data applications as well as cost benefits. Although, support for voice communications using IP, VoIP, has become attractive mainly for its low-cost, flat-rate pricing of the public Internet, the technology has not been developed to the point where it can replace the services and quality provided by the public network. In VoIP, the voice signal is digitized, compressed, and sliced into packets and sent with other packets across the packet switched network. At the receiving end, re-assembled packets arrive as normal sound voice call. Successful delivery of voice over packet networks presents a tremendous opportunity; however, implementing the products is not so straightforward with all the varieties in the standards, user requirements, interoperability, scalability issues and the need for research. 1 This paper presents a brief overview of the VoIP technology with its applications and services. Section 2 includes the implementation considerations and summary of benefits. In Section 3, we pointed out the technological constraints and included the brief hardware and software architecture in Sections 4 and 5. Summary and conclusions are included in the last section. 2. OVERVIEW OF VoIP APPLICATIONS AND SERVICES The role of Public Switched Telecommunications Network (PSTN) has been providing dedicated circuit connection between calling parties over which voice can be transported with good quality. The networking technology has been improved to include sophisticated software-driven switching systems, cross-connect systems, and signaling networks to control setup and provide new services. With the evolution, the promise for the future becomes supporting all standard call services on a IP-based network and take advantage of the efficiencies of a shared packet network as well as the opportunity to provide an open environment for the creation of new services. The Internet has evolved over the last thirty years, but has exploded in growth in the last three to four years with the usage of browsers, the World Wide Web (WWW), and electronic messaging (i.e., e-mail). The Internet, using the Internet Protocol(IP), provides a connectionless, single priority service which is not designed for real-time applications, such as voice conversations. However, in 1995 VocalTec Communications, Ltd.[3] introduced a software package that allowed PC users to carry real-time voice conversation over the Internet software (Internet Phone). For the first time, users of PCs with a sound card, microphone, earphone (or speaker) and Internet connection could talk to others with similar systems anywhere in the world with no long-distance charges. The voice quality was poor, primarily due to the delay encountered on the public Internet, but it was a free call, innovative, and demonstrated the capability of Voice over Internet Protocol (VoIP) and triggered new markets and studies. Some figures showing the trend in IP telephony is presented in [4] which is taken from the Voice on the Net Conference 2 Fall 98. It is stated that, in 1998, there were approximately 80 Million minutes of use of IP Telephony and more than One Trillion minutes of POTS (Plain Old Telephone System). In the meantime, the US Market for VoIP services was roughly $157 Million, whereas the total US market for telecommunications services in 1997 was around $230 Billion. It is pointed out that, IP Telephony minutes are growing 40% to 50% per month, and by 2004; 5% to 20% of long distance calls is estimated to be by VoIP. The Fax over IP market is estimated to be $25 Billion. To show the marketplace of these innovations, we can give a small list of the vendors in VoIP products and services as: Cisco, Clarent, elemedia (only software), Ericcson, Inter-Tel, Linkon (software), Lucent, MICOM (NortelNetworks), Motorola, NetSpeak, NeTrue, Nuera, RADVision, Telogy (softwareTexas Instruments), Vienna Systems, VocalTec and others. 2.1 Applications In order to provide access by using a basic telephone new IP Telephony Service Providers (ITSPs) began developing gateway1 products for voice that provided an interface between the PSTN and the Internet. By deploying these gateways in strategic locations, they were able to provide VoIP telephone service over high usage and expensive tariff paths. Since ITSPs are not considered Interexchange Carriers (IECs), they were not subject to access charges that are normally paid to the Local Exchange Carriers (LECs), or settlement charges for international calls[4]. This allowed them to provide service at significant discounts over the rates charged by standard telecommunication carriers. On the other hand, quality of the voice was still subject to the delays and uneven service provided by the public Internet. The next generation of ITSPs began using dedicated backbone networks that were well managed and not subject to the same variable as public Internet. A typical phone-to-phone call using an ITSP involves three separate network segments and a multistep setup process. The caller dials an access number to connect over 1 Gateway is an endpoint device like an interface unit and it is defined as an H.323 entity which provides real-time, two-way communications between H.323 terminals on the LAN and other ITU terminals on a WAN, or to another H.323 Gateway [9]. 3 the PSTN to the originating gateway. The caller then enters a Personal Identification Number (PIN) to identify themselves for billing purposes. Many ITSPs use a debit approach to billing, where the subscriber prepays for a certain number of minutes, which are reduced as calling times add up. This provides additional savings to the ITSP, which collects interest on the unconsumed balance of predeposited funds and does not have to create, print, and send bills or deal with bad depts. After entering the PIN, the caller then enters the number of the person they want to call. The originating gateway must translate those dialed digits into the IP address of the destination gateway that serves that calling area. The translation tables may reside in the gateway itself or may be contained in a centralized network resource. The originating gateway exchanges information such as dialed digits and voice compression scheme to be used with the destination gateway, all via the IP network. The destination gateway then places a call over the PSTN to the destination party. The voice travels over the circuit connection in the originating LEC to the originating gateway, through the IP network between the gateways, and over the circuit connection in the destination LEC down to the called party. Corporations have taken advantage of the advances in VoIP technology as well, locating gateways to connect onsite voice networks (e.g., via Private Branch Exchanges PBXs) to the corporate data network - Intranet - to support internal voice calling. This provides potential savings by reducing minutes of usage charges that is paid to traditional telecommunication carriers and also by reducing the number of access lines needed for voice lines into the PSTN. Sometimes, corporations can also use this architecture to save on toll calls to public numbers in areas where they have corporate sites. The call is carried over the corporate data network to the destination city, then a local call is placed from the corporate site to the destination number. This is called as call tail end hop-off [4]. While the initial motivations for voice and fax over IP has been cost savings, the long term vision of VoIP are the new services that could be provided by integrating voice and of data on the same network. Creating new applications that are not possible (or feasible) before this on the circuit switched network is likely to be the main advantage of 4 VoIP rather than just discount telephony. Some examples of these services include unified messaging and Internet call centers. “Unified messaging is loosely defined as providing a common interface to manage all types of messages: voice mail, e-mail, fax, and ultimately multimedia messages” [4]. All message manipulation (retrieving, saving, forwarding, attaching, broadcasting, etc.) is performed in the same standard manner. This requires a common interface and technologies, such as text-to-speech conversion and speech recognition. Allowing all of the information to be transported over a common network and protocol is a big step to realizing the full benefits of the service. Internet Call Centers deliver personalized service at the click of a button by combining the expert capabilities of a organization's call center with the self-service functionality and spontaneity of the World Wide Web[5]. In general, it allows an agent to participate as a part of a call center from any location that has access to the Internet. Calls and customer profile data can be coordinated and directed to any address, allowing call distribution functions to act on data such as network congestion, agent capabilities, or time of day when routing the call. The agent can also coordinate data interaction with a caller who is also connected to the Internet, such as pushing relevant web pages to the caller’s PC. One of the immediate applications for IP telephony is real-time facsimile transmission. Facsimile services normally use dial-up PSTN connections, at speeds up to 14.4 kb/s, between pairs of compatible fax machines. Transmission quality is affected by network delays, machine compatibility, and analog signal quality. To operate over packet networks, a fax interface unit must convert the data to packet form, handle the conversion of signaling and control protocols (the T.30 and T.4 standards), and ensure complete delivery of the scan data in the correct order. For fax transmission applications, Fax over IP (FoIP), packet loss and end-to-end delay are more critical than in voice applications and it is out of this paper’s scope. 5 2.2 Implementation Considerations VoIP can be applied to almost any voice communications requirement, ranging from a simple inter-office intercom to complex multi-point teleconferencing/shared screen environments. The quality of voice reproduction to be provided can also be tailored according to the application. Therefore, VoIP equipment must have the flexibility to provide a wide range of configurations and environments and the ability to blend traditional telephony with VoIP. The main issues that needs to be considered in VoIP implementations today are quality of voice over a shared data network, reliability of the network itself for establishing and maintaining calls, interworking with the PSTN for consistent call setup, billing, and maintenance, standards for VoIP call setup, and the interoperability between networks and vendor products, improving the scalability and reducing the cost of gateway products and dealing with the uncertain regulatory environment as ITSPs begin to look more like IECs. Most VoIP applications that have been defined are considered to be real-time activities. It is stated in [4] that store-and-forward voice services can be implemented using VoIP. For example, voice messages could be prepared locally using a telephone and delivered to an integrated voice/data mailbox using Internet or Intranet services. Voice annotated documents, multimedia files, etc. can also become standard within office suites in the near future. The key issue is that the real-time and store-and-forward modes of operation need to be compatible and interoperable. 2.3 Summary of Benefits The benefits of VoIP technology can be summarized as follows : Cost Reduction. Although reducing long distance telephone costs is always a popular topic and would provide a good reason for introducing VoIP, the actual savings over the long term are still a subject of debate in the industry. Flat rate pricing is available with the 6 Internet and can result in considerable savings for both voice and facsimile (at least currently). It has been estimated that up to 70% of all calls to Asia are facsimile, most of which could be replaced by FoIP. These lower prices are based on avoiding telephony access charges and settlement fees rather than being a fundamental reduction in resource costs. The sharing of equipment and operation costs across both data and voice users can also improve network efficiency since excess bandwidth on one network can be used by the other. Simplification. An integrated infrastructure that supports all forms of communication allows more standardization and reduces the total equipment investment. This combined infrastructure can support dynamic bandwidth optimization and a fault-tolerant design. The differences between the time of day and geographic traffic patterns of voice and data offer further opportunities for significant efficiency improvements. Consolidation. Since people are among the most significant cost elements in a network, any opportunity to combine operations, to eliminate points of failure, and to consolidate accounting systems would be beneficial. In the enterprise, system management can be provided for both voice and data services using VoIP. Universal use of the IP protocols for all applications provides both reduced complexity and more flexibility and also related facilities such as directory services and security services may be more easily shared. Advanced Applications. Even though basic telephony and facsimile are the initial applications for VoIP, the longer term benefits are expected to be derived from multimedia and multi-service applications. For example, Internet commerce solutions can combine WWW access to information with a voice call button that allows immediate access to a call center agent from the PC. Needless to say, voice is an integral part of conferencing systems that may also include shared screens, whiteboarding2, etc. Combining voice and data features into new applications will provide the greatest returns over the longer term. 3. VoIP TECHNOLOGIES AND QoS ISSUES 7 The goal for developers is adding telephone calling capabilities (both voice transfer and signaling) to IP-based networks and interconnecting these to the public telephone network and to private voice networks in such a way to maintain current voice quality standards and preserve the features everyone expects from the telephone. An overall architecture is presented in Figure 1. These technologies range in complexity from software packages that pass digitized voice to a specified destination in a network to sophisticated hardware/software products providing toll-quality voice, directory services, and complex voice quality assurance capabilities[6]. These products ail providing transparent, business-quality communication in corporate Intranets and lower-cost, reduced-quality operation in the Internet. System Management PSTN/IP Interworking Speech Telephone Representation Call and Coding Control Voice Transport Figure 1. VoIP Architecture[2] 3.1. Speech Quality and Characteristics Providing a level of quality that at least equals the PSTN (toll-quality voice) is viewed as a basic requirement, although it is said that some experts argue that a cost versus function versus quality trade-off should be applied. Although QoS usually refers to the fidelity of the transmitted voice and facsimile documents, it can also be applied to network availability (i.e., call capacity, or level of call blocking), telephone feature availability (conferencing, calling number display, etc.), and scalability (any-to-any, universal, expandable). 2 Whiteboarding : sharing and editing document, photos and drawings with others in real-time. 8 Although standardized measures have been developed by the ITU, the quality of sound reproduction over a telephone network is fundamentally subjective. It has been found that there are three factors that can profoundly impact the quality of the service [2]. Delay: Two problems that result from high end-to-end delay in a voice network are echo and talker overlap. Echo is caused by the signal reflections of the speaker’s voice from the far end telephone equipment back into the speaker’s ear[7]. Echo becomes a significant problem when the round-trip delay is more than 50 milliseconds. Since echo is perceived as a significant quality problem, VoIP systems must address the need for echo control and implement some means of echo cancellation. Talker overlap -the problem of one caller stepping on the other talker's speech- becomes significant if the one-way delay becomes greater than 250 milliseconds. The major constraint and requirement for reducing delay through a packet network is the end-to-end total delay time. The following are sources of delay in an end-to-end voice over packet call [7]: a. accumulation delay (or algorithmic delay):This delay is caused by the need to collect a frame of voice samples to be processed by the voice coder. It is related to the type of voice coder used and varies from a single waveform sample time (.125 microseconds) to many milliseconds. A representative list of standard voice coders and their frame times are given as: − − − − G.726-ADPCM (16, 24, 32, 40 kb/s)0.125 microseconds G.728-LD-CELP(16 kb/s)-2.5 milliseconds G.729-CS-ACELP (8 kb/s)-10 milliseconds G.723.1-Multi Rate Coder (5.3, 6.3 kb/s)-30 milliseconds b. processing delay: This delay is caused by the actual process of encoding and collecting the encoded samples into a packet for transmission over the packet network. The encoding delay is a function of both the processor execution time and the type of algorithm used. Often, multiple voice coder frames will be collected in a single packet to reduce the packet network overhead. For example, three frames of G.729 codewords, equaling 30 milliseconds of speech, may be collected and packed into a single packet. 9 c. network delay: This delay is caused by the physical medium and protocols used to transmit the voice data and by the buffers used to remove packet jitter on the receive side. Network delay is a function of the capacity of the links in the network and the processing that occurs as the packets transit the network. The jitter buffers add delay that is used to remove the packet delay variation that each packet is subjected to as it transits the packet network. This delay can be a significant part of the overall delay because packet-delay variations can be as high as 70 msec to 100 msec in some frame-relay networks and IP networks. Jitter (Delay Variability): Jitter is the inter-packet arrival time as introduced by the variable transmission delay over the network. Removing jitter requires collecting packets and holding them long enough to allow the slowest packets to arrive in time to be played in the correct sequence. This causes additional delay. The two conflicting goals of minimizing delay and removing jitter have engendered various schemes to adapt the jitter buffer size to match the time varying requirements of network jitter removal. This adaptation has the explicit goal of minimizing the size and delay of the jitter buffer while at the same time preventing buffer underflow caused by jitter. Two approaches to adapting the jitter buffer size are given in [7]. The approach selected will depend on the type of network the packets are traversing. The first approach is to measure the variation of packet level in the jitter buffer over a period of time and to incrementally adapt the buffer size to match the calculated jitter. This approach works best with networks that provide a consistent jitter performance over time (e.g., ATM networks). The second approach is to count the number of packets that arrive late and create a ratio of these packets to the number of packets that are successfully processed. This ratio is then used to adjust the jitter buffer to target a predetermined allowable late packet ratio. This approach works best with the networks with highly variable packet inter-arrival intervals (e.g., IP networks). In addition to the techniques described above, the network must be configured and managed to provide minimal delay and jitter, enabling a consistent QoS. 10 Lost Packet Compensation: Because the Internet is a packet-switched or connectionless network, the individual packets of each voice signal may travel over separate network paths for reassembly in the proper sequence at their ultimate destination. While this provides for a more efficient use of network resources than the circuitswitched PSTN, which routes a call over a single path, it also increases the chances for packet loss. Since all voice frames are treated as data, packets may be dropped under peak loads and during periods of congestion (caused by link failures or congestion). Due to the time sensitivity of voice transmissions, the normal Transmission Control Protocol (TCP) based retransmission schemes are not suitable. Packet losses greater than 10% are generally not tolerable. The data frames, however, are not time-sensitive and dropped or erroneous packets can be appropriately corrected through the process of retransmission. Lost voice packets, however, cannot be dealt with in this manner. Some schemes used by voice over packet software to address the problem of lost frames are given in [7] as: • Interpolate for lost speech packets by replaying the last valid packet received during the interval when the lost packet was supposed to be played out. This is a simple method that fills the time between noncontiguous speech frames. It works well when the incidence of lost frames is infrequent. It does not work very well when there are a number of lost packets in a row or a burst of lost packets. • Send redundant information at the expense of bandwidth utilization. The basic approach replicates and sends the nth packet of voice information along with the (n+1)th packet. This method has the advantage of being able to exactly correct for the lost packet. However, this approach uses more bandwidth and creates greater delay. • A hybrid approach uses a much lower bandwidth voice coder to provide redundant information carried along in the (n+1)th packet. This reduces the problem of the extra bandwidth required but fails to solve the problem of delay. Echo Compensation: Echo in a telephone network is caused by signal reflections generated by the hybrid circuit that converts between a 4-wire circuit (a separate transmit and receive pair) and a 2-wire circuit (a single pair for both transmit and receive) [10]. 11 These reflections of the speaker’s voice are heard in the speaker’s ear. Echo is present even in a conventional circuit-switched telephone network. However, it is acceptable because the round-trip delays through the network are smaller than 50 msec and the echo is masked by the normal side tone every telephone generates. Echo becomes a problem in voice over packet networks because the round-trip delay through the network is almost always greater than 50 msec. Thus, echo cancellation techniques are always used. ITU standard G.165 defines performance requirements that are currently required for echo cancellers. The ITU is defining much more stringent performance requirements in the G.IEC specification. Echo is generated toward the packet network from the telephone network. The echo canceller compares the voice data received from the packet network with voice data being transmitted to the packet network. The echo from the telephone network hybrid is removed by a digital filter on the transmit path into the packet network. On the other hand, maintenance of acceptable voice quality levels despite inevitable variations in network performance (such as congestion or link failures) is achieved using techniques such as compression, silence suppression, and QoS-enabled3 transport networks. Several developments in the 1990s, most notably advances in digital signal processor (DSP) technology, high-powered network switches, and QoS-based protocols, have combined to enable and encourage the implementation of voice over data networks. Low-cost, highperformance DSPs can perform the compression and echo cancellation algorithms efficiently. Software pre-processing of voice conversations can also be used to further optimize voice quality. One technique, called silence suppression, detects whenever there is a gap in the speech and suppresses the transfer of things like pauses, breaths, and other periods of silence. This can amount to 50-60% of the time of a call, resulting in considerable bandwidth conservation. Since the lack of packets is interpreted as complete 3 QoS-enabled network : a network architecture which brings together end hosts closer by increasing performance and reducing delay of the underlying network. In order to do this the network should implement service models so that services are specific to the traffic they service. 12 silence at the output, another function is needed at the receiving end to add "comfort noise" to the output, so there is no perceptible or disturbing change in apparent background noise level when silence occurs. Another software function that improves speech quality is echo cancellation. As was noted earlier, echo becomes a problem whenever the end-to-end delay for a call is greater than 50 milliseconds. Sources of delay in a packet voice call include the collection of voice samples (accumulation delay), encoding/decoding and packetizing time, jitter buffer delays, and network transit delay. The ITU recommendation G.168 defines the performance requirements that are currently required for echo cancellers. Engineering a VoIP network (and the equipment used to build it) involves tradeoffs among the quality of the delivered speech, the reliability of the system, and the delays inherent in the system. Minimizing the end-to-end delay budget is one of the key challenges in VoIP systems. Ensuring reliability in a "best effort" environment is another. Equipment producers that offer the flexibility to configure their systems to fit the environment and thereby optimize the quality of the voice produced will have a competitive advantage. 4. VoIP EQUIPMENT, PROTOCOLS AND STANDARDS VoIP equipment, which can be categorized into client, access/gateway, and carrier class/infrastructure segments, should be configurable and sufficiently flexible to add new techniques as they become available. Producers that make use of embedded software focuses on how to best utilize the functions instead of focusing on the problems associated with implementing and testing the objects themselves. Real-time voice traffic can be carried over IP networks in three different ways: • Voice trunks can replace the analog or digital circuits that are serving as voice trunks (such as private links between company-owned PBXs) or PSTN-access trunks (links between a PBX and the carrier). Voice packets 13 • are transferred between pre-defined IP addresses, thereby eliminating the need for phone number to IP address conversions. • PC-to-PC voice can be provided for multimedia PCs (i.e., PCs with a microphone and sound system) operating over an IP-based network without connecting to the PSTN. PC applications and IP-enabled telephones can communicate using point-to-point or multipoint sessions. This type of system may emulate an Internet chat group and could be combined with shared data systems like multimedia solutions. • Telephony (any phone-to- any other phone) communications appears like a normal telephone to the caller but may actually consist of various forms of voice over packet network, all interconnected to the PSTN. Gateway functionality is required when interconnecting to the PSTN or when interfacing the standard telephones to a data network. With each type of application, there are many standards and protocol that a designer must think. Figure 2 presents the basic IP network protocol stack used to implement VoIP. The brief explanations of the basic protocols are given in Table 1. H.323 RTP,RTCP, RSVP UDP/TCP Network Laye(IPv4, Ipv6, IPM) Data Link Layer Physical Layer Figure 2.VoIP Protocol Stack[2] The most important consideration at the network level is to minimize unnecessary data transfer delays. Providing sufficient node and link capacity and using congestion avoidance mechanisms (such as prioritization, congestion control, and access controls) can help to reduce overall delay. The ability to manage network and optimize route choices will reduce the effects of jitter. 14 Table 1. VoIP Network Protocols Other Standards RTP (Real-time Transport Protocol) RTCP (RTP Control Protocol) RSVP (Resource Reservation Protocol) IA 1.0 TCP, UDP IPv4, IPv6, IP multicast and various routing protocols Various subnetworks including ATM and Frame Relay SNMP (Simple Network Management Protocol) LDAP (Lightweight Directory Access Protocol) Other Internet application protocols Description IETF RFC1889, a real-time end-to end protocol utilizing existing transport layers for data that has real-time properties IETF RFC1889, a protocol to monitor the QoS and to convey information about the participants in an ongoing session; provides feedback on total performance and quality so that modifications can be made IETF RFC2205-2209, a general purpose signaling protocol allowing network resources to be reserved for a connectionless data stream, based on receiver-controlled requests VoIP Forum Implementation Agreement 1.0 selecting protocol options for interoperable VoIP Internet standard Transport Layer protocols Internet standard Network Layer protocols (currently IPv4 is in widespread use) both for data transfer and routing A variety of subnetworks can be used to carry IP datagrams including LANs and WANs using a variety of transmission techniques Internet standard for communications between a manager and a managed object Internet standard for accessing Internet directory services Several other application protocols are used in conjunction with network nodes including FTP, Telnet, http/WWW, etc. 4.1. H.323 The Internet industry is tackling the problems of network reliability and sound quality on the Internet through the gradual adoption of standards. Efforts for setting standards are focusing on the three basic elements of Internet telephony: the audio codec format, transport protocols, and directory services. In May 1996, the International Telecommunications Union (ITU) ratified the H.323 specification, which defines how voice, data, and video traffic will be transported over IP-based local area networks; it also incorporates the T.120 data-conferencing standard[9]. The recommendation is based on the real-time protocol/real-time control protocol (RTP/RTCP) for managing audio and video signals. H.323 addresses the core Internet-telephony applications by defining how delaysensitive traffic, (i.e., voice and video), gets priority transport to ensure real-time 15 communications service over the Internet. (The H.324 specification defines the transport of voice, data, and video over regular telephony networks, while H.320 defines the protocols for transporting voice, data, and video over ISDN networks.) H.323 is a set of recommendations, one of which is G.729 for audio codecs, which the ITU ratified in November 1995. Despite the ITU recommendation, however, the Voice over IP Forum in March 1997 voted to recommend the G.723.1 specification over the G.729 standard. The industry consortium, which is led by Intel and Microsoft, agreed to sacrifice some sound quality for the sake of greater bandwidth efficiency-G.723.1 requires 6.3 kb/s, while G.729 requires 7.9 kb/s. Adoption of the audio codec standard, while an important step, is expected to improve reliability and sound quality mostly for Intranet traffic and point-to-point IP connections. To achieve PSTN-like quality, standards are required to guarantee delay and jitter of Internet connections. The transport protocol RTP, on which the H.323 recommendation is based, essentially is a new protocol layer for real-time applications; RTP-compliant equipment will include control mechanisms for synchronizing different traffic streams. However, RTP does not have any mechanisms for ensuring the on-time delivery of traffic signals or for recovering lost packets. RTP also does not address the so-called "quality of service" (QoS) issue related to guaranteed bandwidth availability for specific applications. Currently, there is a draft signaling-protocol standard aimed at strengthening the Internet’s ability to handle real-time traffic reliably (i.e., to dedicate end-to-end transport paths for specific sessions much like the circuit-switched PSTN does). If adopted, the resource reservation protocol, or RSVP, will be implemented in routers to establish and maintain requested transmission paths and quality-of-service levels. On the other hand, there is a need for industry standards in the area of Internettelephony directory services. Directories are required to ensure interoperability between the Internet and the PSTN, and most current Internet-telephony applications involve proprietary implementations. However, the lightweight directory access protocol (LDAP v3.0) is said to be emerging as the basis for a new standard. 16 The ability to digitize and process voice streams using self-contained software building blocks is the key to success with VoIP implementation. VoIP equipment should comply with the H.323 standard which has been defined by the ITU to describe terminals, equipment, and services for multimedia communication over networks (such as LANs or the Internet) that do not provide a guaranteed QoS. H.323 is a family of software-based standards that define various options for compression and call control. Figure 3 illustrates the functional components of terminals that use the H.323 standards. Table 2 gives a list of the various standards that have been adopted as part of the H.323 family. Although H.323 is the recognized standard for VoIP terminals, there are additional standards that are more appropriately suited for client applications, such as IP phones. As H.323 was originally designed for the desktop, a higher priority was given to rich functionality, rather than resource allocation. This has given rise to alternative protocols which are listed in Table 3, that can interoperate with H.323. Speech Signaling Voice Processing Network Management Call Proc essi ng Packet Processing SNMP Messages IP Packages Figure 3 Voice Gateway/Terminal Functions[2] 17 Table 2. H.323 and Related Recommendations Recommendation H.323 H.225 H.245 H.261 H.263 G.711 G.722 G.728 G.723, G.723.1 G.729, G.729a T.120 Brief Description Document called "Visual telephone systems and equipment for local area networks which provide a non-guaranteed quality of service" (November, 1996) Call control messages including signaling, registration, and admissions, and for the packetization and synchronization of media streams including both point-to-point and multipoint calls Messages for opening and closing channels for media streams, and other commands, requests, and indications Video codec for audio visual services at multiples of 64 kb/s Specifies a codec for video over the PSTN Audio codec for 3.1 kb/s bandwidth over 48,56, and 64 kb/s channels (normal telephony) Audio codec for 7 kb/s bandwidth over 48,56, and 64 kb/s channels Audio codec for 3.1 kb/s bandwidth over 16 kb/s channels Audio codec for 3.1 kb/s bandwidth over 5.3 and 6.3 kb/s channels (G.723.1 has been selected by the VoIP Forum for use with VoIP) Audio codec for 3.1 kb/s bandwidth over 8 kb/s channels (adopted by the Frame Relay Forum for voice over Frame Relay) Data and conference control Table 3. Other VoIP Protocols Protocol SGCP (Simple Gateway Control Protocol) SAP (Session Announcement Protocol) SIP (Session Initiation Protocol) RTSP (Real-Time Streaming Protocol) SDP (Session Description Protocol) Brief Description Simple UDP-based protocol for managing endpoints and connections between endpoints. Protocol used by multicast session managers to distribute a multicast session description to a large group of recipients Protocol used to invite an individual user to take part in a point-to-point or unicast session Protocol used to interface to a server that will provide realtime data Describes the session for SAP, SIP and RTSP A VoIP software solution should be designed with well-defined interfaces between the modules, for example, the interface between the voice processing performed on a DSP and the rest of the system must be clearly defined. This also allows the same device to be configured to work with IP, Frame Relay, or ATM without a complete redesign. An example software architecture for VoIP is described in Section 5. 5. AN EXAMPLE SOFTWARE ARCHITECTURE FOR VoIP 18 Voice and telephone calling can be viewed as one of many applications for an IP network, with software being used to support the application and interface to the network. The emergence of VoIP is a direct result of the advances that have been made in hardware and software technologies in the early 1990s. The software functionality required for voice-topacket conversion in a VoIP terminal or gateway are stated in [7] as: • The Voice Processing module, which prepares voice samples for transmission over the packet network. This software is typically run on a DSP. • The Call Processing (Signaling) module, which serves as a signaling gateway allowing calls to be established across the packet network. This software supports E&M (wink, delay and immediate), loop, or ground start Foreign Exchange Station (FXS) and Foreign Exchange Office(FXO). • The Packet Processing module, which processes voice and signaling packets, adding the appropriate transport headers prior to submitting the packets to the IP network (or other packet networks). Signaling information is converted from telephony protocols to the packet signaling protocol. • The Network Management module, which provides management agent functionality, allowing remote fault, accounting, and configuration management to be performed from standard management systems (see the next section). The Network Management module could include ancillary services such as support for security features, access to dialing directories, and remote access support. The Voice Processing module must include, PCM (Pulse Code Modulation) Interface, which receives samples from the telephony (PCM) interface and forwards them to the appropriate VoIP software module for processing. The PCM interface performs continuous phase re-sampling of output samples to the analog interface. It should add an Echo Cancellation Unit, which performs echo cancellation on sampled, full-duplex voice port signals in accordance with the ITU G.165 or G.168 standard. Since round-trip delay for VoIP is always greater than 50 milliseconds, echo cancellation is a requirement. 19 Operational parameters may be programmable. The Voice Activity/Idle Noise Detector, which suppresses packet transmission when voice signals are not present (and hence saves additional bandwidth) must also be included. If no activity is detected for a period of time, the voice encoder output will not be transported across the network. Idle noise levels are also measured and reported to the destination so that "comfort noise" can be inserted into the call so that the listener does not get “dead air” on their telephone. It may include a Tone Detector, which detects the reception of DTMF (Digital Tone MultiFrequency) tones and discriminates between voice and facsimile signals. These can be used to invoke the appropriate voice processing functions (i.e., the decoding and packetizing of facsimile information or the compression of voice). The Tone Generator will generate DTMF tones and call progress tones under command of the operating system. An additional Facsimile Processing module, will provide a facsimile relay function by demodulating the PCM data, extracting the relevant information, and packing the scan data into packets. The Packet Voice Protocol module, encapsulates the compressed voice and fax data for transmission over the data network. Each packet includes a sequence number that allows the received packets to be delivered in the correct order. This also allows silence intervals to be reproduced properly and lost packets to be detected. On the other hand, a Voice Playout module at the destination, to buffer the packets that are received and to forward them to the voice codec for playout will be useful. This module provides an adaptive jitter buffer and a measurement mechanism that allows buffer sizes to be adapted to the performance of the network. The Call Processing (signaling) subsystem detects the presence of a new call and collects addressing information. Various telephony signaling standards must be supported. A number of functions must be performed if full telephone calling is to be supported. The interface to the telephone network must be monitored to collect incoming commands and responses. The signaling protocols must be terminated and the information must be extracted. The signaling information must be mapped into a format that can be used to establish a session across the packet network. Telephone numbers (E.164 dial addresses) must be converted into IP addresses (with the possible need for an external reference to a directory service). Two approaches to dialing are being used: 20 single stage (dial the destination number and use automatic route selection functions), and two stage (dial the VoIP gateway number, then dial the real destination). 6. SUMMARY AND CONCLUSIONS Data traffic has traditionally been forced to fit onto the voice network. On the other hand, Internet has created an opportunity to reverse this integration strategy -voice and facsimile can now be carried over IP networks, with the integration of video and other multimedia applications close behind. Internet and its underlying TCP/IP protocol suite have become the driving force for new technologies, with the unique challenges of real-time voice being the latest in a series of developments. Consequently, the market for VoIP products is established and is in rapid growth. Several factors will influence future developments in VoIP products and services. Currently, the most promising areas for VoIP are corporate Intranets and commercial extranets. Another influential element in the ongoing Internet-telephony evolution is the VoIP gateway. As these gateways evolve from PC-based platforms to robust embedded systems, each will be able to handle hundreds of simultaneous calls. Consequently, corporations will deploy large numbers of them in an effort to reduce the expenses associated with high-volume voice, fax, and video-conferencing traffic. The economics of placing all traffic- data, voice, and video-over an IP-based network will pull companies in this direction, simply because IP will act as a unifying agent, regardless of the underlying architecture (i.e., leased lines, frame relay, or ATM) of an organization’s network. Implementations of VoIP systems must provide interoperability, since in a public environment different products will need to be able to work together. Using common software that is compatible to all applicable standards can significantly reduce the cost of product development. The VoIP network, whether by design or through management, should be fault-tolerant with only a very small likelihood of complete failure. In particular, the gateway between the Telephone and VoIP systems needs to be highly reliable. Sufficient capacity must be available in the VoIP system and its gateways to 21 minimize the likelihood of call blocking and call droppings. This will be especially important when the network is shared with data traffic that may cause congestion. Mechanisms for admission control should be available for both the voice and data traffic, with prioritization policies set. There is potential for extremely high growth rates in VoIP systems, especially if they prove the equal in perceived quality of the PSTN at much lower cost. VoIP systems must be flexible enough to grow to serve very large user populations, to allow a mix of public and private services and to adapt to local regulations. The need for large numbers of addressable points may force the use of improved Internet protocols such as IPv6. Internet network capacity should be reconsidered. Telephone systems assume that any telephone can call any other telephone and to allow conferencing of multiple telephones across wide areas. This will be driven by functions that map between telephone numbers and other types of packet network address, specifically IP addresses. There must, of course, exist gateways that allow every device to be reachable. On the other hand, many are claiming significant economic advantages to the implementation of VoIP. These are often based on flat rate prices for Internet service, the fact that services such as the "Internet 911" are not required and that there is no regulatory prohibition against interconnection of telephone systems with IP systems. Also assumed is that higher performance compression will not be used in the telephone network to reduce costs. If circumstances change, the motivation for VoIP purely for cost avoidance reasons may change also. 22 REFERENCES [1] “Voice Over IP (VoIP)”, Tech Papers, http://www.protocols.com/papers/voip.html, [2] J. Ryan, Voice Over IP (VoIP), The Technology Guide Series, http://www.techguide.com, 1998. [3] “About VocalTec”, http://www.vocaltec.com/about/aboutus.htm [4] Understanding Voice over IP, Telecommunications Research Associates, 1999. [5] “Lucent Technologies announces Internet Call Centers” http://public1.lucent.com/press/1097/971007.bca.html [6] “Voice Over IP Unite As Technologies Mature”, http://www.micom.com/international/tech/vipunite.htm [7] E.B. Morgan, “Voice Over Packet” White Paper, Telogy Networks, 1998. [8] Internet Telephony Tutorial, http://www.webproforum.com/int_tele, 1999. [9] A Primer on the H.323 Series Standard, http://www.databeam.com/h323/h323primer.html [10] R.C. Levine, EE8302 Digital Telephony Class Notes, Southern Methodist University, TX, USA, Fall 1999. 23
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
advertisement