432 CHAPTER 14 / CELLULAR WIRELESS NETWORKS Table 14.3 IS-95 Forward Link Channel Parameters Channel Sync Data rate (bps) 1200 4800 9600 1200 2400 4800 9600 1800 3600 7200 14400 Code repetition 2 2 1 8 4 2 1 8 4 2 1 Modulation symbol rate (sps) 4800 Paging Traffic Rate Set 1 Traffic Rate Set 2 19,200 19,200 19,200 19,200 19,200 19,200 19,200 19,200 19,200 19,200 PN chips/modulation symbol 256 64 64 64 64 64 64 PN chips/bit 1024 256 128 1024 512 256 128 64 64 64 682.67 341.33 170.67 64 85.33 IS-95 The most widely used second-generation CDMA scheme is IS-95, which is primarily deployed in North America. The transmission structures on the forward and reverse links differ and are described separately. IS-95 Forward Link Table 14.3 lists forward link channel parameters. The forward link consists of up to 64 logical CDMA channels each occupying the same 1228-kHz bandwidth (Figure 14.10a). The forward link supports four types of channels: 7 8 9 31 32 33 63 Pilot channel Paging channel Paging channel Traffic channel 1 Traffic channel 2 Traffic channel 24 Synchronization channel Traffic channel 25 Traffic channel 55 (a) Forward channels User-specific long code Walsh code 0 1 1.228 MHz Distinct long code 1.228 MHz 1 2 Access channel Access channel 32 1 2 3 Access channel Traffic channel Traffic channel Traffic channel 62 Traffic channel (b) Reverse channels Figure 14.10 IS-95 Channel Structure 14.3 / SECOND-GENERATION CDMA 433 • Pilot (channel 0): A continuous signal on a single channel. This channel allows the mobile unit to acquire timing information, provides phase reference for the demodulation process, and provides a means for signal strength comparison for the purpose of handoff determination. The pilot channel consists of all zeros. • Synchronization (channel 32): A 1200-bps channel used by the mobile station to obtain identification information about the cellular system (system time, long code state, protocol revision, etc.). • Paging (channels 1 to 7): Contain messages for one or more mobile stations. • Traffic (channels 8 to 31 and 33 to 63): The forward channel supports 55 traffic channels. The original specification supported data rates of up to 9600 bps. A subsequent revision added a second set of rates up to 14,400 bps. Note that all of these channels use the same bandwidth. The chipping code is used to distinguish among the different channels. For the forward channel, the chipping codes are the 64 orthogonal 64-bit codes derived from a 64 * 64 matrix known as the Walsh matrix (discussed in [STAL05]). Figure 14.11 shows the processing steps for transmission on a forward traffic channel using rate set 1. For voice traffic, the speech is encoded at a data rate of 8550 bps. After additional bits are added for error detection, the rate is 9600 bps. The full channel capacity is not used when the user is not speaking. During quiet periods the data rate is lowered to as low as 1200 bps. The 2400-bps rate is used to transmit transients in the background noise, and the 4800-bps rate is used to mix digitized speech and signaling data. The data or digitized speech is transmitted in 20-ms blocks with forward error correction provided by a convolutional encoder with rate 1/2, thus doubling the effective data rate to a maximum of 19.2 kbps. For lower data rates, the encoder output bits (called code symbols) are replicated to yield the 19.2-kbps rate. The data are then interleaved in blocks to reduce the effects of errors by spreading them out. Following the interleaver, the data bits are scrambled. The purpose of this is to serve as a privacy mask and to prevent the sending of repetitive patterns, which in turn reduces the probability of users sending at peak power at the same time. The scrambling is accomplished by means of a long code that is generated as a pseudorandom number from a 42-bit-long shift register. The shift register is initialized with the user’s electronic serial number. The output of the long code generator is at a rate of 1.2288 Mbps, which is 64 times the rate of 19.2 kbps, so only one bit in 64 is selected (by the decimator function). The resulting stream is XORed with the output of the block interleaver. The next step in the processing inserts power control information in the traffic channel, to control the power output of the antenna. The power control function of the base station robs the traffic channel of bits at a rate of 800 bps. These are inserted by stealing code bits. The 800-bps channel carries information directing the mobile unit to increment, decrement, or keep stable its current output level. This power control stream is multiplexed into the 19.2 kbps by replacing some of the code bits, using the long code generator to encode the bits. 434 CHAPTER 14 / CELLULAR WIRELESS NETWORKS Forward traffic channel information bits (172, 80, 40, or 16 b/frame) Block interleaver 8.6 kbps 4.0 kbps 2.0 kbps 0.8 kbps 19.2 ksps Add frame quality indicators for 9600 & 4800 bps rates 9.2 kbps 4.4 kbps 2.0 kbps 0.8 kbps Long code mask for user m Long code generator 1.2288 Mbps Add 8-bit encoder tail 9.6 kbps 4.8 kbps 2.4 kbps 1.2 kbps 19.2 kbps 19.2 kbps 19.2 kbps Code symbol Power control bits 800 bps Multiplexor Decimator 800 Hz 19.2 kbps Walsh code n PN chip 1.2288 Mbps Symbol repetition 19.2 ksps Decimator Convolutional encoder (n, k, K) (2, 1, 9) 19.2 kbps 9.6 kbps 4.8 kbps 2.4 kbps Code symbol Code symbol QPSK modulator Transmitted signal Figure 14.11 IS-95 Forward Link Transmission The next step in the process is the DS-SS function, which spreads the 19.2 kbps to a rate of 1.2288 Mbps using one row of the 64 * 64 Walsh matrix. One row of the matrix is assigned to a mobile station during call setup. If a 0 bit is presented to the XOR function, then the 64 bits of the assigned row are sent. If a 1 is presented, then the bitwise XOR of the row is sent. Thus, the final bit rate is 1.2288 Mbps. This digital bit stream is then modulated onto the carrier using a QPSK modulation scheme. Recall from Chapter 5 that QPSK involves creating two bit streams that are separately modulated (see Figure 5.11). In the IS-95 scheme, the data are split into I and Q (in-phase and quadrature) channels and the data in each channel are XORed with a unique short code. The short codes are generated as pseudorandom numbers from a 15-bit-long shift register. 14.3 / SECOND-GENERATION CDMA 435 Table 14.4 IS-95 Reverse Link Channel Parameters Channel Data rate (bps) Code rate Symbol rate before repetition (sps) Access Traffic-Rate Set 1 Traffic-Rate Set 2 4800 1200 2400 4800 9600 1800 3600 7200 1/3 1/3 1/3 1/3 1/3 1/2 1/2 1/2 14400 1/2 14,400 3600 7200 14,400 28,800 3600 7200 14,400 28,800 Symbol repetition 2 8 4 2 1 8 4 2 1 Symbol rate after repetition (sps) 28,800 28,800 28,800 28,800 28,800 28,800 28,800 28,800 28,800 Transmit duty cycle 1 1/8 1/4 1/2 1 1/8 1/4 1/2 1 Code symbols/ modulation symbol 6 6 6 6 6 6 6 6 6 PN chips/ modulation symbol 256 256 256 256 256 256 256 256 256 PN chips/bit 256 128 128 128 128 256/3 256/3 256/3 256/3 IS-95 Reverse Link Table 14.4 lists reverse link channel parameters. The reverse link consists of up to 94 logical CDMA channels each occupying the same 1228-kHz bandwidth (Figure 14.10b). The reverse link supports up to 32 access channels and up to 62 traffic channels. The traffic channels in the reverse link are mobile unique. Each station has a unique long code mask based on its electronic serial number. The long code mask is a 42-bit number, so there are 2 42 - 1 different masks. The access channel is used by a mobile to initiate a call, to respond to a paging channel message from the base station, and for a location update. Figure 14.12 shows the processing steps for transmission on a reverse traffic channel using rate set 1. The first few steps are the same as for the forward channel. For the reverse channel, the convolutional encoder has a rate of 1/3, thus tripling the effective data rate to a maximum of 28.8 kbps. The data are then block interleaved. The next step is a spreading of the data using the Walsh matrix. The way in which the matrix is used, and its purpose, are different from that of the forward channel. In the reverse channel, the data coming out of the block interleaver are grouped in units of 6 bits. Each 6-bit unit serves as an index to select a row of the 64 * 64 Walsh matrix 12 6 = 642, and that row is substituted for the input. Thus the data rate is expanded by a factor of 64/6 to 307.2 kbps. The purpose of this encoding is to improve reception at the base station. Because the 64 possible codings are orthogonal, the block coding enhances the decision-making algorithm at the receiver and is also computationally efficient (see [PETE95] for details). We can view this Walsh modulation as a form of block error-correcting code with 1n, k2 = 164, 62 and dmin = 32. In fact, all distances are 32. The data burst randomizer is implemented to help reduce interference from other mobile stations (see [BLAC99b] for a discussion). The operation 436 CHAPTER 14 / CELLULAR WIRELESS NETWORKS Reverse traffic channel information bits (172, 80, 40, or 16 b/frame) Block interleaver 8.6 kbps 4.0 kbps 2.0 kbps 0.8 kbps Add frame quality indicators for 9600 & 4800 bps rates 64-ary orthogonal modulator 9.2 kbps 4.4 kbps 2.0 kbps 0.8 kbps 4.8 ksps (307.2 kcps) Add 8-bit encoder tail Convolutional encoder (n, k, K) (3, 1, 9) Code symbol Long code generator PN chip 1.2288 Mbps Long code mask Symbol repetition 28.8 ksps Figure 14.12 Modulation symbol (Walsh chip) Data burst randomizer 9.6 kbps 4.8 kbps 2.4 kbps 1.2 kbps 28.8 kbps 14.4 kbps 7.2 kbps 3.6 kbps Code symbol 28.8 ksps Code symbol OQPSK modulator Transmitted signal IS-95 Reverse Link Transmission involves using the long code mask to smooth the data out over each 20-ms frame. The next step in the process is the DS-SS function. In the case of the reverse channel, the long code unique to the mobile is XORed with the output of the randomizer to produce the 1.2288-Mbps final data stream. This digital bit stream is then modulated onto the carrier using an offset QPSK modulation scheme. This differs from the forward channel in the use of a delay element in the modulator (Figure 5.11) to produce orthogonality. The reason the modulators are different is that in the forward channel, the spreading codes are orthogonal, all coming from the Walsh matrix, whereas in the reverse channel, orthogonality of the spreading codes is not guaranteed. 14.4 / THIRD-GENERATION SYSTEMS 437 14.4 THIRD-GENERATION SYSTEMS The objective of the third generation (3G) of wireless communication is to provide fairly high-speed wireless communications to support multimedia, data, and video in addition to voice. The ITU’s International Mobile Telecommunications for the year 2000 (IMT-2000) initiative has defined the ITU’s view of third-generation capabilities as • Voice quality comparable to the public switched telephone network • 144-kbps data rate available to users in high-speed motor vehicles over large areas • 384 kbps available to pedestrians standing or moving slowly over small areas • Support (to be phased in) for 2.048 Mbps for office use • Symmetrical and asymmetrical data transmission rates • Support for both packet-switched and circuit-switched data services • An adaptive interface to the Internet to reflect efficiently the common asymmetry between inbound and outbound traffic • More efficient use of the available spectrum in general • Support for a wide variety of mobile equipment • Flexibility to allow the introduction of new services and technologies More generally, one of the driving forces of modern communication technology is the trend toward universal personal telecommunications and universal communications access. The first concept refers to the ability of a person to identify himself or herself easily and use conveniently any communication system in an entire country, over a continent, or even globally, in terms of a single account. The second refers to the capability of using one’s terminal in a wide variety of environments to connect to information services (e.g., to have a portable terminal that will work in the office, on the street, and on airplanes equally well). This revolution in personal computing will obviously involve wireless communication in a fundamental way. Personal communications services (PCSs) and personal communication networks (PCNs) are names attached to these concepts of global wireless communications, and they also form objectives for third-generation wireless. Generally, the technology planned is digital using time division multiple access or code division multiple access to provide efficient use of the spectrum and high capacity. PCS handsets are designed to be low power and relatively small and light. Efforts are being made internationally to allow the same terminals to be used worldwide. Alternative Interfaces Figure 14.13 shows the alternative schemes that have been adopted as part of IMT-2000. The specification covers a set of radio interfaces for optimized performance 438 CHAPTER 14 / CELLULAR WIRELESS NETWORKS Radio interface IMT-DS Direct spread (W-CDMA) IMT-MC Multicarrier (cdma2000) IMT-TC Time code (TD-CDMA) CDMA-based networks Figure 14.13 IMT-SC Single carrier (TDD) IMT-FT Frequency-time (DECT) TDMA-based networks FDMA-based networks IMT-2000 Terrestrial Radio Interfaces in different radio environments. A major reason for the inclusion of five alternatives was to enable a smooth evolution from existing first- and second-generation systems. The five alternatives reflect the evolution from the second generation. Two of the specifications grow out of the work at the European Telecommunications Standards Institute (ETSI) to develop a UMTS (universal mobile telecommunications system) as Europe’s 3G wireless standard. UMTS includes two standards. One of these is known as wideband CDMA, or W-CDMA. This scheme fully exploits CDMA technology to provide high data rates with efficient use of bandwidth. Table 14.5 shows some of the key parameters of W-CDMA. The other European effort under UMTS is known as IMT-TC, or TD-CDMA. This approach is a combination of W-CDMA and TDMA technology. IMT-TC is intended to provide an upgrade path for the TDMA-based GSM systems. Another CDMA-based system, known as cdma2000, has a North American origin. This scheme is similar to, but incompatible with, W-CDMA, in part because the standards use different chip rates. Also, cdma2000 uses a technique known as multicarrier, not used with W-CDMA. Two other interface specifications are shown in Figure 14.13. IMT-SC is primarily designed for TDMA-only networks. IMT-FT can be used by both TDMA and FDMA carriers to provide some 3G services; it is an outgrowth of the Digital European Cordless Telecommunications (DECT) standard. CDMA Design Considerations The dominant technology for 3G systems is CDMA. Although three different CDMA schemes have been adopted, they share some common design issues. [OJAN98] lists the following: 14.4 / THIRD-GENERATION SYSTEMS 439 Table 14.5 W-CDMA Parameters Channel bandwidth 5 MHz Forward RF channel structure Direct spread Chip rate 3.84 Mcps Frame length 10 ms Number of slots/frame 15 Spreading modulation Balanced QPSK (forward) Dual channel QPSK (reverse) Complex spreading circuit Data modulation QPSK (forward) BPSK (reverse) Coherent detection Pilot symbols Reverse channel multiplexing Control and pilot channel time multiplexed. I and Q multiplexing for data and control channels Multirate Various spreading and multicode Spreading factors 4 to 256 Power control Open and fast closed loop (1.6 kHz) Spreading (forward) Variable length orthogonal sequences for channel separation. Gold sequences 2 18 for cell and user separation. Spreading (reverse) Same as forward, different time shifts in I and Q channels. • Bandwidth: An important design goal for all 3G systems is to limit channel usage to 5 MHz. There are several reasons for this goal. On the one hand, a bandwidth of 5 MHz or more improves the receiver’s ability to resolve multipath when compared to narrower bandwidths. On the other hand, available spectrum is limited by competing needs, and 5 MHz is a reasonable upper limit on what can be allocated for 3G. Finally, 5 MHz is adequate for supporting data rates of 144 and 384 kHz, the main targets for 3G services. • Chip rate: Given the bandwidth, the chip rate depends on desired data rate, the need for error control, and bandwidth limitations. A chip rate of 3 Mcps or more is reasonable given these design parameters. • Multirate: The term multirate refers to the provision of multiple fixed-datarate logical channels to a given user, in which different data rates are provided on different logical channels. Further, the traffic on each logical channel can be switched independently through the wireless and fixed networks to different destinations. The advantage of multirate is that the system can flexibly support multiple simultaneous applications from a given user and can efficiently use available capacity by only providing the capacity required for each service. Multirate can be achieved with a TDMA scheme within a single CDMA channel, in which a different number of slots per frame are assigned to achieve different data rates. All the subchannels 440 CHAPTER 14 / CELLULAR WIRELESS NETWORKS Time mux Outer coding/ interleaving Time mux Outer coding/ interleaving Parallel services Time mux (a) Time multiplexing Outer coding/ interleaving Parallel services Outer coding/ interleaving Outer coding/ interleaving (b) Code multiplexing Figure 14.14 Time and Code Multiplexing Principles [OJAN98] at a given data rate would be protected by error correction and interleaving techniques (Figure 14.14a). An alternative is to use multiple CDMA codes, with separate coding and interleaving, and map them to separate CDMA channels (Figure 14.14b). 14.5 RECOMMENDED READING AND WEB SITES [BERT94] and [ANDE95] are instructive surveys of cellular wireless propagation effects. [BLAC99b] is one of the best technical treatments of second-generation cellular systems. [TANT98] contains reprints of numerous important papers dealing with CDMA in cellular networks. [DINA98] provides an overview of both PN and orthogonal spreading codes for cellular CDMA networks. [OJAN98] provides an overview of key technical design considerations for 3G systems. Another useful survey is [ZENG00]. [PRAS00] is a much more detailed analysis. 14.6 / KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS 441 ANDE95 Anderson, J.; Rappaport, T.; and Yoshida, S. “Propagation Measurements and Models for Wireless Communications Channels.” IEEE Communications Magazine, January 1995. BERT94 Bertoni, H.; Honcharenko, W.; Maciel, L.; and Xia, H. “UHF Propagation Prediction for Wireless Personal Communications.” Proceedings of the IEEE, September 1994. BLAC99b Black, U. Second-Generation Mobile and Wireless Networks. Upper Saddle River, NJ: Prentice Hall, 1999. DINA98 Dinan, E., and Jabbari, B. “Spreading Codes for Direct Sequence CDMA and Wideband CDMA Cellular Networks.” IEEE Communications Magazine, September 1998. OJAN98 Ojanpera, T., and Prasad, G. “An Overview of Air Interface Multiple Access for IMT-2000/UMTS.” IEEE Communications Magazine, September 1998. PRAS00 Prasad, R.; Mohr, W.; and Konhauser, W., eds. Third-Generation Mobile Communication Systems. Boston: Artech House, 2000. TANT98 Tantaratana, S, and Ahmed, K., eds. Wireless Applications of Spread Spectrum Systems: Selected Readings. Piscataway, NJ: IEEE Press, 1998. ZENG00 Zeng, M.; Annamalai, A.; and Bhargava, V. “Harmonization of Global Thirdgeneration Mobile Systems. IEEE Communications Magazine, December 2000. Recommended Web sites: • Cellular Telecommunications and Internet Association: An industry consortium that provides information on successful applications of wireless technology. • CDMA Development Group: Information and links for IS-95 and CDMA generally. • 3G Americas: A trade group of Western Hemisphere companies supporting a variety of second- and third-generation schemes. Includes industry news, white papers, and other technical information. 14.6 KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS Key Terms adaptive equalization Advanced Mobile Phone Service (AMPS) base station cellular network code division multiple access (CDMA) diffraction diversity fading fast fading flat fading first-generation (1G) network forward channel frequency diversity frequency reuse handoff mobile radio power control reflection reuse factor reverse channel scattering second-generation (2G) network selective fading slow fading space diversity third-generation (3G) network 442 CHAPTER 14 / CELLULAR WIRELESS NETWORKS Review Questions 14.1. 14.2. 14.3. 14.4. 14.5. 14.6. 14.7. 14.8. 14.9. 14.10. 14.11. 14.12. What geometric shape is used in cellular system design? What is the principle of frequency reuse in the context of a cellular network? List five ways of increasing the capacity of a cellular system. Explain the paging function of a cellular system. What is fading? What is the difference between diffraction and scattering? What is the difference between fast and slow fading? What is the difference between flat and selective fading? What are the key differences between first- and second-generation cellular systems? What are the advantages of using CDMA for a cellular network? What are the disadvantages of using CDMA for a cellular network? What are some key characteristics that distinguish third-generation cellular systems from second-generation cellular systems? Problems 14.1 14.2 14.3 14.4 14.5 Consider four different cellular systems that share the following characteristics. The frequency bands are 825 to 845 MHz for mobile unit transmission and 870 to 890 MHz for base station transmission. A duplex circuit consists of one 30-kHz channel in each direction. The systems are distinguished by the reuse factor, which is 4, 7, 12, and 19, respectively. a. Suppose that in each of the systems, the cluster of cells (4, 7, 12, 19) is duplicated 16 times. Find the number of simultaneous communications that can be supported by each system. b. Find the number of simultaneous communications that can be supported by a single cell in each system. c. What is the area covered, in cells, by each system? d. Suppose the cell size is the same in all four systems and a fixed area of 100 cells is covered by each system. Find the number of simultaneous communications that can be supported by each system. Describe a sequence of events similar to that of Figure 14.6 for a. a call from a mobile unit to a fixed subscriber b. a call from a fixed subscriber to a mobile unit An analog cellular system has a total of 33 MHz of bandwidth and uses two 25-kHz simplex (one-way) channels to provide full duplex voice and control channels. a. What is the number of channels available per cell for a frequency reuse factor of (1) 4 cells, (2) 7 cells, and (3) 12 cells? b. Assume that 1 MHz is dedicated to control channels but that only one control channel is needed per cell. Determine a reasonable distribution of control channels and voice channels in each cell for the three frequency reuse factors of part (a). A cellular system uses FDMA with a spectrum allocation of 12.5 MHz in each direction, a guard band at the edge of the allocated spectrum of 10 kHz, and a channel bandwidth of 30 kHz. What is the number of available channels? BcNT For a cellular system, FDMA spectral efficiency is defined as ha = , where Bw Bc = channel bandwidth Bw = total bandwidth in one direction NT = total number of voice channels in the covered area What is an upper bound on ha? 14.6 / KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS 14.6 Walsh codes are the most common orthogonal codes used in CDMA applications. A set of Walsh codes of length n consists of the n rows of an n * n Walsh matrix. That is, there are n codes, each of length n. The matrix is defined recursively as follows: W1 = 102 14.7 14.8 443 W2n = a Wn Wn Wn b Wn where n is the dimension of the matrix and the overscore denotes the logical NOT of the bits in the matrix. The Walsh matrix has the property that every row is orthogonal to every other row and to the logical NOT of every other row. Show the Walsh matrices of dimensions 2, 4, and 8. Demonstrate that the codes in an 8 * 8 Walsh matrix are orthogonal to each other by showing that multiplying any code by any other code produces a result of zero. Consider a CDMA system in which users A and B have the Walsh codes 1- 1 1 - 1 1 - 1 1 -1 12 and 1- 1 - 1 1 1 -1 -1 1 12, respectively. a. Show the output at the receiver if A transmits a data bit 1 and B does not transmit. b. Show the output at the receiver if A transmits a data bit 0 and B does not transmit. c. Show the output at the receiver if A transmits a data bit 1 and B transmits a data bit 1. Assume the received power from both A and B is the same. d. Show the output at the receiver if A transmits a data bit 0 and B transmits a data bit 1. Assume the received power from both A and B is the same. e. Show the output at the receiver if A transmits a data bit 1 and B transmits a data bit 0. Assume the received power from both A and B is the same. f. Show the output at the receiver if A transmits a data bit 0 and B transmits a data bit 0. Assume the received power from both A and B is the same. g. Show the output at the receiver if A transmits a data bit 1 and B transmits a data bit 1. Assume the received power from B is twice the received power from A. This can be represented by showing the received signal component from A as consisting of elements of magnitude 11 + 1, -12 and the received signal component from B as consisting of elements of magnitude 21+2, -22. h. Show the output at the receiver if A transmits a data bit 0 and B transmits a data bit 1. Assume the received power from B is twice the received power from A. PART FOUR Local Area Networks T he trend in local area networks (LANs) involves the use of shared transmission media or shared switching capacity to achieve high data rates over relatively short distances. Several key issues present themselves. One is the choice of transmission medium. Whereas coaxial cable was commonly used in traditional LANs, contemporary LAN installations emphasize the use of twisted pair or optical fiber. In the case of twisted pair, efficient encoding schemes are needed to enable high data rates over the medium. Wireless LANs have also assumed increased importance. Another design issue is that of access control. ROAD MAP FOR PART FOUR Chapter 15 Local Area Network Overview The essential technology underlying all forms of LANs comprises topology, transmission medium, and medium access control technique. Chapter 15 examines the first two of these elements. Four topologies are in common use: bus, tree, ring, and star. The most common transmission media for local networking are twisted pair (unshielded and shielded), coaxial cable (baseband and broadband), optical fiber, and wireless (microwave and infrared). These topologies and transmission media are discussed, with the exception of wireless, which is covered in Chapter 17. The increasing deployment of LANs has led to an increased need to interconnect LANs with each other and with WANs. Chapter 15 also discusses a key device used in interconnecting LANs: the bridge. 444 Chapter 16 High-Speed LANs Chapter 16 looks in detail at the topologies, transmission media, and MAC protocols of the most important LAN systems in current use; all of these have been defined in standards documents. The most important of these is Ethernet, which has been deployed in versions at 10 Mbps, 100 Mbps, 1 Gbps, and 10 Gbps. Then the chapter looks at Fibre Channel. Chapter 17 Wireless LANs Wireless LANs use one of three transmission techniques: spread spectrum, narrowband microwave, and infrared. Chapter 17 provides an overview wireless LAN technology and applications. The most significant set of standards defining wireless LANs are those defined by the IEEE 802.11 committee. Chapter 17 examines this set of standards in depth. 445 CHAPTER 15 LOCAL AREA NETWORK OVERVIEW 15.1 Background 15.2 Topologies and Transmission Media 15.3 LAN Protocol Architecture 15.4 Bridges 15.5 Layer 2 and Layer 3 Switches 15.6 Recommended Reading and Web Site 15.7 Key Terms, Review Questions, and Problems 446 The whole of this operation is described in minute detail in the official British Naval History, and should be studied with its excellent charts by those who are interested in its technical aspect. So complicated is the full story that the lay reader cannot see the wood for the trees. I have endeavored to render intelligible the broad effects. —The World Crisis, Winston Churchill KEY POINTS • • • • • A LAN consists of a shared transmission medium and a set of hardware and software for interfacing devices to the medium and regulating the orderly access to the medium. The topologies that have been used for LANs are ring, bus, tree, and star. A ring LAN consists of a closed loop of repeaters that allow data to circulate around the ring. A repeater may also function as a device attachment point. Transmission is generally in the form of frames. The bus and tree topologies are passive sections of cable to which stations are attached. A transmission of a frame by any one station can be heard by any other station. A star LAN includes a central node to which stations are attached. A set of standards has been defined for LANs that specifies a range of data rates and encompasses a variety of topologies and transmission media. In most cases, an organization will have multiple LANs that need to be interconnected. The simplest approach to meeting this requirement is the bridge. Hubs and switches form the basic building blocks of most LANs. We turn now to a discussion of local area networks (LANs). Whereas wide area networks may be public or private, LANs usually are owned by the organization that is using the network to interconnect equipment. LANs have much greater capacity than wide area networks, to carry what is generally a greater internal communications load. In this chapter we look at the underlying technology and protocol architecture of LANs. Chapters 16 and 17 are devoted to a discussion of specific LAN systems. 447 448 CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW 15.1 BACKGROUND The variety of applications for LANs is wide. To provide some insight into the types of requirements that LANs are intended to meet, this section provides a brief discussion of some of the most important general application areas for these networks. Personal Computer LANs A common LAN configuration is one that supports personal computers. With the relatively low cost of such systems, individual managers within organizations often independently procure personal computers for departmental applications, such as spreadsheet and project management tools, and Internet access. But a collection of department-level processors will not meet all of an organization’s needs; central processing facilities are still required. Some programs, such as econometric forecasting models, are too big to run on a small computer. Corporate-wide data files, such as accounting and payroll, require a centralized facility but should be accessible to a number of users. In addition, there are other kinds of files that, although specialized, must be shared by a number of users. Further, there are sound reasons for connecting individual intelligent workstations not only to a central facility but to each other as well. Members of a project or organization team need to share work and information. By far the most efficient way to do so is digitally. Certain expensive resources, such as a disk or a laser printer, can be shared by all users of the departmental LAN. In addition, the network can tie into larger corporate network facilities. For example, the corporation may have a building-wide LAN and a wide area private network. A communications server can provide controlled access to these resources. LANs for the support of personal computers and workstations have become nearly universal in organizations of all sizes. Even those sites that still depend heavily on the mainframe have transferred much of the processing load to networks of personal computers. Perhaps the prime example of the way in which personal computers are being used is to implement client/server applications. For personal computer networks, a key requirement is low cost. In particular, the cost of attachment to the network must be significantly less than the cost of the attached device. Thus, for the ordinary personal computer, an attachment cost in the hundreds of dollars is desirable. For more expensive, high-performance workstations, higher attachment costs can be tolerated. Backend Networks and Storage Area Networks Backend networks are used to interconnect large systems such as mainframes, supercomputers, and mass storage devices. The key requirement here is for bulk data transfer among a limited number of devices in a small area. High reliability is generally also a requirement. Typical characteristics include the following: • High data rate: To satisfy the high-volume demand, data rates of 100 Mbps or more are required. 15.1 / BACKGROUND 449 • High-speed interface: Data transfer operations between a large host system and a mass storage device are typically performed through high-speed parallel I/O interfaces, rather than slower communications interfaces. Thus, the physical link between station and network must be high speed. • Distributed access: Some sort of distributed medium access control (MAC) technique is needed to enable a number of devices to share the transmission medium with efficient and reliable access. • Limited distance: Typically, a backend network will be employed in a computer room or a small number of contiguous rooms. • Limited number of devices: The number of expensive mainframes and mass storage devices found in the computer room generally numbers in the tens of devices. Typically, backend networks are found at sites of large companies or research installations with large data processing budgets. Because of the scale involved, a small difference in productivity can translate into a sizable difference in cost. Consider a site that uses a dedicated mainframe computer. This implies a fairly large application or set of applications. As the load at the site grows, the existing mainframe may be replaced by a more powerful one, perhaps a multiprocessor system. At some sites, a single-system replacement will not be able to keep up; equipment performance growth rates will be exceeded by demand growth rates. The facility will eventually require multiple independent computers. Again, there are compelling reasons for interconnecting these systems. The cost of system interrupt is high, so it should be possible, easily and quickly, to shift applications to backup systems. It must be possible to test new procedures and applications without degrading the production system. Large bulk storage files must be accessible from more than one computer. Load leveling should be possible to maximize utilization and performance. It can be seen that some key requirements for backend networks differ from those for personal computer LANs. High data rates are required to keep up with the work, which typically involves the transfer of large blocks of data. The equipment for achieving high speeds is expensive. Fortunately, given the much higher cost of the attached devices, such costs are reasonable. A concept related to that of the backend network is the storage area network (SAN). A SAN is a separate network to handle storage needs. The SAN detaches storage tasks from specific servers and creates a shared storage facility across a high-speed network. The collection of networked storage devices can include hard disks, tape libraries, and CD arrays. Most SANs use Fibre Channel, which is described in Chapter 16. Figure 15.1 contrasts the SAN with the traditional serverbased means of supporting shared storage. In a typical large LAN installation, a number of servers and perhaps mainframes each has its own dedicated storage devices. If a client needs access to a particular storage device, it must go through the server that controls that device. In a SAN, no server sits between the storage devices and the network; instead, the storage devices and servers are linked directly to the network. The SAN arrangement improves client-to-storage access efficiency, as well as direct storage-to-storage communications for backup and replication functions. 450 CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW Server Server Storage devices Mainframe (a) Server-based storage Figure 15.1 Server Server Storage devices Server Mainframe Server (b) Storage area network The Use of Storage Area Networks [HURW98] High-Speed Office Networks Traditionally, the office environment has included a variety of devices with low- to medium-speed data transfer requirements. However, applications in today’s office environment would overwhelm the limited speeds (up to 10 Mbps) of traditional LAN. Desktop image processors have increased network data flow by an unprecedented amount. Examples of these applications include fax machines, document image processors, and graphics programs on personal computers and workstations. Consider that a typical page with 200 picture elements, or pels1 (black or white points), per inch resolution (which is adequate but not high resolution) generates 3,740,000 bits 18.5 inches * 11 inches * 40,000 pels per square inch2. Even with compression techniques, this will generate a tremendous load. In addition, disk technology and price/performance have evolved so that desktop storage capacities of multiple gigabytes are common. These new demands require LANs with high speed that can support the larger numbers and greater geographic extent of office systems as compared to backend systems. Backbone LANs The increasing use of distributed processing applications and personal computers has led to a need for a flexible strategy for local networking. Support of premises-wide data communications requires a networking service that is capable of spanning the distances involved and that interconnects equipment in a single (perhaps large) building 1 A picture element, or pel, is the smallest discrete scanning-line sample of a facsimile system, which contains only black-white information (no gray scales). A pixel is a picture element that contains gray-scale information. 15.2 / TOPOLOGIES AND TRANSMISSION MEDIA 451 or a cluster of buildings. Although it is possible to develop a single LAN to interconnect all the data processing equipment of a premises, this is probably not a practical alternative in most cases. There are several drawbacks to a single-LAN strategy: • Reliability: With a single LAN, a service interruption, even of short duration, could result in a major disruption for users. • Capacity: A single LAN could be saturated as the number of devices attached to the network grows over time. • Cost: A single LAN technology is not optimized for the diverse requirements of interconnection and communication. The presence of large numbers of lowcost microcomputers dictates that network support for these devices be provided at low cost. LANs that support very-low-cost attachment will not be suitable for meeting the overall requirement. A more attractive alternative is to employ lower-cost, lower-capacity LANs within buildings or departments and to interconnect these networks with a higher-capacity LAN.This latter network is referred to as a backbone LAN. If confined to a single building or cluster of buildings, a high-capacity LAN can perform the backbone function. 15.2 TOPOLOGIES AND TRANSMISSION MEDIA The key elements of a LAN are • • • • Topology Transmission medium Wiring layout Medium access control Together, these elements determine not only the cost and capacity of the LAN, but also the type of data that may be transmitted, the speed and efficiency of communications, and even the kinds of applications that can be supported. This section provides a survey of the major technologies in the first two of these categories. It will be seen that there is an interdependence among the choices in different categories. Accordingly, a discussion of pros and cons relative to specific applications is best done by looking at preferred combinations. This, in turn, is best done in the context of standards, which is a subject of a later section. Topologies In the context of a communication network, the term topology refers to the way in which the end points, or stations, attached to the network are interconnected. The common topologies for LANs are bus, tree, ring, and star (Figure 15.2). The bus is a special case of the tree, with only one trunk and no branches. Bus and Tree Topologies Both bus and tree topologies are characterized by the use of a multipoint medium. For the bus, all stations attach, through appropriate hardware interfacing known as a tap, directly to a linear transmission medium, or bus. Fullduplex operation between the station and the tap allows data to be transmitted onto 452 Flow of data Terminating resistance Repeater (a) Bus (c) Ring Central hub, switch, or repeater Headend (b) Tree (d) Star Figure 15.2 LAN Topologies CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW Tap 15.2 / TOPOLOGIES AND TRANSMISSION MEDIA 453 the bus and received from the bus. A transmission from any station propagates the length of the medium in both directions and can be received by all other stations. At each end of the bus is a terminator, which absorbs any signal, removing it from the bus. The tree topology is a generalization of the bus topology. The transmission medium is a branching cable with no closed loops. The tree layout begins at a point known as the headend. One or more cables start at the headend, and each of these may have branches. The branches in turn may have additional branches to allow quite complex layouts. Again, a transmission from any station propagates throughout the medium and can be received by all other stations. Two problems present themselves in this arrangement. First, because a transmission from any one station can be received by all other stations, there needs to be some way of indicating for whom the transmission is intended. Second, a mechanism is needed to regulate transmission. To see the reason for this, consider that if two stations on the bus attempt to transmit at the same time, their signals will overlap and become garbled. Or consider that one station decides to transmit continuously for a long period of time. To solve these problems, stations transmit data in small blocks, known as frames. Each frame consists of a portion of the data that a station wishes to transmit, plus a frame header that contains control information. Each station on the bus is assigned a unique address, or identifier, and the destination address for a frame is included in its header. Figure 15.3 illustrates the scheme. In this example, station C wishes to transmit a frame of data to A. The frame header includes A’s address. As the frame propagates along the bus, it passes B. B observes the address and ignores the frame. A, on the other hand, sees that the frame is addressed to itself and therefore copies the data from the frame as it goes by. So the frame structure solves the first problem mentioned previously: It provides a mechanism for indicating the intended recipient of data. It also provides the basic tool for solving the second problem, the regulation of access. In particular, the stations take turns sending frames in some cooperative fashion. This involves putting additional control information into the frame header, as discussed later. With the bus or tree, no special action needs to be taken to remove frames from the medium. When a signal reaches the end of the medium, it is absorbed by the terminator. Ring Topology In the ring topology, the network consists of a set of repeaters joined by point-to-point links in a closed loop. The repeater is a comparatively simple device, capable of receiving data on one link and transmitting them, bit by bit, on the other link as fast as they are received. The links are unidirectional; that is, data are transmitted in one direction only, so that data circulate around the ring in one direction (clockwise or counterclockwise). Each station attaches to the network at a repeater and can transmit data onto the network through the repeater. As with the bus and tree, data are transmitted in frames. As a frame circulates past all the other stations, the destination station recognizes its address and copies the frame into a local buffer as it goes by. The frame continues to circulate until it returns to the source station, where it is removed (Figure 15.4). Because multiple stations share the ring, medium access control is needed to determine at what time each station may insert frames. 454 CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW A A B C C transmits frame addressed to A A A B C Frame is not addressed to B; B ignores it A A B C A copies frame as it goes by Figure 15.3 Frame Transmission on a Bus LAN Star Topology In the star LAN topology, each station is directly connected to a common central node. Typically, each station attaches to a central node via two point-to-point links, one for transmission and one for reception. In general, there are two alternatives for the operation of the central node. One approach is for the central node to operate in a broadcast fashion. A transmission of a frame from one station to the node is retransmitted on all of the outgoing links. In this case, although the arrangement is physically a star, it is logically a bus: A transmission from any station is received by all other stations, and only one station at a time may successfully transmit. In this case, the central element is referred to as a hub. Another approach is for the central node to act as a frameswitching device. An incoming frame is buffered in the node and then retransmitted on an outgoing link to the destination station. These approaches are explored in Section 15.5. 15.2 / TOPOLOGIES AND TRANSMISSION MEDIA 455 C A B (a) C transmits frame addressed to A A C B A A (b) Frame is not addressed to B; B ignores it C B (c) A copies frame as it goes by A A C B A (d) C absorbs returning frame A Figure 15.4 Frame Transmission on a Ring LAN Choice of Topology The choice of topology depends on a variety of factors, including reliability, expandability, and performance. This choice is part of the overall task of designing a LAN and thus cannot be made in isolation, independent of the choice of transmission medium, wiring layout, and access control technique. A few general remarks can be made at this point. There are four alternative media that can be used for a bus LAN: • Twisted pair: In the early days of LAN development, voice-grade twisted pair was used to provide an inexpensive, easily installed bus LAN. A number of systems operating at 1 Mbps were implemented. Scaling twisted pair up to higher data rates in a shared-medium bus configuration is not practical, so this approach was dropped long ago. 456 CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW • Baseband coaxial cable: A baseband coaxial cable is one that makes use of digital signaling.The original Ethernet scheme makes use of baseband coaxial cable. • Broadband coaxial cable: Broadband coaxial cable is the type of cable used in cable television systems. Analog signaling is used at radio and television frequencies. This type of system is more expensive and more difficult to install and maintain than baseband coaxial cable. This approach never achieved popularity and such LANs are no longer made. • Optical fiber: There has been considerable research relating to this alternative over the years, but the expense of the optical fiber taps and the availability of better alternatives have resulted in the demise of this option as well. Thus, for a bus topology, only baseband coaxial cable has achieved widespread use, primarily for Ethernet systems. Compared to a star-topology twisted pair or optical fiber installation, the bus topology using baseband coaxial cable is difficult to work with. Even simple changes may require access to the coaxial cable, movement of taps, and rerouting of cable segments. Accordingly, few if any new installations are being attempted. Despite its limitations, there is a considerable installed base of baseband coaxial cable bus LANs. Very-high-speed links over considerable distances can be used for the ring topology. Hence, the ring has the potential of providing the best throughput of any topology. One disadvantage of the ring is that a single link or repeater failure could disable the entire network. The star topology takes advantage of the natural layout of wiring in a building. It is generally best for short distances and can support a small number of devices at high data rates. Choice of Transmission Medium The choice of transmission medium is determined by a number of factors. It is, we shall see, constrained by the topology of the LAN. Other factors come into play, including • • • • Capacity: to support the expected network traffic Reliability: to meet requirements for availability Types of data supported: tailored to the application Environmental scope: to provide service over the range of environments required The choice is part of the overall task of designing a local network, which is addressed in Chapter 16. Here we can make a few general observations. Voice-grade unshielded twisted pair (UTP) is an inexpensive, well-understood medium; this is the Category 3 UTP referred to in Chapter 4. Typically, office buildings are wired to meet the anticipated telephone system demand plus a healthy margin; thus, there are no cable installation costs in the use of Category 3 UTP. However, the data rate that can be supported is generally quite limited, with the exception of very small LAN. Category 3 UTP is likely to be the most cost-effective for a single-building, low-traffic LAN installation. Shielded twisted pair and baseband coaxial cable are more expensive than Category 3 UTP but provide greater capacity. Broadband cable is even more expensive but provides even greater capacity. However, in recent years, the trend has been 15.3 / LAN PROTOCOL ARCHITECTURE 457 toward the use of high-performance UTP, especially Category 5 UTP. Category 5 UTP supports high data rates for a small number of devices, but larger installations can be supported by the use of the star topology and the interconnection of the switching elements in multiple star-topology configurations. We discuss this point in Chapter 16. Optical fiber has a number of attractive features, such as electromagnetic isolation, high capacity, and small size, which have attracted a great deal of interest. As yet the market penetration of fiber LANs is low; this is primarily due to the high cost of fiber components and the lack of skilled personnel to install and maintain fiber systems. This situation is beginning to change rapidly as more products using fiber are introduced. 15.3 LAN PROTOCOL ARCHITECTURE The architecture of a LAN is best described in terms of a layering of protocols that organize the basic functions of a LAN. This section opens with a description of the standardized protocol architecture for LANs, which encompasses physical, medium access control (MAC), and logical link control (LLC) layers. The physical layer encompasses topology and transmission medium, and is covered in Section 15.2. This section provides an overview of the MAC and LLC layers. IEEE 802 Reference Model Protocols defined specifically for LAN and MAN transmission address issues relating to the transmission of blocks of data over the network. In OSI terms, higher layer protocols (layer 3 or 4 and above) are independent of network architecture and are applicable to LANs, MANs, and WANs. Thus, a discussion of LAN protocols is concerned principally with lower layers of the OSI model. Figure 15.5 relates the LAN protocols to the OSI architecture (Figure 2.11). This architecture was developed by the IEEE 802 LAN standards committee2 and has been adopted by all organizations working on the specification of LAN standards. It is generally referred to as the IEEE 802 reference model. Working from the bottom up, the lowest layer of the IEEE 802 reference model corresponds to the physical layer of the OSI model and includes such functions as • Encoding/decoding of signals • Preamble generation/removal (for synchronization) • Bit transmission/reception In addition, the physical layer of the 802 model includes a specification of the transmission medium and the topology. Generally, this is considered “below” the lowest layer of the OSI model. However, the choice of transmission medium and topology is critical in LAN design, and so a specification of the medium is included. Above the physical layer are the functions associated with providing service to LAN users. These include 2 This committee has developed standards for a wide range of LANs. See Appendix D for details. 458 CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW OSI reference model Application Presentation Session Transport IEEE 802 reference model Upperlayer protocols Network LLC service access point (LSAP) ( ) ( ) ( ) Logical link control Data link Figure 15.5 Medium access control Physical Physical Medium Medium Scope of IEEE 802 standards IEEE 802 Protocol Layers Compared to OSI Model • On transmission, assemble data into a frame with address and error detection fields. • On reception, disassemble frame, and perform address recognition and error detection. • Govern access to the LAN transmission medium. • Provide an interface to higher layers and perform flow and error control. These are functions typically associated with OSI layer 2. The set of functions in the last bullet item are grouped into a logical link control (LLC) layer. The functions in the first three bullet items are treated as a separate layer, called medium access control (MAC). The separation is done for the following reasons: • The logic required to manage access to a shared-access medium is not found in traditional layer 2 data link control. • For the same LLC, several MAC options may be provided. Figure 15.6 illustrates the relationship between the levels of the architecture (compare Figure 2.9). Higher-level data are passed down to LLC, which appends Application data Application layer TCP header TCP layer IP header IP layer LLC header LLC layer MAC trailer TCP segment IP datagram LLC protocol data unit MAC frame Figure 15.6 LAN Protocols in Context MAC layer 15.3 / LAN PROTOCOL ARCHITECTURE MAC header 459 460 CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW control information as a header, creating an LLC protocol data unit (PDU). This control information is used in the operation of the LLC protocol. The entire LLC PDU is then passed down to the MAC layer, which appends control information at the front and back of the packet, forming a MAC frame. Again, the control information in the frame is needed for the operation of the MAC protocol. For context, the figure also shows the use of TCP/IP and an application layer above the LAN protocols. Logical Link Control The LLC layer for LANs is similar in many respects to other link layers in common use. Like all link layers, LLC is concerned with the transmission of a link-level PDU between two stations, without the necessity of an intermediate switching node. LLC has two characteristics not shared by most other link control protocols: 1. It must support the multiaccess, shared-medium nature of the link (this differs from a multidrop line in that there is no primary node). 2. It is relieved of some details of link access by the MAC layer. Addressing in LLC involves specifying the source and destination LLC users. Typically, a user is a higher-layer protocol or a network management function in the station. These LLC user addresses are referred to as service access points (SAPs), in keeping with OSI terminology for the user of a protocol layer. We look first at the services that LLC provides to a higher-level user, and then at the LLC protocol. LLC Services LLC specifies the mechanisms for addressing stations across the medium and for controlling the exchange of data between two users. The operation and format of this standard is based on HDLC. Three services are provided as alternatives for attached devices using LLC: • Unacknowledged connectionless service: This service is a datagram-style service. It is a very simple service that does not involve any of the flow- and errorcontrol mechanisms. Thus, the delivery of data is not guaranteed. However, in most devices, there will be some higher layer of software that deals with reliability issues. • Connection-mode service: This service is similar to that offered by HDLC. A logical connection is set up between two users exchanging data, and flow control and error control are provided. • Acknowledged connectionless service: This is a cross between the previous two services. It provides that datagrams are to be acknowledged, but no prior logical connection is set up. Typically, a vendor will provide these services as options that the customer can select when purchasing the equipment. Alternatively, the customer can purchase equipment that provides two or all three services and select a specific service based on application. The unacknowledged connectionless service requires minimum logic and is useful in two contexts. First, it will often be the case that higher layers of software will provide the necessary reliability and flow-control mechanism, and it is efficient 15.3 / LAN PROTOCOL ARCHITECTURE 461 to avoid duplicating them. For example, TCP could provide the mechanisms needed to ensure that data is delivered reliably. Second, there are instances in which the overhead of connection establishment and maintenance is unjustified or even counterproductive (for example, data collection activities that involve the periodic sampling of data sources, such as sensors and automatic self-test reports from security equipment or network components). In a monitoring application, the loss of an occasional data unit would not cause distress, as the next report should arrive shortly. Thus, in most cases, the unacknowledged connectionless service is the preferred option. The connection-mode service could be used in very simple devices, such as terminal controllers, that have little software operating above this level. In these cases, it would provide the flow control and reliability mechanisms normally implemented at higher layers of the communications software. The acknowledged connectionless service is useful in several contexts. With the connection-mode service, the logical link control software must maintain some sort of table for each active connection, to keep track of the status of that connection. If the user needs guaranteed delivery but there are a large number of destinations for data, then the connection-mode service may be impractical because of the large number of tables required. An example is a process control or automated factory environment where a central site may need to communicate with a large number of processors and programmable controllers. Another use of this is the handling of important and time-critical alarm or emergency control signals in a factory. Because of their importance, an acknowledgment is needed so that the sender can be assured that the signal got through. Because of the urgency of the signal, the user might not want to take the time first to establish a logical connection and then send the data. LLC Protocol The basic LLC protocol is modeled after HDLC and has similar functions and formats. The differences between the two protocols can be summarized as follows: • LLC makes use of the asynchronous balanced mode of operation of HDLC, to support connection-mode LLC service; this is referred to as type 2 operation. The other HDLC modes are not employed. • LLC supports an unacknowledged connectionless service using the unnumbered information PDU; this is known as type 1 operation. • LLC supports an acknowledged connectionless service by using two new unnumbered PDUs; this is known as type 3 operation. • LLC permits multiplexing by the use of LLC service access points (LSAPs). All three LLC protocols employ the same PDU format (Figure 15.7), which consists of four fields. The DSAP (Destination Service Access Point) and SSAP (Source Service Access Point) fields each contain a 7-bit address, which specify the destination and source users of LLC. One bit of the DSAP indicates whether the DSAP is an individual or group address. One bit of the SSAP indicates whether the PDU is a command or response PDU. The format of the LLC control field is identical to that of HDLC (Figure 7.7), using extended (7-bit) sequence numbers. For type 1 operation, which supports the unacknowledged connectionless service, the unnumbered information (UI) PDU is used to transfer user data. There is 462 CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW MAC frame LLC PDU MAC control Destination MAC address Source MAC address LLC PDU 1 octet 1 1 or 2 Variable DSAP SSAP LLC control Information I/G DSAP value C/R SSAP value CRC LLC address fields I/G Individual/Group C/R Command/Response Figure 15.7 LLC PDU in a Generic MAC Frame Format no acknowledgment, flow control, or error control. However, there is error detection and discard at the MAC level. Two other PDUs are used to support management functions associated with all three types of operation. Both PDUs are used in the following fashion. An LLC entity may issue a command 1C/R bit = 02 XID or TEST. The receiving LLC entity issues a corresponding XID or TEST in response. The XID PDU is used to exchange two types of information: types of operation supported and window size. The TEST PDU is used to conduct a loopback test of the transmission path between two LLC entities. Upon receipt of a TEST command PDU, the addressed LLC entity issues a TEST response PDU as soon as possible. With type 2 operation, a data link connection is established between two LLC SAPs prior to data exchange. Connection establishment is attempted by the type 2 protocol in response to a request from a user. The LLC entity issues a SABME PDU3 to request a logical connection with the other LLC entity. If the connection is accepted by the LLC user designated by the DSAP, then the destination LLC entity returns an unnumbered acknowledgment (UA) PDU. The connection is henceforth uniquely identified by the pair of user SAPs. If the destination LLC user rejects the connection request, its LLC entity returns a disconnected mode (DM) PDU. Once the connection is established, data are exchanged using information PDUs, as in HDLC. The information PDUs include send and receive sequence numbers, for sequencing and flow control. The supervisory PDUs are used, as in HDLC, 3 This stands for Set Asynchronous Balanced Mode Extended. It is used in HDLC to choose ABM and to select extended sequence numbers of seven bits. Both ABM and 7-bit sequence numbers are mandatory in type 2 operation. 15.3 / LAN PROTOCOL ARCHITECTURE 463 for flow control and error control. Either LLC entity can terminate a logical LLC connection by issuing a disconnect (DISC) PDU. With type 3 operation, each transmitted PDU is acknowledged. A new (not found in HDLC) unnumbered PDU, the Acknowledged Connectionless (AC) Information PDU, is defined. User data are sent in AC command PDUs and must be acknowledged using an AC response PDU. To guard against lost PDUs, a 1-bit sequence number is used. The sender alternates the use of 0 and 1 in its AC command PDU, and the receiver responds with an AC PDU with the opposite number of the corresponding command. Only one PDU in each direction may be outstanding at any time. Medium Access Control All LANs and MANs consist of collections of devices that must share the network’s transmission capacity. Some means of controlling access to the transmission medium is needed to provide for an orderly and efficient use of that capacity. This is the function of a medium access control (MAC) protocol. The key parameters in any medium access control technique are where and how. Where refers to whether control is exercised in a centralized or distributed fashion. In a centralized scheme, a controller is designated that has the authority to grant access to the network. A station wishing to transmit must wait until it receives permission from the controller. In a decentralized network, the stations collectively perform a medium access control function to determine dynamically the order in which stations transmit. A centralized scheme has certain advantages, including • It may afford greater control over access for providing such things as priorities, overrides, and guaranteed capacity. • It enables the use of relatively simple access logic at each station. • It avoids problems of distributed coordination among peer entities. The principal disadvantages of centralized schemes are • It creates a single point of failure; that is, there is a point in the network that, if it fails, causes the entire network to fail. • It may act as a bottleneck, reducing performance. The pros and cons of distributed schemes are mirror images of the points just made. The second parameter, how, is constrained by the topology and is a tradeoff among competing factors, including cost, performance, and complexity. In general, we can categorize access control techniques as being either synchronous or asynchronous. With synchronous techniques, a specific capacity is dedicated to a connection. This is the same approach used in circuit switching, frequency division multiplexing (FDM), and synchronous time division multiplexing (TDM). Such techniques are generally not optimal in LANs and MANs because the needs of the stations are unpredictable. It is preferable to be able to allocate capacity in an asynchronous (dynamic) fashion, more or less in response to immediate demand. The asynchronous approach can be further subdivided into three categories: round robin, reservation, and contention. 464 CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW Round Robin With round robin, each station in turn is given the opportunity to transmit. During that opportunity, the station may decline to transmit or may transmit subject to a specified upper bound, usually expressed as a maximum amount of data transmitted or time for this opportunity. In any case, the station, when it is finished, relinquishes its turn, and the right to transmit passes to the next station in logical sequence. Control of sequence may be centralized or distributed. Polling is an example of a centralized technique. When many stations have data to transmit over an extended period of time, round-robin techniques can be very efficient. If only a few stations have data to transmit over an extended period of time, then there is a considerable overhead in passing the turn from station to station, because most of the stations will not transmit but simply pass their turns. Under such circumstances other techniques may be preferable, largely depending on whether the data traffic has a stream or bursty characteristic. Stream traffic is characterized by lengthy and fairly continuous transmissions; examples are voice communication, telemetry, and bulk file transfer. Bursty traffic is characterized by short, sporadic transmissions; interactive terminalhost traffic fits this description. Reservation For stream traffic, reservation techniques are well suited. In general, for these techniques, time on the medium is divided into slots, much as with synchronous TDM. A station wishing to transmit reserves future slots for an extended or even an indefinite period. Again, reservations may be made in a centralized or distributed fashion. Contention For bursty traffic, contention techniques are usually appropriate. With these techniques, no control is exercised to determine whose turn it is; all stations contend for time in a way that can be, as we shall see, rather rough and tumble. These techniques are of necessity distributed in nature. Their principal advantage is that they are simple to implement and, under light to moderate load, efficient. For some of these techniques, however, performance tends to collapse under heavy load. Although both centralized and distributed reservation techniques have been implemented in some LAN products, round-robin and contention techniques are the most common. MAC Frame Format The MAC layer receives a block of data from the LLC layer and is responsible for performing functions related to medium access and for transmitting the data. As with other protocol layers, MAC implements these functions making use of a protocol data unit at its layer. In this case, the PDU is referred to as a MAC frame. The exact format of the MAC frame differs somewhat for the various MAC protocols in use. In general, all of the MAC frames have a format similar to that of Figure 15.7. The fields of this frame are • MAC Control: This field contains any protocol control information needed for the functioning of the MAC protocol. For example, a priority level could be indicated here. • Destination MAC Address: The destination physical attachment point on the LAN for this frame. 15.4 / BRIDGES 465 • Source MAC Address: The source physical attachment point on the LAN for this frame. • LLC: The LLC data from the next higher layer. • CRC: The Cyclic Redundancy Check field (also known as the frame check sequence, FCS, field). This is an error-detecting code, as we have seen in HDLC and other data link control protocols (Chapter 7). In most data link control protocols, the data link protocol entity is responsible not only for detecting errors using the CRC, but for recovering from those errors by retransmitting damaged frames. In the LAN protocol architecture, these two functions are split between the MAC and LLC layers. The MAC layer is responsible for detecting errors and discarding any frames that are in error. The LLC layer optionally keeps track of which frames have been successfully received and retransmits unsuccessful frames. 15.4 BRIDGES In virtually all cases, there is a need to expand beyond the confines of a single LAN, to provide interconnection to other LANs and to wide area networks. Two general approaches are used for this purpose: bridges and routers. The bridge is the simpler of the two devices and provides a means of interconnecting similar LANs. The router is a more general-purpose device, capable of interconnecting a variety of LANs and WANs. We explore bridges in this section and look at routers in Part Five. The bridge is designed for use between local area networks (LANs) that use identical protocols for the physical and link layers (e.g., all conforming to IEEE 802.3). Because the devices all use the same protocols, the amount of processing required at the bridge is minimal. More sophisticated bridges are capable of mapping from one MAC format to another (e.g., to interconnect an Ethernet and a token ring LAN). Because the bridge is used in a situation in which all the LANs have the same characteristics, the reader may ask, why not simply have one large LAN? Depending on circumstance, there are several reasons for the use of multiple LANs connected by bridges: • Reliability: The danger in connecting all data processing devices in an organization to one network is that a fault on the network may disable communication for all devices. By using bridges, the network can be partitioned into self-contained units. • Performance: In general, performance on a LAN declines with an increase in the number of devices or the length of the wire. A number of smaller LANs will often give improved performance if devices can be clustered so that intranetwork traffic significantly exceeds internetwork traffic. • Security: The establishment of multiple LANs may improve security of communications. It is desirable to keep different types of traffic (e.g., accounting, 466 CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW personnel, strategic planning) that have different security needs on physically separate media. At the same time, the different types of users with different levels of security need to communicate through controlled and monitored mechanisms. • Geography: Clearly, two separate LANs are needed to support devices clustered in two geographically distant locations. Even in the case of two buildings separated by a highway, it may be far easier to use a microwave bridge link than to attempt to string coaxial cable between the two buildings. Functions of a Bridge Figure 15.8 illustrates the action of a bridge connecting two LANs, A and B, using the same MAC protocol. In this example, a single bridge attaches to both LANs; frequently, the bridge function is performed by two “half-bridges,” one on each LAN. The functions of the bridge are few and simple: • Read all frames transmitted on A and accept those addressed to any station on B. • Using the medium access control protocol for B, retransmit each frame on B. • Do the same for B-to-A traffic. Several design aspects of a bridge are worth highlighting: • The bridge makes no modification to the content or format of the frames it receives, nor does it encapsulate them with an additional header. Each frame to be transferred is simply copied from one LAN and repeated with exactly the same bit pattern on the other LAN. Because the two LANs use the same LAN protocols, it is permissible to do this. • The bridge should contain enough buffer space to meet peak demands. Over a short period of time, frames may arrive faster than they can be retransmitted. • The bridge must contain addressing and routing intelligence. At a minimum, the bridge must know which addresses are on each network to know which frames to pass. Further, there may be more than two LANs interconnected by a number of bridges. In that case, a frame may have to be routed through several bridges in its journey from source to destination. • A bridge may connect more than two LANs. In summary, the bridge provides an extension to the LAN that requires no modification to the communications software in the stations attached to the LANs. It appears to all stations on the two (or more) LANs that there is a single LAN on which each station has a unique address. The station uses that unique address and need not explicitly discriminate between stations on the same LAN and stations on other LANs; the bridge takes care of that. Bridge Protocol Architecture The IEEE 802.1D specification defines the protocol architecture for MAC bridges. Within the 802 architecture, the endpoint or station address is designated at the LAN A Frames with addresses 11 through 20 are accepted and repeated on LAN B Bridge Station 1 Station 2 Station 10 Frames with addresses 1 through 10 are accepted and repeated on LAN A LAN B Figure 15.8 Station 12 Station 20 15.4 / BRIDGES Station 11 Bridge Operation 467 468 CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW User LLC MAC Physical t1 t8 t2 t3 t7 LAN t4 MAC Physical t5 LAN Physical t6 User LLC MAC Physical (a) Architecture t1, t8 t2, t7 User data LLC-H User data t3, t4, t5, t6 MAC-H LLC-H User data MAC-T (b) Operation Figure 15.9 Connection of Two LANs by a Bridge MAC level. Thus, it is at the MAC level that a bridge can function. Figure 15.9 shows the simplest case, which consists of two LANs connected by a single bridge. The LANs employ the same MAC and LLC protocols. The bridge operates as previously described. A MAC frame whose destination is not on the immediate LAN is captured by the bridge, buffered briefly, and then transmitted on the other LAN. As far as the LLC layer is concerned, there is a dialogue between peer LLC entities in the two endpoint stations. The bridge need not contain an LLC layer because it is merely serving to relay the MAC frames. Figure 15.9b indicates the way in which data are encapsulated using a bridge. Data are provided by some user to LLC. The LLC entity appends a header and passes the resulting data unit to the MAC entity, which appends a header and a trailer to form a MAC frame. On the basis of the destination MAC address in the frame, it is captured by the bridge. The bridge does not strip off the MAC fields; its function is to relay the MAC frame intact to the destination LAN. Thus, the frame is deposited on the destination LAN and captured by the destination station. The concept of a MAC relay bridge is not limited to the use of a single bridge to connect two nearby LANs. If the LANs are some distance apart, then they can be connected by two bridges that are in turn connected by a communications facility. The intervening communications facility can be a network, such as a wide area packet-switching network, or a point-to-point link. In such cases, when a bridge captures a MAC frame, it must encapsulate the frame in the appropriate packaging and transmit it over the communications facility to a target bridge. The target bridge strips off these extra fields and transmits the original, unmodified MAC frame to the destination station. Fixed Routing There is a trend within many organizations to an increasing number of LANs interconnected by bridges. As the number of LANs grows, it becomes important to 15.4 / BRIDGES 469 provide alternate paths between LANs via bridges for load balancing and reconfiguration in response to failure. Thus, many organizations will find that static, preconfigured routing tables are inadequate and that some sort of dynamic routing is needed. Consider the configuration of Figure 15.10. Suppose that station 1 transmits a frame on LAN A intended for station 6. The frame will be read by bridges 101, 102, and 107. For each bridge, the addressed station is not on a LAN to which the bridge is attached. Therefore, each bridge must make a decision whether or not to retransmit the frame on its other LAN, in order to move it closer to its intended destination. In this case, bridge 102 should repeat the frame on LAN C, whereas bridges 101 and 107 should refrain from retransmitting the frame. Once the frame has been transmitted on LAN C, it will be picked up by both bridges 105 and 106. Again, each must decide whether or not to forward the frame. In this case, bridge 105 should retransmit the frame on LAN F, where it will be received by the destination, station 6. Thus we see that, in the general case, the bridge must be equipped with a routing capability. When a bridge receives a frame, it must decide whether or not to Station 1 Station 2 Station 3 LAN A Bridge 101 Bridge 107 LAN B Bridge 103 LAN D Station 4 Bridge 102 LAN C Bridge 104 LAN E Station 5 Bridge 105 LAN F Station 6 Bridge 106 LAN G Station 7 Figure 15.10 Configuration of Bridges and LANs, with Alternate Routes 470 CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW forward it. If the bridge is attached to two or more networks, then it must decide whether or not to forward the frame and, if so, on which LAN the frame should be transmitted. The routing decision may not always be a simple one. Figure 15.10 also shows that there are two routes between LAN A and LAN E. Such redundancy provides for higher overall Internet availability and creates the possibility for load balancing. In this case, if station 1 transmits a frame on LAN A intended for station 5 on LAN E, then either bridge 101 or bridge 107 could forward the frame. It would appear preferable for bridge 107 to forward the frame, since it will involve only one hop, whereas if the frame travels through bridge 101, it must suffer two hops. Another consideration is that there may be changes in the configuration. For example, bridge 107 may fail, in which case subsequent frames from station 1 to station 5 should go through bridge 101. So we can say that the routing capability must take into account the topology of the internet configuration and may need to be dynamically altered. A variety of routing strategies have been proposed and implemented in recent years. The simplest and most common strategy is fixed routing. This strategy is suitable for small internets and for internets that are relatively stable. In addition, two groups within the IEEE 802 committee have developed specifications for routing strategies. The IEEE 802.1 group has issued a standard for routing based on the use of a spanning tree algorithm. The token ring committee, IEEE 802.5, has issued its own specification, referred to as source routing. In the remainder of this section, we look at fixed routing and the spanning tree algorithm, which is the most commonly used bridge routing algorithm. For fixed routing, a route is selected for each source-destination pair of LANs in the configuration. If alternate routes are available between two LANs, then typically the route with the least number of hops is selected. The routes are fixed, or at least only change when there is a change in the topology of the internet. The strategy for developing a fixed routing configuration for bridges is similar to that employed in a packet-switching network (Figure 12.2). A central routing matrix is created, to be stored perhaps at a network control center. The matrix shows, for each source-destination pair of LANs, the identity of the first bridge on the route. So, for example, the route from LAN E to LAN F begins by going through bridge 107 to LAN A. Again consulting the matrix, the route from LAN A to LAN F goes through bridge 102 to LAN C. Finally, the route from LAN C to LAN F is directly through bridge 105. Thus the complete route from LAN E to LAN F is bridge 107, LAN A, bridge 102, LAN C, bridge 105. From this overall matrix, routing tables can be developed and stored at each bridge. Each bridge needs one table for each LAN to which it attaches. The information for each table is derived from a single row of the matrix. For example, bridge 105 has two tables, one for frames arriving from LAN C and one for frames arriving from LAN F. The table shows, for each possible destination MAC address, the identity of the LAN to which the bridge should forward the frame. Once the directories have been established, routing is a simple matter. A bridge copies each incoming frame on each of its LANs. If the destination MAC address corresponds to an entry in its routing table, the frame is retransmitted on the appropriate LAN. 15.4 / BRIDGES 471 The fixed routing strategy is widely used in commercially available products. It requires that a network manager manually load the data into the routing tables. It has the advantage of simplicity and minimal processing requirements. However, in a complex internet, in which bridges may be dynamically added and in which failures must be allowed for, this strategy is too limited. The Spanning Tree Approach The spanning tree approach is a mechanism in which bridges automatically develop a routing table and update that table in response to changing topology. The algorithm consists of three mechanisms: frame forwarding, address learning, and loop resolution. Frame Forwarding In this scheme, a bridge maintains a forwarding database for each port attached to a LAN. The database indicates the station addresses for which frames should be forwarded through that port. We can interpret this in the following fashion. For each port, a list of stations is maintained. A station is on the list if it is on the “same side” of the bridge as the port. For example, for bridge 102 of Figure 15.10, stations on LANs C, F, and G are on the same side of the bridge as the LAN C port, and stations on LANs A, B, D, and E are on the same side of the bridge as the LAN A port. When a frame is received on any port, the bridge must decide whether that frame is to be forwarded through the bridge and out through one of the bridge’s other ports. Suppose that a bridge receives a MAC frame on port x. The following rules are applied: 1. Search the forwarding database to determine if the MAC address is listed for any port except port x. 2. If the destination MAC address is not found, forward frame out all ports except the one from which is was received. This is part of the learning process described subsequently. 3. If the destination address is in the forwarding database for some port y, then determine whether port y is in a blocking or forwarding state. For reasons explained later, a port may sometimes be blocked, which prevents it from receiving or transmitting frames. 4. If port y is not blocked, transmit the frame through port y onto the LAN to which that port attaches. Address Learning The preceding scheme assumes that the bridge is already equipped with a forwarding database that indicates the direction, from the bridge, of each destination station. This information can be preloaded into the bridge, as in fixed routing. However, an effective automatic mechanism for learning the direction of each station is desirable. A simple scheme for acquiring this information is based on the use of the source address field in each MAC frame. The strategy is this. When a frame arrives on a particular port, it clearly has come from the direction of the incoming LAN. The source address field of the frame indicates the source station. Thus, a bridge can update its forwarding database for that port on the basis of the source address field of each incoming frame. To allow for changes in topology, each element in the database is equipped with a 472 CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW timer. When a new element is added to the database, its timer is set. If the timer expires, then the element is eliminated from the database, since the corresponding direction information may no longer be valid. Each time a frame is received, its source address is checked against the database. If the element is already in the database, the entry is updated (the direction may have changed) and the timer is reset. If the element is not in the database, a new entry is created, with its own timer. Spanning Tree Algorithm The address learning mechanism described previously is effective if the topology of the internet is a tree; that is, if there are no alternate routes in the network. The existence of alternate routes means that there is a closed loop. For example in Figure 15.10, the following is a closed loop: LAN A, bridge 101, LAN B, bridge 104, LAN E, bridge 107, LAN A. To see the problem created by a closed loop, consider Figure 15.11. At time t0 , station A transmits a frame addressed to station B. The frame is captured by both bridges. Each bridge updates its database to indicate that station A is in the direction of LAN X, and retransmits the frame on LAN Y. Say that bridge a retransmits at time t1 and bridge b a short time later t2 . Thus B will receive two copies of the frame. Furthermore, each bridge will receive the other’s transmission on LAN Y. Note that each transmission is a frame with a source address of A and a destination address of B. Thus each bridge will update its database to indicate that station A is in Station B LAN Y t1 t2 Bridge A LAN X Bridge B t0 t0 Station A Figure 15.11 Loop of Bridges 15.5 / LAYER 2 AND LAYER 3 SWITCHES 473 the direction of LAN Y. Neither bridge is now capable of forwarding a frame addressed to station A. To overcome this problem, a simple result from graph theory is used: For any connected graph, consisting of nodes and edges connecting pairs of nodes, there is a spanning tree of edges that maintains the connectivity of the graph but contains no closed loops. In terms of internets, each LAN corresponds to a graph node, and each bridge corresponds to a graph edge. Thus, in Figure 15.10, the removal of one (and only one) of bridges 107, 101, and 104, results in a spanning tree. What is desired is to develop a simple algorithm by which the bridges of the internet can exchange sufficient information to automatically (without user intervention) derive a spanning tree. The algorithm must be dynamic. That is, when a topology change occurs, the bridges must be able to discover this fact and automatically derive a new spanning tree. The spanning tree algorithm developed by IEEE 802.1, as the name suggests, is able to develop such a spanning tree. All that is required is that each bridge be assigned a unique identifier and that costs be assigned to each bridge port. In the absence of any special considerations, all costs could be set equal; this produces a minimum-hop tree. The algorithm involves a brief exchange of messages among all of the bridges to discover the minimum-cost spanning tree. Whenever there is a change in topology, the bridges automatically recalculate the spanning tree. 15.5 LAYER 2 AND LAYER 3 SWITCHES In recent years, there has been a proliferation of types of devices for interconnecting LANs that goes beyond the bridges discussed in Section 15.4 and the routers discussed in Part Five. These devices can conveniently be grouped into the categories of layer 2 switches and layer 3 switches. We begin with a discussion of hubs and then explore these two concepts. Hubs Earlier, we used the term hub in reference to a star-topology LAN. The hub is the active central element of the star layout. Each station is connected to the hub by two lines (transmit and receive). The hub acts as a repeater: When a single station transmits, the hub repeats the signal on the outgoing line to each station. Ordinarily, the line consists of two unshielded twisted pairs. Because of the high data rate and the poor transmission qualities of unshielded twisted pair, the length of a line is limited to about 100 m. As an alternative, an optical fiber link may be used. In this case, the maximum length is about 500 m. Note that although this scheme is physically a star, it is logically a bus: A transmission from any one station is received by all other stations, and if two stations transmit at the same time there will be a collision. Multiple levels of hubs can be cascaded in a hierarchical configuration. Figure 15.12 illustrates a two-level configuration. There is one header hub (HHUB) and one or more intermediate hubs (IHUB). Each hub may have a 474 CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW HHUB Two cables (twisted pair or optical fiber) IHUB IHUB Station Transmit Receive Station Figure 15.12 Station Station Station Two-Level Star Topology mixture of stations and other hubs attached to it from below. This layout fits well with building wiring practices. Typically, there is a wiring closet on each floor of an office building, and a hub can be placed in each one. Each hub could service the stations on its floor. Layer 2 Switches In recent years, a new device, the layer 2 switch, has replaced the hub in popularity, particularly for high-speed LANs. The layer 2 switch is also sometimes referred to as a switching hub. To clarify the distinction between hubs and switches, Figure 15.13a shows a typical bus layout of a traditional 10-Mbps LAN. A bus is installed that is laid out so that all the devices to be attached are in reasonable proximity to a point on the bus. In the figure, station B is transmitting. This transmission goes from B, across the lead from B to the bus, along the bus in both directions, and along the access lines of each of the other attached stations. In this configuration, all the stations must share the total capacity of the bus, which is 10 Mbps. A hub, often in a building wiring closet, uses a star wiring arrangement to attach stations to the hub. In this arrangement, a transmission from any one station is received by the hub and retransmitted on all of the outgoing lines. Therefore, to avoid collision, only one station can transmit at a time. Again, the total capacity of the LAN is 10 Mbps. The hub has several advantages over the simple bus arrangement. It exploits standard building wiring practices in the layout of cable. In addition, the hub can be configured to recognize a malfunctioning station that is 15.5 / LAYER 2 AND LAYER 3 SWITCHES 475 A B 10 Mbps 10 Mbps 10 Mbps 10 Mbps Shared bus — 10 Mbps C D (a) Shared medium bus Total capacity up to 10 Mbps 10 Mbps 10 Mbps 10 Mbps A 10 Mbps B C D (b) Shared medium hub Total capacity N 10 Mbps 10 Mbps 10 Mbps 10 Mbps A 10 Mbps B C D (c) Layer 2 switch Figure 15.13 Lan Hubs and Switches jamming the network and to cut that station out of the network. Figure 15.13b illustrates the operation of a hub. Here again, station B is transmitting. This transmission goes from B, across the transmit line from B to the hub, and from the hub along the receive lines of each of the other attached stations. We can achieve greater performance with a layer 2 switch. In this case, the central hub acts as a switch, much as a packet switch or circuit switch. With a layer 2 switch, an incoming frame from a particular station is switched to the appropriate output line to be delivered to the intended destination. At the same time, other unused lines can be used for switching other traffic. Figure 15.13c shows an example in which B is transmitting a frame to A and at the same time C is transmitting a frame to D. So, in this example, the current throughput on the LAN is 20 Mbps, although each individual device is limited to 10 Mbps. The layer 2 switch has several attractive features: 476 CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW 1. No change is required to the software or hardware of the attached devices to convert a bus LAN or a hub LAN to a switched LAN. In the case of an Ethernet LAN, each attached device continues to use the Ethernet medium access control protocol to access the LAN. From the point of view of the attached devices, nothing has changed in the access logic. 2. Each attached device has a dedicated capacity equal to that of the entire original LAN, assuming that the layer 2 switch has sufficient capacity to keep up with all attached devices. For example, in Figure 15.13c, if the layer 2 switch can sustain a throughput of 20 Mbps, each attached device appears to have a dedicated capacity for either input or output of 10 Mbps. 3. The layer 2 switch scales easily. Additional devices can be attached to the layer 2 switch by increasing the capacity of the layer 2 switch correspondingly. Two types of layer 2 switches are available as commercial products: • Store-and-forward switch: The layer 2 switch accepts a frame on an input line, buffers it briefly, and then routes it to the appropriate output line. • Cut-through switch: The layer 2 switch takes advantage of the fact that the destination address appears at the beginning of the MAC (medium access control) frame. The layer 2 switch begins repeating the incoming frame onto the appropriate output line as soon as the layer 2 switch recognizes the destination address. The cut-through switch yields the highest possible throughput but at some risk of propagating bad frames, because the switch is not able to check the CRC prior to retransmission. The store-and-forward switch involves a delay between sender and receiver but boosts the overall integrity of the network. A layer 2 switch can be viewed as a full-duplex version of the hub. It can also incorporate logic that allows it to function as a multiport bridge. [BREY99] lists the following differences between layer 2 switches and bridges: • Bridge frame handling is done in software. A layer 2 switch performs the address recognition and frame forwarding functions in hardware. • A bridge can typically only analyze and forward one frame at a time, whereas a layer 2 switch has multiple parallel data paths and can handle multiple frames at a time. • A bridge uses store-and-forward operation. With a layer 2 switch, it is possible to have cut-through instead of store-and-forward operation. Because a layer 2 switch has higher performance and can incorporate the functions of a bridge, the bridge has suffered commercially. New installations typically include layer 2 switches with bridge functionality rather than bridges. Layer 3 Switches Layer 2 switches provide increased performance to meet the needs of high-volume traffic generated by personal computers, workstations, and servers. However, as the number of devices in a building or complex of buildings grows, layer 2 switches 15.5 / LAYER 2 AND LAYER 3 SWITCHES 477 reveal some inadequacies. Two problems in particular present themselves: broadcast overload and the lack of multiple links. A set of devices and LANs connected by layer 2 switches is considered to have a flat address space. The term flat means that all users share a common MAC broadcast address. Thus, if any device issues a MAC frame with a broadcast address, that frame is to be delivered to all devices attached to the overall network connected by layer 2 switches and/or bridges. In a large network, frequent transmission of broadcast frames can create tremendous overhead. Worse, a malfunctioning device can create a broadcast storm, in which numerous broadcast frames clog the network and crowd out legitimate traffic. A second performance-related problem with the use of bridges and/or layer 2 switches is that the current standards for bridge protocols dictate that there be no closed loops in the network. That is, there can only be one path between any two devices. Thus, it is impossible, in a standards-based implementation, to provide multiple paths through multiple switches between devices. This restriction limits both performance and reliability. To overcome these problems, it seems logical to break up a large local network into a number of subnetworks connected by routers. A MAC broadcast frame is then limited to only the devices and switches contained in a single subnetwork. Furthermore, IP-based routers employ sophisticated routing algorithms that allow the use of multiple paths between subnetworks going through different routers. However, the problem with using routers to overcome some of the inadequacies of bridges and layer 2 switches is that routers typically do all of the IP-level processing involved in the forwarding of IP traffic in software. High-speed LANs and high-performance layer 2 switches may pump millions of packets per second, whereas a software-based router may only be able to handle well under a million packets per second. To accommodate such a load, a number of vendors have developed layer 3 switches, which implement the packet-forwarding logic of the router in hardware. There are a number of different layer 3 schemes on the market, but fundamentally they fall into two categories: packet by packet and flow based. The packetby-packet switch operates in the identical fashion as a traditional router. Because the forwarding logic is in hardware, the packet-by-packet switch can achieve an order of magnitude increase in performance compared to the software-based router. A flow-based switch tries to enhance performance by identifying flows of IP packets that have the same source and destination. This can be done by observing ongoing traffic or by using a special flow label in the packet header (allowed in IPv6 but not IPv4). Once a flow is identified, a predefined route can be established through the network to speed up the forwarding process. Again, huge performance increases over a pure software-based router are achieved. Figure 15.14 is a typical example of the approach taken to local networking in an organization with a large number of PCs and workstations (thousands to tens of thousands). Desktop systems have links of 10 Mbps to 100 Mbps into a LAN controlled by a layer 2 switch. Wireless LAN connectivity is also likely to be available for mobile users. Layer 3 switches are at the local network’s core, forming a local backbone. Typically, these switches are interconnected at 1 Gbps and connect to layer 2 switches at from 100 Mbps to 1 Gbps. Servers connect directly to layer 2 or 478 CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW Layer 3 switch 1 Gbps WAN Router 1 Gbps 1 Gbps Layer 2 switch Layer 3 switch 1 Gbps Layer 2 switch Layer 2 switch 10/100 Mbps 10/100 Mbps 11 Mbps Laptop with wireless connection Figure 15.14 Typical Premises Network Configuration layer 3 switches at 1 Gbps or possible 100 Mbps. A lower-cost software-based router provides WAN connection. The circles in the figure identify separate LAN subnetworks; a MAC broadcast frame is limited to its own subnetwork. 15.6 RECOMMENDED READING AND WEB SITE The material in this chapter is covered in much more depth in [STAL00]. [REGA04] and [FORO02] also provides extensive coverage. [METZ99] is an excellent treatment of layer 2 and layer 3 switches, with a detailed discussion of products and case studies. Another comprehensive account is [SEIF00]. 15.7 / KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS 479 FORO02 Forouzan, B., and Chung, S. Local Area Networks. New York: McGraw-Hill, 2002. METZ99 Metzler, J., and DeNoia, L. Layer 2 Switching. Upper Saddle River, NJ: Prentice Hall, 1999. REGA04 Regan, P. Local Area Networks. Upper Saddle River, NJ: Prentice Hall, 2004. SEIF00 Seifert, R. The Switch Book. New York: Wiley, 2000. STAL00 Stallings, W. Local and Metropolitan Area Networks, Sixth Edition. Upper Saddle River, NJ: Prentice Hall, 2000. Recommended Web site: • IEEE 802 LAN/MAN Standards Committee: Status and documents for all of the working groups 15.7 KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS Key Terms local area network (LAN) logical link control medium access control (MAC) ring topology spanning tree bridge bus topology hub layer 2 switch layer 3 switch star topology tree topology switch storage area networks (SAN) Review Questions 15.1. 15.2. 15.3. 15.4. 15.5. 15.6. 15.7. 15.8. 15.9. 15.10. 15.11. 15.12. 15.13. How do the key requirements for computer room networks differ from those for personal computer local networks? What are the differences among backend LANs, SANs, and backbone LANs? What is network topology? List four common LAN topologies and briefly describe their methods of operation. What is the purpose of the IEEE 802 committee? Why are there multiple LAN standards? List and briefly define the services provided by LLC. List and briefly define the types of operation provided by the LLC protocol. List some basic functions performed at the MAC layer. What functions are performed by a bridge? What is a spanning tree? What is the difference between a hub and a layer 2 switch? What is the difference between a store-and forward switch and a cut-through switch? 480 CHAPTER 15 / LOCAL AREA NETWORK OVERVIEW Problems 15.1 15.2 15.3 15.4 15.5 15.6 15.7 15.8 Instead of LLC, could HDLC be used as a data link control protocol for a LAN? If not, what is lacking? An asynchronous device, such as a teletype, transmits characters one at a time with unpredictable delays between characters. What problems, if any, do you foresee if such a device is connected to a LAN and allowed to transmit at will (subject to gaining access to the medium)? How might such problems be resolved? Consider the transfer of a file containing one million 8-bit characters from one station to another. What is the total elapsed time and effective throughput for the following cases: a. A circuit-switched, star-topology local network. Call setup time is negligible and the data rate on the medium is 64 kbps. b. A bus topology local network with two stations a distance D apart, a data rate of B bps, and a frame size of P with 80 bits of overhead per frame. Each frame is acknowledged with an 88-bit frame before the next is sent. The propagation speed on the bus is 200 m/ms. Solve for: 1. D = 1 km, B = 1 Mbps, P = 256 bits 2. D = 1 km, B = 10 Mbps, P = 256 bits 3. D = 10 km, B = 1 Mbps, P = 256 bits 4. D = 1 km, B = 50 Mbps, P = 10,000 bits c. A ring topology local network with a total circular length of 2D, with the two stations a distance D apart. Acknowledgment is achieved by allowing a frame to circulate past the destination station, back to the source station, with an acknowledgment bit set by the destination. There are N repeaters on the ring, each of which introduces a delay of one bit time. Repeat the calculation for each of b1 through b4 for N = 10; 100; 1000. Consider a baseband bus with a number of equally spaced stations with a data rate of 10 Mbps and a bus length of 1 km. a. What is the mean time to send a frame of 1000 bits to another station, measured from the beginning of transmission to the end of reception? Assume a propagation speed of 200 m/ms. b. If two stations begin to transmit at exactly the same time, their packets will interfere with each other. If each transmitting station monitors the bus during transmission, how long before it notices an interference, in seconds? In bit times? Repeat Problem 15.4 for a data rate of 100 Mbps. At a propagation speed of 200 m/ms, what is the effective length added to a ring by a bit delay at each repeater? a. At 1 Mbps b. At 40 Mbps A tree topology is to be provided that spans two buildings. If permission can be obtained to string cable between the two buildings, one continuous tree layout will be used. Otherwise, each building will have an independent tree topology network and a point-to-point link will connect a special communications station on one network with a communications station on the other network. What functions must the communications stations perform? Repeat for ring and star. System A consists of a single ring with 300 stations, one per repeater. System B consists of three 100-station rings linked by a bridge. If the probability of a link failure is P1 , a repeater failure is Pr , and a bridge failure is Pb , derive an expression for parts (a) through (d): a. Probability of failure of system A b. Probability of complete failure of system B 15.7 / KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS 15.9 15.10 481 c. Probability that a particular station will find the network unavailable, for systems A and B d. Probability that any two stations, selected at random, will be unable to communicate, for systems A and B e. Compute values for parts (a) through (d) for P1 = Pb = Pr = 10-2. Draw figures similar to Figure 15.9 for a configuration in which a. Two LANs are connected via two bridges that are connected by a point-to-point link. b. Two LANs are connected via two bridges that are connected by an X.25 packetswitching network. For the configuration of Figure 15.10, show the central routing matrix and the routing tables at each bridge. CHAPTER HIGH-SPEED LANS 16.1 The Emergence of High-Speed LANS 16.2 Ethernet 16.3 Fibre Channel 16.4 Recommended Reading and Web Sites 16.5 Key Terms, Review Questions, and Problems Appendix 16A Digital Signal Encoding for LANS Appendix 16B Performance Issues Appendix 16C Scrambling 482 482 16 16.1 / THE EMERGENCE OF HIGH-SPEED LANS 483 Congratulations. I knew the record would stand until it was broken. Yogi Berra KEY POINTS • • • The IEEE 802.3 standard, known as Ethernet, now encompasses data rates of 10 Mbps, 100 Mbps, 1 Gbps, and 10 Gbps. For the lower data rates, the CSMA/CD MAC protocol is used. For the 1-Gbps and 10-Gbps options, a switched technique is used. Fibre Channel is a switched network of nodes designed to provide high-speed linkages for such applications as storage area networks. A variety of signal encoding techniques are used in the various LAN standards to achieve efficiency and to make the high data rates practical. Recent years have seen rapid changes in the technology, design, and commercial applications for local area networks (LANs). A major feature of this evolution is the introduction of a variety of new schemes for high-speed local networking. To keep pace with the changing local networking needs of business, a number of approaches to high speed LAN design have become commercial products. The most important of these are • Fast Ethernet and Gigabit Ethernet: The extension of 10-Mbps CSMA/CD (carrier sense multiple access with collision detection) to higher speeds is a logical strategy because it tends to preserve the investment in existing systems. • Fibre Channel: This standard provides a low-cost, easily scalable approach to achieving very high data rates in local areas. • High-speed wireless LANs: Wireless LAN technology and standards have at last come of age, and high-speed standards and products are being introduced. Table 16.1 lists some of the characteristics of these approaches.The remainder of this chapter fills in some of the details on Ethernet and Fibre Channel. Chapter 17 covers wireless LANs. 16.1 THE EMERGENCE OF HIGH-SPEED LANS Personal computers and microcomputer workstations began to achieve widespread acceptance in business computing in the early 1980s and have now achieved the status of the telephone: an essential tool for office workers. Until relatively recently, office LANs provided basic connectivity services—connecting personal computers 484 CHAPTER 16 / HIGH-SPEED LANS Table 16.1 Characteristics of Some High-Speed LANs Fast Ethernet Gigabit Ethernet Fibre Channel Wireless LAN Data Rate 100 Mbps 1 Gbps, 10 Gbps 100 Mbps–3.2 Gbps 1 Mbps–54 Mbps Transmission Media UTP, STP, optical Fiber UTP, shielded cable, optical fiber Optical fiber, coaxial cable, STP 2.4-GHz, 5-GHz microwave Access Method CSMA/CD Switched Switched CSMA/Polling Supporting Standard IEEE 802.3 IEEE 802.3 Fibre Channel Association IEEE 802.11 and terminals to mainframes and midrange systems that ran corporate applications, and providing workgroup connectivity at the departmental or divisional level. In both cases, traffic patterns were relatively light, with an emphasis on file transfer and electronic mail. The LANs that were available for this type of workload, primarily Ethernet and token ring, are well suited to this environment. In recent years, two significant trends have altered the role of the personal computer and therefore the requirements on the LAN: • The speed and computing power of personal computers has continued to enjoy explosive growth. Today’s more powerful platforms support graphicsintensive applications and ever more elaborate graphical user interfaces to the operating system. • MIS organizations have recognized the LAN as a viable and indeed essential computing platform, resulting in the focus on network computing. This trend began with client/server computing, which has become a dominant architecture in the business environment and the more recent intranetwork trend. Both of these approaches involve the frequent transfer of potentially large volumes of data in a transaction-oriented environment. The effect of these trends has been to increase the volume of data to be handled over LANs and, because applications are more interactive, to reduce the acceptable delay on data transfers. The earlier generation of 10-Mbps Ethernets and 16-Mbps token rings are simply not up to the job of supporting these requirements. The following are examples of requirements that call for higher-speed LANs: • Centralized server farms: In many applications, there is a need for user, or client, systems to be able to draw huge amounts of data from multiple centralized servers, called server farms. An example is a color publishing operation, in which servers typically contain hundreds of gigabytes of image data that must be downloaded to imaging workstations. As the performance of the servers themselves has increased, the bottleneck has shifted to the network. • Power workgroups: These groups typically consist of a small number of cooperating users who need to draw massive data files across the network. Examples are a software development group that runs tests on a new software version, or a computer-aided design (CAD) company that regularly runs simulations of new designs. In such cases, large amounts of data are distributed to several workstations, processed, and updated at very high speed for multiple iterations. 16.2 / ETHERNET 485 • High-speed local backbone: As processing demand grows, LANs proliferate at a site, and high-speed interconnection is necessary. 16.2 ETHERNET The most widely used high-speed LANs today are based on Ethernet and were developed by the IEEE 802.3 standards committee. As with other LAN standards, there is both a medium access control layer and a physical layer, which are considered in turn in what follows. IEEE 802.3 Medium Access Control It is easier to understand the operation of CSMA/CD if we look first at some earlier schemes from which CSMA/CD evolved. Precursors CSMA/CD and its precursors can be termed random access, or contention, techniques. They are random access in the sense that there is no predictable or scheduled time for any station to transmit; station transmissions are ordered randomly. They exhibit contention in the sense that stations contend for time on the shared medium. The earliest of these techniques, known as ALOHA, was developed for packet radio networks. However, it is applicable to any shared transmission medium. ALOHA, or pure ALOHA as it is sometimes called, specifies that a station may transmit a frame at any time. The station then listens for an amount of time equal to the maximum possible round-trip propagation delay on the network (twice the time it takes to send a frame between the two most widely separated stations) plus a small fixed time increment. If the station hears an acknowledgment during that time, fine; otherwise, it resends the frame. If the station fails to receive an acknowledgment after repeated transmissions, it gives up. A receiving station determines the correctness of an incoming frame by examining a frame check sequence field, as in HDLC. If the frame is valid and if the destination address in the frame header matches the receiver’s address, the station immediately sends an acknowledgment. The frame may be invalid due to noise on the channel or because another station transmitted a frame at about the same time. In the latter case, the two frames may interfere with each other at the receiver so that neither gets through; this is known as a collision. If a received frame is determined to be invalid, the receiving station simply ignores the frame. ALOHA is as simple as can be, and pays a penalty for it. Because the number of collisions rises rapidly with increased load, the maximum utilization of the channel is only about 18%. To improve efficiency, a modification of ALOHA, known as slotted ALOHA, was developed. In this scheme, time on the channel is organized into uniform slots whose size equals the frame transmission time. Some central clock or other technique is needed to synchronize all stations. Transmission is permitted to begin only at a slot boundary. Thus, frames that do overlap will do so totally. This increases the maximum utilization of the system to about 37%. 486 CHAPTER 16 / HIGH-SPEED LANS Both ALOHA and slotted ALOHA exhibit poor utilization. Both fail to take advantage of one of the key properties of both packet radio networks and LANs, which is that propagation delay between stations may be very small compared to frame transmission time. Consider the following observations. If the station-to-station propagation time is large compared to the frame transmission time, then, after a station launches a frame, it will be a long time before other stations know about it. During that time, one of the other stations may transmit a frame; the two frames may interfere with each other and neither gets through. Indeed, if the distances are great enough, many stations may begin transmitting, one after the other, and none of their frames get through unscathed. Suppose, however, that the propagation time is small compared to frame transmission time. In that case, when a station launches a frame, all the other stations know it almost immediately. So, if they had any sense, they would not try transmitting until the first station was done. Collisions would be rare because they would occur only when two stations began to transmit almost simultaneously. Another way to look at it is that a short propagation delay provides the stations with better feedback about the state of the network; this information can be used to improve efficiency. The foregoing observations led to the development of carrier sense multiple access (CSMA). With CSMA, a station wishing to transmit first listens to the medium to determine if another transmission is in progress (carrier sense). If the medium is in use, the station must wait. If the medium is idle, the station may transmit. It may happen that two or more stations attempt to transmit at about the same time. If this happens, there will be a collision; the data from both transmissions will be garbled and not received successfully. To account for this, a station waits a reasonable amount of time after transmitting for an acknowledgment, taking into account the maximum round-trip propagation delay and the fact that the acknowledging station must also contend for the channel to respond. If there is no acknowledgment, the station assumes that a collision has occurred and retransmits. One can see how this strategy would be effective for networks in which the average frame transmission time is much longer than the propagation time. Collisions can occur only when more than one user begins transmitting within a short time interval (the period of the propagation delay). If a station begins to transmit a frame, and there are no collisions during the time it takes for the leading edge of the packet to propagate to the farthest station, then there will be no collision for this frame because all other stations are now aware of the transmission. The maximum utilization achievable using CSMA can far exceed that of ALOHA or slotted ALOHA. The maximum utilization depends on the length of the frame and on the propagation time; the longer the frames or the shorter the propagation time, the higher the utilization. With CSMA, an algorithm is needed to specify what a station should do if the medium is found busy. Three approaches are depicted in Figure 16.1. One algorithm is nonpersistent CSMA. A station wishing to transmit listens to the medium and obeys the following rules: 1. If the medium is idle, transmit; otherwise, go to step 2. 2. If the medium is busy, wait an amount of time drawn from a probability distribution (the retransmission delay) and repeat step 1. 16.2 / ETHERNET Constant or variable delay 487 Nonpersistent: •Transmit if idle •If busy, wait random time and repeat process •If collision, back off Channel busy Time Ready 1-Persistent: •Transmit as soon as channel goes idle •If collision, back off Figure 16.1 P-Persistent: •Transmit as soon as channel goes idle with probability P •Otherwise, delay one time slot and repeat process •If collision, back off CSMA Persistence and Backoff The use of random delays reduces the probability of collisions. To see this, consider that two stations become ready to transmit at about the same time while another transmission is in progress; if both stations delay the same amount of time before trying again, they will both attempt to transmit at about the same time. A problem with nonpersistent CSMA is that capacity is wasted because the medium will generally remain idle following the end of a transmission even if there are one or more stations waiting to transmit. To avoid idle channel time, the 1-persistent protocol can be used. A station wishing to transmit listens to the medium and obeys the following rules: 1. If the medium is idle, transmit; otherwise, go to step 2. 2. If the medium is busy, continue to listen until the channel is sensed idle; then transmit immediately. Whereas nonpersistent stations are deferential, 1-persistent stations are selfish. If two or more stations are waiting to transmit, a collision is guaranteed. Things get sorted out only after the collision. A compromise that attempts to reduce collisions, like nonpersistent, and reduce idle time, like 1-persistent, is p-persistent. The rules are as follows: 1. If the medium is idle, transmit with probability p, and delay one time unit with probability 11 - p2. The time unit is typically equal to the maximum propagation delay. 2. If the medium is busy, continue to listen until the channel is idle and repeat step 1. 3. If transmission is delayed one time unit, repeat step 1. The question arises as to what is an effective value of p. The main problem to avoid is one of instability under heavy load. Consider the case in which n stations have frames to send while a transmission is taking place. At the end of the 488 CHAPTER 16 / HIGH-SPEED LANS transmission, the expected number of stations that will attempt to transmit is equal to the number of stations ready to transmit times the probability of transmitting, or np. If np is greater than 1, on average multiple stations will attempt to transmit and there will be a collision. What is more, as soon as all these stations realize that their transmission suffered a collision, they will be back again, almost guaranteeing more collisions. Worse yet, these retries will compete with new transmissions from other stations, further increasing the probability of collision. Eventually, all stations will be trying to send, causing continuous collisions, with throughput dropping to zero. To avoid this catastrophe, np must be less than one for the expected peaks of n; therefore, if a heavy load is expected to occur with some regularity, p must be small. However, as p is made smaller, stations must wait longer to attempt transmission. At low loads, this can result in very long delays. For example, if only a single station desires to transmit, the expected number of iterations of step 1 is 1/p (see Problem 16.2). Thus, if p = 0.1, at low load, a station will wait an average of 9 time units before transmitting on an idle line. Description of CSMA/CD CSMA, although more efficient than ALOHA or slotted ALOHA, still has one glaring inefficiency. When two frames collide, the medium remains unusable for the duration of transmission of both damaged frames. For long frames, compared to propagation time, the amount of wasted capacity can be considerable. This waste can be reduced if a station continues to listen to the medium while transmitting. This leads to the following rules for CSMA/CD: 1. If the medium is idle, transmit; otherwise, go to step 2. 2. If the medium is busy, continue to listen until the channel is idle, then transmit immediately. 3. If a collision is detected during transmission, transmit a brief jamming signal to assure that all stations know that there has been a collision and then cease transmission. 4. After transmitting the jamming signal, wait a random amount of time, referred to as the backoff, then attempt to transmit again (repeat from step 1). Figure 16.2 illustrates the technique for a baseband bus. The upper part of the figure shows a bus LAN layout. At time t0 , station A begins transmitting a packet addressed to D. At t1 , both B and C are ready to transmit. B senses a transmission and so defers. C, however, is still unaware of A’s transmission (because the leading edge of A’s transmission has not yet arrived at C) and begins its own transmission. When A’s transmission reaches C, at t2 , C detects the collision and ceases transmission. The effect of the collision propagates back to A, where it is detected some time later, t3 , at which time A ceases transmission. With CSMA/CD, the amount of wasted capacity is reduced to the time it takes to detect a collision. Question: How long does that take? Let us consider the case of a baseband bus and consider two stations as far apart as possible. For example, in Figure 16.2, suppose that station A begins a transmission and that just before that transmission reaches D, D is ready to transmit. Because D is not yet aware of A’s transmission, 16.2 / ETHERNET A B C 489 D TIME t0 A's transmission C's transmission Signal on bus TIME t1 A's transmission C's transmission Signal on bus TIME t2 A's transmission C's transmission Signal on bus TIME t3 A's transmission C's transmission Signal on bus Figure 16.2 CSMA/CD Operation it begins to transmit. A collision occurs almost immediately and is recognized by D. However, the collision must propagate all the way back to A before A is aware of the collision. By this line of reasoning, we conclude that the amount of time that it takes to detect a collision is no greater than twice the end-to-end propagation delay. An important rule followed in most CSMA/CD systems, including the IEEE standard, is that frames should be long enough to allow collision detection prior to the end of transmission. If shorter frames are used, then collision detection does not occur, and CSMA/CD exhibits the same performance as the less efficient CSMA protocol. For a CSMA/CD LAN, the question arises as to which persistence algorithm to use. You may be surprised to learn that the algorithm used in the IEEE 802.3 standard is 1-persistent. Recall that both nonpersistent and p-persistent have performance problems. In the nonpersistent case, capacity is wasted because the medium will generally remain idle following the end of a transmission even if there are stations waiting to send. In the p-persistent case, p must be set low enough to avoid 490 CHAPTER 16 / HIGH-SPEED LANS instability, with the result of sometimes atrocious delays under light load. The 1-persistent algorithm, which means, after all, that p = 1, would seem to be even more unstable than p-persistent due to the greed of the stations. What saves the day is that the wasted time due to collisions is mercifully short (if the frames are long relative to propagation delay), and with random backoff, the two stations involved in a collision are unlikely to collide on their next tries. To ensure that backoff maintains stability, IEEE 802.3 and Ethernet use a technique known as binary exponential backoff. A station will attempt to transmit repeatedly in the face of repeated collisions. For the first 10 retransmission attempts, the mean value of the random delay is doubled. This mean value then remains the same for 6 additional attempts. After 16 unsuccessful attempts, the station gives up and reports an error. Thus, as congestion increases, stations back off by larger and larger amounts to reduce the probability of collision. The beauty of the 1-persistent algorithm with binary exponential backoff is that it is efficient over a wide range of loads. At low loads, 1-persistence guarantees that a station can seize the channel as soon as it goes idle, in contrast to the non- and p-persistent schemes. At high loads, it is at least as stable as the other techniques. However, one unfortunate effect of the backoff algorithm is that it has a last-in firstout effect; stations with no or few collisions will have a chance to transmit before stations that have waited longer. For baseband bus, a collision should produce substantially higher voltage swings than those produced by a single transmitter. Accordingly, the IEEE standard dictates that the transmitter will detect a collision if the signal on the cable at the transmitter tap point exceeds the maximum that could be produced by the transmitter alone. Because a transmitted signal attenuates as it propagates, there is a potential problem: If two stations far apart are transmitting, each station will receive a greatly attenuated signal from the other. The signal strength could be so small that when it is added to the transmitted signal at the transmitter tap point, the combined signal does not exceed the CD threshold. For this reason, among others, the IEEE standard restricts the maximum length of coaxial cable to 500 m for 10BASE5 and 200 m for 10BASE2. A much simpler collision detection scheme is possible with the twisted-pair star-topology approach (Figure 15.12). In this case, collision detection is based on logic rather than sensing voltage magnitudes. For any hub, if there is activity (signal) on more than one input, a collision is assumed. A special signal called the collision presence signal is generated. This signal is generated and sent out as long as activity is sensed on any of the input lines. This signal is interpreted by every node as an occurrence of a collision. MAC Frame Figure 16.3 depicts the frame format for the 802.3 protocol. It consists of the following fields: • Preamble: A 7-octet pattern of alternating 0s and 1s used by the receiver to establish bit synchronization. • Start Frame Delimiter (SFD): The sequence 10101011, which indicates the actual start of the frame and enables the receiver to locate the first bit of the rest of the frame. • Destination Address (DA): Specifies the station(s) for which the frame is intended. It may be a unique physical address, a group address, or a global address. 46 to 1500 octets 7 octets Preamble SFD DA SA FCS 1 S F D 6 6 2 0 DA SA Length LLC data 0 P a d 4 FCS = Start of frame delimiter = Destination address = Source address = Frame check sequence Figure 16.3 IEEE 802.3 Frame Format 16.2 / ETHERNET 491 492 CHAPTER 16 / HIGH-SPEED LANS • Source Address (SA): Specifies the station that sent the frame. • Length/Type: Length of LLC data field in octets, or Ethernet Type field, depending on whether the frame conforms to the IEEE 802.3 standard or the earlier Ethernet specification. In either case, the maximum frame size, excluding the Preamble and SFD, is 1518 octets. • LLC Data: Data unit supplied by LLC. • Pad: Octets added to ensure that the frame is long enough for proper CD operation. • Frame Check Sequence (FCS): A 32-bit cyclic redundancy check, based on all fields except preamble, SFD, and FCS. IEEE 802.3 10-Mbps Specifications (Ethernet) The IEEE 802.3 committee has defined a number of alternative physical configurations. This is both good and bad. On the good side, the standard has been responsive to evolving technology. On the bad side, the customer, not to mention the potential vendor, is faced with a bewildering array of options. However, the committee has been at pains to ensure that the various options can be easily integrated into a configuration that satisfies a variety of needs. Thus, the user that has a complex set of requirements may find the flexibility and variety of the 802.3 standard to be an asset. To distinguish the various implementations that are available, the committee has developed a concise notation: 6data rate in Mbps6signaling method7maximum segment length in hundreds of meters7 The defined alternatives for 10-Mbps are as follows:1 • 10BASE5: Specifies the use of 50-ohm coaxial cable and Manchester digital signaling.2 The maximum length of a cable segment is set at 500 meters. The length of the network can be extended by the use of repeaters. A repeater is transparent to the MAC level; as it does no buffering, it does not isolate one segment from another. So, for example, if two stations on different segments attempt to transmit at the same time, their transmissions will collide. To avoid looping, only one path of segments and repeaters is allowed between any two stations. The standard allows a maximum of four repeaters in the path between any two stations, extending the effective length of the medium to 2.5 kilometers. • 10BASE2: Similar to 10BASE5 but uses a thinner cable, which supports fewer taps over a shorter distance than the 10BASE5 cable. This is a lower-cost alternative to 10BASE5. • 10BASE-T: Uses unshielded twisted pair in a star-shaped topology. Because of the high data rate and the poor transmission qualities of unshielded twistedpair, the length of a link is limited to 100 meters. As an alternative, an optical fiber link may be used. In this case, the maximum length is 500 m. 1 There is also a 10BROAD36 option, specifying a 10-Mbps broadband bus; this option is rarely used. See Section 5.1. 2 16.2 / ETHERNET 493 Table 16.2 IEEE 802.3 10-Mbps Physical Layer Medium Alternatives 10BASE5 10BASE2 10BASE-T 10BASE-FP Transmission medium Coaxial cable (50 ohm) Coaxial cable (50 ohm) Unshielded twisted pair 850-nm optical fiber pair Signaling technique Baseband (Manchester) Baseband (Manchester) Baseband (Manchester) Manchester/ on-off Topology Bus Bus Star Star Maximum segment length (m) 500 185 100 500 Nodes per segment 100 30 — 33 Cable diameter (mm) 10 5 0.4 to 0.6 62.5/125 mm • 10BASE-F: Contains three specifications: a passive-star topology for interconnecting stations and repeaters with up to 1 km per segment; a point-to-point link that can be used to connect stations or repeaters at up to 2 km; a point-topoint link that can be used to connect repeaters at up to 2 km. Note that 10BASE-T and 10-BASE-F do not quite follow the notation: “T” stands for twisted pair and “F” stands for optical fiber. Table 16.2 summarizes the remaining options. All of the alternatives listed in the table specify a data rate of 10 Mbps. IEEE 802.3 100-Mbps Specifications (Fast Ethernet) Fast Ethernet refers to a set of specifications developed by the IEEE 802.3 committee to provide a low-cost, Ethernet-compatible LAN operating at 100 Mbps. The blanket designation for these standards is 100BASE-T. The committee defined a number of alternatives to be used with different transmission media. Table 16.3 summarizes key characteristics of the 100BASE-T options. All of the 100BASE-T options use the IEEE 802.3 MAC protocol and frame format. 100BASE-X refers to a set of options that use two physical links between nodes; one for transmission and one for reception. 100BASE-TX makes use of shielded twisted pair (STP) or high-quality (Category 5) unshielded twisted pair (UTP). 100BASEFX uses optical fiber. In many buildings, any of the 100BASE-X options requires the installation of new cable. For such cases, 100BASE-T4 defines a lower-cost alternative that can use Category 3, voice-grade UTP in addition to the higher-quality Category 5 UTP.3 To achieve the 100-Mbps data rate over lower-quality cable, 100BASE-T4 dictates the use of four twisted-pair lines between nodes, with the data transmission making use of three pairs in one direction at a time. For all of the 100BASE-T options, the topology is similar to that of 10BASE-T, namely a star-wire topology. 3 See Chapter 4 for a discussion of Category 3 and Category 5 cable. 494 CHAPTER 16 / HIGH-SPEED LANS Table 16.3 IEEE 802.3 100BASE-T Physical Layer Medium Alternatives 100BASE-TX 100BASE-FX 100BASE-T4 Transmission medium 2 pair, STP 2 pair, Category 5 UTP 2 optical fibers 4 pair, Category 3, 4, or 5 UTP Signaling technique MLT-3 MLT-3 4B5B, NRZI 8B6T, NRZ Data rate 100 Mbps 100 Mbps 100 Mbps 100 Mbps Maximum segment length 100 m 100 m 100 m 100 m Network span 200 m 200 m 400 m 200 m 100BASE-X For all of the transmission media specified under 100BASE-X, a unidirectional data rate of 100 Mbps is achieved transmitting over a single link (single twisted pair, single optical fiber). For all of these media, an efficient and effective signal encoding scheme is required. The one chosen is referred to as 4B/5B-NRZI. This scheme is further modified for each option. See Appendix 16A for a description. The 100BASE-X designation includes two physical medium specifications, one for twisted pair, known as 100BASE-TX, and one for optical fiber, known as 100-BASE-FX. 100BASE-TX makes use of two pairs of twisted-pair cable, one pair used for transmission and one for reception. Both STP and Category 5 UTP are allowed. The MTL-3 signaling scheme is used (described in Appendix 16A). 100BASE-FX makes use of two optical fiber cables, one for transmission and one for reception. With 100BASE-FX, a means is needed to convert the 4B/5B-NRZI code group stream into optical signals. The technique used is known as intensity modulation. A binary 1 is represented by a burst or pulse of light; a binary 0 is represented by either the absence of a light pulse or a light pulse at very low intensity. 100BASE-T4 100BASE-T4 is designed to produce a 100-Mbps data rate over lower-quality Category 3 cable, thus taking advantage of the large installed base of Category 3 cable in office buildings. The specification also indicates that the use of Category 5 cable is optional. 100BASE-T4 does not transmit a continuous signal between packets, which makes it useful in battery-powered applications. For 100BASE-T4 using voice-grade Category 3 cable, it is not reasonable to expect to achieve 100 Mbps on a single twisted pair. Instead, 100BASE-T4 specifies that the data stream to be transmitted is split up into three separate data streams, each with an effective data rate of 3313Mbps. Four twisted pairs are used. Data are transmitted using three pairs and received using three pairs. Thus, two of the pairs must be configured for bidirectional transmission. As with 100BASE-X, a simple NRZ encoding scheme is not used for 100BASE-T4. This would require a signaling rate of 33 Mbps on each twisted pair and does not provide synchronization. Instead, a ternary signaling scheme known as 8B6T is used (described in Appendix 16A). Full-Duplex Operation A traditional Ethernet is half duplex: a station can either transmit or receive a frame, but it cannot do both simultaneously. With full-duplex operation, a station can transmit and receive simultaneously. If a 16.2 / ETHERNET 495 100-Mbps Ethernet ran in full-duplex mode, the theoretical transfer rate becomes 200 Mbps. Several changes are needed to operate in full-duplex mode. The attached stations must have full-duplex rather than half-duplex adapter cards. The central point in the star wire cannot be a simple multiport repeater but rather must be a switching hub. In this case each station constitutes a separate collision domain. In fact, there are no collisions and the CSMA/CD algorithm is no longer needed. However, the same 802.3 MAC frame format is used and the attached stations can continue to execute the CSMA/CD algorithm, even though no collisions can ever be detected. Mixed Configuration One of the strengths of the Fast Ethernet approach is that it readily supports a mixture of existing 10-Mbps LANs and newer 100-Mbps LANs. For example, the 100-Mbps technology can be used as a backbone LAN to support a number of 10-Mbps hubs. Many of the stations attach to 10-Mbps hubs using the 10BASE-T standard. These hubs are in turn connected to switching hubs that conform to 100BASE-T and that can support both 10-Mbps and 100Mbps links. Additional high-capacity workstations and servers attach directly to these 10/100 switches. These mixed-capacity switches are in turn connected to 100-Mbps hubs using 100-Mbps links. The 100-Mbps hubs provide a building backbone and are also connected to a router that provides connection to an outside WAN. Gigabit Ethernet In late 1995, the IEEE 802.3 committee formed a High-Speed Study Group to investigate means for conveying packets in Ethernet format at speeds in the gigabits per second range. The strategy for Gigabit Ethernet is the same as that for Fast Ethernet. While defining a new medium and transmission specification, Gigabit Ethernet retains the CSMA/CD protocol and Ethernet format of its 10-Mbps and 100-Mbps predecessors. It is compatible with 100BASE-T and 10BASE-T, preserving a smooth migration path. As more organizations move to 100BASE-T, putting huge traffic loads on backbone networks, demand for Gigabit Ethernet has intensified. Figure 16.4 shows a typical application of Gigabit Ethernet. A 1-Gbps switching hub provides backbone connectivity for central servers and high-speed workgroup hubs. Each workgroup LAN switch supports both 1-Gbps links, to connect to the backbone LAN switch and to support high-performance workgroup servers, and 100-Mbps links, to support high-performance workstations, servers, and 100-Mbps LAN switches. Media Access Layer The 1000-Mbps specification calls for the same CSMA/CD frame format and MAC protocol as used in the 10-Mbps and 100-Mbps version of IEEE 802.3. For shared-medium hub operation (Figure 15.13b), there are two enhancements to the basic CSMA/CD scheme: • Carrier extension: Carrier extension appends a set of special symbols to the end of short MAC frames so that the resulting block is at least 4096 bit-times in duration, up from the minimum 512 bit-times imposed at 10 and 100 Mbps. This is so that the frame length of a transmission is longer than the propagation time at 1 Gbps. 496 CHAPTER 16 / HIGH-SPEED LANS 1 Gbps switching hub Central servers 100 Mbps link 1 Gbps link 100/1000-Mbps Hubs Workgroup Workgroup Figure 16.4 Example Gigabit Ethernet Configuration • Frame bursting: This feature allows for multiple short frames to be transmitted consecutively, up to a limit, without relinquishing control for CSMA/CD between frames. Frame bursting avoids the overhead of carrier extension when a single station has a number of small frames ready to send. With a switching hub (Figure 15.13c), which provides dedicated access to the medium, the carrier extension and frame bursting features are not needed. This is because data transmission and reception at a station can occur simultaneously without interference and with no contention for a shared medium. Physical Layer The current 1-Gbps specification for IEEE 802.3 includes the following physical layer alternatives (Figure 16.5): • 1000BASE-SX: This short-wavelength option supports duplex links of up to 275 m using 62.5-mm multimode or up to 550 m using 50-mm multimode fiber. Wavelengths are in the range of 770 to 860 nm. • 1000BASE-LX: This long-wavelength option supports duplex links of up to 550 m of 62.5-mm or 50-mm multimode fiber or 5 km of 10-mm single-mode fiber. Wavelengths are in the range of 1270 to 1355 nm. 16.2 / ETHERNET 497 10-m single-mode fiber 1000BASE-LX 50-m multimode fiber 62.5-m multimode fiber 50-m multimode fiber 1000BASE-SX 62.5-m multimode fiber 1000BASE-T 1000BASE-CX Category5 UTP Shielded cable 25 m 50 m 250 m 500 m Maximum distance 2500 m 5000 m Figure 16.5 Gigabit Ethernet Medium Options (log scale) • 1000BASE-CX: This option supports 1-Gbps links among devices located within a single room or equipment rack, using copper jumpers (specialized shielded twisted-pair cable that spans no more than 25 m). Each link is composed of a separate shielded twisted pair running in each direction. • 1000BASE-T: This option makes use of four pairs of Category 5 unshielded twisted pair to support devices over a range of up to 100 m. The signal encoding scheme used for the first three Gigabit Ethernet options just listed is 8B/10B, which is described in Appendix 16A. The signal-encoding scheme used for 1000BASE-T is 4D-PAM5, a complex scheme whose description is beyond our scope. 10-Gbps Ethernet With gigabit products still fairly new, attention has turned in the past several years to a 10-Gbps Ethernet capability. The principle driving requirement for 10 Gigabit Ethernet is the increase in Internet and intranet traffic. A number of factors contribute to the explosive growth in both Internet and intranet traffic: • An increase in the number of network connections • An increase in the connection speed of each end-station (e.g., 10 Mbps users moving to 100 Mbps, analog 56-kbps users moving to DSL and cable modems) • An increase in the deployment of bandwidth-intensive applications such as high-quality video • An increase in Web hosting and application hosting traffic 498 CHAPTER 16 / HIGH-SPEED LANS Initially network managers will use 10-Gbps Ethernet to provide high-speed, local backbone interconnection between large-capacity switches. As the demand for bandwidth increases, 10-Gbps Ethernet will be deployed throughout the entire network and will include server farm, backbone, and campuswide connectivity. This technology enables Internet service providers (ISPs) and network service providers (NSPs) to create very high-speed links at a low cost, between co-located, carrierclass switches and routers. The technology also allows the construction of metropolitan area networks (MANs) and WANs that connect geographically dispersed LANs between campuses or points of presence (PoPs). Thus, Ethernet begins to compete with ATM and other wide area transmission and networking technologies. In most cases where the customer requirement is data and TCP/IP transport, 10-Gbps Ethernet provides substantial value over ATM transport for both network end users and service providers: • No expensive, bandwidth-consuming conversion between Ethernet packets and ATM cells is required; the network is Ethernet, end to end. • The combination of IP and Ethernet offers quality of service and traffic policing capabilities that approach those provided by ATM, so that advanced traffic engineering technologies are available to users and providers. • A wide variety of standard optical interfaces (wavelengths and link distances) have been specified for 10-Gbps Ethernet, optimizing its operation and cost for LAN, MAN, or WAN applications. Figure 16.6 illustrates potential uses of 10-Gbps Ethernet. Higher-capacity backbone pipes will help relieve congestion for workgroup switches, where Gigabit Ethernet uplinks can easily become overloaded, and for server farms, where 1-Gbps network interface cards are already in widespread use. The goal for maximum link distances cover a range of applications: from 300 m to 40 km. The links operate in full-duplex mode only, using a variety of optical fiber physical media. Four physical layer options are defined for 10-Gbps Ethernet (Figure 16.7). The first three of these have two suboptions: an “R” suboption and a “W” suboption. The R designation refers to a family of physical layer implementations that use a signal encoding technique known as 64B/66B. The R implementations are designed for use over dark fiber, meaning a fiber optic cable that is not in use and that is not connected to any other equipment. The W designation refers to a family of physical layer implementations that also use 64B/66B signaling but that are then encapsulated to connect to SONET equipment. The four physical layer options are • 10GBASE-S (short): Designed for 850-nm transmission on multimode fiber. This medium can achieve distances up to 300 m. There are 10GBASE-SR and 10GBASE-SW versions. • 10GBASE-L (long): Designed for 1310-nm transmission on single-mode fiber. This medium can achieve distances up to 10 km. There are 10GBASE-LR and 10GBASE-LW versions. Workstations Server farm 10/100 Mbps 1 Gbps Workgroup switch Workgroup switch 10 Gbps 10 Gbps Example 10 Gigabit Ethernet Configuration Backbone switch 16.2 / ETHERNET Backbone switch Figure 16.6 Workgroup switch 499 500 CHAPTER 16 / HIGH-SPEED LANS 10GBASE-S (850 nm) 50-m multimode fiber 10GBASE-L (1310 nm) Single-mode fiber 10GBASE-E (1550 nm) Single-mode fiber 10GBASE-LX4 (1310 nm) 62.5-m multimode fiber Single-mode fiber 50-m multimode fiber 62.5-m multimode fiber 10 m 100 m 1 km 300 m Maximum distance 10 km 40 km 100 km Figure 16.7 10-Gbps Ethernet Distance Options (log scale) • 10GBASE-E (extended): Designed for 1550-nm transmission on single-mode fiber. This medium can achieve distances up to 40 km. There are 10GBASEER and 10GBASE-EW versions. • 10GBASE-LX4: Designed for 1310-nm transmission on single-mode or multimode fiber. This medium can achieve distances up to 10 km. This medium uses wavelength-division multiplexing (WDM) to multiplex the bit stream across four light waves. The success of Fast Ethernet, Gigabit Ethernet, and 10-Gbps Ethernet highlights the importance of network management concerns in choosing a network technology. Both ATM and Fiber Channel, explored later, may be technically superior choices for a high-speed backbone, because of their flexibility and scalability. However, the Ethernet alternatives offer compatibility with existing installed LANs, network management software, and applications. This compatibility has accounted for the survival of a nearly 30-year-old technology (CSMA/CD) in today’s fast-evolving network environment. 16.3 FIBRE CHANNEL As the speed and memory capacity of personal computers, workstations, and servers have grown, and as applications have become ever more complex with greater reliance on graphics and video, the requirement for greater speed in delivering data to the processor has grown. This requirement affects two methods of data communications with the processor: I/O channel and network communications. An I/O channel is a direct point-to-point or multipoint communications link, predominantly hardware based and designed for high speed over very short distances. The I/O channel transfers data between a buffer at the source device and a buffer at the destination device, moving only the user contents from one device to another, without regard to the format or meaning of the data. The logic associated 16.3 / FIBRE CHANNEL 501 with the channel typically provides the minimum control necessary to manage the transfer plus hardware error detection. I/O channels typically manage transfers between processors and peripheral devices, such as disks, graphics equipment, CDROMs, and video I/O devices. A network is a collection of interconnected access points with a software protocol structure that enables communication. The network typically allows many different types of data transfer, using software to implement the networking protocols and to provide flow control, error detection, and error recovery. As we have discussed in this book, networks typically manage transfers between end systems over local, metropolitan, or wide area distances. Fibre Channel is designed to combine the best features of both technologies— the simplicity and speed of channel communications with the flexibility and interconnectivity that characterize protocol-based network communications. This fusion of approaches allows system designers to combine traditional peripheral connection, host-to-host internetworking, loosely coupled processor clustering, and multimedia applications in a single multiprotocol interface. The types of channel-oriented facilities incorporated into the Fibre Channel protocol architecture include • Data-type qualifiers for routing frame payload into particular interface buffers • Link-level constructs associated with individual I/O operations • Protocol interface specifications to allow support of existing I/O channel architectures, such as the Small Computer System Interface (SCSI) The types of network-oriented facilities incorporated into the Fibre Channel protocol architecture include • Full multiplexing of traffic between multiple destinations • Peer-to-peer connectivity between any pair of ports on a Fibre Channel network • Capabilities for internetworking to other connection technologies Depending on the needs of the application, either channel or networking approaches can be used for any data transfer. The Fibre Channel Industry Association, which is the industry consortium promoting Fibre Channel, lists the following ambitious requirements that Fibre Channel is intended to satisfy [FCIA01]: • Full-duplex links with two fibers per link • Performance from 100 Mbps to 800 Mbps on a single line (full-duplex 200 Mbps to 1600 Mbps per link) • Support for distances up to 10 km • Small connectors • High-capacity utilization with distance insensitivity • Greater connectivity than existing multidrop channels • Broad availability (i.e., standard components) • Support for multiple cost/performance levels, from small systems to supercomputers • Ability to carry multiple existing interface command sets for existing channel and network protocols 502 CHAPTER 16 / HIGH-SPEED LANS The solution was to develop a simple generic transport mechanism based on point-to-point links and a switching network. This underlying infrastructure supports a simple encoding and framing scheme that in turn supports a variety of channel and network protocols. Fibre Channel Elements The key elements of a Fibre Channel network are the end systems, called nodes, and the network itself, which consists of one or more switching elements. The collection of switching elements is referred to as a fabric. These elements are interconnected by point-to-point links between ports on the individual nodes and switches. Communication consists of the transmission of frames across the point-to-point links. Each node includes one or more ports, called N_ports, for interconnection. Similarly, each fabric-switching element includes multiple ports, called F_ports. Interconnection is by means of bidirectional links between ports. Any node can communicate with any other node connected to the same fabric using the services of the fabric. All routing of frames between N_ports is done by the fabric. Frames may be buffered within the fabric, making it possible for different nodes to connect to the fabric at different data rates. A fabric can be implemented as a single fabric element with attached nodes (a simple star arrangement) or as a more general network of fabric elements, as shown in Figure 16.8. In either case, the fabric is responsible for buffering and for routing frames between source and destination nodes. The Fibre Channel network is quite different from the IEEE 802 LANs. Fibre Channel is more like a traditional circuit-switching or packet-switching network, in contrast to the typical shared-medium LAN. Thus, Fibre Channel need not be concerned with medium access control issues. Because it is based on a switching network, the Fibre Channel scales easily in terms of N_ports, data rate, and distance covered. Fibre Channel switching fabric Figure 16.8 Fibre Channel Network 16.3 / FIBRE CHANNEL 503 This approach provides great flexibility. Fibre Channel can readily accommodate new transmission media and data rates by adding new switches and F_ports to an existing fabric. Thus, an existing investment is not lost with an upgrade to new technologies and equipment. Further, the layered protocol architecture accommodates existing I/O interface and networking protocols, preserving the preexisting investment. Fibre Channel Protocol Architecture The Fibre Channel standard is organized into five levels. Each level defines a function or set of related functions. The standard does not dictate a correspondence between levels and actual implementations, with a specific interface between adjacent levels. Rather, the standard refers to the level as a “document artifice” used to group related functions. The layers are as follows: • FC-0 Physical Media: Includes optical fiber for long-distance applications, coaxial cable for high speeds over short distances, and shielded twisted pair for lower speeds over short distances • FC-1 Transmission Protocol: Defines the signal encoding scheme • FC-2 Framing Protocol: Deals with defining topologies, frame format, flow and error control, and grouping of frames into logical entities called sequences and exchanges • FC-3 Common Services: Includes multicasting • FC-4 Mapping: Defines the mapping of various channel and network protocols to Fibre Channel, including IEEE 802, ATM, IP, and the Small Computer System Interface (SCSI) Fibre Channel Physical Media and Topologies One of the major strengths of the Fibre Channel standard is that it provides a range of options for the physical medium, the data rate on that medium, and the topology of the network (Table 16.4). Transmission Media The transmission media options that are available under Fibre Channel include shielded twisted pair, video coaxial cable, and optical fiber. Standardized data rates range from 100 Mbps to 3.2 Gbps. Point-to-point link distances range from 33 m to 10 km. Table 16.4 Maximum Distance for Fibre Channel Media Types 800 Mbps 400 Mbps 200 Mbps 100 Mbps Single mode fiber 10 km 10 km 10 km — 50-Mm multimode fiber 0.5 km 1 km 2 km — 62.5-Mm multimode fiber 175 m 1 km 1 km — Video coaxial cable 50 m 71 m 100 m 100 m Miniature coaxial cable 14 m 19 m 28 m 42 m Shielded twisted pair 28 m 46 m 57 m 80 m 504 CHAPTER 16 / HIGH-SPEED LANS Topologies The most general topology supported by Fibre Channel is referred to as a fabric or switched topology. This is an arbitrary topology that includes at least one switch to interconnect a number of end systems. The fabric topology may also consist of a number of switches forming a switched network, with some or all of these switches also supporting end nodes. Routing in the fabric topology is transparent to the nodes. Each port in the configuration has a unique address. When data from a node are transmitted into the fabric, the edge switch to which the node is attached uses the destination port address in the incoming data frame to determine the destination port location. The switch then either delivers the frame to another node attached to the same switch or transfers the frame to an adjacent switch to begin routing the frame to a remote destination. The fabric topology provides scalability of capacity: As additional ports are added, the aggregate capacity of the network increases, thus minimizing congestion and contention and increasing throughput. The fabric is protocol independent and largely distance insensitive. The technology of the switch itself and of the transmission links connecting the switch to nodes may be changed without affecting the overall configuration. Another advantage of the fabric topology is that the burden on nodes is minimized. An individual Fibre Channel node (end system) is only responsible for managing a simple point-to-point connection between itself and the fabric; the fabric is responsible for routing between ports and error detection. In addition to the fabric topology, the Fibre Channel standard defines two other topologies. With the point-to-point topology there are only two ports, and these are directly connected, with no intervening fabric switches. In this case there is no routing. The arbitrated loop topology is a simple, low-cost topology for connecting up to 126 nodes in a loop. The arbitrated loop operates in a manner roughly equivalent to the token ring protocols that we have seen. Topologies, transmission media, and data rates may be combined to provide an optimized configuration for a given site. Figure 16.9 is an example that illustrates the principal applications of Fiber Channel. Prospects for Fibre Channel Fibre Channel is backed by an industry interest group known as the Fibre Channel Association and a variety of interface cards for different applications are available. Fibre Channel has been most widely accepted as an improved peripheral device interconnect, providing services that can eventually replace such schemes as SCSI. It is a technically attractive solution to general high-speed LAN requirements but must compete with Ethernet and ATM LANs. Cost and performance issues should dominate the manager’s consideration of these competing technologies. 16.4 RECOMMENDED READING AND WEB SITES [STAL00] covers in greater detail the LAN systems discussed in this chapter. [SPUR00] provides a concise but thorough overview of all of the 10-Mbps through 1-Gbps 802.3 systems, including configuration guidelines for a single segment of each media type, as well 16.4 / RECOMMENDED READING AND WEB SITES 505 1 Linking highperformance workstation clusters 5 Linking LANs and WANs to the backbone ATM WAN Fibre Channel switching fabric 2 Connecting mainframes to each other 4 Clustering disk farms 3 Giving server farms high-speed pipes Figure 16.9 Five Applications of Fibre Channel as guidelines for building multisegment Ethernets using a variety of media types. Two excellent treatments of both 100-Mbps and Gigabit Ethernet are [SEIF98] and [KADA98].A good survey article on Gigabit Ethernet is [FRAZ99]. [SACH96] is a good survey of Fibre Channel. A short but worthwhile treatment is [FCIA01]. FCIA01 Fibre Channel Industry Association. Fibre Channel Storage Area Networks. San Francisco: Fibre Channel Industry Association, 2001. FRAZ99 Frazier, H., and Johnson, H. “Gigabit Ethernet: From 100 to 1,000 Mbps.” IEEE Internet Computing, January/February 1999. KADA98 Kadambi, J.; Crayford, I.; and Kalkunte, M. Gigabit Ethernet. Upper Saddle River, NJ: Prentice Hall, 1998. SACH96 Sachs. M., and Varma, A. “Fibre Channel and Related Standards.” IEEE Communications Magazine, August 1996. SEIF98 Seifert, R. Gigabit Ethernet. Reading, MA: Addison-Wesley, 1998. SPUR00 Spurgeon, C. Ethernet: The Definitive Guide. Cambridge, MA: O’Reilly and Associates, 2000. STAL00 Stallings, W. Local and Metropolitan Area Networks, Sixth Edition. Upper Saddle River, NJ: Prentice Hall, 2000. 506 CHAPTER 16 / HIGH-SPEED LANS Recommended Web sites: • Interoperability Lab: University of New Hampshire site for equipment testing for high-speed LANs • Charles Spurgeon’s Ethernet Web Site: Provides extensive information about Ethernet, including links and documents • IEEE 802.3 10-Gbps Ethernet Task Force: Latest documents • Fibre Channel Industry Association: Includes tutorials, white papers, links to vendors, and descriptions of Fibre Channel applications • CERN Fibre Channel Site: Includes tutorials, white papers, links to vendors, and descriptions of Fibre Channel applications • Storage Network Industry Association: An industry forum of developers, integrators, and IT professionals who evolve and promote storage networking technology and solutions 16.5 KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS Key Terms 1-persistent CSMA ALOHA binary exponential backoff carrier sense multiple access (CSMA) carrier sense multiple access with collision detection (CSMA/CD) collision Ethernet Fibre Channel full-duplex operation nonpersistent CSMA p-persistent CSMA repeater scrambling slotted ALOHA Review Questions 16.1. 16.2. 16.3. 16.4. 16.5. 16.6. 16.7. 16.8. 16.9. What is a server farm? Explain the three persistence protocols that can be used with CSMA. What is CSMA/CD? Explain binary exponential backoff. What are the transmission medium options for Fast Ethernet? How does Fast Ethernet differ from 10BASE-T, other than the data rate? In the context of Ethernet, what is full-duplex operation? List the levels of Fibre Channel and the functions of each level. What are the topology options for Fibre Channel? Problems 16.1 A disadvantage of the contention approach for LANs, such as CSMA/CD, is the capacity wasted due to multiple stations attempting to access the channel at the same time. Suppose that time is divided into discrete slots, with each of N stations attempting to transmit with probability p during each slot. What fraction of slots are wasted due to multiple simultaneous transmission attempts? 16.5 / KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS 507 16.2 For p-persistent CSMA, consider the following situation. A station is ready to transmit and is listening to the current transmission. No other station is ready to transmit, and there will be no other transmission for an indefinite period. If the time unit used in the protocol is T, show that the average number of iterations of step 1 of the protocol is 1/p and that therefore the expected time that the station will have to wait after the current transmission is q 1 1 T a - 1 b . Hint: Use the equality a iXi - 1 = . p 11 X22 i=1 16.3 The binary exponential backoff algorithm is defined by IEEE 802 as follows: The delay is an integral multiple of slot time. The number of slot times to delay before the nth retransmission attempt is chosen as a uniformly distributed random integer r in the range 0 … r 6 2 K, where K = min1n, 102. Slot time is, roughly, twice the round-trip propagation delay. Assume that two stations always have a frame to send. After a collision, what is the mean number of retransmission attempts before one station successfully retransmits? What is the answer if three stations always have frames to send? Describe the signal pattern produced on the medium by the Manchester-encoded preamble of the IEEE 802.3 MAC frame. Analyze the advantages of having the FCS field of IEEE 802.3 frames in the trailer of the frame rather than in the header of the frame. The most widely used MAC approach for a ring topology is token ring, defined in IEEE 802.5. The token ring technique is based on the use of a small frame, called a token, that circulates when all stations are idle. A station wishing to transmit must wait until it detects a token passing by. It then seizes the token by changing one bit in the token, which transforms it from a token to a start-of-frame sequence for a data frame. The station then appends and transmits the remainder of the fields needed to construct a data frame. When a station seizes a token and begins to transmit a data frame, there is no token on the ring, so other stations wishing to transmit must wait. The frame on the ring will make a round trip and be absorbed by the transmitting station. The transmitting station will insert a new token on the ring when both of the following conditions have been met: (1) The station has completed transmission of its frame. (2) The leading edge of the transmitted frame has returned (after a complete circulation of the ring) to the station. a. An option in IEEE 802.5, known as early token release, eliminates the second condition just listed. Under what conditions will early token release result in improved utilization? b. Are there any potential disadvantages to early token release? Explain. For a token ring LAN, suppose that the destination station removes the data frame and immediately sends a short acknowledgment frame to the sender rather than letting the original frame return to sender. How will this affect performance? Another medium access control technique for rings is the slotted ring. A number of fixed-length slots circulate continuously on the ring. Each slot contains a leading bit to designate the slot as empty or full. A station wishing to transmit waits until an empty slot arrives, marks the slot full, and inserts a frame of data as the slot goes by. The full slot makes a complete round trip, to be marked empty again by the station that marked it full. In what sense are the slotted ring and token ring protocols the complement (dual) of each other? Consider a slotted ring of length 10 km with a data rate of 10 Mbps and 500 repeaters, each of which introduces a 1-bit delay. Each slot contains room for one source address byte, one destination address byte, two data bytes, and five control bits for a total length of 37 bits. How many slots are on the ring? With 8B6T coding, the effective data rate on a single channel is 33 Mbps with a signaling rate of 25 Mbaud. If a pure ternary scheme were used, what is the effective data rate for a signaling rate of 25 Mbaud? 16.4 16.5 16.6 16.7 16.8 16.9 16.10 508 CHAPTER 16 / HIGH-SPEED LANS 16.11 16.12 16.13 16.14 16.15 16.16 With 8B6T coding, the DC algorithm sometimes negates all of the ternary symbols in a code group. How does the receiver recognize this condition? How does the receiver discriminate between a negated code group and one that has not been negated? For example, the code group for data byte 00 is + - 0 0 + - and the code group for data byte 38 is the negation of that, namely, - + 0 0 - + . Draw the MLT decoder state diagram that corresponds to the encoder state diagram of Figure 16.10. For the bit stream 0101110, sketch the waveforms for NRZ-L, NRZI, Manchester, and Differential Manchester, and MLT-3. Consider a token ring system with N stations in which a station that has just transmitted a frame releases a new token only after the station has completed transmission of its frame and the leading edge of the transmitted frame has returned (after a complete circulation of the ring) to the station. a. Show that utilization can be approximated by 1/11 + a/N2 for a 6 1 and by 1/1a + a/N2 for a 7 1, b. What is the asymptotic value of utilization as N increases? a. Verify that the division illustrated in Figure 16.18a corresponds to the implementation of Figure 16.17a by calculating the result step by step using Equation (16.7). b. Verify that the multiplication illustrated in Figure 16.18b corresponds to the implementation of Figure 16.17b by calculating the result step by step using Equation (16.8). Draw a figure similar to Figure 16.17 for the MLT-3 scrambler and descrambler. APPENDIX 16A DIGITAL SIGNAL ENCODING FOR LANS In Chapter 5, we looked at some of the common techniques for encoding digital data for transmission, including Manchester and differential Manchester, which are used in some of the LAN standards. In this appendix, we examine some additional encoding schemes referred to in this chapter. 4B/5B-NRZI This scheme, which is actually a combination of two encoding algorithms, is used for 100BASE-X. To understand the significance of this choice, first consider the simple alternative of a NRZ (nonreturn to zero) coding scheme. With NRZ, one signal state represents binary one and one signal state represents binary zero. The disadvantage of this approach is its lack of synchronization. Because transitions on the medium are unpredictable, there is no way for the receiver to synchronize its clock to the transmitter. A solution to this problem is to encode the binary data to guarantee the presence of transitions. For example, the data could first be encoded using Manchester encoding. The disadvantage of this approach is that the efficiency is only 50%. That is, because there can be as many as two transitions per bit time, a signaling rate of 200 million signal elements per second (200 Mbaud) is needed to achieve a data rate of 100 Mbps. This represents an unnecessary cost and technical burden. Greater efficiency can be achieved using the 4B/5B code. In this scheme, encoding is done 4 bits at a time; each 4 bits of data are encoded into a symbol with five code bits, such that each code bit contains a single signal element; the block of five code bits is called a code group. In effect, each set of 4 bits is encoded as 5 bits. The efficiency is thus raised to 80%: 100 Mbps is achieved with 125 Mbaud. To ensure synchronization, there is a second stage of encoding: Each code bit of the 4B/5B stream is treated as a binary value and encoded using nonreturn to zero inverted (NRZI) (see Figure 5.2). In this code, a binary 1 is represented with a transition at the APPENDIX 16A DIGITAL SIGNAL ENCODING FOR LANS 509 beginning of the bit interval and a binary 0 is represented with no transition at the beginning of the bit interval; there are no other transitions. The advantage of NRZI is that it employs differential encoding. Recall from Chapter 5 that in differential encoding, the signal is decoded by comparing the polarity of adjacent signal elements rather than the absolute value of a signal element. A benefit of this scheme is that it is generally more reliable to detect a transition in the presence of noise and distortion than to compare a value to a threshold. Now we are in a position to describe the 4B/5B code and to understand the selections that were made. Table 16.5 shows the symbol encoding. Each 5-bit code group pattern is shown, together with its NRZI realization. Because we are encoding 4 bits with a 5-bit pattern, only 16 of the 32 possible patterns are needed for data encoding. The codes selected to represent Table 16.5 4B/5B Code Groups (page 1 of 2) Data Input (4 bits) Code Group (5 bits) NRZI pattern Interpretation 0000 11110 Data 0 0001 01001 Data 1 0010 10100 Data 2 0011 10101 Data 3 0100 01010 Data 4 0101 01011 Data 5 0110 01110 Data 6 0111 01111 Data 7 1000 10010 Data 8 1001 10011 Data 9 1010 10110 Data A 1011 10111 Data B 1100 11010 Data C 1101 11011 Data D 1110 11100 Data E 1111 11101 Data F 11111 Idle 11000 Start of stream delimiter, part 1 10001 Start of stream delimiter, part 2 01101 End of stream delimiter, part 1 00111 End of stream delimiter, part 2 00100 Transmit error Other Invalid codes 510 CHAPTER 16 / HIGH-SPEED LANS the 16 4-bit data blocks are such that a transition is present at least twice for each 5-code group code. No more than three zeros in a row are allowed across one or more code groups The encoding scheme can be summarized as follows: 1. A simple NRZ encoding is rejected because it does not provide synchronization; a string of 1s or 0s will have no transitions. 2. The data to be transmitted must first be encoded to assure transitions. The 4B/5B code is chosen over Manchester because it is more efficient. 3. The 4B/5B code is further encoded using NRZI so that the resulting differential signal will improve reception reliability. 4. The specific 5-bit patterns for the encoding of the 16 4-bit data patterns are chosen to guarantee no more than three zeros in a row to provide for adequate synchronization. Those code groups not used to represent data are either declared invalid or assigned special meaning as control symbols. These assignments are listed in Table 16.5. The nondata symbols fall into the following categories: • Idle: The idle code group is transmitted between data transmission sequences. It consists of a constant flow of binary ones, which in NRZI comes out as a continuous alternation between the two signal levels. This continuous fill pattern establishes and maintains synchronization and is used in the CSMA/CD protocol to indicate that the shared medium is idle. • Start of stream delimiter: Used to delineate the starting boundary of a data transmission sequence; consists of two different code groups. • End of stream delimiter: Used to terminate normal data transmission sequences; consists of two different code groups. • Transmit error: This code group is interpreted as a signaling error. The normal use of this indicator is for repeaters to propagate received errors. MLT-3 Although 4B/5B-NRZI is effective over optical fiber, it is not suitable as is for use over twisted pair. The reason is that the signal energy is concentrated in such a way as to produce undesirable radiated emissions from the wire. MLT-3, which is used on 100BASE-TX, is designed to overcome this problem. The following steps are involved: 1. NRZI to NRZ conversion. The 4B/5B NRZI signal of the basic 100BASE-X is converted back to NRZ. 2. Scrambling. The bit stream is scrambled to produce a more uniform spectrum distribution for the next stage. 3. Encoder. The scrambled bit stream is encoded using a scheme known as MLT-3. 4. Driver. The resulting encoding is transmitted. The effect of the MLT-3 scheme is to concentrate most of the energy in the transmitted signal below 30 MHz, which reduces radiated emissions. This in turn reduces problems due to interference. The MLT-3 encoding produces an output that has a transition for every binary one and that uses three levels: a positive voltage 1+ V2, a negative voltage 1- V2, and no voltage (0). The encoding rules are best explained with reference to the encoder state diagram shown in Figure 16.10: 511 APPENDIX 16A DIGITAL SIGNAL ENCODING FOR LANS Input 0 0, from V Input 1 Input = 1 V Input 0 V Input 1 Input 0 Input 1 0, from V Input 0 Figure 16.10 MLT-3 Encoder State Diagram 1. If the next input bit is zero, then the next output value is the same as the preceding value. 2. If the next input bit is one, then the next output value involves a transition: (a) If the preceding output value was either + V or -V, then the next output value is 0. (b) If the preceding output value was 0, then the next output value is nonzero, and that output is of the opposite sign to the last nonzero output. Figure 16.11 provides an example. Every time there is an input of 1, there is a transition. The occurrences of + V and - V alternate. 8B6T The 8B6T encoding algorithm uses ternary signaling. With ternary signaling, each signal element can take on one of three values (positive voltage, negative voltage, zero voltage). A pure ternary code is one in which the full information-carrying capacity of the ternary signal is exploited. However, pure ternary is not attractive for the same reasons that a pure binary (NRZ) code is rejected: the lack of synchronization. However, there are schemes referred to as block-coding methods that approach the efficiency of ternary and overcome this disadvantage. A new block-coding scheme known as 8B6T is used for 100BASE-T4. With 8B6T the data to be transmitted are handled in 8-bit blocks. Each block of 8 bits is mapped into a code group of 6 ternary symbols. The stream of code groups is then transmitted in round-robin fashion across the three output channels (Figure 16.12). Thus the ternary transmission rate on each output channel is 1 0 0 1 1 1 1 0 V 0 V Figure 16.11 Example of MLT-3 Encoding 0 0 0 1 0 1 1 1 0 1 1 1 0 512 CHAPTER 16 / HIGH-SPEED LANS 6T (25 Mbaud) Stream of 8-bit bytes 8B (100 Mbps) 8B6T coder Splitter 6T (25 MBaud) 6T (25 MBaud) Figure 16.12 8B6T Transmission Scheme 6 1 * 33 = 25 Mbaud 8 3 Table 16.6 shows a portion of the 8B6T code table; the full table maps all possible 8-bit patterns into a unique code group of 6 ternary symbols. The mapping was chosen with two requirements in mind: synchronization and DC balance. For synchronization, the codes were chosen so to maximize the average number of transitions per code group. The second requirement is to maintain DC balance, so that the average voltage on the line is zero. For this purpose all of the selected code groups either have an equal number of positive and negative symbols or an excess of one positive symbol. To maintain balance, a DC balancing algorithm is used. In essence, this algorithm monitors the cumulative weight of the of all code groups transmitted on a single pair. Each code group has a weight of 0 or 1. To maintain balance, the Table 16.6 Portion of 8B6T Code Table Data Octet 6T Code Group Data Octet 6T Code Group Data Octet 6T Code Group Data Octet 6T Code Group 00 + - 00 + - 10 +0+ - -0 20 00 - + + - 30 + - 00 - + 01 0+ - + -0 11 + +0-0- 21 - - + 00 + 31 0+ - - +0 02 + -0+ -0 12 +0+ -0- 22 + + -0+ - 32 + -0- +0 03 -0+ + -0 13 0+ + -0- 23 + + -0- + 33 -0+ - +0 04 -0+0+ - 14 0+ + - -0 24 00 + 0 - + 34 -0+0- + 05 0+ - -0+ 15 + + 00 - - 25 00 + 0 + - 35 0+ - +0- 06 + -0-0+ 16 +0+0- - 26 00 - 00 + 36 + -0+0- 07 -0+ -0+ 17 0+ +0- - 27 - - + + + - 37 -0+ +0- 08 - + 00 + - 18 0+ -0+ - 28 -0- + +0 38 - + 00 - + 0- + - +0 09 0- + + -0 19 0+ -0- + 29 - -0+0+ 39 0A - +0+ -0 1A 0+ - + + - 2A -0- +0+ 3A - +0- +0 0B +0- + -0 1B 0 + - 00 + 2B 0- - +0+ 3B +0- - +0 0C +0-0+ - 1C 0 - + 00 + 2C 0- - + +0 3C +0-0- + 0D 0- + -0+ 1D 0- + + + - 2D - - 00 + + 3D 0- + +0- 0E - +0-0+ 1E 0- +0- + 2E -0-0+ + 3E - +0+0- 0F +0- -0+ 1F 0- +0+ - 2F 0- -0+ + 3F +0- +0- APPENDIX 16A DIGITAL SIGNAL ENCODING FOR LANS 513 algorithm may negate a transmitted code group (change all + symbols to - symbols and all symbols to + symbols), so that the cumulative weight at the conclusion of each code group is always either 0 or 1. 8B/10B The encoding scheme used for Fibre Channel and Gigabit Ethernet is 8B/10B, in which each 8 bits of data is converted into 10 bits for transmission. This scheme has a similar philosophy to the 4B/5B scheme discussed earlier. The 8B/10B scheme, developed and patented by IBM for use in its 200-megabaud ESCON interconnect system [WIDM83], is more powerful than 4B/5B in terms of transmission characteristics and error detection capability. The developers of this code list the following advantages: • It can be implemented with relatively simple and reliable transceivers at low cost. • It is well balanced, with minimal deviation from the occurrence of an equal number of 1 and 0 bits across any sequence. • It provides good transition density for easier clock recovery. • It provides useful error detection capability. The 8B/10B code is an example of the more general mBnB code, in which m binary source bits are mapped into n binary bits for transmission. Redundancy is built into the code to provide the desired transmission features by making n 7 m. The 8B/10B code actually combines two other codes, a 5B/6B code and a 3B/4B code. The use of these two codes is simply an artifact that simplifies the definition of the mapping and the implementation; the mapping could have been defined directly as an 8B/10B code. In any case, a mapping is defined that maps each of the possible 8-bit source blocks into a 10-bit code block. There is also a function called disparity control. In essence, this function keeps track of the excess of zeros over ones or ones over zeros. An excess in either direction is referred to as a disparity. If there is a disparity, and if the current code block would add to that disparity, then the disparity control block complements the 10-bit code block. This has the effect of either eliminating the disparity or at least moving it in the opposite direction of the current disparity. 64B/66B The 8B/10B code results in an overhead of 25%. To achieve greater efficiency at a higher data rate, the 64B/66B code maps a block of 64 bits into an output block of 66 bits, for an overhead of just 3%.This code is used in 10-Gbps Ethernet. Figure 16.13 illustrates the process.The entire 64-bit data field (scrambled) 01 (a) Data octets only 10 8-bit type Combined 56-bit data /control field (scrambled) (b) Mixed data/control block Figure 16.13 Encoding Using 64B/66B 514 CHAPTER 16 / HIGH-SPEED LANS Ethernet frame, including control fields, is considered “data” for this pzrocess. In addition, there are nondata symbols, called “control,” and which include those defined for the 4B/5B code discussed previously plus a few other symbols. For a 64-bit block consisting only of data octets, the entire block is scrambled. Two synchronization bits, with values 01, are prepended to the scrambled block. For a block consisting a mixture of control and data octets, a 56-bit block is used, which is scrambled; a 66-bit block is formed by prepending two synchronization bits, with values 10, and an 8-bit control type field, which defines the control functions included with this block. In both cases, scrambling is performed using the polynomial 1 + X39 + X58. See Appendix 16C for a discussion of scrambling. The two-bit synchronization field provides block alignment and a means of synchronizing when long streams of bits are sent. Note that in this case, no specific coding technique is used to achieve the desired synchronization and frequency of transitions. Rather the scrambling algorithm provides the required characteristics. APPENDIX 16B PERFORMANCE ISSUES The choice of a LAN or MAN architecture is based on many factors, but one of the most important is performance. Of particular concern is the behavior (throughput, response time) of the network under heavy load. In this appendix, we provide an introduction to this topic. A more detailed discussion can be found in [STAL00]. The Effect of Propagation Delay and Transmission Rate In Chapter 7, we introduced the parameter a, defined as a = Propagation time Transmission time In that context, we were concerned with a point-to-point link, with a given propagation time between the two endpoints and a transmission time for either a fixed or average frame size. It was shown that a could be expressed as a = Length of data link in bits Length of frame in bits This parameter is also important in the context of LANs and MANs, and in fact determines an upper bound on utilization. Consider a perfectly efficient access mechanism that allows only one transmission at a time. As soon as one transmission is over, another station begins transmitting. Furthermore, the transmission is pure data; no overhead bits. What is the maximum possible utilization of the network? It can be expressed as the ratio of total throughput of the network to its data rate: Throughput U = (16.1) Data rate Now define, as in Chapter 7: R = data rate of the channel d = maximum distance between any two stations V = velocity of signal propagation L = average or fixed frame length APPENDIX 16B PERFORMANCE ISSUES 515 The throughput is just the number of bits transmitted per unit time. A frame contains L bits, and the amount of time devoted to that frame is the actual transmission time (L/R) plus the propagation delay (d/V). Thus Throughput = L d>V + L>R (16.2) But by our preceding definition of a, a = d>V L>R = Rd LV (16.3) Substituting (16.2) and (16.3) into (16.1), U = 1 1 + a (16.4) Note that this differs from Equation (7.4) in Appendix 7A. This is because the latter assumed a half-duplex protocol (no piggybacked acknowledgments). So utilization varies with a. This can be grasped intuitively by studying Figure 16.14, which shows a baseband bus with two stations as far apart as possible (worst case) that take turns sending frames. If we normalize time such that frame transmission time = 1, then the propagation time = a. For a 6 1, the sequence of events is as follows: 1. A station begins transmitting at t0 . 2. Reception begins at t0 + a. 3. Transmission is completed at t0 + 1. 4. Reception ends at t0 + 1 + a. 5. The other station begins transmitting. t0 t0 Start of transmission Start of transmission t0 1 t0 a End of transmission Start of reception t0 a t0 1 Start of reception End of transmission t0 1 a t0 1 a End of reception (a) Transmission time 1; propagation time a 1 End of reception (b) Transmission time 1; propagation time a 1 Figure 16.14 The Effect of a on Utilization for Baseband Bus 516 CHAPTER 16 / HIGH-SPEED LANS a1 a1 t0 t0 t0 1 t0 a t0 a t0 1 t0 1 a t0 1 a Figure 16.15 The Effect of a on Utilization for Ring For a 7 1, events 2 and 3 are interchanged. In both cases, the total time for one “turn” is 1 + a, but the transmission time is only 1, for a utilization of 1/11 + a2. The same effect can be seen to apply to a ring network in Figure 16.15. Here we assume that one station transmits and then waits to receive its own transmission before any other station transmits. The identical sequence of events just outlined applies. Typical values of a range from about 0.01 to 0.1 for LANs and 0.1 to well over 1.0 for MANs. Table 16.7 gives some representative values for a bus topology. As can be seen, for larger and/or higher-speed networks, utilization suffers. For this reason, the restriction of only one frame at a time is lifted for high-speed LANs. APPENDIX 16B PERFORMANCE ISSUES 517 Table 16.7 Representative Values of a Data Rate (Mbps) Frame Size (bits) Network Length (km) a 1/11 a 2 1 100 1 0.05 0.95 1 1,000 10 0.05 0.95 1 100 10 0.5 0.67 10 100 1 0.5 0.67 10 1,000 1 0.05 0.95 10 1,000 10 0.5 0.67 10 10,000 10 0.05 0.95 100 35,000 200 2.8 0.26 100 1,000 50 25 0.04 Finally, the preceding analysis assumes a “perfect” protocol, for which a new frame can be transmitted as soon as an old frame is received. In practice, the MAC protocol adds overhead that reduces utilization. This is demonstrated in the next subsection. Simple Performance Model of CSMA/CD The purpose of this section is to give the reader some insight into the performance of CSMA/CD by developing a simple performance models. It is hoped that this exercise will aid in understanding the results of more rigorous analyses. For these models we assume a local network with N active stations and a maximum normalized propagation delay of a. To simplify the analysis, we assume that each station is always prepared to transmit a frame. This allows us to develop an expression for maximum achievable utilization (U). Although this should not be construed to be the sole figure of merit for a local network, it is the single most analyzed figure of merit and does permit useful performance comparisons. Consider time on a bus medium to be organized into slots whose length is twice the end-to-end propagation delay. This is a convenient way to view the activity on the medium; the slot time is the maximum time, from the start of transmission, required to detect a collision. Assume that there are N active stations. Clearly, if each station always has a frame to transmit and does so, there will be nothing but collisions on the line. So we assume that each station restrains itself to transmitting during an available slot with probability P. Time on the medium consists of two types of intervals. First is a transmission interval, which lasts 1/(2a) slots. Second is a contention interval, which is a sequence of slots with either a collision or no transmission in each slot. The throughput, normalized to system capacity, is the proportion of time spent in transmission intervals. To determine the average length of a contention interval, we begin by computing A, the probability that exactly one station attempts a transmission in a slot and therefore acquires the medium. This is the binomial probability that any one station attempts to transmit and the others do not: A = a N 1 b P 11 - P2N - 1 1 = NP11 - P2N - 1 518 CHAPTER 16 / HIGH-SPEED LANS This function takes on a maximum over P when P = 1/N: A = 11 - 1/N2N - 1 We are interested in the maximum because we want to calculate the maximum throughput of the medium. It should be clear that the maximum throughput will be achieved if we maximize the probability of successful seizure of the medium. Therefore, the following rule should be enforced: During periods of heavy usage, a station should restrain its offered load to 1/N. (This assumes that each station knows the value of N. To derive an expression for maximum possible throughput, we live with this assumption.) On the other hand, during periods of light usage, maximum utilization cannot be achieved because the load is too low; this region is not of interest here. Now we can estimate the mean length of a contention interval, w, in slots: i slots in a row with a collision or no E[w] = a i * Pr £ transmission followed by a slot with one ≥ i=1 transmission q q = a i11 - A2iA i=1 The summation converges to E[w] = 1 - A A We can now determine the maximum utilization, which is the length of a transmission interval as a proportion of a cycle consisting of a transmission and a contention interval: U = 1>2a 1>2a + 11 - A2>A = 1 1 + 2a11 - A2>A (16.5) Figure 16.16 shows normalized throughput as a function of a for two values of N. Throughput declines as a increases. This is to be expected. Figure 16.16 also shows throughput as a function of N. The performance of CSMA/CD decreases because of the increased likelihood of collision or no transmission. It is interesting to note the asymptotic value of U as N increases. We need to know that 1 N-1 1 b = . Then we have lim a 1 N: q N e 1 (16.6) lim U = q N: 1 + 3.44a APPENDIX 16C SCRAMBLING For some digital data encoding techniques, a long string of binary zeros or ones in a transmission can degrade system performance. Also, other transmission properties, such as spectral properties, are enhanced if the data are more nearly of a random nature rather than constant or repetitive. A technique commonly used to improve signal quality is scrambling and descrambling. The scrambling process tends to make the data appear more random. 519 1.0 1.0 0.8 0.8 0.6 0.6 Throughput Throughput APPENDIX 16C SCRAMBLING 0.4 N2 a 0.1 0.4 a 1.0 N 10 0.2 0.0 0.1 0.2 2 3 4 5 6 7 89 1 a 2 3 0.0 4 5 6 7 89 10 5 10 15 20 25 N Figure 16.16 CSMA/CD Throughput as a Function of a and N The scrambling process consists of a feedback shift register, and the matching descrambler consists of a feedforward shift register. An example is shown in Figure 16.17. In this example, the scrambled data sequence may be expressed as follows: Bm = A m Bm - 3 Bm - 5 (16.7) where indicates the exclusive-or operation. The shift register is initialized to contain all zeros. The descrambled sequence is 5 4 5 4 3 2 1 A 3 2 1 B (a) Scrambler Figure 16.17 Scrambler and Descrambler B (b) Descrambler A 520 CHAPTER 16 / HIGH-SPEED LANS P 1 0 0 1 0 1 1 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 1 0 0 0 0 0 1 1 1 0 0 0 1 1 0 1 0 0 1 1 0 0 0 0 0 1 1 1 1 0 1 1 0 0 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 0 0 0 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 1 0 1 0 0 1 1 B A 1 1 0 0 0 1 0 1 1 0 1 0 0 0 (a) Scrambling CA 1 0 1 1 1 0 0 0 1 1 1 1 0 1 1 1 0 0 0 1 1 1 0 1 1 1 0 0 0 1 1 0 1 1 0 1 1 1 0 0 0 1 1 0 1 0 0 1 1 0 1 0 1 0 1 0 0 0 0 0 1 1 1 B P (b) Descrambling Figure 16.18 Example of Scrambling with P1X2 1 X 3 X 5 Cm = Bm Bm - 3 Bm - 5 = 1A m Bm - 3 Bm - 52 Bm - 3 Bm - 5 = A m1 Bm - 3 Bm - 32 1Bm - 5 Bm - 52 = Am (16.8) As can be seen, the descrambled output is the original sequence. We can represent this process with the use of polynomials. Thus, for this example, the polynomial is P1X2 = 1 + X3 + X5. The input is divided by this polynomial to produce the scrambled sequence. At the receiver the received scrambled signal is multiplied by the same polynomial to reproduce the original input. Figure 16.18 is an example using the polynomial P(X) and an input of 101010100000111.4 The scrambled transmission, produced by dividing by P(X) (100101), is 101110001101001. When this number is multiplied by P(X), we get the 4 We use the convention that the leftmost bit is the first bit presented to the scrambler; thus the bits can be labeled A 0A 1A 2 Á . Similarly, the polynomial is converted to a bit string from left to right.The polynomial B0 + B1X + B2X2 + . . . is represented as B0B1B2 Á . APPENDIX 16C SCRAMBLING 521 original input. Note that the input sequence contains the periodic sequence 10101010 as well as a long string of zeros. The scrambler effectively removes both patterns. For the MLT-3 scheme, which is used for 100BASE-TX, the scrambling equation is: Bm = A m X9 X11 In this case the shift register consists of nine elements, used in the same manner as the 5-element register in Figure 16.17. However, in the case of MLT-3, the shift register is not fed by the output Bm . Instead, after each bit transmission, the register is shifted one unit up, and the result of the previous XOR is fed into the first unit. This can be expressed as: 2 … i … 9 Xi1t2 = Xi - 11t - 12; X11t2 = X91t - 12 X111t - 12 If the shift register contains all zeros, no scrambling occurs (we just have Bm = A m) the above equations produce no change in the shift register. Accordingly, the standard calls for initializing the shift register with all ones and re-initializing the register to all ones when it takes on a value of all zeros. For the 4D-PAM5 scheme, two scrambling equations are used, one in each direction: Bm = A m Bm - 13 Bm - 33 Bm = A m Bm - 20 Bm - 33 17 CHAPTER WIRELESS LANS 17.1 Overview 17.2 Wireless LAN Technology 17.3 IEEE 802.11 Architecture and Services 17.4 IEEE 802.11 Medium Access Control 17.5 IEEE 802.11 Physical Layer 17.6 IEEE 802.11 Security Considerations 17.7 Recommended Reading and Web Sites 17.8 Key Terms, Review Questions, and Problems 522 17.1 / OVERVIEW 523 Investigators have published numerous reports of birds taking turns vocalizing; the bird spoken to gave its full attention to the speaker and never vocalized at the same time, as if the two were holding a conversation. Researchers and scholars who have studied the data on avian communication carefully write the (a) the communication code of birds such has crows has not been broken by any means; (b) probably all birds have wider vocabularies than anyone realizes; and (c) greater complexity and depth are recognized in avian communication as research progresses. —The Human Nature of Birds, Theodore Barber KEY POINTS • • • • The principal technologies used for wireless LANs are infrared, spread spectrum, and narrowband microwave. The IEEE 802.11 standard defines a set of services and physical layer options for wireless LANs. The IEEE 802.11 services include managing associations, delivering data, and security. The IEEE 802.11 physical layer includes infrared and spread spectrum and covers a range of data rates. In just the past few years, wireless LANs have come to occupy a significant niche in the local area network market. Increasingly, organizations are finding that wireless LANs are an indispensable adjunct to traditional wired LANs, to satisfy requirements for mobility, relocation, ad hoc networking, and coverage of locations difficult to wire. This chapter provides a survey of wireless LANs. We begin with an overview that looks at the motivations for using wireless LANs and summarize the various approaches in current use. The next section examines the three principal types of wireless LANs, classified according to transmission technology: infrared, spread spectrum, and narrowband microwave. The most prominent specification for wireless LANs was developed by the IEEE 802.11 working group. The remainder of the chapter focuses on this standard. 17.1 OVERVIEW As the name suggests, a wireless LAN is one that makes use of a wireless transmission medium. Until relatively recently, wireless LANs were little used. The reasons 524 CHAPTER 17 / WIRELESS LANS for this included high prices, low data rates, occupational safety concerns, and licensing requirements. As these problems have been addressed, the popularity of wireless LANs has grown rapidly. In this section, we survey the key wireless LAN application areas and then look at the requirements for and advantages of wireless LANs. Wireless LAN Applications [PAHL95] lists four application areas for wireless LANs: LAN extension, crossbuilding interconnect, nomadic access, and ad hoc networks. Let us consider each of these in turn. LAN Extension Early wireless LAN products, introduced in the late 1980s, were marketed as substitutes for traditional wired LANs. A wireless LAN saves the cost of the installation of LAN cabling and eases the task of relocation and other modifications to network structure. However, this motivation for wireless LANs was overtaken by events. First, as awareness of the need for LANs became greater, architects designed new buildings to include extensive prewiring for data applications. Second, with advances in data transmission technology, there is an increasing reliance on twisted pair cabling for LANs and, in particular, Category 3 and Category 5 unshielded twisted pair. Most older buildings are already wired with an abundance of Category 3 cable, and many newer buildings are prewired with Category 5. Thus, the use of a wireless LAN to replace wired LANs has not happened to any great extent. However, in a number of environments, there is a role for the wireless LAN as an alternative to a wired LAN. Examples include buildings with large open areas, such as manufacturing plants, stock exchange trading floors, and warehouses; historical buildings with insufficient twisted pair and where drilling holes for new wiring is prohibited; and small offices where installation and maintenance of wired LANs is not economical. In all of these cases, a wireless LAN provides an effective and more attractive alternative. In most of these cases, an organization will also have a wired LAN to support servers and some stationary workstations. For example, a manufacturing facility typically has an office area that is separate from the factory floor but that must be linked to it for networking purposes. Therefore, typically, a wireless LAN will be linked into a wired LAN on the same premises. Thus, this application area is referred to as LAN extension. Figure 17.1 indicates a simple wireless LAN configuration that is typical of many environments. There is a backbone wired LAN, such as Ethernet, that supports servers, workstations, and one or more bridges or routers to link with other networks. In addition, there is a control module (CM) that acts as an interface to a wireless LAN. The control module includes either bridge or router functionality to link the wireless LAN to the backbone. It includes some sort of access control logic, such as a polling or token-passing scheme, to regulate the access from the end systems. Note that some of the end systems are standalone devices, such as a workstation or a server. Hubs or other user modules (UMs) that control a number of stations off a wired LAN may also be part of the wireless LAN configuration. The configuration of Figure 17.1 can be referred to as a single-cell wireless LAN; all of the wireless end systems are within range of a single control module. 17.1 / OVERVIEW 525 Ethernet Server UM 10-Mbps Ethernet switch CM UM user module CM control module 100-Mbps Ethernet switch Bridge or router Figure 17.1 Example Single-Cell Wireless LAN Configuration Another common configuration, suggested by Figure 17.2, is a multiple-cell wireless LAN. In this case, there are multiple control modules interconnected by a wired LAN. Each control module supports a number of wireless end systems within its transmission range. For example, with an infrared LAN, transmission is limited to a single room; therefore, one cell is needed for each room in an office building that requires wireless support. Cross-Building Interconnect Another use of wireless LAN technology is to connect LANs in nearby buildings, be they wired or wireless LANs. In this case, a point-to-point wireless link is used between two buildings. The devices so connected are typically bridges or routers. This single point-to-point link is not a LAN per se, but it is usual to include this application under the heading of wireless LAN. Nomadic Access Nomadic access provides a wireless link between a LAN hub and a mobile data terminal equipped with an antenna, such as a laptop computer or notepad computer. One example of the utility of such a connection is to enable an employee returning from a trip to transfer data from a personal portable computer to a server in the office. Nomadic access is also useful in an extended environment such as a campus or a business operating out of a cluster of buildings. In both of 526 CHAPTER 17 / WIRELESS LANS Frequency 2 UM UM UM UM UM UM Frequency 1 CM Frequency 3 UM CM CM UM UM UM UM UM 100-Mbps Ethernet switch Bridge or router Figure 17.2 Example Multiple-Cell Wireless LAN Configuration these cases, users may move around with their portable computers and may wish access to the servers on a wired LAN from various locations. Ad Hoc Networking An ad hoc network is a peer-to-peer network (no centralized server) set up temporarily to meet some immediate need. For example, a group of employees, each with a laptop or palmtop computer, may convene in a conference room for a business or classroom meeting. The employees link their computers in a temporary network just for the duration of the meeting. Figure 17.3 suggests the differences between a wireless LAN that supports LAN extension and nomadic access requirements and an ad hoc wireless LAN. In the former case, the wireless LAN forms a stationary infrastructure consisting of one or more cells with a control module for each cell. Within a cell, there may be a number of stationary end systems. Nomadic stations can move from one cell to another. In contrast, there is no infrastructure for an ad hoc network. Rather, a peer collection of stations within range of each other may dynamically configure themselves into a temporary network. Wireless LAN Requirements A wireless LAN must meet the same sort of requirements typical of any LAN, including high capacity, ability to cover short distances, full connectivity among attached stations, and broadcast capability. In addition, there are a number of 17.1 / OVERVIEW 527 High-speed backbone wired LAN Nomadic station Cell (a) Infrastructure wireless LAN (b) Ad hoc LAN Figure 17.3 Wireless LAN Configurations requirements specific to the wireless LAN environment. The following are among the most important requirements for wireless LANs: • Throughput: The medium access control protocol should make as efficient use as possible of the wireless medium to maximize capacity. • Number of nodes: Wireless LANs may need to support hundreds of nodes across multiple cells. • Connection to backbone LAN: In most cases, interconnection with stations on a wired backbone LAN is required. For infrastructure wireless LANs, this is easily accomplished through the use of control modules that connect to both 528 CHAPTER 17 / WIRELESS LANS • • • • • • • types of LANs. There may also need to be accommodation for mobile users and ad hoc wireless networks. Service area: A typical coverage area for a wireless LAN has a diameter of 100 to 300 m. Battery power consumption: Mobile workers use battery-powered workstations that need to have a long battery life when used with wireless adapters. This suggests that a MAC protocol that requires mobile nodes to monitor access points constantly or engage in frequent handshakes with a base station is inappropriate. Typical wireless LAN implementations have features to reduce power consumption while not using the network, such as a sleep mode. Transmission robustness and security: Unless properly designed, a wireless LAN may be especially vulnerable to interference and eavesdropping. The design of a wireless LAN must permit reliable transmission even in a noisy environment and should provide some level of security from eavesdropping. Collocated network operation: As wireless LANs become more popular, it is quite likely for two or more wireless LANs to operate in the same area or in some area where interference between the LANs is possible. Such interference may thwart the normal operation of a MAC algorithm and may allow unauthorized access to a particular LAN. License-free operation: Users would prefer to buy and operate wireless LAN products without having to secure a license for the frequency band used by the LAN. Handoff/roaming: The MAC protocol used in the wireless LAN should enable mobile stations to move from one cell to another. Dynamic configuration: The MAC addressing and network management aspects of the LAN should permit dynamic and automated addition, deletion, and relocation of end systems without disruption to other users. 17.2 WIRELESS LAN TECHNOLOGY Wireless LANs are generally categorized according to the transmission technique that is used. All current wireless LAN products fall into one of the following categories: • Infrared (IR) LANs: An individual cell of an IR LAN is limited to a single room, because infrared light does not penetrate opaque walls. • Spread spectrum LANs: This type of LAN makes use of spread spectrum transmission technology. In most cases, these LANs operate in the ISM (industrial, scientific, and medical) microwave bands so that no Federal Communications Commission (FCC) licensing is required for their use in the United States. Infrared LANs Optical wireless communication in the infrared portion of the spectrum is commonplace in most homes, where it is used for a variety of remote control devices. 17.2 / WIRELESS LAN TECHNOLOGY 529 More recently, attention has turned to the use of infrared technology to construct wireless LANs. In this section, we begin with a comparison of the characteristics of infrared LANs with those of radio LANs and then look at some of the details of infrared LANs. Strengths and Weaknesses Infrared offers a number of significant advantages over microwave approaches. The spectrum for infrared is virtually unlimited, which presents the possibility of achieving extremely high data rates. The infrared spectrum is unregulated worldwide, which is not true of some portions of the microwave spectrum. In addition, infrared shares some properties of visible light that make it attractive for certain types of LAN configurations. Infrared light is diffusely reflected by light-colored objects; thus it is possible to use ceiling reflection to achieve coverage of an entire room. Infrared light does not penetrate walls or other opaque objects. This has two advantages: First, infrared communications can be more easily secured against eavesdropping than microwave; and second, a separate infrared installation can be operated in every room in a building without interference, enabling the construction of very large infrared LANs. Another strength of infrared is that the equipment is relatively inexpensive and simple. Infrared data transmission typically uses intensity modulation, so that IR receivers need to detect only the amplitude of optical signals, whereas most microwave receivers must detect frequency or phase. The infrared medium also exhibits some drawbacks. Many indoor environments experience rather intense infrared background radiation, from sunlight and indoor lighting. This ambient radiation appears as noise in an infrared receiver, requiring the use of transmitters of higher power than would otherwise be required and also limiting the range. However, increases in transmitter power are limited by concerns of eye safety and excessive power consumption. Transmission Techniques Three alternative transmission techniques are in common use for IR data transmission: the transmitted signal can be focused and aimed (as in a remote TV control); it can be radiated omnidirectionally; or it can be reflected from a light-colored ceiling. Directed-beam IR can be used to create point-to-point links. In this mode, the range depends on the emitted power and on the degree of focusing. A focused IR data link can have a range of kilometers. Such ranges are not needed for constructing indoor wireless LANs. However, an IR link can be used for cross-building interconnect between bridges or routers located in buildings within a line of sight of each other. One indoor use of point-to-point IR links is to set up a ring LAN. A set of IR transceivers can be positioned so that data circulate around them in a ring configuration. Each transceiver supports a workstation or a hub of stations, with the hub providing a bridging function. An omnidirectional configuration involves a single base station that is within line of sight of all other stations on the LAN. Typically, this station is mounted on the ceiling. The base station acts as a multiport repeater. The ceiling transmitter broadcasts an omnidirectional signal that can be received by all of the other IR transceivers in the area. These other transceivers transmit a directional beam aimed at the ceiling base unit. 530 CHAPTER 17 / WIRELESS LANS In a diffused configuration, all of the IR transmitters are focused and aimed at a point on a diffusely reflecting ceiling. IR radiation striking the ceiling is reradiated omnidirectionally and picked up by all of the receivers in the area. Spread Spectrum LANs Currently, the most popular type of wireless LAN uses spread spectrum techniques. Configuration Except for quite small offices, a spread spectrum wireless LAN makes use of a multiple-cell arrangement, as was illustrated in Figure 17.2. Adjacent cells make use of different center frequencies within the same band to avoid interference. Within a given cell, the topology can be either hub or peer to peer. The hub topology is indicated in Figure 17.2. In a hub topology, the hub is typically mounted on the ceiling and connected to a backbone wired LAN to provide connectivity to stations attached to the wired LAN and to stations that are part of wireless LANs in other cells. The hub may also control access, as in the IEEE 802.11 point coordination function, described subsequently. The hub may also control access by acting as a multiport repeater with similar functionality to Ethernet multiport repeaters. In this case, all stations in the cell transmit only to the hub and receive only from the hub. Alternatively, and regardless of access control mechanism, each station may broadcast using an omnidirectional antenna so that all other stations in the cell may receive; this corresponds to a logical bus configuration. One other potential function of a hub is automatic handoff of mobile stations. At any time, a number of stations are dynamically assigned to a given hub based on proximity. When the hub senses a weakening signal, it can automatically hand off to the nearest adjacent hub. A peer-to-peer topology is one in which there is no hub.A MAC algorithm such as CSMA is used to control access. This topology is appropriate for ad hoc LANs. Transmission Issues A desirable, though not necessary, characteristic of a wireless LAN is that it be usable without having to go through a licensing procedure. The licensing regulations differ from one country to another, which complicates this objective. Within the United States, the FCC has authorized two unlicensed applications within the ISM band: spread spectrum systems, which can operate at up to 1 watt, and very low power systems, which can operate at up to 0.5 watts. Since the FCC opened up this band, its use for spread spectrum wireless LANs has become popular. In the United States, three microwave bands have been set aside for unlicensed spread spectrum use: 902–928 MHz (915-MHz band), 2.4–2.4835 GHz (2.4-GHz band), and 5.725–5.825 GHz (5.8-GHz band). Of these, the 2.4 GHz is also used in this manner in Europe and Japan. The higher the frequency, the higher the potential bandwidth, so the three bands are of increasing order of attractiveness from a capacity point of view. In addition, the potential for interference must be considered. There are a number of devices that operate at around 900 MHz, including cordless telephones, wireless microphones, and amateur radio. There are fewer devices operating at 2.4 GHz; one notable example is the microwave oven, which tends to have greater leakage of radiation with increasing age.At present there is little competition at the 5.8-GHz-band; however, the higher the frequency band, in general the more expensive the equipment. 17.3 / IEEE 802.11 ARCHITECTURE AND SERVICES 531 17.3 IEEE 802.11 ARCHITECTURE AND SERVICES In 1990, the IEEE 802 Committee formed a new working group, IEEE 802.11, specifically devoted to wireless LANs, with a charter to develop a MAC protocol and physical medium specification. The initial interest was in developing a wireless LAN operating in the ISM (industrial, scientific, and medical) band. Since that time, the demand for WLANs, at different frequencies and data rates, has exploded. Keeping pace with this demand, the IEEE 802.11 working group has issued an ever-expanding list of standards (Table 17.1). Table 17.2 briefly defines key terms used in the IEEE 802.11 standard. Table 17.1 IEEE 802.11 Standards Standard Scope Medium access control (MAC): One common MAC for WLAN applications IEEE 802.11 Physical layer: Infrared at 1 and 2 Mbps Physical layer: 2.4-GHz FHSS at 1 and 2 Mbps Physical layer: 2.4-GHz DSSS at 1 and 2 Mbps IEEE 802.11a Physical layer: 5-GHz OFDM at rates from 6 to 54 Mbps IEEE 802.11b Physical layer: 2.4-GHz DSSS at 5.5 and 11 Mbps IEEE 802.11c Bridge operation at 802.11 MAC layer IEEE 802.11d Physical layer: Extend operation of 802.11 WLANs to new regulatory domains (countries) IEEE 802.11e MAC: Enhance to improve quality of service and enhance security mechanisms IEEE 802.11f Recommended practices for multivendor access point interoperability IEEE 802.11g Physical layer: Extend 802.11b to data rates 720 Mbps IEEE 802.11h Physical/MAC: Enhance IEEE 802.11a to add indoor and outdoor channel selection and to improve spectrum and transmit power management IEEE 802.11i MAC: Enhance security and authentication mechanisms IEEE 802.11j Physical: Enhance IEEE 802.11a to conform to Japanese requirements IEEE 802.11k Radio resource measurement enhancements to provide interface to higher layers for radio and network measurements IEEE 802.11m Maintenance of IEEE 802.11-1999 standard with technical and editorial corrections IEEE 802.11n Physical/MAC: Enhancements to enable higher throughput IEEE 802.11p Physical/MAC: Wireless access in vehicular environments IEEE 802.11r Physical/MAC: Fast roaming (fast BSS transition) IEEE 802.11s Physical/MAC: ESS mesh networking IEEE 802.11,2 Recommended practice for the evaluation of 802.11 wireless performance IEEE 802.11u Physical/MAC: Interworking with external networks 532 CHAPTER 17 / WIRELESS LANS Table 17.2 IEEE 802.11 Terminology Access point (AP) Any entity that has station functionality and provides access to the distribution system via the wireless medium for associated stations Basic service set (BSS) A set of stations controlled by a single coordination function Coordination function The logical function that determines when a station operating within a BSS is permitted to transmit and may be able to receive PDUs Distribution system DS A system used to interconnect a set of BSSs and integrated LANs to create an (ESS) Extended service set (ESS) A set of one or more interconnected BSSs and integrated LANs that appear as a single BSS to the LLC layer at any station associated with one of these BSSs MAC protocol data unit (MPDU) The unit of data exchanged between two peer MAC entites using the services of the physical layer MAC service data unit (MSDU) Information that is delivered as a unit between MAC users Station Any device that contains an IEEE 802.11 conformant MAC and physical layer The Wi-Fi Alliance The first 802.11 standard to gain broad industry acceptance was 802.11b. Although 802.11b products are all based on the same standard, there is always a concern whether products from different vendors will successfully interoperate. To meet this concern, the Wireless Ethernet Compatibility Alliance (WECA), an industry consortium, was formed in 1999. This organization, subsequently renamed the Wi-Fi (Wireless Fidelity) Alliance, created a test suite to certify interoperability for 802.11b products. The term used for certified 802.11b products is Wi-Fi. Wi-Fi certification has been extended to 802.11g products,. The Wi-Fi Alliance has also developed a certification process for 802.11a products, called Wi-Fi5. The Wi-Fi Alliance is concerned with a range of market areas for WLANs, including enterprise, home, and hot spots. IEEE 802.11 Architecture Figure 17.4 illustrates the model developed by the 802.11 working group. The smallest building block of a wireless LAN is a basic service set (BSS), which consists of some number of stations executing the same MAC protocol and competing for access to the same shared wireless medium. A BSS may be isolated or it may connect to a backbone distribution system (DS) through an access point (AP). The AP functions as a bridge and a relay point. In a BSS, client stations do not communicate directly with one another. Rather, if one station in the BSS wants to communicate with another station in the same BSS, the MAC frame is first sent from the originating station to the AP, and then from the AP to the destination station. Similarly, a MAC frame from a station in the BSS to a remote station is sent from the local station to the AP and then relayed by the AP over the DS on its way to the destination station. The BSS generally corresponds to what is referred to as a cell in the literature. The DS can be a switch, a wired network, or a wireless network. 17.3 / IEEE 802.11 ARCHITECTURE AND SERVICES 533 IEEE 802.x LAN Extended service set Portal Distribution system Basic service set AP AP STA1 STA5 STA6 STA2 STA4 STA3 STA7 Basic service set STA station AP access point Figure 17.4 IEEE 802.11 Architecture When all the stations in the BSS are mobile stations, with no connection to other BSSs, the BSS is called an independent BSS (IBSS). An IBSS is typically an ad hoc network. In an IBSS, the stations all communicate directly, and no AP is involved. A simple configuration is shown in Figure 17.4, in which each station belongs to a single BSS; that is, each station is within wireless range only of other stations within the same BSS. It is also possible for two BSSs to overlap geographically, so that a single station could participate in more than one BSS. Further, the association between a station and a BSS is dynamic. Stations may turn off, come within range, and go out of range. An extended service set (ESS) consists of two or more basic service sets interconnected by a distribution system. Typically, the distribution system is a wired backbone LAN but can be any communications network. The extended service set appears as a single logical LAN to the logical link control (LLC) level. Figure 17.4 indicates that an access point (AP) is implemented as part of a station; the AP is the logic within a station that provides access to the DS by providing DS services in addition to acting as a station. To integrate the IEEE 802.11 architecture with a traditional wired LAN, a portal is used. The portal logic is implemented in a device, such as a bridge or router, that is part of the wired LAN and that is attached to the DS. IEEE 802.11 Services IEEE 802.11 defines nine services that need to be provided by the wireless LAN to provide functionality equivalent to that which is inherent to wired LANs. Table 17.3 lists the services and indicates two ways of categorizing them. 534 CHAPTER 17 / WIRELESS LANS Table 17.3 IEEE 802.11 Services Service Provider Used to Support Association Distribution system MSDU delivery Authentication Station LAN access and security Deauthentication Station LAN access and security Dissassociation Distribution system MSDU delivery Distribution Distribution system MSDU delivery Integration Distribution system MSDU delivery MSDU delivery Station MSDU delivery Privacy Station LAN access and security Reassocation Distribution system MSDU delivery 1. The service provider can be either the station or the DS. Station services are implemented in every 802.11 station, including AP stations. Distribution services are provided between BSSs; these services may be implemented in an AP or in another special-purpose device attached to the distribution system. 2. Three of the services are used to control IEEE 802.11 LAN access and confidentiality. Six of the services are used to support delivery of MAC service data units (MSDUs) between stations. The MSDU is a block of data passed down from the MAC user to the MAC layer; typically this is a LLC PDU. If the MSDU is too large to be transmitted in a single MAC frame, it may be fragmented and transmitted in a series of MAC frames. Fragmentation is discussed in Section 17.4. Following the IEEE 802.11 document, we next discuss the services in an order designed to clarify the operation of an IEEE 802.11 ESS network. MSDU delivery, which is the basic service, has already been mentioned. Services related to security are discussed in Section17.6. Distribution of Messages within a DS The two services involved with the distribution of messages within a DS are distribution and integration. Distribution is the primary service used by stations to exchange MAC frames when the frame must traverse the DS to get from a station in one BSS to a station in another BSS. For example, suppose a frame is to be sent from station 2 (STA 2) to STA 7 in Figure 17.4.The frame is sent from STA 2 to STA 1, which is the AP for this BSS. The AP gives the frame to the DS, which has the job of directing the frame to the AP associated with STA 5 in the target BSS. STA 5 receives the frame and forwards it to STA 7. How the message is transported through the DS is beyond the scope of the IEEE 802.11 standard. If the two stations that are communicating are within the same BSS, then the distribution service logically goes through the single AP of that BSS. The integration service enables transfer of data between a station on an IEEE 802.11 LAN and a station on an integrated IEEE 802.x LAN. The term integrated 17.4 / IEEE 802.11 MEDIUM ACCESS CONTROL 535 refers to a wired LAN that is physically connected to the DS and whose stations may be logically connected to an IEEE 802.11 LAN via the integration service. The integration service takes care of any address translation and media conversion logic required for the exchange of data. Association-Related Services The primary purpose of the MAC layer is to transfer MSDUs between MAC entities; this purpose is fulfilled by the distribution service. For that service to function, it requires information about stations within the ESS that is provided by the association-related services. Before the distribution service can deliver data to or accept data from a station, that station must be associated. Before looking at the concept of association, we need to describe the concept of mobility. The standard defines three transition types, based on mobility: • No transition: A station of this type is either stationary or moves only within the direct communication range of the communicating stations of a single BSS. • BSS transition: This is defined as a station movement from one BSS to another BSS within the same ESS. In this case, delivery of data to the station requires that the addressing capability be able to recognize the new location of the station. • ESS transition: This is defined as a station movement from a BSS in one ESS to a BSS within another ESS. This case is supported only in the sense that the station can move. Maintenance of upper-layer connections supported by 802.11 cannot be guaranteed. In fact, disruption of service is likely to occur. To deliver a message within a DS, the distribution service needs to know where the destination station is located. Specifically, the DS needs to know the identity of the AP to which the message should be delivered in order for that message to reach the destination station. To meet this requirement, a station must maintain an association with the AP within its current BSS. Three services relate to this requirement: • Association: Establishes an initial association between a station and an AP. Before a station can transmit or receive frames on a wireless LAN, its identity and address must be known. For this purpose, a station must establish an association with an AP within a particular BSS. The AP can then communicate this information to other APs within the ESS to facilitate routing and delivery of addressed frames. • Reassociation: Enables an established association to be transferred from one AP to another, allowing a mobile station to move from one BSS to another. • Disassociation: A notification from either a station or an AP that an existing association is terminated. A station should give this notification before leaving an ESS or shutting down. However, the MAC management facility protects itself against stations that disappear without notification. 17.4 IEEE 802.11 MEDIUM ACCESS CONTROL The IEEE 802.11 MAC layer covers three functional areas: reliable data delivery, access control, and security. This section covers the first two topics. 536 CHAPTER 17 / WIRELESS LANS Reliable Data Delivery As with any wireless network, a wireless LAN using the IEEE 802.11 physical and MAC layers is subject to considerable unreliability. Noise, interference, and other propagation effects result in the loss of a significant number of frames. Even with error correction codes, a number of MAC frames may not successfully be received. This situation can be dealt with by reliability mechanisms at a higher layer, such as TCP. However, timers used for retransmission at higher layers are typically on the order of seconds. It is therefore more efficient to deal with errors at the MAC level. For this purpose, IEEE 802.11 includes a frame exchange protocol. When a station receives a data frame from another station, it returns an acknowledgment (ACK) frame to the source station. This exchange is treated as an atomic unit, not to be interrupted by a transmission from any other station. If the source does not receive an ACK within a short period of time, either because its data frame was damaged or because the returning ACK was damaged, the source retransmits the frame. Thus, the basic data transfer mechanism in IEEE 802.11 involves an exchange of two frames. To further enhance reliability, a four-frame exchange may be used. In this scheme, a source first issues a Request to Send (RTS) frame to the destination. The destination then responds with a Clear to Send (CTS). After receiving the CTS, the source transmits the data frame, and the destination responds with an ACK. The RTS alerts all stations that are within reception range of the source that an exchange is under way; these stations refrain from transmission in order to avoid a collision between two frames transmitted at the same time. Similarly, the CTS alerts all stations that are within reception range of the destination that an exchange is under way. The RTS/CTS portion of the exchange is a required function of the MAC but may be disabled. Medium Access Control The 802.11 working group considered two types of proposals for a MAC algorithm: distributed access protocols, which, like Ethernet, distribute the decision to transmit over all the nodes using a carrier sense mechanism; and centralized access protocols, which involve regulation of transmission by a centralized decision maker. A distributed access protocol makes sense for an ad hoc network of peer workstations (typically an IBSS) and may also be attractive in other wireless LAN configurations that consist primarily of bursty traffic. A centralized access protocol is natural for configurations in which a number of wireless stations are interconnected with each other and some sort of base station that attaches to a backbone wired LAN; it is especially useful if some of the data is time sensitive or high priority. The end result for 802.11 is a MAC algorithm called DFWMAC (distributed foundation wireless MAC) that provides a distributed access control mechanism with an optional centralized control built on top of that. Figure 17.5 illustrates the architecture. The lower sublayer of the MAC layer is the distributed coordination function (DCF). DCF uses a contention algorithm to provide access to all traffic. Ordinary asynchronous traffic directly uses DCF. The point coordination function (PCF) is a centralized MAC algorithm used to provide contention-free service. PCF is built on top of DCF and exploits features of DCF to assure access for its users. Let us consider these two sublayers in turn. 17.4 / IEEE 802.11 MEDIUM ACCESS CONTROL 537 Logical link control Contention-free service Contention service Point coordination function (PCF) MAC layer Distributed coordination function (DCF) 2.4-Ghz frequencyhopping spread spectrum 1 Mbps 2 Mbps 2.4-Ghz directsequence spread spectrum 1 Mbps 2 Mbps Infrared 1 Mbps 2 Mbps IEEE 802.11 Figure 17.5 5-Ghz orthogonal FDM 6, 9, 12, 18, 24, 36, 48, 54 Mbps 2.4-Ghz direct sequence spread spectrum 5.5 Mbps 11 Mbps 2.4-Ghz DS-SS 6, 9, 12, 18, 24, 36, 48, 54 Mbps IEEE 802.11a IEEE 802.11b IEEE 802.11g IEEE 802.11 Protocol Architecture Distributed Coordination Function The DCF sublayer makes use of a simple CSMA (carrier sense multiple access) algorithm. If a station has a MAC frame to transmit, it listens to the medium. If the medium is idle, the station may transmit; otherwise the station must wait until the current transmission is complete before transmitting. The DCF does not include a collision detection function (i.e., CSMA/CD) because collision detection is not practical on a wireless network. The dynamic range of the signals on the medium is very large, so that a transmitting station cannot effectively distinguish incoming weak signals from noise and the effects of its own transmission. To ensure the smooth and fair functioning of this algorithm, DCF includes a set of delays that amounts to a priority scheme. Let us start by considering a single delay known as an interframe space (IFS). In fact, there are three different IFS values, but the algorithm is best explained by initially ignoring this detail. Using an IFS, the rules for CSMA access are as follows (Figure 17.6): 1. A station with a frame to transmit senses the medium. If the medium is idle, it waits to see if the medium remains idle for a time equal to IFS. If so, the station may transmit immediately. 538 CHAPTER 17 / WIRELESS LANS Wait for frame to transmit Medium idle? No Yes Wait IFS Still idle? No Wait until current transmission ends Yes Transmit frame Wait IFS Still idle? No Yes Exponential backoff while medium idle Transmit frame Figure 17.6 IEEE 802.11 Medium Access Control Logic 2. If the medium is busy (either because the station initially finds the medium busy or because the medium becomes busy during the IFS idle time), the station defers transmission and continues to monitor the medium until the current transmission is over. 3. Once the current transmission is over, the station delays another IFS. If the medium remains idle for this period, then the station backs off a random amount of time and again senses the medium. If the medium is still idle, the station may transmit. During the backoff time, if the medium becomes busy, the backoff timer is halted and resumes when the medium becomes idle. 4. If the transmission is unsuccessful, which is determined by the absence of an acknowledgement, then it is assumed that a collision has occurred. To ensure that backoff maintains stability, binary exponential backoff, described in Chapter 16, is used. Binary exponential backoff provides a means of 17.4 / IEEE 802.11 MEDIUM ACCESS CONTROL 539 handling a heavy load. Repeated failed attempts to transmit result in longer and longer backoff times, which helps to smooth out the load. Without such a backoff, the following situation could occur: Two or more stations attempt to transmit at the same time, causing a collision. These stations then immediately attempt to retransmit, causing a new collision. The preceding scheme is refined for DCF to provide priority-based access by the simple expedient of using three values for IFS: • SIFS (short IFS): The shortest IFS, used for all immediate response actions, as explained in the following discussion • PIFS (point coordination function IFS): A midlength IFS, used by the centralized controller in the PCF scheme when issuing polls • DIFS (distributed coordination function IFS): The longest IFS, used as a minimum delay for asynchronous frames contending for access Figure 17.7a illustrates the use of these time values. Consider first the SIFS. Any station using SIFS to determine transmission opportunity has, in effect, the highest priority, because it will always gain access in preference to a station waiting an amount of time equal to PIFS or DIFS. The SIFS is used in the following circumstances: Immediate access when medium is free longer than DIFS Contention window DIFS PIFS DIFS SIFS Busy Medium Backoff window Next frame Slot time Defer access Select slot using binary exponential backoff (a) Basic access method Superframe (fixed nominal length) Superframe (fixed nominal length) Contention-free period PCF (optional) Foreshortened actual superframe period Contention period DCF Variable length (per superframe) Busy medium PCF (optional) PCF defers CF-Burst; asynchronous traffic defers (b) PCF superframe construction Figure 17.7 IEEE 802.11 MAC Timing time 540 CHAPTER 17 / WIRELESS LANS • Acknowledgment (ACK): When a station receives a frame addressed only to itself (not multicast or broadcast), it responds with an ACK frame after waiting only for an SIFS gap. This has two desirable effects. First, because collision detection is not used, the likelihood of collisions is greater than with CSMA/CD, and the MAC-level ACK provides for efficient collision recovery. Second, the SIFS can be used to provide efficient delivery of an LLC protocol data unit (PDU) that requires multiple MAC frames. In this case, the following scenario occurs. A station with a multiframe LLC PDU to transmit sends out the MAC frames one at a time. Each frame is acknowledged by the recipient after SIFS. When the source receives an ACK, it immediately (after SIFS) sends the next frame in the sequence. The result is that once a station has contended for the channel, it will maintain control of the channel until it has sent all of the fragments of an LLC PDU. • Clear to Send (CTS): A station can ensure that its data frame will get through by first issuing a small Request to Send (RTS) frame. The station to which this frame is addressed should immediately respond with a CTS frame if it is ready to receive. All other stations receive the RTS and defer using the medium. • Poll response: This is explained in the following discussion of PCF. The next longest IFS interval is the PIFS. This is used by the centralized controller in issuing polls and takes precedence over normal contention traffic. However, those frames transmitted using SIFS have precedence over a PCF poll. Finally, the DIFS interval is used for all ordinary asynchronous traffic. Point Coordination Function PCF is an alternative access method implemented on top of the DCF.The operation consists of polling by the centralized polling master (point coordinator). The point coordinator makes use of PIFS when issuing polls. Because PIFS is smaller than DIFS, the point coordinator can seize the medium and lock out all asynchronous traffic while it issues polls and receives responses. As an extreme, consider the following possible scenario. A wireless network is configured so that a number of stations with time-sensitive traffic are controlled by the point coordinator while remaining traffic contends for access using CSMA. The point coordinator could issue polls in a round-robin fashion to all stations configured for polling. When a poll is issued, the polled station may respond using SIFS. If the point coordinator receives a response, it issues another poll using PIFS. If no response is received during the expected turnaround time, the coordinator issues a poll. If the discipline of the preceding paragraph were implemented, the point coordinator would lock out all asynchronous traffic by repeatedly issuing polls. To prevent this, an interval known as the superframe is defined. During the first part of this interval, the point coordinator issues polls in a round-robin fashion to all stations configured for polling. The point coordinator then idles for the remainder of the superframe, allowing a contention period for asynchronous access. Figure 17.7b illustrates the use of the superframe. At the beginning of a superframe, the point coordinator may optionally seize control and issue polls for a given period of time. This interval varies because of the variable frame size issued by responding stations. The remainder of the superframe is available for contentionbased access. At the end of the superframe interval, the point coordinator contends 17.4 / IEEE 802.11 MEDIUM ACCESS CONTROL Octets 2 2 FC D/I 6 6 6 2 6 0 to 2312 4 Address Address Address SC Address Frame body CRC 541 FC = Frame control D/I = Duration/connection ID SC = Sequence control Figure 17.8 IEEE 802.11 MAC Frame Format for access to the medium using PIFS. If the medium is idle, the point coordinator gains immediate access and a full superframe period follows. However, the medium may be busy at the end of a superframe. In this case, the point coordinator must wait until the medium is idle to gain access; this results in a foreshortened superframe period for the next cycle. MAC Frame Figure 17.8 shows the 802.11 frame format. This general format is used for all data and control frames, but not all fields are used in all contexts. The fields are as follows: • Frame Control: Indicates the type of frame (control, management, or data) and provides control information. Control information includes whether the frame is to or from a DS, fragmentation information, and privacy information. • Duration/Connection ID: If used as a duration field, indicates the time (in microseconds) the channel will be allocated for successful transmission of a MAC frame. In some control frames, this field contains an association, or connection, identifier. • Addresses: The number and meaning of the 48-bit address fields depend on context. The transmitter address and receiver address are the MAC addresses of stations joined to the BSS that are transmitting and receiving frames over the wireless LAN. The service set ID (SSID) identifies the wireless LAN over which a frame is transmitted. For an IBSS, the SSID is a random number generated at the time the network is formed. For a wireless LAN that is part of a larger configuration the SSID identifies the BSS over which the frame is transmitted; specifically, the SSID is the MAC-level address of the AP for this BSS (Figure 17.4). Finally the source address and destination address are the MAC addresses of stations, wireless or otherwise, that are the ultimate source and destination of this frame. The source address may be identical to the transmitter address and the destination address may be identical to the receiver address. • Sequence Control: Contains a 4-bit fragment number subfield, used for fragmentation and reassembly, and a 12-bit sequence number used to number frames sent between a given transmitter and receiver. • Frame Body: Contains an MSDU or a fragment of an MSDU. The MSDU is a LLC protocol data unit or MAC control information. • Frame Check Sequence: A 32-bit cyclic redundancy check. 542 CHAPTER 17 / WIRELESS LANS We now look at the three MAC frame types. Control Frames Control frames assist in the reliable delivery of data frames. There are six control frame subtypes: • Power Save-Poll (PS-Poll): This frame is sent by any station to the station that includes the AP (access point). Its purpose is to request that the AP transmit a frame that has been buffered for this station while the station was in powersaving mode. • Request to Send (RTS): This is the first frame in the four-way frame exchange discussed under the subsection on reliable data delivery at the beginning of Section 17.3. The station sending this message is alerting a potential destination, and all other stations within reception range, that it intends to send a data frame to that destination. • Clear to Send (CTS): This is the second frame in the four-way exchange. It is sent by the destination station to the source station to grant permission to send a data frame. • Acknowledgment: Provides an acknowledgment from the destination to the source that the immediately preceding data, management, or PS-Poll frame was received correctly. • Contention-Free (CF)-end: Announces the end of a contention-free period that is part of the point coordination function. • CF-End + CF-Ack: Acknowledges the CF-end. This frame ends the contention-free period and releases stations from the restrictions associated with that period. Data Frames There are eight data frame subtypes, organized into two groups. The first four subtypes define frames that carry upper-level data from the source station to the destination station. The four data-carrying frames are as follows: • Data: This is the simplest data frame. It may be used in both a contention period and a contention-free period. • Data + CF-Ack: May only be sent during a contention-free period. In addition to carrying data, this frame acknowledges previously received data. • Data + CF-Poll: Used by a point coordinator to deliver data to a mobile station and also to request that the mobile station send a data frame that it may have buffered. • Data + CF-Ack + CF-Poll: Combines the functions of the Data + CF-Ack and Data + CF-Poll into a single frame. The remaining four subtypes of data frames do not in fact carry any user data. The Null Function data frame carries no data, polls, or acknowledgments. It is used only to carry the power management bit in the frame control field to the AP, to indicate that the station is changing to a low-power operating state. The remaining three frames (CF-Ack, CF-Poll, CF-Ack + CF-Poll) have the same functionality as the corresponding data frame subtypes in the preceding list 1Data + CF-Ack, Data + CF-Poll, Data + CF-Ack + CF-Poll2 but without the data. 17.5 / IEEE 802.11 PHYSICAL LAYER 543 Management Frames Management frames are used to manage communications between stations and APs. Functions covered include management of associations (request, response, reassociation, dissociation, and authentication). 17.5 IEEE 802.11 PHYSICAL LAYER The physical layer for IEEE 802.11 has been issued in four stages. The first part, simply called IEEE 802.11, includes the MAC layer and three physical layer specifications, two in the 2.4-GHz band (ISM) and one in the infrared, all operating at 1 and 2 Mbps. IEEE 802.11a operates in the 5-GHz band at data rates up to 54 Mbps. IEEE 802.11b operates in the 2.4-GHz band at 5.5 and 11 Mbps. IEEE 802.11g also operates in the 2.4-GHz band, at data rates up to 54 Mbps. Table 17.4 provides some details. We look at each of these in turn. Original IEEE 802.11 Physical Layer Three physical media are defined in the original 802.11 standard: • Direct sequence spread spectrum (DSSS) operating in the 2.4-GHz ISM band, at data rates of 1 Mbps and 2 Mbps. In the United States, the FCC (Federal Communications Commission) requires no licensing for the use of this band. The number of channels available depends on the bandwidth allocated by the various national regulatory agencies. This ranges from 13 in most European countries to just one available channel in Japan. • Frequency-hopping spread spectrum (FHSS) operating in the 2.4-GHz ISM band, at data rates of 1 Mbps and 2 Mbps. The number of channels available ranges from 23 in Japan to 70 in the United States. Table 17.4 IEEE 802.11 Physical Layer Standards 802.11 802.11a 802.11b 802.11g Available bandwidth 83.5 MHz 300 MHz 83.5 MHz 83.5 MHz Unlicensed frequency of operation 2.4–2.4835 GHz DSSS, FHSS 5.15–5.35 GHz OFDM 5.725–5.825 GHz OFDM 2.4–2.4835 GHz DSSS 2.4–2.4835 GHz DSSS, OFDM Number of nonoverlapping channels 3 (indoor/outdoor) 4 indoor 4 (indoor/outdoor) 4 outdoor 3 (indoor/outdoor) 3 (indoor/outdoor) Data rate per channel 1, 2 Mbps 6, 9, 12, 18, 24, 36, 48, 54 Mbps 1, 2, 5.5, 11 Mbps 1, 2, 5.5, 6, 9, 11, 12, 18, 24, 36, 48, 54 Mbps Compatibility 802.11 Wi-Fi5 Wi-Fi Wi-Fi at 11 Mbps and below 544 CHAPTER 17 / WIRELESS LANS • Infrared at 1 Mbps and 2 Mbps operating at a wavelength between 850 and 950 nm Direct Sequence Spread Spectrum Up to three nonoverlapping channels, each with a data rate of 1 Mbps or 2 Mbps, can be used in the DSSS scheme. Each channel has a bandwidth of 5 MHz. The encoding scheme that is used is DBPSK (differential binary phase shift keying) for the 1-Mbps rate and DQPSK for the 2-Mbps rate. Recall from Chapter 9 that a DSSS system makes use of a chipping code, or pseudonoise sequence, to spread the data rate and hence the bandwidth of the signal. For IEEE 802.11, a Barker sequence is used. A Barker sequence is a binary 5-1, +16 sequence 5s1t26 of length n with the property that its autocorrelation values R1t2 satisfy ƒ R1t2 ƒ … 1 for all ƒ t ƒ … 1n - 12. 1 N BkBk-t, where the Autocorrelation is defined by the following formula: R(t) = N ka =1 1 Bi are the bits of the sequence. Further, the Barker property is preserved under the following transformations: s1t2 : 1 -12ts1t2 and s1t2 : - s1t2 s1t2 : - s1n - 1 - t2 as well as under compositions of these transformations. Only the following Barker sequences are known: + + + + + + + + + + - + + + + - - + n = 11 + - + + - + + + - - - n n n n n = = = = = 2 3 4 5 7 n = 13 + + + + + - - + + - + - + IEEE 802.11 DSSS uses the 11-chip Barker sequence. Each data binary 1 is mapped into the sequence 5+ - + + - + + + - - -6, and each binary 0 is mapped into the sequence 5- + - - + - - - + + +6. Important characteristics of Barker sequences are their robustness against interference and their insensitivity to multipath propagation. Frequency-Hopping Spread Spectrum Recall from Chapter 9 that a FHSS system makes use of a multiple channels, with the signal hopping from one channel to another based on a pseudonoise sequence. In the case of the IEEE 802.11 scheme, 1-MHz channels are used. The details of the hopping scheme are adjustable. For example, the minimum hop rate for the United States is 2.5 hops per second. The minimum 1 See Appendix J for a discussion of correlation and orthogonality. 17.5 / IEEE 802.11 PHYSICAL LAYER 545 hop distance in frequency is 6 MHz in North America and most of Europe and 5 MHz in Japan. For modulation, the FHSS scheme uses two-level Gaussian FSK for the 1-Mbps system. The bits zero and one are encoded as deviations from the current carrier frequency. For 2 Mbps, a four-level GFSK scheme is used, in which four different deviations from the center frequency define the four 2-bit combinations. Infrared The IEEE 802.11 infrared scheme is omnidirectional rather than point to point. A range of up to 20 m is possible. The modulation scheme for the 1-Mbps data rate is known as 16-PPM (pulse position modulation). In pulse position modulation (PPM), the input value determines the position of a narrow pulse relative to the clocking time. The advantage of PPM is that it reduces the output power required of the infrared source. For 16-PPM, each group of 4 data bits is mapped into one of the 16-PPM symbols; each symbol is a string of 16 pulse positions. Each 16-pulse string consists of fifteen 0s and one binary 1. For the 2-Mbps data rate, each group of 2 data bits is mapped into one of four 4-pulse-position sequences. Each sequence consists of three 0s and one binary 1. The actual transmission uses an intensity modulation scheme, in which the presence of a signal corresponds to a binary 1 and the absence of a signal corresponds to binary 0. IEEE 802.11a Channel Structure IEEE 802.11a makes use of the frequency band called the Universal Networking Information Infrastructure (UNNI), which is divided into three parts. The UNNI-1 band (5.15 to 5.25 GHz) is intended for indoor use; the UNNI-2 band (5.25 to 5.35 GHz) can be used either indoor or outdoor, and the UNNI-3 band (5.725 to 5.825 GHz) is for outdoor use. IEEE 80211.a has several advantages over IEEE 802.11b/g: • IEEE 802.11a utilizes more available bandwidth than 802.11b/g. Each UNNI band provides four nonoverlapping channels for a total of 12 across the allocated spectrum. • IEEE 802.11a provides much higher data rates than 802.11b and the same maximum data rate as 802.11g. • IEEE 802.11a uses a different, relatively uncluttered frequency spectrum (5 GHz). Coding and Modulation Unlike the 2.4-GHz specifications, IEEE 802.11 does not use a spread spectrum scheme but rather uses orthogonal frequency division multiplexing (OFDM). Recall from Section 11.2 that OFDM, also called multicarrier modulation, uses multiple carrier signals at different frequencies, sending some of the bits on each channel. This is similar to FDM. However, in the case of OFDM, all of the subchannels are dedicated to a single data source. To complement OFDM, the specification supports the use of a variety of modulation and coding alternatives. The system uses up to 48 subcarriers that are modulated using BPSK, QPSK, 16-QAM, or 64-QAM. Subcarrier frequency 546 CHAPTER 17 / WIRELESS LANS 12 OFDM symbols 1 OFDM symbol Variable number of OFDM symbols PLCP preamble Signal Data Coded/OFDM (BPSK, r = 1/2) Coded/OFDM (rate is indicated in signal) Rate r Length P 4 bits 1 12 bits 1 6 bits Tail Service MPDU 16 bits Tail Pad 6 bits (a) IEEE 802.11a physical PDU 72 bits at 1 Mbps DBPSK 48 bits at 2 Mbps DQPSK Variable number bits at 2 Mbps DQPSK; 5.5 Mbps DBPSK; 11 Mbps DQPSK PLCP Preamble PLCP Header MPDU Sync SFD Signal Service Length CRC 56 bits 16 bits 8 bits 8 bits 16 bits 16 bits (b) IEEE 802.11b physical PDU Figure 17.9 IEEE 802 Physical-Level Protocol Data Units spacing is 0.3125 MHz., and each subcarrier transmits at a rate of 250 kbaud. A convolutional code at a rate of 1/2, 2/3, or 3/4 provides forward error correction. The combination of modulation technique and coding rate determines the data rate. Physical-Layer Frame Structure The primary purpose of the physical layer is to transmit medium access control (MAC) protocol data units (MPDUs) as directed by the 802.11 MAC layer. The PLCP sublayer provides the framing and signaling bits needed for the OFDM transmission and the PDM sublayer performs the actual encoding and transmission operation. Figure 17.9a illustrates the physical layer frame format. The PLCP Preamble field enables the receiver to acquire an incoming OFDM signal and synchronize the demodulator. Next is the Signal field, which consists of 24 bits encoded as a single OFDM symbol. The Preamble and Signal fields are transmitted at 6 Mbps using BPSK. The signal field consists of the following subfields: • Rate: Specifies the data rate at which the data field portion of the frame is transmitted • r: reserved for future use • Length: Number of octets in the MAC PDU 17.5 / IEEE 802.11 PHYSICAL LAYER 547 • P: An even parity bit for the 17 bits in the Rate, r, and Length subfields • Tail: Consists of 6 zero bits appended to the symbol to bring the convolutional encoder to zero state The Data field consists of a variable number of OFDM symbols transmitted at the data rate specified in the Rate subfield. Prior to transmission, all of the bits of the Data field are scrambled (see Appendix 16C for a discussion of scrambling). The Data field consists of four subfields: • Service: Consists of 16 bits, with the first 7 bits set to zeros to synchronize the descrambler in the receiver, and the remaining 9 bits (all zeros) reserved for future use. • MAC PDU: Handed down from the MAC layer. The format is shown in Figure 17.8. • Tail: Produced by replacing the six scrambled bits following the MPDU end with 6 bits of all zeros; used to reinitialize the convolutional encoder. • Pad: The number of bits required to make the Data field a multiple of the number of bits in an OFDM symbol (48, 96, 192, or 288). IEEE 802.11b IEEE 802.11b is an extension of the IEEE 802.11 DSSS scheme, providing data rates of 5.5 and 11 Mbps in the ISM band. The chipping rate is 11 MHz, which is the same as the original DSSS scheme, thus providing the same occupied bandwidth. To achieve a higher data rate in the same bandwidth at the same chipping rate, a modulation scheme known as complementary code keying (CCK) is used. The CCK modulation scheme is quite complex and is not examined in detail here. Figure 17.10 provides an overview of the scheme for the 11-Mbps rate. Input data are treated in blocks of 8 bits at a rate of 1.375 MHz 18 bits/symbol * 1.375 MHz = 11 Mbps2. Six of these bits are mapped into one of 64 codes sequences derived from a 64 * 64 matrix known as the Walsh matrix Data input MUX 1:8 1.375 MHz Figure 17.10 11 MHz 11-Mbps CCK Modulation Scheme Differential modulation Pick one of 64 complex codes I out Q out 548 CHAPTER 17 / WIRELESS LANS (discussed in [STAL05]). The output of the mapping, plus the two additional bits, forms the input to a QPSK modulator. An optional alternative to CCK is known as packet binary convolutional coding (PBCC). PBCC provides for potentially more efficient transmission at the cost of increased computation at the receiver. PBCC was incorporated into 802.11b in anticipation of its need for higher data rates for future enhancements to the standard. Physical-Layer Frame Structure IEEE 802.11b defines two physical-layer frame formats, which differ only in the length of the preamble. The long preamble of 144 bits is the same as used in the original 802.11 DSSS scheme and allows interoperability with other legacy systems. The short preamble of 72 bits provides improved throughput efficiency. Figure 17.9b illustrates the physical layer frame format with the short preamble. The PLCP Preamble field enables the receiver to acquire an incoming signal and synchronize the demodulator. It consists of two subfields: a 56-bit Sync field for synchronization, and a 16-bit start-of-frame delimiter (SFD). The preamble is transmitted at 1 Mbps using differential BPSK and Barker code spreading. Following the preamble is the PLCP Header, which is transmitted at 2 Mbps using DQPSK. It consists of the following subfields: • Signal: Specifies the data rate at which the MPDU portion of the frame is transmitted. • Service: Only 3 bits of this 8-bit field are used in 802.11b. One bit indicates whether the transmit frequency and symbol clocks use the same local oscillator. Another bit indicates whether CCK or PBCC encoding is used. A third bit acts as an extension to the Length subfield. • Length: Indicates the length of the MPDU field by specifying the number of microseconds necessary to transmit the MPDU. Given the data rate, the length of the MPDU in octets can be calculated. For any data rate over 8 Mbps, the length extension bit from the Service field is needed to resolve a rounding ambiguity. • CRC: A 16-bit error detection code used to protect the Signal, Service, and Length fields. The MPDU field consists of a variable number of bits transmitted at the data rate specified in the Signal subfield. Prior to transmission, all of the bits of the physical layer PDU are scrambled (see Appendix 16C for a discussion of scrambling). IEEE 802.11g IEEE 802.11g extends 802.11b to data rates above 20 Mbps, up to 54 Mbps. Like 802.11b, 802.11g operates in the 2.4-GHz range and thus the two are compatible. The standard is designed so that 802.11b devices will work when connected to an 802.11g AP, and 802.11g devices will work when connected to an 802.11b AP, in both cases using the lower 802.11b data rate. 17.6 / IEEE 802.11 SECURITY CONSIDERATIONS 549 Table 17.5 Estimated Distance (m) Versus Data Rate Data Rate (Mbps) 802.11b 802.11a 802.11g 1 90+ — 90+ 2 75 — 75 5.5(b)/6(a/g) 60 60 + 65 9 — 50 55 11(b)/12(a/g) 50 45 50 18 — 40 50 24 — 30 45 36 — 25 35 48 — 15 25 54 — 10 20 IEEE 802.11g offers a wider array of data rate and modulation scheme options. IEEE 802.11g provides compatibility with 802.11 and 802.11b by specifying the same modulation and framing schemes as these standards for 1, 2, 5.5, and 11 Mbps. At data rates of 6, 9, 12, 18, 24, 36, 48, and 54 Mbps, 802.11g adopts the 802.11a OFDM scheme, adapted for the 2.4 GHz rate; this is referred to as ERPOFDM, with ERP standing for extended rate physical layer. In addition, and ERPPBCC scheme is used to provide data rates of 22 and 33 Mbps. The IEEE 802.11 standards do not include a specification of speed versus distance objectives. Different vendors will give different values, depending on environment. Table 17.5, based on [LAYL04] gives estimated values for a typical office environment. 17.6 IEEE 802.11 SECURITY CONSIDERATIONS There are two characteristics of a wired LAN that are not inherent in a wireless LAN. 1. In order to transmit over a wired LAN, a station must be physically connected to the LAN. On the other hand, with a wireless LAN, any station within radio range of the other devices on the LAN can transmit. In a sense, there is a form of authentication with a wired LAN, in that it requires some positive and presumably observable action to connect a station to a wired LAN. 2. Similarly, in order to receive a transmission from a station that is part of a wired LAN, the receiving station must also be attached to the wired LAN. On the other hand, with a wireless LAN, any station within radio range can receive. Thus, a wired LAN provides a degree of privacy, limiting reception of data to stations connected to the LAN. 550 CHAPTER 17 / WIRELESS LANS Access and Privacy Services IEEE 802.11 defines three services that provide a wireless LAN with these two features: • Authentication: Used to establish the identity of stations to each other. In a wired LAN, it is generally assumed that access to a physical connection conveys authority to connect to the LAN. This is not a valid assumption for a wireless LAN, in which connectivity is achieved simply by having an attached antenna that is properly tuned. The authentication service is used by stations to establish their identity with stations they wish to communicate with. IEEE 802.11 supports several authentication schemes and allows for expansion of the functionality of these schemes. The standard does not mandate any particular authentication scheme, which could range from relatively unsecure handshaking to public-key encryption schemes. However, IEEE 802.11 requires mutually acceptable, successful authentication before a station can establish an association with an AP. • Deauthentication: This service is invoked whenever an existing authentication is to be terminated. • Privacy: Used to prevent the contents of messages from being read by other than the intended recipient. The standard provides for the optional use of encryption to assure privacy. Wireless LAN Security Standards The original 802.11 specification included a set of security features for privacy and authentication that, unfortunately, were quite weak. For privacy, 802.11 defined the Wired Equivalent Privacy (WEP) algorithm. The privacy portion of the 802.11 standard contained major weaknesses. Subsequent to the development of WEP, the 802.11i task group has developed a set of capabilities to address the WLAN security issues. In order to accelerate the introduction of strong security into WLANs, the Wi-Fi Alliance promulgated Wi-Fi Protected Access (WPA) as a Wi-Fi standard. WPA is a set of security mechanisms that eliminates most 802.11 security issues and was based on the current state of the 802.11i standard. As 802.11i evolves, WPA will evolve to maintain compatibility. WPA is examined in Chapter 21. 17.7 RECOMMENDED READING AND WEB SITES [PAHL95] and [BANT94] are detailed survey articles on wireless LANs. [KAHN97] provides good coverage of infrared LANs. [ROSH04] provides a good up-to-date technical treatment of IEEE 802.11. Another useful book is [BING02]. [OHAR99] is an excellent technical treatment of IEEE 802.11. Another good treatment is [LARO02]. [CROW97] is a good survey article on the 802.11 standards but does not cover IEEE 802.11a and IEEE 802.11b. A brief but useful survey of 802.11 is [MCFA03]. [GEIE01] has a good discussion of IEEE 802.11a. [PETR00] summarizes IEEE 802.11b. [SHOE02] provides an overview of IEEE 802.11g. [XIAO04] discusses 802.11e. 17.8 / KEY TERMS, REVIEW QUESTION, AND PROBLEMS 551 BANT94 Bantz, D., and Bauchot, F. “Wireless LAN Design Alternatives.” IEEE Network, March/April 1994. BING02 Bing, B. Wireless Local Area Networks. New York: Wiley, 2002. CROW97 Crow, B., et al. “IEEE 802.11 Wireless Local Area Networks.” IEEE Communications Magazine, September 1997. GEIE01 Geier, J. “Enabling Fast Wireless Networks with OFDM.” Communications System Design, February 2001. (www.csdmag.com) KAHN97 Kahn, J., and Barry, J. “Wireless Infrared Communications.” Proceedings of the IEEE, February 1997. LARO02 LaRocca, J., and LaRocca, R. 802.11 Demystified. New York: McGraw-Hill, 2002. MCFA03 McFarland, B., and Wong, M. ’The Family Dynamics of 802.11” ACM Queue, May 2003. OHAR99 Ohara, B., and Petrick, A. IEEE 802.11 Handbook: A Designer’s Companion. New York: IEEE Press, 1999. PAHL95 Pahlavan, K.; Probert, T.; and Chase, M. “Trends in Local Wireless Networks.” IEEE Communications Magazine, March 1995. PETR00 Petrick, A. “IEEE 802.11b—Wireless Ethernet.” Communications System Design, June 2000. www.commsdesign.com ROSH04 Roshan, P., and Leary, J. 802.11 Wireless LAN Fundamentals. Indianapolis: Cisco Press, 2004. SHOE02 Shoemake, M. “IEEE 802.11g Jells as Applications Mount.” Communications System Design, April 2002. www.commsdesign.com. XIAO04 Xiao, Y. “IEEE 802.11e: QoS Provisioning at the MAC Layer.” IEEE Communications Magazine, June 2004. Recommended Web sites: • Wireless LAN Association: Gives an introduction to the technology, including a discussion of implementation considerations and case studies from users. Links to related sites. • The IEEE 802.11 Wireless LAN Working Group: Contains working group documents plus discussion archives. • Wi-Fi Alliance: An industry group promoting the interoperability of 802.11 products with each other and with Ethernet. 17.8 KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS Key Terms access point (AP) ad hoc networking Barker sequence basic service set (BSS) complementary code keying (CCK) coordination function distributed coordination function (DCF) 552 CHAPTER 17 / WIRELESS LANS distribution system (DS) extended service set (ESS) infrared LAN LAN extension narrowband microwave LAN nomadic access point coordination function (PCF) spread spectrum LAN wireless LAN Review Questions 17.1. 17.2. 17.3. 17.4. 17.5. 17.6. 17.7. 17.8. 17.9. 17.10. List and briefly define four application areas for wireless LANs. List and briefly define key requirements for wireless LANs. What is the difference between a single-cell and a multiple-cell wireless LAN? What are some key advantages of infrared LANs? What are some key disadvantages of infrared LANs? List and briefly define three transmission techniques for infrared LANs. What is the difference between an access point and a portal? Is a distribution system a wireless network? List and briefly define IEEE 802.11 services. How is the concept of an association related to that of mobility? Problems 17.1 Consider the sequence of actions within a BSS depicted in Figure 17.11. Draw a timeline, beginning with a period during which the medium is busy and ending with a on eac ll 1. B -Po CF k a -Ac at CF 2. D ata 3. D nd F-E 8. C Subscriber A 1. Beacon 4. Data CF-Ack CF-Poll 5. Data CF-ACK 8. CF-End Subscriber B AP 1. B eac on F-P o ll 7. D ata (Nu 8. C ll) F-E nd 6. C Subscriber C Figure 17.11 Configuration for Problem 17.1 17.8 / KEY TERMS, REVIEW QUESTION, AND PROBLEMS 17.2 17.3 17.4 17.5 553 period in which the CF-End is broadcast from the AP. Show the transmission periods and the gaps. Find the autocorrelation for the 11-bit Barker sequence as a function of t. a. For the 16-PPM scheme used for the 1-Mbps IEEE 802.11 infrared standard, a1. What is the period of transmission (time between bits)? For the corresponding infrared pulse transmission, a2. What is the average time between pulses (1 values) and the corresponding average rate of pulse transmission? a3. What is the minimum time between adjacent pulses? a4. What is the maximum time between pulses? b. Repeat (a) for the 4-PPM scheme used for the 2-Mbps infrared standard. For IEEE 802.11a, show how the modulation technique and coding rate determine the data rate. The 802.11a and 802.11b physical layers make use of data scrambling (see Appendix 16C). For 802.11, the scrambling equation is P1X2 = 1 + X4 + X7 In this case the shift register consists of seven elements, used in the same manner as the five-element register in Figure 16.17. For the 802.11 scrambler and descrambler, a. Show the expression with exclusive-or operators that corresponds to the polynomial definition. b. Draw a figure similar to Figure 16.17. PART FIVE Internet and Transport Protocols W e have dealt, so far, with the technologies and techniques used to exchange data between two devices. Part Two dealt with the case in which the two devices share a transmission link. Parts Three and Four were concerned with the case in which a communication network provides a shared transmission capacity for multiple attached end systems. In a distributed data processing system, much more is needed. The data processing systems (workstations, PCs, servers, mainframes) must implement a set of functions that will allow them to perform some task cooperatively. This set of functions is organized into a communications architecture and involves a layered set of protocols, including internetwork, transport, and application-layer protocols. In Part Five, we examine the internetwork and transport protocols. Before proceeding with Part Five, the reader is advised to revisit Chapter 2, which introduces the concept of a protocol architecture and discusses the key elements of a protocol. ROAD MAP FOR PART FIVE Chapter 18 Internet Protocols With the proliferation of networks, internetworking facilities have become essential components of network design. Chapter 18 begins with an examination of the requirements for an internetworking facility and the various design approaches that can be taken to satisfy those requirements. The remainder of the chapter deals with the use of routers for internetworking. The Internet Protocol (IP) and the new IPv6 are examined. Chapter 19 Internetwork Operation Chapter 19 begins with a discussion of multicasting across an internet. Then issues of routing and quality of service are explored. 554 The traffic that the Internet and these private internetworks must carry continues to grow and change. The demand generated by traditional data-based applications, such as electronic mail, Usenet news, file transfer, and remote logon, is sufficient to challenge these systems. But the driving factors are the heavy use of the World Wide Web, which demands real-time response, and the increasing use of voice, image, and even video over internetwork architectures. These internetwork schemes are essentially datagram packetswitching technology with routers functioning as the switches. This technology was not designed to handle voice and video and is straining to meet the demands placed on it. While some foresee the replacement of this conglomeration of Ethernet-based LANs, packet-based WANs, and IP-datagram-based routers with a seamless ATM transport service from desktop to backbone, that day is far off. Meanwhile, the internetworking and routing functions of these networks must be engineered to meet the load. Chapter 19 looks at some of the tools and techniques designed to meet the new demand, beginning with a discussion of routing schemes, which can help smooth out load surges. The remainder of the chapter looks at recent efforts to provide a given level of quality of service (QoS) to various applications. The most important elements of this new approach are integrated services and differentiated services. Chapter 20 Transport Protocols The transport protocol is the keystone of the whole concept of a computer communications architecture. It can also be one of the most complex of protocols. Chapter 20 examines in detail transport protocol mechanisms and then discusses two important examples, TCP and UDP. The bulk of the chapter is devoted to an analysis of the complex set of TCP mechanisms and of TCP congestion control schemes. 555 18 CHAPTER INTERNET PROTOCOLS 18.1 Basic Protocol Functions 18.2 Principles of Internetworking 18.3 Internet Protocol Operation 18.4 Internet Protocol 18.5 IPv6 18.6 Virtual Private Networks and IP Security 18.7 Recommended Reading and Web Sites 18.8 Key Terms, Review Questions, and Problems 556 The map of the London Underground, which can be seen inside every train, has been called a model of its kind, a work of art. It presents the underground network as a geometric grid. The tube lines do not, of course, lie at right angles to one another like the streets of Manhattan. Nor do they branch off at acute angles or form perfect oblongs. —King Solomon’s Carpet. Barbara Vine (Ruth Rendell) KEY POINTS • • • • Key functions typically performed by a protocol include encapsulation, fragmentation and reassembly, connection control, ordered delivery, flow control, error control, addressing, and multiplexing. An internet consists of multiple separate networks that are interconnected by routers. Data are transmitted in packets from a source system to a destination across a path involving multiple networks and routers. Typically, a connectionless or datagram operation is used. A router accepts datagrams and relays them on toward their destination and is responsible for determining the route, much the same way as packet-switching nodes operate. The most widely used protocol for internetworking is the Internet Protocol (IP). IP attaches a header to upper-layer (e.g., TCP) data to form an IP datagram. The header includes source and destination addresses, information used for fragmentation and reassembly, a timeto-live field, a type-of-service field, and a checksum. A next-generation IP, known as IPv6, has been defined. IPv6 provides longer address fields and more functionality than the current IP. The purpose of this chapter is to examine the Internet Protocol, which is the foundation on which all of the internet-based protocols and on which internetworking is based. First, it will be useful to review the basic functions of networking protocols. This review serves to summarize some of the material introduced previously and to set the stage for the study of internet-based protocols in Parts Five and Six. We then move to a discussion of internetworking. Next, the chapter focuses on the two standard internet protocols: IPv4 and IPv6. Finally, the topic of IP security is introduced. Refer to Figure 2.5 to see the position within the TCP/IP suite of the protocols discussed in this chapter. 557 558 CHAPTER 18 / INTERNET PROTOCOLS 18.1 BASIC PROTOCOL FUNCTIONS Before turning to a discussion of internet protocols, let us consider a rather small set of functions that form the basis of all protocols. Not all protocols have all functions; this would involve a significant duplication of effort. There are, nevertheless, many instances of the same type of function being present in protocols at different levels. We can group protocol functions into the following categories: • • • • • • • • • Encapsulation Fragmentation and reassembly Connection control Ordered delivery Flow control Error control Addressing Multiplexing Transmission services Encapsulation For virtually all protocols, data are transferred in blocks, called protocol data units (PDUs). Each PDU contains not only data but also control information. Indeed, some PDUs consist solely of control information and no data. The control information falls into three general categories: • Address: The address of the sender and/or receiver may be indicated. • Error-detecting code: Some sort of frame check sequence is often included for error detection. • Protocol control: Additional information is included to implement the protocol functions listed in the remainder of this section. The addition of control information to data is referred to as encapsulation. Data are accepted or generated by an entity and encapsulated into a PDU containing that data plus control information. Typically, the control information is contained in a PDU header; some data link layer PDUs include a trailer as well. Numerous examples of PDUs appear in the preceding chapters [e.g., TFTP (Figure 2.13), HDLC (Figure 7.7), frame relay (Figure 10.16), ATM (Figure 11.4), LLC (Figure 15.7), IEEE 802.3 (Figure 16.3), IEEE 802.11 (Figure 17.8)]. Fragmentation and Reassembly1 A protocol is concerned with exchanging data between two entities. Usually, the transfer can be characterized as consisting of a sequence of PDUs of some bounded size. Whether the application entity sends data in messages or in a continuous 1 The term segmentation is used in OSI-related documents, but in protocol specifications related to the TCP/IP protocol suite, the term fragmentation is used. The meaning is the same. 18.1 / BASIC PROTOCOL FUNCTIONS 559 stream, lower-level protocols typically organize the data into blocks. Further, a protocol may need to divide a block received from a higher layer into multiple blocks of some smaller bounded size. This process is called fragmentation. There are a number of motivations for fragmentation, depending on the context. Among the typical reasons for fragmentation are the following: • The communications network may only accept blocks of data up to a certain size. For example, an ATM network is limited to blocks of 53 octets; Ethernet imposes a maximum size of 1526 octets. • Error control may be more efficient with a smaller PDU size. With smaller PDUs, fewer bits need to be retransmitted when a PDU suffers an error. • More equitable access to shared transmission facilities, with shorter delay, can be provided. For example, without a maximum block size, one station could monopolize a multipoint medium. • A smaller PDU size may mean that receiving entities can allocate smaller buffers. • An entity may require that data transfer comes to some sort of “closure” from time to time, for checkpoint and restart/recovery operations. There are several disadvantages to fragmentation that argue for making PDUs as large as possible: • Because each PDU contains a certain amount of control information, smaller blocks have a greater percentage of overhead. • PDU arrival may generate an interrupt that must be serviced. Smaller blocks result in more interrupts. • More time is spent processing smaller, more numerous PDUs. All of these factors must be taken into account by the protocol designer in determining minimum and maximum PDU size. The counterpart of fragmentation is reassembly. Eventually, the segmented data must be reassembled into messages appropriate to the application level. If PDUs arrive out of order, the task is complicated. Connection Control An entity may transmit data to another entity in such a way that each PDU is treated independently of all prior PDUs. This is known as connectionless data transfer; an example is the use of the datagram, described in Chapter 10. While this mode is useful, an equally important technique is connection-oriented data transfer, of which the virtual circuit, also described in Chapter 10, is an example. Connection-oriented data transfer is preferred (even required) if stations anticipate a lengthy exchange of data and/or certain details of their protocol must be worked out dynamically. A logical association, or connection, is established between the entities. Three phases occur (Figure 18.1): • Connection establishment • Data transfer • Connection termination CHAPTER 18 / INTERNET PROTOCOLS Protocol entity Protocol entity Connection re quest cept Connection ac Data Time 560 ent Acknowledgm Multiple exchanges Terminate-co nnection requ est nnection accept Terminate-co Figure 18.1 The Phases of a Connection-Oriented Data Transfer With more sophisticated protocols, there may also be connection interrupt and recovery phases to cope with errors and other sorts of interruptions. During the connection establishment phase, two entities agree to exchange data. Typically, one station will issue a connection request (in connectionless fashion) to the other. A central authority may or may not be involved. In simpler protocols, the receiving entity either accepts or rejects the request and, in the former case, the connection is considered to be established. In more complex proposals, this phase includes a negotiation concerning the syntax, semantics, and timing of the protocol. Both entities must, of course, be using the same protocol. But the protocol may allow certain optional features and these must be agreed upon by means of negotiation. For example, the protocol may specify a PDU size of up to 8000 octets; one station may wish to restrict this to 1000 octets. Following connection establishment, the data transfer phase is entered. During this phase both data and control information (e.g., flow control, error control) are exchanged. Figure 18.1 shows a situation in which all of the data flow in one direction, with acknowledgments returned in the other direction. More typically, data and acknowledgments flow in both directions. Finally, one side or the other wishes to terminate the connection and does so by sending a termination request. Alternatively, a central authority might forcibly terminate a connection. A key characteristic of many connection-oriented data transfer protocols is that sequencing is used (e.g., HDLC, IEEE 802.11). Each side sequentially numbers the PDUs that it sends to the other side. Because each side remembers that it is engaged in a logical connection, it can keep track of both outgoing numbers, which it generates, and incoming numbers, which are generated by the other side. Indeed, one can essentially define a connection-oriented data transfer as a transfer in which both sides 18.1 / BASIC PROTOCOL FUNCTIONS 561 number PDUs and keep track of both incoming and outgoing numbers. Sequencing supports three main functions: ordered deliver, flow control, and error control. Sequencing is not found in all connection-oriented protocols. Examples include frame relay and ATM. However, all connection-oriented protocols include in the PDU format some way of identifying the connection, which may be a unique connection identifier or a combination of source and destination addresses. Ordered Delivery If two communicating entities are in different hosts2 connected by a network, there is a risk that PDUs will not arrive in the order in which they were sent, because they may traverse different paths through the network. In connection-oriented protocols, it is generally required that PDU order be maintained. For example, if a file is transferred between two systems, we would like to be assured that the records of the received file are in the same order as those of the transmitted file, and not shuffled. If each PDU is given a unique number, and numbers are assigned sequentially, then it is a logically simple task for the receiving entity to reorder received PDUs on the basis of sequence number. A problem with this scheme is that, with a finite sequence number field, sequence numbers repeat (modulo some maximum number). Evidently, the maximum sequence number must be greater than the maximum number of PDUs that could be outstanding at any time. In fact, the maximum number may need to be twice the maximum number of PDUs that could be outstanding (e.g., selective-repeat ARQ; see Chapter 7). Flow Control Flow control is a function performed by a receiving entity to limit the amount or rate of data that is sent by a transmitting entity. The simplest form of flow control is a stop-and-wait procedure, in which each PDU must be acknowledged before the next can be sent. More efficient protocols involve some form of credit provided to the transmitter, which is the amount of data that can be sent without an acknowledgment. The HDLC sliding-window technique is an example of this mechanism (Chapter 7). Flow control is a good example of a function that must be implemented in several protocols. Consider Figure 18.2, which repeats Figure 2.1. The network will need to exercise flow control over host A via the network access protocol, to enforce network traffic control. At the same time, B’s network access module has finite buffer space and needs to exercise flow control over A’s transmission; it can do this via the transport protocol. Finally, even though B’s network access module can control its data flow, B’s application may be vulnerable to overflow. For example, the application could be hung up waiting for disk access. Thus, flow control is also needed over the application-oriented protocol. Error Control Error control techniques are needed to guard against loss or damage of data and control information. Typically, error control is implemented as two separate 2 The term host refers to any end system attached to a network, such as a PC, workstation, or server. 562 CHAPTER 18 / INTERNET PROTOCOLS Host A Host B App X Port (service access point) App Y 1 2 3 2 Logical connection (TCP connection) TCP App Y App X IP Network access protocol #1 Physical 6 TCP Global internet address IP 4 Network access protocol #2 Subnetwork attachment point address Router J Logical connection (e.g., virtual circuit) Physical IP NAP 1 Network 1 (N1) Figure 18.2 NAP 2 Physical Physical Network 2 (N2) TCP/IP Concepts functions: error detection and retransmission. To achieve error detection, the sender inserts an error-detecting code in the transmitted PDU, which is a function of the other bits in the PDU. The receiver checks the value of the code on the incoming PDU. If an error is detected, the receiver discards the PDU. Upon failing to receive an acknowledgment to the PDU in a reasonable time, the sender retransmits the PDU. Some protocols also employ an error correction code, which enables the receiver not only to detect errors but, in some cases, to correct them. As with flow control, error control is a function that must be performed at various layers of protocol. Consider again Figure 18.2. The network access protocol should include error control to assure that data are successfully exchanged between station and network. However, a packet of data may be lost inside the network, and the transport protocol should be able to recover from this loss. Addressing The concept of addressing in a communications architecture is a complex one and covers a number of issues, including • Addressing level • Addressing scope 18.1 / BASIC PROTOCOL FUNCTIONS 563 • Connection identifiers • Addressing mode During this discussion, we illustrate the concepts using Figure 18.2, which shows a configuration using the TCP/IP architecture. The concepts are essentially the same for the OSI architecture or any other communications architecture. Addressing level refers to the level in the communications architecture at which an entity is named. Typically, a unique address is associated with each end system (e.g., workstation or server) and each intermediate system (e.g., router) in a configuration. Such an address is, in general, a network-level address. In the case of the TCP/IP architecture, this is referred to as an IP address, or simply an internet address. In the case of the OSI architecture, this is referred to as a network service access point (NSAP). The network-level address is used to route a PDU through a network or networks to a system indicated by a network-level address in the PDU. Once data arrive at a destination system, they must be routed to some process or application in the system. Typically, a system will support multiple applications and an application may support multiple users. Each application and, perhaps, each concurrent user of an application is assigned a unique identifier, referred to as a port in the TCP/IP architecture and as a service access point (SAP) in the OSI architecture. For example, a host system might support both an electronic mail application and a file transfer application. At minimum each application would have a port number or SAP that is unique within that system. Further, the file transfer application might support multiple simultaneous transfers, in which case, each transfer is dynamically assigned a unique port number or SAP. Figure 18.2 illustrates two levels of addressing within a system. This is typically the case for the TCP/IP architecture. However, there can be addressing at each level of an architecture. For example, a unique SAP can be assigned to each level of the OSI architecture. Another issue that relates to the address of an end system or intermediate system is addressing scope. The internet address or NSAP address referred to previously is a global address. The key characteristics of a global address are as follows: • Global nonambiguity: A global address identifies a unique system. Synonyms are permitted. That is, a system may have more than one global address. • Global applicability: It is possible at any global address to identify any other global address, in any system, by means of the global address of the other system. Because a global address is unique and globally applicable, it enables an internet to route data from any system attached to any network to any other system attached to any other network. Figure 18.2 illustrates that another level of addressing may be required. Each network must maintain a unique address for each device interface on the network. Examples are a MAC address on an IEEE 802 network and an ATM host address. This address enables the network to route data units (e.g., MAC frames, ATM cells) through the network and deliver them to the intended attached system. We can refer to such an address as a network attachment point address. The issue of addressing scope is generally only relevant for network-level addresses. A port or SAP above the network level is unique within a given system 564 CHAPTER 18 / INTERNET PROTOCOLS but need not be globally unique. For example, in Figure 18.2, there can be a port 1 in system A and a port 1 in system B. The full designation of these two ports could be expressed as A.1 and B.1, which are unique designations. The concept of connection identifiers comes into play when we consider connection-oriented data transfer (e.g., virtual circuit) rather than connectionless data transfer (e.g., datagram). For connectionless data transfer, a global identifier is used with each data transmission. For connection-oriented transfer, it is sometimes desirable to use only a connection identifier during the data transfer phase. The scenario is this: Entity 1 on system A requests a connection to entity 2 on system B, perhaps using the global address B.2. When B.2 accepts the connection, a connection identifier (usually a number) is provided and is used by both entities for future transmissions. The use of a connection identifier has several advantages: • Reduced overhead: Connection identifiers are generally shorter than global identifiers. For example, in the frame relay protocol (discussed in Chapter 10), connection request packets contain both source and destination address fields. After a logical connection, called a data link connection, is established, data frames contain a data link connection identifier (DLCI) of 10, 16, or 23 bits. • Routing: In setting up a connection, a fixed route may be defined. The connection identifier serves to identify the route to intermediate systems, such as packet-switching nodes, for handling future PDUs. • Multiplexing: We address this function in more general terms later. Here we note that an entity may wish to enjoy more than one connection simultaneously. Thus, incoming PDUs must be identified by connection identifier. • Use of state information: Once a connection is established, the end systems can maintain state information relating to the connection. This enables such functions as flow control and error control using sequence numbers. We see examples of this with HDLC (Chapter 7) and IEEE 802.11 (Chapter 17). Figure 18.2 shows several examples of connections. The logical connection between router J and host B is at the network level. For example, if network 2 is a frame relay network, then this logical connection would be a data link connection. At a higher level, many transport-level protocols, such as TCP, support logical connections between users of the transport service. Thus, TCP can maintain a connection between two ports on different systems. Another addressing concept is that of addressing mode. Most commonly, an address refers to a single system or port; in this case it is referred to as an individual or unicast address. It is also possible for an address to refer to more than one entity or port. Such an address identifies multiple simultaneous recipients for data. For example, a user might wish to send a memo to a number of individuals.The network control center may wish to notify all users that the network is going down.An address for multiple recipients may be broadcast, intended for all entities within a domain, or multicast, intended for a specific subset of entities. Table 18.1 illustrates the possibilities. Multiplexing Related to the concept of addressing is that of multiplexing. One form of multiplexing is supported by means of multiple connections into a single system. For 18.1 / BASIC PROTOCOL FUNCTIONS 565 Table 18.1 Addressing Modes Destination Unicast Multicast Broadcast Network Address System Address Port/SAP Address Individual Individual Individual Individual Individual Group Individual All Group All All Group Individual Individual All Individual All All All All All example, with frame relay, there can be multiple data link connections terminating in a single end system; we can say that these data link connections are multiplexed over the single physical interface between the end system and the network. Multiplexing can also be accomplished via port names, which also permit multiple simultaneous connections. For example, there can be multiple TCP connections terminating in a given system, each connection supporting a different pair of ports. Multiplexing is used in another context as well, namely the mapping of connections from one level to another. Consider again Figure 18.2. Network 1 might provide a connection-oriented service. For each process-to-process connection established at the next higher level, a data link connection could be created at the network access level.This is a one-to-one relationship, but it need not be so. Multiplexing can be used in one of two directions. Upward multiplexing, or inward multiplexing, occurs when multiple higher-level connections are multiplexed on, or share, a single lower-level connection. This may be needed to make more efficient use of the lower-level service or to provide several higher-level connections in an environment where only a single lower-level connection exists. Downward multiplexing, or splitting, means that a single higher-level connection is built on top of multiple lower-level connections, and the traffic on the higher connection is divided among the various lower connections. This technique may be used to provide reliability, performance, or efficiency. Transmission Services A protocol may provide a variety of additional services to the entities that use it. We mention here three common examples: • Priority: Certain messages, such as control messages, may need to get through to the destination entity with minimum delay. An example would be a terminate-connection request. Thus, priority could be assigned on a message basis. Additionally, priority could be assigned on a connection basis. • Quality of service: Certain classes of data may require a minimum throughput or a maximum delay threshold. • Security: Security mechanisms, restricting access, may be invoked. All of these services depend on the underlying transmission system and any intervening lower-level entities. If it is possible for these services to be provided from below, the protocol can be used by the two entities to exercise those services. 566 CHAPTER 18 / INTERNET PROTOCOLS 18.2 PRINCIPLES OF INTERNETWORKING Packet-switching and packet-broadcasting networks grew out of a need to allow the computer user to have access to resources beyond that available in a single system. In a similar fashion, the resources of a single network are often inadequate to meet users’ needs. Because the networks that might be of interest exhibit so many differences, it is impractical to consider merging them into a single network. Rather, what is needed is the ability to interconnect various networks so that any two stations on any of the constituent networks can communicate. Table 18.2 lists some commonly used terms relating to the interconnection of networks, or internetworking. An interconnected set of networks, from a user’s point of view, may appear simply as a larger network. However, if each of the constituent networks retains its identity and special mechanisms are needed for communicating across multiple networks, then the entire configuration is often referred to as an internet. Each constituent network in an internet supports communication among the devices attached to that network; these devices are referred to as end systems (ESs). In addition, networks are connected by devices referred to in the ISO documents as intermediate systems (ISs). Intermediate systems provide a communications path Table 18.2 Internetworking Terms Communication Network A facility that provides a data transfer service among devices attached to the network. Internet A collection of communication networks interconnected by bridges and/or routers. Intranet An internet used by a single organization that provides the key Internet applications, especially the World Wide Web. An intranet operates within the organization for internal purposes and can exist as an isolated, self-contained internet, or may have links to the Internet. Subnetwork Refers to a constituent network of an internet. This avoids ambiguity because the entire internet, from a user’s point of view, is a single network. End System (ES) A device attached to one of the networks of an internet that is used to support end-user applications or services. Intermediate System (IS) A device used to connect two networks and permit communication between end systems attached to different networks. Bridge An IS used to connect two LANs that use similar LAN protocols. The bridge acts as an address filter, picking up packets from one LAN that are intended for a destination on another LAN and passing those packets on. The bridge does not modify the contents of the packets and does not add anything to the packet. The bridge operates at layer 2 of the OSI model. Router An IS used to connect two networks that may or may not be similar. The router employs an internet protocol present in each router and each end system of the network. The router operates at layer 3 of the OSI model. 18.2 / PRINCIPLES OF INTERNETWORKING 567 and perform the necessary relaying and routing functions so that data can be exchanged between devices attached to different networks in the internet. Two types of ISs of particular interest are bridges and routers. The differences between them have to do with the types of protocols used for the internetworking logic. In essence, a bridge operates at layer 2 of the open systems interconnection (OSI) seven-layer architecture and acts as a relay of frames between similar networks; bridges are discussed in Chapter 15. A router operates at layer 3 of the OSI architecture and routes packets between potentially different networks. Both the bridge and the router assume that the same upper-layer protocols are in use. We begin our examination of internetworking with a discussion of the basic principles of internetworking. We then examine the most important architectural approach to internetworking: the connectionless router. Requirements The overall requirements for an internetworking facility are as follows (we refer to Figure 18.2 as an example throughout): 1. Provide a link between networks. At minimum, a physical and link control connection is needed. (Router J has physical links to N1 and N2, and on each link there is a data link protocol.) 2. Provide for the routing and delivery of data between processes on different networks. (Application X on host A exchanges data with application X on host B.) 3. Provide an accounting service that keeps track of the use of the various networks and routers and maintains status information. 4. Provide the services just listed in such a way as not to require modifications to the networking architecture of any of the constituent networks. This means that the internetworking facility must accommodate a number of differences among networks. These include • Different addressing schemes: The networks may use different endpoint names and addresses and directory maintenance schemes. Some form of global network addressing must be provided, as well as a directory service. (Hosts A and B and router J have globally unique IP addresses.) • Different maximum packet size: Packets from one network may have to be broken up into smaller pieces for another. This process is referred to as fragmentation. (N1 and N2 may set different upper limits on packet sizes.) • Different network access mechanisms: The network access mechanism between station and network may be different for stations on different networks. (For example, N1 may be a frame relay network and N2 an Ethernet network.) • Different timeouts: Typically, a connection-oriented transport service will await an acknowledgment until a timeout expires, at which time it will retransmit its block of data. In general, longer times are required for successful delivery across multiple networks. Internetwork timing procedures must allow successful transmission that avoids unnecessary retransmissions. 568 CHAPTER 18 / INTERNET PROTOCOLS • Error recovery: Network procedures may provide anything from no error recovery up to reliable end-to-end (within the network) service. The internetwork service should not depend on nor be interfered with by the nature of the individual network’s error recovery capability. • Status reporting: Different networks report status and performance differently. Yet it must be possible for the internetworking facility to provide such information on internetworking activity to interested and authorized processes. • Routing techniques: Intranetwork routing may depend on fault detection and congestion control techniques peculiar to each network. The internetworking facility must be able to coordinate these to route data adaptively between stations on different networks. • User access control: Each network will have its own user access control technique (authorization for use of the network). These must be invoked by the internetwork facility as needed. Further, a separate internetwork access control technique may be required. • Connection, connectionless: Individual networks may provide connectionoriented (e.g., virtual circuit) or connectionless (datagram) service. It may be desirable for the internetwork service not to depend on the nature of the connection service of the individual networks. The Internet Protocol (IP) meets some of these requirements. Others require additional control and application software, as we shall see in this chapter and the next. Connectionless Operation In virtually all implementation, internetworking involves connectionless operation at the level of the Internet Protocol. Whereas connection-oriented operation corresponds to the virtual circuit mechanism of a packet-switching network (Figure 10.10), connectionless-mode operation corresponds to the datagram mechanism of a packet-switching network (Figure 10.9). Each network protocol data unit is treated independently and routed from source ES to destination ES through a series of routers and networks. For each data unit transmitted by A, A makes a decision as to which router should receive the data unit. The data unit hops across the internet from one router to the next until it reaches the destination network. At each router, a routing decision is made (independently for each data unit) concerning the next hop. Thus, different data units may travel different routes between source and destination ES. All ESs and all routers share a common network-layer protocol known generically as the Internet Protocol. An Internet Protocol (IP) was initially developed for the DARPA internet project and published as RFC 791 and has become an Internet Standard. Below this Internet Protocol, a protocol is needed to access a particular network. Thus, there are typically two protocols operating in each ES and router at the network layer: an upper sublayer that provides the internetworking function, and a lower sublayer that provides network access. Figure 18.3 shows an example. 18.3 / INTERNET PROTOCOL OPERATION LAN 1 569 LAN 2 Frame relay WAN Router (X) End system (A) Router (Y) End system (B) TCP TCP IP IP t1 t6 t2 t5 t3 t4 IP t7 LLC LLC LAPF MAC Physical t9 Physical Physical t12 t15 t13 t14 Physical Physical TCP-H Data LLC1-H IP-H TCP-H Data t3, t4 MAC1-H LLC1-H IP-H TCP-H Data MAC1-T t8, t9 FR-H IP-H TCP-H Data FR-T LLC2-H IP-H TCP-H Data t13, t14 MAC2-H LLC2-H IP-H TCP-H Data t12, t15 TCP header IP header LLC header MAC header IP LLC MAC Physical IP-H t2, t5 t16 MAC t8 t1, t6, t7, t10, t11, t16 TCP-H IP-H LLCi-H MACi-H t11 LLC LAPF MAC Figure 18.3 t10 MAC2-T MACi-T MAC trailer FR-H Frame relay header FR-T Frame relay trailer Example of Internet Protocol Operation 18.3 INTERNET PROTOCOL OPERATION In this section, we examine the essential functions of an internetwork protocol. For convenience, we refer specifically to the Internet Standard IPv4, but the narrative in this section applies to any connectionless Internet Protocol, such as IPv6. 570 CHAPTER 18 / INTERNET PROTOCOLS Operation of a Connectionless Internetworking Scheme IP provides a connectionless, or datagram, service between end systems. There are a number of advantages to this approach: • A connectionless internet facility is flexible. It can deal with a variety of networks, some of which are themselves connectionless. In essence, IP requires very little from the constituent networks. • A connectionless internet service can be made highly robust. This is basically the same argument made for a datagram network service versus a virtual circuit service. For a further discussion, see Section 10.5. • A connectionless internet service is best for connectionless transport protocols, because it does not impose unnecessary overhead. Figure 18.3 depicts a typical example using IP, in which two LANs are interconnected by a frame relay WAN.The figure depicts the operation of the Internet Protocol for data exchange between host A on one LAN (network 1) and host B on another LAN (network 2) through the WAN. The figure shows the protocol architecture and format of the data unit at each stage.The end systems and routers must all share a common Internet Protocol. In addition, the end systems must share the same protocols above IP. The intermediate routers need only implement up through IP. The IP at A receives blocks of data to be sent to B from a higher layers of software in A (e.g., TCP or UDP). IP attaches a header (at time t1) specifying, among other things, the global internet address of B. That address is logically in two parts: network identifier and end system identifier. The combination of IP header and upper-level data is called an Internet Protocol data unit (PDU), or simply a datagram. The datagram is then encapsulated with the LAN protocol (LLC header at t2 ; MAC header and trailer at t3) and sent to the router, which strips off the LAN fields to read the IP header (t6). The router then encapsulates the datagram with the frame relay protocol fields (t8) and transmits it across the WAN to another router. This router strips off the frame relay fields and recovers the datagram, which it then wraps in LAN fields appropriate to LAN 2 and sends it to B. Let us now look at this example in more detail. End system A has a datagram to transmit to end system B; the datagram includes the internet address of B.The IP module in A recognizes that the destination (B) is on another network. So the first step is to send the data to a router, in this case router X. To do this, IP passes the datagram down to the next lower layer (in this case LLC) with instructions to send it to router X. LLC in turn passes this information down to the MAC layer, which inserts the MAC-level address of router X into the MAC header. Thus, the block of data transmitted onto LAN 1 includes data from a layer or layers above TCP, plus a TCP header, an IP header, an LLC header, and a MAC header and trailer (time t3 in Figure 18.3). Next, the packet travels through network 1 to router X. The router removes MAC and LLC fields and analyzes the IP header to determine the ultimate destination of the data, in this case B. The router must now make a routing decision. There are three possibilities: 1. The destination station B is connected directly to one of the networks to which the router is attached. If so, the router sends the datagram directly to the destination. 18.3 / INTERNET PROTOCOL OPERATION 571 2. To reach the destination, one or more additional routers must be traversed. If so, a routing decision must be made: To which router should the datagram be sent? In both cases 1 and 2, the IP module in the router sends the datagram down to the next lower layer with the destination network address. Please note that we are speaking here of a lower-layer address that refers to this network. 3. The router does not know the destination address. In this case, the router returns an error message to the source of the datagram. In this example, the data must pass through router Y before reaching the destination. So router X constructs a new frame by appending a frame relay (LAPF) header and trailer to the IP datagram. The frame relay header indicates a logical connection to router Y. When this frame arrives at router Y, the frame header and trailer are stripped off. The router determines that this IP data unit is destined for B, which is connected directly to a network to which this router is attached. The router therefore creates a frame with a layer-2 destination address of B and sends it out onto LAN 2. The data finally arrive at B, where the LAN and IP headers can be stripped off. At each router, before the data can be forwarded, the router may need to fragment the datagram to accommodate a smaller maximum packet size limitation on the outgoing network. If so, the data unit is split into two or more fragments, each of which becomes an independent IP datagram. Each new data unit is wrapped in a lower-layer packet and queued for transmission. The router may also limit the length of its queue for each network to which it attaches so as to avoid having a slow network penalize a faster one. Once the queue limit is reached, additional data units are simply dropped. The process just described continues through as many routers as it takes for the data unit to reach its destination. As with a router, the destination end system recovers the IP datagram from its network wrapping. If fragmentation has occurred, the IP module in the destination end system buffers the incoming data until the entire original data field can be reassembled. This block of data is then passed to a higher layer in the end system.3 This service offered by IP is an unreliable one. That is, IP does not guarantee that all data will be delivered or that the data that are delivered will arrive in the proper order. It is the responsibility of the next higher layer (e.g., TCP) to recover from any errors that occur. This approach provides for a great deal of flexibility. With the Internet Protocol approach, each unit of data is passed from router to router in an attempt to get from source to destination. Because delivery is not guaranteed, there is no particular reliability requirement on any of the networks. Thus, the protocol will work with any combination of network types. Because the sequence of delivery is not guaranteed, successive data units can follow different paths through the internet. This allows the protocol to react to both congestion and failure in the internet by changing routes. 3 Appendix L provides a more detailed example, showing the involvement of all protocol layers. 572 CHAPTER 18 / INTERNET PROTOCOLS Design Issues With that brief sketch of the operation of an IP-controlled internet, we now examine some design issues in greater detail: • • • • • Routing Datagram lifetime Fragmentation and reassembly Error control Flow control As we proceed with this discussion, note the many similarities with design issues and techniques relevant to packet-switching networks. To see the reason for this, consider Figure 18.4, which compares an internet architecture with a packetswitching network architecture. The routers (R1, R2, R3) in the internet correspond S2 S1 T1 P1 P3 T2 T3 P2 (a) Packet-switching network architecture N1 P R1 P S1 P P N2 S2 R3 P P P R2 P P N3 (b) Internetwork architecture Figure 18.4 The Internet as a Network (based on [HIND83]) 18.3 / INTERNET PROTOCOL OPERATION 573 to the packet-switching nodes (P1, P2, P3) in the network, and the networks (N1, N2, N3) in the internet correspond to the transmission links (T1, T2, T3) in the networks. The routers perform essentially the same functions as packet-switching nodes and use the intervening networks in a manner analogous to transmission links. Routing For the purpose of routing, each end system and router maintains a routing table that lists, for each possible destination network, the next router to which the internet datagram should be sent. The routing table may be static or dynamic. A static table, however, could contain alternate routes if a particular router is unavailable. A dynamic table is more flexible in responding to both error and congestion conditions. In the Internet, for example, when a router goes down, all of its neighbors will send out a status report, allowing other routers and stations to update their routing tables. A similar scheme can be used to control congestion. Congestion control is particularly important because of the mismatch in capacity between local and wide area networks. Chapter 19 discusses routing protocols. Routing tables may also be used to support other internetworking services, such as security and priority. For example, individual networks might be classified to handle data up to a given security classification. The routing mechanism must assure that data of a given security level are not allowed to pass through networks not cleared to handle such data. Another routing technique is source routing. The source station specifies the route by including a sequential list of routers in the datagram. This, again, could be useful for security or priority requirements. Finally, we mention a service related to routing: route recording. To record a route, each router appends its internet address to a list of addresses in the datagram. This feature is useful for testing and debugging purposes. Datagram Lifetime If dynamic or alternate routing is used, the potential exists for a datagram to loop indefinitely through the internet. This is undesirable for two reasons. First, an endlessly circulating datagram consumes resources. Second, we will see in Chapter 20 that a transport protocol may depend on the existence of an upper bound on datagram lifetime. To avoid these problems, each datagram can be marked with a lifetime. Once the lifetime expires, the datagram is discarded. A simple way to implement lifetime is to use a hop count. Each time that a datagram passes through a router, the count is decremented. Alternatively, the lifetime could be a true measure of time. This requires that the routers must somehow know how long it has been since the datagram or fragment last crossed a router, to know by how much to decrement the lifetime field. This would seem to require some global clocking mechanism. The advantage of using a true time measure is that it can be used in the reassembly algorithm, described next. Fragmentation and Reassembly Individual networks within an internet may specify different maximum packet sizes. It would be inefficient and unwieldy to try to dictate uniform packet size across networks. Thus, routers may need to fragment incoming datagrams into smaller pieces, called segments or fragments, before transmitting on to the next network. 574 CHAPTER 18 / INTERNET PROTOCOLS If datagrams can be fragmented (perhaps more than once) in the course of their travels, the question arises as to where they should be reassembled. The easiest solution is to have reassembly performed at the destination only. The principal disadvantage of this approach is that fragments can only get smaller as data move through the internet. This may impair the efficiency of some networks. However, if intermediate router reassembly is allowed, the following disadvantages result: 1. Large buffers are required at routers, and there is the risk that all of the buffer space will be used up storing partial datagrams. 2. All fragments of a datagram must pass through the same router. This inhibits the use of dynamic routing. In IP, datagram fragments are reassembled at the destination end system. The IP fragmentation technique uses the following information in the IP header: • • • • Data Unit Identifier (ID) Data Length4 Offset More Flag The ID is a means of uniquely identifying an end-system-originated datagram. In IP, it consists of the source and destination addresses, a number that corresponds to the protocol layer that generated the data (e.g., TCP), and an identification supplied by that protocol layer. The Data Length is the length of the user data field in octets, and the Offset is the position of a fragment of user data in the data field of the original datagram, in multiples of 64 bits. The source end system creates a datagram with a Data Length equal to the entire length of the data field, with Offset 0, and a More Flag set to 0 (false). To fragment a long datagram into two pieces, an IP module in a router performs the following tasks: 1. Create two new datagrams and copy the header fields of the incoming datagram into both. 2. Divide the incoming user data field into two portions along a 64-bit boundary (counting from the beginning), placing one portion in each new datagram. The first portion must be a multiple of 64 bits (8 octets). 3. Set the Data Length of the first new datagram to the length of the inserted data, and set More Flag to 1 (true). The Offset field is unchanged. 4. Set the Data Length of the second new datagram to the length of the inserted data, and add the length of the first data portion divided by 8 to the Offset field. The More Flag remains the same. Figure 18.5 gives an example in which two fragments are created from an original IP datagram. The procedure is easily generalized to an n-way split. In this example, the payload of the original IP datagram is a TCP segment, consisting of a 4 In the IPv6 header, there is a Payload Length field that corresponds to Data Length in this discussion. In the IPv4 header, there is Total Length field whose value is the length of the header plus data; the data length must be calculated by subtracting the header length. 18.3 / INTERNET PROTOCOL OPERATION 575 Original IP datagram Data length 404 octets Segment offset 0; More 0 TCP IP header header (20 (20 octets) octets) IP TCP header header (20 (20 octets) octets) TCP payload (384 octets) Partial TCP payload (188 octets) First fragment Data length 208 octets Segment offset 0; More 1 IP header (20 octets) Partial TCP payload (196 octets) Second fragment Data length 196 octets Segment offset 26 64-bit units (208 octets); More 0 Figure 18.5 Fragmentation Example TCP header and application data. The IP header from the original datagram is used in both fragments, with the appropriate changes to the fragmentation-related fields. Note that the first fragment contains the TCP header; this header is not replicated in the second fragment, because all of the IP payload, including the TCP header is transparent to IP. That is, IP is not concerned with the contents of the payload of the datagram. To reassemble a datagram, there must be sufficient buffer space at the reassembly point. As fragments with the same ID arrive, their data fields are inserted in the proper position in the buffer until the entire data field is reassembled, which is achieved when a contiguous set of data exists starting with an Offset of zero and ending with data from a fragment with a false More Flag. One eventuality that must be dealt with is that one or more of the fragments may not get through: The IP service does not guarantee delivery. Some method is needed to decide when to abandon a reassembly effort to free up buffer space. Two approaches are commonly used. First, assign a reassembly lifetime to the first fragment to arrive. This is a local, real-time clock assigned by the reassembly function and decremented while the fragments of the original datagram are being buffered. 576 CHAPTER 18 / INTERNET PROTOCOLS If the time expires prior to complete reassembly, the received fragments are discarded. A second approach is to make use of the datagram lifetime, which is part of the header of each incoming fragment. The lifetime field continues to be decremented by the reassembly function; as with the first approach, if the lifetime expires prior to complete reassembly, the received fragments are discarded. Error Control The internetwork facility does not guarantee successful delivery of every datagram. When a datagram is discarded by a router, the router should attempt to return some information to the source, if possible. The source Internet Protocol entity may use this information to modify its transmission strategy and may notify higher layers. To report that a specific datagram has been discarded, some means of datagram identification is needed. Such identification is discussed in the next section. Datagrams may be discarded for a number of reasons, including lifetime expiration, congestion, and FCS error. In the latter case, notification is not possible because the source address field may have been damaged. Flow Control Internet flow control allows routers and/or receiving stations to limit the rate at which they receive data. For the connectionless type of service we are describing, flow control mechanisms are limited. The best approach would seem to be to send flow control packets, requesting reduced data flow, to other routers and source stations. We will see one example of this with Internet Control Message Protocol (ICMP), discussed in the next section. 18.4 INTERNET PROTOCOL In this section, we look at version 4 of IP, officially defined in RFC 791. Although it is intended that IPv4 will ultimately be replaced by IPv6, it is currently the standard IP used in TCP/IP networks. The Internet Protocol (IP) is part of the TCP/IP suite and is the most widely used internetworking protocol. As with any protocol standard, IP is specified in two parts (see Figure 2.9): • The interface with a higher layer (e.g., TCP), specifying the services that IP provides • The actual protocol format and mechanisms In this section, we examine first IP services and then the protocol. This is followed by a discussion of IP address formats. Finally, the Internet Control Message Protocol (ICMP), which is an integral part of IP, is described. IP Services The services to be provided across adjacent protocol layers (e.g., between IP and TCP) are expressed in terms of primitives and parameters. A primitive specifies the function to be performed, and the parameters are used to pass data and control information. The actual form of a primitive is implementation dependent. An example is a procedure call. 18.4 / INTERNET PROTOCOL 577 IP provides two service primitives at the interface to the next higher layer. The Send primitive is used to request transmission of a data unit. The Deliver primitive is used by IP to notify a user of the arrival of a data unit. The parameters associated with the two primitives are as follows: • • • • • • • • • • Source address: Internetwork address of sending IP entity. Destination address: Internetwork address of destination IP entity. Protocol: Recipient protocol entity (an IP user, such as TCP). Type-of-service indicators: Used to specify the treatment of the data unit in its transmission through component networks. Identification: Used in combination with the source and destination addresses and user protocol to identify the data unit uniquely. This parameter is needed for reassembly and error reporting. Don’t fragment identifier: Indicates whether IP can fragment data to accomplish delivery. Time to live: Measured in seconds. Data length: Length of data being transmitted. Option data: Options requested by the IP user. Data: User data to be transmitted. The identification, don’t fragment identifier, and time to live parameters are present in the Send primitive but not in the Deliver primitive. These three parameters provide instructions to IP that are not of concern to the recipient IP user. The options parameter allows for future extensibility and for inclusion of parameters that are usually not invoked. The currently defined options are as follows: • Security: Allows a security label to be attached to a datagram. • Source routing: A sequenced list of router addresses that specifies the route to be followed. Routing may be strict (only identified routers may be visited) or loose (other intermediate routers may be visited). • Route recording: A field is allocated to record the sequence of routers visited by the datagram. • Stream identification: Names reserved resources used for stream service. This service provides special handling for volatile periodic traffic (e.g., voice). • Timestamping: The source IP entity and some or all intermediate routers add a timestamp (precision to milliseconds) to the data unit as it goes by. Internet Protocol The protocol between IP entities is best described with reference to the IP datagram format, shown in Figure 18.6. The fields are as follows: • Version (4 bits): Indicates version number, to allow evolution of the protocol; the value is 4. CHAPTER 18 / INTERNET PROTOCOLS Bit: 0 4 Version 8 IHL 14 DS Time to Live 16 ECN Identification 20 octets 578 19 31 Total Length Flags Protocol Fragment Offset Header Checksum Source Address Destination Address Options Padding Figure 18.6 IPv4 Header • Internet Header Length (IHL) (4 bits): Length of header in 32-bit words. The minimum value is five, for a minimum header length of 20 octets. • DS/ECN (8 bits): Prior to the introduction of differentiated services, this field was referred to as the Type of Service field and specified reliability, precedence, delay, and throughput parameters. This interpretation has now been superseded. The first six bits of this field are now referred to as the DS (Differentiated Services) field, discussed in Chapter 19. The remaining 2 bits are reserved for an ECN (Explicit Congestion Notification) field, currently in the process of standardization. The ECN field provides for explicit signaling of congestion in a manner similar to that discussed for frame relay (Section 13.5). • Total Length (16 bits): Total datagram length, including header plus data, in octets. • Identification (16 bits): A sequence number that, together with the source address, destination address, and user protocol, is intended to identify a datagram uniquely. Thus, this number should be unique for the datagram’s source address, destination address, and user protocol for the time during which the datagram will remain in the internet. • Flags (3 bits): Only two of the bits are currently defined. The More bit is used for fragmentation and reassembly, as previously explained. The Don’t Fragment bit prohibits fragmentation when set. This bit may be useful if it is known that the destination does not have the capability to reassemble fragments. However, if this bit is set, the datagram will be discarded if it exceeds the maximum size of an en route network. Therefore, if the bit is set, it may be advisable to use source routing to avoid networks with small maximum packet size. • Fragment Offset (13 bits): Indicates where in the original datagram this fragment belongs, measured in 64-bit units. This implies that fragments other 18.4 / INTERNET PROTOCOL • • • • • • • • 579 than the last fragment must contain a data field that is a multiple of 64 bits in length. Time to Live (8 bits): Specifies how long, in seconds, a datagram is allowed to remain in the internet. Every router that processes a datagram must decrease the TTL by at least one, so the TTL is similar to a hop count. Protocol (8 bits): Indicates the next higher level protocol that is to receive the data field at the destination; thus, this field identifies the type of the next header in the packet after the IP header. Example values are TCP = 6; UDP = 17; ICMP = 1. A complete list is maintained at http://www.iana.org/ assignments/protocol-numbers. Header Checksum (16 bits): An error-detecting code applied to the header only. Because some header fields may change during transit (e.g., Time to Live, fragmentation-related fields), this is reverified and recomputed at each router. The checksum is formed by taking the ones complement of the 16-bit ones complement addition of all 16-bit words in the header. For purposes of computation, the checksum field is itself initialized to a value of zero.5 Source Address (32 bits): Coded to allow a variable allocation of bits to specify the network and the end system attached to the specified network, as discussed subsequently. Destination Address (32 bits): Same characteristics as source address. Options (variable): Encodes the options requested by the sending user. Padding (variable): Used to ensure that the datagram header is a multiple of 32 bits in length. Data (variable): The data field must be an integer multiple of 8 bits in length. The maximum length of the datagram (data field plus header) is 65,535 octets. It should be clear how the IP services specified in the Send and Deliver primitives map into the fields of the IP datagram. IP Addresses The source and destination address fields in the IP header each contain a 32-bit global internet address, generally consisting of a network identifier and a host identifier. Network Classes The address is coded to allow a variable allocation of bits to specify network and host, as depicted in Figure 18.7. This encoding provides flexibility in assigning addresses to hosts and allows a mix of network sizes on an internet. The three principal network classes are best suited to the following conditions: • Class A: Few networks, each with many hosts • Class B: Medium number of networks, each with a medium number of hosts • Class C: Many networks, each with a few hosts 5 A discussion of this checksum is contained in Appendix K. 580 CHAPTER 18 / INTERNET PROTOCOLS 0 Network (7 bits) 1 0 1 1 0 Host (24 bits) Class A Host (16 bits) Network (14 bits) Host (8 bits) Network (21 bits) Class B Class C 1 1 1 0 Multicast Class D 1 1 1 1 0 Future use Class E Figure 18.7 IPv4 Address Formats In a particular environment, it may be best to use addresses all from one class. For example, a corporate internetwork that consist of a large number of departmental local area networks may need to use Class C addresses exclusively. However, the format of the addresses is such that it is possible to mix all three classes of addresses on the same internetwork; this is what is done in the case of the Internet itself. A mixture of classes is appropriate for an internetwork consisting of a few large networks, many small networks, plus some medium-sized networks. IP addresses are usually written in dotted decimal notation, with a decimal number representing each of the octets of the 32-bit address. For example, the IP address 11000000 11100100 00010001 00111001 is written as 192.228.17.57. Note that all Class A network addresses begin with a binary 0. Network addresses with a first octet of 0 (binary 00000000) and 127 (binary 01111111) are reserved, so there are 126 potential Class A network numbers, which have a first dotted decimal number in the range 1 to 126. Class B network addresses begin with a binary 10, so that the range of the first decimal number in a Class B address is 128 to 191(binary 10000000 to 10111111). The second octet is also part of the Class B address, so that there are 214 = 16,384 Class B addresses. For Class C addresses, the first decimal number ranges from 192 to 223 (11000000 to 11011111). The total number of Class C addresses is 221 = 2,097,152. Subnets and Subnet Masks The concept of subnet was introduced to address the following requirement. Consider an internet that includes one or more WANs and a number of sites, each of which has a number of LANs. We would like to allow arbitrary complexity of interconnected LAN structures within an organization while insulating the overall internet against explosive growth in network numbers 18.4 / INTERNET PROTOCOL 581 and routing complexity. One approach to this problem is to assign a single network number to all of the LANs at a site. From the point of view of the rest of the internet, there is a single network at that site, which simplifies addressing and routing. To allow the routers within the site to function properly, each LAN is assigned a subnet number. The host portion of the internet address is partitioned into a subnet number and a host number to accommodate this new level of addressing. Within the subnetted network, the local routers must route on the basis of an extended network number consisting of the network portion of the IP address and the subnet number. The bit positions containing this extended network number are indicated by the address mask. The use of the address mask allows the host to determine whether an outgoing datagram is destined for a host on the same LAN (send directly) or another LAN (send datagram to router). It is assumed that some other means (e.g., manual configuration) are used to create address masks and make them known to the local routers. Table 18.3a shows the calculations involved in the use of a subnet mask. Note that the effect of the subnet mask is to erase the portion of the host field that refers to an actual host on a subnet. What remains is the network number and the subnet number. Figure 18.8 shows an example of the use of subnetting. The figure shows a local complex consisting of three LANs and two routers. To the rest of the internet, this complex is a single network with a Class C address of the form 192.228.17.x, where the leftmost three octets are the network number and the rightmost octet contains a host number x. Both routers R1 and R2 are configured with a subnet Table 18.3 IP Addresses and Subnet Masks [STEI95] (a) Dotted decimal and binary representations of IP address and subnet masks Binary Representation Dotted Decimal IP address 11000000.11100100.00010001.00111001 192.228.17.57 Subnet mask 11111111.11111111.11111111.11100000 255.255.255.224 Bitwise AND of address and mask (resultant network/subnet number) 11000000.11100100.00010001.00100000 192.228.17.32 Subnet number 11000000.11100100.00010001.001 1 Host number 00000000.00000000.00000000.00011001 25 (b) Default subnet masks Binary Representation Dotted Decimal Class A default mask 11111111.00000000.00000000.00000000 255.0.0.0 Example Class A mask 11111111.11000000.00000000.00000000 255.192.0.0 Class B default mask 11111111.11111111.00000000.00000000 255.255.0.0 Example Class B mask 11111111.11111111.11111000.00000000 255.255.248.0 Class C default mask 11111111.11111111.11111111.00000000 255. 255. 255.0 Example Class C mask 11111111.11111111.11111111.11111100 255. 255. 255.252 582 CHAPTER 18 / INTERNET PROTOCOLS Net ID/Subnet ID: 192.228.17.32 Subnet number: 1 LAN X A Rest of internet B IP Address: 192.228.17.33 Host number: 1 R1 IP Address: 192.228.17.57 Host number: 25 Net ID/Subnet ID: 192.228.17.64 Subnet number: 2 LAN Y C R2 IP Address: 192.228.17.65 Host number: 1 LAN Z D Net ID/Subnet ID: 192.228.17.96 Subnet number: 3 IP Address: 192.228.17.97 Host number: 1 Figure 18.8 Example of Subnetworking mask with the value 255.255.255.224 (see Table 18.3a). For example, if a datagram with the destination address 192.228.17.57 arrives at R1 from either the rest of the internet or from LAN Y, R1 applies the subnet mask to determine that this address refers to subnet 1, which is LAN X, and so forwards the datagram to LAN X. Similarly, if a datagram with that destination address arrives at R2 from LAN Z, R2 applies the mask and then determines from its forwarding database that datagrams destined for subnet 1 should be forwarded to R1. Hosts must also employ a subnet mask to make routing decisions. The default subnet mask for a given class of addresses is a null mask (Table 18.3b), which yields the same network and host number as the non-subnetted address. Internet Control Message Protocol (ICMP) The IP standard specifies that a compliant implementation must also implement ICMP (RFC 792). ICMP provides a means for transferring messages from routers and other hosts to a host. In essence, ICMP provides feedback about problems in the communication environment. Examples of its use are when a datagram cannot reach its destination, when the router does not have the buffering capacity to forward a datagram, and when the router can direct the station to send traffic on a shorter route. In most cases, an ICMP message is sent in response to a datagram, either by a router along the datagram’s path or by the intended destination host. 18.4 / INTERNET PROTOCOL 0 8 16 31 Code Checksum Unused IPHeader 64 bits of original datagram 0 Type (a) Destination unreachable; time exceeded; source quench 0 8 16 31 Type Code Checksum Pointer Unused IP Header 64 bits of original datagram 0 (b) Parameter problem 583 8 16 31 Type Code Checksum Identifier Sequence number Originate timestamp (e) Timestamp 8 16 31 Type Code Checksum Sequence number Identifier Originate timestamp Receive timestamp Transmit timestamp (f) Timestamp reply 0 8 16 31 Code Checksum Gateway Internet address IP Header 64 bits of original datagram 0 Type 8 16 31 Type Code Checksum Identifier Sequence number (g) Address mask request (c) Redirect 0 8 16 31 Type Code Checksum Identifier Sequence number Optional data (d) Echo, echo reply Figure 18.9 0 8 16 31 Type Code Checksum Sequence number Identifier Address mask (h) Address mask reply ICMP Message Formats Although ICMP is, in effect, at the same level as IP in the TCP/IP architecture, it is a user of IP. An ICMP message is constructed and then passed down to IP, which encapsulates the message with an IP header and then transmits the resulting datagram in the usual fashion. Because ICMP messages are transmitted in IP datagrams, their delivery is not guaranteed and their use cannot be considered reliable. Figure 18.9 shows the format of the various ICMP message types. An ICMP message starts with a 64-bit header consisting of the following: • Type (8 bits): Specifies the type of ICMP message. • Code (8 bits): Used to specify parameters of the message that can be encoded in one or a few bits. • Checksum (16 bits): Checksum of the entire ICMP message. This is the same checksum algorithm used for IP. • Parameters (32 bits): Used to specify more lengthy parameters. These fields are generally followed by additional information fields that further specify the content of the message. In those cases in which the ICMP message refers to a prior datagram, the information field includes the entire IP header plus the first 64 bits of the data field of the original datagram. This enables the source host to match the incoming ICMP message with the prior datagram. The reason for including the first 64 bits of the data field is that this will enable the IP module in the host to determine which 584 CHAPTER 18 / INTERNET PROTOCOLS upper-level protocol or protocols were involved. In particular, the first 64 bits would include a portion of the TCP header or other transport-level header. The destination unreachable message covers a number of contingencies. A router may return this message if it does not know how to reach the destination network. In some networks, an attached router may be able to determine if a particular host is unreachable and returns the message. The destination host itself may return this message if the user protocol or some higher-level service access point is unreachable. This could happen if the corresponding field in the IP header was set incorrectly. If the datagram specifies a source route that is unusable, a message is returned. Finally, if a router must fragment a datagram but the Don’t Fragment flag is set, the datagram is discarded and a message is returned. A router will return a time exceeded message if the lifetime of the datagram expires. A host will send this message if it cannot complete reassembly within a time limit. A syntactic or semantic error in an IP header will cause a parameter problem message to be returned by a router or host. For example, an incorrect argument may be provided with an option. The Parameter field contains a pointer to the octet in the original header where the error was detected. The source quench message provides a rudimentary form of flow control. Either a router or a destination host may send this message to a source host, requesting that it reduce the rate at which it is sending traffic to the internet destination. On receipt of a source quench message, the source host should cut back the rate at which it is sending traffic to the specified destination until it no longer receives source quench messages. The source quench message can be used by a router or host that must discard datagrams because of a full buffer. In that case, the router or host will issue a source quench message for every datagram that it discards. In addition, a system may anticipate congestion and issue source quench messages when its buffers approach capacity. In that case, the datagram referred to in the source quench message may well be delivered. Thus, receipt of a source quench message does not imply delivery or nondelivery of the corresponding datagram. A router sends a redirect message to a host on a directly connected router to advise the host of a better route to a particular destination. The following is an example, using Figure 18.8. Router R1 receives a datagram from host C on network Y, to which R1 is attached. R1 checks its routing table and obtains the address for the next router, R2, on the route to the datagram’s internet destination network, Z. Because R2 and the host identified by the internet source address of the datagram are on the same network, R1 sends a redirect message to C. The redirect message advises the host to send its traffic for network Z directly to router R2, because this is a shorter path to the destination. The router forwards the original datagram to its internet destination (via R2). The address of R2 is contained in the parameter field of the redirect message. The echo and echo reply messages provide a mechanism for testing that communication is possible between entities. The recipient of an echo message is obligated to return the message in an echo reply message. An identifier and sequence number are associated with the echo message to be matched in the echo reply message. The identifier might be used like a service access point to 18.4 / INTERNET PROTOCOL 585 identify a particular session, and the sequence number might be incremented on each echo request sent. The timestamp and timestamp reply messages provide a mechanism for sampling the delay characteristics of the internet. The sender of a timestamp message may include an identifier and sequence number in the parameters field and include the time that the message is sent (originate timestamp). The receiver records the time it received the message and the time that it transmits the reply message in the timestamp reply message. If the timestamp message is sent using strict source routing, then the delay characteristics of a particular route can be measured. The address mask request and address mask reply messages are useful in an environment that includes subnets. The address mask request and reply messages allow a host to learn the address mask for the LAN to which it connects. The host broadcasts an address mask request message on the LAN. The router on the LAN responds with an address mask reply message that contains the address mask. Address Resolution Protocol (ARP) Earlier in this chapter, we referred to the concepts of a global address (IP address) and an address that conforms to the addressing scheme of the network to which a host is attached (subnetwork address). For a local area network, the latter address is a MAC address, which provides a physical address for a host port attached to the LAN. Clearly, to deliver an IP datagram to a destination host, a mapping must be made from the IP address to the subnetwork address for that last hop. If a datagram traverses one or more routers between source and destination hosts, then the mapping must be done in the final router, which is attached to the same subnetwork as the destination host. If a datagram is sent from one host to another on the same subnetwork, then the source host must do the mapping. In the following discussion, we use the term system to refer to the entity that does the mapping. For mapping from an IP address to a subnetwork address, a number of approaches are possible, including • Each system can maintain a local table of IP addresses and matching subnetwork addresses for possible correspondents. This approach does not accommodate easy and automatic additions of new hosts to the subnetwork. • The subnetwork address can be a subset of the network portion of the IP address. However, the entire internet address is 32 bits long and for most subnetwork types (e.g., Ethernet) the Host Address field is longer than 32 bits. • A centralized directory can be maintained on each subnetwork that contains the IP-subnet address mappings. This is a reasonable solution for many networks. • An address resolution protocol can be used. This is a simpler approach than the use of a centralized directory and is well suited to LANs. RFC 826 defines an Address Resolution Protocol (ARP), which allows dynamic distribution of the information needed to build tables to translate an IP address A into a 48-bit Ethernet address; the protocol can be used for any broadcast 586 CHAPTER 18 / INTERNET PROTOCOLS network. ARP exploits the broadcast property of a LAN; namely, that a transmission from any device on the network is received by all other devices on the network. ARP works as follows: 1. Each system on the LAN maintains a table of known IP-subnetwork address mappings. 2. When a subnetwork address is needed for an IP address, and the mapping is not found in the system’s table, the system uses ARP directly on top of the LAN protocol (e.g., IEEE 802) to broadcast a request.The broadcast message contains the IP address for which a subnetwork address is needed. 3. Other hosts on the subnetwork listen for ARP messages and reply when a match occurs. The reply includes both the IP and subnetwork addresses of the replying host. 4. The original request includes the requesting host’s IP address and subnetwork address. Any interested host can copy this information into its local table, avoiding the need for later ARP messages. 5. The ARP message can also be used simply to broadcast a host’s IP address and subnetwork address, for the benefit of others on the subnetwork. 18.5 IPv6 The Internet Protocol (IP) has been the foundation of the Internet and virtually all multivendor private internetworks. This protocol is reaching the end of its useful life and a new protocol, known as IPv6 (IP version 6), has been defined to ultimately replace IP.6 We first look at the motivation for developing a new version of IP and then examine some of its details. IP Next Generation The driving motivation for the adoption of a new version of IP was the limitation imposed by the 32-bit address field in IPv4. With a 32-bit address field, it is possible in principle to assign 232 different addresses, which is over 4 billion possible addresses. One might think that this number of addresses was more than adequate to meet addressing needs on the Internet. However, in the late 1980s it was perceived that there would be a problem, and this problem began to manifest itself in the early 1990s. Reasons for the inadequacy of 32-bit addresses include the following: • The two-level structure of the IP address (network number, host number) is convenient but wasteful of the address space. Once a network number is assigned to a network, all of the host-number addresses for that network number are assigned to that network. The address space for that network may 6 The currently deployed version of IP is IP version 4; previous versions of IP (1 through 3) were successively defined and replaced to reach IPv4. Version 5 is the number assigned to the Stream Protocol, a connection-oriented internet-layer protocol; hence the use of the label version 6. 18.5 / IPv6 • • • • 587 be sparsely used, but as far as the effective IP address space is concerned, if a network number is used, then all addresses within the network are used. The IP addressing model generally requires that a unique network number be assigned to each IP network whether or not it is actually connected to the Internet. Networks are proliferating rapidly. Most organizations boast multiple LANs, not just a single LAN system. Wireless networks have rapidly assumed a major role. The Internet itself has grown explosively for years. Growth of TCP/IP usage into new areas will result in a rapid growth in the demand for unique IP addresses. Examples include using TCP/IP to interconnect electronic point-of-sale terminals and for cable television receivers. Typically, a single IP address is assigned to each host. A more flexible arrangement is to allow multiple IP addresses per host. This, of course, increases the demand for IP addresses. So the need for an increased address space dictated that a new version of IP was needed. In addition, IP is a very old protocol, and new requirements in the areas of address configuration, routing flexibility, and traffic support had been defined. In response to these needs, the Internet Engineering Task Force (IETF) issued a call for proposals for a next generation IP (IPng) in July of 1992. A number of proposals were received, and by 1994 the final design for IPng emerged. A major milestone was reached with the publication of RFC 1752, “The Recommendation for the IP Next Generation Protocol,” issued in January 1995. RFC 1752 outlines the requirements for IPng, specifies the PDU formats, and highlights the IPng approach in the areas of addressing, routing, and security. A number of other Internet documents defined details of the protocol, now officially called IPv6; these include an overall specification of IPv6 (RFC 2460), an RFC dealing with addressing structure of IPv6 (RFC 2373), and numerous others. IPv6 includes the following enhancements over IPv4: • Expanded address space: IPv6 uses 128-bit addresses instead of the 32-bit addresses of IPv4. This is an increase of address space by a factor of 296. It has been pointed out [HIND95] that this allows on the order of 6 * 1023 unique addresses per square meter of the surface of the earth. Even if addresses are very inefficiently allocated, this address space seems inexhaustible. • Improved option mechanism: IPv6 options are placed in separate optional headers that are located between the IPv6 header and the transport-layer header. Most of these optional headers are not examined or processed by any router on the packet’s path. This simplifies and speeds up router processing of IPv6 packets compared to IPv4 datagrams.7 It also makes it easier to add additional options. 7 The protocol data unit for IPv6 is referred to as a packet rather than a datagram, which is the term used for IPv4 PDUs. 588 CHAPTER 18 / INTERNET PROTOCOLS • Address autoconfiguration: This capability provides for dynamic assignment of IPv6 addresses. • Increased addressing flexibility: IPv6 includes the concept of an anycast address, for which a packet is delivered to just one of a set of nodes. The scalability of multicast routing is improved by adding a scope field to multicast addresses. • Support for resource allocation: IPv6 enables the labeling of packets belonging to a particular traffic flow for which the sender requests special handling. This aids in the support of specialized traffic such as real-time video. All of these features are explored in the remainder of this section. IPv6 Structure An IPv6 protocol data unit (known as a packet) has the following general form: ;!40 octets !: ;!!!!!!!!!!! IPv6 header Extension header 0 or !!!!!!!!!!!: more . . . Extension header Transport-level PDU The only header that is required is referred to simply as the IPv6 header. This is of fixed size with a length of 40 octets, compared to 20 octets for the mandatory portion of the IPv4 header (Figure 18.6). The following extension headers have been defined: • Hop-by-Hop Options header: Defines special options that require hop-by-hop processing • Routing header: Provides extended routing, similar to IPv4 source routing • Fragment header: Contains fragmentation and reassembly information • Authentication header: Provides packet integrity and authentication • Encapsulating Security Payload header: Provides privacy • Destination Options header: Contains optional information to be examined by the destination node The IPv6 standard recommends that, when multiple extension headers are used, the IPv6 headers appear in the following order: 1. IPv6 header: Mandatory, must always appear first 2. Hop-by-Hop Options header 3. Destination Options header: For options to be processed by the first destination that appears in the IPv6 Destination Address field plus subsequent destinations listed in the Routing header 4. Routing header 5. Fragment header 6. Authentication header 7. Encapsulating Security Payload header 18.5 / IPv6 589 8. Destination Options header: For options to be processed only by the final destination of the packet Figure 18.10 shows an example of an IPv6 packet that includes an instance of each header, except those related to security. Note that the IPv6 header and each extension header include a Next Header field. This field identifies the type of the immediately following header. If the next header is an extension header, then this field contains the type identifier of that header. Otherwise, this field contains the protocol identifier of the upper-layer protocol using IPv6 (typically a transport-level protocol), using the same values as the IPv4 Protocol field. In Figure 18.10, the upper-layer protocol is TCP; thus, the upper-layer data carried by the IPv6 packet consist of a TCP header followed by a block of application data. We first look at the main IPv6 header and then examine each of the extensions in turn. Octets: Mandatory IPv6 header 40 IPv6 header Hop-by-hop options header Optional extension headers Routing header Variable Variable 8 Fragment header Destination options header Variable 20 (optional variable part) TCP header IPv6 packet body Application data Variable Next Header field Figure 18.10 Segment) IPv6 Packet with Extension Headers (containing a TCP CHAPTER 18 / INTERNET PROTOCOLS Bit: 0 10 4 Version DS Payload Length 10 32 bits 40 octets 590 12 24 16 ECN 31 Flow Label Next Header Hop Limit Source Address Destination Address Figure 18.11 IPv6 Header IPv6 Header The IPv6 header has a fixed length of 40 octets, consisting of the following fields (Figure 18.11): • Version (4 bits): Internet protocol version number; the value is 6. • DS/ECN (8 bits): Available for use by originating nodes and/or forwarding routers for differentiated services and congestion functions, as described for the IPv4 DS/ECN field. • Flow Label (20 bits): May be used by a host to label those packets for which it is requesting special handling by routers within a network; discussed subsequently. • Payload Length (16 bits): Length of the remainder of the IPv6 packet following the header, in octets. In other words, this is the total length of all of the extension headers plus the transport-level PDU. • Next Header (8 bits): Identifies the type of header immediately following the IPv6 header; this will either be an IPv6 extension header or a higher-layer header, such as TCP or UDP. • Hop Limit (8 bits): The remaining number of allowable hops for this packet. The hop limit is set to some desired maximum value by the source and decremented by 1 by each node that forwards the packet. The packet is discarded if Hop Limit is decremented to zero. This is a simplification over the processing required for the Time to Live field of IPv4.The consensus was that the extra effort in accounting for time intervals in IPv4 added no significant value to the protocol. In fact, IPv4 routers, as a general rule, treat the Time to Live field as a hop limit field. • Source Address (128 bits): The address of the originator of the packet. 18.5 / IPv6 591 • Destination Address (128 bits): The address of the intended recipient of the packet. This may not in fact be the intended ultimate destination if a Routing header is present, as explained subsequently. Although the IPv6 header is longer than the mandatory portion of the IPv4 header (40 octets versus 20 octets), it contains fewer fields (8 versus 12). Thus, routers have less processing to do per header, which should speed up routing. Flow Label RFC 3967 defines a flow as a sequence of packets sent from a particular source to a particular (unicast, anycast, or multicast) destination for which the source desires special handling by the intervening routers. A flow is uniquely identified by the combination of a source address, destination address, and a nonzero 20-bit flow label. Thus, all packets that are to be part of the same flow are assigned the same flow label by the source. From the source’s point of view, a flow typically will be a sequence of packets that are generated from a single application instance at the source and that have the same transfer service requirements. A flow may comprise a single TCP connection or even multiple TCP connections; an example of the latter is a file transfer application, which could have one control connection and multiple data connections. A single application may generate a single flow or multiple flows. An example of the latter is multimedia conferencing, which might have one flow for audio and one for graphic windows, each with different transfer requirements in terms of data rate, delay, and delay variation. From the router’s point of view, a flow is a sequence of packets that share attributes that affect how these packets are handled by the router. These include path, resource allocation, discard requirements, accounting, and security attributes. The router may treat packets from different flows differently in a number of ways, including allocating different buffer sizes, giving different precedence in terms of forwarding, and requesting different quality of service from networks. There is no special significance to any particular flow label. Instead the special handling to be provided for a packet flow must be declared in some other way. For example, a source might negotiate or request special handling ahead of time from routers by means of a control protocol, or at transmission time by information in one of the extension headers in the packet, such as the Hop-by-Hop Options header. Examples of special handling that might be requested include some sort of nondefault quality of service and some form of real-time service. In principle, all of a user’s requirements for a particular flow could be defined in an extension header and included with each packet. If we wish to leave the concept of flow open to include a wide variety of requirements, this design approach could result in very large packet headers. The alternative, adopted for IPv6, is the flow label, in which the flow requirements are defined prior to flow commencement and a unique flow label is assigned to the flow. In this case, the router must save flow requirement information about each flow. The following rules apply to the flow label: 1. Hosts or routers that do not support the Flow Label field must set the field to zero when originating a packet, pass the field unchanged when forwarding a packet, and ignore the field when receiving a packet. 592 CHAPTER 18 / INTERNET PROTOCOLS 2. All packets originating from a given source with the same nonzero Flow Label must have the same Destination Address, Source Address, Hop-by-Hop Options header contents (if this header is present), and Routing header contents (if this header is present). The intent is that a router can decide how to route and process the packet by simply looking up the flow label in a table and without examining the rest of the header. 3. The source assigns a flow label to a flow. New flow labels must be chosen (pseudo-) randomly and uniformly in the range 1 to 220 - 1, subject to the restriction that a source must not reuse a flow label for a new flow within the lifetime of the existing flow. The zero flow label is reserved to indicate that no flow label is being used. This last point requires some elaboration. The router must maintain information about the characteristics of each active flow that may pass through it, presumably in some sort of table. To forward packets efficiently and rapidly, table lookup must be efficient. One alternative is to have a table with 220 (about 1 million) entries, one for each possible flow label; this imposes an unnecessary memory burden on the router. Another alternative is to have one entry in the table per active flow, include the flow label with each entry, and require the router to search the entire table each time a packet is encountered. This imposes an unnecessary processing burden on the router. Instead, most router designs are likely to use some sort of hash table approach. With this approach a moderate-sized table is used, and each flow entry is mapped into the table using a hashing function on the flow label. The hashing function might simply be the low-order few bits (say 8 or 10) of the flow label or some simple calculation on the 20 bits of the flow label. In any case, the efficiency of the hash approach typically depends on the flow labels being uniformly distributed over their possible range. Hence requirement number 3 in the preceding list. IPv6 Addresses IPv6 addresses are 128 bits in length. Addresses are assigned to individual interfaces on nodes, not to the nodes themselves.8 A single interface may have multiple unique unicast addresses. Any of the unicast addresses associated with a node’s interface may be used to uniquely identify that node. The combination of long addresses and multiple addresses per interface enables improved routing efficiency over IPv4. In IPv4, addresses generally do not have a structure that assists routing, and therefore a router may need to maintain huge table of routing paths. Longer internet addresses allow for aggregating addresses by hierarchies of network, access provider, geography, corporation, and so on. Such aggregation should make for smaller routing tables and faster table lookups. The allowance for multiple addresses per interface would allow a subscriber that uses multiple access providers across the same interface to have separate addresses aggregated under each provider’s address space. IPv6 allows three types of addresses: • Unicast: An identifier for a single interface. A packet sent to a unicast address is delivered to the interface identified by that address. 8 In IPv6, a node is any device that implements IPv6; this includes hosts and routers. 18.5 / IPv6 593 • Anycast: An identifier for a set of interfaces (typically belonging to different nodes). A packet sent to an anycast address is delivered to one of the interfaces identified by that address (the “nearest” one, according to the routing protocols’ measure of distance). • Multicast: An identifier for a set of interfaces (typically belonging to different nodes). A packet sent to a multicast address is delivered to all interfaces identified by that address. Hop-by-Hop Options Header The Hop-by-Hop Options header carries optional information that, if present, must be examined by every router along the path. This header consists of (Figure 18.12a): • Next Header (8 bits): Identifies the type of header immediately following this header. • Header Extension Length (8 bits): Length of this header in 64-bit units, not including the first 64 bits. • Options: A variable-length field consisting of one or more option definitions. Each definition is in the form of three subfields: Option Type (8 bits), which identifies the option; Length (8 bits), which specifies the length of the Option Data field in octets; and Option Data, which is a variable-length specification of the option. 0 8 16 31 Next header Hdr ext len 0 8 16 Next header Hdr ext len 0 Reserved 24 One or more options Address[1] (a) Hop-by-Hop options header; destination options header 0 8 16 Next header Reserved Fragment offset Identification Address[2] 29 31 Res M (b) Fragment header 0 8 16 31 Next header Hdr ext len Routing type Segments left Address[n] Type-specific data (d) Type 0 routing header (c) Generic routing header Figure 18.12 IPv6 Extension Headers 31 Segments left 594 CHAPTER 18 / INTERNET PROTOCOLS It is actually the lowest-order five bits of the Option Type field that are used to specify a particular option. The high-order two bits indicate that action to be taken by a node that does not recognize this option type, as follows: • 00—Skip over this option and continue processing the header. • 01—Discard the packet. • 10—Discard the packet and send an ICMP Parameter Problem message to the packet’s Source Address, pointing to the unrecognized Option Type. • 11—Discard the packet and, only if the packet’s Destination Address is not a multicast address, send an ICMP Parameter Problem message to the packet’s Source Address, pointing to the unrecognized Option Type. The third highest-order bit specifies whether the Option Data field does not change (0) or may change (1) en route from source to destination. Data that may change must be excluded from authentication calculations, as discussed in Chapter 21. These conventions for the Option Type field also apply to the Destination Options header. Four hop-by-hop options have been specified so far: • Pad1: Used to insert one byte of padding into the Options area of the header. • PadN: Used to insert N bytes (N Ú 2) of padding into the Options area of the header. The two padding options ensure that the header is a multiple of 8 bytes in length. • Jumbo payload: Used to send IPv6 packets with payloads longer than 65,535 octets. The Option Data field of this option is 32 bits long and gives the length of the packet in octets, excluding the IPv6 header. For such packets, the Payload Length field in the IPv6 header must be set to zero, and there must be no Fragment header. With this option, IPv6 supports packet sizes up to more than 4 billion octets. This facilitates the transmission of large video packets and enables IPv6 to make the best use of available capacity over any transmission medium. • Router alert: Informs the router that the contents of this packet is of interest to the router and to handle any control data accordingly.The absence of this option in an IPv6 datagram informs the router that the packet does not contain information needed by the router and hence can be safely routed without further packet parsing. Hosts originating IPv6 packets are required to include this option in certain circumstances.The purpose of this option is to provide efficient support for protocols such as RSVP (Chapter 19) that generate packets that need to be examined by intermediate routers for purposes of traffic control. Rather than requiring the intermediate routers to look in detail at the extension headers of a packet, this option alerts the router when such attention is required. Fragment Header In IPv6, fragmentation may only be performed by source nodes, not by routers along a packet’s delivery path. To take full advantage of the internetworking environment, a node must perform a path discovery algorithm that enables it to learn the smallest maximum transmission unit (MTU) supported by any network on the path. With this knowledge, the source node will fragment, as required, for each given 18.5 / IPv6 595 destination address. Otherwise the source must limit all packets to 1280 octets, which is the minimum MTU that must be supported by each network. The fragment header consists of the following (Figure 18.12b): • Next Header (8 bits): Identifies the type of header immediately following this header. • Reserved (8 bits): For future use. • Fragment Offset (13 bits): Indicates where in the original packet the payload of this fragment belongs, measured in 64-bit units.This implies that fragments (other than the last fragment) must contain a data field that is a multiple of 64 bits long. • Res (2 bits): Reserved for future use. • M Flag (1 bit): 1 = more fragments; 0 = last fragment. • Identification (32 bits): Intended to uniquely identify the original packet. The identifier must be unique for the packet’s source address and destination address for the time during which the packet will remain in the internet. All fragments with the same identifier, source address, and destination address are reassembled to form the original packet. The fragmentation algorithm is the same as that described in Section 18.3. Routing Header The Routing header contains a list of one or more intermediate nodes to be visited on the way to a packet’s destination. All routing headers start with a 32-bit block consisting of four 8-bit fields, followed by routing data specific to a given routing type (Figure 18.12c). The four 8-bit fields are as follows: • Next Header: Identifies the type of header immediately following this header. • Header Extension Length: Length of this header in 64-bit units, not including the first 64 bits. • Routing Type: Identifies a particular Routing header variant. If a router does not recognize the Routing Type value, it must discard the packet. • Segments Left: Number of route segments remaining; that is, the number of explicitly listed intermediate nodes still to be visited before reaching the final destination. The only specific routing header format defined in RFC 2460 is the Type 0 Routing header (Figure 18.12d). When using the Type 0 Routing header, the source node does not place the ultimate destination address in the IPv6 header. Instead, that address is the last address listed in the Routing header (Address[n] in Figure 18.12d), and the IPv6 header contains the destination address of the first desired router on the path.The Routing header will not be examined until the packet reaches the node identified in the IPv6 header. At that point, the IPv6 and Routing header contents are updated and the packet is forwarded. The update consists of placing the next address to be visited in the IPv6 header and decrementing the Segments Left field in the Routing header. Destination Options Header The Destination Options header carries optional information that, if present, is examined only by the packet’s destination node. The format of this header is the same as that of the Hop-by-Hop Options header (Figure 18.12a). 596 CHAPTER 18 / INTERNET PROTOCOLS 18.6 VIRTUAL PRIVATE NETWORKS AND IP SECURITY In today’s distributed computing environment, the virtual private network (VPN) offers an attractive solution to network managers. In essence, a VPN consists of a set of computers that interconnect by means of a relatively unsecure network and that make use of encryption and special protocols to provide security. At each corporate site, workstations, servers, and databases are linked by one or more local area networks (LANs). The LANs are under the control of the network manager and can be configured and tuned for cost-effective performance. The Internet or some other public network can be used to interconnect sites, providing a cost savings over the use of a private network and offloading the wide area network management task to the public network provider. That same public network provides an access path for telecommuters and other mobile employees to log on to corporate systems from remote sites. But the manager faces a fundamental requirement: security. Use of a public network exposes corporate traffic to eavesdropping and provides an entry point for unauthorized users. To counter this problem, the manager may choose from a variety of encryption and authentication packages and products. Proprietary solutions raise a number of problems. First, how secure is the solution? If proprietary encryption or authentication schemes are used, there may be little reassurance in the technical literature as to the level of security provided. Second is the question of compatibility. No manager wants to be limited in the choice of workstations, servers, routers, firewalls, and so on by a need for compatibility with the security facility. This is the motivation for the IP Security (IPSec) set of Internet standards. IPSec In 1994, the Internet Architecture Board (IAB) issued a report titled “Security in the Internet Architecture” (RFC 1636). The report stated the general consensus that the Internet needs more and better security and identified key areas for security mechanisms. Among these were the need to secure the network infrastructure from unauthorized monitoring and control of network traffic and the need to secure enduser-to-end-user traffic using authentication and encryption mechanisms. To provide security, the IAB included authentication and encryption as necessary security features in the next-generation IP, which has been issued as IPv6. Fortunately, these security capabilities were designed to be usable both with the current IPv4 and the future IPv6. This means that vendors can begin offering these features now, and many vendors do now have some IPSec capability in their products. The IPSec specification now exists as a set of Internet standards. Applications of IPSec IPSec provides the capability to secure communications across a LAN, across private and public WANs, and across the Internet. Examples of its use include the following: • Secure branch office connectivity over the Internet: A company can build a secure virtual private network over the Internet or over a public WAN. This enables a business to rely heavily on the Internet and reduce its need for private networks, saving costs and network management overhead. 18.6 / VIRTUAL PRIVATE NETWORKS AND IP SECURITY 597 • Secure remote access over the Internet: An end user whose system is equipped with IP security protocols can make a local call to an Internet service provider (ISP) and gain secure access to a company network. This reduces the cost of toll charges for traveling employees and telecommuters. • Establishing extranet and intranet connectivity with partners: IPSec can be used to secure communication with other organizations, ensuring authentication and confidentiality and providing a key exchange mechanism. • Enhancing electronic commerce security: Even though some Web and electronic commerce applications have built-in security protocols, the use of IPSec enhances that security. IPSec guarantees that all traffic designated by the network administrator is both encrypted and authenticated, adding an additional layer of security to whatever is provided at the application layer. The principal feature of IPSec that enables it to support these varied applications is that it can encrypt and/or authenticate all traffic at the IP level. Thus, all distributed applications, including remote logon, client/server, e-mail, file transfer, Web access, and so on, can be secured. Figure 18.13 is a typical scenario of IPSec usage. An organization maintains LANs at dispersed locations. Nonsecure IP traffic is conducted on each LAN. For traffic offsite, through some sort of private or public WAN, IPSec protocols are used.These protocols operate in networking devices, such as a router or firewall, that connect each LAN to the outside world. The IPSec networking device will typically encrypt and compress all traffic going into the WAN, and decrypt and decompress traffic coming from the WAN; these operations are transparent to workstations and servers on the LAN. Secure transmission is also possible with individual users who dial into the WAN. Such user workstations must implement the IPSec protocols to provide security. Benefits of IPSec Some of the benefits of IPSec are as follows: • When IPSec is implemented in a firewall or router, it provides strong security that can be applied to all traffic crossing the perimeter. Traffic within a company or workgroup does not incur the overhead of security-related processing. • IPSec in a firewall is resistant to bypass if all traffic from the outside must use IP and the firewall is the only means of entrance from the Internet into the organization. • IPSec is below the transport layer (TCP, UDP) and so is transparent to applications. There is no need to change software on a user or server system when IPSec is implemented in the firewall or router. Even if IPSec is implemented in end systems, upper-layer software, including applications, is not affected. • IPSec can be transparent to end users. There is no need to train users on security mechanisms, issue keying material on a per-user basis, or revoke keying material when users leave the organization. • IPSec can provide security for individual users if needed. This is useful for offsite workers and for setting up a secure virtual subnetwork within an organization for sensitive applications. 598 User system with IPSec Secure IP payload Public (Internet) or private network IP er ad he he IP ad er IP re cu ad Se aylo p IP he Se ad c er c Se IP ader he Se pa cure yl I oa P d IP IPSec header header Ethernet switch IP header Ethernet switch IP payload Networking device with IPSec Figure 18.13 An IP Security Scenario IP header IP payload Networking device with IPSec 18.7 / RECOMMENDED READING AND WEB SITES 599 IPSec Functions IPSec provides three main facilities: an authentication-only function referred to as Authentication Header (AH), a combined authentication/encryption function called Encapsulating Security Payload (ESP), and a key exchange function. For VPNs, both authentication and encryption are generally desired, because it is important both to (1) assure that unauthorized users do not penetrate the virtual private network and (2) assure that eavesdroppers on the Internet cannot read messages sent over the virtual private network. Because both features are generally desirable, most implementations are likely to use ESP rather than AH. The key exchange function allows for manual exchange of keys as well as an automated scheme. IPSec is explored in Chapter 21. 18.7 RECOMMENDED READING AND WEB SITES [RODR02] provides clear coverage of all of the topics in this chapter. Good coverage of internetworking and IPv4 can be found in [COME06] and [STEV94]. [SHAN02] and [KENT87] provide useful discussions of fragmentation. [LEE05] is a thorough technical description IPv6. [KESH98] provides an instructive look at present and future router functionality. [METZ02] and [DOI94] describe the IPv6 anycast feature. For the reader interested in a more in-depth discussion of IP addressing, [SPOR03] offers a wealth of detail. COME06 Comer, D. Internetworking with TCP/IP, Volume I: Principles, Protocols, and Architecture. Upper Saddle River, NJ: Prentice Hall, 2006. DOI04 Doi, S., et al. “IPv6 Anycast for Simple and Effective Communications.” IEEE Communications Magazine, May 2004. HUIT98 Huitema, C. IPv6: The New Internet Protocol. Upper Saddle River, NJ: Prentice Hall, 1998. KENT87 Kent, C., and Mogul, J. “Fragmentation Considered Harmful.” ACM Computer Communication Review, October 1987. KESH98 Keshav, S., and Sharma, R. “Issues and Trends in Router Design.” IEEE Communications Magazine, May 1998. LEE05 Lee, H. Understanding IPv6. New York: Springer-Verlag, 2005. METZ02 Metz C. “IP Anycast.” IEEE Internet Computing, March 2002. RODR02 Rodriguez, A., et al. TCP/IP Tutorial and Technical Overview. Upper Saddle River: NJ: Prentice Hall, 2002. SHAN02 Shannon, C.; Moore, D.; and Claffy, K. “Beyond Folklore: Observations on Fragmented Traffic.” IEEE/ACM Transactions on Networking, December 2002. SPOR03 Sportack, M. IP Addressing Fundamentals. Indianapolis, IN: Cisco Press, 2003. STEV94 Stevens, W. TCP/IP Illustrated, Volume 1: The Protocols. Reading, MA: Addison-Wesley, 1994. 600 CHAPTER 18 / INTERNET PROTOCOLS Recommended Web sites: • IPv6: Information about IPv6 and related topics. • IPv6 Working Group: Chartered by IETF to develop standards related to IPv6. The Web site includes all relevant RFCs and Internet drafts. • IPv6 Forum: An industry consortium that promotes IPv6-related products. Includes a number of white papers and articles. 18.8 KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS Key Terms broadcast datagram lifetime end system fragmentation intermediate system Internet Internet Control Message Protocol (ICMP) Internet Protocol (IP) internetworking intranet IPv4 IPv6 multicast reassembly router segmentation subnet subnet mask subnetwork traffic class unicast Review Questions 18.1. 18.2. 18.3. 18.4. 18.5. 18.6. 18.7. 18.8. Give some reasons for using fragmentation and reassembly. List the requirements for an internetworking facility. What are the pros and cons of limiting reassembly to the endpoint as compared to allowing en route reassembly? Explain the function of the three flags in the IPv4 header. How is the IPv4 header checksum calculated? What is the difference between the traffic class and flow label fields in the IPv6 header? Briefly explain the three types of IPv6 addresses. What is the purpose of each of the IPv6 header types? Problems 18.1 18.2 Although not explicitly stated, the Internet Protocol (IP) specification, RFC 791, defines the minimum packet size a network technology must support to allow IP to run over it. a. Read Section 3.2 of RFC 791 to find out that value. What is it? b. Discuss the reasons for adopting that specific value. In the discussion of IP, it was mentioned that the identifier, don’t fragment identifier, and time-to-live parameters are present in the Send primitive but not in the Deliver primitive because they are only of concern to IP. For each of these parameters, indicate whether it is of concern to the IP entity in the source, the IP entities in any intermediate routers, and the IP entity in the destination end systems. Justify your answer. 18.8 / KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS 18.3 18.4 18.5 18.6 18.7 18.8 18.9 18.10 18.11 18.12 18.13 18.14 18.15 18.16 18.17 601 What is the header overhead in the IP protocol? Describe some circumstances where it might be desirable to use source routing rather than let the routers make the routing decision. Because of fragmentation, an IP datagram can arrive in several pieces, not necessarily in the correct order. The IP entity at the receiving end system must accumulate these fragments until the original datagram is reconstituted. a. Consider that the IP entity creates a buffer for assembling the data field in the original datagram.As assembly proceeds, the buffer will contain blocks of data and “holes” between the data blocks. Describe an algorithm for reassembly based on this concept. b. For the algorithm in part (a), it is necessary to keep track of the holes. Describe a simple mechanism for doing this. A 4480-octet datagram is to be transmitted and needs to be fragmented because it will pass through an Ethernet with a maximum payload of 1500 octets. Show the Total Length, More Flag, and Fragment Offset values in each of the resulting fragments. Consider a header that consists of 10 octets, with the checksum in the last two octets (this does not correspond to any actual header format) with the following content (in hexadecimal): 01 00 F6 F7 F4 F5 F2 03 00 00 a. Calculate the checksum. Show your calculation. b. Show the resulting packet. c. Verify the checksum. The IP checksum needs to be recalculated at routers because of changes to the IP header, such as the lifetime field. It is possible to recalculate the checksum from scratch. Suggest a procedure that involves less calculation. Hint: Suppose that the value in octet k is changed by Z = new_value - old_value; consider the effect of this change on the checksum. An IP datagram is to be fragmented. Which options in the option field need to be copied into the header of each fragment, and which need only be retained in the first fragment? Justify the handling of each option. A transport-layer message consisting of 1500 bits of data and 160 bits of header is sent to an internet layer, which appends another 160 bits of header. This is then transmitted through two networks, each of which uses a 24-bit packet header. The destination network has a maximum packet size of 800 bits. How many bits, including headers, are delivered to the network-layer protocol at the destination? The architecture suggested by Figure 18.2 is to be used. What functions could be added to the routers to alleviate some of the problems caused by the mismatched local and long-haul networks? Should internetworking be concerned with a network’s internal routing? Why or why not? Provide the following parameter values for each of the network classes A, B, and C. Be sure to consider any special or reserved addresses in your calculations. a. Number of bits in network portion of address b. Number of bits in host portion of address c. Number of distinct networks allowed d. Number of distinct hosts per network allowed e. Integer range of first octet What percentage of the total IP address space does each of the network classes represent? What is the difference between the subnet mask for a Class A address with 16 bits for the subnet ID and a class B address with 8 bits for the subnet ID? Is the subnet mask 255.255.0.255 valid for a Class A address? Given a network address of 192.168.100.0 and a subnet mask of 255.255.255.192, a. How many subnets are created? b. How many hosts are there per subnet? 602 CHAPTER 18 / INTERNET PROTOCOLS 18.18 18.19 18.20 18.21 18.22 18.23 18.24 18.25 18.26 18.27 18.28 Given a company with six individual departments and each department having ten computers or networked devices, what mask could be applied to the company network to provide the subnetting necessary to divide up the network equally? In contemporary routing and addressing, the notation commonly used is called classless interdomain routing or CIDR. With CIDR, the number of bits in the mask is indicated in the following fashion: 192.168.100.0/24. This corresponds to a mask of 255.255.255.0. If this example would provide for 256 host addresses on the network, how many addresses are provided with the following? a. 192.168.100.0/23 b. 192.168.100.0/25 Find out about your network. Using the command “ipconfig”, “ifconfig”, or “winipcfg”, we can learn not only our IP address but other network parameters as well. Can you determine your mask, gateway, and the number of addresses available on your network? Using your IP address and your mask, what is your network address? This is determined by converting the IP address and the mask to binary and then proceeding with a bitwise logical AND operation. For example, given the address 172.16.45.0 and the mask 255.255.224.0, we would discover that the network address would be 172.16.32.0. Compare the individual fields of the IPv4 header with the IPv6 header. Account for the functionality provided by each IPv4 field by showing how the same functionality is provided in IPv6. Justify the recommended order in which IPv6 extension headers appear (i.e., why is the Hop-by-Hop Options header first, why is the Routing header before the Fragment header, and so on). The IPv6 standard states that if a packet with a nonzero flow label arrives at a router and the router has no information for that flow label, the router should ignore the flow label and forward the packet. a. What are the disadvantages of treating this event as an error, discarding the packet, and sending an ICMP message? b. Are there situations in which routing the packet as if its flow label were zero will cause the wrong result? Explain. The IPv6 flow mechanism assumes that the state associated with a given flow label is stored in routers, so they know how to handle packets that carry that flow label. A design requirement is to flush flow labels that are no longer being used (stale flow label) from routers. a. Assume that a source always send a control message to all affected routers deleting a flow label when the source finishes with that flow. In that case, how could a stale flow label persist? b. Suggest router and source mechanisms to overcome the problem of stale flow labels. The question arises as to which packets generated by a source should carry nonzero IPv6 flow labels. For some applications, the answer is obvious. Small exchanges of data should have a zero flow label because it is not worth creating a flow for a few packets. Real-time flows should have a flow label; such flows are a primary reason flow labels were created. A more difficult issue is what to do with peers sending large amounts of best-effort traffic (e.g., TCP connections). Make a case for assigning a unique flow label to each long-term TCP connection. Make a case for not doing this. The original IPv6 specifications combined the Traffic Class and Flow Label fields into a single 28-bit Flow Label field. This allowed flows to redefine the interpretation of different values of priority. Suggest reasons why the final specification includes the Priority field as a distinct field. For Type 0 IPv6 routing, specify the algorithm for updating the IPv6 and Routing headers by intermediate nodes. CHAPTER 19 INTERNETWORK OPERATION 19.1 Multicasting 19.2 Routing Protocols 19.3 Integrated Services Architecture 19.4 Differentiated Services 19.5 Service Level Agreements 19.6 IP Performance Metrics 19.7 Recommended Reading and Web Sites 19.8 Key Terms, Review Questions, and Problems 603 604 CHAPTER 19 / INTERNETWORK OPERATION She occupied herself with studying a map on the opposite wall because she knew she would have to change trains at some point. Tottenham Court Road must be that point, an interchange from the black line to the red. This train would take her there, was bearing her there rapidly now, and at the station she would follow the signs, for signs there must be, to the Central Line going westward. —King Solomon’s Carpet, Barbara Vine (Ruth Rendell) KEY POINTS • • • • The act of sending a packet from a source to multiple destinations is referred to as multicasting. Multicasting raises design issues in the areas of addressing and routing. Routing protocols in an internet function in a similar fashion to those used in packet-switching networks. An internet routing protocol is used to exchange information about reachability and traffic delays, allowing each router to construct a next-hop routing table for paths through the internet. Typically, relatively simple routing protocols are used between autonomous systems within a larger internet and more complex routing protocols are used within each autonomous system. The integrated services architecture is a response to the growing variety and volume of traffic experienced in the Internet and intranets. It provides a framework for the development of protocols such as RSVP to handle multimedia/multicast traffic and provides guidance to router vendors on the development of efficient techniques for handling a varied load. The differentiated services architecture is designed to provide a simple, easy-to-implement, low-overhead tool to support a range of network services that are differentiated on the basis of performance. Differentiated services are provided on the basis of a 6-bit label in the IP header, which classifies traffic in terms of the type of service to be given by routers for that traffic. As the Internet and private internets grow in scale, a host of new demands march steadily into view. Low-volume TELNET conversations are leapfrogged by highvolume client/server applications. To this has been added more recently the tremendous volume of Web traffic, which is increasingly graphics intensive. Now real-time voice and video applications add to the burden. To cope with these demands, it is not enough to increase internet capacity. Sensible and effective methods for managing the traffic and controlling congestion are needed. Historically, IP-based internets have been able to provide a simple best-effort delivery service to all applications using an internet. But the needs of users have 19.1 / MULTICASTING 605 changed.A company may have spent millions of dollars installing an IP-based internet designed to transport data among LANs but now finds that new real-time, multimedia, and multicasting applications are not well supported by such a configuration. The only networking scheme designed from day one to support both traditional TCP and UDP traffic and real-time traffic is ATM. However, reliance on ATM means either constructing a second networking infrastructure for real-time traffic or replacing the existing IP-based configuration with ATM, both of which are costly alternatives. Thus, there is a strong need to be able to support a variety of traffic with a variety of quality-of-service (QoS) requirements, within the TCP/IP architecture. This chapter looks at the internetwork functions and services designed to meet this need. We begin this chapter with a discussion of multicasting. Next we explore the issue of internetwork routing algorithms. Next, we look at the Integrated Services Architecture (ISA), which provides a framework for current and future internet services. Then we examine differentiated services. Finally, we introduce the topics of service level agreements and IP performance metrics. Refer to Figure 2.5 to see the position within the TCP/IP suite of the protocols discussed in this chapter. 19.1 MULTICASTING Typically, an IP address refers to an individual host on a particular network. IP also accommodates addresses that refer to a group of hosts on one or more networks. Such addresses are referred to as multicast addresses, and the act of sending a packet from a source to the members of a multicast group is referred to as multicasting. Multicasting has a number of practical applications. For example, • Multimedia: A number of users “tune in” to a video or audio transmission from a multimedia source station. • Teleconferencing: A group of workstations form a multicast group such that a transmission from any member is received by all other group members. • Database: All copies of a replicated file or database are updated at the same time. • Distributed computation: Intermediate results are sent to all participants. • Real-time workgroup: Files, graphics, and messages are exchanged among active group members in real time. Multicasting done within the scope of a single LAN segment is straightforward. IEEE 802 and other LAN protocols include provision for MAC-level multicast addresses. A packet with a multicast address is transmitted on a LAN segment. Those stations that are members of the corresponding multicast group recognize the multicast address and accept the packet. In this case, only a single copy of the packet is ever transmitted. This technique works because of the broadcast nature of a LAN: A transmission from any one station is received by all other stations on the LAN. In an internet environment, multicasting is a far more difficult undertaking. To see this, consider the configuration of Figure 19.1; a number of LANs are interconnected by routers. Routers connect to each other either over high-speed links or across a wide area network (network N4). A cost is associated with each link or network in 606 CHAPTER 19 / INTERNETWORK OPERATION N2 Router A 2 2 L2 L1 2 D 3 3 2 4 1 N1 B 2 L3 L4 1 N3 L5 2 4 C 6 Group member Multicast server N4 E 6 6 1 F 1 N5 N6 Group member Figure 19.1 Group member Example Configuration each direction, indicated by the value shown leaving the router for that link or network. Suppose that the multicast server on network N1 is transmitting packets to a multicast address that represents the workstations indicated on networks N3, N5, N6. Suppose that the server does not know the location of the members of the multicast group.Then one way to assure that the packet is received by all members of the group is to broadcast a copy of each packet to each network in the configuration, over the leastcost route for each network. For example, one packet would be addressed to N3 and would traverse N1, link L3, and N3. Router B is responsible for translating the IP-level multicast address to a MAC-level multicast address before transmitting the MAC frame onto N3. Table 19.1 summarizes the number of packets generated on the various links and networks in order to transmit one packet to a multicast group by this method. In this table, the source is the multicast server on network N1 in Figure 19.1; the multicast address includes the group members on N3, N5, and N6. Each column in the table refers to the path taken from the source host to a destination router attached to a particular destination network. Each row of the table refers to a network or link in the configuration of Figure 19.1. Each entry in the table gives the number of packets that 19.1 / MULTICASTING 607 Table 19.1 Traffic Generated by Various Multicasting Strategies (a) Broadcast (b) Multiple Unicast S : N2 S : N3 S : N5 S : N6 Total S : N3 S : N5 S : N6 Total N1 1 1 1 1 4 1 N4 1 1 1 1 N5 1 1 1 (c) Multicast 1 1 3 1 1 1 2 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 2 1 4 4 11 8 N2 N3 1 N6 L1 1 1 L2 L3 1 L4 1 1 1 2 4 4 13 1 L5 Total 2 3 3 traverse a given network or link for a given path. A total of 13 copies of the packet are required for the broadcast technique. Now suppose the source system knows the location of each member of the multicast group. That is, the source has a table that maps a multicast address into a list of networks that contain members of that multicast group. In that case, the source need only send packets to those networks that contain members of the group. We could refer to this as the multiple unicast strategy. Table 19.1 shows that in this case, 11 packets are required. Both the broadcast and multiple unicast strategies are inefficient because they generate unnecessary copies of the source packet. In a true multicast strategy, the following method is used: 1. The least-cost path from the source to each network that includes members of the multicast group is determined. This results in a spanning tree1 of the configuration. Note that this is not a full spanning tree of the configuration. Rather, it is a spanning tree that includes only those networks containing group members. 2. The source transmits a single packet along the spanning tree. 3. The packet is replicated by routers only at branch points of the spanning tree. Figure 19.2a shows the spanning tree for transmissions from the source to the multicast group, and Figure 19.2b shows this method in action. The source transmits a single packet over N1 to router D. D makes two copies of the packet, to transmit over 1 The concept of spanning tree was introduced in our discussion of bridges in Chapter 15. A spanning tree of a graph consists of all the nodes of the graph plus a subset of the links (edges) of the graph that provides connectivity (a path exists between any two nodes) with no closed loops (there is only one path between any two nodes). 608 CHAPTER 19 / INTERNETWORK OPERATION N2 RA N1 L1 L2 RD RD RB L3 N3 L4 RC N1 L3 L4 RC N3 L5 RB N4 RE RE N4 RF RF N5 N6 (a) Spanning tree from source to multicast group Figure 19.2 N5 N6 (b) Packets generated for multicast transmission Multicast Transmission Example links L3 and L4. B receives the packet from L3 and transmits it on N3, where it is read by members of the multicast group on the network. Meanwhile, C receives the packet sent on L4. It must now deliver that packet to both E and F. If network N4 were a broadcast network (e.g., an IEEE 802 LAN), then C would only need to transmit one instance of the packet for both routers to read. If N4 is a packet-switching WAN, then C must make two copies of the packet and address one to E and one to F. Each of these routers, in turn, retransmits the received packet on N5 and N6, respectively. As Table 19.1 shows, the multicast technique requires only eight copies of the packet. Requirements for Multicasting In ordinary unicast transmission over an internet, in which each datagram has a unique destination network, the task of each router is to forward the datagram along the shortest path from that router to the destination network. With multicast transmission, the router may be required to forward two or more copies of an incoming datagram. In our example, routers D and C both must forward two copies of a single incoming datagram. Thus, we might expect that the overall functionality of multicast routing is more complex than unicast routing. The following is a list of required functions: 1. A convention is needed for identifying a multicast address. In IPv4, Class D addresses are reserved for this purpose. These are 32-bit addresses with 1110 as their high-order 4 bits, followed by a 28-bit group identifier. In IPv6, a 128-bit multicast address consists of an 8-bit prefix of all ones, a 4-bit flags field, a 4-bit scope field, and a 112-bit group identifier. The flags field, currently, only indicates whether this address is permanently assigned or not.The scope field indicates the scope of applicability of the address, ranging from a single network to global. 19.1 / MULTICASTING 609 2. Each node (router or source node participating in the routing algorithm) must translate between an IP multicast address and a list of networks that contain members of this group. This information allows the node to construct a shortest-path spanning tree to all of the networks containing group members. 3. A router must translate between an IP multicast address and a network multicast address in order to deliver a multicast IP datagram on the destination network. For example, in IEEE 802 networks, a MAC-level address is 48 bits long; if the highest-order bit is 1, then it is a multicast address. Thus, for multicast delivery, a router attached to an IEEE 802 network must translate a 32-bit IPv4 or a 128-bit IPv6 multicast address into a 48-bit IEEE 802 MAC-level multicast address. 4. Although some multicast addresses may be assigned permanently, the more usual case is that multicast addresses are generated dynamically and that individual hosts may join and leave multicast groups dynamically. Thus, a mechanism is needed by which an individual host informs routers attached to the same network as itself of its inclusion in and exclusion from a multicast group. IGMP, described subsequently, provides this mechanism. 5. Routers must exchange two sorts of information. First, routers need to know which networks include members of a given multicast group. Second, routers need sufficient information to calculate the shortest path to each network containing group members. These requirements imply the need for a multicast routing protocol. A discussion of such protocols is beyond the scope of this book. 6. A routing algorithm is needed to calculate shortest paths to all group members. 7. Each router must determine multicast routing paths on the basis of both source and destination addresses. The last point is a subtle consequence of the use of multicast addresses. To illustrate the point, consider again Figure 19.1. If the multicast server transmits a unicast packet addressed to a host on network N5, the packet is forwarded by router D to C, which then forwards the packet to E. Similarly, a packet addressed to a host on network N3 is forwarded by D to B. But now suppose that the server transmits a packet with a multicast address that includes hosts on N3, N5, and N6. As we have discussed, D makes two copies of the packet and sends one to B and one to C. What will C do when it receives a packet with such a multicast address? C knows that this packet is intended for networks N3, N5, and N6. A simple-minded approach would be for C to calculate the shortest path to each of these three networks. This produces the shortest-path spanning tree shown in Figure 19.3. As a result, C sends two copies of the packet out over N4, one intended for N5 and one intended for N6. But it also sends a copy of the packet to B for delivery on N3. Thus B will receive two copies of the packet, one from D and one from C. This is clearly not what was intended by the host on N1 when it launched the packet. To avoid unnecessary duplication of packets, each router must route packets on the basis of both source and multicast destination. When C receives a packet intended for the multicast group from a source on N1, it must calculate the spanning 610 CHAPTER 19 / INTERNETWORK OPERATION RC L5 N4 RF RE N5 Figure 19.3 N6 RB N3 Spanning Tree from Router C to Multicast Group tree with N1 as the root (shown in Figure 19.2a) and route on the basis of that spanning tree. Internet Group Management Protocol (IGMP) IGMP, defined in RFC 3376, is used by hosts and routers to exchange multicast group membership information over a LAN. IGMP takes advantage of the broadcast nature of a LAN to provide an efficient technique for the exchange of information among multiple hosts and routers. In general, IGMP supports two principal operations: 1. Hosts send messages to routers to subscribe to and unsubscribe from a multicast group defined by a given multicast address. 2. Routers periodically check which multicast groups are of interest to which hosts. IGMP is currently at version 3. In IGMPv1, hosts could join a multicast group and routers used a timer to unsubscribe group members. IGMPv2 enabled a host to request to be unsubscribed from a group. The first two versions used essentially the following operational model: • Receivers have to subscribe to multicast groups. • Sources do not have to subscribe to multicast groups. • Any host can send traffic to any multicast group. This paradigm is very general, but it also has some weaknesses: 1. Spamming of multicast groups is easy. Even if there are application level filters to drop unwanted packets, still these packets consume valuable resources in the network and in the receiver that has to process them. 2. Establishment of the multicast distribution trees is problematic. This is mainly because the location of sources is not known. 19.1 / MULTICASTING 611 3. Finding globally unique multicast addresses is difficult. It is always possible that another multicast group uses the same multicast address. IGMPv3 addresses these weaknesses by 1. Allowing hosts to specify the list of hosts from which they want to receive traffic. Traffic from other hosts is blocked at routers. 2. Allowing hosts to block packets that come from sources that send unwanted traffic. The remainder of this section discusses IGMPv3. IGMP Message Format All IGMP messages are transmitted in IP datagrams. The current version defines two message types: Membership Query and Membership Report. A Membership Query message is sent by a multicast router. There are three subtypes: a general query, used to learn which groups have members on an attached network; a group-specific query, used to learn if a particular group has any members on an attached network; and a group-and-source-specific query, used to learn if any attached device desires reception of packets sent to a specified multicast address, from any of a specified list of sources. Figure 19.4a shows the message format, which consists of the following fields: • Type: Defines this message type. • Max Response Code: Indicates the maximum allowed time before sending a responding report in units of 1/10 second. • Checksum: An error-detecting code, calculated as the 16-bit ones complement addition of all the 16-bit words in the message. For purposes of computation, the Checksum field is itself initialized to a value of zero. This is the same checksum algorithm used in IPv4. • Group Address: Zero for a general query message; a valid IP multicast group address when sending a group-specific query or group-and-source-specific query. • S Flag: When set to one, indicates to any receiving multicast routers that they are to suppress the normal timer updates they perform upon hearing a query. • QRV (querier’s robustness variable): If nonzero, the QRV field contains the RV value used by the querier (i.e., the sender of the query). Routers adopt the RV value from the most recently received query as their own RV value, unless that most recently received RV was zero, in which case the receivers use the default value or a statically configured value. The RV dictates how many times a host will retransmit a report to assure that it is not missed by any attached multicast routers. • QQIC (querier’s querier interval code): Specifies the QI value used by the querier, which is a timer for sending multiple queries. Multicast routers that are not the current querier adopt the QI value from the most recently received query as their own QI value, unless that most recently received QI was zero, in which case the receiving routers use the default QI value. 612 CHAPTER 19 / INTERNETWORK OPERATION Bit: 0 4 8 Type 17 16 31 Max resp code Checksum Group address (class D IPv4 address) Resv S QRV QQIC Number of sources (N) Source address [1] Source address [2] Source address [N] (a) Membership query message Bit: 0 4 8 Type 34 16 31 Reserved Checksum Reserved Number of group records (M) Group record [1] Group record [2] Group record [M] (b) Membership report message Bit: 0 4 Record type 8 16 Aux data len 31 Number of sources (N) Multicast address Source address [1] Source address [2] Source address [N] Auxiliary data (c) Group record Figure 19.4 IGMPv3 Message Formats 19.1 / MULTICASTING 613 • Number of Sources: Specifies how many source addresses are present in this query. This value is nonzero only for a group-and-source-specific query. • Source Addresses: If the number of sources is N, then there are N 32-bit unicast addresses appended to the message. A Membership Report message consists of the following fields: • Type: Defines this message type. • Checksum: An error-detecting code, calculated as the 16-bit ones complement addition of all the 16-bit words in the message. • Number of Group Records: Specifies how many group records are present in this report. • Group Records: If the number of group records is M, then there are M 32-bit unicast group records appended to the message. A group record includes the following fields: • Record Type: Defines this record type, as described subsequently. • Aux Data Length: Length of the auxiliary data field, in 32-bit words. • Number of Sources: Specifies how many source addresses are present in this record. • Multicast Address: The IP multicast address to which this record pertains. • Source Addresses: If the number of sources is N, then there are N 32-bit unicast addresses appended to the message. • Auxiliary Data: Additional information pertaining to this record. Currently, no auxiliary data values are defined. IGMP Operation The objective of each host in using IGMP is to make itself known as a member of a group with a given multicast address to other hosts on the LAN and to all routers on the LAN. IGMPv3 introduces the ability for hosts to signal group membership with filtering capabilities with respect to sources. A host can either signal that it wants to receive traffic from all sources sending to a group except for some specific sources (called EXCLUDE mode) or that it wants to receive traffic only from some specific sources sending to the group (called INCLUDE mode). To join a group, a host sends an IGMP membership report message, in which the group address field is the multicast address of the group. This message is sent in an IP datagram with the same multicast destination address. In other words, the Group Address field of the IGMP message and the Destination Address field of the encapsulating IP header are the same. All hosts that are currently members of this multicast group will receive the message and learn of the new group member. Each router attached to the LAN must listen to all IP multicast addresses in order to hear all reports. To maintain a valid current list of active group addresses, a multicast router periodically issues an IGMP general query message, sent in an IP datagram with an all-hosts multicast address. Each host that still wishes to remain a member of one or more multicast groups must read datagrams with the all-hosts address. When such a 614 CHAPTER 19 / INTERNETWORK OPERATION host receives the query, it must respond with a report message for each group to which it claims membership. Note that the multicast router does not need to know the identity of every host in a group. Rather, it needs to know that there is at least one group member still active. Therefore, each host in a group that receives a query sets a timer with a random delay. Any host that hears another host claim membership in the group will cancel its own report. If no other report is heard and the timer expires, a host sends a report. With this scheme, only one member of each group should provide a report to the multicast router. When a host leaves a group, it sends a leave group message to the all-routers static multicast address. This is accomplished by sending a membership report message with the INCLUDE option and a null list of source addresses; that is, no sources are to be included, effectively leaving the group. When a router receives such a message for a group that has group members on the reception interface, it needs to determine if there are any remaining group members. For this purpose, the router uses the group-specific query message. Group Membership with IPv6 IGMP was defined for operation with IPv4 and makes use of 32-bit addresses. IPv6 internets need this same functionality. Rather than to define a separate version of IGMP for IPv6, its functions have been incorporated into the new version of the Internet Control Message Protocol (ICMPv6). ICMPv6 includes all of the functionality of ICMPv4 and IGMP. For multicast support, ICMPv6 includes both a group-membership query and a groupmembership report message, which are used in the same fashion as in IGMP. 19.2 ROUTING PROTOCOLS The routers in an internet are responsible for receiving and forwarding packets through the interconnected set of networks. Each router makes routing decision based on knowledge of the topology and traffic/delay conditions of the internet. In a simple internet, a fixed routing scheme is possible. In more complex internets, a degree of dynamic cooperation is needed among the routers. In particular, the router must avoid portions of the network that have failed and should avoid portions of the network that are congested. To make such dynamic routing decisions, routers exchange routing information using a special routing protocol for that purpose. Information is needed about the status of the internet, in terms of which networks can be reached by which routes, and the delay characteristics of various routes. In considering the routing function, it is important to distinguish two concepts: • Routing information: Information about the topology and delays of the internet • Routing algorithm: The algorithm used to make a routing decision for a particular datagram, based on current routing information Autonomous Systems To proceed with our discussion of routing protocols, we need to introduce the concept of an autonomous system. An autonomous system (AS) exhibits the following characteristics: 19.2 / ROUTING PROTOCOLS 615 1. An AS is a set of routers and networks managed by a single organization. 2. An AS consists of a group of routers exchanging information via a common routing protocol. 3. Except in times of failure, an AS is connected (in a graph-theoretic sense); that is, there is a path between any pair of nodes. A shared routing protocol, which we shall refer to as an interior router protocol (IRP), passes routing information between routers within an AS. The protocol used within the AS does not need to be implemented outside of the system. This flexibility allows IRPs to be custom tailored to specific applications and requirements. It may happen, however, that an internet will be constructed of more than one AS. For example, all of the LANs at a site, such as an office complex or campus, could be linked by routers to form an AS. This system might be linked through a wide area network to other ASs. The situation is illustrated in Figure 19.5. In this case, the routing algorithms and information in routing tables used by routers in different ASs may differ. Nevertheless, the routers in one AS need at least a minimal level of information concerning networks outside the system that can be reached. Subnetwork 2.1 Subnetwork 1.2 R6 R2 R3 R5 Subnetwork 2.2 Subnetwork 1.1 Subnetwork 1.3 R7 R1 R4 R8 Subnetwork 2.4 Subnetwork 1.4 Subnetwork 2.3 Autonomous system 1 Autonomous system 2 Interior router protocol Exterior router protocol Figure 19.5 Application of Exterior and Interior Routing Protocols 616 CHAPTER 19 / INTERNETWORK OPERATION We refer to the protocol used to pass routing information between routers in different ASs as an exterior router protocol (ERP).2 We can expect that an ERP will need to pass less information than an IRP, for the following reason. If a datagram is to be transferred from a host in one AS to a host in another AS, a router in the first system need only determine the target AS and devise a route to get into that target system. Once the datagram enters the target AS, the routers within that system can cooperate to deliver the datagram; the ERP is not concerned with, and does not know about, the details of the route followed within the target AS. In the remainder of this section, we look at what are perhaps the most important examples of these two types of routing protocols: BGP and OSPF. But first, it is useful to look at a different way of characterizing routing protocols. Approaches to Routing Internet routing protocols employ one of three approaches to gathering and using routing information: distance-vector routing, link-state routing, and path-vector routing. Distance-vector routing requires that each node (router or host that implements the routing protocol) exchange information with its neighboring nodes. Two nodes are said to be neighbors if they are both directly connected to the same network. This approach is that used in the first generation routing algorithm for ARPANET, as described in Section 12.2. For this purpose, each node maintains a vector of link costs for each directly attached network and distance and next-hop vectors for each destination. The relatively simple Routing Information Protocol (RIP) uses this approach. Distance-vector routing requires the transmission of a considerable amount of information by each router. Each router must send a distance vector to all of its neighbors, and that vector contains the estimated path cost to all networks in the configuration. Furthermore, when there is a significant change in a link cost or when a link is unavailable, it may take a considerable amount of time for this information to propagate through the internet. Link-state routing is designed to overcome the drawbacks of distance-vector routing. When a router is initialized, it determines the link cost on each of its network interfaces. The router then advertises this set of link costs to all other routers in the internet topology, not just neighboring routers. From then on, the router monitors its link costs. Whenever there is a significant change (a link cost increases or decreases substantially, a new link is created, an existing link becomes unavailable), the router again advertises its set of link costs to all other routers in the configuration. Because each router receives the link costs of all routers in the configuration, each router can construct the topology of the entire configuration and then calculate the shortest path to each destination network. Having done this, the router can construct its routing table, listing the first hop to each destination. Because the 2 In the literature, the terms interior gateway protocol (IGP) and exterior gateway protocol (EGP) are often used for what are referred to here as IRP and ERP. However, because the terms IGP and EGP also refer to specific protocols, we avoid their use to define the general concepts. 19.2 / ROUTING PROTOCOLS 617 router has a representation of the entire network, it does not use a distributed version of a routing algorithm, as is done in distance-vector routing. Rather, the router can use any routing algorithm to determine the shortest paths. In practice, Dijkstra’s algorithm is used. The Open Shortest Path First (OSPF) protocol is an example of a routing protocol that uses link-state routing. The second-generation routing algorithm for ARPANET also uses this approach. Both link-state and distance-vector approaches have been used for interior router protocols. Neither approach is effective for an exterior router protocol. In a distance-vector routing protocol, each router advertises to its neighbors a vector listing each network it can reach, together with a distance metric associated with the path to that network. Each router builds up a routing database on the basis of these neighbor updates but does not know the identity of intermediate routers and networks on any particular path. There are two problems with this approach for an exterior router protocol: 1. This distance-vector protocol assumes that all routers share a common distance metric with which to judge router preferences. This may not be the case among different ASs. If different routers attach different meanings to a given metric, it may not be possible to create stable, loop-free routes. 2. A given AS may have different priorities from other ASs and may have restrictions that prohibit the use of certain other AS. A distance-vector algorithm gives no information about the ASs that will be visited along a route. In a link-state routing protocol, each router advertises its link metrics to all other routers. Each router builds up a picture of the complete topology of the configuration and then performs a routing calculation. This approach also has problems if used in an exterior router protocol: 1. Different ASs may use different metrics and have different restrictions. Although the link-state protocol does allow a router to build up a picture of the entire topology, the metrics used may vary from one AS to another, making it impossible to perform a consistent routing algorithm. 2. The flooding of link state information to all routers implementing an exterior router protocol across multiple ASs may be unmanageable. An alternative, known as path-vector routing, is to dispense with routing metrics and simply provide information about which networks can be reached by a given router and the ASs that must be crossed to get there. The approach differs from a distance-vector algorithm in two respects: First, the path-vector approach does not include a distance or cost estimate. Second, each block of routing information lists all of the ASs visited in order to reach the destination network by this route. Because a path vector lists the ASs that a datagram must traverse if it follows this route, the path information enables a router to perform policy routing. That is, a router may decide to avoid a particular path in order to avoid transiting a particular AS. For example, information that is confidential may be limited to certain kinds of ASs. Or a router may have information about the performance or quality of the portion of the internet that is included in an AS that leads the router to avoid that AS. Examples of performance or quality metrics include link speed, capacity, tendency 618 CHAPTER 19 / INTERNETWORK OPERATION Table 19.2 BGP-4 Messages Open Used to open a neighbor relationship with another router. Update Used to (1) transmit information about a single route and/or (2) list multiple routes to be withdrawn. Keepalive Used to (1) acknowledge an Open message and (2) periodically confirm the neighbor relationship. Notification Send when an error condition is detected. to become congested, and overall quality of operation. Another criterion that could be used is minimizing the number of transit ASs. Border Gateway Protocol The Border Gateway Protocol (BGP) was developed for use in conjunction with internets that employ the TCP/IP suite, although the concepts are applicable to any internet. BGP has become the preferred exterior router protocol for the Internet. Functions BGP was designed to allow routers, called gateways in the standard, in different autonomous systems (ASs) to cooperate in the exchange of routing information. The protocol operates in terms of messages, which are sent over TCP connections. The repertoire of messages is summarized in Table 19.2. The current version of BGP is known as BGP-4 (RFC 1771). Three functional procedures are involved in BGP: • Neighbor acquisition • Neighbor reachability • Network reachability Two routers are considered to be neighbors if they are attached to the same network. If the two routers are in different autonomous systems, they may wish to exchange routing information. For this purpose, it is necessary first to perform neighbor acquisition. In essence, neighbor acquisition occurs when two neighboring routers in different autonomous systems agree to exchange routing information regularly. A formal acquisition procedure is needed because one of the routers may not wish to participate. For example, the router may be overburdened and does not want to be responsible for traffic coming in from outside the system. In the neighbor acquisition process, one router sends a request message to the other, which may either accept or refuse the offer. The protocol does not address the issue of how one router knows the address or even the existence of another router, nor how it decides that it needs to exchange routing information with that particular router. These issues must be dealt with at configuration time or by active intervention of a network manager. To perform neighbor acquisition, two routers send Open messages to each other after a TCP connection is established. If each router accepts the request, it returns a Keepalive message in response. Once a neighbor relationship is established, the neighbor reachability procedure is used to maintain the relationship. Each partner needs to be assured that the other partner still exists and is still engaged in the neighbor relationship. For this purpose, the two routers periodically issue Keepalive messages to each other. 19.2 / ROUTING PROTOCOLS 619 The final procedure specified by BGP is network reachability. Each router maintains a database of the networks that it can reach and the preferred route for reaching each network. When a change is made to this database, the router issues an Update message that is broadcast to all other routers implementing BGP. Because the Update message is broadcast, all BGP routers can build up and maintain their routing information. BGP Messages Figure 19.6 illustrates the formats of all of the BGP messages. Each message begins with a 19-octet header containing three fields, as indicated by the shaded portion of each message in the figure: Octets Octets 16 Marker 16 Marker 2 Length 2 Length 1 Type Version 1 Type 2 Unfeasible routes length Variable Withdrawn routes 1 2 My autonomous system 2 Hold time 2 4 BGP identifier 1 Opt parameter length Variable Optional parameters Total path attributes length Variable Path attributes Variable Network layer reachability information (a) Open message (b) Update message Octets Octets 16 Marker 16 Marker 2 Length 2 Length 1 Type 1 1 1 Type Error code Error subcode (c) Keepalive message Variable Data (d) Notification message Figure 19.6 BGP Message Formats 620 CHAPTER 19 / INTERNETWORK OPERATION • Marker: Reserved for authentication. The sender may insert a value in this field that would be used as part of an authentication mechanism to enable the recipient to verify the identity of the sender. • Length: Length of message in octets. • Type: Type of message: Open, Update, Notification, Keepalive. To acquire a neighbor, a router first opens a TCP connection to the neighbor router of interest. It then sends an Open message. This message identifies the AS to which the sender belongs and provides the IP address of the router. It also includes a Hold Time parameter, which indicates the number of seconds that the sender proposes for the value of the Hold Timer. If the recipient is prepared to open a neighbor relationship, it calculates a value of Hold Timer that is the minimum of its Hold Time and the Hold Time in the Open message. This calculated value is the maximum number of seconds that may elapse between the receipt of successive Keepalive and/or Update messages by the sender. The Keepalive message consists simply of the header. Each router issues these messages to each of its peers often enough to prevent the Hold Timer from expiring. The Update message communicates two types of information: • Information about a single route through the internet. This information is available to be added to the database of any recipient router. • A list of routes previously advertised by this router that are being withdrawn. An Update message may contain one or both types of information. Information about a single route through the network involves three fields: the Network Layer Reachability Information (NLRI) field, the Total Path Attributes Length field, and the Path Attributes field. The NLRI field consists of a list of identifiers of networks that can be reached by this route. Each network is identified by its IP address, which is actually a portion of a full IP address. Recall that an IP address is a 32-bit quantity of the form 5network, host6. The left-hand or prefix portion of this quantity identifies a particular network. The Path Attributes field contains a list of attributes that apply to this particular route. The following are the defined attributes: • Origin: Indicates whether this information was generated by an interior router protocol (e.g., OSPF) or an exterior router protocol (in particular, BGP). • AS_Path: A list of the ASs that are traversed for this route. • Next_Hop: The IP address of the border router that should be used as the next hop to the destinations listed in the NLRI field. • Multi_Exit_Disc: Used to communicate some information about routes internal to an AS. This is described later in this section. • Local_Pref: Used by a router to inform other routers within the same AS of its degree of preference for a particular route. It has no significance to routers in other ASs. • Atomic_Aggregate, Aggregator: These two fields implement the concept of route aggregation. In essence, an internet and its corresponding address space can be organized hierarchically (i.e., as a tree). In this case, network addresses 19.2 / ROUTING PROTOCOLS 621 are structured in two or more parts. All of the networks of a given subtree share a common partial internet address. Using this common partial address, the amount of information that must be communicated in NLRI can be significantly reduced. The AS_Path attribute actually serves two purposes. Because it lists the ASs that a datagram must traverse if it follows this route, the AS_Path information enables a router to implement routing policies. That is, a router may decide to avoid a particular path to avoid transiting a particular AS. For example, information that is confidential may be limited to certain kinds of ASs. Or a router may have information about the performance or quality of the portion of the internet that is included in an AS that leads the router to avoid that AS. Examples of performance or quality metrics include link speed, capacity, tendency to become congested, and overall quality of operation. Another criterion that could be used is minimizing the number of transit ASs. The reader may wonder about the purpose of the Next_Hop attribute. The requesting router will necessarily want to know which networks are reachable via the responding router, but why provide information about other routers? This is best explained with reference to Figure 19.5. In this example, router R1 in autonomous system 1 and router R5 in autonomous system 2 implement BGP and acquire a neighbor relationship. R1 issues Update messages to R5, indicating which networks it can reach and the distances (network hops) involved. R1 also provides the same information on behalf of R2. That is, R1 tells R5 what networks are reachable via R2. In this example, R2 does not implement BGP. Typically, most of the routers in an autonomous system will not implement BGP. Only a few routers will be assigned responsibility for communicating with routers in other autonomous systems. A final point: R1 is in possession of the necessary information about R2, because R1 and R2 share an interior router protocol (IRP). The second type of update information is the withdrawal of one or more routes. In this case, the route is identified by the IP address of the destination network. Finally, the Notification Message is sent when an error condition is detected. The following errors may be reported: • Message header error: Includes authentication and syntax errors. • Open message error: Includes syntax errors and options not recognized in an Open message. This message can also be used to indicate that a proposed Hold Time in an Open message is unacceptable. • Update message error: Includes syntax and validity errors in an Update message. • Hold timer expired: If the sending router has not received successive Keepalive and/or Update and/or Notification messages within the Hold Time period, then this error is communicated and the connection is closed. • Finite state machine error: Includes any procedural error. • Cease: Used by a router to close a connection with another router in the absence of any other error. BGP Routing Information Exchange The essence of BGP is the exchange of routing information among participating routers in multiple ASs. This process can be quite complex. In what follows, we provide a simplified overview. 622 CHAPTER 19 / INTERNETWORK OPERATION Let us consider router R1 in autonomous system 1 (AS1), in Figure 19.5. To begin, a router that implements BGP will also implement an internal routing protocol such as OSPF. Using OSPF, R1 can exchange routing information with other routers within AS1 and build up a picture of the topology of the networks and routers in AS1 and construct a routing table. Next, R1 can issue an Update message to R5 in AS2. The Update message could include the following: • AS_Path: The identity of AS1 • Next_Hop: The IP address of R1 • NLRI: A list of all of the networks in AS1 This message informs R5 that all of the networks listed in NLRI are reachable via R1 and that the only autonomous system traversed is AS1. Suppose now that R5 also has a neighbor relationship with another router in another autonomous system, say R9 in AS3. R5 will forward the information just received from R1 to R9 in a new Update message. This message includes the following: • AS_Path: The list of identifiers 5AS2, AS16 • Next_Hop: The IP address of R5 • NLRI: A list of all of the networks in AS1 This message informs R9 that all of the networks listed in NLRI are reachable via R5 and that the autonomous systems traversed are AS2 and AS1. R9 must now decide if this is its preferred route to the networks listed. It may have knowledge of an alternate route to some or all of these networks that it prefers for reasons of performance or some other policy metric. If R9 decides that the route provided in R5’s update message is preferable, then R9 incorporates that routing information into its routing database and forwards this new routing information to other neighbors. This new message will include an AS_Path field of 5AS3, AS2, AS16. In this fashion, routing update information is propagated through the larger internet, consisting of a number of interconnected autonomous systems. The AS_Path field is used to assure that such messages do not circulate indefinitely: If an Update message is received by a router in an AS that is included in the AS_Path field, that router will not forward the update information to other routers. Routers within the same AS, called internal neighbors, may exchange BGP information. In this case, the sending router does not add the identifier of the common AS to the AS_Path field. When a router has selected a preferred route to an external destination, it transmits this route to all of its internal neighbors. Each of these routers then decides if the new route is preferred, in which case the new route is added to its database and a new Update message goes out. When there are multiple entry points into an AS that are available to a border router in another AS, the Multi_Exit_Disc attribute may be used to choose among them. This attribute contains a number that reflects some internal metric for reaching destinations within an AS. For example, suppose in Figure 19.5 that both R1 and R2 implement BGP and both have a neighbor relationship with R5. Each provides an Update message to R5 for network 1.3 that includes a routing 19.2 / ROUTING PROTOCOLS 623 metric used internal to AS1, such as a routing metric associated with the OSPF internal router protocol. R5 could then use these two metrics as the basis for choosing between the two routes. Open Shortest Path First (OSPF) Protocol The OSPF protocol (RFC 2328) is now widely used as the interior router protocol in TCP/IP networks. OSPF computes a route through the internet that incurs the least cost based on a user-configurable metric of cost. The user can configure the cost to express a function of delay, data rate, dollar cost, or other factors. OSPF is able to equalize loads over multiple equal-cost paths. Each router maintains a database that reflects the known topology of the autonomous system of which it is a part. The topology is expressed as a directed graph. The graph consists of the following: • Vertices, or nodes, of two types: 1. router 2. network, which is in turn of two types a. transit, if it can carry data that neither originate nor terminate on an end system attached to this network b. stub, if it is not a transit network • Edges of two types: 1. graph edges that connect two router vertices when the corresponding routers are connected to each other by a direct point-to-point link 2. graph edges that connect a router vertex to a network vertex when the router is directly connected to the network Figure 19.7, based on one in RFC 2328, shows an example of an autonomous system, and Figure 19.8 is the resulting directed graph. The mapping is straightforward: • Two routers joined by a point-to-point link are represented in the graph as being directly connected by a pair of edges, one in each direction (e.g., routers 6 and 10). • When multiple routers are attached to a network (such as a LAN or packetswitching network), the directed graph shows all routers bidirectionally connected to the network vertex (e.g., routers 1, 2, 3, and 4 all connect to network 3). • If a single router is attached to a network, the network will appear in the graph as a stub connection (e.g., network 7). • An end system, called a host, can be directly connected to a router, in which case it is depicted in the corresponding graph (e.g., host 1). • If a router is connected to other autonomous systems, then the path cost to each network in the other system must be obtained by some exterior router protocol (ERP). Each such network is represented on the graph by a stub and an edge to the router with the known path cost (e.g., networks 12 through 15). 624 CHAPTER 19 / INTERNETWORK OPERATION N12 N1 3 1 1 N3 8 8 R5 7 1 N2 8 6 6 2 R2 6 1 R3 3 N14 8 8 8 R4 R1 N13 N12 R6 6 7 2 R7 N4 9 1 N15 N11 5 1 R10 3 N6 3 R9 1 1 R11 1 2 N9 R8 N8 4 N7 1 10 2 N10 R12 H1 Figure 19.7 A Sample Autonomous System A cost is associated with the output side of each router interface. This cost is configurable by the system administrator. Arcs on the graph are labeled with the cost of the corresponding router output interface. Arcs having no labeled cost have a cost of 0. Note that arcs leading from networks to routers always have a cost of 0. A database corresponding to the directed graph is maintained by each router. It is pieced together from link state messages from other routers in the internet. Using Dijkstra’s algorithm (see Section 12.3), a router calculates the least-cost path to all destination networks. The result for router 6 of Figure 19.7 is shown as a tree in Figure 19.9, with R6 as the root of the tree. The tree gives the entire route to any destination network or host. However, only the next hop to the destination is used in the forwarding process. The resulting routing table for router 6 is shown in Table 19.3. The table includes entries for routers advertising external routes (routers 5 and 7). For external networks whose identity is known, entries are also provided. 19.3 / INTEGRATED SERVICES ARCHITECTURE N12 N1 625 N13 N14 R1 3 8 1 8 N3 R4 R5 1 1 8 1 3 7 6 R3 N2 8 8 8 7 R6 R2 6 2 6 N12 6 2 N11 N4 R7 5 9 7 1 3 N15 1 R10 R9 N6 3 1 R11 1 2 R8 N8 N9 1 4 1 H1 2 10 N10 N7 R12 Figure 19.8 Directed Graph of Autonomous System of Figure 19.7 19.3 INTEGRATED SERVICES ARCHITECTURE To meet the requirement for QoS-based service, the IETF is developing a suite of standards under the general umbrella of the Integrated Services Architecture (ISA). ISA, intended to provide QoS transport over IP-based internets, is defined in overall terms in RFC 1633, while a number of other documents are being developed to fill in the details. Already, a number of vendors have implemented portions of the ISA in routers and end-system software. This section provides an overview of ISA. Internet Traffic Traffic on a network or internet can be divided into two broad categories: elastic and inelastic. A consideration of their differing requirements clarifies the need for an enhanced internet architecture. Elastic Traffic Elastic traffic is that which can adjust, over wide ranges, to changes in delay and throughput across an internet and still meet the needs of its applications. This is the traditional type of traffic supported on TCP/IP-based internets and 626 CHAPTER 19 / INTERNETWORK OPERATION N12 N13 N14 N1 R1 3 8 8 N3 8 R4 R5 1 3 6 R3 N2 R2 R6 6 2 N12 2 N11 N4 R7 7 9 N15 3 1 R10 R9 N6 3 R11 N9 R8 N8 1 4 H1 10 2 N10 N7 R12 Figure 19.9 The SPF Tree for Router R6 is the type of traffic for which internets were designed. Applications that generate such traffic typically use TCP or UDP as a transport protocol. In the case of UDP, the application will use as much capacity as is available up to the rate that the application generates data. In the case of TCP, the application will use as much capacity as is available up to the maximum rate that the end-to-end receiver can accept data. Also with TCP, traffic on individual connections adjusts to congestion by reducing the rate at which data are presented to the network; this is described in Chapter 20. Applications that can be classified as elastic include the common applications that operate over TCP or UDP, including file transfer (FTP), electronic mail (SMTP), remote login (TELNET), network management (SNMP), and Web access (HTTP). However, there are differences among the requirements of these applications. For example, • E-mail is generally insensitive to changes in delay. • When file transfer is done interactively, as it frequently is, the user expects the delay to be proportional to the file size and so is sensitive to changes in throughput. • With network management, delay is generally not a serious concern. However, if failures in an internet are the cause of congestion, then the need for 19.3 / INTEGRATED SERVICES ARCHITECTURE 627 Table 19.3 Routing Table for R6 Destination Next Hop Distance N1 R3 10 N2 R3 10 N3 R3 7 N4 R3 8 N6 R10 8 N7 R10 12 N8 R10 10 N9 R10 11 N10 R10 13 N11 R10 14 H1 R10 21 R5 R5 6 R7 R10 8 N12 R10 10 N13 R5 14 N14 R5 14 N15 R10 17 SNMP messages to get through with minimum delay increases with increased congestion. • Interactive applications, such as remote logon and Web access, are sensitive to delay. It is important to realize that it is not per-packet delay that is the quantity of interest. As noted in [CLAR95], observation of real delays across the Internet suggest that wide variations in delay do not occur. Because of the congestion control mechanisms in TCP, when congestion develops, delays only increase modestly before the arrival rate from the various TCP connections slow down. Instead, the QoS perceived by the user relates to the total elapsed time to transfer an element of the current application. For an interactive TELNET-based application, the element may be a single keystroke or single line. For a Web access, the element is a Web page, which could be as little as a few kilobytes or could be substantially larger for an image-rich page. For a scientific application, the element could be many megabytes of data. For very small elements, the total elapsed time is dominated by the delay time across the internet. However, for larger elements, the total elapsed time is dictated by the sliding-window performance of TCP and is therefore dominated by the throughput achieved over the TCP connection.Thus, for large transfers, the transfer time is proportional to the size of the file and the degree to which the source slows due to congestion. It should be clear that even if we confine our attention to elastic traffic, a QoSbased internet service could be of benefit. Without such a service, routers are dealing evenhandedly with arriving IP packets, with no concern for the type of application and whether a particular packet is part of a large transfer element or a small one. 628 CHAPTER 19 / INTERNETWORK OPERATION Under such circumstances, and if congestion develops, it is unlikely that resources will be allocated in such a way as to meet the needs of all applications fairly. When inelastic traffic is added to the mix, the results are even more unsatisfactory. Inelastic Traffic Inelastic traffic does not easily adapt, if at all, to changes in delay and throughput across an internet. The prime example is real-time traffic. The requirements for inelastic traffic may include the following: • Throughput: A minimum throughput value may be required. Unlike most elastic traffic, which can continue to deliver data with perhaps degraded service, many inelastic applications absolutely require a given minimum throughput. • Delay: An example of a delay-sensitive application is stock trading; someone who consistently receives later service will consistently act later, and with greater disadvantage. • Jitter: The magnitude of delay variation, called jitter, is a critical factor in realtime applications. Because of the variable delay imposed by the Internet, the interarrival times between packets are not maintained at a fixed interval at the destination. To compensate for this, the incoming packets are buffered, delayed sufficiently to compensate for the jitter, and then released at a constant rate to the software that is expecting a steady real-time stream.The larger the allowable delay variation, the longer the real delay in delivering the data and the greater the size of the delay buffer required at receivers. Real-time interactive applications, such as teleconferencing, may require a reasonable upper bound on jitter. • Packet loss: Real-time applications vary in the amount of packet loss, if any, that they can sustain. These requirements are difficult to meet in an environment with variable queuing delays and congestion losses. Accordingly, inelastic traffic introduces two new requirements into the internet architecture. First, some means is needed to give preferential treatment to applications with more demanding requirements. Applications need to be able to state their requirements, either ahead of time in some sort of service request function, or on the fly, by means of fields in the IP packet header. The former approach provides more flexibility in stating requirements, and it enables the network to anticipate demands and deny new requests if the required resources are unavailable. This approach implies the use of some sort of resource reservation protocol. A second requirement in supporting inelastic traffic in an internet architecture is that elastic traffic must still be supported. Inelastic applications typically do not back off and reduce demand in the face of congestion, in contrast to TCP-based applications. Therefore, in times of congestion, inelastic traffic will continue to supply a high load, and elastic traffic will be crowded off the internet. A reservation protocol can help control this situation by denying service requests that would leave too few resources available to handle current elastic traffic. ISA Approach The purpose of ISA is to enable the provision of QoS support over IP-based internets. The central design issue for ISA is how to share the available capacity in times of congestion. 19.3 / INTEGRATED SERVICES ARCHITECTURE 629 For an IP-based internet that provides only a best-effort service, the tools for controlling congestion and providing service are limited. In essence, routers have two mechanisms to work with: • Routing algorithm: Most routing protocols in use in internets allow routes to be selected to minimize delay. Routers exchange information to get a picture of the delays throughout the internet. Minimum-delay routing helps to balance loads, thus decreasing local congestion, and helps to reduce delays seen by individual TCP connections. • Packet discard: When a router’s buffer overflows, it discards packets. Typically, the most recent packet is discarded. The effect of lost packets on a TCP connection is that the sending TCP entity backs off and reduces its load, thus helping to alleviate internet congestion. These tools have worked reasonably well. However, as the discussion in the preceding subsection shows, such techniques are inadequate for the variety of traffic now coming to internets. ISA is an overall architecture within which a number of enhancements to the traditional best-effort mechanisms are being developed. In ISA, each IP packet can be associated with a flow. RFC 1633 defines a flow as a distinguishable stream of related IP packets that results from a single user activity and requires the same QoS. For example, a flow might consist of one transport connection or one video stream distinguishable by the ISA.A flow differs from a TCP connection in two respects:A flow is unidirectional, and there can be more than one recipient of a flow (multicast).Typically, an IP packet is identified as a member of a flow on the basis of source and destination IP addresses and port numbers, and protocol type. The flow identifier in the IPv6 header is not necessarily equivalent to an ISA flow, but in future the IPv6 flow identifier could be used in ISA. ISA makes use of the following functions to manage congestion and provide QoS transport: • Admission control: For QoS transport (other than default best-effort transport), ISA requires that a reservation be made for a new flow. If the routers collectively determine that there are insufficient resources to guarantee the requested QoS, then the flow is not admitted. The protocol RSVP is used to make reservations. • Routing algorithm: The routing decision may be based on a variety of QoS parameters, not just minimum delay. For example, the routing protocol OSPF, discussed in Section 19.2, can select routes based on QoS. • Queuing discipline: A vital element of the ISA is an effective queuing policy that takes into account the differing requirements of different flows. • Discard policy: A discard policy determines which packets to drop when a buffer is full and new packets arrive. A discard policy can be an important element in managing congestion and meeting QoS guarantees. ISA Components Figure 19.10 is a general depiction of the implementation architecture for ISA within a router. Below the thick horizontal line are the forwarding functions of the router; these are executed for each packet and therefore must be highly optimized. 630 CHAPTER 19 / INTERNETWORK OPERATION Routing protocol(s) Reservation protocol Admission control Routing database Traffic control database Classifier & route selection Packet scheduler Figure 19.10 Management agent QoS queuing Best-effort queuing Integrated Services Architecture Implemented in Router The remaining functions, above the line, are background functions that create data structures used by the forwarding functions. The principal background functions are as follows: • Reservation protocol: This protocol is to reserve resources for a new flow at a given level of QoS. It is used among routers and between routers and end systems. The reservation protocol is responsible for maintaining flow-specific state information at the end systems and at the routers along the path of the flow. RSVP is used for this purpose. The reservation protocol updates the traffic control database used by the packet scheduler to determine the service provided for packets of each flow. • Admission control: When a new flow is requested, the reservation protocol invokes the admission control function. This function determines if sufficient resources are available for this flow at the requested QoS. This determination is based on the current level of commitment to other reservations and/or on the current load on the network. • Management agent: A network management agent is able to modify the traffic control database and to direct the admission control module in order to set admission control policies. • Routing protocol: The routing protocol is responsible for maintaining a routing database that gives the next hop to be taken for each destination address and each flow. These background functions support the main task of the router, which is the forwarding of packets. The two principal functional areas that accomplish forwarding are the following: • Classifier and route selection: For the purposes of forwarding and traffic control, incoming packets must be mapped into classes. A class may correspond to a single flow or to a set of flows with the same QoS requirements. For example, 19.3 / INTEGRATED SERVICES ARCHITECTURE 631 the packets of all video flows or the packets of all flows attributable to a particular organization may be treated identically for purposes of resource allocation and queuing discipline. The selection of class is based on fields in the IP header. Based on the packet’s class and its destination IP address, this function determines the next-hop address for this packet. • Packet scheduler: This function manages one or more queues for each output port. It determines the order in which queued packets are transmitted and the selection of packets for discard, if necessary. Decisions are made based on a packet’s class, the contents of the traffic control database, and current and past activity on this outgoing port. Part of the packet scheduler’s task is that of policing, which is the function of determining whether the packet traffic in a given flow exceeds the requested capacity and, if so, deciding how to treat the excess packets. ISA Services ISA service for a flow of packets is defined on two levels. First, a number of general categories of service are provided, each of which provides a certain general type of service guarantees. Second, within each category, the service for a particular flow is specified by the values of certain parameters; together, these values are referred to as a traffic specification (TSpec). Currently, three categories of service are defined: • Guaranteed • Controlled load • Best effort An application can request a reservation for a flow for a guaranteed or controlled load QoS, with a TSpec that defines the exact amount of service required. If the reservation is accepted, then the TSpec is part of the contract between the data flow and the service. The service agrees to provide the requested QoS as long as the flow’s data traffic continues to be described accurately by the TSpec. Packets that are not part of a reserved flow are by default given a best-effort delivery service. Before looking at the ISA service categories, one general concept should be defined: the token bucket traffic specification. This is a way of characterizing traffic that has three advantages in the context of ISA: 1. Many traffic sources can be defined easily and accurately by a token bucket scheme. 2. The token bucket scheme provides a concise description of the load to be imposed by a flow, enabling the service to determine easily the resource requirement. 3. The token bucket scheme provides the input parameters to a policing function. A token bucket traffic specification consists of two parameters: a token replenishment rate R and a bucket size B. The token rate R specifies the continually sustainable data rate; that is, over a relatively long period of time, the average data rate to be supported for this flow is R. The bucket size B specifies the amount by which the data rate can exceed R for short periods of time. The exact condition is as follows: During any time period T, the amount of data sent cannot exceed RT + B. 632 CHAPTER 19 / INTERNETWORK OPERATION Token replenishment rate R bps Token generator 1. Router puts tokens into bucket at predetermined rate. 2. Tokens can accumulate up to bucket size; excess tokens discarded. Bucket size B bits 4. Router's queue regulator requests tokens equal to to size of the next packet. Arriving packets 3. Traffic seeks admittance to network. Regulator 5. If tokens are available, packet is queued for transmission. Departing packets 6. If tokens are not available, packet is either queued for transmission but marked as excess, buffered for later transmission, or discarded. Figure 19.11 Token Bucket Scheme Figure 19.11 illustrates this scheme and explains the use of the term bucket. The bucket represents a counter that indicates the allowable number of octets of IP data that can be sent at any time. The bucket fills with octet tokens at the rate of R (i.e., the counter is incremented R times per second), up to the bucket capacity (up to the maximum counter value). IP packets arrive and are queued for processing. An IP packet may be processed if there are sufficient octet tokens to match the IP data size. If so, the packet is processed and the bucket is drained of the corresponding number of tokens. If a packet arrives and there are insufficient tokens available, then the packet exceeds the TSpec for this flow. The treatment for such packets is not specified in the ISA documents; common actions are relegating the packet to best-effort service, discarding the packet, or marking the packet in such a way that it may be discarded in future. Over the long run, the rate of IP data allowed by the token bucket is R. However, if there is an idle or relatively slow period, the bucket capacity builds up, so that at most an additional B octets above the stated rate can be accepted. Thus, B is a measure of the degree of burstiness of the data flow that is allowed. Guaranteed Service The key elements of the guaranteed service are as follows: • The service provides assured capacity, or data rate. 19.3 / INTEGRATED SERVICES ARCHITECTURE 633 • There is a specified upper bound on the queuing delay through the network. This must be added to the propagation delay, or latency, to arrive at the bound on total delay through the network. • There are no queuing losses. That is, no packets are lost due to buffer overflow; packets may be lost due to failures in the network or changes in routing paths. With this service, an application provides a characterization of its expected traffic profile, and the service determines the end-to-end delay that it can guarantee. One category of applications for this service is those that need an upper bound on delay so that a delay buffer can be used for real-time playback of incoming data, and that do not tolerate packet losses because of the degradation in the quality of the output. Another example is applications with hard real-time deadlines. The guaranteed service is the most demanding service provided by ISA. Because the delay bound is firm, the delay has to be set at a large value to cover rare cases of long queuing delays. Controlled Load The key elements of the controlled load service are as follows: • The service tightly approximates the behavior visible to applications receiving best-effort service under unloaded conditions. • There is no specified upper bound on the queuing delay through the network. However, the service ensures that a very high percentage of the packets do not experience delays that greatly exceed the minimum transit delay (i.e., the delay due to propagation time plus router processing time with no queuing delays). • A very high percentage of transmitted packets will be successfully delivered (i.e., almost no queuing loss). As was mentioned, the risk in an internet that provides QoS for real-time applications is that best-effort traffic is crowded out. This is because best-effort types of applications employ TCP, which will back off in the face of congestion and delays. The controlled load service guarantees that the network will set aside sufficient resources so that an application that receives this service will see a network that responds as if these real-time applications were not present and competing for resources. The controlled service is useful for applications that have been referred to as adaptive real-time applications [CLAR92]. Such applications do not require an a priori upper bound on the delay through the network. Rather, the receiver measures the jitter experienced by incoming packets and sets the playback point to the minimum delay that still produces a sufficiently low loss rate (e.g., video can be adaptive by dropping a frame or delaying the output stream slightly; voice can be adaptive by adjusting silent periods). Queuing Discipline An important component of an ISA implementation is the queuing discipline used at the routers. Routers traditionally have used a first-in-first-out (FIFO) queuing 634 CHAPTER 19 / INTERNETWORK OPERATION discipline at each output port. A single queue is maintained at each output port. When a new packet arrives and is routed to an output port, it is placed at the end of the queue. As long as the queue is not empty, the router transmits packets from the queue, taking the oldest remaining packet next. There are several drawbacks to the FIFO queuing discipline: • No special treatment is given to packets from flows that are of higher priority or are more delay sensitive. If a number of packets from different flows are ready to be forwarded, they are handled strictly in FIFO order. • If a number of smaller packets are queued behind a long packet, then FIFO queuing results in a larger average delay per packet than if the shorter packets were transmitted before the longer packet. In general, flows of larger packets get better service. • A greedy TCP connection can crowd out more altruistic connections. If congestion occurs and one TCP connection fails to back off, other connections along the same path segment must back off more than they would otherwise have to do. To overcome the drawbacks of FIFO queuing, some sort of fair queuing scheme is used, in which a router maintains multiple queues at each output port (Figure 19.12). With simple fair queuing, each incoming packet is placed in the queue for its flow. The queues are serviced in round-robin fashion, taking one packet from each nonempty queue in turn. Empty queues are skipped over. This scheme is fair in that each busy flow gets to send exactly one packet per cycle. Further, this is a form of load balancing among the various flows. There is no advantage in being greedy. A greedy flow finds that its queues become long, increasing its delays, whereas other flows are unaffected by this behavior. A number of vendors have implemented a refinement of fair queuing known as weighted fair queuing (WFQ). In essence, WFQ takes into account the amount of traffic through each queue and gives busier queues more capacity without completely shutting out less busy queues. In addition, WFQ can take into account the amount of service requested by each traffic flow and adjust the queuing discipline accordingly. Flow 1 Flow 1 Flow 2 Flow 2 Multiplexed Xmit Multiplexed Xmit output Flow N Flow N (a) FIFO queuing Figure 19.12 FIFO and Fair Queuing (b) Fair queuing output 19.3 / INTEGRATED SERVICES ARCHITECTURE 635 Resource ReSerVation Protocol (RSVP) RFC 2205 defines RSVP, which provides supporting functionality for ISA. This subsection provides an overview. A key task, perhaps the key task, of an internetwork is to deliver data from a source to one or more destinations with the desired quality of service (QoS), such as throughput, delay, delay variance, and so on. This task becomes increasingly difficult on any internetwork with increasing number of users, data rate of applications, and use of multicasting. To meet these needs, it is not enough for an internet to react to congestion. Instead a tool is needed to prevent congestion by allowing applications to reserve network resources at a given QoS. Preventive measures can be useful in both unicast and multicast transmission. For unicast, two applications agree on a specific quality of service for a session and expect the internetwork to support that quality of service. If the internetwork is heavily loaded, it may not provide the desired QoS and instead deliver packets at a reduced QoS. In that case, the applications may have preferred to wait before initiating the session or at least to have been alerted to the potential for reduced QoS. A way of dealing with this situation is to have the unicast applications reserve resources in order to meet a given quality of service. Routers along an intended path could then preallocate resources (queue space, outgoing capacity) to assure the desired QoS. If a router could not meet the resource reservation because of prior outstanding reservations, then the applications could be informed. The applications may then decide to try again at a reduced QoS reservation or may decide to try later. Multicast transmission presents a much more compelling case for implementing resource reservation. A multicast transmission can generate a tremendous amount of internetwork traffic if either the application is high-volume (e.g., video) or the group of multicast destinations is large and scattered, or both. What makes the case for multicast resource reservation is that much of the potential load generated by a multicast source may easily be prevented. This is so for two reasons: 1. Some members of an existing multicast group may not require delivery from a particular source over some given period of time. For example, there may be two “channels” (two multicast sources) broadcasting to a particular multicast group at the same time. A multicast destination may wish to “tune in” to only one channel at a time. 2. Some members of a group may only be able to handle a portion of the source transmission. For example, a video source may transmit a video stream that consists of two components: a basic component that provides a reduced picture quality, and an enhanced component. Some receivers may not have the processing power to handle the enhanced component or may be connected to the internetwork through a subnetwork or link that does not have the capacity for the full signal. Thus, the use of resource reservation can enable routers to decide ahead of time if they can meet the requirement to deliver a multicast transmission to all designated multicast receivers and to reserve the appropriate resources if possible. Internet resource reservation differs from the type of resource reservation that may be implemented in a connection-oriented network, such as ATM or frame relay. 636 CHAPTER 19 / INTERNETWORK OPERATION An internet resource reservation scheme must interact with a dynamic routing strategy that allows the route followed by packets of a given transmission to change. When the route changes, the resource reservations must be changed.To deal with this dynamic situation, the concept of soft state is used. A soft state is simply a set of state information at a router that expires unless regularly refreshed from the entity that requested the state. If a route for a given transmission changes, then some soft states will expire and new resource reservations will invoke the appropriate soft states on the new routers along the route. Thus, the end systems requesting resources must periodically renew their requests during the course of an application transmission. Based on these considerations, the specification lists the following characteristics of RSVP: • Unicast and multicast: RSVP makes reservations for both unicast and multicast transmissions, adapting dynamically to changing group membership as well as to changing routes, and reserving resources based on the individual requirements of multicast members. • Simplex: RSVP makes reservations for unidirectional data flow. Data exchanges between two end systems require separate reservations in the two directions. • Receiver-initiated reservation: The receiver of a data flow initiates and maintains the resource reservation for that flow. • Maintaining soft state in the internet: RSVP maintains a soft state at intermediate routers and leaves the responsibility for maintaining these reservation states to end users. • Providing different reservation styles: These allow RSVP users to specify how reservations for the same multicast group should be aggregated at the intermediate switches. This feature enables a more efficient use of internet resources. • Transparent operation through non-RSVP routers: Because reservations and RSVP are independent of routing protocol, there is no fundamental conflict in a mixed environment in which some routers do not employ RSVP. These routers will simply use a best-effort delivery technique. • Support for IPv4 and IPv6: RSVP can exploit the Type-of-Service field in the IPv4 header and the Flow Label field in the IPv6 header. 19.4 DIFFERENTIATED SERVICES The Integrated Services Architecture (ISA) and RSVP are intended to support QoS capability in the Internet and in private internets. Although ISA in general and RSVP in particular are useful tools in this regard, these features are relatively complex to deploy. Further, they may not scale well to handle large volumes of traffic because of the amount of control signaling required to coordinate integrated QoS offerings and because of the maintenance of state information required at routers. As the burden on the Internet grows, and as the variety of applications grow, there is an immediate need to provide differing levels of QoS to different traffic flows. The differentiated services (DS) architecture (RFC 2475) is designed to 19.4 / DIFFERENTIATED SERVICES 637 provide a simple, easy-to-implement, low-overhead tool to support a range of network services that are differentiated on the basis of performance. Several key characteristics of DS contribute to its efficiency and ease of deployment: • IP packets are labeled for differing QoS treatment using the existing IPv4 (Figure 18.6) or IPv6 (Figure 18.11) DS field. Thus, no change is required to IP. • A service level agreement (SLA) is established between the service provider (internet domain) and the customer prior to the use of DS. This avoids the need to incorporate DS mechanisms in applications. Thus, existing applications need not be modified to use DS. • DS provides a built-in aggregation mechanism. All traffic with the same DS octet is treated the same by the network service. For example, multiple voice connections are not handled individually but in the aggregate. This provides for good scaling to larger networks and traffic loads. • DS is implemented in individual routers by queuing and forwarding packets based on the DS octet. Routers deal with each packet individually and do not have to save state information on packet flows. Today, DS is the most widely accepted QoS mechanism in enterprise networks. Although DS is intended to provide a simple service based on relatively simple mechanisms, the set of RFCs related to DS is relatively complex. Table 19.4 summarizes some of the key terms from these specifications. Services The DS type of service is provided within a DS domain, which is defined as a contiguous portion of the Internet over which a consistent set of DS policies are administered. Typically, a DS domain would be under the control of one administrative entity. The services provided across a DS domain are defined in an SLA, which is a service contract between a customer and the service provider that specifies the forwarding service that the customer should receive for various classes of packets. A customer may be a user organization or another DS domain. Once the SLA is established, the customer submits packets with the DS octet marked to indicate the packet class. The service provider must assure that the customer gets at least the agreed QoS for each packet class. To provide that QoS, the service provider must configure the appropriate forwarding policies at each router (based on DS octet value) and must measure the performance being provided for each class on an ongoing basis. If a customer submits packets intended for destinations within the DS domain, then the DS domain is expected to provide the agreed service. If the destination is beyond the customer’s DS domain, then the DS domain will attempt to forward the packets through other domains, requesting the most appropriate service to match the requested service. A draft DS framework document lists the following detailed performance parameters that might be included in an SLA: • Detailed service performance parameters such as expected throughput, drop probability, latency 638 CHAPTER 19 / INTERNETWORK OPERATION Table 19.4 Terminology for Differentiated Services Behavior Aggregate A set of packets with the same DS codepoint crossing a link in a particular direction. Classifier Selects packets based on the DS field (BA classifier) or on multiple fields within the packet header (MF classifier). DS Boundary Node A DS node that connects one DS domain to a node in another domain. DS Codepoint A specified value of the 6-bit DSCP portion of the 8-bit DS field in the IP header. DS Domain A contiguous (connected) set of nodes, capable of implementing differentiated services, that operate with a common set of service provisioning policies and per-hop behavior definitions. DS Interior Node A DS node that is not a DS boundary node. DS Node A node that supports differentiated services. Typically, a DS node is a router. A host system that provides differentiated services for applications in the host is also a DS node. Dropping The process of discarding packets based on specified rules; also called policing. Marking The process of setting the DS codepoint in a packet. Packets may be marked on initiation and may be re-marked by an en route DS node. Metering The process of measuring the temporal properties (e.g., rate) of a packet stream selected by a classifier. The instantaneous state of that process may affect marking, shaping, and dropping functions. Per-Hop Behavior (PHB) The externally observable forwarding behavior applied at a node to a behavior aggregate. Service Level Agreement (SLA) A service contract between a customer and a service provider that specifies the forwarding service a customer should receive. Shaping The process of delaying packets within a packet stream to cause it to conform to some defined traffic profile. Traffic Conditioning Control functions performed to enforce rules specified in a TCA, including metering, marking, shaping, and dropping. Traffic Conditioning Agreement (TCA) An agreement specifying classifying rules and traffic conditioning rules that are to apply to packets selected by the classifier. • Constraints on the ingress and egress points at which the service is provided, indicating the scope of the service • Traffic profiles that must be adhered to for the requested service to be provided, such as token bucket parameters • Disposition of traffic submitted in excess of the specified profile The framework document also gives some examples of services that might be provided: 1. Traffic offered at service level A will be delivered with low latency. 2. Traffic offered at service level B will be delivered with low loss. 3. Ninety percent of in-profile traffic delivered at service level C will experience no more than 50 ms latency. 4. Ninety-five percent of in-profile traffic delivered at service level D will be delivered. 5. Traffic offered at service level E will be allotted twice the bandwidth of traffic delivered at service level F. 19.4 / DIFFERENTIATED SERVICES 0 1 2 3 4 5 0 Differentiated services codepoint Class Class selector codepoints Increasing priority 000000 001000 010000 011000 100000 101000 110000 111000 1 2 3 4 639 5 Drop precedence DS codepoint Default behavior Class selector behaviors 100 011 010 001 Class Class 4—best service Class 3 Class 2 Class 1 010 100 110 Drop precedence Low—most important Medium High—least important 101110 Expedited forwarding (EF) behavior (a) DS Field Figure 19.13 (b) Codepoints for assured forwarding PHB DS Field 6. Traffic with drop precedence X has a higher probability of delivery than traffic with drop precedence Y. The first two examples are qualitative and are valid only in comparison to other traffic, such as default traffic that gets a best-effort service. The next two examples are quantitative and provide a specific guarantee that can be verified by measurement on the actual service without comparison to any other services offered at the same time. The final two examples are a mixture of quantitative and qualitative. DS Field Packets are labeled for service handling by means of the 6-bit DS field in the IPv4 header or the IPv6 header. The value of the DS field, referred to as the DS codepoint, is the label used to classify packets for differentiated services. Figure 19.13a shows the DS field. With a 6-bit codepoint, there are in principle 64 different classes of traffic that could be defined. These 64 codepoints are allocated across three pools of codepoints, as follows: • Codepoints of the form xxxxx0, where x is either 0 or 1, are reserved for assignment as standards. • Codepoints of the form xxxx11 are reserved for experimental or local use. • Codepoints of the form xxxx01 are also reserved for experimental or local use but may be allocated for future standards action as needed. Within the first pool, several assignments are made in RFC 2474. The codepoint 000000 is the default packet class. The default class is the best-effort forwarding behavior in existing routers. Such packets are forwarded in the order that they are received as soon as link capacity becomes available. If other higher-priority 640 CHAPTER 19 / INTERNETWORK OPERATION packets in other DS classes are available for transmission, these are given preference over best-effort default packets. Codepoints of the form xxx000 are reserved to provide backward compatibility with the IPv4 precedence service. To explain this requirement, we need to digress to an explanation of the IPv4 precedence service. The IPv4 type of service (TOS) field includes two subfields: a 3-bit precedence subfield and a 4-bit TOS subfield. These subfields serve complementary functions. The TOS subfield provides guidance to the IP entity (in the source or router) on selecting the next hop for this datagram, and the precedence subfield provides guidance about the relative allocation of router resources for this datagram. The precedence field is set to indicate the degree of urgency or priority to be associated with a datagram. If a router supports the precedence subfield, there are three approaches to responding: • Route selection: A particular route may be selected if the router has a smaller queue for that route or if the next hop on that route supports network precedence or priority (e.g., a token ring network supports priority). • Network service: If the network on the next hop supports precedence, then that service is invoked. • Queuing discipline: A router may use precedence to affect how queues are handled. For example, a router may give preferential treatment in queues to datagrams with higher precedence. RFC 1812, Requirements for IP Version 4 Routers, provides recommendations for queuing discipline that fall into two categories: • Queue service (a) Routers SHOULD implement precedence-ordered queue service. Prece- dence-ordered queue service means that when a packet is selected for output on a (logical) link, the packet of highest precedence that has been queued for that link is sent. (b) Any router MAY implement other policy-based throughput management procedures that result in other than strict precedence ordering, but it MUST be configurable to suppress them (i.e., use strict ordering). • Congestion control. When a router receives a packet beyond its storage capacity, it must discard it or some other packet or packets. (a) A router MAY discard the packet it has just received; this is the simplest but not the best policy. (b) Ideally, the router should select a packet from one of the sessions most heavily abusing the link, given that the applicable QoS policy permits this. A recommended policy in datagram environments using FIFO queues is to discard a packet randomly selected from the queue. An equivalent algorithm in routers using fair queues is to discard from the longest queue. A router MAY use these algorithms to determine which packet to discard. (c) If precedence-ordered queue service is implemented and enabled, the router MUST NOT discard a packet whose IP precedence is higher than that of a packet that is not discarded. 19.4 / DIFFERENTIATED SERVICES 641 (d) A router MAY protect packets whose IP headers request the maximize reli- ability TOS, except where doing so would be in violation of the previous rule. (e) A router MAY protect fragmented IP packets, on the theory that dropping a fragment of a datagram may increase congestion by causing all fragments of the datagram to be retransmitted by the source. (f) To help prevent routing perturbations or disruption of management functions, the router MAY protect packets used for routing control, link control, or network management from being discarded. Dedicated routers (i.e., routers that are not also general purpose hosts, terminal servers, etc.) can achieve an approximation of this rule by protecting packets whose source or destination is the router itself. The DS codepoints of the form xxx000 should provide a service that at minimum is equivalent to that of the IPv4 precedence functionality. DS Configuration and Operation Figure 19.14 illustrates the type of configuration envisioned in the DS documents. A DS domain consists of a set of contiguous routers; that is, it is possible to get from any router in the domain to any other router in the domain by a path that does not include routers outside the domain. Within a domain, the interpretation of DS codepoints is uniform, so that a uniform, consistent service is provided. Routers in a DS domain are either boundary nodes or interior nodes. Typically, the interior nodes implement simple mechanisms for handling packets based Classifier Meter Marker Classifier Shaper/dropper Queue management DS domain DS domain Host Host Border component Interior component Figure 19.14 DS Domains 642 CHAPTER 19 / INTERNETWORK OPERATION on their DS codepoint values. This includes queuing discipline to give preferential treatment depending on codepoint value, and packet-dropping rules to dictate which packets should be dropped first in the event of buffer saturation. The DS specifications refer to the forwarding treatment provided at a router as per-hop behavior (PHB). This PHB must be available at all routers, and typically PHB is the only part of DS implemented in interior routers. The boundary nodes include PHB mechanisms but more sophisticated traffic conditioning mechanisms are also required to provide the desired service. Thus, interior routers have minimal functionality and minimal overhead in providing the DS service, while most of the complexity is in the boundary nodes. The boundary node function can also be provided by a host system attached to the domain, on behalf of the applications at that host system. The traffic conditioning function consists of five elements: • Classifier: Separates submitted packets into different classes. This is the foundation of providing differentiated services. A classifier may separate traffic only on the basis of the DS codepoint (behavior aggregate classifier) or based on multiple fields within the packet header or even the packet payload (multifield classifier). • Meter: Measures submitted traffic for conformance to a profile. The meter determines whether a given packet stream class is within or exceeds the service level guaranteed for that class. • Marker: Re-marks packets with a different codepoint as needed. This may be done for packets that exceed the profile; for example, if a given throughput is guaranteed for a particular service class, any packets in that class that exceed the throughput in some defined time interval may be re-marked for best effort handling. Also, re-marking may be required at the boundary between two DS domains. For example, if a given traffic class is to receive the highest supported priority, and this is a value of 3 in one domain and 7 in the next domain, then packets with a priority 3 value traversing the first domain are remarked as priority 7 when entering the second domain. • Shaper: Delays packets as necessary so that the packet stream in a given class does not exceed the traffic rate specified in the profile for that class. • Dropper: Drops packets when the rate of packets of a given class exceeds that specified in the profile for that class. Figure 19.15 illustrates the relationship between the elements of traffic conditioning. After a flow is classified, its resource consumption must be measured. The metering function measures the volume of packets over a particular time interval to determine a flow’s compliance with the traffic agreement. If the host is bursty, a simple data rate or packet rate may not be sufficient to capture the desired traffic characteristics. A token bucket scheme, such as that illustrated in Figure 19.11, is an example of a way to define a traffic profile to take into account both packet rate and burstiness. If a traffic flow exceeds some profile, several approaches can be taken. Individual packets in excess of the profile may be re-marked for lower-quality handling and allowed to pass into the DS domain. A traffic shaper may absorb a burst of 19.4 / DIFFERENTIATED SERVICES 643 Meter Packets Figure 19.15 Classifier Marker Shaper/ dropper DS Traffic Conditioner packets in a buffer and pace the packets over a longer period of time. A dropper may drop packets if the buffer used for pacing becomes saturated. Per-Hop Behavior As part of the DS standardization effort, specific types of PHB need to be defined, which can be associated with specific differentiated services. Currently, two standards-track PHBs have been issued: expedited forwarding PHB (RFCs 3246 and 3247) and assured forwarding PHB (RFC 2597). Expedited Forwarding PHB RFC 3246 defines the expedited forwarding (EF) PHB as a building block for low-loss, low-delay, and low-jitter end-to-end services through DS domains. In essence, such a service should appear to the endpoints as providing close to the performance of a point-to-point connection or leased line. In an internet or packet-switching network, a low-loss, low-delay, and low-jitter service is difficult to achieve. By its nature, an internet involves queues at each node, or router, where packets are buffered waiting to use a shared output link. It is the queuing behavior at each node that results in loss, delays, and jitter. Thus, unless the internet is grossly oversized to eliminate all queuing effects, care must be taken in handling traffic for EF PHB to assure that queuing effects do not result in loss, delay, or jitter above a given threshold. RFC 3246 declares that the intent of the EF PHB is to provide a PHB in which suitably marked packets usually encounter short or empty queues. The relative absence of queues minimizes delay and jitter. Furthermore, if queues remain short relative to the buffer space available, packet loss is also kept to a minimum. The EF PHB is designed to configuring nodes so that the traffic aggregate3 has a well-defined minimum departure rate. (Well-defined means “independent of the dynamic state of the node.” In particular, independent of the intensity of other traffic at the node.) The general concept outlined in RFC 3246 is this: The border nodes control the traffic aggregate to limit its characteristics (rate, burstiness) to some predefined level. Interior nodes must treat the incoming traffic in such a way that queuing effects do not appear. In general terms, the requirement on interior nodes is that the aggregate’s maximum arrival rate must be less than the aggregate’s minimum departure rate. 3 The term traffic aggregate refers to the flow of packets associated with a particular service for a particular user. 644 CHAPTER 19 / INTERNETWORK OPERATION RFC 3246 does not mandate a specific queuing policy at the interior nodes to achieve the EF PHB. The RFC notes that a simple priority scheme could achieve the desired effect, with the EF traffic given absolute priority over other traffic. So long as the EF traffic itself did not overwhelm an interior node, this scheme would result in acceptable queuing delays for the EF PHB. However, the risk of a simple priority scheme is that packet flows for other PHB traffic would be disrupted. Thus, some more sophisticated queuing policy might be warranted. Assured Forwarding PHB The assured forwarding (AF) PHB is designed to provide a service superior to best-effort but one that does not require the reservation of resources within an internet and does not require the use of detailed discrimination among flows from different users. The concept behind the AF PHB was first introduced in [CLAR98] and is referred to as explicit allocation. The AF PHB is more complex than explicit allocation, but it is useful to first highlight the key elements of the explicit allocation scheme: 1. Users are offered the choice of a number of classes of service for their traffic. Each class describes a different traffic profile in terms of an aggregate data rate and burstiness. 2. Traffic from a user within a given class is monitored at a boundary node. Each packet in a traffic flow is marked in or out based on whether it does or does not exceed the traffic profile. 3. Inside the network, there is no separation of traffic from different users or even traffic from different classes. Instead, all traffic is treated as a single pool of packets, with the only distinction being whether each packet has been marked in or out. 4. When congestion occurs, the interior nodes implement a dropping scheme in which out packets are dropped before in packets. 5. Different users will see different levels of service because they will have different quantities of in packets in the service queues. The advantage of this approach is its simplicity. Very little work is required by the internal nodes. Marking of the traffic at the boundary nodes based on traffic profiles provides different levels of service to different classes. The AF PHB defined in RFC 2597 expands on the preceding approach in the following ways: 1. Four AF classes are defined, allowing the definition of four distinct traffic profiles. A user may select one or more of these classes to satisfy requirements. 2. Within each class, packets are marked by the customer or by the service provider with one of three drop precedence values. In case of congestion, the drop precedence of a packet determines the relative importance of the packet within the AF class. A congested DS node tries to protect packets with a lower drop precedence value from being lost by preferably discarding packets with a higher drop precedence value. This approach is still simpler to implement than any sort of resource reservation scheme but provides considerable flexibility. Within an interior DS node, traffic from the four classes can be treated separately, with different amounts of resources 19.5 / SERVICE LEVEL AGREEMENTS 645 (buffer space, data rate) assigned to the four classes. Within each class, packets are handled based on drop precedence. Thus, as RFC 2597 points out, the level of forwarding assurance of an IP packet depends on • How much forwarding resources has been allocated to the AF class to which the packet belongs • The current load of the AF class, and, in case of congestion within the class • The drop precedence of the packet RFC 2597 does not mandate any mechanisms at the interior nodes to manage the AF traffic. It does reference the RED algorithm as a possible way of managing congestion. Figure 19.13b shows the recommended codepoints for AF PHB in the DS field. 19.5 SERVICE LEVEL AGREEMENTS A service level agreement (SLA) is a contract between a network provider and a customer that defines specific aspects of the service that is to be provided. The definition is formal and typically defines quantitative thresholds that must be met. An SLA typically includes the following information: • A description of the nature of service to be provided: A basic service would be IP-based network connectivity of enterprise locations plus access to the Internet. The service may include additional functions such as Web hosting, maintenance of domain name servers, and operation and maintenance tasks. • The expected performance level of the service: The SLA defines a number of metrics, such as delay, reliability, and availability, with numerical thresholds. • The process for monitoring and reporting the service level: This describes how performance levels are measured and reported. The types of service parameters included in an SLA for an IP network are similar to those provided for frame relay and ATM networks. A key difference is that, because of the unreliable datagram nature of an IP network, it is more difficult to realize tightly defined constraints on performance, compared to the connection-oriented frame relay and ATM networks. Figure 19.16 shows a typical configuration that lends itself to an SLA. In this case, a network service provider maintains an IP-based network. A customer has a number of private networks (e.g., LANs) at various sites. Customer networks are connected to the provider via access routers at the access points. The SLA dictates service and performance levels for traffic between access routers across the provider network. In addition, the provider network links to the Internet and thus provides Internet access for the enterprise. For example, for the Internet Dedicated Service provided by MCI (http://global.mci.com/terms/us/products/dsl), the SLA includes the following items: • Availability: 100% availability. • Latency (delay): Average round-trip transmissions of …45 ms between access routers in the contiguous United States. Average round-trip transmissions of …90 ms between an access router in the New York metropolitan 646 CHAPTER 19 / INTERNETWORK OPERATION Customer networks Access routers ISP network Internet Figure 19.16 Typical Framework for Service Level Agreement area and an access router in the London metropolitan area. Latency is calculated by averaging sample measurements taken during a calendar month between routers. • Network packet delivery (reliability): Successful packet delivery rate of Ú 99.5%. • Denial of service (DoS): Responds to DoS attacks reported by customer within 15 minutes of customer opening a complete trouble ticket. MCI defines a DoS attack as more than 95% bandwidth utilization. • Network jitter: Jitter is defined as the variation or difference in the end-to-end delay between received packets of an IP or packet stream. Jitter performance will not exceed 1 ms between access routers. An SLA can be defined for the overall network service. In addition, SLAs can be defined for specific end-to-end services available across the carrier’s network, such as a virtual private network, or differentiated services. 19.6 IP PERFORMANCE METRICS The IPPM Performance Metrics Working Group (IPPM) is chartered by IETF to develop standard metrics that relate to the quality, performance, and reliability of Internet data delivery. Two trends dictate the need for such a standardized measurement scheme: 19.6 / IP PERFORMANCE METRICS 647 1. The Internet has grown and continues to grow at a dramatic rate. Its topology is increasingly complex. As its capacity has grown, the load on the Internet has grown at an even faster rate. Similarly, private internets, such as corporate intranets and extranets, have exhibited similar growth in complexity, capacity, and load. The sheer scale of these networks makes it difficult to determine quality, performance, and reliability characteristics. 2. The Internet serves a large and growing number of commercial and personal users across an expanding spectrum of applications. Similarly, private networks are growing in terms of user base and range of applications. Some of these applications are sensitive to particular QoS parameters, leading users to require accurate and understandable performance metrics. A standardized and effective set of metrics enables users and service providers to have an accurate common understanding of the performance of the Internet and private internets. Measurement data is useful for a variety of purposes, including • Supporting capacity planning and troubleshooting of large complex internets • Encouraging competition by providing uniform comparison metrics across service providers • Supporting Internet research in such areas as protocol design, congestion control, and quality of service • Verification of service level agreements Table 19.5 lists the metrics that have been defined in RFCs at the time of this writing. Table 19.5a lists those metrics which result in a value estimated based on a sampling technique. The metrics are defined in three stages: • Singleton metric: The most elementary, or atomic, quantity that can be measured for a given performance metric. For example, for a delay metric, a singleton metric is the delay experienced by a single packet. • Sample metric: A collection of singleton measurements taken during a given time period. For example, for a delay metric, a sample metric is the set of delay values for all of the measurements taken during a one-hour period. • Statistical metric: A value derived from a given sample metric by computing some statistic of the values defined by the singleton metric on the sample. For example, the mean of all the one-way delay values on a sample might be defined as a statistical metric. The measurement technique can be either active or passive. Active techniques require injecting packets into the network for the sole purpose of measurement. There are several drawbacks to this approach. The load on the network is increased. This, in turn, can affect the desired result. For example, on a heavily loaded network, the injection of measurement packets can increase network delay, so that the measured delay is greater than it would be without the measurement traffic. In addition, an active measurement policy can be abused for denial-of-service attacks disguised as legitimate measurement activity. Passive techniques observe and extract metrics from existing traffic. This approach can expose the contents of Internet traffic to unintended recipients, creating security and privacy concerns. So far, the metrics defined by the IPPM working group are all active. 648 CHAPTER 19 / INTERNETWORK OPERATION Table 19.5 IP Performance Metrics (a) Sampled metrics Metric Name Singleton Definition Statistical Definitions One-Way Delay Delay = dT, where Src transmits first bit of packet at T and Dst received last bit of packet at T + dT Percentile, median, minimum, inverse percentile Round-Trip Delay Delay = dT, where Src transmits first bit of packet at T and Src received last bit of packet immediately returned by Dst at T + dT Percentile, median, minimum, inverse percentile One-Way Loss Packet loss = 0 (signifying successful transmission and reception of packet); = 1 (signifying packet loss) Average One-Way Loss Pattern Loss distance: Pattern showing the distance between successive packet losses in terms of the sequence of packets Number or rate of loss distances below a defined threshold, number of loss periods, pattern of period lengths, pattern of interloss period lengths. Loss period: Pattern showing the number of bursty losses (losses involving consecutive packets) Packet Delay Variation Packet delay variation (pdv) for a pair of packets with a stream of packets difference between the one-way-delay of the selected packets Percentile, inverse percentile, jitter, peak-topeak pdv Src = IP address of a host Dst = IP address of a host (b) Other metrics Metric Name General Definition Metrics Connectivity Ability to deliver a packet over a transport connection. One-way instantaneous connectivity, two-way instantaneous connectivity, one-way interval connectivity, two-way interval connectivity, two-way temporal connectivity Bulk Transfer Capacity Long-term average data rate (bps) over a single congestion-aware transport connection. BTC = 1data sent2/1elapsed time2 For the sample metrics, the simplest technique is to take measurements at fixed time intervals, known as periodic sampling. There are several problems with this approach. First, if the traffic on the network exhibits periodic behavior, with a period that is an integer multiple of the sampling period (or vice versa), correlation effects may result in inaccurate values. Also, the act of measurement can perturb what is being measured (for example, injecting measurement traffic into a network alters the congestion level of the network), and repeated periodic perturbations can drive a network into a state of synchronization (e.g., [FLOY94]), greatly magnifying what might individually be minor effects. Accordingly, RFC 2330 (Framework for IP 19.7 / RECOMMENDED READING AND WEB SITES I1 MP1 P(i) P(j) MP2 P(i) I1 dTi I2 P(k) P(j) dTj 649 P(k) I2 dTk I1, I2 times that mark that beginning and ending of the interval in which the packet stream from which the singleton measurement is taken occurs MP1, MP2 source and destination measurement points P(i) ith measured packet in a stream of packets dTi one-way delay for P(i) Figure 19.17 Model for Defining Packet Delay Variation Performance Metrics) recommends Poisson sampling. This method uses a Poisson distribution to generate random time intervals with the desired mean value. Most of the statistical metrics listed in Table 19.5a are self-explanatory. The percentile metric is defined as follows: The xth percentile is a value y such that x% of measurements Ú y. The inverse percentile of x for a set of measurements is the percentage of all values …x. Figure 19.17 illustrates the packet delay variation metric. This metric is used to measure jitter, or variability, in the delay of packets traversing the network. The singleton metric is defined by selecting two packet measurements and measuring the difference in the two delays. The statistical measures make use of the absolute values of the delays. Table 19.5b lists two metrics that are not defined statistically. Connectivity deals with the issue of whether a transport-level connection is maintained by the network. The current specification (RFC 2678) does not detail specific sample and statistical metrics but provides a framework within which such metrics could be defined. Connectivity is determined by the ability to deliver a packet across a connection within a specified time limit. The other metric, bulk transfer capacity, is similarly specified (RFC 3148) without sample and statistical metrics but begins to address the issue of measuring the transfer capacity of a network service with the implementation of various congestion control mechanisms. 19.7 RECOMMENDED READING AND WEB SITES A number of worthwhile books provide detailed coverage of various routing algorithms: [HUIT00], [BLAC00], and [PERL00]. [MOY98] provides a thorough treatment of OSPF. Perhaps the clearest and most comprehensive book-length treatment of Internet QoS is [ARMI00]. [XIAO99] provides an overview and overall framework for Internet QoS as well as integrated and differentiated services. [CLAR92] and [CLAR95] provide valuable surveys of the issues involved in internet service allocation for real-time and elastic applications, 650 CHAPTER 19 / INTERNETWORK OPERATION respectively. [SHEN95] is a masterful analysis of the rationale for a QoS-based internet architecture. [ZHAN95] is a broad survey of queuing disciplines that can be used in an ISA, including an analysis of FQ and WFQ. [ZHAN93] is a good overview of the philosophy and functionality of RSVP, written by its developers. [WHIT97] is a broad survey of both ISA and RSVP. [CARP02] and [WEIS98] are instructive surveys of differentiated services, while [KUMA98] looks at differentiated services and supporting router mechanisms that go beyond the current RFCs. For a thorough treatment of DS, see [KILK99]. Two papers that compare IS and DS in terms of services and performance are [BERN00] and [HARJ00]. [VERM04] is an excellent surveys of service level agreements for IP networks. [BOUI02] covers the more general case of data networks. [MART02] examines limitations of IP network SLAs compared to data networks such as frame relay. [CHEN02] is a useful survey of Internet performance measurement issues. [PAXS96] provides an overview of the framework of the IPPM effort. ARMI00 Armitage, G. Quality of Service in IP Networks. Indianapolis, IN: Macmillan Technical Publishing, 2000. BERN00 Bernet, Y. “The Complementary Roles of RSVP and Differentiated Services in the Full-Service QoS Network.” IEEE Communications Magazine, February 2000. BLAC00 Black, U. IP Routing Protocols: RIP, OSPF, BGP, PNNI & Cisco Routing Protocols. Upper Saddle River, NJ: Prentice Hall, 2000. BOUI02 Bouillet, E.; Mitra, D.; and Ramakrishnan, K. “The Structure and Management of Service Level Agreements in Networks.” IEEE Journal on Selected Areas in Communications, May 2002. CARP02 Carpenter, B., and Nichols, K. “Differentiated Services in the Internet.” Proceedings of the IEEE, September 2002. CHEN02 Chen, T. “Internet Performance Monitoring.” Proceedings of the IEEE, September 2002. CLAR92 Clark, D.; Shenker, S.; and Zhang, L. “Supporting Real-Time Applications in an Integrated Services Packet Network: Architecture and Mechanism” Proceedings, SIGCOMM ’92, August 1992. CLAR95 Clark, D. Adding Service Discrimination to the Internet. MIT Laboratory for Computer Science Technical Report, September 1995. Available at http:// ana-www.lcs.mit.edu/anaWeb/papers.html HARJ00 Harju, J., and Kivimaki, P. “Cooperation and Comparison of DiffServ and IntServ: Performance Measurements.” Proceedings, 23rd Annual IEEE Conference on Local Computer Networks, November 2000. HUIT00 Huitema, C. Routing in the Internet. Upper Saddle River, NJ: Prentice Hall, 2000. KILK99 Kilkki, K. Differentiated Services for the Internet. Indianapolis, IN: Macmillan Technical Publishing, 1999. KUMA98 Kumar, V.; Lakshman, T.; and Stiliadis, D. “Beyond Best Effort: Router Architectures for the Differentiated Services of Tomorrow’s Internet.” IEEE Communications Magazine, May 1998. MART02 Martin, J., and Nilsson, A. “On Service Level Agreements for IP Networks.” Proceeding IEEE INFOCOMM ’02, 2002. MOY98 Moy, J. OSPF: Anatomy of an Internet Routing Protocol. Reading, MA: Addison-Wesley, 1998. 19.8 / KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS 651 PAXS96 Paxson, V. “Toward a Framework for Defining Internet Performance Metrics.” Proceedings, INET ’96, 1996. http://www-nrg.ee.lbl.gov PERL00 Perlman, R. Interconnections: Bridges, Routers, Switches, and Internetworking Protocols. Reading, MA: Addison-Wesley, 2000. SHEN95 Shenker, S. “Fundamental Design Issues for the Future Internet.” IEEE Journal on Selected Areas in Communications, September 1995. VERM04 Verma, D. “Service Level Agreements on IP Networks.” Proceedings of the IEEE, September 2004. WEIS98 Weiss, W. “QoS with Differentiated Services.” Bell Labs Technical Journal, October–December 1998. WHIT97 White, P., and Crowcroft, J. “The Integrated Services in the Internet: State of the Art.” Proceedings of the IEEE, December 1997. XIAO99 Xiao, X., and Ni, L. “Internet QoS: A Big Picture.” IEEE Network, March/April 1999. ZHAN93 Zhang, L.; Deering, S.; Estrin, D.; Shenker, S.; and Zappala, D. “RSVP: A New Resource ReSerVation Protocol.” IEEE Network, September 1993. ZHAN95 Zhang, H. “Service Disciplines for Guaranteed Performance Service in Packet-Switching Networks.” Proceedings of the IEEE, October 1995. Recommended Web sites: • Inter-Domain Routing working group: Chartered by IETF to revise BGP and related standards. The Web site includes all relevant RFCs and Internet drafts. • OSPF working group: Chartered by IETF to develop OSPF and related standards. The Web site includes all relevant RFCs and Internet drafts. • RSVP Project: Home page for RSVP development. • IP Performance Metrics working group: Chartered by IETF to develop a set of standard metrics that can be applied to the quality, performance, and reliability of Internet data delivery services. The Web site includes all relevant RFCs and Internet drafts. 19.8 KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS Key Terms autonomous system (AS) Border Gateway Protocol (BGP) classifier broadcast address Differentiated Services (DS) distance-vector routing dropper elastic traffic exterior router protocol inelastic traffic Integrated Services Architecture (ISA) interior router protocol Internet Group Management Protocol jitter link-state routing marker meter multicast address multicasting neighbor acquisition neighbor reachability 652 CHAPTER 19 / INTERNETWORK OPERATION network reachability Open Shortest Path First (OSPF) path-vector routing per-hop behavior (PHB) quality of service (QoS) queuing discipline Resource ReSerVation Protocol (RSVP) shaper unicast address Review Questions 19.1 19.2 19.3 19.4 19.5 19.6 19.7 19.8 19.9 19.10 19.11 19.12 19.13 19.14 19.15 19.16 List some practical applications of multicasting. Summarize the differences among unicast, multicast, and broadcast addresses. List and briefly explain the functions that are required for multicasting. What operations are performed by IGMP? What is an autonomous system? What is the difference between an interior router protocol and an exterior router protocol? Compare the three main approaches to routing. List and briefly explain the three main functions of BGP. What is the Integrated Services Architecture? What is the difference between elastic and inelastic traffic? What are the major functions that are part of an ISA? List and briefly describe the three categories of service offered by ISA. What is the difference between FIFO queuing and WFQ queuing? What is the purpose of a DS codepoint? List and briefly explain the five main functions of DS traffic conditioning. What is meant by per-hop behavior? Problems 19.1 Most operating systems include a tool named “traceroute” (or “tracert”) that can be used to determine the path packets follow to reach a specified host from the system the tool is being run on. A number of sites provide Web access to the “traceroute” tool, for example, http://www.supporttechnique.net/traceroute.ihtml http://www.t1shopper.com/tools/traceroute 19.2 19.3 Use the “traceroute” tool to determine the path packets follow to reach the host williamstallings.com. A connected graph may have more than one spanning tree. Find all spanning trees of this graph: In the discussion of Figure 19.1, three alternatives for transmitting a packet to a multicast address were discussed: broadcast, multiple unicast, and true multicast. Yet another alternative is flooding. The source transmits one packet to each neighboring router. Each router, when it receives a packet, retransmits the packet on all outgoing interfaces except the one on which the packet is received. Each packet is labeled with a unique identifier so that a router does not flood the same packet more than once. Fill out a matrix similar to those of Table 19.1 and comment on the results. 19.8 / KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS 19.4 19.5 19.6 19.7 653 In a manner similar to Figure 19.3, show the spanning tree from router B to the multicast group. IGMP specifies that query messages are sent in IP datagrams that have the Time to Live field set to 1. Why? In IGMPv1 and IGMPv2, a host will cancel sending a pending membership report if it hears another host claiming membership in that group, in order to control the generation of IGMP traffic. However, IGMPv3 removes this suppression of host membership reports. Analyze the reasons behind this design decision. IGMP Membership Queries include a “Max Resp Code” field that specifies the maximum time allowed before sending a responding report. The actual time allowed, called the Max Resp Time, is represented in units of 1/10 second and is derived from the Max Resp Code as follows: If MaxRespCode 6 128, MaxRespTime = Max Resp Code If MaxRespCode Ú 128, MaxRespTime is a floating-point value as follows: 0 1 1 2 exp 3 4 5 6 7 mant MaxRespTime = 1mant ƒ 0x102 V 1exp + 32 in C notation MaxRespTime = 1mant + 162 * 2 1exp + 32 19.8 19.9 19.10 19.11 19.12 19.13 Explain the motivation for the smaller values and the larger values. Multicast applications call an API function on their sockets in order to ask the IP layer to enable or disable reception of packets sent from some specific IP address(es) to a specific multicast address. For each of these sockets, the system records the desired multicast reception state. In addition to these per-socket multicast reception states, the system must maintain a multicast reception state for each of its interfaces, which is derived from the persocket reception states. Suppose four multicast applications run on the same host, and participate in the same multicast group, M1. The first application uses an EXCLUDE5A1, A2, A36 filter. The second one uses an EXCLUDE5A1, A3, A46 filter. The third one uses an INCLUDE5A3, A46 filter. And the fourth one uses an INCLUDE5A36 filter. What’s the resulting multicast state (multicast address, filter mode, source list) for the network interface? Multicast applications commonly use UDP or RTP (Real-Time Transport Protocol; discussed in Chapter 24) as their transport protocol. Multicast application do not use TCP as its transport protocol. What’s the problem with TCP? With multicasting, packets are delivered to multiple destinations. Thus, in case of errors (such as routing failures), one IP packet might trigger multiple ICMP error packets, leading to a packet storm. How is this potential problem avoided? Hint: Consult RFC 1122. BGP’s AS_PATH attribute identifies the autonomous systems through which routing information has passed. How can the AS_PATH attribute be used to detect routing information loops? BGP provides a list of autonomous systems on the path to the destination. However, this information cannot be considered a distance metric. Why? RFC 2330 (Framework for IP Performance Metrics) defines percentile in the following way. Given a collection of measurements, define the function F(x), which for any x gives the percentage of the total measurements that were … x. If x is less than the minimum value observed, then F1x2 = 0%. If it is greater or equal to the maximum value observed, then F1x2 = 100%. The yth percentile refer to the smallest value of x for which F1x2 Ú y. Consider that we have the following measurements: - 2, 7, 7, 4, 18, -5. Determine the following percentiles: 0, 25, 50, 100. 654 CHAPTER 19 / INTERNETWORK OPERATION 19.14 19.15 19.16 19.17 19.18 19.19 19.20 19.21 19.22 For the one-way and two-way delay metrics, if a packet fails to arrive within a reasonable period of time, the delay is taken to be undefined (informally, infinite). The threshold of reasonable is a parameter of the methodology. Suppose we take a sample of one-way delays and get the following results: 100 ms, 110 ms, undefined, 90 ms, 500 ms. What is the 50th percentile? RFC 2330 defines the median of a set of measurements to be equal to the 50th percentile if the number of measurements is odd. For an even number of measurements, sort the measurements in ascending order; the median is then the mean of the two central values. What is the median value for the measurements in the preceding two problems? RFC 2679 defines the inverse percentile of x for a set of measurements to be the percentage of all values …x. What is the inverse percentile of 103 ms for the measurements in Problem 19.14? When multiple equal-cost routes to a destination exist, OSPF may distribute traffic equally among the routes. This is called load balancing. What effect does such load balancing have on a transport layer protocol, such as TCP? It is clear that if a router gives preferential treatment to one flow or one class of flows, then that flow or class of flows will receive improved service. It is not as clear that the overall service provided by the internet is improved. This question is intended to illustrate an overall improvement. Consider a network with a single link modeled by an exponential server of rate Ts = 1, and consider two classes of flows with Poisson arrival rates of l1 = l2 = 0.25 and that have utility functions U1 = 4 - 2Tq1 and U2 = 4 - Tq2 , where Tqi represents the average queuing delay to class i. Thus, class 1 traffic is more sensitive to delay than class 2. Define the total utility of the network as V = U1 + U2 . a. Assume that the two classes are treated alike and that FIFO queuing is used. What is V? b. Now assume a strict priority service so that packets from class 1 are always transmitted before packets in class 2. What is V? Comment. Provide three examples (each) of elastic and inelastic Internet traffic. Justify each example’s inclusion in their respective category. Why does a Differentiated Services (DS) domain consist of a set of contiguous routers? How are the boundary node routers different from the interior node routers in a DS domain? The token bucket scheme places a limit on the length of time at which traffic can depart at the maximum data rate. Let the token bucket be defined by a bucket size B octets and a token arrival rate of R octets/second, and let the maximum output data rate be M octets/s. a. Derive a formula for S, which is the length of the maximum-rate burst. That is, for how long can a flow transmit at the maximum output rate when governed by a token bucket? b. What is the value of S for B = 250 KB, R = 2 MB/s, and M = 25 MB/s? Hint: The formula for S is not so simple as it might appear, because more tokens arrive while the burst is being output. In RSVP, because the UDP/TCP port numbers are used for packet classification, each router must be able to examine these fields. This requirement raises problems in the following areas: a. IPv6 header processing b. IP-level security Indicate the nature of the problem in each area, and suggest a solution. CHAPTER TRANSPORT PROTOCOLS 20 20.1 Connection-Oriented Transport Protocol Mechanisms 20.2 TCP 20.3 TCP Congestion Control 20.4 UDP 20.5 Recommended Reading and Web Sites 20.6 Key Terms, Review Questions, and Problems 655 656 CHAPTER 20 / TRANSPORT PROTOCOLS The foregoing observations should make us reconsider the widely held view that birds live only in the present. In fact, birds are aware of more than immediately present stimuli; they remember the past and anticipate the future. —The Minds of Birds, Alexander Skutch KEY POINTS • • • • The transport protocol provides an end-to-end data transfer service that shields upper-layer protocols from the details of the intervening network or networks. A transport protocol can be either connection oriented, such as TCP, or connectionless, such as UDP. If the underlying network or internetwork service is unreliable, such as with the use of IP, then a reliable connection-oriented transport protocol becomes quite complex. The basic cause of this complexity is the need to deal with the relatively large and variable delays experienced between end systems. These large, variable delays complicate the flow control and error control techniques. TCP uses a credit-based flow control technique that is somewhat different from the sliding-window flow control found in X.25 and HDLC. In essence, TCP separates acknowledgments from the management of the size of the sliding window. Although the TCP credit-based mechanism was designed for end-toend flow control, it is also used to assist in internetwork congestion control. When a TCP entity detects the presence of congestion in the Internet, it reduces the flow of data onto the Internet until it detects an easing in congestion. In a protocol architecture, the transport protocol sits above a network or internetwork layer, which provides network-related services, and just below application and other upper-layer protocols. The transport protocol provides services to transport service (TS) users, such as FTP, SMTP, and TELNET. The local transport entity communicates with some remote transport entity, using the services of some lower layer, such as the Internet Protocol. The general service provided by a transport protocol is the end-to-end transport of data in a way that shields the TS user from the details of the underlying communications systems. We begin this chapter by examining the protocol mechanisms required to provide these services. We find that most of the complexity relates to reliable connection-oriented services. As might be expected, the less the network service provides, the more the transport protocol must do. The remainder of the chapter looks at two widely used transport protocols: Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). Refer to Figure 2.5 to see the position within the TCP/IP suite of the protocols discussed in this chapter.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
advertisement