ERROR-RESILIENT REAL-TIME VIDEO COMMUNICATIONS OVER IP-BASED LANS Guo-Shiang Ma and Ming-Syan Chen∗ Jeng-Chun Chen† Electrical Engineering Department National Taiwan University Taipei, Taiwan, ROC Philips Research East Asia† Philips Coorporation Taipei, Taiwan, ROC ABSTRACT Video communication over IP networks involves more issues than that over ATM networks in terms of error control. First, IP networks provide neither prioritized transmission nor guaranteed QOS for data packets. Second, video streams need to be packetized well before transmission over IP networks in order to have good error resilience characteristics. Finally, the mitigation of spatial and temporal error propagation in IP networks is very different from that in ATM networks upon the loss of IP packets. In this paper, we develop an error-resilient video communication system based on the H.323 family of ITU-T recommendations. With a low-delay backward channel for packet receipt reports, bidirectional error resilience mechanisms are employed to avoid error spreading in the temporal domain. Two operation modes, i.e., ACK and NACK, are devised for two communicating parties to eliminate error propagation after either one frame lag or one round trip delay without consuming much additional bandwidth as compared to simple INTRA updates. Our system can also adaptively choose an operation mode to maintain a satisfactory video quality at the decoder under variable channel error conditions. It is shown by experimental results that our error control scheme is suitable for video transmission over IP-based LANs at packet loss rates up to 20%. 1. INTRODUCTION The Internet is in essence a packet-switched network with variable bandwidth, variable transmission delay, and variable error characteristics. Therefore, real-time video communication in the Internet needs to take into consideration various packet loss conditions. Particularly, error resiliency in video applications via error detection, recovery, and concealment is the key to satisfactory video quality. There has been an extensive literature on error control and concealment for video communication, especially over ATM networks. However, not much focus has been placed upon error resiliency in real-time video communication over IP networks. Readers are referred to  for a discussion of the differences between circuit-switched and packet-based videoconferencing solutions. There has been a lot of research on error concealment at the decoder over ATM networks . However, most postprocessing techniques could fail if a single packet contains rows of macroblocks. Note that transmitting a lot of small IP packets over the Internet consumes a large amount of bandwidth in packet headers. Therefore, this trade-off makes ordinary error concealment mechanisms less effective, which in turn leads our system design to incorporate feedback-based error control. As will be seen in our performance study later, utilizing a backward channel can effectively maintain the visual quality to a satisfactory level. ∗ The corresponding author of this paper. The encoder can encode input video sequences according to variable channel error situations when a separate backward channel is available for the decoder to send packet loss reports. This channel can usually be easily established and maintained in a small videoconference. The most attractive feature is that error propagation can be completely removed soon at the decoder even without succeeding full INTRA-coded frames. If enough memory resources are available at the encoder, it can encode frames using some previous correctly decoded frames for reference, thus removing error propagation at the decoder. This technique has two operation modes, i.e., NACK and ACK modes. As will be seen later, the NACK mode is preferred when packet loss rate is low since it places little burden on the output bandwidth. The ACK mode works better at high packet loss rates since a single packet loss only affects one frame. Note that the ACK mode generally consumes more bandwidth since acknowledged frames rather than adjacent frames are used for reference. Thus, the codec is recommended to use a coarser quantizer if rate control is required to meet a certain bandwidth requirement. In our developed system, we set the focus on the transport level control and the selection of reference pictures. H.261 video streams are packetized on a GOB basis in our modified RFC 2032  scheme. Then, the RTP module takes care of the transmission of these video data packets. A simple temporal error concealment technique is employed at the decoder to conceal packet errors with a short processing delay. We implement bi-directional error control mechanisms into our H.323 video conferencing software to deal with packet error and error propagation. Our error resilience mechanism establishes an out-of-band data channel that serves as a backward channel for the decoder to send packet receipt reports. This connection-oriented channel is assumed to have a short delay; otherwise late backward reports would lose their value to the encoder. To suppress error propagation while maintaining coding efficiency which refers to the efficiency of compression, our system is able to switch between two operation modes in the presence of variable network conditions. We then proceed to observe the effectiveness in the maintenance of picture quality without much additional bandwidth as required by full INTRA-frame refresh. The contribution of this paper includes the design of errorresilient mechanisms and the performance analysis of the NACK and ACK operation modes. In our prototype model, we adopted the design concept of the Reference Picture Selection (RPS) option of H.263+ . We use PSNR and bit rates to evaluate the performance of both operation modes at different packet loss rates. Our experimental results have shown that the NACK mode is appropriate at low packet loss rates since it requires less memory resources and consumes less bandwidth. However, it is sensitive to packet loss and it could fail when encountering successive, bursty packet losses. The ACK mode is very insensitive to packet loss because a packet error only affects a single frame. We have observed that the visual quality can be maintained in the ACK mode even at high packet loss rates up to 20%. Clearly, if some other performance measures are of interest, our system then can evaluate the performance of both modes and make proper switches between them accordingly. This paper is organized as follows. Details of our system model are described in Section 2. We present experimental results and performance evaluation in Section 3. Conclusions are drawn in Section 4. Backward Channel Session Management Forward Video Channel Send Video Compressor Packetizer RTP Input Video Sequence LAN 2. DESIGN OF ERROR-RESILIENT VIDEO COMMUNICATION The checksum filed in the IP header ensures the integrity of the IP packet, providing protection against bit errors. Thus, forward error correction (FEC) that consumes bandwidth is not justifiable on IP networks. IP-based LANs are characterized as having adequate bandwidth for video applications and providing a bit-errorfree environment. In this study, we assume a maximum packet loss rate of 20%. To validate our error-resilient design for video communication, we develop an ITU-T H.323  compliant videoconference application with a focus on highly interactive, but small video conferences. To boost error resiliency, this software application takes advantage of two-way communication mechanisms such as acknowledgments of correctly received video packets via backward channel messages. Packet losses can severely degrade the quality of future decoded pictures at the decoder since an IP packet usually carries an integer number of MB rows. Obviously it is not enough to just apply postprocessing techniques for error concealment. The ITU Recommendation H.245  allows the encoder and decoder to build an out-of-band channel on which the decoder can return packet loss information. To exploit this feature, our design makes good use of a connection-oriented data channel. Upon backward channel messages sent from the decoder, the encoder can take into account the packet loss and encode either severely damaged MBs in INTRA mode or less affected MBs in INTER mode. 2.1. H.323 Video Conference System (VCS) H.323 Video Conference System (VCS) is our video-conference software for video conferencing between two parties by courtesy of Philips Innovation Center, Taipei (PICT). In our joint research project, VCS has served as an excellent testbed because of its full H.323 functionality. There are seven major subsystems in this H.323 VCS software: Session Management, Q.931 Management, H.245 Management, Channel Management, RAS Management, RTP Management, and Socket Management. As shown in Figure 1, the Data Channel is used as a separate backward channel for interactive error concealment. Since it is built upon TCP sockets, all backward messages can reach the encoder in order and remain intact. The Session Management is responsible for passing these messages to its forward video channel. Then received back-channel messages are all processed by the encoder before a new input video frame is to be encoded. The encoder can both free useless frame buffers on ACK messages and use the decoder’s requested frame for reference on NACK messages. 2.2. Coding Control for NACK mode If no packet loss is present during video transmission, the encoder can always successfully encode data with the expected coding efficiency. However, if the encoder does not know some packets did not reach the other end, subsequent decoded frames can suffer from a degradation in visual quality because of predictive coding. With NACK or ACK messages from the decoder, the encoder can react to them right away to eliminate error propagation at the decoder. Reverse Video Channel Video Decompressor Packet Loss Simulator Depacketizer RTP Receive Local Display Device Session Management Backward Channel Figure 1: Video transmission model of VCS Encode frame 7 using frame 2 for reference Keep only frames 2 and 7 Encoder: 1 2 3 (1) Decoder: 1 4 (2) 2 5 6 7 8 9 4’ 5’ 6’ 7 (3) 3’ (4) 10 (1)-ACK(1, 1) (2)-ACK(2, 2) (3)-NACK(3, 2) (4)-ACK(4, 7) (5)-ACK(8, 8) 11 12 (5) 8 9 Back up frame 2 Decode frame 7 using the backup frame for reference Figure 2: Illustration of operation in the NACK mode When operating in the NACK mode without any packet loss, the decoder keeps sending backward acknowledgment messages for the encoder to maintain frame buffers. By checking the TR (Temporal Reference) field in an ACK message, the encoder can then safely release frame buffers prior to the acknowledged frame. The best coding efficiency is maintained since the time lag between the reference frame and the current one is always one frame interval. After a packet loss, the decoder sends an NACK with the RTR (Requested Temporal Reference) field set to the temporal reference of some previous correctly decoded frame kept at the decoder. Upon the receipt of this NACK message, the encoder then encodes data using the decoder’s requested frame for reference. The TRP (Temporal Reference for Prediction) field in our modified RFC 2032 header is set to the RTR value in this NACK message to inform the decoder which backup frame is used for reference. When a transmission error occurs, the decoder sends through the backward channel a negative acknowledgment message to inform the encoder. Consider the execution scenario in Figure 2 as an example. In Figure 2, assume that frame 3 cannot be correctly decoded. The decoder then backs up frame 2 into its additional frame buffer and sends backward NACK(3, 2) requesting the encoder to use frame 2 for reference. On receipt of an NACK for frame 3 before the encoding of frame 7, the encoder can therefore use frame 2 for reference and free the memory resources for frame 3 through frame 6 since these frames are all corrupt at the decoder. Until the encoded data of frame 7 arrives, the decoder suffers from using inconsistent frames for reference for the period of one round trip delay. Note that the decoder requires only one additional frame buffer in NACK operation mode. If the round trip delay is short, which is assumed to be one of the error characteristics of LANs, the impact of packet losses at the decoder can be removed after an acceptable period of time. The advantage of this mechanism over simple INTRA updates lies in the increased coding efficiency. silience mechanisms. We chose an H.261 codec over an H.263 codec since PVRG-P64 Codec is more suitable for our framebased experiment and also the bandwidth is not our main concern because our target network environment is local area networks. Our proposed mechanism is in fact applicable to different video coding standards as long as a RTP payload-specific header format (e.g., RFC 2429 for H.263+) and a back-channel message format are available. To simulate real-time video conferencing, our system encodes at 15 fps (frames per second) the first 800 frames of a typical videophone sequence (Mother and Daughter) in the QCIF format. This video sequence has little motion in it and the background scene does not change frequently as in a video-conferencing setting. We conducted two sets of experiments where one set is in the NACK mode and the other in the ACK mode. Each set has four different packet loss rates so as to observe the impact of packet loss. A Poisson process is used to generate packet loss patterns for different packet loss rates. The same packet loss patterns are used in both sets for fair performance comparison. The video channel of our H.323 VCS software feeds the H.261 codec 15 input video frames every second. Each frame is first encoded and packetized into three packets on a GOB basis. Note that a modified RFC 2032 header is prefixed to each packetized data packet. The video channel then passes all packets to its RTP module for real-time transmission. Bit rates of 128 kbps are used to simulate high-bandwidth LAN connections. A constant quantizer of 8 is used for encoding the whole sequence. We compare the robustness to errors of an H.261 video coder with only error concealment to another H.261 coder that takes advantage of a feedback channel. The backward channel is error-free since it transports back-channel messages by TCP. 2.3. Coding Control for ACK mode 3.2. Experiments in the NACK and ACK Modes When operated in the ACK mode, the decoder sends ACK messages to acknowledge all correctly decoded frames and the encoder uses for reference only the requested frame indicated in the backchannel message. The coding performance is lower even when no transmission errors occur since the time lag between the reference frame and the current frame is more than one round trip delay. However, error propagation can be avoided entirely since only acknowledged frames are used for reference. Consider the execution scenario in Figure 3 as an example. The decoder acknowledges and backs up every correctly decoded frame until an error occurs to frame 4. Since frame 5 is encoded using frame 1 for reference, the decoder can still correctly decode frame 5 in reference to its backup frame 1 without any error propagated from corrupt frame 4. Note that no succeeding frames would be in reference to frame 4. The decoder can thus avoid error propagation after the error to frame 4 entirely. When the decoder receives data for frame 6, it can safely free the frame buffer for frame 1 and use frame 2 for reference. To reduce the number of additional frame buffer at the decoder by half, every two ACK messages could request the same previous reference frame. It should be noted that enough previous frames at the encoder are required to cover the maximum round trip delay of NACK and ACK messages. However, the number of additional frame buffers at the decoder can be reduced to one in the NACK mode. Storage reduction by half is possible in the ACK mode at the cost of slightly increased bit rates. The average peak signal-to-noise ratio (PSNR) is used as a distortion measure of objective quality. Note that PSNR is of less interest when a feedback channel is available since some focus should be set to the relationship between the average bit rate and the recovery time lag. To illustrate the trade-offs among compression performance and video quality, we measure frame sizes and PSNR values. The forced updating in H.261 stipulates that a macroblock should be INTRA-coded at least once for every 132 times of its transmission. The PVRG-P64 Codec simply INTRA-codes one whole frame every 133 frames. Here we present the results for frame 400 through frame 532 in Figures 4 and 5. In Figure 4, each of the frame size curves with different loss rates (5%, 10%, and 20%) is compared with that at zero packet loss rate. In the NACK mode, each frame error can result in a sudden frame size increase of some subsequent frame. As the packet loss rate rises above 10%, the variance of frame size can be quite dramatic. This can be a burden to a constant bit rate coder. Note that as the loss rate increases, the overall bit rate increases accordingly, implying that the NACK mode is not a good option at high loss rates due to the increases in both bit rates and the number of corrupt frames at the decoder. All four frame size curves look similar to one another in the ACK mode despite different loss rates. Since only acknowledged frames are used for prediction, packet loss has little impact on the average bit rate as long as the loss rate is not beyond 50% and the round trip delay for the back-channel is short. From Table 1, we can see that the average bit rate is in the range of 120-124 kbps. Unlike in the NACK mode, there is no obvious bit rate increase in the ACK mode as the loss rate rises. When the packet loss rate is below 5% in the NACK mode, the average frame size is slightly larger than 615 bytes and the average bit rate is below 80 kbps as shown in Table 1. In the ACK mode, however, the average frame size is 1014 bytes and the average bit rate is 120 kbps even at Encode frame 7 using frame 3 for reference Keep frames 3 through 7 Encoder: 1 2 3 4 (1) Decoder: (2) 5 6 (3) 1 2 3 1 2 3 7 (4) 4’ 8 (1)-ACK(1, 1) (2)-ACK(2, 2) (3)-ACK(3, 3) (4)-ACK(5, 5) (5)-ACK(6, 6) 9 10 11 8 9 12 (5) 5 6 7 5 6 Decode frame 7 using frame 3 for reference Back up frame Figure 3: Illustration of operation in the ACK mode 3. PERFORMANCE ANALYSIS 3.1. System Model for Experiments To perform our experiments, we modified the public-domain PVRG-P64 Codec 1.1  to incorporate error concealment and re- 3000 5% 1000 1500 0% 5% 1000 34 33 0% 32 5% 31 Y-PSNR (dB) 1500 35 34 Y-PSNR (dB) 0% 30 500 500 0 400 420 440 460 480 500 520 421 441 461 481 501 521 420 440 460 480 500 520 400 440 460 480 500 520 Frame Sequence Number 36 36 35 35 10% 1000 0% 10% 1000 500 500 0 0 34 33 0% 32 10% 31 Y-PSNR (dB) 1500 34 1500 Y-PSNR (dB) 0% Frame Size (Bytes) 2000 2000 30 420 440 460 480 500 520 401 421 441 Frame Sequence Number 461 481 501 521 420 440 1000 520 20% 1000 420 440 460 480 500 520 401 421 441 461 481 501 460 480 500 520 34 0% 32 20% 31 33 0% 32 20% 31 30 29 28 521 28 400 Frame Sequence Number Frame Sequence Number 440 35 33 30 0 400 420 Frame Sequence Number 29 0 400 36 34 0% 500 500 500 35 1500 Y-PSNR (dB) 20% Frame Size (Bytes) 0% 1500 480 36 2000 2000 460 Frame Sequence Number 2500 2500 10% 31 28 400 Frame Sequence Number 3000 0% 32 29 28 Y-PSNR (dB) 400 33 30 29 Frame Size (Bytes) 420 Frame Sequence Number 2500 2500 5% 31 28 400 Frame Sequence Number 3000 0% 32 29 28 401 Frame Sequence Number 33 30 29 0 Frame Size (Bytes) 36 35 2000 2000 Frame Size (Bytes) Frame Size (Bytes) 36 2500 2500 420 440 460 480 500 520 400 420 Frame Sequence Number 440 460 480 500 520 Frame Sequence Number Figure 4: Encoded frame size over frame sequence number for PB = 0.05, 0.1, 0.2. The operation mode is NACK (left) and ACK (right). Figure 5: Y-PSNR performance over frame sequence number for PB = 0.05, 0.1, 0.2. The operation mode is NACK (left) and ACK (right). zero loss rate. In other words, the ACK mode requires 50% more bandwidth than the NACK mode does even when no packet loss is present. From Figure 5 we can see that the NACK mode is more sensitive to packet loss in terms of PSNR values than the ACK mode. Note that PSNR values fluctuate due to the selection of reference frames. Without the capability of selecting reference frames, the curve would have gone down to a level where blocking artifacts and blurring could be very annoying. It is possible to improve the PSNR performance if sophisticated error concealment techniques are employed, which is orthogonal to the main theme of this paper. When the frame rate is high and the round trip delay is short, the time period of affected frames can be tolerable. Again, the ACK mode has a better PSNR performance over the NACK mode since the adverse impact of a packet error is confined to a single frame. 4. CONCLUSIONS Loss Rate 0% 5% 10% This paper explains the design concept of our error-resilient video communication system over IP-based LANs. We have implemented bi-directional error control mechanisms into our H.323 video conferencing software to deal with packet loss under various network conditions. We observe that our system is effective in maintaining picture quality while saving enormous amount of bandwidth as required by full INTRA-frame refresh. Our experimental results have shown that the NACK mode is appropriate at low packet loss rates due to less memory and less bandwidth required. Also, the visual quality can be maintained in our ACK mode even at high packet loss rates up to 20%. 20% NACK ACK NACK ACK NACK ACK NACK ACK Ave. video bit rate 73.3 120.8 78.7 121.6 81.7 122.8 87.3 124.0 Ave. Y-PSNR 34.965 35.084 34.712 35.009 34.325 34.921 33.498 34.53 0 0 396 101 511 209 575 453 5.41 5.55 4.29 5.79 3.95 6.26 3.61 6.12 No. of affected frms Total N.=665 Ave. no. of frms buf. at the encoder Table 1 Comparison of NACK and ACK modes in terms of averaged measures over 665 frames The average PSNR values in Table 1 are measured on 665 frames (from frame 134 through frame 798) of the test sequence. We can see that the ACK mode is not sensitive to packet loss in terms of PSNR values. The ACK mode outperforms the NACK mode primarily because only acknowledged frames are used for reference. From Table 1 we can also see that the average bit rate for the ACK mode is quite high even without packet loss. Therefore, the system we devised starts with the NACK mode and switches to the ACK mode when a threshold is met. This threshold needs to take into account the bit rate, the packet loss rate, the number of additional memory buffers, the number of affected frames at the decoder, etc. Clearly, this design issue is system-dependent, since for different network environments and execution platforms, each factor has different weighted importance. 5. REFERENCES  S. U. Portable Video Research Group. PVRG-P64 Codec 1.1. 1993.  T. Turletti and C. Huitema. RTP Payload Format for H.261 Video Streams. 1996.  I. T. Union. ITU-T Recommendation H.245: Control Protocol for Multimedia Communication. 1996.  I. T. Union. ITU-T Recommendation H.323: Visual Telephone Systems and Equipment for Local Area Networks which Provide a Non-Guaranteed Quality of Service. May 1996.  I. T. Union. ITU-T Recommendation H.263 Version 2: Video Coding for Low Bitrate Communication. 1998.  Y. Wang and Q.-F. Zhu. Error Control and Concealment for Video Communication: A Review. Proceedings of the IEEE, 86(5):974–997, May 1998.  M. H. Willebeek-LeMair and Z.-Y. Shae. Videoconferencing over Packet-Based Networks. IEEE Journal on Selected Areas in Communications, 15(6):1101 – 1114, August 1997.