transparent satellite/wireless tcp bandwidth acceleration

transparent satellite/wireless tcp bandwidth acceleration
While the transition to IP internetworking in space-based and other wireless aerospace applications
has a tremendous upside, there are significant challenges of communications efficiency and
compatibility to overcome. This paper describes a very high efficiency, low-risk, incremental
architecture for migrating to IP internetworking based on the use of proxies. In addition to
impressive gains in communications bandwidth, the architecture provides encapsulation of
potentially volatile decisions such as particular vendors and network technologies.
The specific benchmarking architecture is a NetAcquire Corporation COTS telemetry system that
includes built-in TCP-Tranquility (also known as SCPS-TP) and Reed-Solomon Forward Error
Correction capabilities as well as a specialized proxy-capable network stack. Depending on
network conditions, we will show that the effective bandwidth for satellite transmissions can be
increased as much as a factor of one hundred with no external changes existing internetworking
Telemetry distribution networks, COTS internetworking, TCP-Tranquility, and SCPS-TP.
Using commercial off-the-shelf (COTS) networking equipment to carry telemetry data over an
Internet Protocol (IP) internetwork has the potential to greatly reduce cost and complexity as well as
to improve the reliability, flexibility and functionality of large-scale telemetry distribution networks.
In this paper we consider the problem of delivering real-time telemetry over an internetwork that has
a relatively high bit error rate (BER) and long latency. These conditions are typically caused by a
space-resident segment being included in the internetwork, but a wireless segment can also
produce these network properties. Our approach uses the NetAcquire architecture to create a
legacy-friendly system architecture that addresses the challenges of space-based communication
Physical Structure
Figure 1 shows a typical physical layout. Telemetry is collected from an object, be it a satellite,
spacecraft, aircraft, or other vehicle, by a ground receiver. The telemetry is sent over the local area
network into the facility’s network, which can route data to other facilities via several means
including one or more satellite links. Data arriving at a remote facility is then routed through the
facility network to specific users.
Space Relay
Sat Modem
Sat Modem
LAN Router
or Switch
LAN Router
or Switch
Figure 1: Physical system layout (components shown in gray are volume COTS)
100% IP Architecture
In this architecture, the Object has an IP address and users access the Object directly. IP traffic
moves across the opaque Object-to-Ground Receiver link, likely encapsulated in a serial framing
format, but potentially using a wireless network format such as IEEE 802.11.
Object A
any over IP
User B
Figure 2: 100% IP Architecture.
The main appeal of this architecture is that it is “100% IP”. IP is inherently designed as a common
language for communication in a heterogeneous system.
However, the main appeal of this architecture is also a key reason that we did not adopt it: it is
inflexible in that requires every system component to communicate natively in IP, even if this is not
the most attractive solution in current environments. For many existing systems this would require
wide-scale, simultaneous equipment upgrades and thus would prove to be too costly and too risky
for many projects. In short, 100% IP does not offer a straightforward incremental path forward.
We do not suggest that moving towards 100% IP is a bad idea. On the contrary, there are many
compelling reasons to make this move: engineers and support staff only need to know one
technology, effort is not expended developing and maintaining proprietary technology with similar
functionality, various COTS components are less expensive and more reliable than their proprietary
equivalents, and infrastructure items such as test frameworks can be shared more readily.
However, we will show that it is not necessary to mandate a complete switch to IP technology in a
single step.
There are additional technical problems with a 100% IP architecture. First, the presumably
expensive Object to Ground Receiver link is used for retransmission of data lost anywhere in the
network. Under the assumption that the internetwork is not especially reliable, it is likely that data
will be lost within the internetwork and therefore unnecessary retransmission over this last hop link
will be performed. This space-link bandwidth is very expensive; it would be preferable to have data
lost in the internetwork retransmitted by an element in the local facility rather than the Object itself.
Another technical problem with the 100% IP approach is that both the Object and the users directly
participate in the data transport protocol. This means that changing this protocol would likely require
changing software in both of these locations. This is not ideal since a space-based Object may
have limited upgrade capabilities and the User PCs and workstations are almost certainly
administered independently. Although protocols tend to evolve slowly, over a long-lived project it is
nearly certain that the protocol will be enhanced. At this point in time, many protocols are in the
initial deployment phase and are therefore even more likely to be updated. The protocol also may
be replaced due to changes in the environment such as new technology adoption or changing
usage patterns.
Modular Architecture
A high efficiency design places proxies at the Object and User sites. Data is transferred in multiple
stages: from Object A to the Ground Receiver proxy; from the Ground Receiver to the User-Side
Proxy; from the User-Side Proxy to the User. Each step is done independently so that any errors
can be corrected only once for that segment.
Object A
any over IP
User B
Figure 3: Modular Architecture.
A key advantage of this architecture is that it is legacy-friendly. The technology used to
communicate between the Object and the Ground Receiver and between the User-Side Proxy and
the User is opaque, meaning that currently existing Object and User protocols can be used if the
proxies can speak them (these protocols do not need to change).
This also allows for different user-side proxies to speak different protocols. Therefore it is not
necessary to update every site simultaneously when switching to the telemetry over IP architecture,
so long as the proxy can be configured to speak the local site protocol.
If general Internet users are to access the Object, a “public access” proxy can be installed where
convenient, for example, at the Object’s down station site, at a central command site, at a
University or other research facility. The public Internet generally does not have the bit error rate
problems that we are concerned with, so there is no great need for the proxy to be co-located close
to the actual users. Also, since different proxies are free to use different communications protocols,
the Internet proxy can employ a data transport appropriate for mass dissemination, such as
unreliable IP multicast, while the core user base proxies simultaneously employ a reliable transport.
From an architecture perspective this is a major improvement, while from the user perspective it
need not even be visible. In both architectures the user identifies the Object with a host name or IP
address. In the 100% IP architecture, this address identifies the Object directly, while in the Modular
architecture, the address refers to a proxy. This distinction is not apparent to the user. The
difference might be visible to a single user traveling to multiple sites, since the local proxy IP
address would differ. However, this can be hidden by using a common host name (e.g., “sat42”)
that maps to the local proxy address at each site.
This technique is also applicable on a larger scale, and may be especially relevant for an Internet
proxy. High-volume web sites such as use host names in a similar way to transparently
route users to the most appropriate server replica[1]. This would allow for the establishment of
proxy servers on different continents so that users worldwide can experience smooth data transfer.
Finally, this architecture encapsulates the protocol used to move data over the internetwork, making
it much easier to upgrade or replace this protocol as necessary. The proxies need to be upgraded
to the new protocol (likely in phases) but neither the Object nor the user base is affected. This
evolvability is important because “the protocol” between the proxies is potentially several
protocols—in addition to basic data transfer, the proxies may be involved in data security, fault
tolerance, user authentication, auditing, etc.
The encapsulation of the proxy-to-proxy protocol has an important implication: it enables a vendorspecific protocol to be run between the proxies without tying the system to the vendor in the long
term. The rationale is that the proxies can be replaced with relatively little disruption to the system.
Naturally, any substituted proxy solution would need to have adequate capabilities, but technology
choice is otherwise unconstrained.
The modularization approach addresses all of the limitations identified for the 100% IP architecture
and therefore is our architecture of choice.
Communication Properties
There are several communication properties that determine which network protocols are suitable for
a given system. Three key properties are:
1. Reliable vs. unreliable: is every bit of data needed or can some loss be tolerated?
2. Ordered vs. unordered: does data need to be delivered in a fixed order or is
reordering tolerated?
3. Timeliness: does the data need to be transmitted in real-time or can it be sent in
Our system ensures reliable, ordered, real-time transport of data, which is a requirement for most
telemetry applications.
In addition to these properties, the arrangement of the communication is another important
consideration. Our application has only a small number of data consumers per data source and
little, if any, commonality between different consumers’ network paths.
Based on these properties, TCP and its variants are the best-suited protocols from the Internet
Standard TCP’s Challenges
The core challenge is to find a TCP variant that is able to move data effectively across the high bit
error rate and high latency internetwork. Standard TCP[1,2] was not designed for this type of
network environment.
The problem can be understood in the context of TCP’s core data delivery algorithm. This algorithm
allows TCP to provide the reliability and ordering properties that our application requires.
There is a fixed amount of buffer space available on the receiver allocated to each TCP connection.
The sender will not send data on a connection unless buffer space at the receiver is guaranteed to
be available. The receiver sends an acknowledgement message for every packet of data it receives
in sequence, informing the sender that space has been freed up and that more data can be sent.
When a packet gets lost due to bit errors (see below), the receiver notices the missing data,
transmits an indication of the problem to the sender, and the sender retransmits the lost data.
Assuming that the retransmission is delivered successfully, this recovery adds one round-trip time
(RTT) latency for receiving the lost packet. While the receiver is waiting for the retransmission, all
the later data received is held up at the receiver because the application requires delivery of the
data in order. Therefore, on the receiver side, data stops flowing for one RTT.
The data stream is broken up into discrete packets typically around 1460 bytes in length; these are
the smallest units of data transfer. The packets contain a weak checksum that allows for detection
of most bit errors but not for error correction. If a corrupted packet arrives at the receiver, it is
dropped as if it was lost in transit. Therefore, a single bit error causes the loss of 1460 bytes of data
and, even worse, puts TCP into its undesirable retransmission mode.
There are a variety of other challenges but these are the two main obstacles to achieving
reasonable throughput over the internetwork.
Managing Retransmits
TCP’s standard scheme for dealing with retransmits can only support one retransmit at a time,
which means one retransmit per RTT. For a space-based internetwork with a 500ms one-way
delay, 1 Mbps data rate, 1000 byte packets, and 1e-5 BER, 10 packets are lost on average per
RTT—so being able to recover from one loss per RTT would not be sufficient.
There are several proposals for enhancing TCP’s retransmission capability, but based on an
evaluation by NASA’s Glenn Research Facility [3], TCP-Tranquility (SCPS-TP) provides the best
performance. TCP-Tranquility, known formally as SCPS-TP, is a backwards-compatible extension
to TCP for dealing specifically with the problems of communication in a stressed environment.
SCPS-TP[4] is one of several protocols in the SCPS package from the CCSDS. The acronym
SCPS officially abbreviates “Space Communications Protocol Specification” but “Stressed
Communications Protocol Specification” has been proposed[5] so as to include other
communications environments with similar properties (notably wireless communication). “TP”
abbreviates “Transport Protocol”.
We chose to use TCP-Tranquility’s selective negative acknowledgement (SNACK) feature. SNACK
allows the receiver to communicate multiple missing data segments (“holes”) in the received data
explicitly. There is no limit to the number of holes that may be reported, as multiple SNACK
messages can be sent. However, even with SNACK the data stream will encounter a throughput
limit because retransmits and even the SNACK messages themselves can also be lost.
The sender still needs to wait for positive acknowledgement of the reception of any retransmitted
packets, so SNACK does not make filling in the holes faster or change the delay pattern on the
receiver side. If only one packet is dropped per round trip then SNACK will behave essentially the
same as standard TCP.
TCP-Tranquility is not widely implemented in commodity operating systems, but, as explained in
Modular Architecture section, our modular architecture encapsulates this protocol within the
proxies, so the lack of mainstream OS support is irrelevant. In fact, TCP-Tranquility as a whole is
not intended to gain widespread deployment because it is designed specifically for stressed
communications rather than the public Internet[6].
Reducing Packet Loss
The second weakness that we addressed was that bit errors cause packet loss. Our strategy was to
develop a novel Reed-Solomon Forward Error Correction (FEC) capability at the TCP level. Each
packet contains redundant data that allows a given number of bit errors to be corrected. If the
receiver sees that FEC was applied, it will forgo the weak TCP checksum and use the much
stronger FEC to detect and correct any bit errors that may have occurred in transit.
This is a substantial improvement: not only is bandwidth saved by avoiding data retransmission due
to bit errors, but also the time-consuming TCP retransmit operation is avoided. The use of FEC
adds at about 3.3% overhead to the data but can correct at least 4 bit errors per 247 bytes of data.
Even though external COTS internetwork components do not specifically support our FEC
technology, the packets are still routed correctly and the scheme is effective. This is because
packets are forward error corrected at the TCP level and the internetwork routes packets at the IP
level. Some compatibility problems may be encountered if firewalls are present, since they inspect
the TCP information. In this case, the firewall may drop the packet unnecessarily if it uses the
weaker TCP checksum to determine the fidelity of the packet.
The only errors our FEC scheme cannot defend against are in the headers. There are typically 22
bytes of header that are not covered by FEC. If a bit error occurs in this region, the packet will be
dropped within the internetwork, the same as in standard TCP.
Figure 4 illustrates the advantage of using FEC to reduce retransmissions on poor networks. The
data is based on four typical bit error rates and a packet size of 1000 bytes. It does not include
multiple retransmissions, which will amplify the difference further. Note that lower numbers are
better and that the scale of both axis is logarithmic: at an error rate of 10-6 FEC retransmissions are
completely eliminated, at 10-5 FEC retransmissions are improved by a factor of 10, and at 10-4 FEC
retransmissions are improved by a factor of over 40.
Retransmision Count
Retransmissions Required for a 1 Megabyte Transfer
With FEC
Without FEC
Bit Error Rate
Figure 4: Theoretical reduction of packet retransmissions due to Forward Error Correction.
These advanced network protocols are available as a NetAcquire “Extreme Network” product option
for the NetAcquire satellite gateway system. The capabilities are configured on a per-Ethernet-port
basis. Both TCP-Tranquility and FEC automatically detect whether the remote host supports the
enhanced protocol, and will fall back to standard TCP/IP if the enhancements are not supported.
This allows COTS PCs and workstations to access the units even if the advanced network
capabilities are engaged.
This deployment strategy further separates the use of specific vendor technology at the proxies. For
example, NetAcquire systems have extensive build-in processing capabilities like data compression
and real-time data analysis, and it is desirable to enable this vendor-specific functionality in a way
that is transparent to end-user applications. In a NetAcquire proxy architecture, this extended
functionality can occur in parallel with the basic proxy functions.
The system architecture uses two proxies to increase system modularity, with the benefits of
increased flexibility, compatibility, and evolvability. The proxy test platform described below uses
two off-the-shelf NetAcquire C-SIO units (the C-SIO product specifically includes PCM serial I/O
capabilities). In addition, the NetAcquire “Extreme Network” processing option was installed.
No custom proxy software needs to be developed for this application—NetAcquire C-SIO’s
standard product capabilities make proxy and gateway configuration trivial. The proxies on both
ends can communicate with other system components via serial, TCP/IP, and UDP/IP. The
NetAcquire proxies can also perform a wide variety of data processing functions such as
decommutation, data reformatting, data compression, archiving, and a wide variety of
computations. On the data consumer side, the proxy has the additional option of using a real-time
publish/subscribe protocol for transmitting data updates to users.
The real-time foundation of the NetAcquire Server platform is important in this system – NetAcquire
systems run real-time operating system (RTOS) instead of a desktop operating system like
Windows or Unix. Without this RTOS, the proxy would be susceptible to unexpected delays. When
dealing with extreme networks, “unusual” system states such as large amounts of retransmissions
and large amounts of buffered data are not really unusual. In system with a desktop operating
system, these “unusual” conditions can cause sudden, unexpected operating system and network
delays. With an RTOS and with other real-time software extensions, one can be assured that the
system will function as expected even under degraded conditions.
Test Procedure
Two NetAcquire Server units were connected via a simulated degraded internetwork connection.
The degraded network link simulator was capable of adding between 0 and 1000 milliseconds of
delay and adding bit errors at probabilities ranging from 10-3 and 10-9. Server #1 received
synchronous serial data from a Bit Error Rate Testing (BERT) device capable of both detecting bit
errors and reporting latency. The data was read and sent over a TCP/IP, TCP-Tranquility/IP, or
TCP-Tranquility&FEC/IP connection to Server #2. Server #2 output the received data via a serial
interface back to the BERT device.
TCP, TCP-Tr. or
over IP
Network Link
Server #1
TCP, TCP-Tr. or
over IP
Server #2
Serial Data
Serial Data
Bit Error Rate
Figure 5: System configuration for simulation testing.
The goal of the test was to determine the limits of each protocol as the internetwork connection
became increasingly degraded. We chose three test conditions:
1. Baseline: no additional latency and 1e-9 bit error rate.
2. Moderate degradation: 350 mS one-way latency and 1e-5 bit error rate.
3. Severe degradation: 700 mS one-way latency and 1e-4 bit error rate.
The test consisted of setting the bit error testing device to a given bit rate and observing the system
for 30 minutes. If the data was transmitted successfully and the delay was constant, then the rate
was considered sustainable and a higher rate was attempted. 2048 Kbit/S was the maximum rate
tested. This proved to be a high enough rate to differentiate the protocols.
The following graph in Figure 6 shows the performance of the various protocol configurations under
various WAN conditions.
Throughput for Various Protocol Variants for a Real-time Telemetry
Throughput (Kbit/S)
0 mS / 1e-9 BER
350 mS / 1e-5 BER
700 mS / 1e-4 BER
One-way Delay / Bit Error Rate
Figure 6: Throughput results for internetwork simulation testing.
The circles indicate complete breakdown of the protocol due to the network conditions. Network
delays are one-way and the TCP window size was set to a large value. The system was run at
various rates to determine the maximum rate at which the time offset between sender and receiver
was constant (i.e., the network is keeping up). The maximum speed for the telemetry stream used
for testing was 2048 Kbit/S, so the readings of 2048 on the graph do not represent maximum
system speeds.
For a baseline (non-degenerated) network connection, all three protocol variants perform equally
well. As soon as significant transmission errors and communication delay are introduced, plain TCP
quickly becomes unusable and exhibits essentially zero-throughput. Furthermore, as bit error rates
continue to increase, TCP-Tanquility also become unusable and throughput drops to zero. Only the
combination of TCP-Tranquility and FEC provide good throughput under the worst network
While the transition to space-based IP internetworking has a tremendous upside, the path to the
goal is not clear. In this paper we address two specific challenges: the need for incremental system
migration and the requirement to support error-prone and high-delay space-communications links.
We presented the technology used by NetAcquire systems for addressing these challenges.
NetAcquire systems offer both native TCP-Tranquility (also known as SCPS-TP) and ReedSolomon Forward Error Correction capabilities built into their network stack. In addition, a real-time
operating system ensures that the system will behave as expected even in extreme conditions. The
functionality of a NetAcquire interconnect is exposed using a proxy architecture that provides COTS
tools for implement gateways to legacy systems.
Finally, the actual benchmark results provide compelling evidence for the importance of addressing
unique space-segment architecture demands in real-world systems.
1. Akamai Technologies Inc.
2. Information Sciences Institute, “Transmission Control Protocol DARPA Internet Program Protocol
Specification,” RFC 793, Internet Engineering Task Force, September, 1981.
3. Jacobson, Van, Braden, Bob, and Borman, Dave, “TCP Extensions for High Performance,” RFC
1323, Internet Engineering Task Force, May, 1992.
4. Lawas-Grodek, Frances, Tran, Diepchi, Dimond, Robert, and Ivancic, William, "SCPS-TP, TCP
and Rate-Based Protocol Evaluation For High Delay, Error Prone Links," Paper T1-20, Proceedings
of Space Ops 2002, Houston, TX, October 9-12, 2002.
5. Space Communications Protocol Specification (SCPS)—Transport Protocol (SCPS-TP). Blue
Book. Issue 1. May 1999. Consultative Committee for Space Data Systems (CCSDS).
6. Cosper, Amy, "Maximized efficiency in stressed environments," Satellite Broadband, May, 2002.
Copyright 2003, NetAcquire Corporation. All rights reserved. Permission granted for publication in
2003 International Telemetry Conference proceedings.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF