ANC4 (2017)

ANC4 (2017)
ANC4
(2017)
Lewis Mackenzie
ANC4
1
ANC4 Topics
•
Internet architecture and routing;
•
Ethernet and VLANs;
•
Congestion control and quality of service;
•
Overlay networks;
•
Multimedia transmission;
•
Communications system implementation.
There are no compulsory texts for this course. All essential
material is provided.
For background reading: see ANC4 Moodle page.
ANC4
2
1. Internet Architecture and Routing
ANC4
3
Internet Structure I
•
•
•
•
Hosts and private LANs (mostly wired Ethernets or wireless 802.11 networks) are attached
to the Internet via edge networks run by ISPs
Customer networks may be attached to ISPs via ADSL/VDSL, cable, wireless (e.g. GPRS,
3G, 4G etc), dedicated fibre etc.
Edge (consumer) ISPs link to higher level ISPs forming what is loosely called “Internet
core” (networks which provide transit to other ISPs).
In addition to ISPs there are dedicated private networks which do not offer transit.
Hosts
Private
LAN
ISP network
Hosts
ISP network
Internet Core
ANC4
4
Internet Structure II
•
Internet hosts use IP to transmit/receive data. IP packets are routed geographically. Internet is
divided into regions called autonomous systems (AS) each controlled by single routing authority. E.g
JANET is AS786 (for an accurate definition see RFC 1930)
Peering Point
AS
AS
Interior router
Border gateway
•
AS connected to others via border gateways (BGs). AS-AS interface may involve peering, voluntary
settlement-free (sender keeps all) interconnection for traffic exchange or transit, where one AS is a
customer of the other. Peering may be private (point-point) or multiple parties may link at peering
point such as LINX (London Internet Exchange) under publicly agreed rules: see www.linx.net.
Private peering agreements are political and usually negotiated between ostensible equals.
•
Interior routers handle local traffic. Each router has routing table indexed on network portion of IP
address. BG may send incoming packet to another BG or an interior router.
ANC4
5
Internet Structure III
•
AS may be a stub (only one external connection), multihomed (>1 connection
but no through traffic) or transit (allows through traffic).
•
Categorize AS’s as follows:
– Tier 1 :can access the whole Internet settlement free (everybody else is a customer
or a peer). Only ~12 such networks exist. Global Internet backbones (e.g. AT&T,
NTT, Sprint).
– Tier 2 :peers with some networks but buys transit (is a customer) to reach some
parts of Internet (typically national carriers)
– Tier 3 : no peering, buys transit from (usually) a Tier 2 AS. Many consumer ISPs.
•
BG tables are big. Border Gateway Protocol (BGP) used to update.
Optimisation not feasible; aim is reachability (given settlement and peering
relationships).
•
Within an AS interior routers use an interior gateway protocol (IGP) to update
routing tables. Commonest are Routing Information Protocol (RIP) and Open
Shortest Path First Protocol (OSPF). These try to optimise routes.
ANC4
6
Communication Architectures
•
Series of layers composing communications system form a communication
architecture.
– specifies functionality of each layer and rules for interaction between layers. Each
layer has one or more protocols.
•
Revision: Compare OSI & Internet architectures!
•
In an architecture a layer may support multiple protocols (e.g. Internet
transport layer has TCP and UDP).
•
Implementation of architecture with specified protocols for each layer is a
protocol stack.
•
In Internet architecture at network layer all networks use IP. Currently primary
version is still IPv4. We focus here on this unless otherwise stated.
•
Subnet access layer in Internet sometimes called “Layer 2”. Why?
•
Following this logic the IP Layer is Layer 3 and the Transport Layer is Layer 4.
ANC4
7
Internet “Layer 2”
•
Can actually consist of complex sub-architecture with multiple sublayers.
•
Provides data pipes between routers or routers and hosts.
•
Different possibilities:
Point-to-point
LAPB, PPP
Broadcast Multiple Access
Non-broadcast
Multiple Access
(NBMA)
ANC4
8
Encapsulation
•
Consider TCP connection to host attached to an Ethernet.
–
–
–
On local net data is transmitted as Ethernet frames (max payload 1500bytes).
Within each Ethernet frame is an IP packet
Within each IP packet is a TCP segment
Ethernet
Header
IP Header
TCP Header
TCP Data
Ethernet
Trailer
On any TCP/IP network,
– max IP packet size is called maximum transmission unit (MTU);
– max data in a TCP segment is called maximum segment size (MSS).
•
In example above MTU=1500, MSS=1460 (if no options are used).
•
MSS can be set for a given direction when a TCP connection is set up (uses the TCP
header MSS option)
•
Most systems will attempt to avoid necessity for fragmentation by keeping packets
smaller than the minimum MTU they are likely to encounter (see later).
•
Absolute minimum MTU for IPv4 is 68 bytes; for IPv6 1280 bytes
•
All IPv4 hosts must be willing to receive packets of at least 576 bytes (for IPv6 1280
bytes).
ANC4
9
IPv4 Datagram Format
•
•
•
IHL (Internet Header Length) 4-bit field giving
number of words in header (5 common). Total
Length is in bytes (max 216 bytes).
6 bits DSCP allows for priorities but not widely
used; 2 bits ECN is for congestion control.
This byte was formerly known as Type of
Service (also not much used).
Fragments from same datagram have same
Identifier. 3 flags: one unused, MF (More
Fragments) indicates more fragments to
come, DF means Don’t Fragment.
•
13-bit Fragment Offset says where current
fragment comes in its parent datagram (units
are 8 bytes).
•
Time-to-live used to limit packet lifetimes:
decremented each hop, kills packet when 0.
Often set to 30.
•
Protocol field tells which type of payload
datagram is carrying: e.g. protocol 6 is TCP,
17 is UDP.
•
Checksum is taken over the header only.
ANC4
Version
IHL
DSCP
Identifier
TTL
Total Length
ECN
Flags
Protocol
Frag Offset
Header Checksum
Source Address
Destination Address
Options
(usually absent)
Data
10
IPv6 Datagram Format (Fixed Header)
•
•
•
Traffic Class 8-bit field equivalent to DSCP+ECN
in IPv4
Flow Label (20 bits) used to associate packets in
flows for real-time applications (see later)
Payload Length is in bytes and includes any
extension headers. Set to zero when jumbogram
option (RFC 2675) is in use.
•
Next Header (8-bits) carries type of next header.
Usually set to cargo protocol number (same as
Ipv4) but may indicate extension header.
•
Hop Limit is like TTL field of IPv4
•
Source and destination addresses are 128 bit
IPv6 format.
•
Several extension header types are defined: e.g
Hop-by-Hop Options, Destination Options,
Authentication (IPSec), Encapsulating Security
Payload (also IPSec), Fragment, Routing etc.
ANC4
Version
Flow label
Traffic Class
Payload length
Next Header Hop Limit
Source Address
Destination Address
Data
11
TCP Segment Format
•
Exchange of TPDUs called segments. Header of five 32-bit words. Each segment carries a
sequence number. TCP sees transmission as stream of (data) bytes. Every byte in stream
has a number. Sequence number of segment is number of its first data byte.
Acknowledgement number, if present, is number of next expected byte.
32-bits
Source port number
Destination port number
There are 9 1-bit flags: NS, CWR, ECE,
URG, ACK, PSH, RST, SYN, FIN used for
e.g. setting up connections (via 3 way
handshake).
Window size is number of bytes receiver
will accept (sliding window flow control)
Checksum is over TCP segment plus
pseudo-header.
If URG =1, urgent pointer gives number of
urgent bytes that follow sequence number.
Options have several uses: e.g. MSS
negotiation (Option 2, length 4 bytes) in
SYN segments.
Sequence number
Acknowledgement number
Header
length
Window size
Flags
Checksum
Header length (4 bits) gives number of 32 bit
words (including options).
Next 3-bit field is reserved
Urgent pointer
Options
(0 or more words)
Data
ANC4
12
Path MTUs
•
•
•
•
•
Minimum MTU encountered along a route is called the path MTU or PMTU.
Most modern systems will try to establish the PMTU for a TCP connection by
initially sending IP packets (of a size in accordance with the initial MSS
negotiation) with DF set and then seeing what happens. The technique is called
PMTU discovery (RFC 1191) and it relies on the Internet Control Message
Protocol (ICMP).
If a router cannot forward a packet because DF is set it will discard the packet and
inform the sender via an ICMP packet (Type 3, Code 4). This “Packet Too Big”
(PTB) message contains a 16-bit next-hop MTU (for the inaccessible network),
allowing the sender to adjust the next data packet accordingly.
Some firewalls block ICMP in which case a failed PMTUD attempt receives no
PTB reply. This is called a PMTU black hole (see RFC 2923).This can lead to
occasional or sustained packet loss and failure of the transport connection.
Can use ping tool to determine PMTU.
ping –f
•
-n <number of pings> –l <size> <dest IP addr>
Ping uses an ICMP Echo Request packet which has a 8 byte header and is
contained in an IP packet so the PMTU is 28 bytes bigger than the largest
successful <size>.
ANC4
13
More on PMTUD
•
MSS negotiation uses MTU of local network of host. This leads to assumed PMTU based
on minimum MTU of end networks. If no MSS is specified the Internet default of 536 is
assumed. Commonly the initial PMTU is 1500. Why?
•
After MSS negotiation, PMTU discovery is conducted at the IP layer. If a reduction in the
PMTU is detected (due to an intermediate hop) it must be reported to the transport layer
which is responsible for packetization.
•
If ICMP is blocked, some routers attached to links with MTUs lower than 1500, reduce the
MSS of TCP connections passing through them. This is called TCP clamping but is not an
ideal solution. Why?
•
A robust mechanism for PMTUD in the absence of ICMP is described in RFC 4821. This
requires active participation by the transport layer and is called Packetization Layer
PMTUD (PLPMTUD).
•
A PMTU is associated with a path to a particular destination. Re-computation for each
active destination is desirable at regular but not too frequent intervals. This leads to the
concept of cached PMTU values held by the IP Layer with each entry having a finite
lifetime (10 minutes is typical).
•
PMTU is also defined for IPv6 (see RFC 1981).
ANC4
14
ICMP
•
ICMP (RFC 792) carries control messages between hosts and routers. Generated in
various circumstances including diagnostics, routing or IP errors (see also RFC 1122:
Requirements for Internet Hosts).
•
Carried in IP datagrams with Protocol Number 1.
•
First byte of header is 8-bit type field, second is 8-bit code (additional info). This is
followed by 16-bit checksum and a number of 32-bit words (format is determined by
type + code) e.g. Type 8 (Echo Request) can carry arbitrary data (must be included in
the Echo Reply).
•
Some example message types.
–
Type 3: destination unreachable (can’t forward datagram). Code gives reason (4 means datagram is
too big (see PMTUD).
–
Types 9/10: router advertisement/router solicitation (identify routers on local network: see RFC 1256).
–
Type 5: routing redirect (tell source there is a better router to use)
–
Type 4: source quench (sender slow down) deprecated by RFC 6633 (2012)
–
Types 8/0 echo request/reply (used in ping and tracert tools)
–
Types 13/14 timestamp request/reply
–
Type 11: time exceeded (sent to source when a packet has reached ttl limit)
ANC4
15
More on ICMP
•
•
ICMP is used for reporting non-transient error conditions or for querying network with
request reply.
–
No port numbers
–
No client-server concept
–
No need for services and ports to be listening.
Special conditions (RFC 792)
–
No ICMP error messages in response to ICMP error messages
–
No ICMP error messages in response to multicast or broadcast IP packet.
–
No ICMP error messages in response to packet not from a unique host
–
For fragmented datagram errors, ICMP replies only to first fragment
•
ICMP is widely used but not designed for security and is vulnerable to exploitation. An
attacker can use the protocol to gather information (reconnaissance), launch denial of
service attacks or act to implement covert channels.
•
IPv6 has very similar ICMPv6 defined in RFC 4443
ANC4
16
ICMP Security Issues
•
ICMP sweep attacker sends ICMP requests to target network range; sees who replies (automated pings).
•
Traceroute (Windows tracert) relies on a response with ICMP Type 11 “Time exceeded” packets and can be
used to map out a network. Traceroute can probe with ICMP or UDP packets. UDP probes often sent to Port
53 (DNS port) to try to penetrate firewalls. Attacker wants to learn firewall filtering policies/open ports.
•
Firewalk is a tool that traceroutes a network firewall, then pings hosts one hop behind and waits for a Type 11
message: if it gets one, the ICMP request got past the firewall and the host(s) exist(s); if not, the attacker can
deduce something about the firewall filter rules. Learning such rules greatly helps the planning of an attack
•
Inverse Mapping: attacker sends ICMP replies. Often firewall allows these even if blocking incoming requests.
If router replies with Type 3 Code 1 packets (Host unreachable) this shows hosts exist at given addresses.
•
OS Fingerprinting different OSs respond slightly differently in the data they include in a Destination
Unreachable response. This can give away info about which OS a host is running.
•
Router impersonation: Attacker can hijack the router solicitation protocol
•
Ping of death; Attacker sends oversized packet (via fragmentation)
•
Denial of service (Smurf attack): spoof IP address of host and send ICMP Echo Request to directed
broadcast address. Replies overwhelm target.
•
ICMP tunnelling exploits arbitrary data length in ICMP echo traffic to transfer information between an infected
machine and a client (e.g. Loki, Phrack Magazine, 49/6, 1996)
−
Requires compromised target machine so Loki software has admin/root privileges.
−
Very hard to detect. Only defence (besides blocking ICMP) is Intrusion Detection System (IDS)
analysis.
ANC4
17
ICMPv6
•
ICMPv6 (RFC 4443) has many similarities to ICMPv4 but here also supports other
protocols such as Neighbour Discovery Protocol (NDP), Secure Neighbour Discovery
(SEND), Multicast Router Discovery (MRD)
–
NDP subsumes role of ARP and Router Solicitation/Advertisement
•
Fixed header is just: 8-bit Type, 8-bit Code, 16-bit Checksum.
•
Types 0-127 are used to signal error conditions
•
•
•
–
Type 1: Destination Unreachable
–
Type 4: Packet Too Big
–
Type 3: Time Exceeded
Types 128-255 are information messages.
–
Some like ICMPv4. e.g. Echo Request (128), Echo Reply (129),
–
Some are extensions supporting NDP, SEND, MRD, etc.
ICMPv6 has many of the same vulnerabilities as ICMPv4. E.g.
–
Covert channels are possible via Echo Request/Response
–
Router Impersonation via NDP Router Advertisement (Type 134)
Fragmentation by router is not allowed in IPv6, so any packet that exceed its PMTU
will elicit a PTB response.
ANC4
18
Layer Addressing
•
•
•
Layer 2 address is network point of attachment (NPA) or MAC address:
–
format varies but most common is IEEE 802 format 48-bits;
–
flat namespace;
–
each interface has independent MAC address;
–
hardwired.
Layer 3 addresses are 32-bit IP numbers:
–
one per interface for any system using Layer 3 routing (one-one correspondence with
MAC addresses)
–
in bridged (layer 2 switched) system can have multiple MAC addresses to one IP
number
–
IP addresses are hierarchical (network number concatenated with host number);
–
soft address.
16-bit Layer 4 subaddresses (port numbers) identifying TCP/UDP ports:
–
•
Uses system of well-known ports associated with particular services;
In Windows use ipconfig /all to determine MAC and IP addresses for
each interface on an Internet host. In Unix ifconfig is similar.
ANC4
19
Address Resolution
•
Typically host or router wants to forward IP packet to destination locally
connected (broadcast LAN or NBMA network).
– Use Layer 2
– Must resolve IP address to Layer 2 address.
•
In broadcast LAN usually use address resolution protocol (ARP): local
broadcast on LAN.
A
B
C
D
E
A
B
C
D
E
ARP response with MAC address for D
ARP request with IP address D
•
Stations maintain ARP cache (examine using arp -a).
•
Conventional ARP relies on LAN broadcast so in NBMA network, need
different approach: e.g. ATMARP in ATM
ANC4
20
ARP
•
•
ARP (RFC 826) is general address resolution
mechanism. Packet format is shown:
HTYPE specifies LAN network type (Ethernet =1)
–
PTYPE specifies internetwork protocol type (IPv4 is
0x0800)
–
HLEN is hardware address length
–
PLEN is protocol address length
–
Operation is 1 for request, 2 for reply.
PLEN
HLEN
Operation
Sender hardware address (SHA)
ARP probe is request sent with SPA = 0.0.0.0
Used to check that a newly assigned IP address is not
already in use (assignment error)-RFC 5227.
ARP can make announcements (aka gratuitous
ARP message) usually broadcast as ARP request
containing senders address in TPA field
(SPA=TPA).
–
•
Protocol Type (PTYPE)
–
–
•
Hardware Type (HTYPE)
Used to update caches with new ARP information so
as to cut down on future traffic
Like ICMP ARP is vulnerable to attack. ARP
spoofing involves the generation of fake ARP
messages to poison ARP caches opening way to
e.g. man-in-the middle and denial of service attacks.
ANC4
Sender Protocol address (SPA)
Target hardware address (THA)
Target protocol address (TPA)
16 bits
Exercise: Explain how these attacks
might be conducted and suggest ways to
defend against them.
21
Neighbour Discovery Protocol (IPv6)
•
•
NDP (RFC 4861) uses five different ICMPv6 types:
–
Router Solicitation
–
Router Advertisement
–
Redirect
–
Neighbour Solicitation
–
Neighbour Advertisement
NS messages are multicast when resolving an address and unicast when
checking reachability.
–
Uses solicited-node multicast address formed from bottom 24 bits of target’s IPv6 address
(prefixed by ff02:0:0:0:0:1:ff00::/104).
–
This is more efficient than broadcast and only goes to nodes with same bottom 24
address bits.
–
Solicited NA messages are targeted back at originator of NS; unsolicited NA messages go
to all-nodes multicast (same approach as ARP).
•
NDP has same security issues as ARP.
•
Incorporation of Router Discovery allows a new node to auto-configure its
address
ANC4
22
Sending data via TCP (the story top-down)
•
Client app submits data to TCP layer giving destination IP address and port
number (or equivalently, socket ID).
•
TCP layer prepares TCP segment consistent with MSS (established at
outset) and attaches 3 word IP pseudo-header
– source and destination IP addresses, protocol number, overall segment length
– TCP (and UDP) header checksum computed over segment + pseudo-header
– pseudo-header is not transmitted but is a means of passing info between TCP
and IP layers. If packet gets misdirected (changed destination IP address),
checksum will fail.
•
TCP layer submits segment + PH to IP layer
•
IP layer prepares IP packet containing segment
•
IP layer invokes routing function and consults forwarding table to determine
next hop (to destination host or next router).
•
IP layer determines forwarding MAC address (via e.g. ARP).
•
IP layer submits IP packet to network access layer.
ANC4
23
IP Addressing
•
Recall that IP addresses are divided into 5 classes A to E.
•
A-C addresses are divided into 2 fields: network number and host number.
–
A from 0.0.0.0
- 127.0.0.0
–
B from 128.0.0.0 - 191.255.0.0
–
C from 192.0.0.0 - 223.255.255.0
•
Class D addresses (224.0.0.0-239.255.255.255) are for multicasting.
•
Class E addresses are reserved.
•
Three ranges are reserved for private networks:
–
10.0.0.0
–
172.16.0.0
–
192.168.0.0 – 192.168.255.0
– 172.31.0.0
1
Class A network
16
Class B networks
256 Class C networks
•
Network with private addresses must not be directly connected to Internet (must use
address translation at gateway).
•
Network number space is under control of IANA but allocations are delegated to
Regional Internet Registries.
ANC4
24
Address Masks
•
Rigid use of class A, B, C networks (classful addressing) has proved inefficient.
•
Motivated introduction of classless addressing where network number can be any
number of bits long.
•
Can indicate split using address mask (also quoted in dotted decimal notation).
– bitwise AND of mask with address gives network number
Bit 0
Bit 8
Bit 16
Bit 24
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
Mask
Network number
Host number
IP Number
•
Classless Inter-Domain Routing (CIDR) notation uses /n suffix after address to
indicate number of bits in network number.
– Thus 130.209.240.50/20 means a network number of 130.209.240.0, host
number of 50
ANC4
25
Special Addresses
•
Network address: address with host number of 0.
•
Network number zero means “this network”.
•
0.0.0.0 means “this host” (used when host doesn’t know its IP
number).
•
255.255.255.255 (limited broadcast address): all hosts on current
physical network.
•
Directed broadcast address (host number all 1s) means all hosts on
all subnets of current network.
•
Loopback address (any address beginning with 127) goes straight
from host’s output to its input.
ANC4
26
Forwarding
•
Internet routing nodes use destination IP address to direct IP packets along
next hop.
•
Use network number of destination IP address to consult routing table
(a.k.a. forwarding table).
•
Each table entry refers to some network A and has the structure:
network address
•
network mask
gateway
interface
metric
Where
–
network address & network mask are the address and mask of A respectively;
–
gateway is IP address where any packet trying to reach A should be sent next;
–
interface is the IP number of the routing node’s interface that should be used;
–
metric is the number of hops still to go to A.
•
If A is directly attached to the node, gateway is the same as interface.
•
If A is a remote network (no interface to node) gateway is IP address of a
router on the local network.
ANC4
27
Matching Routing Table Entries
• . For each entry, destination address in IP packet is bitwise ANDed with network mask
and result compared with network address. If there is a match, entry is used.
•
If there are multiple matches, the longest network mask is preferred
–
–
E.g packet with address 130.209.240.50 and table with entries
130.209.240.0
255.255.240.0
…
130.209.0.0
255.255.0.0
…
Will be routed by the first entry
Destination Address
Entry Mask
Entry Network Address
Bitwise AND
Compare
ANC4
28
Host routing tables
•
Each network host has (assume one network interface) an IP number, N, a
mask called a subnet mask, S, which identifies the network it belongs and
an address for a local default gateway, G.
– Host uses these items to construct routing table.
•
Every routing table will contain a default route with network number 0.0.0.0
and mask 0.0.0.0.
– Every address matches this, so it will be used if no other match is found.
•
Every routing table includes a loopback route which matches any address
of the form 127.x.y.z and routes to the special loopback address 127.0.0.1.
•
Other routes always present include: directly attached network; local host,
network broadcast; limited broadcast and multicast.
•
To examine the forwarding table on a Windows host use:
route print
ANC4
29
Routers
•
Routers have two or more NICs.
– Multihomed conventional computer can act as a router (e.g. Windows Server
includes software to act as an IP router);
– Modern high performance routers use ASICs to minimise delay
•
IP layer on router uses routing table to decide how to forward any packet
arriving on any interface.
•
Routing tables are indexed on network number.
– For some routers (e.g. in backbone networks), routing tables can be very large.
– Routing table sizes can be reduced by grouping geographically close networks
with adjacent network numbers into one entry.
•
Routers generally use adaptive routing with a routing protocol (e.g. RIP,
OSPF, BGP etc) operating between routers to allow new information to
update routes.
•
Routers use a routing algorithm to compute new optimal routes on the basis
of information conveyed by routing protocol.
ANC4
30
Subnetting
•
Subnetting allows a network to be split into smaller sections, or subnets,
by splitting original host number field into subnet field and new host
number field.
•
This is done by using an extended subnet mask.
•
Example: split network 130.209.240.0/24 into 2 equal subnets
130.209.240.0/25 and 130.209.240.128/25.
Network Address for 130.209.240.0/24
1000 0010 1101 0001 1111 0000 0000 0000
Network Address for 130.209.240.0/25
1000 0010 1101 0001 1111 0000 0000 0000
Network Address for 130.209.240.128/25
1000 0010 1101 0001 1111 0000 1000 0000
ANC4
31
Subnet Routing
•
Hosts in different subnets have different subnet numbers. Routing table will
send datagrams targeted to other subnets to a router even if the targets are
on same physical network
•
External routers are not aware of subnets in a network. Only when a
datagram enters a network does it get routed according to subnet.
ANC4
32
Supernetting
•
Allocate multiple Class C networks with adjacent addresses to one organisation in
same geographical vicinity.
–
•
Routers far from such supernetted networks need only one entry
–
•
E.g use 22-bit mask to group 4 Class C networks together
xxxxxxxx.xxxxxxxx.xxxxxx00.00000000
255.255.252.0
…
Routers nearby have four distinct entries
–
xxxxxxxx.xxxxxxxx.xxxxxx00.00000000
255.255.255.0
…
–
xxxxxxxx.xxxxxxxx.xxxxxx01.00000000
255.255.255.0
…
–
xxxxxxxx.xxxxxxxx.xxxxxx10.00000000
255.255.255.0
…
–
xxxxxxxx.xxxxxxxx.xxxxxx11.00000000
255.255.255.0
…
•
This reduces size of routing tables in backbone routers.
•
Group of supernetted networks is a CIDR (Classless Inter-domain Routing) block.
•
Note that since masks are used to describe extent of block, the number of component
networks must be a power of two.
•
In CIDR environment, routing protocols must be able to exchange mask information.
RIPv1 is not CIDR compliant; RIPv2 and OSPF are.
•
A network address/mask combination may refer to one or multiple networks. For the
purposes of routing such a combination is called a routing prefix.
ANC4
33
Routing Algorithms
•
View abstractly first.
•
Aim is to establish best route between every source-destination pair
in current circumstances.
•
Shortest path routing:
– Assign cost (e.g. hops, latency, financial) to every link
– Find paths that minimise cumulative cost
•
Two broad approaches:
– Source routing (path is computed at source and sent with packet)
– Per-hop routing (each node makes routing decision)
•
Per hop requires routing tables at each node
•
Best approach is to allow each node to compute its own table.
•
Internet uses per-hop routing.
ANC4
34
Routing: general
•
Routing can be static (one table computation) or adaptive.
– If adaptive routing is used, how often should updates be undertaken?
•
Per-hop routing tables can be computed centrally and issued, but
this is not popular. Why?
•
Internet uses decentralised adaptive routing. Routers exchange
routing information to allow adaptation of tables: RIP, OSPF, BGP.
•
General adaptive routing problems include:
– Different picture of network at different nodes (may lead to looping)
– Path oscillation
•
Two common approaches to computing shortest paths:
– Link state routing (Dijkstra algorithm);
– Distance vector routing (Bellman Ford algorithm).
ANC4
35
Link state routing
•
Each node checks which nodes are attached to it by sending
HELLO packets.
– Functioning connection with a neighbour is an adjacency.
– Subsequent periodic use of HELLOs allows routers to detect failures of
links or neighbours.
– Once a node knows its neighbours it can establish link costs using any
chosen metric (e.g. hops, bandwidth, time delay).
– Newly discovered link costs are flooded as messages called link state
advertisements (LSAs) to all other network routers.
•
Every node maintains database of link costs
– Used to compute a shortest path tree for each node.
– Can be run in centralised or distributed form.
ANC4
36
Distance-vector routing
•
In this approach, each router exchanges with its neighbours the distance
column (distance vector), suitably indexed, from its routing table.
•
A router compares its own current routes against the new ones it has been
offered and adjusts its table accordingly if it is offered a better one.
•
Dangers:
– Looping
– Bad news travels slowly (count-to-infinity problem)
•
Can be executed centrally but is primarily designed for distributed use.
•
No need for link database at every node so more economical in terms of
storage.
•
In practice, not as stable as link state routing, but easier to implement.
•
Enhancements to the algorithm such as spilt horizon (do not advertise
routes back in the direction from which they were learned) can help speed
convergence in some situations.
ANC4
37
RIP
•
RIP (v2 is RFC 2453) uses distance-vector routing.
•
Router advertises distances in hops every 30 secs
– RIP message is list of routes (address, mask, next hop, distance)
– Messages are carried in UDP PDUs
•
RIP has maximum hop count of 15
– In RIP 16 is infinity (unreachable)
– Only usable on small to medium networks
•
RIP v2 has several enhancements:
– supports classless addressing;
– supports authentication (why?)
– uses multicasts instead of broadcasts on multiaccess networks
– supports route tags.
ANC4
38
OSPF
•
OSPF (RFC 2328) uses link state routing with any metric.
•
Every router builds a Link State Database (LSDB).
– Composed of a Link State Advertisements (LSAs)
•
Routers find neighbours with HELLO packets
– Build an adjacency
– Routers synchronise LSDBs with neighbours when adjacency formed.
•
Routers must keep LSDBs up to date
– If local change is detected, router sends a Link State Update Packet to send new
LSAs to rest of network (flooding via multicast)
– Routers continue to check status of neighbours in case of fault
•
To control size of task, OSPF can divide an AS into several routing areas.
– Connected by area border routers (ABRs)
– Area identified by 32-bit Area ID .
– Always have one backbone area (Area 0).
ANC4
39
Network Address Translation
• NAT router translates between private address and one or more public ones.
• In a common situation, there are fewer public IP addresses than private IP/port pairs wanting Internet
access at the same time. Use process called NAPT in RFC 2663 (summarised below).
• NAT router table maps private address/port pair (IPPriv, s) onto an external address/port pair (IPPub, s1)
called a server reflexive address). It is this that is the source address in any public packets that are
sent to the target of the communication.
• Router records (IPPriv, s) and (IPPub, s1) together with protocol in use (TCP, UDP) in an internal table.
These 5 items form a so-called 5-tuple. This table entry remains valid until inactivity timeout deletes it.
A
P1
10.0.0.2
x.y.z.w
s
10.0.0.2
New Address
Source Port
s1
P2
Source Port
10.0.0.2
s
Public IP
a.b.c.d
Protocol
TCP
d
a.b.c.d
x.y.z.w
x.y.z.w
s1
d
NAPT ROUTER
P4
x.y.z.w
P3
Private address: 10.0.0.1
x.y.z.w
10.0.0.2
Public address: a.b.c.d
d
a.b.c.d
d
s
s1
ANC4
40
NAT Traversal.
•
NAT connections are initiated by private hosts
•
But once a 5-tuple exists, questions arise:
1.
If same (IPPriv, s) initiates a connection to a new target, can same SRA be used?
2.
Can remote device initiate a connection through the existing 5-tuple?
•
In most forms answer to Q1 is Y. Exception is symmetric NAT.
•
Answer to Q2 varies:
–
Full cone NAT, any remote device can initiate.
–
Address restricted cone NAT, any port on same remote IP can initiate
–
Port restricted cone NAT, only same remote IP and port can initiate
•
Some protocols fail over NAT because private host is required to communicate its contact address
in segment payload. Unfortunately this address is private, so peer device cannot make contact.
Examples include VoIP, gaming and peer-peer protocols.
•
To let such protocols work a private host needs to be able to discover its SRA.
•
One means of doing this is via Session Traversal Utilities for NAT (STUN) which uses a STUN
server on the public Internet to reveal a private client’s SRA (RFC 5389).
–
Unfortunately STUN does not work with symmetric NAT. Why?
•
For symmetric NAT solution is TURN (Traversal Using Relays around NAT): see RFC 5766. Uses
TURN relay server to avoid symmetric NAT problem BUT expensive in resource and delay.
•
Interactive Connectivity Establishment (ICE) seeks best approach in given scenario (RFC 5245)
ANC4
41
2. Layer 2
ANC4
42
Ethernet Operation
•
Ethernet uses frames with format:
Bytes
7
1
Preamble
2 or 6
2 or 6
Dest
address
Source
address
2
0-1500
0-46
4
Type or
Length
Data
Pad
Checksum
Start-frame
delimiter
•
Checksum is CRC-32.
•
Ethernet addresses are 48-bits long and globally unique.
•
Data field size is can be 0-1500bytes
•
Minimum length 64 bytes. If data field too small, pad field must compensate. 64 bytes
corresponds to time to cross max network size twice (51.2s at 10Mbps and 5.12s at
100Mbps). If a packet collides, its transmitting station must know (and abort) before it
finishes sending it. Malformed short packet formed by collision is a runt.
•
Type field identifies payload (e.g. 0x0800 is IP).
•
In IEEE 802.3 “Type” replaced by “Length” (for compatibility Type > 0x05DC). 802.3
uses IEEE simplified variant of HDLC, Logic Link Control (LLC) for LANs. 0x8870 is
reserved to indicate a jumbo frame (up to 9000 bytes) option used in Gigabit Ethernet.
•
LLC and IP can be thought of as operating at layers higher than MAC. IEEE views
MAC and LLC as sublayers of OSI Data-Link Layer.
•
IP packets carried over Ethernet may have to be fragmented (max IP size, 64Kbytes).
•
There is a required inter-frame gap of 12 bytes.
NSA3
43
Bridges
•
Bridges are intended to link LANs together
Host
Host
LLC
or
IP layer
LLC
or
IP layer
Relay function
MAC
MAC
MAC
Physical Physical
Physical
LAN 1
•
MAC
Bridge
Physical
LAN 2
Transforms one MAC frame format to another
– No packet (layer 3) level functionality
– Relies on connected LANs having similar formats
•
IEEE 802 defined transparent bridges (IEEE 802.1D)
– Transparently translates frames from one 802 format to another
– Routes based on intelligent interpretation of destination MAC address
ANC4
44
Routing without routers
•
Bridges may be used to construct internetworks.
– Route without Layer 3 involvement
•
Layer 2 addresses have no hierarchical structure
– Point of attachment of hosts may be altered at any time
•
Transparent bridges try to learn where specific addresses currently located
–
–
–
–
•
Observe source addresses (backward learning)
Build filter table (output port indexed on destination MAC address)
Floods frames whose destination is unknown to all ports
Can lead to looping behaviour
802.1D tries to build spanning tree which all bridges share
–
–
–
–
–
–
Use spanning tree protocol (STP) with BPDUs directed to a reserved multicast address
Establish one root bridge
Each LAN gets one designated bridge forwarding frames from root direction.
Original SPT can be quite slow to converge (10s of seconds)
Rapid STP (RSTP) specified in 802.1w is order of magnitude faster (see 802.1D-2004)
Shortest Path Bridging (SPB) in 802.1aq allows all links to be active (802.1aq-2012)
ANC4
45
Remote bridging
•
To bridge LANs not directly connected can tunnel through an
intermediate network: remote bridging.
•
MAC frames encapsulated in backbone network PDUs.
•
Internet may be used as vehicle to connect multiple private LANs
and individuals: forms a virtual private network (VPN).
Remote
bridge
MAC frame
MAC frame is encapsulated in
backbone network packet
for remote transmission
Remote
bridge
MAC frame
Backbone network
ANC4
46
Layer 2 Switches
•
Layer 2 switches are similar in function to bridges but:
– Faster (based on more recent technology);
– More restricted (usually connect only LANs of same type)
•
Ethernet switches developed from 10BaseT technology
– Very widely used for 100Mbps, almost exclusively for gigabit Ethernet
– Can be used to connect segments together
•
Allow full duplex Ethernet (802.3x-1997)
– No collisions, frames discarded if switch buffers full
– Effectively no concept of collision domain
– Increases maximum size of Ethernet
– PAUSE frame defined to implement flow control
ANC4
47
VLANs
•
Usually each LAN is a broadcast domain.
•
Using switches, an (Ethernet) LAN may be divided into several VLANs
(virtual LANs), each its own broadcast domain.
•
VLANs may be port based (isolated), overlapping, MAC address based,
protocol based; or IP subnet based.
– In every case, intelligence required is integrated into the switch
Layer
2
Switch
Each switch port
belongs to just
one VLAN
VLAN 1
• Challenge is VLANs distributed
over several switches.
• Problem is equivalent to
ensuring multicast frames are
directed to hosts in a multicast
group
VLAN 2
ANC4
48
802.1p
•
802.1D handles multicasts by forwarding on all ports
•
802.1p aims to enhance support for groups across interconnected switches
•
802.1p allows stations to join groups and modify filter tables so that frames
multicast to a group are forwarded intelligently.
•
802.1p defined a generic protocol called GARP (generic attribute
registration protocol)
– Allowed stations to request receipt of traffic with specific attribute (e.g. multicast
group membership, via derived GMRP or VLAN membership via GVRP).
– Filter tables in GARP compliant switches then updated accordingly
– This feature of 802.1p has since been superseded by a successor protocol, MRP
(Multiple Registration Protocol) defined in 802.1ak addendum to 802.1Q-2005.
•
802.1p also defines frame priority scheme for differentiating types of traffic
– Priority scheme supported by frame format of 802.1Q
– Switches forward higher priority frames first
– Requires multiple queues at outputs.
ANC4
49
802.1Q
•
802.1Q is aimed at supporting VLANs
•
Each VLAN is allocated a 12-bit VLAN ID.
•
802.1Q adds a 4-byte tag to Ethernet frame containing VID and a 3-bit priority field.
•
Most stations are not 802.1Q aware. Untagged frames are tagged at port where they
enter a switch.
•
–
Tag includes port VID (PVID) and priority.
–
Frame then transferred to every port willing to accept that VID
Links joining switches (trunks) carry tagged frames
Bytes
7
2 or 6
2 or 6
4
2
0-1500
0-46
4
Dest
address
Source
address
802.1Q
Tag
Type or
Length
Data
Pad
Checksum
1
Preamble
Start-frame
delimiter
Drop Eligible
Indicator (DEI)
81-0016
User
Priority
ANC4
VID
50
Routers vs Switches
•
•
•
•
Switches were initially much faster than routers
–
No need to process IP headers
–
Layer 2 switch consults forwarding table in hardware (c.f. ATM switching)
One solution is label switching
–
Identify flows (collections of packets with same endpoints)
–
Attach a label to each packet
–
Construct forwarding table of labels against next hop (Ethernet address).
–
Switch packets on flow without looking at their headers.
Many proprietary ways of doing this but IETF standard is multiprotocol label switching
(MPLS)
–
Operates within an MPLS domain.
–
Effectively creates virtual circuits to carry IP packets through domain.
–
Allows use of quality of service differentiators.
–
Can be used with protocols other than IP.
Another option is the routing switch which is a router with routing functions
implemented via high-speed ASICs: sometimes called wire-speed router.
ANC4
51
Multicast Support at Layer 2
•
Layer 2 supports multicast.
•
802.1D switches can flood multicast
frames
•
B
A
L2 switch
D
C
L2 switch
Better solution MMRP, (MRP
Multicast Registration Protocol):
802.1ak-2007 MRP applied to
multicast.
E
F
•
MMRP can carry any protocol but usually traffic is IP multicast
•
Alternative in this case to MMRP is IGMP snooping
•
Layer 2 switches watch IGMP messages and decide where
stations that have joined or left IP multicast groups lie.
•
Filter tables are adapted accordingly.
•
IGMP snooping is covered by RFC 4541 (informational).
ANC4
L2 switch
G
52
3. Congestion Control and QoS
ANC4
53
Congestion in IP networks
•
Congestion occurs when router receives more packets than it can transmit:
– Buffers fill up.
– Packets are discarded.
•
Congestion control may attempt avoidance or response or both.
– Can be responsibility of hosts (slow down send rate) or routers (selective
discard).
•
TCP has end-end congestion response.
– Reduces rate in response to packet drop.
– What about UDP-type communications?
– Not suitable for isochronous traffic
•
Concept of flow as sequence of packets passing along same route is useful
for router-based control.
– Routers can act on flows to control congestion and prioritise traffic
– Can do this even if flow is not TCP-based.
•
Routers handle congestion via a queuing algorithm
ANC4
54
Quality of Service
•
Basic Internet operates a best effort service
– All packets treated the same.
•
Some types of traffic need guarantees
– Implies a need for a service model with different levels of quality of service
(QoS).
•
Main issue is allocation of network resources (bandwidth, router buffer
space etc) so as to satisfy competing requests fairly
•
Closely connected with congestion control (how network reacts when
excessive demands are made on resources)
•
Resource allocation can be responsibility of hosts or routers or both.
•
Best effort service network typically uses hosts with feedback to reduce
demand (e.g. TCP uses implicit feedback) .
•
Support for multiple QoS levels is usually via resource reservation (e.g. to a
flow) and relies on router behaviour.
ANC4
55
TCP Tahoe Congestion Control
• TCP Tahoe refers to TCP with original Jacobsen congestion
control algorithm.
• TCP sender maintains congestion window (cwnd).
– Incoming ACKs clock outgoing packets.
– When connection starts, congestion window set to 1 segment.
– ~doubles every round trip time (RTT) until it reaches preset
threshold (sstresh): this is called slow start.
– Then increases by 1 every RTT (additive increase) unless timeout
occurs: this is the congestion avoidance phase.
• On timeout, sender assumes congestion
– Set sstresh = cwnd/2 (multiplicative decrease)
– Re-initiates slow start from cwnd of 1
ANC4
56
TCP Reno
•
TCP Reno (RFC 2001 then RFC 2581)
•
when 3 duplicate ACKs are received:
– Congestion less likely so no need to go back to slow start
– fast retransmit (retransmit segment without waiting for timeout)
– fast recovery
• set sstresh=cwnd/2 and set cwnd =sstresh+3
• wait until new data (retransmitted packet) is ACKed
• then go straight to additive increase from sstresh
DTR
DTR on Reno connection (assume send window = cwnd).
3 duplicate
ACKs
Timeout
Fast Recovery
Time
ANC4
57
Limitations of TCP Reno
•
•
•
•
•
Time sequence diagram (right) shows
fast retransmit in Reno
Note that after fast retransmit, sender
may still get ACKs requesting missing
frame up to one RTT from retransmit.
Reno can get into trouble if more than
one packet goes missing in a single
window of data. Why?
Improved version called TCP NewReno
(RFC 3782) attempts to fix this.
Also note more complex TCP SACK
(RFC 2018) which uses selective
acknowledgements.
ANC4
SN=1
SN=2
ACK(2)
SN=3
SN=4
ACK(2)
SN=5
ACK(2)
SN=6
ACK(2)
SN=7
SN=2
SN=8
ACK(2)
ACK(2)
ACK(8)
58
TCP Vegas
•
TCP Vegas (Brakmo, O’Malley & Peterson, 1994)
– proactive approach to congestion detection
– try to predict congestion by measuring current
throughput on the connection against expected
throughput in an uncongested network (delay rather
than loss-based)
• this can be estimated by comparing current
throughput against the best achieved so far;
– if difference is small cwnd is increased; if large,
cwnd is decreased
– attempt to correct for congestion before it takes
hold while maintaining throughput
– TCP Vegas not widely deployed
– Other delay based approaches FAST TCP
(commercial); Compound TCP (Microsoft)
ANC4
ExpectedT 
CurrentT 
cwnd
BaseRTT
cwnd
RTT
Diff  ExpectedT  CurrentT
If Diff   then cwnd  cwnd  1
If Diff   then cwnd  cwnd  1
The Vegas algorithm
BaseRTT is lowest RTT observed so far.
,  constants.
59
Queuing
•
Queuing algorithm dictates how to handle packets arriving
When packet, P, arrives allocate a
for given output. Includes:
–
–
•
•
scheduling discipline (which packet should be transmitted
numeric timestamp, TS(P) using a fair
next) and
queuing algorithm
drop discipline (which packet to throw away).
Simplest algorithm is FIFO with tail drop
–
commonest in Internet routers.
–
but single FIFO queue creates restrictions
Flow 1
Can use multiple queue scheduling to separate flows out
Flow 2
for fair treatment or to allow priorities.
•
Fair queuing:
Could use round-robin scheduling (one packet from each
transmit packet
queue in turn) but some applications may send big packets
with smallest
thus obtaining unfair advantage.
timestamp
–
fair queuing counters this using a timestamp system (see
Flow 3
right and next slide)
–
can be adapted to include carefully controlled priorities
(weighted fair queuing)
ANC4
60
Fair Queuing
•
Aim is to compensate for varying packet size in different flows (queues).
•
Suppose that:
•
–
When packet, P, arrives at an output, X, it is allocated to a queue, Qi.
–
X is currently transmitting a packet PT, from one of its queues (not necessarily Qi).
–
The estimated transmission time for P is t(P)
If there is a packet, PQ , in front of P in Qi , P is time-stamped with value:
TS(P) = TS(PQ) + t(P).
•
If Qi is empty, P is time-stamped with value:
TS(P) = TS(PT) + t(P).
•
The fair queuing algorithm always transmits the packet with the smallest
timestamp.
•
Easy to adapt this to implement weighted fair queuing by scaling transmit
times.
ANC4
61
RED
•
Random Early Detection (RED)
proposed for use with TCP (Floyd et al
1993; see RFC 2309)
– Drop discipline to avoid congestion
– drop packets before buffers full.
– TCP sources slow as soon as
packets get lost.
– Every packet arriving at a given
queue is dropped with a probability
determined by the current average
queue length computed as shown.
AvgLen  (1  Weight )  AvgLen  Weight  SampleLen
0<Weight<1
SampleLen is length of queue when sample taken
If AvgLen is between MinTresh and MaxTresh, each packet dropped with probability P
Below MinTresh P = 0; above MaxTresh P=1.
ANC4
62
Explicit Congestion Notification (ECN)
•
The ECN scheme (RFC 3168) needs compliant
transport protocol (E.g. TCP)
•
Uses two extra bits in the IP Header and two in the
TCP header.
•
IP bits (in old TOS byte) are: ECT (ECN Capable
Transport) and CE (Congestion Experienced)
•
TCP bits are ECE (ECN Echo) and CWR
(Congestion Window Reduced)
•
When ECN is in effect, IP packets start out with ECT
set, CE cleared.
•
If an ECN compliant router encounters congestion
and sees a packet with ECT set, it can choose to
forward with CE set instead of discarding.
•
At receiver, transport layer sets ECE bit in response.
Sender then reacts as with packet drop but also
sends next segment with CWR set.
ANC4
IP Packet with
ECT =1, CE=0
Congested
network
IP Packet with
ECT =1,
CE=1
TCP sets
ECE=1
TCP acts as
with packet
drop. Sets
CWR=1
63
UDP Issues
•
UDP was originally used for short request-response transfers
•
More recently used for streaming multimedia, VoIP, gaming etc.
–
TCP is too constrained for these applications
–
retransmission, in-order delivery get in the way;
–
congestion control is not smooth
•
Recently, significant increase in proportion of UDP on Internet
•
But UDP has no congestion control
–
•
bad for Internet as a whole.
Suggestion 1: Add congestion control mechanism to UDP
–
TCP Friendly Rate Control (TFRC) outlined in RFC 3448 can be used with RTP (common
cargo for UDP). See also update in RFC 5348.
•
•
•
•
Suggestion 2: Design new protocol, “UDP+” including congestion control
mechanisms but omitting unwanted features of TCP
–
•
TFRC continually measures RTT and receive rates (c.f. TCP Vegas) and uses equation-based
control to then adjust send rates
Feedback packets are used to convey measurements back to the sender
Congestion response smoother than TCP but slower (only use when smooth response is needed).
DCCP (Datagram Congestion Control Protocol)
At other pole is UDP-lite (RFC 3828): reduced error checking compared to UDP.
ANC4
64
SYN flooding attacks
•
A SYN flood is a denial of service attack:
– Attacker sends many TCP SYN requests to open spurious connections to server
– Server responds with SYN ACK, but attacker fails to return ACK (third element of
3-way handshake) and waits for half-open (embryonic) connection to time out.
•
In TCP, half-open connection allocates resources on server (records details
of connection) while it waits for ACK.
– By constantly initiating spurious connections, attacker aims to exhaust space
available on server to handle legitimate connections.
•
Attacker may spoof packets to make them seem to have multiple sources.
– makes defence based on identifying attacking IP numbers very difficult.
•
A SYN cookie (TCP: Bernstein 1996) or Init cookie (DCCP) mounts defence:
– server dispenses with local record of half-open connection
– instead encodes state of connection in SYN ACK response and returns to client.
– If client wants to open connection, must return state information with ACK.
– If the client is spurious (no ACK is returned) no buffer space reserved on server.
– SYN cookie supports only limited state (no TCP options, 8 allowable MSS values)
– DCCP Init cookie allows more state.
ANC4
65
Traffic types and QoS
•
Distinguish between real-time and elastic applications (e.g. file transfer).
•
Real-time (aka isochronous) applications vary in type.
– Typically application data unit (audio sample, video frame, real time command)
must be used at destination at some playback time.
– Need predictable latency but can use startup delay and playback buffer to give
some tolerance.
– Some real-time applications are tolerant of loss but there are also intolerant realtime applications.
– Some real-time applications can accommodate small variations in playback point
(delay adaptive) or rate (rate adaptive) by degrading gracefully.
•
QoS must provide suitable service for various traffic types
•
Two broad approaches to QoS
– Fine-grained provides QoS to flows.
– Course-grained provides QoS to classes of data
ANC4
66
Integrated Services
•
Fine grained approach to Internet QoS called integrated services (IntServ).
–
•
Not widely used (does not scale well) but worth studying
Service classes:
–
guaranteed for intolerant real-time applications (see RFC 2212)
–
controlled load for tolerant adaptive real-time applications (see RFC 2211)
•
A traffic flow may be multicast and is initially defined by a Tspec generated by the
sender and based on a traffic shaping algorithm.
•
Each receiver chooses a Tspec it wants but may add capabilities of its own via an
Rspec (only used in guaranteed service): may e.g. allow faster clearing of queues.
•
Integrated Services approach requires:
–
Admission control —can flow be supported by network? This can be hard on Ethernet
networks.
–
RSVP (Resource Reservation Protocol) —setup protocol for an admitted flow, reserves
resources at routers (RFC 2205).
–
Policing —does packet adhere to agreed TSpec.
–
Packet Classification —associate packet with reservation (use IP addresses, port numbers,
protocol number). May use 802.1p priorities or IP DSCP field (formerly TOS)
–
Packet Scheduling —needs sophisticated queuing.
ANC4
67
Traffic Shaping
•
Policing involves traffic shaping (rate limiting).
•
Traffic shaper delays metered traffic if it tries to exceed terms of its contract
(e.g.Tspec in IntSev).
•
IP typically metered with token bucket algorithm.
– “bucket” contains tokens needed to send a unit of data.
– Tokens arrive at some predetermined rate, r, and are placed in bucket unless
bucket is full, b.
– Tokens used up every time packet is sent.
– Allows burstiness up to maximum of bucket size but will impose peak bandwidth
limit, p.
– r, b and p are main elements of Tspec in IntSrv.
– also used in other QoS approaches.
•
Metered packets stored in FIFO buffer until they can be sent
– If buffer overflows packets discarded.
ANC4
68
RSVP
•
Creates illusion of virtual circuits in IP network
•
Receivers make resource reservations
•
Sender sends PATH message with TSpec to
receivers.
•
Each router on (multicast) path records reverse path.
May modify TSpec
•
Receivers return RESV message with Flowspec
(Tspec + Rspec)
•
–
Passed back up route
–
Requirements may be combined at some routers to
serve multiple receivers
–
RSpec may give routers licence to send a little faster
and increase delays a little (slack) where useful.
Reservations time out (no explicit delete)—soft state.
–
Receivers repeat RESVs regularly (~30s)
–
Senders repeat PATH regularly (~30s).
ANC4
69
Differentiated Services
•
•
Alternative to IntServ is differentiated services (DiffServ) (RFCs 2474, 2475)
–
mark packets at edge of an administrative area (DS domain).
–
6 bits of old TOS field have been redefined as differentiated services code points (DSCP),
used to identify classes.
–
Different classes of packets are then treated differently by routers (known as per hop
behaviours or PHBs)
PHB classes have been defined as follows.
–
Expedited forwarding (EF) guarantees immediate forwarding for EF packets (number of EF
packets in area must be limited) RFC 3246. Aimed at real-time services like voice and video.
–
Assured forwarding (AF) is a group of 4 classes each with 3 levels (RFC 2597). Can use e.g.
RED with differently handled packet types to discriminate between priorities— Weighted
RED (WRED). Flexible use to be determined by service provider.
–
Use Weighted Fair Queuing to handle different DSCP packets in different queues.
•
Packets marked (classified) at edge of domain by DS boundary nodes. Interior nodes
forward marked packets according to pre-agreed rules.
•
Actual behaviour is dependent on service provider (may differ between domains).
•
Some form of admission control is necessary to allow a boundary node to know how to
classify a packet. Traffic may then be conditioned (metered, shaped and policed) to
prevent a source from injecting more at a given level than has been agreed.
ANC4
70
Internet QoS: a critique
•
Work on Internet QoS has been going on for over 15 years. Yet there is no
wide-scale deployment.
•
In practice QoS is restricted to LANs, VPNs and specialist providers.
•
QoS is complicated to implement and this makes it unattractive to graft onto
a working system unless a very clear business benefit can be
demonstrated. See, e.g. (copies on ANC4 web page).
– Bell, “Failure to thrive: QoS and the Culture of Operational Networking”
– Davie, “Deployment Experience with Differentiated Services”
– both from ACM SIGCOMM 2003 Workshops
•
Some oppose deployment of QoS on the Internet as a whole. There are
practical and philosophical reasons for such a stance:
– Practically it is argued that QoS is only needed if network resources are scare so
instead we should always ensure significant overprovisioning.
– Philosophically, opposition is tied to arguments about network neutrality.
Proponents fear that QoS will result in capacity being shared in such a way that
the “rich win at the expense of the poor” (c.f. artificial scarcity).
ANC4
71
4. Real World Implementation:
The Windows Communications System
ANC4
72
Windows communication system
•
Windows communications architecture
has always supported protocol stacks
other than TCP/IP but latter is primary.
•
Layer 2 functions implemented by NIC
hardware and drivers which adhere to
Network Driver Interface Specification
(NDIS) specification (exposes device
independent functions which can be
used to program against Layer 2)
•
Above NDIS are protocol drivers such as
Tcpip.sys.
•
the TCP/IP Protocol drivers interact with
kernel mode clients via Network
Programming Interfaces (NPIs).
•
A user networking API like Winsock is
implemented by a kernel mode NPI
client and user mode DLLs.
Application Process
User mode
API DLLs
Kernel mode
Kernel mode NPI clients
NPIs
Protocol drivers
(Network/transport layer
functionality)
NDIS
NDIS wrapper
NDIS miniport driver
ANC4
73
NDIS
•
Network Driver Specification Interface (NDIS): API
developed by Microsoft and 3com.
•
Allows NDIS compliant communications drivers to
communicate with each other and with OS via
standard calls.
•
Network Interface Card (NIC) is controlled by an
NDIS (compatible) miniport driver.
•
NDIS consists of a library (sometimes called the
wrapper) of functions which can be called by a
protocol driver or a miniport driver.
•
Protocol driver “lower edge” first binds to a miniport
driver for chosen NIC. Then NIC can send and
receive.
•
Miniport driver calls NDIS library to communicate with
protocol driver, with OS and also for setting up NIC.
•
NDIS miniport driver is more portable than device
driver not using NDIS.
•
Current version is 6.6 (Windows 10, Server 2016).
ANC4
Protocol driver
Call NDISxXx
NDIS calls
PrototcolxXx
NDIS Wrapper
NDIS calls
MiniportxXx
Call NDISxXx
Miniport driver
Other NDISxXx calls
NIC
74
Implementing the Protocol Stack
•
Protocol driver implements combined transport and network functionality with data-link framing
for selected Layer 2.
–
handles segmentation, reassembly, acknowledgement etc
•
Transport functions accessible to kernel mode clients via an NPI. Examples are the Transport
Layer NPI (TLNPI) which gives access to Tcpip.sys and the Windows 2000/XP TDI interface it
replaced.
•
An NPI makes functions available to user-mode applications via NPI clients in kernel mode,
accessed through user mode DLLs.
•
Winsock uses the driver Afd.sys (Ancillary Function Driver) as its kernel mode client and the
user mode library and the core user-mode library Ws2_32.dll provides the API to user
programs .
•
Multiple protocol drivers can bind to a single miniport.
•
Protocol driver and miniport driver communicate via NDIS library, e.g. when protocol driver
wants to send data or miniport driver wants to deliver data.
•
Intermediate or filter drivers can be inserted between the protocol driver and miniport driver.
These all communicate via the NDIS library. An example is the network capture driver of
Microsoft Network Monitor tool and its successor Microsoft Message Analyser.
ANC4
75
The NIC
• Incoming packets are stored in NIC FIFO according to filter
Main memory
currently in effect (e.g. exact address, promiscuous unicast,
promiscuous multicast, VLAN). See diagram.
• Interrupt is generated which schedules a service routine
Notify protocol
driver via NDIS
library
Buffer
associated with the miniport driver.
• Typically data is then transferred by Direct Memory Access
(DMA) to a buffer in main memory controlled by protocol
driver. Further memory-memory copy typically needed to
transfer data to application.
Miniport
driver
• DMA device must have access to physical memory
addresses. NDIS uses NET_BUFFER structure to store data. Interrupt
Includes Memory Descriptor List (MDL) which describes
virtual memory buffer layout in physical memory.
• Outgoing data is sent via NDIS call to miniport driver API.
Initiate
DMA
transfer
Control
Bus
Mastering
DMAC
NDIS library passes calls to miniport driver via MiniportXxx
calls.
• DMA is then arranged to transfer from memory buffer to NIC
FIFO. Protocol driver notified via NDIS call by miniport driver.
ANC4
Arriving Ethernet frames
76
Scalable Networking
•
Substantial overhead in processing TCP/IP protocol stack:
–
Comms drivers must segment, reassemble, checksum, incoming and outgoing data as well
as handle protocol functions.
–
Many interrupts from NICs (usually constrained to a single core even when, as usual in
modern systems, there are several present).
–
Memory-to-memory copying
•
These issues become more critical at high networking speeds and especially on
servers.
•
Number of techniques have been developed to reduce this overhead. Usually rely
on more intelligent network hardware with suitable OS support.
•
Several scalable networking techniques into the redesigned communications
system introduced in Windows v6.0 (Longhorn). marketed as Vista, Server 2008
–
TCP Chimney Offload
–
Receive Side Scaling
–
NetDMA (in conjunction with Intel’s I/O Acceleration Technology). This was deprecated in
Windows 2012 and Windows 8 largely due to lack of take-up.
–
IPSec Offload
ANC4
77
TCP Offload
•
A TCP Offload Engine (TOE) is a technology
incorporated in some NICs which allows the card
to process the entire TCP/IP stack.
Application
•
Full offload supports connection management as
well as data transfer.
•
Windows supports partial (stateless) offload,
known as TCP Chimney Offload: connections
handled by software as usual but data transfer on
a TCP connection can be handled by the NIC.
Layer 4
•
Chimney offload does not apply to non-TCP
packets.
Layer 2
•
TOE has been criticised as being difficult to patch
(and thus possibly less secure), may run out of
hardware resources, is proprietary and may have
a limited lifetime (given rate of technological
advance).
•
Non-chimney
traffic
Switch function
Layer 3
TCP Chimney
State
updates
NDIS 6.0
Miniport drive
NIC
It is argued that some of these criticisms have
less applicability to partial offload approaches.
ANC4
78
Receive Side Scaling
•
When data arrives on a connection an interrupt is
generated.
•
Modern servers have multiple CPUs/cores to help
share load but to distribute arriving data from TCP
over many cores is problematic since for any
given connection, the protocol driver requires inorder delivery. .
•
•
•
Multiple cores
Select
dest core
Windows Receive Side Scaling (RSS) tries to
solve this by hashing on selected fields of the
packet header and using an indirection table to
decide which core will handle that packet.
All packets on a connection are then handled by
the same core.
If system is sufficiently flexible (e.g. Message
Signalled Interrupt or MSI support) ISR can be
made to run on the chosen core. Otherwise ISR
always runs on one core and work is distributed
from there.
ANC4
Indirection
table
Hash on
header
Incoming
packet
79
5. Overlay Networks
ANC4
80
Overlay Networks
•
•
An overlay network, A, is a logical network
implemented on top of another underlying
network, B.
–
Nodes in A are typically hosts or nodes of B. In
many cases only hosts are used.
–
Links in A are typically tunnels through B.
Overlay Network
Node
Link
Host
For example:
–
•
Host
The Internet can be seen as an overlay of the
telephone network.
–
A VPN is a private overlay of some more public
network (e.g. the Internet, a carrier network etc).
–
Mbone and 6-BONE were routing overlays used
on the Internet to test and deploy new forms of
IP routing.
Node
Tunnel
Underlying Network
Overlay networks based on Internet hosts are of particular interest.
–
Internet has limited capacity for experimentation with new protocols because it provides an essential
service (this leads to what is sometimes called ossification)
–
However as an underlying network it provides direct connectivity between any two hosts or nodes.
–
Internet hosts and, in some cases, routers, can then be interconnected by tunnels into any desired
overlay topology.
End-system multicast
• IP multicast has not been universally deployed.
Host
• Alternative is end-system multicast.
–
Host-only overlay network using underlying Internet.
–
Less efficient than IP multicast but much more so than
multiple unicasts (see diagram).
–
Links are UDP tunnels (application-friendly)
–
Important to efficiency how links are chosen (why?).
Multiple unicast
Host
• Members of multicast network keep measuring
round-trip times to estimate best tunnels.
–
Current best tunnels used to form underlying mesh of
tunnels (this is the overlay network proper).
–
Multicast algorithm (e.g. DVMRP) now run to form
multicast tree using mesh edges only.
–
When host joins network it notifies existing member and
forms tunnel to it.
–
When host leaves, neighbours reconfigure tunnels.
–
Mesh must be maintained as join and leave operations
will result in sub-optimal tunnels.
ANC4
Host
Host
IP multicast
Host
Host
Host
Host
Host
Host
End-system multicast
Host
Host
82
Peer-Peer Networks
•
Peer-peer (P2P) networks attempt to decentralise resources with limited or no
server support.
•
Implemented as host-only (application layer) overlay networks
•
First achieved prominence with music sharing systems like Napster and more
general file-sharing systems like Kazaa.
•
Networks are dynamic (peers can join or leave at any time) and resources are
distributed across all peers.
•
An unstructured P2P network attempts to address issues of peer discovery and
indexing (resource location) without a priori organisation.
•
A structured peer-peer network will attempt to organise connections between peers
and sometimes also resources using techniques like distributed hash tables.
•
Unstructured networks can be:
–
Centralised (central server used for indexing) e.g. Napster
–
Hybrid (special infrastructure supernodes but no single central server) e.g. Kazaa
–
Pure (all peers are entirely equal) e.g. Gnutella
ANC4
83
Gnutella: pure unstructured P2P
•
Peers organise into overlay network connected by UDP tunnels.
•
A peer, A, is connected to all the others it knows about but this is only small subset of
entire network.
•
If A wants an object it queries its neighbours.
•
•
•
–
If one has the required item, it responds and A can access it
–
If no neighbour has item, query is forwarded to all its neighbours and so on (effectively query is
flooded across network)
Queries contain Query ID (QID) and record of the upstream neighbour from which it came
but no trace of source.
–
QIDs used to damp flooding
–
Responses sent upstream until they return to A.
Nodes discover new nodes when they respond to a query
–
Such a discovered node may be retained as a new neighbour.
–
Neighbours periodically check each other for life via PING and PONG messages.
Flooding is an expensive mechanism for broadcasting:
–
Other options exist that try to reduce load (e.g. probabilistic forwarding)
–
Similar problems encountered broadcasting in mobile ad hoc networks (MANETS)
ANC4
84
Structured P2P networks
•
•
Here we try to map objects onto nodes in a distributed fashion. Issues are:
–
How do we map?
–
How do we route request for object to correct node?
Natural approach to distribute objects is to use hash function but:
–
In simple hashing need to decide in advance how many buckets;
–
Not suitable for network where peers can join and leave.
•
A distributed hash table (DHT) is a decentralised distributed lookup system that allows any peer in
an overlay network to retrieve an object given its key (hash) in some very large keyspace (e.g. 160
bits).
•
Most DHT systems use consistent hashing to partition keyspace and key-based routing mechanism.
•
In consistent hashing define a distance function that measures “closeness” of any two keys. Then:
–
Generate a key in the space from each object’s ID (e.g. SHA-1 hash of the name).
–
Generate a key in the space from each peer’s address (e.g. IP number)
–
Allocate a new object to the peer with key closest to the object’s own key.
–
A host owns the key-subspace consisting of all keys that are closer to it than any other peer.
–
If peers leave or join, only the key-subspace of neighbours (in key-space) is affected.
•
In key-based routing, for any key, k, every node, A, that does not own k has at least one overlay
neighbour whose key is closer to k’s than A’s. A forwards to the neighbour whose key is closest to
k’s.
•
See Rowstron & Druschel’s paper on the Pastry system (ANC4 website)
ANC4
85
BitTorrent
•
BitTorrent is a peer-peer file sharing protocol designed by Bram Cohen (2001).
•
Files are divided into pieces downloadable separately.
•
Each file shared by its own independent overlay network or swarm.
•
Swarm starts as one node with complete file.
•
Node that wants file, joins swarm as leecher. When it has downloaded a piece it becomes
another source for that piece.
•
Once a node in the swarm has a complete copy it becomes a seed.
•
In conventional BitTorrent, swarm has a server called its tracker that keeps track of current
swarm membership.
•
To join swarm a new peer, A, uses a .torrent file typically downloaded from a web server to
locate the tracker which it then contacts.
•
Tracker replies with partial list of peers to which A connects and with which it can exchange
data. Linked peers inform each other of which pieces they have completed.
•
Newer versions of BitTorrent support trackerless swarms using a DHT implemented over a
peer-finder network which spans swarms. DHT based on Kademlia protocol. Client
acquires hashed version of desired item called a magnet link and uses this to search DHT
for peer information. Replaces .torrent file, though both systems may run together.
ANC4
86
Operation of BitTorrent
•
A peer-peer network is much more efficient than a central server at distributing a file
to multiple recipients. Why?
•
However any peer-peer network will only function well if participants upload as well as
download. BitTorrent was designed to try to ensure that free-riders are contained.
•
BitTorrent enforces fair behaviour by swarm members. A peer chokes
another if the latter is not uploading at required rate.
•
When a new peer, A, joins a swarm it is allocated a set of swarm neighbours by the
torrent, with which it forms TCP connections. Each neighbour sends a list of pieces.
•
A requests pieces, usually adopting a strategy of rarest first. Why? As A accumulates
pieces, it receives requests itself.
•
BitTorrent uses a fairness approach sometimes called tit-for-tat as follows.
–
A prioritises those neighbours who are sending bits to it fastest selecting the 4 most active.
Neighbours not in this group are said to be choked.
–
It also chooses one 5th neighbour randomly and optimistically unchokes it.
–
The slowest of the 5 unchoked neighbours is replaced by a new optimistically unchoked
candidate every 10 seconds.
–
In this way new peers can participate in the swarm but free-riders are discriminated against.
ANC4
87
Content Distribution Networks
• Client-server interaction is subject to several potential
bottlenecks: client-side server-side and network-based.
• Server-side and network bottlenecks are especially
problematic when a server simultaneously becomes the
target of a large number of clients: e.g a flash crowd or a
distributed DOS attack.
Backend
server
Backend
server
• Solution commonly adopted is a content distribution
network (CDN):
– use geographically distributed surrogate servers to cache
pages held by the main backend servers
Surrogates
– use redirector functions to intercept client requests and forward
them to most appropriate surrogate.
• Redirection may be achieved via DNS request routing
(return different server addresses to clients) and URL
rewriting by surrogate servers.
Redirectors
• Alternatively redirection can be done by physical proxies
which direct the client to an appropriate server.
– Choice of server can be made using a DHT hashing on URL
and server ID.
– Mechanism must also take account of server load and network
proximity.
– DHT means no communication between redirectors needed.
– Examples include Kankan and Coral CDNs.
ANC4
Clients
Yet another approach uses IP anycast (RFC
1546) with BGP so a single IP address
locates nearest server; however this is not
responsive to dynamic changes in traffic.
88
More on CDNs
•
CDN requires major infrastructure. Only largest content providers operate
CP Network
CDNs (e.g. YouTube/Google Amazon does); others use 3rd party e.g. Akamai
•
•
Infrastructure must be distributed globally. Two strategies:
–
Enter deep (e.g. Akamai) places server clusters inside ISP access networks.
–
Bring home (e.g. Limelight, YouTube) places smaller number of larger server
clusters equidistant from multiple Tier 1 ISP Points of Presence (PoPs)
DNS request routing (see RFC 3568) used with DNS extensions (RFC 6891).
Client requests content url
A
B
Site Returns q
q?
Si?
Si
Consider case where content provider (CP) uses third party CDN provider.
•
–
Client, A, visits content provider (CP) site, B, and accesses content url
–
B returns DNS query, q, for content.
–
User host submits q to local DNS server L.
–
L passes q to CP’s authoritative DNS server, C.
–
C returns name, H of a host in CDN network to L
–
L consults CDN provider’s authoritative DNS server, D,
–
D returns hostname, Si, of most suitable surrogate server to L.
q?
L
H
S1
–
L returns Si to A.
–
Client now uses second DNS query to resolve IP address of S.
Surrogate server is chosen by some cluster selection strategy which may be
based on simple geography or, in more sophisticated implementations, by
C
H?
Si
D
S2
.
.
CDN Network
.
Si
.
.
.
Sn
measuring RTTs to client networks.
ANC4
89
6. Multimedia Networking
ANC4
90
Transmitting Video
•
Video has high bit-rate, e.g. 40Mbps for Blu-ray 1080p.
•
But it can be compressed effectively by exploiting spatial redundancy (within
an image) and temporal redundancy (repetition between frames).
•
•
–
E.g. With H.264/MPEG 4 AVC (lossy) compression,1080p can be delivered at less than
8Mbps (including 160kbps audio), although with some definitional loss.
–
See http://arstechnica.com/apple/2012/03/the-ars-itunes-1080p-vs-blu-ray-shootout/ for
an interesting comparison.
When streaming video, multiple versions may be sent at same time at
different quality levels and bit rates.
Image taken from arstechnica article above
depicts scene from the (vampire) movie “30
Days of Night”
•
–
Left hand image is iTunes 1080p
(~4Mbps)
–
Right hand image is Blu-ray (~40Mbps)
Newer H.265 (HEVC) compression better
than halves bit rates for same quality and is
designed to support UHD (4K).
ANC4
91
PCM
•
An audio waveform of bandwidth B Hz can be converted to digital data to any
desired degree of accuracy by:
–
Sampling at a rate >2B samples/sec, the so-called Nyquist rate (this follows from The
Sampling Theorem)
–
quantising each sample to a set of allowable levels. M allowable levels coded with log2M bits.
•
Levels need not be equally spaced. More accuracy if we space low levels closer together
(companding).
•
This technique is called pulse code modulation (PCM).
•
PCM transmission requires DAC and ADC. A PCM codec (coder-decoder) includes both.
•
Easy to use time division multiplexing (TDM) to multiplex interleaved samples.
7
6
5
4
3
s(t)
ADC
2
s(t)
1
Analogue
baseband
0
6
7
110
111
6
5
3
Sample value
110
101
011
Coded value
NCT4
Binary Coded
Signals
DAC
s(t)
Received analogue
baseband
92
Transmitting Audio
•
POTS (Plain Old telephone Service) samples at 8kHz and quantises to 256 levels, a
64kbps stream. Standard telephony codec is ITU G.711 using PCM.
•
CD audio samples at 44.1kHz (why?) and quantises to 16-bit accuracy, giving 705.6kbps
PCM for mono, 1.411Mbps PCM for stereo.
•
Rarely transmit PCM audio directly over Internet. Instead use compression.
•
For near CD quality audio common compression algorithms are MPEG 1 layer 3 (MP3) and
Advanced Audio Coding (AAC).
•
For speech, a codec such as ITU G.726 can compress telephone quality voice to 32kbps
(e.g. used in international trunk lines and in DECT) using ADPCM (adaptive differential
PCM). Additional modes are also available including one at 16kbps.
•
Other codecs can compress voice still further but require more processing power and carry
lower-quality penalties. Include G.729 (8kbps) and G.723.1 (5.3kbps) sometimes used in
VoIP.
•
Wideband audio codecs, such as ITU G.722, improve on telephone quality by capturing
frequencies up to 7kHz in the analogue signal. (cf 3.4kHz in POTS).
•
Finally recall that users are much more sensitive to glitches in audio than in video streams.
ANC4
93
Streaming Stored Video and Audio
•
•
Supports retrieval of audio and video files stored on remote servers.
–
Most important issue is server to client bandwidth.
–
Technology relies on client-side playback buffer to compensate for network jitter.
–
3 approaches in general use
UDP streaming carries chunks of video using RTP (Real-time Transport Protocol) over UDP.
–
•
•
Per-stream separate control connections must be maintained between client and server, often using
RTSP (Real-time Streaming protocol), for pause, resume etc. commands.
HTTP streaming accesses stored video and audio using TCP connections
–
Access targets via URLs.
–
Poor characteristics of TCP compensated by larger playback buffers & video prefetching.
–
Video repositioning can use byte-range header in a GET request.
–
HTTP streaming is popular (e.g. YouTube) because: no need for a separate control connection; and
HTTP/TCP can often cross firewalls and NAT routers more easily than UDP.
Dynamic Adaptive Streaming over HTTP (DASH) enhances HTTP streaming as follows:
–
Use multiple versions of video at different qualities. URLs of these held in manifest file on server requested first by client
–
Client specifies quality to use each time a chunk of video is requested. Adapts requests if observed
available bandwidth changes.
ANC4
94
Voice over IP
•
•
•
Carrying telephone voice over IP (VoIP) increasingly popular
–
Aim to consolidate voice, fax (FoIP) and data using IP as vehicle;
–
Reduced cost and increased flexibility
Voice digitised and packetized by VoIP codec. Various options. E.g.
–
G.711 standard PCM. Packet payload variable, typically 20ms of samples (160 bytes).
–
G.723.1 compressed to 5.3/6.4kbps. Payload: 30ms, 24 bytes (6.4kbps)/ 20 bytes (5.3kbps).
For successful transport of voice, IP network must attempt to control delay and jitter.
–
High delays accentuate echo (>50ms) and encourage talker overlap (>250ms). 3 components:
accumulation delay; processing delay; network delay.
–
Jitter compensation requires buffering and playback (but adds to network delay).
•
Low bandwidth and congestion will inevitably degrade VoIP performance
•
VoIP payload carried by RTP over UDP. TCP is not suitable for reasons discussed.
•
RTP carries sequence numbers and timestamps to aid playout.
•
Header lengths are 12 (RTP) + 8 (UDP) + 20 (IP).
–
Required bit rate for G.711 is 87kbps (each way) and for G.723.1 (6.4 kbps) requires 33kbps.
–
Note that this ignores Layer 2 overhead (e.g. Ethernet).
NCT4
95
Implementing VoIP
•
VoIP is a challenging use of IP technology. Various problems:
•
Receive buffers are much more limited than in streaming. Receiver tries to play out each
packet at same playout delay, q, from its (time-stamped) time of generation at sender.
•
•
–
q includes a variable buffer delay to compensate for jitter
–
If q is too short some packets may miss playout time
–
q can be varied adaptively from one talk spurt to another if receiver measures average network
delay and variance.
Packet loss can be addressed via loss recovery schemes.
–
Forward error correction (FEC) sends redundant information to help recover lost packets
–
Packet interleaving tries to reduce effects of losses by manifesting any loss as several slightly
damaged chunks rather than one completely lost one
–
Error concealment uses interpolation to try to replace the audio lost with a missing packet.
Example: Skype. This is proprietary (now owned by Microsoft) - details are not published
and have to be deduced. Optionally carries video in qualities from 30kbps to 1Mbps.
–
Many codecs used but typical audio quality is wideband at 16000 samples/sec.
–
Audio and video goes over UDP by default. TCP is used when UDP streams blocked
–
Uses P2P technology for user location and NAT traversal, using super-peers and relays.
–
Super peers carry distributed index of usernames to current IP addresses and assist in
establishment of NAT-NAT connections. Relays carry packets between such connections.
ANC4
96
VoIP Signalling
•
VoIP must allow for calls between VoIP clients (hard or soft VoIP phones) and
between VoIP clients and PSTN phones.
•
Two main approaches to Internet telephony: ITU & IETF
•
–
Both systems use similar codecs over RTP for media transport;
–
Differ in signalling approach
ITU approach is H.323
–
Comprises multiple protocols including those for media transport.
–
Includes H.225 to provide call-setup signalling similar to that in ISDN.
•
IETF approach based on SIP (Session Initiation Protocol: RFC 3261 - 3265)
•
SIP is used to initiate user-user communication:
–
initiator addresses user using SIP URI (Universal Resource Identifier) of form [email protected]
(like email address)
–
SIP proxies direct calls to user devices
–
user registers location(s) with domain SIP proxy.
–
SIP proxy for given domain located via DNS; call forwarded to known user location(s).
–
Initiating SIP call includes SDP (Session Description Protocol: RFC 3227) message which
includes details such as codecs to be used etc.
NCT4
97
SIP
•
SIP allows calls to be established in VoIP, video conferencing & many messaging services.
•
Network endpoints in SIP conversation are user agents (UAs). Caller is UAC, callee is UAS.
•
Aim is to initiate call between 2 or more users each using a single UA. Here we only consider 2 party calls.
For an introduction to SIP conferencing see RFC 4353.
•
SIP is an out-of-band text-based signalling protocol, carried by UDP (or TCP) using well-known port 5060.
•
SIP call begins with INVITE message. Contains SDP description of desired call, including IP of sender.
UAS accepts call with response headed with value 200 (OK). Failed call returns 4xx code. UAC replies
with ACK message for three-way handshake. SDP reply is included in 200 response.
•
If UA knows IP address of its peer and this is accessible, SIP communication can be direct.
•
•
–
Often however only SIP address is known;
–
IP address may change due to DHCP assignment or use of multiple UAs by one user
–
Often UA may be behind NAT router or firewall.
When no direct initiation possible SIP proxy is used: directs INVITE to the SIP proxy for the UAS.
–
Multiple proxies may be encountered. Each adds a VIA line to the INVITE.
–
This proxy needs to locate the UAS and does so using a SIP registrar (usually associated with the proxy itself). When a
user activates a SIP device it registers with local registrar.
–
Reveals the UAS IP and allows the call to be delivered.
–
Response is sent back along the route taken by the INVITE.
–
Conversation now proceeds as (typically) UDP/RTP transfer between two ports identified in SDP exchange.
Note that the receiving SIP proxy may re-vector INVITE (e.g. to voicemail) if call is not answered
ANC4
98
RTP
•
RTP (RFC 3550)can be used for transmitting a variety of forms of real-time data. Such data is organised
into streams of RTP associated packets. Often multiple streams for one app, e.g. streams for video and
audio each way (4 streams total). Coordination via separate Real-time Control Protocol (RTCP).
•
When RTP is used, profile is attached, defining allowable payload types and their associated payload
formats.
–
Profile used in VoIP is Audio and Video Conferences with Minimal Control (RFC 3551)
•
Usually runs on top of UDP but can also be used with DCCP, SCTP etc.
•
Basic RTP header is 12 bytes long. CSRC (Contributing SouRCe) identifiers are only used when streams
32
are passed through a mixer. Extension header use is open.
•
2
Header fields of note:
4
8
Payload
Type
16
–
Ver is 2-bit version number (current Ver = 2)
–
Payload type (profile dependent) gives content of
current packet (e.g. PCM, H.264 etc)
–
Sequence number
–
Timestamp field uses sampling clock at sender and
gives sampling instant of 1st sample in packet data.
Clock increases by 1 for every sample.
CSRC Identifiers
…
–
Synchronisation Source Identifier (SSRC) identifies
source of RTP stream (finer grain than IP number
& port).
Extension Headers
…
Ver P X
CC M
Sequence Number
Timestamp
SSRC Identifier
RTP Packet Header
ANC4
99
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement