Research Report No. 2007:02 Peer–to–peer Traffic Measurements Dragos Ilie David Erman Department of Telecommunication Systems, School of Engineering, Blekinge Institute of Technology, S–371 79 Karlskrona, Sweden c 2007 by Dragos Ilie and David Erman. All rights reserved. Blekinge Institute of Technology Research Report No. 2007:02 ISSN 1103-1581 Published 2007. Printed by Kaserntryckeriet AB. Karlskrona 2007, Sweden. This publication was typeset using LATEX. Abstract The global Internet has emerged to become an integral part of everyday life. Internet is now as fundamental a part of the infrastructure as is the telephone system or the road network. Peer-toPeer (P2P) is the logical antithesis of the Client-Server (CS) paradigm that has been the ostensible predominant paradigm for IP-based networks since their inception. Current research indicates that P2P applications are responsible for a substantial part of the Internet traffic. New P2P services are developed and released at a high pace. The number of users embracing new P2P technology is also increasing fast. It is therefore important to understand the impact of the new P2P services on the existing Internet infrastructure and on legacy applications. This report describes a measurement infrastructure geared towards P2P network traffic collection and analysis, and presents measurement results for two P2P applications: Gnutella and BitTorrent. Contents 1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 2 Peer–to–Peer Protocols 2.1 P2P Evolution . . . . . . . 2.2 P2P Definitions . . . . . . . 2.3 Distributed Hash Tables . . 2.4 P2P and Ad-Hoc Networks 2.5 P2P and File Sharing . . . 2.6 P2P and the Grid . . . . . 2.7 P2P and Sensor Networks . . . . . . . . 3 3 4 5 6 7 8 9 3 Protocol Descriptions 3.1 BitTorrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Gnutella . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 20 4 Measurement Software Description 4.1 Passive measurements . . . . . . . 4.2 Network infrastructure . . . . . . . 4.3 TCP Reassembly Framework . . . 4.4 Application Logging . . . . . . . . 4.5 Log Formats . . . . . . . . . . . . . . . . . 29 29 31 32 35 36 5 BitTorrent 5.1 Measurement details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Aggregate results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Swarm size dynamicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 41 42 45 6 Gnutella 6.1 Session Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Message Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Transfer Rate Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 47 48 51 A BitTorrent Application Log DTD 55 B Acronyms 57 Bibliography 59 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of Figures 3.1 3.2 3.3 3.4 3.5 BitTorrent handshake procedure. Example announce GET request. Example scrape GET request. . . BitTorrent handshake procedure. Example of a Gnutella session. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 15 17 18 26 4.1 4.2 4.3 4.4 Measurement network infrastructures. Measurement procedures. . . . . . . . Extract from BitTorrent XML log file. Sample BitTorrent log file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 33 38 40 5.1 5.2 5.3 Temporal structure of measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . Swarm size for measurement 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Swarm size for measurement 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 45 46 6.1 6.2 Gnutella Transfer Rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gnutella Transfer Rates at IP layer. . . . . . . . . . . . . . . . . . . . . . . . . . . 54 54 v List of Tables 2.1 P2P and CS content models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 5.2 5.3 5.4 5.5 5.6 Measurement summary. . . . . . . . . . . Content summary. . . . . . . . . . . . . . Download time and average download rate Session and peer summary. . . . . . . . . Downstream protocol message summary. . Upstream protocol message summary. . . . . . . . . . . . . . . summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 43 43 44 44 45 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 6.16 6.17 6.18 6.19 6.20 6.21 6.22 Incoming session statistics. . . . . . . . . . . . . . . Outgoing session statistics. . . . . . . . . . . . . . . Incoming + Outgoing session statistics. . . . . . . . Message size statistics. . . . . . . . . . . . . . . . . . Message duration statistics. . . . . . . . . . . . . . . Message interarrival time statistics. . . . . . . . . . . Message interdeparture time statistics. . . . . . . . . Handshake message rate statistics. . . . . . . . . . . Handshake byte rate statistics. . . . . . . . . . . . . PING–PONG message rate statistics. . . . . . . . . PING–PONG byte rate statistics. . . . . . . . . . . . QUERY–QUERY HIT message rate statistics. . . . QUERY–QUERY HIT byte rate statistics. . . . . . . QRP and HSEP message rate statistics. . . . . . . . QRP and HSEP byte rate statistics. . . . . . . . . . PUSH and BYE message rate statistics. . . . . . . . PUSH and BYE byte rate statistics. . . . . . . . . . VENDOR and UNKNOWN message rate statistics. VENDOR and UNKNOWN byte rate statistics. . . . Gnutella (all type) message rate statistics. . . . . . . Gnutella (all type) byte rate statistics. . . . . . . . . IP Byte rate statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 48 48 49 49 50 50 51 51 51 51 51 52 52 52 52 52 53 53 53 53 53 vii 5 Chapter 1 Introduction The global Internet has emerged to become an integral part of everyday life. Internet is now as fundamental a part of the infrastructure as is the telephone system or the road network. The driving factor pushing the acceptance and widespread usage of the Internet was the introduction of the World-Wide Web (WWW) by Tim Berners-Lee in 1989. The WWW provided ways of accessing information at rates and amounts then unimagined, and quickly became the Internet “killer application”. In May 1999, ten years after the advent of the WWW, Shawn Fanning introduces Napster, arguably the first modern P2P application. The Napster application and protocols were the first to allow users to share files among each other without the need of a central storage server. Very quickly, Napster became immensly popular, and the P2P revolution begun. Since the advent of Napster, P2P systems have become wide-spread with the emergence of filesharing applications such as Gnutella, Kazaa and eDonkey. These systems have generated headlines across the globe when the U.S. Recording Industry Association (RIAA http://www.riaa.com) and Motion Picture Association of America (MPAA http://www.mpaa.org) have filed law suits against file-sharing users suspected of copyright infringement. The law suits are partly responsible for the embrace of the term P2P as an equivalent for illegal file-sharing. Fortunately, the concept of P2P networking is broader than that and P2P systems have many useful and legal applications. The P2P paradigm is the logical antithesis of the CS paradigm that has been the ostensible predominant paradigm for IP-based networks since their inception. This is however only true to a certain degree, as the idea of sharing among equals has been imbued in the Internet since the early days of the network. Two examples supporting this statement are the e-mail system employed in the Internet and the Domain Name System (DNS). Both protocols are so tightly connected to the inner workings of the Internet, that it is impossible to imagine the degree of usage that the Internet sees today in the absence of those protocols. Once an e-mail has left the user’s mail software, it is routed among mail transfer agents (MTAs), all acting as as equally valued message forwarders. The DNS is the first distributed information database, and implements a hierarchical mapping scheme, which is comparable to the multi-layered P2P systems that have appeared the last few years, such as Gnutella. The major difference between legacy P2P systems such as DNS and e-mail and the new systems such as Gnutella, Napster and eDonkey is that the older systems work as part of the network core, while the new applications are typically application-layer protocols run by edge-node applications. This shift of the edge nodes from acting as service users to being both service providers and users is significantly changing the characteristics of the network traffic. 1 CHAPTER 1. INTRODUCTION This report describes a measurement infrastructure geared towards P2P network traffic collection and analysis. The current chapter has provided a brief introduction, which will be followed by the motivation for the work described in the report. Chapter two continues with a discussion of P2P networks and related network types. Chapter three provides a fairly detailed descriptions of the two protocols measured for the report, BitTorrent and Gnutella. Chapter four explains briefly various network measurement techniques and then presents the measurement infrastructure and software developed at Blekinge Institute of Technology (BTH). Chapter five and six present some of the results obtained from the network traffic measurements of BitTorrent and Gnutella protocols, respectively. 1.1 Motivation The research group at the Department of Telecomunication Systems at BTH has traditionally been involved in research areas related to Internet traffic measurements. Lately, P2P applications seem to be responsible for the predominant part of the Internet traffic [1]. New P2P services are developed and released at a high pace. The number of users embracing new P2P technology is also increasing fast. It is therefore important to understand the impact of the new P2P services on the existing Internet infrastructure and on legacy applications. It was natural in this context for the research focus at BTH to turn towards P2P traffic. This can be seen as part of a broader effort to adapt new services to coexist with older ones as well as to optimize the overall online user experience. We applied several constraints on selecting which P2P protocols to study: open protocol specifications, open-source software implementation, large user base and access to discussion groups where system developers can be contacted. Open protocol specifications were essential because this means no reverse engineering or other type of guess-work is necessary to decode protocol messages, thus allowing the research to focus on measurements and analysis. Access to source code enabled us to understand how certain protocol features, not covered by specifications, were implemented by various vendors. A large user base guaranteed enough peers would be available for measurements. Access to system developers in discussion groups helped us sort out some of the more intricate details of specifications and implementation. Both Gnutella and BitTorrent satisfied all these constraints. The P2P research at BTH has two goals. The first is to establish analytical traffic models that will offer new insights about the properties of P2P traffic. This will hopefully lead to more scalable P2P networks and more efficient and fair resource utilization. The second goal is to research new P2P services that can be built on top of the existing P2P infrastructure, leveraging the advantages offered by P2P systems (e.g., efficient data dissemination and load balancing). In particular, our group is planning to perform research on issues related to routing in overlay networks: Quality of Service (QoS) guarantees, reliability, scalability and multipath routing and transport [2]. Furthermore, we would like to investigate the applicability of our results in ad-hoc, grid and sensor networks. 2 Chapter 2 Peer–to–Peer Protocols The concept of P2P protocols in relation to data communications is quite broad. Generally, this means that nodes engaged in mutual data exchange are capable of equivalent functionality. This is in contrast to pure CS protocols, where nodes may either serve or be served data. A more formal definition of P2P protocols is provided in Section 2.2. 2.1 P2P Evolution The earliest recorded use of the term “peer-to-peer” occured in 1984 and was the context of IBM’s Advanced Peer to Peer Networking (APPN) architecture [3]. This was the result of multiple enhancements to the Systems Network Architecture (SNA). Although early networking protocols such as NNTP and SMTP were working in a P2P fashion – indeed, the original ARPANET was designed as a P2P system – the term P2P did not become mainstream before the appearance of Napster in the fall of 1999. Napster was the first P2P service with the goal to provide users with easy means of finding music files (MP3s). The architecture of Napster was built around a central server that was used to index music files shared by client nodes. This approach is called a centralized directory. The centralized directory allowed Napster to give a very rapid answer as to which hosts stored a particular file. The actual file transfer occured directly between the host looking for the file and the host storing the file. The success of Napster became quickly a source of serious concern for major record companies, who rapidly filed a lawsuit against Napster on grounds of copyright infringement. The lawsuit made Napster immensely popular, attracting millions of additional users. However, Napster couldn’t withstand the pressure of the lawsuit and in July 2001 they were forced to shut down the central server. Without the central server the client nodes could no longer search for files. Thus, the fragility of a centralized directory system became clear. Napster is one of the first generation P2P applications as defined by [3]. Following the advent of Napster, several other P2P applications emerged. Similar in appearance, but altogether different beasts in detail. Gnutella [4], which was released by Justin Frankel of Winamp fame in early 2000, opted to implement a fully distributed system, with no central authority. The same year saw the emergence of the FreeNet system. FreeNet was the the brainchild of Ian Clarke, who had written his Master’s thesis on a distributed, anonymous and decentralized information storage and retrieval system. This system later became FreeNet. FreeNet’s major difference to previous P2P 3 CHAPTER 2. PEER–TO–PEER PROTOCOLS systems was the complete anonymity that it offered users. The fully distributed architecture was resilient to node failures and was also immune to service disruptions of the type experienced by Napster. However, experience with Gnutella showed that fully distributed P2P systems may lead to scalability problems due to massive amounts of signaling traffic [5]. By late 2000 and early 2001, the P2P boom had started in earnest, and applications such as KaZaA, DirectConnect, SoulSeek and eDonkey started appearing. These systems usually provided some form of community-like features such as chat rooms and forums, in addition to the file-sharing services provided by previous systems. KaZaA, which uses the FastTrack protocol, introduced the concept of supernodes in order to solve scalability problems similar to those experienced by Gnutella. Each supernode manages a number of regular nodes and exchanges information about them with other supernodes. Regular nodes upload file lists and search requests to their supernode. The search requests are processed solely among supernodes. Regular peers establish direct HyperText Transfer Protocol (HTTP) connections in order to download files. Gnutella resolved the scalability problem in a similar way. In Gnutella, supernodes are called ultrapeers. The last few years have seen the evolution of the old systems to better utilize network resources, and also the emergence of new systems with the specific focus on efficient bandwidth utilization. The prime example of these is the BitTorrent system. Also, new systems tend to focus on using Distributed Hash Tables (DHTs). DHTs force network topology and data storage to follow specific mathematical structures in order to optimize various parameters (e.g., minimize delay or number of hops). They are seen as a promising alternatives to the flooding algorithms required by routing in unstructured P2P networks. 2.2 P2P Definitions There is no clear concensus regarding an exact definition of a P2P system. Schollmeier makes an attempt in [6] to define a P2P network. In general, the notion of a P2P network appears to be leaning towards some form of utlization of edge node resources by other edge node resources. The resource in question is commonly accepted to be files, and much research is being done on efficient localisation and placement of files. There also seems to be some concensus regarding the idea of pure and hybrid systems. A P2P network is defined in [6] as a network in which the service offered by the system is provided by the participating nodes share part of some local resource pool, such as disk space, files, CPU processing time etc. A pure P2P network is one in which any given participant may be removed without the system experiencing loss of service. Examples of this type of networks are Gnutella, FastTrack and FreeNet. A hybrid P2P network is one in which a central authority of some sort is necessary for the system to function properly. Note that, in contrast to the CS model, the central authority in a hybrid network rarely share resources – this functionality is still provided by the participating peers. The central authority is commonly an indexing server for files or provides a peer localisation service. Examples of this type of network are Napster, eDonkey and DirectConnect. It is also possible to take a resource view on the two types of P2P networks described above. We consider the functions of content insertion, distribution and control and how they are performed in P2P and CS networks. We summarize these in table 2.1. 4 2.3. DISTRIBUTED HASH TABLES Insertion Insertion is the function of adding content to the resource pool of a network. We refer here to insertion in the sense of providing the content, so that in both pure and hybrid networks content is inserted by the participating peers. This is analogous with the peers sharing content. In a CS system however, content is always provided by the server, and thus also “shared” by the server. Distribution Distribution is the function of retrieving content from the resource pool of a network. Again, P2P systems lack central content localization, thus content is disseminated in a distributed fashion. This does not necessarily mean that parts of the same content is retrieved from different sources, i.e. swarming, but rather that the parts, e.g. files, of the total resource pool are retrieved from different sources. By hybrid CS systems, we here refer to Content Delivery Networks (CDNs), such as Akamai [7], and redundant server systems in which several servers provide the same content, but are accessed by client from a single Universal Resource Locator (URL). This is a very common model for Web servers in the Internet today. Control Control is the function of managing the resource pool of a network, such as admission control, resource localization etc. This is the primary function that separates the two types of P2P networks. The peers participating in fully distributed networks are required to assist in the control mechanisms in the network, while the hybrid systems may rely on a central authority for this. Of course, the clients in CS systems have no responsibility towards the network control functionality. Table 2.1: P2P and CS content models. Pure P2P Hybrid P2P Hybrid CS Pure CS Insertion Distributed Distributed Central Central Distribution Distributed Distributed Central/Distributed Central Control Distributed Central Central Central In addition to the definitions above, [3] also classifies P2P systems according to their “generation”. In this classification scheme, hybrid systems such as Napster are considered first generation systems, while fully decentralized systems such as FastTrack and Gnutella are second generation systems. A third generation is discussed as being the improvement upon the first two with respect to features such as redundancy, reliability or anonymity. 2.3 Distributed Hash Tables Hash tables are data structures used for quick lookups of information (data). A hash record is defined as a key with the corresponding value. Hash keys are typically numerical values or strings, while the hash values are indexes in an array and therefore are usually numerical. The array itself is called the hash table. A hash function operates on a given hash key producing a corresponding unique hash value (index) that points to a location in the the hash table. This is the location where the data is stored or the best place to start a search for it. The quality of a hash table and its hash function is related to the probability of collisions. A collision happens when two 5 CHAPTER 2. PEER–TO–PEER PROTOCOLS or more keys point to the same location in the hash table. This problem can be solved by enlarging and rearranging the hash table, but will in general lead to severe performance degradation. DHTs are hash tables spread across many nodes in a network. Each node participating in a DHT is responsible for a subset of the DHT keys. When the hash function is given a key it will produce a hash value that identifies the node responsible for that particular key. In general, DHTs require the use of a structured overlay network. This means that the network topology must follow a particular mathematical (geometric) structure (e.g., a circle as in Chord [8] or a hypercube as in CAN [9]). The structure is typically mapped on some interval equivalent to ℜ or ℜn . Each key gets a well defined position (coordinates) in the specific space interval. A host that joins the DHT is allocated a position in the same space interval. The host will then “own” the coordinates in some neighbourhood of its position. This is how keys are distributed to hosts. Search algorithms can then exploit the properties of the structure to quickly locate which host is owning what key. This is called DHT-routing or key based routing. For example, CAN uses a d-dimensional Cartesian coordinate space and its routing exploits this property by choosing the smallest Cartesian distance to the destination. A good comparison of the tradeoffs between different DHT geometries is presented in [10]. Although DHTs appear to be a very efficient form of information dissemination, storage and lookup, they also suffer from a number of problems [11]: Scalability: The cost to maintain the integrity of the DHT when nodes join/leave the overlay is rather high. Load balancing: Some keys are much more popular than others. Hosts responsible for these keys must handle a disproportionally high volume of traffic compared to other nodes. This phenomena is refered to as “hot spots”. Search flexibility: DHTs cannot handle efficiently keyword searches, at least not as efficient at exact searches. Heterogeneity adaptation: In a heterogeneous environment nodes have different storage, memory, transmission and processing capabilities. Since the topology in a DHT must adapt to a specific structure, it usually does not have enough flexibility to take the heterogeneity factors into account. Current research is addressing these problems [12]. 2.4 P2P and Ad-Hoc Networks During the late 1990s cheap wireless network cards for portable computers became widely available, with 802.11 emerging as the de-facto standard. They were used mostly in office-like environments together with one or more wireless base stations, also called Access Points (APs). The base stations offer services such as frequency allocation, authentication and authorization. The maximum distance from the AP, where radio communication with it is still possible, defines the (radio) cell radius. In the absence of repeaters or high-performance antennas, the wireless cards have limited range on the order of a few hundred meters. In order to communicate with wireless units that are outside the range, they forward the data to the nearest AP which, using either the wired or wireless network, would further forward the data to the destination. 6 2.5. P2P AND FILE SHARING The type of wireless network described above is called a Wireless LAN (WLAN) in infrastructure mode. When only one AP is used, the units movement is confined to the cell radius. This setup is called Basic Structure Service (BSS). Sometimes it is desirable to cover a larger area, such as a whole building or campus. This can be done by using several APs that perform handover when wireless units move between them. In this case the set of participating APs and wireless units is called Extended Service Set (ESS). The ESS mode hides implementation details (e.g., communication between base stations) from the network layer presenting it with what appears to be a regular link layer. Thus, the handovers are transparent for the network layer [13]. Infrastructure mode may not be an option for certain type of environments such as geographically isolated sites, disaster areas or dynamic battlefields. In this type of scenario, the wireless units must be able to communicate with each other in the absence of base stations. Two units that are outside radio range from each other and wish to exchange data will use intermediate units (within radio range) to route the data to destination. This type of communication mode is called ad-hoc mode and the network is called a mobile ad-hoc network (MANET). In essence, each wireless host in the ad-hoc WLAN acts as a router for the other hosts. There is currently a large variety of ad-hoc routing protocols. A complete description is outside the scope of this report, but the interested reader may start with [14]. The important thing to this discussion is that each wireless unit supplies and demands the same type of services from the other WLAN units. This is a typical form of P2P computing according to the P2P definition in [6]. 2.5 P2P and File Sharing File sharing is almost as old as operating systems themselves. Early solutions include protocols such as UNIX remote copy (rcp) command and the File Transfer Protocol (FTP). They were quickly followed by full-fledged network file systems such as NFS and SAMBA. Common for these protocols (with the exception of rcp) is that they were shaped around the CS paradigm, with the servers being the entity storing and serving files. A client that wants to share files must upload them to a server to make them available to other clients. Instant messaging systems such as ICQ[15], Yahoo! Messenger[16] and Microsoft Messenger[17] attempted to solve this problem by implementing a mechanism similar to rcp. Users could thus share file with each other without having to store them on a central server. In fact, this was the first form of P2P filesharing. Napster further extended this idea by implementing efficient file search facilities. In the public eye, P2P is synonymous with file sharing. While other applications that may be termed P2P, e.g. the [email protected] project [18], distributed.net [19] and ZetaGrid [20] have been rather successful in attracting a user-base, no other type of service come close to attracting the number of users that filesharing services have. Services such as those mentioned here are examples of altruistic systems in the sense that the participating peers provide CPU processing power and time to a common resource pool that is then used to perform various complex calculations such as calculating fast Fourier transforms of galactical radio data, code-breaking or finding roots of the Riemann Zeta-function. One of the reasons for the difference in number of users could be that the incentive to altruistically share resources without gaining anything other than some virtual fame or feel-good points of having contributed to the greater good of humanity seems to be low. Most file sharing P2P systems employ some form of admission scheme in which peers are not allowed to join the system or download from it unless they are sharing an adequate amount of files. This provides a dual incentive: first, a peer wanting to join the network must1 provide sort of an entry token in the form of shared 1 Not in all systems, but in most hybrid systems. 7 CHAPTER 2. PEER–TO–PEER PROTOCOLS files, and second, peers joining the system know that there is a certain amount of content provided to them once they join. The BitTorrent P2P system is one of the most prominent networks in enforcing incentive, though admission control in a BitTorrent network is typically handled through web-based registration systems. As not all files are equally desirable in every system, files not belonging to the general category of files handled in a specific P2P network should not be allowed in. For instance, a network such as Napster, which only managed digital music files, might not be interested in peers sharing text files. For systems that require a large amount of file data to be shared as an admission scheme, this becomes a problem. Peers may share “junk files” just to gain access to the network. Junk files are files that are not really requested or desired in the network. This practice is usually scorned upon, but is hard to get to grips with. Some systems, such as eDonkey have implemented a rating system, in which peers are punished for sharing junk files. Similar to junk files, there are also “fakes” or “decoys”. Fakes are files inserted in the network that masquerade under a filename that does not represent the actual content, or files that contain modified versions of the same content. By adding fakes into the network, the real content is made more difficult to find. This problem is alleviated by using various hashing techniques for the files instead of only relying on the filenames to identify the content. An example of this is the insertion of a faked Madonna single, in which the artist had overlaid a phrase on top of her newly released single. While file sharing in and of itself is not an illegal technology and has several non-copyright infringing uses, the ease with which peers may share copyrighted material has drawn the attention of the MPAA (Motion Picture Association of America) and RIAA (Recording Industry Association of America). These organizations are of the view that sharing of material under the copyrights of their members is seriously harming their revenue streams, by decreasing sales. In 2004, the MPAA and RIAA started suing individuals for sharing copyrighted material. However, not all copyright holders and artists agree on this course of action, nor do they agree on the detrimental effect file sharing has on sales or artistic expression. Several smaller record labels have embraced the distribution of samples of their artist’s music online, and artists have formed coalitions against what they feel is the oppressive behaviour of the larger record labels. More recently, P2P systems have been employed by corporations to distribute large files such as Linux distributions, game demos and patches. Many companies make use of the BitTorrent system for this, as it provides for substantial savings in bandwidth costs. 2.6 P2P and the Grid The goal of distributed computing is to provide a cost-efficient alternative to expensive supercomputers by harnessing the collective processing power of a group of general purpose workstations. This approach lead to computer clusters such as Beowulf [21]. Computer clusters typically involve tens or hundreds of computers owned by a company or organization, which are interconnected in a LAN configuration. With the growing popularity of Internet, scientists and engineers have been looking at ways of distributing computations to Internet-connected computers with free computing cycles to share. This approach typically involves several hundred thousand computers spread across the world and is referred to as Internet computing. It differs from clusters in two ways: the number of hosts participating and the geographic spread. Internet computing is part of greater effort called Grid computing. Grid computing takes on a holistic view focusing not only on distributing computations but also catering for resource allo8 2.7. P2P AND SENSOR NETWORKS cation, job scheduling, security and collaboration between users and organizations. It does that by providing open architecture and protocol specifications and by defining services to be provided by grid members. Examples of Grid computing efforts include the Globus Toolkit [22, 23] and BOINC [24]. P2P is related to Grid and Internet computing through the direct exchange of resources and services that takes place between peers. The above mentioned ZetaGrid and [email protected] could be viewed as a type of “proto-grid”2 , and the distributed screensaver ElectricSheep [25] is a more light-hearted variant of the “proto-grid” systems. The main difference between the Grid and P2P is that “Grid computing addresses infrastructure but not yet failure, whereas P2P addresses failure but not yet infrastructure” [26]. For a good comparison of P2P and Grid computing see [26] and [27]. 2.7 P2P and Sensor Networks The term sensor network refers to a wireless communication network consisting of a rather large number (hundreds to thousands) of small-sized, densely deployed, electronic devices (sensors) that perform measurements in their immediate vicinity and transfer the results to a computation center in a hop-by-hop fashion [28]. The geographic area covered by a sensor network is called a sensor field. Sensor networks are similar to ad-hoc networks with reference to data routing. Both type of networks require each node to actively participate in routing decisions (and data forwarding) in order to allow for communications across distances larger than the radio range of a single node. The main difference is that sensors have very stringent requirements for low power and low computation operation. The low power requirement generally means that nodes have smaller radio range than ad-hoc networks. This is the reason why they are densly deployed. The low computation requirement means that they typically use CPUs with limited capabilities in order to conserve energy (i.e., battery power). This means that they often employ special purpose routing protocols that require less computation. A major difference between sensor networks and other type of networks (e.g., ad-hoc, grid and wired) is that sensor networks are data-centric as opposed to node-centric. For example, in a datacentric network the user or the application will ask which node (e.g., sensor) is exceeding a specific temperature. In contrast, in a node-centric network the user or application would rather ask what is the temperature of a specific node [29]. A data-centric approach will typically require attributebased naming. An attribute is a name for a particular type of data. Lists of attributes define the type of data a node is interested in. For example the attribute list “grid=50, temperature>30C” asks for data from grid 50 from sensors that measure a temperature exceeding 30 degrees Celsius. The data centric approach coupled with the low power, low computation operation requirements and the number of sensors in a sensor field makes it impractical to monitor each node at all times. Therefore, the focus is instead on self-organizing sensor networks that cluster nodes in a manner that facilitates local coordination in pursuit of global goals [29]. This lead to research into localized algorithms for coordination. Currently, directed diffusion appears to be a very strong candidate. In directed diffusion nodes establish gradients of interest for specific data. The gradients lead to the natural formation of routes between data providers (sources) and data consumers (sinks). Details on directed diffusion are found in [30]. In order to minimize the volume of data transfers, each node in a sensor network performs data 2 The term proto-grid is used to denote a first generation grid that may have left out some of the requirements for a true grid implementation. 9 CHAPTER 2. PEER–TO–PEER PROTOCOLS fusion (data aggregation). Data fusion works by combining data described by a specific attribute, which was received from different nodes. Data-centric routing may lead to problems related to implosion and overlap deficiencies [31]. The implosion problem is caused by multiple nodes sending the exact same data to their neighbours. The overlap problem is similar to implosion and appears when nodes covering different geographical areas send overlapping data. Data fusion attempts to solve both problems. P2P networks and sensor networks share similarities with ad-hoc routing. Further, P2P networks are becoming more data centric, in particular when they use DHTs. For example, in a DHT-based file sharing application the users are interested in which nodes host a specific file rather then being interested in what files a specific node is hosting. 10 Chapter 3 Protocol Descriptions 3.1 BitTorrent BitTorrent is a P2P protocol for content distribution and replication designed to quickly, efficiently and fairly replicate data [32, 33]. The BitTorrent system may be viewed as being comprised of two protocols and a set of resource metadata. The two protocols are used for communication among peers, and for the communication with a central network entity called the tracker. The metadata provides all the information needed for a peer to join in a BitTorrent distribution swarm and to verify correct reception of a resource. We use the following terminology for the rest of the report: a BitTorrent swarm refers to all the network entities partaking in a distribution of a specific resource. When we refer to the BitTorrent protocol or protocol in singular, we refer to the peer–peer protocol, while explicitly referring to the tracker protocol for the peer–tracker communication. The collection of protocols (peer, tracker and metadata) are referred to as the BitTorrent protocol suite or protocol suite. In contrast to many other P2P protocols such as eDonkey [34], DirectConnect [35], KaZaA [36], the BitTorrent protocol suite does not provide any resource query or lookup functionality. Nor does it provide any chat or messaging facilities. The protocols rather focus on fair and effective replication and distribution of data. The signaling is geared towards an efficient dissemination of data only. Fairness in the BitTorrent system is implemented by enforcing tit-for-tat exchange of content between peers. Non-uploading peers are only allowed to download very small amounts of data, making the download of a complete resource very time consuming if a peer does not share downloaded parts of the resource. With one exception, the protocols operate over Transport Control Protocol (TCP) and use swarming, i.e., peers simultaneously downloading parts, so-called pieces, of the content from several peers simultaneously. The rationale for this is that it is more efficient in terms of network load, as the load is shared across links between peers. This results in a more evenly distributed network utilization than conventional CS distribution systems such as, e.g., FTP or HTTP. The size of the pieces is fixed on a per-resource basis and cannot be changed. The default piece size is 218 bytes. The selection of an appropriate piece size is a fairly important issue. If the piece size is small, re-downloading a failed piece is fast, while the amount of extra data needed to describe all the data in the resource grows. Larger piece sizes means less metadata, but longer re-download times. 11 CHAPTER 3. PROTOCOL DESCRIPTIONS 3.1.1 BitTorrent Encoding BitTorrent uses a simple encoding scheme for most of its protocol messages and associated data. This encoding scheme is known as bencoding. The scheme allows for data structuring and type definition, and currently supports four data types: strings, integers, lists and dictionaries. strings Strings are encoded length-prefixed. The length should be given in base ten, and ASCII coded. The length should be followed by a colon, immediately followed by the specified number of characters as string data. Note that the string encoding does not nessecarily mean that the string data are humanly readable, i.e., in the printable ASCII range. Strings carry any valid 8-bit value, and are commonly used to carry binary data. Example: 3:BTH encodes the string “BTH”. integers Integers are encoded by enclosing a base ten ASCII coded numerical string by i and e. Negative numbers are accepted, but not leading zeroes, except in the case for the value 0 itself. Example: i23e encodes the integer 23. lists Lists are encoded by enclosing any valid bencoding type, including other lists, by l and e. More than one type is allowed. Example: l3:agei30ee encodes the string “age” and the integer 30. dictionaries Dictionaries are encoded by enclosing (key, value) pairs by d and e. The keys must be bencoded strings and the values may be any valid bencoding type, including other dictionaries. Example: d3:agei30e4:name5:james5likesl4:food5:drinkee encodes the structure: age: 30 name: james likes: {food, drink} 3.1.2 Resource Metadata A peer interested in downloading some content by using BitTorrent must first obtain a set of metadata, the so-called torrent file, to be able to join a set of peers engaging in the distribution of the specific content. The metadata needed to join a BitTorrent swarm consists of the network address information (in BitTorrent terminology called the announce URL) of the tracker and resource information such as file and piece size. The torrent file itself is a bencoded version of the associated meta information. An important part of the resource information is a set of Secure Hash Algorithm One (SHA-1) [37, 38] hash values1 , each value corresponding to a specific piece of the resource. These hash values are used to verify the correct reception of a piece. When rejoining a swarm, the client must recalculate the hash for each downloaded piece. This is a very intensive operation with regards to both CPU usage and disk I/O, which has resulted in certain alternative BitTorrent clients storing information regarding which pieces have been successfully downloaded within a specific field in the torrent file. A separate SHA-1 hash value, the info field, is also included in the metadata. This value is used as an identification of the current swarm, and the hash value appears in both the tracker 1 These are also known as message digests. 12 3.1. BITTORRENT and peer protocols. The value is obtained by hashing the entire metadata (sans the info-field itself). Of course, if a third-party client has added extra fields to the torrent file that may change intermittently, such as the resume data mentioned above, these should not be taken into account when calculating the info-field hash value. The metadata as defined by the original BitTorrent design does not contain any information regarding the peers participating in a swarm, though this information is added by some alternative clients to lessen strain on trackers when rejoining a swarm. 3.1.3 Network Entities and Protocols A BitTorrent swarm is composed of peers and at least one tracker. The peers are responsible for content distribution among each other. Peers locate other peers by communicating with the tracker, which keeps peer lists for each swarm. A swarm may continue to function even after the loss of the tracker, but no new peers are able to join. To be functional, the swarm initially needs at least one connected peer to have the entire content. These peers are denominated as seeds, while peers that do not have the entire content, i.e., downloading peers, are denominated as leechers. The BitTorrent protocols (except the metadata distribution protocol) are the tracker protocol and the peer protocol. The tracker protocol is either a HTTP-based protocol or a UDP-based compact protocol, while the peer protocol is a BitTorrent-specific binary protocol. Peer-to-tracker communication usually takes place using HTTP, with peers issuing HTTP GET requests and the tracker returning the results of the query in the returning HTTP response. The purpose of the peer request to the tracker is to locate other peers in the distribution swarm and to allow the tracker to record simple statistics of the swarm. The peer sends a request containing information about itself and some basic statistics to the tracker, which responds with a randomly selected subset of all peers engaged in the swarm. The Peer Protocol The peer protocol, also known as the peer wire protocol, operates over TCP, and uses in-band signaling. Signaling and data transfer are done in the form of a continuous bi-directional stream of length-prefixed protocol messages over a common TCP byte stream. A BitTorrent session is equivalent with a TCP session, and there are no protocol entities for tearing down a BitTorrent session beyond the TCP teardown itself. Connections between peers are single TCP sessions, carrying both data and signaling traffic. Once a TCP connection between two peers is established, the initiating peer sends a handshake message containing the peer id and info field hash (Figure 3.1). If the receiving peer replies with the corresponding information, the BitTorrent session is considered to be opened and the peers start exchanging messages across the TCP streams. Otherwise, the TCP connection is closed. Immediately following the handshake procedure, each peer sends information about the pieces of the resource it possesses. This is done only once, and only by using the first message after the handshake. The information is sent in a bitfield message, consisting of a stream of bits, with each bit index corresponding to a piece index. The BitTorrent peer wire protocol has the following protocol messages: piece The only payload-related protocol message. The message contains one sub13 CHAPTER 3. PROTOCOL DESCRIPTIONS Peer A Peer B info info,peer_id B peer_id A bitfield exchange message exchange Figure 3.1: BitTorrent handshake procedure. piece. request The request-message is the method a peer wishing to download uses to notify the sending peer what subpieces is desired. cancel If a peer has previously sent a request message, this message may be used to withdraw the request before it has been serviced. Mostly used during end-game mode2 . interested This message is sent by a peer to another peer to notify it that the former intends to download some data. See Section 3.1.4 for a description of this and the following three messages. not interested This is the negation of the previous message. It is sent when a peer no longer wants to download. choke This message is send by a data transmitting peer to notify the receiving peer that it will no longer be allowed to download. unchoke The negation of the previous message. Sent by a transmitting peer to a peer that has previously sent an interested message to the former. have After a completed download, the peer sends this message to all its connected peers to notify them of which parts of the data are available from the peer. bitfield Only sent during the initial BitTorrent handshake, and is then exchanged between the connecting peers. Contains a bitfield indicating which pieces the peer has. keepalive Empty message, to keep a connection alive. The Tracker Protocol The tracker is accessed by HTTP or HTTPS GET requests. The default listening port is 6969. The tracker address, port and top-level directory are specified in the announce url field in the torrent file for a specific swarm. 2 End-game mode occurs when a peer only has very few pieces left to download. The peer requests these pieces from all connected peers, and downloads from whoever answers the quickest, and cancels the rest of the requests. 14 3.1. BITTORRENT Tracker queries Tracker queries are encoded as part of the GET URL, in which binary data such as the info hash and peer id are escaped as described in RFC1738 [39]. The query is added to the base URL by appending a questionmark, ?, as described in RFC2396 [40]. The query itself is a sequence of parameter=value pairs, separated by ampersands, &, and possibly escaped. An example request is given in Figure 3.2. GET /announce?info_hash=n%05hV%A9%BA%20%FC%29%12%1Ap%D4%12%5D%E6U%0A%85%E1&\ peer_id=M3-4-2--d0241ecc3a07&port=6881&key=0fcca260&uploaded=0&downloaded=0&\ left=663459840&compact=1&event=started HTTP/1.0 Figure 3.2: Example announce GET request. Each announce request must include the following parameters: info_hash The SHA-1 hash of the value contained in the info field in the torrent file. peer_id A 20-byte string to uniquely identify the requesting peer. There is no concensus regarding the generation of this value, but several distinct types of ID-generation have appeared that may be used to identify which client a peer is running. There is some disagreement between the official protocol description [41] and the Wiki [33]. The original specification states that this field most likely will have to be URL escaped, while the other claims that it must not be escaped. port The listening port of the client. The default port range for the reference client is 6881–6889. Each active swarm needs a separate port in the default client, but third party clients have implemented single-port functionality. uploaded The total number of bytes uploaded to all peers in the swarm, encoded in base ten ASCII. The specification does not state whether this takes into account re-transmits or not. downloaded The total number of bytes downloaded from all peers in the swarm, encoded in base ten ASCII. The specification does not state whether this takes into account re-transmits or not. left The total number of bytes left to download, also encoded in base ten ASCII. The following parameters may optionally be included: compact If set to 1, the tracker response will not be a proper bencoded datum as described below, but rather a binary list of peer addresses and ports. This list is encoded as a six-byte datum for each peer, in which the first six bytes are the IP address of the peer, and the last two bytes are the peer’s listening port. This saves quite a bit of bandwidth, but is only usable in an IPv4 environment. numwant Specifies the number of peers that the requesting peer is requesting from the tracker. event May be one of: started The first request to the tracker, must include this parameter– value pair. 15 CHAPTER 3. PROTOCOL DESCRIPTIONS stopped If shutting down, this should be specified to indicate graceful shutdown. completed Included to notify the tracker once a download is complete, and should not be included when joining a swarm with the full content. key Used as session identifier. Tracker replies The tracker HTTP response, unless the compact parameter is 1, is a bencoded dictionary with the following fields: interval Indicates the number of seconds between subsequent requests to the tracker. complete Number of seeds in the swarm. incomplete Number of leechers in the swarm. peers Contains a list of dictionaries. Each dictionary in this list has the following keys: peer id The peer id paramater that the peer has reported to the tracker. ip IP address or DNS name of the peer. port Listening port of the peer. If the request fails for some reason, the dictionary only contains a failure reason-key, which contains a string indicating the reason for the failed request. Tracker UDP protocol extension To lower the bandwidth usage for heavily loaded trackers, a UDP-based tracker protocol has been proposed [42]. The UDP tracker protocol is not part of the official BitTorrent specification, but has been implemented in some of the third-party clients and trackers. Compared to the standard HTTP-based protocol, the UDP protocol uses about 50 % less bandwidth. It also has the advantage of being stateless, as opposed to the stateful TCP connections required by the HTTP scheme, which means that a tracker is less likely to run out of resources due to things like half-open TCP-connections. The scrape convention Web scraping is the name for the procedure of parsing a web page to extract information from it. In BitTorrent, trackers are at liberty to implement functionality to allow peers to request information regarding a specific swarm without resorting to error-prone web-scraping techniques. If the last name in the announce url, i.e. the name after the last /-character is announce, then the tracker supports scraping by using the announce url with the name announce replaced by scrape. The scrape request may contain a info hash parameter, as shown in Figure 3.3, or be completely without parameters. 16 3.1. BITTORRENT GET /scrape?info_hash=n%05hV%A9%BA%20%FC)%12%1Ap%D4%12%5D%E6U%0A%85%E1 HTTP/1.0 Figure 3.3: Example scrape GET request. The tracker will respond with a bencoded dictionary containing information about all files that the tracker is currently tracking. The dictionary has a single key, files, whose value is another dictionary whose keys are the 20-bit binary info hash values of the torrents on the specific tracker. Each value of these keys contains another dictionary with the following fields: complete Number of seeds in the swarm. downloaded Number of registered complete-events for the swarm. incomplete Number of leechers in the swarm. name This optional field contains the name of the file as defined in the name-field in the torrent file. 3.1.4 Peer States A peer maintains two states for each peer relationship. These states are known as the interested and choked states. The interested state is imposed by the requesting peer on the serving peer, while for the case of the choked state the opposite is true. If a peer is being choked, then it will not be sent any data by the serving peer until unchoking occurs. Thus, unchoking is usually equivalent with uploading. The interested state indicates whether other peers have parts of the sought content. Interest should be expressed explicitly, as should lack of interest. That means that a peer wishing to download notifies the sending peer (where the sought data is) by sending an interested message, and as soon as the peer no longer needs any other data, a not interested message is issued. Similarly, for a peer to be allowed to download, it must have received an unchoke message from the sending peer. Once a peer receives a choke message, it will no longer be allowed to download. This allows the sending peer to keep track of the peers that are likely to immediately start downloading when unchoked. A new connection starts out choked and not interested, and a peer with all data, i.e., a seed, is never interested. In addition to the two states described above, some clients add a third state – the snubbed state. A peer relationship enters this state when a peer purports that it is going to send a specific sub-piece, but fails to do so before a timeout occurs (typically 60 seconds). The local peer then considers itself snubbed by the non-cooperating peer, and will not consider sub-pieces requested from this peer to be requested at all. 3.1.5 Sharing Fairness and Bootstrapping The choke/unchoke and interested/not interested mechanisms provides fairness in the BitTorrent protocol. As it is the transmitting peer that decides whether to allow a download or not, peers not sharing content will be reciprocated in the same manner. To allow peers that have no content to join the swarm and start sharing, a mechanism called optimistic unchoking is employed. Optimistic unchoking means that from time to time, a peer with content will allow even a non-sharing peer to download. This will allow the peer to share the small portions of data received and thus enter into a data exchange with other peer. 17 CHAPTER 3. PROTOCOL DESCRIPTIONS Peer A Peer B interested Peer C interested request(piece,subpiece) request(piece,subpiece) request(piece,subpiece) request(piece,subpiece) unchoke unchoke piece(subpiece) piece(subpiece) piece(subpiece) piece(subpiece) have have Figure 3.4: BitTorrent handshake procedure. This means that while sharing resources is not strictly enforced it is strongly encouraged. It also means that peers that have not been able to configure their firewalls and/or Network Address Translation (NAT) routers properly will only be able to download the pieces altrustically shared by peers through the optimistic unchoking scheme. 3.1.6 Data Transfer Data transfer is done in parts of a piece (called sub-piece, block or chunk ) at a time, by issuing a request message. Sub-piece sizes are typically of size 16384 or 32768 bytes. To allow TCP to increase throughput, several requests are usually sent back-to-back. Each request should result in the corresponding sub-piece to be transmitted. If the sub-piece is not received within a certain time (typically one minute), the non-transmitting peer is snubbed, i.e., it is punished by not being allowed to download, even if unchoked. Data transfer is done by sending a piece message, which contains the requested sub-piece (Figure 3.4). Once the entire piece, i.e., all sub-pieces, has been received, and the SHA-1 hash of the piece has been verified, a have message is sent to all connected peers. The have message allows other peers in the swarm to update their internal information on which pieces are available from which peers. End-game mode When a peer is approaching completion of the download, it sends out requests for the remaining data to all currently connected peers to quickly finish the download. This is known as the endgame mode. Once a requested subpiece is received, the peer sends out cancel-messages to all peers that have not yet sent the requested data. Without the end-game mode, there is a tendency for peers to download the final pieces from the same peer, which may be on a slow link [41]. 18 3.1. BITTORRENT 3.1.7 BitTorrent Performance Issues Even though BitTorrent has become very popular among home users, and widely deployed in corporate environments, there are still some issues currently being addressed for the next version of BitTorrent. The most pressing issue is the load on the central tracker authority. There are two main problems related to the tracker: peak load and redundancy. Many trackers also handle more than a single swarm. The most popular trackers handle several hundred swarms simultaneously. It is not uncommon for popular swarms to contain hundreds or even thousands of peers. Each of these peers connect to the tracker every 30 minutes by default to request new peers and provide transfer statistics. An initial peer request to the tracker results in about 2-3 kB of response data. If these requests are evenly spread out temporally, the tracker can usually handle the load. However, if a particularly desired resource is made available, this may severely strain the tracker, as it will be subject to a mass accumulation of connections akin to a distributed denial of service attack by requesting peers. This is also known as the flash-crowd effect [43]. It is imperative for a swarm to have a functioning tracker if the swarm is to gain new peers since, without the tracker, new peers have no location to receive new peer addresses. Tracker redundancy is currently being explored and two alternatives are studied: backup trackers and distributing the tracking functionality in the swarm itself. An extension exists to the current protocol that adds a field, announce-list, to the metadata, which contains URLs to alternate trackers. No good way of distributing the tracking in the swarm has yet been found, but a network of distributed trackers has been proposed. Proposals of peers sending their currently connected peers to each other have also cropped up, but again, no consensus has been agreed on. Additionally, DHT functionality has been implemented in third party clients to address this problem [44]. A beta version of the reference client also has support for DHT functionality. Another important problem is the initial sharing delay problem. If a torrent has large piece sizes, e.g., larger than 2 MB, the time before a peer has downloaded an entire piece and can start sharing the piece can be quite substantial. It would be preferable to have the ability to have varying verification granularities for the data in the swarm, so that a downloading peer does not have to wait for an entire piece to begin calculating the hashes of the data. One way to do this would be to use a mechanism known as Merkle trees [45], which allow for varying granularity. By using this mechanism, a peer may start sharing after having downloaded only a small amount of the data (on about the same order as the subpiece sizes). 3.1.8 Super Seeding When a swarm is fairly new, i.e., there are few seeds in the swarm and peers have little of the shared resource, it makes sense to try to evenly distribute the pieces of the content to the downloading peers. This will speed up the dissemination of the entire content in the swarm. A normal seed would announce itself as having all pieces during the initial handshaking procedure, thus leaving the piece selection up to the downloading peer. Seeds have usually been in the swarm longer. This means that they are likely to have a better view on which pieces are the most rare in the swarm, and thus most suitable to be first inserted. As soon as peers start receiving the rare pieces, other peers can download them from other peers instead of seeds. This further balances the load in and increases the performance of the swarm. A seed that employs super seeding does not advertise having any pieces at all during handshake. As peers connect to the in effect hidden seed, it instead sends have-messages on a per-peer basis to entice specific peers to download a particular piece. 19 CHAPTER 3. PROTOCOL DESCRIPTIONS This mechanism is most effective in new swarms, or when there is a high peer-to-seed ratio and the peers have little data. It is not recommended for everyday use. As certain peers might have heuristics governing which swarms to be part of, a swarm containing only super seeds might be discarded. This is because peers cannot detect the super seed as a seed, thus assuming that the swarm is unseeded. This decreases the overall performance of the swarm. 3.2 Gnutella Gnutella is a decentralized P2P system. Participants can share any type of resources, although the currently available specification covers only file resources. The first “official” Gnutella protocol was version 0.4 [4]. Soon, Gnutella version 0.6 [46] was released with improvements based on the lessons learned from version 0.4. The protocol is easily extendable, which has lead to a variety of proprietary and non-proprietary extensions (e.g., Ultrapeers and the Query Routing Protocol (QRP)). For a while, the two protocol versions lived side by side and improvements were merged from the v0.6 line into the legacy v0.4 line. However, it seems that July 1st 2003 was sort of a “flag day” when Gnutella v0.4 peers were blocked from the network3 . The activities of Gnutella peers can be divided into two main categories: signaling and user data exchange (further referred to as data exchange). The signaling activities are concerned with discovering the network topology and locating resources. Data exchange occurs when a peer has localized a resource of interest (e.g., a document file). The peer downloads files over direct HTTP connections. 3.2.1 Ultrapeers and Leaf Nodes Initially, the Gnutella network (referred to as the Gnet from now on) was non-hierarchical. However, experience has shown that the abundance of signaling was a major threat to the scalability of the network [5]. Limewire (a company promoting an enhanced Gnutella servent4 ) suggested the introduction of a two-level hierarchy: ultrapeers (UPs) and leaf nodes (LNs). UPs are faster nodes in the sense that they are connected to high-capacity links and have a large amount of CPU power available. LNs maintain a single connection to their ultrapeer. A UP maintains 10-100 connections, one for each LN and 1-10 connections to other UPs [47]. The UPs do signaling on behalf of the LNs thus shielding them from large volumes of signaling traffic. A UP does not necessarily have leaf-nodes – it can work standalone. Some servents may not be capable to become leaf nodes or ultrapeers for various reasons (e.g., they lack required functionality). In this case, they are labeled legacy nodes. In order to improve the overall scalability of the Gnet and to preserve bandwidth, UPs and LNs may refuse to connect to legacy nodes. According to the Gnutella Development Forum (GDF) mailing list, the Gnutella community has recently adopted what is called support for high outdegree5 . This implies that UPs maintain at least 32 connections to other UPs and 100–300 connections to different leaf nodes. LNs are recommended to maintain approximately connections to UPs. The numbers may differ slightly between different Gnutella vendors. The claim is that high-outdegree support allows a peer to 3 This was discovered in the source code for gtk-gnutella-0.92. The software checks if the current date is later than July 1 2003. If true, it disables Gnutella v0.4 signaling. 4 Servent denotes a software entity that acts both as a client or as a server. The name is a combination of the words SERVer and cliENT. 5 First mentioned in [48]. 20 3.2. GNUTELLA connect to the majority of Gnet peers in four hops or less. 3.2.2 Peer Signaling Peer signaling can be divided into the following categories: peer discovery, resource query, ultrapeer routing and miscellaneous signaling. Peer discovery is done mainly through the use of Gnutella Web Cache (GWC) servers and PING and PONG messages. Query signaling consists of QUERY and QUERY HIT messages. Ultrapeer routing can employ various schemes but the recommended one is the QRP. Ultrapeers signal among themselvs using PING and PONG messages. Finally, there are some miscellaneous messages flowing in the Gnet such as PUSH, Horizon Size Estimation Protocol (HSEP) or other messages based on proprietary Gnutella extensions. 3.2.3 Peer Discovery A Gnutella node that wants to join the overlay must first have information about the listening socket6 of at least another peer that is already member of the overlay. This is referred to as the bootstrap problem. The old way to solve the bootstrap problem was to visit a web site that published up-to-date lists of known peers. The first step was to select one of the peers listed on the page, cut-and-paste its address (i.e., the listening socket) from the Web browser into the Gnutella servent and try to open a connection to it. This process would continue until at least one connection was successfully opened. At this point the PING–PONG traffic would, hopefully, reveal more peers to which the servent could connect. The addresses of newly found peers were cached in the local hostcache and reused when the servent application was restarted. Since peers in general have a short life span [49] (i.e., they enter and leave the network very often) the hostcache kept by each node often got outdated. Gnutella Web Cache (GWC) servers7 try to solve this problem. Each GWC server is essentially an HTTP server serving a list of active peers with associated listening sockets. The Web page is typically rendered by a Common Gateway Interface (CGI) script or Java servlet, which is also capable of updating the list contents. UPs update the list continuously, ensuring that new peers can always join the overlay. A list of available GWC servers is maintained at the main GWebCache web site. This list contains only GWC servers that have elected to register themselves. Unofficial GWC servers exist as well. New Gnutella peers implement the following bootstrap procedure: upon start they connect to the main GWebCache Web site, obtain the list of GWC systems, try to connect to a number of them, and finally end up building their own hostcache. Alternatively, the node can connect to an unofficial GWC system or connect directly to a node in the Gnet. The last option requires a priori knowledge about the listening socket of a Gnet node. Recently, it was observed that GWC servers were becoming overloaded. There appeared to be two reasons behind the heavy load: an increase in the number of GWC-capable servents and the appearance of a large number of misbehaving servents. The UDP Host Cache (UHC) protocol was suggested as a way to alleviate the problem. The protocol works as a distributed bootstrap system, transforming UHC-enabled servents into GWC servers [50]. 6 By socket, we refer to the tuple <host address, protocol, port>. abbreviated as GWebCache servers. 7 Also 21 CHAPTER 3. PROTOCOL DESCRIPTIONS 3.2.4 Signaling Connection Establishment Assuming a Gnutella servent has obtained the socket address (i.e., the IP address and port pair) of a peer, it will attempt to establish a full-duplex TCP connection. The explanation below will use typical TCP terminology calling the servent that has done the TCP active open client and its peer server. Once the TCP connection is in place, a handshaking procedure takes place between the client and the server: 1. The client sends the string GNUTELLA CONNECT/0.6<CR><LF> where <CR> is the ASCII code for carriage return and <LF> is the ASCII code for line feed. 2. The client sends all capability headers in a format similar to HTTP and ends with <CR><LF> on an empty line, e.g., User-Agent: BearShare/1.0<CR><LF> X-Ultrapeer: True<CR><LF> Pong-Caching: 0.1<CR><LF> <CR><LF> 3. The server responds with the string GNUTELLA/0.6 <status-code><status-string><CR> <LF>. The <status-code> follows the HTTP specification with code 200 meaning success. The <status-string> is a short human readable description of the status code (e.g., when the code is 200 the string will typically be set to OK). 4. The server sends all capability headers as described in step 2. 5. The client parses the server response to compute the smallest set of common capabilities available. If the client still wishes to connect, it will send GNUTELLA/0.6 <status-code> <status-string><CR><LF> to the server with the <status-code> set to 200. If the capabilities do not match, the client will set the <status-code> to an error code and close the TCP connection. If the handshake is successful, the client and the server start exchanging binary Gnutella messages over the existing TCP connection. The existing TCP connection lasts until one of the peers decides to terminate the session. At that point the peer ending the connection has the opportunity to send an optional Gnutella BYE message. Then the peer closes the TCP connection. Modern servents include a X-Try header in their response if they reject a connection. The header contains a list of socket addresses to recently active servents, to which the other peer can try to connect. The purpose of the X-Try header is to increase connectivity and reduce the need to contact a GWC server. 3.2.5 Compressed Message Streams If the capability set used by the peers includes stream compression then all data on the TCP connection, with the exception of the initial handshake, is compressed [51]. The type of compression algorithm can be selected in the capability header, but the currently supported algorithm is deflate, which is implemented in zlib [52]. 3.2.6 Gnutella Message Headers Each Gnutella message starts with a generic header that contains the following: 22 3.2. GNUTELLA • Message ID/GUID (Globally Unique ID) to uniquely identify messages on Gnet. Leaving out some details, the GUID is a mixture of the node’s Ethernet MAC address and a timestamp [53]. • Payload type code that identifies the type of Gnutella message (e.g., PONG messages have payload type 0x01). • Time-To-Live (TTL) to limit the signaling radius and its adverse impact on the network. Messages with TTL > 15 are dropped8 . • Hop count to inform receiving peers how far the message has traveled (in hops). • Payload length to describe the total length of the message following this header. The next generic Gnutella message header is located exactly this number of bytes from the end of this header. The generic Gnutella header is followed by the actual message which may have its own headers. Also, the message may contain vendor extensions. Vendor extensions are used when a specific type of servent wants to implement experimental functionality not covered by the standard specifications. The vendor extensions should be implemented using Gnutella Generic Extension Protocol (GGEP) [54], since the protocol provides a transparent way for regular servents to interact with the vendor servents. 3.2.7 PING–ONG Messages Each successfully connected pair of peers starts periodically sending PING messages to each other. The receiver of the PING message decrements the TTL in the Gnutella header. If the TTL is greater than zero the node increments the hop counter in the message header and then forwards the message to all its directly connected peers, with the exception of the one from where the message came. Note that PING messages do not carry any user data (not even the sender’s listening socket). This means that the payload length field in the Gnutella header is set to zero. PONG messages are sent only in response to PING messages. More than one PONG message can be sent in response to one PING. The PONG messages are returned on the reverse path used by the corresponding PING message. Each PONG message contains detailed information about one active Gnutella peer. It also contains the same GUID as the PING message that triggered it. The PONG receiver can, optionally, attempt to connect to the peer described in the message. UPs use the same scheme, however they do not forward PINGs and PONGs to/from the LNs attached to them. 3.2.8 QUERY and QUERY_HIT Messages A Gnutella peer wishing to locate some specific resource (e.g., file) must assemble a QUERY message. The message describes the desired resource using a text string. For a file resource this is the file name. In addition, the minimum speed (i.e., upload rate) of servents that should respond to this message is specified as well. There may be additional extensions attached to the message (e.g., proprietary extensions) but those are outside the scope of this document. In Gnutella v0.4, the QUERY message is sent to all peers located one hop away, over the signaling connections established during the handshake. Peers receiving a QUERY message forward it to all directly connected peers unless the TTL field indicates otherwise. 8 Nodes that support high outdegree will drop messages with TTL > 4. 23 CHAPTER 3. PROTOCOL DESCRIPTIONS The newer Gnutella v0.6 attempts to alleviate the problems of the previous version by introducing a form of selective forwarding called dynamic query [48]. A dynamic query first probes how popular the targeted content is. This is done by using a low TTL value in the QUERY message that is sent to a very limited number of directly connected peers. A large number of replies indicate popular content, whereas a low number of replies imply rare content. For rare content, the QUERY TTL value and the number of directly connected peers receiving the message are gradually increased. This procedure is repeated until enough results are received or until a theoretical limit of the number of QUERY message receivers is reached. This form of resource discovery requires all LNs to rely on UPs for their queries (i.e., LNs do not perform dynamic queries). If a peer that has received the QUERY message is able to serve the resource, it should respond with a QUERY HIT message. The GUID for the QUERY HIT message must be the same as the one in the QUERY message that has triggered the response. The QUERY HIT message lists each resource name that matches the resource description from the QUERY message9 along with the resource size in bytes and other information. In addition, the QUERY HIT messages contain the listening socket which should be used by the message receiver when it wants to download the resource. The Gnutella specification discourages the use of messages with sizes greater than 4 kB. Consequently, several QUERY HIT messages may be issued by the same servent in response to a QUERY message. The QUERY HIT receiver must establish a direct HTTP connection to the listening socket described by the message (Section 3.2.13) in order to download the data. If the QUERY HIT sender (i.e., the resource owner) is behind a firewall, incoming connections will typically not be accepted. To work around this problem, when a firewall is detected, the downloader must send a PUSH message over the signaling connection. The message will be routed in reverse direction along the path taken by the received QUERY HIT message. The resource owner can use the information in the PUSH message to establish a TCP connection to the downloader. The downloader can then use the HTTP GET method to retrieve the resource. For details, see Section 3.2.10. Some servents use the metadata extension mechanism [55] to allow for richer queries. The idea is that metadata (e.g., author, genre, publisher) is associated with files shared by a servent. Other servents can query those files not only by file name, but also by the metadata fields. 3.2.9 Query Routing Protocol The mission of ultrapeers is to reduce the burden put on the network by peer signaling. They achieve this goal by eliminating PING messages among leaf nodes and by employing query routing. There are various schemes for ultrapeer query routing but the recommended scheme the QRP [56]. Ultrapeers signal among themselves by using PING and PONG messages. QRP [56] was introduced in order to mitigate the adverse effects of flooding used by the Gnutella file queries and is based on a modified version of Bloom filters [57]. The idea is to break a query into individual keywords and have a hash function applied to each keyword. Given a keyword, the hash function returns an index to an element in a finite discrete vector. Each entry in the vector is the minimum distance expressed in hops to a peer holding a resource that matches the keyword in the query. Queries are forwared only to leaf nodes that have resources that match all the keywords. This substantially limits the bandwith used by queries. Peers run the hash algorithm over the resources they share and exchange the routing tables (i.e., hop vectors) at regular intervals. 9 For example the string linux could identify a resource called linux_redhat_7.0.iso as well as a resource called linux_installation_guide.txt.gz. Thus, this query would yield two potential results. Both results will be returned to the QUERY sender. 24 3.2. GNUTELLA Individual peers (legacy or ultrapeer nodes) may run QRP and exchange routing tables among themselves [58]. However, the typical scenario is that legacy nodes do not use QRP, leaf nodes send route table updates only to ultrapeers, and ultrapeers propagate these tables only to directly connected ultrapeers. 3.2.10 PUSH Messages PUSH messages are used by peers that want to download resources from peers located behind firewalls that prevent incoming TCP connections. The downloader sends a PUSH message over the existing TCP connection, which was setup during the handshake phase. The PUSH message contains the listening socket of the sender. The host behind the firewall can then attempt to establish a TCP connection to the listening socket described in the message. If the TCP connection is established successfully, the host behind the firewall sends the following string over the signaling connection: GIV <File Index>:<Servent Identifier>/<File Name><LF><LF> The <File Index> and <Servent Identifier> are the values found in the corresponding PUSH message and <File Name> is the name of the resource requested. Upon the receipt of the message the receiver issues an HTTP GET request on the newly established TCP connection. GET /get/<File Index>/<File Name> \ HTTP/1.1<CR><LF> User-Agent: Gnutella<CR><LF> Connection: Keep-Alive Range: bytes=0-<CR><LF> <CR><LF> 3.2.11 BYE Messages The BYE message is an optional message used when a peer wants to inform its neighbours that it will close the signaling connection. The message contains an error code along with an error string. The message is sent only to hosts that have indicated during handshake that they support BYE messages. 3.2.12 Horizon Size Estimation Protocol Messages The Horizon Size Estimation Protocol (HSEP) is used to obtain estimates on the number of reachable resources (i.e., nodes, shared files and shared kilobytes of data) [59]. Hosts that support HSEP announce this as part of the capabiliy set exchange during the Gnutella handshake. If the hosts on each side of a connection support HSEP, they start exchanging HSEP message approximately every 30 seconds. The HSEP message consists of n_max triples. Each triple describes the number of nodes, files and kilobytes of data estimated at the corresponding number of hops from the node sending the message. The n_max values is the maximum number of hops supported by the protocol with 10 hops being the recommended value [59]. The horizon size estimation can be used to quantify the quality of a connection: the higher the number of reachable resources, the higher the quality of the connection. 25 CHAPTER 3. PROTOCOL DESCRIPTIONS 3.2.13 Data Exchange (File Transfer) Data exchange takes place over a direct HTTP connection between a pair of peers. Both HTTP 1.0 and HTTP 1.1 are supported but use of HTTP 1.1 is strongly recommended. Most notably, the use of features such as persist connection and range request is encouraged. The range request allows a peer to continue an unfinished transfer from where it left off. Furthermore, it allows servents to utilize swarming, which is the technique to retrieve different parts of the file from different peers. Swarming is not part of the Gnutella protocol and regular Gnutella servents (i.e., servents that do not explicitly support swarming) can be engaged in swarming without being aware of it. From their point of view, a peer is requesting a range of bytes for a particular resource. The intelligence is located at the peer downloading data. The persist connection feature is useful for swarming. It allows a peer to make several requests for different byte ranges in a file, over the same HTTP connection. Fig. 3.5 shows a simple Gnet scenario, involving three legacy peers. It is assumed that Peer A has obtained the listening socket of Peer B from a GWC server. Using the socket descriptor, Peer A attempts to connect to Peer B. In this example, Peer B already has a signaling connection to Peer C. Peer A Peer B Peer C GNUTELLA CONN ECT/0.6 6 200 OK GNUTELLA/0. GNUTELLA/0.6 200 OK PING PING PONG PONG PING PONG PING PONG PONG QUERY QUERY QUERY HIT QUERY HIT HTTP GET HTTP response TCP connection Gnutella message over established TCP connection Separate HTTP connection Figure 3.5: Example of a Gnutella session. 26 3.2. GNUTELLA The first three messages between Peer A and Peer B illustrate the establishment of the signaling connection between the two peers. The two peers may exchange capabilities during this phase as well. The next phase encompasses the exchange of network topology information with the help of PING and PONG messages. The messages are sent over the TCP connection established previously (i.e., during the peer handshake). It is observed that PING messages are forwarded by Peer B from Peer A to Peer C in both directions as well as that PONG messages follow the reverse path taken by the corresponding PING message. At a later time the Peer A sends a QUERY message, which is forwarded by Peer B to Peer C. In this example, only Peer C is able to serve the resource, which is illustrated by the QUERY HIT message. The QUERY and QUERY HIT messages use the existing TCP connection, just like the PING and PONG messages. Again, it is observed that the QUERY HIT message follows the reverse path taken by the corresponding QUERY message. Finally, Peer A opens a direct HTTP connection to Peer C and downloads the resource by using the HTTP GET method. The resource contents are returned in the HTTP response message. The exchange of PING–PONG and QUERY–QUERY HIT messages continues until one of the peers tears down the TCP connection. A Gnutella BYE message may be sent as notification that the signaling connection will be closed. 3.2.14 Other Features Gnutella has support for many other featues which are important, but not whithin the scope of this document. The remainder of this section will present some of these features briefly. Hash/URN Gnutella Extensions The Hash/URN Gnutella Extensions (HUGE) specification [60] provides a way to identify files by Uniform Resource Names (URN) and in particular by SHA-1 hash values. The advantages of using HUGE and SHA-1 is that files with same content but different names can be discovered through the QUERY–QUERY HIT mechanism and that the file integrity can be checked upon download by recomputing the SHA-1 hash value. File Magnets and Magma Lists Building on HUGE, file magnets represent the bridge between the Web and P2P networks. Web pages can include special URL links (file magnets), which encode URNs to resources available on the P2P network. When a user clicks on such a link, the web browser will transfer the URN to the local Gnutella servent, which will perform a query on Gnet. A Magma list is a list of file magnets, e.g., the favourite documents, music or pictures shared by a Gnutella user. Download Mesh In order to speed up file downloads and to distribute the load among servents, when a peer sends a QUERY HIT message it includes a list of peers that are known to have the same file (i.e., 27 CHAPTER 3. PROTOCOL DESCRIPTIONS the download mesh). The simplest way a servent can build such a list is to remember the list it obtained itself when it downloaded the file. Download meshes require support for the HUGE extension. The main benefit of using a download mesh is the ability to perform efficient swarming, i.e., to be able to quickly download simultaneously from several locations. Partial File Sharing Partial File Sharing (PFS) is an optimization of swarming and download meshes. Servents that support PFS do not wait to download the whole file before replying to matching QUERY messages. If a servent requests the file before its download has completed, the servent that has the partial file will set the Content-Range header in the HTTP reply informing the other peer about the amount of available data. Passive/Active Remote Queueing Most servents limit the number of uploads that can occur simultaneously, in order to preserve bandwidth. When Passive/Active Remote Queueing (PARQ) is used, download requests are queued at the servent hosting the file [61]. The specification allows servents to check their place in the download queue. It allows hosts to temporarily (maximum 5 minutes) become unavailable. This feature can come in handy if the servent crashes or temporarily looses its Internet connection. Firewall-to-Firewall aka. Reliable UDP File Transfer Firewall-to-Firewall (F2F) allows two firewalled hosts, both of them connected to Internet through NAT servers, to transfer files among themselves over UDP [62]. The technique to open the UDP ports in the firewall is known as “UDP hole punching” and is documented fairly well in [63]. LAN Multicast Recently, a specification was made available which allows servents located on the same LAN to take advantage of IP multicast to transfer files [64]. The advantages of LAN multicast is that file transfers are more efficient due to IP multicast and generally much faster due to higher bandwitdh, lower latency and lower number of hops. 28 Chapter 4 Measurement Software Description Traffic measurements have been used within the networking research community for a very long time, going back to the early teletraffic researchers, such as Conny Palm and Engset [65]. Williamson [66] identifies four main reasons for the usefulness of network traffic measurements: network troubleshooting, protocol debugging, workload characterization and performance evaluation. For the present work, only the latter two are considered, with an emphasis on workload characterization. When deciding on making measurements, there are two avenues from which to choose: active or passive measurements. Active measurements entail actively probing a network with either artificially generated traffic or having a node join in the network as an active participant. Probing with artificial traffic is analogous to system identification using impulses in, e.g., vibration experiments or acoustical environments. A passive measurement is one where the network is silently monitored without any intrusion. 4.1 Passive measurements Passive measurements are commonly used when data on “real” networks are desired, for instance to use in trace-driven simulations, model validation or bottleneck identification. Essentially, this technique observes a live network without interacting with it. Depending on the level of accuracy desired, different measurement options are available. For coarse-grained measurements, on a timescale the order of seconds, there is the possibility of using SNMP and RMON to gather information from networking hardware. This is usually used as part of normal network operations, and not very useful for protocol evaluations and per-flow performance evaluation. Per-flow information is available in, e.g., Cisco’s NetFlow [67] , but again, no packet inspection is available. Finer-grained measurements with full packet inspection capabilities are available in both hardware and software configurations. Software configurations provide measurement accuracies in tens of microseconds, while dedicated measurement hardware gives nanosecond accuracy. There are two main approaches to perform passive application layer measurements for network traffic. In the first approach, called application logging, the traffic is measured by the application itself. The other approach is to obtain measurements indirectly, by monitoring traffic at the link layer and performing application flow reassembly using a specially designed application. This 29 CHAPTER 4. MEASUREMENT SOFTWARE DESCRIPTION approach is referred to as flow reassembly. A mixture of these two approaches is possible as well. 4.1.1 Application Logging Unless already supported, application logging requires changes in the application software to record incoming and outgoing network data and other events of interest, e.g., transitions between application states, failures, CPU and memory usage etc. This implies modifying the application source code. In the case of open source software these changes can be performed rather straightforwardly. However, for closed source software one would need to negotiate an agreement with the vendor to obtain and modify the source code. The advantage of application logging is that measurement data is readily available from within the application. Measurement code embedded at relevant places in the application can continuously monitor all variables of interest. On the other hand, the main disadvantage associated with this method (apart from the licensing issue discussed above) is related to timestamp accuracy. The accuracy of the timestamp is affected by three main factors: drift in the frequency of the crystal controlling the system clock, latency in the Operating System (OS) and latency in the network stack. The frequency drift of the crystal is due to temperature changes and age. Its influence on the timestamp is rather small when compared to the other two factors. Latency in the OS refers to the delay between the time when a user-space process requests a timestamp from the operating system and the time when the timestamp is available to the process. The delay is largely accounted for by scheduling in the kernel when the calling process is temporarily preempted by other processes. The problem becomes increasingly worse for interpreted programs, e.g., the reference BitTorrent client, which is a python script. In this case the timestamps are subject to additional scheduling imposed by the interpreter. A significant amount of queueing and scheduling occurs in the TCP/IP stack as well, especially in the routines for IP and TCP reassembly. The effect is that timestamps at the application layer are only indicative for the actual time when packets enter or leave the link-layer at the node in question. 4.1.2 Flow Reassembly The flow reassembly method attempts to address some of these problems by moving the measurements closer to the network. Link-layer measurements have enjoyed a long tradition in the network community. However, since the interest is now moving towards what happens at the application layer, one needs to develop dedicated software able to decode application layer messages from the observed link-layer traffic, essentially replicating parts of the application of interest. Flow reassembly involves three stages: link-layer capture, transport stream reassembly, e.g., TCP reassembly, and application message decoding. A plethora of link-layer capture software is available under very liberal licenses on the Internet, e.g., tcpdump and ethereal [68, 69]. The common denominator for this software is that in a shared-medium LAN such as Ethernet, the capturing software forces the network interface to work in promiscuous mode, thus enabling it to monitor all traffic in the LAN. An issue to consider carefully when selecting capture software, is the timestamping operation. The operation should be performed as close as possible to the place where the frame is read from the network card (if possible in the network driver or on the card itself). Failure to do so may lead to similar inaccuracies as in the case of application logging. 30 4.2. NETWORK INFRASTRUCTURE Transport stream reassembly deals with missing or duplicate packets and with packets arriving in the wrong order. At the IP layer this involves reassembly of IP fragments. At the TCP layer, the transport stream reassembly replicates the TCP reassembly functionality from the network stack. The main problem with regards to TCP reassembly is to obtain the same TCP state transitions that occurred at the time when the traffic was recorded. This is particularly hard to do in a heterogeneous network environment, since different OSs handle special TCP conditions in different ways. For example, retransmitted segments may overlap the data received previously and one must decide whether to keep the old or the new data. Windows and UNIX take opposing views of this scenario [70]. A solution to minimize the inconsistencies in protocol implementations is to use a traffic normalizer [71]. Similar problems apply to reassembly of IP fragments. Application message decoding uses reassembled transport layer flows to obtain application messages exchanged by end-points. The main advantage of flow reassembly is that it provides a more accurate view of how the application affects the network. Furthermore, flow reassembly can be run on a dedicated host different from the hosts participating in the application session. Such a dedicated host has also the possibility to analyze all traffic passing by the recording interface. In contrast, application logging can only provide information about the flows in which the measuring host is an active participator. Furthermore, the flow reassembly method can save link-layer traffic to disk for off-line analysis. A major disadvantage associated with flow reassembly is that all application states must be inferred from the recorded network traffic. This is not always possible, since certain application state transitions may be independent of network events. Another disadvantage is that a lot of existing functionality (e.g., IP and TCP reassembly) is duplicated. A well-known programming mantra states that the probability to encounter bugs increases proportionally to the volume of new code. An even more serious problem is related to the link-layer capture. On heavily loaded links, the hardware may not be able to record all data and will start dropping frames. This has an impact on the host performing the measurements but not necessarily on the host participating in the application layer session. Off-line traffic analysis features, similar to those found in flow reassembly, can be implemented using application logging by adding suitable message recording points in the software application. This means, in fact, a measurement method that is a mixture between application logging and flow reassembly. Such a mixed methodology has the advantages of both methods, e.g., no need to infer application state from link-layer traces, and no need to decide beforehand what statistics to collect. 4.2 Network infrastructure The P2P measurement infrastructure developed at BTH consists of peer nodes and protocol decoding software. Tcpdump [68] and tcptrace [72] are used for traffic recording and protocol decoding. Although the infrastructure is currently geared towards P2P protocols, it can be easily extended to measure other protocols running over TCP as well. Furthermore, we plan to develop similar modules to measure UDP-based applications as well. The BTH measurement nodes run the Gentoo Linux 1.4 operating system, with kernel version 2.6.5. Each node is equipped with an Intel Celeron 2.4 GHz processor, 1 GB RAM, 120 GB hard drive, and 10/100 FastEthernet network interface. The network interface is connected to a 100 Mbit switch in the lab at the Telecommunication Systems department, which is further connected through a router to the GigaSUNET backbone (Fig. 4.1(a)). Our experience with the current setup has been that the traffic recording step alone accounts for 31 CHAPTER 4. MEASUREMENT SOFTWARE DESCRIPTION about 70 % of the total time taken by measurements. Protocol decoding is not possible when the hosts are recording traffic. The main reason is that the protocol decoding phase, which is I/O intensive, requires large amounts of CPU power and RAM. To overcome this problem, we are proposing a distributed measurement infrastructure similar to the one shown in Fig. 4.1(b). (a) Measurement setup (b) Distributed measurement setup Figure 4.1: Measurement network infrastructures. When used in the distributed infrastructure, the P2P nodes are equipped with an additional network interface, which we refer to as the management interface. P2P traffic is recorded from the primary interface and stored in a directory on the disk. The directory is exported using the Network File System (NFS) over the management interface. Data processing workstations can read recorded data over NFS as soon as it is available. Optionally, the data processing workstations can be located in private LAN or VPN in order to increase security, save IP address space and decrease the number of collisions on the Ethernet segment. In this case, the Internet access router provides Internet access to the workstations, if needed. 4.3 TCP Reassembly Framework Each measurement node has tcpdump version 3.8.3 installed on it. When the node is running measurements, tcpdump is started before the Gnutella servent in order to avoid missing any connections. Tcpdump can also be run on a different node in the network, provided that the ultrapeer switch port is mirrored to the port where the tcpdump host is recording or if the switch is replaced with a hub and both the tcpdump host and the ultrapeer are connected to it. During the data collection stage, tcpdump collects Ethernet frames from the switch port where the ultrapeer node is connected. Since most P2P applications can use dynamic ports, all traffic reaching the switch port must be collected. However, to increase the performance during data collection and data processing, one can turn off most or all server software on the ultrapeer node. It is possible, in addition, to apply a filter to tcpdump that drops packets used by traditional services, which are running on well-known ports (e.g., HTTP, FTP, SSH). 32 4.3. TCP REASSEMBLY FRAMEWORK The volume of collected data can be quite large, e.g., the resulting trace file could grow well beyond 2 GB in less than one day, which is larger than most standard filesystems can handle without modification. This is directly related to the number of peers the servent is allowed to connect to. In the case of Gnutella we observed on average 130 peers (100 leaf nodes and 30 ultrapeers) and collected approximately 33 GB captured trace data in eleven days. The solution was to have tcpdump spread the recorded data across several files, each 600 MB large. This file size was chosen such that each data file is small enough to fit on a recordable CD. Log parsing Log data reduction Data collection with tcpdump TCP Reassembly Postprocessing and analysis Application msg flow reassembly Figure 4.2: Measurement procedures. 4.3.1 TCP Reassembly Assuming that the measured P2P application runs over TCP, the next step is to reassemble the TCP frames to a flow of ordered bytes. The TCP reassembly module builds on the TCP engine available in tcptrace. The module reads the tcpdump traces in the order they were created. Each trace is scanned for TCP connections. When found, they are stored in a list with connection records. Further, when a new TCP segment is found in the trace file, the module scans the connection list comparing the socket pair of the segment with each entry in the list. If no entry matches the socket pair of the new segment, then a new connection is considered to be found and a record is created for it, which finally is added to the connection list. Otherwise, the connection record matching the socket pair is retrieved and sent together with the new segment to the TCP reassembly engine. The TCP reassembly engine is similar to the one used by the FreeBSD TCP/IP stack as described in [73]. For each active connection, the reassembly engine keeps a doubly linked list, which is referred to as the reassembly list. When given a connection record and a new segment, it retrieves the correct reassembly list and then it inserts the new segment in the correct place in the list. The reassembly engine is capable of handling out-of-order segments as well as forward and backward overlapping between segments. 4.3.2 Application Data Flow Reassembly Whenever new data is available, the application data reassembly module is notified. Upon notification, it will ask the TCP reassembly module for a new segment from the reassembly list corresponding to the socket pair received with the notification. When it receives the new segment, it interprets the contents according to the specification for the protocol it decodes. Since application messages may span several segments and since a segment may contain data from two consecutive messages, each segment is appended to the end of a data buffer before further processing, thus creating a contiguous data flow containing at least one application message. 33 CHAPTER 4. MEASUREMENT SOFTWARE DESCRIPTION Gnutella Reassembly In the case of a new Gnutella connection, the application reassembly module first waits for the handshake phase to begin. If the handshake fails the connection is marked invalid and it is eventually discarded by the memory manager. If the handshake is successful, the application reassembly module scans the capability lists sent by the nodes involved in the TCP connection. If the nodes have agreed to compress the data, the connection is marked as compressed. Further segments received from the TCP reassembly module for this connection are first sent to the decompressor, before being appended to the data buffer. The decompressor uses the inflate() function of zlib [52] to decompress the data available in the new segment. Upon successful decompression the decompressed data is appended to the data buffer. Immediately after the handshake phase, the application reassembly module attempts to find the Gnutella message header of the first message. Using the payload length field, it is able to discover the beginning of the second message. This is the only way in the Gnutella protocol to discover message boundaries and thus track application state changes. Based on the message type field in the message header, the corresponding decoding function is called, which outputs a message record to the log file. The message records follow a specific format required by the postprocessing stage. BitTorrent Reassembly The BitTorrent reassembler works in a fashion much similar to the Gnutella reassembler. It is however less complex, since the BitTorrent protocol is substantially less complex than the Gnutella protocol. Each BitTorrent message is fixed-length and is prepended by the message type in a 32-bit little-endian word, making the decoding straightforward. The timestamp of the first TCP segment of each message is recorded, along with the timestamp of the last segment. In the case of single-segment messages (all messages except the piece-messages), the first and last segments are the same. A rudimentary HTTP parser is also available, which is used to parse tracker responses. 4.3.3 Data Compression and Postprocessing Since the logs can grow quite large, they can be processed through an optional stage of data compression. The compression is achieved by using the on-the-fly deflate compression offered by zlib. Additional data reduction can be achieved if the user is willing to sacrifice some detail by aggregating data over time. The postprocessing module interprets the (optionally compressed) log data and it is able to demultiplex it based on different types of constraints: message type, IP address, port number, etc. The data output format of this stage is suitable for input to numerical computation software such as MATLAB and standard UNIX text processing software such as sed, awk and perl. 34 4.4. APPLICATION LOGGING 4.4 Application Logging Application logging is commonly used in server software to enable traceability of errors and client requests. In certain server applications, such as critical business systems and other high-security systems, server logs are very important for detecting intrusion attempts and for estimating severity of security breaches. In other applications, logs are a useful tool for performance analysis. However, client applications do not usually provide for much in terms of logging. If logging is made available, it usually provides rather coarsely grained information, such as application start and other very high-level application events. It is unusual that an application provides the amount of log detail needed to analyze the network performance of the application. To provide adequate detail in application logs, it is necessary to modify the application in such a way that the application both provides the detailed event information needed and a way to store this information in a log file or database. In applications that are based on an event-loop with a central managing component, obtaining the relevant information is a fairly easy task, as the events being handled contain all information relevant to the specific event. By adding a timestamp, these may then be ejected to a log file or database. On the other hand, in a threaded and less centralised application, this becomes a more difficult task, as events may not be handled through a single component. An additional issue with client-side logging is deployment of the modified clients. It is important to have a large enough number of users to provide representative data. Also, not all users may agree to running a modified client. One of the most difficult problems relates to the non-availability of client-source code. For example, most proprietary software does not provide the source code for the application, making modification impossible without substantial reverse engineering. Log storage may become an issue if, for instance, the application is running on an embedded system where there is no storage available except for internal memory. Also, if measurements are performed over a long period of time and/or there is a large number of events, the application logs may grow prohibitively large. 4.4.1 BitTorrent Software Modifications The reference BitTorrent client version1 is written in the python programming language [74]. Python is an interpreted and interactive language with object oriented features that combines syntactical clarity with powerful components and system-level functionality. This makes the process of extending software written in the language less complicated than in a compiled and syntactically more demanding language such as C or Java. The client is written as an event-based program, reacting on incoming protocol messages and internal timers. The internal timers activate the sending of messages such as tracker requests, unchoking peers and network timeouts. For the purpose of the present work, the incoming network message handling routines are the important part. These are mainly located in a single software component, which handles all incoming events. This component consists of a function containing the main loop (that receives the network messages), and several message specific functions to handle the incoming messages that are invoked from the main loop. While it is possible to intercept the messages in the main loop, it is much easier to do so in the specific message handling routines. There are two major reasons for this: 1 Version 3.4.1, released on March 11, 2004. 35 CHAPTER 4. MEASUREMENT SOFTWARE DESCRIPTION • The message type is already implicitly given by the call of the function. • Message-specific information is provided automatically, without the need to write extra parsing code. For instance, in the case of a piece request message being intercepted in the main loop, it would have been necessary to parse the incoming message to find information such as piece number and subpiece index. Before saving the ejected log messages to disk, they are compressed by the zlib library [52]. This is beneficial both with regards to disk storage and with regards to the amount of disk I/O performed. The degradation in CPU performance of the compression is practically negligible on the measurement computers. Finally, extra parameters have been added to the application to allow changing the filename of the log-file, and code to automatically generate a date and timestamped filename if none was given. 4.5 Log Formats Selection of a log format that provides a suitable amount of information is an important issue. It is important to capture enough information to make relevant statistical analysis possible, while at the same time keep the sizes of the log files to a manageable level. This problem is most noticeable when designing a log format for application logs, as it is not possible to re-run a specific measurement a second time if one has chosen too small a subset of metrics to log. Packet captures are less affected by this, but are not impervious to similar effects in the case of, for instance, too small capture size for the recorded packets, thus losing parts of the payload data. In both cases, information is irretrievably lost. Complete packet captures that contain all data transmitted on a link may be used to re-generate log files as needed. This is however often a very time-consuming process, and it is preferable to avoid it whenever possible. 4.5.1 BitTorrent XML Log Format The eXtensible Markup Language (XML) [75] has a number of attractive features that makes it a good choice as a log format. XML is by concept and design made to be easily parsable by a computer, while at the same time be at least semi-readable by humans. Some of the salient advantages of using XML as a log format are : Parsability There are several XML parsing libraries available for a plethora of languages, including, but not limited to, perl, C, C++, python and MATLAB. This makes the writing of log parsers much easier, since it is not necessary to write an application specific parser for the log format. Extensibility It is easy to add new log fields, and new log fields do not necessitate changing the parser. This is very useful when deciding what information goes into the log, as fields may be added and removed easily. Validation The number and types of fields are easily verifiable, and is usually performed as part of the XML validation process provided by the parsing library. 36 4.5. LOG FORMATS Two drawbacks with using XML as a log format are that the parsing is slightly slower than an application specific parser, and that memory requirements are substantially higher when using specific parsers. In particular, it is rarely possible to use the Document Object Model (DOM) parsers to parse the log files. These parsers maintain a representation of the entire XML file in memory and, with log files in the gigabyte range, the amount of memory required is substantial. Simpler parsers, such as Simple API for XML (SAX) parsers, are therefore used. These parse the document on an element by element basis, removing the need for keeping the entire document in memory. This solution unfortunately also means that the transformation capabilities provided by eXtensible Stylesheet Language Transformations (XSLT) cannot be used, and specific software making use of provided SAX parsers must be created. A third drawback is that using XML adds metadata, which in turn makes the storage requirements for the logs higher. XML documents are text documents comprised of elements and attributes. Attributes are contained within the elements, and usually carry element-specific information and modifiers. The XML document type used for the BitTorrent log files is comprised of only two elements: EVENTLIST and EVENT. The EVENTLIST element carries information regarding the torrent-file used for the measurement and the settings that were used for the BitTorrent client during the measurement session. Figure 4.3 shows two excerpts from such an XML document. Every EVENT element contains the attributes type and timestamp. The timestamp attribute signifies the time at which this event was ejected to the log file, expressed as a UNIX timestamp, i.e., the number of seconds elapsed since 00:00:00 UTC, January 1, 1970. The type field denotes the event type. The various values for the type-attribute are: announce The only tracker-related event type available. It is ejected into the log file when the peer communicates with the tracker to request more peers. This element carries the following attributes: uploaded Denotes the number of subpiece bytes this peer has sent to other peers since it was launched. downloaded Denotes the number of subpiece bytes this peer has received from other peers since the client was launched. left Denotes the number of bytes of the resource that remain to download. last This parameter is undocumented in both the official protocol specification and Wiki. trackerid Used by the tracker for maintaining state. event Is one of started, none or completed. The value started should be used when sending the initial tracker announce message, and only then. The None value is used when transmitting the periodic updates to the tracker, while the value completed is sent exactly once to the tracker when the download is complete. numwant Denotes the number of new peer addresses the peer is requesting from the tracker. start_dl This element is ejected for every newly initiated TCP connection to a peer. Note that it does not necessarily imply that the BitTorrent handshake will be completed. connect This element is ejected after every completed BitTorrent handshake. unchoke, choke, interested, not interested, request, piece, have, cancel 37 Figure 4.3: Extract from BitTorrent XML log file. 38 <EVENT <EVENT <EVENT < <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT <EVENT . . . <EVENT <EVENT <EVENT type="cancel" timestamp="1084430940.120284" dst="200.185.78.6" dstid="-AZ2084-a3QFRyfkLC8Q" piece="1762" nconns="15"/> type="have" direction="out" timestamp="1084430940.120433" piece="1762" nconns="15" down="661110784"/> type="piece" timestamp="1084430940.129599" src="212.100.224.105" srcid="\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xda\xe3.\xdb\x8b\xcfb\xa8" piece="1176" begin="49152" length="16384"/> type="announce" timestamp="1084430940.133816" uploaded="22118400" downloaded="661127168" left="0" last="None" trackerid="None" event="completed" numwant="50"/> type="done" timestamp="1084430940.134094" src="212.100.224.105" srcid="\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xda\xe3.\xdb\x8b\xcfb\xa8" piece="1176" rxtime="0.645374" rxstart="1084430939.488720" nconns="15"/> type="cancel" timestamp="1084430940.134308" dst="220.233.6.19" dstid="-AZ2084-v8j2jYQi0GOq" piece="1176" nconns="15"/> type="not interested" timestamp="1084430940.134540" dst="220.233.6.19" dstid="-AZ2084-v8j2jYQi0GOq" nconns="15"/> type="not interested" timestamp="1084430940.134640" dst="67.70.42.140" dstid="-AZ2084-1lG41aBmxdcr" nconns="15"/> type="announce" timestamp="1084428217.041880" uploaded="0" downloaded="0" left="661127168" last="None" trackerid="None" event="started" numwant="50"/> type="start_dl" timestamp="1084428218.717552" src="220.233.6.19" port="6881" nconns="1"/> type="unchoke" timestamp="1084428218.717747" dst="220.233.6.19" dstid="-AZ2084-v8j2jYQi0GOq" nconns="1"/> type="connect" timestamp="1084428218.717867" src="220.233.6.19" srcid="-AZ2084-v8j2jYQi0GOq" nconns="1"/> type="bitfield" timestamp="1084428219.388488" src="220.233.6.19" srcid="-AZ2084-v8j2jYQi0GOq" nconns="1"/> type="interested" timestamp="1084428219.429775" dst="220.233.6.19" dstid="-AZ2084-v8j2jYQi0GOq" nconns="1"/> type="start_dl" timestamp="1084428225.820689" src="217.226.127.188" port="6881" nconns="2"/> type="unchoke" timestamp="1084428225.820863" dst="217.226.127.188" dstid="-AZ2084-8wtClbR51oMR" nconns="2"/> type="connect" timestamp="1084428225.820989" src="217.226.127.188" srcid="-AZ2084-8wtClbR51oMR" nconns="2"/> type="unchoke" timestamp="1084428229.588582" src="220.233.6.19" srcid="-AZ2084-v8j2jYQi0GOq" nconns="2"/> type="request" timestamp="1084428229.588833" dst="220.233.6.19" dstid="-AZ2084-v8j2jYQi0GOq" piece="688" begin="0" nconns="2"/> type="request" timestamp="1084428229.589048" dst="220.233.6.19" dstid="-AZ2084-v8j2jYQi0GOq" piece="688" begin="16384" nconns="2"/> type="request" timestamp="1084428229.589190" dst="220.233.6.19" dstid="-AZ2084-v8j2jYQi0GOq" piece="688" begin="32768" nconns="2"/> type="request" timestamp="1084428229.589342" dst="220.233.6.19" dstid="-AZ2084-v8j2jYQi0GOq" piece="688" begin="49152" nconns="2"/> type="request" timestamp="1084428229.589477" dst="220.233.6.19" dstid="-AZ2084-v8j2jYQi0GOq" piece="688" begin="65536" nconns="2"/> type="bitfield" timestamp="1084428230.003638" src="217.226.127.188" srcid="-AZ2084-8wtClbR51oMR" nconns="2"/> type="interested" timestamp="1084428230.052099" dst="217.226.127.188" dstid="-AZ2084-8wtClbR51oMR" nconns="2"/> type="start_dl" timestamp="1084428230.842135" src="67.70.42.140" port="6881" nconns="3"/> type="unchoke" timestamp="1084428230.842327" dst="67.70.42.140" dstid="-AZ2084-1lG41aBmxdcr" nconns="3"/> type="connect" timestamp="1084428230.842438" src="67.70.42.140" srcid="-AZ2084-1lG41aBmxdcr" nconns="3"/> type="piece" timestamp="1084428235.373197" src="220.233.6.19" srcid="-AZ2084-v8j2jYQi0GOq" piece="688" begin="0" length="16384"/> type="request" timestamp="1084428235.375262" dst="220.233.6.19" dstid="-AZ2084-v8j2jYQi0GOq" piece="688" begin="81920" nconns="3"/> type="bitfield" timestamp="1084428238.242941" src="67.70.42.140" srcid="-AZ2084-1lG41aBmxdcr" nconns="3"/> type="interested" timestamp="1084428238.279325" dst="67.70.42.140" dstid="-AZ2084-1lG41aBmxdcr" nconns="3"/> type="piece" timestamp="1084428238.983001" src="220.233.6.19" srcid="-AZ2084-v8j2jYQi0GOq" piece="688" begin="16384" length="16384"/> type="request" timestamp="1084428238.983420" dst="220.233.6.19" dstid="-AZ2084-v8j2jYQi0GOq" piece="688" begin="98304" nconns="3"/> type="piece" timestamp="1084428243.061652" src="220.233.6.19" srcid="-AZ2084-v8j2jYQi0GOq" piece="688" begin="32768" length="16384"/> type="request" timestamp="1084428243.062012" dst="220.233.6.19" dstid="-AZ2084-v8j2jYQi0GOq" piece="688" begin="114688" nconns="3"/> type="have" timestamp="1084428243.807769" src="67.70.42.140" srcid="-AZ2084-1lG41aBmxdcr" piece="2325" nconns="3"/> type="piece" timestamp="1084428246.762416" src="220.233.6.19" srcid="-AZ2084-v8j2jYQi0GOq" piece="688" begin="49152" length="16384"/> type="request" timestamp="1084428246.762847" dst="220.233.6.19" dstid="-AZ2084-v8j2jYQi0GOq" piece="688" begin="131072" nconns="3"/> type="have" timestamp="1084428250.036548" src="67.70.42.140" srcid="-AZ2084-1lG41aBmxdcr" piece="2376" nconns="3"/> type="have" timestamp="1084428250.036794" src="67.70.42.140" srcid="-AZ2084-1lG41aBmxdcr" piece="2181" nconns="3"/> CHAPTER 4. MEASUREMENT SOFTWARE DESCRIPTION 4.5. LOG FORMATS These element types are ejected for each sent or received corresponding BitTorrent protocol message. send The send-element is the equivalent of the piece-message, but for the subpieces the local peer transmits. done This element is ejected once a download completes fully, and should only appear once per log file. The various peer-related event types carry event specific information in additional attributes. These attributes are: src, dst These attributes indicate the source and destination IP address of the sending or receiving peer respectively. Valid for all event types. srcid, dstid These attributes indicate the peer ID of the sending or receiving peer respectively. The content of these attributes are encoded using the python functions repr and xml.saxutils.escape. The function repr returns a unique string representation of the input parameter, and the escape function returns an XML-escaped version of its input. Recall that the peer ID is a binary 20-byte value. The peer ID is first processed by the repr function to convert any non-printable to its python hexadecimal representation, i.e., the characters \x followed by the hexadecimal value. This string is then made into a valid XML attribute by the xml.saxutil.escape function, i.e., converting XML special characters such as & to &, with the exception of the quotation character, ", which is encoded using the python hexadecimal encoding (\x22). For a complete list of XML entity encodings see [75]. Valid for all event types except start_dl. piece Denotes which piece a specific message refers to. Valid for types piece, cancel, send, have and request. begin Starting byte of a subpiece reference. Used together with the length parameter to denote a specific subpiece. Valid for types piece, cancel, send and request. length Number of content data bytes received or sent in a single piece message. Valid for types piece, send and cancel. down Denotes the number of downloaded and SHA1-verified bytes. Only valid for type have. nconns This attribute denotes the number of currently connected peers at the time of event ejection. This includes both locally and remotely initiated connections. Valid for all event types. port Indicates the TCP port of the remote peer. Valid for type start_dl only. direction Only valid for types have and bitfield. Used for differentiating between sent and received messages of these types. If the attribute is present and contains the value out, the message was sent by the measurement peer, otherwise it was received. 39 CHAPTER 4. MEASUREMENT SOFTWARE DESCRIPTION txtime The difference in time between the sending of the first subpiece of a piece and the reception of the last subpiece of the piece. rxtime The difference in time between the first request of a piece and the reception of the last subpiece of the piece. 4.5.2 Common Log Formats To facilitate the re-use of parsing software, it was decided as part of the measurement infrastructure, that parsed traces should be written to log files that adhere to a common format. This format is defined as follows: • Fields are separated by spaces (ASCII code 32). • Fields are defined as: 1. The first field should always be the UNIX timestamp for the event. 2. The second field should contain the message type, if any. 3. Any following field may containg arbitrary information. This is a simple and flexible logging scheme that allows us to use standard UNIX tools such as sed, awk and perl to parse the files without resorting to writing specialized parsers for each log file. In fact, the parsing software only assumes that the first field is a UNIX timestamp, and the rest of the fields are arbitrary. It is however recommended that the second field is a message type. Figure 4.4 shows a portion of a log file generated from a BitTorrent application log. 1088079499.265901 1088079499.267193 1088079499.269075 1088079499.275710 1088079499.282697 piece request 1311 196608 have 593 piece have 6690 Figure 4.4: Sample BitTorrent log file. 40 Chapter 5 BitTorrent The reported measurements have been done by having instances of the BitTorrent client software join several distribution swarms. An instrumented version of the reference BitTorrent client has been used to avoid potentially injecting non-standard protocol messages in the swarm. The client was instrumented to log all incoming and outgoing protocol messages together with a UNIX timestamp. The BitTorrent client is implemented in python, an interpreted programming language. The drawback with this is that the accuracy of the timestamps is reduced compared to the actual arrival times of the carrying IP datagrams. By comparing the actual timestamps of back-to-back messages at the application level with the corresponding TCP segments, we have found that the accuracy was approximately 10 ms. Most of the traffic reported in this publication has been collected over a three week time period at two measurement points in Blekinge, Sweden. The first measurement point was the networking lab at BTH, Karlskrona, which is connected to the Internet through a 100 Mbps Ethernet network. The second measurement point was placed at a local ISP with a 5 Mbps link. Both measurement points were running the Gentoo Linux operating system, on standard PC hardware. For the initial measurements, a number of twelve measurement have been done, each of them with a duration of two to seven days (Table 5.1). This first set of measurements were purely done with the instrumented client. An additional measurement with both application logging active and packet capturing running simultaneously has also been performed, for a total of thirteen measurements. For the first measurement point, no significant amount of different software was running simultaneously with the BitTorrent client. At the second measurement point, the BitTorrent client was running as a normal application, together with other software such as web browsers and mail software. The first measurement point can be viewed as a dedicated BitTorrent client, while the second corresponds to normal desktop PC usage patterns. 5.1 Measurement details Measurements 1 through 3 (Table 5.1) have been done with a single instance of the instrumented BitTorrent client running. As TCP is known to be very aggressive in using the network, this has been done to minimize the effects of several clients competing for the available bandwidth and to establish a point of reference for the rest of client sessions. Measurements 4 through 8 were started and done simultaneously, as were measurements 11 and 12. The other measurements were done with some temporal overlap, as shown in Figure 5.1. 41 CHAPTER 5. BITTORRENT Table 5.1: Measurement summary. Number Records Start Duration Location 1 10770695 2004-05-03 2 days, 20 hours BTH 2 10653466 2004-05-06 3 days, 19 hours BTH 3 10990569 2004-05-12 4 days, 4 hours BTH 4 12567283 2004-05-17 7 days BTH 5 13691459 2004-05-17 7 days BTH 6 11754838 2004-05-17 7 days BTH 7 1943636 2004-05-17 7 days BTH 8 7321166 2004-05-17 7 days BTH 9a 687046 2004-05-13 3 days, 7 hours ISP 10 2881803 2004-05-18 5 days, 23 hours ISP 11 9252170 2004-05-22 7 days ISP 12 5599997 2004-05-22 7 days ISP 13 14803678 2004-06-26 7 days BTH a Unfortunately, the original data for this measurement was lost due to hardware failure. Thus, most analysis is not performed on this data, and only summary statistics are provided. May 1 May 8 1 2 May 15 3 May 22 May 29 4–8 9 10 11–12 Figure 5.1: Temporal structure of measurements. An important issue regarding traffic measurements in P2P networks is the copyright. The most popular content in these networks is usually copyrighted material. To circumvent this problem, we joined BitTorrent swarms distributing several popular Linux operating system distributions. Notably, we joined both the RedHat Fedora Core 2 (FC2) test and release versions. The FC2 ’Tettnang’ version was released on May 18th, while the rest of the content was available at the start of the measurements. This gave us a unique opportunity to study the dynamic nature of the FC2 swarms. The contents of the measured swarms are reported in Table 5.2. Two of the swarms have been measured from both measurement points to allow for comparisons, one with temporal overlap, and another without overlap. 5.2 Aggregate results In this section we report some of the more salient results obtained from our measurements. We first summarise the download times and rates in table 5.3. It is observed that the time before our peer went into seeding mode varies from roughly 20 minutes up to 6.5 hours. As the content sizes vary with each measurement, we also provide the average download rate for the entire content, i.e., the size/time ratio. The download rates also show large disparity, with rates ranging from just over 129 kB to over 1.3 MB, with the three first measurements clearly being the most demanding 42 5.2. AGGREGATE RESULTS Table 5.2: Content summary. Content Pieces Size Measurement RedHat Fedora Core 2 test3 CD Images 8465 2.2 GB 1–3 RedHat Fedora Core 2 test3 DVD Image 16708 4.3 GB 6, 10 Slackware Linux Install Disk 1 2501 650 MB 4 Slackware Linux Install Disk 2 2627 670 MB 5 Dynebolic Linux 1.3 2522 650 MB 7, 9 Knoppix Linux 3.4 2753 700 MB 8 RedHat Fedora Core 2 ‘Tettnang‘ CD Images 8719 2.2 GB 12,13 RedHat Fedora Core 2 ‘Tettnang‘ DVD Image 16673 4.3 GB 11 in terms of bandwidth utilization. Table 5.3: Download time and average download rate summary. # Download Time (s) Rate (bps) 1 1930 1149520 2 1932 1147908 3 1681 1319445 4 2607 251424 5 3397 202644 6 23000 190416 7 1237 534282 8 6005 120153 9 2723 242776 10 23475 186570 11 19431 224927 12 9106 250989 13 2951 774420 A summary of session sizes and durations is reported in table 5.4. We also provide the number of sessions, unique peer IPs and peer client IDs. Measurement 6 clearly stands out here, both with regards to mean session size and session length. Also, the maximum session size for this measurement is over twice that of any of the other measurements. The mean session size is also about twice that of the corresponding measurement of the same content (measurement 10). As measurements 6 and 10 have the top two session sizes, it is probable that the session size is related to the total content size (4.3 GB). The minimum session lengths are all set to 0, indicating that they are all shorter than the accuracy provided for by the application logs. These very short sessions are also indicated in the minimum session sizes, and correspond to a session containing only a handshake or an interrupted handshake. Another pertinent feature is the ratio of the number of unique IPs to the number of unique peers for measurement 8. The ratio for that measurement is slightly above 0.25, while none of the other 43 CHAPTER 5. BITTORRENT measurements are below 0.5. This might indicate either users stopping and restarting their clients several times, or users sharing IPs, such as peers subject to NAT. Table 5.4: Session and peer summary. # Sessions Mean Session length (s) Max Min Std Mean Session size (MB) Max Minb Std Peersa ID IP 1 2 3 4 5 6 7 8 9 10 11 12 29712 46022 28687 13493 12354 10685 4444 17287 3043 9701 43939 68288 343 233 465 750 910 1207 218 231 294 652 448 197 98991 117605 171074 143707 180298 223235 46478 87026 29163 267497 141509 292241 0 0 0 0 0 0 0 0 0 0 0 0 2741 2316 3614 3942 4504 7016 1642 1972 1719 5907 3791 2580 27.49 27.15 28.54 49.88 57.08 74.25 49.96 33.11 21.62 37.78 17.22 8.31 647.26 646.03 539.20 671.99 668.53 3117.79 431.13 695.94 408.05 1499.85 475.86 987.89 73 73 73 73 73 73 78 73 78 73 73 73 70.65 64.05 61.70 100.65 116.10 247.74 76.48 109.31 42.27 109.08 52.73 30.63 2024 1876 1913 1813 1747 1033 279 1656 193 444 1841 2177 1314 1394 1319 1143 962 619 184 406 166 305 1067 1152 13 52833 465 483996 0 4036 32.2 1652.83 73 99.4 3930 2440 a Unique b This peer client IDs and IP addresses column measured in bytes. Table 5.5 summarises the number of messages received on a per-message basis. In addition, column 5 shows the number of incoming connection requests collected in our measurements. The request and have messages clearly dominate in terms of number of messages sent, while the interested and not interested messages are the least common. This is valid for all the measurements, except for measurement 2, which has almost 5 times more incoming interested messages than the measurement with the second highest number of interested messages. The high number of request and have messages found in our measurements is expected, as the peer is acting as a seed for most of the time spent in the swarm. When seeding, a peer never receives piece messages, and downloading peer must request data by the request message, thus explaining their high number. The have messages are accounted for by the fact that every completed piece download results in such a message being transmitted. Table 5.5: Downstream protocol message summary. # request not int. piece new conn. bitfield unchoke have int. choke cancel 1 2 3 4 5 6 7 8 9 10 11 12 3316470 3044768 3276644 5596270 6163605 4501907 810019 3347256 217336 838379 1835910 1118110 504 489 493 406 401 191 52 766 37 79 470 348 135615 135797 135682 40167 42176 277261 40371 44328 40426 268429 268575 139943 29746 46047 28714 13502 12364 10688 4445 17292 3045 9703 43957 68297 28024 45054 27092 12935 11827 9659 4370 16623 2996 9181 42848 67373 27120 19117 40705 29628 32325 24239 290 9270 1114 13015 54090 37925 3651835 3984881 3941658 1206000 1197813 2090892 198885 404038 139472 570367 4713440 2619333 2905 14602 2430 2041 2059 2147 230 2012 259 692 2573 3242 26314 18061 39955 28640 31452 23639 122 8579 956 11936 52458 36872 6500 9059 7628 14643 11508 6244 1255 18999 3061 9085 17313 25047 13 8113100 711 139702 52865 50304 60524 6293438 9477 58925 24632 The summary of the outgoing messages in table 5.6 again shows the very low number of interested and not interested messages. The major bulk of the outgoing messages is however accounted for by 44 5.3. SWARM SIZE DYNAMICITY the piece messages. This is again an expected result, as request messages generate a piece message in response. The absence of transmitted choke messages for measurement 7 indicate that there has been a continuous exchange of data between peers. As for the request and have messages, these are tightly coupled to the number of pieces present in the content. The higher number of request messages is because these messages correspond to only a single subpiece. Table 5.6: Upstream protocol message summary. 5.3 # request piece not int. unchoke bitfield int. have choke cancel 1 2 3 4 5 6 7 8 9 10 11 12 137007 137271 136738 42709 44862 291200 40497 47413 40906 285650 281921 145517 3251948 2964836 3189175 5468908 6032599 4394389 808844 3296616 213693 753074 1660868 960802 63 63 62 76 146 91 18 100 16 71 67 76 11792 17471 16545 25476 25759 23166 4445 19380 3192 21304 35698 49093 29714 46020 28682 13489 12353 10661 4444 17281 3042 9673 43927 68271 68 70 64 86 157 197 18 136 19 214 157 125 8465 8465 8465 2501 2627 16708 2522 2753 2522 16708 16673 8719 9553 13301 14085 22740 23749 18943 0 8672 193 15222 31279 34570 970 894 1011 855 725 555 140 423 220 611 812 701 13 141316 7940342 80 27332 52830 97 8719 23527 807 Swarm size dynamicity After having downloaded all the data for a given torrent, the peer disconnects all connected seeds, and starts acting as a seed itself. Number of connections It is interesting to compare the seed phases of measurements 6, 10 and 11. The data in the first two was the test release of the RedHat Fedora Core 2 linux distribution, while the data in the last swarm was the final release of the same version. The final version was released on May 18. This event can be clearly seen at around 12:00 in Fig 5.2(b), at which time peers start disconnecting, most likely due to the new version of the distribution being released. The decrease in connected peers can also be seen in Fig 5.3(b). 60 50 40 30 20 10 0 10:00 11:00 12:00 13:00 Time 14:00 15:00 16:00 Number of connections (a) Leeching phase May 17 60 50 40 30 20 10 0 19:00 May 18 07:00 19:00 May 19 06:00 May 20 18:00 06:00 May 21 17:00 05:00 Time May 22 17:00 04:00 May 23 16:00 (b) Seeding phase Figure 5.2: Swarm size for measurement 6. 45 May 24 04:00 15:00 May 25 03:00 Number of connections CHAPTER 5. BITTORRENT 60 50 40 30 20 10 0 16:00 17:00 18:00 19:00 Time 20:00 21:00 22:00 Number of connections (a) Leeching phase May 18 40 35 30 25 20 15 10 5 0 May 19 06:00 18:00 May 20 06:00 17:00 May 21 05:00 May 22 17:00 Time 04:00 May 23 16:00 04:00 (b) Seeding phase Figure 5.3: Swarm size for measurement 10. 46 May 24 15:00 May 25 03:00 Chapter 6 Gnutella Two sets of Gnutella measurements were performed at BTH. The first set was done in the early development phase of the measurement infrastructure (spring 2004), while the second set of measurements was performed one year later. During that year the Gnet has undergone major changes in terms of new protocol features. Functionality such as dynamic query, download mesh, PFS and UDP file transfers were adopted by a majority of servents, thus profoundly changing the characteristics of Gnutella network traffic. The research group at BTH has therefore decided to focus the analysis efforts on the more recent measurements since they were deemed to better reflect the current and future directions of Gnutella traffic characteristics. As a consequence, this report presents the recent set of measurements. All results presented here were obtained from a 11-day long link-layer packet trace collected at BTH with the measurement infrastructure presented in Chapter 4.2. The Gnutella application flow reassembly was performed as described in Chapter 4.3. The gtk-gnutella open source servent was configured to run as ultrapeer and to maintain 32–40 connections to other ultrapeers and 100 connections to leaf nodes. The number of connections is the vendor preconfigured value, which is close to the suggested values in [48, 47]. Although gtk-gnutella is capable of operation over UDP, this functionality was turned off. Consequently, the ultrapeer used only TCP for its traffic. No other applications, with the exception of an SSH daemon, were running on the ultrapeer for the duration of the measurements. One SSH connection was used to regularly check on the status of the measurements and the amount of free disk space. The SSH connection was idle for most of the time. The firewall was turned off during the measurements. 6.1 Session Statistics A Gnutella session is defined here as the set of Gnutella messages exchanged over a TCP connection between two directly connected peers that have successfully completed the Gnutella handshake. The session lasts until the TCP connection is closed by either FIN or RST TCP segments. The session duration is computed as the time duration between the instant when the first handshake message (CLI HSK) is recorded (at the link layer) until the measured time of the last Gnutella message on the same TCP connection. The session is not considered closed until both sides have sent FIN (or RST) segments. An incoming session is defined as a session for which the CLI HSK message was received by the ultrapeer at BTH. Outgoing sessions are sessions for which the CLI HSK messages was sent by 47 CHAPTER 6. GNUTELLA the ultrapeer at BTH. Table 6.1 and Table 6.2 show the duration (in seconds), the number of exchanged messages and bytes for incoming and outgoing sessions, respectively. Table 6.3 show the same statistics when no distinction is made between incoming and outgoing sessions. In the column denoted “Samples” the first number shows the number of valid Gnutella sessions that is used to compute the statistics. A Gnutella session is considered valid (in the sense that is used to compute session statistics) if the Gnutella handshake was completed successfully and at least one Gnutella message was transferred between the two hosts participating in the session. The number in paranthesis is the total number of observed sessions, valid and invalid. In this case, only 30 % of the all sessions were valid (31.5 % and 13.4 % when considering only incoming and outgoing sessions, respectively). Table 6.1: Incoming session statistics. Type Duration (s) Messages Bytes Max Min Mean Median Std Samples 767553 (8.9 days) 0.03 517.30 0.86 6780.99 173711 (551168) 7561532 (7.6M) 4 585.18 11 22580.99 173711 (551168) 535336627 (535.3M) 780 53059 1356 2034418 173711 (551168) Table 6.2: Outgoing session statistics. Type Duration (s) Messages Bytes Max Min Mean Median Std Samples 470422 (5.4 days) 0.12 2644660 (2.6M) 6 3949.86 2459.10 11170.80 7094 (52904) 23145.15 15716.50 58627.75 182279191 (182.3M) 1574 2173564 7094 (52904) 1457360 4458468 7094 (52904) Table 6.3: Incoming + Outgoing session statistics. Type Duration (s) Messages Bytes Max Min Mean Median 767553 (8.9 days) 7561532 (7.6M) 535336627 (535.3M) Std Samples 0.03 651.98 4 1470.34 0.87 7036.85 180805 (604072) 11 25375.64 780 136258 180805 (604072) 1357 2219411 180805 (604072) The tables show that outgoing sessions transfer about 40 times more data than incoming sessions. Furthermore, it appears that for incoming sessions, few sessions transfer the majority of data. This can be seen by comparing the mean and median values for messages and bytes. The explanation can be found by comparing the mean and median duration values for incoming sessions. It can observed that most incoming sessions have very short duration (< 1 second). Currently, no good reason could be found for this behavior. Most connections were terminated with a BYE message with code 200 Node Bumped. 6.2 Message Statistics Table 6.4 displays the message size statistics for each Gnutella message type. The type UNKNOWN denotes messages with a valid Gnutella header, but with unrecognized message type. The messages are either experimental or corrupted. The type ALL is used for statistics computed over all messages, irrespective of type. It can be observed that on average QUERY HIT and QRP messages have the largest size. They are tightly followed by handshake messages, where the capability headers accounts for most of the data. It is interesting to notice that the maximum size of QUERY HIT messages is 39 kB, which is an order of magnitude greater than the 4 kB specified by [46]. 48 6.2. MESSAGE STATISTICS Table 6.4: Message size statistics. Type Max Min Mean Median Std Samples CLI HSK 696 22 336.91 328 65.69 604072 SER HSK 2835 23 386.83 369 145.69 597896 FIN HSK 505 23 107.92 76 88.55 212162 PING 34 23 25.48 23 3.88 4151799 PONG 464 37 74.96 61 38.68 43727188 QUERY 376 26 70.17 55 46.40 129078986 QUERY HIT 39161 58 590.28 358 1223.58 13242329 QRP 4124 29 608.60 540 596.70 1158596 HSEP 191 47 70.39 71 28.15 538834 PUSH 49 49 49.00 49 0.00 63040718 BYE 148 35 40.02 37 15.84 167726 VENDOR 177 31 36.45 33 19.51 10195389 43 23 23.53 23 3.24 38 39161 22 93.45 49 303.26 266715733 UNKNOWN ALL The message duration statistic can be useful to infer waiting times at application layer when a message is divided across two or more TCP segments. The statistic is defined as the time difference between the first and last TCP segments that were used to transport the message. When a message uses only one TCP segment the time duration for that message is zero. Table 6.5: Message duration statistics in seconds (100 µs resolution). Type Max Min Mean Median Std Samples CLI HSK 349.3015 0 0.0308 0 1.0412 604072 SER HSK 52.2645 0 0.0032 0 0.1350 597896 FIN HSK 68.6295 0 0.0057 0 0.2838 212162 PING 251.2914 0 0.0273 0 0.6309 4151799 PONG 2355.8650 0 0.0077 0 0.5881 43727188 QUERY 2355.8650 0 0.0035 0 1.3271 129078986 QUERY HIT 480.8159 0 0.0243 0 1.0260 13242329 QRP 753.1904 0 0.1883 0 1.6019 1158596 HSEP 74.0482 0 0.0017 0 0.2186 538834 PUSH 135.5155 0 0.0023 0 0.2017 63040718 BYE 148.7292 0 0.0386 0 0.5194 167726 VENDOR 391.3439 0 0.0117 0 0.2451 10195389 1.0418 0 0.2995 0 0.4294 38 2355.8650 0 0.0065 0 0.9968 266715733 UNKNOWN ALL From the median column in Table 6.5 it can be observed that at least 50% of the messages require just one TCP segment. The PONG and QUERY HIT message rows contain extreme values for maximum duration, 2355.9 seconds (approximately 39 minutes). These values are probably the result of malfunctioning or experimental Gnutella servents. Table 6.6 shows interarrival times for messages received by the BTH ultrapeer and Table 6.7 shows interdeparture times for messages sent by the BTH ultrapeer. Summing over the number of samples for each message type does not add up to the value shown in the number of samples for message type ALL. The reason is that the analysis software ignores messages that generate negative interarrival/interdeparture times. Such negative times appear because the application flow reassembly handles several (typically more than one hundred) connections at the same time. On each connection the timestamp for arriving packets is monotonically increasing. However, the interarrival/interdeparture statistics presented here are computed across 49 CHAPTER 6. GNUTELLA all connections. To ensure monotonically increasing timestamps even in this case, new mesages from arbitrary connections are stored in a buffer which is sorted by timestamp. The size of the buffer is limited to 500000 entries due to memory management issues. Table 6.20 shows that on average there are 280 incoming and outgoing messages. This means that the buffer can store about 30 minutes of average traffic and much less during traffic bursts. If there are messages that are delayed (see Table 6.5) due to TCP retransmissions or other events, they will reach the buffer too late and will be discarded. Table 6.6: Message interarrival time statistics (100 µs resolution). Max Min Mean Median Std Samples CLI HSK Type 28.4591 0.0001 1.7246 1.1256 1.8644 551148 SER HSK 5185.0490 0.0001 19.6294 0.2090 92.1849 48432 FIN HSK 1118.9920 0.0001 5.3165 2.1942 19.4800 178783 3457169 PING 13.5871 0.0001 0.2762 0.1931 0.2726 PONG 2.2624 0.0001 0.1404 0.0979 0.1383 9086918 QUERY 1.4514 0.0001 0.0343 0.0240 0.0340 59010007 QUERY HIT 19.2778 0.0001 0.1842 0.0976 0.2661 6932327 QRP 50.0632 0.0001 2.0475 1.0534 2.8707 478451 HSEP 1780.4420 0.0003 6.1560 4.3834 8.4758 154742 PUSH 40.1396 0.0001 0.0677 0.0405 0.1157 24934450 1119.5930 0.0001 5.9160 2.3591 22.3494 160695 30.8037 0.0001 0.4346 0.2207 0.5993 9669915 51576.8600 3.0680 2075.3190 6.9379 9298.3600 35 9.8299 0.0001 0.02436 0.0169 0.0243 114663084 BYE VENDOR UNKNOWN ALL Table 6.7: Message interdeparture time statistics (100 µs resolution). Max Min Mean Median Std CLI HSK Type 5189.2340 0.0002 17.9655 0.1273 88.8506 52902 SER HSK 28.4595 0.0003 1.7298 1.1287 1.8712 549456 FIN HSK 5185.5150 0.0006 28.4784 0.3305 110.2372 33373 20.5910 0.0001 1.3773 0.5077 2.1342 694550 PING Samples PONG 2.7215 0.0001 0.1573 0.1012 0.1682 34639367 QUERY 12.1151 0.0001 0.0295 0.0003 0.0541 70066326 QUERY HIT 19.2818 0.0001 0.2188 0.1285 0.2885 6309719 QRP 603.3599 0.0001 2.6350 0.0004 19.8572 680103 HSEP 358.3067 0.0001 2.5020 1.4089 5.8293 384084 38105019 PUSH BYE VENDOR UNKNOWN ALL 76.5303 0.0001 0.0429 0.0003 0.1713 3849.4550 0.0001 134.8121 77.2090 187.7784 7033 64.6689 0.0001 1.8253 1.1124 2.4838 525269 N/A N/A N/A N/A N/A 1 1.5450 0.0001 0.0178 0.0003 0.0353 152047214 The large interarrival and interdeparture times in handshake messages happen because once a servent reaches the preset amount of connections it will no longer accept or attempt new connections until one or more of the existing connections is closed. This behavior would also explain the large interarrival and interdeparture times for BYE messages. 50 6.3. TRANSFER RATE STATISTICS 6.3 Transfer Rate Statistics This section present transfer rates in bytes/second and in messages/second for all Gnutella message types considered in this report. Each statistic is computed from 950568 samples, which is equivalent to approximately 11 days. Table 6.8: Handshake message rate statistics. Type Dir Max Min Mean Median Std CLI HSK IN 12 0 0.58 0 0.79 CLI HSK OUT 30 0 0.06 0 0.56 SER HSK IN 20 0 0.05 0 0.48 SER HSK OUT 12 0 0.58 0 0.79 FIN HSK IN 9 0 0.19 0 0.46 FIN HSK OUT 18 0 0.04 0 0.34 Table 6.9: Handshake byte rate statistics. Type Dir Max Min Mean Median Std CLI HSK IN 4126 0 187 0 258 CLI HSK OUT 14519 0 27 0 273 SER HSK IN 12507 0 31 0 289 SER HSK OUT 4001 0 212 0 306 FIN HSK IN 982 0 15 0 42 FIN HSK OUT 4474 0 9 0 94 Table 6.10: PING–PONG message rate statistics. Type Dir Max Min Mean Median Std PING IN 72 0 3.64 3 1.94 1.56 PING OUT 17 0 0.73 0 PONG IN 130 0 9.56 9 4.33 PONG OUT 433 0 36.44 36 19.12 Table 6.11: PING–PONG byte rate statistics. Type Dir Max Min Mean Median Std PING IN 1665 0 92 92 50 PING OUT 503 0 19 0 45 PONG IN 17043 0 1213 1173 541 PONG OUT 26050 0 2235 2162 1179 Table 6.12: QUERY–QUERY HIT message rate statistics. Type Dir Max Min Mean Median Std QUERY IN 347 0 62.08 60 19.64 QUERY OUT 875 0 73.71 69 34.08 QUERY HIT IN 531 0 7.29 5 9.82 QUERY HIT OUT 272 0 6.64 5 7.39 51 CHAPTER 6. GNUTELLA Table 6.13: QUERY–QUERY HIT byte rate statistics. Type Dir Max Min Mean Median Std QUERY IN 24101 0 4441 4317 1426 QUERY OUT 46424 0 5088 4702 2511 QUERY HIT IN 1736791 0 4868 1912 23917 QUERY HIT OUT 360235 0 3355 1837 5229 Table 6.14: QRP and HSEP message rate statistics. Type Dir Max Min Mean Median Std QRP IN 45 0 0.50 0 0.98 QRP OUT 283 0 0.72 0 7.18 HSEP IN 20 0 0.16 0 0.41 HSEP OUT 23 0 0.40 0 0.68 Table 6.15: QRP and HSEP byte rate statistics. Type Dir Max Min Mean Median Std QRP IN 47340 0 389 0 1408 3660 QRP OUT 152820 0 353 0 HSEP IN 940 0 8 0 21 HSEP OUT 2185 0 32 0 58 Table 6.16: PUSH and BYE message rate statistics. Type Dir Max Min Mean Median Std PUSH IN 1068 0 26.23 23 19.34 PUSH OUT 4091 0 40.09 32 37.32 BYE IN 40 0 0.17 0 0.43 BYE OUT 118 0 0.01 0 0.15 Table 6.17: PUSH and BYE byte rate statistics. Type Dir Max Min Mean Median Std PUSH IN 52332 0 1285 1127 948 PUSH OUT 200459 0 1964 1568 1829 BYE IN 1720 0 6 0 16 BYE OUT 4956 0 1 0 11 52 6.3. TRANSFER RATE STATISTICS Table 6.18: VENDOR and UNKNOWN message rate statistics. Type Dir Max Min Mean Median Std VENDOR IN 6385 0 10.17 1 76.17 VENDOR OUT 24 0 0.55 0 0.80 UNKNOWN IN 1 0 0.00 0 0.01 UNKNOWN OUT 1 0 0.00 0 0.00 Table 6.19: VENDOR and UNKNOWN byte rate statistics. Type Dir Max Min Mean Median Std VENDOR IN 210702 0 347 33 2514 VENDOR OUT 2197 0 44 0 81 UNKNOWN IN 23 0 0 0 0.1 UNKNOWN OUT 43 0 0 0 0.1 Table 6.20: Gnutella (all type) message rate statistics. Dir Max Min Mean Median Std IN 6471 0 120.63 111 84 OUT 4164 0 159.96 153 61 Table 6.21 shows the aggregate transfer rates for all messages types. Table 6.22 provides the summary statistics for the IP byte rates. It is interesting to note that the mean and median IP byte rates are very similar to the corresponding Gnutella byte rates shown in Table 6.21. These values alone would indicate that the compression of Gnutella messages does not yield large gains. However, if one takes into consideration the maximum and standard deviation values it can be observed that compression removes much of the burstiness from the application layer, leading to smoother traffic patterns. This effect can be also seen if one compares Figure 6.1 to Figure 6.2. Table 6.21: Gnutella (all type) byte rate statistics. Dir Max Min Mean Median Std IN 1745341 0 12883 10113 24287 OUT 370825 0 13338 12062 7624 Table 6.22: IP Byte rate statistics. Dir Max Min Mean Median Std IN 249522 0 11536 10961 4075 OUT 176986 0 12668 12037 5722 53 Bytes per second CHAPTER 6. GNUTELLA Jul 01 Jul 02 Jul 03 Jul 04 Jul 05 1.8 M 1.6 M 1.4 M 1.2 M 1.0 M 800.0 k 600.0 k 400.0 k 200.0 k 0.0 12 00 12 00 12 00 12 00 12 Jul 06 00 12 Jul 07 00 12 Time Jul 08 Jul 09 Jul 10 00 00 00 12 12 12 Jul 11 00 12 Jul 12 00 Jul 13 12 00 Bytes per second (a) Incoming Gnutella Byte Rate Jul 01 Jul 02 Jul 03 Jul 04 Jul 05 400.0 k 350.0 k 300.0 k 250.0 k 200.0 k 150.0 k 100.0 k 50.0 k 0.0 12 00 12 00 12 00 12 00 12 Jul 06 00 12 Jul 07 00 12 Time Jul 08 Jul 09 Jul 10 00 00 00 12 12 12 Jul 11 00 12 Jul 12 00 Jul 13 12 00 Bytes per second (b) Outgoing Gnutella Byte Rate Jul 01 Jul 02 Jul 03 Jul 04 Jul 05 1.8 M 1.6 M 1.4 M 1.2 M 1.0 M 800.0 k 600.0 k 400.0 k 200.0 k 0.0 12 00 12 00 12 00 12 00 12 Jul 06 00 12 Jul 07 00 12 Time Jul 08 Jul 09 Jul 10 00 00 00 12 12 12 Jul 11 00 12 Jul 12 00 Jul 13 12 00 (c) Incoming + Outgoing Gnutella Byte Rate Bytes per second Figure 6.1: Gnutella Transfer Rates. Jul 01 250.0 k Jul 02 Jul 03 Jul 04 Jul 05 Jul 06 Jul 07 Jul 08 Jul 09 Jul 10 00 00 00 Jul 11 Jul 12 Jul 13 200.0 k 150.0 k 100.0 k 50.0 k 0.0 12 00 12 00 12 00 12 00 12 00 12 00 12 Time 12 12 12 00 12 00 12 00 Bytes per second (a) Incoming IP Byte Rate Jul 01 Jul 02 Jul 03 Jul 04 Jul 05 180.0 k 160.0 k 140.0 k 120.0 k 100.0 k 80.0 k 60.0 k 40.0 k 20.0 k 0.0 12 00 12 00 12 00 12 00 12 Jul 06 00 12 Jul 07 00 12 Time Jul 08 Jul 09 Jul 10 00 00 00 12 12 12 Jul 11 00 12 Jul 12 00 Jul 13 12 00 Bytes per second (b) Outgoing IP Byte Rate Jul 01 Jul 02 Jul 03 Jul 04 Jul 05 300.0 k 250.0 k 200.0 k 150.0 k 100.0 k 50.0 k 0.0 12 00 12 00 12 00 12 00 12 Jul 06 00 12 Jul 07 00 12 Time Jul 08 Jul 09 Jul 10 00 00 00 12 12 12 (c) Incoming + Outgoing IP Byte Rate Figure 6.2: Gnutella Transfer Rates at IP layer. 54 Jul 11 00 12 Jul 12 00 Jul 13 12 00 Appendix A BitTorrent Application Log DTD <!ELEMENT <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ELEMENT <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST EVENTLIST (#PCDATA | EVENT)*> EVENTLIST start_timestamp CDATA #IMPLIED> EVENTLIST peertype CDATA #IMPLIED> EVENTLIST version CDATA #IMPLIED> EVENTLIST bound_ip CDATA #IMPLIED> EVENTLIST bound_port CDATA #IMPLIED> EVENTLIST tracker_ip CDATA #IMPLIED> EVENTLIST tracker_port CDATA #IMPLIED> EVENTLIST peer_id CDATA #IMPLIED> EVENTLIST pieces CDATA #IMPLIED> EVENTLIST piecesize CDATA #IMPLIED> EVENTLIST nfiles CDATA #IMPLIED> EVENTLIST totlen CDATA #IMPLIED> EVENTLIST max_slice_length CDATA #IMPLIED> EVENTLIST rarest_first_cutoff CDATA #IMPLIED> EVENTLIST ip CDATA #IMPLIED> EVENTLIST download_slice_size CDATA #IMPLIED> EVENTLIST snub_time CDATA #IMPLIED> EVENTLIST rerequest_interval CDATA #IMPLIED> EVENTLIST max_uploads CDATA #IMPLIED> EVENTLIST saveas CDATA #IMPLIED> EVENTLIST min_uploads CDATA #IMPLIED> EVENTLIST spew CDATA #IMPLIED> EVENTLIST max_upload_rate CDATA #IMPLIED> EVENTLIST minport CDATA #IMPLIED> EVENTLIST http_timeout CDATA #IMPLIED> EVENTLIST timeout_check_interval CDATA #IMPLIED> EVENTLIST display_interval CDATA #IMPLIED> EVENTLIST max_initiate CDATA #IMPLIED> EVENTLIST max_message_length CDATA #IMPLIED> EVENTLIST upload_rate_fudge CDATA #IMPLIED> EVENTLIST check_hashes CDATA #IMPLIED> EVENTLIST min_peers CDATA #IMPLIED> EVENTLIST keepalive_interval CDATA #IMPLIED> EVENTLIST maxport CDATA #IMPLIED> EVENTLIST request_backlog CDATA #IMPLIED> EVENTLIST bind CDATA #IMPLIED> EVENTLIST max_rate_period CDATA #IMPLIED> EVENTLIST url CDATA #IMPLIED> EVENTLIST statfile CDATA #IMPLIED> EVENTLIST report_hash_failures CDATA #IMPLIED> EVENTLIST timeout CDATA #IMPLIED> EVENTLIST responsefile CDATA #IMPLIED> EVENTLIST max_allow_in CDATA #IMPLIED> EVENT (#PCDATA)> EVENT uploaded CDATA #IMPLIED> EVENT downloaded CDATA #IMPLIED> EVENT left CDATA #IMPLIED> EVENT last CDATA #IMPLIED> EVENT trackerid CDATA #IMPLIED> EVENT event CDATA #IMPLIED> EVENT numwant CDATA #IMPLIED> EVENT port CDATA #IMPLIED> EVENT txtime CDATA #IMPLIED> EVENT rxtime CDATA #IMPLIED> 55 APPENDIX A. BITTORRENT APPLICATION LOG DTD <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST <!ATTLIST EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT EVENT rxstart CDATA #IMPLIED> direction CDATA #IMPLIED> down CDATA #IMPLIED> dst CDATA #IMPLIED> dstid CDATA #IMPLIED> nconns CDATA #IMPLIED> type CDATA #IMPLIED> timestamp CDATA #IMPLIED> src CDATA #IMPLIED> srcid CDATA #IMPLIED> piece CDATA #IMPLIED> begin CDATA #IMPLIED> length CDATA #IMPLIED> 56 Appendix B Acronyms AD Anderson-Darling HTTP HyperText Transfer Protocol AP Access Point HUGE Hash/URN Gnutella Extensions BTH Blekinge Institute of Technology HSEP Horizon Size Estimation Protocol BSS Basic Structure Service IID Independent and Identically Distributed ISP Internet Service Provider KS Kolmogorov-Smirnov LN Leaf Node LRD Long-Range Dependence MLE Maximum Likelihood Estimation ML Maximum-Likelihood CCDF Complementary Cumulative Distribution Function CDN Content Delivery Network CGI Common Gateway Interface CS Client-Server CVM Cramér-von Mises DHT Distributed Hash Table DNS Domain Name System NAT Network Address Translation DTD Document Type Definition NFS Network Filesystem DOM Document Object Model NNTP Network News Transfer Protocol EDF Empirical Distribution Function OS Operating System EPDF Experimental Probability Density Function P2P Peer-to-Peer ESS Extended Service Set PDF Probability Density Function F2F Firewall-to-Firewall PFS Partial File Sharing FTP File Transfer Protocol PIT Probability Integral Transform GGEP Gnutella Generic Extension Protocol MPAA Motion Picture Association of America PARQ Passive/Active Remote Queueing PSTN Public Switched Telephone Network GWC Gnutella Web Cache QQ Quantile-Quantile HSEP Horizon Size Estimation Protocol QoS Quality of Service 57 APPENDIX B. ACRONYMS QRP Query Routing Protocol TTL Time-To-Live RIAA Recording Industry Association of America UDP User Datagram Protocol UHC UDP Host Cache UP Ultrapeer URI Uniform Resource Indicator SHA-1 Secure Hash Algorithm One URL Universal Resource Locator SMTP Simple Mail Transfer Protocol URN Uniform Resource Names SNMP Simple Network Management Protocol UUCP Unix to Unix Copy Protocol RMON Remote Monitoring SAX Simple API for XML SRD Short-Range Dependence VoIP TCP Transport Control Protocol WWW World-Wide Web 58 Voice over IP Bibliography [1] Cachelogic. P2P in 2005. http://cachelogic.com/research/p2p2005.php, 2005. [2] Adrian Popescu. Routing on overlay networks: Developments and challenges. IEEE Communications Magazine, Vol. 43(8):22–23, August 2005. [3] Wikipedia Encyclopedia. Peer-to-peer. http://en.wikipedia.org/wiki/P2p, August 2005. [4] Clip2. The Annotated Gnutella Protocol Specification v0.4. The Gnutella Developer Forum (GDF), 1.8th edition, July 2003. http://groups.yahoo.com/group/the gdf/files/Development/. [5] Jordan Ritter. Why Gnutella can’t scale. No, really., February 2001. http://www.darkridge.com-/˜jpr5-/doc-/gnutella.html. [6] Rüdiger Schollmeier. A definition of peer-to-peer networking for the classification of peer-topeer architectures and applications. In Proceedings of the First International Conference on Peer-to-Peer Computing. IEEE, 2001. [7] Akamai. http://www.akamai.com, August 2005. [8] Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Transactions on Networking, Volume 11(Number 1):17–32, February 2003. [9] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker. A scalable content-addressable network. In Proceedings of ACM SIGCOMM 2001, pages 161–172, San Diego, CA, August 2001. ACM Press. [10] Krishna P. Gummadi, Ramakrishna Gummadi, Steven D. Gribble, Sylvia Ratnasamy, Scott Shenker, and Ion Stoica. The impact of DHT routing geometry on resilience and proximity. In Proceedings of the ACM SIGCOMM 2003, pages 381–394, Karlsruhe, Germany, August 2003. ACM Press. [11] Yatin Chawathe, Sylvia Ratnasamy, and Lee Breslau. Making gnutella-like P2P systems scalable. In Proc. ACM SIGCOMM ’03, August 2003. [12] Miguel Castro, Manuel Costa, and Antony Rowstron. Peer-to-peer overlays: structured, unstructured, or both? Technical report, Microsoft Research, Cambridge, UK, July 2004. [13] Pablo Brenner. A Technical Tutorial on the 802.11 Protocol. BreezeCOM Wireless Communications, 1997. 59 BIBLIOGRAPHY [14] Xiaoyan Hong, Kaixin Xu, and Mario Gerla. Scalable routing protocols for mobile ad hoc networks. IEEE Network, pages 11–21, August 2002. [15] ICQ. http://www.icq.com, August 2005. [16] Yahoo! messenger. http://messenger.yahoo.com, August 2005. [17] Msn messenger. http://messenger.msn.com, August 2005. [18] The [email protected] Project. [email protected] – the search for extraterrestrial intelligence. http://setiathome.ssl.berkeley.edu/, February 2005. [19] distributed.net. distributed.net. http://distributed.net, February 2005. [20] ZetaGrid. ZetaGrid. http://www.zetagrid.net/, February 2005. [21] Beowulf. http://www.beowulf.org, December 2005. [22] Ian Foster and Carl Kesselman. Globus: A metacomputing infrastructure toolkit. The International Journal of Supercomputer Applications and High Performance Computing, 11(2):115– 128, 1997. [23] Ian Foster and Carl Kesselman. The Globus project: A status report. In Proceedings of the Seventh Heterogeneous Computing Workshop, pages 4–18, March 1998. [24] David P. Anderson. BOINC: A system for public-resource computing and storage. In Fifth IEEE/ACM International Workshop on Grid Computing, pages 4–10, November 2004. [25] Electricsheep. http://electricsheep.org, December 2005. [26] Ian Foster and Adriana Iamnitchi. On death, taxes, and the convergence of peer-to-peer and grid computing. In 2nd International Workshop on Peer-to-Peer Systems (IPTPS’03), Berkeley, CA, USA, February 2003. [27] Jonathan Ledlie, Jeff Schneidman, Margo Seltzer, and John Huth. Scooped, again. In Proceedings of 2nd International Workshop on Peer-to-Peer Systems (IPTPS ’03), Berkeley, CA, USA, February 2003. [28] Ian F. Akyildiz, Weilian Su, Yogesh Sankarusubramaniam, and Erdal Cayirici. A survey on sensor networks. In IEEE Communication Magazine, pages 102–114. August 2002. [29] Deborah Estrin, Ramesh Govindan, John S. Heidemann, and Satish Kumar. Next century challenges: Scalable coordination in sensor networks. In Proceedings of the ACM/IEEE International Conference on Mobile Computing and Networking, pages 263–270, Seattle, WA, USA, August 1999. ACM. [30] Chalermek Intanagonwiwat, Ramesh Govindan, Deborah Estrin, John Heidenmann, and Fabio Silva. Directed diffusion for wireless sensor networking. IEEE/ACM Transactions on Networking, 11(1):2–16, February 2003. [31] Joanna Kulik, Wendi Heinzelman, and Balakrishnan Hari. Negotiation-based protocols for disseminating information in wireless sensor networks. Wireless Networks, 8(2/3):169–185, 2002. [32] Bram Cohen. BitTorrent. http://www.bittorrent.com/, March 2006. 60 BIBLIOGRAPHY [33] BitTorrent specification. http://wiki.theory.org/BitTorrentSpecification, February 2005. [34] eDonkey. http://www.edonkey.com, February 2005. [35] NeoModus. DirectConnect. http://www.neo-modus.com, February 2005. [36] Sharman Networks. KaZaA. http://www.kazaa.com, February 2005. [37] National Institute of Standards and Technology. Specifications for secure hash standard. http://www.itl.nist.gov/fipspubs/fip180-1.htm, April 1995. FIPS PUB 180-1. [38] D. Eastlake 3rd and P. Jones. US Secure Hash Algorithm 1 (SHA1), September 2001. RFC 3174. [39] T. Berners-Lee, L. Masinter, and M. McCahill. Uniform Resource Locators (URL), December 1994. RFC 1738. [40] T. Berners-Lee, R. Fielding, and L. Masinter. Uniform Resource Identifiers (URI): Generic Syntax, August 1998. RFC 2396. [41] Bram Cohen. BitTorrent protocol specification. http://www.bitconjurer.org/BitTorrent/protocol.html, February 2005. [42] Olaf van der Spek. BitTorrent udp-tracker protocol extension. http://libtorrent.sourceforge.net/udp tracker protocol.html, February 2005. [43] J.A. Pouwelse, P. Garbacki, D.H.J. Epema, and H.J. Sips. The BitTorrent P2P file-sharing system: Measurements and analysis. 4th International Workshop on Peer-to-Peer Systems (IPTPS’05), February 2005. [44] Azureus. http://azureus.sourceforge.net/, August 2005. [45] J. Chapweske. Tree hash exchange format. http://open-content.net/specs/draft-jchapweske-thex-02.html, February 2005. [46] Tor Klingberg and Raphael Manfredi. Gnutella 0.6. The Gnutella Developer Forum (GDF), 200206-draft edition, June 2002. http://groups.yahoo.com/group/the gdf/files/Development/. [47] Anurag Singla and Christopher Rohrs. Ultrapeers: Another Step Towards Gnutella Scalability. Lime Wire LLC, 1.0 edition, November 2002. http://groups.yahoo.com/group/the gdf/files/Development/. [48] Adam A. Fisk. Gnutella Dynamic Query Protocol. LimeWire LLC, 0.1 edition, May 2003. http://groups.yahoo.com/group/the gdf/files/Proposals/Working Proposals/search/Dynamic Querying. [49] Stefan Saroiu, P. Krishna Gummadi, and Steven D. Gribble. A measurement study of peerto-peer file sharing systems. In Proceedings of the Multimedia Computing and Networking (MMCN), January 2002. [50] Gnutella protocol development. http://www.the-gdf.org, December 2005. 61 BIBLIOGRAPHY [51] Raphael Manfredi. Gnutella Traffic Compression. The Gnutella Developer Forum (GDF), January 2003. http://groups.yahoo.com/group/the gdf/files/Development/. [52] Jean-loup Gailly and Mark Adler. zlib. http://www.gzip.org/zlib, August 2005. [53] P. Leach, M. Mealling, and R. Salz. A Universally Unique IDentifier (UUID) URN Namespace, July 2005. RFC 4122. [54] Jason Thomas. Gnutella generic extension protocol (GGEP). http://rfc-gnutella.sourceforge.net/src/GnutellaGenericExtensionProtocol.0.51.html, February 2002. [55] Sumeet Thadani. Metadata extension. http://www.the-gdf.org/wiki/index.php?title=Metadata Extension, 2001. [56] Christopher Rohrs. Query Routing for the Gnutella Network. Lime Wire LLC, 1.0 edition, May 2002. http://groups.yahoo.com/group/the gdf/files/Development/. [57] Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communication of the ACM, Volume 13(Number 7):422–426, July 1970. ISSN:0001-0782. [58] Adam A. Fisk. Gnutella Ultrapeer Query Routing. Lime Wire LLC, 0.1 edition, May 2003. http://groups.yahoo.com/group/the gdf/files/Proposals/ Working Proposals/search/Ultrapeer QRP/. [59] Thomas Schürger. Horizon size estimation on the Gnutella network v0.2, March 2004. http://www.menden.org/gnutella/hsep.html. [60] G. Mohr. Hash/URN gnutella extensions (HUGE) v0.94. http://groups.yahoo.com/group/the gdf/files/Proposals/Working Proposals/HUGE/, April 2002. [61] Raphael Manfredi. Passive/active remote queueing (parq). http://groups.yahoo.com/group/the gdf/files/Proposals/Working Proposals/QUEUE/, May 2003. Version 1.0.a. [62] Reliable udp (rudp) file transfer spec 1.0. http://groups.yahoo.com/group/the gdf/files/Proposals/Pending Proposals/F2F/, February 2005. [63] Pyda Srisuresh, Bryan Ford, and Dan Kegel. State of Peer-to-Peer (P2P) communication across Network Address Translators (NATs). Internet Draft, October 2005. draft-srisureshbehave-p2p-state-01.txt. [64] Sam Berlin, Andrew Mickish, Julian Qian, and Sam Darwin. Multicast in Gnutella. Document Revision 2, November 2004. http://groups.yahoo.com/group/the gdf/files/Proposals/Working Proposals/LAN Multicast/. [65] Conny Palm. Intensitätsschwankungen im Fernsprechverkehr. PhD thesis, Royal Institute of Technology, 1943. [66] Carey Williamson. Internet traffic measurement. 2001. [67] ”B. Claise”. ”Cisco Systems NetFlow Services Export Version 9”. Cisco Systems, October 2004. RFC3954. [68] Van Jacobsen, C. Leres, and S. McCanne. Tcpdump. http://www.tcpdump.org, August 2005. [69] Gerald Combs and contributors. Ethereal: http://www.ethereal.com, November 2005. 62 A network protocol analyzer. BIBLIOGRAPHY [70] Thomas H. Ptacek and Timothy N. Newsham. Insertion, evasion, and denial of service: Eluding network intrusion detection. Technical report, Secure Networks, Inc., January 1998. [71] Mark Handley and Vern Paxon. Network intrusion detection: Evasion, traffic normalization, and end-to-end protocol semantics. In Proceeding of the 10th USENIX Security Symposium, Washington, D.C., USA, August 2001. USENIX Association. [72] Shawn Ostermann. Tcptrace. http://www.tcptrace.org, August 2005. [73] Gary R. Wright and W. Richard Stevens. TCP/IP Illustrated: The Implementation, volume 2. Addison-Wesley, 1995. ISBN: 0-201-63354-X. [74] Guido van Rossum et al. Python. Online at http://www.python.org, August 2005. [75] W3C. Extensible Markup Language (XML) 1.0, 2004. 63
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
advertisement