A new Content Distribution Network architecture - PlentyCast CAO WEI QIU Master of Science Thesis Stockholm, Sweden 2004 IMIT/LCN 2004-05 A new Content Distribution Network architecture - PlentyCast by Cao Wei Qiu KTH/IMIT and SCINT [email protected] or [email protected] 2004-April-30 Stockholm, Sweden A thesis presented to the Royal Institute of Technology, Stockholm in partial fulfillment of the requirement for the degree of Master of Science in Internetworking Academic Advisor and Examiner: G. Q. Maguire Jr. Department of Microelectronics and Information Technology (IMIT) Royal Institute of Technology (KTH) Industry Supervisor: Lars-Erik Eriksson Swedish Center of Internet Technology (SCINT) Signature: Signature: ________________________ Date: ________________________ Date: ________________________ ________________________ Abstract in English Content Distribution Networks have existed for some years. They involve the following problem domains and have attracted attention both in academic research and industry: content replica placement, content location and routing, swarm intelligence, and overlay network self-organization for this type of distributed system. In this project, we propose a novel Content Distribution Network architecture – PlentyCast. This study focuses on improving access latency, network scalability, high content availability, low bandwidth consumption, and improving infrastructure performance for Content Distribution Networks. Outstanding problems such as: Flash crowd, DoS, and difficulty of traffic engineering due to Peer-to-Peer are addressed. Abstract in Swedish Mediadistributionsnätverk har funnits några år. De har fått uppmärksamhet i både akademisk forskning och i industrin och kännetecknas av följande frågor: placering av innehållskopior, lokalisering av innehåll och routing, svärm intelligens, överlagrade nätverks självorganisering för denna typ av fördelade system. I denna rapport studeras en ny nätverksarkitektur för innehållsfördelning - PlentyCast. Denna studie fokuserar på tillgångslatens, nätverksskalbarhet, hög innehållstillgång, låg bandbreddskonsumtion, och förbättrad infrastrukturprestanda för Innehållsfördelningsnätverk. ii Acknowledgement Thanks to Mr. Bengt Källbäck’s for his recommendation of this job in SCINT. Thanks to Mr. Lars-Erik Eriksson in SCINT to select me for this project. He greatly supports my work by taking time to content distribution discuss with me. Thanks to David Jonsson from Interactive Institute for his input of the idea of using Peer-to-Peer techniques. Thanks to my teacher Professor Vlad Vlassov for many discussions. Thanks to my colleagues in SCINT, Kjell Torkelsson, Eriksson B. Svante, Lennart Helleberg, and Staffan Dahlberg gave me their warm-heart assistance during my work at SCINT. My parents’ encourage from China gives me great deal of confidence, enabling me to have solve every problem regarding living and working in Sweden. Thanks for their deep love and support in my life! My Godparents – Eva Lodén and Ragnar Lodén have their greatly helped in my life and work during my days in Sweden. Their love made it much easier live and work in Sweden. Thanks to Per Pedersen in Ericsson for his strong recommendation to the Royal Institute of Technology. My dream is to work as a teacher in either academia or as a coach in industry. This is because Professor Gerald Q. Maguire Jr.- my academic advisor and examiner, became my model for this dream. I like his high expectations for my examination and his very helpful advice role. This model impressed me and awoke my enthusiasm for teaching or coaching. Thank you, Chip! A well-known principle has been demonstrated once again: any achievement is not due just to one person, but due to many people. iii Table of Content CHAPTER 1 1.1 1.2 1.3 1.3.1 1.3.2 1.3.3 1.3.4 1.4 1.4.1 1.4.2 1.4.3 1.4.4 INTRODUCTION TO CONTENT DISTRIBUTION NETWORK...................... 1 CONTENT DISTRIBUTION OVER THE INTERNET ....................................................................... 1 INTERNET STRUCTURE ............................................................................................................ 2 INTERNET BOTTLENECKS ........................................................................................................ 3 First-mile bottleneck......................................................................................................... 4 Peering bottleneck problem .............................................................................................. 4 Backbone bottleneck ......................................................................................................... 5 Last mile bottleneck .......................................................................................................... 5 CDN TECHNOLOGIES ............................................................................................................. 5 A system overview............................................................................................................. 5 A typical architecture ....................................................................................................... 6 Traditional CDN criteria .................................................................................................. 8 Core mechanisms............................................................................................................ 10 1.4.4.1 Server placement................................................................................................................. 10 1.4.4.1.1 Theoretical problem models and solutions..................................................................... 10 1.4.4.1.2 Heuristic approaches...................................................................................................... 11 1.4.4.2 Replica placement ............................................................................................................... 12 1.4.4.2.1 A typical cost model ...................................................................................................... 12 1.4.4.2.2 Discussions of replica placement algorithms criteria ..................................................... 13 1.4.4.3 Replica management ........................................................................................................... 14 1.4.4.3.1 Strong consistency ......................................................................................................... 14 Client validation ............................................................................................................................ 14 Server invalidation ........................................................................................................................ 14 Adaptive Leases ............................................................................................................................. 15 Propagation and Invalidation Combination .................................................................................. 15 1.4.4.3.2 Weak consistency .......................................................................................................... 15 Adaptive TTL ................................................................................................................................. 15 Piggyback Invalidation.................................................................................................................. 15 The Distributed Object Consistency Protocol................................................................................ 16 1.4.4.4 Server location and request routing..................................................................................... 16 1.4.4.4.1 Server location ............................................................................................................... 17 1.4.4.4.1.1 Multicast vs. Agent.................................................................................................. 17 1.4.4.4.1.2 Routing layer vs. application layer .......................................................................... 17 1.4.4.4.2 Request routing.............................................................................................................. 17 1.4.4.4.2.1 Transport-Layer Request-Routing ........................................................................... 18 1.4.4.4.2.2 Single Reply ............................................................................................................ 19 1.4.4.4.2.3 Multiple Replies ...................................................................................................... 19 1.4.4.4.2.4 Multi-Level Resolution............................................................................................ 19 1.4.4.4.2.5 NS Redirection ........................................................................................................ 19 1.4.4.4.2.6 CNAME Redirection ............................................................................................... 20 1.4.4.4.2.7 Anycast.................................................................................................................... 20 1.4.4.4.2.8 Object Encoding ...................................................................................................... 20 1.4.4.4.2.9 DNS Request-Routing Limitations.......................................................................... 21 1.4.4.4.2.10 Application-Layer Request-Routing...................................................................... 21 1.4.4.4.2.11 Header Inspection .................................................................................................. 22 1.4.4.4.2.12 URL-Based Request-Routing ................................................................................ 22 1.4.4.4.2.13 302 Redirection ..................................................................................................... 22 1.4.4.4.2.14 In-Path Element ..................................................................................................... 22 1.4.4.4.2.15 Header-Based Request-Routing............................................................................. 22 1.4.4.4.2.16 Site-Specific Identifiers ......................................................................................... 23 1.4.4.4.2.17 Content Modification............................................................................................. 23 1.4.4.4.2.18 Combination of Multiple Mechanisms .................................................................. 24 1.4.4.5 Self-organization................................................................................................................. 24 1.5 DISCUSSION.......................................................................................................................... 25 1.5.1 Large content for large numbers of users....................................................................... 25 1.5.2 Denial of service attack .................................................................................................. 26 1.5.3 Scalability issue .............................................................................................................. 26 1.5.4 Self-organization in next generation of CDNs................................................................ 26 CHAPTER 2 2.1 2.2 INTRODUCTION TO PEER-TO-PEER .............................................................. 28 A DEFINITION OF P2P ........................................................................................................... 28 PROBLEM DOMAINS AND WELL-KNOWN APPROACHES.......................................................... 28 iv 2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 2.2.6 2.2.7 Decentralization ............................................................................................................. 28 Scalability ....................................................................................................................... 29 Self-organization ............................................................................................................ 30 Anonymity ....................................................................................................................... 30 Cost of ownership ........................................................................................................... 31 Ad hoc connectivity......................................................................................................... 31 Performance ................................................................................................................... 31 2.2.7.1 2.2.7.2 2.2.7.3 2.2.7.4 2.2.7.5 2.2.7.6 2.2.7.7 Replication .......................................................................................................................... 32 Caching ............................................................................................................................... 32 Intelligent routing and peering ............................................................................................ 33 Security ............................................................................................................................... 33 Digital Right Management .................................................................................................. 34 Reputation ........................................................................................................................... 34 Accountability..................................................................................................................... 34 2.2.8 Transparency and Usability............................................................................................ 35 2.2.9 Fault-resilience............................................................................................................... 35 2.2.10 Manageability ............................................................................................................ 36 2.2.11 Interoperability .......................................................................................................... 36 2.3 CORE TECHNIQUES ............................................................................................................... 37 2.3.1 Location and routing ...................................................................................................... 37 2.3.1.1 2.3.1.2 2.3.1.3 2.3.1.4 2.3.1.5 2.3.2 Overlay network mapping............................................................................................... 43 2.3.2.1 2.3.2.2 2.4 Centralized directory model ................................................................................................ 37 Flooding requests model ..................................................................................................... 37 Distributed Hashing Table model........................................................................................ 38 Plaxton location and routing ............................................................................................... 39 DHT algorithms benchmarking........................................................................................... 41 Proximity routing ................................................................................................................ 43 Proximity neighbor/server selection.................................................................................... 45 DISCUSSION.......................................................................................................................... 47 CHAPTER 3 INTRODUCTION TO SWARM IN CONTENT DELIVERY ............................ 48 3.1 AN OVERVIEW OF SWARM IN CONTENT DELIVERY ................................................................ 48 3.2 CORE TECHNIQUES IN SWARM CONTENT DELIVERY .............................................................. 49 3.2.1 Splitting large files.......................................................................................................... 50 3.2.2 Initiated publishing......................................................................................................... 50 3.2.3 Mesh construction........................................................................................................... 51 3.2.4 Peer and content identification....................................................................................... 51 3.2.5 Content/peer location ..................................................................................................... 51 3.2.6 Fault resiliency ............................................................................................................... 52 3.2.7 End User bandwidth ....................................................................................................... 52 3.2.8 ISP infrastructure ........................................................................................................... 52 3.3 AN INTRODUCTION OF FORWARD ERROR CORRECTION CODES ............................................ 55 CHAPTER 4 4.1 4.2 4.3 4.4 INTRODUCTION TO MOBILE AGENTS .......................................................... 58 A DEFINITION....................................................................................................................... 58 WHAT PROBLEMS CAN MOBILE AGENTS SOLVE? .................................................................. 59 CORE TECHNIQUES IN MOBILE AGENTS ................................................................................. 60 OVERVIEW OF THE REMAINING CHAPTERS ........................................................................... 62 CHAPTER 5 PROBLEM STATEMENT ..................................................................................... 63 5.1 PLENTYCAST DESIGN GOALS ................................................................................................ 63 5.1.1 Improved access latency ................................................................................................. 63 5.1.2 Improve network scalability............................................................................................ 66 5.1.3 Improve content availability ........................................................................................... 66 5.1.4 Lower bandwidth consumption....................................................................................... 68 5.1.5 Improve infrastructure performance .............................................................................. 69 5.2 PROBLEM MODELING ............................................................................................................ 70 5.3 DISCUSSION OF THE CRITERIA .............................................................................................. 71 CHAPTER 6 6.1 A NOVEL ARCHITECTURE – PLENTYCAST ................................................. 72 SYSTEM OVERVIEW .............................................................................................................. 72 v 6.2 6.3 6.3.1 6.3.2 6.3.3 6.3.4 6.4 6.4.1 6.4.2 6.4.3 6.5 6.5.1 6.5.2 6.5.3 6.6 6.6.1 6.6.2 6.6.3 6.6.4 6.7 6.7.1 6.7.2 6.7.3 6.7.4 6.8 6.9 6.9.1 6.9.2 6.9.3 6.9.4 HIGH LEVEL SYSTEM ARCHITECTURE ................................................................................... 73 PLENTYCAST CLIENT ........................................................................................................... 74 Active binning ................................................................................................................. 74 SNMP client.................................................................................................................... 74 Peer lookup service......................................................................................................... 74 Peer Selection ................................................................................................................. 75 LANDMARK SERVER ............................................................................................................. 75 Placement ....................................................................................................................... 75 Download & upload monitor.......................................................................................... 76 Load balancing ............................................................................................................... 76 DISTRIBUTION SYSTEM ......................................................................................................... 76 Object splitter ................................................................................................................. 76 FEC encoder................................................................................................................... 76 Block distributor ............................................................................................................. 76 REPLICA SERVER .................................................................................................................. 77 Placement ....................................................................................................................... 77 Storage and delivery ....................................................................................................... 77 Cone loading................................................................................................................... 77 Active binning ................................................................................................................. 78 LOCATION AND ROUTING SYSTEM ........................................................................................ 78 Policy engine .................................................................................................................. 78 Server selector ................................................................................................................ 79 Block meter ..................................................................................................................... 79 Content manager ............................................................................................................ 79 ACCOUNTING SYSTEM .......................................................................................................... 80 SYSTEM CHARACTERISTICS ANALYSIS AND DISCUSSION ...................................................... 80 Case study 1: Normal access mode ................................................................................ 80 Case study 2: Flash Crowd and DDoS mode ................................................................. 81 Case study 3: ADSL users traffic.................................................................................... 82 System characteristics .................................................................................................... 82 CHAPTER 7 7.1 7.2 CONCLUSION AND FUTURE WORK ............................................................... 87 CONCLUSION ........................................................................................................................ 87 FUTURE WORK...................................................................................................................... 87 APPENDIX 1: AN TYPICAL PROGRAM OF HOW TO SPLIT A LARGE FILE INTO PIECES ................................................................................................................................................................ 97 APPENDIX 2: SNAPSHOOT OF USING A FILE SPLITTING TOOL(FREEWARE)............... 99 APPENDIX 3: RECORD OF A MOVIE FILE DOWNLOADED VIA BITTORRENT............. 100 vi LIST OF FIGURES Figure 1. An overview of how content is distributed or delivered to its users ..............1 Figure 2 Four classes of bottlenecks on today’s Internet...............................................3 Figure 3 Overview of an typical CDN ...........................................................................6 Figure 4. A typical CDN architecture ............................................................................7 Figure 5. Replica consistency overview ......................................................................14 Figure 6. HP DOCP Architecture ................................................................................16 Figure 7. Content request routing mechanisms............................................................18 Figure 8. Centralized request model ............................................................................38 Figure 9. Flooding request model ................................................................................38 Figure 10. DHT Model ................................................................................................39 Figure 11. Overlay concept..........................................................................................44 Figure 12. Binning Strategy concept ...........................................................................45 Figure 13. benchmark between client-server and peer-to-peer swarm in content delivery ................................................................................................................48 Figure 14. Swarming flow overview ...........................................................................50 Figure 15. Match and mismatch in P2P overlay mapping ...........................................53 Figure 16. Agent Taxonomy [133] ..............................................................................58 Figure 17. Network layers involved in migration........................................................60 Figure 18. Migration implemented in Java .................................................................61 Figure 19 CDN usability decomposition .....................................................................64 Figure 20. Problem model............................................................................................70 Figure 21. PlentyCast overview...................................................................................72 Figure 22. PlentyCast high level architecture ..............................................................73 Figure 23. Cone loading...............................................................................................78 Figure 24. PlentyCast system characteristics...............................................................83 LIST OF TABLES Table 1. DHT algorithms benchmarking .....................................................................41 Table 2. Class of Internet users....................................................................................52 Table 3. Benchmark of swarm systems .......................................................................55 Table 4. Correlations between latency and its factors .................................................65 Table 5. Accountig Database header ..........................................................................80 Table 6. System characteristics clarification ...............................................................86 vii Chapter 1 Introduction to Content Distribution Network In this Chapter, I will give an introduction of Content Distribution Networks technology. This includes examining three aspects for each of them: (1) problems to be resolved in this realm, (2) core techniques have been used to support to this approach, and (3) hot issues related in each realm. The following chapters have been arranged in this way: second chapter will introduce Peer-to-Peer technology, third chapter will have an introduction of swarming content delivery and Forward Error Correction techniques. A compact introduction of Mobile Agent technology will be conducted in chapter four. After all technologies which we are interested in this project, a problem statement will be made to elaborate each goal that we set up in chapter 5. I will explain our motivation and understanding towards each problem which we are interested in solving, problem model, and research criteria in this project. In chapter 6, we our proposal of a highly usable, scalable, and reliable Content Distribution architecture. At the end of this chapter, I will conduct case studies to evaluate if PlentyCast fulfills the project goals. Finally we will conclude our work and highlight our future work. 1.1 Content distribution over the Internet Figure 1. An overview of how content is distributed or delivered to its users When the Internet bubble was breaking in 1998, many people realized that publishing on a web site is only one step of hosting a web site, and the most important goal is to get the web content delivered to the users over the network. Figure 1 shows an overview of how content will be distributed or delivered to the users. Here, a client first sends a request to a content server via an application layer protocol such as HTTP [38]. After the request has been accepted by the content server, the server sends the content to the client over the network links across different routers and or switches. From a hardware perspective, both client and server are similar; and they both are likely to be connected to the Internet via an Ethernet Network Interface Card. The major difference between client and server is related to how their software1 is structured. The Internet connects the users and the content providers. In Figure 1, there are three actors: the user, the content provider, and the ISPs who provide the network between the user and the content provider. They each have the different requirements based on their own role. From a user’s perspective, the expectation is fast access to the content they want at any time they want it. In addition, the user expect to have good quality of the delivered content. From content provider’s perspective, the expectation is that their content should be maximally available for all the users who want to access, this should be only limited by the performance of their content server. From the ISPs’ perspectives, they expect to have larger number of users utilizing their access network, while minimizing the bandwidth consumption on interconnections to their networks; expect high performance from their network infrastructure. 1.2 Internet structure By definition, Internet is a network of networks. The Internet is a well-known example, being made up of thousands of different networks (also called Autonomous Systems or AS’s) that communicate by using the IP protocol (see Figure 2). These networks range from large backbone providers such as UUNet and BBN to small local ISPs such as Swipnet in Stockholm's Solna. Each of these networks is a complex entity in itself, physically being made up of routers, switches, fiber, microwave, ATM, Ethernet, etc. All of these components work together to transport packets through the network toward their destinations. In order for the Internet to function as a single global network interconnecting everyone, all of these individual networks must connect to each other and exchange traffic. This happens through a process called peering. When two networks decide to connect and exchange traffic, a connection called a peering session is established between a pair of routers, each located at the border of one of the two networks. These two routers periodically exchange routing information, thus informing each other of the destinations that are reachable through their respective networks. There exist thousands of peering points on the Internet, each falling into one of two categories: public or private. Public peering occurs at major Internet interconnection points such as MAE-East, MAE-West, and the Ameritech NAP, while private peering arrangements bypass these points. Peering can either be free such as the one between Tier-1 ISPs, or one network may purchase a connection to another such as Tier-3 and Tier-2 ISP to Tier-1 ISPs. Once the networks are interconnected at peering points, the routing protocol running on every Internet router moves packets in such a way as to transport each data packet to its correct destination. For scalability purposes, there are two types of routing protocols directing traffic on the Internet today. IGP (Interior gateway protocols) such as OSPF and RIP create routing paths within individual networks or ASs, while the EGP (exterior gateway protocol) BGP (Border Gateway Protocol) is used to send traffic between different networks. Interior gateway protocols often use detailed information about network topology, bandwidth, and link delays to compute routes through a network for the incoming packets. Since this approach does not scale up to handle a large-scale networks composed of separate administrative domains, BGP is used to link individual networks together to form the Internet. BGP creates routing paths by 1 Definition of client-server http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?client http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?client-server accessed on 2004-01-19 22:45 2 simply minimizing the number of individual networks (called Autonomous Systems) a packet must traverse. While this approach does not guarantee that the routes are even close to optimal, it supports a global Internet by scaling to handle thousands of ASs and allowing each of them to implement their own independent routing policies within the AS. Peering points and routing protocols thus connect the disparate networks of the Internet into one cooperating infrastructure. Connecting to one of these networks automatically provides access to all Internet users and servers. This structure of the Internet as an interconnection of individual networks is the key to its scalability. It enables distributed administration and control of all aspects of the Internet system: routing, addressing, and internetworking. However, inherent in this architecture are four types of bottlenecks that, left unaddressed, can slow down performance and decrease the ability of the Internet to handle an exponentially growing number of users, services, and traffic. These bottlenecks are described in the following sections. 1.3 Internet bottlenecks Here bottleneck refers to network performance bottleneck2. This occurs when the desired data transmission rate between two end systems exceeds available link capacity along the path for a certain period of time for a given topology. Consequently it degrades network performance by increasing packet loss rate, increasing end-to-end latency, and introducing jitter3. However, as we only consider static content, thus jitter is irrelevant. These problems can be divided into four classes: first-mile, backbone, peering, and last –mile. The following figure depicts an overview of these problems. Figure 2 Four classes of bottlenecks on today’s Internet 2 3 http://en.wikipedia.org/wiki/Performance_problem Accessed on 2004-01-11 10:25 http://en.wikipedia.org/wiki/Jitter accessed on 2004-01-11 10:53 3 1.3.1 First-mile bottleneck The first-mile problem appears when link capacity between local ISP and the content server limits the number of users who wants to access the content server. Intuitively, the capacity of the link between local ISP and its customer – the content server, is a constant for a certain typical long period of time. The number of arriving user requests is actually a random distribution because it is a stochastic process. If there are a small number of clients accessing a content server, the total desired access rate will be less than the link capacity. In this case, the packet loss rate, and latency will be acceptable. But when the content server becomes a hot spot, viz. when large numbers of clients access the same content server, the total desired access rate will exceed the maximal bandwidth that the link can provide. In this case, high packet loss rate will result from the link congestion unless the ISP can replicate responses for common requests. Second, high latency can cause very long response times for user requests. Thirdly, the high traffic load can overwhelm the CPU or memory resources of the content server, ultimately bring down this server. Together they are the first mile problem. One solution is to increase the link capacity, but this is not an optimal solution because this will lead to potential bandwidth wasting when the peak hours passed due to the stochastic process of client requests and it is not a cost-effective solution from the content provider’s perspective. Since CDNs replicate or cache the content in their replica servers or cache proxies distributed across many different ASs, this link budget of the first mile has been distributed to many limited links between the access clients and the content replicas (replica servers or proxies). In this way, the bandwidth between the local ISP and the hot spot content server has been saved to be a great extent. In addition, this relieves the original content server from potential overload. Link budgets of a content provider can be reduced to a small number because each CDN only needs a few links between the content server and the replica servers for content updating because the CDNs can distribute/create replicas in their own overlay networks. 1.3.2 Peering bottleneck problem The peering bottleneck occurs due to two major reasons. Firstly, lower tier ISPs rent their upstream links from higher tier ISPs. But for ISPs within the same tier, they do not constantly pay each other for the links to connect to each other. This leads to a peering problem – these links often run at a fixed capacity for a long time. Secondly, installation of new circuits to expand the capacity of these links occurs much slower than the actual traffic demands increase due to many technical or non-technical reasons. Imagine that thousands of users only need a mouse click but actually installing new links takes at least 3-5 months. Therefore, many ISPs don’t wait until the link is saturated before expanding capacity becomes an issue. For example, Telia4 starts to expand its peering links when traffic reaches 50% of capacity. This seems to make the peering bottlenecks less serious than the other bottlenecks. However, increasing with numbers of broadband Internet users, bandwidth intensive applications, and slow upgrade of links bottlenecks in the peering groups among tier-1 ISPs can occur. This class of bottlenecks we call peering bottlenecks. 4 www.telia.se accessed on 2004-01-18 19:21. Telia is one the largest ISPs in Europe and North American. 4 1.3.3 Backbone bottleneck For the same reasons, the backbone of the Internet is under increasing pressure from traffic demands. However, the speed of installation of new capacity is even slower than the peer link expansion. Thus traffic engineering is used to maintain reasonable Quality of Service for upper layer services. However, it is hard to shape the traffic in today’s increasingly diversified services due to the complexity of trade-offs amongst upper tier ISPs and local access users and their limited bandwidth. The difficulties of doing traffic engineering and new capacity expansion decision may cause the backbone bottlenecks on the Internet. 1.3.4 Last mile bottleneck If the end user’s bandwidth intensive applications run over a low bandwidth dialup or cable-modem link to their local ISP, this may become a bottleneck on the last-mile link. This class of bottlenecks exists for the connection between end users and their local ISP. Increasingly, the last-mile problem seems to be resolved by Digital Subscriber Line access which quickly rolled out over the world [13]. However, it actually only relaxes the bandwidth constraint between the user’s modem and the xDSL distribution point (a DSLAM5). Unfortunately, the primary link from the ISP’s core switches to the distribution cabinet often becomes another bottleneck. Furthermore, this will potentially cause peering and infrastructure bottlenecks into more serious problem. Even though this backhaul link is easier to expand than any other bottleneck links and the number of the subscribers attached to this link is much easier to predict than on other bottlenecks, you do not know what bandwidth intensive applications the end users are like to run. When an ISP does traffic engineering, it must avoid blocking certain bandwidth intensive applications or this could cause the end users to immediately subscribe to a competitor’s network. While the xDSL rollout resolves the last-mile problem, it challenges traffic engineering and existing peering and first-mile solutions. To sum up, the problems that a CDN faces are to meet the requirements of the end user, content provider, and the ISP in the context of these four classes of Internet bottlenecks. 1.4 CDN technologies 1.4.1 A system overview As we described in the previous sections, a CDN’s goal is to (1) minimize the latency, (2) maximize the content data availability, and (3) minimize the bandwidth consumption on ISP networks. Therefore, we can define that a CDN is an overlay network which is used to transfer a large amount of frequently requested data in a short time to clients. More systematically definitions have been given such as: Protocols and appliances created exclusively for the location, download, and usage tracking of content [24]. This means that a CDN provides: (1) A way to distribute content and applications globally through a network of strategically placed servers; (2) A way to track, manage, and report on the distribution and delivery of content; and 5 Digital Subscriber Line Access Multiplexer. 5 (3) A way to provide better, faster, and more reliable delivery of content and applications to users. This following figure depicts an overview of a Content Distribution Network. Figure 3 Overview of an typical CDN This overview shows us that content servers delegate to replica servers copies of the content. When a user accesses certain content delegated via the CDN, they actually access the content not from the original content server, but from a replica server even from multiple replica servers. However, before they get access the content, their requests are redirected by the redirection servers. These servers tell the users to access specific replica servers which are strategically close to clients. The users can now download the content as if the content server was situated in a short distance away (i.e. a smaller numbers of hops compared to accessing the original content server) since it actually download from a nearby replica severs. Thus the latency has been decreased. From another aspects, the content server has less of a danger of being overwhelming when large numbers of clients access the content; since CDN does replication of content across many servers. Furthermore, load balancing causes the client access traffic across to be evenly distributed different replica servers. Thus, the CDN significantly reduces workload at the original content server. In addition, the first mile6 bottleneck has been alleviated by distributing the traffic load over all the replica servers close to the clients, thus consumption of bandwidth on last mile, peering and even backbone bottlenecks are proportionally reduced. This relaxes the pressure on the ISP for traffic engineering. 1.4.2 A typical architecture In a typical CDN, the following components are mandatory: client, replica servers, original server, billing and charging systems, request routing system, distribution 6 Please refer to 2.1.4 6 system and accounting system. The relationship amongst these components (indicated with numbered lines in Figure 3) is described as follows: Figure 4. A typical CDN architecture (1) The original server delegates its Universal Resource Locator name space for objects to be distributed and delivered by the CDN to the request routing system. (2) The original server publishes content that is to be distributed and delivered by the CDN to distribution system. (3) The distribution system sends content to the replica servers. In addition, this system interacts with the request routing system through feedback to assist in replica server selection for clients. (4) The client requests documents from the original server. However, due to URL name space delegation, the client request is redirected to the request routing system (redirection server). (5) The request routing system routes the request to a suitable replica server. (6) The selected replica server then delivers the content to the client. Additionally, the replica server sends accounting information for delivered content to the accounting system. (7) The accounting system collects all accounting information and then manipulates content access records for input to the billing system and statistics are feedback to the request routing system for better redirection of future requests. (8) The billing system uses the content detailed records to work out how much shall be charged or paid each content providers[2]. Following the above flow, we can have the following description of the components of this CDN system. A Client is a Hyper Text Transfer Protocol user application. The requirement of user is to be able to access the content at any time when he or she wants. 7 Replica server is the most important type of component in a CDN. Its major features are: (1) archiving copies of content data strategically considering granularity of the content data; (2) communicating in their peering group in order to achieve load balancing for client traffic; (3) Push content data strategically in according to information distribution system; (4) generating accounting and statistical data. The location and routing system is mainly responsible for redirecting the client’s request from the original content server to specific replica servers. It gets updates from original content servers to determine data granularity whose location shall be redirected, a utilized feedback from accounting system for better redirection based upon certain access metrics, i.e. page hit rate. The distribution system is the channel to deliver content data to different replica servers. Nowadays, CDNs usually have large numbers of replica servers and they are usually spread widely. How to distribute content data to each replica server and how to manage all the replica servers are the major responsibilities of distribution system. There are at least two popular channels to be used by CDN distribution systems. One is the terrestrial Internet links and the other Satellite links (broadcast is used in this case). Many CDN operators usually choose to construct an overlay network connecting replica server into a tree in order to manage all nodes across the Internet. The accounting system is to collect all types of statistical data for different uses in other components such as billing and charging system, the location and routing system, and the distribution system. The content server is the CDN’s customer who is willing to pay for distribution services over a Content Distribution Network. 1.4.3 Traditional CDN criteria In general, we understand how CDN works in such a typical architecture. However, what makes a bold CDN system? To answer this question we examine the desired attributes of a traditional CDN system. In general, they should have the following properties: fast access, robustness, transparency, scalability, efficiency, adaptive, stability, load balancing, interoperability, simplicity [1] and security. Fast access, from a user perspective, access latency7, this is the important measurement of CDN usability. A good CDN aims to decreases content access latency. In particular, it should provide the user with lower latency on average, than would be the case without employing a CDN system. Robustness, from a user perspective, means high content availability, which is another important quality of a CDN. Users expect to receive content whenever they want. From a system design point of view, robustness means that (1) a small number of replica servers or redirection servers might crash, but this should not bring down the entire CDN; and (2) the CDN should recover gracefully in the case of failures. These two attributes actually require good self-organization of the CDN in order to achieve fault resiliency; otherwise the users will see either failed requests of high delay. 7 Please refer to section 2.1.1 for detailed explanation of latency 8 Transparency, a CDN system should be transparent for the user, the only thing the user should notice are faster response and higher content availability. This requires that the CDN to be dependent of the user client. Scalability, since the explosive growth in network size and density in the last decades and continuing exponential growth for the near future, a key to success in such an environment is scalability. A CDN should scale well with both increasing size and density of the Internet. This requires all protocols employed in the caching system to be as lightweight as possible. Efficiency, from an ISP’s point of view, includes two aspects of efficiency. First, how much overhead does the CDN impose on the Internet? The additional load of CDN should be as minimal as possible. This requires that the quantity of signaling packets should be as small as possible. Secondly, any mechanisms of a CDN should avoid leading to critical network resource over-utilization, i.e. increasing pressure on the DNS [47] service. Adaptive, it’s desirable to make a CDN adapt to the dynamically changing user demands and the changing network environment. For instance, a CDN must be able to deal with the flash crowd[136] problem for some special content servers. Adaptation involves several aspects of the system: replica management, request routing, replica server placement, content placement, etc. This increases content availability for the content provider, while load balancing increases robustness. Stability, from an ISP’s point of view, means that a CDN shall not introduce instability into the network. For instance, a naïve CDN routing system distributing requests based upon network information could result in oscillation due to the instability of the Internet. This oscillation will cause the CDN’s cost to increase due to content replication and request routing, thus potentially leads to higher latency in delivering content to a user. Load balancing is desirable, thus the CDN should distribute the load evenly throughout the entire overlay network. It can effectively avoid a single point of failure due to failure of suboptimal replica servers and redirection servers. From the content provider’s point of view, this feature alleviates the first mile bottleneck. From an ISP’s point of view, this reduces demands for their network bandwidth. Interoperability is important, since the Internet grows in scale and coverage it spans a wider range of hardware and software architectures. For instance, on the last mile access, xDSL lines connect a vast numbers of households. Additionally, NAT and firewalls are becoming more and more popular. A good CDN must adapt to a wide range of network architectures. Simplicity is important as simple mechanisms are always easier to correctly implement, and system maintenance is likely to be lower in cost; also simple mechanisms are more likely to be accepted as international standards. Security is always an important property of today’s distributed systems. CDN network security mainly addresses the problems of Digital Right Management of licensed content. Another aspect of security is to secure the CDN network itself. However, 9 security is often correlated to efficiency, thus optimizing one generally minimize the other. Therefore, the appropriate balance must be found. 1.4.4 Core mechanisms From the above description, we can see that there are some mechanisms which are essential for CDN systems. They are: server placement, replica placement and management, request routing, and server location. In the following sections, I will try to explain the problems from each of these aspects and describe some state-of-the-art approaches for solving them. 1.4.4.1 Server placement 1.4.4.1.1 Theoretical problem models and solutions As one of major goal of a CDN, it is very important that a CDN minimize latency between clients and the content server. The problem is where to place servers on the Internet. Intuitively, where to place replica servers (which contains copy of content ) is directly in related to average content access latency for clients. Therefore, optimizing server placement must succeed in minimizing access latency. In a traditional CDN, replica server and content replica are coupled. Thus server placement becomes the basis of replica distribution. Theoretically, there are three models used so far to formulate the server placement problem. They are Minimum KCenter problem, the location facility problem [25] and the Minimum K-Center problem and location facility problem with constraints (also quite popularly) [26]. Minimum K-Center problem Given N servers, select K (K<N) centers (facilities), and then for each location j which is assigned to center i ( i in N ), we shall always have cost djcij (where dj denotes the demand of the node j, cij denotes the distance between i and j). The goal is to select the K centers to minimize the sum of these costs. Location facility problem Given a set of locations I at which facilities may be built, building a facility at location I incurs a cost of Fi. Each client j must be assigned to one facility, incurring a cost djcij. The objective is to find a solution of the minimum total cost. The difference between this model and K-Center problem is the number of centers. K-Center model place a limitation i number of the centers (which is the K), but the Location facility model is open to varying from N. Limited K-center and location facility problem In [26], it is denoted as a capacity version problem. In this model, we place more service constraints on both the location facility and the centers. For instance, a mount of services a center can provide and maximal number of requests which a facility can serve can be constraints. This enables the server and replica placement problem to be formulated as either a limited or unlimited location facility problem, or a limited or unlimited minimal K-center problem. Based upon these problem models, there were many solutions which had been developed. For this NP-hard minimum K center problem, if we are willing to tolerate inaccuracies within a factor of 2, i.e. the maximum distance between a node and the nearest center being no worse than twice the maximum in the optimal case, the 10 problem is solvable in O (N | E |) time [28] The algorithm can be described briefly as follows: Given a graph G = (V, E) and all its edges arranged in non-decreasing order by edge cost. c: c(e1)≤ c(e2)≤… ≤ c(em), let Gi=(Vi, Ei), where Ei={e1, e2, …ei}. A square graph of G, G² is the graph containing V and edge (U,V) wherever there is a path between u and v in G of at most two hops, u ≠ v – some edges in G² are pseudo edges, in that they do not exist in G. G = (V, E) is a subset of V’ is included in V such that , for all u, v belongs to V’ , the edge (u , v) is not in E. An independent set of nodes G² thus a set of nodes in G that are at least three hops apart in G. We define a maximal independent set M as an independent set V’ such that all nodes in V – V’ are at most one hop apart from nodes in V’. The outline of the minimum K-center algorithm from [28] is shown as follows: 1. Construct G1², G2², G3² …, Gm² 2. Compute Mi for each Gi² 3. Find smallest i such that | Mi | ≤K, for instance j 4. Mj is the set of K center 1.4.4.1.2 Heuristic approaches Since the theoretical approach is computational expensive or does not consider the network and workload, thus it is difficult to apply in realistic situations [29] and may not be suitable for CDN. Therefore, other heuristic and suboptimal algorithms have been proposed, which consider some practical aspects of the CDN, such as network load, traffic pattern and network topology [26], [29], [30]. They offer (relatively) lowing computational complexity. After comparing different algorithms such as Tree-based algorithm, Greedy algorithm, Random algorithm, Hot spot algorithm, super-optimal algorithm in simulation, Qiu et al. [26] found that the Greedy algorithm is the one with the best performance, less computational expensive, and relatively insensitive to imperfect data. The basic idea of the greedy algorithm is as follows. Suppose it needs M servers amongst N potential sites. It chooses one site at a time. In the first iteration, it evaluates each of the N potential sites individually to determine its suitability for hosting a server. It computes the cost associated with each site under the assumption that accesses from all clients converge at that site, and picks the site that yields the lowest cost, i.e. lowest bandwidth consumption. In the second iteration, it searches for a second site that, in conjunction with the site which has already been selected, yields the lowest cost. In general, in computing the cost, the algorithm assumes that clients direct their accesses to the nearest server, i.e. one that can be reached with the lowest cost. The iteration continues until M servers have been chosen. To support this greedy approach, one usual method is to partition the graph into tree. K hierarchically Well-Spread Tree[27] ( K-HST) is one of the typical representations. However, greedy placement requires knowledge of the client locations in the network and all pairwise intern-node distances. This information in many cases may not be available, for instance, use of NAT and Firewalls might prevent the location of clients. In [29], a topology-informed placement strategy has been proposed. Assuming that nodes with highest outdegree8 can reach more nodes with smaller latency, we place 8 In a directed graph, we say that a vertex has outdegree x if there are (exactly) x edges leaving that vertex. 11 servers on candidate hosts in descending order of outdegrees. These are called Transit Nodes due to the assumption that nodes in the core of the Internet transit points will have the highest outdegrees. In most of cases, Autonomous System gateways will be chosen as the transit nodes. However, due to inaccuracy of AS topology information, the authors [29] exploited router level topology information as showed that result is better performance than simply using AS level routing information. Going deeper into the network, they found that each LAN associated with a router is a potential site to place a server, rather than being each AS being a site. To sum up, considering the most up-to-date solutions for the server placement: greedy and topology-informed placement strategies are both well developed. The edge computing [9] are inspirited these solutions. However, to our best knowledge, the approaches to the real world isn’t well exploited. 1.4.4.2 Replica placement Similar to server placement, replica placement is also a facility location problem, but the difference from server placement is greater concern about user access patterns. The first problem model was formulated in [32]. It considers distance, cache size, and access frequency. As a distance metric, they choose hierarchical distance model to calculate the distance metrics in order to get better approximation to the actual Internetwork. 1.4.4.2.1 A typical cost model In [33], the another have developed a cost model for object placement over the Internet based upon object size, storage capacity in an Autonomous System, and distance between the Autonomous Systems. Their formula is as follows: Notation: the average number of hops that a request must traverse from all ASs is then where dij(x) is the shortest distance to a copy of object j from ASi under the placement x, J is the number of objects, I is the number of AS. Sij is the number of bytes of storage in ASj for the ith object. Given a target number of hops T, we ask if there is a placement x such that subject to They have proved that this is a NP-hard problem. It means that for a large number of objects and ASs, it is not feasible to solve this problem optimally [34]. Based on this, they have adopted a similar approach to [32] which utilized a heuristic algorithm to solve the placement problem. The algorithms they evaluated were: Random, Popularity, Greedy-Single, Greedy-Global. Similarly, in [32], the authors 12 investigated: Purely local algorithms including MFUPlace, LRU Replacement, GreedyDual Replacement, and Cooperative placement algorithms including an optimal placement algorithm, simple near-optimal placement algorithm, Greedy placement algorithm, Amortized placement algorithm, and Hierarchical GreedyDual algorithm. Eventually, both [33] and [32] concluded that a cooperative approach is the best one. And [32] also identified that client access traffic pattern is a key challenge for replica placement. In [32], with their simulations are based on a Zipf [35] network model, they find Peer-to-Peer9 is a good way out of this NP-hard problem and achieves optimal replica placement and replacement. In [36], they consider this replica placement problem in different granularities. Similar to [32] that the authors established a cost model. They also believe that the cooperative placement and replacement strategy is a better approach. The major contribution of their work is to introduce a cluster-based replication strategy. Their comparison of states to maintain and computational cost amongst different mechanisms (per web site, per cluster and per URL) shows that a cluster-based replication schema is relatively good. In simulation of the web site MSNBC, the cluster-based replication outperformed the other strategies. In particular, their incremental clustering schema is very useful in improving content data availability during flash crowds for popular web site since it adapts well to user access patterns. 1.4.4.2.2 Discussions of replica placement algorithms criteria A replica placement strategy decides what content is to be replicated and where, such that some objective function is optimized under a given traffic pattern and a set of resource constraints. In the objective function, the following metrics should be taken into consideration: A. Reads: the rate of read accesses by a client to an object. This might also be reflected as the probability of an access to an object within time units. B. Writes: the rate of write access by a client to an object. C. Distance: the distance between a client and an original content/replica server, represented with a metric such as latency D. Storage cost: the cost of storing an object at a replica sever. This might reflect the size of object, the throughput of the server, or the fact if a replica in the server E. Content size: the size of object in bytes F. Access time: A time stamp indicating the last time object was accessed at a replica server G. Hit ratio: hit ratio of any replica along the path In addition, the following constraint primitives can be added, such as: storage capacity of a replica server, load capacity of a replica server, node bandwidth capacity of a replica server, link capacity between a client and the replica server, number of replicas to be disseminated, original copy location, delay to be tolerant by a CDN, availability of certain object in a CDN. In [37], the authors have made an intensive study of many replica placement algorithms and proposed sophisticated metrics to evaluate different replica algorithms. Particularly, they have summarized all of their cost functions in their paper. This provides us comprehensive understanding of what constraints were considered in each schema of replica placement algorithm. 9 Peer-to-Peer will be explained in the following chapter (chapter 2). 13 To sum up, replica placement and replacement has been well researched for CDN. Ultimately, a placement algorithm is about how to disseminate replicas under the constraints amongst resources and Quality of Service. 1.4.4.3 Replica management How to maintain data consistency between the master copy of the content in the content server and replicas amongst replica servers is one of the most important questions for all CDNs. If there is a change occurs, updates of this object over the replica servers must be done. It is important that a content user not get a stale version of request content. From CDN’s perspective, over head traffic of the updates should be minimized on the overlay network. This is called the replica or cache coherency or consistency problem, and is depicted in Figure 5. In the following subsections, I will explain two types of replica management strategies– strong consistency and weak consistency. There are many object attributes in HTTP [38], which can assist replica servers to maintain cache coherency. Figure 5. Replica consistency overview 1.4.4.3.1 Strong consistency Client validation This approach is also called polling every-time. The client treats cached resources as potentially out-of-date on each access and sends an If-ModifiedSince header with each access of the resources. This approach can lead to many 304 responses (the HTTP response code for “Not Modified”) by server if the resource does not actually change. Server invalidation Upon detecting a resource change, the server sends invalidation messages to all clients that have recently accessed and potentially cached the resource [49]. This approach requires a server to keep track of clients to use for invalidating cached copies of changed resources and can become cumbersome for a server when the number of clients is large, thus lead to a scalability problem. In addition, the lists themselves can become out-of-date causing the server to send invalidation messages to clients who are no longer caching the resource. Thus it causes unnecessary traffic. 14 Adaptive Leases The server employs a lease mechanism, which determines for how long it should propagate invalidates to the proxies [59]. This work also presents policies under which appropriate lease durations are computed so as to balance the trade-offs of state space overhead and control message overhead. Propagation and Invalidation Combination Fei [60] proposed a smart propagation policy in a hybrid approach (propagation and invalidation).The rationale is to distinguish when to use unicast for invalidation and when to use multicast to propagate the updates. In this propagation policy, the author uses this notation: U is the object/document update rate at the origin content server and the total request rate is R. N is the number of replicas of this object. Є is the factor in the relative efficiency of unicast and multicast [61]. CDN chooses propagation for each object if the following inequality is true: Otherwise, use invalidation where -0.34 < Є < 0.30. Intensive simulation results shows that this method significantly reduced the traffic generated during replica consistency. 1.4.4.3.2 Weak consistency Adaptive TTL Similar to Time to Live for a packet in IPv4[51], the adaptive TTL [52] handles the problem by adjusting a document’s time-to-live based on observations of its lifetime. Adaptive TTL takes advantage of the following facts; if a file has not been modified for a long time, it tends to stay unchanged. Thus, the time-to-live attribute to a document is assigned to be a percentage of the document’s current “age”, which is the current time minus the last modified time of the document. Studies [52] have shown that adaptive TTL can keep the probability of stale documents within reasonable ranges (<5%). Most proxy servers ( i.e. [53], [54], [55] ) use this mechanism. However, there are several drawbacks with this expiration-based coherence [50]. First, users must wait for expiration checks to occur even though they are tolerant the staleness of the requested page. Second, if a user is not satisfied with the staleness of a returned document, they have no choice but to use a tool (Progma) to send a NoCache request to load the entire document from its home site. Third, the mechanism provides no strong guarantee regarding document staleness. Forth, users can not specify the degree of staleness they are willing to tolerate. Finally, when the user aborts a document load, caches often abort a document load as well. Piggyback Invalidation Authors of [56], [57], [58] proposed such a mechanism to improve the effectiveness of cache coherency. They have proposed three invalidation mechanisms as follows: The Piggyback Cache Validation (PCV) [56] capitalizes on requests sent from the proxy cache to the server to improve coherency. In the simplest case, whenever a proxy cache has a reason to communicate with a server it piggybacks a list of cached, but potentially stale, resources from that server for validation. 15 The basic idea of the Piggyback Server Invalidation (PSI) mechanism [57]is for servers to piggyback on a reply to a proxy, the list of resources that have changed since the last access by this proxy. The proxy invalidates cached entries on the list and can extend the lifetime of entries not on the list. They also proposed a hybrid approach which combines the PSI and the PCV techniques to achieve the best overall performance [58]. The choice of the mechanism depends on certain time parameter. If the time is small, then the PSI mechanism is used while the PCV mechanism is used to explicitly validate cache contents for longer interval. The Distributed Object Consistency Protocol Researchers at HP proposed a protocol to enhance an HTTP cache control mechanism. This protocol focuses on two goals: to reduce response time and server demand. The Distributed Object Consistency Protocol [62] defines a new set of HTTP headers to provide object consistency between content origin servers and edge proxy servers. DOCP distributes the ability to serve objects authoritatively on behalf of a content provider throughout the network. This middle-ware like architecture actually represents the request client by its master and content server by its slave. The following picture depicts the overview of this architecture. Figure 6. HP DOCP Architecture 1.4.4.4 Server location and request routing In the location and routing system of a CDN, the location and routing system must be able to serve the client request and redirect the request to a replica server which is located as near as possible to the requesting client. In fact, server location and request routing are two aspect of the same problem of implementing a request service. From a client’s perspective, we can formulate it as a server location problem; from CDN’s perspective, we shall form the request routing problem for a client who requests certain content. In the following sections, I will explain these two views one after another. 16 1.4.4.4.1 Server location Similar to the problem of placement of replica servers and content data replicas, how a client can allocate the best server in terms of proximity metrics and replica server load is another important issue in a CDN system. Since a replica of specific content is stored in some server in most CDNs, this implies that the client will locate the replica if it selects the right server. 1.4.4.4.1.1 Multicast vs. Agent These two techniques can be considered as reactive and proactive approaches. In the former solution, when a client needs to find one server the CDN can send this request to all its replica servers in certain Multicast group. Server type will be cataloged by service. In the case, the client chooses the server who generates the quickest response from amongst that group. The disadvantage is high overhead on these messages sent to all the servers in a group or multi groups. This has been studied in [43]. Unlike the former approach, the Agent approach is more efficient. We can use an agent to probe different servers periodically and to maintain a list of servers which knows the most up-to-date load information for each server. And the agent can communicate with their own protocols to co-ordinate with each other in different locations. When a client requests a type of servers, the agent will select a right server for the client. 1.4.4.4.1.2 Routing layer vs. application layer In [44], the authors proposed a way of using Anycasting to select the nearest server for a client. However, it assumes all servers have the same services. Thereby selection of different services can not be done unless policy constraints programming in the routers. On the contrary, application layer location services can provide better service differentiation, load information, and even bandwidth information for the clients. Similar to previous approach, an agent can act as monitor for the routing-layer traffic and decide to send updates to the database at rendezvous point. Updates will be on demand by the agent, thus many traffic overheads can be reduced [45]. However, the potentially can lead to a single point of failure when larger number of client are sending requests to the CDN. Most important metrics in server selection are: distance between the client and replica server, server load, service type, and bandwidth on the link. The above techniques are quite widely used in today’s’ CDNs. 1.4.4.4.2 Request routing In request routing, we address the problem of deciding which replica server can best service a given client request, in terms of the metrics. These metrics can be, for example, replica server load (where we choose the replica server with the lowest load), end-to-end latency (where we choose the replica server that offers the shortest response time to the client), or distance (where we choose the replica server that is closest to the client). In according to IETF’s classifications [46], there are four catalog and eighteen types of request routing mechanisms. Figure 7 depicts all of them. Since request routing has been well studied and standardized, I will just summarize these results in the following paragraphs. For the detailed reference please see RFC 3568 [163]. 17 Figure 7. Content request routing mechanisms 1.4.4.4.2.1 Transport-Layer Request-Routing At the transport-layer finer levels of granularity can be achieved by close inspection of the client's requests. In this approach, the Request-Routing system inspects the information available in the first packet of the client's request to make surrogate selection decisions. The inspection of the client's requests provides data about the client's IP address, port information, and layer 4 protocols. The acquired data could be used in combination with user-defined policies and other metrics to determine the election of a surrogate that is best suited to serve the request. In general, the forward-flow traffic (client to newly selected surrogate) will flow through the surrogate originally chosen by DNS. The reverse-flow (surrogate to client) traffic, which normally transfers much more data than the forward flow, would typically take the direct path. The overhead associated with transport-layer Request-Routing is better suited for long-lived sessions such as FTP [161] and RTSP [162]. However, it also could be used to direct clients away from overloaded surrogates. In general, transport-layer Request-Routing can be combined with DNS based techniques. As stated earlier, DNS based methods resolve clients requests based on domains or sub domains based on the client's DNS server’s IP address. Hence, the 18 DNS based methods could be used as a first step in deciding on an appropriate surrogate with more accurate refinement made by the transport-layer Request-Routing system. 1.4.4.4.2.2 Single Reply In this approach, the DNS server is authoritative for the entire DNS domain or a sub domain. The DNS server returns the IP address of the best surrogate in an A record to the requesting DNS server. The IP address of the surrogate could also be a virtual IP (VIP) address of the best set of surrogates for requesting DNS server. 1.4.4.4.2.3 Multiple Replies In this approach, the Request-Routing DNS server returns multiple replies such as several records for various surrogates. Common implementations of client site DNS server's cycle through the multiple replies in a Round-Robin fashion. The order in which the records are returned can be used to direct multiple clients using a single client site DNS server. 1.4.4.4.2.4 Multi-Level Resolution In this approach multiple Request-Routing DNS servers can be involved in a single DNS resolution. The rationale of utilizing multiple Request-Routing DNS servers in a single DNS resolution is to allow one to distribute more complex decisions from a single server to multiple, more specialized, Request-Routing DNS servers. The most common mechanisms used to insert multiple Request-Routing DNS servers in a single DNS resolution is the use of NS and CNAME records. An example would be the case where a higher level DNS server operates within a region, directing the DNS lookup to a more specific DNS server within that region to provide a more accurate resolution. 1.4.4.4.2.5 NS Redirection A DNS server can use NS records to redirect the authority of the next level domain to another Request-Routing DNS server. The, technique allows multiple DNS server to be involved in the name resolution process. For example, a client site DNS server resolving a.b.example.com would eventually request a resolution of a.b.example.com from the name server authoritative for example.com. The name server authoritative for this domain might be a Request-Routing NS server. In this case the RequestRouting DNS server can either return a set of “A” records or can redirect the resolution of the request a.b.example.com to the DNS server that is authoritative for example.com using NS records. One drawback of using NS records is that the number of Request-Routing DNS servers is limited by the number of parts in the DNS name. This problem results from the DNS policy that causes a client site DNS server to abandon a request if no additional parts of the DNS name are resolved in an exchange with an authoritative DNS server. A second drawback is that the last DNS server can determine the TTL of the entire resolution process. Basically, the last DNS server can return in the authoritative section of its response its own NS record. The client will use this cached NS record for further request resolutions until it expires. 19 Another drawback is that some implementations of bind voluntarily cause timeouts to simplify their implementation in cases in which a NS level redirect points to a name server for which no valid A record is returned or cached. This is especially a problem if the domain of the name server does not match the domain currently resolved, since in this case the A records, which might be passed in the DNS response, are discarded for security reasons. Another drawback is the added delay in resolving the request due to the use of multiple DNS servers. 1.4.4.4.2.6 CNAME10 Redirection In this scenario, the Request-Routing DNS server returns a CNAME record to direct resolution to an entirely new domain. In principle, the new domain might employ a new set of Request-Routing DNS servers. One disadvantage of this approach is the additional overhead of resolving the new domain name. The main advantage of this approach is that the number of Request-Routing DNS servers is independent of the format of the domain name. 1.4.4.4.2.7 Anycast Anycast [44] is an network service that is applicable to networking situations where a host, application, or user wishes to locate a host which supports a particular service but, if several servers utilize the service, it does not particularly care which server is used. In an Anycast service, a host transmits a datagram to an Anycast address and the network is responsible for providing best effort delivery of the datagram to at least one, and preferably only one, of the servers that accept Datagrams for the Anycast address. The motivation for Anycast is that it considerably simplifies the task of finding an appropriate server. For example, users, instead of consulting a list of servers and choosing the closest one, could simply type the name of the server and be connected to the nearest one. By using Anycast, DNS resolvers would no longer have to be configured with the IP addresses of their servers, but rather could send a query to a well-known DNS Anycast address. Furthermore, to combine measurement and redirection, the Request-Routing DNS server can advertise an Anycast address as its IP address. The same address is used by multiple physical DNS servers. In this scenario, the Request-Routing DNS server that is the closest to the client site DNS server in terms of OSPF and BGP routing will receive the packet containing the DNS resolution request. The server can use this information to make a Request-Routing decision. Drawbacks of this approach are: The DNS server may not be the closest server in terms of routing to the client. Typically, routing protocols are not load sensitive. Hence, the closest server may not be the one with the least network latency. The server load is not considered during the Request-Routing process. 1.4.4.4.2.8 Object Encoding 10 CNAME stands for canonical name. (CNAME) A host's official name as opposed to an alias. The official name is the first hostname listed for its Internet address in the hostname database, /etc/hosts or the Network Information Service (NIS) map hosts.byaddr ("hosts" for short). A host with multiple network interfaces may have more than one Internet address, each with its own canonical name (and zero or more aliases). You can find a host's canonical name using nslookup if you say set querytype=CNAME and then type a hostname. 20 Since only DNS names are visible during the DNS Request-Routing, some solutions encode the object type, object hash, or similar information into the DNS name. This might vary from a simple division of objects based on object type (such as images.a.b.example.com and streaming.a.b.example.com) to a sophisticated schema in which the domain name contains a unique identifier (such as a hash) of the object. The obvious advantage is that object information is available at resolution time. The disadvantage is that the client site DNS server has to perform multiple resolutions to retrieve a single Web page, which might increase rather than decrease the overall latency. 1.4.4.4.2.9 DNS Request-Routing Limitations This section lists some of the limitations of DNS based Request-Routing techniques. DNS only allows resolution at the domain level. However, an ideal request resolution system should service requests on a per object level. In DNS based Request-Routing systems servers may be required to return DNS entries with short time-to-live (TTL) values. This may be needed in order to be able to react quickly in the face of outages. This in turn may increase the volume of requests to DNS servers. Some DNS implementations do not always adhere to DNS standards. For example, many DNS implementations do not honor the DNS TTL field. DNS Request-Routing is based only on knowledge of the client DNS server, as client addresses are not relayed within DNS requests. This limits the ability of the Request-Routing system to determine a client's proximity to the surrogate. DNS servers can request and allow recursive resolution of DNS names. For recursive resolution of requests, the Request-Routing DNS server will not be exposed to the IP address of the client's DNS server. In this case, the RequestRouting DNS server will only be exposed to the address of the DNS server that is recursively requesting the information on behalf of the client's site DNS server. For example, imgs.example.com might be resolved by a CN, but the request for the resolution might come from dns1.example.com as a result of the recursion. Users that share a single client site DNS server will be redirected to the same set of IP addresses during the TTL interval. This might lead to overloading of the surrogate during a flash crowd unless different sites got different answers. Some implementations of bind can cause DNS timeouts to occur while handling exceptional situations. For example, timeouts can occur for NS redirections to unknown domains. DNS based request routing techniques can suffer from serious limitations. For example, the use of such techniques can overburden third party DNS servers, which should not be allowed. In RFC 2782 [164], provides warnings on the use of DNS for load balancing. Readers are encouraged to read the RFC for better understanding of these limitations. 1.4.4.4.2.10 Application-Layer Request-Routing Application-layer Request-Routing systems perform deeper examination of client's packets beyond the transport layer header. Deeper examination of client's packets provides fine-grained Request-Routing control down to the level of individual objects. 21 The process could be performed in real time at the time of the object request. The exposure to the client's IP address combined with the fine-grained knowledge of the requested objects enable application-layer Request-Routing systems to provide better control over the selection of the best surrogate. 1.4.4.4.2.11 Header Inspection Some application level protocols such as HTTP, RTSP, and SSL [165] provide hints in the initial portion of the session about how the client request must be directed. These hints may come from the URL of the content or other parts of the MIME request header such as Cookies. 1.4.4.4.2.12 URL-Based Request-Routing Application level protocols such as HTTP and RTSP describe the requested content by its URL. In many cases, this information is sufficient to disambiguate the content and suitably direct the request. In most cases, it may be sufficient to make RequestRouting decision just by examining the prefix or suffix of the URL. 1.4.4.4.2.13 302 Redirection In this approach, the client's request is first resolved to a virtual surrogate. Sequentially, the surrogate returns an application-specific code such as the 302 (in the case of HTTP or RTSP) to redirect the client to the actual delivery node. This technique is relatively simple to implement. However, the main drawback of this method is the additional latency involved in sending the redirect message back to the client and the client having to resolve a new address. 1.4.4.4.2.14 In-Path Element In this technique, an In-Path element is present in the network in the forwarding path of the client's request. The In-Path element provides transparent interception of the transport connection. The In-Path element examines the client's content requests and performs Request-Routing decisions. The In-Path element then splices the client connection to a connection with the appropriate delivery node and passes along the content request. In general, the return path would go through the In-Path element. However, it is possible to arrange for a direct return by passing the address translation information to the surrogate or delivery node through some proprietary means. The primary disadvantage with this method is the performance implications of URLparsing in the path of the network traffic. However, it is generally the case that the return traffic is much greater than the forward traffic. The technique allows for the possibility of partitioning the traffic among a set of delivery nodes by content objects identified by URLs. This allows object-specific control of server loading. For example, requests for non-cacheable object types may be directed away from a cache. 1.4.4.4.2.15 Header-Based Request-Routing 22 This technique involves using HTTP attributes such as Cookie, Language, and UserAgent, in order to select a surrogate. Cookies can be used to identify a customer or session by a web site. Cookie based Request-Routing provides content service differentiation based on the client. This approach works provided that the cookies belong to the client. In addition, it is possible to direct a connection from a multi-session transaction to the same server to achieve session-level persistence. The language header can be used to direct traffic to a language-specific delivery node. The user-agent header helps identify the type of client device. For example, a voicebrowser, PDA, or cell phone can indicate the type of delivery node that has content specialized to handle the content request. 1.4.4.4.2.16 Site-Specific Identifiers Site-specific identifiers help authenticate and identify a session from a specific user. This information may be used to direct a content request. An example of a site-specific identifier is the SSL Session Identifier. This identifier is generated by a web server and used by the web client in succeeding sessions to identify itself and avoid an entire new security authentication exchange. In order to inspect the session identifier, an In-Path element would observe the responses of the web server and determine the session identifier which is then used to associate the session to a specific server. The remaining sessions are directed based on the stored session identifier. 1.4.4.4.2.17 Content Modification This technique enables a content provider to take direct control over Request-Routing decisions without the need for specific switching devices or directory services in the path between the client and the origin server. Basically, a content provider can directly communicate to the client the best surrogate that can serve the request. Decisions about the best surrogate can be made on a per-object basis or it can depend on a set of metrics. The overall goal is to improve scalability and the performance for delivering the modified content, including all embedded objects. In general, the method takes advantage of content objects that consist of a basic structure that includes references to additional, embedded objects. For example, most web pages consist of an HTML document that contains plain text together with some embedded objects, such as GIF or JPEG images. The embedded objects are referenced using embedded HTML directives. The embedded HTML directives direct the client to retrieve the embedded objects from the origin server. A content provider could modify references to embedded objects such that they could be fetched from the best surrogate. This technique is also known as URL rewriting. Content modification techniques must not violate the architectural concepts of the Internet [48]. Special considerations must be made to ensure that the task of modifying the content is performed in a manner that is consistent with RFC 3238 [48]; it specifies the architectural considerations for intermediaries that perform operations or modifications on content. 23 The basic types of URL rewriting are discussed in the following subsections. 1.4.4.4.2.17.1 A-priori URL Rewriting In this scheme, a content provider rewrites the embedded URLs before the content is placed on the origin server. In this case, URL rewriting can be done either manually or by using software tools that parse the content and replace embedded URLs. A-priori URL rewriting alone does not allow consideration of client specifics for Request-Routing. However, it can be used in combination with DNS Request-Routing to direct related DNS queries into the domain name space of the service provider. Dynamic Request-Routing based on client specifics are then done using the DNS approach. 1.4.4.4.2.17.2 On-Demand URL Rewriting On-Demand or dynamic URL rewriting, modifies the content when the client request reaches the origin server. At this time, the identity of the client is known and can be considered when rewriting the embedded URLs. In particular, an automated process can determine, on-demand, which surrogate would serve the requesting client best. The embedded URLs can then be rewritten to direct the client to retrieve the objects from the best surrogate rather than from the origin server. 1.4.4.4.2.17.3 Content Modification Limitations Content modification as a Request-Routing mechanism suffers from many limitation. For example: The first request from a client to a specific site must be served from the origin server. Content that has been modified to include references to nearby surrogates rather than to the origin server should be marked as non-cacheable. Alternatively, such pages can be marked to be cacheable only for a relatively short period of time. Rewritten URLs on cached pages can cause problems, because they can become outdated and point to surrogates that are no longer available or no longer good choices. 1.4.4.4.2.18 Combination of Multiple Mechanisms There are environments in which a combination of different mechanisms can be beneficial and advantageous over using one of the proposed mechanisms alone. The following example illustrates how the mechanisms can be used in combination. A basic problem of DNS Request-Routing is the resolution granularity that allows resolution on a per-domain level only. A per-object redirection cannot easily be achieved. However, content modification can be used together with DNS RequestRouting to overcome this problem. With content modification, references to different objects on the same origin server can be rewritten to point into different domain name spaces. Using DNS Request-Routing, requests for those objects can now dynamically be directed to different surrogates. 1.4.4.5 Self-organization User, content provider, and their Internet Service providers are the three major actors in every CDN system. User wants fast access of the content, content provider wants high availability of their content, and the ISP wants no network congestion or minimal bandwidth consumption for the traffic over their network. Therefore a good CDN 24 system must find the best trade-offs to meet all above requirements. Example of such systems include Cisco’s Boomerang algorithm in their patented Self-organizing Distributed Architecture (SODA). However, this is what they say: Cisco Content routers utilize an extremely fast site-selection algorithm (Boomerang) to have each site respond to a Domain Name System query with the IP address of a server at the site. The Boomerang process selects the site that has the least amount of network delay between the site and the client's DNS server at that exact instant in time. The first response through the network wins the race and is used by the client to connect to a server at the requested web site. Cisco's content networking solutions for enterprise CDNs also use SODA technology. SODA enables the system to intelligently route requests and deliver content to the end user. The SODA algorithm learns about the network, thus performance improves with time and usage. When a browser selects content from a specific Web site, the content delivery network will look at the source IP address and redirect the request to the optimal content engine. Boomerang and SODA attempt to ensure the fastest delivery of content regardless of location, providing the highest availability and site response. SODA provides for the ability to redirect user requests to the best content router by allowing CDN devices to automatically organize themselves into a single cooperating system. Because the CDN devices collectively determine the fastest route, they also provide the fastest delivery of high-bandwidth content.[76] In the Location and routing system of Cisco’s CDN, The functions as I described in section 1.2.4.4.2. In Boomerang process, the most interesting part is that the SODA algorithm learns about the underlying network design and performance to the end user with usage improves. Thus when new requests arrive for certain content, the content router selects a replica server for the client in terms of metrics such as presence of content, geographical location, originating network, current network conditions, and current replica server health. Their content router actually integrates a redirection server and layer three router in this one cabinet. Thus, it does congestion measurement for the underlying network before determining the redirection of each client request content replica server. Thus it optimizes the layer 3 performance by avoiding congestion on the link and this certainly benefits the client access performance. However, in their architecture, the self-organization replies on their proprietary infrastructure i.e. Cisco CR-4450 content router [77]. 1.5 Discussion After examining the problems and approaches of Server placement, Replica placement, Replica management, Server location and request routing in CDNs, we find that there are some issues are very interesting for future CDNs. 1.5.1 Large content for large numbers of users Nowadays, large content (i.e. movie, music and video-on-demand etc.) publishing becomes more and more popular on web sites just because of high demand from Internet users. Therefore content distribution networks face a challenge in delivering such large content objects. On one hand, this requires lots of bandwidth, and predictable transmission delay to achieve high data availability. On the other hand, the number of content providers turning to CDNs to better service their customers is 25 growing rapidly. Thereby, large and highly dynamic content distribution will raise many issues concerning the design and architecture of CDNs in the future, such as: How to replicate the large amount of content? How to use limited bandwidth to transmit this content? How to scale up to large numbers of users who want to access the large size of content? How to distribute the load of CDN when frequent accessed the content sites appear?... In the case of the Spirit spaceship landing on Mars, NASA published many pictures and videos on their web site. The high hit rate [63], [64] indicates the challenges CDN will have when large number of users request large content objects. Content Distribution network Interconnection [67] [74] has been studied for some time. Their solution is to connect CDNs to interconnect CDNs or peering CDNs in terms of many concerns such as CDN performance, security, accounting [67], and economic reasons [68]. I dare to guess that the delivery in the case of the Spirit spaceship landing Mars, the eTouch [69], Speedra [70] and Sprint[71] had to address this scenario to distributed NASA’s traffic for this Mars event. But this solution is an very expensive one as I mentioned in section 5.1.3. Is there any better way to handle this large volume of traffic? I believe this should be a research focus in the next generation CDNs. 1.5.2 Denial of service attack When we are talking about normal access request for the content server, there is another fact which should not be ignored, that is a type of abnormal access request, i.e., as part of a Denial of Service attack (DoS). These attacks typically flood a network or server with bogus request packets, rendering it unavailable to handle legitimate requests. Despite increased awareness about security issues, denial of service attacks remain a challenging problem. According to a Computer Security Institute survey, for example, the number of respondents indicating their sites had been the victim of a DoS attack rose from 27% in 2000 to 38% in 2001 [72] News about DoS can be found in the section of “Selected news reports/interviews/panel discussions” of [73]. The key problem is that Distributed Denial of Service (DDoS) utilized legitimate access method (such as the attacker sends TCP SYN packets which can be accepted by CDN servers) to overwhelm the content server or replica servers in CDN. Study [72] and CDI [66] choose the server side techniques to solve this problem. As we know, all security mechanisms incur overhead to do checking and action. Can we have some more costeffective solution of this problem? 1.5.3 Scalability issue The magnitude of the Internet population is still increasing exponentially. How well can a CDN or CDNs work when the number of access clients continuously increases is the problem for today’s CDNs. CDI [66] connects many CDNs to increase the scalability by peering affords. The rationale of doing so is to expand the number of the replica servers and offer better load balancing amongst them. Management overhead becomes inevitable when coordination happens amongst them. Can we have a simpler solution than that? 1.5.4 Self-organization in next generation of CDNs For large content distribution, large bandwidth consumption is inevitable. Thereby making the best trade-off amongst decreasing access latency, maximizing content data 26 availability, and optimizing network performance becomes more difficult than small and middle size content delivering in today’s CDNs. Together with new techniques such as Peer-to-Peer, and swarm intelligence, what are the requirements should be implemented in self-organization of CDNs? This should be answered in the next generation CDN systems. 27 Chapter 2 2.1 Introduction to Peer-to-Peer A definition of P2P There are many definitions for Peer-to-Peer technology. In this paper, use this definition from The Free Encyclopedia [166] Generally, a Peer-to-Peer (or P2P) computer network refers to any network that does not have fixed clients and servers, but a number of peer nodes that function as both clients and servers to the other nodes on the network. This model of network arrangement is contrasted with the client-server model. Any node is able to initiate or complete any supported transaction. Peer nodes may differ in local configuration, processing speed, network bandwidth, and storage quantity. Popular examples of P2P are file sharing-networks. According to this definition, we can understand that equivalence is the most important property of such systems. Another important property is decentralization of such systems. The equivalence is reflected by the decentralization in a P2P system or application because P2P refers to a class of systems or applications that employ distributed resources to perform a crucial function in a decentralized manner[79]. The reason why P2P computing has becomed so famous is because that it solves the problems of utilizing many distributed systems in a better way compared with clientserver computing. Moreover, it brings significant conveniency to its users’s for their many different communication demands at work and at play for example, online collaboration, music-sharing, and online gaming, instant messaging etc. This makes Peer-to-Peer one of the hottest Internet trends! 2.2 Problem domains and well-known approaches Differing from CDN, P2P technology is type of communication/computation pattern [80]. It means that P2P computational mechanisms could be applied in many subareas in the realm of distributed systems such as distributed computing of financial and biotechnology, file-sharing, collaboration of fault tolerant and real time constraints, communication middleware. In this paper, we will not describe all the problems which P2P can address in these subareas and disiplines. Instead, we will focus its properties. Following this, we select the features of P2P which can be used in our PlentyCast design. In the following subsections, I use the P2P characteristics described by [79] as a base, then extend their discussion. 2.2.1 Decentralization P2P models question the wisdom of storing and processing data only on centralized servers and accessing the content via request-response protocols. In traditional clientserver models, the information is concentrated in centrally located servers and distributed through networks to client computers that act primarily as user interface devices. Such centralized systems are ideal for some applications and tasks. For example, access rights and security are traditionally more easily managed in centralized systems. However, the topology of the centralized systems may result in inefficiencies, bottlenecks, and wasted resources. Furthermore, although hardware performance and cost have improved, centralized repositories are expensive to set up 28 and hard to maintain. They require human intelligence to build, and to keep the information they contain relevant and current. One of the powers of decentralized systems is the emphasis on the user’s ownership and control of the data and resources. In my viewpoint, managing resources and data in centralized systems have more overheads, take slower response, and have greater probability of error occurring comparing with the user’s ownership of the resources and data if there is a long distance existing between the user and the content objects. However, in full decentralized systems, every peer is an equal participant. This makes design difficult because there is no centralized server with a global view of all the peers in the network or the files they provide. This is why many P2P systems are built in hybrid approach as in Napster[92], where there is a central directory of the files but peers will download the file from other nodes directly. In full decentralized file systems, Freenet[42] and Gnutella[40], a big problem is discovery of the nearest elements of network. For instance, Gnutella must learn one IP address of a peer then the new node can send Ping packets to discovery the rest of the peers and cache the host list of the peers it can reach. Therefore, the problem is to make a trade-off between pure P2P and hybrid P2P in terms of the focus of systems features such as file sharing, collaboration and computation, or platform. This categorization has a direct effect on the self-organization and scalability11 of a system because of the loose coupling to any infrastructure. 2.2.2 Scalability An immediate benefit of decentralization is improved scalability[81]. Scalability is limited by the amount of centralized operations (e.g. synchronization and coordination) that needs to be performed, the amount of state that needs to be maintained, the inherent parallelism in an application, and programming model which is used to represent the computation. There are many attempts to attack the scalability problem. Napster attacked the scalability problem by having the peers directly download music files from the peers that possess the requested document. As a result, Napster was able to scale to over 6 million users at the peak of its service. In contrast, the [email protected] [90] focuses on a task that is almost completely parallel. It harnesses the computer power that is available over the Internet to analyze data collected from its telescopes with the goal of searching for extraterrestrial life forms. The [email protected] has close to 3.5 million users so far. Content Addressable Network (CAN) [82], Chord [83], Oceanstore [84], and PAST [85] dictate a consistent mapping between the object key and hosting node. Therefore, an object is always reachable as long as the host is connected. In such an overlay network, each node only maintains the address of a small number of other nodes. This limits the state information to be maintained for each node, thus scalability is increased. It is said that they can scale to billions of users, millions of servers, and 10¹³ records. The trade-off between scalability and self-organization shall be investigated in this case. 11 Self-organization will be discussed in 2.2.3, and scalability will be discussed in 2.2.2 29 2.2.3 Self-organization Self-organization is defined as “a process where the organization (constraints and redundancy ) of a system spontaneously increases, i.e., without this increase being controlled by the environment or an encompassing or otherwise external system”[86] In P2P systems, self-organization is needed because of scalability, fault resilient12, intermittent connection of resources, and the cost of ownership13. Since unpredictable numbers of users access the network in an ad hoc manner, adaptation should be implemented to handle the changes caused by peers connecting and disconnecting from system. It can also be a very important feature when a peer discovers the network. There are additional concerns in self-organization is about how to adapt the variance of network latency and bandwidth other than intermittent peer availability. In this case self-organization shall be embedded in the lower level communication APIs of the platform in a P2P system. There are a number of academic systems and products that address self-organization, such as OceanStore, Pastry, Tapestry, Chord, and CAN. In OceanStore [84], self-organization is applied to location and routing infrastructure. Because of intermittent peer availability, as well as variances in network latency and bandwidth, the infrastructure is continuously adapting its routing and location support. In Pastry[88], self-organization is handled through protocols for node arrivals and departures based on a fault-tolerant overlay network. Client requests are guaranteed to be routed in less steps on average than the worst case. Also, file replicas are distributed and storage is randomized for load balancing. In [87] , the authors proposed a secondary overlay to be layered on top of CAN[82], Chord[83], Tapestry [89], Pastry [88] that exploits knowledge of the underlying network characteristics. The secondary overlay builds a location layer between “supernodes,” nodes that are situated near network access points, such as gateways to administrative domains. By associating local nodes with their nearby “supernode,” messages across the wide-area can take advantage of the highly connected network infrastructure between these supernodes to shortcut across distant network domains, greatly reducing point-to-point routing distance and reducing network bandwidth usage. 2.2.4 Anonymity An important goal of anonymity is to allow people to use the systems without concern for legal or other ramifications. Ultimately, it will make censorship of digital content impossible. At this point, content creator, participant, overlay operators, and content itself are not identical. Furthermore there are 6 technical approaches so far. They are multicast, spoofing the sender’s address, identity spoofing, covert path, intractable aliases, and non-voluntary placement. There are more actors to be identified with respect to anonymity. Publisher, reader, server, and document are the main actors who can be anonymous [42]. In P2P, how anonymous the operator, the participant, and content creators should be one of the major problem to be resolved because one of the 12 13 Fault resilient will be explain in section 2.2.9. Cost of ownership will be explained in section 2.2.5 30 business requirements to enable the content provide to know who has accessed his or her content. 2.2.5 Cost of ownership One of premises of P2P computing is shared ownership. It reduces the cost of owning the system, the content, and maintenance. It means that the management overheads of controlling resources and data are reduced comparing with the client-server paradigm. From Napster [92], [email protected] [90] till BitTorrent [91], this is always to the advantage of P2P file sharing applications. In P2P, we will take use of this characteristics to enable a large number of computers join the P2P overlay network. The direct consequence of content sharing is that the operator can concentrate more on its services instead of infrastructure maintenance and expansion. However, how the peer and the server cost shall be shared is an important issue, because fairness is the basic principle that should be apply. The degree of cost sharing reflects the decentralization and self-organization. 2.2.6 Ad hoc connectivity Their ad hoc nature strongly affects all classes of P2P systems. Basically, this means that there are some nodes in the network that may be available all the time, but some are only part of the time, and some of them are not available at all. According to Mojo Nation P2P network measurements, 80%-84% of nodes fell into the availability group of disconnected “one time, less than one hour”, and 16%-20% fell into the remaining longer than one hour, and significant fraction stayed connected less than 24 hours before being permanently disconnected [93]. It is considered an exception in traditional networks for a node to be not available, but in P2P system this will be very common. Understanding this is very important. Therefore, in content sharing P2P systems and applications, users expect to be able to access content intermittently, subject to the connectivity of the content providers. In systems with higher guarantees, such as service-level agreements, the ad-hoc nature is reduced by redundant service providers, but parts of the providers servers(and hence content) may be unavailable. How to handle intermittent availability of certain participants is a major problem to be resolved. Replicating the content and or reallocating the content will be challenging in such a dynamic environment. This directly effect the system design for fault resiliency14 and self-organization. Furthermore, not everything will be connected to the Internet. Even under these circumstance, ad hoc groups of people form ad hoc networks in order to collaborate. The supporting ad-hoc networking infrastructures, such as 802.11b, Bluetooth, and infrared, have only a limited radius of accessibility. Therefore, both P2P systems and applications need to be designed to tolerate sudden disconnection and ad hoc additions to groups of peers. 2.2.7 Performance Performance is a significant concern in P2P systems. P2P systems aim to improve performance by aggregating distributed storage capacity (e.g., Napster, Gnutella) and computing cycles ( e.g., [email protected] ) of devices spread across a network. Because of the decentralized nature of these models, performance is influenced by three types 14 Fault resilient will be explained in section 2.2.9 31 of resources: processing, storage, and networking. In particular, networking delays can be significant in wide area networks. Bandwidth is a major factor when a large number of messages are propagated in the network and large amounts of files are being transferred among many peers. This limits the scalability of the system. Performance in this context does not emphasize millisecond performance, but rather tries to answer questions of how long it takes to retrieve a file or how much bandwidth a query will consume. In centrally coordinated systems (e.g., Napster, [email protected], and BitTorrent) coordination between peers is controlled and mediated by a central server, although the peers may also later contact each other directly. This makes these systems vulnerable to the problems facing centralized servers. To overcome the limitations of a centralized coordinator, different hybrid P2P architectures [94] have been proposed to distribute the functionality of the coordinator in multiple indexing servers that cooperate with each other to satisfy user requests. DNS is another example of a hierarchical P2P system that improves performance by defining a tree of coordinators, with each coordinator responsible for a peer group. In BitTorrent, trackers are designed to coordinate the peers’ downloading and upload metrics. Communication between peers in different groups is achieved through a higher level coordinator. In decentralized coordinated systems such as Gnutella and Freenet, there is no central coordinator; communication is handled individually by each peer. Typically, they use message forwarding mechanisms that search for information and data. The problem with such systems is that they end up sending a large number of messages over many hops from one peer to another. Each hop contributes to an increase in the bandwidth on the communication links and to the time required to get results for the queries. The bandwidth for a search query is proportional to the number of messages sent, which in turn is proportional to the number of peers that must process the request before finding the data[98]. There are three key approaches to optimize performance: replication, caching, and intelligent routing. 2.2.7.1 Replication Replication puts copies of objects/files closer to the requesting peers, thus minimizing the distance between the peers requesting and providing the objects. Changes to data objects have to be propagated to all the object replicas. Oceanstore uses an update propagation scheme based on conflict resolution that supports a wide range of consistency semantics. The geographic distribution of the peers helps to reduce congestion of both the peers and the network. In combination with intelligent routing, replication helps to minimize the delay by sending requests to closely located peers. Replication also helps to cope with the disappearance of peers. Because peers tend to be personal computer (PCs) rather than dedicated servers, there is no guarantee that the peers won't be disconnected from the network arbitrary. The key is replication of redundant data across the dynamic overlay in order to achieve maximal object data availability. 2.2.7.2 Caching Caching reduces the path length required to fetch a file/object and therefore the number of messages exchanged between the peers. Reducing such transmissions is important because the communication latency between the peers is a serious performance bottleneck facing P2P systems. In Freenet for example, when a file is found and propagated to the requesting node, the file is cached locally in all the nodes in the return path. More efficient caching strategies can be used to cache large 32 amounts of data infrequently. The goal of caching is to minimize peer access latencies, to maximize query throughput and to balance the workload in the system. The object replicas can be used for load balancing and latency reduction. 2.2.7.3 Intelligent routing and peering To realize the potential of P2P networks, it is important to understand and exploit the social interactions between the peers. The most pioneering work in studying the social connections among people is the “small-world phenomenon” initiated by Milgram [95]. The goal of his experiment was to find short chains of acquaintances linking pairs of people in the United States who did not know one another. Using booklets of postcards he discovered that Americans in the 1960s were, on average, about six acquaintances away from each other. Adamic, et al. have explored the power-law distribution of the P2P networks, and have introduced local search strategies that use high-degree nodes and have costs that scale sub-linearly with the size of the network [96]. Ramanathan et al. [2001] determine “good” peers based on interest, and dynamically manipulate the connections between peers to guarantee that peers with a high degree of similar interests are connected closely. Establishing a good set of peers reduces the number of messages broadcast in the network and the number of peers that process a request before a result is found [97]. A number of academic systems, such as Oceanstore and Pastry, improve performance by proactively moving the data in the network (see also Section 2.1.1.6). The advantage of these approaches is that peers decide whom to contact and when to add/drop a connection based on local information only. 2.2.7.4 Security Extensive research on this topic in P2P network has occurred because of requirements for anonymity15, DRM [101], AAA [99], Firewall[103], and NAT [102]. The contemporary research such as Publius [115] addresses the encryption via public keys and multiple private keys as asymmetric manner. Recent improvements reduce the cost of Byzantine agreement, guiding future research regarding asynchronism in P2P network [99]. Anonymity of the receiving peer, sending peer and content creator shall be leveraged in a P2P system because a anonymity should be design in terms of business requirements described in chapter 2. For DRM, we will not focus on this area because we focus open content. Open content distribution is free for everyone to use. However, we need to mention how P2P will be affected by the DRM. Distributed computing P2P systems require execution of some code on peer machines. It is crucial to protect the peer machines from potentially malicious code and protect the code from a malicious peer machine. Protecting a peer machine typically involves enforcing (1) safety properties such that the external code will not crash the host, or will only access the host data in a type-safe way, and (2) enforcing security properties 15 Please also refer to 2.2.4 33 to prevent sensitive data from being leaked to malicious parties. Techniques to enforce the include sandboxing, safe languages (e.g., java), virtual machines etc. 2.2.7.5 Digital Right Management P2P file sharing makes file copying easy. It is necessary to protect the authors from having their intellectual property stolen. One way to handle this problem is to add a signature in the file that makes it recognizable (the signature remains attached to the file contents) although the file contents do not seem affected. This technique, referenced as watermarking or steganography[105], has been experimented with by RIAA [104] to protect audio files such as MP3s by hiding the copyright information in the file in inaudible ways. 2.2.7.6 Reputation Free riders are a common problem in P2P file sharing systems. An user who only downloads files/objects, but not share the content with others. However, some of the users share lots of content with others. Therefore, one of the metrics is to identify whether a participant is “good” is their reputation. Accountability mechanism should be developed to identify peers with different reputations, for instance, one of approach is cross-rating. But ad hoc network have the possibility of unauthenticated and untrusted users. When multiple people are downloading the same file at the same time, they upload pieces of the file to each other. Not only does this redistribute the cost of downloading but it also eliminates free riding. Thus the node reputation can be resolved both regarding security and self-organization. In BitTorrent, there is a very neat strategy to resolve this free rider problem, via downloading while uploading. [4] 2.2.7.7 Accountability This is another challenge in P2P, in terms of how to provide fair anonymity. Only if the overlay operator is able to know the identity of creator and delivery sites of specific content, can he ensure that the content creator and his or her content is traceable. This is not a technical problem, but rather about how anonymity is implemented and who the operator can trust to hold such information. Firewalls and NAT are other challenges for P2P system. Since most Firewalls block TCP inbound traffic and many applications abuse port 80, which was only designed for HTTP, this causes communication difficulties between participants. If the participants are hidden behind different Firewalls communication becomes even harder. A common solution is configuring the Port used while installing the P2P software. But we need to look into this further and find out which method is used best. Maybe it is very rational to have more afford on the operator’s side concerning other system features. For instance, a P2P relay/reflector server can be developed to find the peers if they are behind firewalls. We can also apply the same method for peers behind Network Address Translators. As we described in section 2.1.1.2, every node can be assigned a node ID based upon hashing function. In this way, every peer can be addressed and indexed. This will efficiently overcome the problems caused by Firewall’s and NATs. The key issue is how to map the key of the object and the key of the node to achieve best locality (i.e. minimal distance). 34 2.2.8 Transparency and Usability There are many things which should be transparent in distributed systems. They included transparency of location, access, concurrency, replication, failure, mobility, scaling, etc. [104]. For example, in file sharing system, P2P users’ localization experience will directly benefit from naming/addressing transparency. Besides this transparency, administrative transparency will deliver user-friendly configuration, and make user client upgrades more easy. Similar to interoperability, network and device transparent P2P systems will work on the Internet, intranets, and private networks, using high-speed or dial-up links. They should also be device transparent, which means they should work on a variety of devices, such as handheld personal digital assistants ( PDAs), desktops, cell phones, and tablets. As an application layer overlay, it should be design to work on different devices such as desktops, laptops, PDAs, cell phones, TV-Set Top Boxes, and game stations. On the operator’s side, automatic and transparent authentication of user and user agent can significantly reduce complexity from user perspective. Supporting mobile users to enable them to download files/objects independently no matter whether she or he is connected to the Internet. On the other hand, as a user, he can simply use a web interface rather than other protocols. Thus, naming/addressing, administrative transparency, network and device transparent, transparent authentication, mobility transparency can significantly enhance a P2P network’s usability. 2.2.9 Fault-resilience Decentralization eliminates a central point of failure, this has significant advantages. However, the solution for this problem varies when the system spans multiple hosts or networks. For example, it must consider disconnections/impeachability, partitions, and node failures. The problem is how to attach to those peers still connected in such presence of failures. How to connect to the remaining peers when the current neighbor disappears because a link is broken? How to resume the computation before the failure in order to continue the collaboration when connectivity is restarted? Perhaps, we should look into [email protected] [email protected] and [email protected][108]; [email protected] as they attack this problem by partitioning the computations. Another method to attack disconnection and node failure is a relay service. In the Groove[110] P2P system, they handle these problems by using some special nodes called replays, which store any updates or communications temporarily until the destination peer reappears in the network. Similarly, Magi [109] queues the messages at the source until the presence of the destination peer is detected. Yahoo[111] and ICQ[112] also adopted a similar approach for when peer is offline or disconnected. Another problem is non-availability of a resource. The resource might not be available because of node failure, network link failure, because a node has gone offline etc. The popular method to resolve this problem is to replicate the resources or content. In this method, there are two approaches: passive and active replication. P2P systems such as Napster and Gnutella, implement both passive and an active uncontrolled replication mechanism based up on the file’s popularity. It will be nice to provide persistent replication nodes to guarantee resource availability, but we need to look into the more active replication strategy and policy in that case. As I introduced in chapter 1, this can definitely exploit synergy from a CDN’s replication strategies. In such a replication mechanism, we should not send replicas to those nodes which disappear or go offline or flip frequently all the time. Before sending replicas to a specific 35 replication node, a policy should be established to select good replication locations. Oceanstore has implemented such mechanism. The inspiration for such a policy can be borrowed from BGP’s route flap dampening mechanism[113]. Thus a penalty can be added according to the history of certain destination peers which are candidates for the replication peer. During the time of transmission, a heartbeat message can be sent from the resource peer. Whenever there is no heartbeat from the resource, the receiving peer will notice the failure of the resource, then it will select a new replica. This method was used in [114]. Anonymous publishing systems such as Freenet and Publius ensure availability by controlled replication. Oceanstore maintains a twolayered hierarchy of replicas and by monitoring of administrative domains avoids sending replicas to locations with a highly correlated probability of failure. However, because a resource in the P2P system could be more than a just a file – such as a proxy to the Internet, shared storage space, or shared computing power – the concepts of replicated file systems have to be extended to cover additional types of resources. Grid computing solutions (e.g. Legion [167]) provide resilience against node failures by restarting computations on different nodes. 2.2.10 Manageability A challenging aspect of P2P is that the system maintenance responsibility has been distributed completely and needs to be addressed by each peer to ensure availability. This is quite different from client-server systems, where availability is a server-side responsibility. The managerial issues include the following aspects: (1) An operator of such network, must ensure that the upgrade system and maintain their system across. Adopting either a centralized management approach or selfmanaging approach will definitely effect decentralization, self-organization, fault resilient, and many other properties of the system. (2) In file-sharing applications, P2P systems enable easy and fast retrieval of the data by distributing the data to caches located at the edges of the network. The location of the data is not known by the retriever, potentially even after the data is retrieved. Freenet, for example, stores the data in many locations in the path between the provider and the retriever, so the whole notion of hosting a file becomes meaningless. Files move freely among the peers and are allowed to disappear even if they are currently being downloaded. This has some important implications. For example, the question is who is accountable for the files (see Section 2.1.1.8.4). Also, how can we ensure that the entire data content is being downloaded, i.e., how do we cope with the un-reliability of the peers. (3) P2P has the following implications for the IT industry: accountability, control, manageability, and standards. The first three are very closely related. Accountability is emphasized in centralized systems where access is monitored through logins, accounts, and the logging of activities. Accountability is more difficult to achieve in client-server systems, because of interactions with multiple clients. It is weakest in P2P systems, because of equal rights and distributed functionality among the peers. Similar reasoning applies for control. In centralized and client-server systems, control is done at one or more well-defined points, whereas it is harder to achieve in P2P systems, as control is entirely distributed. Therefore, more and more P2P systems adopt a hybrid structure for their P2P network. 2.2.11 Interoperability According to earlier distributed systems research, the following are requirements for interoperability in P2P: 36 The systems must interoperate. The protocols shall be used in systems communication, for instance HTTP, XML, SOAP, etc. must be in common. They must exchange signaling and traffics data across systems. The systems must determine if they are compatible at both the lower level and higher levels. Systems must advertise and maintain the same level of security , QoS, and reliability. A P2P system’s protocols be ported into the other P2P system. Interoperability means that the ability of systems, units, or forces to provide services to and accept services from other systems, units, or forces and to use the services so exchanged to enable them to operate effectively together [116]. There are many attempts for P2P system interpretability. One solution is to port a P2P middleware platform such as JXTA[117], BEEP[118], etc.; another is to build up a gateway to translate the protocols between each other[119]. In PlentyCast implementation, we can choose to develop our prototype based upon the JXTA platform as it is both open source and claims to be a de facto standard and nice adaptations exist for different OS platforms (including clear Java APIs for developers). 2.3 Core techniques There are many core techniques, including location and routing, overlay construction, and security mechanisms. We only explain the former two in this paper. 2.3.1 Location and routing In the following subsections, I will explain what most well known mechanisms of location and routing in P2P. They are: centralized directory model, flooding request model, and Distributed Hashing Table model. Then I will focus on introducing some outstanding Distributed Hash Table routing mechanisms. 2.3.1.1 Centralized directory model This model was made popular by Napster and Audiogalaxy[120]. The peers of the community connect to a central directory where they publish information about the content they offer for sharing (see Figure 8). Upon request from a peer, the central index will match the request with the best peer in its directory that matches the request. The best peer could be the one that is cheapest, fastest, or the most available, depending on the user’s needs. Then a file exchange will occur directly between the two peers. This model requires some managed infrastructure (the directory server), which hosts information about all participants in the community. This causes the model to have scalability limits, because it requires bigger servers when the number of requests increase, and larger storage when the number of users increase. However, Napster’s experience showed that - except for legal issues – the model was relatively strong and efficient. 2.3.1.2 Flooding requests model The flooding model is different from the central index one. This is a pure P2P model in which no advertisement of shared resources occurs. Instead, each request from a peer is flooded (broadcast) to directly connected peers, which themselves flood their neighbor peers etc., until the request is answered or a maximum number of flooding 37 steps (i.e. steps 2, 7, 6, 5 ) occur (see Figure 9). This model, which is used by Gnutella, requires a lot of network bandwidth for the discovery, and hence does not prove to be very scalable, but it is efficient in limited communities such as a company network. Figure 8. Centralized request model Figure 9. Flooding request model To solve this problem, some companies have been developing “super-peer” client software, which concentrates lots of the requests. This leads to much lower network bandwidth consumption, at the expense of high CPU consumption. Caching of recent search requests is also used to improve scalability. 2.3.1.3 Distributed Hashing Table model In response to the scaling problems exposed in the above centralized directory model and flooding request model, several research groups have (independently) proposed a new generation of scalable P2P systems that support a distributed hash table (DHT); 38 such as: Tapestry, Pastry, Chord, Content Addressable Networks (CAN), and Oceanstore. In these systems, utilizing Distributed Hash Tables (DHTs), files are Figure 10. DHT Model associated with a key (produced, for instance, by hashing the file name and its contents, or IP address and host name). Each peer from the network is assigned a random key and each peer also knows a given number of peers and each of them in the system is responsible for storing a certain range of keys. Each peer will then route the file towards the peer with the key that is most similar to the node key. This process is repeated until the nearest peer key is the current peer’s key. There is one basic operation in these DHT systems, lookup (key), which returns the identity (e.g., the IP address) of the node storing the object with that key. When a request originator issues lookup(key), the lookup is routed through the overlay network to the node responsible for that key. Then the document is transferred back to the request originator, while each peer participating in the routing will keep a local copy of the file. An example is shown in Figure 10. This operation allows nodes to put and get files based on their key, thereby supporting the hash-table-like interface. This DHT functionality has proved to be a useful in large distributed systems. However, it has the problem that the file keys must be known before posting a request for a given file. Hence it is more difficult to implement a search than in the flooding requests model. Also, network partitioning can lead to an island problem, where the community splits into independent sub-communities, that don’t have links to each other. Therefore, we must find the best trade-off amongst geometry, distance, and algorithm of a specific DHT routing mechanism [121]. 2.3.1.4 Plaxton location and routing Plaxton et al.[122] developed what probably was the first routing algorithm that could be scalably used by DHTs. Although not intended for use in P2P systems, because it assumes a relatively static node population, it provides very efficient routing of lookups. The routing algorithm works by “correcting" a single digit at a time, for example: if node number 36278 receives a lookup query with key 36912, which 39 matches its first two digits, the routing algorithm forwards the query to a node that matches the first three digits (e.g., node 36955). To do this, a node needs to have, as neighbors, nodes that match each prefix of its own identifier, but differ in the next digit. For a system of n nodes, each node has on the order of O(n) neighbors. As one digit is corrected each time the query is forwarded, the routing path is at most O(n) overlay (or application-level) hops. This algorithm has the additional property that if the n² node-to-node latencies (or “distances" according to some metric) are known, the routing tables can be chosen to minimize the expected path latency and, moreover, the latency of the overlay path between two nodes is within a constant factor of the latency of the direct underlying network path. The Plaxton location and routing system provides several desirable properties for both routing and location: Simple Fault Handling: Because routing only requires that nodes match a certain suffix, there is potential to route around any single link or server failure by choosing another node with a similar suffix. Scalable: It is inherently decentralized, and all routing is done by using locally available data in DHT. Without a point of centralization, the only possible bottleneck exists at the originate request node. Exploiting Locality: With a reasonably distributed namespace, resolving each additional digit of a suffix reduces the number of satisfying candidates by a factor of the ID base b (the number of nodes that satisfy a suffix with one more digit specified decreases geometrically). The path taken to the root node by the publisher or server S storing the object O and the path taken by client C will likely converge quickly, because the number of nodes to route to drops geometrically with each additional hop. Therefore, queries for local objects are likely to quickly run into a router with a pointer to the object's location. Proportional Route Distance: Plaxton has proven that the total network distance traveled by a message during both the location and the routing phase is proportional to the underlying network distance, assuring us that routing on the Plaxton overlay incuses a reasonable overhead. There are, however, serious limitations to the original Plaxton scheme: Global Knowledge: To achieve a unique mapping between document identifiers and root nodes, the Plaxton scheme requires global knowledge at the time that the Plaxton mesh is constructed. This global knowledge greatly complicates the process of adding and removing nodes from the network. Root Node Vulnerability: As a location mechanism, the root node for an object is a single point of failure because it is the node that every client relies on to provide an object's location information. Whereas intermediate nodes in the location process are interchangeable, a corrupted or unreachable root node would make objects invisible to distant clients who do not meet any intermediate hops on their way to the root. Lack of Ability to Adapt: While the location mechanism exploits good locality, the Plaxton scheme lacks the ability to adapt to dynamic query patterns, such as distant hotspots. Correlated access patterns to objects are not exploited, and potential trouble spots are not corrected before they cause overload or cause congestion problems in a wide area. Similarly, the static nature of the Plaxton mesh means that insertions could only be handled by using global knowledge to re-compute the function for mapping objects to root nodes[123]. 40 2.3.1.5 DHT algorithms benchmarking In the following Table 1, we can find the difference between different DHT algorithms. DHT-based P2P systems Parameters Hops to locate data Routing path length Notification on joins and leaves Reliability Overlay geometry Chord N- number of peers in the network Log N Log N (Log N)² replicate data on multiple consecutive peers, appropriate retries on failure Ring CAN d – number of dimensions N- number of peers in the network 2·d 2·d multiple peers responsible for each data item, appropriate retries on failure Hypercube Tapestry N- number of peers in the network b- base of chosen identifier b- base of chosen identifier N- number of peers in the network N- number of servers in the network (n-numbers of hashing functions i-size of the filter)16 Log b N Log b N Log N replicate data across multiple peers, keep track of multiple paths to each peer Tree + Ring (Hybrid) Log b N b · Log b N + b Log N replicate data across multiple peers, keep track of multiple paths to each peer Tree + Ring (Hybrid) Log N Log N Log N Floating replica technique, prefetching and proactive migration, Bayou-like conflict resolution update technique Tree Pastry Oceanstore d·N 1/d Table 1. DHT algorithms benchmarking Five main algorithms have implemented the DHT routing model: Chord, CAN, Tapestry, Pastry, and Oceanstore[124]. The goals of each algorithm are similar. The primary goals are to reduce the number of P2P hops that must be taken to locate a document of interest and to reduce the amount of routing state that must be kept at each peer. Each of the five algorithms either guarantee logarithmic bounds with respect to the size of the peer community, or argue that logarithmic bounds can be achieved with high probability. The differences in each approach are minimal, however each is more suitable for slightly different environments. In Chord, each peer keeps track of other peers (where N is the total number of peers in the community).When peer joins and leaves occur the highly optimized version of the algorithm will only need to notify other peers of the change. In CAN, each peer keeps track of only a small number of other peers. Only this set of peers is affected during insertion and deletion, making CAN more 16 This will not be needed until the false positive rate will be tuned. 41 suitable for dynamic communities. However, the tradeoff in this case lies in the fact that the smaller the routing table of a CAN peer, the longer the length of searches. Tapestry and Pastry are very similar. The primary benefit of these algorithms over the other two is that they actively try to reduce the latency of each P2P hop in addition to reducing the number of hops taken during a search. In Table 1, I have learnt their benchmarking in according to studies [5], [123], [124]. The Chord algorithm models the identifier space as a one-dimensional, ring overlay geometry[121]. Peers are assigned IDs based on a hash on the IP address of the peer. When a peer joins the network, it contacts a gateway peer and routes toward its successor. The routing table at each peer n contains entries for other peers where the ith peer succeeds n by at least. To route to another peer, the routing table at each hop is consulted and the message is forwarded toward the desired peer. When the successor of the new peer is found, the new peer takes responsibility for the set of documents that have identifiers less than or equal to its identifier and establishes its routing table. It then updates the routing state of all other peers in the network that are affected by the insertion. To increase the robustness of the algorithm, each document can be stored at some number of successive peers. Therefore, if a single peer fails, the network can be repaired and the document can be found at another peer. CAN models the identifier space as n-dimensional. The overlay belongs to a type of hypercube structure. Each peer keeps track of its neighbors in each dimension. When a new peer joins the network, it randomly chooses a point in the identifier space and contacts the peer currently responsible for that point. The contacted peer splits the entire space for which it is responsible into two pieces and transfers responsibility of half to the new peer. The new peer also contacts all of the neighbors to update their routing entries. To increase the robustness of this algorithm, the entire identifier space can be replicated to create two or more “realities”. In each reality, each peer is responsible for a different set of information. Therefore, if a document cannot be found in one reality, a peer can use the routing information for a second reality to find the desired information. Thus it achieves quite good reliability. Tapestry and Pastry are very similar and are based on the idea of a Plaxton mesh. Their overlay geometry is structured in ring and tree topology. Identifiers are assigned based on a hash on the IP address of each peer. When a peer joins the network, it contacts a gateway peer and routes toward the peer in the network with the ID that most closely matches its own ID. Routing state for the new peer is built by copying the routing state of the peers along the path toward the new peer's location. For a given peer n, its routing table will contain i levels where the i-th level contains references to b nodes (where b is the base of the identifier) that have identifiers that match n in the last i positions. Routing is based on a longest suffix protocol that selects the next hop to be the peer that has a suffix that matches the desired location in the greatest number of positions. Robustness in this protocol relies on the fact that at each hop, multiple nodes, and hence multiple paths, may be traversed. OceanStore is based on Attenuated Bloom Filter and Plaxton mesh. Its overlay geometry is a tree structure. First to create a consistent distributed directory of objects and secondly to route object location requests. It uses a double routing mechanism: (1)Attenuated Bloom filters as the primary step; this allows the queried content to be retrieved efficiently with high probability; (2) then Plaxton routing whenever the first 42 algorithm fails. The first mechanism can fail because Bloom filters can sometimes be misleading owing to false positives. In Oceanstore the misleading behavior happens when attenuated Bloom filters indicate that two or more routes can lead to the object requested. This conflict cannot be avoided entirely, but it is possible to lower the probability of misleading behavior by choosing appropriate parameters for the Bloom such as number of hashing functions or filter size (width/bits of object ID). To allocate an object, if the local filter can’t find it, then contact the neighbor for the first bit of that object key which possibly has a match. If the neighbor does not match, then forward the query to the next possible filter to process, and so forth till it match the key. As the inverse procedure of location, a replica can find its server in the replication phase of the object. Different from previous systems, when used as a network storage system, OceanStore uses floating replica strategy to replicate the objects and uses prefetching and proactive migration to achieve high reliability of shared data objects. To solve the problem in replica coherency across its overlay, it exploits similar mechanism to Bayou [125]. Also, it minimizes the costs in location and routing of logarithmic bounds with respect to the numbers of servers/peers. 2.3.2 Overlay network mapping We argue that any solution that does not consider the underlying network structure will produces useless results for overlay networks. Fortunately, we have found many who people have the same thoughts as we have. The overlay network turns out to be an abstract set of hosts that play an active role in the data location process. As shown in Figure 11, the hosts in the overlay network are a subset of the total number of hosts present. Note how end-to-end paths between two hosts can be different in the two layers. In particular a single edge in the overlay graph can include multiple edges in the corresponding underlying one. This lack of path correspondence between the two layers makes the routing process more difficult: a route retained as optimal in the overlay graph might not be optimal in the underlying graph. This problem is one of the most significant in the current implementations of DHTs. On the one hand, most designs take forwarding decisions at each hop, based on the neighborhood relationship in the overlay network. Depending on how the overlay network has been built, this can lead to awful results. For example, a node in KTH has its neighbor nodes in Europe and hence its path to a node in Stanford, California, USA may traverse distant nodes in Europe. Ideally, one would like to improve routing performance by avoiding such unnecessary high latency hops. Thus, a fundamental challenge in using large-scale overlay networks is to incorporate IP level topological information in the construction of the overlay to improve routing performance. This problem has been attracting more and more attention from researchers [123], [126], [127], [128], [129]. Basically, they look at this problem as a proximity problem. This includes two important steps in a overlay network: proximity routing and proximity neighbor selection. 2.3.2.1 Proximity routing Proximity routing was first proposed in CAN. It involves no changes to routing table construction and maintenance because routing tables are built without taking network proximity into account. However, each node measures the Round Trip Time (RTT) to each neighbor (routing table entry) and forwards messages to the neighbor with the maximum ratio of progress in the d-dimensional space to RTT. As the 43 number of neighbors is small (2d on average) and neighbors are spread randomly over the network topology, the distance to the nearest neighbor is likely to be significantly Figure 11. Overlay concept larger than the distance to the nearest node in the overlay. Additionally, this approach trades off the number of hops in the path against the network distance traversed in each hop: it can ever increase the number of hops. Because of these limitations the technique is less efficient than geographical layout. Proximity routing has also been used in a version of Chord. Here, a small number of nodes are maintained in each finger table entry rather than only one, and as I explained previously, a message is forwarded to the topologically closest node among those entries whose node ID is closer to the message's key. As all entries are chosen from a special region of the ID space, the expected topological distance to the nearest of the entries is likely to be much larger than the distance to the nearest node in the overlay. Furthermore, it appears that all these entries have to be maintained for this technique to be effective because not all entries can be used for all keys. This increases the overhead of node joins and the size of routing tables. In [126], they proposed a binning strategy to solve this proximity problem. Here, proximity routing offers some improvement in routing performance, but this improvement is limited by the fact that a small number of nodes sampled from special portions of the node ID space are not likely to be among the nodes that are closest in the network topology. For instance in the following Figure 13. The rationale behind this scheme is that topologically close nodes are likely to have the same ordering and hence will belong to the same bin. They choose relative distances to achieve their “binning” strategy, i.e. latencies from this set of landmarks. A node measures its round-trip-time to each of these landmarks and orders the landmarks in order of increasing RTT. Thus, based on its delay measurements to the different landmarks, every node has an associated ordering of landmarks. This ordering represents the “bin” the node belongs to. However we can do better than just using the ordering to define a bin. They divide the range of possible latency values into a number of levels. For example, we might divide the range of possible latency values into 3 levels; level 0 for latencies in the range [0,80]ms, level 1 for latencies 44 Figure 12. Binning Strategy concept between [80, 160]ms and level 2 for latencies greater than 160ms. We then augment the landmark ordering of a node with a level vector; one level number corresponding to each landmark in the ordering. To illustrate, consider node A in Figure 1. Its distance to landmarks L1, L2 and L3 are 200ms, 60ms and 100ms respectively. Hence its ordering of landmarks is L2L3L1. Using the 3 levels defined above, node A’s level vector corresponding to its ordering of landmarks is “0 1 2”. Thus, node A’s bin is “L2L3L1:012”. They find that the landmarks can achieve good scalability on making “bins” if there are 10 machines to be placed in the landmark farm. In according to their extensive simulation in power-law network topologies, their binning scheme does a reasonable job of placing nearby nodes into the same bin. This can be a good approach to achieve topology-aware node selection. Simply, the nearest node can be selected shall exist in the same bin or similar bins. 2.3.2.2 Proximity neighbor/server selection The locality properties of Tapestry and Pastry derive from mechanisms to build routing tables that take network proximity into account. They attempt to minimize the distance according to the proximity metric, to each one of the nodes that appear in a node's routing table, subject to the constraints on node ID prefixes. Pastry uses the following metrics in its routing table: 1. Proximity invariant: Each entry in a node's routing table refers to a node that is near, according to the proximity metric, among all live Pastry nodes with the appropriate node ID prefix17. As a result of the proximity invariant, a message 17 Pastry ID is 160 bits 45 is normally forwarded in each routing step to a nearby node, according to the proximity metric, among all nodes whose node ID shares a longer prefix with the key. Moreover, the expected distance traveled in each consecutive routing step increases exponentially, because the density of nodes decreases exponentially with the length of the prefix match. From this property, one can derive two distinct properties of Pastry with respect to network locality: distance traveled and route convergence. 2. Total distance traveled - The expected distance of the latest routing step tends to dominate the total distance traveled by a message. As a result, the average total distance traveled by a message exceeds the distance between source and destination node only by a small constant value. 3. Local route convergence - The paths of two Pastry messages sent from nearby nodes with identical keys tend to converge in the proximity space at a node near the source nodes. To see this, observe that in each consecutive routing step, the messages travel exponentially larger distances towards an exponentially shrinking set of nodes. Thus, the probability of route convergence increases in each step, even if earlier (smaller) routing steps have moved the messages farther apart. This result is of significance for caching applications layered on Pastry. The routing algorithms in Pastry and Tapestry claim that they allow effective proximity neighbor selection because there is freedom to choose nearby routing table entries from among a large set of nodes. CAN also proposed a limited form of proximity neighbor selection in which several nodes are assigned to the same zone in the d-dimensional space. Each node periodically gets a list of the nodes in a neighboring zone and measures the RTT to each of them. The node with the lowest RTT18 is chosen as the neighbor for that zone. This technique is less effective than those used in Tapestry and Pastry because each routing table entry is chosen from a small set of nodes. In the distributed binning algorithm, sever selection process occurs as follows: (1) if there exist one or more servers within the same bin as the client, then the client is redirected to a random server from its own bin; (2)if no server exists within the same bin as the client, then an existing server is selected at random from the set of servers whose bin is most similar to the client’s bin. The degree of similarity between two bins is defined to be the number of positions in their landmark orderings on which they match. There are three types of selection metrics that have been used, Hotz metric[130] (using inter-node distance calculated by node to landmark distance), and Cartesian distance(n-dimension of landmark number). Although the performance of Hotz distance based selection is competitive with the other two schemes, the researchers in [126] conclude that we really do not need to work very hard to achieve good server selection. Hence in designing such topology inference systems one might argue that the simplicity, scalability, and practicality of the system should be as important goals as prediction accuracy. In practice, this server selection might be implemented by having the client include its bin information in a DNS query. DNS name servers could maintain the bin information for servers holding their content (for example, CNN’s name server might 18 RTT stands for Round Trip Time. 46 maintain the bin information for Web servers holding CNN content). Name servers might then use the above scheme to select a server for the requesting client. 2.4 Discussion Equivalence is the essence of Pee-to-Peer technology. This enables different attributes of P2P in an P2P system such as decentralization, scalability, self-organization, anonymity, cost of ownership, ad-hoc connectivity, performance, security, anonymity, DRM, transparency and usability, fault-resilience, manageability, interoperability. File-sharing applications have shown excellence of scalability, decentralization, cost of ownership, transparency and usability. When we use this technology in our work, we shall take advantage of these attributes in order to achieve our project goals. 47 Chapter 3 Introduction to Swarm in content delivery A latest performance study [11] shows that swarming19 scales with offered load up to several orders of magnitude beyond what a basic web server can manage. Most impressively, swarming enables a web server to gracefully cope with a flash crowd, with minimal effect on client performance. 3.1 An overview of swarm in content delivery The main constraint to scalable content delivery is the web’s dependence on a clientserver model, which is inherently limited in its ability to scale to large numbers of clients. As the load on a web server increases, it must either begin refusing clients or else all clients will suffer from long download times. This makes it difficult for a website with limited bandwidth to serve large files or a large number of requesting clients, particularly in a flash crowd event. Many Peer-to-Peer systems have addressed this problem by using similar swarm techniques such as Gnutella, Swarmcast[145] and Onion Networks [144], BitTorrent[146], CoopNet [8], Pseudoserving [137], Backslash system[138], and PROOFS[139] and Swarming [11]. Basically, swarm intelligence[132] is one branch of Artificial Intelligence[131]. Swarm in content delivery is a peer-to-peer content delivery mechanism that utilizes parallel download among a mesh of cooperating peers. Comparing with content delivery in client-server paradigm, swarm create a mesh amongst the clients, the content data is transmitted in this mesh. Thus it significantly speed up downloading speed for a user, reduce content server load, and enhance content data availability. This is depicted in the following Figure. 14. In the following subsections, I will explain what key issues must be addressed in these swarm techniques. Figure 13. benchmark between client-server and peer-to-peer swarm in content delivery 19 One latest proposal of swarm technique in study [11] 48 3.2 Core techniques in swarm content delivery There are some commonalties amongst those swarm content delivery systems such as Gnutella, BitTorrent, CoopNet, Pseudoserving, Backslash system, and PROOFS and Swarming. First of all, they need to split the large file into small block/pieces. Secondly, they eventually form a full mesh amongst the peers who carries different percentage of a file or the complete file. Thirdly, in the mesh, every peer locate and download the block/s from another peer via certain cooperation protocol with each other. Some of them choose strategy of downloading and uploading simultaneously in one peer such as Pareto Efficiency in BitTorrent, and asymmetric mechanism in Swarming. Before we go further deep into the techniques of swarm, I would like to examine the procedure of swarm in content delivery. In general, most swarm systems choose many-to-one and one to many transmission a model so that they can achieve many-tomany mesh. The key is how to form such a mesh for a given content. Figure 15 depicts the flow of swarm in content delivery. There are 6 phases should be very important in this work flow: 1. An new peer requests the content. In this phase, a new peer sends a request to the content directory for specific content downloading. Usually the content publishing becomes important. What information are important to the request peer/client in terms of the overall swarm strategy. 2. The content directory sends the new requesting peer with necessary information. These includes: list of the suitable siblings for this new peer to connect and partial content blocks, or just the peer list to the new peer who has the blocks of that content. In this phase, it is important for the new peer to select the right peers to download the content in terms of metrics such as distance, bandwidth, etc. 3. The new peer requests blocks from the peers on the list. In this phase, the new peer contacts the peers given by the content directory. 4. The new peer downloads from others concurrently. Many-to-one transmission has been established in this phase. The blocks of the content are sent from the one who has downloaded before to the new peer. During the transmission, fault resilient put important requirement for the request peer. How to deal with counterpart’s disconnection or failure becomes very important for the download duration. 5. The new peer sends blocks to others. A full mesh for this content is created amongst peers. Now the one-to-many pattern has been established.. Thus a mesh amongst all downloading peers, mean while they should also upload. This mesh is created for a specific content. Due to the very high frequency of join and leave behavior in the mesh, how to deal with this dynamic situation become another important question to answer. 6. When the new peer complete downloading then leave the mesh. Another new peer and repeat (1) Except for the sequential join events, what if the system have multiple peers join simultaneously? In the following sections, we will examine three typical swarm systems Swarmcast & Onion Networks20, BitTorrent, and Swarming in terms of some key issues which happen during their content delivery. In Table 1, we summarizes different approaches to these key issues in the three typical P2P swarm systems. 20 The people work in Onion networks are almost the same people worked in Swarmcast project before. 49 3.2.1 Splitting large files In general, it is done by opening the large file and then read the conditions parameters for example block size or line numbers. The programme keeps reading the large file from STDIN and then writes to another temp file, if the condition is met then close the temp file and name or encrypt it using the naming policy. Then the read pointer moves to the next character in the original large file, continuously writing in a new file and so forth. When the end of the large file is reached the program is ended. For the detailed information please refer to Appendix 2. I also have tried some commercial products to split large file for example in Fast File Splitter [140]. They can provide more features such as encryption, see snapshoot in Appendix 1 for how it was done. There are many commercial file splitters on the market[141][142][143]. 3.2.2 Initiated publishing During replica placement before client access the content in CDN, a swarm delivery system also faces the challenge to make the correct placement of the content. There are two aspects that should be considered: 1. A content directory server needs to decide when to initiate (or stop) swarming for a particular file based on current server performance and the popularity of the file. Since it is not cost-effective to carry on swarming delivery when there are only a few users to access this content. Especially, if these users are very far away from each other. Figure 14. Swarming flow overview 2. While swarming happens, the content directory server needs to decide what portion of clients to send just a single block and how many to serve with the entire file. When a large number of clients access the content directory, server cost for sending replies is expensive. Depending on the strategy in 1, a careful consideration of what to send is needed. For instance, if you send too many 50 entire file copies to the clients, it may overload your server. In addition, if you send too few blocks to the clients, the swarm might not be formed effectively. 3. In such a dynamic hybrid P2P architecture, it is a challenge for the server to distribute the load to the clients via swarming delivery. The content directory server should trade-off between load-balancing of its own load and high availability of the content data in order to provide an effective swarming delivery. The goal of swarming delivery is to take advantage of high content availability to achieve fast download for user requests. 3.2.3 Mesh construction To be able to deliver the content in many-to-one and one-to-many transmission model, it is the key to form and maintain a mesh connection in between the clients/peers who request the same content. This is the base for a peer to execute download concurrently of different blocks from peers, and upload what he has simultaneously. While doing swarming delivery, the client must deal with long-term dynamics and must determine when to add or drop a peer. In addition, because the client is using parallel download, it must decide which blocks to download from which server peer, while coping with the fact that each peer may potentially have a different set of blocks. Thereby, what protocols shall be used to communicate with each other in order to maintain the mesh. How much network status of other peers should be maintained when peers join and leave with very high frequency? 3.2.4 Peer and content identification A client needs to locate other peers who have desired content so that it can use them as server peers. Whether a peer needs to be identified is a tricky question. If you devise a peer who should know the global status of the overlay network, identification is a must, for example in Chord, CAN, Pastry, and Tapestry. But if you don’t want a peer have global status then you can ignore peer identification, for instance in Swarming delivery in [11]. The advantage of the former method is not only on the global status knowledge, but a good way of penetrate NAT or Firewalls because a peer ID can bootstrap the IP address of that peer which is usually hard to obtain behind NAT or Firewall. Each content object must have a identification for many reasons. Firstly, because the block’s identification replies on the content identification. Secondly, to avoid synchrony of the file name for different content. Thirdly, a unique ID of this content can avoid high costs of data coherency. Fourthly, it is of course to make content location easy. To maintain data integrity for blocks or pieces of the content, unique name of each block must be available. 3.2.5 Content/peer location Once a client has located potential peers, it needs to decide which peers and how many peers it should use for parallel download. These are difficult choices because the client does not know ahead of time the average bandwidth available from each server peer. In addition, the bandwidth is changing all the time. In particular, the client does not know if the bottleneck of the connection will be local or remote. The optimal selection process involves the following metrics: distance, bandwidth, CPU load, memory usage, and number of processes in a peer. 51 3.2.6 Fault resiliency The larger network is, the more frequent join and leave event will occur [93]. Due to the high frequency of join and leave events, in the middle of swarming delivery, it is possible that some peers leave gracefully (with notification) or ungracefully (without notification). This can prolong the downloading time of the entire file, when a uploading peer refuses to upload for their own reason. In this case, a re-selection process must occur. Moreover, this relates to how to exploit redundancy of the connections or peers to re-selection quickly. In addition, a strategy which makes download and upload happen simultaneously on one peer is desired to increase the probability of contributing to each node the mesh. This should ensure the peer will upload to the other in the mesh before it gets all the blocks/pieces. However, whether we shall choose a fairness principle should also be specified because more uploading peers naturally leads to shorter time to get the content file for each other. Moreover, how to adjust the downloading and upload connections for a peer can also be important to enhance the system performance. 3.2.7 End User bandwidth From an access speed perspective, there are three types of users attached to the last mile of today’s Internet: dialup user, broadband users, and office users. Dialup users Broadband users Office users Downlink 56Kbps 1536Kbps 43Mbps Uplink 33Kbps 128Kbps 43Mbps Table 2. Class of Internet users In the mesh, if every peer is downloading and uploading, those peers behind broadband modems will have a bottleneck on uploading their blocks to others because ADSL only provides asymmetric bandwidth for its users. In Table 2, a typical broadband user only has 128Kbps upload bandwidth, which is only 1/12 of their download bandwidth. The problem is that a single upstream pipe cannot meet demand of the another peer. Even worse, because of the fairness attribute of TCP's rate control [168], if the upstream path is congested, the downstream performance suffers as well. In another words, if a computer is serving files on the slow side of a link, it cannot easily download simultaneously on the fast side. How to detect ADSL peers and minimize the negative effect brought by asymmetry is one of the big challenges in swarming delivery. Intuitively, dialup users can not upload very fast, therefore they can not offer very much download bandwidth for others. But their download bandwidth is not so different from their uplink. In the same mesh, we can imagine that the download duration will be longer if the peers are behind dialup modern. In another words, the broad band or office users will have shorter download time for a given large file. However, what if there is mixture of broadband users and dialup users and office users co-existing in the same mesh? Will the ADSL users be impacted by all other users in the mesh negatively? 3.2.8 ISP infrastructure As I described in sections 2.2.3 and 2.3.2, an ISP’s requirement and underlying network proximity should not be ignored in swarming delivery. In Figure 16, a traffic 52 generator simulated a Gnutella mesh. In such a mesh network, if there are many logical connections existing on one physical link, the physical link congestion will most likely occur. Moreover, if the node are not mapped (Figure 15) on the underlying network in an appropriate manner, the download duration for a file and efficiency of the mesh might be negatively impacted. Figure 15. Match and mismatch in P2P overlay mapping In addition, The larger the mismatch between the network infrastructure and the P2P application’s virtual topology, the bigger the “stress” on the infrastructure[148]. This can increase the co-channel interference in physical media before the link gets completely congested. In according to my measurements, a large movie file downloaded via BitTorrent , from about 20 hops away from my machine. It is unlikely that traffic is localized in the mesh21. Most of the connections came across the peering links with Sprint. Initiated publishing Mesh construction Swarmcast & Onion Networks BitTorrent Swarming Breaking large file into packets/pieces with FEC coding, then packets to be distributed randomly to the requesting clients Only distribute a portion of original packets An entire file must be downloaded in one of the clients before it gets Swarmcast in a mesh A temporary mesh network for a specific file Node leaves the mesh when file reconstruction is successful Ignorant of the global state The .torrent contains information about the file, its length, name, and hashing information, and the URL of a tracker only distribute partial blocks of the file and a gossip22 to all the requesting clients (conservative) root server ensures variety of blocks sent to the clients Starts a downloader from a node which already has the complete file (the 'origin') Bencoding formatted messages between the tracker and metainfo (.torrent) file BitTorrent peer protocol: downloaders periodically View this mesh as a collaborative deliver system, where server peers with larger portions of the file or higher bandwidth will tend to 21 Please see Appendix 3 A gossip message contains a list of peers that are willing to serve portions of the same file. For each server peer, the message lists the peer’s IP address, a list of blocks the peer is known to have, and a time stamp indicating the freshness of this information. 22 53 checking with the tracker to keep it informed of their progress, and are uploading to and downloading from each other via direct connections. Peer/content identification SHA-1 hash key for each piece Content/peer Packets selection Prefer transfers from sources Location close together Closest distance(hops) priority Fault resilient End User bandwidth When a node receives a packet, it rebroadcasts it to other nodes; download and upload rate tend to be the same FEC creates additional repair packets Enable file re-construction with a subset of packets maximized and equal utility, i.e., every packet is equally important to the delivery of content The system requires that no explicit feedback be provided. A sender should not care what happens to a piece of data after sending it. This allows scalable transmission across broadcast or high latency/high error networks Often, the sending side of a data transfer is the bottleneck for broadband users (i.e. , ADSL). To deal with this problem, given the current infrastructure of the Web, try to minimize the number of hops between the customer and the server. Swarmcast can do this through its aggressive mesh- A string of length 20 which this downloader uses as its id. Each downloader generates its own id at random at the start of a new download. This value will also almost certainly have to be changed if the same id exists. SHA-1 hash key for each piece Pieces selection Sub-piece priority Random first – when start to download Rarest first – replicate rarest pieces as soon as possible Endgame mode – last piece problem Tit-to-tat – downloading while uploading Choking algorithm – deal with uploading peer refusal Optimistic unchoking algorithm -deal with snubbing peer (uploading peer) Pipeline redundancy – deals with delay between sending pieces (by keeping 5 pipelines active at once and send subspecies on the five pipelines) Sometimes, limiting your upload rate will increase your download rate. This is especially true for asymmetric connections such as cable and ADSL, where the outbound bandwidth is much smaller than the inbound bandwidth. If you are see very high upload rates and low download rates, this is serve greater numbers of clients. Downstream peers will generally get pieces from upstream peers who have received the content earlier. During the mesh formation, individual blocks are propagated along a tree that starts at the root server server-based peer list together with gossip in the beginning gossiping in peer discovering No DHT identification for each peer, only IP address in case redirection overwhelm the root server due to large numbers of clients In the initiation stage, the root server supplies an initial set of peers Priority on more blocks – the server peer who carries more blocks which the client need is first If the multiple peer provide the same content, then distance and bandwidth will be concerned Parallel downloading from multiple server peers Drop a server peer when it runs out of blocks, or disconnects Invoke peer selection process immediately when drop happens Select blocks from server peer or the content server No limitation of parallel connections Results also demonstrate that broadband users do not see a significant performance increase when small numbers of office users participate in swarming Low-speed users will naturally decrease swarming performance for broadband users, but will not introduce 54 creation policy. Swarmcast's simply sends out random packets, this allows even modem users to contribute to the system. So, a user on a cable modem could saturate his or her connection by downloading from a few dozen modem users in parallel. The only limit is an individual downloader's maximum bandwidth. probably the case. The reason this happens is due to the nature of TCP/IP -- every packet received must be acknowledged with a small outbound packet. If the outbound link is saturated with BitTorrent data, the latency of these TCP/IP ACKs will rise, causing poor efficiency. BT chocking algorithm Quick scaling model localize No concerns on ISP network ISP the traffic from ISP to LAN, thus bottlenecks infrastructure significant problems No concerns on ISP network bottlenecks alleviate the bottlenecks on ISP networks. Table 3. Benchmark of swarm systems 3.3 An introduction of Forward Error Correction codes Error correcting codes or Forward Error Correction (FEC) techniques are a classical mechanism to protect the information against errors or losses in transmissions. The principle of FEC is to add redundancy to the transmitted information, so that it is still possible for receivers to recover all the whole transmitted data, even after experiencing errors in transmissions. The main method of FEC used in the network domain is the use of block codes. These codes consider a group of k packets and compute n-k redundancy packets (n>k). The fundamental property of FEC permits the receiver to reconstruct the k source packets as soon as it correctly receives k packets amongst the n disseminated ones. This property enables very high reliability for data transmission, especially in a highly dynamic network. FEC is used in several kinds of applications: wireless transmissions (e.g., satellite, cellular phone) data storage (e.g., CD-ROM, DVD.) or computer memory (e.g., RDRAM). In a networking context, FEC is classically used at lower layers (physical and link layer) in detection/correction mode. In higher layers, similar software techniques such as CRC or checksums are used to detect corrupted packets. In recent years, with the improvements of the performance of personal computers, it is possible to implement encoding and decoding of the FEC in software at user level (i.e., transport or application layer). For example, most reliable multicast transport protocols use FEC [150]. In data storage area, FEC is used to protect data against the failures of storage devices. Small scratches on CD-ROM are corrected by FEC directly implemented on the encoded information. Failures of hard disks can be protected by FEC-based systems such as RAID technology[151], [152]. This property holds only for maximum distance separable (MDS) codes, e.g. Reed-Solomon codes[149]. As we know from previous sections in this chapter, large content data can be spit into fixed length small blocks and transmitted in a mesh. There are two very important attributes in such a mesh. One is that the join and leave behavior of each node is stochastic and it occurs at a very high frequency as we show in section 5.1.3. Another one is that the total volume of the data being transmitted in the mesh as relatively large, and sometimes can reach several orders of magnitude. Thus reliability of data 55 transmission becomes a big challenge in such mesh as created by swarming delivery. To achieve high reliability of swarming delivery, there are three fundamental ways of doing so. First is to use Automatic ReQuest for retransmission. With ARQ, receivers use a back channel to the sender to send requests for retransmission of lost packets. ARQ works well for one-to-one reliable protocols, as evidenced by the pervasive success of TCP/IP. ARQ has also been an effective reliability tool for one-to-many reliability protocols, and in particular for some reliable IP multicast protocols. However, for one-to-very-many reliability protocols, ARQ has limitations, including the feedback implosion problem because many receivers are transmitting back to the sender, there is a need for a back channel to send these requests from the receiver. Another limitation is that receivers may experience different loss patterns of packets, and thus receivers may be delayed by retransmission of packets that other receivers have lost, but they have already received. This may also cause wasteful use of bandwidth to retransmit packets that have already been received by many of the receivers. Second is the Data Carousel approach [153]. With Data Carousel, the sender partitions the object into equal length pieces of data, which we hereafter call source symbols, places them into packets, and then continually cycles through and sends these packets. Receivers continually receive packets until they have received a copy of each packet. Data Carousel has the advantage that it requires no back channel because there is no data that flows from receivers to the sender. However, Data Carousel also has limitations. For example, if a receiver loses a packet in one round of transmission it must wait an entire round before it has a chance to receive that packet again. This may also cause wasteful use of bandwidth, as the sender continually cycles through and transmits packets until no receiver is missing a packet [150]. The third is the FEC encoded approach. The rationale of FEC in swarming delivery is that the files become a set of n+m blocks. Depending on FEC encoding technique, the first n blocks may or may not be the n source blocks. In first case, they just result from the original file splitting. The next m blocks integrate the redundancy introduced by the FEC encoding. In the second case, the original information contained in the n blocks is diffused into the n+m blocks (viz. the source blocks can be in any ith packets in [0, n+m] ). When you publish them you just distribute various blocks over the mesh for instance as they have done as in swarming initiate stage in [11] . This distribution is ensured using a native service of the P2P architecture. This dissemination algorithm can be based either on the natural dissemination due to the user downloads between the peers or on a more specific dissemination algorithm of the P2P system. Eventually, the distribution algorithm is similar to the approaches in section 1.2.4.2 because this is the same problem of placing the n+m blocks amongst K peers. In the latter case, we shall consider distance, and bandwidth metrics for the dissemination because this will more accurately locate the cost for a peer in downloading. In addition, in study [11] , they find that when the various blocks are disseminated over the network, downloading a complete file is equivalent to downloading any n distinct blocks among the n+m ones. As there are greater choices between the different blocks to get, the availability and the robustness of the system is increased. When these blocks are disseminated over the P2P network, the searching service helps to determine the closest ones by considering a certain cost function (e.g., the greatest bandwidth). Then, the n closest blocks are downloaded in a classical way. The original data file can be finally reconstructed by decoding of these n blocks. Moreover, they found that the following interesting figures: 56 In FEC Block Dissemination, for a 10 block file and an FEC encoding rate of ½ (i.e. 50% redundancy ), each block is duplicated 5 times in 100 blocks. File downloading has an average cost per peer of 13.99 Without FEC Block Dissemination, if there are 10 blocks in a file, them without FEC encoding, each block is duplicated 10 times in a total of 100 blocks. Thus file downloading average cost per peer of 17.03 With entire file Replication, if there are 10 block in a file, no FEC encoding, each block is duplicated 10 times then in a total of 100 blocks. Thus file downloading average cost per peer is 16.85[11] Note that the cost function is purely based on the hops between each node. Intuitively, we can see that the approach of block distribution with FEC encoding saves cost in a P2P overlay. Based on this, we shall choose FEC codes for the blocks. A good cost model shall consider metrics such as user downloading speed, bandwidth, server load, and network load. In addition, different download strategies such as fairness in downloading (downloading and uploading at once), and concurrent downloading. 57 Chapter 4 Introduction to Mobile Agents 4.1 A Definition There are many definitions from academia and industry regarding Agent technology, because it is a relatively mature technology. I think the following definition can broadly define what an agent is: An autonomous agent is a system situated within and a part of an environment that senses that environment and acts on it, over time, in pursuit of its own agenda and so as to effect what it senses in the future. It includes the following properties: Property Other names Meaning sensing and responds in a timely fashion to changes in the reactive acting environment autonomous exercises control over its own actions pro-active does not simply act in response to the goal-oriented purposeful environment temporally a continuously running process continuous communicates with other agents, perhaps communicative socially able including people changes its behavior based on its previous learning adaptive experience able to transport itself from one machine to mobile another flexible actions are not scripted character believable "personality" and emotional state. It has the following taxonomy: Figure 16. Agent Taxonomy [133] Intuitively, the mobile agent belongs to one type of Task-specific Agents, and the property is able to transport itself from one machine to another. Therefore, we can accept this definition for a Mobile Agent: 58 “A mobile agent is a program that can migrate from host to host in a network of heterogeneous computer systems and fulfill a task specified by its owner. It works autonomously and communicates with other agents and host systems. During the selfinitiated migration, the agent carries all its code and the complete execution state with it. Mobile agent systems build the environment in which mobile agents can exists. Migration of agents is based on an infrastructure that has to provide the necessary services in the network. The infrastructure is a set of agent servers that run on platforms (nodes) within a possibly heterogeneous network. Each agent server hides the vendor specific aspects of its host platform and offers standardized services to an agent that is docking on to such a server concluding migration. Services include access to local resources and applications, e.g. web-servers, the local exchange of information between agents via message passing, basic security services, creation of new agents, etc.” [134] 4.2 What problems can mobile agents solve? Agents are a tool for analyzing systems, not an absolute characterization that divides the world into agents and non-agents [135]. Agent technology has been integrated into many areas of computer science such as: objects and distributed object architectures, adaptive learning systems, artificial intelligence, expert systems, genetic algorithms, distributed processing, distributed algorithms, collaborative online social environments, security, etc. As mobile agent attributes mobility, it can be used to solve many problems in domains where agents can be integrated. Agent technology solves, or promises to solve, several problems in different domains. Mobile agents solve the nagging client/server network bandwidth problem. Network bandwidth in a distributed application is a valuable resource. A transaction or query between a client and the server may require many round trips over the wire to complete. Each trip creates network traffic and consumes bandwidth. In a system with many clients and/or many transactions, the total bandwidth requirements may exceed available bandwidth, resulting in poor performance for the application as a whole. By creating an agent to handle the query or transaction, and sending the agent from the client to the server, network bandwidth consumption is reduced. So instead of intermediate results and information passing over the link, only the agent need to be sent. Here's a related situation. CDN is one of the typical examples as we described in chapter 1. In the design of a traditional client/server architecture, the architect divides the roles of the client and server pieces very precisely – from the bottom to the up, at design time. The architect makes decisions about where a particular piece of functionality will reside based on network bandwidth constraints (remember the previous problem), network traffic, transaction volume, number of clients and servers, and many other factors. If these estimates are wrong, or the architect makes bad decisions, the performance of the application will suffer. Unfortunately, once the system has been built and the performance measured, it's often difficult or impossible to change the design and fix the problems. Architectures based on mobile agents are potentially much less effected by this problem. Fewer decisions must be made at design time, and the system is much more easily modified after it is built. Mobile agent architectures that support adaptive network load balancing could do much of the redesign automatically. 59 Agent architectures also solve the problems created by intermittent or unreliable network connections. In most network applications today, the network connection must be alive and healthy the entire time a transaction or query is taking place. If the connection goes down, the client often must start the transaction or query from the beginning if it can restart it at all. Agent technology allows a client to dispatch an agent handling a transaction or query into the network when the network connection is alive. The client can then go offline. The agent will handle the transaction or query on its own, and present the result back to the client when it re-establishes the connection. Agent technology also attempts to solve (via adaptation, learning, and automation) the problem of getting a computer to do real thinking for us. It's a difficult problem. The artificial intelligence community has been battling these issues for two decades or more. 4.3 Core techniques in mobile agents The key technique is how to migrate the program from the original host to its destination. Before we figure out how to migrate the agent, let’s understand what network layers will be involved in the migration procedure shown in Figure 18. Figure 17. Network layers involved in migration We assume that the physical path has been established between the client and the server in advance. The rest is to set up as a logical connection on the TCP or IP layer. Since the migration of the program happens on the application layer, there must be a communication channel between the client process and the server process, viz. a logical path must be established network connection between them on the TCP or IP layer according to RFC 793, RFC 791, RFC 919, RFC 922, and RFC 950. Fortunately, you don't have to do the work yourself. Sockets are an innovation of Berkeley Unix that allow the programmer to treat a network connection as just another stream into which bytes can be written and from which bytes can be read. Historically, sockets are an extension of one of Unix's most important ideas: that all I/O should look like file I/O to the programmer, whether you're working with a keyboard, a graphics display, a regular file, or a network connection. Sockets screen the programmer from low-level details of the network, such as media types, packet sizes, packet retransmission, network addresses, and more. This abstraction has proved to be immensely useful and has long since traveled from its origins in Berkeley Unix to all breeds of Unix, plus Windows, and the Macintosh. Thereby, to establish a network connection between the client and the server, you only need to do following the steps as follows: 60 1. Connect to a remote machine 2. Send data 3. Receive data 4. Close a connection 5. Bind to a port 6. Listen for incoming data 7. Accept connections from remote machines on the bound port There are many programming language specific methods to interpret the above operations. For instance in Java, we have the following steps to establish a network connection: 1. The program creates a new socket with a Socket( ) constructor. 2. The socket attempts to connect to the remote host. 3. Once the connection is established, the local and remote hosts get input and output streams from the socket and use those streams to send data to each other. This connection is full-duplex; both hosts can send and receive data simultaneously. What the data means depends on the protocol; different commands are sent to an FTP server than to an HTTP server. There will normally be some agreed-upon hand-shaking followed by the transmission of data from one to the other. 4. When the transmission of data is complete, one or both sides close the connection. Some protocols, such as HTTP 1.0, require the connection to be closed after each request is serviced. Others, such as FTP, allow multiple requests to be processed in a single connection. Figure 18. Migration implemented in Java Given a network connection between the client process and server process, the next step is to wrap the program and send via this channel. There are also some programming language specific method to do so. We choose Java as an example again. In Java, statically, an agent is a type of classes which are defined as “extends Thread implements Serializable”. This means that this class is a child class of Thread class 61 (Java library default class) and it implements another default library interface class called “Serializable”. In the runtime environment, it can be a thread activated by a client or server process. When you instantiate a class like this, you get a serializable object for this class, then use the above 4 steps, you could send this object to the client or the server which un-marshal the object and get the member methods and output the content this agent carried as depicted in the following Figure 19. 4.4 Overview of the remaining chapters As you can see from the previous chapters, there are many problems in CDN, P2P and Agents I have listed many solutions for these problems. However, what problems are we interested in? Why do we want to solve them? How to solve them? In the following chapter 4, I will explain our goals and use this list to bootstrap solutions to all the problems we are interested in solving. In chapter 6, we will propose a novel CDN architecture to solve the problems described in chapter 5. At the end of this paper in chapter 7, we will summarize our contribution and highlight the future work for further development because we believe that PlentyCast has a very good potential as a commercial application in the future. 62 Chapter 5 Problem statement In this chapter, I begin by decomposing the five goals we set up for PlentyCast in this project. Secondly, I will describe the relationships amongst them and identify the high level problems that emerge. Third, I will describe the methods I used in this project to attack these problems and the criteria for how I select from different existing approaches to create an integrated solution. 5.1 PlentyCast design goals In conventional CDNs, there are many problems needing further research in the following areas: cache placement/replacement, cache coherency, caching contents, user access pattern prediction, load balancing, proxy placement, dynamic data caching, etc [1]. Nowadays, new forms of content delivery such as video-on-demand, streaming content for real-time events, scalability, built-in security mechanisms etc, are major challenges for next generation CDN [2]. Since our aim is to deliver large static content in PlentyCast, our goals have been narrowed down so that we won’t address all the outstanding problems of CDN listed above. In this project, we have selected the following goals in our PlentyCast system design; in order to achieve high system reliability comparing with other CDN design. The five goals are: improved access latency, improve network scalability, improve content availability, lower bandwidth consumption, and improve infrastructure performance compared to conventional CDNs. In the following sections, I will explain each of there goals in more details. 5.1.1 Improved access latency A web user can access web content via a web application such as web a browser. Actually, it is not difficult to identify the two properties which are most important for the user when accessing content, the first is the time delay before content is visualized, and second is the quality of the content obtained. If we look at the access time, it is the access latency due to protocol stacks and the network connection between the web server and the user agent. The quality is determined by both the underlying network quality of service(such as specified for QoS in RFC 2211, RFC2212) and/or web application on both ends ( i.e., the coding and decoding needed for specific types of content such as streaming audio or video content). Since we focus on static content in our paper, we assume the server has nearly infinite time in advance to encode the content. In another word, we assume the coding and decoding are well designed for all the content that PlentyCast will deliver, thus the issues of quality of content are excluded in this paper. In this section, we only look at latency problems. Latency is part of the CDN usability (as depicted in Figure 20.) because it represents one of the main user requirements in a CDN system. As CDN delivers content from one end to another over the Internet, quality and latency are both important metrics to evaluate the CDN distribution/delivery quality of service. Latency for static content is determined by the following 5 aspects: (1) Number of requesting clients, (2) network distance between the client and the content server, (3) size of the content, (4) link capacity between the client and the content server, and (5) content server performance. 63 Intuitively, when more clients access the web server, more bandwidth will be consumed along the link, but the link capacity is a constant for a certain period of time; this could potentially lead to congestion along the link between clients and the server. For a more detailed description please refer to section 2.1.4. Figure 19 CDN usability decomposition When every routing protocol is working well i.e. routing in LAN and external network are running normally, IP layer packets will traverse optimal (shortest path) routes. The number of network hops between the client node and server node could be one of the metrics to measure the latency. However, today’s Border Gateway Protocol (an external routing protocol which is popular in today’s Internet) appears to be quite instable due to many pathological routes residing in BGP routers [20]. Furthermore, this can hide the real numbers of hops between end systems. Some experiments also tell us that the numbers of AS –hops is an unreliable metric for indicating network latency [22]. Thus rather than using network hops to measure latency between the content server and its access clients, we choose RTT as our latency metric to indicate the latency. Thus if the system is well tuned, the file size directly determines the data transfer latency. As we know from section 1.1.1, there are four Internet bottlenecks. However, physical link capacity is another aspect of the same problem and impacts the End-to-End transmission latency. Certainly when the client access rates do not exceed the link capacity, additional latency (i.e. beyond transmission latency) only relates to other factors. When bandwidth demand from clients exceeds the link capacity and/or the link capacity is reduced due to traffic shaping by ISP/s, congestion will result in “Page not found” errors to some clients or even worse behavior for some of the clients. 64 The content server used to be a very serious bottleneck before CDN was established. This is because content distribution directly relies on the server’s CPU speed and memory capacity. It was quite easy to overwhelm a content server if there are more than hundreds of client accessing it concurrently. After CDNs were introduced, the CDN spreads the server load over its server farm (containing cache proxies and replica servers). This greatly alleviates the bottleneck due to the CPU and memory limitations of a content server. However, dynamic web content is increasing exponentially, and today’s CDN have a problem, distributing the dynamic content. The most advanced technique is to let interaction happens between the content server and its access client as it is done in Edge Side Includes [9]. Thus, if there are a large number of users who request dynamic content, the problem of server performance reappears. Despite increasing hardware capacity in CPU and Memory, there are some additional ways to solve this problem such as using agent technologies in real time scheduling (however, we do not address this kind of problems in our project). To sum up the access latency problems in CDNs, the following table lists the relations between each of these factors and latency. Distance Content data size Correlation between latency and its factors Content server performance metrics Number of the clients Bandwidth CPU speed Memory size quantity RTT23 Hops Millisecond number Bytes Bit/s MHz Bytes number longer longer more larger Decrease lower smaller larger shorter shorter less smaller Increase higher larger smaller Measurement unit Latency Link capacity Table 4. Correlations between latency and its factors Intuitively, every factor can change the latency independently. For instance, if there is larger number of client accessing the content server, this could lead to longer latency in the delivering the content to the user. The rest of the alternatives have the same property. If all factors accumulate products (the distance metrics in between are longer, content is larger, link capacity is smaller, content server performance metrics are relatively smaller, and the number of the clients is larger), then the latency will become very long. However, these only reflect one aspect of this problem – how latency can occur. What we want to find out is the best trade-offs. How we can achieve improved access latency by altering some of these metrics is our focus. Ultimately, latency is one of the most important metrics for web content access. Decreasing the latency between the content server and a user agent has become one of important mission of CDN. As I described in chapter 1, CDNs replicate the content close to (i.e., a short network distance) the user who requests specific content. This decreases the latency significantly by shortening the distance between clients and content servers. For instance, Akamai placed 13,000 servers across 1,000 networks in 63 countries [3]. In PlentyCast, we adopt the same approaches to resolve the latency problem, but we expect to reduce it more than conventional approaches have. 23 RTT: Round Trip Time, in the traceroute command, is the time length for a packet to traverse back and forth between destination and source. 65 Most conventional CDNs decrease latency by two techniques, one is to intelligently place the content via replication near the requesting user (shortening distance between content and user); another is to intelligently locate and route the content data to the requesting user. These are certainly correct solutions to reduce latency. However, we introduce a third technique to improve access latency in PlentyCast – this is swarm delivery. Basically the mechanism is to partition large content into small fragments; these small pieces of content can be distributed near the requesting user. When a user’s access is authorized, all these smaller fragments will be transmitted from different locations to the destination (forming a multi-point-to-single-point traffic pattern). At the destination, the user agent receives all the fragments concurrently, and assembles them to restore a complete copy of the content at the destination. This appears to be very effective in reducing the latency for a client to access large content data as demonstrated by some new Peer-to-Peer software such as BitTorrent [4]. Thus we argue that the existing CDN techniques with the addition of swarm delivery technique can deal with large content. This enables PlentyCast to reduce access latency better than the approaches used in current CDN networks. 5.1.2 Improve network scalability Scalability means the ability of a network system to increase numbers of servers and clients smoothly. In general, the more clients a CDN can serve and the better selforganization when clients leave and join at a high rate, the better scalability a CDN has. Nowadays, most of CDN systems are designed following the Client-Server paradigm. As we know, this centralized system pattern exposes synchronization and coordination problems when the network becomes very large or topology change becomes highly dynamic. As a new type of computation paradigm, Peer-to-Peer demonstrates excellent scalability and has great potential for content distribution in very large scale networks. An immediate benefit of decentralization is improved scalability [5]. File sharing applications are good examples, such as Freenet [42], Gnutella [40], Kazza [41], and BitTorrent. There have been many attempts to discover how to use P2P in CDN [6], [7], [8]. We can use P2P technology in a CDN is to take advantage of clients’ CPU, storage, and memory for content distribution in certain circumstances. Intuitively, if we choose a client as part of the CDN, content data availability can be significantly increased because the storage space has been extensively expanded in this case. PlentyCast can scale large numbers of clients to access the content comparing traditional CDNs, particularly when there is some extreme events happening in the content delivery network, such as in the case of “Flash Crowds” and Delay of Service attacking on any content server (see section 5.1.3). In PlentyCast, content location and routing, and self-organization must be considered when we want to achieve this goal. Moreover, content distribution, selection, and content lookup algorithms are key issues. This is what we want to focus on in the reminder of the report, in order to achieve better scalability than conventional CDNs. 5.1.3 Improve content availability Content availability means that content data is available for any requesting user at any point in time over the network. In general, it is achieved mainly through redundancy involving increasing the number of places where the content data is stored, hence more nodes where it can be reached. Intuitively, replicating content data in a replica server farm is one of the major techniques to achieve high content data availability in a CDN. Nowadays, one of the key technologies of conventional CDN is Edge Side 66 Includes [9]. This technology enables us to cache content or replicate content as close to the user as possible. Today’s Internet has been partitioned into thousands of Autonomous Systems, all of them are connected via the Border Gateway Protocol over physical links. Users are sitting at the leaf nodes included in each AS. Within an AS, if we place the content cache or replicate the content in the CDN server at the edge of this AS, it will significantly shorten the distance metrics between the content and any content user (or potential user) within this AS. When a user attempts to access the original content server, he or she doesn’t access the content from the original content server, but instead receives the content from the replica server or cache proxy at the edge of the same AS as the user node is located. In this way, on one hand, latency has been significantly decreased between the content data and the user. On the other hand, data replication/redundancy increases content availability. However how much is data availability improved by such technology? 9/1124 gave us a comprehensive lesson about “Flash Crowds”! After the world trade center collapsed on Sept 11 2001, thousands of Internet users accessed MSNBC (who was a customer of one of biggest CDN operators) web servers in very short time – this is a so called Flash Crowd. This behavior makes the web site unreachable because the servers are overwhelmed. When this has been done maliciously, the consequence is recognized as a Denial of Service attack. The following was documented by Microsoft’s CoopNet project during 9/11: We use traces collected at MSNBC during the flash crowd of Sep 11, 2001 for our evaluation. The flash crowd started at around 1:00 pm GMT (9:00 am EDT) and persisted for the rest of the day. The peak request rate was three orders of magnitudes more than the average. We report simulation results for the beginning of the flash crowd, between 1:00 pm to 3:00 pm GMT. There were over 300,000 requests during the 2-hour period. However, only 6% or 18,000 requests were successfully served at an average rate of 20 Mbps with a mean session duration of 20 minutes. Unsuccessful requests were not used in the analysis because of the lack of content byte range and session duration information.[10] We think this problem is not simply due to CDN scalability. Ultimately, this is a problem of content data availability. The lessons to be learn from this event are not only how to deal with “Denial of Service” attack, but more importantly, how to make data highly availability for certain large size of instances of interesting content while protecting the content server from being overwhelmed by a Flash crowd. Recently, the NASA25 spacecraft Spirit landed on Mars but almost swamped NASA Web site due to 109 Million Hits in 24 Hours [63]. “To handle the Mars traffic, NASA is paying $1.5 million to eTouch Systems Corp., Speedera Networks Inc., and Sprint Corp. beyond the $3.5 million it pays them each year to handle NASA's more popular Web sites, said Jeanne Holm, NASA's Web portal manager.”[19]. This reflect a fact that conventional CDNs can only deal with a flash crowd by either interconnecting CDNs (solution such as CDI ) and/or increasing the replica servers and/or cache 24 25 http://www.cnn.com/2001/US/09/11/chronology.attack/ accessed on 2004-01-18 16:40 http://www.nasa.gov/externalflash/sts107/index.html accessed on 2004-02-07 200 16:14 67 proxies at high expensive. As our best knowledge, there are other solution to enhance content availability for content providers. Despite technical difficulties, there is another important fact we should not ignore – lack of good content. Why do very large numbers of users go to NASA, CNN, MSNBC, and The Wall Street Journal on line, NASDAQ for reading or watching news? This lack of quality content occurs because on Internet causes large numbers of users to access relatively small numbers of content servers. This will likely to last for a relatively long period in the future, especially for the large-size content. Hence traffic hot spots are easily formed. Enhancement of data availability is the key to have this subset of content distributed to relatively large numbers of Internet users. Differing from Edge Side Includes [9] technology and CDI [67], we argue that P2P content caching should not be restricted to CDN server farms, but should use the many clients’ storage too. This can create a much larger storage space enabling higher data availability. By introducing swarm delivery in PlentyCast, data can be more widely distributed and more quickly located and transmitted to the destination. Further more, a downloading while uploading strategy shall be adopted in swarm delivery. Originally, this strategy was adopted to address the problems of free-riding [4] in Peer-to-Peer file-sharing applications. Additionally, we use it for a third purpose: gracefully avoiding Flash Crowd and counteracting Denial of Service attack. As downloading comes from the other clients uploading, the more clients that request the same content the more content data will be distributed thus relieving original content server and CDN replica servers. A disadvantage of swarm delivery is that it may occupy too many ports and steal too much bandwidth from other (normal) client applications [11]. Ultimately, realizing swarm delivery and download while uploading strategy relies on the careful design on algorithms in content distribution, location and routing systems of PlentyCast. 5.1.4 Lower bandwidth consumption As we described in paragraph 1.1.1, there are four bottlenecks existing in today’s Internet. Based upon our understanding of the existing approaches to these four classes of Internet bottlenecks26, we will describe our solutions for alleviating these bottlenecks in our network. Many people think Peer-to-Peer file sharing applications save lots of bandwidth due to its equivalence property, i.e., peer A can retransmit the content to peer B thus saving bandwidth between the peer B and the content server. My viewpoint is that this only saves the bandwidth on the first-mile (as CDN does), but indirectly puts increased load on peering and backbone links unless the peers are carefully chosen. Given the pressure from broadband DSL subscribers, I don’t think peering and backbone/infrastructure traffic engineering is an easy mission. Additionally, P2P users usually consume more bandwidth then normal users. Imagine how much traffic 230 million Kazza users will generate on the Internet. According to an industry estimate: “P2P now accounts for 50 to 70 percent of all Internet traffic. KaZaA alone has more than 230 million client downloads, with about 900 million files available for sharing, representing about 6,500 terabytes” [14]. 26 Please refer to 1.1.1 68 Ultimately this expansion in bandwidth is inevitable because of layer seven services are becoming more and more diversified and more and more bandwidth intensive applications have been unleashed. Reducing bandwidth consumption is difficult in a conventional CDN, but it will be possible by using PlentyCast. We use an intelligent distribution strategy of CDN to alleviate the first-mile bottleneck, P2P solves the lastmile bottleneck, and self-organization to solves the bottlenecks for peering and infrastructure/backbone. Therefore, we can reduce the bandwidth consumption at each of the bottlenecks in our CDN, this potentially leads to reducing the bandwidth required over the entire Internet. 5.1.5 Improve infrastructure performance As we know, traffic engineering is the key task for ISPs to avoid network performance and quality of service problems. How to achieve reasonable network performance is not the only question when we construct an overlay on top of the ISP’s layer 3 network. As a matter of fact, new types of Internet traffic models have been discovered by academic researchers and so many new types of service applications are unleashed. This is reflected in more and more new types of traffic patterns forming. ISP’s traffic engineering and planning becomes more and more difficult because of highly dynamic traffic patterns appearing in relatively short periods of time. For instance, routers/switches upgrading for adapting P2P traffic control (as P-cube [15] does) lags the utilization of new P2P applications [18]. On one hand, find solutions on the underlying network is interesting since this layer is supposed to guarantee QoS on layer 3. But the news is not optimistic, skewed traffic of P2P over large networks [17] reflects 20% of users generating 80% of Internet traffic. The consequence is the other web application’s bandwidth has been significantly consumed. Even though an administrator can shut down P2P systems on a campus [158], shutting down P2P will result in fast customer churn in an ISP. These dilemmas suggest that a new way of thinking about how to resolve this traffic engineering bottleneck in order to maximize network performance. This is not only for P2P traffic, but also for all the new types of layer seven traffic that will be generated. This should be the way to handle the bandwidth intensive applications, otherwise, it will not exist for very long. For instance. P2P systems will not exist any longer if they have both annoy ISPs and the record industry (a class of Content Provider). Therefore, we must find harmonized solutions for all different types applications in today and tomorrow’s Internet. This means that the solution must consider all the requirements from different actors in the networks This goal will make the PlentyCast design distinct from many other CDNs. When we attack this problem, there is a very important philosophical motivation we advocate: The survival of the fittest – Charles Robert Darwin If we examine the problem of Internet infrastructure performance, I believe that coadaptation is the key to solving this type of problem. This means that a solution providing solutions for both infrastructure and applications make a good solution for everyone. However, it is naïve to think that every P2P system devised will be infrastructure-aware since they have been vastly used to do decentralized file sharing. Thus we choose to describe our PlentyCast approach because a CDN is more manageable than P2P file sharing systems. As one of the pillars of PlentyCast, selforganization will be used to provide fault tolerance for the overlay network. In addition, PlentyCast’s self-organization means that system can re-organize overlay traffic to optimize underlying network’s performance. This will enable our CDN to 69 adapt much better to the underlying network. Ultimately, this maximizes the underlying network performance for content distribution. To our best knowledge, this is the first study this approach in CDN, we will refer to it as a Both Infrastructure and Overlay Network Based Solution. (BIONBS) 5.2 Problem modeling Our goals in designing PlentyCast was to realize our goals by exploiting the following systems features – overlay self-organization, intelligent content location and routing, and swarm delivery. All the goals are inter-related, and this is depicted in Figure 20. Figure 20. Problem model In this picture, the questions are concentrated on the layer underlying. The following questions should be asked: A. What functions shall we design in to use each core technology? B. How to design these functions? To answer these two questions, we must answer the following questions: 1. What requirements shall we consider from a user perspective? 2. What requirements shall we consider from a content provider perspective? 3. What requirements shall we consider from an ISP (multi-tier) perspective? 4. What technologies shall we choose to realize those requirements? Following this, we can define a set of sub-problems for each of the core technologies: Smart replica placement: 1) How to identify each content replica? 2) How to place content as close to users as possible? 3) How many replicas of the content shall be placed close to end users? 4) How to replace content in terms of self-organization? Intelligent replica location and routing 1) How to locate a replica in the content space? 70 2) How to choose the best replica to download in terms of location, routing, and self-organization metrics? 3) How to relocate new replica on downloading? 4) How many replicas shall be chosen to download? 5) How to organize content and its host/s efficiently in a dynamic environment? 6) How to compute the location and routing metrics? 7) How to acquire the underlying traffic engineering metrics? Swarming delivery 1) What information shall be carried in metadata for each fragment of content? 2) How to document each metadata is host? 3) How to ensure integrity of metadata during transmission? 4) How to aware the other pieces of the same content? 5) How to split and rebuild the content locally? 6) How to avoid expensive negotiation between hosts in swarming? Self-organization 1) How to detect faults during content transportation? 2) How to isolate the faulty nodes? 3) How to signal location and routing to relocate content following faults? 4) How to re-organize overlay traffic in terms of metrics from infrastructure? 5.3 Discussion of the criteria According to distributed system theory, there are many aspects that should be addressed in such a large system design. However, due to the limited time and resources in this thesis project, I will focus only on major problems addressed in this system architecture. We have chosen to focus on self-organization, replica placement, swarm intelligence, and location & routing after intensive literature study. We believe those four technologies are most outstanding candidates for solutions to the problems to be addressed by this design. These mechanisms will form backbones of PlentyCast. There are many contemporary researches addressing these four areas. In replica placement, swarm intelligence, and location & routing, there is no major difference between our approaches and others. With regard to self-organization, we consider that infrastructure awareness is one of the important system metrics. This makes a big difference from the other approaches. In chapters 3, 4, 5 and 6, we first decompose the problem domain into sub-problems. Following, a study of relevant literature from leading industrial vendor’s research laboratories such as HP lab, IBM research, Microsoft Research, ATT Research, and standardization bodies such as ACM, IEEE, and IETF. Third, I looked at the academic research journals by following the citations in many papers. As we know, end users, content providers, and ISPs are the three major customers of CDNs. A good CDN system should not be skewed to one of these customers over others. In this paper, we focus on mechanisms and/or algorithms. No simulations were conducted, but we believe this should be part of our future work. An industry team will implement prototype of this design following my thesis project. 71 Chapter 6 6.1 A novel architecture – PlentyCast System overview Figure 21. PlentyCast overview As with most of the CDN topology, PlentyCast consists of two types of servers: replica servers and landmarks. They can be physically co-locate in one site or can be placed in different sites. However, they function in different way, as will be explained in later sections. Therefore, we would like to separate them in Figure 21. PlentyCast overview in order to indicate their differences. In this PlentyCast overlay, we classify all the elements in three different sub-layers. One is overlay X, intuitively, this sublayer only consists of replica servers. Layer Y consists of landmarks on the edges of the designated ASs. The third layer is formed by PlentyCast clients and is called overlayer Z. An obvious attribute of this layer is the number of elements is not predictable. However, we choose them as part of our CDN strategy in certain circumstances. All three layers are mapped on the underlying physical network. The physical connections can be shared or linked separately. On top of the overlay X, content servers are connected to the entire CDN. There are six key subsystems and two assistance systems in our architecture. The six key systems includes: 1. PlentyCast client 2. Landmark server 3. Replica server 4. Location and routing system 5. Distribution system 6. Accounting system 72 6.2 High level system architecture Figure 22. PlentyCast high level architecture There are another three systems that appear to be external systems of this CDN system. They are content server, billing and charging system. Following the general work flow, the description of this architecture is as follows. 1. Content object copies are sent to the distribution system. In Object Splitter, it is split into fixed length block, and the blocks are coded by the FEC encoder with redundancy information. 2. The distributor places the coded blocks over the replica servers following certain policy. 3. After the distribution has been finished, a notification must be sent to the location and routing system to update the replica locations. 4. When a PlentyCast client requests specific content, the local landmark will forward this request to the location and routing system to select the most suitable replica servers for the client. 5. If the location and routing system find the popularity of the content isn’t too high or load of the local replica servers are not high, then a list of local servers will be sent to the PlentyCast client for its selection via the landmark. 5’. If the location and routing system finds that the popularity of the content is not too high or the load on local servers are very high, then it will instruct the distributor to disseminate more blocks to more replica servers, and then a list of replica servers will be sent to the landmark for reference. Meanwhile, the location and routing system will send an order of swarming delivery to the landmark; the landmark will search amongst its peering group and its local database to lookup blocks existing in any active PlentyCast clients within certain radius of the content space (such as on the same LAN or within the same AS). Then a list of all candidates (servers and PlentyCast peering clients) are sent to the PlentyCast client. 73 6. According to the selection process in the PlentyCast client, only some replica servers will be selected . Then a many-to-one concurrent download is conducted. The client downloads all the blocks, decodes them, and then reconstructs the content locally. 6’. According to selection process in the PlentyCast client, some servers and PlentyCast clients will be chosen to start/join swarming delivery. Then a mesh is formed to download and upload the blocks of this content as described in section 3.2.2. The landmark periodically update the status of the clients in the mesh. If the number of blocks of the content exists amongst the PlentyCast clients in the mesh, the landmark shall send a new list of PlentyCast peers the client for re-selection of the nearest blocks. 7. The landmark checks the downloading status periodically of the downloading clients, and then reports to the accounting system about the popularity of this content and the numbers of replicas actual alive in PlentyCast clients. Meanwhile, the replica server also sends the hit rate of the content and its current load to accounting system. 8. The accounting systems manipulate data for location & routing system, billing and charging system, the customer – content provider. 6.3 PlentyCast client In this user agent, most functions are the same as a web browser. It can for example send an HTTP request for any content which the user wants to access. The difference appears in the following aspects: 6.3.1 Active binning Every PlentyCast client periodically pings a set of landmarks to obtain RTT between each of the landmarks in their joining phase. This is to used to classify different replica servers into those with the same level; as described in section 2.3.2.1. The level vector for each client is stored and send to the landmark. This information will assist in replica server election. 6.3.2 SNMP client. To be able to monitor the downloading and/or uploading status in this client, an SNMP MIB27 for throughput of IP/TCP connections shall be queried. Mobile agent client Since the landmark will periodically request the download and/or upload status of the client, a report-on-demand mechanism must exist between the client and the landmark. In study [155], they developed a very efficient way to monitor large scale and dynamic network with small bandwidth consumption. This can be adopted in reporton-demand mechanism between PlentyCast clients and landmarks. 6.3.3 Peer lookup service If swarming delivery is needed, a PlentyCast client should be able to locate the blocks being downloaded either by one of DHT systems such as Tapestry (as described in section 2.3.1.3 ) or a platform such as JXTA [117]. In this case, the landmarks is the “super peer”. It monitors the downloading and uploading status of each client who is 27 Management Information Base. 74 accessing content. It provides a list of PlentyCast clients who are downloading and uploading content when swarming delivery is needed. Thus a PlentyCast client can locate the blocks of that content amongst the peers given by the landmark. In a PlentyCast client, either of these two communication mechanisms shall be implemented. 6.3.4 Peer Selection Since n+m blocks of content are encoded in FEC codes, these m blocks are redundant blocks. Only n blocks are needed to reconstruct the content. Thereby, selecting n distinct blocks amongst n+m blocks is key in this selection process. In the selection algorithm, the following steps are mandatory in this algorithm: 1. Identify how many blocks are needed to reconstruct this content. 2. Identify the peers and/or servers who carry any blocks of the content file, and the number of those peers in the list given by the landmark. 3. Identify the peers who has the download request for this content. Any peer whose portion of the content is lower than 100% shall be uploaded. 4. Selecting the n’ peers and/or servers (n’≤ n) from the group in 2. The metrics are the peers or servers with: a. Shortest distance b. Most blocks c. Largest bandwidth d. Latest download or upload time If the total number of blocks accumulated to n or the end of list is reached, then the selection process is ended. Naturally, the rest of the n-n’ peer/s in the list are identified as backup peers. 5. Set up connections to those servers and peers in groups of 3 and 4 6. Reading in the blocks from all the hosts in 4 and save, then send out the blocks to all peers in group 3. (It is possible for two peers download and upload happening at once between each other) 7. If the any sending peer/s of the blocks in 1 is disconnected, then promote the redundant peer/s in group 5 to group 4. 8. If the blocks can not be found in any backup peers or there is no back up peers to be selected, then Peer lookup service is activated. 6.4 Landmark server Redirection executor When a request is received, it should be forwarded to the location and routing system, while a notification is sent to the accounting system to record the object’s popularity. If there is a routing request is received from the location and routing system for the PlentyCast client request, the executor shall forward this decision to the client. The decision is usually a list of replica servers who carry the blocks for the client’s request. When swarming delivery should be performed, the executor adds the PlentyCast client peers to the list of the replica servers, and then sends the request client/s. A hybrid solution can be chosen amongst the techniques in section 1.4.4.4.2 for the executor. 6.4.1 Placement The location of the landmarks must be placed at the edges of the ASs. The hierarchy shall be maintained in order to closely map the overlay network to Internet structure. 75 Thus they can provide good measurement results in binning techniques to locate the nearest servers28 as explained in section 2.3.2.1. 6.4.2 Download & upload monitor It is very important that the landmark knows the download and upload status of each PlentyCast client in the mesh during the swarming delivery. Thereby, the monitor keeps tracking each PlentyCast peers behavior and their performance. In addition, the degree of content saturation in the mesh must be monitored. This will provide information for peer selecting in the case of node joining and re-selecting on fault happening or on self-organization in the middle of transmission. This can be implemented by either using mobile agent [155] or a platform like JXTA. 6.4.3 Load balancing If the landmark is overloaded, load balancing shall be devised for the landmarks. A policy of load balancing must be established in the landmark server farm at each site. The following key metrics should be used in such a load balancing process: CPU load and memory must be monitored, Number of processes in the node must be monitored, and Number of PlentyCast clients. If any of the value of these metrics exceed thresholds which was configured, new request must be forwarded to the neighbor peers of landmarks. It means that landmark shall run on top of a P2P service. I recommend that the same Peer-to-Peer service platform or approaches shall be adopted as the rest of the subsystems such as Peer lookup service and download & upload monitor. 6.5 Distribution system 6.5.1 Object splitter If there is a copy of a large file is obtained from the content server, it shall be split into fixed length of blocks. Every block shall be in the size range of 32 to 64Kbyte. The design and implementation can be see in the Appendix 1. 6.5.2 FEC encoder This component is devised to encode the blocks generated by Object Splitter. The rationale is described in section 3.3. The actual implementation could follow the reference model in [156] [157] . 6.5.3 Block distributor Every block of this content shall be assigned unique ID. Assume l is the number of replica servers, in the first round distribution, if n+m > l then every replica servers gets (n+m)/l copies. If n+m ≤ l then randomly choose l – (n+m) blocks from the total n+m blocks and then distributed to the rest of replica servers and make sure there is no duplication during the distribution from each server. This can be done by either Tapestry or JXTA platform. In the second round distribution, number of the blocks shall be increased in certain replica servers. However, this time the distribution won’t be so simple. Since the location and routing system will notice that the popularity of certain content according 28 How close the severs are depends on how the level vectors have been defined, see section 2.3.2.1 76 to the report from the replica server and the landmarks in advance. The scale of the dissemination shall be smaller than the first round concerning the replica server load and overlay traffic load, but the portion of the content in this group should be increased. The location and routing systems shall input a list of the k replica servers ( in terms of its server selection process) and the portion of the content (x%). The same rule as in the first round, variety should be maintained. Thereby the following steps are mandatory in this match-making algorithm. 1. Identify how many blocks are included in this content. 2. Check the server list with the accounting system to find out which block IDs already exist in the servers, and then exclude these blocks. 3. Select the rest of the blocks of the content and put them in a list. 4. Work out the number of the blocks y to be added, by the following formula: y=x%∙(n+m). 5. Select the y blocks from the list in 3, and assign them to k buckets evenly. 6. The k buckets shall be sent to the k replica servers in the list29. 6.6 Replica server 6.6.1 Placement Replica servers shall be placed at the edges of the AS. Preferably, they can be place in the Internet Data Center of ISPs. Since we disseminate variety of blocks of the content across the different servers, the replica server only appears to be a storage for different blocks. In some cases, they even do not get any complete copy of this content. Thus we argue not to choose the approaches in section 1.4.4.1 in order to avoid the NP-hard problem. 6.6.2 Storage and delivery On one hand, the replica server should store the blocks of the content which are sent by distributor of the distribution system. On the other hand, they shall upload the blocks to the PlentyCast clients or peers who request the content. Whenever it downloads blocks based on the client request, it should cache blocks that have been delivered, in other words, it will increase the numbers of blocks of content in its storage till the entire content is cached locally. It should report the server load, available storage, and the hit rate of specific content to the accounting system. 6.6.3 Cone loading If there is a list of k buckets, they will be assigned to a group of k replica servers. Each ith replica server ( i=1,2,3,4,….,k ) should use a Peer-to-Peer look up service to find all the blocks in the ith bucket and download them from peers across the replica overlay. When the ith block is downloading from the peers, meanwhile it should upload the blocks plus the blocks stored on the request client. This procedure, it is named “cone loading”. The following Figure depicts this behavior in the overlay. 29 Another scenario is to let distributor directly establish one-to-many connections and spread the y blocks evenly to the servers in a round robin manner. The efficiency of these two scenarios should be compared in future work. 77 Figure 23. Cone loading In the above figure, A and B are the request clients for the content. They are concurrently download from the replica servers. The servers are downloading the blocks which they don’t have from peers other via Peer-to-Peer lookup service. When replica servers 1-10 are uploading the content blocks, they should cache the blocks which they are downloading from other peers or each other. The traffic generated can be high in the PlentyCast overlay the first time when the content being downloaded. But as more clients request the content, more blocks will be found in the overlay. The bound on the total storage should be considered in the replica server overlay X. 6.6.4 Active binning Every replica servers should periodically ping a set of landmarks to obtain the RTT between each of the landmarks. This is to used to bin different replica servers who have the same level as described in section 2.3.2.1. The level vectors can be either stored in each replica server or in the database of location and routing systems. This can be adjusted in the future work on simulation. To achieve low cost and fast content location, we recommend either choosing Tapestry or JXTA as the basic P2P service i.e. to provide peer discovery and content location. 6.7 Location and routing system 6.7.1 Policy engine The major feature of the location and routing system is to locate suitable replica servers for the user and push content reactively or proactively to the right group/s of replica servers in order to achieve high data availability for the requested content. The following metrics shall be considered in this policy engine: content hit rate and bin. 78 The small difference between our binning strategy and the one described in section 2.3.2.1 is to add server load and bandwidth to the level vectors. Similarly, we can bin the replica server load and its bandwidth into different levels too. A level vector will be described in the following format: Since more metrics have been added to the binning, the location and routing system should be able to make a more optimal selection. If a PlentyCast client request is received, but popularity of the content (i.e., hit rate) exceeds the threshold of PlentyCast system or the load of replica servers appears to be high, an order shall be issued to the landmark so that swarming delivery can be initiated to serve this request. 6.7.2 Server selector If a PlentyCast client request is received, the server selector will execute the same distributed binning algorithm as described in section 2.3.2.2. There are two additional metrics: server load and bandwidth will be checked by the binning algorithm. The lower the load and the higher their available bandwidth level is, the better servers are for the client. If the servers are selected, a server list will be sent to the client for block selection as described in section 6.3. 6.7.3 Block meter There should be 100% of the content loaded to this group after the group of servers have been chosen for the client. As described in section 6.6, the replica servers shall be responsible for collecting all the blocks of this content and then uploading them to the requesting client concurrently. The client shall choose the blocks and replicas by itself. This selection shall be applied to the following cases of client requests. The first is when the client appears should be a new client (on no blocks of content are downloaded) The second occurs when client has already yet partially downloaded the content (portions of blocks) and requests more blocks delivered from replica servers. 6.7.4 Content manager Periodically, the hit rate of all the content should be checked. If the hit rate of certain content is at a relatively low value, then the number of replica blocks disseminated in the replica servers can be reduced. The following steps are mandatory in such process: 1. Check the hit rate for each specific content object in the accounting system. 2. Sort the content hit rate by different replica servers. 3. Use Peer-to-Peer service to locate the content replicas. 4. Delete the replica for a specific time period (this parameter can be tuned in according to the bound on storage in each server) 5. Update the accounting database. 79 6.8 Accounting system There are two classes of data that should be accounted for: hit rate for each element content, and the server load of each replica server. The database header should cover the following mandatory keys as depicted in the following Table 5. Server ID Current load Current bandwidth Storage capacity Content ID Current Hit rate Number of blocks of the content Content ID Table 5. Accountig Database header The primary key is the Server ID and the secondly key is content ID. In this way, a content ID is bind to the server ID. 6.9 System characteristics analysis and discussion In this section, we would like to conduct a set of case studies by going through our system with different traffic demands from clients, while we will demonstrate how we can achieve our five goals while processing these different traffic demand in our CDN system. 6.9.1 Case study 1: Normal access mode In this case, we assume that the content has been published on a web site. The users’ access rate does not exceed the bound of the normal access mode. A PlentyCast client in the Internet finds the landmarks in the cache via a probe. After selecting the three closest landmarks, it records its own bin. Then sends a request (content URL and bin level) to nearest landmark for this content via HTTP. The landmark forwards this request to the location and routing system. The bin of this client will be compared with the bin of servers in the accounting database. If a similar bin exist, then the servers should be selected for the request. Since this is the first time a client is accessing the content, hit rate won’t be considered. After the servers are chosen, the distribution system must issue a list of buckets for the selected servers. Each replica server locates the their blocks which are contained in the bucket via the P2P lookup service, then downloads them from a replica server peer in overlay X. All new blocks for a replica server will be cached. This list of servers should be sent to the PlentyCast client for the blocks selection by the client. Since this is the first time to download the content, all servers will be chosen to upload the blocks of this content. A cone loading will be formed between replica servers grouped with this client. Since the blocks are relative small (32-64 Kbytes per block), downloading and uploading happen between replica server peers, and the replica overlay is relatively stable, the latency until the client gets the complete content should not be longer than a one-to-one download from content server. Another important aspect is that the client will only download n blocks due to the FEC coded blocks, even though the number of blocks prepared in the group of designated servers is n+m. Thereby, redundancy plus concurrent download – cone 80 loading provides both high data availability for content data and reduces user access latency. Although, this might increase the bandwidth consumption and increase the load on the replicas severs. However, we argue that it only happens when this content is first being accessed. When there are additional clients access this content, more blocks will be distributed to the client from nearby. We estimate that latency would be reduced by several orders of magnitude, and the bandwidth consumption will be reduced due to less and less download happening in the overlay. The only increasing cost is for storage in replica servers as more and more replica of the entire content are created. Fortunately, storage is cheaper than other resources in the network (i.e., rental of a 2M link a year). In addition, our content manager will act as garbage collector which periodically cleans out the deprecated content replicas. Thus, before any flash crowd or DDoS happens, the system increase the number of replicas within the performance bound of server overlay and thus improves content availability and reduces user access latency. Since we adopt a binning strategy for server selection, this enables clients to choose the server(s) closest to them. This location aware techniques ensure the traffic won’t flow over a long distance. It significantly reduces bandwidth consumption on two of the major Internet bottlenecks, specifically, the peering bottleneck and backbone bottleneck on the overlay network. Another advantage of binning is to localize access traffic, thereby it potentially contribute to improve infrastructure performance. 6.9.2 Case study 2: Flash Crowd and DDoS mode We now examine the system behavior for a high access traffic pattern. In this case, we assume that there are large number of PlentyCast clients accessing specific content via landmarks either in a short period of time or simultaneously (as in DDoS). The alarm caused by such events are reported to the location and routing system by the accounting system (this is actually reported by the landmarks since the accounting system collected the hit rate values). The location and routing system start the server selection process for the number of routing requests of some of the clients. Based on the number of requests which can not be handled, it will issue orders to the landmarks (as if the request were bounced back). When a landmark receives this order, it will start a swarming delivery process. A mesh for this content will be formed. The binning process in each client gradually causes the access traffic to scale on each LAN. The upload and download policy in each client creates more and more replicas in the mesh. Thus the P2P overlay Z enables high access rate traffic of the clients to be counteracted by themselves (i.e., these clients in turn act as servers to propagate the content). In the high access rate traffic pattern, the PlentyCast system use P2P swarming delivery techniques to solve the problems of Flash crowds and DDoS. As in case 1, PlentyCast can significantly decrease the user’s access latency and reduce bandwidth consumption on the Internet by enabling client access of the content in the same LAN to increase gradually. Given the nature of a hybrid P2P overlay, network scalability has been significantly improved. Since the data torrent has been organized into overlays Z to X, the bottlenecks on peering, backbone and first mile are completely 81 alleviated. In the last mile, bandwidth consumption can be very high, but the torrent of traffic eventually exist on a LAN (where link layer multicast can be used). Thus it will be easier for a local ISP to handle these events. In a flash crowd, ISP shall does nothing, as the ISP can easily apply the DDoS detection tool such as [159]. 6.9.3 Case study 3: ADSL users traffic In the normal access mode described in section 6.9.1, the PlentyCast client behind an ADSL modern will take the advantage of the broadband downlink – usually with 18Mbps bandwidth on the downlink. This and FEC combine to enable very fast download of content. However, the asymmetry will cause problem as described in section 3.2 when swarming delivery happens. The upstream connection cannot upload the same amount of content as it is downloading. Even worse, because of the details of TCP's rate control, if the upstream path is congested, the downstream performance suffers as well. In PlentyCast, we adopted the mechanisms from both in Swarmcast and BitTorrent. The block selection process in a PlentyCast client uses distance metrics to choose the number of closest peers for download and upload, this exploits ADSL peers located in the same LAN. Thus the TCP performance has been improved and we can potentially avoid upstream being clogged. In addition, since the block size has been limited to a quite small size (32-64Kbytes per block), this also alleviates the last-mile bottleneck. We exploit the chock algorithm from BitTorrent [4]. As an exception to simultaneous upload and download, a PlentyCast can stop uploading to specific peers. If a PlentyCast client is located behind the ADSL modern, it will execute Pause algorithm for swarming delivery. Every 10 seconds, the ADSL peer will evaluate the upload rate for the peers who upload to it. It will choose the of peers whose sum of upload rate equals its uplink rate. Then it will download only from these peers, but stop others. To save overhead during each TCP session, the ADSL peer open the connection to those paused-download peers. Every 30 seconds, the ADSL peer will resume one of the paused-download peer regardless that peer’s upload rate. In this way, the ADSL peer alleviates the bottleneck brought by TCP “fairness” attribute. In accordance with the result in study [11], when different users (dialup, broadband, and office users) co-exist in the same data mesh, the performance of swarming performance in each class of users won’t be effected significantly. In PlentyCast, we actually periodically makes each ADSL peer converge just as a symmetric peer whose download and upload rate trend to be equal mean while we will not degrade the advantage of broadband i.e., the download link can be saturated during another period of time. Thus our system can contribute to enhanced performance of the infrastructure. 6.9.4 System characteristics Since PlentyCast is a new type of CDN and primarily uses techniques of P2P, Agent, and Error correction coding, we argue that the following criteria and attributes in CDN and P2P (described in section 1.2.3 and section 2.2 ) should be used to evaluate our system. They are : 1. Fast access 82 2. Robustness 3. Transparency 4. Efficiency 5. Adaptive 6. Reliability 7. Simplicity 8. Security 9. Decentralization 10. Scalability 11. Self-organization 12. Anonymity 13. Cost of ownership 14. Ad-hoc connectivity 15. Performance 16. Digital Rights Management 17. Usability and Transparency 18. Fault-resilience 19. Manageability 20. Interoperability Figure 24. PlentyCast system characteristics Using the above criteria, we can define 5 grades for each of these attributes. From the highest to the lowest: Very good, good, acceptable, not good, and worse. Table 5 will explain our motivation to state our systems characteristics: 83 Very good Fast access Robustness Good Cone loading Swarming download intelligent location& routing High content availability Fault resilient Transparency Efficiency Adaptive Reliability Simplicity Acceptable FEC reduces overheads Good solution to Internet bottlenecks Flash crowd and DoS aware Intelligent location and routing Extended Binning strategy Very high content availability Load balancing design in landmark Intelligent location and routing Most of techniques are verified by good practices Security in the signaling system Enforced data integrity when using secured hashing Decentralization Selforganization Worse PlentyCast inherited from web browser Small configuration problems in NAT and firewalls Security Scalability Not good This is a hybrid architecture design P2P service design Intelligent location and routing system 84 Automatic switch between normal mode and high traffic mode Localize the intensive traffic in flash crowd and DoS Pause algorithm for ADSL users Anonymity Not yet designed Intelligent location and routing system Mechanism for cost sharing in Flash crowd and DoS Swarming delivery Landmark load sharing Swarming delivery with download and upload strategy Aggressive content / peer selection algorithm Adaptive distribution system Replica server caching Intelligent routing Cost of ownership Ad-hoc connectivity Performance DRM Usability Fault-resilience Not yet designed Low access latency High content availability Good network adaptive High content availability by using FEC codes Concurrent download Intelligent location and routing 85 Manageability Interoperability Accounting system for whole system Landmarks for selfmanagement in swarming Pause algorithm to deal with ADSL in swarming delivery Need to developed solution for NAT and firewall Table 6. System characteristics clarification 86 Chapter 7 Conclusion and future work In this project, we have made an intensive literature study in the area of CDN, P2P, Swarming, and Error Correction coding. Based on our findings, we proposed a novel CDN architecture to achieve: improved access latency, improve network scalability, improve content availability, lower bandwidth consumption, improve infrastructure performance. Finally, we find that on our CDN – PlentyCast can in theory accomplish all the goals. 7.1 Conclusion Large content distribution has become a key issue in the Internet. P2P file-sharing techniques have been recognized as very efficient for large content distribution. Swarming content delivery increases user access speed by several orders of magnitude. It is inevitable that P2P swarming techniques will be used in the next generation CDNs. One signal is very clear, a new generation of CDN will emerge very soon. Big companies like Intel, have already built P2P services to use all hosts to share large content across the enterprise network [160]. In media industry, large content streaming delivery over Internet become an inevitable trend. Meanwhile, the war between traditional distribution & record industry and P2P companies & users becomes a barrier to P2P distribution of content on the Internet (for instance, consider the battle between R1AA and P2P companies in US.A.). However, P2P delivery will soon be very significant because of its attractive features for the Internet users – which is to deliver large content quickly. 7.2 Future work First of all, we will build a software prototype system in the very near future. And this prototype will be tested for a real event soon in 2005. An intensive evaluation test will be conducted for all the features we described in the PlentyCast architecture. There are still open questions in the area of large content distribution such as: (i) How to deliver dynamic large content, for example real time streaming media distribution. The biggest challenge is real time streaming content distribution is the synchronization between voice stream and video steam over a long distance in terrestrial networks. The longer the distance is, the harder they can be synchronized. In addition, how to improve the user access rate for such content becomes even harder. To our best knowledge, there is no optimal solution in this area. The only known attempts is the MDC (Multiple Description Coding) solution in Microsoft CoopNet [8]. (ii) How to efficiently protect these applications against malicious peers. DDoS attack is firstly using email to propagate the virus to large number of innocent clients, then activate the web request on certain day to a targeted web server. In our architecture, we use landmarks to manage the overlay Z. If there is malicious PlentyCast clients having DDoS to targeted landmarks, this will have very serious consequences. We must investigate how to handle this kind of situation. (iii) What AAA mechanism shall be used in such CDN for limited distribution. PlentyCast client shall be authenticated and authorized and accounted in an efficient way in order to avoid problem in (ii) and provide base for comprehensive DRM solutions for some specific content distribution 87 (iv) (v) How to develop Digital Rights Management solution based on such CDN. Most of content can be distributed without any copy right or license concerns but some specific content such as licensed software and movie file with copy righted etc. If the contents shall not be infringed and only available for viewing then a DRM solution must be developed in our CDN. How to run a good business model for this type of CDN. What we addressed in business model is about the value chain amongst users, Content providers, and ISPs. Firstly, if the users use our network, they contribute their resources (CPU, memory, and storage) for delivery. Actually they have already paid something whenever they download. So this avoids problem that "need users to pay" which most users dislike. Secondly, as a content provider, he or she is still paying to the CDN operator for content distribution, but they can pay _less_ than today because today's CDN is considered as an expensive solution, not all web content providers can use. Thirdly, the CDN operators shall pay _less_ than today either because their cost has been shared with the users and our solution save bandwidth in their network. Fourthly, the tier 1 and 2 ISPs’ traffic engineers are happy to have relatively predictable traffic pattern appears in their dimension tools. The Tier-3 local ISP in our solution will have more traffic on their LANs. But this is much better than before. Today’s Tier-3 ISP face bother User’s large traffic and expensive long haul links to their Tier-2 or 3 ISPs. Our solution actually alleviated downlink bottleneck. The LAN network expansion shall be more fun than their long haul expansion. For example, they may ask their subscriber to expand their access network to higher capacity links such as FDDI, wireless LAN etc. However, a deep investigation on this is needed. Introduction to the author: Cao Weiqiu (Mike) graduated with his computer application B. Eng. degree in China National University of Defense Technology in July 1994. He used to work in Ericsson from 1994 to 2002. During the course, he worked with switches, optical transmission systems, and network management systems of Telecommunication as both engineer and managerial roles. When studying in Royal Institute of Technology (KTH), he had examined internetworking technology in many aspects. 88 Reference [1] Jia Wang, A survey of Web Caching Schemes for the Internet. ACM SIGCOMM Computer Communication Review. Volume 29 , Issue 5 (October 1999) Pages: 36 - 46 [2] Gang Peng, CDN: Content Distribution Network http://citeseer.ist.psu.edu/peng03cdn.html accessed on 2004-03-12 18:25 [3] www.akam www.akamai.com accessed on 2004-03-12 18:22 [4] Bram Cohen, Incentives Build Robustness in BitTorrent http://www.google.com/url?sa=U&start=1&q=http:// http:// bitconjurer.org/BitTorre nt/bittorrentecon.pdf&e=1102 accessed on 2004-03-12 18:20 [5] Dejan S. Milojicic, Vana Kalogeraki, Rajan Lukose, Kiran Nagaraja1, Jim Pruyne, Bruno Richard, Sami Rollins 2 , Zhichen Xu. Peer-to-Peer Computing. HP Laboratories Palo Alto. HPL-2002-57 (R.1) July 3rd , 2003 [6] Kunwadee Sripanidkulchai, Bruce Maggs, and Hui Zhang. Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems, Infocom 2003. [7] Elisa Turrini, Fabio Panzieri. Using P2P Techniques for Content Distribution Internetworking: A Research Proposal http://csdl.computer.org/comp/proceedings/p2p/2002/1810/00/18100171.pdf accessed on 2004-03-12 18:17 [8] http://research.microsoft.com/~padmanab/projects/coopnet/ accessed 200401-05 18:01 [9] http://www.esi.org/ accessed in 2004-01-05 19:08 [10] Venkata N. Padmanabhan, Helen J. Wang, and Philip A. Chou. Distributing Streaming Media Content Using Cooperative Networking. Accessed in http://research.microsoft.com/projects/coopnet/ [11] Daniel Stutzbach, Daniel Zappala, and Reza Rejaie, Swarming: Scalable Content Delivery for the Masses. January, 2004 (Technical Report, UO-CISTR-2004-1). [12] Akamai white paper – Internet Bottlenecks Accessed on 2004-01-6 22:22 http://www.akamai.com/en/html/services/white_paper_library.html [13] Broadband Access in Korea: Experience and Future Perspective by YongKyung Lee and Dongmyun Kee, IEEE Communications Magazine, December 2003, pp. 30-36. [14] Controlling P2P Traffic, accessed on LIGHT Reading on 2004-01-07 01:05 http://www.lightreading.com/document.asp?doc_id=44435 [15] www.p www.p-cube.com. Accessed on 2004-01-07 01:16am [16] http://www.calresco.org/sos/sosfaq.htm#1.2 accessed on 2004-01-07 01:38am [17] Analyzing Peer-to-Peer Traffic Across Large Networks, by Shubho Sen and Jia Wang, to appear in ACM/IEEE Transactions on Networking, 2004. Accessed on 2004-01-07 20:27 [18] http://www.openp2p.com/pub/q/p2p_category Access on 2004-01-07 20:45 [19] http://www.eweek.com/article2/0,4149,1431490,00.asp accessed on 200301-11 11:37 [20] Craig Labovitz, G. Robert Malan, and Farnam Jahanian, Originals of Internet Instability, http://citeseer.nj.nec.com/labovitz99origins.html , accessed on 2004-01-16 17:00 [21] http://www.ietf.org/rfc/rfc894.txt and http://www.ietf.org/rfc/rfc1042.txt http://www.ietf.org/rfc/rfc1042.txt, accessed on 2004-01-16 18:25 89 [22] Katia Obraczka and Fabio Silva, Network Latency Metrics for Server Proximity http://citeseer.nj.nec.com/obraczk http://citeseer.nj.nec.com/obraczka00network.html, accessed on 2004-01-16 19:34 [23] Mohammed J., Kabir, Apache Server Administrator's Handbook, IDG Books, pp.68-69, [24] Charles D. Cranor et al., Enhanced Streaming Services in a Content Distribution Network, IEEE Internet Computing, July 2001. [25] M. Charikar, and S. Guha. Improved Combinatorial Algorithms for the Facility Location and K-Median Problems. In Proc. of the 40th Annual IEEE Conference on Foundations of Computer Science, 1999. [26] Lili Qiu, Venkata N. Padmanabhan, and Deoffery M. Voelker. On the placement of web server replicas. In Proceedings of IEEE INFOCOM 2001 Conference, Anchorage, Alaska USA, April 2001. [27] Year Brutal. Probabilistic approximation of metric space and its algorithmic applications. In 37th Annual IEEE Symposium on Foundations of Computer Science, October 1996 [28] Vijay Varian. Approximation methods. Springer-Verlag, 1999 [29] Pavlin Radoslavov, Ramesh Govindan, and Deborah Estrin TopologyInformed Internet Replica Placement (2001) Proceedings of WCW'01: Web Caching and Content Distribution Workshop, Boston, MA [30] K. Kangasharju, J. Roberts, and K. Ross. Object replication strategies in content distribution networks. In 6th International Web Caching Workshop and Content Delivery Workshop, Boston, MA, 2001. [31] Gary Audin. Reality check on five-nines. Business Communications Review, pages 22–27, May 2002. [32] Madhukar R. Korupolu Michael Dahlin Coordinated Placement and Replacement for Large-Scale Distributed Caches (1998) [33] Jussi Kangasharju, James Roberts, and Keith W. Ross, Object Replication Strategies in Content Distribution Networks (2001) [34] http://mathworld.wolfram.com/NP http://mathworld.wolfram.com/NP-HardProblem.html accessed on 2004-0201 20:55 [35] http://www.nist.gov/dads/HTML/zipfslaw.html www.nist.gov/dads/HTML/zipfslaw.html accessed on 2004-02-01 21:17 [36] Yan Chen, Lili Qiu, Weiyu Chen, Luan Nguyen, Randy H. Katz, Efficient and Adaptive Web Replication using Content Clustering. http://citeseer.nj.nec.com/596760.html accessed on 2004-02-02 13:03 [37] M. Kafisson, C. Karamanolis, and M. Mahalingam. A Framework for Evaluating Replica Placement Algorithm. Technical report, HP Lab, 2002. http://www.hpl.hp.com/personal/Magnus_Karlsson/papers/rp_framework.pdf accessed on 2004-02-02 14:45 [38] Hyper Text Transfer Protocol HTTP RFC 1945 http://www.ietf.org/rfc/rfc1945.txt?number=1945 accessed on 2004-01-19 http://www.ietf.o 22:24 [39] http://en.wikipedia.org/wiki/OSI_model accessed on 2004-02-06 2004 19:57 [40] http://www.gnutella.com/ www.gnutella.com/ accessed on 2004-02-07 15:51 [41] http://www.kazaa.com/us/index.htm accessed on 2004-02-07 15:51 [42] http://www.freenetproject.org/ accessed on 2004-02-07 15:51 [43] D. R. Boggs. Internet Broadcasting. PhD thesis, Electrical Engineering Dept., Stanford University, January 1982. Also Tech. Rep. CSL-83-3, Xerox PARC, Palo Alto, Calif. 90 [44] C. Partridge, T. Mendez, and W. Milliken. RFC 1546: Host Anycasting service, November 1993 http://www.ietf.org/rfc/rfc1546.txt?number=1546 accessed 2004-02-07 18:53 [45] Z. Fei, S. Bhattacharjee, E. Zegura, and M. Ammar. A Novel Server Selection Technique for Improving the Response Time of a Replicated Service. In Proc. of IEEE Infocom, March 1998.Page 783-791 [46] http://www.ietf.org/rfc/rfc3568.txt?number=3568 RFC 3568 Known Content Network (CN) Request-Routing Mechanisms. Accessed on 2004-0207 21:27 [47] Domain Name Service DNS RFC 1591 http://www.ietf.org/rfc/rfc1591.txt?number=1591 accessed on 2004-01-25 21:26 [48] IAB Architectural and Policy Considerations for Open Pluggable Edge Services http://www.ietf.org/rfc/rfc3238.txt?number=3238 accessed on 200402-08 9:59 [49] E. Cohen, B. Krishnamurthy, and J. Rexford, Efficient algorithms for predicting requests to Web servers, Proceedings of Infocom’99. [50] A. Dingle and T. Partl. Web cache coherence. In Proc Fifth International WWW Conference, May 1996. [51] RFC 791 http://www.ietf.o http://www.ietf.org/rfc/rfc0791.txt?number=791 accessed on 200402-07 16:04 [52] V. Cate, Alex - a global file system, Proceedings of the 1992 USENIX File System Workshop, pp. 1-12, May 1992. [53] A. Chankhunthod, P. B. Danzig, C. Neerdaels, M. F. Schwartz, and K. J. Worrel, A hierarchical Internet object cache, Usenix’96, January 1996. [54] A. Luotonen and K. Altis, World Wide Web proxies, Computer Networks and ISDN Systems, First International Conference on WWW, April 1994. [55] D. Wessels, Intelligent caching for World-Wide Webobjects, Proceedings of INET’95, Honolulu, Hawaii, June 1995. [56] B. Krishnamurthy and C. E.Wills, Study of piggyback cache validation for proxy caches in the World Wide Web, Proceedingsof the 1997 USENIX Symposium on Internet Technology and Systems, pp. 1-12, December 1997. [57] B. Krishnamurthy and C. E. Wills, Piggyback server invalidation for proxy cache coherency, Proceedings of the WWW-7 Conference, pp. 185-194, 1998. [58] B. Krishnamurthy and C. E. Wills, Proxy cache coherency and replacement towards a more complete picture, ICDC99, June 1999. [59] V. Duvvri, P. Shenoy, and R. Tewari. Adaptive leases: A strong consistency mechanism for the world wide web. In INFOCOM, 2000. [60] Z. Fei. A Novel Approach to Managing Consistency in Content Distribution Networks. In Proceedings of the 6th Workshop on Web Caching and Content Distribution, Boston, MA, June 2001. [61] J. Chuang and M. Sirbu, “Pricing multicast communication: A cost based approach,” in Proceedings of INET’98, July 1998. Geneva, Switzerland. [62] John Dilley, Martin Arlitt, Stephane Perret, and Tai Jin. The Distributed Object Consistency Protocol. Technical report, Hewlett-Packard Labs Technical Reports, 1999. [63] http://thewhir.com/marketwatch/nas010604.cfm accessed on 2004-01-11 13:54 [64] http://www.eweek.com/print_article/0,3048,a=116052,00.asp accessed on 2004-02-09 19:49 91 [65] http://www.contentalliance.org/docs/draft http://www.contentalliance.org/docs/draft-green-cdnp-gen-arch-02.html accessed on 2004-02-09 20:01 [66] http://www.ietf.org/rfc/rfc3570.txt?number=3570 accessed on 2004-02-09 20:04 [67] http://www1.ietf.org/mail http://www1.ietf.org/mail-archive/ietf-announce/Current/msg24223.html accessed on 2004-02-09 20:07 [68] Alexandros Biliris, Chuck Cranor, Fred Douglis, Michael Rabinovich, Sandeep Sibal, Oliver Spatscheck, Walter Sturm. CDN Brokering. Proceedings of WCW'01 [69] http://www.etouch.net/ accessed on 2004-02-09 20:20 [70] http://www.speedera.com/ accessed on 2004-02-09 20:22 [71] http://www.sprint.com/ accessed on 2004-02-09 20:23 [72] K. Lee, S. Chari, A. Shaikh, S. Sahu, and P. Cheng. Protecting Content Distribution Networks. IBM Research Report RC 22566, September [73] http://staff.washington.edu/dittrich/misc/ddos/ accessed 2004-02-09 20:50 [74] L. Amini, A. Shaikh, H. Schulzrinne. Effective Peering for Multi-provider Content Delivery Services. to appear in Proc. of IEEE INFOCOM, June 2004. [75] http://www.cisco.com/ accessed on 2004-02-10 17:04 [76] http://newsroom.cisco.com/dlls/innovators/content_netwrk/boomerang.html accessed on 2004-02-10 16:54 [77] http://www.cisco.com/univercd/cc/td/doc/pcat/cr4450.htm accessed on 2004-02-10 18:21 [78] http://e http://en.wikipedia.org/wiki/Peer-to-Peer accessed on 2004-02-10 21:36 [79] Dejan S. Milojicic, Vana Kalogeraki, Rajan Lukose, Kiran Nagaraja1, Jim Pruyne, Bruno Richard, Sami Rollins 2 , Zhichen Xu. Peer-to-Peer Computing. HP Laboratories Palo Alto. HPL-2002-57 (R.1) July 3rd , 2003 [80] http://en.wikipedia.org/wiki/Pattern accessed 2004-02-11 19:35 [81] http://en.wikipedia.org/wiki/Scalability accessed on 2004-01-11 21:46 [82] Sylvia Ratnasamy, Mark Handley, Richard Karp, Scott Shenker. Application-level Multicast using Content-Addressable Networks (2001). Lecture Notes in Computer Science. Proceedings of the Third International COST264 Workshop on Networked Group Communication table of contents. Pages: 14 - 29 [83] Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, Hari Balakrishnan. Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications (2001). [84] http://oceanstore.cs.berkeley.edu/ accessed on 2004-02-11 23:18 [85] P. Druschel and A. Rowstron. "Past: Persistent and anonymous storage in a peer-to-peer networking environment," in Proc. the 8th IEEE Work. Hot Topics in Operating Systems (HotOS), Germany, May 2001, pp. 65--70. [86] Principia Cybernetica Web http://pespmc1.vub.ac.be/SELFORG.html accessed on 2004-02-11 23:24 [87] Ben Y. Zhao, Yitao Duan, Ling Huang Anthony D. Joseph, John D. Kubiatowicz. Brocade: Landmark Routing on Overlay Networks (2002). [88] ROWSTRON, A., AND DRUSCHEL, P. Pastry: Scalable, distributed object location and routing for large-scale Peer-to-Peer systems. In Proceedings of IFIP/ACM Middleware 2001 (November 2001). [89] Zhao B. Y., Kubiatowicz J. D., and Joseph A. D. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Tech. Rep. UCB/CSD-01-1141, UC Berkeley, EECS, 2001. 92 [90] http://set http://setiathome.ssl.berkeley.edu/ accessed on 2004-02-12 15:37 [91] http://bitconjurer.org/BitTorrent/ accessed on 2004-02-12 15:39 [92] http://www.napster.com/ accessed on 2004-02-12 15:50 [93] Bryce Wilcox-O’Hearn. Experience deploying a large-scale emergent network First international workshop, IPTPS 2002 Cambridge, MA, USA, March 2002, LNCS 2429 [94] Beverly Yang, Hector Garcia-Molina. Comparing Hybrid Peer-to-Peer Systems (2001). The VLDB Journal [95] S. Milgram, "The small world problem," Psychology Today, 61(1) (1967). [96] The Small World Web. 2000. Adamic, L. Technical Report, Xerox Palo Alto Research Center. [97] M. K. Ramanathan, V. Kalogeraki, J. Pruyne, Finding Good Peers in the Peer-to-Peer Networks. International Parallel and Distributed Computing Symposium, 2001. Fort Lauderdale, Florida, April 2002. [98] Stefan Saroiu, P. Krishna Gummadi, and Steven D. Gribble. A Measurement Study of Peer-to-Peer File Sharing Systems (2002). Proceedings of Multimedia Computing and Networking 2002 (MMCN '02) [99] Miguel Castro and Barbara Liskov. Practical Byzantine Fault Tolerance (1999). OSDI: Symposium on Operating Systems Design and Implementation [100] http://www.ietf.org/rfc/rfc2989.txt?number=2989 www.ietf.org/rfc/rfc2989.txt?number=2989 accessed on 200402-12 21:15 [101] http://en.wikipedia.org/wiki/Digital_rights_management accessed on 2004-02-12 21:16 [102] http://www.ietf.org/rfc/rfc1631.txt?number=1631 accessed on 200402-12 21:21 [103] http://www.ietf.org/rfc/rfc3093.txt?number=3093 accessed on 200402-12 21:25 [104] G. Coulouris, J. Dollimore, and T. Kindberg. Distributed Systems Concepts and Design. Addison-Wesley, third edition, 2001. [105] S. Katzenbeisser and F.A.P. Petitcolas. Information Hiding: Techniques for Steganography and Digital Watermarking," Artech House. [106] http://www.riaa.com/ accessed on 2004-02-13 08:15 [107] Miguel Castro. Practical Byzantine Fault Tolerance (2001). OSDI: Symposium on Operating Systems Design and Implementation [108] http://www.stanford.edu/group/pandegroup/genome/ accessed on 2004-02-13 9:47 [109] http://www.endeavors.com/securecollaboration.html accessed on 2004-02-13 10:31 [110] http://www.groove.net/ accessed on 2004-02-13 10:33 [111] http://www.yahoo.com/ accessed on 2004-02-13 10:50 [112] http://web.icq.com/ accessed on 2004-02-13 10:52 [113] S. Halabi and D. McPherson. Internet Routing Architectures. Cisco Press, 2nd edition., 2001. Chapter 6 Tuning BGP Capabilities [114] Ludmila Cherkasova and Jangwon Lee. FastReplica: Efficient Large File Distribution within Content Delivery Networks. HPL-2003-43 20030319 External [115] Marc Waldman, Aviel Rubin, and Lorrie Cranor. Publius: A robust, tamper-evident, censorship-resistant web publishing system (2000). Proc. 9th USENIX Security Symposium 93 [116] http://en.wikipedia.org/wiki/Interoperability accessed on 2004-02-13 18:41 [117] http://www.jxta.org/ accessed on 2004-02-13 18:43 [118] http://www.beepcore.org/beepcore/docs/wp-p2p.jsp accessed on http://www.beepcore.org/beepc accessed on 2004-02-13 18:45 [119] Siu Man Lui and Sai Ho Kwok, Interoperability of Peer-To-Peer File Sharing. ACM SIGecom Exchanges Volume 3 , Issue 3 Summer, 2002. Pages: 25 – 33, [120] http://www.audiogalaxy.com/ accessed on 2004-02-14 16:15 [121] K. Gummadi, R. Gummadi, S. Gribble, S. Ratnasamy, S. Shenker, and I. Stoica. The Impact of DHT Routing Geometry on Resilience and Proximity Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications table of contents. Karlsruhe, Germany SESSION: Peer-to-peer table of contents Pages: 381 394 2003 [122] C. Greg Plaxton, Rajmohan Rajaraman, and Andréa W. Richa. Accessing nearby copies of replicated objects in a distributed environment. ACM Symposium on Parallel Algorithms and Architectures. [123] Roberto Rinaldi and Marcel Waldvogel. Routing and Data Location in Overlay Peer-to-Peer Networks. IBM Research. Zurich Research Laboratory [124] David Bindel, Yan Chen, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Christopher Wells, Ben Zhao, and John Kubiatowicz. OceanStore: An Extremely WideArea Storage System (2000). http://citeseer.ist.psu.edu/bindel00oceanstore.html accessed on 2004-03-12 17:46 [125] A. Demers, K. Petersen, M. Spreitzer, D. Terry, M. Theimer, and B.Welch. The Bayou architecture: Support for data sharing among mobile users. In Proc. of IEEE Workshop on Mobile Computing Systems & Applications, Dec. 1994. [126] Sylvia Ratnasamy, Mark Handley, Richard Karp, Scott Shenker. Topologically-Aware Overlay Construction and Server Selection (2002). Proceedings of IEEE INFOCOM'02 [127] Xin Yan Zhang, Qian Zhang, Member, Zhensheng Zhang, Gang Song and Wenwu Zhu. A Construction of Locality-Aware Overlay Network: mOverlay and Its Performance. IEEE IEEE Journal on Selected Areas in Communications. [128] Rolf Winter, Thomas Zahn, and Jochen Schiller. Institute of Computer Science. Topology-Aware Overlay Construction in Dynamic Networks. Freie Universität Berlin, Germany [129] Miguel Castro, Peter Druschel, and Y. Charlie Hu. Topology-aware routing in structured peer-to-peer overlay networks (2002). Antony Rowstron. Microsoft Research [130] S. Hotz, “Routing information organization to support scalable routing with heterogeneous path requirements,” Tech. Rep. PhD thesis (draft), University of Southern California, 1994. [131] http://en.wikipedia.org/wiki/Artificial_intelligence accessed on 200402-19 11:57 [132] http://www.molbio.ku.dk/MolBioPages/abk/PersonalPages/Jesper/Swa rm.html accessed on 2004-02-19 23:05 94 [133] http://www.msci.memphis.edu/~franklin/AgentProg.html accessed on 2004-02-21 12:12 [134] http://tracy.informatik.uni-jena.de/research.html accessed on 2004-02http://tracy.informatik.uni 21 12:12 [135] Russell, Stuart J. and Norvig, Peter, Artificial Intelligence: A Modern Approach, Prentice Hall, 1995. [136] http://en.wikipedia.org/wiki/Flash_crowd accessed on 2004-02-124 20:33 [137] K. Kong, and D. Ghosal. Mitigating server-side congestion in the Internet through Pseudoserving. IEEE/ACM Transactions on Networking 7, 4 (Aug. 1999), 530-544. [138] Tyron Stading, Petros Maniatis, and Mary Baker. Peer-to-Peer Caching Schemes to Address Flash Crowds. In 1st International Workshop on Peer-toPeer Systems (IPTPS 2002), March 2002. [139] Angelos Stavrou, Dan Rubenstein, and Sambit Sahu. A Lightweight, Robust P2P System to Handle Flash Crowds. In IEEE ICNP, November 2002. [140] Fast File Splitter: http://www.maros http://www.maros-tools.com/products/FFS/ accessed on 2004-02-25 10:26 [141] Turbo File Split: http://www.computer http://www.computersoftware.org/utilities/file_compression/turbo_file_split.asp accessed on 200402-25 10:22 [142] NET energy http://www.computer http://www.computersoftware.org/utilities/file_compression/turbo_file_split.asp accessed on 200402-25 10:27 [143] Easy File Splitter http://www.filesplitter.net/ accessed on accessed on 2004-02-25 10:27 [144] http://onionnetworks.com/ accessed on 2004-02-27 9:50 [145] http://sourceforge.net/projects/swarmcast/ accessed 2004-02-27 9:51 [146] http://bitconjurer.org/BitTorrent/protocol.html accessed 2004-02-27 11:25 [147] http://www.cs.ucr.edu/~csyiazti/courses/cs204/project/html/final.html# http://www.cs.ucr.edu/~csyiazti/courses/cs204 foot1728 accessed on 2004-02-29 23:25 [148] Matei Ripeanu and Ian Foster, Mapping Gnutella Network, 1st International Workshop on Peer-to-Peer Systems, Cambridge, Massachusetts, March 2002 [149] I. S. Reed and G. Solomon, “Polynomial Codes Over Certain Finite Field”, J. SIAM, vol. 8, pp. 300-304, 1960. [150] M. Luby, L. Vicisano, J. Gemmell, L. Rizzo, M. Handley, J. Crowcroft. RFC 3453: The Use of Forward Error Correction (FEC) in Reliable Multicast. December 2002 http://www.ietf.org/rfc/rfc3453.txt?number=3453 accessed on 2004-03-04 18:26 [151] D. Patterson, G. Gibson, R. Katz. A case for redundant arrays of inexpensive disks (RAID), Proceedings of ACM SIGMOD '88, 1988. [152] Ann Chervenak. Tertiary Storage: An Evaluation of New Applications PhD thesis in University of California at Berkeley (1994) PhD thesis [153] M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. Spielman, "Efficient Erasure Correcting Codes", IEEE Transactions on Information Theory, Special Issue: Codes on Graphs and Iterative Algorithms, pp. 569584, Vol. 47, No. 2, February 2001. 95 [154] J. Lacan, L. Lancérica, and L. Dairaine, When FEC Speed up Data Access in P2P Networks. Proceedings of IDMS'02 Conference (Interactive Distributed Multimedia Systems), Coimbra, Portugal, 2002 [155] Koon-Seng Lim and Rolf Stadler. Developing pattern-based management programs. CTR Technical Report 503-01-01 August 6, 2001 http://www.ctr.columbia.edu/~stadler/papers/MMNS01 http:// www.ctr.columbia.edu/~stadler/papers/MMNS01-Lim-StadlerLong.pdf accessed on 2004-03-05 10:49 [156] http://www.fokus.gmd.de/research/cc/mobis/products/fec_old/content. html accessed on 2004-03-07 16:15 [157] http://info.iet.unipi.it/~luigi/fec.html accessed on 2004-03-07 16:16 [158] Patric Hadenius. "Relieving Peer-to-Peer Pressure" Technology Review (02/25/04) http://www.technologyreview.com/ accessed on 2004-0228 20:00 [159] http://www.securiteam.com/tools/5BP0G000IW.html accessed on 2004-03-10 21:48 [160] http://www.intel.com/business/newstech/peertopeer.pdf accessed on 2004-03-12 15:32 [161] FTP stands for File Transfer Protocol RFC http://www.w3.org/Protocols/rfc959/Overview.html accessed on 2004-02-07 http://www. 23:09 [162] RTSP stands for Real Time Streaming Protocol http://www.ietf.org/rfc/rfc2326.txt?number=2326 accessed on 2004-02-07 23:09 [163] http://www.ietf.org/rfc/rfc3568.txt?number=3568 accessed 2004-02-07 22:31 [164] RFC 2782 http://www.ietf.org/rfc/rfc2782.txt?number= http://www.ietf.org/rfc/rfc2782.txt?number=2782 accessed on 2004-02-07 23:34 [165] RFC 2246 http://www.ietf.org/rfc/rfc2246.txt?number=2246 accessed on 2004-02-07 23:40 [166] http://en.wikipedia.org/wiki/Peer-to-Peer accessed on 2004-04-02 http://en.wi 18:31 [167] http://legion.virginia.edu/ accessed on 2004-04-04 20:18 [168] http://www.ietf.org/rfc/rfc2914.txt?number=2914 accessed on 2004 http://www.ietf.org/rfc/rfc2 04-05 10:40 96 Appendix 1: An typical program of how to split a large file into pieces The split program splits large text files into smaller pieces. It is written in Awk. By default, the output files are named `xaa', `xab', and so on. Each file has 1000 lines in it, with the likely exception of the last file. To change the number of lines in each file, you supply a number on the command line preceded with a minus, e.g., `500' for files with 500 lines in them instead of 1000. To change the name of the output files to something like `myfileaa', `myfileab', and so on, you supply an additional argument that specifies the filename. The program first sets its defaults, and then tests to make sure there are not too many arguments. It then looks at each argument in turn. The first argument could be a minus followed by a number. If it is, this happens to look like a negative number, so it is made positive, and that is the count of lines. The data file name is skipped over, and the final argument is used as the prefix for the output file names. # split.awk -- do split in awk # Arnold Robbins, [email protected], [email protected], Public Public Domain Domain # May 1993 # usage: split [-num] [sourcefile] [destinationfile] BEGIN { outfile = "x" count = 1000 if (ARGC > 4) usage() # default i = 1 if (ARGV[i] ~ /^-[0-9]+$/) { count = -ARGV[i] ARGV[i] = "" i++ } # test argv in case reading from stdin instead of file if (i in ARGV) i++ # skip data file name if (i in ARGV) { outfile = ARGV[i] ARGV[i] = "" } s1 = s2 = "a" out = (outfile s1 s2) } The next rule does most of the work. tcount (temporary count) tracks how many lines have been printed to the output file so far. If it is greater than count, it is time to close the current file and start a new one. s1 and s2 track the current suffixes for the file name. If they are both `z', the file is just too big. Otherwise, s1 moves to the next letter in the alphabet and s2 starts over again at `a'. { if (++tcount > count) { close(out) 97 if (s2 == "z") { if (s1 == "z") { printf("split: %s is too large to split\n", \ FILENAME) > "/dev/stderr" exit 1 } s1 = chr(ord(s1) + 1) s2 = "a" } else s2 = chr(ord(s2) + 1) out = (outfile s1 s2) tcount = 1 } print > out } The usage function simply prints an error message and exits. function usage( e) { e = "# usage: split [-num] [sourcefile] [destinationfile]" print e > "/dev/stderr" exit 1 } The variable e is used so that the function fits nicely on the page. This program is a bit sloppy; it relies on awk to close the last file for it automatically, instead of doing it in an END rule. 98 Appendix 2: Snapshoot of using a file splitting tool(freeware) In this operation, I split a file Berlin Stockholm.rar by Fast File Splitter. Then I got 199 blocks of this file and every piece is 50Kb except for the last block with only 28Kb. Use Join bottom, they can be restored the original file Stockholm.rar in 2-3 seconds. 99 Appendix 3: Record of a movie file downloaded via BitTorrent Microsoft Windows 2000 [Version 5.00.2195] (C) Copyright 1985-2000 Microsoft Corp. C:\Documents and Settings\Administrator>ipconfig /all Windows 2000 IP Configuration Host Name . . . . . . . Primary DNS Suffix . . Node Type . . . . . . . IP Routing Enabled. . . WINS Proxy Enabled. . . DNS Suffix Search List. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : : : : : : singingbridge Mixed No No swipnet.se Ethernet adapter Local Area Connection: Connection-specific DNS Suffix . : swipnet.se Description . . . . . . . . . . . : Realtek RTL8139/810X Family PCI Fast Ethern et NIC Physical Address. DHCP Enabled. . . Autoconfiguration IP Address. . . . Subnet Mask . . . Default Gateway . DHCP Server . . . DNS Servers . . . Lease Obtained. . Lease Expires . . . . . . . . . . Enabled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : : : : : : : : : : 00-E0-00-99-5F-53 Yes Yes 213.100.32.29 255.255.255.192 213.100.32.1 130.244.196.150 130.244.127.169 2004 年 2 月 28 日 10:16:08 2004 年 2 月 28 日 12:16:08 ************************************host information*************************************** C:\Documents and Settings\Administrator>netstat -a Active Connections Proto Local Address Foreign Address State TCP singingbridge:echo singingbridge:0 LISTENING TCP singingbridge:discard singingbridge:0 LISTENING TCP singingbridge:daytime singingbridge:0 LISTENING TCP singingbridge:qotd singingbridge:0 LISTENING TCP singingbridge:chargen singingbridge:0 LISTENING TCP singingbridge:epmap singingbridge:0 LISTENING TCP singingbridge:1025 singingbridge:0 LISTENING TCP singingbridge:1026 singingbridge:0 LISTENING TCP singingbridge:1039 singingbridge:0 LISTENING TCP singingbridge:1097 singingbridge:0 LISTENING TCP singingbridge:1365 singingbridge:0 LISTENING TCP singingbridge:1370 singingbridge:0 LISTENING TCP singingbridge:1473 singingbridge:0 LISTENING TCP singingbridge:1488 singingbridge:0 LISTENING TCP singingbridge:1491 singingbridge:0 LISTENING TCP singingbridge:1497 singingbridge:0 LISTENING TCP singingbridge:1498 singingbridge:0 LISTENING TCP singingbridge:6000 singingbridge:0 LISTENING TCP singingbridge:6001 singingbridge:0 LISTENING TCP singingbridge:1028 singingbridge:0 LISTENING TCP singingbridge:56666 singingbridge:0 LISTENING TCP singingbridge:netbios-ssn singingbridge:0 LISTENING TCP singingbridge:1365 220.170.35.125:8882 ESTABLISHED TCP singingbridge:1370 210-85-158-205.cm.apol.com.tw:8883 ESTABLISHED (used tracert to see number of hops, 20 hops till the last timeout) TCP singingbridge:1473 221.216.102.211:6000 ESTABLISHED TCP singingbridge:1488 221.205.107.136:1883 ESTABLISHED TCP singingbridge:1491 218.80.60.77:6881 ESTABLISHED TCP singingbridge:1497 61.149.22.39:6881 ESTABLISHED TCP singingbridge:1498 218.61.139.44:6000 ESTABLISHED TCP singingbridge:6000 61.191.173.136:2155 TIME_WAIT TCP singingbridge:6000 61.191.173.136:2578 TIME_WAIT 100 TCP singingbridge:6000 81-223-102-114.Fuenfhaus.Xdsl-line.inode.at:33124 TIME_WAIT TCP singingbridge:6000 81-223-102-114.Fuenfhaus.Xdsl-line.inode.at:33231 TIME_WAIT TCP singingbridge:6000 dsl-210-11-209-190.syd.level10.net.au:3931 TIME_WAIT (used tracert to see number of hops 22) TCP singingbridge:6000 210.51.228.209:26429 ESTABLISHED TCP singingbridge:6000 211.97.62.204:2369 TIME_WAIT TCP singingbridge:6000 218.11.2.106:13269 TIME_WAIT TCP singingbridge:6000 218.11.2.106:13305 TIME_WAIT TCP singingbridge:6000 218.69.1.10:2044 ESTABLISHED TCP singingbridge:6000 218.246.236.64:65181 ESTABLISHED TCP singingbridge:6001 61.172.29.12:25138 ESTABLISHED (used tracert to see number of hops, 23 hops) TCP singingbridge:6001 61.185.210.132:44722 ESTABLISHED TCP singingbridge:6001 61.185.237.132:34403 ESTABLISHED TCP singingbridge:6001 61.187.64.210:36890 ESTABLISHED TCP singingbridge:6001 chenpc-216.tamu.edu:4474 ESTABLISHED (used tracert to see number of hops, 22 hops) TCP singingbridge:6001 202.105.131.102:3086 ESTABLISHED TCP singingbridge:6001 202.105.131.102:3708 TIME_WAIT TCP singingbridge:6001 202.110.201.218:38689 ESTABLISHED TCP singingbridge:6001 dsl-210-11-209-190.syd.level10.net.au:4999 ESTABLISHED TCP singingbridge:6001 210.51.228.209:12572 TIME_WAIT TCP singingbridge:6001 210.51.228.209:21800 TIME_WAIT TCP singingbridge:6001 210.51.228.209:31034 TIME_WAIT TCP singingbridge:6001 210.51.228.209:41882 ESTABLISHED TCP singingbridge:6001 210.51.228.209:50026 TIME_WAIT TCP singingbridge:6001 210.53.6.74:46445 ESTABLISHED (used tracert to see number of hops) TCP singingbridge:6001 210.211.11.42:4092 ESTABLISHED TCP singingbridge:6001 211.93.112.98:3221 ESTABLISHED TCP singingbridge:6001 211.147.255.124:4758 ESTABLISHED TCP singingbridge:6001 211.158.78.6:1055 ESTABLISHED TCP singingbridge:6001 211.160.16.2:45335 ESTABLISHED TCP singingbridge:6001 211.167.203.34:1499 ESTABLISHED TCP singingbridge:6001 211.167.203.34:3184 TIME_WAIT TCP singingbridge:6001 211.167.203.34:3231 TIME_WAIT TCP singingbridge:6001 218.0.237.158:3009 ESTABLISHED TCP singingbridge:6001 218.7.35.160:3678 ESTABLISHED TCP singingbridge:6001 218.11.2.106:10795 ESTABLISHED TCP singingbridge:6001 218.13.209.176:21977 ESTABLISHED TCP singingbridge:6001 218.66.88.249:3677 ESTABLISHED (used tracert to see number of hops) TCP singingbridge:6001 218.66.88.249:3788 ESTABLISHED TCP singingbridge:6001 218.66.88.249:3826 ESTABLISHED TCP singingbridge:6001 218.66.210.154:1929 ESTABLISHED TCP singingbridge:6001 218.69.1.10:6781 TIME_WAIT TCP singingbridge:6001 218.72.151.18:3194 ESTABLISHED TCP singingbridge:6001 218.76.145.96:3547 ESTABLISHED TCP singingbridge:6001 218.79.90.178:4761 ESTABLISHED TCP singingbridge:6001 218.79.133.4:4564 ESTABLISHED TCP singingbridge:6001 218.80.60.77:3487 ESTABLISHED TCP singingbridge:6001 218.80.60.77:3488 ESTABLISHED TCP singingbridge:6001 218.80.60.77:4440 TIME_WAIT TCP singingbridge:6001 218.80.60.77:4448 TIME_WAIT TCP singingbridge:6001 218.80.60.77:4727 TIME_WAIT TCP singingbridge:6001 218.86.231.211:3260 ESTABLISHED TCP singingbridge:6001 218.109.32.9:3647 ESTABLISHED TCP singingbridge:6001 219.138.154.173:30083 ESTABLISHED TCP singingbridge:6001 220.186.180.109:4417 ESTABLISHED (used tracert to see number of hops) TCP singingbridge:6001 220.196.170.3:3909 ESTABLISHED TCP singingbridge:6001 221.204.33.79:3473 ESTABLISHED TCP singingbridge:6001 221.209.150.13:4341 ESTABLISHED (used tracert to see number of hops) UDP singingbridge:echo *:* UDP singingbridge:discard *:* UDP singingbridge:daytime *:* UDP singingbridge:qotd *:* UDP singingbridge:chargen *:* UDP singingbridge:netbios-ns *:* UDP singingbridge:netbios-dgm *:* UDP singingbridge:isakmp *:* UDP singingbridge:router *:* UDP singingbridge:4500 *:* 101 ******************************BitTorrent Peers connected to the host******************************* C:\Documents and Settings\Administrator>tracert dsl-210-11-209-190.syd.level10.net.au Tracing route to dsl-210-11-209-190.syd.level10.net.au [210.11.209.190] over a maximum of 30 hops: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 10 <10 <10 <10 <10 <10 <10 10 80 90 150 150 150 151 300 301 301 300 460 311 311 1322 ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms <10 <10 <10 <10 <10 <10 <10 <10 90 90 151 151 161 150 310 310 310 311 491 300 310 2603 ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms <10 <10 <10 <10 <10 <10 10 10 90 90 160 160 150 160 311 311 310 310 310 301 311 721 ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms c213-100-32-1.swipnet.se [213.100.32.1] cty3-core.gigabiteth1-1.swip.net [130.244.189.1] stb1-core.srp2-0.swip.net [130.244.194.246] sl-gw10-sto-5-0.sprintlink.net [80.77.97.21] sl-bb20-sto-8-0.sprintlink.net [80.77.96.37] sl-bb21-sto-15-0.sprintlink.net [80.77.96.34] sl-bb21-cop-12-0.sprintlink.net [213.206.129.33] sl-bb20-cop-15-0.sprintlink.net [80.77.64.33] sl-bb21-msq-10-0.sprintlink.net [144.232.19.29] sl-bb22-rly-15-3.sprintlink.net [144.232.19.98] sl-bb22-sj-10-0.sprintlink.net [144.232.20.186] sl-bb20-sj-15-0.sprintlink.net [144.232.3.166] sl-st20-pa-15-1.sprintlink.net [144.232.20.42] sl-newzeal-1-0.sprintlink.net [144.223.243.18] p5-0.sjbr1.global-gateway.net.nz [202.37.246.202] p1-0.sybr3.global-gateway.net.nz [202.50.116.193] p4-0.sybr2.global-gateway.net.nz [202.50.119.86] con3.sybr2.global-gateway.net.nz [202.37.246.238] pos2-0-0.bdr2.hay.connect.com.au [210.8.219.242] g0-2.cor6.hay.connect.com.au [210.8.134.92] DLEV140780-5.gw.connect.com.au [210.8.226.141] dsl-210-11-209-190.syd.level10.net.au [210.11.209.190] Trace complete. C:\Documents and Settings\Administrator>tracert chenpc-216.tamu.edu Tracing route to chenpc-216.tamu.edu [165.91.170.208] over a maximum of 30 hops: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 <10 <10 <10 <10 <10 <10 10 10 20 20 40 30 30 90 ms ms ms ms ms ms ms ms ms ms ms ms ms ms <10 <10 <10 <10 <10 10 10 20 21 20 30 40 40 100 ms ms ms ms ms ms ms ms ms ms ms ms ms ms <10 <10 <10 <10 <10 10 10 20 20 20 40 40 30 100 ms ms ms ms ms ms ms ms ms ms ms ms ms ms c213-100-32-1.swipnet.se [213.100.32.1] htg3-core.gigabiteth4-0.swip.net [130.244.189.2] stb1-core.srp2-0.swip.net [130.244.194.246] sl-gw10-sto-5-0.sprintlink.net [80.77.97.21] sl-bb21-sto-8-0.sprintlink.net [80.77.96.41] sl-bb21-cop-12-0.sprintlink.net [213.206.129.33] sl-bb20-cop-15-0.sprintlink.net [80.77.64.33] sl-bb20-ham-13-0.sprintlink.net [213.206.129.54] sl-bb21-ams-14-0.sprintlink.net [213.206.129.49] sl-bb20-ams-15-0.sprintlink.net [217.149.32.33] 213.206.131.46 ae-0-55.mp1.Amsterdam1.Level3.net [213.244.165.97] so-1-0-0.mp1.London2.Level3.net [212.187.128.49] so-1-0-0.bbr1.Washington1.Level3.net [212.187.128.138] 15 16 17 18 19 20 21 22 121 131 130 131 130 130 * 130 ms ms ms ms ms ms 130 130 130 130 130 130 * 141 ms ms ms ms ms ms 130 130 140 140 140 140 * 130 ms ms ms ms ms ms unknown.Level3.net [209.247.9.102] so-6-0.ipcolo1.Dallas1.Level3.net [4.68.112.178] p0-0.texasamu.bbnplanet.net [4.25.100.2] csce-7--dmzf-ci-g-10.net.tamu.edu [165.91.254.4] csce-1.net.tamu.edu [165.91.2.2] evan-oc22-1.net.tamu.edu [128.194.1.72] Request timed out. chenpc-216.tamu.edu [165.91.170.208] ms ms ms Trace complete. C:\Documents and Settings\Administrator>tracert 61.172.29.12 Tracing route to 61.172.29.12 over a maximum of 30 hops 1 2 <10 ms <10 ms <10 ms <10 ms <10 ms <10 ms c213-100-32-1.swipnet.se [213.100.32.1] htg3-core.gigabiteth4-0.swip.net [130.244.189.2] 102 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 <10 <10 <10 11 10 81 90 150 150 150 170 641 651 651 661 691 671 691 * * 681 ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms <10 <10 <10 10 10 90 90 151 151 150 170 651 671 661 661 691 671 681 * * 640 ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms <10 10 <10 10 10 90 90 160 160 151 171 641 661 651 661 691 671 691 * * 661 ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms stb1-core.srp2-0.swip.net [130.244.194.246] sl-gw10-sto-5-0.sprintlink.net [80.77.97.21] sl-bb21-sto-8-0.sprintlink.net [80.77.96.41] sl-bb21-cop-12-0.sprintlink.net [213.206.129.33] sl-bb20-cop-15-0.sprintlink.net [80.77.64.33] sl-bb21-msq-10-0.sprintlink.net [144.232.19.29] sl-bb22-rly-15-3.sprintlink.net [144.232.19.98] sl-bb22-sj-10-0.sprintlink.net [144.232.20.186] sl-bb24-sj-13-0.sprintlink.net [144.232.3.214] sl-st21-pa-15-2.sprintlink.net [144.232.9.10] sl-china4-1-0.sprintlink.net [144.223.243.62] 202.97.51.205 202.97.33.89 202.101.63.253 218.1.1.153 218.1.1.17 218.1.10.162 218.1.10.182 Request timed out. Request timed out. 61.172.29.12 Trace complete. 103 IMIT/LCN 2004-05 www.kth.se
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
advertisement