Scaling Network Verification using Symmetry and Surgery Gordon D. Plotkin† Nikolaj Bjørner‡ Nuno P. Lopes‡ Andrey Rybalchenko‡ George Varghese‡ LFCS, University of Edinburgh† Microsoft Research‡ [email protected] {nbjorner,nlopes,rybal,george}@microsoft.com Abstract On the surface, large data centers with ∼ 105 stations and nearly a million routing rules are complex and hard to verify. However, these networks are highly regular by design; for example they employ fat tree topologies with backup routers interconnected by redundant patterns. To exploit these regularities, we introduce network transformations: given a reachability formula ϕ and a network N , we transform N into (a simpler to verify) network N̄ and a corresponding transformed formula ϕ such that (for example) ϕ is valid in N if and only ϕ is valid in N̄ . Our network transformations exploit network surgery (in which irrelevant or redundant sets of nodes, headers, ports, or rules are “sliced” away) and network symmetry (say between backup routers). The validity of these transformations is established using a formal theory of networks. In particular, using Van BenthemHennessy-Milner style bisimulation, we show that one can generally associate bisimulations to transformations connecting networks and formulas with their transforms. Our work is a development in an area of current wide interest: applying programming language techniques (in our case bisimulation and modal logic) to problems in switching networks. We provide experimental evidence that our network transformations can speed up the task of verifying the communication between all pairs of Virtual Machines in a large datacenter network with ∼100,000 VMs by 65×. An all-pair reachability calculation, which formerly took 5.5 days, can be done in 2 hours, and can be easily parallelized to complete in minutes. Categories and Subject Descriptors ing]: Software/Program Verification D.2.4 [Software Engineer- General Terms Verification, Theory Keywords Network Verfication, Symmetries, Semantics 1. Introduction Cloud services such as Dropbox, Google, iCloud, Amazon, and Azure contain up to a million inexpensive servers connected by a data center network. A single service request such as a Web search is split among hundreds of servers that communicate to produce the response. As our dependency on networks increases, the reliability of data center networks becomes increasingly critical. However, surveys show [19, 31] that network outages are quite common. Thus verification technologies that proactively prevent potential network disruptions are valuable. A network, as shown in Figure 1, consists of boxes (routers, switches, firewalls, henceforth referred to as routers) that forward packets from input ports to output ports. We abstract the dataplane, or forwarding component of a router, as a set of rules that map predicates on packet headers (e.g., 32-bit IP addresses starting with 101) to output ports; some rules called Access Control Lists (ACLs) use complex predicates (e.g., sources from outside Microsoft that send SQL packets) on sets of fields to decide which packets must be dropped. The rules may also prescribe changes in packet headers. The output ports of each router are physically connected to the input ports of other routers as specified by a network topology. The network program, the composition of all the router programs following the topology, then maps packets from entry points in the network through the network to exit points. This abstraction only enables qualitative reasoning, for example about packet reachability or about looping. It abstracts away quantitative issues, such as performance metrics (e.g., delay) and ignores the control plane that builds the forwarding rules. The correctness requirements [17] are also simple: they specify the headers that can communicate between hosts, together with predicates on the paths the headers traverse. So wherein, therefore, lies the complexity that justifies the emerging area of network verification [15–17, 22, 30]? First, manual rules added by operators interact with the rules computed by automatic routing protocols. Second, the forwarding rules typically involve load balancing by which a packet can be sent by many possible routes to its destination. Third, routers sometimes rewrite packets. Fourth, while the program structure is simple, the state space is large. Forwarding rules operate on at least the IP and TCP headers and other fields such as MPLS [18]. Thus conservatively, we have a space of 280 headers, millions of rules, and a large number N of servers (as many as 1 million) that can communicate. Checking reachability across N 2 pairs of stations for 280 headers to find a network policy violation is a scaling challenge. Early systems [22] found a single violation using SAT solvers. Later systems [16, 17] found all violations for medium sized networks using symbolic execution. NetPlumber [15] introduced efficient incremental analysis. Finally, Yang and Lam [30] used predicate abstraction to rewrite router rules in a form that is much more efficient to verify. However, these improvements do not target the scaling challenge. For example, it was reported in [10] that the cost of verifying reachability for all pairs of stations took around 1 day, even on a small university network. Verifying all pairs in a large data center in a few minutes (the time scale at which a network can be reconfigured) seemed out of reach for earlier techniques. ... ... ... Core ... Aggregation Edge Leaf Figure 1. Data center network schematic 1.1 Scaling Verification using Network Transformations Fortunately, most large networks are designed using design patterns that enforce regularities that we can exploit. For example, data center networks are often arranged in fat-trees (Figure 1), which are leveled graphs where routers at a given level are symmetrically connected to multiple routers at adjacent levels. This is done for load balancing and resiliency reasons. For example, in Figure 2 R3 and R4 are symmetrically placed, for if they are interchanged and the ports on R3 , R4 and the corresponding ports of their neighbors R1 , R2 and R5 are correspondingly interchanged, then the global topology stays the same. Thus, by design, one may expect the rule sets of symmetrically placed routers at a given level (e.g., R3 and R4 ) to be symmetrical. This suggests that symmetrical routers such as R3 and R4 can be replaced by a single equivalent (with respect to packet forwarding) router as in Figure 2 without changing the qualitative properties of the network. Doing so reduces the rules in the new network, and in the limit can transform a “fat tree” into a “thin tree”. Z Z R1 R1 R2 ... R2 ... ··· R3 p1 ... R4 Transforms to ⇐⇒ R3 p2 R5 X R5 Y X Y Figure 2. Suppose that interchanging R3 and R4 as well as corresponding marked ports on their neighbours, e.g., p1 and p2 on R5 , leaves reachability the same. Then R3 and R4 can be merged into a single router in the transformed network on the right. Further, the hierarchical structure implies that communication within subtrees should stay local. Thus two stations X and Y attached to the same edge (or ToR, for Top of Rack switch) R5 (see Figure 2) should typically communicate only via R5 . Thus, for verification of the communication between X and Y, the rest of the network is irrelevant as long as one can prove that traffic from X to Y does not “leak” out from the ToR R5 . This suggests a general notion of slicing away irrelevant rules and portions of the network. We refer to such slicing of ports and rules as network “surgery”. One can also slice networks in terms of headers. For example, if in reality the two backup routers R3 and R4 of Figure 2 are nearly identical except for two local addresses h1 in R3 and h2 in R4 , we might hope to slice the network into two equivalent networks operating on disjoint sets of headers, one of which has R3 and R4 perfectly symmetric, and one of which has the standard topology but with a smaller set of rules (say dealing with only h1 and h2 ). Both symmetry-induced and slicing-based transformations are network transformations that transform the original network into a sequence of simpler networks with equivalent forwarding, but with smaller size in terms of boxes, rules, and links. If the verification effort scales with size, and the transformations are efficient, the overall verification time will be faster. To proceed, we need a model of networks and a way to write reachability specifications. To this end we use an operational semantics treating each router as a state machine. We then use a standard modal logic defined on the states to describe desired behaviors. The logic permits assertions that certain (anonymous) transi- tions occur, that a packet header has reached a certain port, or that a packet header has a certain property. To connect network transformations with the logic we generally show that they yield bisimulations (in the sense of Van Benthem and Park and Milner [28]) between a network N and its transform N̄ . We then employ the Van Benthem-Hennessy-Milner principle, as used in modal logic and process calculus [24, 28], to show the validity of a transformation using bisimulation. In process calculus this principle states that if there is a bisimulation between two processes P and P , then P has a given property ϕ, if, and only if, P does. In our case, as, for example, N̄ may have different ports from N , it may be necessary to modify the formula ϕ asserted of N to another, ϕ, asserted of N̄ (the analogous situation in process calculus would be when comparing processes with different action alphabets, see, e.g., [4]). We propose two different ways to employ the Van BenthemHennessy-Milner technique for network verification. One is to reduce the size of N , for example, by slicing away parts of the network irrelevant to the proposition, or else by exploiting symmetry either to remove essentially duplicate rules, or to merge symmetrically placed routers with symmetrical semantics. The other is to reduce the number of properties to be verified, by finding symmetries between propositions about symmetrically placed parts of the network or by identifying equivalent packet headers. In doing so, it may be useful to pass through a sequence of such transformations, producing a sequence N1 ∼1 . . . ∼n−1 Nn of bisimilar networks. Our contributions (with a paper outline) are: 1. A Theory of Network Dataplanes: To formalize network transformation, we introduce a network model (Sections 2.1, 2.2), a modal logic (Section 2.3), and a proof technique based on bisimulation to verify networks (Section 3). After describing surgeries, we describe a compositional way of generating bisimulations (Section 5). A side effect of all this machinery is the ability to formalize earlier concepts such as slicing [16], and (a generalized version of) Yang-Lam equivalence [30]. 2. A toolbox of network transformations: We provide a toolbox of network transformations (Examples 1 through 4) based on surgery (Section 4) and header merging and symmetry (Section 6) that can be applied iteratively to simplify the verification task. There are generally bisimulations corresponding to each. As with earlier work (e.g., [7]), we exploit symmetry over state spaces. However, to do this we exploit the structure of the network domain where the state consists of both headers and locations, and the program is distributed over boxes and rules, enabling symmetries to be constructed from both topological and semantic components. 2. Scaling comprehensive Network Verification: We show (Section 7) how one might scale comprehensive network evaluation for large production data centers over all pairs of stations. In particular, for a Microsoft data center in Singapore with ∼820K rules, we reduce the time to verify reachability between all pairs of VMs from 5.5 days to 2 hours on a single core using only simple transformations, a 65× reduction. Earlier work [15–17, 30] did not report the running times for reachability between all N 2 pairs. 2. Networks and their logics As a foundation for our theory, we provide a formal definition of switching networks as graphs of interconnected boxes of various kinds, such as routers, switches, bridges, or firewalls. This is network syntax. Having a syntax available, we then define network semantics in terms of suitable kinds of transition relations. The semantics, in its turn, supports a suitable modal logic, which can be used to specify network properties that we wish to verify. 2.1 Network syntax Networks are constructed using a box signature that consists of a set of boxes b ∈ Box, with each box having a given finite collection K ⊆fin N of ports, written b : K; we write Box for the signature. (We use a set, rather than a sequence, of ports to prepare the ground for network surgery in which we slice away ports to transform a network.) Given such a box signature, a network N consists of • a finite set NodeN of nodes, • a box assignment function βN : NodeN → BoxN , Next, packet exit ports j may be independent of their entry ports i. Formally, we additionally have for any box b : K in the network: h @ i1 −→Box, b h0 @ i0 ⇐⇒ h @ i2 −→Box, b h0 @ i0 for all i1 , i2 , i0 ∈ K and h, h0 . This happens when the forwarding tables are “centrally located” and are used to route all incoming messages to exit ports, independently of their entry port, and when ACLs are given per exit port; the Singapore network again provides an example. In this case the the semantics can be equivalently described in terms of “rules” of the form h 7→ I where I ⊆ K, with one such rule for each h, namely the one where • a symmetric, 1-1, irreflexive, port connection or link relation I =def {j ∈ K | ∃i ∈ K. h @ i →Box, b h @ j} γN on the set PortN of network ports where: PortN =def {(n, i) | n ∈ NodeN , βN (n) : K, and i ∈ K} The idea behind having both boxes and nodes is that the same box (e.g., backup router) can be replicated at different points of the network; one can think of the nodes as being box id’s attached by βN to their boxes. We use p to range over ports, and write the pairs (n, i) as n.i. The connection relation γN specifies how ports are connected to each other and hence encodes the network topology. It is symmetric as packets can flow in either direction; it is 1-1 to model physical connection; it is irreflexive to avoid tight loops. Not all ports need be connected to another port; such unconnected ports are called external; the others are called internal. Definitions in this style of various forms of networks made up out of boxes with ports can be found in other contexts, for example sharing graphs [12], bigraphs [25], or cyclic networks [13]; other definitions can be found in the network literature, for example [32]. 2.2 Network semantics For the network semantics, we first assume given a set PacN of packet headers ranged over by h (below we may just say “packet”); we call PacN the header space (of N ). Packet headers are typically bit strings – hence the 280 possibilities alluded to in the introduction — but one may use any convenient set. For each box b : K we then have the set of box-located packets, where such a located packet is a pair, written h @ i, of a packet h and a box location, that is, a box port i (with i ∈ K). In other words, a box-located packet is a specific packet located at a specific port in the network. Next, for each box b ∈ Box, we assume given a transition relation h @ i −→Box, b h0 @ i0 between box-located packets. Note that packets may be changed (rewritten) in a transition as well as their location. When the signature can be understood from the context, we will generally just write h @ i −→b h0 @ i0 . These transitions are typically given by the data plane of a router, via forwarding tables, ACLs (access control lists), and packet rewriting. We do not specify here what these may be, or how they induce transition relations (however, apart from Section 7, our work is independent of such considerations). The box transitions arise from packets moving from an input port of a router, through its internal ports, and, possibly changed, on to an output port, following its set of rules. In many switching networks only packet locations change, as there are no packet rewriting rules. That is, packet headers are not changed by any of the boxes in the network. Formally, for any box b : K in the network we have: h @ i −→Box, b h0 @ i0 =⇒ h0 = h 0 0 for all i, i ∈ K and h, h . This holds, for example, for the Singapore network considered in Section 7. With the box transition semantics available, we define the network semantics in terms of two transition relations between states: an internal one, moving to internal network nodes, and an external one, moving to external network nodes. The set, StatesN , of states of the network N consists of the network-located packets; these are pairs, written h @ p, of a packet h and a network location, that is, a network port p. The internal network transition relation: h @ n.i −→N h0 @ n0 .i0 holds if, and only if, for some j, h @ i −→β(n) h0 @ j and n.j γ n0 .i0 . In some sense we are doing two transitions in one step in that a packet first follows a box transition at input port i to output port i0 in the same box, and then moves to the input port j of the router to which i is connected (as specified by the port connection relation γ). A similar “short-circuiting” is done in the Header Space model [16]. The external network transition relation: h @ n.i N h0 @ n0 .i0 holds iff h @ i −→β(n) h0 @ i0 where n.i0 is an external port. We write N for the total network transition relation →N ∪ N . Below, when they can be understood from the context, we generally omit network suffixes from NodeN , βN and so on. However, network suffixes are essential when we describe transformations between networks. The double packet/location nature of states gives the subject a special interest. Note that our states only keep track of the location of one packet, not of several as would be natural if we were interested in concurrent aspects of network operation. Our purpose, as elsewhere in the network verification literature, see, e.g., [33], is rather to verify properties such as reachability, concerning only one packet at a time. A notion of network slice will prove important. In its most general form, a slice is a subset S of States, the set of states. We say that a slice is a network invariant if, in addition, we have: h @ p ∈ S ∧ (h @ p N h0 @ p) =⇒ h0 @ p0 ∈ S A less general, but useful case, already considered in [16], is where the slice is a product H × P of sets H and P of packets and ports. 2.3 A Network logic We next spell out a small natural modal logic for such networks that supports the application of the van Benthem-Hennessy-Milner principle while being enough to cover typical network requirements such as reachability relations between hosts. We first assume given a collection of packet formulas α to specify properties of packets, together with a satisfaction relation h |= α for them, where h ∈ PacN . For example such packet formulas may consist of wildcard expressions such as 101xxxxx for packets with (let’s say) eight bit destination IP addresses that start with 101 as in Header Space Analysis [16]. One can then define a largely standard modal logic by the following grammar of N -formulas: Original Complex Core k0 k0 l k ϕ ::= α | @ p | ⊥ | ¬ϕ | ϕ ∧ ϕ | ♦ϕ | Fϕ The modal worlds are the network states. The formula α expresses that α holds of the state packet; @ p expresses that the state location is p; ♦ϕ expresses that one can reach a state where ϕ holds in zero or more internal steps from the current state; and Fϕ expresses that one can reach a state where ϕ holds by an external step from the current state. Other connectives are definable: the other propositional ones; the modality, expressing necessarily reaching, having taken zero or more internal steps; ♦T ϕ abbreviating ♦(ϕ ∨ Fϕ) and the corresponding T ϕ; and @ p1 , . . . , pn (for n > 0) abbreviating @ p1 ∨ . . . ∨ @ pn . More expressive logics can be obtained, for example, by adding fixed-points µX.ϕ[X], but the simpler logic seems sufficiently expressive. Given a network N , and a semantics for it, we obtain a semantics for the logic, following the above ideas. As usual, we define a satisfaction relation: h @ p |=N ϕ – that in network N the packet h satisfies the formula ϕ when located at port p — by structural induction on formulas. The clauses for the boolean formulas are as usual; the others are: h @ p |=N α ⇐⇒ def h @ p |=N @ p0 ⇐⇒ def h @ p |=N ♦ϕ ⇐⇒ def h @ p |=N Fϕ ⇐⇒ def h |= α p = p0 ∃h0 , p0 . h @ p→N ∗ h0 @ p0 ∧ h0 @ p0 |= ϕ 0 0 ∃h , p . h @ pN h0 @ p0 ∧ h0 @ p0 |= ϕ There are then two natural validity notions for a given network N (reflecting the double nature of states): one where a header validates a formula if the header satisfies the formula when located at any port; and the other where all headers satisfy the formula when located at any port: h |=N ϕ ⇐⇒ ∀p. h @ p |=N ϕ and |=N ϕ ⇐⇒ ∀h. h |=N ϕ (There is also a third validity p |=N ϕ, but we do not know of a use for it.) Let us give a few examples to illustrate the expressiveness of the logic. First, |=N α ∧ @ p1 ⇒ ♦( @ p2 , p3 ∧ ♦F(α0 ∧ @ p4 )) holds if, whenever a network packet satisfying α is at port p1 then it can reach an external port p4 via p2 or p3 , being meanwhile transformed into a packet satisfying α0 . In the same vein, suppose a network slice S is defined by a boolean combination of atomic formulas, ϕ. Then the slice is a network invariant if, and only if, |=N ϕ ⇒ T ϕ holds. Suppose that S 0 is another slice, defined by a boolean combination of atomic formulas, ϕ0 . Then |=N ϕ ⇒ ¬♦T ϕ0 holds if the second slice cannot be accessed from the first. This models the standard notion of slicing (a form of network virtualization that requires isolation between slices) considered in [16, 17], but our formalism can express alternative notions. l R1 i i 0 R2 0 j j0 V1 Core by hub surgery ⇐⇒ V2 H l l0 k R1 R2 0 i i V1 j j0 V2 Figure 3. Replacing the core by a hub. Continuing, suppose that p is an internal port. Then |=N α ∧ @ p ⇒ @ p holds if any packet with header satisfying α and reaching p is dropped (or loops through p forever). Next, h |=N @ p ⇒ ♦(¬ @ p ∧ ♦ @ p) holds if any packet with header h, loops through p via some other port, returning to p with its header possibly being transformed. This is a generic loop, in the terminology of [16]. However, if one has an invariant α available, one can detect some infinite loops: |=N α ∧ @ p ⇒ ♦(¬ @ p ∧ ♦(α ∧ @ p)) holds if any packet with header satisfying α loops through p, via some other port, with its header, however transformed, still satisfying α. 2.4 A first surgery Our first example of a surgery shows how to replace a core by a hub, while preserving reachability formulas. Example 1: Replacing a core by a hub Suppose our network N consists of a “core” Ncore through which ToRs (or other boxes) communicate via a single link. Suppose too that packet headers are not changed by any of the boxes in the network. Then if the core is obstruction-free, in a suitable sense, reachability queries between network machines accessing the core via those boxes hold if, and only if, they hold for a greatly reduced network in which the core is replaced by a hub: see Figure 3. As shown there, the core network Ncore has external ports k0 and l (at nodes a and b, say). Fix a “traffic set” T ⊆ PacN of packet headers; typically this will be the packets addressed to the VMs handled by R2 . Then we assume the core is obstruction-free for T between a.k0 and b.l, i.e., that, for all h ∈ T we have: h @ a.k0 −→∗Ncore h @ b.l The network N̄ is the same as N except that the core has been replaced by a hub H with ports k0 and l. Regarding its semantics we assume only that, for all h: h @ k0 →H h @ l In both cases we have boxes (ToR’s, say) R1 and R2 (at nodes r1 and r2 , say) to which (for example, virtual machines) V1 and V2 are connected, with ports as shown. Regarding logic, both networks have the same packet formulas, and the evident collections of port formulas. Then, for any h and p, we have that the reachability formula ϕreach , namely: αT ∧ @ r1 .i0 ⇒ ♦ @ v2 .j0 is satisfied by h @ p in N if, and only if, it is satisfied by h @ p in N̄ (we are assuming that we have a packet formula αT defining T, i.e., such that T = {h | h |= αT }). This is easily verified by direct consideration of the semantics of the formula in each of the two networks, coupled with the no-obstruction assumption. For either N or N̄ , h @ p |= ϕreach holds if, and only if, we have 0 h ∈ T ∧ p = r1 .i ∧ h @ p |= ♦ @ v2 .j 0 and we can then calculate, for h ∈ T: h @ r1 .i0 |=N ♦ @ v2 .j0 ⇐⇒ h @ r1 .i0 →∗N h @ v2 .j0 ⇐⇒ h @ i0 →R1 h @ k ∧ h @ a.k0 →∗Ncore h @ b.l ∧ h @ r2 .l0 →R2 h @ v2 .j0 ⇐⇒ h @ i0 →R1 h @ k ∧ h @ r2 .l0 →R2 h @ v2 .j ⇐⇒ h @ i0 →R1 h @ k ∧ h @ k0 →H h @ l ∧ h @ r2 .l0 →R2 h @ v2 .j0 ⇐⇒ h @ r1 .i0 →∗N̄ h @ v2 .j0 ⇐⇒ h @ r1 .i0 |=N̄ ♦ @ v2 .j0 In the calculation we use the fact that the network does not alter packet headers and, in the third equivalence, the fact that Ncore is obstruction-free for T between a.k0 and b.l. 3. ϕ ∼For ϕ ϕ ∼For ϕ ♦ϕ ∼For ♦ϕ Fϕ ∼For Fϕ P ROOF. This is a standard proof. There are five cases: 1. As ⊥ never holds, we have ⊥∼For ⊥. 2. Suppose that ϕ ∼For ϕ and that h @ p ∼ h @ p. Then h @ p |=N ¬ϕ holds iff h @ p |=N ϕ does not hold iff h @ p |=N̄ ϕ does not hold (as we have ϕ ∼For ϕ) iff h @ p |=N̄ ¬ϕ holds. 3. Suppose that ϕ ∼For ϕ and that ψ ∼For ψ and h @ p ∼ h @ p. Then h @ p |=N ϕ ∧ ψ holds iff h @ p |=N ϕ and h @ p |=N ψ both hold iff h @ p |=N̄ ϕ and h @ p |=N̄ ψ both hold (as we have ϕ ∼For ϕ and ψ ∼For ψ) iff h @ p |=N̄ ϕ ∧ ψ holds. 4. Suppose that ϕ ∼For ϕ and that h @ p ∼ h @ p. Assume that h @ p |=N ♦ϕ holds, in order to show that h @ p |=N̄ ♦ϕ does. Note that then, for all h0 , p0 such that h @ p →∗N h0 @ p0 , h0 @ p0 |=N ϕ holds. 0 Network bisimulations To show the validity of network transformations, our proof technique is to define bisimulations ∼ between two networks N and N built using the same box signature, but possibly with different box semantics, for example between the networks on the left and right in Figure 2. We take these to be relations between network-located packets which are both bisimulations between −→N ∗ and −→N ∗ and bisimulations between N and N , each in the usual sense (note ∗ ). It is they are then also bisimulations between ∗N and N natural to consider, in particular, one-step bisimulations between N and N ; these are bisimulations between both −→N and −→N and N and N (and so also between N and N ). One can form relations between states from relations ∼Pa and ∼Po between the respective sets of packets and ports by: 0 Assuming given h , p0 such that h @ p →∗N h @ p0 , we have to 0 show h @ p0 |=N̄ ϕ holds. As ∼ is a bisimulation between N and N̄ and h @ p ∼ h @ p, there are h0 , p0 such that h @ p →∗N h0 @ p0 and h0 @ p0 ∼ h @ p0 . By the assumption we then have 0 h0 @ p0 |=N ϕ, and so, as ϕ ∼For ϕ, h @ p0 |=N̄ ϕ, holds as required. The proof that h @ p |=N ♦ϕ holds if h @ p |=N̄ ♦ϕ does is similar. 5. This is similar to the previous case. In case ∼ is built from relations ∼Pa (on packets) and ∼Po (on ports), as above, there are natural sufficient conditions for the correlations to hold between atomic formulas. Recall that atomic formulas are either of the form α (sets of packets) or @ p1 , . . . , pn (sets of ports). So, α ∼For α holds if ∀h, h. h ∼Pa h =⇒ (h |= α ⇐⇒ h |= α) and @ p1 , . . . , pm ∼For @ p1 , . . . , pn holds if h @ p ∼ h @ p ⇐⇒ def h ∼Pa h ∧ p ∼Po p ∀p, p. p ∼Po p =⇒ (p ∈ {p1 , . . . , pm } ⇐⇒ p ∈ {p1 , . . . , pn }) Then, given a relation ∼Po between ports, there is a largest relation ∼Pa between packets which, so combined with ∼Po , forms a (onestep) network bisimulation between N and N . We remark that, while there certainly are largest such (one-step) bisimulations, they do not seem very useful. The bisimulations relative to a fixed port relation are sensitive to ports visited and so do prove useful; they are analogous to those considered in [4]. Given a bisimulation ∼ between N and N we wish to transfer properties between the first network and the second, as is usual with bisimulations, so that proving a formula in the first network is equivalent to proving a transformed formula in the second network. As we have two logics here, speaking of different sets of packets and ports, we need to correlate formulas. To that end we define a correlation relation ∼For between N -formulas and N -formulas by taking ϕ ∼For ϕ to hold if, and only if, for all h, p, h, p we have: These conditions are also necessary, except in evident trivial cases. We remark that there is no commitment to the above logic for purposes of network verification. Proposition 1 demonstrates a wide range of properties that can be transferred between bisimilar networks. However one is free to express such properties using whatever logical or automata-theoretic means one desires. For example one can translate our logic into Datalog, so one can use [20]. h @ p ∼ h @ p =⇒ (h @ p |= ϕ ⇐⇒ h @ p |= ϕ) We then have the expected proposition expressing the Van BenthemHennessy-Milner principle for networks: P ROPOSITION 1. The formula correlation relation ∼For is closed under the logical connectives, that is the following implications hold: ⊥∼For ⊥ ϕ ∼For ϕ ¬ϕ ∼For ¬ϕ ϕ ∼For ϕ ψ ∼For ψ ϕ ∧ ψ ∼For ϕ ∧ ψ 4. Network surgery We next give three examples of network transformations by various forms of surgery, whether at the levels of graphs, headers, or rules. The networks are related to their transforms by bisimulations; these, in their turn, allow application of appropriate versions of the Van Benthem-Hennesy-Milner principle, justified by Proposition 1. We also revisit Example 1, and show why the obstruction-freeness assumption made there does not justify a helpful bisimulation. Our first example of a network transformation may appear too trivial to be useful (removing disconnected sets of nodes). However, as Figure 4 shows, it becomes effective after the following example, a more complex surgery called slicing. Example 2: Removing disconnected components When a network N consists of two disconnected subnetworks we can verify their properties separately. Suppose its nodes split into Then the slice is a network invariant if, and only if, the following two conditions hold: Slice irrelevant portions Transforms to via slicing (Example 2) ⇐⇒ R1 Nodes 1 X Nodes 2 Y R1 Nodes 2 Nodes 1 X Transforms to via removal (Example 1) ⇐⇒ Y • No leakage For all b : K, i ∈ K, and h ∈ H: h @ i −→b h0 @ j =⇒ (i, j) is not P -leaky • Header invariance For all b : K, h ∈ H, and P -connected pairs Nodes 1 X (i, j): Y Figure 4. When considering the reachability between stations in the same subtree, the remainder of the network can be sliced away (Example 3), leaving a disconnected set of Nodes that can be removed by Example 2. two disjoint subsets Node1 and Node2 which are disconnected in that for all ports n1 .i1 and n2 .i2 : n1 ∈ Node1 ∧ n1 ∈ Node2 =⇒ ¬ n1 .i1 γ n2 .i2 Then N naturally splits into two subnetworks N1 and N2 with respective node sets Node1 and Node2 and with βNi j and γNj being the evident restrictions of βN and γN , for j = 1, 2. Note that a port is internal (external) in an Nj iff it is in N . We then have that N is bisimilar to each of the Nj . For j = 1, 2, one can take as (one-step) bisimulation relation h @ n.i ∼j h @ n.i ⇐⇒ def h @ n.i = h @ n.i ∧ n ∈ Nodej The logic for Nj has as formulas those of N that only mention ports with nodes in Nodej . For such formulas, Proposition 1 tells us that that for all packets h and Nj -ports n.i we have: h @ n.i |=N ϕ ⇐⇒ h @ n.i |=Nj ϕ We now introduce the more powerful slicing transformation illustrated in Figure 4. When verifying the reachability between certain pairs of stations such as X and Y, under certain conditions it is possible to slice away irrelevant portions of the network which we formalize as slicing. Example 3: Slicing Suppose we have a slice of the form S = H × P of a network N , where H is a subset of the set of headers and P is a subset of the set of ports. We wish to define a corresponding slice network N |S obtained by cutting off all the ports not in P and restricting the header space to H. We restrict ourselves to the case where P is connection invariant, meaning that for all p, p0 ∈ PortN , we have: pγN p0 =⇒ (p ∈ P ⇐⇒ p0 ∈ P ) Thus we are going to cut off the port at one end of a link if, and only if, we cut off the port at its other end; in other words, we are cutting off either external ports or whole links. For slicing to work, we will need the slice to be a network invariant, as defined above. In the case of slices of the form H × P network invariance can be rephrased in terms of natural no-leakage and header invariance conditions on network boxes. For every box b : K occurring in the network (i.e., in the range of β) say that a box port pair (i, j) ∈ K 2 is P -leaky if, and only if, for some p ∈ P and q ∈ / P we have: ∃n. β(n) = b ∧ p = n.i ∧ n.j γ q and say that a box port pair (i, j) ∈ K 2 is P -connected if, and only if, for some p, q ∈ P we have: ∃n. β(n) = b ∧ p = n.i ∧ (n.j γ q ∨ (q = n.j ∧ q is external)) h @ i −→b h0 @ j =⇒ h0 ∈ H In the case where the slice is only on headers it evidently suffices to check invariance of H for all pairs of ports of all boxes occurring in the network. To define the slice network N |S, we begin by defining the slice signature Box|S. For every box b : K and every I ⊆ K, we assume available a Box|S box b|I : I; intuitively, b|I is obtained from b by slicing off all of b’s ports that are in K\I. We take the header space to be H and its semantics is the restriction of that of b to H, that is that, for all h, h0 ∈ H and i, i0 ∈ I we have: h @ i →Box|S,b|I h0 @ i0 ⇐⇒ h @ i →Box,b h0 @ i0 The slice network N |S is then obtained as follows: its node set is that of N ; its box assignment function is given by: βN |S (n) = βN (n)|{i ∈ K | n.i ∈ P } where β(n) : K; and its connection relation γN |I is the restriction of γN to N |I’s ports, i.e., to P . We note that the set of ports of N |S is P , as anticipated. (For n.i is such a port iff i ∈ K and n.i ∈ P , where β(n) : K. But if n.i ∈ P then n.i ∈ PortN and so i ∈ K; so we see that n.i is indeed such a port iff it is in PortN |S .) Note too that, as P is connection invariant, a port p ∈ P is an internal (external) port of N |S if, and only if, it is of N . In the particular case when we are only slicing on headers, that is when P is the set of all ports, N |S can be taken to be N , although one evidently still needs the restricted signature. We now show that, restricted to S, the transition relations of N and N |S are the same. L EMMA 1. 1. The internal (external) transition relation of N |S is contained in that of N . 2. The restriction of the internal (external) transition relation of N to the slice S is contained in that of N |S. P ROOF. 1. Suppose, first, that h @ n.i −→N |S h0 @ n0 .i0 in order to show that h @ n.i −→N h0 @ n0 .i0 . Then for some j we have h @ i −→Box|S, β(n) h0 @ j and also n.j γN |S n0 .i0 . We then have h @ i −→Box, β(n) h0 @ j and n.j γN n0 .i0 and so h @ n.i −→N h0 @ n0 .i0 . Next, suppose instead that h @ n.i N |S h0 @ n0 .i0 . Then we have h @ i −→Box|S, β(n) h0 @ i0 and n.i0 is an external port of N |S. We then have that h @ i −→Box, β(n) h0 @ i0 and, by the above remark on external ports of N |S, that n.i0 is an external port of N , and so h @ n.i N h0 @ n0 .i0 . 2. For the second part consider two states h @ n.i and h0 @ n0 .i0 in S. Then suppose, first, that h @ n.i −→N h0 @ n0 .i0 to show that h @ n.i −→N |S h0 @ n0 .i0 . Then h @ i −→Box, βN (n) h0 @ j and n.j γN n0 .i0 , for some j. So, first, as n.i ∈ P (since h @ n.i is in S), i is in the set I =def {i ∈ K | n.i ∈ P } where βN (n) : K. Then, as n0 .i0 ∈ P (since h0 @ n0 .i0 ∈ S) and as P is connection invariant and n.j γN n0 .i0 , we see that n.j ∈ P and so that j ∈ I also holds. It follows that h @ i −→Box, βN (n)|I h0 @ j. Then, as h, h0 ∈ H, we finally have h @ i −→Box|S, βN |S (n) h0 @ j. Next as n.j γN n0 .i0 and n.j, n0 .i ∈ P we further have that n.j γN |S n0 .i0 . Combining the last two facts, we have h @ n.i −→N |S h0 @ n0 .i0 , as required. Next, suppose instead, that h @ i −→Box, βN (n) h0 @ i0 and n.i0 is an external port of N . Here, as h @ n.i and h0 @ n0 .i0 are in S, n.i and n0 .i0 are in P ; therefore, arguing as above, we see that h @ i −→Box|S, βN |S (n) h0 @ i0 holds. But n.i0 is an external port of N |S as it is an external port of N . So we find that h @ n.i N |S h0 @ n0 .i0 , concluding the proof. Turning to logic, for the logic of N |S, we take the N -formulas that mention only ports of N |S. There is a natural potential bisimulation ∼S between N and N |S, defined by: h @ p ∼S h @ p ⇐⇒ def h @ p = h @ p ∈ S T HEOREM 1. The relation ∼S is a one-step bisimulation between →N and →N |S if, and only if, S is a network invariant of N . In that case, for any N |S formula and any state h @ p ∈ S we have: h @ p |=N ϕ ⇐⇒ h @ p |=N |S ϕ P ROOF. Suppose that S is an invariant. To show that ∼S is a bisimulation between the →N and →N |S , we suppose that h @ p ∼S h @ p (i.e., that h @ p = h @ p ∈ S) and verify the usual two conditions. First, if h @ p →N h0 @ p0 then, as S is an invariant, we have h0 @ p0 ∈ S. So, applying Lemma 1, part 2, we have that h @ p →N |S h0 @ p0 . As h0 @ p0 ∼S h0 @ p0 (as h0 @ p0 ∈ S) we have verified the first condition. Second, if h @ p →N |S h0 @ p0 then h0 @ p0 ∈ S and so h @ p →N h0 @ p0 , by Lemma 1, part 1. The second condition follows. One shows ∼S an external transition bisimulation similarly. Conversely, suppose that ∼S is a one-step bisimulation between N and N |S. To show S invariant, suppose that h @ p ∈ S and h @ p N h0 @ p0 . Then h @ p ∼S h @ p and so, as ∼S is a 0 0 bisimulation, we have that h @ p N |S h @ p0 for some h @ p0 0 0 with h0 @ p0 ∼S h @ p0 , that is, with h0 @ p0 = h @ p0 ∈ S. Thus S is indeed an invariant. The final part follows from the assumption that ∼S is a bisimulation by applying Proposition 1 and the remarks made immediately thereafter. B. The advantage of this transformation is that it can be applied repeatedly to gradually drain the rules out of backup routers, to make a much simpler reduced network. Example 4: Redirecting traffic Suppose we have a network N in which traffic T ⊆ PacN flows from a node a to a node b and then, without change, to a node d, but there is an alternative flow, in which the same traffic moves from a to yet another node c from which it also flows without change to d. Then, as long as the box at d treats the traffic in the same way, by whichever of the two routes it arrives, we can transform the network by rerouting the traffic T through b through c instead, and removing relevant rules from the box at b. This results in another network N̄ , with less rules. While this transformation is clearly far from general, it seems widely applicable to data center switching networks because of the symmetry present there. In Section 6 below we point out how one could pick out candidate nodes a, b, c, etc., using symmetry considerations. We next present the transformation and its background assumptions precisely. The nodes a, b, c, and d are supposed all distinct. i0 i B b k k0 a A j j0 i0 C c ⇐⇒ B b l k l0 d D i a A j j0 C c k0 l0 d D l Figure 6. The formal setup for Example 4 Let the boxes at these nodes be A : KA , B : KB , C : KC , and D : KD , respectively. Suppose that a and b are connected by ports a.i and b.i0 ; that b and d are connected by ports b.k and d.k0 ; that a and c are connected by ports a.j and c.j0 ; and that c and d are connected by ports c.l and d.l0 . Note that we then also have that i and j are distinct, as are i0 and k; j0 and l; and k0 and l0 . This scheme is depicted in Figure 6. The background assumptions are that for any h ∈ T the following hold: h @ i0 −→B h0 @ q ⇐⇒ h0 = h ∧ q = k h h A h B C h D Transform: Remove h rule in B, redirect in A ⇐⇒ h h A B C h D which says that T traffic entering at i0 is forwarded by B unchanged at k, h @ j0 −→C h0 @ q ⇐⇒ h0 = h ∧ q = l which says that T traffic entering at j0 is forwarded by C unchanged at l, and h @ k0 −→D h0 @ q ⇐⇒ h @ l0 −→D h0 @ q Figure 5. When considering forwarding header h, redirecting traffic from A to only use C and dropping the relevant rule in B does not impact the reachability of h Our next surgery (redirection), formalized in Example 4, keeps the topology unchanged but drops rules. This, in turn, involves redirecting the traffic covered by the dropped rules. First, notice in Figure 5 that A, in Figure 5, forwards packets to header h to both B and C. This is done for good reasons in practice such as load balancing. However, given that certain conditions hold, to compute the reachable set of packets addressed to h it suffices to redirect traffic at A to only go to C, and drop the rules dealing with this traffic from which says that D treats T traffic at k0 and l0 identically. The network N̄ is obtained from N by replacing the boxes A : KA and B : KB by boxes A : KA and B : KB . The transition rules of A are the same as those of A except for transitions to the ports i or j, where we put instead: /T h @ q →A h0 @ i ⇐⇒ def h @ q →A h0 @ i ∧ h0 ∈ h @ q →A h0 @ j ⇐⇒ def h @ q →A h0 @ j ∨ (h @ q →A h0 @ i ∧ h0 ∈ T) The transition rules of B are the same as those for B except for transitions between ports i0 and k where we have: h @ i0 →B h0 @ k ⇐⇒ def h @ i0 →B h0 @ k ∧ h ∈ /T Turning to the bisimulation relation ∼, we take h @ p ∼ h @ p to hold if, and only if, h = h and one of the following two conditions also hold: Further, for such a q, d.q will either be an external port or else be connected to a port other than b.i0 (as that port is connected to another port). So if h0 @ p0 is reachable by a → transition from h @ p (equivalently from h @ p) then we have h0 @ p0 ∼ h0 @ p0 . The bisimulation condition therefore holds in this case too. - p = p and, if h ∈ T, then p 6= b.i0 - h ∈ T and one of the following two conditions hold: p = b.i0 p = d.k0 ∧ ∧ p = c.j0 p = d.l0 Note that this bisimulation relation is not the composition of separate relations between packets and ports. T HEOREM 2. The relation ∼ is a one-step bisimulation between −→N and −→N̄ . P ROOF. We first show ∼ a bisimulation of the −→ relation. We consider all the cases when a relation h @ p ∼ h @ p holds, and verify the bisimulation condition in each. We always have h = h. Suppose first that p = p. Then either h ∈ / T or else p 6= b.i0 . There are two cases: In the first case, we assume first that p is not a port of the node a. In this case, a →-successor state of h @ p of either network cannot be of the form h0 @ b.i0 and so is ∼-related to itself. Further, both networks have the same →-successor states, as the relevant transitions are the same in both networks. This is evident in the case of the c and the d nodes where the corresponding boxes are the same in both networks. In the case of the node b, the boxes B and B have the same transitions, except for those from i0 ; however in this case we then have h ∈ / T and so B and B have the same transitions from h @ i0 . As the →-successor states are the same in both networks and are ∼-related to themselves, the bisimilarity condition holds in this case. In the second case we assume that p has the form a.q, and consider the possible forms h0 @ q 0 of the A and A successors of h @ q. These are the same if h0 ∈ / T or else q 0 6= i and q 0 6= j. The corresponding →-successor states in the two networks are then equal and in the ∼ relation to themselves. Otherwise, h0 ∈ T and q 0 = i or q 0 = j, and we note: − Every A successor h0 @ i can be paired off with the A successor h0 @ j, and h0 @ b.i0 ∼ h0 @ c.j0 . − Every A successor h0 @ j can be paired off with the A successor h0 @ j, and h0 @ c.j0 ∼ h0 @ c.j0 ; and this accounts for all the remaining A successors of the form h00 @ j with h00 ∈ T. − There are no A successors of h @ q of the form h0 @ i. This verifies the bisimilarity condition in this case. Suppose next that p 6= p. Then h ∈ T, and there are two cases: 0 0 In the first case p = b.i and p = c.j . So, employing the first two background assumptions, the only possible →successor state of N is h @ d.k0 , and only possible next such state of N̄ is h @ d.l0 . As h @ d.k0 ∼ h @ d.l0 , the bisimulation condition holds in this case. In the second case, p = d.k0 and p = d.l0 . By the third background assumption the h0 @ q reachable from h @ k0 and those reachable from h0 @ l0 by a D transition are the same, and so too, therefore are the states of the two networks reachable from h @ p and h0 @ p by a →-transition. We next show that ∼ is a bisimulation of the relation. The two networks have the same relation. Also, h @ p ∼ h @ p whenever there is a transition from h @ p. The only case where ∼ relates two different states, one of which may make a transition, is h @ d.k0 ∼ h @ d.l0 , with h ∈ T, and in that case one can make a transition to a state h0 @ p0 iff the other can, as D treats T traffic at k0 and l0 identically. Putting these together, we see that ∼ is a bisimulation of . Turning to logic we take the logic of N̄ to be that of N , and consider formula correlations. For packet header formulas we evidently always have α ∼For α For port formulas we assume available a boolean combination ϕT of packet header formulas defining T. We then have the following correlations: @ p ∼For @ p if p is not b.i0 , c.j0 , d.k0 , or d.l0 , ¬ϕT ∧ @ p ∼For ¬ϕT ∧ @ p if p is c.j0 , d.k0 , or d.l0 , ϕT ∧ ( @ b.i0 ∨ @ c.j0 ) ∼For ϕT ∧ @ c.j0 and ϕT ∧ ( @ d.k0 ∨ @ d.l0 ) ∼For ϕT ∧ ( @ d.k0 ∨ @ d.l0 ) So, in particular, if we are “away from the surgery” the same assertions hold. More precisely, if p is not b.i0 , c.j0 , d.k0 , or d.l0 , and ϕ does not mention any of these ports, then, for any h, we have: h @ p |=N ϕ ⇐⇒ h @ p |=N̄ ϕ There are variations on this transformation. Regarding the definition of A, one can instead leave its transitions to i unchanged, when the proof that ∼ is a one-step bisimulation relation goes through unchanged. It may also happen that the transitions to j are unchanged, as the ones being added are already there. This is common in the core where ports such as i and j are treated symmetrically for the usual redundancy and performance reasons. Next, let us assume, as is common in switching networks, that packets are forwarded without being transformed. We can then weaken the condition that D treats T traffic at k and l identically. Given a subset G of ports (to be “guarded”), we define a forwarding relation ≡TG on ports. It is the least relation such that p ≡TG q if, and only if, one of the following hold: • p = q, or • p, q ∈ / G and both of the following hold ∀h ∈ T, p0 . h @ p →N h @ p0 ⇒ ∃q 0 . h @ q →N h @ q 0 ∧ p0 ≡TG q 0 ∀h ∈ T, q 0 . h @ q →N h @ q 0 ⇒ ∃p0 . h @ p →N h @ p0 ∧ p0 ≡TG q 0 Then ≡TG is an equivalence relation and if p ∈ G and p ≡TG q then p = q. The weaker condition on D is then that d.k0 ≡TG d.l0 . One can define a bisimulation exactly as before, except that one replaces the second clause in the case where h ∈ T (i.e., the one that p = d.k0 and p = d.l0 ) by p ≡TG p. As before, the same assertions hold if we are away from the surgery, i.e., the ports b.i0 , c.j0 , or any ports in the ≡TG -relation to a different port, and so, in particular, any port in G. As before, one may also leave the transitions to i of A unchanged. Returning to Example 1, let us see why there need not be a relevant bisimulation between the two networks, by which we mean a bisimulation allowing the two atomic formulas @ r1 .i0 and @ v2 .j0 to be transferred. The problem is that although a packet h ∈ T can go from r1 .i0 to v2 .j0 via the core, it might also enter the core and then go somewhere else and then be dropped. Now consider the following reachability formula (generally stronger than ϕreach ) αT ∧ @ r1 .i0 ⇒ ♦ @ v2 .j0 This holds in N̄ , but may fail to hold in N . However, if that is the case, then, by Proposition 1, there can be no bisimulation between N and N̄ that allows the two atomic formulas to be transferred. 5. Composite bisimulations We now see how composite bisimulations can be obtained. These use “topological” relations between networks and their signatures to compose together bisimulation relations between the transition relations of boxes to form bisimulations between the networks. This contrasts with the bisimulations in Section 4 which are rather constructed in an “ad hoc” way, according to the need at hand. Suppose we have two signatures Box and Box. Then a signature relation between the two signatures consists of: • a box relation ∼B between Box and Box, and • for each so-related pair of boxes b : K, b : K, a box port relation b,b ∼Po ⊆ K × K between their ports. Suppose next that we have two networks N and N built over the two signatures. Then a topological relation between N and N consists of a signature relation between the two signatures, as above, together with a node relation ∼N between their sets of nodes such that: We then have: P ROPOSITION 2. If ∼Pa is a signature bisimulation, then ∼Net is a one-step bisimulation relation between N and N . P ROOF. Assume ∼Pa is a signature bisimulation. We give the first halves of the proofs that ∼Net is an internal transition relation bisimulation and that it is an external one; the second halves are similar. Suppose h @ n.i ∼Net h @ n.i. Then h ∼Pa h, n ∼N n, and β (n),βN̄ (n) βN (n),βN̄ (n) i ∼PoN i, and so h @ i ∼Blp h @ i. 0 0 0 If, first, h @ n.i →N h @ n .i , then h @ i →βN (n) h0 @ j and n.j γ n0 .i0 , for some j. As ∼Blp is a bisimulation of →βN (n) 0 0 by →βN̄ (n) , there are h , j such that h @ i →βN̄ (n) h @ j β (n),β (n) 0 β (n),β (n) 0 0 N N̄ and h0 @ j ∼Blp h @ j (and so h0 ∼Pa h and β (n),βN (n) βN (n),βN (n) j). Next, as n ∼N n and j ∼PoN j, j ∼Po 0 0 n.j ∼Po n.j. So as n.j γ n .i and ∼Po is a bisimulation of γ by 0 0 0 γ , there are n0 and j such that n.j γ n0 .j and n0 .i0 ∼Po n0 .j . 0 0 0 Then, as h @ i →βN̄ (n) h @ j and n.j γ n .j , we have that 0 0 0 0 h @ n.i →N h @ n0 .i . Finally, as h0 ∼Pa h and n0 .i0 ∼Po n0 .i 0 0 0 0 0 0 we also have h @ n .i ∼Net h @ n .i , as required. If, instead, h @ n.i N h0 @ n0 .i0 , then h @ i →βN (n) h0 @ i0 and n.i0 is external. So, as ∼Blp is a bisimulation of →βN (n) 0 0 0 0 by →βN̄ (n) , there are h , i such that h @ i →βN̄ (n) h @ i 0 0 N N̄ h @ i (and so h0 ∼Pa h and and h0 @ i0 ∼Blp βN (n),βN (n) 0 β (n),βN (n) 0 0 i ∼Po i ). Next, as n ∼N n and i0 ∼PoN i, 0 0 0 0 n.i ∼Po n.i . But then as n.i is external, we see that n.i is external too. 0 0 0 0 So, as h @ i →βN̄ (n) h @ i , we have h @ n.i N h @ n.i 0 0 Finally, as h0 ∼Pa h and n0 .i0 ∼Po n0 .i we also have that 0 0 0 0 0 0 h @ n .i ∼Net h @ n .i , as required. • if n ∼N n then βN (n) ∼B βN̄ (n), and 6. • the network port relation ∼Po between PortN and PortN̄ We use composite symmetries in two ways. The first is to define various notions of symmetries, particularly the topological ones that typically hold in data centers, and then show how local symmetries can be found and also how to divide out networks by symmetry groups. The second is to construct equivalencies between packet headers that allow us to replace assertions about one header by an assertion about an equivalent one, or to go further and replace headers by their equivalence classes. defined by: n.i ∼Po n.i ⇐⇒ def β n ∼N n ∧ i ∼PoN (n),βN̄ (n) i is a bisimulation of γ by γ. (Note that then if n.i ∼Po n.i then n.i is external iff n.i is.) Turning to the box transition relations , assume first that we have two header spaces PacN and PacN . Then a packet header relation is a relation ∼Pa between the two header spaces, PacN and PacN . Assume next that we have a semantics for each of the signatures, that is, we have families of transition relations −→N ,b (b ∈ Box) and −→N̄ ,b (b ∈ Box) for the two signatures, as considered above. Then, given a signature relation between the two signatures as above, a packet header relation ∼Pa is a signature bisimulation between the two signatures if, and only if the box-located packet b,b relation ∼Blp , defined by: h @ i ∼Blp h @ i ⇐⇒ def h ∼Pa h ∧ i ∼b,b Po i is a bisimulation between −→b and −→b , whenever b ∼B b. Given a topological relation (comprising a signature relation and a node relation) and a packet relation as above, we have a relation ∼Pa between the two sets of packets and a relation ∼Po between the two sets of ports. So we can define a relation ∼Net between the corresponding located packet sets as remarked in Section 3, i.e., by putting: h @ p ∼Net h @ p ⇐⇒ def h ∼Pa h ∧ p ∼Po p Network symmetries and merging headers Example 5: Network symmetry At the most general level, a symmetry of a network N is a permutation πN of network states which preserves and reflects both →∗N and ∗N . That is, for all network states h @ p and h0 @ p0 , we have: h @ p −→∗N h0 @ p0 ⇐⇒ πN (h @ p) −→∗N πN (h0 @ p0 ) and similarly for ∗N . More strictly, a one step symmetry of N is required to preserve and reflect →N and and ∗N . One could also require that the permutation is composed from separate permutations of the headers and the network ports (note that if the header space is finite, then the reflection condition is redundant). The network structure provides a good source of exploitable symmetries, and so we focus on composite symmetries which are simply those composite bisimulations between a network and itself where all the relations are bijections. (We note in passing the evident fact that all these various classes of symmetries form groups.) Let us spell this out in functional terms. Assume given a signature Box. Then a signature symmetry over Box consists of: • a box permutation πBo : Box ∼ = Box, and b • for all boxes b : K, a box port bijection πPo :K ∼ = K, where πBo (b) : K, Next, given a network N over Box, a topological symmetry of N is a signature symmetry, together with a node permutation πN o : Node ∼ = Node such that: • For all n, we have πBo (β(n)) = β(πN o (n)), and • For all ports n.i and n+ .i+ , we have: n.i γ n+ .i+ ⇐⇒ πPo (n.i) γ πPo (n+ .i+ ) where the port bijection πPo : Port ∼ = Port is defined by: β(n) πPo (n.i) = πN o (n).πPo (i) (n.i ∈ Port) We can associate a graph ΓN to every network N . It has nodes those of the network and has relation RN where: RN (n, n0 ) ⇐⇒ def ∃n.i, n0 .i0 ∈ Port. n.i γ n0 .i0 Then every topological symmetry πN o is a symmetry of ΓN . Turning to the semantics, suppose we have a header space PacN , and families of transition relations −→N ,b (b ∈ Box), as usual. Then, given a signature symmetry as above, a packet bijection πPa : PacN ∼ = PacN is a symmetry of the signature semantics, if, for all h, h0 , b : K, and all i, i0 ∈ K, we have: b b h @ i →b h0 @ i0 ⇐⇒ πPa (h) @ πPo (i) →πBo (b) πPa (h0 ) @ πPo (i0 ) Figure 2 provides an example of a topological symmetry where the boxes R3 and R4 , and their nodes would be permuted, but all others would be fixed, and where the ports are permuted as shown. If, further, the two boxes contained the same rules (modulo box port bijections) then the identity packet bijection would provide a signature semantics symmetry. Given a topological symmetry πN o (comprising a signature symmetry and a node permutation) and a packet bijection πPa as above, we can define a bijection of the located packet set by: πN et (h @ n.i) =def πPa (h) @ πPo (n.i) and we have the following corollary of Proposition 2: So the suggestion is to look for two nodes with the same neighbors, and use the node connections to determine the port correlations between them (two ports are correlated if they link to the same neighbor). One can then choose pairs of such ports (other than any ports connecting a and b) to search for candidates for i0 and j0 , or for k and l, for traffic redirection. Figure 2 again provides an example. The two required nodes are those corresponding to the two boxes R3 and R4 . We next sketch how to quotient a network N by a group of topological symmetries. In this way we may be able to “slim” fat trees, as discussed above. First, let us briefly recall the relevant background material (and see, e.g. [3, §17]). An action of a group G on a set X is a map · : G × X → X, such that the following two equations hold: (g 0 · g) · x = g 0 · (g · x) The orbit equivalence relation is then defined on X by: x ∼G y ⇐⇒ def ∃g ∈ G. g · x = y and we write X/G for the set of equivalence classes [x] of elements x of X under this equivalence relation. Under componentwise composition, the topological symmetries (πBo , πPo , πN o ) form a group TopN , with three actions: on Box; on BoxPort =def {b.i | b : K, i ∈ K}; and on Node, where: (πBo , πPo , πN o ) · b (πBo , πPo , πN o ) · (b.i) (πBo , πPo πN o ) · n −1 h |= απ ⇐⇒ πPa (h) |= α We then obtain an evident homomorphic definition of ϕπ for N formulas ϕ, where: (@ p)π = @ πPo (p) (¬ϕ)π = ¬ϕπ (ϕ ∧ ψ)π = ϕπ ∧ ψπ (♦ϕ)π = ♦ϕπ (Fϕ)π = Fϕπ Proposition 1 then tells us, that, for any N -formula ϕ, we have: h @ p |= ϕ ⇐⇒ πPa (h) @ πPo (p) |= ϕπ We can use symmetry considerations to pick out candidates for traffic-redirection surgery. We would like to find two symmetrically placed nodes b and c which are candidates for redirecting traffic (from b to c), so wish to find a box symmetry πBo and an associated topological symmetry πN o which switches a and b. Let us say that the symmetry is local if πN o leaves all the other nodes invariant. For such a local symmetry a and b will have the same ΓN neighbors. If there is only one link between any two nodes, as is common, the signature symmetry is also then determined. =def =def =def πBo (b) b (i) πBo (b).πPo πN o (n) Let G be a subgroup of TopN . Let Box/G, BoxPort/G, and Node/G, be the collections of orbit equivalence classes for each of its three actions, inherited from TopN . We define a signature on Box/G by setting [b] : {[b.i] | i ∈ K}, for each b : K (ignoring that the [b.i] are not natural numbers). Then we can define a quotient network N /G over the signature. It has node set Node/G, box assignment function βN /G , where βN /G ([n]) =def [βN (n)] and connection relation γN /G where m.q γN /G m0 .q 0 ⇐⇒ def C OROLLARY 1. If πPa is a symmetry of the signature semantics, then πN et is a one-step symmetry of N . Turning to the logic, assume that for each packet formula α there is a formula απ such that: e·x=x ∃ n, n0 , b, b0 , i, i0 . n.i γN n0 .i0 ∧ βN (n) = b ∧ βN (n0 ) = b0 ∧ m = [n] ∧ m0 = [n0 ] ∧ q = [b.i] ∧ q 0 = [b.i0 ] Assume next that the identity packet permutation is a symmetry of the signature semantics, given any signature component of any element of G. Then we can define a transition relation for Box/G, keeping the packet set the same as that of N , by: h @ q → c h0 @ q 0 ⇐⇒ def ∃b, i, i0 . h @ i →b h0 @ i0 ∧ c = [b] ∧ q = [b.i] ∧ q 0 = [b.i0 ] So we have defined the syntax and semantics of the quotient network N /G, as desired. There is a one-step bisimulation between N and N /G, formed by combining the identity relation on packets and the relation ∼Po on ports, where: n.i ∼Po [n].[n.i] ⇐⇒ def n ∈ [n] ∧ n.i ∈ [n.i] Finally, assuming the evident logic for N /G, we have: @ p1 , . . . , pm ∼For [p] if [p] = {p1 , . . . , pm }. Figure 2 again provides an example, assuming it depicts a composite symmetry as discussed above. The permutation group G is that generated by the symmetry, when, for example, the box orbit equivalence classes would be singletons except for {R3, R4}. The right-hand-side of the figure depicts the net after division by the symmetry group, with, for example, R3 depicting {R3, R4}. In general, one might have a number of such pairs of symmetrically placed boxes, when G would be the group generated by the various pair symmetries. Example 6: Merging headers Suppose we have a signature Box with semantics given as above. Then a binary (signature) invariant is a bisimulation ∼Pa between the signature and itself. Spelling this out, what is required is that ∼Pa is a relation on the set of packet headers such that, for any h, h, if h ∼Pa h then, for all boxes b : K and i, i0 ∈ K: 0 0 0 (i) h @ i −→b h0 @ i0 =⇒ ∃h . h0 ∼Pa h ∧ h @ i −→b h @ i0 holds for all h0 , as does 0 0 (ii) h @ i −→b h @ i0 =⇒ ∃h0 . h0 ∼Pa h ∧h @ i −→b h0 @ i0 0 for all h . Taking the other relations to be the relevant identity relations, Proposition 2 applies, and one obtains a one-step bisimulation of N by itself. Proposition 1 then tells us that for any formula ϕ and headers h ∼PacN h we have: h |=N ϕ ⇐⇒ h |=N ϕ provided ϕ is built, using the connectives, from formulas of the form @ p or formulas that are ∼Pa -invariant, by which is meant that ∀h, h. h ∼Pa h =⇒ (h |=N ψ ⇐⇒ h |=N ψ) holds, and which are boolean combinations ψ of basic formulas. We can go further, and identify equivalent packet headers. First, we can assume that ∼Pa is a partial equivalence relation (i.e., that it is transitive and symmetric, but not necessarily reflexive); for if it is not, one need only consider its transitive symmetric closure. The cases where reflexivity holds are given by the domain of ∼Pa , H =def {h | h ∼Pa h} (and H × Port is easily seen to be a network invariant). Restricted to its domain, ∼Pa is an equivalence relation and we can change the set of packet headers to its set of equivalence classes. That is, we change the header space to: PacN / ∼Pa =def {[h] | h ∈ H} Having done, so we can define a new semantics for boxes, where: [h] @ i −→b [h0 ] @ i0 ⇐⇒ def h @ i −→b h0 @ i0 Then we can define the network N / ∼Pa ; this is the same as N , except for the above changes in packet headers and box semantics. As may be expected, N and N /∼Pa are one-step bisimilar, taking the packet header bisimulation ∈PacN to be membership: h ∈PacN [h] ⇐⇒ def h ∈ [h] (⇐⇒ h ∼Pa h) and the other relations to be the relevant diagonals. Turning to logic, we take the basic formulas of N / ∼Pa to be the ∼Pa -invariant boolean combinations ψ of basic formulas and set: [h] |= ψ ⇐⇒ h |= ψ Then, applying Proposition 1, we find that, for any N / ∼Pa formula ϕ, and any h ∈ H: h |=N ϕ ⇐⇒ [h] |=N /∼Pa ϕ The maximal signature bisimulation provides a natural choice of binary invariant; it is automatically an equivalence relation. In the case where the signature semantics does not change packet headers, that is where packet headers are forwarded unchanged, one can give a more explicit form of the maximal signature bisimulation, first (implicitly) considered by Yang and Lam [30] in the particular case where the output ports of the signature transition relations do not depend on the input ports. To see this, first note that in the case where packet headers are forwarded unchanged, ∼Pa is a signature invariant if, and only if, whenever h ∼Pa h then, for all boxes b : K and i, i0 ∈ K, we have: h @ i →b h @ i0 ⇐⇒ h @ i →b h @ i0 Next, identifying predicates on headers with sets of headers, given a set H of predicates, one can define an equivalence relation on headers by taking h ∼H h to hold if, and only if, for all H ∈ H we have: h ∈ H ⇐⇒ h ∈ H Then Yang and Lam’s atomic predicates are the equivalence classes of this relation. Taking H to be the sets {h | h @ i →b h @ i0 }, where b : K and i, i0 ∈ K, one obtains an equivalence relation ≡YL slightly generalising that of Yang and Lam, which we therefore call Yang-Lam equivalence. Yang and Lam demonstrated that impressive efficiencies in verification could be achieved by replacing packet headers by (representations of) their equivalence classes (as discussed above) since there can be many fewer equivalence classes than headers. in the networks they considered. As we shall see this also holds for the Singapore network. Comparing the above reformulation of signature invariants with Yang-Lam equivalence we see that ∼Pa is a signature invariant if, and only if, ∼Pa ⊆ ≡YL . This proves the first part of the following theorem: T HEOREM 3. Under the assumption that the signature semantics does not change packet headers, the following are the same: • the maximal signature bisimulation • Yang-Lam equivalence Further, in case all boxes occur in N (i.e., are in the range of βN ), then they are also the same as: • the maximal relation on headers which, when combined with the identity relation on ports (as in Section 3), forms a bisimulation. P ROOF. For the proof of the second part of the theorem, the condition that a packet header relation ∼Pa forms a bisimulation when combined with the identity relation on ports is clearly equivalent to asking that, for all nodes n, i ∈ K (where β(n) : K), and ports q we have: h @ n.i →N h @ q ⇐⇒ h @ n.i →N h @ q (∗) Fix a node n, and an i ∈ K (where b : K, setting b =def β(n)), and consider the possible q for which (*) holds. There are three cases. In the first q = n.j for some j and q is external. Here, inspection of the definition of →N tells us that (*) is equivalent to h @ i →b h @ j ⇐⇒ h @ i →b h @ j In the second q = n0 .j 0 and n.jγn0 .j 0 for some port n0 .j 0 and j ∈ K. Inspection of the definition of →N again tells us that (*) is equivalent to h @ i →b h @ j ⇐⇒ h @ i →b h @ j The third case is when q has neither of these forms, in which case (*) is trivially true (more precisely, both sides of implication are false). As every port at any node is either internal or external, and as every box is in the range of β we see from the characterisation of signature bisimulations in the case that headers are left unchanged in signature transitions that ∼Pa forms a bisimulation when combined with the identity relation on ports iff it is a signature bisimulation. 7. Experiments We describe our benchmark network and then show the use of variants of the surgeries of Examples 1 and 4, together with a header equivalence relation, to obtain large speedups compared to experiments described in earlier work [20]. The first is systematically applicable to all data center networks with a distinguished core; the others are applied automatically, in seconds, on large benchmarks; we also briefly describe how they are implemented. 7.1 Setup As an initial experimental test of some of the ideas considered in this paper, we worked on a Microsoft production data center located in Singapore. This is a fairly large switching network, with 52 core routers, each with about 800 forwarding rules (but no ACLs), and with 90 ToRs with about 800 rules and 100 ACLs each. In total, this network has about 820K forwarding and ACL rules and is a reasonable example of a complex data center. We used parsers to automatically extract routing tables from the Arista and Cisco devices in this network using a “show ip route” command. This produced a set of routing tables including the ECMP routing options. Each router is also annotated with a name from which it is easy to syntactically determine its level in the topology (e.g., router names starting with HL denote Host Leaf or Edge routers). This is standard practice; we did not have to add extra annotations manually. The rules in routers map headers to a set of next hop IP addresses; using a simple call to the Domain Name Service we map these next hops to router names to automatically extract the topology of the network. We use the NoD tools [20] to encode the networks. All results were obtained by running NoD [20] queries on both the original network and various transformations of it. 7.2 Speeding up All-Pairs Reachability Our first experiment computed all-pairs reachability between client VMs on the Singapore data center; that is, for each pair of VMs we compute (a representation of) the set of headers of those packets that can reach one VM to the other. This is ideally what needs to be done periodically in operating data centers to help catch configuration errors in routers. Naively done, this requires a quadratic number of queries over the number of client VMs (Virtual Machines), and these are usually of the order of 100,000s. For example, without exploiting the surgeries in this paper, NoD takes 131 hours (∼5.5 days) to prove that client VMs can reach each other using a single query encoding all-pairs reachability at once. We did the following experiment based on our knowledge of how this network operates. We followed the idea of the simplest surgery, described in Example 1, automatically rewriting the core by a hub, much as shown in Figure 3. The hub was, however, more complicated: it connected all pairs of ToRs, not just one, and it took account of the fact that ToRs could be connected to the core by more than one port. We then applied NoD to the transformed network, running the same all-pairs reachability query. This now took only 2 hours. Of course, to be fair one has to count the time to prove that the core network does indeed behave like the hub i.e., that it does not filter out packets. The simplest approach is to use NoD itself to prove (see Figure 3) that for all pairs of edge routers R1 and R2 , that all headers h (with source address corresponding to the prefix corresponding to R1 and with destination address corresponding to the prefix corresponding to R2 ) when sent from R1 reach R2 . This “proof” does not even require rewriting the network; it only requires computing reachability between every pair of edge routers in the original network. This “brute-force” verification that the core can be replaced by a hub still took only 1 hour and 40 minutes to complete. This is faster than the original 131 hours because all virtual machines such as V1 connected to R1 in Figure 3 are aggregated by a single prefix. Thus the original N 2 scaling is reduced to (N/64)2 as edge routers typically have 64 ports. Below we will give a more sophisticated verification, which is much faster, taking only seconds, that exploits local rule surgery, much like Example 4. However, even using brute-force verification that the core can be placed by a hub, the overall speedup is from 131 hours to 3.66 hours, i.e., 36×. 7.3 Accelerating other Queries We next apply the local surgery ideas introduced in Example 4 to speed up four sets of earlier experiments described in [20]. These checked common “beliefs” in the case of the Singapore data center. The first two beliefs checked were that neither Internet addresses nor customer VMs can access protected fabric controllers for security reasons. The third was that all “utility boxes” can reach all “fabric controllers”, and the fourth was that all “service boxes” can reach all “fabric controllers”, The definitions of fabric controllers, utility boxes, and service boxes are unimportant; the reader should think of these as classes of boxes with different reachability privileges. The four queries each took several minutes to complete in the original network without the use of surgeries. Instead, we implemented a version of the Example 4 rule surgery, that used the fact that the network forwards packets unchanged and without regard to entry ports, when we can discuss the network transition relation using rules of the form h 7→ I, as described in Section 2.2. We then obtained the results given in Table 1. Note that, in contrast to the times in minutes on the original network, the same queries on the transformed network often ran in a couple of seconds after surgery, owing to the reduction in the number of rules. The essence of Example 4 is to remove rules with respect to some set of headers T that do not impact reachability of headers h in T with respect to some set of ports G. We take T to be a singleton consisting of a single header h, and G to be the set of ToR ports connecting to VMs, and write ≡h for ≡TG , the forwarding equivalence relation defined in Example 4. rA h k B rB h A l rC h C i rD h e j D e ≡h e ⇓ i ≡h j ⇓ k ≡h l 0 rA h B A rC h C rD h D e Figure 7. We fix a header h and then inductively establish the port forwarding equivalence relation ≡h . Some rules can then be dropped after redirecting traffic, in accordance with the equivalence relation. For example, consider Figure 7, where, for illustration, we suppose that k, l, i, and j are not in G; the rule rA sends h out only to k and l; the rule rB sends it out only to i; the rule rC sends it out only to j; and the rule rD sends it out only to e. Then ports i and j at D are forwarding equivalent with respect to header h because they forward to the same external port e. This inductively implies that ports k at B and l at C are equivalent because they respectively forward to equivalent ports i and j. We can then redirect the h-traffic through A to go only to C 0 via l, changing rA to the rule rA . Note that the new network has the same forwarding relation ≡h as before, so we can do similar redirections for other rules. Now suppose that in Figure 7 the only way that h-traffic could reach B is via k. Then, after the redirections, the rule rB will be inaccessible and so can be pruned, as can any other inaccessible rules. We implemented this idea with two data structures. First, we have to deal with the large number of potential headers used in the forwarding plane. For IPv4, this is up to 232 headers even if considering only the destination IP for building equivalence classes (as we do). We mapped the original set of potential headers into a smaller set of equivalence classes where headers are in the same equivalence class if there are no two rules that can distinguish them. For this purpose we used a data-structure called disjoint decomposed normal form (ddNF) data-structure, described in [5]. The asscoiated equivalence relation is theoretically coarser than Yang-Lam equivalence, but we observe in [5] that the number of extra partitions is insignificant. Despite the fact that the number of rules in our data set was close to a million, the number of header equivalence classes was only around 4000, consistent with the results of [30]. It takes under four seconds to compute these equivalence classes for our network. Next, we split rules so that each rule operates on a single header equivalence class. Then, for each such class h we compute the forwarding equivalence relation ≡h on ports, illustrated above. The algorithm refines port equivalence relations p ∼h q, represented as maps from header equivalence classes h to partitions on ports. Initially, the map maps each such h to the discrete partition. Then, in the style of congruence closure algorithms, we use unionfind structures to maintain partitions [29]: Until reaching the fixedpoint ≡h , for each class of headers h, we merge partitions containing p and q not in G if {p0 | h @ p −→N h @ p0 } and {q 0 | h @ q −→N h @ q 0 } are element-wise ∼h -equivalent, as are {p0 | h @ p N h @ p0 } and {q 0 | h @ q N h @ q 0 }. The element-equivalence check is fast because we can use the union-find root to find canonical equivalence classs representatives and we can maintain the sets as sorted lists. Finally, we rewrite all rules in all routers using the header and port equivalence relations: Each rule in every router is rewritten to redirect traffic to the canonical representative of the equivalence relation between ports. Then rules that are no longer reachable, because all the rules that previously directed headers to it had their ports renamed, are garbage collected. After this transformation (which can be thought of as a set of transformations, one for each header equivalence class), we reran the original four queries described in the NoD paper [20], obtaining the results given in Table 1. In this experiment, we transformed a network with nearly a million rules to a new network with just over 10K rules. Not shown is the time to parse text files containing the data center network (i.e., translate from CISCO format to Datalog) which is about 6 seconds and the time to perform the surgeries which is 4 seconds. The overall speedups obtained ranged from 15× to 360×. We reiterate that both the identification and the rewriting involved are completely automatic and very fast. A major obstacle of the classical symmetry reduction program in model checking [6, 8, 14] is that is often computationally hard to even identify the symmetries. We are much faster because we do not aim to find all symmetries, and we work at the fine structure of rules. Even naively, finding which pairs of rules are equivalent is only Experiment Internet Reaches Protected Fabric VM Reaches Protected Fabric Utility boxes can reach all fabric controllers Service boxes can reach all fabric controllers pre-op 12 min 12 min 4 min post surgery 2s 48s 1.7s 6 min 1.6s Table 1. Speedups for belief checking experiments in [20] quadratic in complexity (per header class). The congruence closure algorithm described above is even more efficient. It is noteworthy that this particular surgery does not even require identifying which routers are backups for each other because we focus on rule and not box equivalence. We complete the story by connecting the two experiments. Recall that Experiment 1 used a brute-force verification that the core can be replaced by a hub that took 1 hour and 40 minutes. We can use the rule surgeries just described to reduce to reduce even this time to under a second. Suppose the Edge Routers of a data-center are tor1 , . . . , torn and we wish to check that, for example, tori is reachable on the address range that it owns from all the other ToRs on any of the ports by which they connect to the core (we use the terms ToR and Edge Router synonymously). Then the typical data-center configuration ensures that all the ToRs, other than tori , are forwarding equivalent for any packet h in the address range (by which we mean that the core ports they connect to are all ≡h -equivalent). We checked that this was the case for the Singapore data-center for all its ToRs, using the computed forwarding equivalence relations. We observe that this cuts down a quadratic number of routes to a linear number of representatives for the pairwise routes, as, for example, to check reachability of tori from the others, one need only check reachability from any one of the core ports one of the others connects to. We checked these reachability queries on the Singapore data-center using a simple depth-first search algorithm on the transition graph of the heavily reduced network. The bottom line is that with more efficient verification, the overall speedup for all-pairs verification dropped from 131 hours to 2 hours, a 65× speedup. 8. Related work Symmetry reduction: Symmetry reduction has a long history in the model checking literature where it was used for verifying concurrent systems [6, 8, 14]. Ip and Dill [14] trace the use of symmetry for automatic verification to the 80’s [1, 21]. Symmetry is formally defined in terms of a certain permutation of participating processes specified by a group G. After symmetries are identified, model checking is done on a simpler structure quotiented by G. The permutations are usually discovered on the state components, i.e., they deal with the data aspects of the program. By contrast, we target particular permutations that arise from the replication of routers for load balancing and redundancy reasons. Perhaps it is fair to say that our symmetries are driven by the “control-flow graph” of the network. Further, exploiting symmetries does not require modifying the network verification engine since it is done by modifying the network itself (and adjusting the property). Two practical difficulties with the classical symmetry reduction program are firstly finding and verifying the group G, and secondly dealing with the fact that real structures do not have perfect symmetries. We proceed differently. Rather than calculating the whole group, our aim is to find particular symmetries and divide the network by them, rather than its transition relation, intending to verify the simpler network in place of the original one. It is not hard to find such symmetries, at least as regards the topology of the network, especially for fat trees where we need only look at routers on the same level. However it may be that the topological symmetry does not preserve the transition relation, as there are a few differences in the rules of the two routers. So, rather than actually construct the quotiented network (which would anyway be expensive, even with perfect symmetry) we use the topological symmetry (possibly implicitly) to remove redundant rules, as illustrated in Section 4. Software verification methods for multi-threaded programs usually explore symmetry in the local data maintained by (almost) identical processes by applying the so-called thread modular proof rule, where the induction principle is adjusted to reflect the symmetry [9]. Our approach performs a surgery before the verification engine is run, as opposed to within the verification engine. Bisimulation: Bisimulation is at the core of NetKAT’s [2] decision procedure for its equational theory [11]. However, its current implementation does not exploit symmetries. Figure 3 in [11] suggests that on fat tree topologies the running time appears to grow rapidly with the number of hosts. Our approach, by contrast, relies on bisimulation in the preprocessing phase, but not the actual verification step. Our notions of symmetry and surgery might be beneficial for (and easily integrated into) the NetKAT decision procedure, perhaps as a preprocessing step. A similar approach could be adopted for other network verification tools that check forwarding properties [17, 30]. Flowlog [27] employs partial evaluation and weakening to simplify the network verification problem. Partial evaluation is not currently considered in our system but could be a valuable addition. Weakening is more challenging as our current setup is intimately connected with the notion of bisimulation, but not simulation. Kuai [23] relies on partial order reduction when checking properties of SDNs. Our traffic redirection can be seen as an instance of partial order reduction that is applied statically. Semantics: While NetKAT [2] and related work [26] provide a semantically-oriented theory of networks, there is currently no provision for exploiting symmetry and network transformation. Note also that the research program in NetKAT is primarily topdown: defining a policy language and synthesizing networks that meet these policies. By contrast, our agenda is bottom-up: we start with existing networks and analyze their reachability policies. 9. Conclusion If network verification is to become an integral part of operational procedure in large networks, then the time for comprehensive verification (all pairs of stations, all properties) should be less than the average time of a network reconfiguration. Reconfiguration typically takes hours, but is set to decrease to seconds in the presence of virtual machine migration [15] to optimize resource usage. Surveying the initial work on network verification, Zhang, Malik, and McGeer [33] say . . . initial results are based on modest sized systems. However, overall, both FSM- and SAT-based approaches will need to be tested for larger scale systems, e.g. entire data centers or large scale enterprise networks. This will likely need development of new ideas in their solutions, or at the least adaptation of scaling techniques used in other domains. For example, large data centers are likely to have symmetry in their structure. This may enable the use of parametric model checking techniques [39], or symmetry reduction in model-checking and SAT-based techniques. Their application will open up new challenges. Our paper addresses the challenge of verifying large scale networks by developing a theory of network transformation. Our experimen- tal results show that the apparent complexity of a well-structured data center network (ostensibly 232 headers, ∼820K rules) can reduce to ∼4,000 header equivalence classes and ∼10,000 rules after suitable transformations. In some sense, we are extending the research program of [30] which reduces the number of header equivalence classes but not the number of rules (or ports or routers). The final simpler underlying structure may not be as surprising as it seems at first: if it were more complex, it would be beyond the understanding of the humans who design and operate the network. The initial experimental evidence in this paper shows that easyto-code versions of simple transformations (Examples 1 and 4) can provide large speedups, reducing the time for comprehensive verification from days to 2 hours. Since the all-pairs verification task is easily parallelized, this suggests that comprehensive evaluation can be done using a 32-core machine in under 4 minutes which is practical for immediate deployment. Other transformations such as slicing (see Examples 2 and 3) may well bring this time down further. The aim would be to achieve verification in the order of seconds, possibly also using incremental verification techniques as in [15]. Note that earlier results in [15] and [30] were not for all pairs, but only for single queries and for much smaller networks. The specific transformations we found useful for data center networks in our experiments may carry over to two other important classes of networks: enterprise networks and Internet Service Provider Networks. While neither uses fat-tree topologies, there are regularities in these designs that perhaps could be exploited using the methods of this paper. For example, most enterprise networks use a core network that interconnects a number of leaf networks, and Points of Presence (POPs) in ISP networks often use complete mesh topologies. Even if new transformations are needed, the bisimulation proof techniques may well still be applicable. While we have only implicitly touched on modularity (when we replaced the core by a wire in Section 7), we plan to extend the theory in this paper to allow modular verification; this would correspond to the compositionality properties of bisimulation in the process calculus, though, as there, composing properties of subsystems will no doubt present challenges. Other avenues for research include scaling quantitative verification (e.g., bandwidth and delay and not just reachability) and control plane verification [10] using network transformations. Finally, note that our semantics is relational, modeling nondeterminism. In particular partiality is used to model both dropped packets and infinite loops. One could instead adopt other semantic frameworks to model other aspects of networks, for example to distinguish packet dropping from infinite loops, or to model multicasting and/or probabilistic choice. We anticipate that one could then still follow the program of this paper and connect network verification with the then relevant notions of bisimulation. References [1] S. Aggarwal, R. Kurshan, and K. Sabnani. A calculus for protocol specification and validation. Protocol Specification, Testing, and Verification, 3(1), 1983. [2] C. J. Anderson, N. Foster, A. Guha, J.-B. Jeannin, D. Kozen, C. Schlesinger, and D. Walker. NetKAT: semantic foundations for networks. In POPL, 2014. [3] M. A. Armstrong. Groups and Symmetry. Springer, 1988. [4] S. Arun-Kumar. On bisimilarities induced by relations on actions. In SEFM, 2006. [5] N. Bjørner, G. Juniwal, R. Mahajan, S. A. Seshia, and G. Varghese. ddnf: An efficient data structure for header spaces. Technical report, Microsoft Research, November 2015. [6] E. M. Clarke, T. Filkorn, and S. Jha. Exploiting symmetry in temporal logic model checking. In CAV, 1993. [7] E. Emerson and A. Sistla. Symmetry and model checking. Formal Methods in System Design, 9(1-2):105–131, 1996. [8] E. A. Emerson and A. P. Sistla. Symmetry and model checking. In CAV, 1993. [9] C. Flanagan and S. Qadeer. Thread-modular model checking. In SPIN, 2003. [10] A. Fogel, S. Fung, L. Pedrosa, M. Walraed-Sullivan, R. Govindan, R. Mahajan, and T. Millstein. A general approach to network configuration analysis. In NSDI, 2015. [11] N. Foster, D. Kozen, M. Milano, A. Silva, and L. Thompson. A coalgebraic decision procedure for NetKAT. In POPL, 2015. [12] M. Hasegawa. Models of Sharing Graphs: A Categorical Semantics of let and letrec. PhD thesis, University of Edinburgh, 1997. [13] M. Hasegawa, M. Hofmann, and G. Plotkin. Finite dimensional vector spaces are complete for traced symmetric monoidal categories. In Pillars of Computer Science: Essays Dedicated to Boris (Boaz) Trakhtenbrot on the Occasion of His 85th Birthday, pages 367–385. Springer Berlin Heidelberg, 2008. [14] N. Ip and D. Dill. Better verification through symmetry. Formal Methods in System Design, 9(1), 1996. [15] P. Kazemian, M. Chang, H. Zeng, G. Varghese, N. McKeown, and S. Whyte. Real time network policy checking using header space analysis. In NSDI, 2013. [16] P. Kazemian, G. Varghese, and N. McKeown. Header space analysis: static checking for networks. In NSDI, 2012. [17] A. Khurshid, X. Zou, W. Zhou, M. Caesar, and P. B. Godfrey. VeriFlow: verifying network-wide invariants in real time. In NSDI, 2013. [18] J. F. Kurose and K. Ross. Computer Networking: A Top-Down Approach Featuring the Internet. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2nd edition, 2002. [19] Z. Li, M. Liang, L. O’Brien, and H. Zhang. The cloud’s cloudy moment: A systematic survey of public cloud service outage. In- ternational Journal of Cloud Computing and Services Science (IJCLOSER), 2(5):321–331, 2013. [20] N. P. Lopes, N. Bjørner, P. Godefroid, K. Jayaraman, and G. Varghese. Checking beliefs in dynamic networks. In NSDI, 2015. [21] B. Lubachevsky. An approach to automating the veri cation of compact parallel coordination programs. Acta Informatica, 21(2), 1984. [22] H. Mai, A. Khurshid, R. Agarwal, M. Caesar, P. B. Godfrey, and S. T. King. Debugging the data plane with Anteater. In SIGCOMM, 2011. [23] R. Majumdar, S. D. Tetali, and Z. Wang. Kuai: A model checker for software-defined networks. In FMCAD, 2014. [24] R. Milner. Communication and Concurrency. Prentice-Hall, 1989. [25] R. Milner. The Space and Motion of Communicating Agents. Cambridge University Press, 2009. [26] C. Monsanto, N. Foster, R. Harrison, and D. Walker. A compiler and run-time system for network programming languages. In POPL, 2012. [27] T. Nelson, A. D. Ferguson, M. J. G. Scheer, and S. Krishnamurthi. Tierless programming and reasoning for software-defined networks. In NSDI, 2014. [28] D. Sangiorgi. On the origins of bisimulation and coinduction. ACM Trans. Program. Lang. Syst., 31(4):15:1–15:41, May 2009. [29] R. E. Tarjan. Efficiency of a good but not linear set union algorithm. J. ACM, 22(2):215–225, 1975. [30] H. Yang and S. Lam. Real-time verification of network properties using atomic predicates. In ICNP, 2013. [31] H. Zeng, P. Kazemian, G. Varghese, and N. McKeown. Automatic test packet generation. In CoNEXT, 2012. [32] S. Zhang and S. Malik. SAT based verification of network data planes. In ATVA, 2013. [33] S. Zhang, S. Malik, and R. McGeer. Verification of computer switching networks: An overview. In ATVA, 2012.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement