Resource-Efficient Real-Time Scheduling Using Credit-Controlled Static-Priority Arbitration

Resource-Efficient Real-Time Scheduling Using Credit-Controlled Static-Priority Arbitration Firew Siyoum, Benny Akesson, Sander Stuijk, Kees Goossens, Henk Corporaal Eindhoven University of Technology, Department of Electrical Engineering, email:{f.m.siyoum, k.b.akesson, s.stuijk, k.g.w.goossens, h.corporaal}@tue.nl Abstract—A present-day System-on-Chip (SoC) runs a wide range of applications with diverse real-time requirements. Resources, such as processors, interconnects and memories, are shared between these applications to reduce cost. Resource sharing causes temporal interference, which must be bounded by a suitable resource arbiter. System-level analysis techniques use the service guarantee of the arbiter to ensure that realtime requirements of these applications are satisfied. A service guarantee that underestimates the minimum service provided by an arbiter results in more allocation of resources than needed to satisfy latency and throughput requirements. For instance, a linear service guarantee cannot accurately capture bursty service provision by many priority-based schedulers, such as Credit-Controlled Static Priority (CCSP) and Priority-Budget Scheduling (PBS). As a result, the timing analysis of these arbiters becomes too pessimistic. This leads to unnecessary cost penalties since some SoC resources, such as SDRAM bandwidth, are scarce and expensive. This paper addresses this problem for the CCSP arbiter. The two main contributions are: (1) a piecewise linear service guarantee that accurately captures bursty service provisioning, and (2) an equivalent dataflow model of the new service guarantee, which is an essential component to integrate the arbiter with dataflow-based system-level design techniques that analyze the worst-case latency and throughput of real-time applications. The new service guarantee enables efficient resource utilization under CCSP arbitration. Experimental results of an H.263 video decoder application show that memory bandwidth savings from 26% up to 67% can be achieved by using the new service guarantee as compared to the existing linear service guarantee. (PBS) [3] allow low latency to bursty requestors, which are users whose request rate fluctuate significantly over time. A linear service guarantee based on the latency-rate (LR) server model [4] cannot capture the bursty service provision by these arbiters. Therefore, it provides a pessimistic WCRT that leads to exaggerated latency and diminished throughput. As a consequence, a designer has to over-allocate resources to satisfy latency and throughput requirements. This may overshadow the benefits of these arbiters for SoC resource sharing. CCSP, in particular, has three key properties that make it suitable for scheduling access to shared MPSoC resources [2], [5]: (1) it bounds temporal interference and guarantees each requestor a minimum allocated service in a time interval, (2) it has an efficient allocation mechanism that enables high resource utilization, and (3) it has a small area hardware implementation that is also fast enough to keep up with the speed of SoC resources and make a scheduling decision every clock cycle. As shown in Figure 1, the current service guarantee of CCSP, referred to as a latency-rate service guarantee, is a linear lower bound on the provided service. However, like many priority-based arbiters, a requestor under CCSP can have bursty provided service intervals, as illustrated in the figure. As a result, the latency-rate service guarantee gives a pessimistic WCRT for CCSP arbitration. System-level timing analysis based on this service guarantee, consequently, results in over-allocation of SoC resources. I. I NTRODUCTION Convergence of application domains in consumer electronics requires a system to run a wide range of applications with diverse real-time requirements. To satisfy the computational requirements of these applications at low power consumption, sophisticated Multi-Processor Systems-on-Chip (MPSoC) with heterogeneous processing elements are used [1]. To reduce cost, system resources, such as processors, interconnects and memories, are shared between multiple users, which we refer to as requestors. However, resource sharing causes temporal interference between requestors, which must be bounded by a resource arbiter to guarantee a certain amount of service to each requestor. The minimum service an arbiter guarantees a requestor in an interval is referred to as its service guarantee. This guarantee bounds the worst-case response times (WCRT) of requestors, which is used by system-level analysis techniques to determine if real-time requirements, such as latency and throughput, of applications are satisfied. A service guarantee that underestimates the minimum service provided by an arbiter results in more allocation of resources than needed, which we refer to as over-allocation. For instance, priority-based schedulers, such as Credit-Controlled Static Priority (CCSP) [2] and Priority-Budget Scheduling Accumulated service units Index Terms—real-time; SoC; CCSP; service guarantee; over allocation; arbitration; dataflow; latency-rate server requested service provided service improvement in WCRT bursty provided service interval new bi−rate guarantee existing latency−rate guarantee Time [service cycle] Fig. 1. New piecewise linear vs. existing linear service guarantee. In this work, we show how to address the over-allocation problem through a piecewise linear service guarantee. Piecewise linear bounds have been previously used for trafficshaping [6]–[8], as well as computing response time of arbiters [9]. However, these works focus on regulating unbounded requested services to make service guarantee analysis feasible. On the other hand, our model is a piecewise linear lower bound on the worst-case provided service. The model is an improved service guarantee that accurately characterizes the provided service. Most importantly, this work shows how a piecewise linear service guarantee can be represented with an equivalent dataflow model. This brings the service guarantee from individual access requests to the application level. Ultimately, this allows us to compute the worst-case latency and throughput of applications that run on MPSoCs using dataflow-based system-level design techniques [10], [11]. This paper has two main contributions: (1) a piecewise linear service guarantee for CCSP, which we refer to as birate service guarantee, as shown in Figure 1, and (2) an equivalent dataflow model of the bi-rate service guarantee. As a result of the new bi-rate model, a given SoC resource under CCSP arbitration can support more requestors, or a given set of requestors can be accommodated with less resource capacity. Experiments on traced traffic of an H.263 video decoder show that we can save from 27% up to 67% of memory bandwidth, compared to the existing latency-rate service guarantee. The remaining part of this paper is organized as follows. In Section II, a review of related work is presented. Section III recaps the CCSP arbiter and presents the formal model used in this paper. The new bi-rate service guarantee and the associated dataflow model, which are the two major contributions of this paper, are presented in Sections IV and V, respectively. Section VI covers the experimental setup and obtained results. Finally, the paper concludes in Section VII, summarizing the highlights of this work. II. R ELATED WORK Simple and starvation-free schedulers are natural candidates to arbitrate resource accesses in real-time MPSoCs. This is because they provide upper bounds on temporal interference and have small and fast hardware implementations. Several of such arbiters have been proposed, including TDMA [12], and extensions of Round-Robin arbitration, such as Weighted Round-Robin [13], and Deficit Round-Robin [14]. These arbiters offer requestors a fixed allocated budget (rate), which is replenished within a fixed frame size (period). These framebased arbiters guarantee each requestor a minimum service proportional to its allocated rate. However, they fail to efficiently support diverse latency requirements without overallocating resources. For instance, they cannot provide low latency to a requestor with low bandwidth requirement without allocating more slots in a frame. This problem is referred to as coupling between latency and rate [15]. Over-allocation in these types of arbiters may also result from a similar coupling between allocation granularity and latency. For instance, a single slot in a TDMA table with four entries corresponds to 25% of the bandwidth. Over-allocation occurs if the requestor requires any less. The over-allocation can be reduced by increasing the number of entries, although, this increases the WCRT of all requestors sharing the resource [2], [15]. Priority Budget Scheduling (PBS) [3] addresses the coupling between latency and rate using priorities. However, it only improves the latency of a single bursty requestor with high priority and still couples allocation granularity and latency. On the other hand, efforts to decouple the allocation granularity from latency have been made in [16]–[18], although the resulting arbiters couple latency and rate, and are unable to efficiently distinguish diverse latency requirements. CCSP circumvents the shortcomings of existing resource arbiters, since (1) it efficiently supports diverse latency requirements, and (2) offers a small hardware implementation that runs sufficiently fast and keep up with resources like SDRAM memory controllers [19]. It has furthermore been shown in [2] that CCSP belongs to the class of LR servers, and hence offers a minimum guaranteed service to its requestors in an interval. This service guarantee is used at system-level to determine if latency and throughput requirements of applications are satisfied. However, the latency-rate service guarantee is a linear lower bound that cannot capture the bursty service provided by many priority-based schedulers. As a result, it yields a pessimistic WCRT for requestors under CCSP. This ultimately leads to unnecessarily excess resource allocations to attain the latency and throughput requirements. This paper presents a piecewise linear service guarantee for CCSP that captures the bursty provided service by modeling two distinct modes in the arbiter, namely when a requestor has enough budget for a bursty service, and when it does not. Our analysis is based on service curves, commonly used in analysis of timing bounds and buffer requirements in the field of communication networks [20]–[23]. A common similarity between these works is their use of traffic-shaping to enforce a certain request arrival curve. Piecewise linear bounds have also been employed for traffic-shaping/policing in [6]–[8], as well as for computing response time of real-time arbiters by bounding resource access requests [9]. However, these traffic shapers are applied to the requested service, whereas our model is a piecewise lower-bound on the provided service. Dataflow graphs allow analysis of the worst-case latency and throughput of real-time systems [24], [25]. State-of-theart dataflow-based system-level analysis techniques [10], [11] require various aspects of the system, such as computation, buffers and arbitration, to be modeled with dataflow components. Modeling resource sharing with dataflow components is not trivial, as equivalence in temporal behavior with the service guarantees of the arbiters should be proved using rigorous algebraic steps [26], [27]. In [28], a general dataflow model for arbiters in the class of LR servers is presented that also covers CCSP. However, the basis of this dataflow model is the current linear LR service guarantee, and hence suffers from the associated over-allocation problem for priority-based arbiters. In [29], a three-actor dataflow model is presented for the PBS arbiter. However, this model only targets shared memory access and the latency improvement applies only to the single high-priority requestor. In this paper, we present a new dataflow model for CCSP based on our new piecewise linear service guarantee that applies to any MPSoC resource as well as to requestors with any priority level. III. BACKGROUND The CCSP arbiter was originally proposed in [2]. A small and fast hardware implementation of the arbiter was presented in [5]. In this section, we iterate the basic operation of the CCSP arbiter and key definitions that are essential for a complete presentation of this work. In Section III-A, the formal model used in this paper is discussed. In Section III-B and III-C, the basic operation of the CCSP arbiter and the current service guarantee are explained using this formal model. A. Formal Model The analysis of the CCSP arbiter uses service curves [20] to model the interaction between the requestors and the resource. We use an abstract resource view where a service unit is the access granularity of the resource. Time is discrete and a time unit, referred to as a service cycle, is defined as the time required to serve such a service unit. We use closed discrete time intervals and hence [τ, t] includes all cycles in the sequence hτ, τ + 1, ..., t − 1, ti. A service curve is a cumulative function of service units. For any service curve ξ, ξ(t) denotes its value at service cycle t. We furthermore use ξ(τ, t) = ξ(t + 1) − ξ(τ ) to denote the difference in values between the endpoints of the closed interval [τ, t]. The set of requestors is denoted as R. The k th request from requestor r ∈ R is ωrk and its size in service units is s(ωrk ) ∈ N+ . Arriving requests from each requestor to the resource are placed in separate buffers in front of the resource. The arriving requests from a requestor are captured by the requested service curve, w, as defined in Definition 1. Definition 1 (Requested service curve): The requested service curve of a requestor r ∈ R is denoted wr (t) : N → N, where wr (0) = 0 and wr (t + 1) = wr (t) + s(ωrk ) wr (t) ωrk arrived at t + 1 no request arrived at t + 1 The scheduled requestor at time t is denoted γ(t) : N → R ∪ {∅} and receives service of one service unit. The service provided by the resource to a requestor is captured by the provided service curve, w′ , as given in Definition 2. Definition 2 (Provided service curve): The provided service curve of a requestor r ∈ R is denoted wr′ (t) : N → N, where wr′ (0) = 0 and ′ wr (t) + 1 γ(t) = r ′ wr (t + 1) = wr′ (t) γ(t) 6= r At any time t the difference between the requested service curve and the provided service curve gives us the number of service units waiting in the buffer for service. This is called the backlog of the requestor, defined in Definition 3. The concepts of requested service curve, provided service curve and backlog are illustrated in Figure 2. Definition 3 (Backlog): The backlog of a requestor r ∈ R at any time t is denoted qr (t) : N → N, and is defined as qr (t) = wr (t) − wr′ (t). Accumulated service units w q(τ ) w′ s(ω k ) τ the allocated burstiness (σr′ ) and the allocated rate (ρ′r ), as given in Definition 4. These two parameters, in combination, determine the fraction of resource capacity that the requestor is allocated. Definition 4 (Allocated service): The service allocation of a requestor r ∈ R is the pair (σr′ , ρ′r )P∈ R+ × R+ . For ′ a valid allocation, it should hold that ∀r∈R ρr ≤ 1 and ′ ∀r ∈ R : σr ≥ 1. CCSP uses a continuous budget replenishment policy, which is based on the concept of active periods, which is defined in Definition 5. Intuitively, the active period of a requestor is the maximum interval of time where (1) it is backlogged and/or (2) it is live, as defined in Definition 6. A requestor is said to be live at time t if the cumulative of the requested service units is not less than the total service the requestor would get if it were continuously getting service at its allocated rate. A requestor in its active period interval is said to be active and Rta denotes the set of active requestors at time t. Definition 5 (Active period): An active period of a requestor r ∈ R is defined as the maximum time interval [τ1 , τ2 ], such that ∀t ∈ [τ1 , τ2 ] : qr (t) > 0 ∨ wr (τ1 − 1, t − 1) ≥ ρ′r · (t − τ1 + 1). Definition 6 (Live requestor): A requestor r ∈ R is defined as live at a time t during an active period [τ1 , τ2 ] if wr (τ1 − 1, t − 1) ≥ ρ′r · (t − τ1 + 1). Potential refers to the amount of budget a requestor has and it is defined in Definition 7. The allocated burstiness, σ ′ , determines the initial potential of a requestor at the start of an active period, while the allocated rate, ρ′ , corresponds to the speed with which the budget is replenished during every service cycle in an active period. Together these two parameters determine the upper bound on the provided service of a requestor, ŵ′ , as illustrated in Figure 3. A high allocated burstiness entitles a requestor to more service before exhausting its potential, forcing it to surrender the resource to lower priority requestors. For every service unit a requestor is provided, the potential is decremented by one. Figure 3 illustrates the relationships between allocated service (ρ′ , σ ′ ), potential (π(t)) and active period ([τ1 , τ2 ] and [τ3 , τ4 ]). Definition 7 (Potential): The potential of a requestor r ∈ R is denoted πr (t) : N → R, where πr (0) = σr′ and πr (t) + ρ′r − 1 πr (t + 1) = πr (t) + ρ′r ′ σr Time [service cycle] Requested and provided service curves and related concepts. B. CCSP Arbiter A CCSP arbiter that arbitrates a set of requestors R sharing a common system resource consists of a rate regulator and a scheduler [2]. The scheduler uses a static-priority scheduling scheme. Each requestor is assigned a unique priority level and the set of requestors that have higher priority than a requestor r ∈ R is denoted Rr+ . In addition, each requestor has an allocated service that consists of two parameters: live line Accumulated service units Fig. 2. r ∈ Rta ∧ γ(t) = r r ∈ Rta ∧ γ(t) 6= r r∈ / Rta ∧ γ(t) 6= r upper bound on provided service w w′ π(t) ρ′ σ′ τ1 t τ2 τ3 τ4 Time [service cycle] Fig. 3. Allocated service, potential and active period. A requestor is said to be eligible for scheduling: (1) if it is backlogged i.e. qr (t) > 0 and (2) if it has enough potential for at least one service unit i.e. πr (t) ≥ 1 − ρ′r (since the potential of an active requestor increments by ρ′r every service cycle). The set of eligible requestors at time t is denoted Rte . The scheduler of CCSP is non-work-conserving and schedules the highest priority eligible requestor every service cycle. C. Latency-rate Service Guarantee Previous work [2] on the service guarantee of the CCSP arbiter shows that a requestor is guaranteed service according to its allocated rate, ρ′ , after a maximum latency, Θ. This linear service guarantee defines a lower bound, w̌′ , on the provided service curve during an active period (cf. Figure 4). This lower bound, which we refer to as the latency-rate service guarantee, shows that CCSP belongs to the class of LR servers [4] and it is used to derive the WCRT of requests. Based on this latency-rate service guarantee, an active requestor r ∈ R is guaranteed a minimum service during an active period [τ1 , τ2 ] according to the following relation: ∀t ∈ [τ1 , τ2 ] : w̌r′ (τ1 , t) = max(0, ρ′r · (t − τ1 + 1 − Θr )), where X σs′ Θr = ∀s∈Rr+ 1− X ρ′s (1) ∀s∈Rr+ IV. A RBITER A NALYSIS FOR B I - RATE S ERVICE G UARANTEE In this section, we present a piecewise linear service guarantee for the CCSP arbiter. The service guarantee of an arbiter is the minimal service it can provide to a requestor, irrespective of the situation of other requestors that are sharing the resource. This minimal service is normally computed by considering the worst-case scenario that leads to the WCRT. For a requestor under the CCSP arbiter, this worst-case scenario happens when it experiences the maximum interference from higher priority requestors. This is stated in Lemma 1 and proven in [2]. Lemma 1 (Maximum interference): The maximum interference experienced by a requestor r ∈ R during an interval [τ1 , τ2 ] occurs when all higher priority requestors start an active period at τ1 and remain active ∀t ∈ [τ1 , τ2 ], and equals X σs′ + ρ′s · (τ2 − τ1 + 1) (2) îr (τ1 , τ2 ) = ∀s∈Rr+ As discussed in Section III-C, the existing service guarantee of CCSP guarantees a requestor r ∈ R its allocated rate, ρ′r , after a maximum latency, Θr . However, in CCSP a requestor can be temporarily served at a rate higher than its allocated rate after its maximum latency. Intuitively, this can be explained as follows. At the end of the maximum latency, a requestor can have an accumulated potential from two sources, according to Definition 7: (1) from its allocated burstiness i.e. if σr′ > 1 and (2) from potential accumulated while being blocked by higher priority requestors, i.e. during the maximum latency, Θr . As a result, at the end of the maximum latency, a requestor has a potential that is equal to the sum of these two. When a requestor is at the end of its maximum latency, it implies that higher priority requestors have utilized their accumulated potential. Thus, they have to accumulate potential at their respective allocated rate during multiple service cycles, before they are eligible to access the resource again. Consequently, the requestor can use the resource whenever it is not used by higher priority requestors. This means, the requestor can get service at a higher rate ρ∗r ≥ ρ′r , where X ρ′s . (3) ρ∗r = 1 − ∀s∈Rr+ The requestor receives service at this higher rate as long as its potential does not go below 1 − ρ′r , which is the minimum potential a requestor needs to have to be eligible for scheduling. The service cycle at which the potential drops below 1 − ρ′r is referred to as the boundary cycle. After the boundary cycle the requestor has to wait multiple service cycles and accumulate potential at its allocated rate to be eligible. Therefore, it receives service at its allocated rate, ρ′r . If the potential does not drop below 1 − ρ′r throughout the active period, the boundary cycle is considered to be the last service cycle of the active period. In this case, the requestor receives all its service units in the active period with the higher service rate, ρ∗r . Definition 8 (Boundary Cycle): The boundary cycle of a requestor r ∈ R is defined as the maximum service cycle τrb during an active period [τ1 , τ2 ] of the worst-case scenario, such that ∀t ∈ [τ1 , τrb ] : πr (t) > 1 − ρ′r . (4) The boundary cycle can be determined based on Definition 8, as shown in Lemma 2. Note that in this paper, we only consider requestors for which ρ∗r > ρ′r , since otherwise the higher service rate is equal to the allocated rate. The case where ρ∗r = ρ′r can occur only P to the lowest priority requestor in a fully loaded resource i.e. ∀r∈R ρ′r = 1. Lemma 2: For a requestor r ∈ R with active period [τ1 , τ2 ] of the worst-case scenario, the boundary cycle τrb = min(τ2 , ⌊tx ⌋) where X σs′ σr′ −1+ρ′r + (5) ∀s∈Rr+ t x = τ1 + ∗ ′ ρ −ρ r r Proof: The potential of a requestor r at time t ∈ [τ1 , τ2 ] is given in Equation (6). The term σr′ + ρ′r · (t − τ1 ) is the potential the requestor gets and wr′ (τ1 , t − 1) is the potential it spends for the service units it receives in the interval [τ1 , t−1], according to Definition 7. πr (t) = σr′ + ρ′r · (t − τ1 ) − wr′ (τ1 , t − 1). (6) The service provided to requestor r during the worst-case scenario is given by Equation (7), where îr is the maximum interference, given by Lemma 1. It is computed by subtracting the maximum interference from higher priority requestors from the total service units available in the interval. wr′ (τ1 , t − 1) = t − τ1 − îr (τ1 , t − 1) (7) Since the higher service rate of CCSP is not captured in the latency-rate service guarantee, the timing analysis results in resource over-allocation to satisfy real-time requirements of requestors. To solve this problem, we propose a piecewise linear service guarantee, which we refer to as bi-rate service guarantee. The bi-rate service guarantee takes the higher service rate into account and improves the WCRT of requests, as shown in Figure 4. ∆ in the figure illustrates the improvement by using a bi-rate service guarantee. and the service guarantee is determined solely by the higher ˇr′ (τ1 , t) = w̌r′h (τ1 , t). In contrast, for rate linear equation, i.e. w̌ b ′a t > τr , w̌r (τ1 , t) is the minimum and the service guarantee ˇr′ (τ1 , t) = w̌r′a (τ1 , t). is determined by the allocated rate, i.e. w̌ If τ2 , i.e. the end of the active period, is less than tx , then the requestor gets service at its higher rate throughout the active period. w̌r′h (τ1 , t) ρ∗r w w ρ′ τb Fig. 4. τ2 Improving WCRT by piecewise linear model. Under the bi-rate service guarantee, a requestor r is guaranteed a minimum service at two different rates, ρ∗r and ρ′r , after a maximum latency Θr . These two rates correspond to the cases when a requestor has enough potential to access the resource at a high rate, and when it does not. The maximum latency, Θr , is the same as the one in the latencyrate service guarantee, which is given in Equation (1). The bi-rate guarantee is defined based on two linear equations: higher-rate guarantee w̌r′h and allocated-rate guarantee w̌r′a , which are given in Equations (8) and (9), respectively. w̌r′h (τ1 , t) = ρ∗r · (t − τ1 + 1 − Θr ) (8) w̌r′a (τ1 , t) (9) = ρ′r where Γr = − · (t − τ1 + 1 − Γr ) σr′ + ρ∗r − 1 ρ′r The new bi-rate service guarantee is, then, the minimum of the two at any time. Hence, during an active period [τ1 , τ2 ], a ˇr′ (τ1 , t), requestor r ∈ R is guaranteed a minimum service, w̌ as given in Equation (10). ˇr′ (τ1 , t) = max(0, min(w̌r′h (τ1 , t), w̌r′ (τ1 , t))) w̌ a Fig. 5. Higher-rate and allocated-rate guarantee. Time [service cycle] τrh = τrb − Θr . Time [service cycle] Θ Θr To guarantee a provided service at the higher rate, ρ∗ , the requestor has to ask for service at this rate. Thus, the new service guarantee applies only to requestors that have a minimum service request rate of ρ∗ at least up to time point τ h (cf. Figure 4), given in Equation (11). We refer to such requestors as high-rate requestors, defined in Definition 9. ρ∗ τh τ1 Γr ′ Accumulated Service Units ∆ ρ′r τrb τ1 live line w̌′ (latency-rate) ˇ′ (bi-rate) w̌ w̌r′a (τ1 , t) Accumulated Service Units Putting Equation (7) into Equation (6), gives us the complete relation for πr (t). Solving for tx from the relation πr (tx ) = 1 − ρ′r results in Equation (5). Then the boundary cycle τrb is the largest integer less than tx , since time is discrete, and does not exceed τ2 , which is the end of the active period. This gives us the relation τrb = min(τ2 , ⌊tx ⌋). (10) Γr is computed such that the intersection of the two linear equations, w̌r′h (τ1 , t) and w̌r′a (τ1 , t), is at tx , given in Equation (5). Hence, solving the two linear equations simultaneously shows that their intersection point is, indeed, at tx . If τ2 , which is the end of the active period, is greater than this intersection point tx , then the boundary cycle τrb = ⌊tx ⌋ according to Lemma 2. Therefore, as illustrated in Figure 5, for t ≤ τrb , w̌r′h (τ1 , t) is the minimum of the two equations (11) Definition 9: (High-rate requestor) A requestor r ∈ R is a high-rate requestor during an active period [τ1 , τ2 ] if ∀t ∈ [τ1 , τ2 ]: ∗ t ≤ τrh ρr · (t − τ1 + 1) wr (τ1 , t) ≥ wr (τ1 , τrh ) + ρ′r · (t − τrh ) Otherwise Next, we present the mathematical proof of this new service guarantee, but first two important lemmas are provided to simplify this proof. Equation (6) shows an interesting relationship between the provided service and potential. By rearranging, we arrive at Lemma 3. Lemma 3: For a requestor r ∈ R during an active period [τ1 , τ2 ], it holds that ∀t ∈ [τ1 , τ2 ] πr (t) ≤ σr′ − ρ′r ⇐⇒ wr′ (τ1 , t − 1) ≥ ρ′r · (t − τ1 + 1) (12) From Lemma 3, it has also been proved in [2] that the eligibility of a requestor can be determined by looking at its potential, as given in Lemma 4. Lemma 4: For a requestor r ∈ R during an active period [τ1 , τ2 ], it holds that ∀t ∈ [τ1 , τ2 ] πr (t) > σr′ − ρ′r =⇒ r ∈ Rte . (13) Theorem 1 defines the bi-rate service guarantee, making use of the concepts and results presented so far. Theorem 1: (Bi-rate service guarantee) During any active period [τ1 , τ2 ] and ∀t ∈ [τ1 , τ2 ], a high-rate requestor r ∈ R is guaranteed a minimum service according to Equation (10). Proof: The strategy of the proof is showing that the service guarantee holds for four different types of time intervals [τi , τj ] that cover all possible scenarios during the active period. Case 1: ∀t ∈ [τi , τj ] ∧ τj ≤ τrb : qr (t) > 0. This case covers time intervals before the boundary cycle where the requestor is backlogged. For this case, we want to show that a minimum service is guaranteed at the higher rate of the requestor; i.e. ˇr′ (τ1 , t) = w̌r′h (τ1 , t). Since we are before the boundary w̌ cycle, we know that ∀t ∈ [τi , τj ], πr (t) > 1 − ρ′r , according to Definition 8. This means the requestor is eligible, since the definition of this case states that it is also backlogged. The resource provides (τj − τi + 1) service units in the interval. Taking into account the maximum interference possible from higher priority requestors (cf. Lemma 1), the minimum service available to r is given as: ˇr′ (τi , τj ) = τj − τi + 1 − îr (τi , τj ). w̌ (14) Expanding and rearranging the right side of Equation (14) ˇr′ (τi , τj ) = ρ∗r · (τj − τi + 1 − Θr ), proving a minimum gives w̌ service at the higher rate for all timing intervals of this type. Case 2: ∀t ∈ [τi , τj ] ∧ τj ≤ τrb : qr (t) = 0. This case covers time intervals before the boundary cycle where the requestor is not backlogged. For time intervals of this type, we want to prove that a minimum service can be guaranteed at the higher rate. By Definition 3, the backlog of a requestor is the difference between the requested service and the provided service. If there is no backlog, this means the requested and provided services are equal i.e. wr (t) = wr′ (t). Since r is a high-rate requestor, it holds that the requested service wr (τ1 , t) ≥ ρ∗r · (t − τ1 + 1) for the interval [τ1 , τrh ]. Therefore, for this interval, wr′ (τ1 , t) ≥ ρ∗r ·(t−τ1 +1) ≥ w̌r′h (τ1 , t). Since wr (t) is a non-decreasing function, it also holds for the interval [τrh , τrb ] that wr′ (τ1 , t) ≥ w̌r′h (τ1 , t). Therefore, w̌r′h (τ1 , t) is the guaranteed service. Case 3: ∀t ∈ [τi , τj ] ∧ τi > τrb : πr (t) > σr′ − ρ′r . This case considers intervals after the boundary cycle where the requestor has more potential than its allocated burstiness. After the boundary cycle, the requestor is no more in the higher rate service interval of the active period. Therefore, in this interval we want to prove that a minimum service can be guaranteed ˇr′ (τ1 , t) = w̌r′a (τ1 , t). Since πr (t) > at its allocated rate, i.e. w̌ ′ ′ σr − ρr , the requestor is eligible according to Lemma 4. Similar arguments as in Case 1 of this proof regarding the available service units and the maximum interference give the guaranteed service as ρ∗r · (τj − τi + 1 − Θr ), which equals w̌r′h (τi , τj ). For t > τrb , w̌r′a (τi , τj ) is less than w̌r′h (τi , τj ). Therefore, w̌r′a (τi , τj ) can be taken as the guaranteed service. Case 4: ∀t ∈ [τi , τj ] ∧ τi > τrb : πr (t) ≤ σr′ − ρ′r . This case considers time intervals after the boundary cycle where the potential of the requestor is less than its allocated burstiness. The strategy of the proof is first to show that wr′ (τ1 , t − 1) ≥ w̌r′a (τ1 , t). Then, based on Definition 2 of provided service curve, it follows that wr′ (τ1 , t) ≥ wr′ (τ1 , t − 1) ≥ w̌r′a (τ1 , t). 1) The provided service wr′ (τ1 , t − 1) can be expanded as shown in Equation (15), since for any service curve ξ, ξ(τ, t) = ξ(t + 1) − ξ(τ ). wr′ (τ1 , t − 1) = wr′ (τ1 , τrb − 1)+ wr′ (τrb , τrb ) + wr′ (τrb + 1, t − 1). (15) We proceed by bounding the three terms in Equation (15) individually. 2) At τrb , requestor r is scheduled, because its potential drops, which happens only when a requestor gets service, according to Definition 7. This implies that wr′ (τrb + 1) = wr′ (τrb ) + 1. Using this, wr′ (τrb , τrb ) can be formulated as follows. wr′ (τrb , τrb ) = wr′ (τrb + 1) − wr′ (τrb ) = 1 3) From Case 1 and 2 of this proof, we already proved that wr′ (τ1 , τrb − 1) ≥ ρ∗r · (τrb − τ1 − Θr ) In addition, computing the intersection of the two linear equations ρ∗r · (τrb − τ1 − Θr ) and ρ′r · (τrb − τ1 ) + σr′ − 1 shows that for ∀t > τrb it holds that wr′ (τ1 , τrb − 1) ≥ ρ∗r · (τrb − τ1 − Θr ) ≥ ρ′r · (τrb − τ1 ) + σr′ − 1. 4) Since πr (t) ≤ σr′ − ρ′r we have from Lemma 3 that wr′ (τrb + 1, t − 1) ≥ ρ′r · (t − τrb ) 5) Now putting, the results from steps 2, 3 and 4 into Equation (15), we get: wr′ (τ1 , t − 1) ≥ 1 + ρ′r · (τrb − τ1 ) + σr′ − 1 + ρ′r · (t − τrb ) = 1 + ρ′r · (t − τ1 ) + σr′ − 1 ≥ ρ′r + ρ∗r + ρ′r · (t − τ1 ) + σr′ − 1 (Since 1 > ρ′r + ρ∗r ) = ρ′r · (t − τ1 + 1) + ρ∗r + σr′ − 1 = ρ′r · (t − τ1 + 1 − Γr ) ˇr′ (τ1 , t) = Hence, we can guarantee a minimal service w̌ ρ′r · (t − τ1 + 1 − Γr ) = w̌r′a (τ1 , t). In summary, the bursty service a requestor can receive depends mainly on its allocated burstiness and the allocated burstiness of its higher priority requestors. Increasing its allocated burstiness increases its potential at the start of the active period. Increasing the allocated burstiness of higher priority requestors enables the requestor to accumulate more potential at its allocated rate while being blocked by them. High accumulated potential means a requestor can receive service at a high rate, as it does not have to wait to accumulate potential to be eligible. The bi-rate service guarantee integrates this fact through the boundary cycle parameter, τ b , and guarantees each requestor high-rate service until its boundary cycle for every active period. This results in an improvement in the WCRT of requests as compared to latency-rate service guarantee that does not consider this bursty provided service. The reduction in the WCRTs of individual requests lead to improvements in latency and throughput of applications. Since designers are concerned in satisfying these real-time requirements, a system-level representation of the service gurantee is needed. Dataflow-based system-level design technique is one efficient approach to achieve this goal. In the next section, we present a dataflow model of the bi-rate service guarantee. A. Dataflow-based System-level Analysis A DFG [32] is a directed graph that consists of actors that are connected through channels. A channel represents a FIFO buffer through which actors communicate by sending tokens. A token represents an abstract data unit. The connection point between an actor and a channel is referred to as a port. An example DFG consisting of two actors X, Y and two channels cxx , cxy is shown in Figure 6(a). Channels that have the same source and destination actor are referred to as self-edges (e.g. channel cxx ). A number next to a black dot over a channel represents the number of initial tokens that are available in the channel at the beginning of the execution of the graph. A variety of DFGs exist today with different levels of analyzability, expressiveness and implementation efficiency [33]. Homogeneous Synchronous Dataflow graph (HSDFG) is one of them. In HSDFG, when there exists at least one token on every input port of an actor, the actor fires. At the end of the firing, the actor produces one token on each of its output ports. The execution duration of a single firing of an actor Y is denoted χ(Y ) ∈ R+ . Depending on the system aspect a given actor models, the implication of the execution duration may vary. For instance, for an actor that models a software process, the execution duration may represent its worst-case execution time (WCET) for a given processor type. Figure 6(b) shows an example binding of actors X and Y onto an example MPSoC platform. The platform is a multi-tile architecture, where each tile comprises a processing unit (P), a local memory (M) and a network interface (NI). By introducing dataflow components that model various sorts of architectural aspects, such as computation, storage, arbitration and communication, into the DFG, an architectureaware DFG can be created. A simple example of an Tile 1 P 1 cxx M Tile 2 X P NI Y M n1 NI X cxy N Communication Network cxy Memory Controller Y (a) DFG Fig. 6. X 1 n2 Shared Memory Y 1 (b) Binding of actors (c) Architectureaware DFG Use of DFGs for system-level design and analysis of MPSoCs architecture-aware HSDFG is shown in Figure 6(c). In this graph the initial tokens n1 and n2 model the allocated buffer sizes for the channel cxy on Tile 1 and Tile 2, respectively. The inter-tile communication latency between actors X and Y is modeled by the execution time of actor N , χ(N ). Timing analysis and scheduling of this system can be carried out on the architecture-aware DFG. For example, the worst-case throughput of the application is the inverse of the maximum cycle mean (MCM) of the graph, which is the largest average execution duration among all cycles in the graph [24]. The approach presented in this section can also be applied to complex graphs that use more expressive DFGs. Detailed discussions on dataflow-based system-level design and analysis techniques can be found in [10], [11] and [33]. B. Dataflow Model for CCSP arbitration We present a HSDFG model, which is shown in Figure 7, for service provision using CCSP arbitration for a requestor r ∈ R. The dataflow model consists of three actors: latency actor (Lr ), higher-rate actor (Hr ) and allocated-rate actor (Ar ). The number of initial tokens hr and the execution durations of the three actors are determined by the requestor’s allocated service and the allocated services of its higherpriority requestors, Rr+ . The number of initial tokens hr is given in Equation (16), where sr , provided in Equation (17), refers to the number of service units that can be served at the higher rate during every active period. k j ′ r (16) hr = sr − (sr −2)·ρ ρ∗ r sr = j Θr −Γr 1 − ρ1∗ ρ′ r r k (17) 1 Accumulated service units V. DATAFLOW M ODEL A dataflow graph (DFG) is a directed graph that allows high-level modeling and analysis of real-time applications. These graphs have efficient analysis techniques to compute throughput and buffer requirements of applications that are running on MPSoCs. They can capture cyclic data dependencies between processes of an application, which exist in many real-life systems. Because of these reasons, DFGs play an indispensable role in today’s MPSoC design flows that carry out application binding and resource scheduling [12], [30]. In addition, dataflow design tools enable fast design-spaceexploration (DSE) of MPSoCs to find arbiter configurations and other system parameters that satisfy latency and throughput requirements of applications [10], [11], [27]. State-of-the-art dataflow-based system-level analysis techniques require that the various aspects of the system, such as computation, storage and arbitration, to be modeled with dataflow components. Dataflow modeling of arbitration is not trivial, as equivalence in temporal behavior with the service guarantees of the arbiters should be proved using rigorous algebraic steps [26]–[29]. In this section, we present a dataflow model for CCSP arbitration, based on the bi-rate service guarantee previously derived in Section IV. First in Section V-A, we briefly introduce how DFGs can be used for system-level design and analysis of MPSoCs. Section V-B then presents the dataflow model for the CCSP arbiter, along with the intuition behind its execution. Due to space reasons, the complete mathematical proof of the dataflow model is not presented in this paper, but it is provided in [31]. Ar hr Lr ρ′r sr Hr ρ∗r Θr 1 (a) CCSP dataflow model Fig. 7. Time [service cycle] (b) Rate of provided service Dataflow model for CCSP arbitration. The execution durations of the three actors for a requestor r ∈ R are given in Equations (18), (19) and (20). The case where hr = 1 is unique from all other cases (hr > 1), since this implies that actors Hr and Ar cannot be executed in parallel. This is because there can exist a maximum of only one token at a time over their cyclic dependency. The execution duration of actor Ar , χ(Ar ) for hr = 1 is hence different from the other cases, as shown in Equation (20). χ(Lr ) = Θr (18) 1 ρ∗r (19) χ(Hr ) = χ(Ar ) = ( 1 ρ′r 1 ρ′r − 1 ρ∗ r if hr > 1 if hr = 1 (20) Figure 8 demonstrates the use of the CCSP dataflow model in the context of system-level design by extending Figure 6(c) to model CCSP arbitrated shared memory communication between actors X and Y of Figure 6(a). Actor N of Figure 6(c) is replaced by two CCSP dataflow models for the write and read latencies of actors X and Y . Actors Nx and Ny model the latencies over the communication network. 1 1 Ax Ay hy hx X Nx Lx Hx Ny Ly 1 1 Hy Y 1 1 n2 n1 token by actor Hr marks the finishing time of the j th service unit, denoted F(j). F(j) is determined by the arrival time of tokens from its three input channels, shown in Figure 7(a). These are: (1) the arrival of the j th token from the output of actor Lr , E(j) + Θr , (2) the finishing time of the previous firing, F(j − 1), which models that only one service unit can be served at a time and (3) the production time of the (j − hr )th token from actor Ar . This is because there are already hr initial tokens in this channel, representing the number of service units that can be served at the higher rate, ρ∗r , before the potential is exhausted at the boundary cycle. For every service unit served at the higher rate, one token is consumed from these initial tokens. As long as there are tokens available on this channel, service units can be served at the higher rate. This is because in order to fire, actor Hr does not have to wait for the production of tokens by actor Ar . However, after all tokens are consumed, the rate of production of tokens by actor Hr is determined by the rate of production of tokens by the actor Ar . According to Equation (20), this implies that service units are served at the allocated rate of the requestor, ρ′r . Notice that for hr = 1, the rate of token production by actor Hr is the sum of the execution durations of both actors Hr and Ar . This is because the two actors cannot be fired simultaneously, since there can exist a maximum of only one token over the cyclic dependency at a time. Bringing the above three limiting factors together, the finishing time of service units can be expressed with a maxplus equation [34] that bounds the completion time of service units, as shown in Equation (21). G(j) denotes the finishing time of the j th token at the output of the allocated-rate actor. cxy Fig. 8. Architecture-aware DFG with shared memory communication. The rate of service provision by the CCSP dataflow model, shown in Figure 7(b), is equivalent to the bi-rate service guarantee, previously given in Equation (10). In the remaining part of this section, the intuition behind the rate of service provision of the dataflow model is explained. Arriving requests at the input of the CCSP arbiter can be multiple service units long, i.e. for the k th request s(ω k ) ≥ 1. Each service unit is represented by a single token in our dataflow model. The arrival of a request at the input of the arbiter is analogous to the arrival of multiple tokens at the input of actor Lr . This means the arrival time of the j th token at actor Lr , denoted E(j), marks the arrival time of the j th service unit at the input of the arbiter. Actor Lr produces a single token for every arriving token after an execution duration of Θr service cycles, which models the maximum waiting time every service unit encounters. The production time of the j th token by the latency actor equals E(j) + Θr . It is important to notice that actor Lr does not have a selfedge with initial tokens. The number of initial tokens on a self-edge determines the auto-concurrency of the actor, which is the number of possible simultaneous firings of the actor. This implies that multiple tokens can wait in parallel, which accurately reflects what actually happens at the input of the arbiter. On the other hand, actors Hr and Ar have self-edges, each with a single initial token that implies only one service unit can be served at a time by the resource. Output tokens produced by actor Hr represent completed service units. This means that the production time of the j th max(E(j) + Θr , F(j − 1), F(j) = G(j − h)) + 0 1 ρ∗ r j>0 Otherwise (21) The mathematical proof of equivalence between Equation (21) and the bi-rate service guarantee, previously given in Equation (10), is provided in [31]. VI. E XPERIMENT R ESULTS This section presents experimental results that demonstrate the implications of our new bi-rate service guarantee for a real-life application. First, in Section VI-A, the experimental setup is presented. Then, Section VI-B shows improvements in guaranteed throughput by applying the bi-rate service guarantee. Finally, Section VI-C demonstrates how the throughput improvement translates to resource savings when satisfying real-time requirements. A. Experimental Setup Our experimental setup emulates a MPSoC with a SRAM memory controller that uses the CCSP arbiter to arbitrate memory accesses from multiple requestors. We limit the number of requestors to five, r1 to r5 , and they are assigned unique priority levels in decreasing order, giving r1 the highest and r5 the lowest priorities. Each requestor ri has an associated allocated rate (ρ′ri ) and allocated burstiness (σr′ i ). In real-time system design, the resource allocation of requestors should guarantee that each of them satisfies their timing requirements in the worst-case interference scenario. Priority assignment Memory controller models (ri ) 1 r1 H.263 decoder memory access trace −clock cycle −number of service units ... L ri Sr i w̌r′ i r2 w 5 7 x 10 6 5 4 3 2 Requested Service Curve ˇr′ ) Bi−rate Service Guarantee (w̌ 3 1 0 0 1 r3 and the application is assigned priority level 3 in the CCSP arbiter. We return to the experiment with these parameters later. The modified requested service curves (according to both service guarantees) are not shown in this figure, since they ˇr′ ) at this graph overlap with the two lower bounds (w̌r′ 3 and w̌ 3 scale. This graph shows the improvement in latency by using the bi-rate service guarantee, compared to the latency-rate rate service guarantee. Accumulated Service [service unit] The objective of this experiment is to demonstrate improvements in throughput and resource allocation by applying the bi-rate service guarantee as compared to the existing latencyrate service guarantee. For this purpose, a MATLAB module is written that models a simple SRAM memory controller with a 32-bit interface, both according to the latency-rate dataflow model of [28] and our new dataflow model, presented in Section V. Given a requested service curve, wri , the module computes two lower bounds on the provided service curve, ˇr′ , based on the latency-rate and the bi-rate dataflow w̌r′ i and w̌ i models, respectively. The experiment setup is illustrated in Figure 9, where the execution duration of actor Sri of the LR server model is given by χ(Sri ) = 1/ρ′ri . ′ Latency−rate Service Guarantee (w̌r3 ) 1 2 3 4 5 6 Time [clock cycle] 7 8 9 7 x 10 Ari r4 hri L ri Hr i ˇr′ w̌ i r5 1 Fig. 9. The experimental setup. A requested service curve w is generated by tracing memory transactions of a real-life application. The SimIt-ARM [35] cycle-accurate simulator is used to trace all external memory transactions (cache misses) from a StrongArm processor that runs an H.263 video decoder application (written by Telenor research, a popular public-domain implementation of H.263). The simulator is configured with a 16 Kbyte instruction cache with 32-byte blocks and a 16 Kbyte write-back data cache with 32-byte blocks. In addition, the memory reading and writing latencies of the simulator are set to zero while generating the trace. The service unit of the memory controller module is set to 32-bytes (the cache line) and the service cycle to 8 clock cycles. In the experiment, it is assumed that all memory read and write transactions stall the processor. The stalling of the processor is taken into account by modifying the requested service curve w in such a manner that the arrival time of every transaction (requested service unit) is delayed by the response time of the previous transaction. This is important as the trace is originally generated by setting the write and read latencies of the simulator to zero. B. Throughput Improvement In this experiment, we intend to demonstrate differences in guaranteed throughput according to the latency-rate and bi-rate service guarantees. Assuming the traced traffic represents the worst-case memory access pattern, we compute the throughput by taking the inverse of the total time required to decode one complete frame. This gives us the minimum throughput in frames per second (fps), since the worst-case memory access pattern yields the maximum total time duration to decode one frame. Figure 10 shows the service curves while the system decodes one complete frame where ∀i ρ′ri = 0.15, ∀i σr′ i = 2 Fig. 10. Requested service (generated with zero memory access latency) and lower bounds for the provided service based on latency-rate and bi-rate service guarantees. Table I shows the improvements in throughput as the assigned priority of the application is varied from 1 to 5. The allocated rate is fixed at 0.15 for all cases and the allocated burstiness of all requestors is varied together from 1 to 4. The result shows that requestors with average priority improve more than both high and low priority requestors. This is because high-priority requestors have smaller Θ, which implies that they accumulate less potential while being blocked and low-priority requestors have smaller ρ∗ relative to highpriority requestors (cf. Equation (3)). Though it is not shown here, we are also able to observe from this experiment that the achievable throughput improvements for different allocated rates have similar trends as the results shown in Table I. TABLE I T HROUGHPUT IMPROVEMENTS IN %. Priority 1 2 3 4 5 ∀i σr′ i = 1 27 36 44 41 24 ∀i σr′ i = 2 30 48 45 28 15 ∀i σr′ i = 3 30 54 36 21 11 ∀i σr′ i = 4 31 53 30 17 9 It is important to note that these achieved improvements are neither at the cost of additional latency penalty of lower priority requestors nor by additional allocated resource capacity. It is a direct result of the tighter bi-rate service guarantee that accurately models the service provided by the CCSP arbiter. C. Efficient Resource Utilization The improvements in throughput can be traded for resource savings, since the throughput requirement of a real-time requestor can now be satisfied with less allocated resources. Assume that our application runs on a 266 MHz processor and has a throughput requirement of 15 fps. Figure 11 shows all combinations of service allocations and assigned priorities that satisfy this requirement, based on both the latency-rate and bi-rate guarantees. Latency−rate model Bi−rate model 0.9 0.8 Resource saving 0.3 Priority 1 Priority 2 Priority 3 0.28 Priority 1 Priority 2 Priority 3 90 Priority 1 Priority 2 Priority 3 80 0.7 0.26 70 0.6 0.5 0.4 resource saving (%) allocated rate allocated rate 0.24 0.22 0.2 60 50 40 0.3 30 0.18 0.2 20 0.16 0.1 0 0 1 2 3 4 5 6 7 8 9 1011 allocated burstiness (a) Latency-rate model 0.14 10 0 1 2 3 4 5 6 7 8 9 1011 allocated burstiness 0 1 2 3 4 5 6 7 8 9 1011 allocated burstiness (b) Bi-rate model (c) Resource saving Fig. 11. Resource allocation for throughput-constrained application: a) shows latency-rate cannot capture burstiness and b) shows bi-rate can. It can be seen that by using the bi-rate service guarantee and without allocating any additional burstiness (keeping σr′ i = 1), the resource allocation can be reduced by 67% (from ρ′r = 0.68 to ρ′r = 0.22) at priority level 3, by 43% at priority level 2 and by 26% at priority level 1. The resource savings further increase when increasing the allocated burstiness. For instance, at priority level 3 and σr′ 3 = 10, the resource allocation can be reduced by 72% (from 0.68 to 0.19). These improvements are achieved without imposing any additional latency on lower-priority requestors, since the arbiter configuration is the same in both lower-bound computations. This experiment shows that the latency-rate service guarantee cannot capture bursty provided service and, hence, fails to completely decouple latency and rate. In this use case, the allocated rate must be increased at all priority levels to satisfy the throughput requirement of 15 fps. This results in substantial over-allocation of the resource. For instance, when the application is assigned priority level 3, 46% of the memory bandwidth is wasted due to the pessimism of the latency-rate model, which allocates 68% of the bandwidth. In contrast, the bi-rate service guarantee, presented in this paper, accurately models the behavior of the CCSP arbiter and accomplishes this at 22%, and consequently, enables efficient resource utilization by avoiding over-allocation of the resource. VII. C ONCLUSIONS The Credit-Controlled Static-Priority (CCSP) arbiter is suitable for scheduling shared System-on-Chip resources, such as memories. The existing linear service guarantee of the arbiter fails to capture that service may be provided in a bursty manner, resulting in over-allocation of resources to satisfy realtime requirements. In this paper, we address this drawback through a new piecewise linear (bi-rate) service guarantee. In addition, for system-level analysis, we present a novel dataflow model of the arbiter, based on the new service guarantee. As a result of the new bi-rate guarantee, a given resource under CCSP arbitration can support more requestors, or a given set of requestors can be accommodated with less resource capacity. Experimental results on traced traffic of an H.263 video decoder application show that savings from 26% up to 67% in memory bandwidth can be achieved by using the new bi-rate guarantee compared to the existing linear guarantee. R EFERENCES [1] “International Technology Roadmap for Semiconductors (ITRS),” 2009. [2] B. Akesson et al., “Real-Time Scheduling Using Credit-Controlled Static-Priority Arbitration,” in Proc. RTCSA, 2008. [3] M. Steine et. al, “A priority-based budget scheduler with conservative dataflow model,” in DSD, 2009. [4] D. Stiliadis and A. Varma, “Latency-rate servers: a general model for analysis of traffic scheduling algorithms,” IEEE/ACM Trans., 1998. [5] B. Akesson et al., “Efficient Service Allocation in Hardware Using Credit-Controlled Static-Priority Arbitration,” in Proc. RTCSA, 2009. [6] A. Francini and F. Chiussi, “Minimum-latency dual-leaky-bucket shapers for packet multiplexers: Theory and implementation,” in IWQoS, 2000. [7] H. J. Chao and J. S. Hong, “Design of an ATM shaping multiplexer with guaranteed output burstiness,” Computer Systems Science and Engineering, 1996. [8] F. Guillemin et. al, “Extremal traffic and bounds for the mean delay of multiplexed regulated traffic streams,” in INFOCOM, 2002. [9] S. Stein et. al, “A polynomial-time algorithm for computing response time bounds in static priority scheduling employing multi-linear workload bounds,” in ECRTS, 2010. [10] A. Bonfietti et. al, “An efficient and complete approach for throughputmaximal SDF allocation and scheduling on multi-core platforms,” in DATE, 2010. [11] C. Lee et. al, “A systematic design space exploration of MPSoC based on synchronous data flow specification,” Journal of Signal Processing Systems, Springer, 2010. [12] A. Hansson et. al, “CoMPSoC: A template for composable and predictable multi-processor system on chips,” ACM TODAES, vol. 14, no. 1, 2009. [13] M. Katevenis et. al, “Weighted round-robin cell multiplexing in a general-purpose ATM switch chip,” IEEE J. Sel. Areas Commun., vol. 9, no. 8, Oct. 1991. [14] M. Shreedhar and G. Varghese, “Efficient fair queueing using deficit round robin,” in Proc. SIGCOMM, 1995. [15] H. Zhang, “Service disciplines for guaranteed performance service in packet-switching networks,” Proc. IEEE, vol. 83, no. 10, 1995. [16] C. Kalmanek et. al, “Rate controlled servers for very high-speed networks,” Proc. GLOBECOM, 1990. [17] S. S. Kanhere and H. Sethu, “Fair, efficient and low-latency packet scheduling using nested deficit round robin,” High Performance Switching and Routing, 2001 IEEE Workshop on, 2001. [18] D. Saha et al., “Carry-over round robin: a simple cell scheduling mechanism for ATM networks,” IEEE/ACM Trans. Netw., vol. 6, no. 6, 1998. [19] B. Akesson et. al, “Predator: a predictable SDRAM memory controller,” in Proc. CODES+ISSS, 2007. [20] J.-Y. L. Boudec and P. Thiran, Network calculus: a theory of deterministic queuing systems for the Internet. Springer-Verlag New York, Inc., 2001. [21] R. Cruz, “A calculus for network delay. I. Network elements in isolation,” IEEE Trans. Inf. Theory, vol. 37, no. 1, 1991. [22] G. Kesidis and T. Konstantopoulos, “Worst-case performance of a buffer with independent shaped arrival processes,” IEEE Communications Letters, 2000. [23] E. Wandeler et. al, “Performance analysis of greedy shapers in real-time systems,” in DATE, 2006. [24] A. Ghamarian et. al, “Throughput analysis of synchronous data flow graphs,” in ACSD, 2006. [25] M. Geilen and S. Stuijk, “Worst-case performance analysis of synchronous dataflow scenarios,” in CODES/ISSS, 2010. [26] M. Bekooij et. al, “Performance guarantees by simulation of process,” in SCOPES, 2005. [27] M. Bekooij et. al, “Dataflow analysis for real-time embedded multiprocessor system design,” in chapter 15. Dynamic and Robust Streaming Between Connected CE Devices. Kluwer Academic Publishers, 2005. [28] M. Wiggers et. al, “Modelling run-time arbitration by latency-rate servers in dataflow graphs,” in SCOPES, 2007. [29] J. Staschulat and M. Bekooij, “Dataflow models for shared memory access latency analysis,” in EMSOFT, 2009. [30] A. Kumar et. al, “Multi-Processor System-Level Synthesis for Multiple Applications on Platform FPGA,” in FPL, 2007. [31] F. Siyoum et. al, “Dataflow Model for Credit-Controlled Static-Priority Arbitration,” http://www.es.ele.tue.nl/esreports/, TU/Eindhoven, Tech. Rep., 2010. [32] E. Lee and D. Messerschmitt, “Synchronous data flow,” Proc. IEEE, vol. 75, no. 9, 1987. [33] S. Stuijk, “Predictable Mapping of Streaming Applications on Multiprocessors,” Ph.D. dissertation, Eindhoven University of Technology, 2007. [34] B. Heidergott et. al, Max Plus at Work: Modeling and Analysis of Synchronized Systems. Princeton University Press, 2006. [35] http://simit-arm.sourceforge.net/.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Related manuals
Download PDF
advertisement