# Applying Pay-Burst-Only-Once Principle for Periodic Power

Applying Pay-Burst-Only-Once Principle for Periodic Power Management in Hard Real-Time Pipelined Multiprocessor Systems GANG CHEN, Technische Universität Muenchen KAI HUANG, Technische Universität Muenchen and Sun Yat-sen University CHRISTIAN BUCKL, Fortiss GmbH ALOIS KNOLL, Technische Universität Muenchen Pipelined computing is a promising paradigm for embedded system design. Designing a power management policy to reduce the power consumption of a pipelined system with nondeterministic workload is, however, nontrivial. In this article, we study the problem of energy minimization for coarse-grained pipelined systems under hard real-time constraints and propose new approaches based on an inverse use of the pay-burst-onlyonce principle. We formulate the problem by means of the resource demands of individual pipeline stages and propose two new approaches, a quadratic programming-based approach and fast heuristic, to solve the problem. In the quadratic programming approach, the problem is transformed into a standard quadratic programming with box constraint and then solved by a standard quadratic programming solver. Observing the problem is NP-hard, the fast heuristic is designed to solve the problem more efficiently. Our approach is scalable with respect to the numbers of pipeline stages. Simulation results using real-life applications are presented to demonstrate the effectiveness of our methods. Categories and Subject Descriptors: C.3 [Special-Purpose and Application-based System]—Real-Time and Embedded Systems General Terms: Algorithms Additional Key Words and Phrases: Scheduling, energy, pay-burst-only-once, periodic power management, real-time system ACM Reference Format: Gang Chen, Kai Huang, Christian Buckl, and Alois Knoll. 2015. Applying pay-burst-only-once principle for periodic power management in hard real-time pipelined multiprocessor systems. ACM Trans. Des. Autom. Electron. Syst. 20, 2, Article 26 (February 2015), 27 pages. DOI: http://dx.doi.org/10.1145/2699865 1. INTRODUCTION With increasing requirements for high performance, multicore architectures are believed to be the major solution for future embedded systems. Many real-time applications, especially streaming applications, can be executed on multiple processors simultaneously to achieve parallel processing. When real-time applications are executed A preliminary version of a portion of this article appears in Proceedings of the Conference on Design Automation and Test in Europe 2013. This work has been partly supported by China Scholarship Council, German BMBF project ECU (grant no. 13N11936) and Car2X (grant no. 13N11933). The authors would like to thank their sponsors for their support. Authors’ addresses: G. Chen, Department of Informatics, Technische Universität Muenchen (TUM), Boltzmannstraße 3, 85748, Garching bei München, Germany; K. Huang (corresponding author), School of Mobile Information Engineering, Sun Yat-sen University, Zhu Hai, China; email: [email protected]; C. Buckl, Fortiss GmbH, Germany; A. Knoll, Department of Informatics, Technische Universität Muenchen (TUM), Boltzmannstraße 3, 85748, Garching bei München, Germany. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected] c 2015 ACM 1084-4309/2015/02-ART26 $15.00 DOI: http://dx.doi.org/10.1145/2699865 ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. 26 26:2 G. Chen et al. on multicore architectures powered by batteries, minimizing the energy consumption is one of the major design goals, because an energy-efficient design will increase the lifetime, increase the reliability, and decrease the heat dissipation of the system. Pipelined computing is a promising paradigm for embedded system design, which can, in principle, provide high throughput and low energy consumption [Carta et al. 2007]. For instance, a streaming application can be split into a sequence of functional blocks that are computed by a pipeline of processors where power-gating techniques can be applied to achieve energy efficiency. Performance constraints of a streaming application are usually imposed on two principle metrics, that is, throughput and latency. The latency is the main concern for applications such as video/telephone conferencing and automatic pattern recognition applications, where the latency beyond a certain boundary is not tolerated. In the case of pipelined real-time systems, the latency of a streaming application can be expressed as the end-to-end deadline requirement that the application is processed through the pipeline. Designing the scheduling policy for the pipeline stages under the requirements of both energy efficiency and timing guarantee is, however, nontrivial. In general, energy efficiency and timing guarantee are conflict objectives, that is, techniques that reduce the energy consumption of the system will usually pay the price of longer execution time, and vice versa. Previous work on this topic either requires precise timing information of the system [Yu and Prasanna 2002; Xu et al. 2007] or tackles only soft real-time requirements [Javaid et al. 2011b; Carta et al. 2007]. However, this precise timing of task arrivals might not be guaranteed in practice. Thus, the previous approaches cannot guarantee the worst-case deadline and cannot be applied to those embedded systems where violating deadlines could be disastrous. Compared to the preceding work, our work tackles a pipelined event stream with nondeterministic workloads in hard real-time systems by an inverted use of the pay-burst-only-once principle for energy efficiency. This article studies the energy minimization problem of coarse-grained pipelined systems under hard real-time requirements. We consider a streaming application that is split into a sequence of coarse-grained functional blocks which are mapped to a pipeline architecture for processing. The workload of the streaming application is abstracted as an event stream and the event arrivals of the stream are modeled as the arrival curves in the interval domain [Le Boudec and Thiran 2001]. The event stream has an end-to-end deadline requirement, that is, the time by which any event in the stream travels through the pipeline should be no longer than this required deadline. The objective is thereby to find those optimal scheduling policies for individual stages of the pipeline with minimal energy consumption while the deadline requirement of the event stream is guaranteed. Intuitively, the problem can be solved by partitioning the end-to-end deadline into sub-deadlines for individual pipeline stages and optimizing the energy consumption based on the partitioned sub-deadlines. However, any partition strategy based on the end-to-end deadline and the follow-up optimization method will suffer counting multiple times the burst of the event stream, which will inevitably overestimate the needed resource for every pipeline stage and lead to poor energy saving. A motivation example in Section 4 will demonstrate this drawback in detail. Therefore, a more sophisticated method is needed to tackle this problem. In this article, we develop a new approach to solve the energy minimization problem for pipelined multiprocessor embedded systems while guaranteeing the worst-case end-to-end delay. This article summarizes and extends the results built in Chen et al. [2013]. Our idea to solve this problem lies in an inverse use of the well-known pay-burstonly-once principle [Le Boudec and Thiran 2001]. Rather than directly partitioning the ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. Applying Pay-Burst-Only-Once Principle for Periodic Power Management 26:3 end-to-end deadline, we compute for the entire pipeline one service curve that serves as a constraint for the minimal resource demand. The energy minimization problem is then formulated with respect to the individual resource demands of pipeline stages. To solve this problem, we propose two heuristics, that is, a quadratic programming heuristic and a fast heuristic. In the quadratic programming heuristic, the minimization problem is transformed to a standard quadratic programming problem with box constraint and then solved by a standard solver. Observing that the formulated problem is NP-hard, we present a fast heuristic to find a suboptimal solution by analyzing the properties of the optimal solution, running with the complexity O(mn) (where m and n are the stage number and sample step number, respectively). For simplicity, we consider power-gating energy minimization and use periodic dynamic power management in Huang et al. [2009b, 2011a] to reduce the leakage power, that is, to periodically turn on and off the processors of the pipeline. In this work we compute period power management schemes offline and the fixed Ton/Toff for processors of every pipeline stage are applied during runtime. With this approach, we can not only guarantee the overall end-to-end deadline requirement but also retrieve the pay-burst-only-once phenomena, achieving a significant reduction of the energy consumption. In addition, our methods are scalable with respect to the number of pipeline stages. The contributions of this article are summarized as follows. —A new method is developed to solve the energy minimization problem for pipelined multiprocessor embedded systems by inversely using the pay-burst-only-once principle. —A minimization problem is formulated based on the needed resource of individual stages of the pipeline architecture and a transformation of the formulation to a standard quadratic programming problem with box constraints. The formulated problem is proved NP-hard. —A quadratic programming heuristic is developed to solve the formulated problem and a formal proof is provided to show the correctness of our approach, that is, guarantee on the end-to-end deadline requirement. —A fast heuristic is developed to solve the formulated problem, running with the complexity O(mn). The rest of the article is organized as follows. Section 2 reviews related work in the literature. Section 3 presents basic models and the definition of the studied problem. Section 4 presents the motivation example and Section 5 describes the proposed approach. Experimental evaluation is presented in Section 6, and Section 7 concludes. 2. RELATED WORK Pipelined computing is a promising paradigm for embedded system design, which can in principle provide high performance and low energy consumption. Pipelined multiprocessor systems are widely applied as a viable platform for high performance implementation of multimedia applications [Shee and Parameswaran 2007; Javaid and Parameswaran 2009; Shee et al. 2006; Karkowski and Corporaal 1997]. Energy optimization for pipelined multiprocessor systems is an interesting topic where a number of techniques have been proposed in the literature. Carta et al. [2007] and Alimonda et al. [2009] proposed a feedback control technique for dynamic voltage/frequency scaling (DVFS) in a pipelined MPSoC architecture with soft real-time constraints, aimed at minimizing energy consumption with throughput guarantees. Each pipelined processor is associated with a dedicated controller that monitors the occupancy level of the queues to determine when to increase or decrease the voltage-frequency levels of the processor. Javaid et al. [2011b] proposed an adaptive pipelined MPSoC architecture and a runtime balancing approach based on workload prediction to achieve energy ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. 26:4 G. Chen et al. efficiency. The authors in Javaid et al. [2011a] proposed a dynamic power management scheme for adaptive pipelined MPSoCs. In this work, the duration of idle periods is determined based on future workload prediction and used to select an appropriate power state for the idle processor. However, the prior approaches are under soft real-time constraints. Regarding hard real-time systems, these approaches cannot be applied. There are also methods [Davare et al. 2007; Hong et al. 2011; de Langen and Juurlink 2006, 2009; Liu et al. 2014; Yu and Prasanna 2002] for hard real-time systems. To guarantee the end-to-end delay, the anthers in Liu et al. [2014] studied the problem of minimizing the number of processors required for scheduling end-to-end deadlineconstrained streaming applications modeled as CSDF graphs, where the actors of a CSDF are executed as strictly periodic tasks. In Davare et al. [2007], the authors optimized periods for dependent tasks on hard real-time distributed automotive systems in order to meet the end-to-end constraints. In Hong et al. [2011], the authors proposed a distributed approach to assign local deadlines for periodical tasks on distributed systems to meet the end-to-end deadline constraints. To reduce the energy consumption, Yu and Prasanna [2002] presented an integer linear programming (ILP) formulation for the problem of frequency assignment of a set of periodic independent tasks on a heterogeneous multiprocessor system. The authors in de Langen and Juurlink [2006, 2009] proposed leakage-aware scheduling heuristics to reduce the energy consumption by translating real-time applications with periodic tasks to DAGs using the framebased scheduling paradigm and considering the trade-offs among DVFS, DPM, and the number of the processors. But these methods require precise timing information such as periodical real-time events. However, in practice, this precise timing information of task arrivals might not be determined in advance. The nondeterminism in the timing of event arrivals results from two main causes: (a) An event may be triggered by the physical environment, which, in general, is not able to be accurately predicted. (b) When a distributed system is considered, an event might be triggered by other events on different processing components in which variable execution workloads would make the prediction of precise information on event arrivals extremely complicated. In the aforesaid research, there is no guarantee that an event will arrive in time. Therefore, these approaches cannot be applied to guarantee the worst-case deadline in embedded systems where violating deadlines could be disastrous. Unlike previous work, we focus on improving energy efficiency in hard real-time embedded systems while guaranteeing the system satisfies the worst-case deadline constraint. To model irregular event arrivals, Real-Time Caculus (RTC) [Thiele et al. 2000], which is based on network calculus [Le Boudec and Thiran 2001], can be applied. Specifically, the arrival curve in the RTC models an upper bound and a lower bound of the number of event arrivals or the demand of computation under a specified time interval domain. Considering the DVFS system, Maxiaguine et al. [2005] computed a safe frequency at periodic intervals to prevent buffer overflow of a system. By adopting RTC models, Chen et al. [2009] explored the schedulability for the online DVFS scheduling algorithms proposed in Yao et al. [1995]. Combining optimistic and pessimistic DVFS scheduling, Perathoner et al. [2010] presented an adaptive scheme for the scheduling of arbitrary event streams. When only considering dynamic power management (DPM), Huang et al. [2009b, 2011a] presented an algorithm to find periodic time-driven patterns to turn on/off the processor for energy saving. Online algorithms are proposed in Huang et al. [2009a, 2011b] and Lampka et al. [2011] to adaptively control the power mode of a system, procrastinating the processing of arrived events as late as possible. In one algorithm in Huang et al. [2009a, 2011b], a tight bound of event arrivals is computed based on historical information of event arrivals in the recent past. Instead of using historical information, a dynamic counter technique [Lampka et al. 2011] is used to predict the future workload. Compared to preceding work, the ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. Applying Pay-Burst-Only-Once Principle for Periodic Power Management 26:5 Fig. 1. System model. distinct difference of ours is that we can tackle the correlation of a pipelined event stream by an inverted use of the pay-burst-only-once principle. With this new method, retrieving this correlation of the same event stream between different pipeline stages, we can compute longer deadlines for each pipeline stage and reduce the overall power consumption of the system. 3. MODELS AND PROBLEM DEFINITION 3.1. Hardware Model The hardware architecture we have chosen is a simplified one with no shared cache and shared bus among different processing cores. The processing cores are connected in a pipelined fashion via dedicated FIFOs. We consider the system with pipeline architecture shown in Figure 1(a). Subtasks of a partitioned application are mapped and executed in different processors. The processors communicate data only through distributed memory units. Each memory unit can be organized as one or several FIFOs. The data communication and synchronization among processors are realized by blocking read and write SW primitives. This kind of hardware architecture has been realized in Nikolov et al. [2008]. As the service curve of each stage can be computed for energy efficiency by our proposed approaches offline, the worst-case FIFO size of each stage can be determined by applying the analysis approach in Wandeler et al. [2006]. Each processor in the pipelined system has three power consumption modes, namely active, standby, and sleep modes, as shown in Figure 1(b). To serve events, the processor must be in the active mode with power consumption Pa . When there is no event to process, the processor can switch to sleep mode with lower power consumption Pσ . However, mode switching from sleep mode to active mode will cause additional energy and latency penalty, respectively denoted as Esw,on and tsw,on. To prevent the processor from frequent mode switches, the processor can stay at standby mode with power consumption Ps , which is less than Pa but more than Pσ , that is, Pa > Ps > Pσ . Moreover, the mode switch from active (standby) mode to sleep mode will cause energy and time overhead, respectively denoted by Esw,sleep and tsw,sleep. Consider the overhead of switching the system from active mode to sleep mode, the system break-even time T BET denotes the minimum time length that the system stays at sleep mode. If the interval that the system can stay at sleep mode is smaller than T BET , the mode-switch mode overheads are larger than the energy saving, therefore switching mode is not worthwhile. And break-even time T BET can be defined as follows: Esw T BET = max tsw , , (1) Ps − Pσ where tsw = tsw,on + tsw,sleep and Esw = Esw,on + Esw,sleep. ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. 26:6 G. Chen et al. 3.2. Energy Model The analytical processor energy model in Martin et al. [2002], Wang and Mishra [2010], Jejurikar et al. [2004], and de Langen and Juurlink [2009] is adopted in this article, whose accuracy has been verified with SPICE simulation [Martin et al. 2002; Wang and Mishra 2010; de Langen and Juurlink 2009]. The dynamic power consumption of the core on one voltage/frequency level (Vdd, f ) can be given by 2 Pdyn = Ceff · Vdd · f, (2) where Vdd is the supply voltage, f the operating frequency, and Ceff the effective switching capacitance. The cycle length tcycle is given by a modified alpha power model tcycle = Ld · K6 , (Vdd − Vth)α (3) where K6 is technology constant and Ld is estimated by the average logic depth of all instructions’ critical path in the processor. The threshold voltage Vth is given as Vth = Vth1 − K1 · Vdd − K2 · Vbs , (4) where Vth1 , K1 , K2 are technology constants and Vbs is the body bias voltage. The static power is mainly contributed by the subthreshold leakage current Isubn, the reverse bias junction current I j , and the number of devices in the circuit Lg . It can be presented by Psta = Lg · (Vdd · Isubn + |Vbs | · I j ), (5) where the reverse bias junction current I j is approximated as a constant and the subthreshold leakage current Isubn can be determined as Isubn = K3 · e K4 Vdd · e K5 Vbs , (6) where K3 , K4 , and K5 are technology constants. To avoid junction leakage power overriding the gain in lowering Isubn, Vbs should be constrained between 0 and −1V. Thus, the power consumption at active mode and at standby mode, that is, Pa and Ps , under one voltage/frequency (Vdd, f ) can be respectively computed as Pa = Pdyn + Psta + Pon, (7) Ps = Psta + Pon, (8) where Pon is an inherent power needed for keeping the processor on. 3.3. Task Model This article considers streaming applications that can be split into a sequence of tasks. As shown in Figure 1(a), an H.263 decoder is represented as four tasks (i.e., PD1, deQ, IDCT, MC) implemented in a pipelined fashion [Oh and Ha 2002]. To model the workload of the application, the concept of arrival curve α() = [α u(), αl ()], originating from network calculus [Le Boudec and Thiran 2001], is adopted. α u() and αl () provide the upper and lower bounds on the number of arrival events for the stream S in any time interval . Many other traditional timing models of event streams can be unified in the concept of arrival curves. For example, a periodic event stream can be modeled by a set of step functions, where ᾱ u() = p + 1 and ᾱl () = p . For a sporadic event stream with minimal interarrival distance p and maximal interarrival distance p , the upper and lower arrival curves are ᾱ u() = p + 1, ᾱl () = , p respectively. Moreover, a widely used model to specify an arrival curve is the PJD model, where the arrival curve is characterized by period p, jitter j, and minimal ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. Applying Pay-Burst-Only-Once Principle for Periodic Power Management 26:7 Fig. 2. Examples for arrival curves: (a) periodic events with period p; (b) events with minimal interarrival distance p and maximal interarrival distance p = 1.3 p; (c) events with period p, jitter j = p, and minimal interarrival distance d = 0.75 p. interarrival distance d. In the PJD model, the upper arrival curve can be determined j as ᾱ u() = min{ + , d }. Figure 2 depicts arrival curves for the previous cases. p Analogous to arrival curves that provide an abstract event stream model, a tuple β() = [β u(), β l ()] defines an abstract resource model which provides upper and lower bounds on the available resources in any time interval . Further details are referred to Thiele et al. [2000]. Note that arrival curves are event based, meaning they specify the number of events of the steam in one interval of time, while service curves are based on the amount of computation time. Therefore, service curve β has to be transformed to β̄ to indicate the number of events of the stream that the processor can process in a specified interval time. Suppose that the execution time of an event l u is c, the transformation of the service curves can be done by β̄ l = βc and β̄ u = βc . With these definitions, a processor with lower service curve β̄ Gl () is said to satisfy the deadline D for the event stream specified by α u() if the following condition holds. β̄ Gl () ≥ α u( − D), ∀ ≥ 0 (9) Note that we adopt the same assumption as Maxiaguine et al. [2005], Huang et al. [2009a, 2009b], Lampka et al. [2011], and Chen et al. [2009] and assume the worst-case execution time (WCET) of each task can be predefined and considered as system input in the article. As mentioned in the previous section, the hardware architecture that we have chosen is a simplified one with no shared cache and shared bus among different processing cores. In this sense, we can safely assume the WCET of the running tasks as system inputs. 3.4. Problem Statement This article considers periodic power management [Huang et al. 2009b] that periodically turns on and off a processor. In each period T = Ton +Toff , it switches the processor to active (standby) mode for Ton time units, followed by Toff time units in sleep mode, as shown in Figure 1(b). Given a time interval L, where L T and TL is an integer, suppose that γ (L) is the number of events of event stream S served in L. If all the served events finish within L, the energy consumption E(L, Ton, Toff ) by applying this periodic scheme is E(L, Ton, Toff ) = L (Esw,on + Esw,sleep) Ton + Toff L · Toff L · Ton Ps + Pσ + Ton + Toff Ton + Toff + c · γ (L)(Pa − Ps ) ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. 26:8 G. Chen et al. = L · Esw L · Ton(Ps − Pσ ) + Ton + Toff Ton + Toff + L · Pσ + c · γ (L)(Pa − Ps ), where Esw is Esw,on + Esw,sleep for brevity. Given a sufficiently large L, without changing the scheduling policy, the minimization of energy consumption E(L, Ton, Toff ) of a single processor is to find Toff and Ton such that the average idle power consumption P(Ton, Toff ) is minimized. P(Ton, Toff ) = def L·Esw Ton+Toff + L·Ton·(Ps −Pσ ) Ton+Toff L Esw + Ton · (Ps − Pσ ) . = Ton + Toff (10) on , the average idle power consumption P in (10) can be defined By defining K = TonT+T off by Toff and K(0 ≤ K ≤ 1) as follows: Esw Esw def P(K, Toff ) = · K. + (Ps − Pσ ) − (11) Toff Toff By analyzing (11), it is obvious that the following properties hold. Property 1. minimum. ∀Toff , Toff ≥ Property 2. ∀Toff , Toff < Esw Ps −Pσ Esw Ps −Pσ , P(K, Toff ) gets its minimum when K gets its , P(K, Toff ) gets its minimum as Ps − Pσ when K = 1. According to Properties 1 and 2, when Toff > Esw Ps −Pσ holds, the processing unit should turn on as briefly as possible in one period. When Toff ≤ Esw Ps −Pσ holds, the processing unit should turn on all the time with Toff = 0. In this context, can be seen as the break-even time of the processing unit. Based on (10), the energy minimization problem of an m-stage pipeline can be formulated as minimizing the function m i i Esw + Ton · Psi − Pσi P(Ton, Toff ) = , (12) i + Ti Ton off i Esw Ps −Pσ m 1 2 1 2 m where Ton = [Ton Ton . . . Ton ] and Toff = [Toff Toff . . . Toff ]. Now we can define the problem that we studied as follows. Given pipelined platform with m stages, an event stream S processed by this pipeline, and an end-to-end deadline requirement D, we are to find a set of periodic power managements characterized by Ton and Toff that minimize the average idle power consumption P defined in (12) while guaranteeing that the worst-case end-to-end delay does not exceed D. 4. MOTIVATION EXAMPLE A phenomenon called pay-burst-only-once is well known and can give a closer upper estimate on the delay when an end-to-end service curve is derived prior to delay computations [Fidler 2003]. When a workload flow with a burst traverses a number of stages in sequence, the effect of the burst of the flow on the end-to-end delay bound is ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. Applying Pay-Burst-Only-Once Principle for Periodic Power Management 26:9 Fig. 3. Motivation example. the same as if the flow traversed only one node. The end-to-end delay bound computed with this property can be tighter than the sum of delay bounds of each node. This section presents a motivation example where an event stream passes through a two-stage pipeline with a deadline requirement D. For simplicity, arrival curves in the leaky-bucket form and service curves in rate-latency form [Le Boudec and Thiran 2001] are used. In this representation, an arrival curve is modeled as α() = b + r · , where b is the burst and r the leaky rate. Correspondingly, a service curve is modeled as β() = R · ( − T ), where R is service rate and T the delay. A graphical illustration of the example is shown in Figure 3, where D = 20, b = 5, r = 0.5, and R1 = R2 = 1. We first inspect the strategy of partitioning the end-to-end deadline and using the partitioned sub-deadlines for the two pipeline stages. For simplicity, we split the D equally, that is, D/2 for each stage. As shown in Figure 3(a), given D/2 deadline requirement for the first pipeline stage, we obtain the maximal T1 = D − Rb1 = 5, 2 corresponding to the minimal service demand β1 = − 5. To derive the minimal β2 for the second stage of the pipeline is more involved. We need the output arrival curve α from the first stage. According to Le Boudec and Thiran [2001], α () = b + r · T1 + r · . 1 Now again, with a deadline requirement D/2 for α , we have T2 = D − b+r·T = 2.5. 2 R1 Let’s take a close look at this solution. According to the concatenation theorem β R1 ,T1 ⊗β R2 ,T2 = βmin(R1 ,R2 ),T1 +T2 , we get a concatenated service curve β = −(T1 + T2 ) = − 7.5. With this concatenated service curve, the maximal overall end-to-end deadline for β1 and β2 is 12.5, which is far more strict than D. This example indicates that the obtained β1 and β2 based on partitioning the end-to-end deadline are too pessimistic. The reason for the pessimism comes from paying the burst b/R1 for the second stage 1 of the pipeline as well as the additional delay r·T from the first stage, as the pay-burstR2 only-once principle points out. These effects will be accumulated for every stage of the pipeline, leading to even more pessimistic results as the number of pipeline stages increases. In addition, computing the resource demand of each stage requires the lower bound of the output arrival curve from the previous stage. Computing this output curve requires numerical min-plus convolution that will incur considerable computational and memory overheads. In conclusion, the strategy based on partitioning the end-toend deadline is not a viable approach, in particular for those cases of pipelined systems with many stages. On the other hand, one can first derive the total concatenated server demand β T l , in this case T = 15 as shown in Figure 3(b). Any partition based on this T will result in smaller but valid service curves for each pipeline stage, as we can always retrieve the original end-to-end deadline by means of the pay-burst-only-once principle. For ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. 26:10 G. Chen et al. example, by an equal partition of T , both T1 and T2 are 7.5 and D is still preserved. This brings the basic idea of our approach that will be presented in the next section. 5. PROPOSED APPROACH Our approach lies in an inverse use of the pay-burst-only-once principle, as mentioned in the previous section. Rather than directly partitioning the end-to-end deadline, we compute one service curve for the entire pipeline, which serves as a constraint for the minimal resource demand. The energy minimization problem is then formulated with respect to the resource demands for individual pipeline stages. To solve this minimization problem, the formulation is transformed into a quadratic programming form and solved by a 2-phase heuristic. Without loss of generality, a pipelined system with m heterogeneous stages (m ≥ 2) is considered. The processor of the i stage can provide minimal βiGl service. Since periodic i and power management is considered, the minimal service βiGl can be modeled as a Ton i Toff pair: − Ti off i βiGl () = Ton ⊗ . (13) i + Ti Ton off The derivation of Eq. (13) is presented in Lemma A.1 in the appendix section. In addition, to obtain a tight lower bound of the service curve of the entire pipeline, we i i restrict Ton as a multiple of the worst-case execution time ci , that is, Ton = ni ci , ni ∈ N + . 5.1. Problem Formulation Regarding the problem formulation, we first present an approximation approach (see Lemma 5.1) to derive a lower bound of the PPM service curve. By using this approximated curve, we derive the concatenated service curve directly (see Lemma 5.2), which can be used to guarantee the real-time properties (see Theorem 5.3). Then, the energy minimization problem is formulated with respect to the resource demands for individual pipeline stages. Before presenting the formulation, we first state a few basics. By i Ton defining Ki = T i +T i , we have the following two lemmas. on off LEMMA 5.1. β̄iGl () ≥ Ki ( ci i − Toff − ci ). PROOF. According to the definition of the min-plus convolution operation inequality a + b ≥ a + b, and the inequality Eq. (13), we have ⎢ ⎥ ⎢ i −Toffi ⎥ ⎢ Ton T i +T i ⎥ on ⎢ ⎥ off β̄iGl () ≥ ⎣ . ⎦⊗ ci ci , the i With the restriction Ton = ni ci , ni ∈ N + and a ≥ a, we have ⎢ ⎥ ⎢ i −Toffi ⎥ i ⎢ Ton T i +T i ⎥ − Toff on ⎢ ⎥ off ⎣ ⎦ = ni · i + Ti ci Ton off ≥ ≥ According to a ≥ a − 1, we have ci Ki i − Toff . ci 1 ( ci − ci ). ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. Applying Pay-Burst-Only-Once Principle for Periodic Power Management 26:11 According to the rule of min-plus convolution of rate-latency service curve β R1 ,T1 ⊗ β R2 ,T2 = βmin(R1 ,R2),T1 +T2 in Le Boudec and Thiran [2001] and Ki ≤ 1, we have 1 Ki 1 Ki Ki i i i − Toff − Toff ⊗ ( − ci ) = min − Toff , − ci = − ci . ci ci ci ci ci Then, we get the right side of the inequality. LEMMA 5.2. m i=1 β̄i Gl m ≥ mini=1 ( Kcii )( − m i i=1 (Toff + ci )). PROOF. According to the rule of min-plus convolution of rate-latency service curve β R1 ,T1 ⊗ β R2 ,T2 = βmin(R1 ,R2),T1 +T2 in Le Boudec and Thiran [2001] and Lemma 5.1, we have m m m i Ki Ki m Gl i β̄i ≥ − Toff − ci = min Toff + ci . − i=1 ci ci i=1 i=1 i=1 With Lemma 5.2, we state the next theorem. THEOREM 5.3. Assuming an event stream modeled with arrival curve α is processed by an m-stage pipeline and the lower service curve of each pipeline stage is defined by a i i Ton and Toff pair, the pipelined system satisfies an end-to-end deadline D if the following condition holds. m i Ki m Toff + ci ≥ α u( − D) min (14) − i=1 ci i=1 PROOF. In Lemma 5.2, the right-hand side of the inequality is a lower bound of m m Gl Gl that is the concatenated service curve of the pipeline. With ≥ i=1 β̄i i=1 β̄i u α ( − D), the end-to-end delay of the pipeline is no more than D according to the pay-burst-only-once principle. Therefore, the theorem holds. The left-hand side of inequality Eq. (14) can be considered as a bounded delay funcm tion bdf (, ρ0 , b0 ) = max(0, ρ0 ( − b0 )) with slope ρ0 = mini=1 ( Kcii ) and bounded delay m i + ci ). For the stream S with deadline D, a set of minimum bounded b0 = i=1 (Toff delay functions bdfmin(, ρ, b) can be derived by varying b (see Section 5.2). Therefore, Toff ] such that the resulting bounded delay function we should find a solution of [ K, bdf (, ρ0 , b0 ) is no less than the minimum bounded delay functions bdfmin(, ρ, b). Therefore we can formulate our optimization problem as following: minimize Toff K, subject to Toff ) P( K, m min i=1 m Ki ci ≥ρ i Toff + ci ≤ b i=1 0 ≤ Ki ≤ 1, i = 1, . . . , m i ≥ 0, i = 1, . . . , m, Toff ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. (15) 26:12 G. Chen et al. = [K1 , . . . , Kn]. P( K, Toff ) is obtained as follows by conducting a transformawhere K i Ton tion Ki = T i +T i to the average power consumption (10) of each stage. on off Toff ) = P( K, m i i Esw (1 − Ki ) i + Ps − Pσi Ki . i Toff The advantage of formulation (15) is twofold. First of all, the service curves of individual pipeline stages are the variables of the optimization problem, which, on the one hand, overcomes the problem of paying the burst multiple times while, on the other hand, avoiding the costly computation during the optimization. Second, this formulation allows us to use a more efficient method to analyze the problem as presented in the following sections. 5.2. Quadratic Programming Transformation How to solve the minimization problem (15) is not obvious. The constraints b and ρ, indeed, are not fixed values and in addition these two constraints are correlated. For a fixed b, the minimum bounded delay function bdfmin(, ρ, b) can be determined by computing ρ. ρ = inf {ρ : bdf (, ρ, b) ≥ α u( − D), ∀ ≥ 0}. (16) In this article, we conduct the optimization by varying b and computing ρ for every possible b. For a fixed b, we can transform (15) into a quadratic programming problem with box constraints (QPB), as stated in the following lemma. LEMMA 5.4. The minimization problem in (15) can be transformed as the following quadratic programming problem with box constraints: minimize x=[x1 ... xm] x T Q x subject to 0 ≤ xi ≤ (17) i (1 Esw − ρ ci ), i = 1, . . . , m, where Q = A− B, A is an m× m matrix of ones and B is an m× m diagonal matrix with (b− m c j )(P i −P i ) s σ j=1 . i th diagonal element i Esw ∗ Denote x as the optimal solution for the QPB problem in (17), then the optimization (x ∗ )2 x∗ i solution for (15) can be obtained with Ki = 1 − Eii and Toff = m i x∗ (b − m j=1 c j ). sw j=1 j PROOF. With Cauchy-Buniakowski-Schwartz’s inequality, we can get that m 2 m m i Esw (1 − Ki ) i i (1 − k ) Toff · ≥ Esw . i i T off i=1 i=1 i=1 The minimum value of m i (1−Ki ) Esw i i=1 Toff lowing equation holds. m can be obtained at ( √ Ei (1−ki ))2 sw b− m j=1 c j i=1 when the fol- ⎞ ⎛ m i (1 − K ) Esw i ⎝b − = cj⎠ . j m j=1 E (1 − K ) sw j j=1 i Toff ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. Applying Pay-Burst-Only-Once Principle for Periodic Power Management 26:13 Then optimization formulation in (15) can be formulated as m i m ( i=1 Esw (1 − Ki ))2 i m Ps − Pσi Ki + minimize K1 ,K2 ,...,Km b − j=1 c j i=1 subject to ρ ci ≤ Ki ≤ 1, i = 1, . . . , m. i (1 − K ), formulation (15) can be transformed as the QPB By defining xi = Esw i problem in (17). i Note that there is a feasible region for b. To guarantee all the resulting Toff ≥ 0, the m bounded delay b should not be less than i=1 ci . According to (14), the maximum slope ρ of the bounded delay function will not exceed max1m ci . Correspondingly, we derive i=1 the minimum bounded delay function bdfmin(, max1m ci , b). By inverting (16), we can i=1 derive the maximum delay bu by (18), which can guarantee that all the resulting Ki will not exceed 1. In summary, the feasible region of b ∈ [bl , bu] can be bounded as follows: 1 u u , d ≥ α ( − D), ∀ ≥ 0 b = sup d : bdf , m maxi=1 ci m bl = ci . (18) i=1 5.3. Quadratic Programming Heuristic With the preceding information, we can now present the overall algorithm to the energy minimization problem defined in Section 3.4. Basically, bounded delay b is scanned by step within the range [bl , bh]. For each b, we first solve the subproblem (17) with a QPB solver, and then the obtained solution is repaired to fulfill further constraints (this will be explained later on). The pseudocode of the algorithm is depicted in Algorithm 1. ALGORITHM 1: Quadratic Programming Heuristic Input: α u, bl , bh, , and Pmin = ∞ opt , To f f, opt Output: K 1: for b = bl to bh with step do 2: compute ρ by Eqn. 16; and Toff by solving (17); 3: obtain K 4: repair K and Toff ; Toff ) < Pmin then 5: if P( K, ; To f f, opt ← Toff ; 6: Kopt ← K opt , To f f, opt ); 7: Pmin ← P( K 8: end if 9: end for THEOREM 5.5. ∃i ∈ {1, 2, . . . , m} and hard. i Esw Psi −Pσi < b− m j=1 c j , then the problem is NP- i m Esw PROOF. If there exists a stage pi where the condition P i −P i < b − j=1 c j holds, s σ the matrix Q in Lemma 5.4 is not positive semi-definite. Thus, QPB is the nonconvex quadratic programming problem which is NP-hard [Jeyakumar et al. 2006]. ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. 26:14 G. Chen et al. To solve the subproblem (line 3 in Algorithm 1), we apply the existing QPB solver. According to Theorem 5.5, QPB is NP-hard when the scanned bounded delay b is big i m Esw enough (i.e., P i −P i < b − j=1 c j ). It is in general difficult to solve the problem optis σ mally. Nevertheless, there are approximation schemes [Fu et al. 1998] that can efficiently solve the nonconvex QPB and there are many excellent off-the-shelf software packages [Chen and Burer 2012] available. In this article, the state-of-the-art finite B&B algorithm [Chen and Burer 2012] is applied to solve our QPB problem. and Toff , the repair phase (line 4 in Algorithm 1) is conAfter obtaining a pair of K ducted to fulfill further constraints. This repair scheme is represented in Algorithm 2. i i First of all, the resulting Toff of pipeline stage i may be smaller than tsw . In the case i i where Toff < tsw , turning off the processor of stage i is not possible, therefore the soi ] = [1, 0], stage i is on all the time (line 2 lution for stage i is repaired by [Ki , Toff in Algorithm 2). However, this repair step will lead to the loss of sleep time Qi for each stage (line 21 in Algorithm 2). We record this loss and try to reassign the loss to each stage at the end of the algorithm (lines 21–32 in Algorithm 2) to minimize the power i consumption further. Second, the resulting Ton may not be a multiple of ci , which is i one of our basic requirements. The repair steps are conducted to make Ton a multiple of ci (lines 6–20 in Algorithm 2). To guarantee the resulting Ki is constant with respect Ti i i to Ki , Toff should be adjusted to Koni − Ton , and Ti indicates how much sleep time of i (line 14 in Algorithm 2). the stage i should be adjusted comparing to the original Toff i If Ti > 0 holds, it means that Ton decreases and the stage i should decrease sleep i to make Ki constant (Line 16 in Algorithm 2), which will result in the loss time Toff Qi and this part can be reassigned to prolong the sleep time of other stages. Ti ≤ 0 i i indicates that Ton increases and the stage should increase sleep time Toff to make Ki i i constant. For this case, we make Toff constant with respect to Toff , which results in a Ki increase and power consumption increase Ei (line 18 in Algorithm 2). In the end, the total loss Q should be reassigned to the stage with Ti < 0 to reduce the power consumption further (lines 21–32 in Algorithm 2). The reassignment heuristic uses power increase Ei as a metric to decide which stage should be assigned first. Specifically, the heuristic iterates through all stages that need to compensate and, in each iteration, i picks the stage with maximum power increase Ei and increase Toff without causing Ki < Ki . The reassignment heuristic terminates when there is no loss to reassign or no stage needs to compensate. It is worth noting that the repair phase we conduct can still guarantee the repaired solution to satisfy the constraints, as stated in Lemma 5.6. LEMMA 5.6. The solution repaired by Algorithm 2 satisfies the constrains in (15). m i PROOF. The operation in lines 2–20 will not increase the term i=1 Toff without causing Ki < Ki , which satisfies the constrains in (15). The reassignment heuristic in (lines 21–32) reassigns the total loss Q to each stage that needs to compensate and m i i increase its sleep time Toff without increasing the total sleep time i=1 Toff and causing Ki < Ki . Thus, the solution repaired by Algorithm 2 satisfies the constrains in (15). 5.4. Fast Heuristic In Section 5.3, we presented a quadratic programming heuristic with QPB transformation. According to Theorem 5.5, QPB is NP-hard when the scanned bounded delay b is big enough. Assume that bounded delay b is scanned by n steps, then the heuristic in Section 5.3 needs to solve this NP-hard problem several times, which is time consuming. ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. Applying Pay-Burst-Only-Once Principle for Periodic Power Management 26:15 ALGORITHM 2: Repair Scheme Toff ] Input: solution of QPB problem:[ K, , T ] Output: [ K off i i 1: compute the stage set: S1 = { pi |Toff < tsw }; 2: repair [K , Toff ] of the stage p ∈ S1 as [1, 0]; i 3: update budget Ti ← Toff and power increase Ei ← 0 for stages p ∈ S1 ; i i 4: compute Ton and the stage set: S2 = { pi |Toff ≥ tsw }; 5: for each stage pi ∈ S2 do i 6: if Ton < ci then i 7: Ton ← ci ; 8: else Ti i 9: Ton ← con ci ; i 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: if i Ton Ki i i − Ton < tsw then Ti i Ton ← con ci ; i end if end if Ti i i compute budget Ti = Toff − ( Koni − Ton ); if Ti >= 0 then Ti i i Toff ← Koni − Ton ; Ei ← 0 ; else i i i i i i Toff ← Toff ; Ei ← P(Ton , Toff ) − P(Ton , Toff ); end if end for compute total budget Q = Ti >0 Ti ; while Q > 0 do find stage pi with maximum power increase Ei ; if Ti < 0 then compute available allocation allo = min(Q, |Ti |); i i Toff ← Toff + allo ; Ti ← Ti + allo; i i i i Ei ← P(Ton , Toff ) − P(Ton , Toff ); Q ← Q − allo ; else break ; end if end while , T ]; update [ K off Besides, in the first optimization step, the quadratic programming heuristic does not i consider the break-even time constraint (i.e., Toff of pipeline stage i is not smaller than i T BET ), which could also make the result pessimistic. To overcome these drawbacks, we present a fast heuristic to find a suboptimal solution, running with O(mn) time complexity (m is the stage number). Different from the heuristic in Section 5.3, we consider the break-even time constraint in the optimization phase and partition stage set P as two stage sets according to this constraint, rather than decoupling the break-even time constraint and optimization. Based on this stage-set partition, we can derive a suboptimal solution as stated in Lemma 5.7. Toff ] as the optimal solution LEMMA 5.7. Give a fixed bounded delay b and denote [ K, for the problem. Partition the stage set P into two subsets S1 and S2 , where S1 = ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. 26:16 G. Chen et al. i i i i Toff ] can be { pi |Toff < T BET } and S2 = { pi |Toff ≥ T BET }. Then, the optimal solution [ K, determined as follows i (1) For the stage pi ∈ S1, [Ki , Toff ] = [1, 0]. i (2) For the stage pi ∈ S2, [Ki , Toff ] = [ρci , xi ], where xi = i wi = Esw (1 − ρ · ci ). wi pi ∈S2 wi (b − m i=1 ci ) and Ei i i sw PROOF. For the stage subset S2 , Toff ≥ T BET ≥ P i −P i holds. The average power s σ i consumption P(Ki , Toff ) gets its minimum at Ki = ρci according to Property 1. Thus, the w2 average power consumption of the stage subset S2 can be transformed as pi ∈S2 T ii + off m i i i i pi ∈S2 ρci (Ps − Pσ ) with constraint pi ∈S2 Toff ≤ b − pi ∈S1 Toff . According to i=1 ci − Cauchy-Buniakowski-Schwartz’s inequality, the optimal average power consumption i of the stage subset S2 can be determined as (19) when [Ki , Toff ] = [ρci , wi wi (b − pi ∈S2 m i c − T )] holds. pi ∈S1 off i=1 i !2 w i pi ∈S2 i = P Ki , Toff + ρci (Psi − Pσi ). m i b − c − T i p ∈S1 i=1 off i p ∈S2 p ∈S2 i (19) i According to (19), power consumption of the stage subset S2 gets the the average i minimum when pi ∈S1 Toff gets the minimum. i i = tsw , where for this case of For the stage set S1 , there are two cases: (a) T BET i i Toff < tsw , turning off the processor of stage i is not possible, as we stated in repair i of the processor scheme, due to the hardware requirement that the sleep time Toff i should not be smaller than the overhead tsw . Thus, the solution for stage i is forced as i Esw i i [Ki , Toff ] pi ∈S1 = [1, 0]; (b) T BET = P i −P i where, with Property 2, the optimal average s σ i power consumption of the stage subset S1 gets its minimum at [Ki , Toff ] pi ∈S1 = [1, 0]. i At this point, pi ∈S1 Toff gets the minimum as 0, thus the average power consumption of the stage subset S2 gets its minimum. According to Lemma 5.7, the optimal solution can be derived directly if the stage partition P = {S1, S2} is determined. Thus, optimal solution can be derived by exhaustively exploring all possible stage partitions with the complexity Ø(2n). When the stage number increases, the complexity will increase exponentially. To reduce its complexity, a fast stage partition scheme is proposed in this article. In this scheme, we first i i greedily put all stages into the stage set S2 = { pi |Toff ≥ T BET } (i.e., we assume all the stages can enter sleep mode). Under this greedy partitioning, we compute the optimal Toff according to Lemma 5.7 as described in lines 1 and 2 in Algorithm 3. Then, we can assign the stages by checking whether the resulting optimal Toff under the greedy i partition is greater than T BET (see lines 3–9 in Algorithm 3). The feasibility of this partition scheme can be guaranteed by Lemma 5.8. LEMMA 5.8. Stage partition P = {S1, S2} generated by Algorithm 3 is feasible. m i ci ) ≥ T BET holds for S2. According PROOF. In Algorithm 3, wi wi (b − i=1 pi ∈P m wi i to Lemma 5.7, Toff in S2 can be determined as (b − i=1 ci ). As S2 ⊆ P, we wi pi ∈S2 ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. Applying Pay-Burst-Only-Once Principle for Periodic Power Management 26:17 ALGORITHM 3: Greedy Partition Scheme Input: ρ, b, P Output: S1,S2 i (1 − ρ · c ) for each stage p ; 1: Compute wi = Esw i mi w 2: Compute xi = i w (b − i=1 ci ) for each stage pi ; i pi ∈P 3: for pi ∈ P do i 4: if xi < TBET then 5: Insert stage pi into set S1; 6: else 7: Insert stage pi into set S2; 8: end if 9: end for i can get Toff ≥ wi p ∈P wi i (b − m i=1 ci ) i ≥ T BET holds for S2, thus the stage partition gener- ated by Algorithm 3 is feasible. For each b, we can first obtain a suboptimal partition by the greedy partition scheme depicted in Algorithm 3, and then the optimal solution under the obtained partition can be determined. The pseudocode of the algorithm is depicted in Algorithm 4. ALGORITHM 4: Fast Heuristic Input: α u, bl , bh , Pmin = ∞ i m ]i=1 Output: [Ki , Toff 1: for b = bl to bh with step do 2: compute ρ by Eqn. (16); 3: generate the feasible partition S1 and S2 by Algorithm 3; and Toff according to Lemma 5.7; 4: obtain K and Toff by Algorithm 2; 5: repair K Toff ) < Pmin then 6: if P( K, ; To f f, opt ← Toff ; 7: Kopt ← K opt , To f f, opt ); 8: Pmin ← P( K 9: end if 10: end for 6. PERFORMANCE EVALUATIONS In this section, we demonstrate the effectiveness of our approach. We compare three approaches in this section: (1) the pay-burst-only-once algorithm based on quadratic programming (PBOOA-QP) presented in Section 5.3; (2) pay-burst-only-once algorithm based on the fast heuristic (PBOOA-FH) presented in Section 5.4; and (3) the deadline partition algorithm (DPA). DPA partitions the end-to-end deadline into sub-deadlines for individual pipeline stages and explores all the possible deadline partition combinations to find that deadline partition with the minimum energy consumption. For each deadline partition combination, DPA uses the scheme in Huang et al. [2009b] to minimize the energy consumption of individual pipeline stages to optimize the overall energy consumption. To show the effects of our scheme, we report the average idle power computed as Eq. (10) as well as the computation time of all the schemes. The simulation is implemented in Matlab using the RTC toolbox [Wandeler and Thiele 2006] and the finite B&B algorithm [Chen and Burer 2012] is used to solve QPB. All results are obtained from a 2.83 GHz processor with 4GB memory. ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. 26:18 G. Chen et al. Table I. Constants for 70nm Technology [Martin et al. 2002; Wang and Mishra 2010] Const Value Const Value Const Value K1 K2 K3 K4 K5 0.063 0.153 5.38 × 10−7 1.83 4.19 K6 K7 Vdd Vbs α 5.26 × 10−12 −0.144 [0.5,1] [−1,0] 1.5 Vth1 Ij Ceff Ld Lg 0.244 4.8 × 10−10 0.43 × 10−9 37 4 × 106 Table II. Power Parameters Vdd 0.7V Pa 656mW Ps 390mW Pσ 50μW Esw 483μJ tsw 10ms 6.1. Simulation Setup The experiments are conducted based on the classical energy model of a 70nm technology processor in Martin et al. [2002], Wang and Mishra [2010], Jejurikar et al. [2004], and Chen et al. [2014], whose accuracy has been verified with SPICE simulation. Table I lists the energy parameter under 70nm technology [Martin et al. 2002; Wang and Mishra 2010; Jejurikar et al. 2004; Chen et al. 2014]. According to Jejurikar et al. [2004], executing at Vdd = 0.7V is more energy efficient than executing at lower voltage levels. To achieve the minimization of the overall energy consumption of the system, we assume that the processor runs at this critical frequency level when the processor is in the active state. From Wang and Mishra [2010] and Jejurikar et al. [2004], body bias voltage Vbs is obtained as −0.7V . From Jejurikar et al. [2004], Pon related to idle power can be obtained as 100mW and the power consumption in sleep mode Pσ is set as 50μW. In Jejurikar et al. [2004], we can obtain energy overhead Esw of the state transition as 483μJ. We set time overhead tsw of the state transition as 10ms. According to the energy parameter in Table I and the energy model in Section 3.2, we can calculate the corresponding active power Pa and standby power Ps under voltage level Vdd = 0.7V . Table II lists all the power parameters used in the experiment. An event stream is specified by the PJD model with period p, jitter j, and minimal interarrival distance d. It is worth noting that a worst-case execution time c is associated with the service curve of different stages, as stated in Section 3.3. The jitter j and the relative deadline D of the stream are respectively defined as j = ϕ · p and D = γ · p and vary according to the corresponding factors. To evaluate the effectiveness of our approach, we conduct the experiments with three applications. We collected results for these three applications with deadline and jitter varying with the corresponding factors γ and ϕ. In the following, we give a brief overview of the three applications. The H.263 decoder application [Oh and Ha 2002] was modeled by four tasks consisting of packet decoding (PD1), an inverse quantization operation (deQ), an inverse DCT operation (IDCT), and motion compensation (MC). The execution time of each subtask in the H.263 decoder application can be found in Oh and Ha [2002]. The activation period of the H.263 decoder application is 100ms with varying the jitter and the end-to-end deadline. The MP3 decoder application is implemented in a pipelined fashion [Oh and Ha 2002] that can be split into five tasks, including packet decoding (PD2), Huffman decoding (HD), the inverse quantization operation (deQ), an inverse DCT operation (IDCT), and antialiasing (FB). The execution time of each subtask in the H.263 decoder application can be found in Oh and Ha [2002]. The activation period of the MP3 decoder application is 100ms with varying the jitter and the end-to-end deadline. Time Delay Equalization (TDE) comes from the GMTI (Ground Moving Target Indiciator) application obtained from StreamIt benchmarks [Thies and ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. Applying Pay-Burst-Only-Once Principle for Periodic Power Management 26:19 Table III. Average Power Savings with Respect to DPA PBOOA-QP PBOOA-FH H.263 2-stages 10.46% 10.46% MP3 2-stages 11.57% 11.57% TDE 2-stages 39.62% 39.65% H.263 3-stages 23.59% 23.31% MP3 3-stages 26.37% 25.69% TDE 3-stages 30.60% 34.09% Amarasinghe 2010]. The Time Delay Equalization (TDE) application contains 4 tasks, including tasks like FFT reorder, combined DFT, FFT reorder, and combined IDFT. We set the activation period of the consumer application as 30ms. 6.2. Simulation Result We first evaluate how the power consumptions of the compared approaches change as the jitter and deadline vary. Cases of 2-stage and 3-stage pipeline architectures with homogeneous 70nm processors are evaluated. We vary the jitter factor ϕ from 0–3 with step 0.5 and the deadline factor γ from 1.5–2 with step 0.5. The simulation results of the three approaches are shown in Figure 4. In the figure, each line represents the average energy consumption under the varied jitter factor settings with the fixed deadline factor and task mapping. From these, we can make the following observations: (1) pay-bustonly-once-based approaches always outperform the deadline partition approach for all settings on both pipeline architectures. We list average normailized power savings of PBOOA-QP and PBOOA-FH with respect to DPA in Table III; (2) the average idle power consumptions of the three approaches increase as jitter increases, since bigger jitter requires longer Ton to gurantee the worst-case end-to-end deadline; (3) the average idle power consumptions of the three approaches decrease as end-to-end deadline increases. This is expected becasue the loose end-to-end deadline requirement could result in smaller execution time Ton and longer sleep time Toff ; (4) one interesting obervation is that pay-bust-only-once-based approaches can achieve more power savings on 3-stage pipeline rather than 2-stage pipeline for different jitter and deadline settings. This is because DPA on 3-stage pipeline pay burst more times than 2-stage pipeline, which leads PBOOA-QP and PBOOA-FH to achieve more power savings on 3-satge pipeline. Next, we conduct the experiment to show the impact of time overhead of state transition tsw on the effectiveness of our approaches. An H.263 application with jitter factor ϕ = 0.5 and deadline factor γ = 1 runs in 3-stage pipeline architectures with homogeneous 70nm processors. We vary the time overhead of state transition tsw from 5ms– 15ms with fixed step size 1ms. Figure 5 illustrates the average power consumptions for the three compared approaches. In this figure, we can observe that our approaches can find efficient solutions and outperform DPA in all of tsw settings. Besides, when tsw increases, the average power consumptions of DPA increase faster compared to payburst-only-once-based approaches. This is because DPA generates the less idle time due to suffering from a paying burst many times compared to pay-burst-only-once-based approaches, as we show in Section 4. The increase of tsw will reduce the opportunities for turnning off the processor, which means that entering sleep mode should be more difficult for DPA. Then, we discuss the impact of the period setting on the effectiveness of the approaches. The MP3 application with jitter factor ϕ = 1 and deadline factor γ = 1.5 runs in two-stage pipeline architectures with homogeneous 70nm processors, where we vary period settings from 70–130ms with fixed step size 10ms. Figure 6 illustrates the average power consumptions for the three compared approaches under different period settings. From Figure 6, we can see that the pay-burst-only-once-based approaches outperform DPA at all period settings. Furthermore, the average power consumption of all approaches decreases when the period increases. This is expected because a bigger period of the application can prolong the idle intervals. ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. 26:20 G. Chen et al. Fig. 4. Average idle power consumption for three applications on 2-stage and 3-stage pipeline architectures. In the end, we demonstrate the scalability of our approaches. We test them by up to a 20-stage heterogeneous pipeline. The execution time of subtasks mapped on each stage are randomly generated between 5–15ms. According to the power model presented in Section 3.2, the power profile of each stage can be generated by randomly selecting voltage Vdd between 0.5V –0.8V . The activation period of the event stream is 40ms with jitter factor ϕ = 1. The end-to-end deadline for the test case ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. Applying Pay-Burst-Only-Once Principle for Periodic Power Management 26:21 Fig. 5. Average power consumption with varying tsw . Fig. 6. Average power consumption with varying period. with different stage number is determined by n · 20, where n is the stage number. The overhead values of state transition tsw and Esw of different stages are randomly selected in [1ms, 5ms] and [400uJ, 800uJ], respectively. Based on the observation that the deadline partition algorithm may suffer deadline combination explosion and the costly computation, we set the search step as 5 for the three compared approaches. Figure 7 shows the power consumption and computation overhead on different pipeline architectures. From this figure, we have the following observations: (1) as shown in Figure 7(a), the computation overhead of the deadline partition algorithm increases exponentially. When the stage number exceeds 10, the Deadline Partition Algorithm (DPA) fails to generate the results due to the expiration of time budget of 8 hours. For the case of 9-stage pipeline, DPA takes almost 420 minutes, which is 9182× longer than the 3-stage pipeline case. This is expected because the deadline combinations will increase exponentially as the stage number increases. In addition, as the stage number increases, the time for computing the resource demand of each following stage, which requires the lower bound of the output arrival curve from the previous stage, increases. ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. 26:22 G. Chen et al. Fig. 7. Computation time and power computation for heterogeneous pipelined system. Computing this output curve requires numerical min-plus convolution that will incur considerable computational and memory overheads; (2) compared to the deadline partition algorithm, pay-burst-only-once-based approaches are fast and the computation time increases slowly with respect to the stage number, especially for PBOOA-FH. With the case of 20-stage pipeline, the PBOOA-QP approach takes 3.7 minutes, 124× more computing time than the 3-stage case. PBOOA-FH takes only 0.08 minutes to generate the result, only 7.5× than the 3-stage case; (3) in the context of average idle power consumption, pay-burst-only-once-based approaches are more energy efficient than the deadline partition algorithm. In Figure 7(b), we can see PBOOA-QP and PBOOA-FH approaches always outperform DPA for all pipeline architectures, indicating that our approaches are not only faster but also more energy efficient than the DPA approach. Besides, as observed in prior experiments, the gap in power consumption between the deadline partition algorithm and pay-burst-only-once-based algorithm increase as the stage number increases. This is expected because, as the stage number increases, the times when DPA should pay burst also increase. In contrast, the proposed approaches only need to pay burst only once, which leads the tighter end-to-end delay bound and prolongs the idle intervals of the stages for energy efficiency; (4) the PBOOA-FH approach can achieve almost identical average idle power consumption with respect to the PBOOA-QP approach with almost 10× speedup. In some cases, PBOOA-FH can even achieve more energy savings than PBOOA-QP. This is because that, in contrast to the PBOOA-QP approach, PBOOA-FH integrates break-even time constraints into the optimization phase, which leads the PBOOA-FH approach to find better solutions than the PBOOA-QP approach. 7. CONCLUSION This article presents new approaches to minimize energy consumption for pipelined systems. Targeting the streaming application with nondeterministic workload arrivals under hard real-time constraints, our approaches can not only guarantee the original end-to-end deadline requirement but also retrieve the pay-burst-only-once phenomena, resulting in a significant reduction in both the energy consumption and computing overhead. Moreover, our approaches are scalable with respect to the number of pipelined stages. Simulation results demonstrate the effectiveness of our approaches. In the future, we intend to extend our approaches to dynamic voltage-frequency scaling (DVFS) to reduce dynamic power for pipelined systems. Another interesting future work would be to target multidimensional issues such as energy and thermal ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. Applying Pay-Burst-Only-Once Principle for Periodic Power Management 26:23 constraints simultaneously. In addition, how to combine our approaches with consideration of the mapping of the application is also deemed worthy for our future work. APPENDIX LEMMA A.1. The service curve of period power management specified by Ton and Toff can be represented as follows. i − Toff Gl i βi () = Ton ⊗ . (20) i + Ti Ton off PROOF. According to Huang et al. [2009b], the service curve of period power management specified by Ton and Toff can be represented as Eq. (21). " # Gl (21) · Ton, − · Toff . β () = max Ton + Toff Ton + Toff This proof presents the derivation of Eq. (20), which is used to represent the service curve of period power management, to indicate that Eqs. (21) and (20) are equivalent. According to the definition of the min-plus convolution, $ % − Toff β Gl () = Ton ⊗ Ton + Toff % $ s − Toff . = inf − s + Ton · 0≤s≤ Ton + Toff We make some transformations as follows. T = Ton + Toff = k · T + r , k ∈ N + , 0 ≤ r < T , s = ks · T + rs , ks ∈ N + , 0 ≤ rs < T . Then, we have % $ rs − Toff (k − ks ) · T + (r − rs ) + Ton · ks + Ton · . 0≤s≤ T β Gl () = inf (22) As s ≤ , there are two possibilities between the parameters r and rs : (1) when ks = k , rs ≤ r should be held for s ≤ ; (2) when ks ≤ k − 1, there is no constraint between r and rs because ks ≤ k − 1 is sufficient to guarantee s ≤ . Case 1: ks ≤ k − 1. For this case, there are no constraints between r and rs , thus we can have Eq. (23) by calling Eq. (22). % $ rs − Toff β Gl () = inf (k − ks ) · T + (r − rs ) + ks · Ton + Ton · 0≤s≤ T % $ rs − Toff = inf k · T + r − ks · (T − Ton) − rs + Ton · 0≤s≤ T % $ rs − Toff . (23) = inf k · T + r − ks · Toff − rs + Ton · 0≤s≤ T ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. 26:24 G. Chen et al. —When Toff < rs < T holds, we have Eq. (24) by calling Eq. (23). β Gl () = inf (k · T + r − ks · Toff − rs + Ton) 0≤s≤ > (k · T + r − ks · Toff − T + Ton) = (k · T + r − ks · Toff − Toff ) ≥ (k · T + r − k · Toff ). (24) —When 0 ≤ rs ≤ Toff holds, we have Eq. (25) by calling Eq. (23). inf β Gl () = inf (k · T + r − ks · Toff − rs ) =========⇒ k · T + r − k · Toff . (25) rs =Toff ,ks =k −1 0≤s≤ For the preceding two cases, the infimum of βkGl () for the case ks ≤ k − 1 can be s ≤k −1 obtained as Eq. (26) by calling Eqs. (24) and (25). βkGl () = k · T + r − k · Toff = − k · Toff . s ≤k −1 (26) Case 2: ks = k . For this case, rs ≤ r should be held for s ≤ , thus we have Eq. (27) by calling Eq. (22). % $ rs − Toff . (27) β Gl () = inf (r − rs ) + k · Ton + Ton · 0≤s≤ T As rs should be constrained by r , there are two cases for r . —r ≤ Toff . For this case, we have 0 ≤ rs ≤ r ≤ Toff , thus we have Eq. (28) by calling Eq. (27). inf βkGl () = inf ((r − rs ) + k · Ton) ===⇒ k · Ton. s =k ,r ≤Toff rs =r 0≤s≤ (28) By integrating the cases of ks = k and ks ≤ k − 1, we have Eq. (29) by calling Eqs. (26) and (28). βrGl () = min ( − k · Toff , k · Ton) = k · Ton. ≤Toff (29) —r > Toff . For this case, there are two subcases for rs , that is, 0 ≤ rs ≤ Toff < r < T and Toff < rs ≤ r < T . —0 ≤ rs ≤ Toff < r < T . For this case, we have Eq. (30) by calling Eq. (27). inf β Gl () = inf ((r − rs ) + k · Ton) ===⇒ r − Toff + k · Ton < Ton + k · Ton. rs =Toff 0≤s≤ (30) —Toff < rs ≤ r < T . For this case, we have Eq. (31) by calling Eq. (27). inf β Gl () = inf ((r − rs ) + k · Ton + Ton) ===⇒ Ton + k · Ton. 0≤s≤ rs =r (31) () can be obtained as Eq. (32) For the prior two cases, the infimum of βkGl s =k ,r >Toff by calling Eqs. (30) and (31). βkGl () = r − Toff + k · Ton = − (k + 1) · Toff . s =k ,r >Toff (32) ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. Applying Pay-Burst-Only-Once Principle for Periodic Power Management 26:25 By integrating the cases of ks = k and ks ≤ k − 1, we have Eq. (33) by calling Eqs. (32) and (26). () = min ( − k · Toff , − (k + 1) · Toff ) = − (k + 1) · Toff . βrGl >Toff With Eqs. (29) and (33), we can obtain the service curve as Eq. (34). r ≤ Toff k ·T . β Gl = on − (k + 1) · Toff r > Toff (33) (34) When 0 ≤ r ≤ Toff holds, we have k · Ton ≥ − (k + 1) · Toff and k = Ton+T . When off r > Toff holds, we have k · Ton < − (k + 1) · Toff and k + 1 = Ton+T . off Then, we can obtain the service curve as Eq. (35). β Gl = max (k · Ton, − (k + 1) · Toff ). (35) By transforming Eq. (35), we can obtain Eq. (21), thus the service curve of period power management can be represented as Eq. (20). REFERENCES A. Alimonda, S. Carta, A. Acquaviva, A. Pisano, and L. Benini. 2009. A feedback-based approach to DVFS in data-flow applications. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 28, 11, 1691–1704. S. Carta, A. Alimonda, A. Pisano, A. Acquaviva, and L. Benini. 2007. A control theoretic approach to energyefficient pipelined computation in mpsocs. ACM Trans. Embedd. Comput. Syst. 6, 4. G. Chen, K. Huang, C. Buckl, and A. Knoll. 2013. Energy optimization with worst-case deadline guarantee for pipelined multiprocessor systems. In Proceedings of the Design, Automation and Test in Europe Conference (DATE’13). G. Chen, K. Huang, and A. Knoll. 2014. Energy optimization for real-time multiprocessor system-on-chip with optimal DVFS and DPM combination. ACM Trans. Embedd. Comput. Syst. 13, 3. J. Chen and S. Burer. 2012. Globally solving nonconvex quadratic programming problems via completely positive programming. Math. Program. Comput. 4, 1, 33–52. J. J. Chen, N. Stoimenov, and L. Thiele. 2009. Feasibility analysis of on-line DVS algorithms for scheduling arbitrary event streams. In Proceedings of the 30th IEEE Real-Time Systems Symposium (RTSS’09). A. Davare, Q. Zhu, M. Di Natale, C. Pinello, S. Kanajan, and A. Sangiovanni-Vincentelli. 2007. Period optimization for hard real-time distributed automotive systems. In Proceedings of 44th ACM/IEEE Design Automation Conference (DAC’07). P. de Langen and B. Juurlink. 2006. Leakage-aware multiprocessor scheduling for low power. In Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS’06). P. de Langen and B. Juurlink. 2009. Leakage-aware multiprocessor scheduling. J. Signal Process. Syst. 57, 1, 73–88. M. Fidler. 2003. Extending the network calculus pay bursts only once principle to aggregate scheduling. In Proceedings of the 2nd International Workshop on Quality of Service in Multiservice IP Networks (QoS-IP’03). 19–34. M. Fu, Z. Luo, and Y. Ye. 1998. Approximation algorithms for quadratic programming. J. Combinat. Optim. 2, 1, 29–50. S. Y. Hong, T. Chantem, and X. S. Hu. 2011. Meeting end-to-end deadlines through distributed local deadline assignments. In Proceedings of the 32nd IEEE Real-Time Systems Symposium (RTSS’11). K. Huang, J. J. Chen, and L. Thiele. 2011a. Energy-efficient scheduling algorithms for periodic power management for real-time event streams. In Proceedings of the 17th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA’11). K. Huang, L. Santinelli, J. J. Chen, L. Thiele, and G. C. Buttazzo. 2009a. Adaptive dynamic power management for hard real-time systems. In Proceedings of the 30th IEEE Real-Time Systems Symposium (RTSS’09). K. Huang, L. Santinelli, J. J. Chen, L. Thiele, and G. C. Buttazzo. 2009b. Periodic power management schemes for real-time event streams. In Proceedings of the 48th IEEE International Conference on Decision and Control (CDC’09). ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. 26:26 G. Chen et al. K. Huang, L. Santinelli, J. J. Chen, L. Thiele, and G. C. Buttazzo. 2011b. Applying real-time interface and calculus for dynamic power management in hard real-time systems. Real-Time Syst. 47, 2, 163–193. H. Javaid and S. Parameswaran. 2009. A design flow for application specific heterogeneous pipelined multiprocessor systems. In Proceedings of the 46th ACM/IEEE Annual Design Automation Conference (DAC’09). H. Javaid, M. Shafique, J. Henkel, and S. Parameswaran. 2011a. System-level application-aware dynamic power management in adaptive pipelined MPSoCs for multimedia. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’11). H. Javaid, M. Shafique, S. Parameswaran, and J. Henkel. 2011b. Low-power adaptive pipelined MPSoCs for multimedia: An h.264 video encoder case study. In Proceedings of the 48th ACM/EDAC/IEEE Design Automation Conference (DAC’11). R. Jejurikar, C. Pereira, and R. Gupta. 2004. Leakage aware dynamic voltage scaling for real-time embedded systems. In Proceedings of the 41st ACM/IEEE Design Automation Conference (DAC’04). V. Jeyakumar, A. M. Rubinov, and Z. Y. Wu. 2006. Sufficient global optimality conditions for non-convex quadratic minimization problems with box constraints. J. Global Optim. 36, 3, 471–481. I. Karkowski and H. Corporaal. 1997. Design of heterogeneous multi-processor embedded systems: Applying functional pipelining. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’97). 156. K. Lampka, K. Huang, and J. J. Chen. 2011. Dynamic counters and the efficient and effective online power management of embedded real-time systems. In Proceedings of the 7th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’11). J. Y. Le Boudec and P. Thiran. 2001. Network Calculus: A Theory of Deterministic Queuing Systems for the Internet. Springer. D. Liu, J. Spasic, J. T. Zhai, T. Stefanov, and G. Chen. 2014. Resource optimization of CSDF-modeled streaming applications with latency constraints. In Proceedings of the Design, Automation and Test in Europe Conference (DATE’14). S. M. Martin, K. Flautner, T. Mudge, and D. Blaauw. 2002. Combined dynamic voltage scaling and adaptive body biasing for lower power microprocessors under dynamic workloads. In Proceedings of IEEE/ACM International Conference on Computer-Aided Design (ICCAD’02). S. Maxiaguine, A. Chakraborty, and L. Thiele. 2005. DVS for buffer-constrained architectures with predictable QoS-energy tradeoffs. In Proceedings of the IEEE/ACM International Conference on Hardware/ Software Codesign and System Synthesis (CODES+ISSS’05). H. Nikolov, T. Stefanov, and E. Deprettere. 2008. Systematic and automated multiprocessor system design, programming, and implementation. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 27, 3, 542–555. H. Oh and S. Ha. 2002. Hardware-software cosynthesis of multi-mode multi-task embedded systems with real-time constraints. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES+ISSS’02). S. Perathoner, K. Lampka, N. Stoimenov, L. Thiele, and J. J. Chen. 2010. Combining optimistic and pessimistic DVS scheduling: An adaptive scheme and analysis. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’10). S. L. Shee, A. Erdos, and S. Parameswaran. 2006. Heterogeneous multiprocessor implementations for jpeg: A case study. In Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’06). S. L. Shee and S. Parameswaran. 2007. Design methodology for pipelined heterogeneous multiprocessor system. In Proceedings of the 44th Annual Design Automation Conference (DAC’07). L. Thiele, S. Chakraborty, and M. Naedele. 2000. Real-time calculus for scheduling hard real-time systems. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’00). W. Thies and S. Amarasinghe. 2010. An empirical characterization of stream programs and its implications for language and compiler design. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). E. Wandeler and L. Thiele. 2006. Real-time calculus (rtc) toolbox. http://www.mpa.ethz.ch/Rtctoolbox. E. Wandeler, L. Thiele, M. Verhoef, and P. Lieverse. 2006. System architecture evaluation using modular performance analysis - A case study. Int. J. Softw. Tools Technol. Transfer 8, 6, 649–667. W. X. Wang and P. Mishra. 2010. Leakage-aware energy minimization using dynamic voltage scaling and cache reconfiguration in real-time systems. In Proceedings of the 23rd International Conference on VLSI Design (VLSID’10). R. B. Xu, R. Melhem, and D. Mosse. 2007. Energy-aware scheduling for streaming applications on chip multiprocessors. In Proceedings of the 28th IEEE International Real-Time Systems Symposium (RTSS’07). ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015. Applying Pay-Burst-Only-Once Principle for Periodic Power Management 26:27 F. Yao, A. Demers, and S. Shenker. 1995. A scheduling model for reduced CPU energy. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science (FOCS’95). Y. Yu and V. K. Prasanna. 2002. Power-aware resource allocation for independent tasks in heterogeneous real-time systems. In Proceedings of the 9th IEEE International Conference on Parallel and Distributed Systems (ICPADS’02). Received November 2013; revised July 2014; accepted November 2014 ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

### Related manuals

Download PDF

advertisement