# Energy Optimization with Worst-Case Deadline Guarantee for Pipelined Multiprocessor Systems Gang Chen

Energy Optimization with Worst-Case Deadline Guarantee for Pipelined Multiprocessor Systems Gang Chen Kai Huang Christian Buckl Alois Knoll TU Munich, Germany [email protected] TU Munich, Germany [email protected] fortiss GmbH, Germany [email protected] TU Munich, Germany [email protected] Abstract—Pipelined computing is a promising paradigm for embedded system design. Designing the scheduling policy for a pipelined system is however more involved. In this paper, we study the problem of the energy minimization for coarse-grained pipelined systems under hard real-time constraints and propose a method based on an inverse use of the pay-burst-only-once principle. We formulate the problem by means of the resource demands of individual pipeline stages and solve it by quadratic programming. Our approach is scalable w.r.t the number of the pipeline stages. Simulation results using real-life applications as well as commercialized processors are presented to demonstrate the effectiveness of our method. I. I NTRODUCTION Pipelined computing is a promising paradigm for embedded system design, which can in principle provide high performance and low energy consumption [1]. For instance, a streaming application can be split into a sequence of functional blocks that are computed by a pipeline of processors where clock/power-gating techniques can be applied to achieve energy efficiency. Designing the scheduling policy for the pipeline stages under the requirements of both energy efficiency and timing guarantee is however non-trivial. In general, energy efficiency and timing guarantee are conflict objectives, i.e., techniques that reduce the energy consumption of the system will usually pay the price of longer execution time, and vice versa. Previous work on this topic either requires precise timing information of the system [15], [14], [12] or tackles only soft realtime requirements [6], [1]. In the context of hard real-time systems, seldom work has been published that can handle nondeterministic workloads. This paper studies the energy-minimization problem of coarse-grained pipelined systems under hard real-time requirements. We consider a streaming application that is split into a sequence of coarse-grained functional blocks which are mapped to a pipeline architecture for processing. The workload of the streaming application is abstracted as an event stream and the event arrivals of the stream are modeled as the arrival curves in interval domain [7]. The event stream has an endto-end deadline requirement, i.e., the time by which any event in the stream travels through the pipeline should be no longer than this required deadline. The objective is thereby to find the optimal scheduling policies for individual stages of the pipeline with minimal energy consumption while the deadline requirement of the event stream is guaranteed. c 978-3-9815370-0-0/DATE13/°2013 EDAA Intuitively, the problem can be solved by partitioning the end-to-end deadline into sub-deadlines for individual pipeline stages and optimizing the overall energy consumption based on the partitioned sub-deadlines. However, any partition strategy based on the end-to-end deadline and the follow-up optimization method will suffer from counting multiple times of the burst of the event stream, which will inevitably overestimate the needed resource for each pipeline stage and lead to poor energy saving. A motivation example in Section IV will demonstrate this drawback in details. Therefore, more sophisticated method is needed to tackle this problem. Our idea to solve this problem lies in an inverse use of the known pay-burst-only-once principle [7]. Rather than directly partitioning the end-to-end deadline, we compute for the entire pipeline one service curve which serves as a constraint for the minimal resource demand. The energy minimization problem is then formulated with respect to the individual resource demands of pipeline stages and is solved with standard quadratic programming. For simplicity, we consider power-gating energy minimization and use periodic dynamic power management to reduce the leakage power, i.e., to periodically turn on and off the processors of the pipeline. Note that the basic idea can also be applied to clock-gating energy reduction. With this approach, we can not only guarantee the overall end-to-end deadline requirement but also retrieve the pay-burst-only-once phenomena, resulting in a significant reduction of the energy consumption. In addition, our method is scalable with respect to the number of the pipeline stages. The contributions of this paper are summarized as follows: • • • • A new method is developed to solve the energyminimization problem for pipelined multi-processor embedded systems by inversely using the pay-burst-onlyonce principle. We derive a formulation of the minimization problem based on the needed resource of individual stages of the pipeline architecture and a transformation of the formulation to a standard quadratic programming problem with box constraints. A two-phase heuristic is developed to solve the formulated problem and a formal proof is provided to show the correctness of our approach, i.e., guarantee on the end-to-end deadline requirement. We conduct simulation using real-life applications as well as commercialized processors to demonstrate the effectiveness of our method. The rest of the paper is organized as follows: Section II reviews related work in the literature. Section III presents basic models and the definition of the studied problem. Section IV presents the motivation example and Section V describes the proposed approach. Experimental evaluation is presented in Section VI and Section VII concludes the paper. II. R ELATED W ORK Energy optimization for pipelined multiprocessor systems is an interesting topic where numbers of techniques have been proposed in the literature. For instance, approaches based on control theory [1] and runtime workload prediction [6] are proposed, targeting energy minimization under soft real-time constraints. There are also methods [15], [14], [12] for hard real-time systems. But these methods require precise timing information of task arrivals, e.g., periodic arrivals. However, in practice, this precise timing information of task arrivals might not be known in advance, since arrival time of tasks depends on many nonfunctional factors, e.g., environmental impacts. There are also many works on hard real-time systems but allowing non-deterministic task arrivals. By using the arrival curve model [7] to abstract task arrivals into time interval domain, techniques based on dynamic frequency scaling [8], [10] and dynamic power management [5], [4] have been recently proposed for uni-processor systems. Nevertheless, how to cope with multiple processors is not yet clear. In this paper, we present an approach to derive energy-efficient scheduling with hard real-time constraints for pipelined multiprocessor systems using the arrival curve model. III. M ODELS AND P ROBLEM D EFINITION A. Hardware Model We consider the system with pipeline architecture showed in Fig. 1(a). Each processor in the pipelined system has three power consumption modes, namely active, standby, and sleep modes, as shown in Fig. 1(b). To serve events, the processor must be in the active mode with power consumption Pa . When there is no event to process, the processor can switch to sleep mode with lower power consumption Pσ . However, mode-switching from sleep mode to active mode will cause additional energy and latency penalty, respectively denoted as Esw,on and tsw,on . To prevent the processor from frequent mode switches, the processor can stay at standby mode with power consumption Ps , which is less than Pa but more than Pσ , i.e. Pa > Ps > Pσ . Moreover, the mode-switch from active (standby) mode to sleep mode will cause energy and time overhead, respectively denoted by Esw,sleep and tsw,sleep . B. Task Model This paper considers streaming applications that can be split into a sequence of tasks. As shown in Fig. 1(a), a H.263 decoder is represented as four tasks (i.e., PD1, deQ, IDCT, MC) implemented in a pipeline fashion [9]. To model the workload of the application, the concept of arrival curve α(∆) = FIFO PD1 Processor1 deQ FIFO Processor2 IDCT FIFO Processor3 MC Processor4 (a) H.263 decoder on pipeline hardware architecture active (Pa ) sleep (Pσ ) Tof f Ton standby (Ps ) t Ton (b) Power model of a processor Fig. 1. System model [αu (∆), αl (∆)], originated from Network Calculus [7], is adopted. αu (∆) and αl (∆) provides the upper and lower bounds on the number of arrival events for the stream S in any time interval ∆. Analogous to arrival curves that provide an abstract event stream model, a tuple β(∆) = [β u (∆), β l (∆)] defines an abstract resource model which provides an upper and lower bounds on the available resources in any time interval ∆. Note that arrival curves are event-based and service curves are based on amount of computation time. Suppose that the execution time of an event is c, the transformation of uthe l service curves can be done by β̄ l = ⌊ βc ⌋ and β̄ u = ⌊ βc ⌋. With these definitions, a processor with lower service curve β̄ Gl (∆) is said to satisfy the deadline D for the event stream specified by αu (∆), if the following condition holds. β̄ Gl (∆) ≥ αu (∆ − D), ∀∆ ≥ 0 (1) C. Problem Statement This paper considers periodic power management [4] that periodically turns on and off a processor. In each period T = Ton + Tof f , switch the processor to active (standby) mode for Ton time units, following by Tof f time units in sleep mode, as shown in Fig. 1(b). Given a time interval L, where L ≫ T and TL is an integer. Suppose that γ(L) is the number of events of event stream S served in L. If all the served events finish within L, the energy consumption E(L, Ton , Tof f ) by applying this periodic scheme is L (Esw,on + Esw,sleep ) E(L, Ton , Tof f ) = Ton + Tof f L · Ton L · Tof f + Ps + Pσ Ton + Tof f Ton + Tof f +c · γ(L)(Pa − Ps ) L · Ton (Ps − Pσ ) L · Esw + = Ton + Tof f Ton + Tof f +L · Pσ + c · γ(L)(Pa − Ps ) where Esw is Esw,on + Esw,sleep for brevity. Given a sufficiently large L, without changing the scheduling policy, the minimization of energy consumption E(L, Ton , Tof f ) of a single processor is to find Tof f and Ton such that the average idle power consumption P (Ton , Tof f ) is minimized. ·(Ps −Pσ ) L·Esw + L·TTon T +T def on +Tof f P (Ton , Tof f ) = on of f L (2) Esw + Ton · (Ps − Pσ ) = Ton + Tof f Based on (2), the energy minimization problem of a m-stage pipeline can be formulated as minimizing following function: m i i X Esw + Ton · (Psi − Pσi ) P (T~on , T~of f ) = (3) i + Ti Ton of f i m 2 1 ] and T~of f = . . . Ton Ton where T~on = [Ton 1 2 m [Tof f Tof f . . . Tof f ]. Now we can define the problem that we studied as follows: Given pipelined platform with m stages, an event stream S processed by this pipeline, and an endto-end deadline requirement D, we are to find a set of periodic power managements characterized by T~on and T~of f that minimize the average idle power consumption P defined in Eqn. (3), while guaranteeing that the worst-case end-to-end delay does not exceed D. IV. M OTIVATION E XAMPLE This section presents a motivation example, where an event stream passes through a 2-stage pipeline with a deadline requirement D. For simplicity, arrival curves in the leakybucket form and service curves in rate-latency form [7] are used. In this representation, an arrival curve is modeled as α(∆) = b + r · ∆, where b is the burst and r is the leaky rate. Correspondingly, a service curve is modeled as β(∆) = R · (∆ − T ), where R is service rate and T is the delay. A graphical illustration of the example is shown in Fig. 2, where D = 20, b = 5, r = 0.5, and R1 = R2 = 1. We first inspect the strategy of partitioning the end-to-end deadline and using the partitioned sub-deadlines for the two pipeline stages. For simplicity, we split the D equally, i.e., D/2 for each stage. As shown in Fig. 2, given D/2 deadline requirement for the first pipeline stage, we obtain the maximal b T1 = D 2 − R1 = 5, corresponding to the minimal service demand β1 = ∆ − 5. To derive the minimal β2 for the second stage of the pipeline is more involved. We need the output arrival curve α′ from the first stage. According to [7], α′ (∆) = b + r · T1 + r · ∆. Now again with a deadline requirement D/2 b+r·T1 = 2.5. for α′ , we have T2 = D 2 − R1 α′ 14 α 12 10 β1 8 D2 = 10 β2 6 4 2 βT l D1 = 10 D = 20 T1 = 5 T = 15 T2 = 2.5 2 4 6 Fig. 2. 8 10 12 14 16 18 20 ∆ Motivation example. Lets take a close look at this solution. According to the concatenation theorem βR1 ,T1 ⊗ βR2 ,T2 = βmin(R1 ,R2 ),T1 +T2 , we get a concatenated service curve β = ∆ − (T1 + T2 ) = ∆ − 7.5. With this concatenated service curve, the maximal overall end-to-end deadline for β1 and β2 is 12.5 which is far too stricter than D. This example indicates that the obtained β1 and β2 based on partitioning the end-to-end deadline is too pessimistic. The reason for the pessimism comes from paying the burst b/R1 for the second stage of the pipeline as well as 1 the additional delay r·T R2 from the first stage, as the payburst-only-once principle points out. These effects will be accumulated for every stage of the pipeline, leading to even more pessimistic results, as the number of the pipeline stages increases. In addition, computing the resource demand of each stage requires the lower bound of the output arrival curve from the previous stage. Computing this output curve requires numerical min-plus convolution which will incur considerable computational and memory overheads. In conclusion, the strategy based on partitioning the end-to-end deadline is not a viable approach, in particular for the cases of pipelined systems with many stages. On the other hand, one can first derive the total server demand β T l , in this case T = 15. Any partition based on this T will result in smaller but valid service curves for each pipeline stage, as we can always retrieve the original end-toend deadline by means of the pay-burst-only-once principle. For example, by an equal partition of T , both T1 and T2 are 7.5 and D is still preserved. This brings the basic idea of our approach that will be presented in the next section. V. P ROPOSED A PPROACH Our approach lies in an inverse use of the pay-burst-onlyonce principle, as mentioned in the previous section. Rather than directly partitioning the end-to-end deadline, we compute one service curve for the entire pipeline which serves as a constraint for the minimal resource demand. The energy minimization problem is then formulated with respect to the resource demands for individual pipeline stages. To solve this minimization problem, the formulation is transformed into a quadratic programming form and solved by a 2-phase heuristic. Without loss of generality, a pipelined system with m heterogeneous stages (m ≥ 2) is considered. The processor of the i stage can provide minimal βiGl service. Since periodic power management is considered, the minimal service βiGl can i i be modeled as an Ton and Tof f pair: l ∆ − Ti m of f i )⊗∆ (4) βiGl (∆) = (Ton i + Ti Ton of f In addition, to obtain a tightened lower bound of service curve i of the entire pipeline, we restrict Ton as a multiple of the worst i case execution time ci , i.e., Ton = ni ci , ni ∈ N + . A. Problem Formulation Before presenting the formulation, we first state a few bases. i Ton , we have the following two By defining Ki = T i +T i on of f lemmas. i i Lem. 1: β̄iGl (∆) ≥ K ci (∆ − Tof f − ci ) overcomes the problem of payingNburst multiple times, on the other hand avoids the costly computation during the optimization. Second, this formulation allows us to use more efficient method to analyze the problem, which will be present in the following sections. Proof: i ¹ Ton § i ∆−Tof f i +T i Ton of f ¨ º ¹ º ∆ ⊗ ci ci i ∆ − Tof 1 f ≥ ni ( i ) ⊗ (∆ − ci ) i ci Ton + Tof f Ki i ≥ (∆ − Tof f − ci ) ci β̄iGl (∆) ≥ Lem. 2: m N β̄i Gl m ¡ i ≥ min( K ci ) ∆ − i=1 i=1 m P i=1 i (Tof f + ci ) B. Quadratic Programming Transformation ¢ Proof: It can be directly derived from the definition of min-plus convolution [7] and Lem. 1. With Lem. 2, we state below theorem. Thm. 1: Assuming an event stream modeled with arrival curve α is processed by an m-stage pipeline and the lower i service curve of each pipeline stage is defined by a Ton and i Tof f pair, the pipelined system satisfies an end-to-end deadline D, if the following condition holds: m ´ X m Ki ³ i (Tof + c ) ≥ αu (∆ − D) (5) min( ) ∆ − i f i=1 ci i=1 Proof: In Lem. 2, the right hand side of inequality is m N Gl a lower bound of β̄i which is the concatenated service i=1 curve of the pipeline. With m N i=1 β̄i Gl ≥ αu (∆−D), the end-to- end delay of the pipeline is no more than D, according to the pay-burst-only-once principle. Therefore, the theorem holds. The left hand side of the inequality Eqn. (5) can be considered as a bounded-delay function bdf (∆, ρ0 , b0 ) = Ki max(0, ρ0 (∆ − b0 )) with slope ρ0 = minm i=1 ( ci ) and Pm i bounded-delay b0 = i=1 (Tof f + ci ). For the stream S with deadline D, a set of minimum bounded-delay functions bdfmin (∆, ρ, b) can be derived by varying b (See Section V-B). ~ T~of f ] such that the Therefore, we should find a solution of [K, resulting bounded-delay function bdf (∆, ρ0 , b0 ) is no less than minimum bounded-delay functions bdfmin (∆, ρ, b). Therefore, we can formulate our optimization problem as following: minimize ~ T~of f ) P (K, subject to min( ~ T ~of f K, m i=1 m X i=1 Ki )≥ρ ci (6) i (Tof f + ci ) ≤ b 0 ≤ Ki ≤ 1, i = 1, . . . , m i Tof f ≥ 0, i = 1, . . . , m ~ = [K1 . . . Kn ]. P (K, ~ T~of f ) is obtained as follows where K i Ton by conducting a transformation Ki = T i +T to (3). i on of f m X E i (1 − Ki ) ~ T~of f ) = + (Psi − Pσi ) Ki ) ( sw i P (K, T of f i The advantage of the formulation (6) is two-fold. First of all, the service curves of individual pipeline stages are the variables of the optimization problem, which on the one hand How to solve the minimization problem (6) is not obvious. The constraints b and ρ indeed are not fixed values. In addition, these two constraints are correlated. For a fixed b, the minimum bounded-delay function bdfmin (∆, ρ, b) can be determined by computing ρ: ρ = inf {ρ : bdf (∆, ρ, b) ≥ αu (∆ − D), ∀∆ ≥ 0} (7) In this paper, we conduct the optimization by varying b and computing ρ for every possible b. For a fixed b, we can transform (6) into a quadratic programming problem with box constraints(Q PB), as stated in the following lemma. Lem. 3: The minimization problem in (6) can be transformed as the following quadratic programming problem with box constraints: minimize ~xT Q~x ~ x=[x1 ... xm ] p i (1 − ρ c ), i = 1, . . . , m. subject to 0 ≤ xi ≤ Esw i (8) where Q = A−B, A is m×m matrix of onesPand B is m×m i i (b− m j=1 cj )(Ps −Pσ ) diagonal matrix with ith diagonal element . i Esw Denote ~x∗ as the optimal solution for the Q PB problem in (8), then the optimization solution for (6) can be obtained Pm x∗ (x∗ )2 i = Pm i x∗ (b − j=1 cj ) with Ki = 1 − Eii and Tof f sw j=1 j Proof: With Cauchy-Buniakowski-Schwartz’s inequality, we can get that: m p m m i X X X Esw (1 − Ki ) i i (1 − k ))2 ≥ ( Esw Tof i f · i T of f i=1 i=1 i=1 Pm E i (1−K ) The minimum value of i=1 swT i i can be obtained at of f √ i P ( m Esw (1−ki ))2 i=1 P when the following equation holds. m b− j=1 cj p m i (1 − K ) X Esw i i q cj ) (b − Tof = f Pm j j=1 (1 − K ) E sw j j=1 Then optimization formulation in (6) can be formulated as: Pm p i m ( i=1 Esw (1 − Ki ))2 X i Pm minimize (Ps − Pσi )Ki + K1 ,K2 ,...,Km b − j=1 cj i=1 ρ ci ≤ Ki ≤ 1, i = 1, . . . , m p i (1 − K ), formulation (6) can be By defining xi = Esw i transformed as the Q PB problem in (8). Note that there is a feasible region for b. To guarantee i all the resulting Pm Tof f ≥ 0, the bound-delay b should not be less than i=1 ci . According to (5), the maximum slope 1 . Corρ of bound-delay function will not exceed maxm i=1 ci respondingly, we derive the minimum bound-delay function 1 bdfmin (∆, maxm , b). By inverting (7), we can derive the i=1 ci maximum delay bu by (9), which can guarantee that all the resulting Ki will not exceed 1. In summary, the feasible region subject to of b ∈ [bl , bu ] can be bounded as follows: 1 bu = sup {d : bdf (∆, , d) ≥ αu (∆ − D), ∀∆ ≥ 0} maxm c i i=1 m X l ci (9) b = i=1 C. Two-Phase Heuristic With above information, we can now present the overall algorithm to the energy minimization problem defined in Section III-C. Basically, bounded-delay b is scanned by step ǫ within the range [bl , bh ]. For each b, we first solve the subproblem (8) with a Q PB solver. Then, the obtained solution is repaired to fulfill further constraints (will explain later on). The pseudo code of the algorithm is depicted in Algo. 1. Algorithm 1 PBOOA Input: αu , bl , bh , ǫ, and Pmin = ∞ ~ opt , T~of f, opt Output: K 1: for b = bl to bh with step ǫ do 2: compute ρ by Eqn. (7); ~ and T~of f by solving (8); 3: obtain K ~ 4: repair K and T~of f ; ~ T~of f ) < Pmin then 5: if P (K, ~ opt ← K ~ ; T~of f, opt ← T~of f ; 6: K ~ opt , T~of f, opt ); 7: Pmin ← P (K 8: end if 9: end for To solve the sub-problem (Line 3 in Algo. 1), we apply existing Q PB solver. According to [2], when Q is positive semi-definite, Q PB is solvable in polynomial time. Otherwise, Q PB can be seen as the non-convex quadratic programming problem which is NP-Hard. Nevertheless, there are approximation schemes [3] that can efficiently solve the non-convex Q PB and there are many excellent off-the-shelf software packages [2] available. In this paper, state-of-the-art finite B&B algorithm [2] is applied to solve our Q PB problem. Algorithm 2 Repair Scheme ~ T~of f ] Input: solution of Q PB problem:[K, ′ ′ ~ ~ Output: [K , Tof f ] i i 1: compute the stage set: S1 = {pi |Tof f < tsw }; ′ ′ 2: repair [K , Tof f ] of the stage p ∈ S1 as [1, 0]; P i 3: compute the loss Q = pi ∈S1 Tof f ; 4: reassign Q to stage p with maximum power savings; i i 5: compute Ton and the stage set: S2 = {pi |Tof f ≥ tsw }; 6: for each stage p ∈ S2 do 7: if Ton < c then ′ ′ 8: Ton ← c ; Tof f ← Tof f ; 9: else ′ ′ ′ ′ Ton 10: Ton ← ⌊ Ton c ⌋ c ; Tof f ← K − Ton ; ′ 11: if Tof f < tsw then ′ ′ 12: Ton ← ⌈ Ton c ⌉ c ; Tof f ← Tof f ; 13: end if 14: end if 15: end for ~ and T~of f , the repair phase After obtaining a pair of K (Line 4 in Algo. 1) is conducted to fulfill further constraints. This repair scheme is represented in Algo. 2. First of all, the i i resulting Tof f of pipeline stage i may be smaller than tsw . In i i the case that Tof f < tsw , turning off the processor of stage i is not possible. Therefore, the solution for stage i is repaired ′ i′ by [Ki , Tof f ] = [1, 0], stage i is on all the time (Line 2 in Algo. 2). However, this repair step will lead to the loss of sleep time Q (Line 3 in Algo. 2). We try to assign the loss Q to each stage by Tof f = Tof f + Q and compute their power savings by comparing with the previous solution. Then assign Q to the stage with maximum power saving (Line 4 i in Algo. 2). Second, the resulting Ton may not be a multiple of ci , which is one of our basic requirement. The repair steps i to be a multiple of ci (Line 5– are conducted to make Ton Line 15 in Algo. 2). It is worth noting that the repair phase we conduct can still guarantee the repaired solution to satisfy the constraints. VI. P ERFORMANCE E VALUATIONS In this section, we demonstrate the effectiveness of our approach. We compare our approach (PBOOA) with deadline partition approach (DPA), where DPA partitions the end-to-end deadline into sub-deadlines for individual pipeline stages and optimizes the overall energy consumption by using the scheme in [4] to minimize the energy consumption of individual pipeline stages. The simulation is implemented in Matlab using RTC-toolbox [13] and the finite B&B algorithm [2] is used to solve Q PB. All results are obtained from a 2.83GHz processor with 4GB memory. A. Simulation Setup The H.263 decoder shown in Fig. 1(a) is used as the test application. The execution time of each subtask in H.263 decoder application can be found in [9]. The event stream is specified by the PJD model. The activation period of the application is 300 ms with a end-to-end constraint of 600 ms. Regarding to processors for the pipeline architecture, we consider Marvell PXA270 processor and source its power profile from [11]. Standby power Ps and sleep power Pσ are respectively 0.260 Watt and 0.0154 Watt with considering switching time overhead tsw and energy overhead Esw as 0.067 sec and 10.19 mJ, respectively. B. Simulation Result We first evaluate how the power consumptions of the two approaches change as the jitter varies. Cases of 2-stage and 3-stage pipeline architectures with homogeneous PXA270 processors are evaluated. We vary the jitter of the stream from 0 to 840 ms. The simulation results are shown in Fig. 3. From figures, we can make the following observations: (1) PBOOA always outperforms DPA for both pipeline architectures. PBOOA on average can achieve 16.8% and 20.09% normalized power savings w.r.t DPA on 2-stage and 3-stage pipeline architectures, respectively; (2) PBOOA can achieve more power savings on 3-stage pipeline than 2-stage pipeline 0.23 0.25 0.2 0.2 0.17 0.15 0.14 DPA PBOOA NPS 0.1 0 200 400 Jitter (msec) 600 800 Nomalized Power Savings(NPS) Average Idle Power (Watt) 0.3 0.11 0.45 0.24 0.4 0.22 0.35 0.2 0.3 0.18 DPA PBOOA NPS 0.25 0 200 400 Jitter (msec) 600 800 Nomalized Power Savings(NPS) Average Idle Power (Watt) (a) 2-stage:(PD1,deQ)→P 1,(IDCT,MC)→P 2 0.16 (b) 3-stage:(PD1,deQ)→P 1,IDCT→P 2,MC→P 3 Fig. 3. Power consumption of PBOOA and DPA on 2-stage and 3-stage homogeneous pipelined system with varying jitter for different jitter setting. The reason is that DPA on 3-stage pipeline pays burst for more times than 2-stage platform, which leads to PBOOA can achieve more power savings on 3-stage pipeline; (3) The power consumptions of both approaches increase as the jitter increases, since the bigger jitter requires the longer Ton to guarantee the worst-case endto-end deadline. 4 0.8 Power (Watt) 10 Power(PBOOA) Power(DPA) Computation Time(PBOOA) Computation Time(DPA) 3 10 2 0.6 10 0.4 10 0.2 10 0 2 1 0 Computation Time (Min) 1 −1 3 4 5 6 Stage Num 7 8 10 9 Fig. 4. Computation time and power computation for heterogeneous pipelined system Second, we demonstrate the scalability of our approach. We test our approach by up to 9-stage heterogeneous pipeline with jitter of 300 ms. Power profile of processors are randomly generated, while the range is set according to PXA270 processor. Fig. 4 shows the power consumption and computation overhead on different pipelines. From this figure, we can have below observations: (1) The DPA approach is time consuming. For the case of 3-stage pipeline, DPA takes almost four hours, which is 65 times longer than PBOOA on the same pipeline. In addition, the 4-stage case needs 15 times more computing time than the 3-stage case. When core number exceeds 4, deadline partition approach fails to provide a result due to expiration of time budget. (2) PBOOA is considerably fast. The 2-stage takes about three minutes. Even with the case of 9-stage pipeline, PBOOA needs 2.5 times more computing time than the 2-stage case. VII. C ONCLUSION This paper presents a new approach to minimize the energy consumption of pipelined systems. Our approach can tackle streaming applications with non-deterministic workload arrivals under hard real-time constraints. This approach can not only guarantee the original end-to-end deadline requirement but also retrieve the pay-burst-only-once phenomena, resulting in a significant reduction in both the energy consumption and computing overhead. Moreover, our approach is scalable with respect to the number of pipelined stages. Regarding to future work, it is an interesting problem to combine our approach with the consideration of the mapping of the application. ACKNOWLEDGMENT This work has been partly funded by German BMBF projects ECU (grant number: 13N11936) and Car2X (grant number: 13N11933). R EFERENCES [1] S. Carta, A. Alimonda, A. Pisano, A. Acquaviva, and L. Benini. A control theoretic approach to energy-efficient pipelined computation in mpsocs. ACM Transactions on Embedded Computing Systems, 2007. [2] J. Chen and S. Burer. Globally solving nonconvex quadratic programming problems via completely positive programming. Mathematical Programming Computation, 2012. [3] M. Fu, Z.-Q. Luo, and Y. Ye. Approximation algorithms for quadratic programming. Journal of Combinatorial Optimization, 1998. [4] K. Huang, L. Santinelli, J.-J. Chen, L. Thiele, and G. Buttazzo. Periodic power management schemes for real-time event streams. In CDC, 2009. [5] K. Huang, L. Santinelli, J.-J. Chen, L. Thiele, and G. Buttazzo. Applying real-time interface and calculus for dynamic power management in hard real-time systems. Real-Time Systems, 2011. [6] H. Javaid, M. Shafique, S. Parameswaran, and J. Henkel. Low-power adaptive pipelined mpsocs for multimedia: An h.264 video encoder case study. In DAC, 2011. [7] J. Le Boudec and P. Thiran. Network Calculus: A Theory of Deterministic Queuing Systems for the Internet. Springer, 2001. [8] S. Maxiaguine, A. Chakraborty and L. Thiele. Dvs for buffer-constrained architectures with predictable qos-energy tradeoffs. In CODES+ISSS, 2005. [9] H. Oh and S. Ha. Hardware-software cosynthesis of multi-mode multitask embedded systems with real-time constraints. In CODES+ISSS, 2002. [10] S. Perathoner, K. Lampka, N. Stoimenov, L. Thiele, and J.-J. Chen. Combining optimistic and pessimistic dvs scheduling: An adaptive scheme and analysis. In ICCAD, 2010. [11] Marvell PXA270. http://www.marvell.com/application-processors. [12] K. Srinivasan and K. S. Chatha. Integer linear programming and heuristic techniques for system-level low power scheduling on multiprocessor architectures under throughput constraints. Integration,the VLSI Journal, 2007. [13] E. Wandeler and L. Thiele. Real-Time Calculus (RTC) Toolbox. http://www.mpa.ethz.ch/Rtctoolbox, 2006. [14] R. Xu, R. Melhem, and D. Mosse. Energy-aware scheduling for streaming applications on chip multiprocessors. In RTSS, 2007. [15] Y. Yu and V. Prasanna. Power-aware resource allocation for independent tasks in heterogeneous real-time systems. In ICPADS, 2002.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

### Related manuals

Download PDF

advertisement