Energy Optimization with Worst-Case Deadline Guarantee for Pipelined Multiprocessor Systems Gang Chen

Energy Optimization with Worst-Case Deadline Guarantee for Pipelined Multiprocessor Systems Gang Chen
Energy Optimization with Worst-Case Deadline
Guarantee for Pipelined Multiprocessor Systems
Gang Chen
Kai Huang
Christian Buckl
Alois Knoll
TU Munich, Germany
[email protected]
TU Munich, Germany
[email protected]
fortiss GmbH, Germany
[email protected]
TU Munich, Germany
[email protected]
Abstract—Pipelined computing is a promising paradigm for
embedded system design. Designing the scheduling policy for a
pipelined system is however more involved. In this paper, we
study the problem of the energy minimization for coarse-grained
pipelined systems under hard real-time constraints and propose
a method based on an inverse use of the pay-burst-only-once
principle. We formulate the problem by means of the resource
demands of individual pipeline stages and solve it by quadratic
programming. Our approach is scalable w.r.t the number of the
pipeline stages. Simulation results using real-life applications as
well as commercialized processors are presented to demonstrate
the effectiveness of our method.
I. I NTRODUCTION
Pipelined computing is a promising paradigm for embedded
system design, which can in principle provide high performance and low energy consumption [1]. For instance, a
streaming application can be split into a sequence of functional blocks that are computed by a pipeline of processors
where clock/power-gating techniques can be applied to achieve
energy efficiency.
Designing the scheduling policy for the pipeline stages
under the requirements of both energy efficiency and timing
guarantee is however non-trivial. In general, energy efficiency
and timing guarantee are conflict objectives, i.e., techniques
that reduce the energy consumption of the system will usually
pay the price of longer execution time, and vice versa. Previous
work on this topic either requires precise timing information
of the system [15], [14], [12] or tackles only soft realtime requirements [6], [1]. In the context of hard real-time
systems, seldom work has been published that can handle nondeterministic workloads.
This paper studies the energy-minimization problem of
coarse-grained pipelined systems under hard real-time requirements. We consider a streaming application that is split into
a sequence of coarse-grained functional blocks which are
mapped to a pipeline architecture for processing. The workload
of the streaming application is abstracted as an event stream
and the event arrivals of the stream are modeled as the arrival
curves in interval domain [7]. The event stream has an endto-end deadline requirement, i.e., the time by which any event
in the stream travels through the pipeline should be no longer
than this required deadline. The objective is thereby to find
the optimal scheduling policies for individual stages of the
pipeline with minimal energy consumption while the deadline
requirement of the event stream is guaranteed.
c
978-3-9815370-0-0/DATE13/°2013
EDAA
Intuitively, the problem can be solved by partitioning the
end-to-end deadline into sub-deadlines for individual pipeline
stages and optimizing the overall energy consumption based on
the partitioned sub-deadlines. However, any partition strategy
based on the end-to-end deadline and the follow-up optimization method will suffer from counting multiple times of
the burst of the event stream, which will inevitably overestimate the needed resource for each pipeline stage and lead
to poor energy saving. A motivation example in Section IV
will demonstrate this drawback in details. Therefore, more
sophisticated method is needed to tackle this problem.
Our idea to solve this problem lies in an inverse use of the
known pay-burst-only-once principle [7]. Rather than directly
partitioning the end-to-end deadline, we compute for the entire
pipeline one service curve which serves as a constraint for the
minimal resource demand. The energy minimization problem
is then formulated with respect to the individual resource demands of pipeline stages and is solved with standard quadratic
programming. For simplicity, we consider power-gating energy
minimization and use periodic dynamic power management to
reduce the leakage power, i.e., to periodically turn on and off
the processors of the pipeline. Note that the basic idea can
also be applied to clock-gating energy reduction. With this
approach, we can not only guarantee the overall end-to-end
deadline requirement but also retrieve the pay-burst-only-once
phenomena, resulting in a significant reduction of the energy
consumption. In addition, our method is scalable with respect
to the number of the pipeline stages. The contributions of this
paper are summarized as follows:
•
•
•
•
A new method is developed to solve the energyminimization problem for pipelined multi-processor embedded systems by inversely using the pay-burst-onlyonce principle.
We derive a formulation of the minimization problem
based on the needed resource of individual stages of the
pipeline architecture and a transformation of the formulation to a standard quadratic programming problem with
box constraints.
A two-phase heuristic is developed to solve the formulated problem and a formal proof is provided to show
the correctness of our approach, i.e., guarantee on the
end-to-end deadline requirement.
We conduct simulation using real-life applications as
well as commercialized processors to demonstrate the
effectiveness of our method.
The rest of the paper is organized as follows: Section II
reviews related work in the literature. Section III presents basic
models and the definition of the studied problem. Section IV
presents the motivation example and Section V describes the
proposed approach. Experimental evaluation is presented in
Section VI and Section VII concludes the paper.
II. R ELATED W ORK
Energy optimization for pipelined multiprocessor systems is
an interesting topic where numbers of techniques have been
proposed in the literature. For instance, approaches based on
control theory [1] and runtime workload prediction [6] are
proposed, targeting energy minimization under soft real-time
constraints. There are also methods [15], [14], [12] for hard
real-time systems. But these methods require precise timing
information of task arrivals, e.g., periodic arrivals. However, in
practice, this precise timing information of task arrivals might
not be known in advance, since arrival time of tasks depends
on many nonfunctional factors, e.g., environmental impacts.
There are also many works on hard real-time systems but
allowing non-deterministic task arrivals. By using the arrival
curve model [7] to abstract task arrivals into time interval domain, techniques based on dynamic frequency scaling [8], [10]
and dynamic power management [5], [4] have been recently
proposed for uni-processor systems. Nevertheless, how to cope
with multiple processors is not yet clear. In this paper, we
present an approach to derive energy-efficient scheduling with
hard real-time constraints for pipelined multiprocessor systems
using the arrival curve model.
III. M ODELS AND P ROBLEM D EFINITION
A. Hardware Model
We consider the system with pipeline architecture showed
in Fig. 1(a). Each processor in the pipelined system has three
power consumption modes, namely active, standby, and sleep
modes, as shown in Fig. 1(b). To serve events, the processor
must be in the active mode with power consumption Pa .
When there is no event to process, the processor can switch
to sleep mode with lower power consumption Pσ . However,
mode-switching from sleep mode to active mode will cause
additional energy and latency penalty, respectively denoted as
Esw,on and tsw,on . To prevent the processor from frequent
mode switches, the processor can stay at standby mode with
power consumption Ps , which is less than Pa but more than
Pσ , i.e. Pa > Ps > Pσ . Moreover, the mode-switch from
active (standby) mode to sleep mode will cause energy and
time overhead, respectively denoted by Esw,sleep and tsw,sleep .
B. Task Model
This paper considers streaming applications that can be split
into a sequence of tasks. As shown in Fig. 1(a), a H.263 decoder is represented as four tasks (i.e., PD1, deQ, IDCT, MC)
implemented in a pipeline fashion [9]. To model the workload
of the application, the concept of arrival curve α(∆) =
FIFO
PD1
Processor1
deQ
FIFO
Processor2
IDCT
FIFO
Processor3
MC
Processor4
(a) H.263 decoder on pipeline hardware architecture
active (Pa )
sleep (Pσ )
Tof f
Ton
standby
(Ps )
t
Ton
(b) Power model of a processor
Fig. 1.
System model
[αu (∆), αl (∆)], originated from Network Calculus [7], is
adopted. αu (∆) and αl (∆) provides the upper and lower
bounds on the number of arrival events for the stream S in any
time interval ∆. Analogous to arrival curves that provide an
abstract event stream model, a tuple β(∆) = [β u (∆), β l (∆)]
defines an abstract resource model which provides an upper
and lower bounds on the available resources in any time
interval ∆. Note that arrival curves are event-based and service
curves are based on amount of computation time. Suppose that
the execution time of an event is c, the transformation
of uthe
l
service curves can be done by β̄ l = ⌊ βc ⌋ and β̄ u = ⌊ βc ⌋.
With these definitions, a processor with lower service curve
β̄ Gl (∆) is said to satisfy the deadline D for the event stream
specified by αu (∆), if the following condition holds.
β̄ Gl (∆) ≥ αu (∆ − D), ∀∆ ≥ 0
(1)
C. Problem Statement
This paper considers periodic power management [4] that
periodically turns on and off a processor. In each period T =
Ton + Tof f , switch the processor to active (standby) mode for
Ton time units, following by Tof f time units in sleep mode,
as shown in Fig. 1(b). Given a time interval L, where L ≫
T and TL is an integer. Suppose that γ(L) is the number of
events of event stream S served in L. If all the served events
finish within L, the energy consumption E(L, Ton , Tof f ) by
applying this periodic scheme is
L
(Esw,on + Esw,sleep )
E(L, Ton , Tof f ) =
Ton + Tof f
L · Ton
L · Tof f
+
Ps +
Pσ
Ton + Tof f
Ton + Tof f
+c · γ(L)(Pa − Ps )
L · Ton (Ps − Pσ )
L · Esw
+
=
Ton + Tof f
Ton + Tof f
+L · Pσ + c · γ(L)(Pa − Ps )
where Esw is Esw,on + Esw,sleep for brevity. Given a sufficiently large L, without changing the scheduling policy, the
minimization of energy consumption E(L, Ton , Tof f ) of a
single processor is to find Tof f and Ton such that the average
idle power consumption P (Ton , Tof f ) is minimized.
·(Ps −Pσ )
L·Esw
+ L·TTon
T +T
def
on +Tof f
P (Ton , Tof f ) = on of f
L
(2)
Esw + Ton · (Ps − Pσ )
=
Ton + Tof f
Based on (2), the energy minimization problem of a m-stage
pipeline can be formulated as minimizing following function:
m
i
i
X
Esw
+ Ton
· (Psi − Pσi )
P (T~on , T~of f ) =
(3)
i + Ti
Ton
of f
i
m
2
1
] and T~of f =
. . . Ton
Ton
where T~on = [Ton
1
2
m
[Tof f Tof f . . . Tof f ]. Now we can define the problem that
we studied as follows:
Given pipelined platform with m stages, an event
stream S processed by this pipeline, and an endto-end deadline requirement D, we are to find a
set of periodic power managements characterized
by T~on and T~of f that minimize the average idle
power consumption P defined in Eqn. (3), while
guaranteeing that the worst-case end-to-end delay
does not exceed D.
IV. M OTIVATION E XAMPLE
This section presents a motivation example, where an event
stream passes through a 2-stage pipeline with a deadline
requirement D. For simplicity, arrival curves in the leakybucket form and service curves in rate-latency form [7] are
used. In this representation, an arrival curve is modeled as
α(∆) = b + r · ∆, where b is the burst and r is the
leaky rate. Correspondingly, a service curve is modeled as
β(∆) = R · (∆ − T ), where R is service rate and T is
the delay. A graphical illustration of the example is shown
in Fig. 2, where D = 20, b = 5, r = 0.5, and R1 = R2 = 1.
We first inspect the strategy of partitioning the end-to-end
deadline and using the partitioned sub-deadlines for the two
pipeline stages. For simplicity, we split the D equally, i.e.,
D/2 for each stage. As shown in Fig. 2, given D/2 deadline
requirement for the first pipeline stage, we obtain the maximal
b
T1 = D
2 − R1 = 5, corresponding to the minimal service
demand β1 = ∆ − 5. To derive the minimal β2 for the second
stage of the pipeline is more involved. We need the output
arrival curve α′ from the first stage. According to [7], α′ (∆) =
b + r · T1 + r · ∆. Now again with a deadline requirement D/2
b+r·T1
= 2.5.
for α′ , we have T2 = D
2 −
R1
α′
14
α
12
10
β1
8
D2 = 10
β2
6
4
2
βT l
D1 = 10
D = 20
T1 = 5
T = 15
T2 = 2.5
2
4
6
Fig. 2.
8
10 12 14 16 18 20
∆
Motivation example.
Lets take a close look at this solution. According to the
concatenation theorem βR1 ,T1 ⊗ βR2 ,T2 = βmin(R1 ,R2 ),T1 +T2 ,
we get a concatenated service curve β = ∆ − (T1 + T2 ) =
∆ − 7.5. With this concatenated service curve, the maximal
overall end-to-end deadline for β1 and β2 is 12.5 which is far
too stricter than D. This example indicates that the obtained
β1 and β2 based on partitioning the end-to-end deadline is too
pessimistic.
The reason for the pessimism comes from paying the
burst b/R1 for the second stage of the pipeline as well as
1
the additional delay r·T
R2 from the first stage, as the payburst-only-once principle points out. These effects will be
accumulated for every stage of the pipeline, leading to even
more pessimistic results, as the number of the pipeline stages
increases. In addition, computing the resource demand of each
stage requires the lower bound of the output arrival curve
from the previous stage. Computing this output curve requires
numerical min-plus convolution which will incur considerable
computational and memory overheads. In conclusion, the
strategy based on partitioning the end-to-end deadline is not
a viable approach, in particular for the cases of pipelined
systems with many stages.
On the other hand, one can first derive the total server
demand β T l , in this case T = 15. Any partition based on
this T will result in smaller but valid service curves for each
pipeline stage, as we can always retrieve the original end-toend deadline by means of the pay-burst-only-once principle.
For example, by an equal partition of T , both T1 and T2 are
7.5 and D is still preserved. This brings the basic idea of our
approach that will be presented in the next section.
V. P ROPOSED A PPROACH
Our approach lies in an inverse use of the pay-burst-onlyonce principle, as mentioned in the previous section. Rather
than directly partitioning the end-to-end deadline, we compute
one service curve for the entire pipeline which serves as
a constraint for the minimal resource demand. The energy
minimization problem is then formulated with respect to the
resource demands for individual pipeline stages. To solve
this minimization problem, the formulation is transformed
into a quadratic programming form and solved by a 2-phase
heuristic.
Without loss of generality, a pipelined system with m
heterogeneous stages (m ≥ 2) is considered. The processor of
the i stage can provide minimal βiGl service. Since periodic
power management is considered, the minimal service βiGl can
i
i
be modeled as an Ton
and Tof
f pair:
l ∆ − Ti m
of f
i
)⊗∆
(4)
βiGl (∆) = (Ton
i + Ti
Ton
of f
In addition, to obtain a tightened lower bound of service curve
i
of the entire pipeline, we restrict Ton
as a multiple of the worst
i
case execution time ci , i.e., Ton = ni ci , ni ∈ N + .
A. Problem Formulation
Before presenting the formulation, we first state a few bases.
i
Ton
, we have the following two
By defining Ki = T i +T
i
on
of f
lemmas.
i
i
Lem. 1: β̄iGl (∆) ≥ K
ci (∆ − Tof f − ci )
overcomes the problem of payingNburst multiple times, on
the other hand avoids the costly
computation during the
optimization. Second, this formulation allows us to use more
efficient method to analyze the problem, which will be present
in the following sections.
Proof:
i
¹ Ton
§
i
∆−Tof
f
i +T i
Ton
of f
¨
º
¹
º
∆
⊗
ci
ci
i
∆ − Tof
1
f
≥ ni ( i
) ⊗ (∆ − ci )
i
ci
Ton + Tof
f
Ki
i
≥
(∆ − Tof f − ci )
ci
β̄iGl (∆) ≥
Lem. 2:
m
N
β̄i
Gl
m
¡
i
≥ min( K
ci ) ∆ −
i=1
i=1
m
P
i=1
i
(Tof
f + ci )
B. Quadratic Programming Transformation
¢
Proof: It can be directly derived from the definition of
min-plus convolution [7] and Lem. 1.
With Lem. 2, we state below theorem.
Thm. 1: Assuming an event stream modeled with arrival
curve α is processed by an m-stage pipeline and the lower
i
service curve of each pipeline stage is defined by a Ton
and
i
Tof f pair, the pipelined system satisfies an end-to-end deadline
D, if the following condition holds:
m
´
X
m Ki ³
i
(Tof
+
c
)
≥ αu (∆ − D)
(5)
min( ) ∆ −
i
f
i=1 ci
i=1
Proof: In Lem. 2, the right hand side of inequality is
m
N
Gl
a lower bound of
β̄i which is the concatenated service
i=1
curve of the pipeline. With
m
N
i=1
β̄i
Gl
≥ αu (∆−D), the end-to-
end delay of the pipeline is no more than D, according to the
pay-burst-only-once principle. Therefore, the theorem holds.
The left hand side of the inequality Eqn. (5) can be
considered as a bounded-delay function bdf (∆, ρ0 , b0 ) =
Ki
max(0, ρ0 (∆ − b0 )) with slope ρ0 = minm
i=1 ( ci ) and
Pm
i
bounded-delay b0 =
i=1 (Tof f + ci ). For the stream S
with deadline D, a set of minimum bounded-delay functions
bdfmin (∆, ρ, b) can be derived by varying b (See Section V-B).
~ T~of f ] such that the
Therefore, we should find a solution of [K,
resulting bounded-delay function bdf (∆, ρ0 , b0 ) is no less than
minimum bounded-delay functions bdfmin (∆, ρ, b). Therefore,
we can formulate our optimization problem as following:
minimize
~ T~of f )
P (K,
subject to
min(
~ T
~of f
K,
m
i=1
m
X
i=1
Ki
)≥ρ
ci
(6)
i
(Tof
f + ci ) ≤ b
0 ≤ Ki ≤ 1, i = 1, . . . , m
i
Tof
f ≥ 0, i = 1, . . . , m
~ = [K1 . . . Kn ]. P (K,
~ T~of f ) is obtained as follows
where K
i
Ton
by conducting a transformation Ki = T i +T
to (3).
i
on
of f
m
X
E i (1 − Ki )
~ T~of f ) =
+ (Psi − Pσi ) Ki )
( sw i
P (K,
T
of f
i
The advantage of the formulation (6) is two-fold. First of
all, the service curves of individual pipeline stages are the
variables of the optimization problem, which on the one hand
How to solve the minimization problem (6) is not obvious.
The constraints b and ρ indeed are not fixed values. In
addition, these two constraints are correlated. For a fixed b,
the minimum bounded-delay function bdfmin (∆, ρ, b) can be
determined by computing ρ:
ρ = inf {ρ : bdf (∆, ρ, b) ≥ αu (∆ − D), ∀∆ ≥ 0} (7)
In this paper, we conduct the optimization by varying b and
computing ρ for every possible b. For a fixed b, we can
transform (6) into a quadratic programming problem with box
constraints(Q PB), as stated in the following lemma.
Lem. 3: The minimization problem in (6) can be transformed as the following quadratic programming problem with
box constraints:
minimize
~xT Q~x
~
x=[x1 ... xm ]
p
i (1 − ρ c ), i = 1, . . . , m.
subject to
0 ≤ xi ≤ Esw
i
(8)
where Q = A−B, A is m×m matrix of onesPand B is m×m
i
i
(b− m
j=1 cj )(Ps −Pσ )
diagonal matrix with ith diagonal element
.
i
Esw
Denote ~x∗ as the optimal solution for the Q PB problem in
(8), then the optimization solution for (6) can be obtained
Pm
x∗
(x∗ )2
i
= Pm i x∗ (b − j=1 cj )
with Ki = 1 − Eii and Tof
f
sw
j=1 j
Proof: With Cauchy-Buniakowski-Schwartz’s inequality,
we can get that:
m p
m
m
i
X
X
X
Esw
(1 − Ki )
i
i (1 − k ))2
≥
(
Esw
Tof
i
f ·
i
T
of
f
i=1
i=1
i=1
Pm E i (1−K )
The minimum value of i=1 swT i i can be obtained at
of f
√ i
P
( m
Esw (1−ki ))2
i=1 P
when
the
following
equation holds.
m
b− j=1 cj
p
m
i (1 − K )
X
Esw
i
i
q
cj )
(b
−
Tof
=
f
Pm
j
j=1
(1
−
K
)
E
sw
j
j=1
Then optimization formulation
in
(6)
can
be formulated as:
Pm p i
m
( i=1 Esw
(1 − Ki ))2 X i
Pm
minimize
(Ps − Pσi )Ki
+
K1 ,K2 ,...,Km
b − j=1 cj
i=1
ρ ci ≤ Ki ≤ 1, i = 1, . . . , m
p
i (1 − K ), formulation (6) can be
By defining xi = Esw
i
transformed as the Q PB problem in (8).
Note that there is a feasible region for b. To guarantee
i
all the resulting
Pm Tof f ≥ 0, the bound-delay b should not
be less than i=1 ci . According to (5), the maximum slope
1
. Corρ of bound-delay function will not exceed maxm
i=1 ci
respondingly, we derive the minimum bound-delay function
1
bdfmin (∆, maxm
, b). By inverting (7), we can derive the
i=1 ci
maximum delay bu by (9), which can guarantee that all the
resulting Ki will not exceed 1. In summary, the feasible region
subject to
of b ∈ [bl , bu ] can be bounded as follows:
1
bu = sup {d : bdf (∆,
, d) ≥ αu (∆ − D), ∀∆ ≥ 0}
maxm
c
i
i=1
m
X
l
ci
(9)
b =
i=1
C. Two-Phase Heuristic
With above information, we can now present the overall
algorithm to the energy minimization problem defined in Section III-C. Basically, bounded-delay b is scanned by step ǫ
within the range [bl , bh ]. For each b, we first solve the subproblem (8) with a Q PB solver. Then, the obtained solution
is repaired to fulfill further constraints (will explain later on).
The pseudo code of the algorithm is depicted in Algo. 1.
Algorithm 1 PBOOA
Input: αu , bl , bh , ǫ, and Pmin = ∞
~ opt , T~of f, opt
Output: K
1: for b = bl to bh with step ǫ do
2:
compute ρ by Eqn. (7);
~ and T~of f by solving (8);
3:
obtain K
~
4:
repair K and T~of f ;
~ T~of f ) < Pmin then
5:
if P (K,
~ opt ← K
~ ; T~of f, opt ← T~of f ;
6:
K
~ opt , T~of f, opt );
7:
Pmin ← P (K
8:
end if
9: end for
To solve the sub-problem (Line 3 in Algo. 1), we apply
existing Q PB solver. According to [2], when Q is positive
semi-definite, Q PB is solvable in polynomial time. Otherwise,
Q PB can be seen as the non-convex quadratic programming
problem which is NP-Hard. Nevertheless, there are approximation schemes [3] that can efficiently solve the non-convex
Q PB and there are many excellent off-the-shelf software packages [2] available. In this paper, state-of-the-art finite B&B
algorithm [2] is applied to solve our Q PB problem.
Algorithm 2 Repair Scheme
~ T~of f ]
Input: solution of Q PB problem:[K,
′
′
~
~
Output: [K , Tof f ]
i
i
1: compute the stage set: S1 = {pi |Tof
f < tsw };
′
′
2: repair [K , Tof f ] of the stage p ∈ S1 as [1, 0];
P
i
3: compute the loss Q =
pi ∈S1 Tof f ;
4: reassign Q to stage p with maximum power savings;
i
i
5: compute Ton and the stage set: S2 = {pi |Tof
f ≥ tsw };
6: for each stage p ∈ S2 do
7:
if Ton < c then
′
′
8:
Ton ← c ; Tof f ← Tof f ;
9:
else
′
′
′
′
Ton
10:
Ton ← ⌊ Ton
c ⌋ c ; Tof f ← K − Ton ;
′
11:
if Tof f < tsw then
′
′
12:
Ton ← ⌈ Ton
c ⌉ c ; Tof f ← Tof f ;
13:
end if
14:
end if
15: end for
~ and T~of f , the repair phase
After obtaining a pair of K
(Line 4 in Algo. 1) is conducted to fulfill further constraints.
This repair scheme is represented in Algo. 2. First of all, the
i
i
resulting Tof
f of pipeline stage i may be smaller than tsw . In
i
i
the case that Tof f < tsw , turning off the processor of stage i
is not possible. Therefore, the solution for stage i is repaired
′
i′
by [Ki , Tof
f ] = [1, 0], stage i is on all the time (Line 2
in Algo. 2). However, this repair step will lead to the loss
of sleep time Q (Line 3 in Algo. 2). We try to assign the
loss Q to each stage by Tof f = Tof f + Q and compute their
power savings by comparing with the previous solution. Then
assign Q to the stage with maximum power saving (Line 4
i
in Algo. 2). Second, the resulting Ton
may not be a multiple
of ci , which is one of our basic requirement. The repair steps
i
to be a multiple of ci (Line 5–
are conducted to make Ton
Line 15 in Algo. 2). It is worth noting that the repair phase
we conduct can still guarantee the repaired solution to satisfy
the constraints.
VI. P ERFORMANCE E VALUATIONS
In this section, we demonstrate the effectiveness of our
approach. We compare our approach (PBOOA) with deadline
partition approach (DPA), where DPA partitions the end-to-end
deadline into sub-deadlines for individual pipeline stages and
optimizes the overall energy consumption by using the scheme
in [4] to minimize the energy consumption of individual
pipeline stages. The simulation is implemented in Matlab
using RTC-toolbox [13] and the finite B&B algorithm [2] is
used to solve Q PB. All results are obtained from a 2.83GHz
processor with 4GB memory.
A. Simulation Setup
The H.263 decoder shown in Fig. 1(a) is used as the test
application. The execution time of each subtask in H.263
decoder application can be found in [9]. The event stream
is specified by the PJD model. The activation period of the
application is 300 ms with a end-to-end constraint of 600 ms.
Regarding to processors for the pipeline architecture, we
consider Marvell PXA270 processor and source its power
profile from [11]. Standby power Ps and sleep power Pσ
are respectively 0.260 Watt and 0.0154 Watt with considering
switching time overhead tsw and energy overhead Esw as
0.067 sec and 10.19 mJ, respectively.
B. Simulation Result
We first evaluate how the power consumptions of the two
approaches change as the jitter varies. Cases of 2-stage and
3-stage pipeline architectures with homogeneous PXA270
processors are evaluated. We vary the jitter of the stream
from 0 to 840 ms. The simulation results are shown in Fig. 3.
From figures, we can make the following observations: (1)
PBOOA always outperforms DPA for both pipeline architectures. PBOOA on average can achieve 16.8% and 20.09%
normalized power savings w.r.t DPA on 2-stage and 3-stage
pipeline architectures, respectively; (2) PBOOA can achieve
more power savings on 3-stage pipeline than 2-stage pipeline
0.23
0.25
0.2
0.2
0.17
0.15
0.14
DPA
PBOOA
NPS
0.1
0
200
400
Jitter (msec)
600
800
Nomalized Power Savings(NPS)
Average Idle Power (Watt)
0.3
0.11
0.45
0.24
0.4
0.22
0.35
0.2
0.3
0.18
DPA
PBOOA
NPS
0.25
0
200
400
Jitter (msec)
600
800
Nomalized Power Savings(NPS)
Average Idle Power (Watt)
(a) 2-stage:(PD1,deQ)→P 1,(IDCT,MC)→P 2
0.16
(b) 3-stage:(PD1,deQ)→P 1,IDCT→P 2,MC→P 3
Fig. 3. Power consumption of PBOOA and DPA on 2-stage and 3-stage
homogeneous pipelined system with varying jitter
for different jitter setting. The reason is that DPA on 3-stage
pipeline pays burst for more times than 2-stage platform,
which leads to PBOOA can achieve more power savings
on 3-stage pipeline; (3) The power consumptions of both
approaches increase as the jitter increases, since the bigger
jitter requires the longer Ton to guarantee the worst-case endto-end deadline.
4
0.8
Power (Watt)
10
Power(PBOOA)
Power(DPA)
Computation Time(PBOOA)
Computation Time(DPA)
3
10
2
0.6
10
0.4
10
0.2
10
0
2
1
0
Computation Time (Min)
1
−1
3
4
5
6
Stage Num
7
8
10
9
Fig. 4. Computation time and power computation for heterogeneous pipelined
system
Second, we demonstrate the scalability of our approach.
We test our approach by up to 9-stage heterogeneous pipeline
with jitter of 300 ms. Power profile of processors are randomly
generated, while the range is set according to PXA270 processor. Fig. 4 shows the power consumption and computation
overhead on different pipelines. From this figure, we can have
below observations: (1) The DPA approach is time consuming.
For the case of 3-stage pipeline, DPA takes almost four hours,
which is 65 times longer than PBOOA on the same pipeline.
In addition, the 4-stage case needs 15 times more computing
time than the 3-stage case. When core number exceeds 4,
deadline partition approach fails to provide a result due to
expiration of time budget. (2) PBOOA is considerably fast.
The 2-stage takes about three minutes. Even with the case
of 9-stage pipeline, PBOOA needs 2.5 times more computing
time than the 2-stage case.
VII. C ONCLUSION
This paper presents a new approach to minimize the energy consumption of pipelined systems. Our approach can
tackle streaming applications with non-deterministic workload
arrivals under hard real-time constraints. This approach can not
only guarantee the original end-to-end deadline requirement
but also retrieve the pay-burst-only-once phenomena, resulting
in a significant reduction in both the energy consumption and
computing overhead. Moreover, our approach is scalable with
respect to the number of pipelined stages. Regarding to future
work, it is an interesting problem to combine our approach
with the consideration of the mapping of the application.
ACKNOWLEDGMENT
This work has been partly funded by German BMBF
projects ECU (grant number: 13N11936) and Car2X (grant
number: 13N11933).
R EFERENCES
[1] S. Carta, A. Alimonda, A. Pisano, A. Acquaviva, and L. Benini. A
control theoretic approach to energy-efficient pipelined computation in
mpsocs. ACM Transactions on Embedded Computing Systems, 2007.
[2] J. Chen and S. Burer. Globally solving nonconvex quadratic programming problems via completely positive programming. Mathematical
Programming Computation, 2012.
[3] M. Fu, Z.-Q. Luo, and Y. Ye. Approximation algorithms for quadratic
programming. Journal of Combinatorial Optimization, 1998.
[4] K. Huang, L. Santinelli, J.-J. Chen, L. Thiele, and G. Buttazzo. Periodic
power management schemes for real-time event streams. In CDC, 2009.
[5] K. Huang, L. Santinelli, J.-J. Chen, L. Thiele, and G. Buttazzo. Applying
real-time interface and calculus for dynamic power management in hard
real-time systems. Real-Time Systems, 2011.
[6] H. Javaid, M. Shafique, S. Parameswaran, and J. Henkel. Low-power
adaptive pipelined mpsocs for multimedia: An h.264 video encoder case
study. In DAC, 2011.
[7] J. Le Boudec and P. Thiran. Network Calculus: A Theory of Deterministic Queuing Systems for the Internet. Springer, 2001.
[8] S. Maxiaguine, A. Chakraborty and L. Thiele. Dvs for buffer-constrained
architectures with predictable qos-energy tradeoffs. In CODES+ISSS,
2005.
[9] H. Oh and S. Ha. Hardware-software cosynthesis of multi-mode multitask embedded systems with real-time constraints. In CODES+ISSS,
2002.
[10] S. Perathoner, K. Lampka, N. Stoimenov, L. Thiele, and J.-J. Chen.
Combining optimistic and pessimistic dvs scheduling: An adaptive
scheme and analysis. In ICCAD, 2010.
[11] Marvell PXA270.
http://www.marvell.com/application-processors.
[12] K. Srinivasan and K. S. Chatha. Integer linear programming and heuristic techniques for system-level low power scheduling on multiprocessor
architectures under throughput constraints. Integration,the VLSI Journal,
2007.
[13] E. Wandeler and L. Thiele. Real-Time Calculus (RTC) Toolbox.
http://www.mpa.ethz.ch/Rtctoolbox, 2006.
[14] R. Xu, R. Melhem, and D. Mosse. Energy-aware scheduling for
streaming applications on chip multiprocessors. In RTSS, 2007.
[15] Y. Yu and V. Prasanna. Power-aware resource allocation for independent
tasks in heterogeneous real-time systems. In ICPADS, 2002.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement