Applying Pay-Burst-Only-Once Principle for Periodic Power

Applying Pay-Burst-Only-Once Principle for Periodic Power
Applying Pay-Burst-Only-Once Principle for Periodic Power
Management in Hard Real-Time Pipelined Multiprocessor Systems
GANG CHEN, Technische Universität Muenchen
KAI HUANG, Technische Universität Muenchen and Sun Yat-sen University
CHRISTIAN BUCKL, Fortiss GmbH
ALOIS KNOLL, Technische Universität Muenchen
Pipelined computing is a promising paradigm for embedded system design. Designing a power management
policy to reduce the power consumption of a pipelined system with nondeterministic workload is, however,
nontrivial. In this article, we study the problem of energy minimization for coarse-grained pipelined systems
under hard real-time constraints and propose new approaches based on an inverse use of the pay-burst-onlyonce principle. We formulate the problem by means of the resource demands of individual pipeline stages
and propose two new approaches, a quadratic programming-based approach and fast heuristic, to solve the
problem. In the quadratic programming approach, the problem is transformed into a standard quadratic
programming with box constraint and then solved by a standard quadratic programming solver. Observing
the problem is NP-hard, the fast heuristic is designed to solve the problem more efficiently. Our approach is
scalable with respect to the numbers of pipeline stages. Simulation results using real-life applications are
presented to demonstrate the effectiveness of our methods.
Categories and Subject Descriptors: C.3 [Special-Purpose and Application-based System]—Real-Time
and Embedded Systems
General Terms: Algorithms
Additional Key Words and Phrases: Scheduling, energy, pay-burst-only-once, periodic power management,
real-time system
ACM Reference Format:
Gang Chen, Kai Huang, Christian Buckl, and Alois Knoll. 2015. Applying pay-burst-only-once principle for
periodic power management in hard real-time pipelined multiprocessor systems. ACM Trans. Des. Autom.
Electron. Syst. 20, 2, Article 26 (February 2015), 27 pages.
DOI: http://dx.doi.org/10.1145/2699865
1. INTRODUCTION
With increasing requirements for high performance, multicore architectures are believed to be the major solution for future embedded systems. Many real-time applications, especially streaming applications, can be executed on multiple processors simultaneously to achieve parallel processing. When real-time applications are executed
A preliminary version of a portion of this article appears in Proceedings of the Conference on Design Automation and Test in Europe 2013.
This work has been partly supported by China Scholarship Council, German BMBF project ECU (grant no.
13N11936) and Car2X (grant no. 13N11933).
The authors would like to thank their sponsors for their support.
Authors’ addresses: G. Chen, Department of Informatics, Technische Universität Muenchen (TUM),
Boltzmannstraße 3, 85748, Garching bei München, Germany; K. Huang (corresponding author), School of Mobile Information Engineering, Sun Yat-sen University, Zhu Hai, China; email: [email protected];
C. Buckl, Fortiss GmbH, Germany; A. Knoll, Department of Informatics, Technische Universität Muenchen
(TUM), Boltzmannstraße 3, 85748, Garching bei München, Germany.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by
others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to
post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions
from [email protected]
c 2015 ACM 1084-4309/2015/02-ART26 $15.00
DOI: http://dx.doi.org/10.1145/2699865
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
26
26:2
G. Chen et al.
on multicore architectures powered by batteries, minimizing the energy consumption
is one of the major design goals, because an energy-efficient design will increase the
lifetime, increase the reliability, and decrease the heat dissipation of the system.
Pipelined computing is a promising paradigm for embedded system design, which
can, in principle, provide high throughput and low energy consumption [Carta et al.
2007]. For instance, a streaming application can be split into a sequence of functional
blocks that are computed by a pipeline of processors where power-gating techniques
can be applied to achieve energy efficiency.
Performance constraints of a streaming application are usually imposed on two principle metrics, that is, throughput and latency. The latency is the main concern for
applications such as video/telephone conferencing and automatic pattern recognition
applications, where the latency beyond a certain boundary is not tolerated. In the case
of pipelined real-time systems, the latency of a streaming application can be expressed
as the end-to-end deadline requirement that the application is processed through the
pipeline.
Designing the scheduling policy for the pipeline stages under the requirements of
both energy efficiency and timing guarantee is, however, nontrivial. In general, energy
efficiency and timing guarantee are conflict objectives, that is, techniques that reduce
the energy consumption of the system will usually pay the price of longer execution
time, and vice versa. Previous work on this topic either requires precise timing information of the system [Yu and Prasanna 2002; Xu et al. 2007] or tackles only soft real-time
requirements [Javaid et al. 2011b; Carta et al. 2007]. However, this precise timing
of task arrivals might not be guaranteed in practice. Thus, the previous approaches
cannot guarantee the worst-case deadline and cannot be applied to those embedded systems where violating deadlines could be disastrous. Compared to the preceding work,
our work tackles a pipelined event stream with nondeterministic workloads in hard
real-time systems by an inverted use of the pay-burst-only-once principle for energy
efficiency.
This article studies the energy minimization problem of coarse-grained pipelined
systems under hard real-time requirements. We consider a streaming application that
is split into a sequence of coarse-grained functional blocks which are mapped to a
pipeline architecture for processing. The workload of the streaming application is abstracted as an event stream and the event arrivals of the stream are modeled as the
arrival curves in the interval domain [Le Boudec and Thiran 2001]. The event stream
has an end-to-end deadline requirement, that is, the time by which any event in the
stream travels through the pipeline should be no longer than this required deadline.
The objective is thereby to find those optimal scheduling policies for individual stages
of the pipeline with minimal energy consumption while the deadline requirement of
the event stream is guaranteed.
Intuitively, the problem can be solved by partitioning the end-to-end deadline into
sub-deadlines for individual pipeline stages and optimizing the energy consumption
based on the partitioned sub-deadlines. However, any partition strategy based on the
end-to-end deadline and the follow-up optimization method will suffer counting multiple times the burst of the event stream, which will inevitably overestimate the needed
resource for every pipeline stage and lead to poor energy saving. A motivation example
in Section 4 will demonstrate this drawback in detail. Therefore, a more sophisticated
method is needed to tackle this problem.
In this article, we develop a new approach to solve the energy minimization problem for pipelined multiprocessor embedded systems while guaranteeing the worst-case
end-to-end delay. This article summarizes and extends the results built in Chen et al.
[2013]. Our idea to solve this problem lies in an inverse use of the well-known pay-burstonly-once principle [Le Boudec and Thiran 2001]. Rather than directly partitioning the
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
Applying Pay-Burst-Only-Once Principle for Periodic Power Management
26:3
end-to-end deadline, we compute for the entire pipeline one service curve that serves
as a constraint for the minimal resource demand. The energy minimization problem
is then formulated with respect to the individual resource demands of pipeline stages.
To solve this problem, we propose two heuristics, that is, a quadratic programming
heuristic and a fast heuristic. In the quadratic programming heuristic, the minimization problem is transformed to a standard quadratic programming problem with box
constraint and then solved by a standard solver. Observing that the formulated problem is NP-hard, we present a fast heuristic to find a suboptimal solution by analyzing
the properties of the optimal solution, running with the complexity O(mn) (where m
and n are the stage number and sample step number, respectively). For simplicity, we
consider power-gating energy minimization and use periodic dynamic power management in Huang et al. [2009b, 2011a] to reduce the leakage power, that is, to periodically
turn on and off the processors of the pipeline. In this work we compute period power
management schemes offline and the fixed Ton/Toff for processors of every pipeline stage
are applied during runtime. With this approach, we can not only guarantee the overall
end-to-end deadline requirement but also retrieve the pay-burst-only-once phenomena,
achieving a significant reduction of the energy consumption. In addition, our methods
are scalable with respect to the number of pipeline stages. The contributions of this
article are summarized as follows.
—A new method is developed to solve the energy minimization problem for pipelined
multiprocessor embedded systems by inversely using the pay-burst-only-once
principle.
—A minimization problem is formulated based on the needed resource of individual
stages of the pipeline architecture and a transformation of the formulation to a standard quadratic programming problem with box constraints. The formulated problem
is proved NP-hard.
—A quadratic programming heuristic is developed to solve the formulated problem and
a formal proof is provided to show the correctness of our approach, that is, guarantee
on the end-to-end deadline requirement.
—A fast heuristic is developed to solve the formulated problem, running with the
complexity O(mn).
The rest of the article is organized as follows. Section 2 reviews related work in the
literature. Section 3 presents basic models and the definition of the studied problem.
Section 4 presents the motivation example and Section 5 describes the proposed approach. Experimental evaluation is presented in Section 6, and Section 7 concludes.
2. RELATED WORK
Pipelined computing is a promising paradigm for embedded system design, which can
in principle provide high performance and low energy consumption. Pipelined multiprocessor systems are widely applied as a viable platform for high performance implementation of multimedia applications [Shee and Parameswaran 2007; Javaid and
Parameswaran 2009; Shee et al. 2006; Karkowski and Corporaal 1997]. Energy optimization for pipelined multiprocessor systems is an interesting topic where a number
of techniques have been proposed in the literature. Carta et al. [2007] and Alimonda
et al. [2009] proposed a feedback control technique for dynamic voltage/frequency scaling (DVFS) in a pipelined MPSoC architecture with soft real-time constraints, aimed
at minimizing energy consumption with throughput guarantees. Each pipelined processor is associated with a dedicated controller that monitors the occupancy level of the
queues to determine when to increase or decrease the voltage-frequency levels of the
processor. Javaid et al. [2011b] proposed an adaptive pipelined MPSoC architecture
and a runtime balancing approach based on workload prediction to achieve energy
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
26:4
G. Chen et al.
efficiency. The authors in Javaid et al. [2011a] proposed a dynamic power management
scheme for adaptive pipelined MPSoCs. In this work, the duration of idle periods is determined based on future workload prediction and used to select an appropriate power
state for the idle processor. However, the prior approaches are under soft real-time
constraints. Regarding hard real-time systems, these approaches cannot be applied.
There are also methods [Davare et al. 2007; Hong et al. 2011; de Langen and
Juurlink 2006, 2009; Liu et al. 2014; Yu and Prasanna 2002] for hard real-time systems.
To guarantee the end-to-end delay, the anthers in Liu et al. [2014] studied the problem
of minimizing the number of processors required for scheduling end-to-end deadlineconstrained streaming applications modeled as CSDF graphs, where the actors of a
CSDF are executed as strictly periodic tasks. In Davare et al. [2007], the authors optimized periods for dependent tasks on hard real-time distributed automotive systems
in order to meet the end-to-end constraints. In Hong et al. [2011], the authors proposed
a distributed approach to assign local deadlines for periodical tasks on distributed systems to meet the end-to-end deadline constraints. To reduce the energy consumption,
Yu and Prasanna [2002] presented an integer linear programming (ILP) formulation
for the problem of frequency assignment of a set of periodic independent tasks on a
heterogeneous multiprocessor system. The authors in de Langen and Juurlink [2006,
2009] proposed leakage-aware scheduling heuristics to reduce the energy consumption
by translating real-time applications with periodic tasks to DAGs using the framebased scheduling paradigm and considering the trade-offs among DVFS, DPM, and
the number of the processors. But these methods require precise timing information
such as periodical real-time events. However, in practice, this precise timing information of task arrivals might not be determined in advance. The nondeterminism in the
timing of event arrivals results from two main causes: (a) An event may be triggered
by the physical environment, which, in general, is not able to be accurately predicted.
(b) When a distributed system is considered, an event might be triggered by other
events on different processing components in which variable execution workloads would
make the prediction of precise information on event arrivals extremely complicated.
In the aforesaid research, there is no guarantee that an event will arrive in time.
Therefore, these approaches cannot be applied to guarantee the worst-case deadline
in embedded systems where violating deadlines could be disastrous. Unlike previous
work, we focus on improving energy efficiency in hard real-time embedded systems
while guaranteeing the system satisfies the worst-case deadline constraint.
To model irregular event arrivals, Real-Time Caculus (RTC) [Thiele et al. 2000],
which is based on network calculus [Le Boudec and Thiran 2001], can be applied.
Specifically, the arrival curve in the RTC models an upper bound and a lower bound
of the number of event arrivals or the demand of computation under a specified time
interval domain. Considering the DVFS system, Maxiaguine et al. [2005] computed a
safe frequency at periodic intervals to prevent buffer overflow of a system. By adopting RTC models, Chen et al. [2009] explored the schedulability for the online DVFS
scheduling algorithms proposed in Yao et al. [1995]. Combining optimistic and pessimistic DVFS scheduling, Perathoner et al. [2010] presented an adaptive scheme for
the scheduling of arbitrary event streams. When only considering dynamic power management (DPM), Huang et al. [2009b, 2011a] presented an algorithm to find periodic
time-driven patterns to turn on/off the processor for energy saving. Online algorithms
are proposed in Huang et al. [2009a, 2011b] and Lampka et al. [2011] to adaptively
control the power mode of a system, procrastinating the processing of arrived events as
late as possible. In one algorithm in Huang et al. [2009a, 2011b], a tight bound of event
arrivals is computed based on historical information of event arrivals in the recent
past. Instead of using historical information, a dynamic counter technique [Lampka
et al. 2011] is used to predict the future workload. Compared to preceding work, the
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
Applying Pay-Burst-Only-Once Principle for Periodic Power Management
26:5
Fig. 1. System model.
distinct difference of ours is that we can tackle the correlation of a pipelined event
stream by an inverted use of the pay-burst-only-once principle. With this new method,
retrieving this correlation of the same event stream between different pipeline stages,
we can compute longer deadlines for each pipeline stage and reduce the overall power
consumption of the system.
3. MODELS AND PROBLEM DEFINITION
3.1. Hardware Model
The hardware architecture we have chosen is a simplified one with no shared cache
and shared bus among different processing cores. The processing cores are connected
in a pipelined fashion via dedicated FIFOs. We consider the system with pipeline architecture shown in Figure 1(a). Subtasks of a partitioned application are mapped
and executed in different processors. The processors communicate data only through
distributed memory units. Each memory unit can be organized as one or several FIFOs. The data communication and synchronization among processors are realized by
blocking read and write SW primitives. This kind of hardware architecture has been
realized in Nikolov et al. [2008]. As the service curve of each stage can be computed for
energy efficiency by our proposed approaches offline, the worst-case FIFO size of each
stage can be determined by applying the analysis approach in Wandeler et al. [2006].
Each processor in the pipelined system has three power consumption modes, namely
active, standby, and sleep modes, as shown in Figure 1(b). To serve events, the processor
must be in the active mode with power consumption Pa . When there is no event to
process, the processor can switch to sleep mode with lower power consumption Pσ .
However, mode switching from sleep mode to active mode will cause additional energy
and latency penalty, respectively denoted as Esw,on and tsw,on. To prevent the processor
from frequent mode switches, the processor can stay at standby mode with power
consumption Ps , which is less than Pa but more than Pσ , that is, Pa > Ps > Pσ .
Moreover, the mode switch from active (standby) mode to sleep mode will cause energy
and time overhead, respectively denoted by Esw,sleep and tsw,sleep.
Consider the overhead of switching the system from active mode to sleep mode,
the system break-even time T BET denotes the minimum time length that the system
stays at sleep mode. If the interval that the system can stay at sleep mode is smaller
than T BET , the mode-switch mode overheads are larger than the energy saving, therefore switching mode is not worthwhile. And break-even time T BET can be defined as
follows:
Esw
T BET = max tsw ,
,
(1)
Ps − Pσ
where tsw = tsw,on + tsw,sleep and Esw = Esw,on + Esw,sleep.
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
26:6
G. Chen et al.
3.2. Energy Model
The analytical processor energy model in Martin et al. [2002], Wang and Mishra [2010],
Jejurikar et al. [2004], and de Langen and Juurlink [2009] is adopted in this article,
whose accuracy has been verified with SPICE simulation [Martin et al. 2002; Wang
and Mishra 2010; de Langen and Juurlink 2009]. The dynamic power consumption of
the core on one voltage/frequency level (Vdd, f ) can be given by
2
Pdyn = Ceff · Vdd
· f,
(2)
where Vdd is the supply voltage, f the operating frequency, and Ceff the effective switching capacitance. The cycle length tcycle is given by a modified alpha power model
tcycle =
Ld · K6
,
(Vdd − Vth)α
(3)
where K6 is technology constant and Ld is estimated by the average logic depth of all
instructions’ critical path in the processor. The threshold voltage Vth is given as
Vth = Vth1 − K1 · Vdd − K2 · Vbs ,
(4)
where Vth1 , K1 , K2 are technology constants and Vbs is the body bias voltage.
The static power is mainly contributed by the subthreshold leakage current Isubn, the
reverse bias junction current I j , and the number of devices in the circuit Lg . It can be
presented by
Psta = Lg · (Vdd · Isubn + |Vbs | · I j ),
(5)
where the reverse bias junction current I j is approximated as a constant and the
subthreshold leakage current Isubn can be determined as
Isubn = K3 · e K4 Vdd · e K5 Vbs ,
(6)
where K3 , K4 , and K5 are technology constants. To avoid junction leakage power overriding the gain in lowering Isubn, Vbs should be constrained between 0 and −1V. Thus,
the power consumption at active mode and at standby mode, that is, Pa and Ps , under
one voltage/frequency (Vdd, f ) can be respectively computed as
Pa = Pdyn + Psta + Pon,
(7)
Ps = Psta + Pon,
(8)
where Pon is an inherent power needed for keeping the processor on.
3.3. Task Model
This article considers streaming applications that can be split into a sequence of tasks.
As shown in Figure 1(a), an H.263 decoder is represented as four tasks (i.e., PD1,
deQ, IDCT, MC) implemented in a pipelined fashion [Oh and Ha 2002]. To model
the workload of the application, the concept of arrival curve α() = [α u(), αl ()],
originating from network calculus [Le Boudec and Thiran 2001], is adopted. α u()
and αl () provide the upper and lower bounds on the number of arrival events for the
stream S in any time interval . Many other traditional timing models of event streams
can be unified in the concept of arrival curves. For example, a periodic event stream
can be modeled by a set of step functions, where ᾱ u() = p + 1 and ᾱl () = p . For a
sporadic event stream with minimal interarrival distance p and maximal interarrival
distance p , the upper and lower arrival curves are ᾱ u() = p + 1, ᾱl () = ,
p
respectively. Moreover, a widely used model to specify an arrival curve is the PJD
model, where the arrival curve is characterized by period p, jitter j, and minimal
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
Applying Pay-Burst-Only-Once Principle for Periodic Power Management
26:7
Fig. 2. Examples for arrival curves: (a) periodic events with period p; (b) events with minimal interarrival
distance p and maximal interarrival distance p = 1.3 p; (c) events with period p, jitter j = p, and minimal
interarrival distance d = 0.75 p.
interarrival distance d. In the PJD model, the upper arrival curve can be determined
j
as ᾱ u() = min{ +
, d }. Figure 2 depicts arrival curves for the previous cases.
p
Analogous to arrival curves that provide an abstract event stream model, a tuple
β() = [β u(), β l ()] defines an abstract resource model which provides upper and
lower bounds on the available resources in any time interval . Further details are
referred to Thiele et al. [2000]. Note that arrival curves are event based, meaning they
specify the number of events of the steam in one interval of time, while service curves
are based on the amount of computation time. Therefore, service curve β has to be
transformed to β̄ to indicate the number of events of the stream that the processor
can process in a specified interval time. Suppose that the execution time of an event
l
u
is c, the transformation of the service curves can be done by β̄ l = βc and β̄ u = βc .
With these definitions, a processor with lower service curve β̄ Gl () is said to satisfy the
deadline D for the event stream specified by α u() if the following condition holds.
β̄ Gl () ≥ α u( − D), ∀ ≥ 0
(9)
Note that we adopt the same assumption as Maxiaguine et al. [2005], Huang et al.
[2009a, 2009b], Lampka et al. [2011], and Chen et al. [2009] and assume the worst-case
execution time (WCET) of each task can be predefined and considered as system input
in the article. As mentioned in the previous section, the hardware architecture that we
have chosen is a simplified one with no shared cache and shared bus among different
processing cores. In this sense, we can safely assume the WCET of the running tasks
as system inputs.
3.4. Problem Statement
This article considers periodic power management [Huang et al. 2009b] that periodically turns on and off a processor. In each period T = Ton +Toff , it switches the processor
to active (standby) mode for Ton time units, followed by Toff time units in sleep mode,
as shown in Figure 1(b). Given a time interval L, where L T and TL is an integer,
suppose that γ (L) is the number of events of event stream S served in L. If all the
served events finish within L, the energy consumption E(L, Ton, Toff ) by applying this
periodic scheme is
E(L, Ton, Toff ) =
L
(Esw,on + Esw,sleep)
Ton + Toff
L · Toff
L · Ton
Ps +
Pσ
+
Ton + Toff
Ton + Toff
+ c · γ (L)(Pa − Ps )
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
26:8
G. Chen et al.
=
L · Esw
L · Ton(Ps − Pσ )
+
Ton + Toff
Ton + Toff
+ L · Pσ + c · γ (L)(Pa − Ps ),
where Esw is Esw,on + Esw,sleep for brevity. Given a sufficiently large L, without changing the scheduling policy, the minimization of energy consumption E(L, Ton, Toff ) of a
single processor is to find Toff and Ton such that the average idle power consumption
P(Ton, Toff ) is minimized.
P(Ton, Toff ) =
def
L·Esw
Ton+Toff
+
L·Ton·(Ps −Pσ )
Ton+Toff
L
Esw + Ton · (Ps − Pσ )
.
=
Ton + Toff
(10)
on
, the average idle power consumption P in (10) can be defined
By defining K = TonT+T
off
by Toff and K(0 ≤ K ≤ 1) as follows:
Esw
Esw
def
P(K, Toff ) =
· K.
+ (Ps − Pσ ) −
(11)
Toff
Toff
By analyzing (11), it is obvious that the following properties hold.
Property 1.
minimum.
∀Toff , Toff ≥
Property 2. ∀Toff , Toff <
Esw
Ps −Pσ
Esw
Ps −Pσ
, P(K, Toff ) gets its minimum when K gets its
, P(K, Toff ) gets its minimum as Ps − Pσ when K = 1.
According to Properties 1 and 2, when Toff >
Esw
Ps −Pσ
holds, the processing unit should
turn on as briefly as possible in one period. When Toff ≤
Esw
Ps −Pσ
holds, the processing
unit should turn on all the time with Toff = 0. In this context,
can be seen as the
break-even time of the processing unit.
Based on (10), the energy minimization problem of an m-stage pipeline can be formulated as minimizing the function
m
i
i
Esw
+ Ton
· Psi − Pσi
P(Ton, Toff ) =
,
(12)
i + Ti
Ton
off
i
Esw
Ps −Pσ
m
1
2
1
2
m
where Ton = [Ton
Ton
. . . Ton
] and Toff = [Toff
Toff
. . . Toff
]. Now we can define the
problem that we studied as follows.
Given pipelined platform with m stages, an event stream S processed by this
pipeline, and an end-to-end deadline requirement D, we are to find a set of
periodic power managements characterized by Ton and Toff that minimize
the average idle power consumption P defined in (12) while guaranteeing
that the worst-case end-to-end delay does not exceed D.
4. MOTIVATION EXAMPLE
A phenomenon called pay-burst-only-once is well known and can give a closer upper
estimate on the delay when an end-to-end service curve is derived prior to delay computations [Fidler 2003]. When a workload flow with a burst traverses a number of
stages in sequence, the effect of the burst of the flow on the end-to-end delay bound is
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
Applying Pay-Burst-Only-Once Principle for Periodic Power Management
26:9
Fig. 3. Motivation example.
the same as if the flow traversed only one node. The end-to-end delay bound computed
with this property can be tighter than the sum of delay bounds of each node.
This section presents a motivation example where an event stream passes through
a two-stage pipeline with a deadline requirement D. For simplicity, arrival curves in
the leaky-bucket form and service curves in rate-latency form [Le Boudec and Thiran
2001] are used. In this representation, an arrival curve is modeled as α() = b + r · ,
where b is the burst and r the leaky rate. Correspondingly, a service curve is modeled
as β() = R · ( − T ), where R is service rate and T the delay. A graphical illustration
of the example is shown in Figure 3, where D = 20, b = 5, r = 0.5, and R1 = R2 = 1.
We first inspect the strategy of partitioning the end-to-end deadline and using the
partitioned sub-deadlines for the two pipeline stages. For simplicity, we split the D
equally, that is, D/2 for each stage. As shown in Figure 3(a), given D/2 deadline
requirement for the first pipeline stage, we obtain the maximal T1 = D
− Rb1 = 5,
2
corresponding to the minimal service demand β1 = − 5. To derive the minimal β2 for
the second stage of the pipeline is more involved. We need the output arrival curve α from the first stage. According to Le Boudec and Thiran [2001], α () = b + r · T1 + r · .
1
Now again, with a deadline requirement D/2 for α , we have T2 = D
− b+r·T
= 2.5.
2
R1
Let’s take a close look at this solution. According to the concatenation theorem
β R1 ,T1 ⊗β R2 ,T2 = βmin(R1 ,R2 ),T1 +T2 , we get a concatenated service curve β = −(T1 + T2 ) =
− 7.5. With this concatenated service curve, the maximal overall end-to-end deadline
for β1 and β2 is 12.5, which is far more strict than D. This example indicates that the
obtained β1 and β2 based on partitioning the end-to-end deadline are too pessimistic.
The reason for the pessimism comes from paying the burst b/R1 for the second stage
1
of the pipeline as well as the additional delay r·T
from the first stage, as the pay-burstR2
only-once principle points out. These effects will be accumulated for every stage of
the pipeline, leading to even more pessimistic results as the number of pipeline stages
increases. In addition, computing the resource demand of each stage requires the lower
bound of the output arrival curve from the previous stage. Computing this output curve
requires numerical min-plus convolution that will incur considerable computational
and memory overheads. In conclusion, the strategy based on partitioning the end-toend deadline is not a viable approach, in particular for those cases of pipelined systems
with many stages.
On the other hand, one can first derive the total concatenated server demand β T l ,
in this case T = 15 as shown in Figure 3(b). Any partition based on this T will result
in smaller but valid service curves for each pipeline stage, as we can always retrieve
the original end-to-end deadline by means of the pay-burst-only-once principle. For
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
26:10
G. Chen et al.
example, by an equal partition of T , both T1 and T2 are 7.5 and D is still preserved.
This brings the basic idea of our approach that will be presented in the next section.
5. PROPOSED APPROACH
Our approach lies in an inverse use of the pay-burst-only-once principle, as mentioned
in the previous section. Rather than directly partitioning the end-to-end deadline, we
compute one service curve for the entire pipeline, which serves as a constraint for
the minimal resource demand. The energy minimization problem is then formulated
with respect to the resource demands for individual pipeline stages. To solve this
minimization problem, the formulation is transformed into a quadratic programming
form and solved by a 2-phase heuristic.
Without loss of generality, a pipelined system with m heterogeneous stages (m ≥ 2) is
considered. The processor of the i stage can provide minimal βiGl service. Since periodic
i
and
power management is considered, the minimal service βiGl can be modeled as a Ton
i
Toff pair:
− Ti off
i
βiGl () = Ton
⊗ .
(13)
i + Ti
Ton
off
The derivation of Eq. (13) is presented in Lemma A.1 in the appendix section. In
addition, to obtain a tight lower bound of the service curve of the entire pipeline, we
i
i
restrict Ton
as a multiple of the worst-case execution time ci , that is, Ton
= ni ci , ni ∈ N + .
5.1. Problem Formulation
Regarding the problem formulation, we first present an approximation approach (see
Lemma 5.1) to derive a lower bound of the PPM service curve. By using this approximated curve, we derive the concatenated service curve directly (see Lemma 5.2), which
can be used to guarantee the real-time properties (see Theorem 5.3). Then, the energy
minimization problem is formulated with respect to the resource demands for individual pipeline stages. Before presenting the formulation, we first state a few basics. By
i
Ton
defining Ki = T i +T
i , we have the following two lemmas.
on
off
LEMMA 5.1. β̄iGl () ≥
Ki
(
ci
i
− Toff
− ci ).
PROOF. According to the definition of the min-plus convolution operation
inequality a + b ≥ a + b, and the inequality Eq. (13), we have
⎢
⎥
⎢ i −Toffi ⎥ ⎢ Ton T i +T i ⎥
on
⎢
⎥
off
β̄iGl () ≥ ⎣
.
⎦⊗
ci
ci
, the
i
With the restriction Ton
= ni ci , ni ∈ N + and a ≥ a, we have
⎢
⎥
⎢ i −Toffi ⎥
i
⎢ Ton T i +T i ⎥
− Toff
on
⎢
⎥
off
⎣
⎦ = ni ·
i + Ti
ci
Ton
off
≥
≥
According to a ≥ a − 1, we have ci
Ki i
− Toff
.
ci
1
(
ci
− ci ).
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
Applying Pay-Burst-Only-Once Principle for Periodic Power Management
26:11
According to the rule of min-plus convolution of rate-latency service curve β R1 ,T1 ⊗
β R2 ,T2 = βmin(R1 ,R2),T1 +T2 in Le Boudec and Thiran [2001] and Ki ≤ 1, we have
1
Ki 1 Ki Ki i
i
i
− Toff
− Toff
⊗ ( − ci ) = min
− Toff
,
− ci =
− ci .
ci
ci
ci ci
ci
Then, we get the right side of the inequality.
LEMMA 5.2.
m
i=1
β̄i
Gl
m
≥ mini=1
( Kcii )( −
m
i
i=1 (Toff
+ ci )).
PROOF. According to the rule of min-plus convolution of rate-latency service curve
β R1 ,T1 ⊗ β R2 ,T2 = βmin(R1 ,R2),T1 +T2 in Le Boudec and Thiran [2001] and Lemma 5.1, we
have
m
m
m
i
Ki
Ki m
Gl
i
β̄i ≥
− Toff − ci = min
Toff + ci .
−
i=1
ci
ci
i=1
i=1
i=1
With Lemma 5.2, we state the next theorem.
THEOREM 5.3. Assuming an event stream modeled with arrival curve α is processed
by an m-stage pipeline and the lower service curve of each pipeline stage is defined by a
i
i
Ton
and Toff
pair, the pipelined system satisfies an end-to-end deadline D if the following
condition holds.
m
i
Ki
m
Toff + ci ≥ α u( − D)
min
(14)
−
i=1
ci
i=1
PROOF. In Lemma 5.2, the right-hand side of the inequality is a lower bound of
m
m
Gl
Gl
that is the concatenated service curve of the pipeline. With
≥
i=1 β̄i
i=1 β̄i
u
α ( − D), the end-to-end delay of the pipeline is no more than D according to the
pay-burst-only-once principle. Therefore, the theorem holds.
The left-hand side of inequality Eq. (14) can be considered as a bounded delay funcm
tion bdf (, ρ0 , b0 ) = max(0, ρ0 ( − b0 )) with slope ρ0 = mini=1
( Kcii ) and bounded delay
m
i
+ ci ). For the stream S with deadline D, a set of minimum bounded
b0 = i=1 (Toff
delay functions bdfmin(, ρ, b) can be derived by varying b (see Section 5.2). Therefore,
Toff ] such that the resulting bounded delay function
we should find a solution of [ K,
bdf (, ρ0 , b0 ) is no less than the minimum bounded delay functions bdfmin(, ρ, b).
Therefore we can formulate our optimization problem as following:
minimize
Toff
K,
subject to
Toff )
P( K,
m
min
i=1
m
Ki
ci
≥ρ
i
Toff
+ ci ≤ b
i=1
0 ≤ Ki ≤ 1, i = 1, . . . , m
i
≥ 0, i = 1, . . . , m,
Toff
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
(15)
26:12
G. Chen et al.
= [K1 , . . . , Kn]. P( K,
Toff ) is obtained as follows by conducting a transformawhere K
i
Ton
tion Ki = T i +T i to the average power consumption (10) of each stage.
on
off
Toff ) =
P( K,
m
i
i
Esw
(1 − Ki ) i
+ Ps − Pσi Ki .
i
Toff
The advantage of formulation (15) is twofold. First of all, the service curves of individual
pipeline stages are the variables of the optimization problem, which, on the one hand,
overcomes the problem
of paying the burst multiple times while, on the other hand,
avoiding the costly
computation during the optimization. Second, this formulation
allows us to use a more efficient method to analyze the problem as presented in the
following sections.
5.2. Quadratic Programming Transformation
How to solve the minimization problem (15) is not obvious. The constraints b and ρ,
indeed, are not fixed values and in addition these two constraints are correlated. For
a fixed b, the minimum bounded delay function bdfmin(, ρ, b) can be determined by
computing ρ.
ρ = inf {ρ : bdf (, ρ, b) ≥ α u( − D), ∀ ≥ 0}.
(16)
In this article, we conduct the optimization by varying b and computing ρ for every
possible b. For a fixed b, we can transform (15) into a quadratic programming problem
with box constraints (QPB), as stated in the following lemma.
LEMMA 5.4. The minimization problem in (15) can be transformed as the following
quadratic programming problem with box constraints:
minimize
x=[x1 ... xm]
x T Q
x
subject to 0 ≤ xi ≤
(17)
i (1
Esw
− ρ ci ), i = 1, . . . , m,
where Q = A− B, A is an
m× m matrix of ones and B is an m× m diagonal matrix with
(b−
m
c j )(P i −P i )
s
σ
j=1
.
i th diagonal element
i
Esw
∗
Denote x as the optimal solution for the QPB problem in (17), then the optimization
(x ∗ )2
x∗
i
solution for (15) can be obtained with Ki = 1 − Eii and Toff
= m i x∗ (b − m
j=1 c j ).
sw
j=1
j
PROOF. With Cauchy-Buniakowski-Schwartz’s inequality, we can get that
m
2
m
m
i
Esw
(1 − Ki )
i
i (1 − k )
Toff ·
≥
Esw
.
i
i
T
off
i=1
i=1
i=1
The minimum value of
m
i
(1−Ki )
Esw
i
i=1
Toff
lowing equation holds.
m
can be obtained at
(
√
Ei (1−ki ))2
sw
b− m
j=1 c j
i=1
when the fol-
⎞
⎛
m
i (1 − K )
Esw
i
⎝b −
=
cj⎠ .
j
m
j=1
E
(1
−
K
)
sw
j
j=1
i
Toff
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
Applying Pay-Burst-Only-Once Principle for Periodic Power Management
26:13
Then optimization formulation in (15) can be formulated as
m i
m
( i=1
Esw (1 − Ki ))2 i
m
Ps − Pσi Ki
+
minimize
K1 ,K2 ,...,Km
b − j=1 c j
i=1
subject to
ρ ci ≤ Ki ≤ 1, i = 1, . . . , m.
i (1 − K ), formulation (15) can be transformed as the QPB
By defining xi = Esw
i
problem in (17).
i
Note that there is a feasible region for b. To guarantee all the resulting Toff
≥ 0, the
m
bounded delay b should not be less than i=1 ci . According to (14), the maximum slope
ρ of the bounded delay function will not exceed max1m ci . Correspondingly, we derive
i=1
the minimum bounded delay function bdfmin(, max1m ci , b). By inverting (16), we can
i=1
derive the maximum delay bu by (18), which can guarantee that all the resulting Ki will
not exceed 1. In summary, the feasible region of b ∈ [bl , bu] can be bounded as follows:
1
u
u
, d ≥ α ( − D), ∀ ≥ 0
b = sup d : bdf ,
m
maxi=1
ci
m
bl =
ci .
(18)
i=1
5.3. Quadratic Programming Heuristic
With the preceding information, we can now present the overall algorithm to the energy
minimization problem defined in Section 3.4. Basically, bounded delay b is scanned by
step within the range [bl , bh]. For each b, we first solve the subproblem (17) with a
QPB solver, and then the obtained solution is repaired to fulfill further constraints (this
will be explained later on). The pseudocode of the algorithm is depicted in Algorithm 1.
ALGORITHM 1: Quadratic Programming Heuristic
Input: α u, bl , bh, , and Pmin = ∞
opt , To f f, opt
Output: K
1: for b = bl to bh with step do
2:
compute ρ by Eqn. 16;
and Toff by solving (17);
3:
obtain K
4:
repair K and Toff ;
Toff ) < Pmin then
5:
if P( K,
; To f f, opt ← Toff ;
6:
Kopt ← K
opt , To f f, opt );
7:
Pmin ← P( K
8:
end if
9: end for
THEOREM 5.5. ∃i ∈ {1, 2, . . . , m} and
hard.
i
Esw
Psi −Pσi
< b−
m
j=1
c j , then the problem is NP-
i
m
Esw
PROOF. If there exists a stage pi where the condition P i −P
i < b −
j=1 c j holds,
s
σ
the matrix Q in Lemma 5.4 is not positive semi-definite. Thus, QPB is the nonconvex
quadratic programming problem which is NP-hard [Jeyakumar et al. 2006].
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
26:14
G. Chen et al.
To solve the subproblem (line 3 in Algorithm 1), we apply the existing QPB solver.
According to Theorem 5.5, QPB is NP-hard when the scanned bounded delay b is big
i
m
Esw
enough (i.e., P i −P
i < b −
j=1 c j ). It is in general difficult to solve the problem optis
σ
mally. Nevertheless, there are approximation schemes [Fu et al. 1998] that can efficiently solve the nonconvex QPB and there are many excellent off-the-shelf software
packages [Chen and Burer 2012] available. In this article, the state-of-the-art finite
B&B algorithm [Chen and Burer 2012] is applied to solve our QPB problem.
and Toff , the repair phase (line 4 in Algorithm 1) is conAfter obtaining a pair of K
ducted to fulfill further constraints. This repair scheme is represented in Algorithm 2.
i
i
First of all, the resulting Toff
of pipeline stage i may be smaller than tsw
. In the case
i
i
where Toff < tsw , turning off the processor of stage i is not possible, therefore the soi
] = [1, 0], stage i is on all the time (line 2
lution for stage i is repaired by [Ki , Toff
in Algorithm 2). However, this repair step will lead to the loss of sleep time Qi for each
stage (line 21 in Algorithm 2). We record this loss and try to reassign the loss to each
stage at the end of the algorithm (lines 21–32 in Algorithm 2) to minimize the power
i
consumption further. Second, the resulting Ton
may not be a multiple of ci , which is
i
one of our basic requirements. The repair steps are conducted to make Ton
a multiple
of ci (lines 6–20 in Algorithm 2). To guarantee the resulting Ki is constant with respect
Ti
i
i
to Ki , Toff
should be adjusted to Koni − Ton
, and Ti indicates how much sleep time of
i
(line 14 in Algorithm 2).
the stage i should be adjusted comparing to the original Toff
i
If Ti > 0 holds, it means that Ton decreases and the stage i should decrease sleep
i
to make Ki constant (Line 16 in Algorithm 2), which will result in the loss
time Toff
Qi and this part can
be reassigned to prolong the sleep time of other stages. Ti ≤ 0
i
i
indicates that Ton
increases and the stage should increase sleep time Toff
to make Ki
i
i
constant. For this case, we make Toff
constant with respect to Toff
, which results in a Ki
increase and power consumption increase Ei (line 18 in Algorithm 2). In the end, the
total loss Q should be reassigned to the stage with Ti < 0 to reduce the power consumption further (lines 21–32 in Algorithm 2). The reassignment heuristic uses power
increase Ei as a metric to decide which stage should be assigned first. Specifically, the
heuristic iterates through all stages that need to compensate and, in each iteration,
i
picks the stage with maximum power increase Ei and increase Toff
without causing
Ki < Ki . The reassignment heuristic terminates when there is no loss to reassign or
no stage needs to compensate. It is worth noting that the repair phase we conduct can
still guarantee the repaired solution to satisfy the constraints, as stated in Lemma 5.6.
LEMMA 5.6. The solution repaired by Algorithm 2 satisfies the constrains in (15).
m
i
PROOF. The operation in lines 2–20 will not increase the term
i=1 Toff without
causing Ki < Ki , which satisfies the constrains in (15). The reassignment heuristic
in (lines 21–32) reassigns the total loss Q to each stage that needs
to compensate and
m
i
i
increase its sleep time Toff
without increasing the total sleep time i=1
Toff
and causing
Ki < Ki . Thus, the solution repaired by Algorithm 2 satisfies the constrains in (15).
5.4. Fast Heuristic
In Section 5.3, we presented a quadratic programming heuristic with QPB transformation. According to Theorem 5.5, QPB is NP-hard when the scanned bounded delay b is
big enough. Assume that bounded delay b is scanned by n steps, then the heuristic in
Section 5.3 needs to solve this NP-hard problem several times, which is time consuming.
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
Applying Pay-Burst-Only-Once Principle for Periodic Power Management
26:15
ALGORITHM 2: Repair Scheme
Toff ]
Input: solution of QPB problem:[ K,
, T ]
Output: [ K
off
i
i
1: compute the stage set: S1 = { pi |Toff
< tsw
};
2: repair [K , Toff
] of the stage p ∈ S1 as [1, 0];
i
3: update budget Ti ← Toff
and power increase Ei ← 0 for stages p ∈ S1 ;
i
i
4: compute Ton and the stage set: S2 = { pi |Toff
≥ tsw
};
5: for each stage pi ∈ S2 do
i
6:
if Ton
< ci then
i
7:
Ton
← ci ;
8:
else
Ti
i
9:
Ton
← con
ci ;
i
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
if
i
Ton
Ki
i
i
− Ton
< tsw
then
Ti
i
Ton
← con
ci ;
i
end if
end if
Ti
i
i
compute budget Ti = Toff
− ( Koni − Ton
);
if Ti >= 0 then
Ti
i
i
Toff
← Koni − Ton
; Ei ← 0 ;
else i
i
i
i
i
i
Toff
← Toff
; Ei ← P(Ton
, Toff
) − P(Ton
, Toff
);
end if
end for
compute total budget Q = Ti >0 Ti ;
while Q > 0 do
find stage pi with maximum power increase Ei ;
if Ti < 0 then
compute available allocation allo = min(Q, |Ti |);
i
i
Toff
← Toff
+ allo ; Ti ← Ti + allo;
i
i
i
i
Ei ← P(Ton
, Toff
) − P(Ton
, Toff
);
Q ← Q − allo ;
else
break ;
end if
end while
, T ];
update [ K
off
Besides, in the first optimization step, the quadratic programming heuristic does not
i
consider the break-even time constraint (i.e., Toff
of pipeline stage i is not smaller than
i
T BET ), which could also make the result pessimistic. To overcome these drawbacks, we
present a fast heuristic to find a suboptimal solution, running with O(mn) time complexity (m is the stage number). Different from the heuristic in Section 5.3, we consider
the break-even time constraint in the optimization phase and partition stage set P
as two stage sets according to this constraint, rather than decoupling the break-even
time constraint and optimization. Based on this stage-set partition, we can derive a
suboptimal solution as stated in Lemma 5.7.
Toff ] as the optimal solution
LEMMA 5.7. Give a fixed bounded delay b and denote [ K,
for the problem. Partition the stage set P into two subsets S1 and S2 , where S1 =
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
26:16
G. Chen et al.
i
i
i
i
Toff ] can be
{ pi |Toff
< T BET
} and S2 = { pi |Toff
≥ T BET
}. Then, the optimal solution [ K,
determined as follows
i
(1) For the stage pi ∈ S1, [Ki , Toff
] = [1, 0].
i
(2) For the stage pi ∈ S2, [Ki , Toff
] = [ρci , xi ], where xi =
i
wi = Esw (1 − ρ · ci ).
wi
pi ∈S2
wi
(b −
m
i=1 ci )
and
Ei
i
i
sw
PROOF. For the stage subset S2 , Toff
≥ T BET
≥ P i −P
i holds. The average power
s
σ
i
consumption P(Ki , Toff ) gets its minimum at Ki = ρci according to Property 1. Thus, the
w2
average power consumption of the stage subset S2 can be transformed as pi ∈S2 T ii +
off
m
i
i
i
i
pi ∈S2 ρci (Ps − Pσ ) with constraint
pi ∈S2 Toff ≤ b −
pi ∈S1 Toff . According to
i=1 ci −
Cauchy-Buniakowski-Schwartz’s inequality, the optimal average power consumption
i
of the stage subset S2 can be determined as (19) when [Ki , Toff
] = [ρci , wi wi (b −
pi ∈S2
m
i
c
−
T
)]
holds.
pi ∈S1 off
i=1 i
!2
w
i
pi ∈S2
i
=
P Ki , Toff
+
ρci (Psi − Pσi ).
m
i
b
−
c
−
T
i
p
∈S1
i=1
off
i
p ∈S2
p ∈S2
i
(19)
i
According to (19),
power consumption of the stage subset S2 gets the
the average
i
minimum when pi ∈S1 Toff
gets the minimum.
i
i
= tsw
, where for this case of
For the stage set S1 , there are two cases: (a) T BET
i
i
Toff < tsw , turning off the processor of stage i is not possible, as we stated in repair
i
of the processor
scheme, due to the hardware requirement that the sleep time Toff
i
should not be smaller than the overhead tsw . Thus, the solution for stage i is forced as
i
Esw
i
i
[Ki , Toff
] pi ∈S1 = [1, 0]; (b) T BET
= P i −P
i where, with Property 2, the optimal average
s
σ
i
power consumption of the stage subset S1 gets its minimum at [Ki , Toff
] pi ∈S1 = [1, 0].
i
At this point, pi ∈S1 Toff gets the minimum as 0, thus the average power consumption
of the stage subset S2 gets its minimum.
According to Lemma 5.7, the optimal solution can be derived directly if the stage
partition P = {S1, S2} is determined. Thus, optimal solution can be derived by exhaustively exploring all possible stage partitions with the complexity Ø(2n). When the stage
number increases, the complexity will increase exponentially. To reduce its complexity, a fast stage partition scheme is proposed in this article. In this scheme, we first
i
i
greedily put all stages into the stage set S2 = { pi |Toff
≥ T BET
} (i.e., we assume all the
stages can enter sleep mode). Under this greedy partitioning, we compute the optimal
Toff according to Lemma 5.7 as described in lines 1 and 2 in Algorithm 3. Then, we
can assign the stages by checking whether the resulting optimal Toff under the greedy
i
partition is greater than T BET
(see lines 3–9 in Algorithm 3). The feasibility of this
partition scheme can be guaranteed by Lemma 5.8.
LEMMA 5.8. Stage partition P = {S1, S2} generated by Algorithm 3 is feasible.
m
i
ci ) ≥ T BET
holds for S2. According
PROOF. In Algorithm 3, wi wi (b − i=1
pi ∈P
m
wi
i
to Lemma 5.7, Toff in S2 can be determined as
(b − i=1
ci ). As S2 ⊆ P, we
wi
pi ∈S2
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
Applying Pay-Burst-Only-Once Principle for Periodic Power Management
26:17
ALGORITHM 3: Greedy Partition Scheme
Input: ρ, b, P
Output: S1,S2 i (1 − ρ · c ) for each stage p ;
1: Compute wi = Esw
i
mi
w
2: Compute xi = i w (b − i=1 ci ) for each stage pi ;
i
pi ∈P
3: for pi ∈ P do
i
4:
if xi < TBET
then
5:
Insert stage pi into set S1;
6:
else
7:
Insert stage pi into set S2;
8:
end if
9: end for
i
can get Toff
≥
wi
p ∈P wi
i
(b −
m
i=1 ci )
i
≥ T BET
holds for S2, thus the stage partition gener-
ated by Algorithm 3 is feasible.
For each b, we can first obtain a suboptimal partition by the greedy partition scheme
depicted in Algorithm 3, and then the optimal solution under the obtained partition
can be determined. The pseudocode of the algorithm is depicted in Algorithm 4.
ALGORITHM 4: Fast Heuristic
Input: α u, bl , bh , Pmin = ∞
i m
]i=1
Output: [Ki , Toff
1: for b = bl to bh with step do
2:
compute ρ by Eqn. (16);
3:
generate the feasible partition S1 and S2 by Algorithm 3;
and Toff according to Lemma 5.7;
4:
obtain K
and Toff by Algorithm 2;
5:
repair K
Toff ) < Pmin then
6:
if P( K,
; To f f, opt ← Toff ;
7:
Kopt ← K
opt , To f f, opt );
8:
Pmin ← P( K
9:
end if
10: end for
6. PERFORMANCE EVALUATIONS
In this section, we demonstrate the effectiveness of our approach. We compare three
approaches in this section: (1) the pay-burst-only-once algorithm based on quadratic
programming (PBOOA-QP) presented in Section 5.3; (2) pay-burst-only-once algorithm
based on the fast heuristic (PBOOA-FH) presented in Section 5.4; and (3) the deadline
partition algorithm (DPA). DPA partitions the end-to-end deadline into sub-deadlines
for individual pipeline stages and explores all the possible deadline partition combinations to find that deadline partition with the minimum energy consumption. For
each deadline partition combination, DPA uses the scheme in Huang et al. [2009b] to
minimize the energy consumption of individual pipeline stages to optimize the overall
energy consumption. To show the effects of our scheme, we report the average idle
power computed as Eq. (10) as well as the computation time of all the schemes. The
simulation is implemented in Matlab using the RTC toolbox [Wandeler and Thiele
2006] and the finite B&B algorithm [Chen and Burer 2012] is used to solve QPB. All
results are obtained from a 2.83 GHz processor with 4GB memory.
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
26:18
G. Chen et al.
Table I. Constants for 70nm Technology [Martin et al. 2002; Wang and
Mishra 2010]
Const
Value
Const
Value
Const
Value
K1
K2
K3
K4
K5
0.063
0.153
5.38 × 10−7
1.83
4.19
K6
K7
Vdd
Vbs
α
5.26 × 10−12
−0.144
[0.5,1]
[−1,0]
1.5
Vth1
Ij
Ceff
Ld
Lg
0.244
4.8 × 10−10
0.43 × 10−9
37
4 × 106
Table II. Power Parameters
Vdd
0.7V
Pa
656mW
Ps
390mW
Pσ
50μW
Esw
483μJ
tsw
10ms
6.1. Simulation Setup
The experiments are conducted based on the classical energy model of a 70nm technology processor in Martin et al. [2002], Wang and Mishra [2010], Jejurikar et al. [2004],
and Chen et al. [2014], whose accuracy has been verified with SPICE simulation.
Table I lists the energy parameter under 70nm technology [Martin et al. 2002; Wang
and Mishra 2010; Jejurikar et al. 2004; Chen et al. 2014]. According to Jejurikar et al.
[2004], executing at Vdd = 0.7V is more energy efficient than executing at lower voltage
levels. To achieve the minimization of the overall energy consumption of the system,
we assume that the processor runs at this critical frequency level when the processor
is in the active state. From Wang and Mishra [2010] and Jejurikar et al. [2004], body
bias voltage Vbs is obtained as −0.7V . From Jejurikar et al. [2004], Pon related to idle
power can be obtained as 100mW and the power consumption in sleep mode Pσ is set
as 50μW. In Jejurikar et al. [2004], we can obtain energy overhead Esw of the state
transition as 483μJ. We set time overhead tsw of the state transition as 10ms. According to the energy parameter in Table I and the energy model in Section 3.2, we can
calculate the corresponding active power Pa and standby power Ps under voltage level
Vdd = 0.7V . Table II lists all the power parameters used in the experiment.
An event stream is specified by the PJD model with period p, jitter j, and minimal
interarrival distance d. It is worth noting that a worst-case execution time c is associated with the service curve of different stages, as stated in Section 3.3. The jitter j and
the relative deadline D of the stream are respectively defined as j = ϕ · p and D = γ · p
and vary according to the corresponding factors.
To evaluate the effectiveness of our approach, we conduct the experiments with
three applications. We collected results for these three applications with deadline and
jitter varying with the corresponding factors γ and ϕ. In the following, we give a brief
overview of the three applications. The H.263 decoder application [Oh and Ha 2002]
was modeled by four tasks consisting of packet decoding (PD1), an inverse quantization
operation (deQ), an inverse DCT operation (IDCT), and motion compensation (MC). The
execution time of each subtask in the H.263 decoder application can be found in Oh and
Ha [2002]. The activation period of the H.263 decoder application is 100ms with varying
the jitter and the end-to-end deadline. The MP3 decoder application is implemented
in a pipelined fashion [Oh and Ha 2002] that can be split into five tasks, including
packet decoding (PD2), Huffman decoding (HD), the inverse quantization operation
(deQ), an inverse DCT operation (IDCT), and antialiasing (FB). The execution time of
each subtask in the H.263 decoder application can be found in Oh and Ha [2002]. The
activation period of the MP3 decoder application is 100ms with varying the jitter and
the end-to-end deadline. Time Delay Equalization (TDE) comes from the GMTI (Ground
Moving Target Indiciator) application obtained from StreamIt benchmarks [Thies and
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
Applying Pay-Burst-Only-Once Principle for Periodic Power Management
26:19
Table III. Average Power Savings with Respect to DPA
PBOOA-QP
PBOOA-FH
H.263
2-stages
10.46%
10.46%
MP3
2-stages
11.57%
11.57%
TDE
2-stages
39.62%
39.65%
H.263
3-stages
23.59%
23.31%
MP3
3-stages
26.37%
25.69%
TDE
3-stages
30.60%
34.09%
Amarasinghe 2010]. The Time Delay Equalization (TDE) application contains 4 tasks,
including tasks like FFT reorder, combined DFT, FFT reorder, and combined IDFT. We
set the activation period of the consumer application as 30ms.
6.2. Simulation Result
We first evaluate how the power consumptions of the compared approaches change as
the jitter and deadline vary. Cases of 2-stage and 3-stage pipeline architectures with
homogeneous 70nm processors are evaluated. We vary the jitter factor ϕ from 0–3 with
step 0.5 and the deadline factor γ from 1.5–2 with step 0.5. The simulation results of the
three approaches are shown in Figure 4. In the figure, each line represents the average
energy consumption under the varied jitter factor settings with the fixed deadline factor
and task mapping. From these, we can make the following observations: (1) pay-bustonly-once-based approaches always outperform the deadline partition approach for all
settings on both pipeline architectures. We list average normailized power savings of
PBOOA-QP and PBOOA-FH with respect to DPA in Table III; (2) the average idle power
consumptions of the three approaches increase as jitter increases, since bigger jitter
requires longer Ton to gurantee the worst-case end-to-end deadline; (3) the average idle
power consumptions of the three approaches decrease as end-to-end deadline increases.
This is expected becasue the loose end-to-end deadline requirement could result in
smaller execution time Ton and longer sleep time Toff ; (4) one interesting obervation is
that pay-bust-only-once-based approaches can achieve more power savings on 3-stage
pipeline rather than 2-stage pipeline for different jitter and deadline settings. This is
because DPA on 3-stage pipeline pay burst more times than 2-stage pipeline, which
leads PBOOA-QP and PBOOA-FH to achieve more power savings on 3-satge pipeline.
Next, we conduct the experiment to show the impact of time overhead of state transition tsw on the effectiveness of our approaches. An H.263 application with jitter factor
ϕ = 0.5 and deadline factor γ = 1 runs in 3-stage pipeline architectures with homogeneous 70nm processors. We vary the time overhead of state transition tsw from 5ms–
15ms with fixed step size 1ms. Figure 5 illustrates the average power consumptions
for the three compared approaches. In this figure, we can observe that our approaches
can find efficient solutions and outperform DPA in all of tsw settings. Besides, when tsw
increases, the average power consumptions of DPA increase faster compared to payburst-only-once-based approaches. This is because DPA generates the less idle time due
to suffering from a paying burst many times compared to pay-burst-only-once-based
approaches, as we show in Section 4. The increase of tsw will reduce the opportunities
for turnning off the processor, which means that entering sleep mode should be more
difficult for DPA.
Then, we discuss the impact of the period setting on the effectiveness of the approaches. The MP3 application with jitter factor ϕ = 1 and deadline factor γ = 1.5
runs in two-stage pipeline architectures with homogeneous 70nm processors, where
we vary period settings from 70–130ms with fixed step size 10ms. Figure 6 illustrates
the average power consumptions for the three compared approaches under different period settings. From Figure 6, we can see that the pay-burst-only-once-based approaches
outperform DPA at all period settings. Furthermore, the average power consumption of
all approaches decreases when the period increases. This is expected because a bigger
period of the application can prolong the idle intervals.
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
26:20
G. Chen et al.
Fig. 4. Average idle power consumption for three applications on 2-stage and 3-stage pipeline architectures.
In the end, we demonstrate the scalability of our approaches. We test them by
up to a 20-stage heterogeneous pipeline. The execution time of subtasks mapped on
each stage are randomly generated between 5–15ms. According to the power model
presented in Section 3.2, the power profile of each stage can be generated by randomly selecting voltage Vdd between 0.5V –0.8V . The activation period of the event
stream is 40ms with jitter factor ϕ = 1. The end-to-end deadline for the test case
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
Applying Pay-Burst-Only-Once Principle for Periodic Power Management
26:21
Fig. 5. Average power consumption with varying tsw .
Fig. 6. Average power consumption with varying period.
with different stage number is determined by n · 20, where n is the stage number.
The overhead values of state transition tsw and Esw of different stages are randomly
selected in [1ms, 5ms] and [400uJ, 800uJ], respectively. Based on the observation that
the deadline
partition algorithm may suffer deadline combination explosion and the
costly
computation, we set the search step as 5 for the three compared approaches.
Figure 7 shows the power consumption and computation overhead on different pipeline
architectures. From this figure, we have the following observations: (1) as shown in
Figure 7(a), the computation overhead of the deadline partition algorithm increases
exponentially. When the stage number exceeds 10, the Deadline Partition Algorithm (DPA) fails to generate the results due to the expiration of time budget of 8 hours.
For the case of 9-stage pipeline, DPA takes almost 420 minutes, which is 9182× longer
than the 3-stage pipeline case. This is expected because the deadline combinations will
increase exponentially as the stage number increases. In addition, as the stage number
increases, the time for computing the resource demand of each following stage, which
requires the lower bound of the output arrival curve from the previous stage, increases.
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
26:22
G. Chen et al.
Fig. 7. Computation time and power computation for heterogeneous pipelined system.
Computing this output curve requires numerical min-plus convolution that will incur
considerable computational and memory overheads; (2) compared to the deadline partition algorithm, pay-burst-only-once-based approaches are fast and the computation
time increases slowly with respect to the stage number, especially for PBOOA-FH.
With the case of 20-stage pipeline, the PBOOA-QP approach takes 3.7 minutes, 124×
more computing time than the 3-stage case. PBOOA-FH takes only 0.08 minutes to
generate the result, only 7.5× than the 3-stage case; (3) in the context of average idle
power consumption, pay-burst-only-once-based approaches are more energy efficient
than the deadline partition algorithm. In Figure 7(b), we can see PBOOA-QP and
PBOOA-FH approaches always outperform DPA for all pipeline architectures, indicating that our approaches are not only faster but also more energy efficient than the DPA
approach. Besides, as observed in prior experiments, the gap in power consumption
between the deadline partition algorithm and pay-burst-only-once-based algorithm increase as the stage number increases. This is expected because, as the stage number
increases, the times when DPA should pay burst also increase. In contrast, the proposed approaches only need to pay burst only once, which leads the tighter end-to-end
delay bound and prolongs the idle intervals of the stages for energy efficiency; (4) the
PBOOA-FH approach can achieve almost identical average idle power consumption
with respect to the PBOOA-QP approach with almost 10× speedup. In some cases,
PBOOA-FH can even achieve more energy savings than PBOOA-QP. This is because
that, in contrast to the PBOOA-QP approach, PBOOA-FH integrates break-even time
constraints into the optimization phase, which leads the PBOOA-FH approach to find
better solutions than the PBOOA-QP approach.
7. CONCLUSION
This article presents new approaches to minimize energy consumption for pipelined
systems. Targeting the streaming application with nondeterministic workload arrivals
under hard real-time constraints, our approaches can not only guarantee the original
end-to-end deadline requirement but also retrieve the pay-burst-only-once phenomena, resulting in a significant reduction in both the energy consumption and computing overhead. Moreover, our approaches are scalable with respect to the number of
pipelined stages. Simulation results demonstrate the effectiveness of our approaches.
In the future, we intend to extend our approaches to dynamic voltage-frequency scaling (DVFS) to reduce dynamic power for pipelined systems. Another interesting future work would be to target multidimensional issues such as energy and thermal
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
Applying Pay-Burst-Only-Once Principle for Periodic Power Management
26:23
constraints simultaneously. In addition, how to combine our approaches with consideration of the mapping of the application is also deemed worthy for our future work.
APPENDIX
LEMMA A.1. The service curve of period power management specified by Ton and Toff
can be represented as follows.
i
− Toff
Gl
i
βi () = Ton
⊗ .
(20)
i + Ti
Ton
off
PROOF. According to Huang et al. [2009b], the service curve of period power management specified by Ton and Toff can be represented as Eq. (21).
"
#
Gl
(21)
· Ton, −
· Toff .
β () = max
Ton + Toff
Ton + Toff
This proof presents the derivation of Eq. (20), which is used to represent the service
curve of period power management, to indicate that Eqs. (21) and (20) are equivalent.
According to the definition of the min-plus convolution,
$
%
− Toff
β Gl () = Ton
⊗
Ton + Toff
%
$
s − Toff
.
= inf − s + Ton ·
0≤s≤
Ton + Toff
We make some transformations as follows.
T = Ton + Toff
= k · T + r , k ∈ N + , 0 ≤ r < T ,
s = ks · T + rs , ks ∈ N + , 0 ≤ rs < T .
Then, we have
%
$
rs − Toff
(k − ks ) · T + (r − rs ) + Ton · ks + Ton ·
.
0≤s≤
T
β Gl () = inf
(22)
As s ≤ , there are two possibilities between the parameters r and rs : (1) when
ks = k , rs ≤ r should be held for s ≤ ; (2) when ks ≤ k − 1, there is no constraint
between r and rs because ks ≤ k − 1 is sufficient to guarantee s ≤ .
Case 1: ks ≤ k − 1. For this case, there are no constraints between r and rs , thus we
can have Eq. (23) by calling Eq. (22).
%
$
rs − Toff
β Gl () = inf (k − ks ) · T + (r − rs ) + ks · Ton + Ton ·
0≤s≤
T
%
$
rs − Toff
= inf k · T + r − ks · (T − Ton) − rs + Ton ·
0≤s≤
T
%
$
rs − Toff
.
(23)
= inf k · T + r − ks · Toff − rs + Ton ·
0≤s≤
T
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
26:24
G. Chen et al.
—When Toff < rs < T holds, we have Eq. (24) by calling Eq. (23).
β Gl () = inf (k · T + r − ks · Toff − rs + Ton)
0≤s≤
> (k · T + r − ks · Toff − T + Ton)
= (k · T + r − ks · Toff − Toff )
≥ (k · T + r − k · Toff ).
(24)
—When 0 ≤ rs ≤ Toff holds, we have Eq. (25) by calling Eq. (23).
inf
β Gl () = inf (k · T + r − ks · Toff − rs ) =========⇒ k · T + r − k · Toff . (25)
rs =Toff ,ks =k −1
0≤s≤
For the preceding two cases, the infimum of βkGl
() for the case ks ≤ k − 1 can be
s ≤k −1
obtained as Eq. (26) by calling Eqs. (24) and (25).
βkGl
() = k · T + r − k · Toff = − k · Toff .
s ≤k −1
(26)
Case 2: ks = k . For this case, rs ≤ r should be held for s ≤ , thus we have Eq. (27)
by calling Eq. (22).
%
$
rs − Toff
.
(27)
β Gl () = inf (r − rs ) + k · Ton + Ton ·
0≤s≤
T
As rs should be constrained by r , there are two cases for r .
—r ≤ Toff . For this case, we have 0 ≤ rs ≤ r ≤ Toff , thus we have Eq. (28) by calling
Eq. (27).
inf
βkGl
() = inf ((r − rs ) + k · Ton) ===⇒ k · Ton.
s =k ,r ≤Toff
rs =r
0≤s≤
(28)
By integrating the cases of ks = k and ks ≤ k − 1, we have Eq. (29) by calling
Eqs. (26) and (28).
βrGl
() = min ( − k · Toff , k · Ton) = k · Ton.
≤Toff
(29)
—r > Toff . For this case, there are two subcases for rs , that is, 0 ≤ rs ≤ Toff < r < T
and Toff < rs ≤ r < T .
—0 ≤ rs ≤ Toff < r < T . For this case, we have Eq. (30) by calling Eq. (27).
inf
β Gl () = inf ((r − rs ) + k · Ton) ===⇒ r − Toff + k · Ton < Ton + k · Ton.
rs =Toff
0≤s≤
(30)
—Toff < rs ≤ r < T . For this case, we have Eq. (31) by calling Eq. (27).
inf
β Gl () = inf ((r − rs ) + k · Ton + Ton) ===⇒ Ton + k · Ton.
0≤s≤
rs =r
(31)
() can be obtained as Eq. (32)
For the prior two cases, the infimum of βkGl
s =k ,r >Toff
by calling Eqs. (30) and (31).
βkGl
() = r − Toff + k · Ton = − (k + 1) · Toff .
s =k ,r >Toff
(32)
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
Applying Pay-Burst-Only-Once Principle for Periodic Power Management
26:25
By integrating the cases of ks = k and ks ≤ k − 1, we have Eq. (33) by calling
Eqs. (32) and (26).
() = min ( − k · Toff , − (k + 1) · Toff ) = − (k + 1) · Toff .
βrGl
>Toff
With Eqs. (29) and (33), we can obtain the service curve as Eq. (34).
r ≤ Toff
k ·T
.
β Gl = on
− (k + 1) · Toff r > Toff
(33)
(34)
When 0 ≤ r ≤ Toff holds, we have k · Ton ≥ − (k + 1) · Toff and k = Ton+T
. When
off
r > Toff holds, we have k · Ton < − (k + 1) · Toff and k + 1 = Ton+T
.
off
Then, we can obtain the service curve as Eq. (35).
β Gl = max (k · Ton, − (k + 1) · Toff ).
(35)
By transforming Eq. (35), we can obtain Eq. (21), thus the service curve of period
power management can be represented as Eq. (20).
REFERENCES
A. Alimonda, S. Carta, A. Acquaviva, A. Pisano, and L. Benini. 2009. A feedback-based approach to DVFS in
data-flow applications. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 28, 11, 1691–1704.
S. Carta, A. Alimonda, A. Pisano, A. Acquaviva, and L. Benini. 2007. A control theoretic approach to energyefficient pipelined computation in mpsocs. ACM Trans. Embedd. Comput. Syst. 6, 4.
G. Chen, K. Huang, C. Buckl, and A. Knoll. 2013. Energy optimization with worst-case deadline guarantee
for pipelined multiprocessor systems. In Proceedings of the Design, Automation and Test in Europe
Conference (DATE’13).
G. Chen, K. Huang, and A. Knoll. 2014. Energy optimization for real-time multiprocessor system-on-chip
with optimal DVFS and DPM combination. ACM Trans. Embedd. Comput. Syst. 13, 3.
J. Chen and S. Burer. 2012. Globally solving nonconvex quadratic programming problems via completely
positive programming. Math. Program. Comput. 4, 1, 33–52.
J. J. Chen, N. Stoimenov, and L. Thiele. 2009. Feasibility analysis of on-line DVS algorithms for scheduling
arbitrary event streams. In Proceedings of the 30th IEEE Real-Time Systems Symposium (RTSS’09).
A. Davare, Q. Zhu, M. Di Natale, C. Pinello, S. Kanajan, and A. Sangiovanni-Vincentelli. 2007. Period
optimization for hard real-time distributed automotive systems. In Proceedings of 44th ACM/IEEE
Design Automation Conference (DAC’07).
P. de Langen and B. Juurlink. 2006. Leakage-aware multiprocessor scheduling for low power. In Proceedings
of the 20th International Parallel and Distributed Processing Symposium (IPDPS’06).
P. de Langen and B. Juurlink. 2009. Leakage-aware multiprocessor scheduling. J. Signal Process. Syst. 57,
1, 73–88.
M. Fidler. 2003. Extending the network calculus pay bursts only once principle to aggregate scheduling.
In Proceedings of the 2nd International Workshop on Quality of Service in Multiservice IP Networks
(QoS-IP’03). 19–34.
M. Fu, Z. Luo, and Y. Ye. 1998. Approximation algorithms for quadratic programming. J. Combinat. Optim.
2, 1, 29–50.
S. Y. Hong, T. Chantem, and X. S. Hu. 2011. Meeting end-to-end deadlines through distributed local deadline
assignments. In Proceedings of the 32nd IEEE Real-Time Systems Symposium (RTSS’11).
K. Huang, J. J. Chen, and L. Thiele. 2011a. Energy-efficient scheduling algorithms for periodic power management for real-time event streams. In Proceedings of the 17th IEEE International Conference on
Embedded and Real-Time Computing Systems and Applications (RTCSA’11).
K. Huang, L. Santinelli, J. J. Chen, L. Thiele, and G. C. Buttazzo. 2009a. Adaptive dynamic power management for hard real-time systems. In Proceedings of the 30th IEEE Real-Time Systems Symposium
(RTSS’09).
K. Huang, L. Santinelli, J. J. Chen, L. Thiele, and G. C. Buttazzo. 2009b. Periodic power management schemes
for real-time event streams. In Proceedings of the 48th IEEE International Conference on Decision and
Control (CDC’09).
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
26:26
G. Chen et al.
K. Huang, L. Santinelli, J. J. Chen, L. Thiele, and G. C. Buttazzo. 2011b. Applying real-time interface and
calculus for dynamic power management in hard real-time systems. Real-Time Syst. 47, 2, 163–193.
H. Javaid and S. Parameswaran. 2009. A design flow for application specific heterogeneous pipelined multiprocessor systems. In Proceedings of the 46th ACM/IEEE Annual Design Automation Conference
(DAC’09).
H. Javaid, M. Shafique, J. Henkel, and S. Parameswaran. 2011a. System-level application-aware dynamic
power management in adaptive pipelined MPSoCs for multimedia. In Proceedings of the IEEE/ACM
International Conference on Computer-Aided Design (ICCAD’11).
H. Javaid, M. Shafique, S. Parameswaran, and J. Henkel. 2011b. Low-power adaptive pipelined MPSoCs for
multimedia: An h.264 video encoder case study. In Proceedings of the 48th ACM/EDAC/IEEE Design
Automation Conference (DAC’11).
R. Jejurikar, C. Pereira, and R. Gupta. 2004. Leakage aware dynamic voltage scaling for real-time embedded
systems. In Proceedings of the 41st ACM/IEEE Design Automation Conference (DAC’04).
V. Jeyakumar, A. M. Rubinov, and Z. Y. Wu. 2006. Sufficient global optimality conditions for non-convex
quadratic minimization problems with box constraints. J. Global Optim. 36, 3, 471–481.
I. Karkowski and H. Corporaal. 1997. Design of heterogeneous multi-processor embedded systems: Applying functional pipelining. In Proceedings of the International Conference on Parallel Architectures and
Compilation Techniques (PACT’97). 156.
K. Lampka, K. Huang, and J. J. Chen. 2011. Dynamic counters and the efficient and effective online power
management of embedded real-time systems. In Proceedings of the 7th IEEE/ACM/IFIP International
Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’11).
J. Y. Le Boudec and P. Thiran. 2001. Network Calculus: A Theory of Deterministic Queuing Systems for the
Internet. Springer.
D. Liu, J. Spasic, J. T. Zhai, T. Stefanov, and G. Chen. 2014. Resource optimization of CSDF-modeled streaming
applications with latency constraints. In Proceedings of the Design, Automation and Test in Europe
Conference (DATE’14).
S. M. Martin, K. Flautner, T. Mudge, and D. Blaauw. 2002. Combined dynamic voltage scaling and adaptive
body biasing for lower power microprocessors under dynamic workloads. In Proceedings of IEEE/ACM
International Conference on Computer-Aided Design (ICCAD’02).
S. Maxiaguine, A. Chakraborty, and L. Thiele. 2005. DVS for buffer-constrained architectures with predictable QoS-energy tradeoffs. In Proceedings of the IEEE/ACM International Conference on Hardware/
Software Codesign and System Synthesis (CODES+ISSS’05).
H. Nikolov, T. Stefanov, and E. Deprettere. 2008. Systematic and automated multiprocessor system design,
programming, and implementation. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 27, 3, 542–555.
H. Oh and S. Ha. 2002. Hardware-software cosynthesis of multi-mode multi-task embedded systems with
real-time constraints. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES+ISSS’02).
S. Perathoner, K. Lampka, N. Stoimenov, L. Thiele, and J. J. Chen. 2010. Combining optimistic and pessimistic
DVS scheduling: An adaptive scheme and analysis. In Proceedings of the IEEE/ACM International
Conference on Computer-Aided Design (ICCAD’10).
S. L. Shee, A. Erdos, and S. Parameswaran. 2006. Heterogeneous multiprocessor implementations for jpeg:
A case study. In Proceedings of the 4th International Conference on Hardware/Software Codesign and
System Synthesis (CODES+ISSS’06).
S. L. Shee and S. Parameswaran. 2007. Design methodology for pipelined heterogeneous multiprocessor
system. In Proceedings of the 44th Annual Design Automation Conference (DAC’07).
L. Thiele, S. Chakraborty, and M. Naedele. 2000. Real-time calculus for scheduling hard real-time systems.
In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’00).
W. Thies and S. Amarasinghe. 2010. An empirical characterization of stream programs and its implications for language and compiler design. In Proceedings of the 19th International Conference on Parallel
Architectures and Compilation Techniques (PACT’10).
E. Wandeler and L. Thiele. 2006. Real-time calculus (rtc) toolbox. http://www.mpa.ethz.ch/Rtctoolbox.
E. Wandeler, L. Thiele, M. Verhoef, and P. Lieverse. 2006. System architecture evaluation using modular
performance analysis - A case study. Int. J. Softw. Tools Technol. Transfer 8, 6, 649–667.
W. X. Wang and P. Mishra. 2010. Leakage-aware energy minimization using dynamic voltage scaling and
cache reconfiguration in real-time systems. In Proceedings of the 23rd International Conference on VLSI
Design (VLSID’10).
R. B. Xu, R. Melhem, and D. Mosse. 2007. Energy-aware scheduling for streaming applications on chip multiprocessors. In Proceedings of the 28th IEEE International Real-Time Systems Symposium (RTSS’07).
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
Applying Pay-Burst-Only-Once Principle for Periodic Power Management
26:27
F. Yao, A. Demers, and S. Shenker. 1995. A scheduling model for reduced CPU energy. In Proceedings of the
36th Annual Symposium on Foundations of Computer Science (FOCS’95).
Y. Yu and V. K. Prasanna. 2002. Power-aware resource allocation for independent tasks in heterogeneous
real-time systems. In Proceedings of the 9th IEEE International Conference on Parallel and Distributed
Systems (ICPADS’02).
Received November 2013; revised July 2014; accepted November 2014
ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 2, Article 26, Pub. date: February 2015.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement