Thermodynamic Aspects of Confidentiality Pasquale Malacaria , Fabrizio Smeraldi

Thermodynamic Aspects of Confidentiality
Pasquale Malacaria∗, Fabrizio Smeraldi
School of Electronic Engineering and Computer Science, Queen Mary University of London,
Mile End Road, London E1 4NS, UK
∗ Corresponding
author.
Email addresses: pm@eecs.qmul.ac.uk (Pasquale Malacaria ), fabri@eecs.qmul.ac.uk
(Fabrizio Smeraldi)
Preprint submitted to Elsevier
November 4, 2013
Thermodynamic Aspects of Confidentiality
Pasquale Malacaria∗, Fabrizio Smeraldi
School of Electronic Engineering and Computer Science, Queen Mary University of London,
Mile End Road, London E1 4NS, UK
Abstract
We analyse secure computation as a physical process and connect it to recent
advances in security, namely Quantitative Information Flow.
Using a classic thermodynamic argument involving the second principle and
reversibility we show that any deterministic computation, where the final state
of the system is observable, must dissipate at least W kB T ln 2. Here W is the
information theoretic notion of security as defined in Quantitative Information
Flow, kB the Boltzmann constant and T the temperature of the environment.
Such minimum dissipation is also an upper bound on another probabilistic quantification of confidentiality introduced by Smith.
We then explore the thermodynamics of timing channels in Brownian computers. Here the low energies involved lead to the emergence of new timing
channels arising directly from the entropy variations related to computation.
Keywords: Information Theory, Security, Thermodynamics
1. Introduction
The study of computation as a physical process connects to a long tradition
straddling the fields of physics and computer science. Thermodynamic aspects
of computation have been considered, among others, in the works of Bennett
[4], Feynman [14], Landauer [21]. One of the main achievements is arguably the
realisation that computation, per se, does not require any energy dissipation [4].
Key to this result is the demonstration that any computation can in principle
be embedded into a reversible computation, and these can be carried out at no
energy cost.
However there are application areas of great practical importance that are
concerned with intrinsically irreversible computations, confidentiality being the
foremost among them. In this work the intuitive connection between the physics
of irreversible computation and confidentiality is made formal. Instrumental to
∗ Corresponding
author.
Email addresses: pm@eecs.qmul.ac.uk (Pasquale Malacaria ), fabri@eecs.qmul.ac.uk
(Fabrizio Smeraldi)
Preprint submitted to Elsevier
November 4, 2013
establishing a formal connection is recent research in confidentiality under the
name of Quantitative Information Flow [11, 12] where the leakage of confidential
information is quantified in terms of information theory.
Quantitative Information Flow (QIF) was introduced to address over-restrictive
definitions of confidentiality: ideally a secure system ought to be able not to
disclose any confidential information. In practice no usable system has this desirable “zero leakage”, or non-interference property. Any password protected
system leaks some information to an attacker even by refusing access to the system (the attacker will then learn that the password is not the one attempted).
Because of the unavoidability of leakage QIF provides an alternative approach to confidentiality: it aims to measure the leakage and so to provide support for a risk assessment of the security threat. Measuring leakage is achieved
by measuring the information about the secret data an attacker can infer by
observing the system. For example, attempting to randomly guess a pin number at an ATM machine will generate two possible observations: (1) the pin is
accepted (probability of acceptance 0.0001), (2) the pin is rejected (probability
of rejection 0.9999). A standard measure of information is Shannon’s entropy.
When evaluated on these probabilities it allows the inference that the attacker
has gained 0.00147 out of the total 13.2 bits of information about the secret
pin in this attack: an insignificant leak unless the attack is repeated multiple
times. More generally, given an initial distribution on the confidential data and
a deterministic program P whose sole input is the confidential data, the leakage
is defined as the Shannon entropy of the probability distribution associated to
possible observable outputs. This definition is consistent with the naive “zero
leakage” definition: it is easy to prove that a program leaks no confidential
information if and only if its output has zero entropy [11].
Quantitative Information Flow has been applied among others to side channel attacks analysis [17, 18, 9], to measure confidentiality leaks in the Linux
Kernel [20], to database security analysis [2], to the analysis of anonymity protocols [6, 7], to side channels leaks in web applications [32] and the avoidance
of fault masking [10].
1.1. Contributions
In the first part of the paper (Sections 4 and 5) we consider quasi-static
transformations. This setting allows us to prove fundamental bounds valid for
a generic computational system. These bounds were first published in [23].
In Section 4 we start from the second principle of thermodynamics and its
application to the erasure of information (Landauer principle). Using a classical
thermodynamic argument based on reversibility, we prove a lower bound on
dissipation for secure computation. Up to the multiplicative factor kB T ln 2,
this bound is given by the notion of security W from Quantitative Information
Flow (equation 18 and proposition 3).
Section 4.4 demonstrates that net energy expenditure is not required if we
allow probabilistic operators into the language. In that case work can actually be
extracted by the system (inequality 21), although erasure remains an irreversible
3
operation. Again the energy is bounded by the quantity W which is in this case
non-positive.
Section 5 investigates the thermodynamics of Smith’s definition of leakage
and proves that remaining uncertainty is in general a lower bound on W (proposition 6). The bound becomes an equality if and only if the remaining uncertainty coincides with the difference between the work needed to reset the input
and output registers when they are in their maximally disordered state (proposition 4 and 7). To the best of our knowledge this is the first connection between
guessability and thermodynamics. Finally both measures are order related to
the magnitude of dissipation in section 5.1.
In order to study timing channels the second part of the paper (Sections
6, 7) deals with a class of computers with low energy requirements and governed by a specific dynamics, namely Brownian computers. Following classic
arguments from Feynman and Bennett we investigate energy dissipation in such
computers and the relation between energy and speed of computation (section
6.3). We then apply these arguments to timing channels (section 7) and show
the existence of length-of-computation timing channels (section 7.1) as well as
of a novel class of entropy timing channels (section 7.2). Trade-offs between
these channels are illustrated in section 7.3.
2. Background
2.1. Notations
Given probabilities µ1 , . . . , µN , the Shannon entropy of the distribution is
defined as
X
H(µ1 , . . . , µN ) = −
µi log(µi )
(1)
1≤i≤N
where log denotes the logarithm in base 2. The entropy of a random variable
X is
X
H(X) = −
µ(X = x) log(µ(X = x)).
X
Given two random variables X, Y the conditional entropy H(X|Y ) is defined
as H(X, Y ) − H(Y ) where H(X, Y ) is the entropy of the joint random variable
(X, Y ). H(X|Y ) measures the uncertainty on X knowing Y .
Mutual information is defined as I(X; Y ) = H(X)−H(X|Y ) and conditional
mutual information as I(X; Y |Z) = H(X|Z) − H(X|Y, Z). Mutual information
is a measure of the correlation between X and Y , i.e. it quantifies how much
information they share.
2.2. Basic definitions and properties
A general definition of leakage assumes an information processing system
having inputs h, l where h are the confidential inputs and l are the public inputs
and a set of observables P probabilistically related to the inputs. The leakage of
confidential data h to the observables P given public input l is then defined as the
4
difference in the uncertainty about the secret before and after the observations
and is measured using conditional mutual information [11]:
I(h; P |l) = H(h|l) − H(h|P, l)
(2)
In simple terms this is what the attacker has learned about the secret by observing the system.
In the case of a deterministic program where the sole input is h and observations P are the outputs, definition (2) reduces to mutual information I(h; P )
and the following holds:
I(h; P )
= H(h) − H(h|P )
(3)
= H(P ) − H(P |h)
(4)
= H(P )
(5)
where the second equality holds because mutual information is symmetric and
the third equality holds because the outputs of a program only depend on h,
hence H(P |h) = 0.
Notice that as discussed in [24] (resp. [20]) the restriction to h being the
sole input is not a limitation in the theory (resp. in practice). In this work we
will assume that the final memory state of the system is observable.
The key quantity in this paper is W = H(h) − H(P ).
Proposition 1. For deterministic programs with sole input h the following are
equivalent:
1. W = H(h) − H(P )
2. W = H(h) − I(h; P )
3. W = H(h|P )
The equivalences follow easily from equations 3, 4 and 5.
In its formulation W = H(h|P ) the quantity W is known in the Quantitative
Information Flow community as security [24, 11]. The reason why the first
definition of W is the chosen one is related to its generality beyond deterministic
systems and will be clarified in section 4.4.
Consider now the equivalent formulation
W = H(h) − I(h; P ).
In words this says that W is the amount of secret that has not been leaked by
the program, i.e. the secret protected by the program.
A contribution of this paper is to show that W ln(2)kB T represents also the
thermodynamic work to be done on the system to protect the confidential data
(equation 18). In other words W ln(2)kB T is the minimum amount of energy
any system implementing that program must dissipate in order to protect that
confidential data.
5
To understand the basic properties of W it helps to introduce equivalence
relations induced by the observations: we say that two confidential values are
equivalent if the program will produce the same output when given those inputs
[24, 20].
Proposition 2. The maximum and minimum of W are as follows:
1. W is maximal for the distribution on the secret which is uniform on the
largest equivalence class and 0 on all other points. This maximal value is
log(|e|) where e is the largest equivalence class of confidential values and
|e| its cardinality.
2. W is minimal for the distribution on the secret which is 0 everywhere
apart for only one point in each equivalence class (and is uniform on these
points). The minimal value is 0.
As a sanity check consider W = H(h|P ) = 0: that means that an attacker
will have no uncertainty about the secret given the observations, hence everything has been leaked i.e. no work has been done to protect the secret. This
case also cover the situation where the program has no confidential input: these
are computations with no security constraints and so reversible computations
in the sense of Bennett [4].
At the other end W is maximal when H(h|P ) = H(h), that is when the
observations and the secret are independent - hence all bits of the secret have
been protected. A particular case of this is the computation of a constant
function.
3. The thermodynamics of computation
While a thorough review of the physics of computation is beyond the scope
of this work, we believe that an overview of the salient points of this discussion
can actually help the reader locate our contribution in the wider subject area.
3.1. Modelling computation
Most of the key conclusions in the thermodynamics of computation can be
arrived at through the analysis of relatively simple physical models of computation involving idealised objects such as perfectly rigid spheres, a single molecule
of a perfect gas, or quantum systems with few degrees of freedom (in the case of
quantum computers). A useful physical model of computation is given by a set
of (idealised) billiard balls arranged in a particular way, that are set into motion starting from an input condition and will eventually evolve through a series
of perfectly elastic collisions to a configuration representing the output state.
For the sake of this argument, we can assume that the presence of a ball in a
particular position at the beginning (or respectively the end) of a computation
represents a 1 state, while its absence represents a 0 state. A suitably complex
system of billiard balls can in principle carry on any computation [16, 14], with
some qualifications that will become clear below.
6
3.2. Time-symmetry and reversibility
The most important feature of the billiard-ball model is its reversibility, that
directly derives from the symmetry of the laws of mechanics with respect to time.
Concretely, the total energy of the balls is conserved during a computation;
it is then sufficient to reflect the balls backwards into the computer at the
end of computation for this to be undone as the balls return to their starting
position. Since the position of the balls encodes the state of the computer,
this implies that all the functions computed are logically reversible — that
is, one-to-one — and they are computed at zero energy cost. More generally,
the time symmetry of physical laws and the existence of universal reversible
gates such as the controlled-XOR gate introduced by Friedkin and Toffoli [16]
imply that all logically reversible functions can be computed reversibly without
energy expenditure. Since a non-injective function y = f (x) can easily be
made invertible by enriching the output with the input (f˜(x) = (x, f (x))), all
computations can in principle be done reversibly and without any minimum
energy expenditure; however, there are evidently cases (security being one of
them) where it is clearly not desirable to do so.
3.3. The Second Principle and irreversibility
Irreversibility arises in Thermodynamics from the statistical study of a high
number of copies of the same system. This gives rise to one of the most powerful
and all-encompassing concepts of a time arrow, encoded in the Second Principle. The Second Principle associates to each system a state function S known
as entropy, and states that when the system undergoes a transformation, the
following inequality holds:
δQ
(6)
∆S ≥
T
where δQ is the heat absorbed by the system at temperature T and the equality
sign holds for reversible transformations only. Since entropy is a function of
the state of the system, this means that an isolated system (that cannot dump
heat into the environment) will tend to evolve irreversibly towards states with
higher entropy. For a computer, that is generally modelled as being in equilibrium with a single heat source at temperature T (the environment), Equation
6 implies that if the machine is returned to its initial state (∆S = 0) after an
irreversible transformation, a certain amount of energy is dissipated as heat into
the environment during the process:
I
I
δQ
1
∆Q
0 = ∆S >
=
δQ =
(7)
T
T
T
as the inequality sign will strictly hold in (6) during at least part of the transformation. Since the computer is reverted to its initial state, the energy dispersed
as heat must be compensated by doing an equal amount of work on the system.
In terms of the microstates of the system, i.e. of a complete specification of
all its degrees of freedom, entropy can be written as
X
S = −kB
pi log pi ,
(8)
i
7
which bears a striking analogy to the Shannon entropy H in Equation 1. Indeed, since logical states in a computation are in one-to-one correspondence
with the physical states of the computer, Equation 6 provides a direct way to
relate changes in the information content of a register of the computer to energy consumption. In particular, any reduction of the information content of
the register (for instance, a reset operation) will result in a negative ∆S and
thus require heat to be dispersed into the environment - and work to be done
on the system if conservation of energy is to hold. The quantitative relation
between the erasure of information and dissipation is beautifully brought out
by a computational take on a puzzling conceptual experiment, i.e. Maxwell’s
demon. We will briefly review this argument in the next section.
3.4. From Maxwell’s demon to the Landauer principle
Another consequence of Equation 6 is that, since ∆S = 0 over a transformation that ultimately reverts the system to its initial state, it is impossible to
build a thermal machine that has as its only effect the transformation of energy
from a single source of heat into work — in order to balance the entropy cheque,
some heat will need to be dumped into a reservoir at lower temperature. This
is known as the Kelvin statement of the Second Principle, and its hypothetical
violation as a Perpetual Motion of the second kind.
An intriguing conceptual attempt on the Kelvin statement was produced by
Maxwell with his demon. Maxwell considered a simple system consisting of a
perfect gas contained in two chambers communicating via a trap door in the
partition. The trap door is operated by a hypothetical agent (the demon) that,
by cleverly opening and closing it, is able to group the fastest molecules into
one side of the partition, thus creating a pressure difference that can then be
used to produce work for free. As the process can be repeated at will, this
would be a perpetual motion of the second kind. Various attempts have been
made at exorcising the demon, notably focusing on the cost to the demon of
measuring the position and speed of the particle prior to making a decision
on opening the trap door, or on the temperature and thermal agitation of the
demon itself. However, the modern consensus is that measurements can be
performed at arbitrarily low cost [22]. Rather, the demon itself is viewed as a
computing machine that must have at least one bit of memory — in order for
it to know whether it should open the trap-door to let the particle through or
not. Safeguarding the Second Principle requires that the cost for the demon of
resetting its memory to prepare it for another run is precisely kB T ln 2.
This compelling argument about the interaction between information and
physical entropy is given by Bennett and illustrated by the simplified version of
Maxwell’s argument shown in figure 1. The system consists of a single particle
in a box. In the initial state (1) the observer (demon) is completely ignorant on
the whereabout of the particle (see right side of figure). Next (2) the observer,
at cost zero, inserts a partition separating the box in two chambers. He now
knows for sure that the particle is on the left hand side or on the right hand side
but he doesn’t know which one. In the next step (3) he observes the box and
so acquires information on the position of the particle (left or right side); this
8
3K\VLFDOV\VWHP
2EVHUYHU,QIRUPDWLRQ
/HIW
5LJKW
Figure 1: Interaction between information and physical entropy
observation requires no energy expenditure. Using this knowledge and again
at cost zero (4) he can then push a piston in the empty chamber up to the
partition. Next (5), again at cost zero, he can remove the partition and let
the pressure exerted by the particle push the piston: the work generated by
the system by pushing the piston is kB T ln 2. His knowledge is now that the
particle was on the left hand side (or the right hand side). Eventually the
expansion process terminates (6) and the box is seemingly back to the initial
state. So far kB T ln 2 energy has been extracted from the system. The only
difference between the initial state of the system (1) and this sixth step of the
process is that the observer still has information about the origin of the particle
(1 bit of information). Hence resetting the observer’s memory to the initial
state of complete ignorance (7) ≡ (1) must require minimum work kB T ln 2 to
compensate for the work extracted in step (5).
This is an instance of the celebrated Landauer principle [21]:
The minimum cost for the cancellation of one bit of information is
kB T ln 2.
9
This principle has very recently also been experimentally demonstrated [31]. As
argued by Bennett [4], Feynman [14], this intrinsic cost of cancelling information
is the key consideration in the thermodynamics of computation.
4. Physical model of secure computation
4.1. A simple two state register
In this section, we derive a few basic results on the energetic cost of erasing
information using a simple and rather idealised physical model. While having a
specific model is useful to understand the type of reasoning involved, our final
results do not depend on the model and have quite general applicability.
In its simplest version, our model of a one-bit system consists of one molecule
of a perfect gas contained inside a box divided in two chambers by a partition.
The two chambers are labelled with the states 0 and 1; the system is in thermal
equilibrium with a heat reservoir at temperature T . If the particle has equal
probability of being in either chamber, we can reset the system by removing
the central partition and use an (idealised) piston to compress the gas into the
chamber marked 0. For a perfect gas, PV = nkB T , where n is the number of
molecules, V the volume and P the pressure. We assume that the two chambers
have unit volume. In this case, the work done by the system during compression
is
Z 1
Z 1
1
dV = −(ln 2)kB T
(9)
PdV = kB T
V
2
2
that agrees with Landauer’s principle.
A more interesting case is obtained if we assume that the particle is found
in the two chambers with different probabilities µ1 and µ2 (we assume without
loss of generality that µ1 > µ2 ). It is useful to consider an ensemble of identical
boxes, each containing a single molecule of an ideal gas. The molecule is in
the left half of the box in proportion µ1 of the boxes and in the right half
in proportion µ2 . Assume that the partition of each box is actually a piston
initially placed in the centre position, and that all the shafts are joined together.
Since more particles will hit one of the pistons on the left hand side than on the
right hand side, the pistons will move to the right until the pressure on both
sides is equalised. This expansion can be used to extract work from the system,
leaving it in the maximally disordered state; the energy thus obtained can then
be offset against the work needed for a reset. Figure 2 illustrates the idea for a
system consisting of a two states with probabilities 1/3, 2/3.
Again assuming that each chamber initially has unit volume, and averaging
across the ensemble, we have Pi = µi kB T , with P1 being the pressure in the left
chamber and P2 the pressure in the right chamber. The volumes of the chambers
at the end of the expansion obey µ1 /V1 = µ2 /V2 , which is the condition for the
pressure to equilibrate. Since V1 + V2 = 2, we have V1 = 2µ1 at the end of the
expansion.
10
(B)
(A)
Figure 2: A two–state system with probabilities 1/3 and 2/3 before (A) and after (B) the
expansion.
Therefore, the work done by the system during expansion is:
1
Wexp
=
kB T
kB T
Z
2µ1
(P1 − P2 )dV =
1
Z
=
2µ1
µ1
dV −
V
Z
2µ1
µ2
dV =
2−V
1
1
2−1
= µ1 ln 2µ1 − µ2 ln
=
2 − 2µ1
= µ1 ln 2µ1 + µ2 ln 2µ2 =
= µ1 ln 2 + µ1 ln µ1 + µ2 ln 2 + µ2 ln µ2 =
= ln 2 − H(µ1 , µ2 ) ln 2
(10)
where for convenience we have divided both sides by kB T .
After the expansion we can reposition at no cost the pistons at one end of
the combined chambers and we are left to reset the same maximally disordered
system we considered above, which according to Equation 9 can be done at a
cost (ln 2)kB T . We conclude that the work needed to reset a two–state system
with probabilities µ1 , µ2 is
(ln 2)kB T − Wexp = H(µ1 , µ2 )kB T ln 2.
4.2. The multiple–state case
We now introduce a generalisation of the above perfect gas model to an
N −state system, able to represent N distinct logical states with probabilities
µ1 . . . . , µN . We will use this conceptual model to compute the work required to
reset the representation of an arbitrary distribution of logical states.
Our generalised physical model consists of a box with N chambers, each
initially of unit volume. The partitions of the chambers are pistons attached
to separate shafts that can be actuated independently. Figure 3 illustrates the
idea. The box contains exactly one molecule of ideal gas, that is found in the
i-th chamber with probability µi (again, it is useful to think of an ensemble of
such boxes in a fraction µi of which the particle is found in chamber i).
11
We again assume, for convenience, that the chambers are arranged in order
of decreasing probability of containing the particle (the general case can be
treated in a similar way by letting the pistons expand in different predetermined
directions). In order to reset the system we start by performing a series of
reversible expansions between adjacent cells, followed by removing the partitions
between cells that have been brought into equilibrium. Specifically, we begin
by expanding the first (leftmost) chamber against the second (storing the work
done in the process somewhere). We then remove the partition between the first
two chambers and expand the resulting joint volume against the third chamber.
This process is iterated until the system is brought to its maximally disordered
state and equilibrium is reached; energy produced by all expansions can then
be used to help resetting the system to its initial state.
We shall now work out a generic stage in this expansion, namely the expansion of the cells numbered 1 through n − 1 (that we suppose have already been
merged) against cell n.
LetP
the cumulative probability of the particle being in cell 1 through n be
n
Mn = i=1 µi . Hence the pressure in the first n−1 chambers after the partitions
between them have been removed is
Pn−1 = Mn−1 kB T /(n − 1),
n − 1 being the volume.
Let Pn be the pressure of the next individual chamber, i.e.
Pn = µn kB T.
After the expansion, the volume of the n-th cell will be a fraction µn /Mn of the
total volume of the n cells, which we assume is n.
3
3
3
31
3
3
3
31
Figure 3: Modelling a system with N states. The average pressure Pi in each chamber is
proportional to the probability µi of finding the particle in that chamber.
12
Thus work done during the expansion, similarly to equation 10 is
Wn
1
=
kB T
kB T
Z
n(1−µn /Mn )
(Pn−1 − Pn )dV =
n−1
n(1−µn /Mn )
µn
Mn−1
=
−
dV =
V
n−V
n−1
n(1 − µn /Mn )
nµn
= Mn−1 ln
+ µn ln
=
n−1
Mn
nµn
n Mn−1
+ µn ln
=
= Mn−1 ln
n − 1 Mn
Mn
n
n−1
= Mn ln
− Mn−1 ln
+ µn ln µn
Mn
Mn−1
Z
(11)
The work extracted from the system during the series of expansions can now
be obtained as the sum of the contribution of all the pairwise expansions:
N
X
Wn
Wexp
=
=
kB T
k
T
n=1 B
X
N N
X
n
n−1
Mn ln
=
µn ln µn . (12)
− Mn−1 ln
+
Mn
Mn−1
n=1
n=1
PNNoticing that the term in brackets yields a telescopic sum and that MN =
i=1 µi = 1 we obtain
Wexp
kB T
=
MN ln
N
X
N
+
µn ln µn
MN
n=1
=
ln N +
N
X
µn ln µn . (13)
n=1
This represents all the work extracted from the system during the expansion,
that leaves it in the maximally disordered state — i.e. with the particle equally
likely to be in any of the chambers. At this point, resetting the system to the
initial state requires the following work:
Z 1
1
Wcomp
=
dV = ln N
(14)
kB T
N V
Thus the net work done on the system to reset it form an arbitrary distribution
of states µ1 , µ2 , . . . , µN is
Wreset = Wcomp − Wexp
=−
N
X
µn ln µn kB T
=
(15)
=
(16)
n=1
= H(µ1 . . . . , µN )kB T (ln 2)
13
(17)
4.3. Universal factorisation of secure computations
Bennett [4], Friedkin and Toffoli [16] demonstrated that computations can
in principle be performed reversibly, hence there is no need for dissipation in
the computational process. Consistently with Bennett’s ideas we factor a secure
computation P̃ into a reversible computation R and a resetting step Π.
We build the following commutative diagram:
R
(h, 0)
(S(h), P (h))
Π
P̃
(0, P (h))
i.e. P̃ = Π ◦ R (we will, in the following, identify P̃ with P where no confusion
can arise). In the above diagram the extra register S(h) holds the history, i.e. the
information required for reversing the calculation R. After R terminates, Π
enforces security by deleting the history S(h).
Figure 4 illustrates this process for the program l=h%2; with h being a
two–bit secret.
Note that there is a wide choice in the implementation of S, and thus of the
two programs R and Π. An obvious choice is for S to hold a copy of the input
(which is generally not minimalistic, as our example in Figure 4 shows). However, as we will see the particular implementation of S does not affect the energy
cost of the computation. Indeed, since S is needed to disambiguate between input states hi , hj leading to the same program output Pk the only requirement
on S is that for each equivalence class in the observational equivalence S is
one-to-one. Thus given a program outcome Pk = P (hi ), the probability of the
associated history S(hi ) is equal to the probability of the input hi , i.e. :
µ(S(hi )|Pk ) = µ(hi |Pk ).
K
5
6K3K
Ȇ
3K
Figure 4: Secure computation of l = h%2 on a two-bit secret. The shaded history register
S(h) is reset by Π in the last step.
14
Combining this with equation 15 it then follows that the cost of resetting S,
averaged over all program outputs, is

X

X
µ(Pk )
h∈P −1 (Pk )
k

1
 kB T
µ(h|Pk ) ln
µ(h|Pk )
=
= H(h|P )kB T ln 2
=
= W kB T ln 2
(18)
that is the energy equivalent of the security of the program.
This result is universal i.e.
Proposition 3. W kB T ln 2 is a lower bound on the energy dissipated by any
system implementing P̃ .
To prove it suppose an implementation P˜0 dissipates less than W kB T ln 2, then
P˜0 ◦ R−1 is effectively an implementation of the reset operation Π that violates
Landauer’s principle.
4.4. Erasure vs resetting: extracting work from the system
Security imposes a lower bound on dissipation only in the case of deterministic computation, in which the system is reset to a fixed state. An alternative
process for protecting confidential data consists in overwriting the information
to be kept confidential with randomly generated bits; we call such cancellation erasure by randomisation (in this section simply erasure). Considering the
graph in Section 4.3, we replace the reset operator Π with an erasure operator
E:
E
(S(h), P (h)) 7−→ (, P (h))
(19)
where is a random number. Alternatively, the erasing program is given by P̃e =
E ◦ R, where R performs the computation reversibly and E assigns random bits
to the register S(h) containing confidential data. Notice that the deterministic
model of computation has now been extended with a probabilistic operation E
(we comment on this below).
We now have
H(P̃e ) = H(P ) + log(|S(h)|)
(20)
where the second term is the entropy of generating a random string of size |S(h)|
(i.e. |S| is the length in bits of register S). Therefore
W
=
H(h) − H(P̃e ) =
=
H(h) − H(P ) − log(|S(h)|) ≤ 0
(21)
The second equality is illustrated by the commutative diagram in Figure 5: we
know from section 4.3 that Π has cost H(h) − H(P ) and Π0 has, by Landauer
principle, cost log(|S(h)|). Inequality 21 then follows because S(h) and P (h)
15
P̃e
(h, 0)
R
P̃
(S(h), P (h))
E
Π
(, P (h))
Π0
(0, P (h))
Figure 5: Relation between erasure and resetting
together have the same information content as h, and log(|S(h)|) is an upper
bound on the information content of S(h).
If the inequality is strict then W is negative, meaning that work can be
extracted from the system; such work results from the randomisation and consequent increase in entropy of the history register S (note that the length of S
can be arbitrary). However, W will be zero if the computation R already leaves
S in a maximally disordered state — in which case further randomisation does
not allow us to extract any work from the register. It should also be noted that,
according to the Landauer principle, work extracted from the system during
erasure will have to be paid back should one decide to revert the system to its
original state (for instance to allow further use).
An important remark about erasure is that the introduction of probabilistic
operators like erasure means that the leakage is no longer correctly described by
the entropy of the observables H(P̃e ). In fact, the term log(|S(h)|) in equation 20
should not count towards leakage as it corresponds to disorder injected into the
system by the erasure operator. For this reason the general definition of leakage
is given in terms of mutual information (equation 2); in fact
I(Pe ; h) = H(Pe ) − H(Pe |h)
=
= (H(P ) + log(|S(h)|)) − log(|S(h)|)
=
H(P )
Here H(Pe |h) = log(|S(h)|) because the output of the program P is known
when h is given; hence the only uncertainty comes from the randomisation of
S.
Probabilistic operators are also the reason why we defined W as H(h)−H(P )
instead of setting W = H(h|P ). While the two definitions are equivalent in the
deterministic setting they differ in the probabilistic one. In fact by choosing
W = H(h|P ), as the conditional entropy is always non-negative we would conclude that dissipation is needed to protect confidential data also in the case of
probabilistic systems; however we have just shown that it is possible to extract
work from systems in non-maximally disordered states (an alternative argument for this uses Bennett’s fuel value of information [14, 4]). Since this work
can exceed the work needed to protect confidential data, H(h|P ) would be an
16
imprecise definition.
5. The thermodynamics of min-entropy leakage
A known issue with Shannon’s entropy as a measure of program security is its
mismatch with guessability: random variables may have arbitrarily high entropy
and still be highly likely to be guessed. This issue has prompted researchers
in security to investigate alternative foundations for Quantitative Information
Flow, with Geoffrey Smith [28] providing a notion based on the probability of
guessing the secret in one try that we will now explore. As we here show, Smith’s
definition is also closely related to the energetic cost of deleting information
and hence to the Landauer principle, although the focus is now on the input
and output registers rather than on the information required to reverse the
computation.
Smith quantifies the loss of confidentiality in terms of the difference between
the (log of the) probability of guessing the secret before and after observing the
output of a program. The logic underlying this approach is illustrated by the
following two programs:
A
if (h%8 == 0) then x = h; else x = 1;
B
x = h& 07k−1 1k+1 ;
Program A returns the value of h when the last three bits of the secret are
0, and returns 1 otherwise. Program B copies the last k + 1 bits of the secret
to the public variable x ( & is the bitwise and).
Given a uniformly distributed secret h of size 8k bits (where k is a parameter), the two programs have very similar leakage (H(A) = k + 0.169,
H(B) = k + 1); as we have seen, a very similar amount of work is thus needed
to protect the secret (in both cases W ' (7k − 1)kB T ). However, the two programs have an entirely different guessing behaviour. Program A discloses the
whole secret with probability 1/8 (and very little otherwise), while program B
always reveals the last k + 1 bits of the secret — but we are then left to guess
the remaining 7k − 1 bits with probability 1/27k−1 . As k is increased it gets
a lot easier to guess the secret in one try after running program A than after
running program B; conversely, the difference in the energy dissipated by each
program becomes negligible.
For these reasons, Smith suggests a measure of confidentiality based on Renyi
min-entropy [26]. The leakage of a program is defined as the difference between
the a priori Renyi min-entropy:
H∞ (h) = − log(max µ(hi ))
hi ∈h
and the a posteriori Renyi min-entropy H∞ (h|P ), expressed as a min-entropy
conditioned over all possible values of the observables:


X
H∞ (h|P ) = − log 
µ(Pj ) max (µ(hi |Pj )) .
Pj ∈P
17
hi ∈h
As shown in [19] H∞ (h|P ) is the log of the complement of the Bayes risk and
is also called remaining uncertainty. In the case of our examples, H∞ (h|P ) is
' 8k − 3 for A and k + 1 for B: a fitting quantification of the difference in
guessability between the two programs.
It makes sense to try and understand the thermodynamic meaning of H∞ (h|P ).
If the security W = H(h|P ) is the minimum dissipation what, if anything, is
H∞ (h|P ) in thermodynamic terms?
A first connection is given by the following result:
Proposition 4. For a deterministic program with a uniformly distributed secret
as its input,
H∞ (h|P ) = log(|h|) − log(|P |).
The proof of the above is a consequence of the following facts [28]:
1. The channel capacity of the two measures coincides, i.e.
max(H∞ (h) − H∞ (h|P )) = max(H(h) − H(h|P ))
µ(h)
µ(h)
2. maxµ(h) (H∞ (h) − H∞ (h|P )) is given by the uniform distribution on the
input h
3. maxµ(h) (H(h) − H(h|P )) is given by the uniform distribution on the outputs of the program (this is also the channel capacity for Shannon leakage)
4. in both cases the maximum is equal to the log of the number of outputs
of the program (denoted by log(|P |)).
The thermodynamic interpretation of H∞ (h|P ) hence is the difference between the maximal work needed to reset the initial state of the system (the
input register) and the maximal work needed to reset the final state (the output
register).
The following result easily follows from proposition 4:
Proposition 5. For a deterministic program with a uniformly distributed secret
as its input the following are equivalent:
1. H∞ (h|P ) = W
2. the outputs of the program are uniformly distributed
3. the observational equivalence relation consists of equivalence classes all of
equal size
These conditions are for example true of program B above, but not of program
A. A class of programs satisfying these conditions are for example the ones
computing h%n where n is a divisor of 2|h| .
If we relax the condition about the input being uniformly distributed then
Smith’s remaining uncertainty always underestimates dissipation i.e.
Proposition 6. For all deterministic programs and any distribution on h:
H∞ (h|P ) ≤ W.
18
The result is already proven in [29] using the Sathi-Vardy bound. We provide
an alternative argument which will be used in the proof of proposition 7. We
prove that
W − H∞ (h|P ) = H(h) − H(P ) − H∞ (h|P ) ≥ 0
the marginal probability of an equivalence class with bi =
P Let bi denote
?
j hij . Also, hi = maxj hij .
The above inequality can then be written as
X
X
X
bi log bi −
hij log hij + log
h?i ≥ 0.
i
ij
i
An upper bound for the second term is given by
X
X
X
hij log hij ≤
hij log h?i =
bi log h?i
ij
ij
(22)
i
so that it will suffice to prove
X
i
bi log bi −
X
i
bi log h?i + log
X
h?i =
i
=
X
i
bi log
X
bi
+ log
h?i ≥ 0
?
hi
i
From Theorem 2.7.1 in Cover and Thomas [13] we have that
P
X
X
bi
bi
bi log P i ?
bi log ? ≥
hi
i hi
i
i
(23)
(24)
Replacing the first term in the second line of (23) with the above we have
P
X
X
P
bi
h?i = log P 1h? + log i h?i = 0
bi log P i ? + log
i
i
i hi
i
i
P
(since i bi = 1). This concludes the proof.
We can now strengthen proposition 5 to precisely characterise when dissipation and H∞ (h|P ) coincide:
Proposition 7. H∞ (h|P ) = W iff the input is uniformly distributed and the
output is uniformly distributed.
The proof follows from the following observations: if we relax the requirement
of uniform distribution on the input then inequality 22 is strict. If we relax the
requirement of uniform distribution on the output then inequality 24 is strict.
Any of these will make proposition 6 a strict inequality.
19
5.1. Dissipation and Intrinsic Source Code Threat
A further interesting connection between both measures of confidentiality
and thermodynamics is given by considering the following problem: judging
only from the source code, which of two programs P, P 0 is more of a confidentiality threat? One way to look at this problems is to argue that if we only
know the source code we shouldn’t make assumptions on any particular a priori
distribution on h, so it is natural to define the ordering P ≤H P 0 iff for all
possible a priori distributions on h, P leaks less than P 0 . In terms of Shannon’s
entropy we formalise this by
P ≤H P 0 ⇐⇒ ∀µh . H(P ) ≤ H(P 0 ).
where µh ranges over all distributions on h.
Similarly we define a min-entropy order P ≤M P 0 by considering all possible
a priori distributions, i.e.
P ≤M P 0 ⇐⇒ ∀µh . H∞ (h) − H∞ (h|P ) ≤ H∞ (h) − H∞ (h|P 0 ).
Finally we define a dissipativity order ≤W based on security over all possible a
priori distributions by
P ≤W P 0 ⇐⇒ ∀µh . H(h|P ) ≤ H(h|P 0 )
Proposition 8. For deterministic programs the following relations hold:
∀P, P 0 . P ≥W P 0 ⇐⇒ P ≤H P 0 ⇐⇒ P ≤M P 0 .
The first equivalence is intuitive and follows from the definition of W : the
more information is leaked the less dissipation is required. The second equivalence is, in the light of differences between entropy and guessability, more
surprising and is proved reasoning in terms of the observational equivalence in
[25] or in more syntactic terms in [35].
6. Dynamics of Computation: timing channels
We now consider timing channels. As timing behaviours belong to the dynamics of the system, all considerations must be restricted to computers with
a specific dynamics. Current computers are overwhelmingly clock-based. The
problem with this choice of time evolution is that it is very wasteful: a state
transition activated by a clock wastes far too much energy compared to the
fundamental thermodynamic requirements and obfuscates the thermodynamic
properties of timing channels arising from the correlation between minimum
energy consumption and speed of computation.
Physicists interested in the thermodynamics of computation have studied
low–energy yet powerful models of computation like Brownian computers [4, 14].
Biology is a good example of the reliability and complexity of such systems: an
20
$
$
(L
(L
(L
İ
L
(L
L
(L
L
(L
L
L
Figure 6: Transitions in a Brownian computer without (left) or with (right) branching. A:
activation energy; Ei : energy of state i; : driving potential.
example of Brownian computing is given by DNA replication, transcription and
translation.
In general, Brownian computers transverse a well-defined, deterministically
arranged sequence of states under the influence of thermal agitation. Starting
from the input state, the computer is allowed to advance or backtrack at random
on the computational path until it reaches the final state of the computation
(think of a complex chemical reaction proceeding backwards and forwards between intermediate products until final equilibrium). Mathematically Brownian
computations are random walks, i.e. systems with forward and backward probabilistic transitions between states. Notice that there is no contradiction between
random walk operation and the deterministic nature of the computation. The
computer evolves along a well-defined sequence of states that it can visit; the
randomness is about the forward/backward dynamics only and a trajectory may
well have just one end point.
6.1. Evolution of Brownian computers
At a physical level a Brownian computer evolves by transversing a series of
states separated by an energy barrier A comparable to the thermal agitation
energy kB T (Figure 6 left). It is a standard result in statistical mechanics that
the probability of transition from one state to another, separated by a (positive)
energy difference δE, is proportional to exp(−δE/kB T ) [14].
When the activation energy A becomes available to the system because of
thermal fluctuations, the system has two alternatives: it can either proceed
forwards with probability
f = C exp (−(A − Ei )/kB T )
(25)
or backtrack to the state Ei with probability
b = C exp (−(A − Ei+1 )/kB T ) ,
(26)
where C is a normalisation constant.
Let r = f /b be the ratio of the forward to backward transition rates. We
can then write
kB T ln r = −(Ei+1 − Ei ) = −δE,
(27)
21
which shows that on the average the machine will move towards the state of
lower energy. One can thus bias the computation forwards by setting Ei+1 < Ei
by a small amount ; this energy is effectively dissipated in the transition.
6.2. Branching
Before considering the evolution of the system in terms of computations, it
is useful to extend our analysis to the case of the transition from two possible
antecedent states to a single posterior state (Figure 6 right). We assume for
simplicity that the two antecedent states have equal energy Ei .
In this case, we still have
f = C 0 exp (−(A − Ei )/kB T )
(28)
for the forward transition, but we should write
b = 2C 0 exp (−(A − Ei+1 )/kB T )
(29)
for the backwards transition to account for the fact that the system has two
possibilities of moving backwards and only one of moving forwards (indeed,
from the above if Ei = Ei+1 we obtain that b = 2f , which confirms that the
choice is entirely unbiased).
From the above we obtain, for r = f /b, the expression
kB T ln(f /b) = kB T ln r = −kB T ln 2 − (Ei+1 − Ei ).
(30)
Remembering that δS = kB ln 2 is the reduction in entropy associated to the
transition from two possible antecedent states in step i to a single state in step
i + 1 and assuming that T is held constant we can write this as
kB T ln r = −δE + T δS = −δF
(31)
where F is the free energy of the system (more in general, when entropy variations are involved the transition probabilities are proportional to exp(−δF/kB T )).
Once again, computation will on the average go forward if the right hand
side is larger than zero, and otherwise it will go backwards; however, this time,
an entropy term that was absent from Equation 27 appears. This has interesting
consequences for the kinetics of irreversible computations, that will be explored
in the following sections.
6.3. Speed of computation
The link between the variation of free energy at each step and the speed of
computation can be made more explicit for the case that is small [14]. Writing
the forward rate as f and the backwards rate as b we have: f = b + θ and
therefore:
− δF = kB T ln f /b = kB T ln(1 + θ/b) ≈ kB T θ/b =
= kB T (f − b)/b ≈ kB T
22
f −b
(f + b)/2
(32)
where the linearisation on the first line and the approximation on the second
hold in the limit of small θ. On the right hand side, f − b is proportional to the
velocity vdrif t with which the computation moves forward. The denominator
(f + b)/2 is the average rate of transition due to Brownian motion, and can be
interpreted as the maximum speed at which the system would move forward, if
all the transitions were in the right direction. We indicate this with vtherm . We
can then write
vdrif t
,
(33)
+ T δS = −δF ≈ kB T
vtherm
where for convenience we have introduced the (positive) quantity = −δE. Note
that vtherm depends on the characteristics of the system and on the temperature
rather than on the driving potential .
6.4. Irreversibility and dissipation
As Equation 33 shows, to a first approximation the velocity with which the
computation proceeds depends linearly on . This energy is not stored in the
system in any useful way, rather it is dissipated as heat at temperature T ; it
is the energy dissipation per step. The linear dependency is characteristic of
diffusive processes [15].
The precise details of the energetic balance depend on the nature of the
computation. For a logically reversible, one-to-one computation we can represent the succession of states that the computer transverses as a sequence with
no bifurcations. In that case the variation of entropy of the computer is zero,
i.e. δS = 0 in equation 33. Hence for computation to proceed on the average it
is enough that a small energy gradient is applied to bias the system towards
forward evolution. In this case, the machine dissipates energy at each step;
this can be seen as the price to pay for carrying out the computation at the
desired speed. In the genetic apparatus, dissipation is of the order of 102 kB T
per operation [5]. However, if we are prepared to wait for the result can be
arbitrarily small (ignoring random errors).
However, logically irreversible computations are a different case. Each time
two computational paths converge the state space accessible to the system is
reduced. This can be enough to make the drift velocity vdrif t in Equation 33
negative unless the energy dissipated at each step is larger than −T δS. For an
elementary step of computation with two–way branching this is equivalent to the
requirements of the Landauer principle. On the average, as shown in Section 4,
when mediated across all possible inputs and their respective computational
paths the minimum dissipation required is lower bounded by W kB T ln 2.
7. Time channels
The fact that a Brownian computer goes through a certain number of discrete states at a given average drift speed (equation 33), albeit with a slightly
exotic dynamics, naturally implies the existence of time channels. Much as in
the standard paradigm of synchronous, clock-driven computing, calculations requiring a higher number of steps will on the average take longer to complete.
23
For Brownian computers, however, the effectiveness of such time channels turns
out to depend on the amount of energy used to drive the computation forward.
Surprisingly, Brownian computers also exhibit an entirely different, novel
class of time channels that are intrinsically linked to the degree of irreversibility
of a computation and that allow discriminating between computations requiring
an identical number of operations.
In the following sections we will detail the two types of time channels and
illustrate their dependency on energy and irreversibility.
7.1. Length of computation and time
The most obvious time channel is linked to the number of steps required by
a computation. Let us for simplicity consider a one-to-one computation corresponding to a non-branching computational path, and assume that the energy
profile for each of its steps is as displayed in Figure 6 (left) but with kB T .
Thus the activation energy is still low, but the large driving potential makes
backward transitions very unlikely (in fact, irrespective of the level of branching
of the computational path). In this case the system behaves more like a standard
sequential machine than as a Brownian computer; the length of the computational path (together with the availability of the activation energy required for
a transition) becomes the only variable determining the average duration of the
computation. A similar behaviour can be expected for computational paths
with little branching, where the entropy term in Equation 33 is small compared
to the driving potential . We can think of this case as the “synchronous” limit.
As the driving potential decreases, however, the computer will spend more
and more time retracing steps of the computation at random. In fact, without
any driving force, a Brownian computer would drift away (in either direction)
from the starting state by effect of thermal agitation. The variance of its position
ν after n = vtherm t random steps would be hν 2 i = n. In a first approximation,
we can assume this dispersion to be superposed to the net displacement represented by vdrif t , thus making it more difficult for an attacker to estimate the
number of steps of the computation from a timing measure. As the calculations
in Section 7.3.1 show, for very low driving energies position uncertainty can
effectively mask these timing channels.
7.2. Entropy timing channels
An entirely novel type of time channel directly related to the logical irreversibility of computation arises in Brownian computers when the driving potential is low. In this case, the contribution of the entropy term T δS in Equation 33
to vdrif t cannot be discarded.
To see how this can induce a timing channel, consider the following example
code:
l=0;
for(i =0; i< n; i++)
{ l=l+h[i];
h[i]=0;}
24
l=0;
that computes the number of bits of the secret h that are set. The number of
operations is here independent of the input — and so, in the standard setting,
would be the time required by computation.
However, for Brownian computers equation 33 introduces a dependence of
the speed of computation on input through the term T δS. The key observation
is here that the total entropy variation for the computation of P (h) is ∆S =
−H(h|P = o)kB ln 2, and that in general this will depend on the input. In
other words, paths corresponding to different inputs will have different average
degrees of branching per step (their length being equal).
Since it is reasonable to assume that the driving potential is set once for
all independent of the input, this will necessarily lead to a different vdrif t and
therefore to a different average duration of the computation. Paths showing
a higher degree of branching (maximum entropy reduction) will take longer,
as the computer can be expected to drift backwards into the different possible
antecedents of each node. Thus an attacker will be able to discriminate between
inputs associated with a different degree of branching on the basis of the time
taken by the computation. As calculation in Section 7.3.2 below show, this
effect becomes more relevant for small driving potential, i.e. the situation in
which “standard” timing channels are fuzzed by random oscillations.
7.3. Timing channels and variance
We present here a simple computational treatment of the two types of timing
channels described in Section 7.1 and 7.2 above, that illustrates their main
characteristic and the trade-offs involved.
In order to simplify calculations we assume that the computer can move
forward with probability f and backwards with probability b at each and every
step, and that each transition takes the same time.
We also assume that branching computational paths have a constant degree
of branching at each node, as would be the case for example of a complete binary
tree. In this case, since we are only interested in the level of the tree reached
by the computer at a given time, we can still describe the computation as a
one-dimensional asymmetric (f 6= b) random walk, where the current position
corresponds to the level the system reached in the tree (see Figure 7).
Under these assumptions, the probability that the system is found in position
(or at tree level) ν after a total of n transition is equal to the probability that
exactly (n + ν)/2 such transitions are forwards, and consequently (n − ν)/2 of
them are backwards:
n+ν
n−ν
n
n+ν
, n = n+ν f 2 b 2 .
(34)
Bf
2
2
where we assume that the binomial coefficient is zero if (n + ν)/2 is not an
integer or |ν| > n.
If we assume that the last step of the computation is a potential well, so
that the computer will get trapped when it reaches the end of the computation
25
Q
Q
Figure 7: Two random walks (top: non-branching, bottom: uniformly branching) and the
corresponding computational positions (tree levels)
for the first time (and some flag will be raised to signal that termination has
occurred), it can be shown that the probability µ(n|ν; f ) that computation of ν
steps will terminate in n transitions is given by
n+ν
n−ν
ν
ν
n+ν
n
,n =
f 2 b 2
(35)
µ(n|ν; f ) = Bf
n
2
n n+ν
2
for positive ν.
7.3.1. Length of computation channel:
Let’s consider two computations requiring a different number of operations
ν1 6= ν2 . Let us assume that the driving potential is set so that the forward
rate f is the same in the two cases.
√
According to the DeMoivre–Laplace theorem [33], for |k − nf | . nf b we
have
(k−nf )2
n k n−k
1
f b
'√
e− 2nf b .
k
2πnf b
Substituting this into Equation 35 yields
µ(n|ν; f ) '
( 1 (n+ν)−nf )2
1
ν
2
2nf b
√
e−
n 2πnf b
(36)
Notice that the graph of the last factor is not a Gaussian curve, as n also appears
in the denominator of the exponent.
Figure 8 (left) shows the probability of the total number of transitions
µ(n|ν1 ; f ), µ(n|ν2 ; f ) for ν1 requiring 600 operations, ν2 requiring 800 operations, and f = 0.7. Perhaps somewhat unsurprisingly we see that, for instance,
a computation terminating around time 1150 is most likely to be ν1 , while one
terminating after 1400 steps is more likely to be ν2 . This is akin of the common sense intuition about timing channels: longer computations take longer to
terminate.
26
Figure 8: Halting probability for computations of different lengths and same forward rate as
a function of the number of transitions. Left: the two curves are clearly separated when f
is high. Here ν1 = 600, ν2 = 800 and f = 0.7. Right: the distributions overlap extensively
when f & 0.5 (here f = 0.501). Notice the very wide range of transition numbers over which
this happens.
Interestingly however, as f decreases such channels tends to disappear as
the variance of the distributions increases so as to blur the distinction between
the two curves. Figure 8 (right) shows the distributions corresponding to the
same computations for f = 0.501: over a huge range of total transition numbers
there is no reliable way to determine which computation has finished, as both
probabilities are comparable.
7.3.2. Entropy channels:
These timing channels are somewhat counterintuitive as they do not derive
from a difference in the number of transitions but rather in the degree of branching, i.e. the difference in entropy variation between computations, that through
Equation 33 results in a different speed. Consider two computations of the same
length ν1 = ν2 = ν but different speed f1 6= f2 . To study these channels we
consider the Log–likelihood of the probabilities:
(n+ν)/2 (n−ν)/2
ln
µ(n|ν; f1 )
f
b
= ln 1(n+ν)/2 1(n−ν)/2 =
µ(n|ν; f2 )
f
b
2
2
1
f1
1
b1
= (n + ν) ln
+ (n − ν) ln
=
2
f2
2
b2
1
f1 b1
1
f 1 b2
= n ln
+ ν ln
2
f2 b2
2
f 2 b1
(37)
If we now set φ = ln(f1 /f2 ), β = ln(b1 /b2 ) we obtain
ln
µ(n|ν; f1 )
1
1
= n(φ + β) + ν(φ − β).
µ(n|ν; f2 )
2
2
27
(38)
Figure 9: Timing channels due to branching. The distributions correspond to the halting
probabilities for two computations of the same length (600 steps), the leftmost with f = 0.75
and the rightmost with f = 0.70.
Assuming that branching is higher on the second path than on the first path, so
that f1 > f2 , we have φ > 0 and β < 0. Besides, it is easy to see that |φ| < |β|.
Therefore, the line in Equation 38 has negative slope and a positive intercept.
For
β−φ
n≤
ν = n∗
(39)
β+φ
the computation is more likely to have followed the first path. Note that when
f1 is close to f2 the slope of the line becomes small, making it more difficult to
distinguish between the two paths (for f1 = f2 the likelihood ratio is identically
1). Conversely, for f1 significantly larger than f2 a small deviation from n∗ is
enough to tell apart the path with confidence.
Sample probability densities for this kind of timing channels (computed using
the approximation in Equation 36) are illustrated in Figure 9.
Notice that length of computation and entropy channels will in general coexist and that typical security counter-measures for timing channels like padding
or noise insertion will need to take into account both kinds of channels.
8. Practical implications
The work here presented is of foundational nature and deals with general
paradigms of computing rather than with specific implementations; its aim is to
advance our scientific understanding of confidentiality. We make no claim at the
moment about major applications of these ideas to come in the near future. It
is however worth spending a few words in relating these ideas to some practical
applications of thermodynamics to security.
The first that comes to mind is power analysis attacks. This kind of attacks,
that rely on differential energy consumption in different circuit paths, have been
very successful in breaking cryptographic implementations [34]; in fact they are
among the most successful crypto security attacks to date [30]. For low energy
devices such as Brownian computers these energy side channels could arguably
28
be studied using the energy-speed correlation from Sections 6 and 7. Other kind
of power analysis, for example of authentication systems, could also in principle
be related to the first part of this work, though accessing all information leaked
by the system in specific states might require a more detailed modelling of the
micro-states of the system based on statistical mechanics rather than on classical
thermodynamics.
Overall key to the applicability of this work is the development of low energy computers. The energies we considered are minuscule as compared to the
dissipation of nowadays transistors (≈ 106 kB T per transition). However nanotechnology is slowly lowering this figure to a point where they will no longer be
irrelevant. Carbon nanotube memories with switching energies of the order of
103 kB T have been feasible, at least at the prototype stage, for over a decade [27].
Very recently, experimental work implementing a Szilard engine [31] has brought
the Landauer principle within the realm of experimental validation. Molecular
computation has similarly proved its feasibility; early applications interestingly
included cryptoanalysis [1] as well as autonomous DNA–based computation [3]
quite related to the Brownian computing scenario we consider here. Timing
aspects of DNA–based computers are actually the subject of current investigation [8]. For computers operating so close to reversibility the energy cost
of security presented in this paper would clearly be significant. Once technology pushes devices to energy limits comparable to thermal agitation, further
efficiency will only be achievable by making calculations reversible wherever
possible. At that stage, security will become a hard lower bound on dissipation,
and a secure system protecting a large amount of data will need to dissipate
a comparatively sizeable amount of energy. Crucially if the dissipation of the
system were below a reasonable multiple of W serious doubts on its security
could be raised.
9. Conclusions
The study of thermodynamic aspects of computation dates back to the pioneers of computing starting with Von Neumann. Following works by Landauer
and later Friedkin and Toffoli and Bennett illustrated how all computations can
be executed reversibly. Thus dissipation, while of great practical importance,
seems to have little foundational status in computer science.
Here we established a fundamental relation between dissipation and secure
computation by proving that two of the main metrics of confidentiality in computer security are essentially measures of dissipation in the thermodynamic
sense. These results provide thermodynamic foundations for confidentiality,
with Landauer’s principle thus implying a fundamental lower bound to the energetic cost of secure computation.
We also explored the relationship between the dynamics of computation and
security for the case of Brownian computers. As we showed, this computing
paradigm supports a novel type of time channels directly related to the irreversibility of computation, and thus to security. This further suggests that when
29
computer technology is pushed towards the limits of energy efficiency, interesting correlations between quantities normally considered unrelated by computer
scientists (such as energy, entropy and time) may appear; these will require a
detailed analysis.
Understanding the physics of confidentiality contributes to the debate on
the role of irreversibility in other minimally dissipative systems including nano
technologies, molecular and biological computation and quantum computing.
Applied fields such as the study of power analysis attacks are also likely to
benefit.
References
[1] L. M. Adleman, P. W. K. Rothemund, S. Roweis, E. Winfree: On applying
molecular computation to the Data Encryption Standard, Journal of Computational Biology 6(1), 53–63, 1999
[2] M. S. Alvim, M. E. Andrés, K. Chatzikokolakis, C. Palamidessi: On the
Relation between Differential Privacy and Quantitative Information Flow.
ICALP (2) 2011: 60-76
[3] Y. Benenson, T. Paz-Elizur, R. Adar, E. Keinan, Z. Livneh, E. Shapiro:
Programmable and autonomous computing machine made of biomolecules,
Nature 414(6862):430–434, 2001
[4] C. Bennett. Logical Reversibility of computation. IBM J.Res.Develop. 17,
525-532. 1973.
[5] C. Bennett. Dissipation-error tradeoff in proofreading. Biosystems 11, pg
85–91. 1979.
[6] K. Chatzikokolakis, C. Palamidessi, P. Panangaden: Anonymity protocols
as noisy channels. Information and Computation 206(2-4): 378-401 (2008)
[7] H. Chen, P. Malacaria: Quantifying maximal loss of anonymity in protocols.
ASIACCS 2009: 206-217
[8] N. Aubert, Y. Rondelez, T. Fujii, M. Hagiya: Enforcing delays in DNA
computing systems, 18th Intern. Conf. on DNA Computing and Molecular
Programming (DNA18), Aarhus, Denmark, August 2012
[9] T. Chothia, V. Smirnov. A Traceability Attack against e-Passports. Financial Cryptography 2010: 20-34
[10] D. Clark, R. Hieron. Squeeziness: An information theoretic measure for
avoiding fault masking. Information Processing Letters Volume 112, Issues
89, 30 April 2012, Pages 335-340
[11] D. Clark, S. Hunt, P. Malacaria: Quantitative information flow, relations
and polymorphic types. Journal of Logic and Computation, 18(2):181-199,
2005.
30
[12] D. Clark, S. Hunt, P. Malacaria: A static analysis for quantifying information flow in a simple imperative language. Journal of Computer Security,
Volume 15, Number 3. 2007.
[13] T. Cover, J. Thomas. Elements of Information Theory. Wiley-Interscience
publications. 1991.
[14] R. P. Feynman. Feynman Lectures on Computation. Edited by A. Hey and
R. Allen. Addison Wesley 1996.
[15] R. P. Feynman, R. B. Leighton and M. Sands. The Feynman Lectures on
Physics. Vol 1. Addison Wesley 1964.
[16] E. Fredkin, T. Toffoli. Conservative logic. International Journal of Theoretical Physics, 21:219253, 1982.
[17] B. Köpf, D. Basin: An information-theoretic model for adaptive sidechannel attacks. Proceedings ACM conference on Computer and Communications Security, 2007, 286-296.
[18] B. Köpf, G. Smith: Vulnerability Bounds and Leakage Resilience of Blinded
Cryptography under Timing Attacks. CSF 2010: 44-56
[19] K. Chatzikokolakis, C. Palamidessi, P. Panangaden: On the Bayes risk in
information-hiding protocols. Journal of Computer Security (JCS) 16(5):531571 (2008)
[20] J. Heusser, P. Malacaria: Quantifying Information Leaks In Software. Proceedings ACM Annual Computer Security Applications Conference, ACSAC
2010, Austin, Texas. ACM 2010.
[21] R. Landauer. Dissipation and heat generation in the computing process.
IBM J.Res.Develop., 5, 148-156. 1961.
[22] H. Leff and A. Rex editors. Maxwell’s Demon 2, Entropy, Classical and
Quantum Information, Computing. Institute of Physics publishing 2003.
[23] P. Malacaria and F. Smeraldi, The Thermodynamics of confidentiality, in
Proc. of IEEE Computer Security Foundations Symposium, CSF 2012
[24] P. Malacaria. Assessing security threats of looping constructs. Proc. ACM
Symposium on Principles of Programming Language, POPL 2007.
[25] P. Malacaria. Algebraic foundations for quantitative information flow,
Mathematical Structures in Computer Science, in press.
[26] A. Rényi: On measures of information and entropy. Proceedings of the
4th Berkeley Symposium on Mathematics, Statistics and Probability 1960:
547-561.
31
[27] T. Rueckes, K. Kim, E. Joselevich, G. Y. Tseng, C.-L. Cheung, C.
M. Lieber: Carbon nanotube–based nonvolatile random access memory for
molecular computing, Science Vol. 289, July 7th, 2000: 94-97
[28] G. Smith: On the Foundations of Quantitative Information Flow. In Proc.
FOSSACS 2009: Twelfth International Conference on Foundations of Software Science and Computation Structures LNCS 5504, pp. 288-302, York,
UK, March 2009.
[29] G. Smith: Quantifying Information Flow Using Min-Entropy. In Proc.
QEST 2011: 159-167
[30] F.-X. Standaert, N. Veyrat-Charvillon, E. Oswald, B. Gierlichs, M. Medwed, M. Kasper and S. Mangard. The World Is Not Enough: Another Look
on Second-Order DPA. In: Advances in Cryptology - ASIACRYPT 2010,
pages 112-129. Springer LNCS 6477, December 2010.
[31] S. Toyabe, T. Sagawa, M. Ueda, E. Muneyuki, M. Sano (2010-09-29), Information heat engine: converting information to energy by feedback control, Nature Physics 6 (12): 988-992, http://dx.doi.org/10.1038/nphys1821,
arXiv:1009.5287, Bibcode 2011NatPh...6..988T.
[32] K. Zhang, Z. Li, R Wang, X. Wang, and S. Chen. Sidebuster: Automated
Detection and Quantification of Side-Channel Leaks in Web Application Development. In Proc ACM CCS 2010.
[33] A. Papoulis, S. U. Pillai: Probabilities, random variables and stochastic
processes, McGraw-Hill 2002
[34] P. Kocher, J. Jaffe, B. Jun. Differential Power Analysis. in Advances in
Cryptology - Crypto 99 Proceedings, Lecture Notes In Computer Science
Vol. 1666, M. Wiener, ed., Springer-Verlag, 1999.
[35] H. Yasuoka, T. Terauchi: Quantitative Information Flow - Verification
Hardness and Possibilities. CSF 2010: 15-27
32
Download PDF