Part II - Random Processes Goals for this unit: • Give overview of

Part II - Random Processes Goals for this unit: • Give overview of
Part II - Random Processes
Goals for this unit:
• Give overview of concepts from discrete probability
• Give analogous concepts from continuous probability
• See how the Monte Carlo method can be viewed as sampling technique
• See how Matlab can help us to simulate random processes
• See applications of the Monte Carlo method such as approximating an integral or finding extrema of a function.
• See how Random Walks can be used to simulate experiments that can’t
typically be done in any other way.
• Random processes are important because most real world problems exhibit
random variations; the term stochastic processes is also used.
• The Monte Carlo Method is based on the principles of probability and statistics so we will first look at some basic ideas of probability.
• Similar to the situation where we looked at continuous and discrete problems
(recall this was the main division in Algorithms I and II) we will consider
concepts from both discrete and continuous probability.
• When we talk about discrete probability we consider experiments with a finite
number of possible outcomes. For example, if we flip a coin or roll a die we
have a fixed number of outcomes.
• When we talk about continuous probability we consider experiments where
the random variable takes on all values in a range. For example, if we spin a
spinner and see what point on the circle it lands, the random variable is the
point and it takes on all values on the circle.
Historical Notes
Archaeologists have found evidence of games of chance on prehistoric digs, showing that gaming and gambling have been a major pastime for different peoples
since the dawn of civilization. Given the Greek, Egyptian, Chinese, and Indian
dynasties’ other great mathematical discoveries (many of which predated the
more often quoted European works) and the propensity of people to gamble, one
would expect the mathematics of chance to have been one of the earliest developed. Surprisingly, it wasn’t until the 17th century that a rigorous mathematics
of probability was developed by French mathematicians Pierre de Fermat and
Blaise Pascal. 1
1 Information
taken from MathForum, http://mathforum.org/isaac/problems/prob1.html
The problem that inspired the development of mathematical probability in Renaissance Europe was the problem of points. It can be stated this way:
Two equally skilled players are interrupted while playing a game of chance for a
certain amount of money. Given the score of the game at that point, how should
the stakes be divided?
In this case “equally skilled” indicates that each player started the game with
an equal chance of winning, for whatever reason. For the sake of illustration,
imagine the following scenario.
Pascal and Fermat are sitting in a cafe in Paris and decide, after many arduous
hours discussing more difficult scenarios, to play the simplest of all games, flipping
a coin. If the coin comes up heads, Fermat gets a point. If it comes up tails,
Pascal gets a point. The first to get 10 points wins. Knowing that they’ll just
end up taking each other out to dinner anyway, they each ante up a generous
50 Francs, making the total pot worth 100. They are, of course, playing ’winner
takes all’. But then a strange thing happens. Fermat is winning, 8 points to 7,
when he receives an urgent message that a friend is sick, and he must rush to his
home town of Toulouse. The carriage man who has delivered the message offers
to take him, but only if they leave immediately. Of course Pascal understands,
but later, in correspondence, the problem arises: how should the 100 Francs be
divided?
How would you divide the 100 Francs? After we discuss probability you should
be able to divide the money and argue that it is a fair division.
What do we mean by probability and events?
• Probability measures the chance of a given event occurring. It is a number
between 0 and 1.
• The complement of an event is everything that is not in the event; for
example the complement of 3 or less hurricanes hitting Florida in 2011 is 4
or more hurricanes hitting Florida.
• If the probability of an event is 1 then it is certain to occur.
• A probability of 0 represents an event that cannot occur, i.e., is impossible.
• Probabilities in between 0 and 1 are events that may occur and the bigger
the number the more likely that the event will occur.
• “Rain tomorrow”, “crop yield”, “number of hurricanes to hit Florida’s coast
in 2011”, “ the number of times the sum of two dice thrown is even” are
all examples of events for which we may be interested in determining their
probabilities.
Notation & Terminology:
• Let X denote the random variable, for example the sum of two dice when
they are thrown.
• The set of all possible outcomes is denoted Ω. For example, if we roll one
die the possible outcomes are Ω = {1, 2, 3, 4, 5, 6}.
• We will denote an event as E. So for example, if we roll a single die and
want to determine the likelihood of the result being an even number, then
E = {2, 4, 6}; note that E ⊂ Ω.
• We are interested in determining the probability of an event and will denote
it p(E).
The sum of all probabilities for all possible outcomes is 1.
The probability of an event will be the sum of the probabilities of each
outcome in the event.
The probability of the complement of an event E will be 1 − p(E).
What is meant by the probability of an event?
Suppose the weather forecast is that the probability of rain tomorrow is 1/4=0.25.
We say the likelihood of rain is 25%. What does this mean?
The complement of this event is that there is NO rain tomorrow so the probability
of no rain is
1
1 − = 0.75
4
So there is a 75% chance of no rain tomorrow and thus it is 3 times more likely
that there is no rain tomorrow than there is rain.
Unions and Intersections
Consider the following two outcomes:
• A= { 3 or less hurricanes to hit Florida in 2011}
• B= { 4 or 5 hurricanes to hit Florida in 2011}
Then A ∪ B is { 5 or less hurricanes to hit Florida in 2011}.
The probability of A ∪ B is p(A ∪ B) = p(A) + p(B) because the events are
mutually exclusive.
Then A ∩ B is zero because both events can’t be satisfied.
Now consider two events which are not mutually exclusive.
• C = { rain tomorrow}
• D = { rain the day after tomorrow}
Then C ∪ D = { rain in the next two days} (i.e., rain tomorrow or rain the next
day).
Now the intersection of C and D is everything that is in event C and in event
D so
.
C ∩ D = { rain tomorrow and rain the next day}
What is the probability of C ∪ D? We can’t simply sum the probabilities. For
example if there was a probability of 1/4 each day (i.e., 25% chance) then we
can’t say there is a 50% chance that it will rain in the next two days. In the case
where the events are not mutually exclusive we have
p(C ∪ D) = p(C) + p(D) − p(C ∩ D)
Conditional Probability
• If we know that one event has occurred it may change our view of the
probability of another event. For example, let
A = {rain today},
B = {rain tomorrow},
C = {rain in 90 days from now}
Assume that knowledge that A has occurred will change your view of the
probability that B will occur, but not of the probability that C will occur.
• We write p(B|A) to denote the conditional probability of B, given A.
• With our assumptions, p(C|A) = p(C) but p(B|A) 6= p(B).
• We say that A and C are independent, but A and B are not.
• For independent events A and C, the probability of p(A ∩ C) = p(A)p(C).
Example If we roll a single die (with sides 1, 2, 3, 4, 5, 6) what is the probability
that you will roll the number 4? If we roll two dice, what is the probability that
they will both be 4?
Clearly all numbers are equally probable and since the probabilities have to sum
to 1 and there are 6 possible outcomes, p(X = 4) = 1/6. To determine the
probability that a 4 occurs
on both dice, we note that they are independent events
1
so the probability is 16 16 = 36
.
Example Let’s return to the Fermat/Pascal problem of flipping a coin. We
know that heads and tails are each equally likely to occur. If heads occurs then
Fermat gets a point, otherwise Pascal gets one. When the game is interrupted
the score is Fermat 8 and Pascal 7. There is 100 francs at stake so how do you
fairly divide the winnings?
We first ask ourselves what is the maximum number of flips of the coin that
guarantee someone will win? Clearly Fermat needs 2 points and Pascal 3. The
worse case scenario would be for Fermat to get 1 and Pascal to get 3 so in 4
more flips we are guaranteed a winner.
If we determine the possible outcomes in these 4 flips then we can assign a winner
to each. Then Fermat’s probability of winning will be the number of outcomes
he would win over the total number of outcomes; similarly for Pascal. Then we
will know how to divide the money.
Here are the 24 possible outcomes for 4 flips and we have indicated the winner
in each case remembering that Fermat needs 2 heads to win and Pascal 3 tails.
HHHH (F)
HTHH (F)
TTTT (P)
THTT (P)
HHHT (F)
HTTH (F)
TTHT (P)
THHH (F)
HHTH (F)
HTHT (F)
TTTH (P)
THTH (F)
HHTT (F)
HTTT (P)
TTHH (F)
THHT (F)
So Fermat will win 11 out of 16 times so his probability is 11/16. Pascal will
win 5 out of 16 times so his probability is 5/16. Fermat should get 11/16 of the
money and Pascal the rest.
Simulating random processes using Matlab’s rand command
• Random events are easily simulated in Matlab with the function rand, which
you have already encountered. A computer cannot generate a truly random
number, rather a pseudorandom number, but for most purposes it suffices
because it is practically unpredictable.
We have seen that if you use the command x=rand() you get a number
between 0 and 1. It can also generate row or column vectors. For example,
rand(1,5) returns a row vector of five random numbers (1 row, 5 columns)
such as
0.2416 0.9644 0.4519 0.3278 0.8913
a
b
c
d
e
=
=
=
=
=
rand
rand
rand
rand
rand
(
(
(
(
(
)
5, 1 )
1, 5 )
3, 4 )
5 )
<-- a scalar value;
<-- a column vector;
<-- a row vector;
<-- a matrix;
<-- a 5x5 matrix;
• If we want to generate n integers between say 0 and k then we first generate
the numbers between 0 and 1 using x=rand(1,n) and then use either of
the following commands:
x=floor ( (k+1)*x ) )
x=ceil ( k*x )
For example, if we generate 5 random numbers
.8147,
.9058 .1770 .9134 .6324
and we want integers ≤ 10 then x=floor (11*x ) ) gives {8, 9, 1, 10, 6}
and x=ceil ( 10*x ) gives {9, 10, 2, 10, 7}.
• The only thing that may be disconcerting is that if we run our program
which is based on generating random numbers two times in a row, we will
get different numbers! It is sometimes useful (such as in debugging) to be
able to reproduce your results exactly. If you continually generate random
numbers during the same Matlab session, you will get a different sequence
each time, as you would expect.
• However, each time you start a session, the random number sequence begins
at the same place (0.9501) and continues in the same way. This is not true
to life, as every gambler knows.
• Clearly we don’t want to terminate our session every time we want to reproduce results so we need a way to “fix” this. To do this, we provide a seed for
our random number generator. Just like Matlab has a seed at the beginning
of a session we will give it a seed to start with at the beginning of our code.
Seeding rand
• The random number generator rand can be seeded with the statement
rand(state, n)
where n is any integer. By default, n is set to 0 when a Matlab session
starts.
• Note that this statement does not generate any random numbers; it only
initializes the generator.
• Sometimes people will use the system clock to make sure the seed is being
reset
rand(state, sum(100 ∗ clock))
• Try setting the seed for rand, generating a random number, then resetting
the seed and generate another random number; see that you get the same
random number two times in a row; e.g.,
rand(′state′, 12345)
rand(1)
Exercise Use two calls to Matlab’s rand command to generate two random row
vectors of length 4. Now repeat the exercise where you reset the seed before
each call to rand. You should get the same vector each time in this latter case.
Probability Density Function
• If we measure a random variable many times then we can build up a distribution of the values it can take.
• As we take more and more measurements and plot them, we will see an
underlying structure or distribution of the values.
• This underlying structure is called the probability density function (PDF).
• For a discrete probability distribution the PDF associates a probability with
each value of a discrete random variable.
For example, let the random variable X be the number of rainy days in a
10 day period. The number of outcomes is 11 because we can have 0 days,
1 day, . . ., 10 days. The discrete PDF would be a plot of each probability.
• For a continuous probability distribution the random variables can take all
values in a given range so we can not assign probabilities to individual values.
Instead we have a continuous curve, called our continuous PDF, which allows
us to calculate the probability of a value within any interval.
• How will we determine the probability for a range of values from the continuous PDF? We will see that the probability is calculated as the area under
the curve between the values of interest.
• We first look at discrete PDFs.
Estimating the Probabilities & the Discrete PDF
• If we have an experiment (like rolling dice or flipping a coin) we may be
able to compute the exact probabilities for each outcome (as we did in the
Fermat-Pascal problem) and then plot the PDF.
• But what if we don’t have the exact probabilities? Can we estimate them
and then use the estimates to plot the PDF?
The answer is yes. We essentially take many measurements of the random
variable and accumulate the frequency that each outcome occurred. The
relative frequency of each outcome gives us an estimate of its probability.
For example, if a particular outcomes occurs 50% of the time, then its probability is 0.5.
As a concrete example, suppose we roll two dice and sum the numbers that
appear. The individual outcomes on each die are still equally likely but now we
are going to look at the sum of the two dice which is not equally likely. The
possible outcomes are 2 through 12. Suppose we roll the dice 1000 times and
count the frequency that each sum appears. When we look at the frequency plot
for the sums we notice that it shows a nonuniformity. This is because there is
only one way to score 2, but many ways to score 7.
When we repeat the calculations using 3 and 4 dice and draw a frequency plot
for the sums, we begin to notice a trend in the shape of the plot and suspect
that there is an underlying function f (x) which is determining its shape.
• In fact, this plot suggests the normal curve, or ”bell shaped distribution”,
even though we expect to see that curve only for continuous variables.
How can we simulate rolling the dice and generating these plots using Matlab?
Here is a version where we don’t generate all the random numbers at once.
n_dice = 2;
freq(1:12)=0;
for k = 1:n
x=rand(1,n_dice);
% generate 2 random numbers between 0 and 1
x=ceil( 6*x );
% turn them into an integer 1 to 6
value = sum(x);
% sum the two dice
freq(value ) = freq(value ) + 1; % increment the frequency
end
We can also generate all the random numbers for all the throws for the two dice
using
throws = rand(n, n dice);
throws = ceil(6 ∗ throws);
How can we use the frequency plot to estimate the probabilities?
• A frequency plot is simply a record of what happened.
• However, if we keep records long enough, we may see that the frequency
plot can be used to make statements about what is likely to happen in the
future. For two dice, we can say that a score of 7 is very likely and can even
guess how much more likely it is than say 3.
• This suggests that there is an underlying probability that influences which
outcomes occur. In cases where we are simply collecting data, we can turn
our frequencies into estimates of probability by normalizing by the total
number of cases we observed:
frequency of result #i
estimated probability of result #i =
total number of results
Now let’s compare the approximate probabilities found using this formula for the
case where we used n throws of 2 dice and then compare this with their exact
probabilities.
To calculate the actual probabilities of each sum we determine the possible outcomes of rolling the dice. Because there are more outcomes in this case, it’s
better to make an outcome table where we list all possible outcomes of the roll
of 2 (fair) dice.
1
2
3
4
5
6
1
(1,1)
(2,1)
(3,1)
(4,1)
(5,1)
(6,1)
2
(1,2)
(2,2)
(3,2)
(4,2)
(5,2)
(6,2)
3
(1,3)
(2,3)
(3,3)
(4,3)
(5,3)
(6,3)
4
(1,4)
(2,4)
(3,4)
(4,4)
(5,4)
(6,4)
5
(1,5)
(2,5)
(3,5)
(4,5)
(5,5)
(6,5)
6
(1,6)
(2,6)
(3,6)
(4,6)
(5,6)
(6,6)
• We see that there are a total of 36 combinations of the dice.
• If we want to compute the probability of an outcome, how would we do it?
As an example, consider the probability of the outcome (3,5). Remember
that these events are independent and because the probability of rolling a
3 with the first die is 1 out of 6 and the probability of rolling a 5 with
the second die is also 1 out of 6 so the probability of the outcome (3,5) is
1
1
1
×
=
.
6
6
36
• However, we want to determine the probability that a particular sum will
occur. The number of outcomes resulting in a sum of 2 to 12 are:
2→1
7→6
8→5
3→2
9→4
4→3
5→4
10 → 3
6→5
11 → 2
12 → 1
• How do we compute the probability that a particular sum will occur? Consider the probability of the sum 5 occurring. There are a total of 4 ways out
of 36 outcomes that yield the sum of 5 so its exact probability is 4/36=1/9.
• For a pair of fair dice, the exact probability of each total is:
2
3
4
5
6
7
8
1
36
2
36
3
36
4
36
5
36
6
36
5
36
9 10 11 12
4
36
3
36
2
36
1
36
.03 .06 .08 .11 .14 .17 .14 .11 .08 .06 .03
Note that the probabilities are all positive and the sum of the probabilities is 1,
as expected.
Now we compare these to our estimated probabilities. Clearly they don’t agree
but for large n they are a “good” approximation. Remember that are using a
frequency interpretation of probability.
sum →
2
3
4
5
6
7
8
9
10
11
12
exact .03
.06
.08
.11
.14
.17
.14
.11
.08
.06
.03
n = 100 .02
.05
.07
.09
.15
.12
.15
.11
.15
.07
.02
n = 1000 .034 .048 .082 .123 .136 .171 .127 .115 .091 .049 .024
n = 10000 .026 .0555 .0867 .1181 .1358 .1711 .1347 .1088 .0822 .0553 .0258
• Our plot of the exact probabilities is an example of a discrete PDF and the
corresponding plot of our estimated probabilities is an approximation to the
discrete PDF. The PDF assigns a probability p(x) to each outcome X in our
set Ω of all possible outcomes. In our example there were 11 outcomes and
Ω = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} .
• Recall that the discrete PDF describes the relative likelihood for a random
variable to occur at a given point.
• What properties should a discrete PDF have? Clearly two obvious requirements for a discrete PDF are
1.
2.
0 ≤ p(xi) for each xi
Pn
i=1 p(xi) = 1 where n is the number of possible outcomes.
Exercise. Suppose you have a die which is not fair in the sense that each side
has the following probabilities
2
2
2
1
1
1
2→
3→
4→
5→
6→
1→
9
9
9
9
9
9
Would you bet on rolling an even or an odd number? Why?
Write a code to simulate rolling one of these loaded dice and estimate the PDF.
Graph it.
Continuous Probability Density Functions
• We have seen that the discrete PDF provides us with probabilities for each
value in the finite set of outcomes Ω. The continuous PDF provides us with
probabilities for a range of values in the set of outcomes.
• Recall that in the discrete case, the sum of all probabilities had to be 1. The
analog of that for the continuous case is that the total area under the curve
is 1.
• If we determine the area under the curve between two points, then it is just
the probability that the outcome will be between those values.
• For example, in the plot below the probability that the random variable is
≤ a is the integral under the curve from −∞ to a.
Z
a
f (x) dx
−∞
is the probability that the random variable is ≤ a.
Mean, Variance and Expected Values
• When we have a large collection of numbers (such as scores from a large
lecture exam) we are not usually interested in the individual numbers but
rather in certain descriptive quantities such as average and how far from
the average a certain percentage of students are. The same is true for a
probability density function.
• Three important characteristics of a PDF are its mean or expected value or
average, its variance, and the standard deviation.
• For a discrete PDF, assume that we have a finite set Ω of all possible n
outcomes and the ith outcome for the random variable X is denoted by xi.
We can calculate the expected value (i.e., the mean) of a random variable
X, denoted either µ or E[X], from the formula
n
X
µ=
p(xi)xi
i=1
• For a continuous PDF, where the random variables range between a and b
we can calculate the mean from the formula
Z b
p(x)x dx
µ=
a
• If you recall the definition of the center of mass of a laminar sheet
R
DRxρ(x) dA
D ρ dA
whereR ρ is the density you can see an analog of this formula because in our
b
case a p dx = 1.
• To understand the discrete case, let’s look at an example.
Consider the situation where we have the heights in inches of the 12 members
of a women’s basketball team given by
{69, 69, 66, 68, 71, 65, 67, 66, 66, 67, 70, 72}
We know that if we want the average height of the players we simply compute
69 + 69 + 66 + 68 + 71 + 65 + 67 + 66 + 66 + 67 + 70 + 72
= 67.9
12
We can also interpret 67.9 as the mean or expected value of a random
variable. To see this, consider choosing one player at random and let the
random variable X be the player’s height. Then the expected value of X is
67.9 which can also be calculated by forming the product of each xi and its
corresponding probability
1
1
1
1
69
+ 69
+ 66
+ · · · + 72
12
12
12
12
which is just the formula we gave for the mean/expected value.
Example Suppose we are going to toss a (fair) coin 3 times. Let X, the random
variable, be the number of heads which appear. Find the mean or expected value
and explain what it means.
We know that there are 23 possible outcomes which are
HHH, HHT, HTH, HTT, TTT, TTH, THT, THH
and each has a probability of 1/8. Now heads can occur no times, 1 time, 2
times or 3 times. The probability that it occurs 0 times is 1/8, that it occurs 1
time is 3/8, that it occurs 2 times is 3/8 and it occurs 3 times is 1/8. Therefore
the mean or expected value is
3
3
1 12 3
1
= .
0 +1 +2 +3 =
8
8
8
8
8
2
This says that if we toss the coin three times the number of times we expect
heads to appear is 32 which is one-half the total possible times; exactly what we expected! Another way to get this answer is to consider the set {3, 2, 2, 1, 0, 1, 1, 2}
indicating the number of heads appearing in 3 tosses and average them to get
(3 + 2 + 2 + 1 + 0 + 1 + 1 + 2)/8 = 12/8 = 3/2.
Statisticians use the variance and standard deviation of a continuous random
variable X as a way of measuring its dispersion, or the degree to which it is
“scattered.”
The formula for the variance Var(X) of a discrete PDF is
Var(X) =
n
X
i=1
p(xi) ∗ (xi − µ))2
and for a continuous PDF
Var =
Z
b
a
p(x)(x − µ)2 dx
where µ is the expected value or mean.
p
The standard deviation, σ(X) is just Var(X).
Note that Var(X) is the mean or expected value of the function (X − µ)2, which
measures the square of the distance of X from its mean. It is for this reason that
Var(X) is sometimes called the mean square deviation, and σ(X) is called the
root mean square deviation. Var(X) will be larger if X tends to wander far
away from its mean, and smaller if the values of X tend to cluster near its mean.
The reason we take the square root in the definition of σ(X) is that Var(X)
is the mean/expected value of the square of the deviation from the mean, and
thus is measured in square units. Its square root therefore gives us a measure in
ordinary units.
Example Compute the mean (expected value), variance and standard deviation
for our example where we throw 2 dice and sum the values. Recall that here the
random variable X is the sum of the values.
Returning to our probability table
2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4
36 36 36 36 36 36 36 36
3
36
2
36
1
36
we can compute the mean or expected value as
µ=2
1
2
3
2
1
+ 3 + 4 + · · · + 11 + 12 = 7
36
36
36
36
36
and the variance as
2
3
1
1
2
2
2
Var(X) = ∗(2−7) + ∗(3−7) + ∗(4−7) +. . .+ ∗(12−7)2 = 5.8333
36
36
36
36
√
with a standard deviation of σ(X) = 5.833 = 2.415. The mean is what we
expected from looking at the PDF. If, for some reason, 7 was impossible to roll,
it would still be the expected value.
Normal distribution
In probability theory, the normal (or Gaussian) distribution, is a continuous PDF
that is often used as a first approximation to describe real-valued random variables
that tend to cluster around a single mean value. The graph of the associated
PDF is the well-known bell-shaped curve known as the Gaussian function or bell
curve given by
f (x) = √
1
2πσ 2
−
e
(x−µ)2
2σ 2
where parameter µ is the mean (location of the peak) and σ 2 is the variance
(the measure of the width of the distribution). The distribution with µ = 0 and
σ 2 = 1 is called the standard normal distribution.
The normal distribution is considered the most basic continuous PDF and it is
used throughout statistics, natural sciences, and social sciences as a simple model
for complex phenomena.
randn
Matlab has a built in command randn which returns pseudorandom values drawn
from the standard normal distribution. We can use it to generate values from a
normal distribution with a particular mean and standard deviation; for example,
if we want to generate 100 points with mean 1 and standard deviation 2 we use
r = 1 + 2. ∗ randn(100, 1);
Example Generate 1000 random numbers using rand and randn and make
a histogram of the results. Compute the mean and standard deviation of each
using the Matlab commands mean and std. The command to make a histogram
of y values is hist(y).
The Cumulative Density Function
If we have a discrete system with a known PDF, the value of the PDF at (xi),
say pdf(xi), is the probability that the outcome xi will occur.
But suppose we want to know the chances of rolling a 7 or less, using two
dice. This is the probability that the outcome is less than or equal to 7; it is so
important that it has its own name, the cumulative density function or CDF.
cdf(x) =probability outcome is less than or equal to x
=
X
y≤x
pdf(y)
Example If we return to the example when we rolled one die we know that
each number was equally probable so each had probability of 1/6. What is the
CDF for each xi ∈ {1, 2, 3, 4, 5, 6}?
If we ask what is cdf(6) then it should be 1 because we know it is 100% sure that
we will roll a number ≤ 6. For the other values we simply sum up the PDFs.
cdf(1) =
1
6
cdf(2) =
1 1 1
+ = ,
6 6 3
etc.
We see that for our discrete PDF
• cdf(x) is a piecewise constant function defined for all x;
• cdf(x) = 0 for x less than the smallest possible outcome;
• cdf(x) = 1 for x greater than or equal to the largest outcome;
• cdf(x) is essentially the discrete integral of pdf(X) over the appropriate
interval ;
• cdf(x) is monotonic (and hence invertible);
• the probability that x is between x1 and x2 ( x1 < x ≤ x2) is
.
cdf(x2) − cdf(x1)
Return again to the problem of rolling two dice and calculating the probability
that the sum of the dice is a number between 2 and 12. We found the exact
probability of each sum as:
2
3
4
5
6
7
8
1
36
2
36
3
36
4
36
5
36
6
36
5
36
9 10 11 12
4
36
3
36
2
36
1
36
.03 .06 .08 .11 .14 .17 .14 .11 .08 .06 .03
For a discrete case like this, it’s easy to make the corresponding cumulative
density function table:
2
3
4
5
6
7
8
1
36
3
36
6
36
10
36
15
36
21
36
26
36
9 10 11
30
36
33
36
35
36
12
36
36
.03 .08 .16 .28 .42 .58 .72 .83 .92 .97 1.00
The CDF is actually defined for all x, not just the values 2 through 12.
For example, the table tells us that cdf(4.3) = 0.16 and cdf(15) = 1 The latter
just tells us that the probability of rolling two dice and getting a sum ≤ 15 is 1.
Here is the plot of the CDF for 2 dice.
Matlab has a built in command cumsum which can assist us in calculating the
CDF from a discrete PDF. For example, to find the cumulative sum of integers
from 1 to 5 we use
cumsum(1:5)
and the response is [ 1 3
6
10 15 ].
Here is a script to draw the last bar graph for the CDF.
x = 2 :
1 2
pdf = [ 1 , 2 , 3 , 4 , 5 , 6 , 5 , 4 , 3 , 2 , 1 ] / 3 6
cdf= cumsum ( pdf )
bar ( x , cdf )
xlabel ( ’ Score ’ )
ylabel ( ’CDF( Score ) ’ )
title ( ’CDF for 2 dice ’ )
Using the CDF for Simulation
• The CDF can be used to simulate the behavior of a discrete system.
• As an example, consider the problem of simulating rolling two dice and using
the CDF to estimate the probabilities where we are given the CDF as
cdf = {.03, .08, .16, .28, .42, .58, .72, .83, .92, .97, 1.00}
• First recall that the probability that x is between x1 and x2 is cdf(x2) −
cdf(x1).
• For our problem we generate a random number r between 0 and 1 for a
probability. We determine i such that
cdf(i − 1) < r ≤ cdf(i)
However, when i = 1 this doesn’t work so we append our CDF to include 0
cdf = {0, .03, .08, .16, .28, .42, .58, .72, .83, .92, .97, 1.00}
For example, if r = .61 then i = 8.
• We do this lots of times and keep track of how many random numbers were
assigned to i = 1, i = 2, etc.
• Then our estimate for the probability of the sum being say 2 (i.e., corresponding to i = 2) is the number of times we have chosen a random number
between 0 and 0.03 divided by the total number of random numbers generated.
function prob = cd sim ( n )
cdf = [0 .03 .08 .16 .28 .42 .58 .72 .83 .92 .97 1.00]
count(1:12)=0;
for k = 1:n
r = rand(1);
for i = 2:12
if cdf(i-1) < r & r <= cdf(i)
count (i) = count(i)+1;
end
end
end
freq = count(2:12)/n;
x=2:12;
bar(x,freq)
end
Exact - blue
Approximate - red
Exercise. Let’s return again to the problem where we have a die which is not
fair in the sense that each side has the following probabilities
2
2
2
1
1
1
2→
3→
4→
5→
6→
1→
9
9
9
9
9
9
Compute the exact CDF. What is the probability that you will roll a 3 or less?
Write a program to use your CDF to approximate the exact probabilities. Compare.
The Monte Carlo Method
The Monte Carlo Method (MCM) uses random numbers to solve problems. Here
we will view MCM as a sampling method.
First we will see how we can sample different geometric shapes and then apply
the sampling to different problems.
The MC method is named after the city in the Monaco principality, because of
a roulette, a simple random number generator. The name and the systematic
development of Monte Carlo methods dates from about 1944.
The real use of Monte Carlo methods as a research tool stems from work on
the atomic bomb during the second world war. This work involved a direct
simulation of the probabilistic problems concerned with random neutron diffusion
in fissile material; but even at an early stage of these investigations, von Neumann
and Ulam refined this particular ” Russian roulette” and ”splitting” methods.
However, the systematic development of these ideas had to await the work of
Harris and Herman Kahn in 1948. Other early researchers into the field were
Fermi, Metropolis, and Ulam.
• We can think of the output of rand(), or any stream of uniform random
numbers, as a method of sampling the unit interval.
• We expect that if we plot the points along a line, they would be roughly
evenly distributed
• Random numbers are actually a very good way of exploring or sampling many
geometrical regions. Since we have a good source of random numbers for
the unit interval, it’s worth while to think about how we could modify such
numbers to sample these other regions.
Sampling the interval [a, b]
Suppose we need uniform random numbers, but we want them in the interval
[a, b]
We have seen that we can use rand but now we have to shift and scale. The
numbers have to be shifted by a and scaled by b − a. Why?
r =rand();
s =a + (b − a) ∗ r
or we can do this for hundreds of values at once:
r =rand(10000, 1);
s =a + (b − a) ∗ r;
1000 uniformly sampled points on [1, 10]
Sampling the Unit Square
We might need uniform random numbers in the unit square.
We could compute an x and y value separately for each point
x =rand();
y =rand();
or we can do this for hundreds of values at once to create a 2 × n array:
xy =rand(2, 10000);
Of course if our domain is not the unit square, then we must map the points.
Sampling the Unit Circle
Suppose we need to evenly sample points in the unit circle; i.e., the circle centered
at (0,0) with radius 1.
The first thing that might come to mind is to try to choose a radius from 0 to
1, and then choose an angle between 0 and 2π
r =rand();
t =2 ∗ π ∗ rand();
but although this seems “random” it is not a uniform way to sample the circle
as the following figure demonstrates. More points in a circle have a big radius
than a small one, so choosing the radius uniformly actually is the wrong thing to
do!
How can we fix this problem?
The problem is that when we choose the radius to vary uniformly, we’re saying
there are the same number of points at every value of the radius we choose
between 0 and 1. But of course, there aren’t. If we double the radius of a circle,
the area of the circle increases by a factor of 4. So to measure area uniformly, we
need r2 to be uniformly random, in other words, we want to set r to the square
root of a uniform random number.
p
r = rand();
t =2 ∗ π ∗ rand();
Another approach to sampling the circle is to use rejection.
Suppose we sample the square that contains the circle, and then only plot the
points that are in the circle?
Now we’re sure to get uniformly distributed points!
We’ll also have to reject many points that we generate, which of course is not
optimal.
We imbed the unit circle in a square whose vertices are (-1,1), (1,-1), (-1,1),
(1,1).
x = -1 + 2 * rand ( );
y = -1 + 2 * rand ( );
i = find ( x.^2 + y.^2 < 1 )
plot ( x(i), y(i), ’b*’ )
In this example, there were 3110 points in the circle out of 4000 generated. Is
there any significance in this ratio?
Sampling using rejection.
Sampling the Surface of the Unit Sphere
Another important shape to sample is the surface of the unit sphere. We can
do this, but we’ll have to cheat slightly and use normal random numbers rather
than uniform ones. Recall that the command randn did this.
It turns out that if you want a point on the surface of the unit sphere, you can
just make a vector of 3 normal random numbers, and scale by its length:
xyz = randn ( 3, 1000 );
for j = 1 : 1000
xyz(:,j) = xyz(:,j) ./ norm ( xyz(:,j) );
end
scatter3 ( xyz(1,:), xyz(2,:), xyz(3,:) )
This will also work for higher dimensional spheres!
Exercise Use MCM to sample the “L”-shaped region
0 ≤ x ≤ 10
0 ≤ y ≤ 4 for 0 ≤ x ≤ 2 and 0 ≤ y ≤ 2 for x > 2
Use 1000 points; plot.
Using MCM to approximate π
• We know that the area of the unit circle (with radius 1) is just π. Suppose
we enclose the unit circle in a square say the one centered at the origin with
side 2.
• We want to randomly “throw darts” at the square. Our approximation to π
will be
the number of darts inside or on the unit circle
the total number of darts thrown
Here is a a plot of the region in the first quadrant.
This method is easily implemented in Matlab. We simply generate a random
point (x, y) and decide if it is inside the circle or not. Because rand generates
random numbers between 0 and 1 we either have to map the numbers between
-1 and 1 or use numbers between 0 1 to calculate an approximation to π/4 (i.e.,
in the first quadrant) and multiply the result by 4; we choose the later.
Input: number of random points to generate
Output: approximation to π
count=0;
for i = 1, n
x=rand(1); y=rand(1);
if x^2+y^2 <= 1
count = count + 1 \par
end
end
pi = 4.*count / n
It is more efficient to generate all the random numbers at once and do vector
operations. Note that to square each entry in the vector x (i.e., take dot product) we have to use the appropriate syntax. Here the command find finds all
occurrences where the condition is satisfied; it returns a vector with these indices
so the total number of times this condition was satisfied is the length of this
vector.
rand(’state’,12345)
x=rand(1,n) % generate row of random numbers;
y=rand(1,n);
i=find ( x.^2 + y.^2 <= 1); % find locations where this condition is
m=length(i);
pi_approx = 4. * m / n
error = abs ( pi - pi_approx)
Here are some approximations and the corresponding errors for some values of n
101
102
103
104
105
Approx 3.2
3.16 3.164 3.1292 3.1491
Error 0.0584 0.0184 0.0224 0.0124 0.0075
Note that the error is NOT monotonically decreasing as n increases.
Buffon’s Needle Problem
An interesting related problem is Buffon’s Needle problem which was first proposed in 1777.
Here’s the problem (in a simplified form).
• Suppose you have a table top which you have drawn horizontal lines every 1
inch.
• You now drop a needle of length 1 inch onto the table.
• What is the probability that the needle will be lying across (or touching) one
of the lines?
2
• Actually, one can show that the probability is just
π
AA
A
A
A
X
XXX
XX
!
!!
!
!
So if we could simulate this on a computer, then we could have another method
for approximating π
• Let’s analyze the problem to see how we can implement it.
• Let’s assume without loss of generality that the lines are horizontal, they are
spaced 1 unit apart and the length of the needle is 1 unit.
• Assume, as in the figure, that we have dropped a needle and that we know the
location of the middle of the needle (actually we just need the y-coordinate)
and the angle θ the needle makes with the horizon.
• So in the figure we see that the vertical side of the triangle has length 12 sin θ
θ
1/2 sin θ
• Since we know the y-coordinate of the middle of the needle we know the y
coordinates of the end of the needle
1
y ± sin θ
2
• Here are the 3 situations we can have
1
0
• If the top of the needle has a y-coordinate greater than one, then the needle
touches the line, i.e., we have a “hit”. If the bottom of the needle has a
y-coordinate less than zero, then it touches the other line and we have a
“hit”.
• Since it is known that the probability that the needle will touch the line is
2/π then
number of hits
2
≈
number of points π
and thus
number of points
π ≈2×
number of hits
One way to see that the probability is 2/π is to note that the shaded portion in the
figure below is found by using the definite integral of (1/2)sin(theta) evaluated
from zero to pi. The result is that the shaded portion has a value of 1. The
value of the entire rectangle is (1/2)(pi) or pi/2. So, the probability of a hit is
1/(pi/2) or 2/pi. That’s approximately .6366197.
Exercise Write pseudo code for an algorithm to approximate π using Buffon’s
needle problem. Then modify the code for estimating π using the unit circle to
solve this problem.
Using Monte Carlo to Approximate an Integral
• Suppose we want to evaluate
Z
b
f (x) dx
a
• If f (x) ≥ 0 for a ≤ x ≤ b then we know that this integral represents the
area under the curve y = f (x) and above the x−axis.
• Standard deterministic numerical integration rules approximate this integral
by evaluating the integrand f (x) at a set number of points and multiplying
by appropriate weights.
– For example, the midpoint rule is
Z b
a+b
(b − a)
f (x) dx ≈ f
2
a
– Simpson’s rule is
Z b
h
i
a+b
b−a
f (a) + 4f
+ f (b)
f (x) dx ≈
6
2
a
• Deterministic numerical integration (or numerical quadrature) formulas have
the general form
Z b
q
X
f (x) dx ≈
f (xi)wi
a
i=1
where the points xi are called quadrature points and the values wi are quadrature weights
To approximate the integral using Monte Carlo (which is a probablistic approach)
we first extend the idea we used to approximate π.
Assume for now that the integrand is greater than or equal to zero in the interval
[a, b] then we
• choose a rectangle which includes the area you want to determine; e.g., if
f (x) ≤ M for all a ≤ x ≤ b then choose the rectangle with base b − a and
height M
– generate a random point in the rectangle
– determine if the random point is in desired region
– take area under curve as a fraction of the area of the rectangle
• First we generate two random points, call them (x, y) and map them to the
rectangle enclosing our integral.
• In our method for approximating π we checked to see if x2 + y 2 ≤ 1. What
do we check in this case?
At this given x-point we want to see if the random point y is above the
curve or not. It is easy to check to see if
y ≤ f (x) where x is the random point
If this is true, then the point lies in the desired area and our counter is
incremented.
• Our approximation to the area, i.e., the value of the integral, is simply
the usual fraction times the area of the testing rectangle (whose area is
M (b − a)).
number of hits
× M (b − a)
number of points
(2,4)
(2,0)
12 random points generated and
5 in the desired region
Here we are approximating
Z
2
3 2
x
8
2
x dx = = = 2.67
3 0 3
0
Using 12 points we have that the approximation is
40
5
(8) =
= 3.33
12
12
where 8 is the area of the box we are sampling in and 5 out of 12 points were in
the desired region. Not a good approximation but we only used 12 points!
How do you choose the bounding box? It is not unique.
Example
Approximate the integral
Z 1/2
1
cos(πx) = ≈ 0.3183
π
0
using MC.
We choose the bounding box to be [0, .5] × [0, 1]. The results are given in the
table.
n approx
100
1000
10000
100,000
0.3150
0.3065
0.3160
0.3196
error
0.0033
0.0118
0.0023
0.0013
R2
In the example of approximating 0 x2 dx we saw that there are a lot of random
points generated which do not lie in the region under the curve f (x) = x2.
We will now look at using MCM for approximating an integral from a slightly
different viewpoint. This has the advantage that we don’t have to take a bounding
box and we get a better approximation with the same number of points.
Recall that the Riemann sum for approximating
partition
Rb
a
f (x) dx with the uniform
x0 = a, xn = b, xi = xi−1 + ∆x where ∆x =
b−a
n
is
Z
b
a
f (x) dx ≈
n
X
i=1
f (x̃i)∆x =
n
X
i=1
b−a
f (x̃i)
n
Here x̃i is any point in the interval [xi−1, xi] so if x̃i is the midpoint (xi−1 + xi)/2
then we have the midpoint rule. As n → ∞ we get the integral.
The MCM approach to this problem can be written as
Z b
n
X
b−a
f (x) dx ≈
f (xi)
n
a
i=1
where now the points xi are random points in the interval [a, b]. This is not
a determinate formula because we choose the xi randomly. For large n, we can
get a very good approximation.
Another way to look at this expression is to recall from calculus that the average
of the function f (x) over [a,b] is
Z b
Z b
n
X
1
1
f (x) dx = (b − a)f¯ ≈ (b − a) ∗
f (x) dx =⇒
f (xi)
f¯ =
b−a a
n i=1
a
One can show rigorously that the error is O( √1n ).
Example
Consider the example we did before of approximating the integral
Z 1/2
1
cos(πx) = ≈ 0.3183
π
0
using MC. We now compare the two methods. We have to do the same number
of function values in each case.
n approx
error
100 0.3150 0.0033 0.3278 0.0095
1000 0.3065 0.0118 0.3157 0.0026
10000 0.3160 0.0023 0.3193 0.0010
Example
where
Consider the problem of estimating a “hard” integral
R1
0
f (x) dx
f (x) =sech2(10.0 ∗ (x − 0.2))
+sech2(100.0 ∗ (x − 0.4))
+sech2(1000.0 ∗ (x − 0.6))
From the plot, it should be clear that this is actually a difficult function to
integrate. If we apply the Monte Carlo Method, we get “decent” convergence to
the exact value of 0.21080273631018169851
1
10
100
1000
10,000
100,000
1,000,000
0.096400
0.179749
0.278890
0.221436
0.210584
0.212075
0.211172
1.1e-01
3.1e-02
6.8e-02
1.0e-02
2.1e-04
1.2e-03
3.6e-04
Here is a plot of the error on a log-log plot. Note that the slope is approximately
-1/2. This is what we expect because we said that the error = C √1n for a constant
C so the log of the error is log(n−1/2) = −0.5 log n.
Now let’s approximate the integral of a function when the interval is not [0, 1].
Specifically we want to integrate
f (x) = cos(x)
+5∗ cos(1.6 ∗ x)
−2∗ cos(2.0 ∗ x)
+5∗ cos(4.5 ∗ x)
+7∗ cos(9.0 ∗ x);
over [2,7]. This is a very“wiggly” function which we you can see from the plot.
Because the interval is [2, 7] we have to shift the interval from [0,1] to [2,7]. This
is easy to do with a linear map; we want 0 → 2 and 1 → 7 so the linear function
that does this is y = 2 + 5x so a random point x in the interval [2, 7] is just x
= 2.0 + ( 7.0 - 2.0 ) * rand ( n, 1 ).
Applying the Monte Carlo Method, we get “decent” convergence to the exact
solution -4.527569. . .:
1
10
100
1000
10000
100000
1000000
10000000
21.486936
-21.701206
-2.472846
-4.911594
-4.253230
-4.424016
-4.509781
-4.529185
2.6e+01
1.7e+01
2.0e+00
3.8e-01
2.7e-01
1.0e-01
1.7e-02
1.6e-03
Again, we see the approximation to a slope of − 21 on a log-log plot.
• The two functions both seem “hard” to integrate, but for one, we got an
error 20 times smaller with the same number of points Why?
• The error depends in part on the variance of the function.
• To get a feeling for the variance of these
R btwo functions;2 remember that the
variance is given by the formula Var = a x(x − mean) dx.
– The first function has an integral of about 0.2 over the interval [0,1], so its
average value is 0.2 because the length of the interval is 1. The function
ranges between 0 and 1, so the pointwise variance is never more than
(1 − .2)2 = 0.64
– The second function has an integral of about -4.5 over an interval of
length 5, for an average value of about -1. Its values vary from -14 to
+16 or so, so the pointwise variance can be as much as 200.
Integrals in two dimensions
Since we know how to sample a square, let’s try to use the Monte Carlo method
to compute the integral of f (x, y) where
f (x, y) = |x2 + y 2 − 0.75|
over the square [−1, +1] × [−1, +1].
Because of the absolute value function, f (x, y) is not differentiable. It would be
tricky to try to work out the integral, even though this is a pretty simple function.
Because we are in two dimensions, we need to use pairs of random numbers;
when we generate each it will be between [0,1] so we map to [−1, +1] × [−1, +1]
by
x= -1.0 + 2.0 * rand ( n, 1 ), y = -1.0 + 2.0 * rand ( n, 1 )
Here is a surface plot of our function.
We apply MC in an analogous way
Z dZ
c
a
b
f (x, y) dx ≈
n
X
i=1
f (xi, yi)
(b − a)(d − c)
n
Our convergence to the exact solution 1.433812586520645:
1
10
100
1000
10000
100000
1000000
10000000
0.443790
1.547768
1.419920
1.464754
1.430735
1.428808
1.432345
1.433867
9.900230e-01
1.139549e-01
1.389309e-02
3.094170e-02
3.077735e-03
5.004836e-03
1.467762e-03
5.394032e-05
Sometimes we have to integrate a function over a more complicated domain
where it is not as easy to sample in but we have a way to determine if a point is
in the region; for example, an “E”-shaped region. What can we do in that case?
The answer is to return to the original way we evaluated an integral by putting
a bounding box around the region. This approach is usually called MC with
rejection.
If we want to integrate over a region in 2-d that is not a box, we enclose that
region in a box. We then generate a point in the box and reject it if it is not in
the region of integration.
To see how this rejection technique works we look at the example of computing
an integral over a circle.
Our integrand will be
f (x, y) = x2
and we will integrate over the circle of radius 3 centered at (0,0).
For this problem, the function is smooth, but the region is not a rectangle. It’s
not hard to work out the exact answer using polar coordinates, because the circle
is special. But the Monte Carlo approach with rejection would handle an ellipse,
or a region shaped like Mickey Mouse, just as easily.
Now we need to map pairs of random numbers from [0, 1] to [−3, +3] × [−3, +3]
and then reject those that lie outside the circle x2 + y 2 = 9. We simply use
x = -3.0 + 6.0 * rand ( n, 1 );
y = -3.0 + 6.0 * rand ( n, 1 );
if x.^2 + y.^2 <= 9
Note that here we are generating n random numbers at once and using the vector
operation for exponentiation.
exact = 63.617251235193313079;
area = pi * 3*3;
% domain we are integrating over
x = - r + 2 * r * rand ( n, 1 );
y = - r + 2 * r * rand ( n, 1 );
i = find ( x .* x + y .* y <= r * r );
n2 = length ( i );
result = area * sum ( x(i) .* x(i) ) / n2;
fprintf ( 1, ’ %8d %f %e\n’, ...
n2, result, abs ( result - exact ) );
end
% because our funct
If we apply the Monte Carlo Method, we get “decent” convergence:
1
9
78
784
7816
78714
785403
7853972
74.493641
52.240562
62.942561
61.244910
62.653662
63.405031
63.669047
63.624209
1.087639e+01
1.137669e+01
6.746898e-01
2.372341e+00
9.635895e-01
2.122204e-01
5.179565e-02
6.958217e-03
Exact
63.617251235193313079
Using Monte Carlo Sampling for Optimization
• In the lab you will explore another application of MC sampling – find the
extrema of a function.
• A function such as
f (x) = cos(x) + 5 ∗ cos(1.6 ∗ x) − 2 ∗ cos(2.0 ∗ x)
+5∗ cos(4.5 ∗ x) + 7 ∗ cos(9.0 ∗ x);
plotted over [2,7] has many local extrema. Many minimization routines “get
stuck” at these local minima and can’t find the global minimum.
• You will use MC sampling to estimate where the global minimum is. Note
that if we have an algorithm to find a minimum of f (x), we can find its
maximum by find the minimum of −f (x).
Exercise Write a script to implement MC for approximating an integral in 1D
using the formula
Z
b
a
f (x) dx ≈
n
X
i=1
f (xi)
b−a
n
Try out the code on the integral
Z
1
cos(4πx) dx
0
Modify the script to to perform integration in two spatial dimensions. Try your
code out on the integral
Z 3Z 2
(1 + 8xy) dydx = 57 .
0
1
Simulations using the Monte Carlo Method
• So far we’ve concentrated on the Monte Carlo method as a means of sampling. This gave us an alternate means of solving integration problems. Of
course, there are already other methods for dealing with integration problems.
• We will next consider a Monte Carlo application in which no other method
is available, the field of simulation.
• We will consider the common features of tossing a coin many times, watching
a drop of ink spread into water, determining game strategies, and observing
a drunk stagger back and forth along Main Street!
• Using ideas from the Monte Carlo method, we will be able to pose and
answer questions about these random processes.
• Computer simulation is intended to model a complex process. The underlying
model is made up of simplifications, arbitrary choices, and rough measurements.
• Our hope is that there is still enough relationship to the real world that we
can study the results of the simulation and get an idea of how the original
process works, predict outputs for given inputs, and how we can improve it
by changing some of the parameters.
• A famous applied mathematician once said “the purpose of computations is
insight, not numbers”. It is good to always keep this in mind!.
• Often it is particularly hard to use rigorous mathematical analysis to prove
that a given model will always behave in a certain way (for example, the
output won’t go to infinity.)
• Sampling tries to answer these questions by practical examples - we try lots
of inputs and see what happens.
• We want to consider some examples of simulations that can be done using
the MC method and you will look at another in the lab.
A Simple Example from Business
Suppose a friend is starting a business selling cookies and you want to help him
succeed (and show off your computational skills).
Assume he buys the cookies from a local bakery at the cost of $0.40 and sells
them for $1.00; assume that your friend has no overhead so he makes $0.60 in
profit per cookie if he sells all the ones he ordered.
However, if he has some cookies left over at the end of the day, those are given
to the homeless shelter and he looses $0.40 per cookie. He works for four hours
each afternoon and feels there is a fairly uniform demand during those hours; he
has never sold less than 80 or more than 140 cookies.
Use MC to recommend how many cookies he should order to maximize his profits.
How can we use MC to answer this question?
• Let Q denote the quantity that he orders; D the demand (amount sold)
• Set Q = 80
• Generate n replications of D; i.e., generate n random numbers between 80
and 140
• For each replication, compute the daily profit
• After n replications estimate the earnings ordering Q cookies by
P
daily profit
earnings =
n
• Repeat for integer values of Q between 80 and 140;
• Select the value of Q which yields the best earnings
profit = 0.;
for k = 1:n
d = rand(1);
d=80 + 60*x;
if d > q
profit = profit + .6*q;
else
profit = profit + .6*d-.4*(q-d);
end
end
earnings = profit /n
end
60
58
56
54
52
50
48
80
90
100
110
120
130
140
Profit is maximized by ordering approximately 116 cookies
Brownian Motion
In 1827, Scottish botanist Robert Brown was studying pollen grains
which he had mixed in water. Although the water was absolutely
still, the pollen grains seemed to quiver and move about randomly.
He could not stop the motion, or explain it. He carefully described
his observations in a paper.
When other researchers were able to reproduce the same behavior, even using other liquids and other particles, the phenomenon was named
Brownian Motion, although no one had a good explanation for what they were
observing.
Check out the You Tube video for Brownian Motion at
http : //www.youtube.com/watch?v = apUl baT Kc
Here is the result of a simulation of Brownian motion.
In 1905, the same year that he published his paper on special
relativity, Albert Einstein wrote a paper explaining Brownian motion. Each pollen
grain, he said, was constantly being jostled by the motions of the water molecules
on all sides of it. Random imbalances in these forces would cause the pollen grains
to twitch and shift.
Moreover, if we observed the particle to be at position (x, y) at time t = 0, then
its distance from that point at a later time t was a normal random variable with
√
mean 0 and variance t. In other words, its typical distance would grown as t.
Recall that the command randn used here generates numbers with a normal
distribution.
T = 10.0;
N = 1000;
h = sqrt ( T / N );
x(1) = 0.0;
y(1) = 0.0;
for i = 1 : N
x(i+1) = x(i) + h * randn ( );
y(i+1) = y(i) + h * randn ( );
end
We can write this program another way to get rid of the loop.
T = 10.0;
N = 1000;
h = sqrt ( T / N );
x(1) = 0.0;
y(1) = 0.0;
x(2:N+1) = h * cumsum ( randn(1:N,1) );
y(2:N+1) = h * cumsum ( randn(1:N,1) );
Brownian motion also explained the phenomenon of diffusion, in which a drop
of ink in water slowly expands and mixes.
As particles of ink randomly collide with water molecules, they spread and mix,
without requiring any stirring.
√
The mixing obeys the t law, so that, roughly speaking, if the diameter of the
ink drop doubles in 10 seconds, it will double again in 40 seconds.
The physical phenomenon of Brownian Motion has been explained by assuming
that a pollen grain was subjected to repeated random impulses. This model
was intriguing to physicists and mathematicians, and they soon made a series of
abstract, simplified versions of it whose properties they were able to analyze, and
which often could be applied to new problems.
The simplest version is known as the Random Walk in 1D.
Random Walks
100 Drunken Sailors
We’ll introduce the random walk with a story about a captain whose ship was
carrying a load of rum. The ship was tied up at the dock, the captain was asleep,
and the 100 sailors of the crew broke into the rum, got drunk, and staggered out
onto the dock.
The dock was 100 yards long, and the ship was at the 50 yard mark. The sailors
were so drunk that each step they took was in a random direction, to the left
or right. (We’re ignoring the width of the pier because they can only move to
the right or left.) They were only able to manage one step a minute. Two hours
later, the captain woke up.
”Oh no!” he said, ”There’s only 50 steps to the left or right and they fall into
the sea! And two hours makes 120 steps! They will all be drowned for sure!”
But he was surprised to see that around 80 of the crew were actually within 11
steps to the left or right, and that all of the crew was alive and safe, though in
sorry shape.
How can this be explained?
We will use the idea of Random Walk to simulate it. Here’s a frequency plot for
the simulation.
1D Random Walk − 100 sailors and 120 steps
10
9
8
number of sailors
7
6
5
4
3
2
1
0
−40
−30
−20
−10
0
10
steps from ship right or left
20
30
40
• We want to simulate this experiment using a random walk.
• Taking n random steps is like adding up random +1’s and -1’s; the average
of such a sum tends to zero, with an error that is roughly √1n (in our problem
√
n ≈ 11)
1
√
|average − 0| ≈
n
then
|
and so
1
sum
|≈√
n
n
√
|sum| ≈ n.
Simulating the Drunken Sailor Random Walk Experiment
Our model for a random walk in 1D is very simple.
• We let x represent the position of a point on a line.
• We assume that at step n = 0 the position is x = 0.
• We assume that on each new step, we move one unit left or right, chosen
at random.
• From what we just said, we can expect that
√ after n steps, the distance of the
point from 0 will on average be about n. If we compare n to the square
of the distance, we can hope for a nice straight line.
• First let’s look at how we might simulate this experiment. In this code we
want to plot the number of steps n on the x-axis and the square of the
distance traveled and the square of the maximum distance from the origin
each sailor got.
• For each sailor we will take n steps; before starting we set x = 0 (the location
given in units of steps to right or left) and at each step
– generate a random number r between 0 and 1(we could have mapped r
between -1 and 1)
– if 0 ≤ r < 0.5 we move to the left so x = x − 1
– if .5 < r ≤ 1 we move to the right so x = x + 1
%
%
Take WALK_NUM walks, each of length STEP_NUM random +1 or -1 step
time=1:step_num;
for walk = 1 : walk_num
x = 0;
for step = 1 : step_num
r = rand ( );
%
%
%
Take the step.
if ( r <= 0.5 )
x = x - 1;
else
x = x + 1;
end
%
%
Update the average and max.
x2_ave(step) = x2_ave(step) + x^2;
x2_max(step) = max ( x2_max(step), x^2 );
end
x2_ave(:,:) = x2_ave(:,:) / walk_num;
%
%
Plot the results.
plot ( time, x2_ave, time, x2_max, ’LineWidth’, 2 );
2
1D Random Walk − Max and average of distance versus time
1200
1000
Distance Squared
800
600
400
200
0
0
20
Blue - square of average distance
40
60
N
80
100
120
Green - square of maximum distance
The square of the average distance behaves linearly but the square of the maximum distance each sailor traveled from the starting point varies a lot.
• In 1D, how many possible random walks of n steps are there?
• To answer this, consider flipping a coin. If we flip it once, there are two
choices H and T . If we flip it two times there are 22 = 4 choices
HH HT T T T H
and if we flip it 3 times there are 23 = 8 choices
HHH HHT HT H HT T T T T T HT T HH T T H
so in general there are 2n outcomes for flipping a coin n times.
• Our random walk in 1D is like flipping a coin; there are only two choices –
right or left.
• So if we have 120 steps 2120 ≈ 1036 possible random walks.
• How many of them can end up at a given position?
• There is only one that can end up at n.
• There are n out of 2n (i.e., 120 out of 1036) that end at n − 2; they involve
n − 1 steps of +1 and one step of -1; these can occur at n places. For
example, one sailor could go left on the first step and right on all remaining
ones; another sailor could go right on all steps except the second one, etc.
• One can show that, in general, there are
end up at n - 2*k.
n
k
distinct random walks that will
• That means the probability of ending up at a particular spot is just the
corresponding combinatorial coefficient divided by 2n.
• In other words, row n of Pascal’s triangle tells you what a random walk can
do. But row n of Pascal’s triangle can be thought of as dropping balls in a
pachinko game that can randomly go left or right at each level!
An entry in Pascal’s triangle is found by summing the two numbers above it to
the left and right. The top 1 is called the zeroth row. An entry in the nth row
can be found by n choose r where r is the element number in that row. For
example, in row 3, 1 is the zeroth element, 3 is element number 1, the next three
is the 2nd element, and the last 1 is the 3rd element. The formula for n Choose
r is
n!
where 0!=1
r!(n − r)!
Random Walks in 2D
We can do the same thing in two dimensions. Now we can move to the right or
left or to the top or bottom. You will implement this in the lab and use a random
walk to solve Laplace’s equation in two dimensions.
Exercise Modify the code for using random walks for the drunken sailor problem
if each sailor has a 60% chance of moving to the right and a 40% chance of moving
to the left. Output your results in a frequency plot where the x axis is the number
of steps in each direction from the origin and the y axis is the frequency.
Game Theory
In the 1950’s, John von Neumann (who wrote the first computer user manual,
for the ENIAC), developed game theory, which sought strategies in multi-player
games. The Monte Carlo method was used to simulate the games, and determine
the best strategies. We will look at two examples of this – simulating a gambling
game and simulating a duel!
The first example is a modification to the Fermat/Pascal point game which makes
it much more difficult. Recall in that problem that if one person won, the other
person did not lose. Here we assume that one player wins a dollar from the other
player.
In particular, assume that there are two gamblers, A and B, who decide to play
a game.
A has p dollars and B has q dollars
They decide to flip a coin. If it comes up heads, A wins $1 from B while tails
works the other way.
The game is over when one gambler is bankrupt.
Here are some questions we can ask:
1. What is the probability that A will win?
2. What is the expected value of the game to A, that is, how much will he win
or lose, on average?
3. How long will a typical game last?
4. What are the chances that the winner will be ahead the entire game?
5. What happens if we change the amount of money that A and B have?
Here’s a typical game, in which A starts with $3 and B with $7; assume A wins
when a head comes up
0
A
B
3
7
1
H
4
6
2
H
5
5
3
T
4
6
4
T
3
7
5
T
2
8
6
H
3
7
7
T
2
8
8
T
1
9
9 10 11 12 13 14 15
H T H H T T T
2 1 2 3 2 1 0
8 9 8 7 8 9 10
A loses after 15 tosses, having tossed 6 heads and 9 tails.
Clearly the gambler with the most starting money has an advantage, but how
much?
It’s easy to see that if A starts with $3 dollars, the game can end in 3 steps. Can
we expect that in such a case, a typical game might last 6, or 9 or 12 steps?
Does it depend in any way on how much more money B has?
Suppose that A and B start with the same amount of money. In a typical game,
is it likely that they will each be in the lead about the same number of tosses?
A single plot can show you an example of how the game works, but it can’t tell
you all the things that can happen. Here is a plot of the history of A’s money
for the example game above.
For a simulation, we need to look at a lot of games, and we try different starting
bets (the “parameters” of this model).
Let’s consider the following cases when we run our simulations:
$A $B
$3
$7
$30 $70
$10 $10
$1 $100
Remarks
Our starting example.
10 times as long?
Equal stakes, equal odds
Game will be quick.
Structure of the code
• input starting stakes a stakes, b stakes and number of games to simulate n games
• initialize; set a wins=0, b wins=0, a=a stakes, b=b stakes
• for each game while a>0 and b>0
– generate a random number to see who wins
– if a wins, a = a+ 1; b=b-1
– if b wins, b = b+ 1; a=a-1
• stop when a=0 or b=0
• after the game is over, increment wins
• after all the games are over, approximate the probability that each player
wins
a_wins = 0;
b_wins = 0;
%
%
%
Play n_game games.
for game = 1 : n_games
a = a_stakes;
b = b_stakes;
while ( 0 < a & 0 < b )
r = rand ( );
if ( r <=
a = a +
b = b else
a = a -
0.5 )
1;
1;
1;
b = b + 1;
end
end
% increment wins
if ( b == 0 )
a_wins = a_wins + 1;
else
b_wins = b_wins + 1;
end
end
%
%
%
Average over the number of games.
prob_a_win = a_wins / n_games
prob_b_win = b_wins / n_games
The results of 1000 games for each set of starting stakes.
$A $B Length Max A Prob
$3
$7
21
135
0.29
$30 $70 2,010 12,992
0.28
$10 $10
101
722
0.49
$1 $100
104 10,472
0.01
From this data you might be able to guess the following:
• the expected number of steps is $A * $B;
• A’s chance of winning is $A / ( $A + $B ) ;
What is the expected value of this game?
$A
A’s chance of winning is
$A + $B
the chance of $A losing (i.e., B winning) is 1 - this probability, or
$B
.
$A + $B
The expected value is
value(win) ∗ p(win) + value(loss) ∗ p(loss)
or
$B
$A
$B
− $A
=0
$A + $B
$A + $B
so even when A has $1 against B’s $100, it’s a fair game.
(small chance of big win) - ( big chance of small loss).
Here are the results of four simulations which mimic our cases above.
Gambler’s ruin, number of steps
Gambler’s ruin, number of steps
300
180
160
250
140
120
Frequency
Frequency
200
150
100
100
80
60
40
50
20
0
0
50
100
150
200
250
0
0
2000
4000
6000
Steps
A − $3 B − $7
8000
Steps
A − $30 B − $70
10000
12000
14000
A − $10 B − $10
A − $1 B − $100
Game Theory - Simulating a Duel
Another example in game theory is a duel. Two players take turns firing at each
other until one is hit. The accuracy, or probability of hitting the opponent on
one shot, is known for both players.
In the two person duel there is no strategy (just keep firing), but we want to
know the probabilities that either player survives.
If we assume Player A has an accuracy of 46 and Player B an accuracy of
A gets the first shot then Player A can survive if:
1. A hits on the first shot: ( probability: 4/6) OR
2. A misses, B misses, A hits ( probability: 2/6)*(1/6)*(4/6) OR
3. A misses, B misses, A misses, B misses, A hits:
( probability: 2/6)*(1/6)*(2/6)*(1/6)*(4/6)
OR...
5
6
and
The probability that A survives is thus an infinite sum:
∞
X
5 4
4
((1 − )(1 − ))i
6
6 6
i=0
This has the form 4/6 ∗ (1 + x + x2 + x3 + ...) where 0 < x < 1 so this is equal
to 4/6 ∗ 1/(1 − x) so the probability that A wins is
while B’s chance is
12
p(A)
=
p(A) + p(B) − p(A) ∗ p(B) 17
p(B) ∗ (1 − p(A))
5
=
p(A) + p(B) − p(A) ∗ p(B) 17
We have worked out the exact probabilities in this case however, now we want
to simulate duels so we can approximate the probabilities and compare with
our exact ones. To estimate the probabilities of A and B surviving, we would
simulate the duel 100 or 1,000 times, counting the frequency with which each
player survived, and dividing by the number of duels. We will also keep track of
the number of turns it takes to complete a duel (i.e., until we get a hit) and then
compute the average.
Basically we just need a function which performs the duel and returns the number
of shots required and the survivor (either Player 1 or Player 2). The only input
that is needed is p(1:2) which stores each player’s probability. We can then
write a script to call this function n duel times (the number of duels we are
simulating) and compute the average number of shots required to end the duel.
We would also keep a tally of how many duels were won by each player in the
calling script so we can estimate probabilities.
How do we decide if the player shooting hits the other player? We generate a
random number r between 0 and 1; then Player 1 hits his target if r ≤ p(1);
similarly for Player 2. The duel continues until a target is hit.
[survivor, turn_num] = function duel (p)
turn_num = 0; % count for number of shots required to complete due
if p(1) <= 0
& p(2) <=0
% make sure input probabilities are valid
fprintf (’invalid probabilities so duel fails’, /n)
return
end
while ( 1 )
%
%
%
Player 1 fires.
turn_num = turn_num + 1;
r = rand ( );
if ( r <= p(1) ) % Player 1 hits his target
survivor = 1;
break % leave the while loop
end
%
%
%
Player 2 fires.
turn_num = turn_num + 1;
r = rand ( );
if ( r <= p(2) ) % Player 2 hits his target
survivor = 2;
break
end
end
Note the use of the while (1) here. The expression which must be satisfied in
this statement is always true so the only way to escape the loop is through the
break statement. This is why we added a check at the first of the program to
verify that not both probabilities are ≤ 0.
A 3 person duel is more interesting and we will look at it next.
Exercise Download the function duel.m which has the code described above
implemented. Then write a script which calls this function for each simulation
of a duel. The input for your script should be the probabilities of each player
p(1:2) and the number of simulations you wish to perform. Your code should
(i) compute the average number of shots to complete the duel and (ii) estimate
the probabilities and compare with exact. After this is done, make a plot of the
average number of shots required versus the exact probability of Player 1.
The Three-Way Duel from ”The Good, The Bad, and The Ugly”
In a two person duel there’s no strategy (except of course to shoot at your
opponent on your turn).
But suppose we move to a three person duel involving A, B, and C. Let’s suppose
that C is a relatively poor shot.
If A and B are ”reasonable”, they will probably shoot at each other, since each
represents the biggest threat to the other. As soon as one gets hit, it will be C’s
turn, and that’s his best chance, to have the first shot at the remaining opponent.
(Remember they take turns shooting.)
But a disaster occurs if, instead of A or B knocking the other out, poor shot C
accidentally hits A or B on his turn. Because then C does not get the next shot;
rather the survivor gets first shot at C, and we know that whether A or B shoots
either is a better shot than C!
Now the coding for this problem is similar to that for the two person duel, except
that, on each shot, the player has a choice of whom to shoot at. We said A and
B should shoot at each other.
It seems that C should shoot at the better shot, say A, so that if he accidentally
hits him, he has to face the weaker player B.
Another possibility: C could shoot at no one (if that’s allowed). Then, once A
or B is out, C is sure to get the first shot at the survivor.
Coding the Three Person Duel
• The three-way duel is harder. Instead of writing it from scratch let’s see
what we need to change in the two-way duel code. When we do this, we
should see what we would have to revise if we had a four-person duel!
• Now we can’t assume that as soon as one person is eliminated, the game is
over. So the first thing we have to change is the part of the code that says
if the shot is accurate, the game is over and the shooter is the winner.
• Another issue is that now we have a choice of who to aim at. We will take
the rational strategy that each player aims at the other player who is the
most accurate shot.
• A “brute force” approach to coding this would be to write out what Player
1 will do, then what Player 2 will do and so on. This is straightforward, but
cluttered. It might be better to try to abstract what is going on.
• To this end, let’s concentrate on one arbitrary player whose index is I (i.e.,
1, 2, or 3). If it is I’s turn then what should he do?
– check to see if he is alive; if alive proceed;
– decide who to aim at (there may be one or two choices) based on their
probabilities;
– generate a random number and see if he hits his target;
– if he hits his target then check to see if he is the only surviving ;
– if he is the only survivor then set I as the survivor and return.
Pseudo-code for Player I’s turn
IF ( I am alive )
: turn_num = turn_num + 1
: IF ( two other players are alive )
: : TARGET = the better one
: ELSE
: : TARGET = the remaining player.
: END
: r = rand ( );
: IF ( r <= p(I) )
: : TARGET is eliminated.
: : IF ( I am the ONLY remaining player now)
: : : survivor = I; break;
: : END
: END
END
• To turn our pseudocode into code, we need an efficient way to keep track of
who’s in the game and who’s out.
• One way to do this is to use with the probability array p. Once a person has
been shot, we could set their p value to zero.
• That also makes it easy to choose the target. Always pick the person with
the biggest p value. But what if that person is yourself!
• One simple thing to do is when it is your turn, temporarily set your probability
to zero so you don’t choose to shoot yourself. Then reset the value after
you’ve taken your shot (unless you win and it doesn’t matter).
• If you have an array p then the Matlab command max(p) will give you the
maximum entry of that array; but if you use [value,loc]=max(p) it will
give you the maximum entry and its location. So in our case, it will give us
the player’s number which has the largest probability of hitting his opponent.
• Now there is an easy way to see if everyone has been shot (except you) so
that you can be declared the winner. If you hit your target, set the target’s
probability to 0 (indicating he has been shot). Then all you have to do is
sum the probability array p and if you get 0, then you are the only survivor.
Remember that you have temporarily set your probability to 0 so you won’t
shoot yourself.
turn_num = turn_num + 1;
p_save = p(i);
p(i) = 0.0;
[ pmax, target ] = max ( p ); % TARGET is index of max.
r = rand ( );
if ( r <= p_save )
p(target) = 0.0; % you shot opponent "target"
if ( sum ( p ) == 0.0 ) % everyone has been shot except you
survivor = i;
break;
end
end
p(i) = p_save; % reset your probability
Conclusion for Three-Way Duel
• This is not the best way, or the only way, but it is a way, and it has some
real advantages.
• There are hardly any extra variables (except for p save).
• The code works the same for every player. We never assume that the players
were given in a particular order.
• The code works the same if we increase the number of players.
• Your homework will be simulate a three-person duel using this strategy and
then to modify your code to change the strategy.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement