Getting started with WinBUGS

Getting started with WinBUGS
1
Getting started with WinBUGS
James B. Elsner and Thomas H. Jagger
Department of Geography, Florida State University
Some material for this tutorial was taken from http://www.unt.edu/rss/class/rich/5840/session1.doc
2
http://www.mrc-bsu.cam.ac.uk/bugs/
1. Click on WinBUGS and
follow instructions for downloading the software.
2. Click on Overview and
view the movie.
3
4
Click on WinBUGS
icon to start a
session.
Register to get full
version access.
5
The WinBUGS user manual
and Examples Vol I and
Vol II provide a reference
for beginners and novices.
A good way to approach a
problem with WinBUGS is
to scan through the examples
to find a problem like yours.
You can then modify the
code to fit your problem.
When you click on examples,
you will open them in what is
called a compound document.
This organizes programs,
graphs, and explanations into
a single file. A good example
is the
surgical: institutional ranking.
6
Example #1: Inference is required about the proportion of people in the
population who are unemployed. Let us call this value “pi”.
Think: The true value of pi is in the range between 0 and 1, where 0
means no one is unemployed and 1 means everyone is.
Realistically, we might have some prior information on the value of pi.
For instance, newspaper reports, economic theory, previous surveys,
etc. Your prior can be as informative as you like.
To get things started, lets assume that you choose a prior as a random
value from a beta distribution restricted between the values of 0.1 and
0.6.
7
Specifying your prior
Create a directed graph. In WinBUGS, select Doodle > New...
In the New Doodle dialog, click
on
the default options. You will see a blank worksheet called “untitled”.
Left click anywhere in the middle
of the sheet to create a node.
A node has a name and
type along with other
characteristics and
parameters depending
on its type.
Notes: To delete a node, highlight
it, then Ctrl-Delete.
To highlight a node, click on
it.
Use the Help > Doodle help
to learn more.
8
Click on name, then type “pi”. Note
the graphical node is labeled
simultaneously.
Leave type as “stochastic”.
Click on density and change to “dbeta”.
This means winBUGS will choose
a random value from the beta
distribution.
Click on a and type “1” then on b and type “1”. These are the parameters of the beta
distribution.
Click on lower bound at type “0.1” then on upper bound and type “0.6”. These are the bounds
we set on the true value of our prior.
We will begin by using winBUGS to look at samples from our prior. To do this, we need to
write code using this doodle. WinBUGS executes from the code. The doodle helps us
keep track of our model, which at this stage consists of a single node called pi.
On the main menu, select Doodle > Write Code.
9
A new window opens that displays
the model code.
Left click over the text to highlight it.
Click on Attributes > 16 point.
The model selects a random number
from a beta distribution with
parameters a=b=1, keeping only
those values that lie between 0.1
and 0.6.
Note ~ is read “distributed as”.
To run the model, select Model > Specification... to bring up the Specification Tool dialog box.
Click
. In the lower left corner of the main dialog box you
should see the words “model is syntactically correct”. The
compile button in the Specification Tool becomes active.
Click
. In the lower left corner you should see the words “model compiled”.
Click
. This creates a starting value for the model. You should see the words “initial
values generated, model initialized”.
10
Before you produce results, select Options > Output options
Select the log radio button. Note here you can also select
the output precision.
Select Inference > Samples... This brings up the
Sample Monitor Tool. In the node window type
“pi”, then select
. Note here you can also
select the percentiles of interest.
To run the model, select Model > Update... This opens
the Update Tool. Change updates to 5000 then press
. Watch as the iterations are counted by 100
to 5000.
Return to Inference > Samples... to open the Sample
Monitor Tool. Scroll to pi in the node window,
then press
.
11
The Log window displays text indicating the model code is
correct, and that the model compiled and was initialized.
After pressing
, the Log window displays a plot
showing the distribution of prior values.
The x-axis is the set of possible values for pi and the y-axis
indicates how often the model chooses a particular value.
The table-top shape to the graph indicates that all values of
pi between 0.1 and 0.6 are equally likely. This is what we
decided that we know about the unemployment rate before
we look at our data sample.
There are other useful buttons on the Sample Monitor Tool. For example, by pressing
get the following table appended in the Log window.
The mean value for our prior is
0.349, with 95% of the 5000
values in the range between
0.113 and 0.5869.
we
12
The MC error is the Monte Carlo error, it decreases as the number of samples increases. It
helps in deciding when enough samples have been taken. Since we know the density
is flat on top, we can reduce the wiggles by increasing the number of samples.
Try running 50,000 samples.
pi sample: 50000
3.0
2.0
1.0
0.0
0.0
0.2
0.4
0.6
Note the smoother density and the reduction in the MC error from 0.002 with 5000 samples
to 0.0006 for 50000 samples.
13
For comparison and practice, let's rerun the analysis with a slightly different model. Let's
restrict our prior to the range between 0.2 and 0.45. In words, we are more precise
about what we know concerning the value of pi.
Go back to the Doodle window and change the upper and lower bounds accordingly. Then
select Write Code. Check to see if the code is consistent with the Doddle.
With the Model window highlighted (window containing the code), select Model >
Specification... to open the Specification Tool. First press
. You will
receive the following warning.
Click
Then click
.
and
.
Select Inference > Samples... to bring up the Sample Monitor Tool.
Type in “pi” and the press
.
Select Model > Update... to open the Update Tool. Change updates to 5000 then press
Select Inference > Samples... Scroll to pi in the node window and press
and
.
.
14
The new results are added to the Log
window.
Note that the x- and y-axes scales are
different in the two graphs.
We see that the range of possible values
is constrained and that the mean
value shifts to the left (is smaller).
The graph indicates that we believe the
true value for pi is bounded, but
that within the bounds any value
is equally likely.
This simple example demonstrates how
WinBUGS works. It shows how
to start with a doodle and end up
with a set of random numbers that
encapsulate our belief about the
unknown population parameter.
15
Inference by combining your prior with data
The real power of WinBUGS comes from the ability to combine your prior beliefs with
data you have in hand, thus allowing you to make inferences.
Continuing with our unemployment example, suppose you have results from a small sample
of 14 people. A total of n = 14 people were surveyed and r = 4 of them were unemployed.
These data will have a binomial distribution with proportion pi and denominator N.
Go back to your directed graph and add two additional nodes.
Add a constant node N.
Add a stochastic node r with binomial density, proportion equal to pi and order N.
Left click in your Doodle window, change type to constant and name to N.
Left click again to add a stochastic node with name r, density dbin, proportion pi and
order N.
To add links between the nodes, click on node r to highlight it. With the Ctrl key held,
click on node pi and node N. Arrows will be added to the nodes. The arrows will
point to the highlighted node. The solid arrows indicates a statistical relationship (is
distributed as).
16
The directed graph describes the new
model.
The number of unemployed r is
estimated from our prior and data
as a random variable having a
binomial distribution with
proportion pi and order N.
N is a constant, and the prior for pi is a
beta distribution with two
parameters (a=b=1) and restricted
between 0.2 and 0.45.
Select Doodle > Write Code
To enter the data, type the following in the model code window.
list(N=14,r=4)
When you make you add
arrows, make sure the
parameters of the stochastic
node do not change.
17
Select Model > Specification...
Click
.
Highlight the word “list” and
press
. If
everything
is well you will see the
message “data loaded”.
Press
then
.
Select Inference > Samples...
Type “pi” and press
.
WinBUGS has data for a node with a distribution, so it will calculate the appropriate likelihood
function and prior for pi, and combine them into a posterior distribution. It knows about
conjugate pair of distributions, so the calculation is straightforward.
Select Model > Update... Change updates to 5000, then press
.
WinBUGS generates updated samples of pi (updated from the initial) by combining the prior
information on pi and the new information on pi given by the data r and N.
18
Prior
pi sample: 5000
6.0
4.0
2.0
0.0
0.1
0.2
0.3
0.4
0.3
0.4
pi sample: 5000
Posterior
6.0
4.0
2.0
0.0
0.1
0.2
Note how the data changes our view of the unemployment rate. For one thing, the data gives
us reason to think that the unemployment rate pi is less than 0.4. There is still plenty
of uncertainty, but it is less than before we took the sample. The standard deviation (sd)
of pi from the prior is 0.072 but is 0.067 from the posterior. The 95% credible interval
shrinks accordingly.
For more practice, and to see the effect of a larger sample on the posterior, rerun the model
with n=140, and r = 40.
19
Posterior
pi sample: 5000
15.0
10.0
5.0
0.0
0.1
0.2
0.3
0.4
Thus as we increase our information about pi through more data, the influence of the prior on
the posterior decreases. This is seen by the fact that the posterior density looks less
like the original prior distribution.
20
Example #2: Suppose that we take another survey, perhaps at some
time later. This time 12 different people were asked and 5 said they
were unemployed. What is the evidence that the underlying rates for
the two surveys is really different?
Note: From the first sample, 4/14 or 29% of the people were not
employed. From the second sample, 5/12 or 42% of the people are
unemployed. Looking only at the percentages, the difference appears
to be large.
Using WinBUGS we can determine how much we know about the
differences in the rates from these two small surveys by calculating a
posterior for the difference. Alternatively, we can calculate a posterior
for the ratio.
21
A model for the ratio and difference of two binomial samples
Use the model from example #1 and add a second set of nodes for the new survey.
Call the prior pi2, the number of unemployed r2, and the number of surveyed N2.
Notes:
The nodes can be rearranged using click and drag.
pi2 is set up the same as pi using a beta density (a=b=1) and bounded.
N2 is a constant like N.
r2 is set up the same as r using a binomial density with proportion pi2 and order N2.
22
To get the differences and ratios we create two logical nodes.
Left click to create a node. Change
type to logical, name to ratio, leave
link as identity, and set value
to pi2/pi.
Left click to create a node. Change
type to logical, name to difference,
leave link as identity, and set value
to pi2-pi.
23
Add the links to finish the model. Note that the links to the logical nodes are made with a
hollow arrow which indicates a deterministic relationship as opposed to the solid arrow
which indicates a stochastic relationship. The final doodle and corresponding code for
the two proportion model are:
Directed Graph
model;
{
pi ~ dbeta(1,1)I(
0.2,0.45)
r ~ dbin(pi,N)
pi2 ~ dbeta(1,1)I(
0.2,0.45)
r2 ~ dbin(pi2,N2)
ratio <- pi2 / pi
difference
<- pi2 - pi
}
A deterministic relationship as indicated by a hollow arrow gets coded in the model using
the <- symbol as is used in R or Splus. A statistical relationship gets coded in the
model with a ~ symbol.
Add the data.
In the model window type: list(N=14, r=4, N2=12, r2=5).
24
Use the Specification Tool to check the model, load the data, compile, and generate the
initial values.
Use the Sample Monitor Tool to set ratio and difference.
Use the Update Tool to generate 5000 values of the
ratio and difference.
difference sample: 5000
ratio sample: 5000
6.0
1.5
4.0
1.0
2.0
0.5
0.0
0.0
-0.4
-0.2
0.0
0.2
0.0
0.5
1.0
1.5
2.0
25
Does it appear as if the sample rates of unemployment are significantly different?
Hint: The area under the difference curve to the right of zero is not much larger
than the area to the left of zero. Also the area to the right of one under the ratio
curve is not much larger than the area to the left of one.
Also, the mean difference is close to zero and the mean ratio is close to one. The
95% credible interval for the difference in unemployment rates is (-0.15, 0.20),
which is interpreted to mean that there is a 95% probability that the difference lies
somewhere in this interval. This interval includes 0. Similarly the 95% credible
interval for the ratio of unemployment rates is (0.61, 1.90), which includes the
value of 1.
In Bayesian analysis, the 95% credible interval is the analogue of the 95%
confidence interval in conventional statistics. However, with a Bayesian
analysis we state that there is a 95% probability that the parameter is between
the interval values. Whereas in a conventional analysis we state that 95% of all
such intervals will contain the true, but unknown value for the parameter
assuming the null hypothesis is correct.
26
Summary
This tutorial describes the basics of using WinBUGS.
It explains how to make directed graphs using Doodle. A directed graph provides
visual representation of a statistical model. The graphs simplify complex models,
communicate the structure of the problem, and provide the basis for computation.
It explains how to compile, load data and run WinBUGS code. The code can be
written automatically from the directed graph.
It explains how to view output using a log file. The density and statistics give
information about the posterior distribution that can be used for drawing inferences.
The examples were easy as they relied on conjugate distributions (beta, binomial).
Also, there was only one unknown parameter in example #1 and only two unknown
parameters in example #2.
WinBUGS works well for these types of problems. For more complicated problems
with lots of parameters, we cannot be so sure. WinBUGS provides a set of
diagnostic tools that allow you to check whether things are working properly.
Things can go wrong and the help manual contains a warning. Using examples
from the manual is a good path to follow, but always look critically at your results
to see if they make sense. Use the diagnostic tools.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement