BayesAss Edition 3.0 User`s Manual

Add to My manuals
12 Pages

advertisement

BayesAss Edition 3.0 User`s Manual | Manualzz

MCMC run completed. Output written to BA3out.txt

The program will create an output file in the current working directory when it has finished running. By default the output file is named BA3out.txt. You can double click on this file to open it with the Mac text editor and see the results.

3 Data file format

The BA3 program uses an input file format that is identical to that of earlier BayesAss releases. The input file should be in a plain text format. DO NOT use a word processor such as Word to create the input file without explicitly converting it to a text file format before use. One possible approach is to input the data into a spreadsheet program such as

Excel and then save the file as a “space-delimited text file.” Another approach is to install one of the many available free text editors such as emacs or vi on your computer. Each line of the input file should have the following format indivID popID locID allele1 allele2 where indivID is a unique identifier for the individual, popID is a unique identifier of the individual’s source population, locID is a unique identifier for the locus, and allele1 and allele2 are the allele labels for each allele of the individual’s genotype. The order of the alleles on the line is arbitrary. Missing alleles are represented using a 0. If there are n individuals and L loci there will be n

×

L lines in the input file. See the example data files distributed with the program.

4 Command line options

The BA3 program has about a dozen command line options that allow you to control the way the program runs and the level of detail in the output that it produces. The command line options are given after the program name and before the input file name.

For example,

./BA3 -v -i=10000000 -o myout.txt myin.txt

executes the program for 1 million iterations using verbose output, writing the output to the file myout.txt and using the input file myin.txt. Some options such as the option specifying the number of iterations, -i, take parameter values while others such as -v do not. Parameter values should follow the option specifier and may, or may not, be separated from the option specifier by a space or an an = sign. For example, the following are all equivalent ways to specify 1, 000, 000 iterations:

4

./BA3 -i1000000 myin.txt

./BA3 -i 1000000 myin.txt

./BA3 -i=1000000 myin.txt

Table 1 lists all the command line options with a brief description of their parameters and effects. Each option is described in detail in the remainder of this section. Following Unix conventions, each command line option has two possible forms, a short (one letter) form preceded by - and a longer, one word form preceded by --, for example the ”verbose output” command can be specified on the command line as either -v or --verbose. The longer forms are available solely because some persons find them easier to remember.

Option

-a --deltaA

Parameter Values Effect

0

<

A

1.0

Mixing parameter for allele frequencies

Number of iterations to discard as burnin

-b --burnin

-f --deltaF

Positive integer

0

<

F

1.0

-g --genotypes

None

Mixing parameter for inbreeding coefficients

Output genotypes and migrant ancestries

-i --iterations

Positive integer

-m --deltaM

0

<

M

1.0

-n --sampling

Positive integer

-o- --output

String

-s --seed

-t --trace

-u --settings

-v --verbose

Positive integer

None

None

None

Number of iterations for MCMC

Mixing parameter for migration rates

Interval between samples for MCMC

Output file name

Seed for random number generator

Create a trace file to monitor convergence

Output options and parameter settings

Use verbose screen output

Table 1: Options available for BA3 program

4.1

Random number generator seed

The option -s (--seed) is used to specify a positive integer used to ”seed” the random number generator algorithm. A deterministic algorithm is used to generate pseudorandom numbers during the MCMC such that the sequence of random numbers is entirely determined by the starting seed. Thus, separate runs of the program started using same seed will produce exactly the same outcome. To test whether the program is converging it is important to carry out several independent runs initiated with different seeds.

To start the program using 10456 as the random number seed use the following command:

./BA3 -s=104656

If no seed is specified the default seed is 10.

5

4.2

MCMC iterations, burn-in and sampling interval

The command line option -i (--iterations) specifies the number of iterations for the

Markov chain Monte Carlo (MCMC) analysis. By default the program uses 5, 000, 000 iterations. The number of iterations is an important factor in determining whether a MCMC analysis has converged (see below). In general, a greater number of iterations will be more likely to insure convergence but the run-time of the program also increases in proportion to the number of iterations. The value of the number of iterations should be a positive integer. For example,

.\BA3 -i10000000 test.txt

will execute the program using the data file test.txt and carry out 10 million iterations. The option -b (--burnin) is used to specify a positive integer that is the number of iterations of the MCMC that are discarded before sampling begins to obtain a sample of values that will be used to estimate parameters. Burn-in length is chosen such that the chain is likely to have reached the stationary distribution before sampling begins. The burn-in length must obviously be less than the total number of iterations. For example,

./BA3 -i10000000 -b1000000 test.txt

will run the MCMC for 10 million iterations, discarding the first 1 million iterations. In this case, 9 million iterations are available for sampling. The option -n (--sampling) is used to specify a positive integer that is the interval between samples. This interval must obviously be less than the number of iterations minus the burn-in, but will typically be much smaller, perhaps 100 or 1000. For example,

./BA3 -i10000000 -b1000000 -n1000 test.txt

will run the MCMC for 10 million iterations, discarding the first 1 million iterations and sampling every 1000 iterations from the remaining 9 million iterations, producing a sample of 9000 observations from the chain that will be used to estimate parameters.

4.3

MCMC mixing parameters

For continuous parameters such as migration rates, allele frequencies and inbreeding coefficients, the size of the proposed change to the parameter value at each iteration of the

MCMC can be adjusted. These adjustments are used to fine-tune the acceptance rates for proposals (see discussion below). There are 3 mixing parameter adjustments: -a

(--deltaA), -f (--deltaF) and -m (--deltaM) that adjust the proposal size for the allele frequencies, inbreeding coefficients and migration rates, respectively. Each mixing parameter should be a number between 0 and 1, with the size of the proposed move being proportional to the magnitude of this number.

6

advertisement