Dissertation TSH hyperref

Dissertation TSH hyperref
Dissertation
submitted to the
Combined Faculties for the
Natural Sciences and for Mathematics
of the Ruperto-Carola University of Heidelberg
Germany
for the degree of
Doctor of Natural Sciences
Put forward by
Dipl.-Phys. Tim Schulze-Hartung
born in Ludwigshafen am Rhein
Date of oral examination: 25 January 2013
Searching and Characterising Exoplanets
using
Astrometry and Doppler Spectroscopy
Referees:
Prof. Dr. Thomas Henning
Prof. Dr. Andreas Quirrenbach
We on Earth have just awakened
to the great oceans of space and time
from which we have emerged.
— Carl Sagan
Zusammenfassung
Für die Suche nach extraterrestrischem Leben wie zur Weiterentwicklung von theoretischen Modellen ist der Nachweis von Exoplaneten von zentraler Bedeutung. Die meisten
Planetenkandidaten sind bislang mittels Doppler-Spektroskopie des Zentralsterns entdeckt
worden. Die so gemessenen Radialgeschwindigkeiten lassen in Kombination mit Astrometrie
Rückschlüsse auf Masse und Charakteristik der Umlaufbahn des Begleitojekts zu. In der
vorliegenden Arbeit werden Modelle zur theoretischen Beschreibung der entsprechenden
Observablen hergeleitet. Diese Modelle können unter Annahme eines Fehlermodells mit den
Messdaten verglichen werden. Als statistische Methode wird hier der Bayes’sche Ansatz
genauer erläutert und weiterverfolgt. Neben der Berücksichtigung von A-priori-Wissen
erlaubt dieser unter anderem die Ableitung von Wahrscheinlichkeitsdichten der Parameter
und den statistisch robusten Nachweis von Exoplaneten. Zur Anwendung der Bayes’schen
Methode auf Astrometrie- und Radialgeschwindigkeitsdaten wurde das Computerprogramm
Base stark weiterentwickelt. Nach einer ausführlichen Vorstellung des Tools wird dieses zur
Bestimmung der Umlaufbahn des Doppelsterns Mizar A eingesetzt. Hierbei werden frühere
Resultate bestätigt und fundierte Aussagen über die Unsicherheiten der Parameter getroffen.
In Bezug auf den nahen Stern Eridani lassen eine Frequenzanalyse und der Bayes-Faktor
keine eindeutigen Schlüsse über die Präsenz eines umstrittenen planetaren Begleiters zu.
Dies könnte durch stellare Aktivität sowie Eigenschaften der zugrundeliegenden Daten
verursacht sein. Eine weitere Untersuchung dieser Effekte erscheint aussichtsreich.
Abstract
The discovery of exoplanets plays a key role both in advancing theoretical models and in
the search for extraterrestrial life. Most planet candidates have so far been detected by
means of Doppler spectroscopy of their central star. The radial velocities thus measured,
in combination with astrometry, allow to draw conclusions on the mass and orbital
characteristics of the accompanying object. In this work, models which theoretically
describe the corresponding observables are derived. Under the assumption of an error
model, these observable models can be compared to the measured data. Here, the Bayesian
approach is detailed and pursued as a statistical method. In addition to considering a
priori knowledge, it allows to derive probability densities over the parameters as well as the
statistically robust detection of exoplanets. To apply the Bayesian method to astrometric
and radial-velocity data, the computer program Base has been significantly extended.
After presenting the tool in detail, it is employed to determine the orbit of the binary star
Mizar A. This leads to a confirmation of earlier results and well-founded statements on the
parameter uncertainties. For the nearby star Eridani, a frequency analysis and the Bayes
factor do not support unanimous conclusions about the presence of a controversial planetary
companion. This might be caused by stellar activity and properties of the underlying data.
Further investigation of these effects seems promising.
Contents
1 The search for exoplanets
1.1 Planet formation and statistics
1.1.1 From dust to planets . .
1.1.2 Gas giants . . . . . . . .
1.1.3 Statistics and migration
1.2 Observational techniques . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
14
15
15
16
17
2 Physics and observables
2.1 Dynamics and kinematics of planetary systems .
2.1.1 Stellar motion in the orbital plane . . . .
2.1.2 Transformation to reference system . . . .
2.1.3 Relation to the planetary orbit . . . . . .
2.1.4 Multiple planets . . . . . . . . . . . . . .
2.2 Observable models . . . . . . . . . . . . . . . . .
2.2.1 Additional observable effects . . . . . . .
2.2.2 Hipparcos intermediate astrometric data
2.2.3 Radial velocities . . . . . . . . . . . . . .
2.2.4 Determination of planetary mass . . . . .
2.2.5 Binary systems . . . . . . . . . . . . . . .
2.3 Parameters and derived quantities . . . . . . . .
2.4 Errors and noise . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
22
23
26
26
28
29
30
32
33
34
34
38
3 Data analysis
3.1 Frequentist inference . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Likelihood of data . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Parameter estimation . . . . . . . . . . . . . . . . . . . .
3.1.3 Uncertainty estimation . . . . . . . . . . . . . . . . . . . .
3.1.4 Model selection . . . . . . . . . . . . . . . . . . . . . . . .
Period detection . . . . . . . . . . . . . . . . . . . . . . .
3.2 Bayesian inference . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Bayes’ theorem . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Encoding prior knowledge . . . . . . . . . . . . . . . . . .
3.2.3 Posterior sampling of parameters and derived quantities .
3.2.4 Marginalisation and density estimation . . . . . . . . . . .
3.2.5 Parameter estimation . . . . . . . . . . . . . . . . . . . .
3.2.6 Uncertainty estimation . . . . . . . . . . . . . . . . . . . .
3.2.7 Model selection . . . . . . . . . . . . . . . . . . . . . . . .
3.2.8 Model-uncertainty prediction and observation scheduling .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
41
41
42
43
44
44
45
46
46
47
48
49
50
50
51
53
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
4 Informatics and implementation
4.1 Requirements . . . . . . . . . . . . . . . . . . . . . .
4.2 Other software and features of BASE . . . . . . . . .
4.3 Modes of operation . . . . . . . . . . . . . . . . . . .
4.3.1 Normal and binary mode . . . . . . . . . . .
4.3.2 Number-of-planets modes . . . . . . . . . . .
4.3.3 Periodogram mode . . . . . . . . . . . . . . .
4.4 Input . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Invocation and options . . . . . . . . . . . . .
4.4.2 Data and selection of mode . . . . . . . . . .
File format . . . . . . . . . . . . . . . . . . .
Data grouping . . . . . . . . . . . . . . . . .
4.4.3 Priors . . . . . . . . . . . . . . . . . . . . . .
4.5 Program architecture . . . . . . . . . . . . . . . . . .
4.5.1 Top-level program flow . . . . . . . . . . . . .
4.5.2 Posterior sampling by MCMC . . . . . . . . .
4.5.3 Improvement of mixing by parallel tempering
4.5.4 Assessing convergence by multi-PT . . . . . .
4.5.5 Organisation of source code . . . . . . . . . .
4.6 Specific algorithms . . . . . . . . . . . . . . . . . . .
4.6.1 Periodogram mode . . . . . . . . . . . . . . .
4.6.2 Saving and reading samples . . . . . . . . . .
4.6.3 Synthetic data . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Bayesian analysis of exoplanet and binary orbits
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Methods and models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Likelihoods and frequentist inference . . . . . . . . . . . . . . . . .
5.2.2 Bayesian inference . . . . . . . . . . . . . . . . . . . . . . . . . . .
Bayes’ theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posterior inference . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.3 Observable models . . . . . . . . . . . . . . . . . . . . . . . . . . .
Stellar motion in the orbital plane . . . . . . . . . . . . . . . . . .
Transformation into the reference system . . . . . . . . . . . . . .
Relation to the planetary orbit . . . . . . . . . . . . . . . . . . . .
Observables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Effects of the motion of observer and CM . . . . . . . . . . . . . .
5.3 BASE – Bayesian astrometric and spectroscopic exoplanet detection and
characterisation tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 Prior knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.2 Physical systems and modes of operation . . . . . . . . . . . . . .
5.3.3 Types of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.4 Computational techniques . . . . . . . . . . . . . . . . . . . . . . .
5.4 Target and data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5 Analysis and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.1 Preparation of RV data . . . . . . . . . . . . . . . . . . . . . . . .
5.5.2 Pass A: first constraints on RV parameters . . . . . . . . . . . . .
5.5.3 Pass B: combining all data . . . . . . . . . . . . . . . . . . . . . .
5.5.4 Pass C: selecting the frequency f . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
55
55
56
58
58
58
58
58
59
69
70
70
72
72
73
74
76
77
78
80
80
81
81
.
.
.
.
.
.
.
.
.
.
.
.
85
85
88
88
90
90
91
95
95
97
98
98
99
.
.
.
.
.
.
.
.
.
.
.
103
103
103
104
104
105
107
108
110
110
111
11
5.6
5.7
5.8
5.5.5 Passes D and E: selecting ω2 and Ω and refining results
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix: Encoding prior knowledge . . . . . . . . . . . . . . .
Appendix: Numerical posterior summaries . . . . . . . . . . . .
6 A planet around Eridani?
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .
6.2 Previous work . . . . . . . . . . . . . . . . . . . . . .
6.3 Analysis and results . . . . . . . . . . . . . . . . . .
6.3.1 Determination of priors . . . . . . . . . . . .
6.3.2 Bayesian periodogram of all data . . . . . . .
6.3.3 Bayesian periodograms of individual data sets
6.3.4 Model selection . . . . . . . . . . . . . . . . .
6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
112
118
118
119
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
123
123
124
127
127
129
130
134
134
7 Conclusions
137
Acknowledgements
151
Chapter 1
The search for exoplanets
Discovering other worlds
The idea that life could exist outside of our Earth has fascinated humankind for countless
years. While life has not been unveiled on any of the seven other planets in our Solar
System, the investigation has a much wider scope encompassing thousands of stars in the
Solar neighbourhood which may host planets. Although this search has been a subject of
scientific investigation since the nineteenth century, it is only twenty years ago that the first
extrasolar planet (or exoplanet) was discovered in orbit around the pulsar PSR B1257+12
(Wolszczan and Frail 1992), followed by the detection of a Jupiter-mass planet orbiting the
main-sequence star 51 Pegasi (Mayor and Queloz 1995). Whereas the radiation of a pulsar
would probably not allow life as we know it to exist, 51 Pegasi is a host star similar to our
Sun. Since that time, more than 800 (partly unconfirmed) extrasolar planet candidates
with diverse properties have been unveiled in more than 600 systems, more than 100 of
which show signs of multiplicity (Schneider, J. et al. 2011). It has been demonstrated
using models for planetary formation and evolution that the diversity in properties of these
objects reflect the variety of conditions in which they have formed (Mordasini et al. 2009).
According to the premise that life tends to assume the forms known to us from Earth,
particularly that it is based on liquid water, the search increasingly concentrates on the
detection of terrestrial planets, i.e. rocky, low-mass planets potentially similar to our
Earth. Of prime interest are such planets that orbit their parent stars in what is called
the habitable zone. In this zone – whose scope depends on the stellar properties – planets
probably have to reside if most of the H2 O potentially present on their surfaces is to be
able to assume the liquid state. Still, atmospheric theory suggests that water surfaces
could also be completely frozen in the habitable zone (Boschi et al. 2012).
Thus, characterising the parent stars and orbits of exoplanets only serves to exclude
life-hostile environments, while to deliver conclusive evidence for life forms as we know
them, it is at least necessary to analyse the planets’ atmospheres and find unambiguous
biomarkers such as O2 or O3 (e.g. Seager 2003).
Besides the final question about life, it is interesting how many Earth-like planets exist
around the variety of types of stars and how their sizes and masses are distributed. While
most previous planet searches have focused on Sun-like stars and mainly found massive gas
giants similar to Jupiter, which cause the strongest of planetary signals in observational
data, stars of different masses and types are increasingly becoming targets of investigation.
13
14
Observing low-mass stars is promising because terrestrial planets are easiest to detect
around these; the fact that the empirical distribution of planetary minimum mass rises
towards lower masses (Butler et al. 2006) might indicate that terrestrial planets are frequent.
Furthermore, increasing the sample of planets with well-characterised orbits and properties
around stars of types as varied as possible will also help to improve our understanding
of planet formation by allowing to better compare predicted planet populations to the
actually observed ones (Alibert et al. 2011). Yet in this endeavour, one faces the challenge
of detecting and interpreting signals sometimes lower than the level of the disturbing noise
which is always present. Moreover, telescope time and thus observational data are always
sparse.
In the present work, statistical methods of detecting extrasolar planets and characterising
their orbits in terms of orbital parameters and derived quantities, based on astrometric
and radial-velocity data, are discussed, implemented and applied to such data. Closely
related to the orbits of planets are those of binary stars, which are treated analogously.
This work is organised as follows. Current models for planet formation and their
observed statistical properties are outlined in section 1.1, while section 1.2 aims to give
an overview of today’s observational techniques for exoplanet1 detection. In chapter 2,
models for the signals caused by planetary and binary companions in astrometric and
radial-velocity data are derived, followed in chapter 3 by a discussion of data-analysis
methods for relating data and models, taking into account noise and prior information
on the model parameters. The methods presented are implemented in Base, a software
tool which is described in chapter 4. After validating it in chapter 5 using observational
data, Base helps to assess the presence of putative planets around the star Eridani in
chapter 6. Finally, conclusions are drawn in chapter 7.
1.1
Planet formation and statistics
Planets are believed to be created in a common process together with their parent stars.
The formation of these objects takes place in interstellar molecular clouds comprised of gas
– predominantly H2 , but including many other kinds of atoms and molecules – and dust
in a variety of species. Due to local density fluctuations, gravitational instabilities arise
which may lead to the cloud’s collapse. However, because each cloud has a non-vanishing
angular momentum, not all of the mass falls onto the central protostar: a protostellar
disk is also formed in the plane perpendicular to the total angular momentum, possibly
accommodating a high fraction of the collapsing cloud’s matter. From solar-mass clouds, a
protostellar “embryo” is thought to be created within less than 105 yr, while the remainder
of mass accretion takes multiples of that time, amounting to about a million years after
the initial collapse before the star has reached its final mass and proceeds towards the
main sequence. At that time, a long process has also begun further out in the disk – the
formation of protoplanetary bodies, which evolve over tens if not hundreds of millions of
years to planets such as those found in our Solar System. The processes of planet formation
are still only partly understood, but they may clearly be as manifold as the resulting bodies
that have already been detected. Several planet-formation models have been developed,
the most widely accepted of which are briefly sketched below. For an in-depth treatment of
the matter, the reader is referred to the review article by Papaloizou and Terquem (2006),
1
Although most methods described in section 1.2 apply to any kind of companion to a star, including
planets, brown dwarfs, or stars, companions are referred to as “exoplanets” – the least massive case, hence
generally the hardest to detect.
15
as well as to Perryman (2011), who provides a recommendable reference not only in this
area, but for contemporary exoplanet science in all its variety.
1.1.1
From dust to planets
The fine dust grains initially present in the protoplanetary disk, with sizes ranging down
to below 1µm, collide and stick together, forming larger bodies between 1 cm and 10 m
in diameter. This is accompanied by the dust settling in the disk’s mid-plane. Here,
the temporal order of the two processes is just one example of the many aspects of the
models still uncertain. Due to their significant interaction with the gas in the disk, the
newly formed larger grains experience a force driving them to the centre of the disk on
average, causing a time span of only about 100 yr to be available for coagulation. An effect
helping to circumvent this difficulty may be turbulence, accompanied by local pressure
enhancements that facilitate the aggregation of solids.
As the meter-size barrier is overcome, planetesimals with diameters of order 1 km and
more, whose interactions are increasingly determined by gravity, are presumed to form by
essentially the same mechanisms as their smaller predecessors, i.e. pairwise collisions and
sticking. The increasing role of gravitational interactions then leads to a phase of runaway
growth between size scales of 10 km and 100 km, where the rate of mass growth increases
with time for larger bodies only. This phase is probably followed by an oligarchic-growth
epoch, in which the larger bodies all grow at a similar rate, dominantly removing material
from the influence of their smaller “competitors” and ending up about 1000 km in diameter.
Finally, the strong gravitational interactions between these larger bodies become more
chaotic, leading to severe disruptions of their Keplerian orbits around the barycentre and
allowing for collisions that shape the final planetary system and produce planets of masses
similar to that of the Earth.
1.1.2
Gas giants
In contrast to the rocky planets discussed above, the existence of giant planets such as
Jupiter cannot be explained solely by the same mechanisms. Predominantly composed of
gaseous matter tens or hundreds of times the mass of the Earth, potentially surrounding
a solid core, such objects probably could not be formed from the material found close to
the accreting protostar, whereas further out in the disk, the accretion time scales may be
too long. Instead, the formation of such gas giants is explained today by two competing
scenarios: core accretion (which is favoured overall) and gravitational instability.
The core-accretion model states that some of the objects produced as rocky planets,
with masses of several (tens of) Earth masses, may turn into the cores of newly forming
giant planets by gravitationally capturing large amounts of gas and also planetesimals.
The developing gaseous envelope further enhances the rate of collection of planetesimals,
an increasing fraction of which dissolve in the dense gas. Where conditions are adverse, i.e.
the available gas or core mass is insufficient, an ice giant such as Uranus and Neptune or
no giant planet at all may be formed from the core.
It is suspected that massive cores may only be able to form beyond the so-called snow
line,2 a certain distance from the protostar, where temperatures are so low that various
otherwise gaseous species are in the solid phase, thus providing enough matter to form
2
For the protoplanetary disk preceding our Solar System, this line has been found to correspond to a
distance of 2.7 AU from the centre (e.g. Lecar et al. 2006), which would be intermediate between the orbits
of Mars and Jupiter in today’s terms.
16
0.40
0.35
0.30
Density
0.25
0.20
0.15
0.10
0.05
0.00
-2
-1
0
1
2 3
log 1Pd
4
5
6
7
Figure 1.1: Distribution of the periods P listed for 820 planet candidates by Schneider
(2012). The dotted line indicates the median period P = 54.3 d.
massive cores. There, however, the gas-accretion time scales are longer, possibly conflicting
with the onset of dispersion of the gaseous protoplanetary disk by, e.g., photoevaporation
within about ten million years. While this may let some of the planets end up as ice
giants, the formation of gas giants by core accretion can also be modelled over such time
scales. One effect thought to accelerate the formation process is the protoplanet’s migration
(% 1.1.3) through parts of the disk.
In the competing model of gravitational disk instability, giant planets may be formed
by collapsing fragments of a dense protoplanetary disk. Taking place over much shorter
time scales than the various processes postulated in core accretion, gravitational instability
would enable the formation of gas giants well before the disk’s dispersion. Uncertainties
of this approach include whether fragmentation can occur at smaller distances from the
centre or for less massive protoplanets, and whether it could reproduce the assumed core
masses of the giant planets in our Solar System.
While both giant-planet formation scenarios are often viewed as contradictory, it has
also been suggested that they may in fact be complementary, with some fraction of the
giant planets formed by each of them.
1.1.3
Statistics and migration
The period distribution of the exoplanet candidates detected thus far is clearly bimodal, as
is evident from fig. 1.1.3 Its high-period peak is located at P = 625 d, corresponding to an
3
This graph is based on data compiled at the Extrasolar Planets Encyclopaedia (Schneider 2012). Is
has been produced using kernel density estimation (% 3.2.4), providing a differentiable alternative to the
often-used box-shaped histogram.
17
orbital radius of about 2.6 AU around a solar-mass star (eq. (2.63)), a similar position as
that of the suspected snow line in what preceded our Solar System (% 1.1.2). By contrast,
the higher peak is at a period of only P = 4.31 d, or a distance of 0.1 AU, corresponding to
the so-called hot Jupiters unveiled in large numbers: 41% of the putative exoplanets are
found to have semi-major axes not exceeding 0.2 AU (P . 33 d around a Sun-like star),
while 50% are within 0.5 AU of their host star, equivalent to a period of P . 120 d around
the Sun. A fraction of 53% of the candidates within 0.5 AU are at least half as massive as
Jupiter.
While the transit-photometry technique (% 1.2) has been yielding an increasing fraction
of the candidates, amounting to about a third to date, Doppler spectroscopy still accounts
for over 50% of the detections. Since each observational technique has its specific biases,
the observed distribution of period or other parameters should be treated with some
caution. In particular, transit spectroscopy is inclined to detecting lower periods due
to the smaller corresponding orbits, half of its detections being associated with periods
shorter than 4.2 d, and none exceeding one year. Thus, the transiting planets found cannot
contribute significantly to the high-period peak in fig. 1.1. Doppler spectroscopy also has
a monotonously increasing bias towards lower-period planets, which cause the stronger
signals in measured stellar radial velocities when conditions are otherwise unchanged.
Nevertheless, about half of its detections are associated with periods exceeding one year.
In conclusion, the clear bimodality of the period distribution can be expected to reflect at
least to a significant part the physical reality of exoplanet systems.
Although other explanations have been put forward, the observed “pile-up” of hot
Jupiters at small orbital radii is often attributed to a presumed migration of the giant
planets from their birth places further out in the protoplanetary disk (% 1.1.2) towards
the centre. In the vicinity of their host star, they then must be stopped in time by other
mechanisms that are still poorly understood. Migration may by caused by the planet
interacting with other planets, with planetesimals and/or with the disk’s residual gas,
and various hypothetical stopping mechanisms have also been proposed. Although both
theories are still a matter of debate, they are perhaps the favoured interpretations to date.
1.2
Observational techniques
Planets outside our own Solar System are regularly discovered at the present time. This is
made possible, from an observational point of view, by a variety of different techniques
outlined below.
Direct observational methods refer to the imaging of exoplanets (e.g. Levine et al.
2009), which reflect the light of their host stars but also emit their own thermal radiation.
To overcome the major obstacle of the high brightness contrast between planet and star,
techniques such as coronagraphy (Lyot 1932; Levine et al. 2009), angular differential
imaging (ADI) (Marois et al. 2006; Vigan et al. 2010), spectral differential imaging (SDI)
(Smith 1987; Vigan et al. 2010), and polarimetric differential imaging (Kuhn et al. 2001;
Adamson et al. 2005) have been invented. Still, imaging has only revealed few detections
and orbit determinations so far.
The most productive methods in terms of the number of detected and characterised
exoplanets are of an indirect nature, observing the effects of the planet on other objects or
their radiation.
Of these, transit photometry and spectroscopy (e.g. Charbonneau et al. 2000; Seager
2008) are noteworthy because they have helped uncover more than 200 exoplanet candidates,
18
plus over 2000 still unconfirmed candidates from the Kepler space mission (Koch et al.
2010): small decreases in the apparent visual brightness of a star during the primary or
secondary eclipse point to the existence of a transiting companion, whose spectrum may
additionally be inferred by subtracting the target spectra obtained during and outside an
eclipse. Such data allow one to determine the ratio of planetary to stellar radius and the
orbital inclination as well as the planet’s atmospheric composition and temperature.
Timing methods include measurements of transit timing variations (TTV4 ) and transit
duration variations (TDV) (e.g. Holman and Murray 2005; Nascimbeni et al. 2011) of
either binaries or stars known to harbour a transiting planet. The method used in the
first exoplanet detection (Wolszczan and Frail 1992) is pulsar timing, which relies on slight
anomalies in the exact timing of the radio emission of a pulsar and is sensitive to planets
in the Earth-mass regime.
Microlensing (Mao and Paczynski 1991; Gould 2009), which accounted for about 15
exoplanet candidates, uses the relativistic curvature of spacetime due to the masses of both
a lens star and its potential companion, with the latter causing a change in the apparent
magnification and thus the observed brightness of a background source.
Perhaps the most well-known technique, and one of those on which this thesis is based,
is known as Doppler spectroscopy or radial-velocity (RV) measurements (e.g. Mayor and
Queloz 1995; Lovis and Fischer 2010). With more than 400 exoplanet candidates, it has
been most successful in detecting new exoplanets and determining their orbits to date.
From a set of high-resolution spectra of the target star, a time series of the line-of-sight
velocity component of the star is deduced. These data allow one to determine5 the orbit in
terms of its geometry and kinematics in the orbital plane as well as the minimum planet
mass mp,min ≈ mp sin i. To derive the actual planet mass mp , the inclination i of the
orbit plane with respect to the sky plane needs to be derived with a different method, e.g.
astrometry. The RV technique is distance-independent by principle, but signal-to-noise
requirements do pose constraints on the maximum distance to a star. Stellar variability
sometimes makes this approach difficult because it alters the line shapes and thus mimicks
RV variations. The signal in stellar RVs caused by a planet in a circular orbit has a
semi-amplitude of approximately
s
K ≈ mp sin i
G
,
m? arel
(1.1)
where mp , m? , i, G, and arel are the masses of planet and host star, the orbital inclination,
Newton’s gravitational constant, and the semi-major axis of the planet’s orbit relative
to the star, respectively. This approximation holds for mp m? , which is true in most
cases. It should be noted that the sensitivity of the RV method decreases towards less
inclined (more face-on) orbits, which is an example for the selection effects inherent to any
planet-detection method.
Finally, astrometry (AM; e.g. Gatewood et al. 1980; Sozzetti 2005; Reffert 2009) – on
which this work is also based – is the oldest observational technique known in astronomy: a
stellar position is measured with reference to a two-dimensional coordinate system attached
to the sky plane. The measurements may either be absolute (wide-angle astrometry),
e.g. by using a single interferometer, or relative to physical reference stars (narrow-angle
4
For a list of abbreviations used in this article, cf. table 1.1.
Some quantities depend, however, on the knowledge of the stellar mass m? , which, using current
techniques (e.g. Torres et al. 2010), may be determined up to an uncertainty of about 6% for single
(post-)main-sequence stars over 0.6 M .
5
19
astrometry); in the latter case, the reference may be either given by a binary companion or
an unattached star. Alternatively, as is the case in space telescopes such as Hipparcos or
Gaia, the coordinate system may be defined globally by a grid of reference stars spanning
all the sky. Astrometry can thus be considered as complementary to Doppler spectroscopy,
which measures the kinematics perpendicular to the sky plane, i.e. in the line of sight.
In contrast to Doppler spectroscopy, AM allows one to determine the orientation of the
orbital plane relative to the sky in terms of its inclination i and the position angle Ω of the
line of nodes 6 with respect to the meridian of the target. A planet in circular orbit around
its host star displaces the latter on sky with an approximate angular semi-amplitude of
α≈
mp arel
,
m? d
(1.2)
where d is the distance between the star and the observer. Again, this approximation holds
for mp m? .
Imaging astrometry, in its attempt to reach sufficient presicion, still faces problems
due to various distortion effects. By contrast, interferometric astrometry has been used to
determine the orbits of previously known exoplanets, mainly with the help of space-borne
telescopes such as Hipparcos or the Hubble Space Telescope (HST), which presently still
excel their Earth-bound competitors (e.g. McArthur et al. 2010). However, instruments
like PRIMA (Delplancke et al. 2000; Delplancke 2008; Launhardt et al. 2008) or GRAVITY
(Gillessen et al. 2010) at the ESO Very Large Telescope Interferometer are promising to
advance ground-based AM even more in the near future.
While planet-induced signals in AM and RVs are both approximately linear in planetary
mass mp , they differ in their dependence on the orbital semi-major axis arel (eq. (1.1)
and (1.2)). Doppler spectroscopy is more sensitive to smaller orbits (or higher orbital
frequencies, eq. (2.63)), while AM favours larger orbital separations, viz. longer periods.
Comprehensive reviews of observational methods for exoplanet detection and characterisation can be found in Deeg et al. (2007, chapter 1) and Perryman (2011).
6
The line of nodes is the intersection of the orbital plane with the sky plane.
Table 1.1: Abbreviations used in this work.
Abbreviation
Meaning
ADI
AM
AMa
AMh
API
Base
BVS
CES
CFHT
CM
CRC
CSV
DFT
ESO
FAP
GOMP
GNU
GRAVITY
HARPS
HIPPARCOS
HPDI
HST
IRAS
JD
LC
LS
MAP
MCMC
MH
NASA
NLA
NPOI
OpenMP
PDF
PRIMA
PSR
PT
RLE
RMS
RV
SB
SDI
SIMBAD
SSB
TDV
TTV
VLC
VLT
VLTI
VTA
Angular differential imaging
Astrometry
Astrometry (angular positions)
Astrometry (Hipparcos intermediate astrometric data)
Application programming interface
Bayesian Astrometric and Spectroscopic Exoplanet Detection and Characterisation Tool
Bisector velocity span
Coudé Echelle spectrograph
Canada-France-Hawaii telescope
Centre of mass
Cyclic redundancy check
Comma-separated values
Discrete Fourier transform
European Southern Observatory
False-alarm probability
GNU OpenMP
GNU’s Not Unix
General Relativity Analysis via VLT Interferometry
High Accuracy Radial Velocity Planet Searcher
High-precision parallax-collecting satellite
Highest posterior-density interval
Hubble Space Telescope
Infrared Astronomical Satellite
Julian date
Long Camera
Lomb-Scargle
Maximum a-posteriori
Markov chain Monte Carlo
Metropolis-Hastings
National Aeronautics and Space Administration
Numerical Lebesgue Algorithm
Navy Prototype Optical Interferometer
Open Multi-Processing
Portable Document Format
Phase-Referenced Imaging and Microarcsecond Astrometry
Potential scale reduction
Parallel tempering
Run-length encoding
Root mean square
Radial velocity
Spectroscopic binary
Spectral differential imaging
Set of Identifications, Measurements and Bibliography for Astronomical Data
Solar system barycentre
Transit duration variations
Transit timing variations
Very Long Camera
Very Large Telescope
Very Large Telescope Interferometer
Volume Tesselation Algorithm
Chapter 2
Physics and observables
Observable effects of Newton’s gravity
The motion of two or more bodies in a bound system is responsible for all the deterministic
observable effects that are considered in this work. Such systems are either extrasolar
planetary systems, where one or more planets and a star orbit each other, or binary systems
comprised of two orbiting stars.
In this chapter we derive models for the relevant observables (observable models) from
orbital kinematics, which are in turn caused by the dynamics of the orbiting bodies.
Observable models consist of functions f (t; θ) of the model parameters θ and time t which
return theoretical values of the observables. These can be compared to measured data by
means of the likelihood (section 3.1.1).
In our treatment of the dynamics, we consider only isolated systems, i.e. no external
forces are taken into account, and relativistic effects are neglected, implying that all orbits
can be considered closed. We begin with the simplest case of a single-planet system, where
two bodies of differing masses orbit each other. Since only the star is observable by AM
and Doppler spectroscopy, it is the stellar motion which we first describe. Then, because
the closely related planetary orbit is of prime interest, we transform the parameters into
ones characterising the planet’s orbit.
By making several straightforward transformations, the results are then carried over
to binary systems, which behave completely analogously for the purposes of this work.
Finally, neglecting gravitational interactions between more than two bodies, observable
models are derived for multi-planet systems.
An overview of the model parameters used in this work is given in table 2.1, while
table 2.2 lists quantities that can be derived from them. For an in-depth treatment of
celestial mechanics, the interested reader is referred, e.g., to Moulton (1984).
2.1
Dynamics and kinematics of planetary systems
In the following, we examine the case of an isolated non-relativistic two-body system of
star and planet. The stellar motion around the two-body centre of mass (CM) is governed
by Newton’s Law of Gravity,
r̈ = −Gµ
21
r
,
|r|3
(2.1)
22
where
µ=
m3?
(m? + mp )2
(2.2)
is the mass function, r is position vector of the star with respect to the CM, m? is the
stellar mass, and mp is the planetary mass. We assume that the CM is unaccelerated, viz.
no external force acts upon the system, which implies that the reference frame is inertial.
2.1.1
Stellar motion in the orbital plane
The general solution corresponds to a motion in a fixed plane – the orbital plane – and
can be expressed in a polar coordinate system (r, ν), whose pole coincides with the CM
and whose fixed direction is that from the CM to the periapsis.1 Stumpff (1973) found the
solution for the radial coordinate to be
r = a?
1 − e2
= r(ν; a? , e),
1 + e cos ν
(2.3)
where a? and e are the semi-major axis and orbital eccentricity, respectively. The angular
coordinate ν ∈ [0, 2π) is known as the true anomaly. This equation describes the stationary
elliptical Keplerian orbit of the star with one focus of the ellipse coinciding with the CM.
Since eq. (2.3) implies that r varies in the range [a? (1 − e), a? (1 + e)] over the course of
an orbital revolution, we can define
r ≡ a? (1 − e cos E) = r(E; a? , e),
(2.4)
where E is called the eccentric anomaly. It follows from eq. (2.3) and (2.4) and a
trigonometric half-angle formula that the transformation between true and eccentric
anomaly is given by
cos E − e
cos ν =
.
(2.5)
1 − e cos E
The time dependence of E is given implicitly by Kepler’s equation,
E − e sin E = 2πf (t − T ) = M (t),
which implies
Ė =
2πf
,
1 − e cos E
(2.6)
(2.7)
where f = P −1 is the orbital frequency, P is the orbital period, T is the last time the
periapsis was passed before the first measurement (known as the time of periapsis) and
M (·) is the mean anomaly, which varies uniformly over the course of an orbit. Kepler’s
equation is transcendental and can be solved numerically to obtain E for every relevant
combination of e and M .2
In the components of position vector (r, ν)| the eccentric anomaly E appears only as
an argument to the cos(·) function (eq. (2.4) and (2.5)). Hence E is equivalent to E + 2π
and, according to eq. (2.6), M is equivalent to M + 2π. Consequently, both anomalies E
and M can be taken modulo 2π by redefining Kepler’s equation as
E − e sin E = 2π · mod(f (t − T ), 1) = M (t),
1
2
Here, the periapsis refers to the stellar position closest to the CM.
Alternatively, it is possible to solve Kepler’s equation using eq. (2.7) and E|t=T = 0.
(2.8)
23
where
$ %
mod(x, y) ≡ x −
x
y,
y
x, y ∈ R
(2.9)
defines the modulo function, which satisfies for z ∈ R
mod(zx, zy) = z mod(x, y).
(2.10)
Equation (2.8) implies that E = E(t; f, T ) is a periodic function of t with period f −1 ,
given f and T .
The time of periapsis T , due to its definition, lies within a range which depends on
orbital frequency f . However, T can be transformed into another parameter whose range
is simpler to determine. We proceed in a similar approach as Gregory (2005a) by using the
alternative χ, defined by
M (tr )
χ≡
= f (tr − T ).
(2.11)
2π
where tr is a reference time for the parameters known as the epoch (section 2.2). Thus,
Kepler’s equation becomes
E − e sin E = 2π · mod(χ + f (t − tr ), 1) = M (t),
(2.12)
where the mean anomaly M (·) varies uniformly over the course of an orbit. By reference
to eq. (2.4) and (2.5), it is readily shown that the stellar coordinates are periodic functions
of χ with period 1. χ is therefore called a cyclic parameter and treated as lying within the
range [0, 1).
To express the stellar position in cartesian coordinates, we set up a coordinate system S1
such that its origin is identical to the CM, its z-axis is perpendicular to the orbital plane
and its positive direction chosen such that ν̇ > 0 due to the orbital motion, and the
vector from the CM to the periapsis is orientated in positive x-direction. In S1 , the stellar
barycentric position is given by
x1
cos ν




r 1 =  y1  = r(ν; a? , e)  sin ν 
z1
0




(2.13)
and, using eq. (2.4) and (2.5),
cos E − e
√

r 1 = a?  1 − e2 sin E  = r 1 (E; a? , e),
0


(2.14)
where a? is the semi-major axis, e is the eccentricity, and E is the eccentric anomaly.
2.1.2
Transformation to reference system
To derive a model for the stellar barycentric position or velocity, respectively, we transform
S1 to a new coordinate system S4 by three successive rotations. These are described by
Euler angles, termed in our case argument of the periapsis ω? , inclination i, and position
angle of the ascending3 (or first4 ) node Ω, and are carried out as follows (fig. 2.1):
3
The ascending node is the point of intersection of the orbit and the sky plane where the moving object
passes away from the observer.
4
Without RV data, it cannot be determined whether a given node is ascending or descending; then, Ω is
defined to be the position angle of the first node.
24
Figure 2.1: Definition of the angles ω? , i, Ω. a) From S1 to S2 , the star and its sense of
rotation about the CM are indicated; the dotted line marks the major axis of the orbital
ellipse. b) From S2 to S3 , the observer and line of sight are indicated. c) From S3 to S4 ,
the positive x4 -axis points northward along the meridian of the CM.
25
1. Rotate S1 about its z1 -axis by (−ω? ) such that the ascending node of the stellar orbit
lies on the positive x2 -axis.
2. Rotate S2 about its x2 -axis by (+i) such that the new z3 -axis passes through the
observer.5
3. Rotate S3 about its z3 -axis by (−Ω) such that the new x4 -axis is parallel to the
meridian of the CM and points in a northern direction.
Except for the inclination, the signs of these rotation angles are chosen such that the
inverse rotations, leading from the reference system S4 to the stellar orbit, have positive
angles.
Thus, the stellar barycentric position has new coordinates
r 4 = Rzxz r 1 ,
(2.15)
with the passive rotation matrix

Rzxz

A F J


≡B G K 
C H L
(2.16)
defining the above rotations; its components are
A = cos Ω cos ω? − sin Ω cos i sin ω?
(2.17)
B = sin Ω cos ω? + cos Ω cos i sin ω?
(2.18)
F = − cos Ω sin ω? − sin Ω cos i cos ω?
(2.19)
G = − sin Ω sin ω? + cos Ω cos i cos ω?
(2.20)
J = − sin Ω sin i
(2.21)
K = cos Ω sin i
(2.22)
C = − sin i sin ω?
(2.23)
H = − sin i cos ω?
(2.24)
L = cos i.
(2.25)
A, B, F, and G are known as the Thiele-Innes constants, first introduced by Thiele (1883).
By taking the time derivative of eq. (2.15), we obtain the stellar velocity in S4 ,
v 4 = ṙ 4 = Rzxz ṙ 1 = Ė Rzxz
dr 1
,
dE
(2.26)
where
− sin E
dr 1
√

= a?  1 − e2 cos E 
dE
0

and Ė is given by eq. (2.7).

(2.27)
26
Figure 2.2: The orbits of star S and planet P around the centre of mass C. All three points
lie on a common line and the ratio of the lengths of the segments |CS| : |CP| is equal to
the mass ratio mp : m? .
2.1.3
Relation to the planetary orbit
By reference to the above results, the observables of AM and RV are easily derived
(section 2.2). They can be parameterised by quantities pertaining to the planetary instead
of the stellar orbit based on the following simple relation.
According to the definition of the CM, the line connecting star and planet contains the
CM and the ratio of their respective distances from the CM equals the inverse mass ratio,
−→
mp −→
CS = −
CP,
m?
(2.28)
where C, S and P stand for CM, star and planet, respectively. This implies invariable
relationships between the orbits of the star and the planet as follows. The two bodies orbit
their CM with a common orbital frequency f and time of periapsis T . With respect to the
corresponding periapsis, they always have the same eccentric anomaly E. Their orbital
shapes, viz. eccentricities e, are identical as well, and the orbital semi-major axes relate to
each other as
mp
a? =
ap .
(2.29)
m?
Additionally, the two bodies share the same sense of orbital revolution, hence those
nodes of both orbits which lie on the positive x2 -axis are ascending. Consequently, the only
Euler angle not shared by the stellar and planetary orbits is the argument of periapsis,
which differs by π because star and planet are in opposite directions from the CM.
2.1.4
Multiple planets
In the presence of multiple planets, an n-body problem with n > 2 arises, for which no
exact and general solution is known. Instead, several approaches exist which either give an
exact solution for special cases or an approximate (general) solution, e.g. by numerical
integration or in the form of a truncated Taylor series. Numerical integration can be
an important tool especially in systems with interaction between the planets, leading to
possibly unstable systems where bodies may collide or be ejected from the system.
By contrast, it is assumed in this work that any interactions between the planets can be
neglected, implying that their orbits around the star can be treated separately, as derived
5
A positive rotation angle is used here to ensure that the node is indeed ascending (fig. 2.1 b).
27
in the following. Furthermore, only the case where these orbits are coplanar, i.e. all bodies
move in the same orbital plane, is considered.
For any number np of planets, but particularly when np > 1, the position of the
common CM of all bodies is given by
−→
OC =
np
−→ P
−−→
m? OS +
mp,j OPj
j=1
mtot
(2.30)
−→
−−→
−→ P
m? OS + mp,j OS + SPj
j
=
−→
= OS +
(2.31)
mtot
P
−−→
mp,j SPj
j
mtot
,
(2.32)
from which the barycentric position of the star is immediately obtained as
−
→ X mp,j −−→
CS =
Pj S.
mtot
j
(2.33)
In the above equations, O is the coordinate origin, Pj is the position of the jth planet and
Pnp
mtot ≡ m? + j=1
mp,j .
Assuming that the total planetary mass in the system is much lower than the stellar
mass,
np
X
(2.34)
mp,j m? ,
j=1
eq. (2.33) may be approximated as
n
p
−
→ X
CS ≈
−−→
mp,j
Pj S.
m? + mp,j
j=1
(2.35)
because
mtot ≈ m? + mp,j
∀j ∈ {1, . . . , np }.
(2.36)
We furthermore define the two-body barycentre Cj of star and jth planet by specialising
eq. (2.30) to np = 1 and thereby obtain
n
p
−
→ X
−−→
CS ≈
Cj S.
(2.37)
j=1
Due to eq. (2.37), the stellar barycentric position r 1 in S1 can be approximated by the
(j)
sum of the stellar positions r 1 induced by the individual planets j, viz.
r1 ≈
np
X
(j)
r1 .
j=1
(2.38)
28
As all the bodies orbit in one common plane, the second and third rotations in
section 2.1.2 are defined by the same inclination i and position angle of the ascending
node Ω for all planets and only the arguments of periapsis ωj differ. Thus, eq. (2.15) gives
the position of the star in S4 as
r4 =
np
X
Rzxz (ωj , i, Ω) r 1 ,
(2.39)
j=1
with Rzxz as defined by eq. (2.16).
2.2
Observable models
In the following, the two types of astrometric data treated in this work are abbreviated
respectively as AMa (angular positions) and AMh (Hipparcos intermediate data), while
AM denotes astrometric data in general.
To express the stellar barycentric position as a two-dimensional angular position, we
perform a final transformation of S4 into a spherical coordinate system S5 with radial,
elevation and azimuthal coordinates (r, δ, α)| . Its origin is identical with the observer, its
reference plane coincides with the (y4 , z4 )-plane and its fixed direction is −z4 , pointing
from the observer to the CM. In S5 , the radial coordinate of the CM equals a distance d,
which relates to the parallax $ as
1 AU
d=
.
(2.40)
$
The distance is assumed constant in the following and the radial coordinate in S5 omitted.
In the new system, the two-dimensional angular barycentric position of the star is obtained
as
r5 ≡
δ5
α5 cos δ5
!
1
=
d
0
=a
x4
y4
!
(cos E − e)
(2.41)
A
B
!
+
p
1−
e2 sin E
F
G
!!
(2.42)
with
a0 ≡ $a? · 1 AU−1
(2.43)
in S5 , where the first coordinate r ≡ d has been omitted, δ is called the declination and α
is called the right ascension. The factor cos δ in eq. (2.41) stems from the fact that the
line element in spherical coordinates, for r = const, is dr = r(eδ dδ + eα cos δ dα), where
eδ and eα are the unit vectors in the local directions of increasing δ and α, respectively.
The coordinate α cos δ is often abbreviated α∗ .
In binary systems, observations can use the binary companion as point of reference
and coordinate origin and may therefore be described by eq. (2.42) with only minor
modifications (% 2.2.5). In particular, because δ 1 rad, the factor cos δ ≈ 1 can be
omitted and the second coordinate is simply α.
By contrast, single-star planetary systems lack an observable object near the CM in
general. Therefore, one or more physically unattached stars or a set of stars need to be
referred to when observing the latter type of systems. In the following, we derive the stellar
coordinates in an independent coordinate system S6 assumed to be fixed to an inertial frame
29
as the previous coordinate systems S1...5 . We first assume that the coordinates (δ, α∗ )| in
S6 of the fixed direction of S5 change linearly in time, with the rate of change being
µ≡
µδ
µ α∗
!
(2.44)
and the values at a given reference time or epoch tr being
rr ≡
δr
αr cos δr
!
(2.45)
.
Thus, the angular position of the star in S6 results as
r6 ≡
δ6
α6 cos δ6
!
= r 5 + r r + (t − tr )µ.
(2.46)
In general, the epoch tr is the instant at which such parameters as δr or αr are defined
– parameters that represent the values of time-variable quantities. In particular, if we
momentarily ignore the orbital motion by setting δ5 ≡ α5 ≡ 0, the position r 6 at t = tr
is identical to r r . In this work, all parameters referring to the following aspects of the
observable models are assumed to be constant and therefore not associated with an epoch:
• motion in the orbital plane and orbital shape
• coordinate transformations to systems S2...5
• distance of CM
• rates of linear changes of astrometric position and radial velocity.
2.2.1
Additional observable effects
In the following, we list several physical effects which may be important when analysing
the types of data considered in this work. However, most of them only affect the analysis
of AM data, therefore those also concerning RV data are explicitly pointed out.
• Light deflection, a relativistic effect where the masses of objects modify the apparent
position of a target, is not considered in this work.
• The following classical effects are related to the finite speed of light and can be
treated in combination as planetary aberration (Green 1985):
◦ Aberration designates the change of the apparent target position as seen from
Earth due to Earth’s orbital velocity of about 10−4 c, where c is the speed of
light.
◦ Light-time correction takes into account the target’s motion during the time
taken for light to reach the observer and affects both AM and RV data.
In this work, planetary aberration is neglected because aberration is irrelevant for
AMa data of binaries and already removed from AMh data (van Leeuwen 2007,
chapter 2.5.3), and because light-time correction cannot be carried out reliably due
to the uncertainty in the target’s distance.
30
• Annual parallax is the shift in the apparent position of the target due to Earth’s
orbital motion around the SSB. For wide-angle or global AM, this effect needs to be
accounted for according to the absolute parallaxes $ of the relevant bodies, and it
is therefore included in the Hipparcos model. By contrast, relative parallaxes of
the observed bodies may be referred to in narrow-angle AM. For AMa measurements
of the relative angular positions of binary components situated (approximately) at
the same distance from the observer, annual parallax can be neglected (see e.g.
section 5.2.3). In RV data, a correction for the effect of Earth’s motion has usually
already been applied (section 2.2.3).
• Perspective secular changes in certain parameters pertaining to AM and RV data are
due to the relative motion of the target with respect to the SSB and, correspondingly,
a change in direction towards the target. For a linear motion, the temporal changes
in parallax $, proper motion µ and radial velocity of the CM v are given by (van de
Kamp 1977):
$̇ = −v$2 · 1 AU−1
(2.47)
µ̇ = −2vµ$ · 1 AU−1
(2.48)
−1
(2.49)
v̇ = µ $
2
· 1 AU,
where both $̇ and µ̇ change their signs from positive to negative at perihelion, whereas
v̇ > 0 always. Assuming values of $ = 0.0200 , µ = 0.100 yr−1 , and v = 20 km s−1
– which are near the median values given in the Hipparcos catalogue (Perryman
et al. 1997) for stars with $ ≥ 0.0100 and the median of the absolute value of radial
velocities given in Chubak et al. (2012, table 3), respectively – the resulting rates of
change are
$̇ = −0.0082 µas yr−1
−2
(2.50)
µ̇ = −0.082 µas yr
(2.51)
v̇ = +0.011 m s−1 yr−1 .
(2.52)
Expressed as relative rates of change, this gives
|$̇ $−1 | = 4.09 × 10−7 yr−1
−1
|µ̇ µ
−7
| = 8.18 × 10
yr
−1
|v̇ v −1 | = 5.75 × 10−8 yr−1 .
(2.53)
(2.54)
(2.55)
Due to their typically small values, perspective changes are neglected in this thesis.
Nevertheless, a linear acceleration av of radial velocity may be allowed (% 2.2.3),
which may include the influence of an outer, unmodelled planet and/or perspective
acceleration.
2.2.2
Hipparcos intermediate astrometric data
Instruments aboard the Hipparcos satellite measured one-dimensional positions of 118,300
stars along great circles on the sky. Observations were taken simultaneously in two fields
of view, separated by a basic angle of 58◦ . The satellite was spinning with a frequency of
11.25 d−1 , maintaining a fixed solar aspect angle ξ = 43◦ between the spin axis and the
direction of the Sun in order to improve thermal stability; its spin axis was made to precess
31
around the direction of the Sun. Due to the motion of the satellite, the stellar images
moved over a detector and their transit times, the basic data of the mission, were measured.
These were then translated into positions, or abscissae, along well-defined reference great
circles on sky. In this “great-circle reduction”, each abscissa was based on up to five
successive rotations of the satellite performed during the time span of ca. 10.7 h, i.e. one
orbital revolution. The great-circle reduction included reconstruction of the satellite’s
along-scan attitude, or scan phase. Afterwards, all abscissae were combined in one common
solution (“sphere reconstruction”) and the five AM parameters δ, α, µδ , µα∗ , $ and their
uncertainties were estimated for each observed star (“astrometric parameter solution”).
This led to the original Hipparcos catalogue published by Perryman et al. (1997).
In the new reduction by van Leeuwen (2007), the basic data were combined in a different
way, including the following aspects:
• The processes involved in the great-circle reduction were decoupled from each other;
• because the published AM parameters could be used as reference values, their errors
could be assumed uncorrelated and following a distribution with zero mean and finite
variance, which was not previously possible;
• drifts of the basic angle were identified and corrected in 0.7% of the satellite’s orbits;
• in reconstructing the scan phase of the satellite, a different model was used and
different weights were assigned to the two fields of view;
• discontinuities in the scan phase and attitude of the satellite were identified and
corrected.
While the original procedure involved projecting all the transits of an orbit onto one
common reference great circle, the new reduction combined only the data of one field
transit each, resulting in up to five abscissae per orbit, whose accuracy excelled that of
their predecessors. The definitions of several relevant angles are according to fig. 2.3.
The abscissa residuals are determined as follows. The a-priori reference position r r of
the target is converted to a reference abscissa ah,r by projecting it onto the scan circle.
Then, the difference ∆ah ≡ ah − ah,r between the measured and the reference abscissae
gives the abscissa residual.
Following van Leeuwen (2007), the abscissa residuals can be modelled as:
∆ah = sin ψ(δ(t) − δh (t)) + cos ψ(α∗ (t) − α∗,h (t)) + γ($ − $h ),
(2.56)
with model position (δ(t), α∗ (t))| ≡ r 6 (eq. (2.46)) and reference positions
δh (t) = δh + µδ,h (t − th )
α∗,h (t) = α∗,h + µα∗ ,h (t − th ).
(2.57)
(2.58)
Here, ψ is the scan-orientation angle, δh , µδ,h , α∗,h , µα∗ ,h , $h are AM reference parameters
(defined with respect to the Hipparcos mean and reference epoch th = J1991.25), $ is the
parallax, and γ is the time-dependent parallax factor allowing to account for the annual
parallax. The values of the reference parameters and the parallax factors for all data are
part of the intermediate-data files in van Leeuwen (2007). Thus, the combination of eq.
(2.46), (2.56), (2.57) and (2.58) yields the model equation for AMh data, ∆ah = ∆ah (t; θ).
32
Figure 2.3: Definition of angles for the Hipparcos instrument. Shown are the position
of the Sun , the celestial equator E, the target star ?, the instantaneous spin axis s
and, perpendicular to it, the scan circle S; labeled angles are the solar aspect angle ξ, the
scan-orientation angle ψ and the abscissa ah .
2.2.3
Radial velocities
In contrast to AM, RV data are usually automatically transformed into an inertial frame
resting with respect to the SSB (e.g. Lindegren and Dravins 2003), which allows the Earth’s
motion to be neglected in this model and treats the observer’s rest frame as inertial. The
model function for the stellar radial velocity measured by an observer is thus given by
v(t; θ) = −(v 4 )z + V + (t − tr )av ,
with
(v 4 )z = (ṙ 4 )z = −
np
X
Kj
sin Ej sin ωj −
q
1 − e2j cos Ej cos ωj
1 − ej cos Ej
j=1
(2.59)
,
(2.60)
where V is the reference RV6 , av is a constant RV acceleration which may account for
the linear component of a potential perspective secular change of radial velocity (% 2.2.1)
and/or unmodelled low-period, outer planets, j ∈ [1, np ] is the planet’s index – omitted
in the following for brevity – and np is the number of planets, which are assumed to be
non-interacting (section 2.1.4). Further, the RV semi-amplitude K can be expressed as
K = 2πf a? sin i
=
p
3
2πGf
(2.61)
mp sin i
2
(mp + m? ) 3
.
(2.62)
The last equality holds because of Kepler’s third law,
(ap + a? )3 f 2 =
6
G
(mp + m? )
4π 2
(2.63)
The reference RV consists of the radial velocity of the CM plus an offset due to the specific calibration
of the instrument (% 2.4). An independent parameter Vi is therefore used for each data set (table 2.1).
33
and the definition of the CM (eq. (2.28)). Owing to eq. (2.62), only one of a? , K needs to
be employed in the AM and RV models; here we adopt K.
It should be noted that a different version of eq. (2.59) involving ν instead of E and the
K
alternative definition Kalt ≡ √1−e
(table 2.2) is often found the literature (e.g. Gregory
2
2005a).
2.2.4
Determination of planetary mass
In the following, we assume that a given RV semi-amplitude K > 0 and frequency f have
been estimated from RV data and that the stellar mass m? is also known exactly, but that
the orbital inclination i is unknown. Then, the quantity
fm ≡
mp sin i
(mp + m? )
2
3
= √
3
K
,
2πGf
(2.64)
3 is known as the mass function, is also given. From it, the minimum
whose cube fm
planetary mass mp,min can be determined as follows. Rewriting eq. (2.64) as
2
mp sin i = fm (mp + m? ) 3 ,
(2.65)
it is, first, clear that the value of mp sin i is not strictly known unless the planetary mass mp
is also known. As both sides of eq. (2.65) depend on mp , and assuming that a minimum
mass mp,min exists which satisfies this equation, it follows that
2
2
min(mp sin i) = min fm (mp + m? ) 3 = fm (mp,min + m? ) 3 .
mp
mp
(2.66)
Moreover, rewriting eq. (2.65) as
2
sin i =
fm (mp + m? ) 3
,
mp
(2.67)
we have
d sin i
fm (mp + 3m? )
= − 2√
< 0.
dmp
3mp 3 mp + m?
(2.68)
With i ∈ (0, π), this implies
1 = max(sin i) = sin i|mp =mp,min ,
mp
(2.69)
i.e. the minimum mass mp,min is reached for edge-on orbits, i = π2 . Insertion of the latter
values into eq. (2.65) and comparison with eq. (2.66) implies
2
mp,min = fm (mp,min + m? ) 3 = min(mp sin i),
mp
(2.70)
which can be solved numerically for mp,min , given fm and m? (table 2.2).
Despite the conceptional difference between mp sin i and mp,min , it should not go
unnoticed that one may write, using m? mp ,
2
mp,min ≈ fm m?3 ≈ mp sin i,
(2.71)
34
an approximation often made implicitly in the literature. The relative error of this
approximation is less than 1% even for a 10 MJ planet and thus smaller than the usual
uncertainty of several percent in m? (% 1.2), which has been neglected above.
If, by contrast, the inclination i is known, e.g. through the availability of AM data, then
the planetary mass mp itself can be determined. This is achieved by rewriting eq. (2.65) as
mp =
2
fm
(mp + m? ) 3 ,
sin i
(2.72)
which contains only (albeit not exactly) known quantities besides mp and thus can be
solved numerically for mp (table 2.2).
2.2.5
Binary systems
If the primary and secondary binary components assume the roles of star and planet,
respectively, the above reasoning also yields the observables of a binary system.
For visual binaries, AMa measurements often refer to the position of the secondary
with respect to the primary, implying that proper motion µ and constant AM offset r r can
both be ignored. From the definition of the CM, it follows that the orbit of the secondary
with respect to the primary is identical with its barycentric orbit but scaled by a factor
(m1 + m2 )m−1
1 , or with the semi-major axis equaling the sum of the two components’
barycentric semi-major axes,
arel = a1 + a2 .
(2.73)
Thus, eq. (2.42) can be used as AMa model function r(t; θ) for a binary with arel replacing
a? , ω2 + π replacing ω? and
$arel
.
(2.74)
a0rel ≡
1 AU
Equation (2.59) yields the RV of component i if K is replaced by (−1)i+1 Ki and ω by
ω2 . If AMa and RV data are combined, parameters a1,2 can be omitted in favour of K1,2
(eq. (2.61)) and arel is given by the equivalent of eq. (2.62) in combination with eq. (2.73).
2.3
Parameters and derived quantities
For several of the parameters listed in table 2.1, the default prior ranges may be determined
from the data. The ranges of the derived quantities listed in table 2.2 follow from
corresponding parameter ranges, but need not be explicitly specified as derived quantities are
sampled implicitly with the parameters (section 3.2.3). The automatic default ranges Ipri,θ
for parameters θ are as follows:
• reference radial velocity V :
◦ normal mode:
˜ i.
Ipri,V = ∆v
(2.75)
˜ i per RV data set i, with each range
◦ binary mode: section of the RV ranges ∆v
extended by one corresponding measurement uncertainty on both bounds, i.e.
˜ i=
∆v
max (vj + σj − (vk − σk )),
j,k∈[1,ND ]
(2.76)
35
where ND is the number of data, {vj } are the measurements and {σk } are their
uncertainties, all pertaining to data set i. Thus,
Ipri,V =
\
˜ i.
∆v
(2.77)
i
• radial-velocity acceleration:
˜ i|
˜ i|
|∆v
|∆v
= − min
, min
,
i
i
∆ti
∆ti
"
Ipri,av
#
(2.78)
where ∆ti is the time span covered by data set i. This means that the RV acceleration
is assumed not to exceed the minimum rate of RV change of any data set i determined
˜ i | ∆t−1 .
heuristically according to |∆v
i
• radial-velocity jitter σ+ :
Ipri,σ+ = [0, 3 stdev ({vj })],
(2.79)
where {vj } is the set of all RVs.
• astrometric jitter τ+ :
Ipri,τ+ = [0, τ+,max ],
(2.80)
where

stdev ({∆ah,j }) ,
τ+,max ≡ 3 · q 1 P 2

+ δ 2 ),
j (α
∗,j
ND
AMh data
AMa data.
j
(2.81)
The expression used for AMa data corresponds to three standard deviations of the
Euclidean distances of the binary components, calculated with respect to a sample
mean identified with zero.
• position angle of the ascending or first node Ω: the prior range of Ω is reduced from
the default [0, 2π) (ascending node) to [0, π) if no RV data are provided, in which
case Ω is defined as referring to the first node (section 2.1.2).
• radial-velocity semi-amplitude K, Ki : while the lower bound Kmin ≡ 0, the determination of the upper bound varies:
◦ normal mode: to determine the upper bound, the following alternatives7 have
been found: one may either use the physical argument of a maximum allowed
“projected mass” mp sin i,
Kmax =
p
3
2πGfmax
(mp sin i)max
2
(2.82)
m?3
˜ calculated over all sets,
according to eq. (2.62), or the extended RV range ∆v
q
˜ ·
= ∆v
1 − e2min
,
(2.83)
2
where the latter is based on the fact that for a given K and e, the RV-model
1
range is 2K(1 − e2 )− 2 .
Kmax
7
In Base, the variant can be chosen with an option (section 4.4.1).
36
◦ binary mode:
q
Ki,max
˜ i·
= ∆v
1 − e2min
2
.
(2.84)
Table 2.1: Parameters of the models in section 2.2.
Symbol8
Designation
Unit
Widest prior
support
V
av
reference RV11
RV acceleration
magnitude of additional
RV noise
parallax12
reference right ascension
reference declination
proper motion in α cos δ
proper motion in δ
magnitude of additional
AM noise
inclination
position angle of the ascending (or first) node13
eccentricity
orbital frequency
mean anomaly at tr over
2π
argument of periapsis
RV semi-amplitude
semi-major axis of stellar barycentric orbit over
distance14
semi-major axis of orbit of
secondary around primary
over distance14
m s−1
m s−1
—
—
U
S
×
×
×
×
×
×
m s−1
R+
0
M
×
×
×
arcsec
×
×
×
×
×
×
mas yr−1
mas yr−1
(0, 0.77]
[0, 360]
[−90, 90]
—
—
×
×
×
×
×
mas
R+
0
M
×
×
×
×
rad
[0, π)
U
×
×
×
×
rad
[0, 2π)
U
×
×
×
×
1
d−1
[0, 1)
(0, 10]
U
J
×
×
×
×
×
×
×
×
×
×
1
[0, 1)
×
U
×
×
×
×
×
rad
m s−1
[0, 2π)
R+
0
×
U
M
×
×
×
×
×
×
×
×
mas
[10−3 , 10−5 ]
J
×
×
mas
[10−3 , 10−5 ]
J
×
σ+
$
αr
δr
µ α∗
µδ
τ+
i
Ω
e
f
χ
ω, ω2
K, Ki
a0
a0rel
◦
◦
Cyclic9
×
×
Prior10
AMa
J
U
U
S
S
Used with data types
AMh RV AMa +RV AMh +RV
37
8
Parameters printed in boldface pertain to each planet individually in multi-planet mode.
For cyclic parameters θ, the indicated lower and upper bounds are treated as equivalent (section 2.1.1).
10
The default prior type; abbreviations: U (uniform), J (Jeffreys), M (modified Jeffreys), S (signed modified Jeffreys).
11
One instance of this parameter, Vi , is employed per RV data set in order to allow for differing offsets (section 2.4).
12
Prior support includes trigonometric parallax of the nearest star, Proxima Centauri (Perryman et al. 1997).
13
For details see section 2.1.2.
14
Lower bound corresponds to AM measurement uncertainty of 1 µas; upper bound according to wide-binary observations by Tolbert (1964).
9
38
Table 2.2: Quantities derived from model parameters.
Definition
P ≡ f −1
T ≡ tr −
χ
f
1
2
mp,min ≡ K(2πGf )− 3 (mp,min + m? ) 3
1
2
mp ≡ K(2πGf )− 3 (sin i)−1 (mp + m? ) 3
mj ≡
4π 2
G
K3−j (K1 +K2
f (2π sin i)3
2
ρ≡ m
m1 =
)2
K1
K2
K
Kalt ≡ √1−e
2
a? ≡ 2πfKsin i
K
a? sin i ≡ 2πf
m?
ap ≡ m
a?
p
?
ap,max ≡ mm
a? sin i
p,min
K
j
aj ≡ 2πf sin
i
arel ≡ a1 + a2 =
K1 +K2
2πf sin i
d ≡ 1 AU
$
2.4
Designation
Unit
Eq.
period
time of periapsis
minimum planetary
mass15
planetary mass16
d
d
binary-component mass
M
—
(2.11)
(2.64),
(2.70)
(2.72)
(2.61),
(2.63)
(2.29),
(2.61)
mass ratio of binary components
alternative RV semiamplitude17
semi-major axis of stellar
orbit around CM
semi-major axis of stellar orbit times sine of
inclination
semi-major axis of planetary orbit around CM
maximum semi-major
axis of planetary orbit
semi-major axis of component’s orbit around
CM
semi-major axis of orbit of secondary around
primary
distance
MJ
MJ
1
m s−1
—
AU
(2.61)
AU
(2.61)
AU
(2.29)
AU
(2.29),
(2.70)
AU
(2.61)
AU
(2.61)
pc
—
Errors and noise
Data from realistic, macroscopic measurement processes cannot be described exactly by
deterministic observable models, as there is always some amount of noise caused by a variety
of different and potentially unknown physical effects. Noise contributes a random error, or
mismatch between the measured and the predicted value. Furthermore, sytematic errors,
i.e. deterministic but a priori unknown alterations, may be caused by the chosen observable
model being too simplistic and/or overly complex with respect to the different aspects of
the true physical process in the target, and/or by a misunderstanding or miscalibration of
the instrument. Thus, assuming that an adequate “error-free” observable model is given
by function f (·; ·) with parameters θ, the measured value y i can be described as
y i = f (t; θ) + s,i + i ,
(2.85)
15
The implicit function of minimum planetary mass mp,min is solved numerically. The planet’s minimum mass equals its real mass if the orbit is edge-on, viz. sin i = 1 (section 2.2.4).
16
The implicit function of planetary mass mp is solved numerically.
17
Refers to the star or any of the binary components, respectively.
39
where s,i is the systematic error and i is the random error.
If the functional form of the systematic error s,i is known, it can be modelled by either
altering the observable model f (·; ·) or by letting it be absorbed into one or several of the
model parameters. In this work, only the unknown constant calibration offset affecting
most RV data sets is considered; it is accounted for by the reference radial velocity Vi ,
which is made an independent model parameter for each data set (table 2.1). All other
possible systematic errors are assumed to be insignificant (s,i ≡ 0) and therefore neglected.
Noise, although unpredictable by definition, may obey characterisable distributions
given by noise models, adjusted by one or several parameters (see also Sozzetti 2005). In
the following, we assume that the noise of different data is independent.
According to the principle of maximum entropy, given only the mean and variance of
a distribution, the normal (Gaussian) distribution has maximum information-theoretic
entropy, equivalent to minimum bias or prejudice with respect to the missing information,
and should therefore be used in these cases (Kapur 1989).
However, additional noise components may be present whose variance is unknown. We
consider two distinct classes of noise-induced error components in measurements of stellar
positions or radial velocities:
1. an error 0,i caused by instrumental effects and photon noise, known as the internal
error, whose distribution is characterised by the nominal uncertainty of the ith datum,
derived from an understanding of the physics and statistics of the measurement
process;
2. an error +,i caused e.g. by atmospheric or stellar effects not modelled otherwise,
called external error and assumed independent from 0,i .
Throughout, we assume that both error components follow a (uni- or bivariate) normal
distribution with zero mean, which can be shown by means of characteristic functions to
imply that the total error i = 0,i + +,i is again normally distributed with zero mean.
Sometimes a non-standardised t-distribution with (unknown) degrees-of-freedom parameter ν ∈ R is adopted for the total error i . This can be derived under the assumption
that, for one-dimensional data, the unknown variance of i follows an inverse-Gamma
distribution18 and by then marginalising (% 3.2.4) the variance.
By contrast, in this work no marginalisation with respect to the noise variance is
applied; instead, the external errors are described by specific parameters, described in the
following, that are treated and estimated like model parameters.
2 ), the
For (one-dimensional) RV data, the internal error 0,i is distributed as N (0, σ0,i
2
external error +,i is distributed as N (0, σ+ ), where σ+ is a free parameter, and thus the
2 + σ 2 ).
total error i is distributed as N (0, σ0,i
+
For (two-dimensional) AMa data, the internal error 0,i ∼ N (0, E0,i ), where E0,i is
called the AMa data covariance matrix, and the external error +,i ∼ N (0, E+ ). Here, the
scalar covariance matrix E+ = diag(τ+2 , τ+2 ) characterises the distribution of external noise,
assuming that the latter is independent identically-distributed (i.i.d.) in the two astrometric
coordinates, with τ+ being a free parameter. Thus, the total error i ∼ N (0, E0,i +E+ ). The
data covariance matrix represents the nominal uncertainty of the two measured astrometric
coordinates and can be written using singular-value decomposition as
E0,i = R(−φi )
18
a2i 0
0 b2i
!
R(φi ),
(2.86)
This is the conjugate-prior distribution of the variance of a Gaussian likelihood, i.e. it is the posterior
distribution of the variance if the variance has an inverse-Gamma distribution as prior (% 3.2).
40
where R(·), ai , bi and φi are the 2 × 2 passive rotation matrix, the nominal semi-major and
-minor axes of the uncertainty ellipse and the position angle of its major axis with respect
to North, respectively. An uncertainty ellipse is defined here as the location of all points
whose probability density under the noise model is at least exp(− 12 ), corresponding to an
interval of ±1σ0 around the datum for RV data.
2 ),
For (one-dimensional) AMh data, the internal error 0,i is distributed as N (0, τ0,i
whereas the external errors in the underlying astrometric coordinates are distributed as
AMa data. This implies due to eq. (2.56) that the external error in abscissa +,i ∼ N (0, τ+2 );
2 + τ 2 ).
thus, the total error i ∼ N (0, τ0,i
+
As σ+ and τ+ are treated as free external-noise similar parameters, they are given for
every evaluation of the noise distribution (% 3.1.1) and not marginalised over.
Chapter 3
Data analysis
Two approaches to inference
Data analysis is a type of inductive reasoning, inferring general rules from specific observational data (e.g. Gregory 2005b). These general rules are described by observable
models which produce theoretical values of the observables as a function of parameters.
Additionally, models for the errors are set up (section 2.4).
The first steps in the analysis of a given set of data consist in defining the noise model
and all the possible concurrent observable models. Then, the following primary tasks of
data analysis may be carried out:
1. In model selection, the relative probabilities of a set of concurrent models {Mi },
chosen a priori, are assessed. In this work, only observable models are selected, while
noise models are assumed to be known. Specifically, exoplanet detection tries to
decide the question of whether a certain star is accompanied by a planet or not,
based on available data.
2. Additionally, model assessment (or model checking) may be used to determine whether
each of the models under consideration, especially the most probable one, adequately
describe the data. In this thesis, the basic assumption is made that the most probable
model1 indeed provides an adequate description of the data, i.e. that potential other
effects than those caused by binary or planetary companion(s) play no significant
role. Therefore, model assessment is not performed in this thesis.
3. Parameter estimation aims to infer the parameters θ of a chosen model. This is
specifically referred to as the characterisation (or determination) of exoplanet orbits
in the present context.
4. The purpose of uncertainty estimation is to provide a measure of the uncertainties in
the parameters.
3.1
Frequentist inference
The well-established frequentist approach to inference is named after the fact that it defines
probability as the relative frequency of an event. Measurements are regarded as values of
1
For stars treated as binary, only one possible model is considered in this work (chapter 2).
41
42
random variables drawn from an underlying statistical population that is characterised by
population parameters. The kind of population and its parameters are determined by the
observable and noise models: e.g. for one-dimensional observables with Gaussian noise,
under the condition that no systematic errors are present and the correct model function is
f (·; ·) with parameters θ, the population is normally distributed with variance equal to the
noise variance and mean given by the observable model f (t; θ) evaluated at parameters θ
and time t.2
3.1.1
Likelihood of data
Given the observable model (% 2.2) and the noise model (% 2.4) with their parameters,
hence the population assumed to underlie the measurements, the probability density3 of
any datum can be calculated. We start by defining the residuals for the relevant data
types,
∆AM,i ≡ r i − r(θ; ti )
∆h,i ≡ ah,i − ah (θ; ti )
∆RV,i ≡ vi − v(θ; ti ),
(3.1)
(3.2)
(3.3)
i.e. the differences between a datum taken at time ti and the corresponding observable
model, respectively for AMa , AMh and RV. Furthermore, we define the normalised residuals,
q
%AM,i ≡ si ∆|AM,i E−1
i ∆AM,i
∆h,i
τi
∆RV,i
≡
,
σi
(3.5)
%h,i ≡
%RV,i
(3.4)
(3.6)
where
si ≡ sgn ((φi − ϕi )(π − [φi − ϕi ]))
(3.7)
gives the sign of the normalised AMa residual, with ϕi and φi being the position angles of
the residual and of the uncertainty ellipse, respectively (section 2.4). In these equations,
total errors are employed as discussed in section 2.4,
Ei ≡ E0,i + E+
τi2
σi2
2
≡
≡
2
τ0,i
+ τ+2
2
2
σ0,i
+ σ+
.
(3.8)
(3.9)
(3.10)
In general, ti is the value of an independent variable, which may e.g. be temporal or spatial. We
assumed the measurement durations to be short in comparison with the characteristic time of orbital
motion, given by the orbital period, and thus the observations to take place at points in time ti which are
known exactly.
3
Throughout, we use the term probability density wherever it refers to a continuous quantity, as opposed
to probability, i.e. probability mass function, for discrete quantities. Probability distribution, denoted by
p(·), is a generic term used for both cases.
43
Now the probability density or likelihood of an individual datum is dictated by the
(Gaussian) noise model
LAM,i
Lh,i
%2AM,i
1
≡ p(r i |θ) = √
exp −
2
2π det Ei
%2h,i
1
exp −
≡ p(ah,i |θ) = √
2
2πτi
LRV,i
!
(3.11)
!
(3.12)
%2RV,i
1
exp −
≡ p(vi |θ) = √
2
2πσi
!
(3.13)
.
The joint likelihood of all data of a given type equals the product of the above individual
likelihoods, i.e.
LAM =
NY
AM

LAM,i = (2π)
NAM
i=1
Lh =
Nh
Y
LRV =
−1
det Ei 
i=1

Lh,i = (2π)
Nh
2
i=1
Nh
Y
NY
AM p
Nh
Y
−1
τi 
i=1

LRV,i = (2π)
NRV
2
χ2
exp − h
2
−1
N
RV
Y
σi 
i=1
i=1
χ2
exp − AM
2
!
(3.14)
!
χ2
exp − RV
2
(3.15)
!
.
(3.16)
(3.17)
Here, NAM , Nh , NRV are the respective numbers of data and
χ2AM ≡
χ2h ≡
χ2RV ≡
N
AM
X
%2AM,i
i=1
N
h
X
%2h,i
i=1
N
RV
X
%2RV,i
(3.18)
(3.19)
(3.20)
i=1
are called χ2 statistics of the data,providing a measure of the deviation between data
and model, known as the goodness of fit. If the Gaussian noise model and the observable
model are correct, the normalised residuals follow a standard normal distribution N (0, 1).
Then, and if the observable models are linear functions of the parameters, the values of the
χ2 statistics obey the χ2ν distribution, where ν = ND − k are the degrees of freedom, with
ND being the number of data and k the number of model parameters. This distribution
has an expectation equal to ν and a variance given by 2ν.
In the general case of several types of data being combined, the total likelihood is the
product of the corresponding joint likelihoods given in eq. (3.14) – (3.16); e.g. for AMh
and RV data,
L = Lh LRV .
(3.21)
3.1.2
Parameter estimation
Frequentist parameter estimation is generally equivalent to maximising L or minimising χ2
as functions of θ. The resulting best estimates of the parameters θ̂ are therefore often called
44
maximum-likelihood or least-squares estimates. For linear models, χ2 (θ) is a quadratic
function and consequently θ̂ can be found unambiguously by matrix inversion. In the more
realistic cases of nonlinear models, however, χ2 may have many local minima, therefore
care needs to be taken not to mistake a local minimum for the global one. Several methods
exist to this end, including evaluation of χ2 (θ) on a finite grid, simulated annealing or
genetic algorithms (e.g. Gregory 2005b).
3.1.3
Uncertainty estimation
Under the frequentist framework, parameter uncertainties are usually quoted as confidence
intervals. Procedures to derive these are designed such that when repeated many times
based on different data, a certain fraction of the resulting intervals will contain the true
parameters. Popular methods use bootstrapping (Efron and Tibshirani 1993) or the Fischer
information matrix, which is based on a local linearisation of the model (e.g. Ford 2004).
However, these methods suffer from specific caveats: the Fischer matrix is only appropriate
for a quadratic-shaped χ2 in the vicinity of the minimum, and bootstrapping, which
relies on modified data, may lead to severe misestimation of the parameter uncertainties,
especially when these are large (Vogt et al. 2005).
3.1.4
Model selection
Frequentist model selection usually starts by setting up a null hypothesis H0 stating that
the simplest model M0 is correct. If M0 is linear and the noise is normally distributed, the
χ2 statistic follows the known reference distribution χ2ν (% 3.1.1). Under these conditions,
the following procedures are often used to select a model.
1. The goodness of fit χ̂2 = minθ χ2 (θ) of M0 is calculated and the reference distribution
is integrated from χ̂2 to infinity to obtain the p-value,
p = p(χ2 ≥ χ̂2 |H0 ).
(3.22)
If p < α for some previously chosen significance level α chosen in advance, say
α = 0.05, the null hypothesis is rejected as it would imply a better fit with high
probability 1 − α. Rejecting H0 is equivalent to the hypothesis that another more
complex model is correct. Because it only addresses one model explicitly, this is a
type of model selection based on model assessment.
2. The relative goodness of fit of M0 and an alternative, more complex model M1 ,
defined as correct under hypothesis H1 , is determined by calculating χ2 under both
models and applying a certain statistic S to these values χ̂2ν0 , χ̂2ν1 . Then, the measured
value Ŝ of this statistic is converted into a p-value analogously as above, using the
reference distribution of S under H0 . Again, if p > α, the null hypothesis is rejected in
favour of H1 . Among such statistics S are the F -statistic and the likelihood-ratio-test
statistic, whose null-hypothesis reference distributions are known only under certain
regularity conditions and if ND → ∞ (Protassov et al. 2002).
It should be noted, however, that this type of model selection can only be applied if
the models M0 , M1 are nested, i.e. M1 turns into M0 if its additional parameters
assume their so-called null values.
45
Period detection
The detection of periodicities in time-series of data, e.g. sets of RV data {(ti , vi )}, plays
an important role particularly in the present context of discovering and characterising
extrasolar planets. In assessing the relative performance of two competing models, it is a
special type of frequentist model selection (item 2 above). One of its basic difficulties is
to distinguish spurious patterns induced by noise, which mimics periodicities, from a real
periodic signal.
An important tool for period detection is the periodogram, which proceeds by evaluating
a certain statistic of the data at a range of test frequencies f . The Lomb-Scargle (LS)
periodogram (Lomb 1976; Scargle 1982) is a well-known type of periodograms closely
related to the discrete Fourier-transform (DFT; Deeming 1975) and briefly sketched in the
following. The Lomb-Scargle statistic, or periodogram power PLS , at any given frequency f
measures the relative goodness of fit of a zero-mean sinusoidal signal of that frequency
and a constant signal. In calculating the statistic, the free parameters of both models are
according to a linear least-squares fit in which the individual measurement uncertainties
are disregarded. The power PLS follows an F -distribution under the null hypothesis H0
that only noise and no deterministic signal is present.
From the maximal power PLS,max found at frequency fmax , the p-value is calculated
analogously to eq. (3.22). Correspondingly, the probability under H0 to obtain a power
lower than PLS,max at some arbitrary frequency is 1 − p. If the number of statistically
independent frequencies, M , can be determined,4 one can calculate the probability that
at least one of the frequencies has a peak exceeding the measured PLS,max although no
periodic signal exists. This false-alarm probability (FAP) then equals 1 − (1 − p)M . If
the FAP is found to fall below a given significance level α, H0 is rejected in favour of H1 ,
postulating a single sinusoidal periodicity of frequency fmax .
Other kinds of periodograms have been defined, notably the floating-mean periodogram
(Cumming et al. 1999), the Keplerian periodogram (Cumming 2004), the SigSpec periodogram (Reegen 2007), and the generalised Lomb-Scargle periodogram (Zechmeister and
Kürster 2009). While all of them account for a non-zero mean of the signal, the latter two
also introduce weighting by considering the measurement uncertainties.
Of the variants mentioned, only the Keplerian and the generalised Lomb-Scargle
periodogram are tailored to the signals of eccentric orbits in RV data, while the others
correspond to the assumption of circular orbits when applied to RV data.
Besides noise, another problem inherent to period detection in finite time-series of data
is spectral leakage, i.e. the appearance of spurious peaks in the periodogram due to the
finiteness of total observing time (sidelobes) and time spacing (aliases). In particular, the
DFT of a time series is the convolution of the Fourier transform of the underlying signal
with a spectral window which determines the effects of spectral leakage.5
Spectral leakage, also called aliasing, often causes strong artefacts and is not accounted
for in the reference distributions of most periodogram statistics. However, it affects different
techniques to varying degrees (e.g. Reegen 2007). Methods to deal with spectral leakage
are discussed, e.g. by Ferraz-Mello (1981); Roberts et al. (1987); Foster (1995); Reegen
(2011).
4
5
This number may only be possible approximatively or in special cases (e.g. Cumming et al. 1999).
The spectral window is given as the DFT of a constant sampled at the observing times.
46
3.2
Bayesian inference
Bayesian inference (e.g. Sivia 2006), which has gained popularity in various scientific
disciplines during the past few decades, defines probability as the degree of belief in a
certain hypothesis H. While this is sometimes criticised as leading to subjective assignments
of probabilities, Bayesian probabilities are not subjective if they are based on all relevant
knowledge K, hence different persons with the same knowledge will assign them the same
value (e.g. Sivia 2006). Thus, Bayesian probabilities are conditional on the knowledge K,
and this conditionality should be stated explicitly, as in the following equations.
3.2.1
Bayes’ theorem
In the eighteenth century, Thomas Bayes laid the foundation of a new approach to inference
with what is now known as Bayes’ theorem (Bayes and Price 1763). For the purpose of
parameter and uncertainty estimation, the hypothesis H refers to the values of the model
parameters θ, and Bayes’ theorem is expressed as
prior π(θ)
z
}|
likelihood L(θ)
{ z
}|
{
p(θ|M, K) · p(D|θ, M, K)
p(θ|D, M, K) =
{z
}
|
p(D|M, K)
posterior P(θ)
|
{z
evidence Z
(3.23)
}
where D ≡ {(ti , y i )} is the set of pairs of observational times and corresponding data
values and M denotes the particular model assumed. As mentioned above, all probabilities
are also conditional on the knowledge K, including statements on the types of parameters
and on parameter space Θ – which we assume to be a subset of Rk with k ∈ N – as well as
the noise model.
Using Bayes’ theorem, the aim is to determine the posterior p(θ|D, M, K) ≡ P(θ), i.e.
the probability distribution of the parameters θ in light of the data D, given the model M
and prior knowledge K. The posterior represents all the knowledge about the model
parameters based on the data and prior knowledge. The other terms on the right-hand
side of the theorem are explained below.
• The term prior refers to the probability distribution p(θ|M, K) ≡ π(θ) of the
parameters θ given only the model and prior knowledge K; it characterises the
knowledge about the parameters present before considering the data. For objective
choices of priors, based on classes of parameters, see section 3.2.2.
• The likelihood p(D|θ, M, K) ≡ L(θ) is the probability distribution of the data
values D, given the observing times, the model and the parameters, where the
notation L(θ) is used to refer to the likelihood as a function of the parameters. It is
introduced in the context of frequentist inference in section 3.1.1.
• The evidence p(D|M, K) ≡ Z is the probability distribution of the data values D,
given the observing times and the model but neglecting the parameter values,
p(D|M, K) =
Z
p(D, θ|M, K) dθ
(3.24)
=
Z
p(θ|M, K) p(D|θ, M, K) dθ
(3.25)
=
Z
π(θ) L(θ) dθ.
(3.26)
47
It equals the integral of the product of prior and likelihood over parameter space Θ
and plays the role of a normalising constant. In practice, however, it is hard to
calculate (section 3.2.7).
It may be instructive here to note that the frequentist approach of maximising the
likelihood p(D|θ, M, K) is equivalent to maximising the posterior when assuming uniform
priors p(θ|M, K). This can be seen by inserting p(θ|M, K) = const into eq. (3.23), which
leads to
P(θ) = p(θ|D, M, K) ∝ p(D|θ, M, K) = L(θ)
(3.27)
However, this maximum-likelihood approach ignores the fact that uniform priors are not
always the most objective choice (% 3.2.2) and the posterior cannot be fully characterised
just by the position of its maximum. Still, the latter can be used as a posterior summary
in the Bayesian framework (section 3.2.5).
3.2.2
Encoding prior knowledge
By means of the prior, Bayesian analysis allows one to incorporate knowledge obtained
earlier, e.g. using different data. When no prior knowledge is available for some model
parameter, except for its allowed range, maximum prior ignorance about the parameter
can be encoded by a prior of one of the following functional forms for the most common
classes of location and scale parameters (Gregory 2005b; Sivia 2006).
• For a location parameter, we demand that the prior be invariant against a shift ∆ in
the parameter, i.e.
p(θ|M, K) dθ = p(θ + ∆|M, K) d(θ + ∆),
(3.28)
which leads to the uniform prior
p(θ|M, K) =
Θ(θ − a) Θ(b − θ)
,
b−a
(3.29)
where Θ(·), a, and b are the Heaviside step function, the lower and the upper prior
bounds.
Here, we note that the frequentist approach, lacking an explicit definition of the prior,
corresponds to the implicit assumption of a uniform prior for all parameters.
• A positive scale parameter, which often spans several decades, is characterised by its
invariance against a stretch of the coordinate axis by a factor ϕ, i.e.
p(θ|M, K) dθ = p(ϕθ|M, K) d(ϕθ),
(3.30)
which is solved by the Jeffreys prior,
p(θ|M, K) =
Θ(θ − a) Θ(b − θ)
.
θ ln ab
(3.31)
That a uniform prior would be inappropriate for this parameter is also illustrated by
the fact that it would assign higher probabilities to θ lying within a higher decade of
[a, b] than in a lower.
48
• If the lower prior bound of a scale parameter is zero, e.g. for the RV semi-amplitude K,
a modified Jeffreys prior is used. It has the form
p(θ|M, K) =
Θ(θ − a) Θ(b − θ)
,
k
(θ + θk ) ln b+θ
θk
(3.32)
where θk is the knee of the prior. For θ θk , this prior is approximately uniform,
while it approaches a Jeffreys prior for θ θk .
• Scale parameters which can have positive or negative sign have a signed modified
Jeffreys prior which we define as
1
p(θ|M, K) ≡ pmJ (|θ| |M, K),
2
(3.33)
where pmJ (·) is a modified Jeffreys prior.
3.2.3
Posterior sampling of parameters and derived quantities
In order to obtain the normalisation constant Z of the posterior from prior π(θ) and
likelihood L(θ), one would have to integrate the product π(θ)L(θ) over the whole parameter
space (eq. (3.24)). This can be achieved in general by analytic or numerical integration,
but neither is feasible in more than about three dimensions, and the former method may
carry further disadvantages in being approximative or restricted to simple integrands.
To circumvent this obstacle, the Markov chain Monte Carlo (MCMC) method (e.g.
Gilks et al. 1996) allows to gather samples distributed as the posterior density;6 from
these samples, various aspects of the posterior, described below, can then be estimated.
Moreover, MCMC allows to explore the whole parameter space without applying a regular
grid to all dimensions, which may not be chosen fine enough in more than a few dimensions
to adequately sample the posterior. MCMC and several related techniques are described
in technical detail in section 4.5.
Not only the parameters θ are of interest a posteriori, but various other relevant
quantities can be derived from them (table 2.2). Such derived quantities ϑ = ϑ(θ, θ̃), where
θ̃ are additional parameters such as the epoch tr or stellar mass m? , are implicitly sampled
with the parameters θ according to
ϑ(j) = ϑ(θ (j) , θ̃).
(3.34)
For example, the time of periapsis T is sampled as
T (j) = tr −
χ(j)
.
f (j)
(3.35)
These derived-quantity samples can be used analogously to parameter samples for posterior
inference as described below, where a notation pertaining only to parameters is used for
simplicity.
6
The first M samples belonging to the burn-in phase do not follow the posterior distribution and thus
need to be excluded (section 4.5.2).
49
3.2.4
Marginalisation and density estimation
As a density over k > 2 dimensions, the posterior P(θ) cannot be displayed unambiguously
in a figure. By reducing the dimensionality of the posterior domain, marginal posteriors Pi (·), i.e. probability densities over each of the parameters θi , and joint marginal
posteriors Pi,j (·, ·) over two parameters, can be obtained and plotted. This reduction
is known as marginalisation and described mathematically by integration over all other
parameters,
Pi (θi ) ≡ p(θi |D, M, K) =
Z
Pi,j (θi , θj ) ≡ p(θi , θj |D, M, K) =
where dθ \i ≡
Q
k6=i
dθk and dθ \i,j ≡
n
Q
k6o
=i,j
(j)
P(θ) dθ \i
(3.36)
Z
(3.37)
P(θ) dθ \i,j ,
dθk . In practice, marginal posteriors are
estimated from the collected samples θ
by only considering their ith component and
performing a density estimation based on these one-dimensional samples. Joint marginal
posteriors are derived analogously, based two-dimensional samples of components i and j.
Several density estimators exist for deriving a density from a set of samples. One of
them – the oldest and probably most popular type, known as the histogram – has several
drawbacks: its shape depends on the choice of origin and bin width, and when used on
two-dimensional data, a contour diagram cannot easily be derived from it.
Generalising the histogram to kernel density estimation over one or two dimensions, the
samples can be represented more accurately and unequivocally (Silverman 1986). Below,
we refer only to the simpler one-dimensional case. There, the kernel estimator can be
written as
!
N
X
1
x − X (j)
F(x) ≡
K
,
(3.38)
N σker j=M +1
σker
where x is a scalar variable, M is the burn-in length (% 4.5.2), N is the total
n chain
o length
(j)
(i.e. number of samples), K(·) is the kernel, σker is the window width and X
are the
underlying samples. As detailed by Silverman (1986), the efficiency of various kernels in
terms of the achievable mean integrated square error is very similar, and therefore the
choice of kernel can be based on other requirements. Since no differentiability is required
for the estimated densities and computational effort plays an important practical role, a
triangular kernel,
Ktri (x) ≡ max (1 − |x|, 0) ,
(3.39)
was selected for estimating the marginal posteriors and a biweight kernel,
Kbi (x) ≡
2
15 1 − x2 ,
16
(3.40)
for the joint marginal posteriors.
The window width is chosen following the recommendations of Silverman (1986),
riq
≡ 2.189 · min σsamp ,
1.34
σker
1
(N − M ) 5 ,
(3.41)
where σsamp is the sample standard deviation and riq is the interquartile range of the
samples.
50
3.2.5
Parameter estimation
To obtain a single most probable estimate of the parameters, the posterior density P(·)
can be summarised by the posterior mode θ̂ ∈ Θ, i.e. the point where the posterior attains
its maximum value,
θ̂ ≡ arg maxθ P(θ).
(3.42)
This point, also known as the maximum a-posteriori (MAP) parameter estimate, can be
approximated by the MCMC sample with highest posterior density (Gilks et al. 1996),
based on the values of P(θ (j) ) already calculated during sampling. This approximation
neglects the finite spacing between samples.
Alternatively, the following scalar summaries can be inferred from the samples or, in
case of the marginal mode, from the marginal posteriors Pi (θ):
• mean or expectation θ̄,
Z ∞
θ Pi (θ) dθ,
(3.43)
Pi (θ) dθ ≡ 0.5,
(3.44)
θ̌ ≡ arg maxθ Pi (θ).
(3.45)
θ̄ ≡
−∞
• median θ̃,
Z θ̃
−∞
• marginal mode θ̌,
3.2.6
Uncertainty estimation
For uncertainty estimation, posterior samples enable highest posterior-density intervals
(HPDIs) to be estimated. For any given C ∈ R with 0 < C < 1, a HPDI IHPD ≡ [a, b] is
defined as the smallest interval over which the posterior contains a probability C, i.e.
Z b
a
Pi (θ) dθ = C,
s.t.
b − a = min .
(3.46)
In contrast to frequentist confidence intervals, HPDIs are generally not symmetric,
meaning that their midpoint does not necessarily correspond to the best estimate. This is
because the marginal posteriors may be asymmetric, including any amount of skew.
It should also be noted that HPDIs are not useful with multimodal posteriors because
several modes cannot be meaningfully summarised by one interval per dimension, nor by a
single best estimate.
To quantify linear dependencies between parameters, the a-posteriori Pearson correlation
coefficient,
cov(θ1 , θ2 )
rθ1 ,θ2 ≡ p
= rθ2 ,θ1 ,
(3.47)
var(θ1 ) var(θ2 )
can be inferred from the samples. There may also be nonlinear correlations between
parameters that are not described by the correlation coefficients. Furthermore, one should
be aware that for strong linear or non-linear relationships between parameters, uncertainties
of single parameters as characterised by HPDIs may not be meaningful.
We stress that the (joint) marginal posteriors can – and should – always be referred to,
especially when best estimates and/or HPDIs do not adequately characterise the posterior.
The availability of these more informative densities is one of the advantages of a Bayesian
approach with posterior sampling.
51
3.2.7
Model selection
Bayes’ theorem (% 3.2.1) can also be expressed for model selection based on a hypothesis H
stating that a certain model M is correct. This gives
p(M|K) · p(D|M, K)
,
p(D|K)
p(M|D, K) =
(3.48)
where p(M|K) and p(M|D, K) are the model prior and posterior, respectively, p(D|K) is a
normalising constant, while p(D|M, K) is the evidence given equivalently to eq. (3.24) by
p(D|M, K) =
Z
p(D, θ|M, K) dθ
(3.49)
=
Z
π(θ) L(θ) dθ = Z.
(3.50)
When two models M0 , M1 are concurrently entertained, their posterior odds are therefore
given by
p(M1 |K) p(D|M1 , K)
p(M1 |D, K)
=
·
.
p(M0 |D, K)
p(M0 |K) p(D|M0 , K)
|
{z
} |
prior odds
{z
Bayes factor
(3.51)
}
Absent any knowledge to the contrary, a fair choice for the prior odds is to set them equal
to 1. In the following, we therefore ignore the prior odds and consider the posterior odds
to be given by the Bayes factor
B1,0 ≡
p(D|M1 , K)
Z1
=
,
p(D|M0 , K)
Z0
(3.52)
where Z0 and Z1 are the evidences of the two models.
In contrast to frequentist model selection, Bayesian evidences and thus the Bayes factor
include an Ockham’s razor, which imposes a penalty on any model Mi whose parameter
space (i.e. prior support) is larger than warranted by the likelihood, as can be seen from
the definition of the evidence, eq. (3.50): the evidence, also known as the prior-averaged
likelihood or marginal likelihood, decreases when the prior supports large areas with low
likelihood L(θ), i.e. the models are “unnecessarily complex”. Additionally, the Bayes factor
is a consistent selector, which increasingly supports the correct model in the limit ND → ∞
(Weinberg 2012).
A common method of estimating integrals such as that in eq. (3.50) is known as the
Laplace approximation, where the integrand is approximated as a multivariate normal
distribution. However, this only leads to good results if all significant posterior peaks have
been found and can be represented accurately by Gaussian distributions, which would
be a severe practical limitation for the purposes of this work. By contrast, methods are
described in the following to estimate the integral in eq. (3.50) on the basis of samples
from the prior or posterior densities.
n
o
Assume that a set of samples θ (j) : j = M + 1, . . . , N has been drawn, where θ (j) ∼
g(θ) and g(·) is a density. The evidence of a particular model can then be written
Z=
Z
π(θ)L(θ)
g(θ) dθ
g(θ)
(3.53)
52
and can be approximated by the consistent estimator (Gilks et al. 1996, ch. 10)
1
Ẑ =
N −M
N
X
π(θ (j) )L(θ (j) )
j=M +1
g(θ (j) )
πL .
≡
g (3.54)
(3.55)
g
Below, the notation Z w Ẑ denotes the relation between a quantity and its estimator. If
the prior π(·) is chosen as density g, the estimator results as
Ẑ = kLkπ .
(3.56)
Sampling from the prior, however, can be similarly inefficient as pure Monte-Carlo sampling.
By contrast, for posterior samples, the density is
g(θ) ≡ P(θ) =
π(θ)L(θ)
.
Z
(3.57)
Using
1
1
=
Z
Z
Z
1
=
Z
1
g(θ) dθ.
L(θ)
π(θ) dθ =
Z
π(θ)
g(θ) dθ
Zg(θ)
(3.58)
and eq. (3.57) yields
Z
(3.59)
An estimator for the evidence based on posterior samples is therefore given by

1
Ẑ = 
N −M
−1
1
≡
L .
P
N
X
1
(j)
j=M +1 L(θ )
−1

(3.60)
(3.61)
This harmonic-mean approximation of the evidence, however, is prone to disturbance
by samples with extremely low likelihood and therefore does not exist in many cases;
convergence criteria are given by Wolpert (2002). Weinberg (2012) presents an algorithm
based on Lebesgue integrals for calculating eq. (3.60) by ignoring samples contributing a
large error. However, this Numerical Lebesgue Algorithm is only stable and recommended
with informative priors.
Here, another algorithm for estimating the evidence Z as given by eq. (3.50) is therefore
employed. Provided by Weinberg (2012), it uses balanced kd-trees for space partitioning.
This Volume Tesselation Algorithm (VTA) is applied to a subspace Θs ⊂ Θ of the parameter
space which should be chosen so as to help minimise bias and variance by excluding lowposterior regions, e.g. those containing samples θ (j) with low values of π(θ (j) )L(θ (j) ). The
tree is constructed storing samples only in the leaves, not in inner nodes, which simplifies
subsequent volume calculations.
53
Due to eq. (3.23), one has
Z
Z
Θs
P(θ) dθ =
Z
Θs
π(θ)L(θ) dθ,
(3.62)
π(θ)L(θ) dθ
,
Θs P(θ) dθ
(3.63)
and therefore
R
Z=
Θs
R
with the denominator simply given as the fraction of posterior samples in Θs ,
1
P(θ) dθ =
N
−
M
Θs
N
X
Z
1Θs (θ (j) ),
(3.64)
j=M +1
where 1· (·) is the indicator function. Given a tesselation of Θs by ns sub-volumes ωi , each
containing at most a fixed number c of leaves, the numerator of eq. (3.63) can be estimated
by first assigning each ωi a representative value Pi of π(·)L(·) (the sample median is chosen
for Pi in the case of balanced kd-trees) and then approximating the integral as a Riemann
sum,
Z
Θs
π(θ)L(θ) dθ ≈
ns
X
(3.65)
ωi Pi .
i=1
Thus, one has
Z w (N − M ) PN
Pns
i=1 ωi Pi
j=M +1 1Θs (θ
(j)
)
.
(3.66)
In cases tested by Weinberg (2012) with low-informative priors, the VTA excelled both the
NLA and the Laplace approximation.
3.2.8
Model-uncertainty prediction and observation scheduling
Posterior samples may also be used to assess the uncertainty in the value of the observable
model(s) at any given instant t. In particular, when t is in the future, such an uncertainty
prediction may help to schedule upcoming observations of the same target with the same
technique(s). In this work, uncertainty prediction is applied only to AMa and radial
velocities, as the Hipparcos ninstrument
is out of operation.
o
Given posterior samples θ (j) , we define uncertainty prediction for a given type of
data to consist in the following algorithm:
1. Define the set of times {ti } of interest;
2. for each ti ∈ {ti } do:
n
o
(a) for each θ (j) ∈ θ (j) , calculate the value of the model function f i,j ≡ f (ti ; θ (j) );
n
o
(b) calculate a measure of dispersion ςi of the values f i,j , i.e.:
• the variance or standard deviation, for one-dimensional model functions f (·; ·),
or
54
• the semi-major axes ai and semi-minor axes bi of the uncertainty ellipse
which
n isodefined by the covariance matrix S of the two vector components
of f i,j , for two-dimensional model functions; equivalently, ai and bi are
given by the eigenvalues of S,
tr S
ai =
+
2
s
tr S
−
2
s
bi =
tr S
2
2
tr S
2
2
− |S|
(3.67)
− |S|.
(3.68)
The dispersion measures {ςi } can then be plotted over time, followed by locating their
extrema. For observation scheduling, re-observation can be recommended at the time tmax
of maximum dispersion of the predicted model-function values, i.e. maximum uncertainty
in the observable, yielding a maximal constraint of the observed stellar motion.
Alternatively, observation scheduling may be based on the premise that re-observation
should yield the maximum gain in information on the parameters. This approach is called
maximum-entropy sampling (e.g. Loredo 2004; Ford 2008) and employs the (negative)
Shannon entropy as a measure for the information on a parameter. In the case of normally
distributed predictions f (t; ·) with t given, this is equivalent to uncertainty prediction as
described above.
Chapter 4
Informatics and implementation
A tool for Bayesian exoplanet science
This chapter introduces Base, a Bayesian astrometric and spectroscopic exoplanet detection and characterisation tool. Its goals are to fulfil two major tasks of exoplanet science,
namely the detection of exoplanets and the characterisation of their orbits, by implementing
methods of Bayesian statistics detailed in chapter 3.
Base has been developed to provide for the first time the possibility of an integrated
Bayesian analysis of stellar astrometric and Doppler-spectroscopic measurements with respect to their companions’ signals, correctly treating particularly the correlated astrometric
measurement uncertainties (% 2.4) and allowing one to explore the whole multidimensional
parameter space Θ without the need for informative prior constraints. Still, users may
readily incorporate prior knowledge, e.g. from previous analyses with other tools, by means
of priors on the model parameters. The tool automatically diagnoses convergence of its
Markov chain Monte Carlo (MCMC) sampler to the posterior and regularly outputs status
information.
For orbit characterisation, Base performs a complete Bayesian parameter and uncertainty estimation, delivering important results including probability densities and
correlations of model parameters and several derived quantities. As opposed to a single
best estimate and confidence interval per parameter, this is especially important when the
data do not constrain the parameters well, e.g. when only few data have been recorded
or the signal-to-noise ratio is low (as can be the case for lightweight planets or young
host stars). Another important function which Base has been built to include is Bayesian
model selection, performed if the user allows concurrent models with different numbers of
planets.
Base comes in the form of a highly configurable command-line tool, developed in
Fortran 2008 and compiled with GFortran (Free Software Foundation 2011a). This chapter
details the implementation and modes of using Base.
4.1
Requirements
Base has been developed according to the following requirements:
• Accuracy:
55
56
◦ the user should be able to reflect their prior knowledge, where present, as
accurately as possible in the analysis;
◦ calculations should be carried out to an accuracy sufficient for the respective
task;1
◦ output should include all significant digits and not be trimmed.
• Flexibility:
◦ all relevant aspects of how the analysis is carried out should be adjustable by
the user;
◦ Base should make sensible assumptions where user demand is missing.
• Usability:
◦ the modes of supplying information to Base and control its behaviour should
be as simple and sensible as possible;
◦ the user should be able to control the time taken by flexible-duration tasks (such
as sampling) and should be provided with an estimate of the remaining runtime
for a given task wherever possible;
◦ output should be easily comprehensible (including explanations where needed);
◦ the user should be warned when Base acts in an unexpected way;
◦ Base should run automatically, without the need of user supervision nor interaction after the start,2 while providing the user with relevant up-to-date status
information.
4.2
Other software and features of BASE
Development of Base has commenced before this work (Schulze-Hartung 2008), leading to
an early version with only some of today’s capabilities of the tool. In the course of this
dissertation, the program has been significantly extended to include, among other aspects,
treatment of AM data, modelling of binaries and multi-planet systems, as well as model
selection.
Table 4.1 lists the most essential features of Base as well as their availability both in
Base before the present work and in three other computer programs introduced recently
in the literature. Although other tools for Bayesian inference using MCMC with a more
general applicability exist – such as those by Weinberg and Moss (2011) or Foreman-Mackey
et al. (2012) – only programs that specialise on exoplanet science, specifically allowing to
readily analyse AMa and RV data, are included in this overview. The table demonstrates
that Base includes considerable new and important functionality.
1
Therefore, most calculations operate on double-precision (64-bit) numbers, but quadruple precision
(128-bit) is employed where essential.
2
This is especially important considering the sometimes hour-long durations of Base runs.
57
Table 4.1: Essential features of Base and their presence in other existing tools specialised
on exoplanet science. Where fields are blank, no clear indication of the feature being
available could be found in the cited article.
Category
Base feature
AMa data
RV data
AMa +RV data
Data
AMh (+RV) data
User-defined grouping of data
Correct treatment of AMa
uncertainties
Binary stars (two RV amplitudes)
MoMultiple planets
dels
User-defined epoch
Special treatment of cyclic
parameters
Additional noise
Prior information
Priors
Arbitrary prior densities
MCMC
Hit-and-run sampler
Thinning
SampParallel tempering
ling
Automatic convergence diagnostic
Saving and reading samples
Derived quantities
Marginal posteriors by kernel
density estimation
InfeRefinement of kernel window
rence width (for parameter f )
User-defined HPDIs, posteriorprobability intervals and hypercubes
Joint marginal-posterior densities
Model selection
Uncertainty prediction
Generating data
Regular status updates during
Other
tasks
Gnuplot interface
Producing plots with LATEX
formulae
Section
4.4.2
4.4.2
4.4.2
4.4.2
4.4.2
Base
before
this
work
Tuomi
et al.
(2009)
×
×
×
×
AngladaGregory Escudé
(2011)
et al.
(2012)
×
×
×
×
×
2.4
2.2.5
2.1.4
4.4.1
2.1.1
2.4
4.4.3
4.4.3
3.2.3
4.5.2
4.5.2
4.5.3
4.5.4
×
×
×
×
×
×
×
×
×
×
×
×
×
×
4.3.3
4.4.1
3.2.4
×
3.2.7
3.2.8
4.6.3
×
4.1
×
4.4.1
4.4.1
×
×
×
4.6.2
3.2.3
3.2.4
×
×
×
×
58
4.3
Modes of operation
Base can operate in several kinds of modes, which correspond to the assumption of certain
physical systems or to the type of inference to be made. The modes are explained as
follows.
4.3.1
Normal and binary mode
Base is capable of analysing data for two similar types of systems, whose physics have
been described in section 2.2. These are systems with:
1. one observable component that may be accompanied by one or more unobserved
bodies (generally referred to as planetary systems here for simplicity); this mode of
operation is called the normal mode;
2. two observable, gravitationally bound components (referred to as binary stars), one
of which may serve as reference point for observing the other; this mode of operation
is called binary mode.
Base automatically selects normal or binary mode depending on the given data as
described in section 4.4.2.
4.3.2
Number-of-planets modes
In normal mode, any combination of up to ten numbers of planets {np,i } can be selected
by the user (section 4.4.1). Each number of planets np,i ∈ [0, 9] corresponds to a separate np mode, including sampling and analysis, carried out by Base. Each such mode
includes the assumption of a corresponding model and parameters which can be configured
independently using the –scope option (section 4.4.1).
In the case of several np modes, a Bayesian model selection is made after the last mode
has been finished. Giving the same number of planets several times allows to compare the
posterior probability of the same model under different conditions, e.g. with or without
additional noise (% 2.4).
4.3.3
Periodogram mode
Using the option –periodogram, Base can be put in periodogram mode, which modifies
binary or 1-planet mode by restricting inference to the orbital frequency f . Its marginal
posterior is estimated using a refined kernel window width (% 3.2.4), resulting in narrower
peaks and, in general, an altered marginal mode. Posterior summaries calculated from
the samples directly are not modified in periodogram mode. The algorithm used for
window-width refinement is detailed in section 4.6.1.
4.4
Input
Users can provide different kinds of information to Base, i.e. a number of configuration
options, data, and prior knowledge. These various types of user input are described below.
59
Table 4.2: Option notation.
Notation
Meaning
JhargiK
Jhargi|hargi|. . . K
hargiJ,hargiJ. . . KK
...
J. . . K
[]
Optional argument hargi
List of alternative arguments
One or more instances of hargi separated by ,
Repetition of expression from and including the previous | or ,
Recursive insertion of enclosing optional expression
Square brackets enclosing arguments or intervals, to be put literally
4.4.1
Invocation and options
According to the requirement of running without the need for user interaction, options
supplied in starting the program control the behaviour of Base and allow the user to input
information such as the stellar mass or prior knowledge (see section 3.2.1). In the following,
options are referred to using a notation explained in table 4.2. Any option can be supplied:
1. in a configuration file3 base.conf in the working directory and/or
2. on the command line.
The configuration file is parsed first, followed by the command line, with command-line
options overriding those in the configuration file. Besides options, arguments on the
command line may be filenames,4 where each file must contain a data set in one of the
formats described in section 4.4.2. As opposed to filenames, options are preceded by the
characters – on the command line to distinguish them from file names. No data-file names
may be supplied in the configuration file and the – prefix is omitted in options. Below,
options are always quoted with the prefix for clarity.
Options may be classified according to their syntax as follows:
1. commands:
(a) –option
(b) –option[hargi]
2. assignments:
(a) –option=hrhsi
(b) –option[hargi]=hrhsi
In assignments, depending on the option, hrhsi may denote:
1. a single value hvali
2. a list of values hvaliJ,hvaliJ. . . KK
3. an interval [hlboundi,huboundi].
Using the options described in the following, Base may be invoked by:
base JhoptionsiK hfilenamei Jhoptionsi|hfilenamei J. . . KK
3
Only the –group option, which refers to a specific given data set, cannot be employed in the configuration
file.
4
By default, filenames are relative to the data directory (option –data-dir); if the data directory is not
set, filenames refer to the working directory or may be absolute paths to a file.
60
Global and local options. Many options are scopable, i.e. it is possible to set their
scope either to all np modes in a given Base run (which is the default) or only to a specific
mode. Instances of such options are called global in the former case and local in the latter.
A scope referring to a given np mode can be started with the –scope option, with all
succeeding options before the next –scope becoming local to that mode. All options given
before the first –scope option are global. If in a given np mode an option is set both
globally and locally, the local option takes precedence.
Available options.
The following options are available to control the behaviour of Base:
–calc-hpdi=hvaliJ,hvaliJ. . . KK Set the probability contents of additional HPDI(s) IHPD
(section 3.2.6) to be calculated for all parameters and derived quantites. Base automatically calculates HPDIs of probability contents 50%, 68.27%, 95%, 95.45%, 99%,
and 99.73%.
Arguments: hvali: probability content C (section 3.2.6). Default: none. Restrictions: Not combinable with –no-inference. Cannot be supplied more than once.
–calc-prob[hpari]=[hlboundi,huboundi] Calculate the posterior probability of a
parameter θ to lie within a given interval I,
p(θ ∈ I|D, M, K) w
1
N −M
N
X
1I (θ(j) ).
(4.1)
j=M +1
Arguments: hpari: name of θ (table 4.3); hlboundi, huboundi: lower and upper bounds
of I. Default: none. Restrictions: Not combinable with –no-inference.
Calculate the posterior prob–calc-prob-all-pars=Jhboundsi|,K,Jhboundsi|,K. . .
ability of parameters θ to lie within the given hypercube Θsub ⊂ Θ,
p(θ ∈ Θsub |D, M, K) w
1
N −M
N
X
1Θsub (θ (j) ).
(4.2)
j=M +1
Arguments: hboundsi ::= hlboundi,huboundi: pair of lower and upper bounds of Θsub
in a given dimension; replacing any hboundsi by , implies that the corresponding
parameter is marginalised. Default: none. Restrictions: Not combinable with
–no-inference.
–calc-transit-after=htimei Set the time after which the first potential transit
times tt,1 , tt,2 of a planet are to be calculated (table 2.2).
Arguments: htimei: time [d] after which transit times are to be calculated. Default:
last observing time. Restrictions: Not combinable with –no-inference. Cannot
be supplied more than once.
61
–comment=htexti
Include a comment in the log.
Arguments: htexti: comment to be included. Default: none. Restrictions: Cannot
be supplied more than once.
–data-dir=hdiri
Set data directory.
Arguments: hdiri: path to the data directory (relative to working directory, or absolute). Default: (empty) (interpreted relative to working directory). Restrictions:
Cannot be supplied more than once.
–discard-frac=hvali Set maximum runtime fraction during which to initially
discard samples before starting burn-in (section 4.5.2).
Arguments: hvali: fraction of runtime. Default: 0.1. Restrictions: Not combinable
with –read-samples. A potential adjustment of tempering parameters occurs after
5 minutes at the latest, immediately followed by the start of burn-in, regardless of
this option.
1/2
–discard-psr-scale=hvali Set the factor ρp by which the difference R̂1/2 − R∗
has to decrease for all parameters, i.e.
1/2
1/2
R̂i − R∗
1/2
1/2
R̂max − R∗
≤ ρp
∀i ∈ [1, k]
(4.3)
before the initial discarding of samples is stopped and burn-in is started (% 4.5.2),
1/2
where R̂1/2 is the PSR of a parameter, R̂max is the maximum PSR attained initially
1/2
by any parameter and R∗ is the threshold for convergence (option –psr-conv).
Arguments: hvali: scale factor ρp . Default: 0.9. Restrictions: Not combinable with
–read-samples. Cannot be supplied more than once. May be overridden by the
maximum runtime before adjusting tempering parameters and/or starting burn-in
(option –discard-frac).
–epoch=htimei
Set the epoch (section 2.1.1).
N
Arguments: htimei: epoch tr . Default: centre of observing time span, t1 +t
2 . Restrictions: Not combinable with –read-samples. Cannot be supplied more than once.
–generate-data[Jang_pos|RVK,Jhcommandsi,KJh# planetsi,Khparsi,Jhmassi,K
h# datai,hbegini,hendi,huncertaintyiJ,huncertaintyiJ. . . KK] Generate AMa or RV
data according to the given arguments (section 4.6.3).
Arguments: hcommandsi ::= hcommandiJ,hcommandiJ. . . KK: data command(s) (table 4.13); h# planetsi: number of planets np (unless in binary mode); hparsi: parameters θ (components separated by ,); hmassi: stellar mass (only for notation in
data-file header, omitted in binary mode); h# datai: number of data; hbegini, hendi:
62
times [d] of first and last observation; huncertaintyi ::= Jhmajori,hminori|hstdeviK:
uncertainty ellipse (a, b) or standard deviation σ, for AMa and RV data respectively
(section 4.6.3). Restrictions: Not combinable with –read-samples, –no-inference.
Cannot be supplied more than once. If observing times are specified by the data
command $time, they must comply with h# datai and the number of huncertaintyi
both; also in this case, values of hbegini and hendi are ignored.
–group[hthresholdsi] Arrange the data from the preceding file in groups according
to the given thresholds, then replace each group by a representative synthetic datum
(section 4.4.2).
Arguments: hthresholdsi ::= Jhtimei,hphii|htimei,hpsii,hgammai|htimeiK: thresholds
for AMa , AMh and RV data, respectively, as defined in section 4.4.2. Restrictions:
Not scopable. Cannot be used in the configuration file.
–help
Display a help screen.
Restrictions: Not combinable with –read-samples, –save-samples, –no-inference.
Not scopable.
–help-pars
Display information on parameters and derived quantities.
Restrictions: Not combinable with –read-samples, –save-samples, –no-inference.
Not scopable.
–inference-len-frac=hvali Set the fraction of the number of links used for inference to the total cold-chain length, ρi ≡ (N − M )N −1 , where N is the total length
and M is the burn-in length (section 4.5.2).
Arguments: hvali: fraction ρi . Default: 0.5.
–inference-len=Jhvali|maxK Set the minimal number of links used for inference,
or maximise that number, with the latter implying that Base will not stop sampling
before either the maximum chain length is reached or both maximum runtime is
reached and burn-in is finished (section 4.5.2).
Arguments: hvali: minimum number of links used for inference, (N −M )min . Default:
hvali= 0. Restrictions: Not combinable with –read-samples. Cannot be supplied
more than once. Independent of this option, inference is always based on at least 103
samples.
–K_max-from-vel
eccentricity as
Determine prior upper bound of K from RV range and minimum
q
˜ ·
Kmax = ∆v
1 − e2min
2
,
(4.4)
63
according to eq. (2.60) and (2.76) (section 2.3).
Restrictions: Not combinable with –read-samples, –msini_max. Cannot be supplied
more than once. Ignored if no RV data given.
–keep-seed Reuse the random seed from file .BASE.randomSeed in the working
directory, where a newly generated seed is stored.
Restrictions: Not combinable with –read-samples. Cannot be supplied more than
once.
–M=hvali Set the stellar mass. Needs to be specified by this option or in the AMa /RV
data file(s) if derived quantities are to be calculated.
Arguments: hvali: stellar mass m? [M ]. Default: none. Restrictions: Cannot be
supplied more than once.
–mail-when-finished=haddressi
Base has finished.
Send an email to the specified address when
Arguments: haddressi: email address. Default: none. Restrictions: Cannot be
supplied more than once.
–max-dur=hvalihuniti Set the maximum runtime to be taken by the Multi-PT
procedure. If this option is not given, –max-len must be given.
Arguments: hvali: value of maximum runtime in the given unit; huniti ::= Jh|m|sK:
time unit. Default: none. Restrictions: Not combinable with –read-samples.
May be overridden by –max-len in case of conflict.
–max-len=hvali Set the maximum length of the final cold chain. If this option
is not given, –max-dur must be given. Takes precedence over –max-dur in case of
conflict.
Arguments: hvali: chain length. Default: none. Restrictions: Not combinable with
–read-samples.
–msini_max=hvali Set the maximum value of mp sin i, from which the prior upper
bound of K is calculated, according to eq. (2.62), as
s
3
2πGfmax
(mp sin i)max .
m2?
Arguments: hvali: value of (mp sin i)max [MJ ]. Default: 13.5
5
(4.5)
Restrictions: Not
This corresponds to the Deuterium-burning limit for Solar-metallicity substellar objects.
64
combinable with –read-samples, –K_max-from-vel. Cannot be supplied more than
once. Ignored if no RV data given.
–n-periods-range=[hloweri,hupperi] Set assumed range of the number of orbital
periods [nmin , nmax ] covered by any data file, translated to a prior range on f as
nmin nmax
f∈
,
,
∆tmin ∆tmax
(4.6)
where ∆tmin , ∆tmax are the shortest and longest time spans covered by any data set,
respectively.
Arguments: hloweri, hupperi: lowest and highest numbers of orbital periods. Default:
none. Restrictions: Not combinable with –read-samples.
–n-pl=hvaliJ,hvaliJ. . . KK Set number(s) of planets to be assumed, thereby defining
a set of nm different np modes to be carried out (section 4.3.2).
Arguments: hvali: number of planets np,i . Default: np,1 = 1, nm = 1. Restrictions:
Cannot be supplied more than once.
–n-pt=hvali Set number of PT procedures m to be run by Multi-PT (section 4.5.4).
Setting m ≡ 1 reduces Multi-PT to PT (section 4.5.3).
Arguments: hvali: number of PT procedures m. Default: 2. Restrictions: Not
combinable with –read-samples. Cannot be supplied more than once.
–no-deriv
Disable calculation of derived quantities (section 3.2.3).
–no-files
Disable any file output.
Restrictions: Not combinable with –save-samples. Cannot be supplied more than
once.
–no-gnuplot-exec
Don’t execute Gnuplot after finishing.
Restrictions: Cannot be supplied more than once.
–no-inference
Disable inference, i.e. only collect posterior samples.
Restrictions: Cannot be supplied more than once.
–no-joint-marg-post
Don’t produce joint marginal posteriors.
65
–no-marg-post Don’t produce marginal posteriors. This also prevents the calculation of marginal-posterior modes.
Restrictions: Not combinable with –periodogram. Cannot be supplied more than
once.
–no-plots Don’t produce any files related to plotting. This also prevents the
calculation of marginal-posterior modes.
Restrictions: Cannot be supplied more than once.
–no-ps4pdf-exec Disable execution of the program ps4pdf (Niepraschk and Voß
2001), which serves to convert PSTricks-compatible plots produced by Gnuplot to
the PDF format if –pstricks-plots is used.
Restrictions: Cannot be supplied more than once. Can only be used with –pstricks-plots.
–no-temp-pars-adj Disable adjustment of tempering parameters. Otherwise, tempering parameters are adjusted so as to optimise the “round-trips” of samples in the
space of tempering parameter γ and thus improve the efficiency of PT (section 4.5.3)
by a variant of the feedback-optimized PT algorithm (Katzgraber et al. 2006). Since
the efficiency improvement has not been quantified or closely studied with Base,
using the present option is recommended.
Restrictions: Not combinable with –read-samples.
–only-plot-data Only plot the given data and exit. If combined with –group,
data are grouped first (section 4.4.2).
Restrictions: Not combinable with –read-samples, –save-samples. Cannot be
supplied more than once.
–only-speed-meas
Only perform a measurement of sampling speed and exit.
Restrictions: Not combinable with –read-samples, –save-samples. Cannot be
supplied more than once.
–out-dir=houtput diri
Set output directory.
Default: ../out (relative to working directory). Restrictions: Cannot be supplied
more than once.
–out-dir-csv=hCSV output diri
Set output directory for CSV plot files.
Default: houtput diri/csv. Restrictions: Cannot be supplied more than once.
66
–periodogram Restrict inference to orbital frequency f , producing a marginal
posterior with refined kernel window width (section 4.6.1).
Restrictions: Not combinable with –no-inference, –no-marg-post, –quick-plots.
Cannot be supplied more than once.
–predict=[hbegini,hendi] Predict uncertainties of observables corresponding to
given data sets in the given time span.
Arguments: hbegini, hendi: time [d] of beginning and end of prediction time span.
Default: none. Restrictions: Not combinable with –no-inference. Cannot be
supplied more than once.
–prior[hpari]=hfilei Set the prior probability density of a parameter θ from a CSV
file, implying both its prior shape and support; the latter can be additionally clipped
using –range (section 4.4.3).
Arguments: hpari: name of θ (table 4.3); hfilei: path (relative to working directory, or
absolute) to a CSV file containing samples of the prior density (table 4.8). Default:
none. Restrictions: Not combinable with –read-samples.
–psr-conv=hvali
Set PSR threshold for convergence (section 4.5.4).
1/2
Arguments: hvali: PSR threshold R∗ . Default: 1.1. Restrictions: Not combinable
with –read-samples. Cannot be supplied more than once.
–pstricks-plots Produce PSTricks-compatible plots, i.e. .tex files that can be
included in a LATEX document. Implies –single-plots.
Restrictions: Cannot be supplied more than once.
–pt-n-chains=hvali Set number of chains n run by each PT procedure (section 4.5.3). Setting n ≡ 1 reduces PT to pure MCMC sampling (section 4.5.2).
Arguments: hvali: number of chains n. Default: 8. Restrictions: Not combinable
with –read-samples. Cannot be supplied more than once.
–quick-plots Produce plots of reduced resolution, e.g. reducing the number of
points at which marginal posteriors are sampled from 500 to 100.
Restrictions: Not combinable with –periodogram. Cannot be supplied more than
once.
–quiet
Suppress all messages to the standard output.
Restrictions: Cannot be supplied more than once. Overridden by –help.
67
–range[hpari]=Jhtypei:K[hloweri,hupperi]
rameter θ.
Set the prior range (and type) of a pa-
Arguments: hpari: name of θ (table 4.3); hloweri: the lower prior bound θmin ; hupperi:
the upper prior bound θmax ; htypei ::= Ju|j|m|s|n[hni]K: the prior type (uniform,
Jeffreys, modified Jeffreys, signed modified Jeffreys or truncated normal, respecmin
tively); hni: ratio of θmax −θ
to the standard deviation for a truncated normal prior
2
max
(whose distribution mean is set to θmin +θ
). Restrictions: Not combinable with
2
–read-samples.
–read-samples=hfilei Read MCMC samples from a saved-samples file (% 4.6.2)
instead of drawing them. Only the samples of the requested n-planets mode (–n-pl)
are used. The original data and potential –group option(s) must be retained. Takes
precedence over –save-samples.
Restrictions: Cannot be supplied more than once.
–save-cand
Save the sampled candidates (% 4.5.2) in a CSV file.
Restrictions: Not combinable with –read-samples. Cannot be supplied more than
once.
–save-samples
Save the final cold-chain samples (section 4.6.2).
Restrictions: Cannot be supplied more than once. Ignored when used with –read-samples.
–scatter-plots Produce scatter plots, i.e. plots of pairs of components of the
posterior samples.
Restrictions: Not combinable with –no-inference.
–scope=hii Begin a new scope, which makes the following options refer to the
indicated np mode.
Arguments: hii: the position i of the intended np mode in the set {np,i } given with
the option –n-pl.
–single-plots
Output one graphic file per plot, where possible.
Restrictions: Cannot be supplied more than once.
–speed-meas-dur=hvalihuniti
Set maximum duration of the speed measurement.
Arguments: hvali: value of maximum duration in the given unit; huniti ::= Jh|m|sK:
68
time unit.
Restrictions: Not combinable with –read-samples.
–stdout-line-len=hvali
Set the line length on standard output.
Arguments: hvali: line length [characters]. Restrictions: Cannot be supplied more
than once.
–swap=hvali
Set mean spacing between swap proposals (section 4.5.3).
Arguments: hvali: mean swap spacing nswap . Default: 100.
binable with –read-samples.
–temp-pars=hvaliJ,hvaliJ. . . KK
chains (section 4.5.3).
Restrictions: Not com-
Set the (initial) tempering parameters of the heated
Arguments: hvali: tempering parameter γ(k) . Restrictions: Not combinable with
–read-samples. Cannot be supplied more than once.
–temp-pars-adj-dur=hvalihuniti
adjustment.
Set maximum duration of tempering-parameters
Restrictions: Not combinable with –read-samples, –no-temp-pars-adj.
–thin=Jhvali|autoK Set thinning stride q, or have it determined automatically
from maximum runtime (–max-dur) and maximum chain length (–max-len) such
that both are approximately reached (section 4.5.2). q = 1 implies that thinning is
deactivated. When combined with –read-samples, the posterior samples are thinned
after reading them in.
Arguments: hvali: thinning stride q. Restrictions: Cannot be supplied more than
once. Automatic thinning cannot be performed with –read-samples.
–value[hpari]=hvali Set parameter θ to a fixed value c, equivalent to a deltafunction prior p(θ|M, K) = δ(θ − c) (section 4.4.3).
Arguments: hpari: name of θ (table 4.3); hvali: the parameter value c. Restrictions:
Not combinable with –read-samples.
–zoom-all-pars=Jhboundsi|,K,Jhboundsi|,K. . .
Set the ranges of “zoomed” marginal
posteriors and joint marginal posteriors for all parameters and only make zoomed
variants of these plots. Marginal posteriors of derived quantities are zoomed accordingly. Omitting any hboundsi implies that the prior range will be used for the
corresponding parameter.
Arguments: hboundsi ::= hloweri,hupperi: pair of lower and upper bounds in a given
69
Table 4.3: Parameter names in Base.
Symbol
Name
Symbol
Name
Symbol
Name
V
av
σ+
$
αr
δr
µ α∗
V
a_v
sigma_+
pi
alpha_r
delta_r
mu_acd
µδ
τ+
e
f
χ
ω
ω2
mu_d
tau_+
e
f
chi
omega
omega_2
i
Ω
K
K1
K2
a0rel
a0
i
Omega
K
K_1
K_2
a`_rel
a`
dimension. Restrictions: Not combinable with –no-inference. Cannot be supplied
more than once.
4.4.2
Data and selection of mode
Observational data of the following types can be treated by Base:
• AM data:
◦ AMa : relative angular position of the secondary with respect to the primary
binary component, given in a cartesian or polar coordinate system in a CSV
file6 with one record per line and each record comprising the fields described in
table 4.4 or table 4.5;
◦ AMh : Hipparcos intermediate astrometric data (abscissa residuals), given in a
binary file as described in van Leeuwen (2007, section G.2.2).
• RV data in a CSV file with one record per line and each record comprising the fields
described in table 4.6.
Base automatically selects binary mode if and only if for all of the given types of data:
• all data of the type refer to the secondary measured with respect to the primary
component (implemented for AMa data) or
• data of both binary components with respect to an external, quasi-inertial reference
point are given (RV data).
6
In the CSV files read by Base, record fields should be separated by one or more spaces.
Table 4.4: AMa data (cartesian coordinate system): record fields.
No.
Quantity
Description
Unit
1
2
3
4
5
6
t
α cos δ
δ
a
b
φ
Time
Right ascension
Declination
Semi-major axis of uncertainty ellipse
Semi-minor axis of uncertainty ellipse
Position angle of uncertainty ellipse, eastwards from North
d
mas
mas
mas
mas
◦
70
Table 4.5: AMa data (polar coordinate system): record fields.
No.
Quantity
Description
Unit
1
2
3
4
5
6
t
ρ
θ
a
b
φ
Time
Angular distance
Position angle, eastwards from North
Semi-major axis of uncertainty ellipse
Semi-minor axis of uncertainty ellipse
Position angle of uncertainty ellipse, eastwards from North
d
mas
◦
mas
mas
◦
Consequently, analysis is carried out in binary mode if AMa data and/or RV data of both
binary components are given. AMh data cannot be treated in binary mode.
File format
The first data record is preceded by the file header, in which every non-empty line starts
with the comment character #. To specify the data type and additional information, any
header line may contain a data command according to table 4.7; the command must be
placed immediately after the comment character.
Data grouping
Base offers to organise observational data from a given set in subsets containing a number of
chronologically successive data and replacing each such group by a representative synthetic
datum derived from the group members. As defined here, this procedure is different from
regular binning of the data in that the group boundaries in the time domain are not
regularly spaced. Rather, groups may start at arbitrary times, whereas their maximum
time span is fixed.
Besides time, other aspects of the data are used to determine group membership,
depending on the data type. Accordingly, the grouping parameter(s) supplied by the user
with the –group option specify the maximum ranges of:
1. time, ∆t (all types)
2. uncertainty-ellipse orientation, ∆φ (AMa data)
3. scan-orientation angle, ∆ψ (AMh data)
4. ratio of parallax factor to its group mean,
∆γ
γ̄
(AMh data).
In each set, beginning with the earliest datum, groups are created from as many successive
data as possible without exceeding any of the relevant grouping parameters. Groups of
Table 4.6: Radial-velocity data: record fields.
No.
Quantity
Description
Unit
1
2
3
t
v
σ
Time
Radial velocity
Measurement uncertainty
d
m s−1
m s−1
71
Table 4.7: Data commands in CSV files.
Command
Meaning
$name=hstringi
$M=hmassi
$type=Jang_pos|RVK
$coord_sys=polar
Set system identifier
Set stellar mass m? [M ]
Data type (AMa or RV)
Polar coordinate system
Positions given for primary/secondary component
Positions measured relative
to binary companion
$bin_comp=J1|2K
$ref_pt=companion
Remarks
May be set using –M option
Required
Only for AMa data
Required for AMa data: 2
Required for AMa data
size 1 may also be created where successive data are too widely spaced; in such groups, data
are not modified. The resulting group sizes are printed such that it can be reconstructed
which original records have been grouped in each set.
The synthetic replacement datum is defined as the weighted group mean, with respect
to the following quantities:
• time t (all types)
• observable (RV and AMh data)
• uncertainty-ellipse orientation φ (AMa data)
• scan-orientation angle ψ (AMh data)
• parallax factor γ (AMh data).
For a quantity x, the weighted group mean is given as
Pk
i=j
x̄ = Pk
wi xi
i=j
wi
(4.7)
if the group contains the jth through kth datum, where wi are normalised weights determined, respectively for one- and two-dimensional data, by (% 2.4)
wi =

 12
σi

(4.8)
1
.
a2i +b2i
By contrast, the AMa replacement observable is derived by first rotating the original
coordinate system S5 around the z5 -axis by the weighted-mean orientation angle φ̄, i.e.
such that all uncertainty-ellipse semi-major axes are approximately aligned with the new
0
0
x0 -axis. Subsequently,
the
data coordinates {xi } and {yi } are independently averaged using
weights
1
a2i
and
1
b2i
, respectively. The resulting mean (x̄0 , ȳ 0 )| is finally transformed
back into S5 . Its uncertainty ellipse is determined by semi-major and -minor axes
1
ā ≡ rP
k
1
i=j a2
i
1
b̄ ≡ rP
k
1
i=j b2
i
,
(4.9)
(4.10)
72
Table 4.8: User-defined prior shape: record fields.
No.
Quantity
Description
Unit
1
2
θ
p(θ|M, K)
Abscissa
Prior density
according to parameter
according to the independent weighted-averaging of {x0i } and {yi0 }.
This definition disregards the differences in orientations of the individual uncertainty
ellipses, but instead treats them as co-aligned. It allows to describe the replacement
uncertainty by a bivariate normal distribution just as the original uncertainties and thus
keep the noise model unchanged. This is a good approximation if ∆φ is chosen small enough.
Similarly, the definition for AMh data ignores the variation in scan-circle orientation and
parallax factor, which is a good approximation if the corresponding thresholds are small.
Grouping may be useful by reducing the average noise level for data whose spacing is
so close that no significant orbital motion is thought to occur within the corresponding
time span. However, owing to its modification of original data, it should be used with
caution and inference should not ideally rely on grouped data only.
4.4.3
Priors
Any model parameter can be assigned a prior probability density in one of three forms:
1. a delta-function δ(θ−c), equivalent to fixing the parameter at value c (option –value);
2. a user-defined shape, i.e. a set of up to 1001 samples {(θi , p(θi |M, K))} of the prior
density given in a CSV file with one record per line and each record comprising the
fields described in table 4.8 (option –prior);
3. a range (either in combination with a user-defined shape, which is then clipped to
the given range, or else using one of the default prior shapes detailed in section 3.2.2
(option –range)).
Any parameter for which no such option is present is assigned a default prior shape
and range to pose the weakest possible restraints justified by the data, the model, and
mathematical/physical considerations.
In case 2 above, the prior density is linearly interpolated between the given samples
and normalised accordingly; the prior support is set to the abscissa range of the samples,
or to a sub-range specified with the –range option. If less than 499 samples are within the
prior support or their abscissae are not uniformly spaced, the prior density is re-estimated
at 1001 positions by means of kernel density estimation.
4.5
Program architecture
In the following, the architecture of Base is described in terms of the program flow on
the top level, the three integral parts responsible for MCMC sampling, improvement of
mixing and detection of convergence, as well as the organisation of the Fortran source code
in modules.
73
4.5.1
Top-level program flow
On the top level, program flow in Base (excluding some minor actions) can be visualised
as in fig. 4.1. In complete analogy, the essential tasks between Start and End and their
dependencies on options are described as follows.
1. Start
2. Initialise variables, including several default values which may be modified using
options.
3. Parse configuration file and command line (section 4.4.1).
4. If –help: display a general help screen;
else if –help-pars: display a parameter-related help screen;
else if observational data are supplied, read observational data.
5. If –group: group the data as described in section 4.4.2.
6. If –read-samples: read header of saved-samples file from an earlier Base run
(section 4.6.2).
7. If –keep-seed: initialise the pseudo-random-number generator with the seed stored
previously in .BASE.randomSeed in the working directory;
else: initialise the pseudo-random-number generator with a seed read from /dev/urandom.
8. If –only-plot-data: write files for data plots;
else if –generate-data: generate synthetic data (section 4.6.3);
else:
(a) Set up models and parameters according to user input.
(b) Repeat for all np modes:
i. If –read-samples:
n
o
A. Read posterior samples θ (j) .
ii.
iii.
iv.
v.
B. Check consistency of parameter set in saved-samples file with current
model.
Set various properties of parameters and derived quantities such as names
and bounds.
Pre-calculate eccentric anomalies.
n
o
If –read-samples: calculate likelihoods L(θ (j) ) of samples;
else: execute multiPT, including measurement of sampling speed.
Unless –only-speed-meas:
A. If –save-samples: save posterior samples.
B. Unless –no-inference:
• Calculate various summaries and derived quantities.
• If neither –no-files nor –no-plots: write certain plot files.
• If –calc-prob-all-pars: calculate posterior probabilities over hypercubes Θsub .
n
o
• Sort scalar samples of parameters θ(j) .
74
• Calculate medians, HPDIs (% 3.2.6) and marginal posteriors (% 3.2.4)
of parameters.
• If –calc-prob: calculate posterior probabilities over intervals I.
• Calculate χ2 .
• If –predict: predict uncertainties of relevant observables (section 3.2.8).
• If data are supplied and not –periodogram: calculate residuals.
• Print summaries for current np mode.
• If neither –no-files nor –no-plots: write certain plot files.
(c) Print summaries for complete Base run.
9. If neither –generate-data nor –only-speed-meas: execute Gnuplot.
10. If –mail-when-finished: send notification email.
11. End
MCMC sampling is carried out by the low-level routine makeChain, described in section 4.5.2. makeChain itself is called by prlTempering, which performs parallel tempering
(section 4.5.3) and is in turn called by the top-level routine multiPT. The latter is started
in step 8(b)iv above and described in section 4.5.4.
4.5.2
Posterior sampling by MCMC
As noted in section 3.2.3, the technique of MCMC helps to estimate the normalised
posterior P(θ) by drawing samples of the parameter vector distributed as the posterior.
Base uses the MCMC variant described by the Metropolis-Hastings algorithm (MH;
Metropolis et al. 1953; Hastings 1970),
n which performs aorandom walk through parameter
space, thereby collecting N samples θ (j) : j = 1, . . . , N .
The distribution of these samples is not initially identical with the posterior but
converges to it in the limit of many samples if the chain obeys certain regularity conditions
(e.g. Roberts 1996). The first M < N burn-in samples are still strongly correlated with the
starting state θ (0) and excluded to improve convergence. Because the burn-in length M
cannot be determined in advance, Base considers a fixed fraction of samples to belong to
the burn-in phase.7 Methods for setting the starting state θ (0) and detecting convergence
are described in section 4.5.4.
Posterior sampling by MCMC is performed by the routine makeChain. Starting from
the current chain link θ (j+1) ∈ Θ, the following steps lead to the next link, according to the
MH algorithm and the hit-and-run sampler (step 1; Boneh and Golan 1979; Smith 1980):
1. Set up a candidate C:
(a) sample a direction, viz. a random unit vector d ∈ Rk from an isotropic density
over the k-dimensional unit sphere;
(b) sample a (signed) distance r from a uniform distribution over the interval {r0 ∈
R : θ (j) + r0 d ∈ Θ};
(c) set candidate C ≡ θ (j) + rd;
7
This fraction can be defined using the –inference-len-frac option (section 4.4.1).
75
Figure 4.1: Base program flow. Important tasks are signified by boxes, while arrows
visualise program flow. Decisions and loops are represented by conditions written on the
corresponding arrows. While most bifurcations correspond to choices made by means of
options, not all options have consequences visible in this diagram.
76
2. calculate the acceptance probability
α(θ
(j)
, C) ≡ min 1,
P(C)
P(θ (j) )
!
;
(4.11)
3. draw a random number β from a uniform distribution U(0, 1) over the interval [0, 1];
4. if β ≤ α, accept the candidate, i.e. set the next link θ (j+1) ≡ C; otherwise,
θ (j+1) ≡ θ (j) .
The hit-and-run sampler, compared to alternatives like the Gibbs sampler (Geman and
Geman 1984), favours exploring of the whole parameter space Θ without becoming “trapped”
in the vicinity of a local posterior maximum (Gilks et al. 1996).
Because only the ratio of two posterior values is used (step 2), the normalising evidence p(D|M, K) – a constant that is difficult to determine (% 3.2.1) – is irrelevant in the
MH algorithm.
Thinning. Slow mixing, i.e. exploration of the parameter space by Markov chains,
increases the number of samples needed to meaningfully characterise the posterior. In such
cases, constraints of computer memory may prevent enough samples for convergence from
being stored. Thus, it may sometimes be useful to only store one in a given number q of
samples and discard the others. This simple concept, called thinning, can be activated
in Base with the –thin option, where the thinning stride q ∈ N+ can either be given or
determined automatically from maximum runtime and maximum chain length such that
both are approximately reached.
4.5.3
Improvement of mixing by parallel tempering
To enhance mixing and decrease the attraction of the chain by local posterior modes, the
parallel tempering (PT) algorithm (e.g. Gregory 2005b)nis employed one level above MH.
o
(j)
Its function is to create in parallel n chains of length N , θ (k) : j = 1, . . . , N ; k = 1, . . . , n
each sampled by an independent MH procedure, where the cold chain k = 1 uses an
unmodified likelihood L(·), while the others, the heated chains, use as replacement L(·)γ(k)
with the positive tempering parameter γ(k) < 1. After nswap samples of each chain, two
chains k ≥ 1 and k + 1 ≤ n are randomly selected and their last links, denoted here by
θ (k) and θ (k+1) , are swapped with probability
L(θ (k+1) )
αswap (k, k + 1) = min 1,
L(θ (k) )
!γk
L(θ (k) )
L(θ (k+1) )
!γk+1 !
,
(4.12)
which ensures that the distributions of both chains remain unchanged.
This procedure allows states from the “hotter” chains, which explore parameter space
more freely, to “seep through” to the cold chain without compromising its distribution.
Conclusions are only drawn from the samples of the cold chain.
In contrast to MCMC sampling, which is sequential in nature, the structure of PT allows
one to exploit the multiprocessing facilities provided by modern symmetric multiprocessor
systems. For this purpose, Base uses the OpenMP API (OpenMP Architecture Review
Board 2008) as implemented for GFortran by the GOMP project (Free Software Foundation
2011b).
The following PT parameters can be adjusted by the user:
77
• The number of parallel chains n can be set with the option –pt-n-chains, with
n ≡ 1 implying that PT is deactivated.
• To adjust the swapping stride nswap , the option –swap can be used.
n
o
• Using the option –temp-pars, the tempering parameters γ(k) : k = 2, . . . , n can be
set, whereas γ(1) ≡ 1 is fixed. By default, γ(n) ≡ 10−3 , and the tempering parameters
are linearly decreasing from γ(1) through γ(n) ,
γ(k) = 1 +
4.5.4
k−1
· (10−3 − 1).
n−1
(4.13)
Assessing convergence by multi-PT
As a matter of principle, it cannot be proven that a given Markov chain has converged to
the posterior. However, convergence may be meaningfully defined as the degree to which
the chain does not depend on its initial state θ (0) any more. This can be determined
on the basis of a set of independent chains – in Base, these are the cold chains of m
independent PT procedures – started at different points in parameter space. The procedure
of Base handling these PT procedures is called multiPT. The PT starting states should
be defined such that for each of their components, they are overdispersed with respect to
the corresponding marginal posteriors.
Since the marginal posteriors are difficult to obtain before the actual sampling, Base
determines the starting states by repeatedly drawing, for each parameter, a set of m
samples from the prior using rejection sampling; the repetition is stopped as soon as the
sample variance exceeds the corresponding prior variance, which yields a set of starting
states overdispersed with respect to the prior. Assuming that the prior variance exceeds
the marginal-posterior variance, the overdispersion requirement is met.
Such a test, using the potential scale reduction (PSR) or Gelman-Rubin statistic, was
proposed by Gelman and Rubin (1992) and was later refined and corrected by Brooks and
Gelman (1998). It is repeatedly carried out during sampling and, in the case of a positive
result, sampling is stopped before the user-defined maximum runtime and/or number of
samples have been reached. It may also sometimes be useful to abort sampling manually,
e.g. when convergence does not appear to improve any further, which can be done at
any time by creating an empty file .BASE.finish in the working directory. If this file is
detected by Base, sampling is finished at the current chain length, the file is deleted and
all remaining procedures are carried out as if sampling had finished regularly.
The statisticnis calculated with respect to each parameter
θ separately, using the posto
(j)
burn-in samples θl : j = M + 1, . . . , N ; l = 1, . . . , m of θ provided by the m independent
chains. It compares the actual variances of θ within the chains built up thus far to an
estimate of the marginal-posterior (i.e. target) variance of θ, thus estimating how closely
the chains have approached convergence.
The PSR R̂1/2 is defined as
d + 3 V̂
R̂ ≡
,
(4.14)
d+1 W
where V̂ is an estimate of the marginal-posterior variance, d the estimated number of
degrees of freedom underlying the calculation of V̂ , and W the mean within-sequence
78
variance. The quantities are defined as
ν−1
m+1
W+
B
ν
mν
m
N
2
X
X
1
(j)
W ≡
θl − θ l
m(ν − 1) l=1 j=M +1
V̂ ≡
B≡
m 2
ν X
θl − θ ,
m − 1 l=1
(4.15)
(4.16)
(4.17)
where B is the between-sequence variance and ν ≡ N − M ; a horizontal bar denotes the
mean taken over the set of samples obtained by varying the omitted indexes.
Owing to the initial overdispersion of starting states, V̂ overestimates the marginalposterior variance in the beginning and subsequently decreases. Furthermore, while the
chains are still exploring new areas of parameter space, W underestimates the marginalposterior variance and increases. Thus, as convergence to the posterior is accomplished,
R̂1/2 & 1. Therefore, convergence can be assumed as soon as the PSR has fallen below
1/2
a threshold R∗ ≥ 1. If Base is run with R∗ ≡ 1, sampling is continued up to the
user-defined length or duration, respectively.
For cyclic parameters (% 2.1.1), the default lower and upper prior bounds are equivalent,
which needs to be taken into account when calculating the PSR. Thus, Base uses the
modified definition of the PSR by Ford (2006) for these parameters.
4.5.5
Organisation of source code
The source code of Base is organised in a main program, main, and 22 modules, each
contained in a separate file and comprising data and procedures referring to a specific field
of work (table 4.9). Each module can make use of, i.e. depend on, other modules by means
of Fortran’s use statement, giving rise to a dependence hierarchy shown in fig. 4.2.
8
In Base, specific constants are assigned to variables to mark them as undefined, which simplifies the
code in many instances. These values are chosen to lie near the lowest storable value of a given data type.
79
Figure 4.2: Dependency graph of the source modules and main program of Base. Arrows
point from the using module to the used module. The modules iso_fortran_env and
omp_lib are part of GFortran.
80
Table 4.9: Modules of Base.
Module
Field of work/tasks
baseRun
basicIO
basicTypes
coords
crc32
dataModelsNoise
env
etc
inference
intervals
io
kdTree
maths
meta
random
sampling
sorting
strings
time
uncert
undefVals
userOpts
Complete runs of BASE
Basic in-/output
Basic data types and conversions
Coordinates in R2 and R3
Cyclic redundancy check (CRC-32)
Data, models, and noise
Computing environment
Further utilities
Posterior inference
Intervals in R
(Formatted) in-/output
kd-trees
Mathematics
Name, author, and version information for BASE
Random numbers
MCMC sampling, PT, Multi-PT, reading and saving samples
Sorting and other tasks on arrays
Strings
Date and time
Representing quantities with their uncertainties
Undefined values of variables8
User options
4.6
Specific algorithms
Below, a selection of algorithms implemented in Base are detailed. These concern inference
in periodogram mode, saving and reading posterior samples, and generating synthetic data.
4.6.1
Periodogram mode
By default, the window width for marginal posteriors is based on the MCMC sample
standard deviation σsamp (eq. (3.41)). If there are multiple maxima, however, this can
lead to artificially broad peaks, which can be particularly problematic for the orbital
frequency f (see section 2.1.1), which plays an important role in distinguishing different
solutions in orbit-related parameter estimation. Therefore, Base includes a periodogram
mode, in which the window width of the marginal posterior of f – a Bayesian analogon to
the frequentist periodogram (% 3.1.4) – is reduced according the following procedure.
1. Initially assume a default window width as given by eq. (3.41);
2. estimate the marginal posterior of f and find its local maximum fmax nearest to
posterior mode fˆ, as well as the local minimum fmin nearest to fmax ;
3. calculate the marginal-posterior standard deviation σ 0 over the half-peak between
fmax and fmin ;
√
4. re-calculate the window width using eq. (3.41), with 2σ 0 replacing σsamp ;
81
5. repeat step 2, but only consider local minima with ordinates p(fmin |D, M, K) ≤
0.5·p(fmax |D, M, K) in order not to be misled by weak marginal-posterior fluctuations;
6. repeat steps 3 and 4.
4.6.2
Saving and reading samples
n
o
Posterior samples θ (j) gathered by Base can be saved to a file and read in again
afterwards, which can be useful when one wishes to make different types of a posteriori
inference based on the same data, models and prior knowledge. For example, Base may
first be run with the option –no-plots in order to obtain textual summaries for a first
inspection. If the option –save-samples was used, Base may then be run again with
–read-samples to re-use the previous posterior samples and produce plots from them by
omitting –no-plots. To read in samples, the same data files as before must be given in
the original order and potential –group options also need to be unchanged.
Base stores posterior samples, along with other relevant information, in saved-samples
files, which consist of the following parts:
1. global header (table 4.10)
2. for each np mode carried out or once for binary mode, respectively (table 4.11):
(a) np -mode header
(b) samples
3. CRC-32 checksum (table 4.12)
After reading the global header (part 1), all available np modes (part 2) are read until the
requested mode is found, whose samples are then used. While reading parts 1 and 2, a
CRC-32 checksum is calculated and compared to the saved checksum (part 3) after reading;
in case of a mismatch, an error is cast in order to prevent using a corrupted saved-samples
file.
4.6.3
Synthetic data
Based on the observable and noise models introduced in sections 2.2 and 2.4, a set of
ND synthetic data can be easily generated by:
1. Sampling a set of observing times {ti : i = 1, . . . , ND } as described below;
2. for each time ti :
(a) calculating the model function f 0i ≡ f (ti ; θ);
(b) adding random noise 0,i to obtain the synthetic datum f i ≡ f 0i + 0,i .
9
The data type of a record, using the abbreviations int (integer) and chr (character), followed by a
number specifying the kind, i.e. size in bytes, and/or the number of elements in the array constituting the
record (in brackets).
10
hundefi is a special value signifying an undefined variable; see footnote 8.
11
The data type of a record, using the abbreviations int (integer) and chr (character), followed by a
number specifying the kind, i.e. size in bytes, and/or the number of elements in the array constituting the
record (in brackets).
82
Table 4.10: Saved-samples file: global header.
Length
const.
ID
Content
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
“BASE posterior samples”
Base revision number
—
length(T )
length(U )
length(V )
# configuration-file lines
date/time Base was started
# data sets
t1
tN
epoch tr
times in JD scale?
# data
data CRC
P
# np modes
Q
var.
R
Si
T
U
V
Wi
Xi
# PT procedures in multiPT
# chains in PT
data type
command line
host-computer name
target-system identifier
length(Xi )
configuration-file line
Remarks
reserved
hundefi10 in
binary mode
Type9
Count
chr(22)
int2
chr(128)
int2
int2
int2
int2
int4×8
int2
real8
real8
real8
int1
int4
int4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
int1
1
int4
1
int4
int1
chr(D)
chr(E)
chr(F )
int2
chr(Wi )
1
I
1
1
1
G
G
Besides ND and the type of data (AMa or RV) to be generated, the following information
is specified by the user with the option –generate-data:
1. the earliest and latest observing times, t1 and tN
2. the number of planets np and stellar mass m? (unless binary-mode data is being
generated) as well as the parameters θ
3. nunc ≥ 1 possible observational uncertainties {σj }, for RV data, or {(aj , bj )}, for
AMa data (% 2.4), from which one is drawn randomly with replacement in step 2b
of the generation algorithm
Additionally, the data commands listed in table 4.13 may be given to modify the procedure
as follows:
• AMa data may be generated in a polar rather than cartesian coordinate system by
using $coord_sys=polar.
• For RV data, $n_b_obs=2 can be used to switch to binary mode,12 generating one
12
According to section 4.4.2, binary mode is automatically selected for AMa data.
83
Table 4.11: Saved-samples file: np -mode header and samples.
Length
const.
ID
Content
Remarks
Type11
Y
# planets
int2
1
Z
AA
AB
AC
AD
int2
int8
int8
int8
int8
1
1
1
1
1
int1
1
AF
AG
AH
AIi
AJi
AK
AL
AM
# local parameters
# samples (total)
# burn-in samples
thinning stride
mean PT block length
tempering parameters
adjusted?
finished by user request?
compression mode
tempering parameter
length(AJi )
local parameter name
prior type
prior lbound
prior ubound
hundefi in binary
mode
int1
int1
real8
int2
chr(AIi )
int1
real8
real8
1
1
R
Z
Z
Z
Z
Z
AN
prior knee
real8
Z
real8
Z
int4
real8
real8
Z
AP
AP
int1
Z
real8
Z
real8
real8
real8
real8
int8
int1
real8
Z
Z
Z
AA
1
AE
var.
AO
AP
AQ
AR
AS
AT
var.
AU
AV
AW
AX
AY
AZ
BA
prior: underlying Gaussian variance
# user-def. prior points
user-def. prior abscissae
user-def. prior ordinates
user-def. prior: log-scale
abscissa?
user-def. prior: inverse
delta
final PSR value
fraction of indep. samples
posterior mode
sample
# sequences
sequence size
value
0: none; 1: RLE
hundefi if not
applying to prior
type
if
if
if
if
AG = 0
AG = 1
AG = 1
AG = 1
Count
AY
Table 4.12: Saved-samples file: checksum.
Length
ID
Content
Remarks
Type11
Count
const
BB
check sum
CRC-32
int4
1
Q
Z
Z
Z
# free
pars.
84
Table 4.13: Data commands for generating data.
Command
Meaning
$coord_sys=polar
$n_b_obs=2
$unif_time
$window=[hperiodi,hfractioni]
$times=[htimeiJ,htimeiJ. . . KK]
Polar coordinate system
Two bodies observed (for RV data)
Apply uniform time sampling
Allow data only in given time window
Explicitly set observing times to generate data at
When generating RV data, choose set of parameters pertaining to AMa and RV data
When generating AMa data, choose set of
parameters pertaining to AMa and RV data
$with_ang_pos
$with_RV
data set each for the primary and secondary components.
• The times {ti : i = 2, . . . , ND − 1} are either sampled randomly from U(t1 , tN ) or, if
$unif_time is given, distributed such that {ti : i = 1, . . . , ND } are uniformly spaced.
• When the observing times are randomly sampled, any number of time windows {(Pj , ρj )}
can be set up, each defined by a period Pj and a signed fraction ρj , implying that
for any sampled time ti the following condition needs to be fulfilled:
mod(ti − t1 , Pj )
Pj
≤ ρj ,
ρj > 0
≥ 1 + ρj , ρj ≤ 0
(
(4.18)
• When generating RV data, the set of parameters for the case of both RV and AMa data
can be used, and vice versa when generating AMa data (commands $with_ang_pos
and $with_RV, respectively).
Chapter 5
Bayesian analysis of exoplanet and binary
orbits
Demonstrated using astrometric and radial-velocity data of Mizar A
This chapter reproduces an article published in Astronomy&Astrophysics (Schulze-Hartung
et al. 2012) unaltered in content. Therefore, some of the text of previous chapters of this
thesis is repeated here.
Aims. We introduce BASE (Bayesian astrometric and spectroscopic exoplanet detection and
characterisation tool), a novel program for the combined or separate Bayesian analysis of astrometric
and radial-velocity measurements of potential exoplanet hosts and binary stars. The capabilities of
BASE are demonstrated using all publicly available data of the binary Mizar A.
Methods. With the Bayesian approach to data analysis we can incorporate prior knowledge and
draw extensive posterior inferences about model parameters and derived quantities. This was
implemented in BASE by Markov chain Monte Carlo (MCMC) sampling, using a combination
of the Metropolis-Hastings, hit-and-run, and parallel-tempering algorithms to explore the whole
parameter space. Nonconvergence to the posterior was tested by means of the Gelman-Rubin
statistic (potential scale reduction). The samples were used directly and transformed into marginal
densities by means of kernel density estimation, a “smooth” alternative to histograms. We derived
the relevant observable models from Newton’s law of gravitation, showing that the motion of Earth
and the target can be neglected.
Results. With our methods we can provide more detailed information about the parameters than a
frequentist analysis does. Still, a comparison with the Mizar A literature shows that both approaches
are compatible within the uncertainties.
Conclusions. We show that the Bayesian approach to inference has been implemented successfully
in BASE, a flexible tool for analysing astrometric and radial-velocity data.
5.1
Introduction
The search for extrasolar planets – places where one day humankind might find other
forms of life in the Universe – has been a subject of scientific investigation since the
nineteenth century, but only became successful in 1992 with the first confirmed discovery of
an exoplanet orbiting the pulsar PSR B1257+12 (Wolszczan and Frail 1992). Still, because
85
86
it is situated in an environment hostile to life as we know it, this case has been of less
relevance to the public than the first detection of a Sun-like planet-host star, 51 Pegasi
(Mayor and Queloz 1995). Since that time, more than 700 extrasolar planet candidates have
been unveiled in more than 500 systems, more than 90 of which show signs of multiplicity
(Schneider 2012).
Closely related to the detection of extrasolar planets is the characterisation of their
orbits. Both these tasks now profit from the existence of a variety of observational
techniques, which we briefly sketch in the following. Comprehensive reviews can be found
in Perryman (2000) and Deeg et al. (2007).
Direct observational methods refer to the imaging of exoplanets (e.g. Levine et al.
2009), which reflect the light of their host stars but also emit their own thermal radiation.
To overcome the major obstacle of the high brightness contrast between planet and star,
techniques such as coronagraphy (Lyot 1932; Levine et al. 2009), angular differential
imaging (Marois et al. 2006; Vigan et al. 2010), spectral differential imaging (Smith 1987;
Vigan et al. 2010), and polarimetric differential imaging (Kuhn et al. 2001; Adamson
et al. 2005) have been invented. Still, imaging has only revealed few detections and orbit
determinations so far.
The most productive methods in terms of the number of detected and characterised
exoplanets are of an indirect nature, observing the effects of the planet on other objects or
their radiation.
Of these, transit photometry (e.g. Charbonneau et al. 2000; Seager 2008) is noteworthy
because it has helped uncover more than 200 exoplanet candidates, plus over 2000 still
unconfirmed candidates from the Kepler space mission (Koch et al. 2010): small decreases
in the apparent visual brightness of a star during the primary or secondary eclipse point to
the existence of a transiting companion. These data allow one to determine the planet’s
radius and orbital inclination and may also yield information on the planet’s own radiation.
Timing methods include measurements of transit timing variations (TTV) and transit
duration variations (TDV) (e.g. Holman and Murray 2005; Nascimbeni et al. 2011) of
either binaries or stars known to harbour a transiting planet. The method used in the
first exoplanet detection (Wolszczan and Frail 1992) is pulsar timing, which relies on slight
anomalies in the exact timing of the radio emission of a pulsar and is sensitive to planets
in the Earth-mass regime.
Microlensing (Mao and Paczynski 1991; Gould 2009), which accounted for about 15
exoplanet candidates, uses the relativistic curvature of spacetime due to the masses of both
a lens star and its potential companion, with the latter causing a change in the apparent
magnification and thus the observed brightness of a background source.
Perhaps the most well-known technique, and one of those on which this article is based,
is known as Doppler spectroscopy or radial-velocity (RV) measurements (e.g. Mayor and
Queloz 1995; Lovis and Fischer 2010). With more than 500 exoplanet candidates, it has
been most successful in detecting new exoplanets and determining their orbits to date.
From a set of high-resolution spectra of the target star, a time series of the line-of-sight
velocity component of the star is deduced. These data allow one to determine the orbit in
terms of its geometry and kinematics in the orbital plane as well as the minimum planet
mass mp,min ≈ mp sin i. To derive the actual planet mass mp , the inclination i of the
orbit plane with respect to the sky plane needs to be derived with a different method, e.g.
astrometry. The RV technique is distance-independent by principle, but signal-to-noise
requirements do pose constraints on the maximum distance to a star. Stellar variability
sometimes makes this approach difficult because it alters the line shapes and thus mimicks
87
RV variations. The signal in stellar RVs caused by a planet in a circular orbit has a
semi-amplitude of approximately
s
K ≈ mp sin i
G
,
m? arel
(5.1)
where mp , m? , i, G, and arel are the masses of planet and host star, the orbital inclination,
Newton’s gravitational constant, and the semi-major axis of the planet’s orbit relative
to the star, respectively. This approximation holds for mp m? , which is true in most
cases. It should be noted that the sensitivity of the RV method decreases towards less
inclined (more face-on) orbits, which is an example for the selection effects inherent to any
planet-detection method.
Finally, astrometry (AM; e.g. Gatewood et al. 1980; Sozzetti 2005; Reffert 2009) – on
which this work is also based – is the oldest observational technique known in astronomy: a
stellar position is measured with reference to a given point and direction on sky. Astrometry
can thus be considered as complementary to Doppler spectroscopy, which measures the
kinematics perpendicular to the sky plane. In contrast to the latter, AM allows one to
determine the orientation of the orbital plane relative to the sky in terms of its inclination i
and the position angle Ω of the line of nodes with respect to the meridian of the target. A
planet in circular orbit around its host star displaces the latter on sky with an approximate
angular semi-amplitude of
mp arel
α≈
,
(5.2)
m? d
where d is the distance between the star and the observer. Again, this approximation holds
for mp m? .
Imaging astrometry, in its attempt to reach sufficient presicion, still faces problems
due to various distortion effects. By contrast, interferometric astrometry has been used to
determine the orbits of previously known exoplanets, mainly with the help of space-borne
telescopes such as Hipparcos or the Hubble Space Telescope (HST), which presently still
excel their Earth-bound competitors (e.g. McArthur et al. 2010). However, instruments
like PRIMA (Delplancke et al. 2000; Delplancke 2008; Launhardt et al. 2008) or GRAVITY
(Gillessen et al. 2010) at the ESO Very Large Telescope Interferometer are promising to
advance ground-based AM even more in the near future.
While planet-induced signals in AM and RVs are both approximately linear in planetary
mass mp , they differ in their dependence on the orbital semi-major axis arel (eq. (5.1)
and (5.2)). Doppler spectroscopy is more sensitive to smaller orbits (or higher orbital
frequencies, eq. (5.45)), while AM favours larger orbital separations, viz. longer periods.
In this article we introduce BASE, a Bayesian astrometric and spectroscopic exoplanet
detection and characterisation tool. Its goals are to fulfil two major tasks of exoplanet
science, namely the detection of exoplanets and the characterisation of their orbits. BASE
has been developed to provide for the first time the possibility of an integrated Bayesian
analysis of stellar astrometric and Doppler-spectroscopic measurements with respect to
their companions’ signals,1 correctly treating the measurement uncertainties and allowing
one to explore the whole parameter space without the need for informative prior constraints.
Still, users may readily incorporate prior knowledge, e.g. from previous analyses with
other tools, by means of priors on the model parameters. The tool automatically diagnoses
1
Although most methods described in this introduction apply to any kind of companion to a star, we
refer here to companions as “exoplanets”, irrespective of whether they are able to sustain hydrogen or
deuterium burning.
88
convergence of its Markov chain Monte Carlo (MCMC) sampler to the posterior and
regularly outputs status information. For orbit characterisation, BASE delivers important
results such as the probability densities and correlations of model parameters and derived
quantities.
Because published high-precision AM observations of potential exoplanet host stars are
still sparse, we used data of the well-known binary Mizar A to demonstrate the capabilities
of BASE. It is also planned to gain astrophysical insights into exoplanet systems using
BASE in the near future.
This article is organised as follows. Section 5.2 provides an overview of the most
often-used methods of data analysis, including Bayes’ theorem and MCMC as theoretical
and implementational foundations of this work, as well as a derivation of the necessary
observable models. BASE is described in section 5.3. In section 5.4, the target Mizar A
and the data used in this article are discussed. Section 5.5 presents and discusses our
analysis of Mizar A. Conclusions are drawn in section 5.6.
5.2
Methods and models
Data analysis is a type of inductive reasoning in that it infers general rules from specific
observational data (e.g. Gregory 2005b). These general rules are described by observable
models, simply called models in the following, which produce theoretical values of the
observables as a function of parameters. The primary tasks of data analysis are listed in
the following.
1. In model selection, the relative probabilities of a set of concurrent models {Mi },
chosen a priori, are assessed. Specifically, exoplanet detection tries to decide the
question of whether a certain star is accompanied by a planet or not, based on
available data. Additionally, model-assessment techniques can be used to determine
whether the most probable model describes the data accurately enough.
2. Parameter estimation aims to determine the parameters θ of a chosen model. This
is specifically referred to as exoplanet characterisation (or orbit determination) in
the present context.
3. The purpose of uncertainty estimation is to provide a measure of the parameters’
uncertainties.
Although model selection is equally important, in what follows we focus entirely on the
second and third tasks, viz. parameter and uncertainty estimation. This is because for a
known binary system, only one model is appropriate, viz. two bodies orbiting each other.
Accordingly, BASE can only perform model selection when it analyses data from stars for
which it is a-priori unknown whether a companion exists.
5.2.1
Likelihoods and frequentist inference
The well-established, conventional frequentist approach to inference is touched upon only
briefly here. Its name stems from the fact that it defines probability as the relative
frequency of an event. Measurements are regarded as values of random variables drawn
from an underlying population that is characterised by population parameters, e.g. mean
and standard deviation in the case of a normal (Gaussian) population. In the following,
89
we derive the joint probability density of the values of AM and RV data, known as the
likelihood L, which plays a central role in frequentist inference.
When combining data of different types, one should generally be aware that potential
systematic errors may differ between the data sets, e.g. due to a calibration error in one
instrument that renders the data inconsistent with each other. In this case, each data set
analysed separately would imply a different result. In other instances, systematic errors in
one set do not affect the other data: for example, any constant offset in radial velocity
is absorbed into parameter V , which is irrelevant to the analysis of astrometric data. In
the following derivation, we assumed that no systematic effects are present that led to
inconsistent data.
In the following, we assumed that the error i of any datum yi is statistically independent
of those of all other data and consists of two components, each distributed according to a
(uni- or bivariate) normal distribution with zero mean:
• a component 0,i corresponding to a nominal measurement error, whose distribution
is characterised by the covariance matrix E0,i or standard deviation σ0,i given with
the datum, for AM and RV data respectively, and
• a component +,i representing e.g. instrumental, atmospheric or stellar effects
not modelled otherwise, whose distribution is characterised by scalar covariance
matrix E+ = diag(τ+2 , τ+2 ) – assuming no correlation between the noise in the two
2 , respectively, where τ and σ are free noise-model
AM components – or variance σ+
+
+
parameters.
The AM data covariance matrix E0,i of datum i, representing the uncertainty of and
correlation between the two components measured, can be written using singular-value
decomposition as
!
a2i 0
E0,i = R(−φi )
R(φi ),
(5.3)
0 b2i
where R(·), ai , bi and φi are the 2 × 2 passive rotation matrix, the nominal semi-major and
-minor axes of the uncertainty ellipse and the position angle of its major axis, respectively.
Using characteristic functions, it is readily shown that the sum i = 0,i + +,i of the
two independent error components is again normally distributed, with zero mean and
1/2
2 + σ2
covariance matrix Ei = E0,i + E+ or standard deviation σi = σ0,i
, respectively.
+
2
The probability density of the values of NAM two-dimensional AM data {r i } and NRV
RV data {vi }, known as the likelihood L, is then given by
(5.4)
L = LAM LRV ,
where LAM and LRV are the likelihoods pertaining to the individual data types,

LAM
= (2π)NAM
NY
AM p
−1
det Ei 
i=1

LRV = (2π)
NRV
2
N
RV
Y
i=1
2
−1
σi 
1
exp − χ2AM ,
2
1
exp − χ2RV .
2
(5.5)
(5.6)
Throughout, we use the term probability density wherever it refers to a continuous quantity, as opposed
to probability for discrete quantities. Probability distribution, denoted by p(·), is a generic term used for
both cases.
90
Furthermore, the sums of squares χ2AM and χ2RV are defined by
χ2AM ≡
χ2RV
≡
N
AM
X
(r i − r(θ; ti ))| E−1
i (r i − r(θ; ti )),
i=1
N
RV X
i=1
vi − v(θ; ti )
σi
2
,
(5.7)
(5.8)
where r i , vi are the ith AM and RV datum, r(·; ·), v(·; ·) the AM and RV model functions
and θ is the vector of model parameters. The relevant models are derived in section 5.2.3.
Parameter estimation. Frequentist parameter estimation is generally equivalent to
maximising L or minimising χ2 as functions of θ. The resulting best estimates of the
parameters θ̂ are therefore often called maximum-likelihood or least-squares estimates. For
linear models, χ2 (θ) is a quadratic function and consequently θ̂ can be found unambiguously
by matrix inversion. In the more realistic cases of nonlinear models, however, χ2 may have
many local minima, therefore care needs to be taken not to mistake a local minimum for
the global one. Several methods exist to this end, including evaluation of χ2 (θ) on a finite
grid, simulated annealing or genetic algorithms (e.g. Gregory 2005b).
Uncertainty estimation. Frequentist parameter uncertainties are usually quoted as
confidence intervals. Procedures to derive these are designed such that when repeated
many times based on different data, a certain fraction of the resulting intervals will
contain the true parameters. Popular methods use bootstrapping (Efron and Tibshirani
1993) or the Fischer information matrix, which is based on a local linearisation of the
model (e.g. Ford 2004). However, these methods suffer from specific caveats: the Fischer
matrix is only appropriate for a quadratic-shaped χ2 in the vicinity of the minimum, and
bootstrapping, which relies on modified data, may lead to severe misestimation of the
parameter uncertainties, especially when these are large (Vogt et al. 2005).
5.2.2
Bayesian inference
Bayesian inference (e.g. Sivia 2006), which has gained popularity in various scientific
disciplines during the past few decades, defines probability as the degree of belief in a
certain hypothesis H. While this is sometimes criticised as leading to subjective assignments
of probabilities, Bayesian probabilities are not subjective if they are based on all relevant
knowledge K, hence different people having the same knowledge will assign them the same
value (e.g. Sivia 2006). Thus, Bayesian probabilities are conditional on the knowledge K,
and this conditionality should be stated explicitly, as in the following equations.
Bayes’ theorem
In the eighteenth century, Thomas Bayes laid the foundation of a new approach to inference
with what is now known as Bayes’ theorem (Bayes and Price 1763). For the purpose of
parameter and uncertainty estimation, the hypothesis H refers to the values of model
parameters θ, and Bayes’ theorem can be expressed as
p(θ|D, M, K) =
p(D|θ, M, K) · p(θ|M, K)
,
p(D|M, K)
(5.9)
91
where p(·) is a probability (density). Furthermore, D ≡ {(ti , y i )} is the set of pairs of
observational times3 and corresponding data values, and M denotes the particular model
assumed. As mentioned above, all probabilities are also conditional on the knowledge K,
including statements on the types of parameters and the parameter space Θ (which we
assume to be a subset of Rk with k ∈ N) as well as the noise model.
Using Bayes’ theorem, the aim is to determine the posterior P(θ) ≡ p(θ|D, M, K), i.e.
the probability distribution of the parameters θ in light of the data D, given the model M
and prior knowledge K. The other terms, located on the right-hand side of the theorem,
are explained below.
• The term prior refers to the probability distribution p(θ|M, K) of the parameters θ
given only the model and prior knowledge K; it characterises the knowledge about
the parameters present before considering the data. For objective choices of priors,
based on classes of parameters, see section 5.7.
• The likelihood p(D|θ, M, K) is the probability distribution of the data values D,
given the times of observation, the model and the parameters. It is introduced in the
context of frequentist inference in section 5.2.1.
• The evidence is the probability distribution of the data values D, given the times of
observation and the model but neglecting the parameter values,
p(D|M, K) =
Z
p(D, θ|M, K) dθ
(5.10)
=
Z
p(θ|M, K) p(D|θ, M, K) dθ.
(5.11)
It equals the integral of the product of prior and likelihood over the parameter
space Θ and plays the role of a normalising constant, which is hard to calculate in
practice, however.
It may be instructive here to note that the frequentist approach of maximising the
likelihood p(D|θ, M, K) is equivalent to maximising the posterior when assuming uniform
priors p(θ|M, K). This can be seen by inserting p(θ|M, K) = const into eq. (5.9), which
leads to
P(θ) = p(θ|D, M, K) ∝ p(D|θ, M, K).
(5.12)
However, this maximum-likelihood approach ignores the fact that uniform priors are
not always the most objective choice (see section 5.7) and the posterior cannot be fully
characterised just by the position of its maximum. Still, the latter can be used as a
posterior summary in the Bayesian framework (section 5.2.2).
Posterior inference
Sampling from the posterior. To estimate the normalised posterior P(θ), i.e. the
probability
n distribution of the
o parameters θ in light of the data, N samples of the parameter
(j)
vector θ : j = 1, . . . , N are collected using the Markov chain Monte Carlo method
(MCMC ; e.g. Gilks et al. 1996) in the variant described by the Metropolis-Hastings
3
In general, ti is the value of an independent variable, which may e.g. be temporal or spatial. We
assumed the measurement durations to be short in comparison with the characteristic time of orbital
motion, given by the orbital period, and thus the observations to take place at points in time ti which are
known exactly.
92
algorithm (MH; Metropolis et al. 1953; Hastings 1970), which performs a random walk
through parameter space. The distribution of these samples – excluding the first M < N
burn-in samples, which are still strongly correlated with the starting state θ (0) – converges
to the posterior P(·) in the limit of many samples if the chain obeys certain regularity
conditions (e.g. Roberts 1996). Methods for setting the starting state θ (0) and detecting
convergence are described in section 5.3.4.
Starting from the current chain link θ (j+1) ∈ Θ, the following steps lead to the next
link, according to the MH algorithm and the hit-and-run sampler (step 1; Boneh and
Golan 1979; Smith 1980):
1. Set up a candidate C:
(a) sample a direction, viz. a random unit vector d ∈ Rk from an isotropic density
over the k-dimensional unit sphere;
(b) sample a (signed) distance r from a uniform distribution over the interval {r0 ∈
R : θ (j) + r0 d ∈ Θ};
(c) set candidate C ≡ θ (j) + rd;
2. calculate the acceptance probability
α(θ
(j)
, C) ≡ min 1,
P(C)
P(θ (j) )
!
;
(5.13)
3. draw a random number β from a uniform distribution over the interval [0, 1];
4. if β ≤ α, accept the candidate, i.e. set the next link θ (j+1) ≡ C; otherwise,
θ (j+1) ≡ θ (j) .
The hit-and-run sampler, compared to alternatives like the Gibbs sampler (Geman and
Geman 1984), favours exploring of the whole parameter space Θ without becoming “trapped”
in the vicinity of a local posterior maximum (Gilks et al. 1996).
Because only the ratio of two posterior values is used (step 2), the normalising evidence p(D|M, K) – a constant that is difficult to determine, as mentioned above – is
irrelevant in the MH algorithm.
Marginalisation and density estimation. Obviously, the posterior mode alone reveals
only one particular aspect of the posterior. However, as a density over k > 2 dimensions,
the posterior cannot be displayed unambiguously in a figure.
To obtain a plottable summary of the posterior, a set of marginal posteriors Pi (·), i.e.
probability densities over each of the parameters θi , and joint marginal posteriors Pi,j (·, ·)
over two parameters, can be estimated. Theoretically, these densities are derived from the
posterior density by marginalisation, viz. integration over all other parameters,
Pi (θi ) ≡ p(θi |D, M, K) =
Z
Pi,j (θi , θj ) ≡ p(θi , θj |D, M, K) =
where dθ \i ≡
Q
k6=i
dθk and dθ \i,j ≡
Q
k6=i,j
P(θ) dθ \i
(5.14)
Z
(5.15)
P(θ) dθ \i,j ,
dθk . Practically, marginal posteriors are
n
o
estimated by only considering component i of the collected samples θ (j) and performing
93
a density estimation based on them. Joint marginal posteriors are derived analogously,
based on components i and j.
Several density estimators exist for deriving a density from a set of samples. One of
them – the oldest and probably most popular type, known as the histogram – has several
drawbacks: its shape depends on the choice of origin and bin width, and when used with
two-dimensional data, a contour diagram cannot easily be derived from it. Generalising
the histogram to kernel density estimation over one or two dimensions, the samples can be
represented more accurately and unequivocally (Silverman 1986).
Below, we refer only to the simpler one-dimensional case. There, the kernel estimator
can be written as
!
N
1 X
x − X (j)
F(x) ≡
K
,
(5.16)
N σker j=1
σker
where x is a scalar variable, K(·) the kernel, σker the window width and X (j) are the
underlying samples. As detailed by Silverman (1986), the efficiency of various kernels in
terms of the achievable mean integrated square error is very similar, and therefore the
choice of kernel can be based on other requirements. Since no differentiability is required
for the estimated densities and computational effort plays an important practical role, a
triangular kernel,
Ktri (x) ≡ max (1 − |x|, 0) ,
(5.17)
was selected for estimating the marginal posteriors. The window width is chosen following
the recommendations of Silverman (1986),
riq
≡ 2.189 · min σsamp ,
1.34
σker
1
(N − M ) 5 ,
(5.18)
where σsamp , riq , N and M are the sample standard deviation, the interquartile range of
the samples, number of samples and burn-in length, respectively.
Periodogram mode. By default, the window width for marginal posteriors is based on
the MCMC sample standard deviation σsamp (eq. (5.18)). If there are multiple maxima,
however, this can lead to artificially broad peaks, which can be particularly problematic for
the orbital frequency f (see section 5.2.3), which plays an important role in distinguishing
different solutions in orbit-related parameter estimation. Therefore, BASE includes a
periodogram mode, in which the window width of the marginal posterior of f (a Bayesian
analogon to the frequentist periodogram) is reduced according the following procedure.
1. Initially assume a default window width as given by eq. (5.18);
2. estimate the marginal posterior of f and find its local maximum fmax nearest to
posterior mode fˆ, as well as the local minimum fmin nearest to fmax ;
3. calculate the marginal-posterior standard deviation σ 0 over the half-peak between
fmax and fmin ;
√
4. re-calculate the window width using eq. (5.18), with 2σ 0 replacing σsamp ;
5. repeat step 2, but only consider local minima with ordinates p(fmin |D, M, K) ≤
0.5·p(fmax |D, M, K) in order not to be misled by weak marginal-posterior fluctuations;
6. repeat steps 3 and 4.
94
Parameter estimation. To obtain a single most probable estimate of the parameters,
the posterior density P(·) can be summarised by the posterior mode θ̂ ∈ Θ, i.e. the point
where the posterior assumes its maximum value,
θ̂ ≡ arg maxθ P(θ).
(5.19)
This point, also known as the maximum a-posteriori (MAP) parameter estimate, can be
approximated by the MCMC sample with highest posterior density, based on the values of
P(θ (j) ) already calculated during sampling. This approximation neglects the finite spacing
between samples.
Alternatively, the following scalar summaries can be inferred from the samples or, for
the marginal mode, from the marginal posteriors Pi (θ):
• mean or expectation θ̄,
Z ∞
θ Pi (θ) dθ,
(5.20)
Pi (θ) dθ ≡ 0.5,
(5.21)
θ̌ ≡ arg maxθ Pi (θ).
(5.22)
θ̄ ≡
−∞
• median θ̃,
Z θ̃
−∞
• marginal mode θ̌,
Uncertainty estimation. For uncertainty estimation, highest posterior-density intervals
(HPDIs) can be derived from the posterior samples. For any given C ∈ R with 0 < C < 1,
a HPDI IHPD ≡ [a, b] is defined as the smallest interval over which the posterior contains a
probability C,
Z b
a
Pi (θ) dθ = C,
s.t.
b − a = min .
(5.23)
BASE automatically calculates HPDIs of probability contents 50%, 68.27%, 95%, 95.45%,
99%, and 99.73%; others may be added on user request.
In contrast to frequentist confidence intervals, HPDIs are generally not symmetric,
meaning that their midpoint does not correspond to the best estimate. This is because the
marginal posteriors may be asymmetric, including any amount of skew.
It should also be noted that HPDIs are not useful with multimodal posteriors because
several modes cannot be meaningfully summarised by one interval per dimension, nor by a
single best estimate.
To quantify linear dependencies between parameters, the a-posteriori Pearson correlation
coefficient,
cov(θ1 , θ2 )
rθ1 ,θ2 ≡ p
= rθ2 ,θ1 ,
(5.24)
var(θ1 ) var(θ2 )
can be inferred from the samples. There may also be nonlinear correlations between
parameters that are not described by the correlation coefficients. One should also be aware
that for strong linear or non-linear relationships between parameters, uncertainties of single
parameters as characterised by HPDIs may not be meaningful.
We stress that the (joint) marginal posteriors can – and should – always be referred to,
especially when best estimates and/or HPDIs do not adequately characterise the posterior.
The availability of these more informative densities is one of the advantages of a Bayesian
approach with posterior sampling.
95
5.2.3
Observable models
Independent of the chosen approach to inference – frequentist or Bayesian –, theoretical
values of the observables need to be calculated and compared to the data by means of
the likelihood. To this end, an observable model is set up for each relevant type of data,
i.e. a function f (θ; t) of the model parameters θ and time t. An overview of all model
parameters used in this work is given in Table 5.1, while table 5.2 lists quantities that can
be derived from them.
In this section, we only sketch the derivation of the observable models, beginning with
a single-planet system. For an in-depth treatment of celestial mechanics, the interested
reader is referred e.g. to Moulton (1984).
Stellar motion in the orbital plane
Newton’s Law of Gravity governs the motion of a non-relativistic two-body system of star
and planet, whose centre of mass (CM) rests in some inertial reference frame. A solution
to it is given by both the star and the planet moving in elliptical Keplerian orbits with a
fixed common orbital plane and each with one focus coinciding with the CM.
To describe the stellar position, whose variation is observable with astrometry and
Doppler spectroscopy, we set up a coordinate system S1 whose origin is identical to the
CM, z-axis perpendicular to the orbital plane and the vector from the CM to the periapsis
orientated in positive x-direction. In S1 , the stellar barycentric position is given by
cos E − e

√
r 1 = a?  1 − e2 sin E  = r 1 (E; a? , e),
0


(5.25)
where a? is the semi-major axis, e the eccentricity and E the eccentric anomaly.
The time-dependent eccentric anomaly is determined implicitly by Kepler’s equation,
E − e sin E = 2π(χ + f (t − t1 )) = M (t),
(5.26)
where f = P −1 is the orbital frequency, P the orbital period, t1 the time of first measurement and M (·) the mean anomaly, which varies uniformly over the course of an orbit.
Furthermore, following Gregory (2005a), we use
χ≡
M (t1 )
= f (t1 − T ),
2π
(5.27)
with T standing for the last time the periapsis was passed prior to t1 (time of periapsis).
Kepler’s equation is transcendental and needs to be solved numerically to obtain E for
every relevant combination of e and M .
BASE performs a one-time pre-calculation of E over an (e, M )-grid, which, because of
the monotonicity of E as an (implicit) function of e and M , allows one to reduce the effort
of numerically solving eq. (5.26) by providing lower and upper bounds on E.
By reference to eq. (5.25) and (5.26), it is readily shown that the stellar coordinates
are periodic functions of χ with period 1. We therefore call χ a cyclic parameter and treat
it as lying in the range [0, 1).
96
Figure 5.1: Definition of the angles ω? , i, Ω. a) From S1 to S2 , the star and its sense of
rotation about the CM are indicated; the dotted line marks the major axis of the orbital
ellipse. b) From S2 to S3 , the observer and line of sight are indicated. c) From S3 to S4 ,
the positive x4 -axis points northward along the meridian of the CM.
97
Transformation into the reference system
To derive the stellar barycentric position as seen from the perspective of an observer,
we transform S1 into a new coordinate system S4 by three successive rotations. These
are described by three Euler angles, termed in our case argument of the periapsis ω? ,
inclination i and position angle of the ascending node Ω, and are carried out as follows
(fig. 5.1):
1. Rotate S1 about its z1 -axis by (−ω? ) such that the ascending node 4 of the stellar
orbit lies on the positive x2 -axis.
2. Rotate S2 about its x2 -axis by (+i) such that the new z3 -axis passes through the
observer.
3. Rotate S3 about its z3 -axis by (−Ω) such that the new x4 -axis is parallel to the
meridian of the CM and points in a northern direction.
Thus, the stellar barycentric position has new coordinates
r 4 = Rzxz r 1 ,
(5.28)
with matrix

Rzxz

A F J


≡B G K 
C H L
(5.29)
defining the rotations; its components are
A = cos Ω cos ω? − sin Ω cos i sin ω?
(5.30)
B = sin Ω cos ω? + cos Ω cos i sin ω?
(5.31)
= − cos Ω sin ω? − sin Ω cos i cos ω?
(5.32)
G = − sin Ω sin ω? + cos Ω cos i cos ω?
(5.33)
F
J
= − sin Ω sin i
(5.34)
K = cos Ω sin i
(5.35)
C = − sin i sin ω?
(5.36)
H = − sin i cos ω?
(5.37)
L = cos i.
(5.38)
A, B, F, and G are known as the Thiele-Innes constants, first introduced by Thiele (1883).
By taking the time derivative of eq. (5.28), we obtain the stellar velocity in S4 ,
− sin E
dr 1
2πf a?
√

Ė =
Rzxz  1 − e2 cos E  .
dE
1 − e cos E
0

v 4 = Rzxz
4

(5.39)
The ascending node is the point of intersection of the orbit and the sky plane where the moving object
passes away from the observer. In step 2, a positive rotation angle is used to ensure that the node is indeed
ascending (fig. 5.1 b).
98
Relation to the planetary orbit
By reference to the above results, the observables of AM and RV are easily derived
(section 5.2.3). Based on the following simple relation, they can be parameterised by
quantities pertaining to the planetary instead of the stellar orbit.
According to the definition of the CM, the line connecting star and planet contains the
CM and the ratio of their respective distances from the CM equals the inverse mass ratio,
−→
mp −→
CS = −
CP,
m?
(5.40)
where C, S and P stand for CM, star and planet, respectively. This implies a simple
relation between the orbits of the star and the planet as follows. The two bodies orbit
the CM with a common orbital frequency f and time of periapsis T . With regard to the
corresponding periapsis, they always have the same eccentric anomaly E. Their orbital
shapes, viz. eccentricities e, are identical as well.
Furthermore, the two bodies share the same sense of orbital revolution, hence those
nodes of both orbits which lie on the positive x2 -axis are ascending. Consequently, the only
Euler angle that differs between stellar and planetary orbit is the argument of periapsis,
which differs by π because star and planet are in opposite directions from the CM.
Observables
To express the stellar barycentric position as a two-dimensional angular position, we
performed a final transformation of S4 into a spherical coordinate system S5 with radial,
elevation and azimuthal coordinates (r, δ, α). Its origin is identical with the observer, its
reference plane coincides with the (y4 , z4 )-plane and its fixed direction is −z4 . In S5 , the
radial coordinate of the CM equals a distance d = 1 AU $−1 , with $ being the parallax.
Stellar coordinates r 4 therefore correspond to a two-dimensional angular position
r5 ≡
δ
α
!
1
=
d
!
x4
y4
(5.41)
in S5 , where δ and α are called the declination and right ascension, respectively. Using
eq. (5.25), (5.28) and (5.29), the model function for the angular position of the star with
respect to the CM becomes
0
r(θ; t) ≡ r 5 = a
(cos E − e)
A
B
!
+
p
1−
e2 sin E
F
G
!!
,
(5.42)
with a0 ≡ $a? (1 AU)−1 . In practice, complications arise because the stellar position is
often measured relative to a physically unattached reference star, whose distance and
motion differ, and not relative to the unobservable CM. By contrast, for a visual binary,
the companion can be used as a reference, yielding the simple model described below.
Other astrometric effects may be caused by the accelerated motion of Earth-bound
observers around the solar system barycentre (SSB). This is discussed for the case of the
binary Mizar A in section 5.2.3.
In contrast to astrometry, RV data are usually automatically transformed into an
inertial frame resting with respect to the SSB (e.g. Lindegren and Dravins 2003), which
allows the Earth’s motion to be neglected in this model and treats the observer’s rest
99
frame as being inertial. The model function for the stellar radial velocity measured by an
observer is thus given by (v 5 )r , with
√
sin E sin ω − 1 − e2 cos E cos ω
(v 5 )r − V = −(v 4 )z = K
,
(5.43)
1 − e cos E
where V is the RV offset, consisting of the radial velocity of the CM plus an offset due to
the specific calibration of the instrument – which therefore differ, in general, between RV
data sets – and K is the RV semi-amplitude, which can be expressed as
K = 2πf a? sin i =
2
2πGf mp (mp + m? )− 3 sin i.
p
3
(5.44)
The last equality holds because of Kepler’s third law,
(ap + a? )3 f 2 =
G
(mp + m? )
4π 2
(5.45)
and the definition of the CM (eq. (5.40)). Owing to eq. (5.44), only one of a? , K needs to
be employed in the AM and RV models; for BASE, K was adopted.
As an aside, in the literature (e.g. Gregory 2005a) often a different version of eq. (5.43)
K
is employed that involves ν instead of E and the alternative definition Kalt ≡ √1−e
(see
2
table 5.2).
Binary system. If the primary and secondary binary components assume the roles of
star and planet, respectively, the above reasoning also yields the observables of a binary
system.
For visual binaries, AM measurements often refer to the position of the secondary with
respect to the primary. From the definition of the CM, it follows that the orbit of the
secondary with respect to the primary is identical with its barycentric orbit but scaled by a
factor (m1 + m2 )m−1
1 , or with the semi-major axis equaling the sum of the two components’
barycentric semi-major axes,
arel = a1 + a2 .
(5.46)
Thus, the AM model of eq. (5.42) can be used for a binary with arel replacing a? , ω2 + π
replacing ω? and
$arel
a0rel ≡
.
(5.47)
1 AU
Equation (5.43) yields the RV of component i if K is replaced by (−1)i+1 Ki and ω by ω2 .
If AM and RV data are combined, BASE uses K1,2 instead of a1,2 and calculates arel using
the equivalent of eq. (5.44) in combination with eq. (5.46).
Effects of the motion of observer and CM
If we consider the observer, whose position relative to the CM defines the orientation of
S2...5 , to be located on Earth and therefore to be subject to Earth’s accelerated motion
around the SSB, the angles Ω, i, ω as defined above are not constant but rather functions
of time; an additional source of their variation is the (assumedly linear) proper motion of
the CM. Consequently, these angles are not appropriate constant model parameters even
within a timespan of a few months. Furthermore, the systems S2...5 defined above are not
strictly inertial, which renders the simple coordinate transformations invalid. However,
we argue below that the variation in these angles is so weak that these effects are indeed
negligible in the context of this work.
100
Because AM determines instantaneous positions, it is directly influenced by changes
in the positions of the observer and the CM. In the following, we assess for Mizar A the
greatest possible magnitude of a change in the AM position ∆r ≡ |∆r| due to angular
changes ∆Ω, ∆i, ∆ω > 0, which are in turn caused by the varying relative position of
observer and CM. Finally, we compare ∆r to the AM measurement uncertainties.
An upper limit on each of the angular changes ∆Ω, ∆i, ∆ω can be derived from the
proper motion of the CM and the annual parallax from the Earth’s motion, as
|∆Ω|, |∆i|, |∆ω| ≤ 2$ + µ ∆t,
(5.48)
where the first term corresponds to the annual parallax and
q µ is the magnitude of the
CM’s proper motion. For the second term, we have µ∆t ≈ µ2α∗ + µ2δ ∆t.
Using sin(x + ξ) ≈ sin x + ξ cos x, for ξ 1, along with eq. (5.30) – (5.35) and (5.48)
as well as the final results for Ω, i, ω and $ (using the values θ̂ from table 5.9) and the
proper motion from table 5.3 yields the following maximum changes of the Thiele-Innes
constants:
|∆A| . 8.30 × 10−6
(5.49)
−6
(5.50)
−6
(5.51)
|∆B| . 8.00 × 10
|∆F | . 7.99 × 10
−6
|∆G| . 4.34 × 10
Hence, with ∆r ≈
p
.
(5.52)
|∆δ|2 + |∆(α cos δ)|2 ,
|∆δ| ≤ a0rel (2|∆A| + |∆F |)
|∆(α cos δ)| ≤
a0rel (2|∆B|
+ |∆G|)
(5.53)
(5.54)
and the final posterior median a˜rel (table 5.10), we obtain the maximum change in relative
angular position of the two components,
∆r . 0.311 µas.
(5.55)
Comparison with table 5.4 reveals that this is more than two orders of magnitude smaller
than the median AM measurement uncertainty, proving that the motions of the Earth and
the CM of Mizar A can indeed be neglected.
Table 5.1: Model parameters used by BASE.
Widest
Prior Support
Symbol
Designation
Unit
e
f
χ
ω, ω2
i
Ω
eccentricity
orbital frequency
mean anomaly at t1 over 2π
argument of periapsis7
inclination
position angle of the ascending node8
semi-major axis of orbit of secondary
around primary over distance
parallax
RV offset
RV semi-amplitude11
standard deviation of additional AM noise
standard deviation of additional RV noise
1
d−1
1
rad
rad
rad
[0, 1)
(0, 1]
[0, 1)
[0, 2π)
[0, π)
[0, 2π)
mas
[10−3 , 10−5 ] 9
a0rel
$
V
K, Ki
τ+
σ+
arcsec
m s−1
m s−1
mas
m s−1
(0, 0.77] 10
—
≥0
≥0
≥0
Cyclic6
×
×
×
Prior Type
Data Types5
AM RV AM+RV
uniform
Jeffreys
uniform
uniform
uniform
uniform
×
×
×
×
×
×
Jeffreys
×
Jeffreys
uniform
mod. Jeffreys
mod. Jeffreys
mod. Jeffreys
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
5
The types of data for which each parameter is relevant. Abbreviations: AM (astrometry), RV (radial velocities).
For cyclic parameters θ, the indicated lower and upper bounds are treated as equivalent by BASE (see section 5.2.3).
7
Normal mode employs ω, while binary mode employs ω2 .
8
The widest prior range of Ω reduces to [0, π) if no RV data are provided, in which case it cannot be determined whether a given node is ascending or descending;
then, Ω is defined to be the position angle of the first node.
9
Lower bound corresponds to AM measurement uncertainty of 1 µas; upper bound according to wide-binary observations by Tolbert (1964).
10
Interval includes trigonometric parallax of the nearest star, Proxima Centauri (Perryman et al. 1997).
11
Normal mode employs K, while binary mode employs K1 and K2 .
6
101
Definition
P ≡ f −1
T ≡ t1 −
d ≡ 1 AU
ρ≡
Kalt ≡
mj ≡
χ
f
$
K1
m2
m1 = K2
√K
1−e2
2
4π 2 K3−j (K1 +K2 )
G
f (2π sin i)3
K
a? sin i ≡ 2πf
2
mp,min ≡ K(2πGf )− 3 (mp,min + m? ) 3
12
d
time of periapsis
d
distance
pc
semi-amplitude14
component mass
K1 +K2
2πf sin i
?
ap,max ≡ mm
a? sin i
p,min
period
alternative RV
K
1
Unit
binary mass ratio
j
aj ≡ 2πf sin
i
arel ≡ a1 + a2 =
Designation
semi-major axis of component’s
orbit around CM
semi-major axis of secondary’s
orbit around primary
semi-major axis of stellar orbit
times sine of inclination
minimum planetary mass15
maximum semi-major axis of
planetary orbit
1
Equations
(5.27)
102
Table 5.2: Quantities derived from model parameters.
Mode12
N B
Data Types13
AM RV AM+RV
×
×
×
×
×
×
×
×
×
×
×
(5.40), (5.44)
m s−1
×
×
×
×
×
×
×
×
M
(5.44), (5.45)
×
×
AU
(5.44)
×
×
AU
(5.44)
×
×
AU
(5.44)
×
×
MJ
(5.44)
×
×
AU
(5.40)
×
×
The modes in which each derived quantity appears. Abbreviations: N (normal), B (binary).
The types of data for which each derived quantity is relevant. Abbreviations: AM (astrometry), RV (radial velocities).
14
Refers to the star or any of the binary components, respectively.
15
The implicit function of minimum planetary mass mp,min is solved numerically. The planet’s minimum mass equals its real mass if the orbit is edge-on, viz.
sin i = 1.
13
103
5.3
BASE – Bayesian astrometric and spectroscopic exoplanet
detection and characterisation tool
We have developed BASE, a computer program for the combined Bayesian analysis of AM
and RV data according to section 5.2, for the following main reasons.
• A statistically well-founded, reliable tool was needed that was able to perform a
complete Bayesian parameter and uncertainty estimation, along with model selection
(only for planetary systems, not detailed in this article).
• We aimed to combine astrometry and Doppler-spectroscopy analyses.
• A possibility to include knowledge from earlier analyses was needed.
• Finding all relevant solutions across a multidimensional, high-volume parameter
space Θ was required. A more detailed knowledge of the parameters than a best
estimate and a confidence interval can provide is especially important when the data
do not constrain the parameters well, e.g. when only few data have been recorded or
the signal-to-noise ratio is low (as can be the case for lightweight planets or young
host stars).
BASE is a highly configurable command-line tool developed in Fortran 2008 and
compiled with GFortran (Free Software Foundation 2011a). Options can be used to
control the program’s behaviour and supply information such as the stellar mass or prior
knowledge (see section 5.3.1). Any option can be supplied in a configuration file and/or on
the command line.
5.3.1
Prior knowledge
Any model parameter can be assigned a prior probability density in one of three forms:
• a fixed value;
• a user-defined shape, i.e. a set {(θi , p(θi |M, K))};
• a range (either in combination with a specific shape, which is then clipped to the
given range, or else using one of the standard prior shapes detailed in section 5.7).
Any parameter for which no such option is present is assigned a default prior shape
and range to pose the weakest possible restraints justified by the data, the model, and
mathematical/physical considerations.
5.3.2
Physical systems and modes of operation
As mentioned above, BASE is capable of analysing data for two similar types of systems,
whose physics have been described in section 5.2.3. These are systems with
1. one observable component that may be accompanied by one or more unobserved
bodies (generally referred to as planetary systems here for simplicity); in this normal
mode, the number of companions can be set to a value ranging from zero to nine, or
a list of such values, in which case several runs are conducted and the outcome is
compared in terms of model selection;
2. two observable, gravitationally bound components (referred to as binary stars); this
mode of operation is called binary mode.
104
5.3.3
Types of data
Observational data of the following types can be treated by BASE:
• AM data, whose observable, in the case of binary targets, is the relative angular
position of the two binary components. Each data record consists of a date, the
angular position (α cos δ, δ) or (ρ, θ) in a cartesian or polar coordinate system, and
its standard uncertainty ellipse, given by (a, b, φ);
• RV data, where each data record consists of a date and the observed stellar radial
velocity as well as its uncertainty.
5.3.4
Computational techniques
In the following, we describe some computational techniques implemented in BASE that
are relevant to the present work.
Improved exploration of parameter space. To enhance the mixing, i.e. rapidness of
exploration of the parameter space, of Markov chains produced by the Metropolis-Hastings
(MH) algorithm (section 5.2.2) and decrease their attraction by local posterior modes, the
parallel tempering (PT) algorithm (e.g. Gregory 2005b)nis employed one level above MH.
o
(j)
Its function is to create in parallel n chains of length N , θ (k) : j = 1, . . . , N ; k = 1, . . . , n
each sampled by an independent MH procedure, where the cold chain k = 1 uses an
unmodified likelihood L(·), while the others, the heated chains, use as replacement L(·)γ(k)
with the positive tempering parameter γ(k) < 1. After nswap samples of each chain, two
chains k ≥ 1 and k + 1 ≤ n are randomly selected and their last links, denoted here by
θ (k) and θ (k+1) , are swapped with probability
L(θ (k+1) )
αswap (k, k + 1) = min 1,
L(θ (k) )
!γk
L(θ (k) )
L(θ (k+1) )
!γk+1 !
,
(5.56)
which ensures that the distributions of both chains remain unchanged.
This procedure allows states from the “hotter” chains, which explore parameter space
more freely, to “seep through” to the cold chain without compromising its distribution.
Conclusions are only drawn from the samples of the cold chain.
In contrast to MCMC sampling, which is sequential in nature, the structure of PT
allows one to exploit the multiprocessing facilities provided by many modern computing
architectures. For this purpose, BASE uses the OpenMP API (OpenMP Architecture
Review Board 2008) as implemented for GFortran by the GOMP project (Free Software
Foundation 2011b).
Assessing convergence. As a matter of principle, it cannot be proven that a given
Markov chain has converged to the posterior. However, convergence may be meaningfully
defined as the degree to which the chain does not depend on its initial state θ (0) any more.
This can be determined on the basis of a set of independent chains – in our case, the cold
chains of m independent PT procedures – started at different points in parameter space.
These starting states should be defined such that for each of their components, they are
overdispersed with respect to the corresponding marginal posteriors.
Since the marginal posteriors are difficult to obtain before the actual sampling, BASE
determines the starting states by repeatedly drawing, for each parameter, a set of m
105
samples from the prior using rejection sampling; the repetition is stopped as soon as the
sample variance exceeds the corresponding prior variance, which yields a set of starting
states overdispersed with respect to the prior. Assuming that the prior variance exceeds
the marginal-posterior variance, the overdispersion requirement is met.
Such a test, using the potential scale reduction (PSR) or Gelman-Rubin statistic, was
proposed by Gelman and Rubin (1992) and was later refined and corrected by Brooks and
Gelman (1998). It is repeatedly carried out during sampling and, in the case of a positive
result, sampling is stopped before the user-defined maximum runtime and/or number of
samples have been reached. It may also sometimes be useful to abort sampling manually,
which can be done at any time.
The statistic is calculated
with respect to each parameter
n
o θ separately, using the
(j)
16
θl : j = M + 1, . . . , N ; l = 1, . . . , m of θ provided by the m
post-burn-in samples
independent chains. It compares the actual variances of θ within the chains built up thus
far to an estimate of the marginal-posterior (i.e. target) variance of θ, thus estimating how
closely the chains have approached convergence.
The PSR R̂1/2 is defined as
d + 3 V̂
,
(5.57)
R̂ ≡
d+1 W
where V̂ is an estimate of the marginal-posterior variance, d the estimated number of
degrees of freedom underlying the calculation of V̂ , and W the mean within-sequence
variance. The quantities are defined as
V̂
≡
W
≡
B ≡
ν−1
m+1
W+
B
ν
mν
m
N
2
X
X
1
(j)
θ l − θl
m(ν − 1) l=1 j=M +1
m 2
ν X
θl − θ ,
m − 1 l=1
(5.58)
(5.59)
(5.60)
where B is the between-sequence variance and ν ≡ N − M ; a horizontal bar denotes the
mean taken over the set of samples obtained by varying the omitted indexes.
Owing to the initial overdispersion of starting states, V̂ overestimates the marginalposterior variance in the beginning and subsequently decreases. Furthermore, while the
chains are still exploring new areas of parameter space, W underestimates the marginalposterior variance and increases. Thus, as convergence to the posterior is accomplished,
R̂1/2 & 1. Therefore, we can assume convergence as soon as the PSR has fallen below
1/2
a threshold R∗ ≥ 1. If BASE is run with R∗ ≡ 1, sampling is continued up to the
user-defined length or duration, respectively.
For cyclic parameters (section 5.2.3), the default lower and upper prior bounds are
equivalent, which needs to be taken into account when calculating the PSR. Thus, BASE
uses the modified definition of the PSR by Ford (2006) for these parameters.
5.4
Target and data
16
Because the burn-in length M cannot be determined in advance, BASE considers a fixed but configurable
fraction of samples to belong to the burn-in phase at any time.
17
Different values of the parallax have been estimated in this work (table 5.9).
18
Uncertainties are missing in the original publication, but have been estimated in this work (sec-
106
Table 5.3: Basic physical properties of Mizar A.
Property
Value
Reference
Type
Spectral types
MV
Mbol
L
R
$ 17
µ α∗
µδ
SB II
2× A2 V
2.27 ± 0.07
0.91 ± 0.07
33.3 ± 2.1
2.4 ± 0.1
39.4 ± 0.3
119.01 ± 1.49
−25.97 ± 1.65
1
2
2
3
3
3
3
4
4
mag
mag
L
R
mas
mas yr−1
mas yr−1
References. (1) Pickering (1890); (2) Hoffleit and Jaschek (1982); (3) Hummel et al.
(1998); (4) van Leeuwen (2007)
Mizar A (ζ 1 Ursae Majoris, HD 116656, HR 5054), the first spectroscopic binary discovered,
is of double-lined type (SB II; Pickering 1890). Its basic physical properties are summarised
in table 5.3. Together with the spectroscopic binary Mizar B, it forms the Mizar quadruple
system, seen from Earth as a visual binary with the components separated by about 14.4 00 .
Mizar is the first double star discovered by a telescope and also the first one to be imaged
photographically (Bond 1857).
At an apparent angular separation of about 11.8 0 , or 74 ± 39 kAU spatial distance,
Mizar is accompanied by Alcor, which has recently turned out to be a spectroscopic binary
itself (Mamajek et al. 2010). Mizar and Alcor, also known as the “Horse and Rider”, form
an easy naked-eye double star, while it is still a matter of debate whether or not they
constitute a physically bound sextuplet.
Published data used in this article are displayed in fig. 5.7 – 5.8 along with the model
functions determined by BASE and their properties are summarised in table 5.4. Using
a Coudé spectrograph at the 1.93-m telescope at Observatoire Haute Provence, Prevot
(1961) obtained 17 optical photographic spectra of the light of Mizar A combined with
tion 5.5.1).
19
Uncertainties refer to the semi-minor and -major axes of the standard uncertainty ellipses, respectively (section 5.2.1).
Table 5.4: Published data for Mizar A used in this work.
Data type
Radial velocities18
Angular
positions19
Angular
positions19
Instrument
Observatory
Mark III interferometer
Haute
Provence
Mount Wilson
NPOI interferometer
Lowell
Coudé spectrograph
Year
No.
records
Median
uncertainty
1961
2 × 17
2.050 km s−1 1
1995
28
1998
25
0.040 mas /
0.345 mas
0.042 mas /
0.137 mas
References. (1) Prevot (1961); (2) Hummel et al. (1995); (3) Hummel et al. (1998)
Ref.
2
3
107
that of electric arcs or sparks between iron electrodes. For each of the 13 individual stellar
lines identified in the spectra, one intermediate radial-velocity value per binary component
was obtained by comparison with a set of reference lines of iron. Finally, a set of 17 pairs
of RVs for the two components was calculated as arithmetic means of the corresponding
intermediate values. The RV measurement uncertainty is not given by Prevot (1961) and
was therefore estimated in the course of the present work, as described in section 5.5.
High-precision AM data of Mizar A were first obtained by Hummel et al. (1995) using
the Mark III optical interferometer on Mount Wilson, California (Shao et al. 1988), with
baseline lengths between 3 and 31 m. It measured the squared visibilities and their
uncertainties at positions sampled over the aperture plane due to Earth’s rotation. The
visibilities can be modelled as a function of the diameters, magnitude differences, and
relative angular positions (ρ, θ) of the binary components, Armstrong et al. (1992). These
authors also describe a procedure to derive one angular position for each night of observation
from a corresponding set of visibilities, which was adopted by Hummel et al. (1995) to
obtain initial estimates of the orbital parameters. These positional data are also relevant
for the present work. Hummel et al. (1995) also performed a direct fit to the squared
visibilities to derive final estimates of the component diameters and orbital parameters.
Later, a descendant instrument, the Navy Prototype Optical Interferometer (NPOI) at
Lowell observatory, Arizona (Armstrong et al. 1998), was used by Hummel et al. (1998) to
obtain more accurate results using three siderostats, viz. three baselines at a time. This
allowed a better calibration using the closure phase,20 which is independent of atmospheric
turbulence. Similarly as before, Hummel et al. (1998) separately fitted binary orbits directly
to the visibility data and also to the positional angles derived for each night, concluding
that the respective results agree well with each other and that those parameters in common
with spectroscopic analyses are compatible with Prevot (1961); they also performed a fit
to both AM and RV data to obtain their final parameter estimates.
While Hummel et al. (1998) did not include the older, less accurate Mark III data
in their analysis, we present a combined treatment of all published data, i.e. the AM
positions of Hummel et al. (1995) and Hummel et al. (1998) along with the RV data of
Prevot (1961).
5.5
Analysis and results
To illustrate the features of BASE and demonstrate its validity, we used the tool to analyse
all published data of Mizar A. This section details the steps taken in and the results of
our analysis. Its general goal was, given uninformative prior knowledge, to search the
20
The closure phase φcl is the phase of the product of three visibilities, each pertaining to a different
baseline.
Table 5.5: Analysis passes carried out sequentially.
Pass
Description
Data
A
B
C
D
E
First constraints on RV parameters
Combining all data
Selecting frequency f
Selecting ω2 and Ω
Refining results
RV
All
All
All
All
108
parameter space as comprehensively as possible to find and characterise the a posteriori
most probable solution together with its uncertainty and other characteristics. For reasons
detailed below, once the RV data had been prepared, several runs of BASE (table 5.5)
were manually conducted, each using priors derived from the previous pass, with relatively
uninformative priors in the first step.
In this analysis, our approach was to regard all nominal measurement uncertainties as
accurate by setting parameters characterising additional noise in AM (τ+ ) as well as RV
(σ+ ) data to zero.
Astrometric data alone allow one to constrain neither the RV offset V nor the amplitudes K1 , K2 but only the sum K1 + K2 (section 5.2.3). By contrast, these parameters can
be constrained using spectroscopic data alone, or AM and RV data both. However, the AM
data reduce the relative weight of the spectroscopic data and thus make the determination
of these parameters harder. We found in the course of this work that by an iterative
approach, starting with a first pass using only RV data, this difficulty could be resolved.
In all passes, BASE was configured to build up eight parallel chains, including the
cold chain, using the parallel-tempering technique (detailed in section 5.3.4). 108 posterior
samples were collected, the first 10% of which were assumed to be burn-in samples. In the
final pass, two PT procedures were employed to enable a test of convergence to the posterior,
which reduced the number of samples collected with unchanged memory requirements to
5 × 107 . Convergence was not assessed in earlier passes, because convergence was difficult
to reach as long as several distinct solutions existed within the prior support.
5.5.1
Preparation of RV data
Assuming appropriate measurement uncertainties is a prerequisite for the proper relative
weighting of the different data types when combining them. Because these uncertainties
are not quoted by Prevot (1961) for the spectroscopic data, we estimated them according
to the following method.
First, we assumed that each RV datum vi is the sum of the model value v(θ true ; ti ) and
an error ei , where θ true are the true parameter values and the errors are independent and
identically Gaussian-distributed with an unknown standard deviation σ. Furthermore, we
assumed that the model parameters found in a particular analysis are identical with θ true .
It follows that the uncertainty σ can be estimated as the sample standard deviation of the
set of residuals {vi − v(θ true ; ti )}.
Thus, based on the sample standard deviation of the best-fit residuals of Prevot (1961),
viz. 2.13 km s−1 , we initially assumed a conservative value of 2.50 km s−1 for all data. To
quantify the measurement uncertainties based on our own inference, we then conducted a
preliminary analysis similar to pass A described in the next subsection, from which we
finally inferred σ = 2.05 km s−1 . In addition, we took a more correct alternative approach
−1
to estimating the RV uncertainties by assuming a relatively low
q value of σ = 2.00 km s
2 deviating by less
and allowing for higher noise via σ+ , which led to an estimate σ 2 + σ+
than 1.7% from our previous estimate.
109
Table 5.6: Pass A: initial prior ranges.
e
f 21
(d−1 )
χ
ω2
(rad)
V 22
(km s−1 )
K1 23
(km s−1 )
K2 23
(km s−1 )
0
1
2.7379×10−6
1
0
1
0
2π
−65.25
56.15
0
69.50
0
68.85
✹✵
✺✵
✂✄ ❬ ❦♠s ❪
✻✵
✵✳✵✽
✵✳✵✼
✵✳✵✻
✮ ✁ ✵✳✵✺
✭ ✐ ✵✳✵✹
❑
✵✳✵✸
✵✳✵✷
✵✳✵✶
✵
P
✷✵
✸✵
✵✳✸✵
✵✳✷✺
✵✳✷✵
✮
✭☎ ❱
✵✳✶✺
P
✵✳✶✵
✵✳✵✺
✲✶✹ ✲✶✷ ✲✶✵ ✲✽ ✲✻ ✲✹ ✲✷
✆ ❬ ❦♠s ❪
✵
✷
Figure 5.2: Pass A: Marginal posteriors of RV amplitudes K1 (top, solid line), K2 (top,
dashed line) and offset V (bottom), all plotted over the approximate range of the corresponding 99% HPDI.
110
✶✽✵✵✵
✶✻✵✵✵
✶✹✵✵✵
✶✷✵✵✵
✮
✶✵✵✵✵
✭
❢
✽✵✵✵
P
✻✵✵✵
✹✵✵✵
✷✵✵✵
✵
✵✳✵✶
✵✳✵✷
✵✳✵✸
✵✳✵✹
✵✳✵✺
✵✳✵✻
✂
✁ ❬ ❪
❞
Figure 5.3: Pass B: Marginal posterior of orbital frequency f , plotted over the range of the
corresponding 99% HPDI. This is a Bayesian analogon to the frequentist periodogram.
5.5.2
Pass A: first constraints on RV parameters
To facilitate the determination of the RV parameters K1 , K2 and V , a first pass using only
RV data was carried out, as mentioned above. It used uninformative priors (section 5.7
and table 5.1), with bounds listed in table 5.6.
Figure 5.2 shows the resulting marginal posteriors (see section 5.2.2) of the RV offset V
and the amplitudes K1 , K2 , with the abscissae approximately corresponding to the 99%
HPDIs. These marginal posteriors represent much tighter constraints on the parameters
than the corresponding priors do.
5.5.3
Pass B: combining all data
Providing as new priors the marginal posteriors of all RV parameters e, f, χ, ω2 , V, K1 and
K2 from pass A, constrained to the corresponding 99% HPDIs IHPD, 99% , and again using
uninformative priors on the additional parameters (table 5.7), the AM data were added
and BASE was run again. Using the resulting posterior samples, BASE was additionally
invoked in periodogram mode to refine the kernel window width for the marginal posterior
of f as described in section 5.2.2. (This refinement was not used in pass A in order not
to constrain the frequency too tightly before adding the AM data, because the combined
data are expected to correspond to a marginally differing frequency.)
21
The prior bounds of f correspond to a period range between 1 d and 1 yr.
The bounds of V are given by the section of the RV ranges measured for primary and secondary, with
the latter extended by one corresponding measurement uncertainty on both sides.
23
The upper bounds of K1 and K2 are given by half the measured RV span of the corresponding
component, plus one measurement uncertainty.
24
Interval includes trigonometric parallax of nearest star, Proxima Centauri (Perryman et al. 1997).
22
Table 5.7: Pass B: prior ranges for additional AM parameters.
i
(rad)
0
π
Ω
(rad)
0
2π
$ 24
(arcsec)
0
0.77
111
✶✷
✶✵
✮
✁✭
✡
❀P✮
✽
✻
✭
✁
P
✦
✹
✷
✵
✶
✷
✸
✹
✺
✂✂✄ ☎ ❬r❛❞❪
Figure 5.4: Pass C: Marginal posteriors of the argument of periapsis ω2 (solid line) and
position angle of the ascending node Ω (dashed line), plotted over the approximate range
of the corresponding 99% HPDIs. Triangles indicate the positions of small local maxima
located approximately ±π from the corresponding marginal modes.
The resulting marginal posterior (fig. 5.3) exhibits a very strong mode around 0.04869 d−1 ,
whose height is 8.1 times that of the next lower peak. Over the range of its mode, the
marginal posterior contains a total probability of 45.0%. Most other parameters in this
stage have very broad and/or multimodal marginal posteriors, hinting at different solutions
still probable within the prior support.
5.5.4
Pass C: selecting the frequency f
For the next pass, the prior of f was provided by the marginal posterior of pass B (fig. 5.3),
constrained to the range of the marginal mode. For all other parameters, priors were
identical with the previous marginal posteriors, constrained to IHPD, 99% .
With the frequency thus restrained, the pass produced unimodal marginal posteriors in
all parameters except for ω2 and Ω.
The marginal posteriors of the argument of periapsis ω2 and the position angle of
the ascending node Ω exhibited small local maxima located at a distance of −1.02π and
+1.03π from their marginal modes, respectively (indicated by triangles in fig. 5.4). This
can be explained by the fact that the Thiele-Innes constants (eq. (5.30) – (5.33)), which
appear in the AM model, contain products of sin(·) and/or cos(·) functions with ω2 and
Ω as arguments. These products retain their values when ω2 and Ω are both shifted by
π – with opposite signs, since the marginal modes of these angles lie in different halves
of the interval [0, 2π). Consequently, the AM model function is invariant with respect to
such shifts. (Thus, when analysing only AM data, it cannot be determined which node
is ascending, hence Ω is defined only over the interval [0, π) and refers to the first node.)
Owing to the combination with RV data, which independently constrain ω2 (eq. (5.43)),
the ambiguity is strongly reduced, as illustrated by the very small subpeaks in fig. 5.4 –
though it is not completely resolved.
As another result from this pass, the high correlation coefficient of the two angles,
rω2 ,Ω = −0.62, expresses the strong negative linear relationship between them introduced
by the possibility of a contrarious change.
112
✵✳✺✷
✵✳✺✸
✵✳✵✹✽✻✽✾
✸✻
✷✹
✸✽
✹✵
✹✷
✩ ❬✄❛☎❪
✷✺
✷✻
✲✽
✷✼
✵✳✂✸
✵✳✂✻
✟✶ ❬❆❯❪
✲✻
❱ ❬
✵✳✾
✂
✞ ❬♣❝❪
✵✳✂
✵✳✵✹✽✻✾
✵✳✾✷✽
✵✳✂
✲✹
✺✷
✚
❑✶ ❬
✂✳✂ ✂✳✷
✵✳✂✺
✟ ❬❆❯❪
✻✵
✺✻
❦♠
❪
s
✷
✹✳✾✼ ✹✳✾✽ ✹✳✾✾
✵✳✾✸✹
✦
✤
✶
❢ ❬ ❪
❞
❡
✻✹
✷✳✺
✸
✵✳✷✸
✵✳✷✺
✷
✂✳✵✹
✂✳✵✺
✷✵✳✺✸✽✂
✷✵✳✺✸✽✹
❦♠
❪
s
✸
✝ ❬▼❙✉♥ ❪
✂✳✵✻
✂✳✽✷
✂✳✽✸
✸✻✾✾✼✳✂✺
P ❬✁❪
✹
✻✺
✼✵
❀
❑✆❧t ✶ ❬
✂✳✽✹
✡ ❬r❛✁❪
✐ ❬r❛✁❪
✻✵
✺✻
❑ ❬
✝✶ ❬▼❙✉♥ ❪
✵✳✷
✺✷
❦♠
❪
s
✺
❬r❛✁❪
✸✻✾✾✼✳✸✺
❚ ❬✁❪
✼✺
❦♠
❪
s
✻✵
✻✺
❑✆❧t
❀
✼✵
❬
❦♠
❪
s
✵✳✷✼
✟✠✡❧ ❬❆❯❪
Figure 5.5: Marginal posteriors of parameters e, f, χ, ω2 , i, Ω, $, V, K1 , K2 (see also table 5.1)
and derived quantities P, T, d, ρ, Kalt,1 , Kalt,2 , m1 , m2 , a1 , a2 , arel (table 5.2) with marginalposterior medians (dotted line) and 68.27% HPDIs (dashed lines), from the final pass.
Abscissae ranges are identical with the corresponding 95% HPDIs. Ordinate values were
omitted but follow from normalisation. Open triangles on the upper abscissae indicate the
MAP estimates. Open circles on the lower abscissae bound the confidence intervals given by
or derived from Prevot (1961), while open triangles refer to the intervals of Hummel et al.
(1998). Some of these literature estimates are not plotted because they are outside the
abscissa range; for f and χ, no literature uncertainty estimate is available. For derivations
and numerical values, see tables 5.9 and 5.10.
5.5.5
Passes D and E: selecting ω2 and Ω and refining results
In pass D, the ambiguity in ω2 and Ω was resolved by selecting the range around their
marginal modes via the new priors. For all other parameters, priors were again identical to
the previous marginal posteriors, constrained to IHPD, 99% . All resulting marginal posteriors
turned out to be unimodal, corresponding to a single solution as opposed to several clearly
distinct orbital solutions.
As a final step, pass E was conducted to refine the results by using all previous marginal
posteriors, constrained to IHPD, 99% , as new priors, thus confining the parameter space a
priori to the most probable solution.
(Joint) marginal posteriors and correlations. The final marginal posteriors of all
model parameters, plotted over the corresponding 95% HPDIs, are shown in fig. 5.5, along
with the medians, 68.27% HPDIs (corresponding in probability content to the frequentist
1-σ confidence intervals) and MAP estimates. For comparison, literature estimates and
113
Figure 5.6: Two-parameter joint marginal posterior densities Pi,j (·, ·) from pass E. The
figure consists of 45 sub-plots, one for each combination of two parameters. Black denotes
highest density. The inner and outer contours contain 50% and 64.86% probability,
respectively. All plots aligned in one column share the same abscissa, denoted on the
bottom; all plots aligned in one row share the same ordinate, denoted on the left. All
abscissae and ordinates are displayed over the corresponding 95% HPDIs (table 5.9), such
that the total probability content of each plot is between 90% and 95%.
114
✂
✄
✁
Figure 5.7: AM and RV data and models calculated with the MAP parameter estimates
(table 5.9). a) AM data (uncertainty ellipses, solid lines), model (large ellipse, solid line)
and residual vectors (dashed lines). b) Residual AM error ellipses. c) RV data and model
for primary (normal error bars, solid line) and secondary (hourglass-shaped error bars,
dotted line). The horizontal dashed line indicates the RV offset V . RV residuals are
presented in fig. 5.8.
confidence intervals are also included, where permitted by the abscissa range.
Linear and nonlinear dependencies between a pair of parameters may be qualitatively
judged by means of the joint marginal posteriors (section 5.2.2) shown in fig. 5.6. The
inclinations of their equiprobability contours with respect to the coordinate axes are related
to the corresponding correlation coefficients. Some joint marginal posteriors exhibit clear
deviations from bivariate normal distributions, illustrating that a Gaussian approximation
of the likelihood by use of the Fischer matrix (section 5.2.1) would be inappropriate.
Table 5.11 lists the final posterior correlation coefficients. When these attain high
absolute values, model-related equations can sometimes serve as an explanation.
For example, the highly negative correlation between f and χ is related to eq. (5.26) in
the following way. Given data and estimated model parameters, let tm be the time midway
between the first and last observations, and Em the eccentric anomaly at tm . Now increase
f by a small amount. This can be balanced by a small decrease of χ (or, equivalently, by
increasing the time of periapsis T ) such that the eccentric anomaly at tm again equals
Em . This contrarious change of f and χ, as opposed to leaving only one of them altered,
will make E deviate less, on average, over the total timespan; i.e. the model will fit the
data better. The relation between f and χ so introduced is indicated by their negative
correlation coefficient.
115
✽
✻
❪
✹
♠❬
✷
s
❛
✍
✵
✲✷
✶
✲✶
✽
❪
s
❛
✹
✍
✵
♠❬
s
♦
❝
✲✹
☛
✲✽
✸
✵
✲✸
✻✵
✹✵
✷✵
❪
❦
✁
✈
❬
✵
✲✷✵
✲✹✵
✲✻✵
✲✽✵
✹
✲✹
✵
✙
✷✙
▼
Figure 5.8: All data types: from top to bottom: AM data and model (δ coordinate); AM
residuals ∆δ; AM data and model (α cos δ coordinate); AM residuals ∆(α cos δ); RV data
and model v (primary: solid line, secondary: dotted line, RV offset V : horizontal dashed
line); RV residuals ∆v (primary: normal error bars, secondary: dashed error bars, model:
dashed line). AM error bars are defined as the sides of the smallest rectangle orientated
along the coordinate axes and containing the respective error ellipse. Abscissa values are
mean anomalies M (eq. (5.26)), i.e. times folded with respect to the MAP estimates of the
time of periapsis T and period P .
116
✵✳✹
◆ ✭✵❀ ✶✮
✵✳✸✺
❆▼
✵✳✸
❘❱
❛❧❧
✵✳✷✺
②
t
✐
s
✵✳✷
♥
❡
❉
✵✳✶✺
✵✳✶
✵✳✵✺
✵
✲✶✵
✵
✲✺
✺
✶✵
◆♦r♠❛❧ ✁✂❞ r✂✁ ❞✉❛❧
Figure 5.9: Distribution of normalised residuals of AM data (long-dashed line), RV data
(short-dashed) and all data (solid), along with the standard normal distribution N (0, 1)
(dotted). For definitions of the normalised residuals, see eq. (5.61) – (5.65).
As another example, the highly negative correlation coefficient ω2 and Ω can be
understood by reference to eq. (5.30) – (5.33), which describes the Thiele-Innes constants
of the AM model as follows. In the case of edge-on orbits (inclination i = 0), these
expressions simplify to sums or differences of cos(·) and sin(·) functions, the arguments
being ω2 + Ω or ω2 − Ω, with both arguments appearing equally often. An additional
simplification is observed for face-on orbits (i = π2 ), where the Thiele-Innes constants are
A = G = − cos(Ω + ω2 ) and B = −F = − sin(Ω + ω2 ). Thus, for orbits nearly face-on,
the strong appearance of the sum Ω + ω2 introduces a negative correlation between the
two angles, because their contrarious change can lead to the same value of the model
function. For Doppler-spectroscopic data, this is counteracted by the fact that the RV
model function does not contain Ω.
Models and residuals. Figure 5.7 presents all data with the models calculated from
the MAP estimates listed in table 5.9, as well as residual vectors and error ellipses for
astrometry. While this analysis used both the older (Hummel et al. 1995) and newer
(Hummel et al. 1998) AM in combination with RV data (Prevot 1961), the AM model is
very similar to the one calculated by Hummel et al. (1998, fig. 11) using only the newer
AM data because of the overall agreement in parameters.
Figure 5.8 shows all data, models and residuals, separately by coordinate for AM. The
abscissa corresponds to the mean anomaly M (eq. (5.26)), i.e. the plots are folded with
respect to a time of periapsis T and period P corresponding to the posterior mode. Again,
due to the similarity in parameters, the folded RV plot in fig. 5.8 is very similar to the
corresponding figure in Prevot (1961).
Table 5.8: p-values of the Kolmogorov-Smirnov statistic.
Data type
AM
RV
All data
p-value25
0.00098
0.75139
0.00331
117
To assess the distribution of the residuals and compare it to a normal distribution, thus
checking the validity of our noise model, we normalised the residuals of both data types as
follows. For AM, we defined the normalised residual as a signed version of the Mahalanobis
distance (Mahalanobis 1936) between the observed and the modelled values,
q
%AM,i ≡ si (r i − r(θ; ti ))| E−1
i (r i − r(θ; ti )),
(5.61)
si ≡ sgn ((φi − ϕi )(π − [φi − ϕi ])) ,
(5.62)
with sign
where ϕi and φi are the position angles of the residual and of the uncertainty ellipse (see
section 5.2.1), respectively. This definition allows us to write, according to eq. (5.8),
χ2AM =
N
AM
X
%2AM,i
(5.63)
i=1
For RV data, we define
%RV,i ≡
vi − v(θ; ti )
,
σi
(5.64)
which analogously yields
χ2RV =
N
RV
X
%2RV,i .
(5.65)
i=1
The distributions of normalised residuals of both data types individually as well as of all data,
estimated using kernel density estimation (section 5.3.4), are shown in fig. 5.9. Table 5.8
lists the p-values of the Kolmogorov-Smirnov statistic, which relates each distribution to
the standard normal distribution N (0, 1).
The p-value equals the probability, under the hypothesis HN that the normalised
residuals are randomly drawn from N (0, 1), to observe a distribution of normalised residuals
that differs at least as much from N (0, 1) as is actually the case. This difference is quantified
here by the Kolmogorov-Smirnov statistic. Denoting the hypothesis of such an observation
by Hobs , the p-value can be expressed as p(Hobs |HN ). We note that according to the Bayes
theorem, this is not equal to the “inverse” probability p(HN |Hobs ) of the residuals coming
from the normal distribution, given the observation.
For the AM residuals, as well as for all residuals, a heavy-tailed distribution (fig. 5.9)
is observed and the low p-value indicates a minute probability of such an observation
under HN , i.e. the normalised residuals comply poorly with the standard normal distribution N (0, 1). This reflects the fact that several AM data points are outliers with respect to
the observable and noise model (fig. 5.7). We interpret this in terms of systematic effects in
the measurements that are not contained in our noise model. In contrast, the normalised
RV residuals are more normally-distributed, and there are no such severe outliers (fig. 5.7
and fig. 5.8).
According to the principle of maximum entropy, given only the mean and variance
of a distribution, the normal distribution has maximum information-theoretic entropy,
equivalent to minimum bias or prejudice with respect to the missing information (e.g. Kapur
1989). Still, it is well-known that this widely used noise model is relatively prone to outliers;
Lange et al. (1989) have suggested to replace it by a non-standardised t-distribution,
resulting in the down-weighting of outliers. This distribution can be derived from the
25
These values are defined with respect to the final residuals and a standard normal distribution.
118
normal distribution under the assumption that the noise variance is unknown with a certain
probability distribution. The t-distribution has an additional unknown degrees-of-freedom
parameter ν ∈ R; for ν = 1, it resembles the Cauchy-distribution.
In contrast, our approach has been to regard every datum with its standard deviation
as accurate, which also implies that we have not discarded any data as outliers. Under
this assumption, deviating from the maximum-entropy principle by selecting a different
distribution introduces prior “knowledge” that we may not actually have and thus potentially
biases the results.
5.6
Conclusions
We have presented BASE, a novel and highly configurable tool for Bayesian parameter and
uncertainty estimation with respect to model parameters and additional derived quantities,
which can be applied to AM as well as RV data of both exoplanet systems and binary
stars. With user-specified or uninformative prior knowledge, it employs a combination of
Markov chain Monte Carlo (MCMC) and several other techniques to explore the whole
parameter space and collect samples distributed according to the posterior distribution.
We presented a new, simple method of refining the window width of one-dimensional kernel
density estimation, which is used to derive marginal posterior densities.
We derived the observable models from Newton’s law of gravitation, neglecting the
motion of the observer and the target system, which we showed is justified in the case of
Mizar A. After sketching how we estimated the RV uncertainties that are missing in the
original publication (Prevot 1961), we detailed our analysis of all publicly available AM
and RV data of Mizar A. It consists of five consecutive stages and has produced estimates
of the values, uncertainties, and correlations of all model parameters and derived quantities,
as well as marginal posterior densities over one and two dimensions.
As illustrated in fig. 5.5 and table 5.9, our new results exhibit overall compatibility
with previous literature values; this is also the case for the models in fig. 5.7, whose plots
differ only slightly from those published earlier. Several outliers in the AM data are visible
in the distribution of the corresponding normalised residuals, which deviates significantly
more from a standard normal distribution than that of the RV residuals. Nevertheless, it
is not necessary to remove outliers for our program to finish successfully.
In the near future, we plan to apply BASE to a potential exoplanet host star. In
this study one of the aims will be to determine the existence probability of a planetary
companion.
Acknowledgements. We wish to thank David W. Hogg, Sabine Reffert, René Andrae, and
Mathias Zechmeister for fruitful discussions, and an anonymous referee for comments that have
improved the quality and clarity of this article.This research has made use of the SIMBAD
database and VizieR catalogue access tool, operated at CDS, Strasbourg, France, as well as NASA’s
Astrophysics Data System.
5.7
Appendix: Encoding prior knowledge
By means of the prior, Bayesian analysis allows one to incorporate knowledge obtained
earlier, e.g. using different data. When no prior knowledge is available for some model
parameter, except for its allowed range, maximum prior ignorance about the parameter
119
can be encoded by a prior of one of the following functional forms for the most common
classes of location and scale parameters (Gregory 2005b; Sivia 2006).
• For a location parameter, we demand that the prior be invariant against a shift ∆ in
the parameter, i.e.
p(θ|M, K) dθ = p(θ + ∆|M, K) d(θ + ∆),
(5.66)
which leads to the uniform prior
p(θ|M, K) =
Θ(θ − a) Θ(b − θ)
,
b−a
(5.67)
where Θ(·), a, and b are the Heaviside step function, the lower and the upper prior
bounds.
Here, we note that the frequentist approach, lacking an explicit definition of the prior,
corresponds to the implicit assumption of a uniform prior for all parameters.
• A positive scale parameter, which often spans several decades, is characterised by its
invariance against a stretch of the coordinate axis by a factor ϕ, i.e.
p(θ|M, K) dθ = p(ϕθ|M, K) d(ϕθ),
(5.68)
which is solved by the Jeffreys prior,
p(θ|M, K) =
Θ(θ − a) Θ(b − θ)
.
θ ln ab
(5.69)
That a uniform prior would be inappropriate for this parameter is also illustrated
by the fact that it would assign higher probabilities to θ lying in a higher decade of
[a, b] than in a lower.
• If the lower prior bound of a scale parameter is zero, e.g. for the RV semi-amplitude K,
a modified Jeffreys prior is used. It has the form
p(θ|M, K) =
Θ(θ − a) Θ(b − θ)
,
k
(θ + θk ) ln b+θ
θk
(5.70)
where θk is the knee of the prior. For θ θk , this prior is approximately uniform,
while it approaches a Jeffreys prior for θ θk .
5.8
Appendix: Numerical posterior summaries
The following tables list the numerical values of several posterior summaries. Those in
tables 5.9 and 5.10 are derived from the marginal posteriors (fig. 5.5), while the correlation
coefficients in table 5.11 reflect linear relations between parameters and, in this respect,
can be regarded as summaries of the joint marginal posteriors (fig. 5.6).
Compared to the underlying densities, all of these summaries are incomplete. Still, they
are useful e.g. for the calculation of model functions or comparison with literature results.
Estimate
e
f
(d−1 )
χ
ω2
(rad)
i
(rad)
Ω
(rad)
$
(mas)
V
(km s−1 )
θ̂
θ̌
θ̃
θ̄
0.5304
0.5299
0.5295
0.5281
0.5282
0.5317
0.5270
0.5326
0.5152
0.5380
0.4905
0.5451
0.537
0.004
0.5354
0.0025
0.0486 89388
0.0486 89403
0.0486 89409
0.0486 89438
0.0486 89341
0.0486 89467
0.0486 89295
0.0486 89509
0.0486 88907
0.0486 90031
0.0486 88483
0.0486 90874
0.0486 8881
...
0.0486 89403
0.0000 00119
0.93322
0.93315
0.93294
0.93244
0.93227
0.93379
0.93175
0.93430
0.92550
0.93860
0.91472
0.94285
0.93487
...
0.93524
0.00097
4.9771
4.9779
4.9784
4.9807
4.9751
4.9806
4.9735
4.9825
4.9653
5.0019
4.9530
5.0423
4.9595
0.0201
4.9620
0.0052
1.0530
1.0528
1.0527
1.0515
1.0513
1.0549
1.0500
1.0558
1.0383
1.0615
1.0134
1.0699
...
...
1.0559
0.0052
1.8381
1.8374
1.8365
1.8346
1.8345
1.8397
1.8327
1.8411
1.8165
1.8485
1.7790
1.8593
...
...
1.850
0.007
38.74
38.69
38.91
39.02
38.13
39.66
37.74
40.06
35.73
42.24
34.24
45.02
...
...
39.4
0.3
−6.02
−6.20
−6.04
−6.02
−6.93
−5.21
−7.31
−4.68
−9.96
−2.08
−12.40
0.22
−5.64
0.15
...
...
IHPD, 50%
IHPD, 68.27%
IHPD, 95%
IHPD, 99%
Lit. estimate26
Uncertainty27
Lit. estimate28
Uncertainty
K1
(km s−1 )
K2
(km s−1 )
58.84
58.60
58.41
58.21
57.14
60.21
56.18
60.84
51.77
64.54
46.87
67.05
58.04
0.70
58.33
2.40
57.16
57.39
56.97
56.83
55.47
58.47
54.69
59.23
50.35
63.16
45.66
66.49
57.03
0.80
56.69
2.33
Reference
(1)
(2)
References. (1) Prevot (1961); (2) Hummel et al. (1998)
26
For f and χ, literature values and uncertainties are calculated from t1 and the original parameters P and T according to table 5.2. K1 and K2 are derived from
Kalt,1 and Kalt,2 via e according to table 5.2.
27
Uncertainties are missing for P in Prevot (1961) and thus for f and χ.
28
For f and χ, see footnote 26. K1 and K2 are derived from P, a0rel , $, ρ and i using eq. (5.40), (5.44) and (5.47).
120
Table 5.9: Model parameters: new and previous results. For definitions of the estimates, see section 5.2.2.
Table 5.10: Derived quantities: new and previous results. For definitions of the estimates, see section 5.2.2.
Estimate
P
(d)
T
(d) 29
d
(pc)
ρ
θ̌
θ̃
θ̄
20.53 8350
20.53 8347
20.53 8335
20.53 8322
20.53 8375
20.53 8305
20.53 8395
20.53 8085
20.53 8558
20.53 7729
20.53 8737
20.53 860
...
20.53 835
0.00 005
36997.247
36997.251
36997.262
36997.234
36997.265
36997.223
36997.276
36997.135
36997.404
36997.044
36997.622
36997.212
0.022
36997.20
0.03
25.74
25.69
25.66
25.18
26.20
24.94
26.47
23.46
27.72
21.98
28.92
...
...
25.38
0.19
1.028
1.025
1.027
0.986
1.061
0.969
1.082
0.866
1.187
0.751
1.305
1.018
0.018
1.029
0.041
IHPD, 50%
IHPD, 68.27%
IHPD, 95%
IHPD, 99%
Lit. estimate
Uncertainty
Lit. estimate
Uncertainty
Kalt,1
(km s−1 )
Kalt,2
(km s−1 )
m1
(M )
m2
(M )
a1
(AU)
a2
(AU)
arel
(AU)
68.70
68.85
68.56
67.30
70.90
66.26
71.71
60.95
75.99
54.61
78.41
68.80
0.79
69.06
3.84
68.05
67.15
66.93
65.39
68.91
64.48
69.82
59.10
74.19
53.79
78.28
67.60
0.91
67.13
3.74
2.477
2.459
2.455
2.320
2.609
2.238
2.678
1.827
3.051
1.475
3.415
...
...
2.43
0.07
2.500
2.517
2.521
2.311
2.689
2.228
2.809
1.775
3.235
1.463
3.641
...
...
2.50
0.07
0.12657
0.12678
0.12658
0.12331
0.13051
0.12156
0.13269
0.11252
0.14028
0.10452
0.14637
...
...
0.12652
0.00519
0.12418
0.12353
0.12393
0.11742
0.12987
0.11385
0.13293
0.10058
0.14681
0.08958
0.16321
...
...
0.12297
0.00504
0.25074
0.25068
0.25017
0.24628
0.25555
0.24388
0.25803
0.22977
0.26908
0.21512
0.27939
...
...
0.24949
0.00205
Reference
(1)
(2)
References. (1) Prevot (1961); (2) Hummel et al. (1998)
T is given in the reduced Julian date scale, i.e. as Julian Date − 2.4 × 106 d.
121
29
122
Table 5.11: Pearson correlation coefficients of pairs of parameters.
e
f
χ
ω2
i
Ω
$
V
K1
K2
−0.21017
0.35120
−0.62560
0.52716
0.60484
0.02224
−0.01699
0.11517
0.04381
f
χ
ω2
i
Ω
$
V
...
...
...
...
...
...
...
−0.43922
...
...
...
...
...
...
0.18818 −0.35074
...
...
...
...
...
−0.11723
0.28022 −0.57440
...
...
...
...
−0.22616
0.34159 −0.62355
0.47121
...
...
...
0.04643 −0.03317 −0.02066
0.05501 −0.00134
...
...
0.02057 −0.02176
0.01468 −0.01012 −0.01436
0.00752
...
−0.07289
0.09702 −0.12194
0.09822
0.11416 −0.33947
0.00868
−0.03506
0.04188 −0.04901
0.03520
0.04618 −0.32930 −0.02639
K1
...
...
...
...
...
...
...
...
0.02274
Chapter 6
A planet around Eridani?
Assessing the presence of a companion
6.1
Introduction
For decades, the nearby cool star Eridani has been suspected to host a planetary system,
with the first confirmed hint at a potential companion given based on Doppler spectroscopy
by Walker et al. (1995). Eridani is a low-mass star of 0.82 M (Butler et al. 2006) and
80
CFH
L
M1
M2
60
CL
M3
CV
H
40
v [m s−1 ]
20
0
-20
-40
-60
-80
1980
1985
1990
1995
2000
2005
2010
Julian year
Figure 6.1: The sets of Eridani radial-velocity data analysed in this chapter. Constants
have been added so as to nullify the mean values of each set. Abbreviations are defined in
table 6.1.
123
124
spectral type K2 V (Gray et al. 2006), situated only about 3.22 pc from the Sun. While
the controversial age estimates for Eridani range from about 100 to 1000 Myr, Janson
et al. (2008) denominated 440 Myr, inferred from its rotation rate, the most probable
estimate. Its nearness and youth make Eridani one of the most interesting planet-host
candidates and a promising target for imaging searches. To date, however, none of the
putative companions to this star have been confirmed by direct imaging.
Analyses of Eridani’s radial-velocity (RV) time series have so far yielded contradictory
outcomes in terms of the frequencies and causes of periodicities. This is partly due to the
fact that the star exhibits a high amount of RV jitter, i.e. noise caused predominantly by its
well-studied strong magnetic activity (e.g. Rueedi et al. 1997) – a common feature of cool
stars (Wilson 1978). This activity, perhaps undergoing quasi-periodic cycles, may modulate
the photospheric granulation, causing the spectral line profiles and thus the observed radial
velocity to vary coincidentally (Dravins 1985). The detection of planetary signals is further
hampered by the presence of star spots of different temperature on the photosphere, which
also modulate the observed radial velocity through changing line profiles with a period
corresponding to the stellar rotation.1
Aspects of Eridani’s activity during the years 1986 – 1992 were assessed by Gray
and Baliunas (1995), who concluded that its strong magnetic activity showed “regular
excursions” and hints of an underlying 5-yr cycle in the S-index. The Ca ii H- and K-line
profiles showed rotational modulation with a period of Prot = 11.1 d, varying between 11
and 20 d over the data sets from individual seasons. Eridani’s luminosity was found to
vary by only 1.2%. Walker et al. (1995) could not detect large photometric changes either,
making it less likely that stellar oscillations are a significant contributor to the intrinsic
stellar variability.
Already hinted by a far-infrared excess at 60 µm determined from the IRAS catalogue
by Aumann (1985), a dusty ring or debris disk was imaged about 60 AU from Eridani
with an inclination idisk ≈ 25◦ (Greaves et al. 1998). Its morphology could be modelled by
assuming the presence of an outer planetary companion Eri c with a semi-major axis
of 40 AU and a mass of about mc = 0.1 MJ , corresponding to an orbital period of about
280 yr (Quillen and Thorndike 2002).
6.2
Previous work
In the following, the various analyses of AM and RV data of Eridani found in the literature
are discussed. Figure 6.1 shows the RV data sets analysed in this chapter. Some of their
properties are listed in table 6.1 along with the abbreviations used below.
Campbell et al. (1988) concluded from RV data spanning about six years that the
star is a “probable variable”. They estimated a linear trend and curvature from the RV
data, but no perturbation period. In a follow-up article, Walker et al. (1995) employed a
generalisation of the Lomb-Scargle periodogram to search for periods exceeding 40 d in RV
data2 collected over a time span of 11 yr. Periods of P1 = 9.88 yr (with a semi-amplitude
Kalt ≈ 14 m s−1 ) and P2 = 56 d were found but designated only “marginally significant”.
The authors determined that P1 and P2 were aliases of each other (% 3.1.4), without being
1
Several observables have been found which may indicate stellar activity. Changes in the symmetry of a
stellar line profile can be detected from the bisector velocity span (BVS) (Toner and Gray 1988), while
the S-index measures the strength of the Ca ii H- and K-lines, thereby allowing to infer variations in the
strength of the magnetic field (Schrijver et al. 1989; Baliunas et al. 1995).
2
The CFH set, see table 6.1.
125
able to discern the actual periodicity or make a definite detection. However, they derived
upper planetary mass limits of mp . 1.8 MJ for P1 and mp . 0.5 MJ for P2 .
The same data were later re-analysed by Nelson and Angel (1998) using least-squares
fitting, who found the four periods 11.9 d ≈ Prot , 52.5 d, 7 yr < P < 8 yr and 10 yr, arguing
that they were all probably related to stellar rotation. A common plot of RV and BVS time
series of Eridani was presented by McMillan et al. (1996). Due to a visual correlation,
the authors suggested that the RV variations were probably caused by granular convection.
Cumming et al. (1999) applied their floating-mean periodogram (% 3.1.4) for circular
orbits to Eridani RV data covering a time span of 11.2 yr. They inferred a period of
P = 2520 d with Kalt = 14.7 m s−1 (the 99th percentile being 20 m s−1 ), but did not confirm
the planetary origin of the signal.
By contrast, based on several RV data sets3 spanning over 19 yr, Hatzes et al. (2000)
concluded for the first time that the presence of a planetary companion Eri b was
the simplest and most likely hypothesis to explain the observed variations. The periods
derived with two different methods were P = 2502.1 d and P = 2503.5 d, respectively, i.e.
P ≈ 6.85 yr, with an RV semi-amplitude Kalt = 19.0 ± 1.7 m s−1 . A periodogram of the
Ca ii H and K S-index revealed periods of 20 yr, 3 yr and 3.8 yr, in the order of decreasing
periodogram power, while a peak at 6.78 yr, near P , had a lower power and was deemed
insignificant. The authors admitted that the variations might still be caused by stellar
activity, but noted that since no correlation with the S-index was observed, this would
be counter to common understanding. Among other parameters, the orbital solution of
Hatzes et al. (2000) was characterised by eccentricity e = 0.608 ± 0.041 and “projected”
planetary mass (% 2.2.4) mp sin i = 0.86 MJ .
Around the same time, Gatewood (2000) announced the results of a first astrometric
orbital analysis of Eri b. After cancelling proper motion and parallactic displacement from
112 Multichannel Astrometric Photometer (MAP) data, and keeping the RV parameters
fixed, they estimated the additional AM parameters, including an angular semi-major axis
of the stellar orbit a0 = 1.51 ± 0.41 mas, a planetary mass mb = 1.2 ± 0.33 MJ and orbital
inclination i = 46 ± 17 ◦ .
Zucker and Mazeh (2001) performed a combined fit of RV data with the original
Hipparcos intermediate AM data (Perryman et al. 1997) to find an angular semi-major
axis of 10.1 mas, with a “peculiarly high” 99th-percentile value of 26.49 mas determined by
means of a bootstrapping technique.
Another joint fit to AM and RV data was performed by Benedict et al. (2006), who used
AM data from the HST Fine Guidance Sensor 1r obtained over 3 yr along with various RV
data sets4 covering more than 25 yr. Additionally, MAP AM data were added to improve
the determination of proper motion and parallax. The authors found a period of 2502±10 d,
RV semi-amplitude Kalt = 18.5±0.2 m s−1 and projected mass mp sin i = 0.78±0.08, similar
to the results of Hatzes et al. (2000), whereas their best-fit eccentricity e = 0.702 ± 0.039
was somewhat higher. Some of the astrometric parameters of Gatewood (2000) were
approximately reproduced by Benedict et al. (2006), including the angular semi-major axis
a0 = 1.88 ± 0.2 mas, mass mb = 1.55 ± 0.24 MJ , and inclination i = 30.1 ± 3.8◦ ≈ idisk .
Based on 120 RV data5 spanning over 16 yr, Butler et al. (2006) characterised the
orbit by a period P = 2500 ± 350 d and RV semi-amplitude Kalt = 18.6 ± 2.9 m s−1 , both
very similar to the results of Hatzes et al. (2000) and Benedict et al. (2006), but a lower
3
Probably including subsets of the CFH, L, M1, M2, and CL sets.
Including set M3.
5
Set L.
4
126
Table 6.1: Radial-velocity data sets used in this chapter, sorted by first observing time.
The median uncertainty σ̃ and the root mean square deviation about the mean, %, are
defined in section 6.3.2.
Abbrev.
Observatory/
instrument
Timespan [Jyr]
CFH
L
M1
M2
CL
M3
CV
H
CFHT
Lick
McDonald
McDonald
CES+LC
McDonald
CES+VLC
HARPS
1980.8 – 1991.9
1987.7 – 2004.0
1988.7 – 1994.8
1990.8 – 1998.1
1992.8 – 1998.0
1998.7 – 2006.2
1999.9 – 2005.9
2003.8 – 2007.7
No.
σ̃ [m s−1 ]
records
65
120
32
42
66
33
69
521
13.4
4.5
24.8
15.0
9.68
5.6
8.29
0.32
% [m s−1 ]
Reference
16.48
16.58
17.71
14.26
13.64
7.39
9.91
5.73
1
2
3
4
5
6
7
8
References. (1) Walker et al. (1995); (2) Butler et al. (2006); (3) Benedict et al. (2006);
Meschiari et al. (2009); (4) Benedict et al. (2006); Meschiari et al. (2009); (5) Endl et al.
(2002); Zechmeister, M. (2011, priv.comm.); (6) Benedict et al. (2006); (7) Zechmeister, M.
(2011, priv.comm.); (8) Zechmeister, M. (2012, priv.comm.)
eccentricity e = 0.25 ± 0.23 and higher projected mass mp sin i = 1.06 ± 0.16 MJ .
Zechmeister (2010) searched for signatures of circular and eccentric orbits in a combination of three RV data sets6 covering nearly 15 yr. No significant periods were detected,
with the highest periodogram powers found at 1.49 d (eccentric) and 6.30 d (circular),
respectively. The period around 7 yr could not be confirmed.
Reffert and Quirrenbach (2011) performed a fit to AMh data (% 2.2.2), keeping the
RV parameters fixed at previously published values. The resulting most likely parameters
included an inclination i = 23 ± 20◦ and companion mass mp = 2.4 ± 1.1 MJ . However,
due to the high uncertainty of the AM parameters, the authors did not consider this a
significant detection.
Recently, Anglada-Escudé and Butler (2012) published their analysis of seven RV
data sets7 spanning over 26 yr. The best-fit results included an eccentricity e = 0.4
(the 99% confidence interval being [0.2, 0.68]), period P = 2651 ± 36 d, semi-amplitude
Kalt = 11.8 ± 1.1 m s−1 , and projected mass mp sin i = 0.645 ± 0.058 MJ . Since all of these
numbers differ significantly from most previous estimates, the authors doubted that the
RV variations are of planetary origin but instead referred to activity as a likely cause.
This chapter aims to assess the presence of a planet-induced RV signal in the available
AMh and RV data. This is accomplished by estimating a Bayesian “periodogram” (% 4.3.3)
using Base. In contrast to the above-mentioned frequentist periodograms, this approach
includes well-defined priors on all parameters of the AMh and RV models (% 2.2), none
of which need to be kept fixed. The “periodogram” is given by the marginal posterior
of orbital frequency f . Further, the Bayes factor for the competing zero- and one-planet
hypotheses is estimated.
6
The CL and CV sets and most of the H set.
The data sets CFH, L, M1, M2, M3, and two other data sets originating from the CES+LC and HARPS
instruments.
7
127
Table 6.2: Literature values for parameters. Units are given in table 2.1.
Parameter
$
µ α∗
µδ
αr
δr
Best estimate
Uncertainty σ
0.31094
−975.17
19.49
53.2350902200
−9.4583060400
0.00016
0.21
0.20
3.89 × 10−8
3.06 × 10−8
Reference
1
1
1
1
1
References. (1) van Leeuwen (2007)
6.3
Analysis and results
The results of a combined Bayesian analysis of Hipparcos intermediate astrometric data
(AMh ) and RV data of Eridani are presented in the following. 78 AMh data (section 2.2.2)
spanning Julian years 1990.0 – 1992.6, with a median measurement uncertainty of 610 µas,
were analysed along with the RV data listed in table 6.1. The priors of all parameters were
chosen as described in the following.
6.3.1
Determination of priors
Section 2.3 introduced the default prior bounds set according to general considerations for
most parameters and according to the data for V, av , σ+ , τ+ , Ω, and K in normal mode.
For the RV semi-amplitude K, the upper bound Kmax was determined from the RV range.8
These default priors have not been altered, except for those discussed in the following. The
prior ranges used are listed in table 6.3.
Jitter.
To determine the maximum allowable jitter, two runs of Base were conducted:
1. Pass A aimed to determine the amount of RV jitter σ+ not accounted for in the RV
data uncertainties. This pass was based on the assumption that the marginal posterior
of σ+ will be shifted to lower jitter values as the observable model becomes more
complex, i.e. that less jitter is “needed” to model the data when a planetary signal is
included, compared to a model without planet. Thus, to derive an upper limit σ+,max ,
Base was run in 0-planets mode with all RV data. The secular perspective acceleration
was held fixed at the value of v̇ = 0.07031 m s−1 yr−1 determined using eq. (2.49)
with the literature values of proper motion µα∗ , µδ and parallax $ listed in table 6.2.
The default priors of the other parameters were not changed.
Pass A resulted in a 99% HPDI (% 3.2.6) upper bound
σ+,max,99% = 8.365
m
,
s
(6.1)
which was kepth throughout the following runs as a fixed upper bound, while the lower
bound was set to zero to allow arbitrarily small amounts of jitter. The upper bound
may be compared with, e.g., the external-noise estimate of 6.6 m s−1 ≈ 0.8 σ+,max,99%
by Anglada-Escudé and Butler (2012).
8
This is activated in Base with the –K_max-from-vel option.
128
Table 6.3: Prior ranges for the Eridani analysis. Non-default ranges are derived in
section 6.3.1. Units are given in table 2.1. Parameters printed in boldface only appear in
1-planet mode.
Parameter
Default range
V1
V2
V3
V4
V5
V6
V7
V8
av
σ+
$
αr
δr
µ α∗
µδ
τ+
i
Ω
e
f
χ
ω
K
a0
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
Lower bound
−69.6
−48.5
−60.9
−47.7
−23.8
13675.51
13677.28
16430.785
−5.806
0
0.280
53.2350 8998
−9.4583 0622
−1072.687
17.541
0
0
0
0
0.0001
0
0
0
0.001
Upper bound
48.5
56
74.4
53.8
19.8
13774.16
13743.44
16461.749
5.806
8.364
0.342
53.2350 9045
−9.4583 0585
−877.653
21.439
1.283
π
2π
1
1
1
2π
67.65
4.319
2. In pass B, an analogous procedure was followed for the AMh jitter τ+ , independent
from the RV data. The prior supports for the five AM parameters were set to
relatively wide intervals around the previously published best estimates (table 6.2)
in order to reflect the high uncertainty due to the different estimation techniques,
whereas the default prior of τ+ was not altered. For parallax and proper motion,
ranges of ±10% around the best estimates were adopted. Because this choice would
not be appropriate for location parameters (% 3.2.2) such as αr and δr , they were
assigned ranges of ±6σ. These priors were kept unchanged over all subsequent passes.
Pass B yielded a 99% HPDI upper bound
τ+,max,99% = 1.283 mas.
(6.2)
Orbital frequency. All analyses in 1-planet mode were based on the assumption that
at most one period P is contained in the total observing time span ∆t = 26.85 yr. Thus,
the minimum orbital frequency was set to fmin ≡ 10−4 d−1 ≈ (26.85 yr)−1 , whereas the
upper bound fmax was set to 1 d−1 , corresponding to a planetary orbital semi-major axis
of 0.034 AU or approximately ten stellar radii of Eridani.
129
Angular semi-major axis. When analysing the AMh data separately, the orbital semimajor axis over distance, i.e. angular semi-major axis a0 , appears as a model parameter.
For it, an upper bound of 6 stdev ({ah,j }) was adopted based on the assumptions that the
orbital phases of significant stellar displacement r 5 (eq. (2.41)) are covered by the data,
that the angles between r 5 and the scan circles are sufficiently small (taken modulo π) to
“detect” the displacement, and therefore the dispersion of the abscissa residuals {∆ah,j }
provides an upper limit on a0 .
6.3.2
Bayesian periodogram of all data
Under the assumption of a 1-planet model, Base was started in pass C with all AMh and
RV data and the priors described above to produce a Bayesian “periodogram”, i.e. marginal
posterior of orbital frequency f . Base collected 5 × 106 posterior samples with a thinning
stride q = 10 to improve convergence with unchanged memory demands (section 4.5.2).
The last 75% of the thinned samples were used for inference.9
The resulting marginal posterior density is displayed in fig. 6.2. No major peak is
present within any of the previously published confidence intervals lying in the zoomed
interval of fig. 6.2b. This range has a posterior probability of 5.3 × 10−4 , calculated
as the fraction of posterior samples falling into it (see eq. (3.64)). Thus, none of the
peaks in this interval can be considered significant. In particular, the highest peak, at
f ≈ 3.95 × 10−4 d−1 ≈ (2530 d)−1 , covers only 7.4 × 10−5 posterior probability.
No peak with density p(f |D, M, K) > 0.5 is visible in the frequency range of fig. 6.2c,
which has a minute probability of 1.9 × 10−5 and includes the estimates f = (9.88 yr)−1
(Walker et al. 1995) and f = (10 yr)−1 (Nelson and Angel 1998).
The marginal posterior of f is found to rise for (1 d + 17 s)−1 = 0.9998 d−1 ≤ f ≤ 1 d−1 ,
i.e. towards the highest frequency searched (fig. 6.2d), with the interval covering a posterior
probability of 0.84%. This peak might be explained as an alias as follows. On the one hand,
it is evident from fig. 6.1 that any signal potentially present in the RV data likely has a
relatively low signal-to-noise ratio, which is also hinted by comparison of the measurement
uncertainties with the radial-velocity dispersion: excluding the H data with their very low
quoted uncertainties, the median measurement uncertainty is σ̃ = 9 m s−1 , while the root
mean square (RMS) deviation of all RVs about the corresponding means is
v
u
u 1
u
%tot = t P
i Ni
Ni n X
X
(i)
vj − v (i)
2
= 14.5 m s−1 = 1.6 σ̃,
(6.3)
i=1 j=1
(i)
where Ni is the number of data in set i, n is the number of data sets, vj is the jth RV
of set i and v (i) is the mean RV of set i. Due to this low dispersion compared with the
internal noise level, the data are marginally consistent with a constant RV, associated with
a frequency f0 = 0. This is especially true since additional jitter σ+ is allowed for.
On the other hand, due to the timing constraints forced by the day–night cycle, data
from Earth-bound telescopes are affected by sampling periods of multiples of one day. For
uniformly spaced sampling at frequency fs , a sinusoidal signal of frequency f can be shown
(e.g. Dawson and Fabrycky 2010) to have aliases at frequencies
fa = ±f ± nfs ,
9
These settings were kept with all passes.
n ∈ N.
(6.4)
130
Table 6.4: Summaries for orbital frequency f [d−1 ] from pass C (all data). Numbers have
been rounded to five significant digits unless inappropriate. Definitions of the estimates
are given in section 3.2.5.
Estimate
fˆ
f˜
f¯
σmean
fˇ
σmarg
IHPD, 50%
IHPD, 68.27%
IHPD, 95%
IHPD, 99%
Value
0.61564
0.46730
0.48043
0.12740
0.9999985
0.0000010
0.020628
0.50148
0.017454
0.73696
0.016285
0.97965
0.0076181
0.99999988
Here, with f = f0 = 0 and fs = 1 d−1 , aliases would therefore be expected at frequencies
fa = n d−1 , including the mentioned peak at 1 d−1 . Likewise, the local maximum seen in
fig. 6.2a at f = 0.5 d might stem from an observation sampling at fs0 = (2 d)−1 .
The posterior summaries for f are revealed by table 6.4 to be quite inconclusive.
While the median f˜ equals the mean f¯ to within its uncertainty σmean , both MAP
value fˆ and marginal mode fˇ are significantly different, with the latter equal to fˇ =
(1 d + (130 ± 86 ms))−1 .
6.3.3
Bayesian periodograms of individual data sets
In order to assess which data set is responsible for which of the many dominant frequencies
in the combined marginal posterior (fig. 6.2a), a set of additional passes D1 – D9 was
Table 6.5: Posterior probabilities for frequencies f ∈ I, with I ≡ [10−4 d−1 , 5 × 10−4 d−1 ],
based on individual data sets ordered by increasing observing time span ∆t.
Data set
AMh
H
CL
CV
M1
M2
M3
CFH
L
∆t [yr]
2.5
3.8
5.2
6.0
6.1
7.3
7.5
11.1
16.3
p(f ∈ I|D, M, K)
0.0500
5.07 × 10−4
0.0131
0.0596
0.183
0.195
0.287
0.344
0.199
131
conducted, each using only one data set in the order of increasing observing time span
(table 6.5).
The resulting marginal posteriors are displayed in fig. 6.3. A striking feature of these
densities is the rise of more or less clear peaks around low frequencies f . 5 × 10−4 d−1 ≈
(5 yr)−1 with increasing time span in certain data sets, i.e. fig. 6.3c – 6.3d (sets CL and
CV ) and fig. 6.3f – 6.3i (sets M2, M3, CFH, and L). In the latter set of figures, the
maximum also rises with an increasing RMS-to-uncertainty ratio10 % σ̃ −1 but similar time
span ∆t (fig. 6.3f – 6.3g) and again becomes more pronounced with a longer time span but
similar RMS-to-uncertainty ratio (fig. 6.3g – 6.3h). Finally, for the last data set with its
longest time span and second-highest % σ̃ −1 (fig. 6.3i), a peak at f = (2668 d)−1 containing
about 17.4% probability is clearly visible.
Obviously, both ∆t and % σ̃ −1 correlate positively with the height and “sharpness”
of the peaks around f ≈ (2500 d)−1 , a frequency which approximately corresponds to
the periods published in some of the previous literature. The role of the time spans
is also demonstrated by table 6.5, which lists the posterior probabilities for frequencies
f ∈ [10−4 , 5 × 10−4 ]: a steady rise in probability is observed with the time span increasing
for sets H through CFH , excepting only the shortest and longest data sets. Moreover, in
the diagrams of fig. 6.3a, 6.3e, and 6.3f, a general increase in probability towards very
low frequencies f . 2 × 10−4 d−1 , any “specific” peaks being only minor, is also evident.
These correspond to the data sets with the lowest % σ̃ −1 . In the latter case, where a
sub-peak around f ≈ 3.5 × 10−4 d−1 becomes visible (fig. 6.3f), ∆t is the highest of the
three and % σ̃ −1 exceeds that of the second diagram. This positive correlation of the time
spans ∆t and RMS-to-uncertainty ratios % σ̃ −1 on the one hand with the probability around
f ≈ (2500 d)−1 on the other hand may indicate a truth to this signal.
One particular data set, however, is different from all the others (table 6.1) in that
it combines the by-far highest RMS-to-uncertainty ratio and number of data with the
second-lowest time span: the H set. Figure 6.3b and table 6.5 reveal that this set by itself
easily has the highest posterior probability for orbital frequencies in excess of 5 × 10−4 d−1 ,
with strong peaks recognisable only for f ' 0.01 d−1 . A regularity is seen in the spacing
of the peaks at these higher frequencies, which may be due to aliasing (% 6.3.2) caused
by the large volume of data taken over a short time span. The latter circumstance may
particularly increase the effect of the daily observing windows in comparison to longer
windows or true periods. Comparison with fig. 6.3a, i.e. the AMh set with its lower
RMS-to-uncertainty ratio % σ̃ −1 but similarly low time span, may indicate that the high
% σ̃ −1 of the H set also plays a role for the strength of higher frequencies.
Consequently, the H set may be suspected as a significant cause for the strong presence
of peaks at higher frequencies in the combined marginal posterior (fig. 6.2) and the relative
weakness of lower frequencies, despite the latter being prominent in several of the individual
data sets (fig. 6.3). With its 521 data and very low internal uncertainties – whose median
does not exceed 7.1% of that of any other set (table 6.1) – this data set undoubtedly
takes the strongest influence on the combined likelihood (eq. (3.21)) and therefore on the
marginal posterior of f . This proposition seems to be further corroborated by the fact
that the Bayesian evidence, i.e. prior-averaged likelihood, could only be calculated when
data grouping was applied, reducing the number of HARPS data to an order of magnitude
similar to that of most other data sets (section 6.3.4).
10
The RMS per data set is defined analogously to eq. (6.3). The values of RMS and median uncertainties
are listed for each set in table 6.1.
132
30
p(f |D, M, K)
25
20
15
10
5
0
0.0001
0.001
0.01
f [d
−1
0.1
1
]
(a)
p(f |D, M, K)
5
N
C
H
4
3
Bu
Be
A
2
1
0
0.00034
0.00036
0.00038
0.0004
f [d
−1
0.00042
0.00044
0.00046
]
(b)
p(f |D, M, K)
0.25
0.20
30
W
N
25
20
0.15
15
0.10
10
0.05
0.00
5
0.00024
0.00026
f [d
−1
(c)
0.00028
]
0
0.999
0.9995
f [d
−1
1
]
(d)
Figure 6.2: Marginal posterior of orbital frequency f under the 1-planet model using all data
(pass C). The ordinates are normalised according to linear spacing in a logarithmic abscissa.
(a) Periodogram over complete f range. (b) Blow-up of interval containing most literature
estimates, indicated by horizontal error bars. (c) Frequency interval free from peaks with
p(f |D, M, K) > 0.5 around two further literature estimates. (d) Upper frequency range.
Abbreviations: N (Nelson and Angel 1998), C (Cumming et al. 1999), H (Hatzes et al.
2000), Bu (Butler et al. 2006), Be (Benedict et al. 2006), A (Anglada-Escudé and Butler
2012), W (Walker et al. 1995).
(a) AMh data. ∆t = 2.5 yr, % σ̃ −1 = 1.17
(b) RV data set H. ∆t = 3.8 yr, % σ̃ −1 = 17.9
(c) RV data set CL. ∆t = 5.2 yr, % σ̃ −1 = 1.41
(d) RV data set CV. ∆t = 6.0 yr, % σ̃ −1 = 1.20
(e) RV data set M1. ∆t = 6.1 yr, % σ̃ −1 = 0.71
(f) RV data set M2. ∆t = 7.3 yr, % σ̃ −1 = 0.95
133
0.0001
0.001
0.01
f [d
−1
(g) RV data set M3. ∆t = 7.5 yr, % σ̃
−1
= 1.32
0.1
1
]
(h) RV data set CFH. ∆t = 11.1 yr, % σ̃ −1 = 1.23
(i) RV data set L. ∆t = 16.3 yr, % σ̃ −1 = 3.68
Figure 6.3: Periodograms of each data set taken separately, ordered by increasing observing time span from left to right and top to bottom. Ordinates
follow from normalisation and have been omitted for clarity. Also listed are the observing time spans ∆t and the ratios % σ̃ −1 of the RMS deviation of
the observable from its mean to the median measurement uncertainty.
134
Table 6.6: Number of original and grouped data for each set.
AMh
CFH
L
M1
M2
CL
M3
CV
H
78
42
65
50
120
120
32
32
42
42
66
28
33
29
69
23
521
28
Original
Grouped
6.3.4
Model selection
In order to contrast the above reasoning with a quantitative measure of the probability of
a planet’s presence around Eridani, the Volume Tesselation Algorithm (VTA, % 3.2.7)
implemented in Base was applied to the posterior samples based on all data sets. Using
the data unaltered, the evidences Z0 , Z1 for the 0- and 1-planet models, respectively, were
numerically zero. This may have been caused by the large number of data, especially
the over 500 HARPS RVs with very low nominal measurement uncertainties, which in
combination imply tiny likelihoods for parameter combinations associated with even small
residuals (eq. (3.6), (3.16) and (3.20)).
To circumvent this obstacle, the data were also grouped (% 4.4.2) using maximum time
spans of 30 min, 2 h, and 5 h respectively. For the L, M1, and M2 sets, no groups could be
formed even to within 5 h. For the CFH, M3, and CV sets, the groups remained unchanged
regardless of the chosen time span, while for the other two RV sets, the resulting number
of data varied within a range of only ±2. Therefore, a grouping time span of 2 h was
adopted. For the AMh data, the variations in scan-orientation angle and parallax factor
within each group were found to lie below 1◦ and 1%, respectively. The number of original
and modified data in each set are listed in table 6.6.
With grouping, the evidences and Bayes factors given in table 6.7 were calculated.
According to Kass and Raftery (1995), Bayes factors B1,0 < 10−2 may be interpreted as
“decisive” evidence for model M0 . In the present case, the estimated values B1,0 < 10−108
are, of course, many orders of magnitude lower and seem to provide definite evidence in
favour of the no-planet hypothesis.
Three caveats, however, should not remain unmentioned. First, this inference is based
on the specified prior knowledge. While in the frequentist framework, prior knowledge takes
the form of flat, implicit priors (% 3.2.2), the Bayesian approach makes priors “visible”,
causing the necessity – and chance – to explicitly specify them. However, care has been
taken in specifying the prior knowledge for the present analysis (section 6.3.1). Second,
the values are also based on the available data, their number, time sampling, and other
potential imperfections. And third, the data were grouped together in order to be able
to perform the calculations. The fact that this reduced the number of HARPS data from
521 to only 28, i.e. the same order of magnitude as for most other sets, illustrates the
significant effect of grouping.
6.4
Conclusions
According to the analyses presented, it seems questionable whether the determined Bayes
factors should be taken as conclusive evidence against a planet around Eridani. Besides
the modification of data applied in the latter case, the results of sections 6.3.2 and 6.3.3
suggest that the outcome of any analysis based on the present AMh and RV data may
be strongly influenced by the aliasing, large number and small nominal uncertainties of
the HARPS data set. As mentioned in section 6.3.3, the fact that the prior-averaged
135
Table 6.7: Evidences Z0 and Z1 for the 0- and 1-planet models, respectively, and Bayes
factors B1,0 . Values have been estimated with different numbers of leaves c (section 3.2.7).
c
Z0
Z1
4
8
16
32
64
1.49 × 10−700
1.23 × 10−698
2.35 × 10−689
3.07 × 10−676
1.36 × 10−676
3.59 × 10−826
2.45 × 10−807
2.85 × 10−805
2.85 × 10−805
6.40 × 10−800
B1,0
2.42 × 10−126
2.00 × 10−109
1.21 × 10−116
9.28 × 10−130
4.71 × 10−124
likelihoods, i.e. evidences, could only be calculated with data grouping also hints that
an excessive number of low-uncertainty RV data, particularly when strongly affected by
aliasing, may interfere with the combined analysis of all data. The fact that a periodicity
was clearly detected by Hatzes et al. (2000), Anglada-Escudé and Butler (2012) and others,
but not by us, may thus be attributed to the circumstance that their analyses did not
make use of a similar number of data comparable with those in the HARPS set.
Further research is therefore necessary to better understand the significance of the
effects discussed in this chapter for the analysis of the Eridani data. Particularly a
deepened understanding of Eridani’s strong and potentially manifold activity seems to
be key to answering the question about the reality of a planetary companion. Developing
models for the activity-induced effects on the observed data which go beyond the oftenmade assumption of additional Gaussian stellar jitter may aid any further analyses of
astrometric, radial-velocity, or other data for the minute effects of planetary motion. Such
an understanding indeed may be helped by large volumes of high-cadence data such as
those in the HARPS data set, because they provide a better time resolution of the RV
variation than many other data. As detailed analyses of Eridani’s activity would have
been beyond the scope of this work, they remain only to be recommended for future
research. Based on such future assessments, supplemented with a better understanding of
the effects of the narrow time sampling on the current problem and potentially based on
further data, it may be possible to draw definite and statistically reliable conclusions on
the presence of its long-suspected planetary companion.
136
Chapter 7
Conclusions
A short summary of this work
While the search for life outside the Earth still has not been successful, potential places
of residence for it are being unveiled in growing numbers. Probably formed in a common
process with their host stars, such exoplanets are not only of prime interest to those
interested in extraterrestrial life. Even the history of our own Solar System, the phases and
time scales of its evolution, are still not completely understood, and increasing the sample
of well-characterised planetary systems potentially quite different from our own helps to
put constraints on the variety of scenarios for the formation and evolution of such systems.
Detecting exoplanets and determining their orbits around their host stars has been made
possible by a number of different observational techniques, enabling scientists to obtain
observational evidence for the stellar motion caused by its companion. The alternative of
directly imaging the planet faces severe problems due to the very high brightness contrast
to the nearby host star. For indirectly detected companions, the question of whether they
are indeed of planetary nature or rather of higher mass may not be an obvious one to
answer, since the stellar motion is governed in all cases by the same physical laws1 and
only aspects of it can be observed. Moreover, noise and other sources of error contribute to
the measured value by a generally unknown amount, raising the need for reliable statistical
methods and an improved instrumental precision both. This is particularly the case as the
focus shifts towards the relatively low-mass rocky planets akin to our Earth.
Two kinds of observational techniques, namely astrometry and Doppler spectroscopy,
have been investigated in the present work. We have derived models by which to theoretically
describe the values of several corresponding observables: relative astrometric positions
of two binary components obtained using interferometry, astrometric abscissa residuals
pertaining to the host star as measured by instruments aboard now the inoperative
Hipparcos satellite, and stellar radial velocities routinely inferred from shifts in the stellar
spectral lines. These models depend on a set of parameters which ultimately characterise
the orbits of the involved objects around each other or the centre of mass.
Given the observed data, the model parameters are adjusted in a statistical data-analysis
process so as to make the model approximate the observed values. The unavoidable residuals,
or mismatch, between data and model are identified with noise in the absence of systematic
errors and under the assumption that the model is “correct”. The model parameters
1
Only classical mechanics have been considered in this thesis.
137
138
are repeatedly adjusted, each time rating the resulting residuals by reference to a noise
model, thus judging the appropriateness of the underlying parameters. Finally, the most
probable values of the parameters, or a probability density over them, can be estimated,
corresponding to the characterisation of the planetary orbit.
In this work, the Bayesian approach to inference is detailed primarily. It allows to
explicitly specify prior knowledge on the model parameters and, in light of the data, to
obtain their posterior probability density. This density can then be summarised by most
probable parameter values and uncertainties. Also, Bayes factors can be estimated which
include an Ockham’s razor penalising a too-complex model for its inflated parameter
space. Bayes factors therefore provide a statistically sound quantity for selecting the most
probable of several models – one of the features of the Bayesian approach missing in the
traditional set of frequentist techniques. Besides, predictions of the future values of the
observable models and their uncertainties can be made, helping to optimally schedule
upcoming observations.
The posterior is estimated from parameter samples obtained by the Markov chain
Monte Carlo (MCMC) technique,2 which has been implemented in a computer program
called Base. It allows to explore a multidimensional parameter space, treating astrometric
(AM) and radial-velocity (RV) data in a joint analysis. On user request, data may be
automatically arranged in groups of chronologically successive measurements, a procedure
that may be useful when the time spacing of the data is closer than the lowest orbital
period P presumed a priori, but should be applied with caution.
When prior knowledge is not explicitly provided by the user, it is supplemented by
Base in terms of relatively uninformative priors. However, prior knowledge may also be
specified in several forms, including prior-density samples and fixed parameter values.
Base can model binary stars as well as (multi-)planet systems. In the latter case, the
tool allows to estimate the Bayes factors of competing models, thus assessing the most
likely number of planets, including zero. This constitutes a Bayesian type of exoplanet
detection. For parameter estimation, Base estimates marginal posterior densities over one
or two parameters and also provides a set of numerical summaries for each parameter as
well as the correlations between them. Furthermore, in periodogram mode, the marginal
posterior of the orbital frequency f = P −1 is calculated in a refined manner so as to
make all of the often very thin peaks visible. Base incorporates kernel density estimation,
providing smooth and differentiable density estimates based on a finite number of samples.
In this thesis, we have provided an overview of the modes of using Base, its functionality
as well as its structural components in terms of its program flow, central algorithms, and
the organisation of its source code.
We have also applied Base to publicly available AM and RV data of the well-known
binary star Mizar A. We have detailed the estimation of the missing RV uncertainties,
followed by the application of Base in a set of successive passes. After obtaining first
constraints on several RV parameters, all data were combined to obtain the most probable
range of orbital frequency in periodogram mode, based on a wide prior range. In the next
pass, a degeneracy between two other orbital parameters was resolved before obtaining final
results for all orbital parameters corresponding to a particular orbit and its uncertainty. The
determined parameter values and uncertainties are compatible with previously published
results and constitute reliable knowledge on the orbital characteristics of Mizar A.
2
Besides MCMC, parallel tempering is also implemented in Base in order to facilitate the exploration
of the complete parameter space without becoming “trapped” in small regions of very high probability.
Moreover, convergence of the collected samples to the posterior is supervised by the multi-PT procedure.
139
Finally, a well-known putative exoplanet host star has been examined: Eridani. While
first confirmed hints at a planetary companion to this relatively young and nearby star
were already given almost twenty years ago, and although Eridani has been well-studied,
the question still has not been answered whether it is in fact part of a planetary system.
Direct imaging of the companion has been without success thus far, and due to Eridani’s
strong magnetic activity, the abundant RV data from various telescopes have not allowed
either to draw definite conclusions on a planet’s presence, let alone its orbit.
We have analysed eight separate, publicly available sets of RV data and a set of
Hipparcos intermediate-astrometry abscissa residuals observed of Eridani. We have
assessed the periodicities present in them, as well as in the combination of all data. The
periods found in the literature could not be confirmed with all data sets combined, whereas
similar periods were found in some of the data sets. From the probabilities of the frequencies
found and their correlation with potentially quality-related properties of the data sets,
we have concluded that a signal with period P ≈ 2500 d may be present in reality but
hidden due to the strong disturbance caused by the aliasing and dominance of one of the
data sets. Due to numerical issues, the important Bayes factor could only be calculated
after grouping the data to within time spans of 2 h. Although the determined Bayes factor
strongly favours the no-planet hypothesis, it is to be interpreted with caution not only
because of the grouping applied, but also since its estimation may have been affected by
the data imperfections mentioned above. The latter circumstance could also explain why
other authors, particularly including Hatzes et al. (2000) who first denominated the planet
hypothesis the most likely of all, did detect a unanimous periodicity – all previous analyses
were based on different data.
An improved understanding of Eridani’s strong activity, which may indeed be aided
by the availability of closely-spaced data, will help to better characterise the effects of the
activity-induced jitter on the observed data. While such an activity analysis was beyond
the scope of this work, it remains to be recommended for future research. It appears that
guided by a deepened understanding of activity- and sampling-related effects influencing
the data, definite conclusions may be drawn on the reality of a perhaps minute planetary
signal caused by a companion to Eridani.
140
Bibliography
Adamson, A., C. Aspin, C. Davis, T. Fujiyoshi, and A. Adamson, C. Aspin, C. Davis, &
T. Fujiyoshi (Eds.) (2005, December). Astronomical Polarimetry: Current Status and
Future Directions, Volume 343 of Astronomical Society of the Pacific Conference Series.
Alibert, Y., C. Mordasini, and W. Benz (2011, February). Extrasolar planet population
synthesis. III. Formation of planets around stars of different masses. A&A 526, A63.
Anglada-Escudé, G., A. P. Boss, A. J. Weinberger, I. B. Thompson, R. P. Butler, S. S.
Vogt, and E. J. Rivera (2012, February). Astrometry and Radial Velocities of the Planet
Host M Dwarf GJ 317: New Trigonometric Distance, Metallicity, and Upper Limit to
the Mass of GJ 317b. ApJ 746, 37.
Anglada-Escudé, G. and R. P. Butler (2012, June). The HARPS-TERRA Project. I. Description of the Algorithms, Performance, and New Measurements on a Few Remarkable
Stars Observed by HARPS. ApJS 200, 15.
Armstrong, J. T., D. Mozurkewich, L. J. Rickard, D. J. Hutter, J. A. Benson, P. F. Bowers,
N. M. Elias, II, C. A. Hummel, K. J. Johnston, D. F. Buscher, J. H. Clark, III, L. Ha,
L. Ling, N. M. White, and R. S. Simon (1998, March). The Navy Prototype Optical
Interferometer. ApJ 496, 550–+.
Armstrong, J. T., D. Mozurkewich, M. Vivekanand, R. S. Simon, C. S. Denison, K. J.
Johnston, X. Pan, M. Shao, and M. M. Colavita (1992, July). The orbit of Alpha Equulei
measured with long-baseline optical interferometry - Component masses, spectral types,
and evolutionary state. AJ 104, 241–252.
Aumann, H. H. (1985, October). IRAS observations of matter around nearby stars.
PASP 97, 885–891.
Baliunas, S. L., R. A. Donahue, W. H. Soon, J. H. Horne, J. Frazer, L. Woodard-Eklund,
M. Bradford, L. M. Rao, O. C. Wilson, Q. Zhang, W. Bennett, J. Briggs, S. M. Carroll,
D. K. Duncan, D. Figueroa, H. H. Lanning, T. Misch, J. Mueller, R. W. Noyes, D. Poppe,
A. C. Porter, C. R. Robinson, J. Russell, J. C. Shelton, T. Soyumer, A. H. Vaughan,
and J. H. Whitney (1995, January). Chromospheric variations in main-sequence stars.
ApJ 438, 269–287.
Bayes, M. and M. Price (1763, January). An Essay towards Solving a Problem in the
Doctrine of Chances. By the Late Rev. Mr. Bayes, F. R. S. Communicated by Mr. Price,
in a Letter to John Canton, A. M. F. R. S. Philosophical Transactions 53, 370–418.
Benedict, G. F., B. E. McArthur, G. Gatewood, E. Nelan, W. D. Cochran, A. Hatzes,
M. Endl, R. Wittenmyer, S. L. Baliunas, G. A. H. Walker, S. Yang, M. Kürster, S. Els,
141
142
and D. B. Paulson (2006, November). The Extrasolar Planet Eridani b: Orbit and
Mass. AJ 132, 2206–2218.
Bond, G. (1857, June). Photographical Experiments on the Positions of Stars. MNRAS 17,
230–+.
Boneh, A. and A. Golan (1979). Constraints’ redundancy and feasible region boundedness
by random feasible point generator (rfpg). In Third European Congress on Operations
Research, EURO III, Amsterdam.
Boschi, R., V. Lucarini, and S. Pascale (2012, July). Bistability of the climate around the
habitable zone: a thermodynamic investigation. ArXiv e-prints.
Brooks, S. P. and A. Gelman (1998). General methods for monitoring convergence of
iterative simulations. Journal of Computational and Graphical Statistics 7, 434–455.
Butler, R. P., J. T. Wright, G. W. Marcy, D. A. Fischer, S. S. Vogt, C. G. Tinney, H. R. A.
Jones, B. D. Carter, J. A. Johnson, C. McCarthy, and A. J. Penny (2006, July). Catalog
of Nearby Exoplanets. ApJ 646, 505–522.
Campbell, B., G. A. H. Walker, and S. Yang (1988, August). A search for substellar
companions to solar-type stars. ApJ 331, 902–921.
Charbonneau, D., T. M. Brown, D. W. Latham, and M. Mayor (2000, January). Detection
of Planetary Transits Across a Sun-like Star. ApJ 529, L45–L48.
Chubak, C., G. Marcy, D. A. Fischer, A. W. Howard, H. Isaacson, J. A. Johnson, and
J. T. Wright (2012, July). Precise Radial Velocities of 2046 Nearby FGKM Stars and
131 Standards. ArXiv e-prints.
Cumming, A. (2004, November). Detectability of extrasolar planets in radial velocity
surveys. MNRAS 354, 1165–1176.
Cumming, A., G. W. Marcy, and R. P. Butler (1999, December). The Lick Planet Search:
Detectability and Mass Thresholds. ApJ 526, 890–915.
Dawson, R. I. and D. C. Fabrycky (2010, October). Radial Velocity Planets De-aliased: A
New, Short Period for Super-Earth 55 Cnc e. ApJ 722, 937–953.
Deeg, H. J., J. A. Belmonte, and A. Aparicio (Eds.) (2007, October). Extrasolar Planets.
Cambridge University Press.
Deeming, T. J. (1975, August). Fourier Analysis with Unequally-Spaced Data. Ap&SS 36,
137–158.
Delplancke, F. (2008, June). The PRIMA facility phase-referenced imaging and microarcsecondastrometry. New A Rev. 52, 199–207.
Delplancke, F., S. A. Leveque, P. Kervella, A. Glindemann, and L. D’Arcio (2000, July).
Phase-referenced imaging and micro-arcsecond astrometry with the VLTI. In P. Léna &
A. Quirrenbach (Ed.), Society of Photo-Optical Instrumentation Engineers (SPIE) ConferenceSeries, Volume 4006 of Presented at the Society of Photo-Optical Instrumentation
Engineers (SPIE) Conference, pp. 365–376.
143
Dravins, D. (1985). Stellar lineshifts induced by photospheric convection. In A. G. D.
Philip and D. W. Latham (Eds.), Stellar Radial Velocities, pp. 311–320.
Efron, B. and R. Tibshirani (1993). An introduction to the bootstrap. Monographs on
statistics and applied probability. Chapman & Hall.
Endl, M., M. Kürster, S. Els, A. P. Hatzes, W. D. Cochran, K. Dennerl, and S. Döbereiner
(2002, September). The planet search program at the ESO Coudé Echelle spectrometer.
III. The complete Long Camera survey results. A&A 392, 671–690.
Ferraz-Mello, S. (1981, April). Estimation of Periods from Unequally Spaced Observations.
AJ 86, 619.
Ford, E. B. (2004, June). Quantifying the Uncertainty in the Orbits of Extrasolar Planets
with Markov Chain Monte Carlo. In S. S. Holt and D. Deming (Eds.), The Search for
Other Worlds, Volume 713 of American Institute of Physics Conference Series, pp. 27–30.
Ford, E. B. (2006, May). Improving the Efficiency of Markov Chain Monte Carlo for
Analyzing the Orbits of Extrasolar Planets. ApJ 642, 505–522.
Ford, E. B. (2008, March). Adaptive Scheduling Algorithms for Planet Searches. AJ 135,
1008–1020.
Foreman-Mackey, D., D. W. Hogg, D. Lang, and J. Goodman (2012, February). emcee:
The MCMC Hammer. ArXiv e-prints.
Foster, G. (1995, April). The cleanest Fourier spectrum. AJ 109, 1889–1902.
Free Software Foundation (2011a). GFortran. http://gcc.gnu.org/fortran/.
Free Software Foundation (2011b). GOMP. http://gcc.gnu.org/projects/gomp/.
Gatewood, G. (2000, October). The Actual Mass of the Object Orbiting Epsilon Eridani.
In AAS/Division for Planetary Sciences Meeting Abstracts #32, Volume 32 of Bulletin
of the American Astronomical Society, pp. 1051.
Gatewood, G., L. Breakiron, R. Goebel, S. Kipp, J. Russell, and J. Stein (1980, February).
On the astrometric detection of neighboring planetary systems. II. Icarus 41, 205–231.
Gelman, A. and D. B. Rubin (1992). Inference from iterative simulation using multiple
sequences. Statistical Science 7 (4), pp. 457–472.
Geman, S. and D. Geman (1984, nov.). Stochastic relaxation, gibbs distributions, and
the bayesian restoration of images. Pattern Analysis and Machine Intelligence, IEEE
Transactions on PAMI-6 (6), 721 –741.
Gilks, W. R., S. Richardson, and D. J. Spiegelhalter (1996). Markov Chain Monte Carlo
in Practice (first ed.). London: Chapman & Hall.
Gillessen, S., F. Eisenhauer, G. Perrin, W. Brandner, C. Straubmeier, K. Perraut,
A. Amorim, M. Schöller, C. Araujo-Hauck, H. Bartko, H. Baumeister, J. Berger, P. Carvas, F. Cassaing, F. Chapron, E. Choquet, Y. Clenet, C. Collin, A. Eckart, P. Fedou,
S. Fischer, E. Gendron, R. Genzel, P. Gitton, F. Gonte, A. Gräter, P. Haguenauer,
M. Haug, X. Haubois, T. Henning, S. Hippler, R. Hofmann, L. Jocou, S. Kellner,
144
P. Kervella, R. Klein, N. Kudryavtseva, S. Lacour, V. Lapeyrere, W. Laun, P. Lena,
R. Lenzen, J. Lima, D. Moratschke, D. Moch, T. Moulin, V. Naranjo, U. Neumann,
A. Nolot, T. Paumard, O. Pfuhl, S. Rabien, J. Ramos, J. M. Rees, R. Rohloff, D. Rouan,
G. Rousset, A. Sevin, M. Thiel, K. Wagner, M. Wiest, S. Yazici, and D. Ziegler (2010,
July). GRAVITY: a four-telescope beam combiner instrument for the VLTI. In Society
of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Volume 7734 of
Presented at the Society of Photo-Optical Instrumentation Engineers (SPIE) Conference.
Gould, A. (2009, March). Recent Developments in Gravitational Microlensing. In
K. Z. Stanek (Ed.), Astronomical Society of the Pacific Conference Series, Volume
403 of Astronomical Society of the Pacific Conference Series, pp. 86–+.
Gray, D. F. and S. L. Baliunas (1995, March). Magnetic activity variations of epsilon
Eridani. ApJ 441, 436–442.
Gray, R. O., C. J. Corbally, R. F. Garrison, M. T. McFadden, E. J. Bubar, C. E. McGahee,
A. A. O’Donoghue, and E. R. Knox (2006, July). Contributions to the Nearby Stars
(NStars) Project: Spectroscopy of Stars Earlier than M0 within 40 pc-The Southern
Sample. AJ 132, 161–170.
Greaves, J. S., W. S. Holland, G. Moriarty-Schieven, T. Jenness, W. R. F. Dent, B. Zuckerman, C. McCarthy, R. A. Webb, H. M. Butner, W. K. Gear, and H. J. Walker (1998,
October). A Dust Ring around epsilon Eridani: Analog to the Young Solar System.
ApJ 506, L133–L137.
Green, R. (1985). Spherical Astronomy. Cambridge University Press.
Gregory, P. C. (2005a, October). A Bayesian Analysis of Extrasolar Planet Data for HD
73526. ApJ 631, 1198–1214.
Gregory, P. C. (2005b). Bayesian Logical Data Analysis for the Physical Sciences: A
Comparative Approach with ‘Mathematica’ Support. Cambridge: Cambridge University
Press.
Gregory, P. C. (2011, January). Bayesian exoplanet tests of a new method for MCMC
sampling in highly correlated model parameter spaces. MNRAS 410, 94–110.
Hastings, W. K. (1970). Monte carlo sampling methods using markov chains and their
applications. Biometrika 57, 97–109.
Hatzes, A. P., W. D. Cochran, B. McArthur, S. L. Baliunas, G. A. H. Walker, B. Campbell,
A. W. Irwin, S. Yang, M. Kürster, M. Endl, S. Els, R. P. Butler, and G. W. Marcy (2000,
December). Evidence for a Long-Period Planet Orbiting Eridani. ApJ 544, L145–L148.
Hoffleit, D. and C. Jaschek (1982). The Bright Star Catalogue. Yale University Observatory.
Holman, M. J. and N. W. Murray (2005, February). The Use of Transit Timing to Detect
Terrestrial-Mass Extrasolar Planets. Science 307, 1288–1291.
Hummel, C. A., J. T. Armstrong, D. F. Buscher, D. Mozurkewich, A. Quirrenbach, and
M. Vivekanand (1995, July). Orbits of Small Angular Scale Binaries Resolved with the
Mark III Interferometer. AJ 110, 376–+.
145
Hummel, C. A., D. Mozurkewich, J. T. Armstrong, A. R. Hajian, N. M. Elias, II, and D. J.
Hutter (1998, November). Navy Prototype Optical Interferometer Observations of the
Double Stars Mizar A and Matar. AJ 116, 2536–2548.
Janson, M., S. Reffert, W. Brandner, T. Henning, R. Lenzen, and S. Hippler (2008,
September). A comprehensive examination of the Eridani system. Verification of a
4 micron narrow-band high-contrast imaging approach for planet searches. A&A 488,
771–780.
Kapur, J. (1989). Maximum-Entropy Models in Science and Engineering. Wiley.
Kass, R. and A. Raftery (1995). Bayes factors. Journal of the American Statistical
Association 90, 773–795.
Katzgraber, H. G., S. Trebst, D. A. Huse, and M. Troyer (2006). Feedback-optimized parallel
tempering monte carlo. Journal of Statistical Mechanics: Theory and Experiment 2006.
Koch, D. G., W. J. Borucki, G. Basri, N. M. Batalha, T. M. Brown, D. Caldwell,
J. Christensen-Dalsgaard, W. D. Cochran, E. DeVore, E. W. Dunham, T. N. Gautier, III, J. C. Geary, R. L. Gilliland, A. Gould, J. Jenkins, Y. Kondo, D. W. Latham,
J. J. Lissauer, G. Marcy, D. Monet, D. Sasselov, A. Boss, D. Brownlee, J. Caldwell, A. K.
Dupree, S. B. Howell, H. Kjeldsen, S. Meibom, D. Morrison, T. Owen, H. Reitsema,
J. Tarter, S. T. Bryson, J. L. Dotson, P. Gazis, M. R. Haas, J. Kolodziejczak, J. F. Rowe,
J. E. Van Cleve, C. Allen, H. Chandrasekaran, B. D. Clarke, J. Li, E. V. Quintana,
P. Tenenbaum, J. D. Twicken, and H. Wu (2010, April). Kepler Mission Design, Realized
Photometric Performance, and Early Science. ApJ 713, L79–L86.
Kuhn, J. R., D. Potter, and B. Parise (2001, June). Imaging Polarimetric Observations of
a New Circumstellar Disk System. ApJ 553, L189–L191.
Lange, K. L., R. J. A. Little, and J. M. G. Taylor (1989). Robust statistical modeling using
the t distribution. Journal of the American Statistical Association 84 (408), pp. 881–896.
Launhardt, R., D. Queloz, T. Henning, A. Quirrenbach, F. Delplancke, L. Andolfato,
H. Baumeister, P. Bizenberger, H. Bleuler, B. Chazelas, F. Dérie, L. Di Lieto, T. P.
Duc, O. Duvanel, N. M. Elias, II, M. Fluery, R. Geisler, D. Gillet, U. Graser, F. Koch,
R. Köhler, C. Maire, D. Mégevand, Y. Michellod, J. Moresmau, A. Müller, P. Müllhaupt,
V. Naranjo, F. Pepe, S. Reffert, L. Sache, D. Ségransan, Y. Salvadé, T. Schulze-Hartung,
J. Setiawan, G. Simond, D. Sosnowska, I. Stilz, B. Tubbs, K. Wagner, L. Weber, P. Weise,
and L. Zago (2008, July). The ESPRI project: astrometric exoplanet search with PRIMA.
In Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Volume
7013 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series.
Lecar, M., M. Podolak, D. Sasselov, and E. Chiang (2006, April). On the Location of the
Snow Line in a Protoplanetary Disk. ApJ 640, 1115–1118.
Levine, M., R. Soummer, J. Arenberg, R. Belikov, P. Bierden, A. Boccaletti, R. Brown,
A. Burrows, C. Burrows, E. Cady, W. Cash, M. Clampin, C. Cossapakis, I. Crossfield, L. Dewell, R. Egerman, H. Fergusson, J. Ge, A. Give’On, O. Guyon, S. Heap,
T. Hyde, B. Jaroux, J. Jasdin, J. Kasting, M. Kenworthy, S. Kilston, A. Klavins, J. Krist,
M. Kuchner, B. Lane, C. Lillie, R. Lyon, J. Lloyd, A. Lo, P. J. Lowrance, P. J. Macintosh,
S. McCully, M. Marley, C. Marois, G. Matthews, D. Mawet, B. Mazin, G. Mosier,
146
C. Noecker, L. Pueyo, B. R. Oppenheimer, N. Pedreiro, M. Postman, A. Roberge,
S. Ridgeway, Schneider, J. Schneider, G. Serabyn, S. Shaklan, M. Shao, A. Sivaramakrishman, D. Spergel, K. Stapelfeldt, M. Tamura, D. Tenerelli, V. Tolls, W. Traub,
J. Trauger, R. J. Vanderbei, and J. Wynn (2009). Overview of Technologies for Direct
Optical Imaging of Exoplanets. In astro2010: The Astronomy and Astrophysics Decadal
Survey, Volume 2010 of Astronomy, pp. 37.
Lindegren, L. and D. Dravins (2003, April). The fundamental definition of “radial velocity”.
A&A 401, 1185–1201.
Lomb, N. R. (1976, February). Least-squares frequency analysis of unequally spaced data.
Ap&SS 39, 447–462.
Loredo, T. J. (2004, April). Bayesian Adaptive Exploration. In G. J. Erickson and Y. Zhai
(Eds.), Bayesian Inference and Maximum Entropy Methods in Science and Engineering,
Volume 707 of American Institute of Physics Conference Series, pp. 330–346.
Lovis, C. and D. Fischer (2010). Radial Velocity Techniques for Exoplanets, pp. 27–53.
University of Arizona Press.
Lyot, B. (1932). Étude de la couronne solaire en dehors des éclipses. Avec 16 figures dans
le texte. ZAp 5, 73–+.
Mahalanobis, P. C. (1936, April). On the generalised distance in statistics. In Proceedings
National Institute of Science, India, Volume 2, pp. 49–55.
Mamajek, E. E., M. A. Kenworthy, P. M. Hinz, and M. R. Meyer (2010, March). Discovery
of a Faint Companion to Alcor Using MMT/AO 5 µm Imaging. AJ 139, 919–925.
Mao, S. and B. Paczynski (1991, June). Gravitational microlensing by double stars and
planetary systems. ApJ 374, L37–L40.
Marois, C., D. Lafrenière, R. Doyon, B. Macintosh, and D. Nadeau (2006, April). Angular
Differential Imaging: A Powerful High-Contrast Imaging Technique. ApJ 641, 556–564.
Mayor, M. and D. Queloz (1995, November). A Jupiter-Mass Companion to a Solar-Type
Star. Nature 378, 355–+.
McArthur, B. E., G. F. Benedict, R. Barnes, E. Martioli, S. Korzennik, E. Nelan, and
R. P. Butler (2010, June). New Observational Constraints on the υ Andromedae System
with Data from the Hubble Space Telescope and Hobby-Eberly Telescope. ApJ 715,
1203–1220.
McMillan, R. S., T. L. Moore, M. L. Perry, and P. H. Smith (1996, September). Correlation
of the radial velocity of Epsilon Eridani with its magnetic cycle. In Bulletin of the
American Astronomical Society, Volume 28 of Bulletin of the American Astronomical
Society, pp. 1111.
Meschiari, S., A. S. Wolf, E. Rivera, G. Laughlin, S. Vogt, and P. Butler (2009, September).
Systemic: A Testbed for Characterizing the Detection of Extrasolar Planets. I. The
Systemic Console Package. PASP 121, 1016–1027.
147
Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller (1953,
June). Equation of State Calculations by Fast Computing Machines. J. Chem. Phys. 21,
1087–1092.
Mordasini, C., Y. Alibert, W. Benz, and D. Naef (2009, July). Extrasolar planet population
synthesis. II. Statistical comparison with observations. A&A 501, 1161–1184.
Moulton, F. (1984). An Introduction to Celestial Mechanics. Dover Books on Astronomy.
Dover Publications.
Nascimbeni, V., G. Piotto, L. R. Bedin, and M. Damasso (2011, March). TASTE: The
Asiago Search for Transit timing variations of Exoplanets. I. Overview and improved
parameters for HAT-P-3b and HAT-P-14b. A&A 527, A85.
Nelson, A. F. and J. R. P. Angel (1998, June). The Range of Masses and Periods Explored
by Radial Velocity Searches for Planetary Companions. ApJ 500, 940.
Niepraschk, R. and H. Voß (2001). The package ps4pdf: from postscript to pdf. TUGboat 22 (4), 290 – 292.
OpenMP Architecture Review Board (2008). OpenMP. http://openmp.org/.
Papaloizou, J. C. B. and C. Terquem (2006, January). Planet formation and migration.
Reports on Progress in Physics 69, 119–180.
Perryman, M. (2011, June). The Exoplanet Handbook.
Perryman, M. A. C. (2000, August). Extra-solar planets. Reports on Progress in Physics 63,
1209–1272.
Perryman, M. A. C., L. Lindegren, J. Kovalevsky, E. Hoeg, U. Bastian, P. L. Bernacca,
M. Crézé, F. Donati, M. Grenon, F. van Leeuwen, H. van der Marel, F. Mignard, C. A.
Murray, R. S. Le Poole, H. Schrijver, C. Turon, F. Arenou, M. Froeschlé, and C. S.
Petersen (1997, July). The HIPPARCOS Catalogue. A&A 323, L49–L52.
Pickering, E. C. (1890, February). On the spectrum of zeta Ursae Majoris. The Observatory 13, 80–81.
Prevot, L. (1961). Vitesses radiales et éléments orbitaux de ζ1 Ursae Majoris. Journal des
Observateurs 44, 83–+.
Protassov, R., D. A. van Dyk, A. Connors, V. L. Kashyap, and A. Siemiginowska (2002,
May). Statistics, Handle with Care: Detecting Multiple Model Components with the
Likelihood Ratio Test. ApJ 571, 545–559.
Quillen, A. C. and S. Thorndike (2002, October). Structure in the Eridani Dusty
Disk Caused by Mean Motion Resonances with a 0.3 Eccentricity Planet at Periastron.
ApJ 578, L149–L152.
Reegen, P. (2007, June). SigSpec. I. Frequency- and phase-resolved significance in Fourier
space. A&A 467, 1353–1371.
Reegen, P. (2011, December). SigSpec User’s Manual. Communications in Asteroseismology 163, 3.
148
Reffert, S. (2009, November). Astrometric measurement techniques. New A Rev. 53,
329–335.
Reffert, S. and A. Quirrenbach (2011, March). Mass constraints on substellar companion
candidates from the re-reduced Hipparcos intermediate astrometric data: nine confirmed
planets and two confirmed brown dwarfs. A&A 527, A140.
Roberts, D. H., J. Lehar, and J. W. Dreher (1987, April). Time Series Analysis with Clean
- Part One - Derivation of a Spectrum. AJ 93, 968.
Roberts, G. O. (1996). Markov chain concepts related to sampling algorithms. In W. R.
Gilks, S. Richardson, and D. J. Spiegelhalter (Eds.), Markov Chain Monte Carlo in
Practice (first ed.)., pp. 45–57. London: Chapman & Hall.
Rueedi, I., S. K. Solanki, G. Mathys, and S. H. Saar (1997, February). Magnetic field
measurements on moderately active cool dwarfs. A&A 318, 429–442.
Scargle, J. D. (1982, December). Studies in astronomical time series analysis. II - Statistical
aspects of spectral analysis of unevenly spaced data. ApJ 263, 835–853.
Schneider, J. (2012). The Extrasolar Planets Encyclopædia. http://exoplanet.eu/. Accessed
on November 10, 2012.
Schneider, J., Dedieu, C., Le Sidaner, P., Savalle, R., and Zolotukhin, I. (2011). Defining
and cataloging exoplanets: the exoplanet.eu database. A&A 532, A79.
Schrijver, C. J., J. Cote, C. Zwaan, and S. H. Saar (1989, February). Relations between
the photospheric magnetic field and the emission from the outer atmospheres of cool
stars. I - The solar CA II K line core emission. ApJ 337, 964–976.
Schulze-Hartung, T. (2008). Bayesian astrometric and spectroscopic exoplanet detection
and characterization software. Diploma thesis, University of Heidelberg.
Schulze-Hartung, T., R. Launhardt, and T. Henning (2012, September). Bayesian analysis
of exoplanet and binary orbits. Demonstrated using astrometric and radial-velocity data
of Mizar A. A&A 545, A79.
Seager, S. (2003, March). The search for extrasolar Earth-like planets. Earth and Planetary
Science Letters 208, 113–124.
Seager, S. (2008, March).
Space Sci. Rev. 135, 345–354.
Exoplanet Transit Spectroscopy and Photometry.
Shao, M., M. M. Colavita, B. E. Hines, D. H. Staelin, and D. J. Hutter (1988, March).
The Mark III stellar interferometer. A&A 193, 357–371.
Silverman, B. (1986). Density estimation for statistics and data analysis. Monographs on
statistics and applied probability. Chapman and Hall.
Sivia, D. S. (2006). Data Analysis—A Bayesian Tutorial (second ed.). Oxford: Oxford
University Press.
Smith, R. L. (1980). A monte carlo procedure for the generation of feasible solutions to
mathematical programming problems. In Bulletin of the TIMS/ORSA Joint National
Meeting, Washington, DC, pp. 101.
149
Smith, W. H. (1987, December). Spectral differential imaging detection of planets about
nearby stars. PASP 99, 1344–1353.
Sozzetti, A. (2005, October). Astrometric Methods and Instrumentation to Identify and
Characterize Extrasolar Planets: A Review. PASP 117, 1021–1048.
Stumpff, K. (1973). Himmelsmechanik, Volume 1. VEB Deutscher Verlag der Wissenschaften.
Thiele, T. N. (1883, January). Neue Methode zur Berechung von Doppelsternbahnen.
Astronomische Nachrichten 104, 245–+.
Tolbert, C. R. (1964, May). A UBV Study of 94 Wide Visual Binaries. ApJ 139, 1105–+.
Toner, C. G. and D. F. Gray (1988, November). The starpatch on the G8 dwarf XI Bootis
A. ApJ 334, 1008–1020.
Torres, G., J. Andersen, and A. Giménez (2010, February). Accurate masses and radii of
normal stars: modern results and applications. A&A Rev. 18, 67–126.
Tuomi, M., S. Kotiranta, and M. Kaasalainen (2009, February). The complementarity of
astrometric and radial velocity exoplanet observations. Determining exoplanet mass with
astrometric snapshots. A&A 494, 769–774.
van de Kamp, P. (1977). Perspective secular changes in stellar proper motion, radial
velocity and parallax. Vistas in Astronomy 21, 289–310.
van Leeuwen, F. (Ed.) (2007). Hipparcos, the New Reduction of the Raw Data, Volume 350
of Astrophysics and Space Science Library. Springer Verlag.
Vigan, A., C. Moutou, M. Langlois, F. Allard, A. Boccaletti, M. Carbillet, D. Mouillet, and
I. Smith (2010). Photometric characterization of exoplanets using angular and spectral
differential imaging. Monthly Notices of the Royal Astronomical Society 407 (1), 71–82.
Vogt, S. S., R. P. Butler, G. W. Marcy, D. A. Fischer, G. W. Henry, G. Laughlin, J. T.
Wright, and J. A. Johnson (2005, October). Five New Multicomponent Planetary
Systems. ApJ 632, 638–658.
Walker, G. A. H., A. R. Walker, A. W. Irwin, A. M. Larson, S. L. S. Yang, and D. C.
Richardson (1995, August). A search for Jupiter-mass companions to nearby stars.
Icarus 116, 359–375.
Weinberg, M. D. (2012). Computing the Bayes Factor from a Markov Chain Monte Carlo
Simulation of the Posterior Distribution. Bayesian Analysis 7 (3), 737 – 770.
Weinberg, M. D. and J. E. B. Moss (2011, August). The umass bayesian inference engine.
http://www.astro.umass.edu/BIE/manual.pdf.
Wilson, O. C. (1978, December). Chromospheric variations in main-sequence stars. ApJ 226,
379–396.
Wolpert, R. L. (2002, August). Stable limit laws for marginal probabilities from mcmc
streams: Acceleration of convergence. http://ftp.isds.duke.edu/WorkingPapers/0222.pdf.
150
Wolszczan, A. and D. A. Frail (1992, January). A planetary system around the millisecond
pulsar PSR1257 + 12. Nature 355, 145–147.
Zechmeister, M. (2010, November). Precision Radial Velocity Surveys for Exoplanets. Ph.
D. thesis.
Zechmeister, M. and M. Kürster (2009, March). The generalised Lomb-Scargle periodogram.
A new formalism for the floating-mean and Keplerian periodograms. A&A 496, 577–584.
Zucker, S. and T. Mazeh (2001, November). Analysis of the Hipparcos Observations of the
Extrasolar Planets and the Brown Dwarf Candidates. ApJ 562, 549–557.
151
Acknowledgements
First and foremost, I wish to thank my primary advisor Prof. Thomas Henning, who
gave me the opportunity to conduct my PhD thesis at the Max Planck Institute for
Astronomy (MPIA). This work would not have been possible without his enduring support,
his patience, and his scientific guidance.
I owe many thanks to my co-advisor Dr. Ralf Launhardt for his constant readiness
to provide valuable suggestions and constructive criticism both. His support has been
indispensable throughout this work.
Prof. Andreas Quirrenbach is gratefully appreciated for his readiness to act as second
co-advisor and referee for my thesis.
The members of my PhD Advisory Committee, Dr. Eva Schinnerer, Dr. Wolfgang
Brandner, Dr. Coryn Bailer-Jones, PD Dr. Hubert Klahr, and Dr. Tom Herbst
are acknowledged for providing helpful input on the scope and progress of this work.
I owe special thanks to Prof. Michael Perryman, who encouraged me in my work and
gave valuable hints on self-management.
I am grateful to Dr. Mathias Zechmeister for sharing essential radial-velocity data,
and for several fruitful discussions on frequentist data analysis. Dr. Johny Setiawan is
thanked for sharing a number of data sets that have been useful in testing Base.
Dr. Sabine Reffert is appreciated for her advice on the treatment of Hipparcos data
and other aspects of astrometry.
Prof. David W. Hogg and Dr. René Andrae are both thanked for many fruitful and
interesting discussions about Bayesian statistics and other aspects of data analysis.
Prof. Edward O. Wiley and Francisco Rica Romero are regarded for patiently testing
Base and for many useful hints and ideas.
Dr. Dading Nugroho, Gabriele Maier, Dr. Natalia Kudryavtseva, and Dr. René
Andrae are appreciated for sharing their time in our MPIA office.
I thank my parents Brigitte Schulze-Hartung and Klaus Schulze-Hartung for their
patience and support during all this time.
I appreciate my good and reliable friends Bernhard Wüste and Dr. Manfred Bohn.
Manfred is thanked for sharing his LATEX style template.
The deepest gratitude I owe to Swetlana Stresler for being there and giving me strong
support, tolerance and understanding throughout a very demanding phase of my life.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement