Ruotsalainen
Tampere University of Technology
Vision-Aided Pedestrian Navigation for Challenging GNSS Environments
Citation
Ruotsalainen, L. (2013). Vision-Aided Pedestrian Navigation for Challenging GNSS Environments. (Suomen
geodeettisen laitoksen julkaisuja - Publications of the Finnish Geodetic Institute;151). Suomen geodeettinen
laitos.
Year
2013
Link to publication
TUTCRIS Portal (http://www.tut.fi/tutcris)
Take down policy
If you believe that this document breaches copyright, please contact [email protected], and we will remove access to
the work immediately and investigate your claim.
Download date:15.09.2017
SUOMEN GEODEETTISEN LAITOKSEN JULKAISUJA
VERFFENTLICHUNGEN DES FINNISCHEN GEODTISCHEN INSTITUTES
PUBLICATIONS OF THE FINNISH GEODETIC INSTITUTE
================== N:o 151 ==================
Vision-Aided Pedestrian Navigation for Challenging GNSS Environments
by
Laura Ruotsalainen
Doctoral dissertation for the degree of Doctor of Science in Technology to be
presented with due permission for public examination and debate in Tietotalo
Building, Auditorium TB109, at Tampere University of Technology on the 4th of
November 2013 at 12 noon.
KIRKKONUMMI 2013
ii
ISBN (printed): 978-951-711-302-1
ISBN (pdf): 978-951-711-303-8
ISSN: 0085-6932
Juvenes Print Oy, 2013
iii
Supervising professor
Professor Ruizhi Chen, Texas A&M University Corpus Christi, Conrad Blucher Institute of Surveying and Science, School of Engineering and Computer Science
Thesis co-supervisors
Professor Gérard Lachapelle, University of Calgary, Department of Geomatics Engineering
Professor Jarmo Takala, Tampere University of Technology, Department of Pervasive
Computing
Preliminary examiners
Professor Andreas Wieser, Swiss Federal Institute of Technology Zurich ETH, Institute of Geodesy and Photogrammetry
Professor Jari Hannuksela, University of Oulu, Department of Computer Science and
Engineering
Opponents
D.Sc.(Tech) Susanna Pirttikangas, University of Oulu, Department of Computer Science and Engineering
D.Sc.(Tech) Jari Syrjärinne, Nokia Oyj
iv
ABSTRACT
There is a strong need for an accurate pedestrian navigation system, functional also
in GNSS challenging environments, namely urban areas and indoors, for improved
safety and to enhance everyday life. Pedestrian navigation is mainly needed in these
environments that are challenging for GNSS but also for other RF positioning systems and some non-RF systems such as the magnetometry used for heading due to
the presence of ferrous material. Indoor and urban navigation has been an active research area for years. There is no individual system at this time that can address all
needs set for pedestrian navigation in these environments, but a fused solution of different sensors can provide better accuracy, availability and continuity. Self-contained
sensors, namely digital compasses for measuring heading, gyroscopes for heading
changes and accelerometers for the user speed, constitute a good option for pedestrian navigation. However, their performance suffers from noise and biases that result
in large position errors increasing with time. Such errors can however be mitigated
using information about the user motion obtained from consecutive images taken by
a camera carried by the user, provided that its position and orientation with respect
to the user’s body are known. The motion of the features in the images may then be
transformed into information about the user’s motion. Due to its distinctive characteristics, this vision-aiding complements other positioning technologies in order to
provide better pedestrian navigation accuracy and reliability.
This thesis discusses the concepts of a visual gyroscope that provides the relative
user heading and a visual odometer that provides the translation of the user between
the consecutive images. Both methods use a monocular camera carried by the user.
The visual gyroscope monitors the motion of virtual features, called vanishing points,
arising from parallel straight lines in the scene, and from the change of their location
that resolves heading, roll and pitch. The method is applicable to the human environments as the straight lines in the structures enable the vanishing point perception.
For the visual odometer, the ambiguous scale arising when using the homography
vi
Abstract
between consecutive images to observe the translation is solved using two different methods. First, the scale is computed using a special configuration intended for
indoors. Secondly, the scale is resolved using differenced GNSS carrier phase measurements of the camera in a method aimed at urban environments, where GNSS can’t
perform alone due to tall buildings blocking the required line-of-sight to four satellites. However, the use of visual perception provides position information by exploiting a minimum of two satellites and therefore the availability of navigation solution
is substantially increased. Both methods are sufficiently tolerant for the challenges
of visual perception in indoor and urban environments, namely low lighting and dynamic objects hindering the view. The heading and translation are further integrated
with other positioning systems and a navigation solution is obtained. The performance of the proposed vision-aided navigation was tested in various environments, indoors and urban canyon environments to demonstrate its effectiveness. These experiments, although of limited durations, show that visual processing efficiently complements other positioning technologies in order to provide better pedestrian navigation
accuracy and reliability.
PREFACE
The research presented in this thesis has been carried out mainly at the Finnish Geodetic Institute (FGI), Department of Navigation and Positioning, during years 20102013. The research included also a eight-month research visit to the Department of
Geomatics Engineering, University of Calgary, Canada in 2012.
I have been privileged to receive guidance from four distinguished professors, to
whom I want to express my gratitude. First, I would like to thank my supervisor
Prof. Ruizhi Chen for providing the possibility to commit this research, for guidance
and encouragement. Second, I would like to thank my co-supervisor Prof. Gérard
Lachapelle for providing me the valuable opportunity to work and study in his Position, Location and Navigation (PLAN) group at the University of Calgary, for guidance and for introducing me the stunning Canadian Rockies. Third, I would like to
thank my other co-supervisor Prof. Jarmo Takala for his guidance in practical issues related to my studies and the dissertation process. Last, but definitely not least,
I would like to than Prof. Heidi Kuusniemi for her endless guidance, aiding and
encouragement during the process.
I would like to express my appreciation to Prof. Andreas Wieser and Prof. Jari
Hannuksela for reviewing the manuscript and providing constructive comments.
I have also been privileged to work with numerous people being able to realize their
passion to science and therefore making the working environment very pleasant.
Therefore I would like to thank everyone I have been dealing with during my scientific career, but especially my colleagues who have contributed to my work and
made my everyday work pleasant both at Finnish Geodetic Institute and University
of Calgary: Dr. Zahidul Bhuiyan, Dr. Liang Chen, Dr. Yuwei Chen, Dr. Ling Pei,
Dr. Jingbin Liu, M.Sc. Robert Guinness, M.Sc. Heli Honkanen, Dr. Jared Bancroft,
M.Sc. Anup Dhital and David Garrett, and all the others I have had the pleasure to
collaborate with of the Position, Location and Navigation group in Calgary and of
viii
Preface
the Finnish Geodetic Institute.
My research has been supported financially by the Nokia Foundation award, received
in years 2011 and 2012 and Tampere University of Technology’s grant for postgraduate exchange, which are gratefully acknowledged.
In addition, I would like to express my gratitude to my parents Marja-Kirsti and
Jouko Eliasson and my sister Liisa Eliasson-Tapio for always believing in me, for
their support and admiration during this process as always, which has given the selfconfidence needed also in this process. I would also like to thank all my relatives,
especially my grandmothers, and my family-in-law for their encouragement also in
this process as always. I would like to thank my friends for their friendship, for
offering me the valuable moments of laugh and happiness and sharing of concerns.
Finally, my greatest thanks go to my family, my husband, Aki, who has fully supported me in this process as in everything for the last twenty years, mentally as well as
in practice, and my two beautiful daughters, Maria and Malla, who fill my every day
with joy and happiness. And most of all, for giving life the meaning.
Helsinki, September 2013
Laura Ruotsalainen
TABLE OF CONTENTS
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.1
Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.3
Author’s Contribution . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.4
Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2. Overview of pedestrian navigation . . . . . . . . . . . . . . . . . . . . .
11
2.1
Navigation Frames and Attitude . . . . . . . . . . . . . . . . . . .
11
2.2
Absolute Positioning . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.2.1
Global Navigation Satellite Systems . . . . . . . . . . . . .
13
2.2.2
WLAN Positioning . . . . . . . . . . . . . . . . . . . . . .
17
2.2.3
Other Technologies . . . . . . . . . . . . . . . . . . . . . .
18
Relative Positioning . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.3.1
20
2.3
Inertial Sensors . . . . . . . . . . . . . . . . . . . . . . . .
x
Table of Contents
2.3.2
2.4
Other Self-Contained Sensors . . . . . . . . . . . . . . . .
22
Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
2.4.1
Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.4.2
Extended Kalman Filter . . . . . . . . . . . . . . . . . . .
27
3. Computer vision methods for navigation . . . . . . . . . . . . . . . . . .
29
3.1
Camera, Fundamental and Essential Matrices and Coordinate Frames
29
3.2
Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.2.1
Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.2.2
SIFT Features . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.2.3
Line Extraction . . . . . . . . . . . . . . . . . . . . . . . .
34
3.3
Image Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.4
Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.4.1
39
Distortion . . . . . . . . . . . . . . . . . . . . . . . . . . .
4. Visual gyroscope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
4.1
Locating the Vanishing Points . . . . . . . . . . . . . . . . . . . .
42
4.2
Attitude of the Camera . . . . . . . . . . . . . . . . . . . . . . . .
44
4.3
Error Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
4.4
Performance of the Visual Gyroscope . . . . . . . . . . . . . . . .
51
4.4.1
Theoretical Analysis of Attainable Accuracy . . . . . . . .
55
Effect of Camera and Setup Characteristics on the Accuracy of the
Visual Gyroscope . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
4.5.1
Experimental Results . . . . . . . . . . . . . . . . . . . . .
60
4.6
Smartphone Application of Visual Gyroscope . . . . . . . . . . . .
66
4.7
Visual Gyroscope Implementation Using Probabilistic Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
4.5
Table of Contents
5. Visual odometer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
6.2
6.3
73
5.1.1
Measuring the Distance of an Object from the Camera . . .
74
5.1.2
Error Detection and Resolving the unknown scale for the
Visual Odometer . . . . . . . . . . . . . . . . . . . . . . .
77
5.1.3
Degeneracy . . . . . . . . . . . . . . . . . . . . . . . . . .
79
5.1.4
Performance of the Visual Odometer . . . . . . . . . . . . .
79
83
Visual Gyroscope and Odometer Aided Multi-Sensor Positioning . .
84
6.1.1
Kalman Filter Used in Multi-Sensor Positioning . . . . . .
85
6.1.2
Test in an Indoor Office Environment . . . . . . . . . . . .
87
6.1.3
Test in Office Environment Using an Outdated WLAN Radio
Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
Stand-Alone Visual System . . . . . . . . . . . . . . . . . . . . . .
92
6.2.1
Kalman Filter Used in Stand-Alone Visual Positioning . . .
93
6.2.2
Test in a Shopping Mall Environment . . . . . . . . . . . .
93
6.2.3
Test in an Urban Canyon . . . . . . . . . . . . . . . . . . .
94
Visual Gyroscope Aided IMU Positioning . . . . . . . . . . . . . .
97
6.3.1
6.4
73
The Principle of the Visual Odometer . . . . . . . . . . . . . . . .
6. Vision-aided navigation using visual gyroscope and odometer . . . . . . .
6.1
xi
Kalman Filter Used in Visual Gyroscope Aided IMU Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
6.3.2
Equipment Setup on the Body . . . . . . . . . . . . . . . .
100
6.3.3
Equipment Setup on the Foot . . . . . . . . . . . . . . . . .
106
Performance of Visual Gyroscope Implemented Using Probabilistic
Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . .
108
7. Vision-aided carrier phase navigation . . . . . . . . . . . . . . . . . . . 113
7.1
Ambiguity Resolution Using Differenced GNSS Carrier Phase Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
114
xii
Table of Contents
7.1.1
Ambiguous Translation Using the Fundamental Matrix . . .
116
7.1.2
Navigation Solution Incorporating the Absolute User Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
117
7.2
Method Verification in a Sub-Urban Environment . . . . . . . . . .
119
7.3
Vision-Aided GNSS Navigation in an Urban Environment . . . . .
121
8. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.1
Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
128
8.2
Future Development . . . . . . . . . . . . . . . . . . . . . . . . .
131
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
LIST OF FIGURES
2.1
User attitude in navigation frame . . . . . . . . . . . . . . . . . . .
12
2.2
Absolute heading error of a digital compass indoors . . . . . . . . .
23
3.1
Coordinate frames in vision-aiding . . . . . . . . . . . . . . . . . .
31
3.2
Epipolar geometry . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.3
Hough transform parameters . . . . . . . . . . . . . . . . . . . . .
37
4.1
Vanishing point . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
4.2
Vanishing point in an image with roll . . . . . . . . . . . . . . . . .
45
4.3
Vanishing point error detection . . . . . . . . . . . . . . . . . . . .
49
4.4
Allan deviation of the visual gyroscope . . . . . . . . . . . . . . .
53
4.5
Visual gyroscope‘s tolerance on dynamic objects . . . . . . . . . .
54
4.6
An image captured of the same scene with three different cameras .
60
4.7
Experiment setup for testing camera characteristics . . . . . . . . .
61
4.8
Effect of the field-of-view on the line detection. . . . . . . . . . . .
64
5.1
Visual odometer configuration . . . . . . . . . . . . . . . . . . . .
75
5.2
Matched SIFT features between consecutive images . . . . . . . . .
77
6.1
Equipment setup for testing the vision-aided multi-sensor positioning
system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
6.2
Office corridor used for experiments . . . . . . . . . . . . . . . . .
88
6.3
Visual odometer speed . . . . . . . . . . . . . . . . . . . . . . . .
89
6.4
Vision-aided position solution in an office corridor . . . . . . . . .
90
xiv
List of Figures
6.5
Vision-aided position solution in an office corridor with an outdated
WLAN radio map . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
6.6
Challenging environment of Iso Omena shopping centre . . . . . .
94
6.7
The two-dimensional position solution in the Iso Omena shopping
centre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
6.8
Challenging environment of an urban canyon . . . . . . . . . . . .
96
6.9
Position solution in an urban canyon . . . . . . . . . . . . . . . . .
97
6.10 Route for experiments on the University of Calgary campus . . . . .
101
6.11 Body mounted test equipment . . . . . . . . . . . . . . . . . . . .
102
6.12 Standard deviation for different integration schemes . . . . . . . . .
103
6.13 Attitude error using different integration schemes . . . . . . . . . .
104
6.14 Equipment setup for the foot . . . . . . . . . . . . . . . . . . . . .
107
6.15 Images from one step cycle period . . . . . . . . . . . . . . . . . .
107
6.16 RMS position errors obtained for foot-mounted IMU . . . . . . . .
108
6.17 Line detection and vanishing point calculations using Probabilistic
Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . .
110
6.18 Evaluation of vanishing point detection in an environment suffering
from low lighting and non-orthogonal lines . . . . . . . . . . . . .
110
6.19 Correcting IMU errors using a vanishing point obtained using Probabilistic Hough Transform . . . . . . . . . . . . . . . . . . . . . .
111
6.20 Conflict between estimated and detected vanishing . . . . . . . . .
112
7.1
Setup for vision-aided carrier phase navigation . . . . . . . . . . .
120
7.2
Position solution verification in a sub-urban environment shown in
Google Earth . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
121
7.3
Calgary downtown . . . . . . . . . . . . . . . . . . . . . . . . . .
122
7.4
Number of satellites acquired in an urban canyon . . . . . . . . . .
123
7.5
Position solution in an urban canyon . . . . . . . . . . . . . . . . .
124
List of Figures
xv
7.6
Horizontal position errors in an urban canyon . . . . . . . . . . . .
125
7.7
Position solution in an urban canyon shown in Google Earth . . . .
126
7.8
Position solution using GPS only in an urban canyon shown in Google
Earth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
xvi
List of Figures
LIST OF TABLES
4.1
Statistics for heading change accuracy, all units degrees . . . . . . .
52
4.2
Effect of roll error on other angle observations . . . . . . . . . . . .
55
4.3
Parameters for GoPro, Sony and Nokia cameras . . . . . . . . . . .
60
4.4
Heading change error statistics . . . . . . . . . . . . . . . . . . . .
62
4.5
Roll and pitch error statistics . . . . . . . . . . . . . . . . . . . . .
63
4.6
Processing time for different algorithms in visual gyroscope’s Nokia
N8 Symbian (capturing the photo) smartphone implementation using
OpenCV (edge and line detection) and vanishing point, heading and
tilt computations implemented using C++. . . . . . . . . . . . . . .
66
Statistics of the effect of camera height errors for visual odometer’s
speed accuracy, units are in m/s . . . . . . . . . . . . . . . . . . .
81
5.1
6.1
Positioning error statistics using different systems in an office corridor 90
6.2
Positioning error statistics using different positioning systems in an
office corridor with an outdated WLAN radio map . . . . . . . . . .
92
Positioning error statistics for visual stand-alone and GPS position
solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
Attitude errors obtained for body-mounted IMU with different integration methods . . . . . . . . . . . . . . . . . . . . . . . . . . . .
104
RMS position error obtained for foot-mounted IMU with and without
vision-aiding . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
108
Ratio of image points used for computing the Probabilistic Hough
Transform presented to the image points used by Standard Hough
Transform for the images processed in the experiment . . . . . . . .
109
6.3
6.4
6.5
6.6
xviii
List of Tables
7.1
Positioning verification error statistics using vision-aided carrier phase
(VA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.2
Positioning error statistics using vision-aided carrier phase (VA) and
GPS only (GPS) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
125
ABBREVIATIONS
AVUPT
Absolute Visual attitude Update
BLUE
Best Linear Unbiased Estimate
C/A
Coarse/Acquisition
CCD
Charge Coupled Device
CMOS
Complementary Metal Oxide Semiconductor
COMPASS/Beidou
Chinese Satellite Navigation System
DCM
Direction Cosine Matrix
DOP
Dilution Of Precision
E
East
ECEF
Earth Centered Earth Fixed
EKF
Extended Kalman Filter
ENU
East-North-Up
EXIF
Exchangeable Image File
Galileo
European Satellite Navigation System
GDOP
Geometric Dilution Of Precision
GLONASS
The Russian Positioning System, Global’naya
Navigatsionnaya Sputknikkovaya Sistema
GNSS
Global Navigation Satellite System
xx
Abbreviations
GPS
Global Positioning System
HD
High-definition
HSGPS
High Sensitivity GPS
IEEE
The Institute of Electrical and Electronics Engineers
ION
Institute of Navigation
IMU
Inertial Measurement Unit
INS
Inertial Navigation System
KF
Kalman filter
LCI
Low-coherence Interferometry
LDOP
Line Dilution Of Precision
LOS
Line Of Sight
Max
Maximum
MEMS
Micro-Electro-Mechanical
Min
Minimum
MSP
Multi Sensor Positioning
N
North
PGCP
Pseudo Ground Control Points
PDOP
Position Dilution Of Precision
PPP
Precise Point Positioning
RANSAC
RANdom SAmple Consensus
xxi
RF
Radio Frequency
RFID
Radio Frequency Identification
rms
root mean square
RSSI
Received Signal Strength Indication
SHT
Standard Hough Tranform
SIFT
Scale Invariant Feature Transform
SLAM
Simultaneous Localization And Mapping
SPAN
Synchronized Position Attitude Navigation
SVD
Singular Value Decomposition
std
standard deviation
ToA
Time of Arrival
TVUPT
Temporal Visual Attitude Update
U
Up
UAV
Unmanned Aerial Vehicle
UKF
Unscented Kalman filter
UTC
Coordinated Universal Time
UERE
User Equivalent Range Error
UWB
Ultra-Wideband
VA
Vision-aided
WiFi
Wireless network, a registered trademark of the Wi-Fi Alliance
WLAN
Wireless Local Area Network
xxii
Abbreviations
SYMBOLS
αi
Angle between a line i in an image and the image x-axis
β
roll
∆t
Time interval
∆x
Vector offset of the user’s true position and time bias
from the values at the linearization point
δxk
Perturbation of the state
−
Error of a priori state estimate or perturbation of the
Euler angles
Error of a posteriori state estimate or noise
in GPS measurements
or vector of errors in GNSS measurements
ηg
Noise in gyroscope or carrier phase measurement
λ
Carrier wavelength or longitude
µ
Mean
∇
Image gradient
ω
Earth turn rate
b
ωib
Body (b) turn rate with respect to the inertial (i) frame
angular velocity measurement
b
ω̃ib
Gyroscope angular velocity measurement
xxiv
Symbols
Ω
Skew symmetrical matrix of the angular velocity vector
φ
pitch or latitude
Φ
State transition matrix
ρ
Pseudorange or the radius of a line in an image in
Hough Transform
ρ̂
Estimated pseudorange computed from the estimated
user position
σ
Standard deviation
σ2
Variance
2 (t )
σC
A
Allan variance
θ
Heading, (azimuth)
ϕ
Carrier phase
b
body frame
c
Speed of light
C
Direction cosine matrix or Convolution
d
direction of a line in an image
diono
Ionospheric delay
dtropo
Tropospheric delay
dρ
Ephemeris error
dt
Satellite clock error
Di
Distance between the starting point of line i
and the vanishing point
E
Essential matrix
f
Focal legth
f
Specific force
xxv
F
Fundamental matrix
g
Mass gravitation
G
User-satellite geometry matrix or Convolution kernel
or g-sensitivity coefficient matrix
h
Height
H
Height of an image in pixels
H
Design matrix or image homography
i
inertial frame
I
Image matrix
k
Distortion value
K
Kalman gain or camera calibration matrix
L1
GPS signal carrier frequency at 1575.42 MHz
M
Image gradient magnitude matrix
N
Gaussian probability distribution or integer number
of carrier waves
Ne
Inertia tensor
O
Image gradient orientation matrix
p
pressure
P
State error covariance or camera matrix
q̃
Spectral density value
Q
Process noise covariance
r
Geometric range
rd
Radial distance of the normalized distorted image point
r
User position vector or Least-squares residual vector
xxvi
R
Symbols
Measurement noise covariance or camera rotation
matrix
Rg
Universal gas constant
RW LAN
RSSI observation vector
s
Ambiguous scale in translation observed from
consecutive images
s
Satellite coordinate vector
S
User speed
S
Scale factor and non-orthogonality matrix
t
time
tu
Receiver clock error
t
User translation vector
Ti
Satellite i’s position vector
Trcvr
Receiver position vector
T0
Temperature at the sea level
TL
Temperature lapse rate
u
Principal point’s x-coordinate
u
User coordinate vector
or the unit vector from user to satellite
uGC
Satellite and user geometry change
v
Principal point’s y-coordinate
v
Kalman filter’s innovation vector
or user velocity vector or a vanishing point matrix
vk
Process noise
xxvii
vf ov
Vertical field-of-view of a camera
vx
Vanishing point in x-axis direction
in homogenous coordinates
vy
Vanishing point in y-axis direction
in homogenous coordinates
vz
Vanishing point in z-axis direction
in homogenous coordinates
wi
Standardized innovation of the ith element
of the innovation vector
wk
Measurement noise
xk
State vector
x̂−
k
a priori state estimate
x̂k
a posteriori state estimate
xk
Nominal value of the state
xu
User (receiver) x-coordinate
x
Feature coordinates in the image reference frame
X
Object coordinates in the world reference frame
or user position East component
Ẋ
Time derivative of X
yu
User (receiver) y-coordinate
ỹ(tA )k
Average value of bin k in Allan variance
Y
User position North component
Ẏ
Time derivative of Y
zk
Measurement vector
xxviii
Symbols
zu
User (receiver) z-coordinate
Z
Depth of an object i.e. the Z-coordinate in the world
reference frame
1. INTRODUCTION
In addition to commercial solutions, such as directing the user flexibly to the destination desired, pedestrian navigation is crucial in critical applications such as positioning of first responders, electronic monitoring (i.e. monitoring of dangerous offenders
under house arrest or parole), and military personnel. The equipment used for pedestrian navigation has to be small and light to carry, effortless to use as well as have
reasonable low levels of power consumption and price. Like in all navigation systems the position information has to be accurate and available in real time. At present
Global Navigation Satellite Systems (GNSS) are the superior navigation technology
fulfilling all the above requirements in outdoor open-sky environments. However,
instruments for pedestrian navigation are mainly needed indoors and in urban areas,
where GNSS is significantly degraded or unavailable.
In these challenged GNSS environments the absolute position of the user may be
obtained with other radio navigation systems like Wireless Local Area Networks
(WLAN), Bluetooth, or Radio Frequency Identification (RFID). The drawbacks of
these radio systems are that they need a priori prepared infrastructure and are therefore restricted to certain areas. They also have, in some environments, too low availability for the needs of pedestrian navigation, depending on the number of access
points available. When the initial absolute position is known, the position of the user
may be propagated using relative positioning approaches, like self-contained sensors.
The propagated position may then be used to augment the position measurements
obtained with GNSS or other radio sensors for more accurate and available, or even
short-time stand-alone navigation.
The most commonly used self-contained sensors in pedestrian navigation are digital
compasses for measuring the heading of the user, gyroscopes for heading changes,
and accelerometers for the user speed. When these measurements are used as inputs
to Pedestrian Dead Reckoning (PDR) algorithms or integrated with absolute position
2
1. Introduction
measurements using a Kalman filter, the position of the user is obtained continuously
despite the degradation of the GNSS signals. However, self-contained sensors suffer
from biases and drift errors that may decrease the position accuracy substantially, especially when consumer grade Micro-Electro-Mechanical (MEMS) sensors are used.
The errors in a pedestrian indoor position solution experienced due to the above shortcomings of self-contained sensors, of which the accelerometers, gyroscopes and magnetometers are discussed in more detail herein, may be mitigated using information
about the user motion obtained from consecutive images. When the user is carrying a
camera whose position and orientation with respect to the user’s body are known, the
motion of the features in the observed images may be transformed into information
about the user motion. The visual motion information is not affected by the same
error sources as GNSS and self-contained sensors, and is therefore a complementary
information source for augmenting the positioning measurements. Visual-aiding increases the accuracy, availability, continuity and reliability of the navigation solution,
as will be shown herein.
The visual perception used in the methods presented herein utilizes a camera carried
by the user and facing roughly to the direction of motion. In the case of consumer
smartphone applications this is in most cases a preferable configuration, as during
the navigation in e.g. a mall or an office building the user is likely to follow the
path or look at a map from the smartphone display, resulting in the camera orientation required by the method. In the case of first responders, electronic monitoring,
and military personnel, the camera is preferably not carried in hand where it would
complicate fundamental operations, but attached to the body or helmet of the person.
This configuration is also favourable for the methods presented in the thesis.
1.1
Research Objectives
The use of visual information in navigation is challenging. Motion of the features
in consecutive images provides information about the change of the user’s heading
and translation during the time interval between two consecutive images. In order
to convert this relative information into the absolute position information needed
for navigation, the position and heading have to be initialized with known absolute values and then propagated using the relative measurements. The attitude and
1.1. Research Objectives
3
translation obtained using visual information would stay accurate during the navigation if the environment was favourable for the visual perception and there would be
no correspondence errors while matching features from consecutive images. Visual
measurements obtained from different time epochs are independent and therefore the
errors in previous epochs do not affect the measurements from subsequent images.
However, as errors are inevitable especially the propagated distance and therefore the
position starts to drift after a while due to errors affecting the visual measurements
and therefore absolute information is needed to re-initialize the trajectory from time
to time.
The main error sources for the visual-aiding observable in indoor surroundings are
the varying lighting conditions of the environment and the low amount of distinctive
features to be detected. Urban outdoor areas do not usually suffer from low lighting
during daytime. Outdoor surroundings are rich in features, but also in dynamic objects, namely humans and vehicles. When the motion of the camera, and therefore
of the user, is observed by following the motion of the features in the consecutive
images, the image processing method has to be able to exclude the dynamic objects
in the scene from disturbing the perception of the motion.
In this research the user heading, as well as the pitch and roll of the camera, are observed by tracking the motion of vanishing points, namely features arising from projective transformations that map the three-dimensional objects into two-dimensional
image points. The deficiency of the method is that it is strongly dependent of the
geometry of the environment, as it requires straight parallel lines in the view of the
camera, preferably in at least two orthogonal directions. In sharp turns this requirement is violated and the magnitude of the turn is often impossible to be defined using
image processing alone.
Resolving translation from consecutive images using only a monocamera is a challenging research task as well. The complication in observing translation from consecutive images is that the distance between the objects seen in images and the camera contributes to the extent the image pixels move when the camera moves. When
the depth, i.e. the distance of the object from the camera, is unknown the scale of
translation stays unknown regardless of how many matching image points are found
between the consecutive images.
The objectives of this research is to provide methods to retrieve heading and trans-
4
1. Introduction
lation information using consecutive images addressing the above mentioned challenges and also to enable accurate, more reliable and available navigation solutions
by augmenting other positioning systems with the information obtained. All calculations are of a sufficiently low complexity to be adopted in real time for navigation
with current smartphones. The algorithms of the concept called ”visual gyroscope”
providing the user heading are already developed for the mobile phone environment
and the feasibility of the implemented system is herein discussed.
1.2
Related Work
The research related to visual positioning has so far mainly concentrated on navigation of vehicles and mobile robots [26] [103]. The motion of a robot or vehicle
is constrained and usually only two-dimensional. The visual calculations are easier
due to the fact that the location and the path of motion are to some extent known in
advance. The first papers related to vision-aiding in pedestrian navigation were published since the late 90s [8]. They used earlier prepared databases with images taken
of the surroundings tagged with position information obtained using Global Positioning System (GPS), a map or a floor plan. The absolute position of the pedestrian was
provided when a match was found between an image taken by the pedestrian and an
image from the database [113]. One of the first such applications made for a smartphone was published in 2004 by Robertson and Cipolla [85] running the calculations
on a server to which the query image was sent. Hile and Borriello [43] matched features, like corners, found from the query image, into a floor plan saved in a server.
The feature matching was restricted to a certain area of the floor plan using coarse
position information obtained with WLAN. The database based vision-aiding applications provide accurate positioning but are restricted to a certain area and require
extensive preparation.
A visual pedestrian navigation system independent of a server and of pre-existing
databases needs usually integration with other positioning sensors to be functional.
In such a system the relative position of the user is obtained by monitoring the motion of features in consecutive images taken by the user device and integrating the
information with measurements obtained with self-contained sensors or GNSS receiver. With initial absolute position information the navigation may be performed
with reduced drift and other errors, as without the initial position the visual perception
1.2. Related Work
5
provides information of the user motion only. Such server independent systems have
been proposed by [41] using visual-aided Inertial Measurement Unit (IMU) measurements. On the other hand, a Simultaneous Localization And Mapping (SLAM)
system produces a map of the unknown environment while simultaneously locating
the user. Traditionally the mapping has been done using inertial sensors but in recent
years visual SLAM systems integrating also a camera has been developed [19].
Most human made environments, in indoors and urban outdoor areas consist of segments forming a Cartesian coordinate system with straight lines in three orthogonal
directions. The coordinate system is called the Manhattan grid [25] and it provides
a good basis for vision-aided navigation utilizing vanishing points [13] [35] [111].
The method of integrating vanishing point based orientation information with Inertial
Navigation System (INS) measurements has been implemented before for accurate
indoor navigation of an unmanned aerial vehicle (UAV) [108] [30] [82] and for pedestrian navigation [55] [47]. The method presented in this thesis follows the mentioned
vanishing points based methods but is further developed for pedestrian and especially
smartphone use by developing computationally less demanding algorithms and sophisticated error detection.
The unknown scale in translation obtained from tracking the motion of features in
consecutive images is one of the most challenging issues related to visual navigation.
The magnitude of the motion of the figure in an image is dependent on the depth of
the object, i.e. the distance of the object from the camera. Because the distance of the
objects from the camera in the navigation environment is usually unknown, a scale
problem arises and different methods for overcoming it have been used. When the
environment contains objects with known sizes, the distance may be resolved [97].
Also, when scale information about the environment is available, for example in the
form of a floor plan [44], the depth of the objects may be observed. Tools aiding the
resolving of the distance, like laser rangefinders, have been integrated with a camera
by [115] and the motion of the user resolved. The requirement for special equipment
reduces the applicability of the methods for pedestrian navigation at this time. When
a stereo camera is used, the distance of the objects may be resolved using triangulation [49]. Recently some smartphones equipped with stereo cameras have been
launched. In the case of stereo vision the distance between the two cameras, called
the baseline, affects the accuracy of the motion obtained from images. The farther
the two cameras are from each other the better the accuracy [45]. Therefore a con-
6
1. Introduction
figuration using a monocular camera and images taken from two different positions
provides better results for vision-aided navigation than a smartphone equipped with
a stereo camera due to its very short baseline (e.g. 2.4 cm for LG OPTIMUS 3D
smartphone) as a larger baseline results in better depth accuracy [45].
Certain configuration of the navigation system gives information about the distance
of the objects being photographed from the camera. When the camera is pointing
down to the ground the z-coordinate i.e. the distance is constant and equals the height
of the camera. The method utilizing the downward-pointing camera has been used
in the applications of vehicle navigation [79] [58] and recently in pedestrian navigation [42]. However, one of the challenges of the visual-aiding in indoor environments
is the shortage of features to be tracked. Especially floor textures are usually very homogenous and for that it is very difficult to find matching image points using a camera
pointing straight to the floor. [16] developed an outdoor robot navigation system using a special camera configuration, namely the camera had a certain pitch towards
the ground, to resolve the distance problem. Optical flow calculations for finding
the camera rotation and translation were used. The method presented in this thesis
follows the ideas presented in this method but is further developed for pedestrian and
indoor use. In [16] the pitch was measured a priori and kept static during navigation
as in the method developed herein where orientation of the camera is computed separately for each image and therefore accommodating the irregular movement of the
camera e.g. in a smartphone held in hand. As the orientation of the camera is computed separately the method decreases the number of features needed for resolving
the motion.
As mentioned above, GNSS is an accurate and freely accessible system for outdoor
navigation widely used in smartphones. However, at least four satellites in a good
geometry are needed for solving the user position, a requirement that is not always
fulfilled in urban areas. When knowledge of an initial position is available, fewer
satellites may be used for resolving the total change in position between two time
epochs as will be shown later. When the errors affecting satellite signal propagation
are known, information obtained with two satellites is enough to resolve the total
magnitude of translation in addition to the receiver clock error. [99] used the magnitude information for resolving the ambiguous scale in translation induced by motion of features in consecutive images. A method for robot navigation encompassing
three cameras for visual measurements, an IMU for resolving pitch and roll of the
1.3. Author’s Contribution
7
camera and an iterative algorithm for solving the user heading was developed. In this
thesis, an algorithm more suitable for pedestrian navigation is developed, utilizing
less equipment and more robust vision-derived heading information.
1.3
Author’s Contribution
In this thesis a novel pedestrian navigation system is presented. Two concepts are
developed, namely a ”visual gyroscope” providing the user heading and a ”visual
odometer” providing the translation. The motivation for dividing the observation of
the user heading and translation into two separate tasks instead of using traditional
methods resolving the full motion at once is the difficulty to determine the unknown
scale of the translation and also the challenges indoor environments set for visual
perceiving, namely the low lighting and shortage of features. The visual odometer
presented in the thesis builds on the orientation information produced by the visual
gyroscope. Also, traditional methods utilize feature points matched in consecutive
images. As the two-fold method presented herein perceives the orientation change
using lines observed from the scene, measurements are much more accurate than using other features, as will be explained in Chapter 4 and the number of other features
needed for resolving the translation is reduced. Author’s contributions include also
a system developed for pedestrian urban navigation, utilizing the visual gyroscope,
visual odometer and signal carrier information obtained from at least two GNSS satellites.
All calculations are of a sufficiently low complexity to be adopted for navigation with
current smartphones. The main contributions of the thesis are as follows:
• A visual gyroscope with lower computational requirements compared to the
existing algorithms resolving the user heading using visual perceiving and
therefore suitable for present smartphones. The visual gyroscope is based on
observing heading, pitch and roll of the camera, using vanishing points.
• A novel error detection method for the visual gyroscope which provides accurate and reliable navigation despite the unforeseeable motions of a pedestrian.
The algorithm makes the visual gyroscope suitable for pedestrian navigation.
8
1. Introduction
• A visual odometer, namely a method to resolve translation from images using
a monocular camera. The visual odometer is suitable to be used also in indoor
environments which are usually poor in features. It is feasible for seamless
navigation since it leans on the visual gyroscope’s orientation information and
needs only the approximate height of the camera as prior information.
• A vision-aided differentiated carrier phase navigation system for pedestrians.
The method is leaner than previous similar solutions. The system is independent from other sensors than the camera and the GNSS receiver because it encompasses the visual gyroscope and visual odometer providing the orientation
and motion information.
The core contributions of Chapters 4-6 were first presented in [89], [90], [92], [91],
[93], [94] and [95] in which the author of the thesis is the first author and in [64] in
which the author of the thesis is a co-author.
1.4
Thesis Outline
The thesis combines two different scientific disciplines; navigation and computer
vision. They both have a well-established terminology and mathematical representation. In order to respect the fundamentals of both sciences, the thesis utilizes customary terminology. Unfortunately, in the case of variable names and expressions there
are many differing meanings and therefore the terminology characteristic for each is
presented in dedicated chapters.
In Chapter 2, the most prevalent systems used in pedestrian navigation - i.e. GNSS,
WLAN and self-contained sensors - are presented. The computer vision principles
relevant in vision-aided navigation are discussed in Chapter 3 with an emphasis on
the methods and algorithms used in the thesis. Chapter 4 introduces the concept
of a ”visual gyroscope” and the novel error detection algorithm. The feasibility and
challenges of the visual gyroscope are discussed as well as the effect of different
camera and setup characteristics on the accuracy and applicability of the method in
pedestrian navigation. In Chapter 5 a concept of ”visual odometer” is presented.
The mathematics, strengths and challenges of the visual odometer and its utilization
are discussed. Chapter 6 presents results from various experiments integrating the
1.4. Thesis Outline
9
visual gyroscope and odometer, both for indoor and urban pedestrian navigation. In
Chapter 7 the vision-aided differentiated carrier phase navigation system for pedestrians, results from experiments and its feasibility for urban pedestrian navigation
are discussed. Chapter 8 provides conclusions and recommendations for future research.
The Novatel GPS and the Novatel SPAN-SE GPS/GLONASS receivers with a Northrop
Grummans tactical grade LCI-IMU were used to determine the reference trajectories
for assessing the performances of the algorithms developed in this thesis. As the system is initialized outdoor using the GNSS receiver and the navigation time indoors is
limited to a short-term period, the obtained post-processed reference solution has a
decimeter level accuracy.
10
1. Introduction
2. OVERVIEW OF PEDESTRIAN NAVIGATION
Global Navigation Satellite Systems (GNSS) are the superior navigation technology
used also for pedestrian positioning. However, GNSS is significantly degraded or
unavailable in environments where pedestrian positioning is mainly needed, namely
in indoors and urban areas and other methods are required for augmentation or replacement in these situations. Methods other than GNSS for these indoor and urban
areas may be divided into two classes based on the type of position information they
provide, namely into absolute and relative positioning. Robust integration of measurements from sources providing data with different rates and perceiving observations in different reference frames is challenging. This chapter introduces the basics
of GNSS based positioning, Wireless Local Area Network (WLAN) positioning and
other absolute positioning methods. The relative positioning systems used in this
thesis, namely Inertial Navigation System and other self-contained sensors, are also
presented. Finally the Kalman filter, a set of mathematical equations used for estimating the state of a process based on a priori knowledge of the accuracy of the
measurements and confidence on the model used, is discussed.
2.1
Navigation Frames and Attitude
This thesis uses five reference frames relevant for vision-aided pedestrian navigation,
namely Inertial, Earth-Centered Earth-Fixed, Navigation, Body and Camera reference frames. The inertial frame has origin at the centre of the Earth and axes fixed
with respect to stars, not rotating with the Earth. Earth-Centered Earth-Fixed reference frame has also origin at the centre of the Earth, but the axes rotate with the
Earth with respect to the inertial frame. Both frames have their z-axis coincident with
the Earth’s polar axis. The navigation frame is a local geographic frame with origin
defined by the initialization of the navigation setup and axes pointing at north, east
and up. The body frame is a frame where the inertial navigation system is installed,
12
2. Overview of pedestrian navigation
Fig. 2.1. Heading, pitch and roll in Navigation frame.
containing three orthogonal axis, z-axis pointing up [105]. In vehicle navigation the
rotation around z-axis is called yaw, x-axis roll and y-axis pitch. In pedestrian navigation where the orientation of the unit is not always fixed with respect to the user
the term yaw is substituted with heading and defined as the angle between the chosen
body axis with respect to the north [24], which is defined as the direction from a point
to the North pole of the earth and projected onto the level surface. The heading, pitch
and roll are shown in Figure 2.1. The definition for the camera reference frame is not
needed in this chapter but will be given in Chapter 3.
2.2
Absolute Positioning
Absolute positioning systems provide the actual coordinates of the user position as
the relative positioning systems tell the speed (or translation) and direction of the
user to be intergraded with the initial position. In reality, only GNSS provides the
absolute coordinate information of the pedestrian in the Earth-Centered Earth-Fixed
2.2. Absolute Positioning
13
(ECEF) coordinate frame as the other systems provide the absolute position in some
a priori defined local reference frame, for example inside a certain building. All the
absolute positioning techniques presented facilitate positioning by transmitting radio
waves with different wavelengths and frequencies.
2.2.1
Global Navigation Satellite Systems
GNSS encompass the United States Global Positioning System (GPS), the Russian
GLONASS, Chinese COMPASS/Beidou and the European Galileo systems. The following principles will be based on GPS because it is still the most used system due
to its long existence compared to other systems mentioned above, but are true for the
other systems also. In GNSS based positioning the traverse time of a signal from the
satellite to the user receiver antenna is estimated. When this time is multiplied by the
speed of light a geometric range between the satellite and the user is obtained. In an
ideal case measurements from three satellites would provide an accurate three dimensional position of the user. In reality the measurements are erroneous, the main error
source being the timing errors between the receiver clock and the satellite clock from
the system time. Therefore the measured range is called the pseudorange. The satellite clocks are precise and synchronized by the ground control segment of the system.
However, the clocks in the user receivers are low-cost with typically a large timing
error. Therefore, it has to be estimated as a parameter in the navigation solution.
Observations from at least four satellites are needed for three dimensional positioning, namely the fourth observation is used for resolving the receiver clock error. The
pseudorange measurement is defined as
ρi = ri + c(tu − dti ) + dρi + diiono + ditropo + εiρ
(1)
where ri is the geometric range between the user receiver’s antenna and the satellite i
[m], c is the speed of light [m/s], tu is the receiver clock error [s] and dti is the satellite clock error [s] with respect to GPS time, dρi is the ephemeris error [m], diiono
and ditropo are the ionospheric and tropospheric delays [m], respectively and εiρ encompasses noise, unmodelled errors and multipath [75]. Because some of the errors
may be corrected using the data found in the signal and the rest may be considered
negligible compared to the receiver clock error, the pseudorange measurements may
be expressed as
14
2. Overview of pedestrian navigation
ρi = ||si − u|| + ctu
(2)
where si represents the coordinate vector of satellite i, ctu is the speed of light
(c) times the advance of the receiver clock tu and u is the user coordinate vector
(xu , yu , zu ) to be resolved [54]. These pseudorange measurements from at least
four satellites may further be used for obtaining the user coordinates. Because the
pseudorange equations are non-linear, the user position and clock error have to be
linearized by expanding using a Taylor series as
f (xu , yu , zu , tu ) = f (x̂u + ∆xu , ŷu + ∆yu , ẑu + ∆zu , t̂u + ∆tu ) =
δf (x̂u , ŷu , ẑu , t̂u )
δf (x̂u , ŷu , ẑu , t̂u )
∆xu +
∆yu
δ x̂u
δ ŷu
δf (x̂u , ŷu , ẑu , t̂u )
δf (x̂u , ŷu , ẑu , t̂u )
∆zu +
∆tu + ...
+
δẑu
δ t̂u
f (x̂u , ŷu , ẑu , t̂u ) +
(3)
where (x̂, ŷ, ẑ, t̂) are approximated values of the true position and true clock error
(xu , yu , zu , tu ) and (∆xu , ∆yu , ∆zu , ∆tu ) are the differences between the true and
approximated values. The higher order derivatives are discarded to neglect nonlinear
terms and the remaining first-order partial derivatives are
δf (x̂u , ŷu , ẑu , t̂u )
δ x̂u
δf (x̂u , ŷu , ẑu , t̂u )
δ ŷu
δf (x̂u , ŷu , ẑu , t̂u )
δẑu
δf (x̂u , ŷu , ẑu , t̂u )
δ t̂u
xi − x̂u
r̂i
yi − ŷu
=−
r̂i
zi − ẑu
=−
r̂i
=−
(4)
=c
The variables r̂i for the estimated geometric ranges are defined as
r̂i =
p
(xi − x̂u )2 + (yi − ŷu )2 + (zi − ẑu )2 .
The pseudorange measurement may now be presented as
(5)
15
2.2. Absolute Positioning
ρi = ρ̂i −
yi − ŷu
zi − ẑu
xi − x̂u
∆xu −
∆yu −
∆zu + c∆tu
r̂i
r̂i
r̂i
(6)
and finally the difference between the measured pseudorange ρi and the pseudorange
computed using the estimated user position ρ̂i is
∆ρi =
xi − x̂u
yi − ŷu
zi − ẑu
∆xu +
∆yu +
∆zu − c∆tu .
r̂i
r̂i
r̂i
(7)
The differences between the approximated and true position and clock error ∆x is
∆x = H−1 ∆ρ where the matrices for n measured satellites are


 x1 −x̂u
∆ρ1
r̂1
 .. 
 ..
∆ρ =  .  H =  .
xn −x̂u
∆ρn
r̂n
y1 −ŷu
r̂1
z1 −ẑu
r̂1
yn −ŷu
r̂n
zn −ẑu
r̂n
..
.
..
.



∆xu
1
 ∆y 


u 

1 ∆x = 
 ∆zu 
1
−c∆tu
(8)
and by using this information the true position and clock error may be computed from
the approximated values when four satellites are observed. When more than four
satellites are observed, the solution is computed using the least-squares estimation as
∆x = (HT H)−1 H∆ρ.
The user / satellite relative geometry contribute to how much the combined measurement errors, the most important being ionospheric and tropospheric delay, receiver
noise and resolution and multipath, expressed using a variable called User Equivalent
Range Error (UERE), will affect the resulting position error [54]. The more measurements the better the position solution is obtained only when the measurements are
linearly independent [75]. When the satellites used are widely spread with respect to
the user receiver, the dilution of precision (DOP) is small and the position solution
much more accurate than when the satellites are close to each other or otherwise in
poor configuration. The effect of the satellite geometry for the position error is related as follows. The user-satellite geometry is denoted as G = (HT H)−1 , where
the matrix H is called the design matrix and is as explained above. Then the covariance matrix of the position errors on the x−, y−, and z−components and of the user
clock bias (tu ) estimate is
cov(x) = GσU2 ERE .
(9)
16
2. Overview of pedestrian navigation
The variances of position and clock error components are [54] [75] σx = σU ERE G11 ;
σy = σU ERE G22 ; σz = σU ERE G33 ; σb = σU ERE G44 , where Gii is the ith entry on
the diagonal of G. The Geometric Dilution of Precision (GDOP) encompassing 3-D
position and clock bias estimation error is now
GDOP =
p
G11 + G22 + G33 + G44
(10)
and Position Dilution of Precision (PDOP) the square root of the sum of the three first
factors. In the case where four satellites are tracked the PDOP value is smallest, and
therefore the position solution best possible, when three of the satellites are evenly
distributed in azimuth near the horizon and the fourth is perpendicularly above the
user receiver (i.e. at zenith).
A more accurate satellite-to-user distance is obtained when a carrier phase observation is used. The carrier phase observation ϕ from satellite i is defined as
ϕi = ri + c(tu − dti ) + dρi + λN − diiono + ditropo + εiϕ
(11)
where λ is the carrier wavelength, N is the integer ambiguity, εiϕ encompasses noise,
unmodelled errors and multipath and the other variables are as in the case of the
pseudorange measurement.
Although carrier phase measurements provide very accurate positioning, in millimeter level in favourable conditions, they have not been widely used in pedestrian
navigation. In order to obtain an accurate position solution, the integer ambiguity, namely the integer number of cycles the signal has traversed since leaving the
satellite, has to be resolved. This may be done using double differenced GNSS
measurements [67] or single differenced measurements and Precise Point Positioning (PPP) [33], both too complex for the equipment used for pedestrian navigation
with lightness and reasonable cost requirements. The carrier phase observations are
also difficult to be obtained continuously in the environments typical for pedestrian
positioning namely in urban areas and indoors: the carrier phase tracking loop is
more vulnerable to losing lock in attenuated signal environments than the code delay
tracking loop, which produces the pseudorange measurements.
However, when the carrier phase measurements obtained in two consecutive time
epochs are differentiated the integer ambiguity, which is unchangeable, disappears.
2.2. Absolute Positioning
17
The differentiated measurements are left with the error and noise terms as well as the
change in geometric range between the time epochs, which may further be used for
pedestrian navigation, as will be shown in Chapter 7
GNSS is the superior positioning system in open outdoor areas, but its use is very
limited in urban and indoor areas. Although in these challenging environments High
Sensitivity GPS (HSGPS) receivers are usually used, the performance in terms of reliability and accuracy is degraded [62]. As the received signal power level decreases
the measurement uncertainty increases due to noise. The Effective Isotopic Radiated
Power (EIRP) of a GPS (L1 C/A Code) civil signal is 26.8 dBW at the time of transmission. The power decreases mainly due to free space propagation loss (∼ 184.4
dB) while the signal travels from space to the Earth. In order to be able to find the
relevant information from the signal below noise, the minimum received power at the
conventional receiver has to be around -160 dBW [54] and at a typical HSGPS receiver -186 dBW [68]. The requirement of the -160 dBW received power is achieved
with a Line-of-Sight (LOS) signal, but the signal degrades due to the attenuation
resulting from propagation through a material (i.e. shadowing) and interference, typically multipath (i.e. fading). Type of the material the signal has to penetrate affects
the amount of attenuation, a good comparison of the effect of different widely used
materials may be found from [40]. For example, while entering a concrete and steel
building the mean fading of the signal ranges from 19 to 23 dB and from 12 to 21
dB when entering a residential garage, depending on the elevation angle of the satellites tracked [68]. The required and received signal power levels show that the use
of a HSGPS provides increased availability of the GPS positioning in most GNSS
challenging environments, but the accuracy is still too poor for pedestrian navigation.
Therefore augmenting and replacing GPS signals in urban and indoor areas is needed
and few comprehensive methods will be discussed below.
2.2.2
WLAN Positioning
Wireless Local Area Network (WLAN) based on IEEE 802.11 standard is a wireless
network used for communication between closely-spaced electronic devices (occasionally also called Wi-Fi which is a registered trademark of the Wi-Fi Alliance).
Because of its broad existence means for using the technology for positioning has
also been developed. In a WLAN positioning solution, the prevailing fingerprinting
18
2. Overview of pedestrian navigation
technique uses a database, a so called radio map, of access point signal strengths collected manually during an off-line training process. The user position is determined
with the radio map and Received Signal Strength Indication (RSSI) measurements,
which are the power level measurements of the received radio signal. A Bayesian
theorem and e.g. the Histogram Maximum Likelihood method are used to solve the
user position with the measurements [86], [112]. WLAN positioning provides typically room-level accuracy but is limited to the surroundings with existing and prepared
infrastructure [101].
The measured RSSI samples at each reference point during the training phase are
utilized to estimate the parameters of Weibull distribution [96] used to describe the
WLAN signal strength distribution [66]. During positioning phase the observation
vector RSSI = {r1 , ..., rn } is used to find the position x that maximizes the conditional probability P (x|R) using the Bayesian theorem as
arg max[P (x |RW LAN )] = arg max
x
x
P (RW LAN |x )P (x )
.
P (RW LAN )
(12)
The advantages of using WLAN positioning are its large coverage, typically the range
is from 50 m to 100 m, and that no line of sight is required [74]. A major weakness
of the fingerprinting procedure is its vulnerability to the changes of the environment,
causing the signal propagation patterns and thus the radio map to become obsolete and therefore offering a position solution with reduced accuracy. Also electrical
equipment placed to the vicinity of the access points contorts the position solution.
2.2.3
Other Technologies
The other promising and actively researched absolute positioning technologies for
pedestrians contain Radio Frequency Identification (RFID), Bluetooth and UltraWideband (UWB). RFID positioning is based on attaching the user with tags that
are then observed by a reader, or by attaching the environment with tags and equipping the user with a reader. The two most widespread methods used for resolving the
user position is by just acknowledging that a user is close to the reader with known
position or by using the RSSI measurements as described above. Ultra-Wideband positioning is based on a transmitter emitting radio waves occupying a large frequency
bandwidth, namely more than 500 MHz. The benefit of using the Ultra-Wideband
2.3. Relative Positioning
19
signals compared to the narrow band equivalents is their ability to penetrate many
building materials such as concrete, glass and wood [59]. The UWB based positioning may be performed similar to the RFID positioning by attaching the user with
receiver tags using RSSI methods such as in WLAN positioning or similar to Time of
Arrival (ToA) methods such as GNSS based positioning. However, UWB positioning may also be used without supplying the user with special equipment, namely by
using the UWB transmitter as a radar. In this manner the time elapsed before an emitted signal comes back to the transmitter after reflecting from the user is measured.
When the background is known, the position of the user may be estimated. Bluetooth
positioning uses the same principles as WLAN positioning; the position is mainly
obtained using the RSSI methods utilizing an a priori prepared database of the access
points in the area. Smartphones have been equipped with Bluetooth receivers for long
already, but unfortunately the infrastructure of the access points is not even close to
be as widespread as for WLANs. The benefit of using Bluetooth for positioning is
that the transmitters may be manufactured to transmit signals with strong power and
therefore resulting in long range positioning [20].
A comprehensive presentation of various techniques used for indoor positioning, not
all mentioned in this thesis, may be found from [74].
2.3
Relative Positioning
Self-contained sensors carried by the user are desirable equipment for pedestrian navigation providing relative position information independently of the environment.
With a known initial position, the position may be propagated using the sensors
for a limited period of time [24]. The propagation is done using standard inertial
algorithms incorporating the attitude obtained by integrating the gyroscope measurements and translation obtained by double integrating the accelerometer measurements. The limitation of the self-contained sensors is the cumulative measurement
errors growing fast due to the procedure integrating also the measurement noise and
biases.
An important aspect strongly affecting the development in pedestrian navigation is
that the tolerable number and size of the equipment used is limited compared to e.g.
robot or vehicular navigation. This forces to seek for a compromise between the
accuracy and usability of the system. Micro-Electro-Mechanical System (MEMS)
20
2. Overview of pedestrian navigation
sensors are small in size and weight, have low power consumption and are inexpensive to produce [105] and therefore used widely for pedestrian navigation and
especially in smartphones, however with decreased measurement performance.
2.3.1
Inertial Sensors
Accelerometers and gyroscopes are called inertial sensors. A system encompassing at
least one accelerometer observing the acceleration of the body and a gyro measuring
the rotation is called an Inertial Measurement Unit (IMU). Two different methods
are used for processing the IMU measurements, namely Pedestrian Dead Reckoning
(PDR) and inertial navigation [37], systems using the latter referred to as Inertial
Navigation Systems (INS).
PDR has three phases; step detection, step length estimation and navigation solution
update by combining the step length estimate obtained using at least one accelerometer and heading using a magnetometer or a gyro augmented with a magnetometer.
As the performance of PDR is less sensitive to the quality of the sensors, especially
when the distance travelled is concerned, PDR is feasible to be used with MEMS
sensors. The process works with a single accelerometer, but the performance increases when more sensors are used [37].
Inertial navigation algorithms require a full IMU with triads of accelerometers and
gyroscopes. The accelerometers observe the acceleration of a body, but to be able to
transform the acceleration measurements into user position the direction of the acceleration is also needed and that is done by observing the relative rotational motion of
the body with respect to the inertial reference frame using rate gyroscopes. The most
common type of MEMS gyros are vibratory gyros based on Coriolis force [9]. The
formation of a navigation solution from the accelerometer and gyroscope measurements is as follows [105].
The accelerometers output a measurement of specific force in the body reference
frame f b . The measurement has to be transformed into the inertial reference frame
using a Direction Cosine Matrix [61] Cib as f i = Cib f b . Matrix Cib may be computed
b obtained from gyroscopes using
from the angular velocities ωib
Ċib = Cib Ωbib
(13)
2.3. Relative Positioning
21
where Ωbib is the skew symmetric matrix of the angular velocity vector. The specific force contains a measure of mass gravitation (g) that has to be accommodated
resulting in
d2 r =f +g
dt2 i
(14)
where r is the user position vector with respect to the reference frame origin and t is
time.
By integrating the obtained value once the velocity of the user in inertial reference
frame vi is obtained. In pedestrian navigation the final navigation solution is needed
in the Earth frame. The Coriolis theorem provides means to convert the velocity
measurement into the ECEF frame using information of the Earth turn rate ωie =
[0 0 Ω]T as
ve = vi − ωie × r
(15)
where × denotes a vector cross product. By integrating again the velocity measurement the position of the user r in the Earth-Centered Earth-Fixed reference frame is
obtained.
Integration of INS and GNSS fills the outages in positioning and provides more robust and reliable systems than either alone. However, both the accelerometer and
gyroscope measurements suffer from various error sources, the most important ones
being bias, scale factor and noise. Therefore the low-cost MEMS accelerometers
used especially in smartphones are too erroneous to be used to obtain the user speed
without augmentation with e.g. GNSS or using calibration and special algorithms,
e.g. [104] [81]. The drift in gyroscope induced user attitude, especially heading, due
to the mentioned errors seems to be still the most significant challenge in indoor and
urban pedestrian navigation, although the accuracy of the position solution increases
as multiple IMUs are used [10]. In order to achieve the accuracy and continuity of
positioning needed for pedestrian navigation, means for overcoming the challenges
due to the mentioned errors have to be found. Next section introduces few techniques
used for augmenting the inertial sensors.
22
2. Overview of pedestrian navigation
2.3.2
Other Self-Contained Sensors
This section presents two self-contained sensors not discussed above and used in the
thesis, namely a magnetic compass, also called a magnetometer and a barometer.
Magnetometer
A magnetic compass provides absolute angle information of the user with respect
to magnetic north by measuring the intensity of Earth’s magnetic field [18]. The
Earth’s magnetic field has a component parallel to the Earth’s surface pointing toward
magnetic north that differs in Helsinki area [2] by approximately 8 degrees from the
geographic north (magnetic declination) and has a field intensity of about 0.52 gauss.
The magnetic declination varies both from place to place and with the course of time.
If the compass is totally parallel to the earth’s surface, the heading (i.e. azimuth) (θ)
may be computed from its horizontal measurements XM , YM , neglecting the vertical
component ZM , as


90,





270,


θ = 180 − (arctan(YM /XM )) ∗ 180/π,





−(arctan(YM /XM )) ∗ 180/π,




360 − (arctan(YM /XM )) ∗ 180/π,
if XM = 0, YM > 0
if XM = 0, YM < 0
if XM < 0
(16)
if XM > 0, YM ≤ 0
if XM > 0, YM > 0.
In practice, especially in pedestrian navigation applications, the compass is not totally
parallel to the Earth’s surface and the tilt has to be compensated for. If the roll (β)
and pitch (φ) of the compass are known, the compass XM and YM measurements
may be transformed to the horizontal plane (XH , YH ) as
XH
YH
= XM cos(φ) + YM sin(β) sin(φ) − ZM cos(β) sin(φ)
= YM cos(β) + ZM sin(β).
(17)
Now the heading θ may be computed using the equations in 16 by substituting the
variables XH , YH for XM , YM .
The compass measurement suffer from errors of two different types; predictable and
unpredictable. The predictable errors come from sources such as orientation of the
2.3. Relative Positioning
23
Fig. 2.2. Absolute heading error obtained with a digital compass of a smartphone indoors.
navigation platform, soft and hard iron effects and magnetic declination. These errors may be eliminated by calibration or real-time compensation algorithms [22].
The unpredictable errors mainly due to environmental magnetic disturbances may be
high, for example as in an office corridor experiment causing a heading mean error
of around 18 degrees, when using a MEMS compass built-in a smartphone [94] as
shown in Figure 2.2, and are difficult to be removed. The error in the Figure is
obtained by comparing the heading direction obtained using the smartphone compass carried by a walking user to the real geographical direction computed using the
known path and a floor plan. Therefore the compass heading in indoor environments
is too poor to be used without augmentation, but as the magnetic heading is an absolute measure it is an effective measurement when integrated with e.g. a gyro.
24
2. Overview of pedestrian navigation
Barometer
The self-contained sensors presented above are poor in estimating the user z coordinate, namely the height. Barometers measure the air pressure that can be converted
into altitude information in indoor environments. The pressure (p) measured by the
barometer is related to the height (h) as [80]

h=
kRg
g

T0 
p

1−
TL
p0
(18)
where Rg is the universal gas constant 8.3143(N ·m)/(mol·K), p0 is the average sea
level pressure 101, 325(kP a), TL is the temperature lapse rate −0.0065(K/m) [48],
T0 is the temperature at the sea level and g is the gravitation constant.
Cameras are not affected by the error sources deteriorating the measurements from
gyroscopes, accelerometers and compasses and GNSS in indoor and urban areas.
Observing the heading and translation from consecutive images is also free from
preparing the environment a priori. Cameras are also light and small in size as well
as reasonable priced. Therefore vision-aiding is a feasible method for augmenting the
above mentioned systems in pedestrian navigation applications and will be discussed
in the following chapters.
2.4
Estimation
Kalman filter is a set of mathematical equations used for estimating the state, e.g.
position, velocity and attitude in case of pedestrian navigation, of a process recursively [51]. Kalman filter is used for integrating the measurements obtained using
visual methods presented herein with other position measurements. These implementations are discussed in Chapter 6. The state is estimated in a way that the mean
of squared errors between the actual measurements and the expected measurements
is minimized [109]. The recursive nature of the filter provides means for incorporating information about the past states and using the information to predict the current
or even the future states. This is done by using a discrete-time stochastic system
model [84]
2.4. Estimation
xk = fk−1 (xk−1 , vk−1 )
25
(19)
where fk−1 is a known linear or nonlinear function of the state xk−1 and vk−1 represents process noise. The measurements zk are related to the state through a measurement model (h) as
zk = hk (xk , wk )
(20)
where wk is measurement noise. In the following the fundamentals of the filter are
described for the linear (Kalman filter) and nonlinear (Extended Kalman filter) cases.
2.4.1
Kalman Filter
Kalman filter estimates the state of a linear stochastic system by using measurements
that are corrupted by zero-mean Gaussian noise wk and are linear functions of the
state that is also corrupted by zero-mean Gaussian noise vk−1 [36]. The discretetime system model is [109]
xk = Φk−1 xk−1 + wk−1
(21)
where k denotes the time epoch and Φ is called a state transition matrix and propagates the state from the epoch k − 1 to k. The measurement obtained is
zk = Hxk + vk .
(22)
Matrix H relates the state to the measurement as was the case for resolving the user
position from the pseudorange measurements explained in GNSS positioning above.
The state and measurement errors have Gaussian probability distributions
p(w) ∼ N (0, Q)
p(v) ∼ N (0, R)
(23)
where Q is process noise and R measurement noise covariance. Kalman filter has a
prediction stage and an update stage, where the predicted state estimation is corrected
26
2. Overview of pedestrian navigation
using the obtained measurement. The predicted state estimation is called a priori
estimate x̂−
k and the updated state estimate a posteriori x̂k . The measurement zk is
used to update the state as
−
x̂k = x̂−
k + K(zk − Hx̂k ).
(24)
The factor (zk − Hx̂−
k ) is called the measurement innovation and it expresses the
difference between the predicted (Hx̂−
k ) and observed zk measurement. Matrix K is
called the Kalman gain and is computed as
− T
T
−1
Kk = P−
k H (HPk H + R) .
(25)
−
The errors of a priori and a posteriori state estimates are defined as −
k = xk − x̂k
and k = xk − x̂k , respectively. The matrix P−
k represents the covariance of the
a priori state estimate error. The objective of the Kalman gain is to minimize the
obtained a posteriori state estimate error covariance. Equation 24 shows that when the
measurement error covariance is small and therefore the measurements are reliable,
the gain is large and the innovation is weighted heavily as when the state estimate
error covariance P−
k is small, the gain is also small and a priori estimated state is
trusted more.
Kalman filter is initialized by setting values for the initial state x0 and initial state
error covariance P0 . The measurement covariance matrix R is usually set a priori
and kept constant as the matrix H, process noise covariance matrix Q and the state
transition matrix. The algorithm then recursively predicts the state as
x̂−
= Φk−1 x̂k−1
k
P−
=
Φk−1 Pk−1 Φk−1 + Q
k
(26)
and updates the state estimate and state error covariance when the measurement is
obtained incorporating the new Kalman gain computed using (25) as
−
x̂k = x̂−
k + Kk (zk − Hx̂k )
Pk = (I − Kk H)P−
k.
(27)
2.4. Estimation
2.4.2
27
Extended Kalman Filter
The Kalman filter is a comprehensive method for estimating the state when both the
state model f and measurement model h are linear. However this is not true for all
applications including positioning and therefore other means, like an extension of the
algorithm called Extended Kalman filter (EKF), should be used [65]. In EKF the
state (19) and measurement (20) models are
xk = fk (xk−1 ) + wk−1
zk = hk (xk ) + vk .
(28)
In the case of the Kalman filter the state estimates are updated using the measurements as in EKF the nominal value of the state xk is updated with the perturbations
(δxk ) as
xk = xk + δxk .
(29)
EKF linearizes the measurement matrix H around the mean of the prior state approximating the non-linearity by a Taylor expansion. Therefore EKF is a Best Linear
Unbiased Estimator (BLUE) minimizing the expectation E(||xk − x̂k ||2 ) [65]. The
integration processes discussed in the thesis use different variations of Kalman and
Extended Kalman filters. These were chosen as they are prevailing means for observation integration and estimation in the navigation field. The performance of EKF
is however poor when the state and measurement models are highly non-linear and
in such cases other estimators, e.g. Unscented Kalman Filter (UKF) [27] or Particle
Filter (PF) [50] might be an alternative and should be a topic for further research.
28
2. Overview of pedestrian navigation
3. COMPUTER VISION METHODS FOR NAVIGATION
This chapter introduces the basics of computer vision. Herein the real life matters
that are seen in the field-of-view of the camera are called objects and their twodimensional images features. Humans inherently possess a good quality ”stereo camera”, namely eyes, and the human visual perception is capable of filling in missing
information. Therefore it is easy for a human to understand perspectives, evaluate
distances and occluded parts of the objects in the scene. In the case of the computer
vision, objects in the scene are seen as sets of points of digitized brightness value
functions. The form of these features, i.e. the point sets, change related to the pose of
the camera and the lightness of the environment. Therefore care has to be taken while
the features are extracted and matched in images. Deduction of motion information
from images is also challenging. The methods used widely in vision-aided navigation
research, also in the approaches presented in this thesis, are explained below.
3.1
Camera, Fundamental and Essential Matrices and Coordinate Frames
The principles explained in this section are mainly derived from [38]. The objects in
the 3D world are mapped into 2D image features using projective transformations.
These projections do not preserve the properties of shape, length, angle, distance or
ratio of distances, but they do preserve the property of straightness. As a result of
projective transformation the lines parallel in the scene seem to intersect in an image at a point, called the vanishing point. Therefore, to obtain a projective geometry
space, the Euclidean geometry has to be augmented with a point and line in the infinity. Also, two coordinates (x, y) presenting a point in Euclidean space are replaced
with a triplet (x, y, 1) called homogenous coordinates in projective space.
An object point having coordinates XN = (X, Y, Z) in the world (navigation) frame
is transformed into the camera frame XC using the rotation (R) of the camera frame
30
3. Computer vision methods for navigation
with respect to the world frame and translation of the camera origin (t) with respect
to the world frame origin as
XC = RXN + t.
(30)
The methods presented in this thesis assume a pinhole camera model, in which
an object point in the camera frame expressed in homogenous coordinates XC =
(X, Y, Z, 1) is mapped to the point x = (f X, f Y, Z) in image plane. f is called
the focal length and a line perpendicular to the image plane going from the camera
centre, called principal axis, meets the plane at distance f in a point called principal
point. World, camera and image frames as well as principal point and focal length are
shown in Figure 3.1. An important note is that as opposed to the established means
of coordinate frame configuration in navigation research presented in the previous
chapter, in computer vision the y-axis is pointing up and z-axis along the camera’s
principal axis. This is an issue that has to be accommodated for while forming the
navigation solution. The mapping of the world point X to the homogenous image
point x = (x, y, 1), where x = f X/Z and y = f Y /Z is characterized by a 3x4
camera matrix P as x = PX. When the calibration matrix K is known the world
point X is mapped into image point x using camera matrix P = K[R|t], where [R|t]
denotes a 3x4 matrix composed of a 3x3 matrix R and a 3x1 column vector t.
When the object points all lie on a plane, point correspondences xi , x0i in two images
are related by a homography expressed using a 3x3 matrix H as Hxi = x0i . The point
vectors have three entries but are defined only up to a scale and therefore four point
correspondences (each having two coordinates) are needed to resolve the ambiguous
values in H. If three of these point correspondences are collinear, the homograpy
is said to be degenerate and does not have a unique solution. Degeneracy problems
are addressed in more detail in Chapter 5. Image points must be normalized for the
solutions of homography to be correct. The image points x are normalized using the
camera calibration matrix K, discussed in detail below, as x̂ = K−1 x.
When the object points do not lie on a plane but are tracked from a real 3D scene, a
Fundamental matrix (F) has to be computed. The Fundamental matrix encompasses
the intrinsic projective geometry between two views, meaning that only the rotation
and translation of the two camera centers (also one camera between two images) and
the internal camera parameters represented by the Calibration matrix K, affect the
3.1. Camera, Fundamental and Essential Matrices and Coordinate Frames
31
Fig. 3.1. Camera, image and world coordinate frames.
matrix. In other words, Fundamental matrix (F) represents the epipolar geometry
between the two views, visualized in Figure 3.2. The figure shows the image x, x0 of
an object point X, the epipoles are intersections of the baseline between the optical
centres of the cameras and the corresponding image planes. If only the position of the
first image point x is known, epipolar geometry restricts the location of the second
image point x0 to lie on the epipolar line, which is the line joining the image point and
the epipole, and an epipolar plane is configured by the baseline and epipolar lines.
The Fundamental matrix F is a 3x3 matrix and is defined for all corresponding points
in two images (x, x0 ) as
x0 Fx = 0.
(31)
At least seven corresponding points have to be matched from two images to compute
the Fundamental matrix. For a general motion the rotation R, translation t and cameras’ internal parameters K, K0 encompassed in the Fundamental matrix relate the
32
3. Computer vision methods for navigation
Fig. 3.2. The epipolar geometry between two images including epipolar plane, epipoles
(e, e0 ) and epipolar lines (l, l0 ) [31].
image points in the first and second images (x, x0 ), respectively, as
x0 = K0 RK−1 x + K0 t/Z
(32)
where Z is the z-coordinate, i.e. the depth, of the object point.
When the image points are normalized as was explained above, the Essential matrix
may be used instead of the Fundamental matrix, as x̂0T Ex̂ = 0, where E = K0T FK.
3.2
Feature Extraction
First, the features have to be extracted for solving the motion of camera between
consecutive images. SIFT features, explained below, are good features to be matched,
when the environment contains many distinguishable objects. However, when the
environment is poor with features, such as an office corridor, features arising from
the constructions, like corners and lines, are more robust to be used. Below, first
the procedure called filtering is explained, because of its use in noise reduction from
images as well as edge detection. Then the extraction of two types of features used
herein, SIFT features and lines, is explained.
3.2. Feature Extraction
3.2.1
33
Filtering
Filtering is used for finding patterns and reducing noise from images. Filtering replaces the value of an individual pixel (x, y) with a weighted sum of its neighbors.
Different weights correspond to different processes [31]. The pattern of weights is
denoted as the kernel of the filter. The process of employing a filter is usually called
convolution and is defined as
Cij =
X
Gi−x,j−y Ix,y
(33)
x,y
where the ith and jth component of the convolution result is denoted with Cij , I is
the image and Gi−x,j−y is the kernel of the convolution.
A symmetric Gaussian kernel has the form of the probability density for a 2D Gaussian random variable and is a good kernel for noise reduction convolution. Using a
large standard deviation (σ) in convolution emphasizes the weight of the neighboring pixels and reduces the noise heavily, though causing some blurring. The Gaussian
kernel is presented as
Gσ (x, y) =
3.2.2
(x2 + y 2 )
1
exp(−
).
2πσ 2
2σ 2
(34)
SIFT Features
Scale Invariant Feature Transform (SIFT) [70] is an approach based on transforming an image into local feature vectors; SIFT descriptors, describing the intensities
around image points that are found as maxima or minima of a difference-of-Gaussian
function. Each vector is invariant to image translation, scaling, and rotation and partially invariant to illumination changes and affine or 3D projections.
The process for using SIFT features is divided into to parts: keypoint localization and
computation of a SIFT descriptor. First, the keypoints used for computing the SIFT
descriptors are localized as follows. The minima and maxima of a difference of Gaussian function are computed in SIFT by building an image pyramid and resampling the
data in each level. The 1D Gaussian kernel used is
34
3. Computer vision methods for navigation
g(x) = √
−x2
1
e 2σ2
2πσ
(35)
√
and σ = 2 [70]. The image is convolved twice using the mentioned sigma and
√
resulting in an image after the first convolution with smoothing of σ = 2 and an
image after two convolutions using the same sigma and having an effective smoothing
of σ = 2. Difference of Gaussian is obtained by subtracting the two images from
each other. Then, the image resulted from the second convolution is resampled using
bilinear interpolation and a pixel spacing of 1.5 resulting in an image having each
pixel as a constant linear combination of the four adjacent pixels. The maxima and
minima are found by comparing each pixel to its neighbours.
Secondly, the image resulting from the first convolution (I 1 ) in each level is processed for obtaining the image gradient magnitudes (Mij ) and orientations (Oij ) as
Mij
=
q
1
1
(Iij1 − Ii+1,j
)2 + (Iij1 − Ii,j+1
)2
(36)
Oij
1
1
= arctan 2(Iij1 − Ii+1,j
, Ii,j+1
− Iij1 ).
The gradient magnitudes (Mij ) are given a threshold of 0.1 times the maximum possible value to provide robustness to changes in illumination. Its effect on the orientations (Oij ) is much lower. The orientation invariance is obtained by convolving the
image using Gaussian kernel with large σ-value and by multiplying the weights with
the corresponding gradient value. A histogram with 10 degree intervals is built from
the convolution results and the dominant orientation of the feature is the peak in the
histogram. As a result the features in the image are represented with descriptors having a stable location, scale and orientation also invariant for changes in illumination
in consecutive images.
Feature detection is an active research area in computer vision. Although SIFT is
a comprehensive method, faster algorithms have also been developed [12] [88] [87]
and their suitability especially for smartphone applications should be assessed.
3.2.3
Line Extraction
Indoor and urban environments are constructed in a way that their structures constitute a three dimensional grid defining a orthogonal coordinate frame, also called Man-
3.2. Feature Extraction
35
hattan grid [25], containing straight parallel lines and therefore the methods based
on line features are suitable for these environments otherwise poor in features [55].
Lines are also good features for the basis of visual positioning because they are invariant to changes in the lighting, which is crucial especially for indoor positioning,
but also because straight lines remain straight in projective transformations and are
not disturbed by dynamic objects that are not blocking the view to all lines in the
scene. Line extraction begins by identifying edges of all features in an image and
then separating the straight lines from other features. These are explained below.
Edge Detection
Fast changes of brightness in the image indicate edges of objects. The brightness of
the pixel in an image depends on the characteristics of light sources as well as the
traits and orientation of the surface. The orientation of the surface is specified with
surface gradients. Canny edge detector [17] calculates magnitudes and directions of
these gradients. It is an optimal algorithm for detecting the edges requiring low error
rate of the calculations, the points to be well localized, meaning that the distance
between the calculated location of the edge and the real one has to be minimal, and
refusing multiple responses for an edge.
In a two dimensional image the edge has a position and an orientation. The direction
of the tangent to the edge contour is called edge direction. The edge is found by
convolving the image using a first derivative Gn in direction n of a two-dimensional
Gaussian G as a kernel and defined as
Gn =
δG
= n · ∇G
δn
(37)
x2 + y 2
).
2σ 2
(38)
where
G = exp(−
Now an edge point is a local maximum when the image I is convolved using Gn as a
kernel. The local maximum is found from an image pixel fulfilling
δ2
G·I=0
δn2
(39)
and the edge strength is calculated as
|Gn · I| = |∇(G · I)|.
(40)
36
3. Computer vision methods for navigation
A pixel having a magnitude of local maximum along the gradient direction belongs
to the edge and the process of finding the maxima using (39) is called non-maximum
suppression. The set of possible edge pixels found by looking for local maxima
contains too many members. The pixels having weak response to an edge have to be
excluded using a procedure called hysteresis. Hysteresis evaluates each pixel in the
possible edge set using two thresholds. All pixels having edge strength (40)above the
upper threshold are classified as part of an edge and all below the lower threshold as
not belonging to the edge. Pixels between the two thresholds are evaluated based on
their neighbours. If the neighbour belongs to an edge, then the pixel belongs also,
otherwise not. After all pixels in the possible pixel set are deemed to belong to an
edge or not, the optimal edge set is defined.
Canny edge detection is one of the most used edge detectors, but many others also
exist, for example Sobel, Laplace, and Prewitt operators [98].
Separating the Lines from Other Edges
Canny edge detection finds all changes of brightness in an image demonstrating an
edge of an object. For most computer vision applications there is still a need to find
certain shapes among all edges. Hough [46] developed a method for identifying lines
among all image pixels. His method maps all image points into a two-dimensional
parameter space, the parameters being the slope and the intercept of the line. Each
point is then examined and given a vote for all features possibly travelling through
it. Since both the slope and intercept are unbounded, a modified form of Hough
transform was developed and has been widely exploited in computer vision research
[28]. When extended it is suitable for finding also other curves than lines. The
method is based on the parameter space (ρ, θ), where ρ is the radius of a line passing
through the origin and normal to the line being detected and θ is its angle with the
x-axis as shown in Fig. 3.3. A straight line including the pixel (x,y) is then defined
as a sinusoid
ρ = x cos(θ) + y sin(θ).
(41)
When the possible values of θ are restricted to the interval [0, π] every line in the
image plane corresponds to a unique point in the parameter space defined plane.
Now the curves going through a common point in the parameter plane correspond
to the image points on a specific straight line. Therefore the lines are identified by
3.3. Image Matching
37
Fig. 3.3. Formulation of parameters ρ and θ in Hough transform.
looking for the points in the (ρ, θ)-parameter space having local maxima of votes.
The weakness of the otherwise sophisticated algorithm is that it is computationally
heavy. As in pedestrian navigation the real-time processing of algorithms is crucial,
more efficient line detection algorithm based on probabilistic Hough Transform was
developed in this thesis and will be presented in Chapter 6.
3.3
Image Matching
Matching is a process of identifying the corresponding features in two images of
the same scene taken at different viewpoints, different times, or by different sensors
(cameras). As the SIFT descriptors are invariant to rotation and translation, their
matching is restricted to finding the most similar descriptors in the two images,
i.e. the descriptors with minimum Euclidean distance [71]. SIFT functions well
for matching in environments full of features, such as in outdoors, but suffers from
errors if the images are comprised mostly of vegetation or dynamic objects [102].
When more robust matching is needed from the noisy image data RANdom SAmple
38
3. Computer vision methods for navigation
Consensus (RANSAC) [29] is used. The RANSAC algorithm enlarges the set composed by the minimum set needed for the solution with the points within some error
tolerance. This is done by searching for a random sample of points that leads to a fit
of the model in question for which many of the points agree. This leaves the outliers
out from the data used in calculations. The algorithm is used widely in computer vision applications such as vanishing point detection. A comprehensive explanation of
the algorithm with examples is given in [31]. RANSAC is however computationally
quite heavy and therefore not used in the methods discussed in this thesis emphasizing the computational efficiency.
3.4
Camera Calibration
As was explained earlier the operations performed for mapping the objects into images are described using projective geometry. However, navigation solutions need
the information to be presented in Euclidean reconstruction, i.e. in correct distances
and angles. This may be done by using a calibrated camera for capturing the images.
Calibration provides information about camera’s intrinsic parameters and is represented using a calibration matrix K. The camera intrinsic parameters are focal length
(fx , fy ), principal point (u, v), skew coefficient (S), aspect ratio and distortions. Focal length is defined as the distance between the centre of the cameras lens and the
sensor while taking a focused image of an object that is infinitely far. Principal point
is the intersection point of the cameras optical axis with the image plane, as was
shown in Figure 3.1. Distortions blur the image due to the fact that the focal length
varies in different points of the lens. The skew comes from manufacturing errors
and makes the two image axes non-orthogonal, the skew coefficient defines the angle
between these axes.
The general form of the calibration matrix is


fx S u


Kg =  0 fy v 
0 0 1
and aspect ratio may be computed as
fy
fx .
(42)
3.4. Camera Calibration
39
The skew in a normal camera is usually zero, except when taking an image of an
image, for example, when enlarging a negative [38]. A reduced form, with zero
skew, of a camera matrix K is normally used for computer vision applications and is


fx 0 u


K =  0 fy v  .
0 0 1
(43)
It is adequate for a pedestrian navigation system to calibrate the camera once and
assume the parameters unchanged since. The calibration may be done by photographing a certain model image from different viewpoints and then calculating the
parameters using the images and the known geometry of the model. In the methods
described in the thesis calibration is implemented using a Matlab application [15]. A
camera may be calibrated also with a single image using vanishing point information
and the zero assumption of the skew. If the positions of three vanishing points can
be recovered, the focal length and centre of projection (the principal point) may be
determined [60]. When the accuracy of the navigation solution may be compromised
for the sake of adaptability, the focal length may be found from the images Exchangeable Image File (EXIF) data but is an average of values the cameras of the type in
question have and therefore not as accurate as the focal length obtained by calibrating the certain camera used, and the principal point may be assumed to be the central
point of the image.
3.4.1
Distortion
The best accuracy for vision-aided calculations is obtained when a camera with a
wide angle lens offering an extended field-of-view is used, as will be shown in
Chapter 4. However, the wide angle lens results in radial distortion in the images.
If the distortion is not corrected, the calculation accuracy suffers. According to [38],
the rectification of the whole image introduces aliasing effects complicating the feature detection. For optimal result, when a wide angle lens camera is used the radial
distortion is corrected only for the features extracted from the images with a model
presented in [72] and explained below.
The radial distance (rd ) of the normalized distorted image points (xd , yd ) [72] from
40
3. Computer vision methods for navigation
the radial distortion center, which is in most cases the principal point (u,v), is
q
rd = x2d + yd2 .
(44)
Using the radial distance of the distorted image points, the radial distance (r) of the
corrected image points (xu , yu ) is obtained as
r = rd (1 − k1 rd2 − k2 rd4 ).
(45)
The constants ki are the distortion values specific to the camera and are obtained from
calibration. The corrected and distorted image points are related as
xd = xu (1 + k1 r + k2 r2 )
yd = yu (1 + k1 r + k2 r2 ).
(46)
The effect of the distortion correction is shown in the case of the feasibility of the
visual gyroscope in Chapter 4.
4. VISUAL GYROSCOPE
Urban scenes in indoor and downtown environments consist mainly of straight lines
in three orthogonal directions [25]. The projective transformations mapping the three
dimensional scene into a two dimensional image preserves the straight lines, but not
the angles, and therefore the lines parallel in the scene seem to intersect in the image.
The lines in three orthogonal directions form three intersection points called vanishing points. The vanishing points arising from lines in x- and y-axis directions are
called horizontal and vertical vanishing points, respectively, and from the lines in the
direction of propagation (z-axis) the central vanishing point.
A vanishing point is the intersection point v of a ray through the camera centre having a direction d, and of all other lines also having direction d, and the image plane.
The vanishing point v is related to the direction d as v = Kd [38] where K is the
camera calibration matrix encompassing the intrinsic parameters of the camera. The
directions d and d0 of two vanishing points in consecutive images are related by the
Rotation matrix R as d0 = Rd, i.e. rotation of the camera between the two images.
The change of position of the camera between the images, meaning pure translation
with no rotation, has no effect on the vanishing point location. The rotation R of
the camera may also be thought as the rotation from the initial position where the
camera is aligned with the navigation frame so that the z-axis of the camera is pointing to the direction of the propagation and the x- and y-axes are orthogonal to the
z-axis as shown in Figure 3.1, i.e. the orientation of the camera with respect to the
navigation frame. In this initial configuration, which requires a careful alignment
of the camera in a way that its optical axis coincides with the construction of the
environment, the central vanishing point vz lies at the principal point and the other
two vanishing points at infinity on the x and y image axes. Then the orientation of
the camera R is described with V = KR, where V is the vanishing point location
matrix incorporating the horizontal, vertical and central vanishing points, [vx vy vz ]
and vi = (xvi , yvi , 1)T , K is the calibration matrix containing the camera intrinsic
42
4. Visual gyroscope
parameters (defined in Chapter 3), and R is the rotation matrix of the camera [32].
As the perfect alignment of the camera optical axis and the environment is not possible in reality, the navigation solution is initialized by measuring the orientation of
the user using other means, as will be discussed in Chapter 6, and then propagating
the orientation using the visual gyroscope measurements in two consecutive images.
Also, the initialization is crucial in order to present the orientation of the camera with
respect to the scene obtained using visual perceiving and explained in this chapter as
heading, pitch and roll in the navigation frame.
From the change of the vanishing point locations in consecutive images the change
in the camera attitude may be monitored and the camera used as a visual gyroscope.
It should be noted that the methods presented in the thesis assume that the camera
is not experiencing heading change, pitch or roll that exceeds 90 degrees between
consecutive images. When only the central vanishing point is obtained, the visual
gyroscope provides the pitch and heading of the user, if also either the horizontal or
vertical vanishing point is perceived, the full three dimensional attitude is provided.
First, the method for obtaining the central and vertical vanishing points is explained.
Secondly, the rotation matrices for the user heading and camera pitch are given, then
the configurations for full attitude. Effective error detection is crucial for not deteriorating the integrated navigation solution using erroneous visual measurements
and therefore a concept of Line Dilution of Precision (LDOP) has been developed
and will be presented. The influence of different camera characteristics on the performance of the visual gyroscope has been studied and will be discussed. Finally,
smartphone implementation of the visual gyroscope will be represented.
4.1
Locating the Vanishing Points
In order to find the central vanishing point and furthermore the heading change and
the pitch of the camera, the straight lines in the direction of the propagation must
be identified in the image. Because the images are noisy, especially the ones taken
with a smartphone camera and in an indoor environment, the images must be preprocessed. The images are smoothed using a Gaussian filter by replacing the image
pixels with a weighted sum of their neighbour pixel values and therefore reducing
the noise [31]. All edges in images are identified with a Canny edge detector and the
straight lines separated from the set of all edges with the Hough Lines algorithm as
4.1. Locating the Vanishing Points
43
explained in Chapter 3. All lines found are classified based on their orientation with
respect to the camera frame, as going in the direction of the z-axis, totally horizontal
or totally vertical and horizontal or vertical. Totally horizontal (and vertical) lines
have angle of zero degrees with respect to the x-axis (y-axis), i.e. the slope of the
line is zero (infinite), whereas the ones classified as horizontal or vertical have a
slope lower than a threshold with respect to the corresponding axis. The central
vanishing point is found by using a voting scheme, namely each intersection of all
line pairs, i.e. vanishing point candidate, is voted for by all the lines found and
the point that gets most of the votes, in this case the intersection point of most of
the lines, is selected as the correct one. However, this configuration assumes that
the main proportion of the lines in the direction of propagation are parallel with the
construction, i.e. a specific non-parallel pattern of floor or wall decoration is not
dominating. The classification of the lines and the central vanishing point found with
the method explained are shown in Figure 4.1. A misclassification may be seen on
the right edge of the Figure, where blue, turqoise and green lines all represent lines
going in the direction of propagation, but because of their slopes three are classified
as horizontal and one as totally horizontal line.
Only the 2 degrees-of-freedom attitude may be obtained from one vanishing point
location and in order to resolve also the roll at least one other vanishing point in
addition to the central one has to be located. Experiments show that the horizontal
lines are infrequent in urban and indoor environments and therefore in this thesis the
vertical vanishing points are tracked.
When the camera is experiencing only a small roll (as in Figure 4.1), the lines in
vertical directions are mainly totally vertical due to the relatively low resolution of
the images, which reflects the fact that the pixels obtained with a camera experiencing
small roll are overwhelmed by the noise in images. When the camera has a larger
roll, most of the vertical lines have slopes deviating from infinity, as is the case in
Figure 4.2. The ratio of the number of vertical lines that have finite slopes and that
of the lines that have infinite slopes is calculated. If the ratio exceeds a threshold,
the camera is experiencing roll, and the location of the vertical vanishing point is
calculated similarly as the central one and incorporated into the rotation calculations.
The locations of the central vanishing points in Figures 4.1 and 4.2 are shown using
a red circle. The vertical vanishing points lie outside the image and therefore are not
shown.
44
4. Visual gyroscope
Fig. 4.1. Lines in an image with no (or minor) roll are classified based on their slope as
totally vertical or horizontal (green), vertical (white dotted), horizontal (turqoise)
and along the direction of propagation (blue). Red dot is the central vanishing point.
4.2
Attitude of the Camera
As discussed above, the heading, tilt and roll of the camera are observed with respect
to the scene, namely to the initial orientation where the camera x-, y- and z-axis are
aligned with the axis of the scene. In order to obtain a navigation solution these
have to be related to the navigation frame as will be shown later. When the camera is
rotated from the initial position counter clockwise with heading θ degrees the rotation
matrix has the form


cos θ 0 sin θ


R= 0
1
0 
− sin θ 0 cos θ
and the pitch towards the floor plane φ degrees
(47)
4.2. Attitude of the Camera
45
Fig. 4.2. When the camera experiences roll the number of totally vertical lines decreases.
Lines are classified based on their slope as totally vertical or horizontal (green),
vertical (white dotted), horizontal (turqoise) and along the direction of propagation
(blue). Red dot is the central vanishing point.


1
0
0


R = 0 cos φ − sin φ .
0 sin φ cos φ
(48)
When the camera experiences these two rotations the matrix R becomes


cos θ
0
sin θ


R =  sin φ sin θ cos φ − sin φ cos θ .
− cos φ sin θ sin φ cos φ cos θ
(49)
When the calibration and rotation matrices are as explained above, the heading (θ)
and pitch (φ) angles may be obtained from the location of the central vanishing point
as
46
4. Visual gyroscope



fx 0 u
cos θ
0
sin θ



V = KR =  0 fy v   sin φ sin θ cos φ − sin φ cos θ
0 0 1
− cos φ sin θ sin φ cos φ cos θ
(50)
resulting in


fx cos θ − u cos φ sin θ
u sin φ
fx sin θ + u cos φ cos θ


V = fy sin φ sin θ − v cos φ sin θ fy cos φ + v sin φ −fy sin φ cos θ + v cos φ cos θ
− cos φ sin θ
sin φ
cos φ cos θ
(51)
and thus


fx sin θ + u cos φ cos θ


vz = −fy sin φ cos θ + v cos φ cos θ .
cos φ cos θ
(52)
The vanishing point vz is presented in homogenous coordinates as (x, y, 1), where
the x, y are the pixel coordinates of the central vanishing point obtained using the
voting scheme explained above. As the third row of the (52) equals 1, the heading
(θ) and pitch (φ) may be computed as
x−u
)
fx
y−v
φ = arcsin(
).
−fy cos(θ)
θ = arcsin(
(53)
An important note is that the heading and pitch obtained by tracking the vanishing
points reverses to the definition of the navigation frame and has to be accommodated
for in the integration. To be exact, when the camera rotates clockwise, i.e. its heading
increases in navigation frame, the vanishing point location moves counterclockwise
and its x-coordinate decreases and therefore the obtained visual heading θ has an
opposite sign compared to the heading in the navigation frame. Also in the navigation
frame the pitch increases upwards as φ decreases.
Tracking the central vanishing point provides information about the user heading
change and camera pitch. When also roll β is required, at least two vanishing points
4.3. Error Detection
47
are needed, in this thesis the other being the vertical vanishing point as explained
above. Now the rotation matrix R of the camera experiencing only roll has the form


cos β − sin β 0


R =  sin β cos β 0 .
0
0
1
(54)
And the full rotation of the camera experiencing simultaneously change in heading,
pitch and roll is


cos β cos θ − sin β sin φ sin θ − sin β cos φ cos β sin θ + sin β sin φ cos θ


R = sin β cos θ + cos β sin φ sin θ cos β cos φ sin β sin θ − cos β sin φ cos θ
− cos φ sin θ
sin φ
cos φ cos θ
(55)
and all three angles may be resolved using the two vanishing point locations.
4.3
Error Detection
In the case when the location of the vanishing point is known a priori to some extent,
the intersection points deviating remarkably from the estimate may be discarded and
an accurate orientation measurement obtained [82]. The method, used for UAVs (unmanned aerial vehicles), determines the possible vanishing point locations based on
the known potential attitude of the camera. The error detection based on the estimated vanishing point location is suitable for robot and vehicle navigation, where the
motion is to some extent foreseeable and stable. However, no limitations may be imposed for the possible vanishing point locations in pedestrian navigation, especially
if the vision-aiding is done using a smartphone camera, because the motion of the
pedestrian is much more unpredictable. Therefore a method evaluating the vanishing
point accuracy based on the geometry of lines used to compute it is developed in this
thesis.
The concept of Dilution of Precision (DOP), originally specifying the geometry of
satellites used for obtaining a position solution with GNSS [54], was introduced into
vision-based navigation by [69]. Their DOP value presented the orientation and position of pseudo ground control points (PGCP) needed in navigation using a camera
48
4. Visual gyroscope
and 3D maps constructed of the environment. Now the concept of an LDOP, a dilution of precision value demonstrating the geometry of the lines used for calculating
the position of the vanishing point is developed. The method is based on dividing the
scene in an image into four quarters around the estimated vanishing point.
If lines intersecting at the vanishing point are found from all four sections, the estimated vanishing point is correct with high probability, and it is given a minimum
√
LDOP value, namely 2. The situation is visualized in Figure 4.3a. The justification
for the minimum LDOP value selection is given below, where the situation of reduced line geometry is explained. If the lines intersecting at the estimated vanishing
point are from three of the sections, the line geometry is still determined sufficiently
accurate and a low LDOP value is assigned to the estimated vanishing point. When
the geometry of lines is reduced, namely the lines are found only from two sections,
shown in Figures 4.3b and 4.3c, or especially only from one, as in Figure 4.3d, more
evaluation of the geometry must be done. In the case where lines are found from two
sections, the accuracy of the estimated vanishing point is strongly dependent of the
orientation of the lines found. If the lines are from opposite quarters, as in Figure
4.3c, namely the angle between the lines is close to or larger than 180 degrees, the
intersection is in an incorrect location with a higher probability and a higher LDOP
value is assigned than for the case where the lines are from adjacent quarters, namely
the angle between them is less than 180 degrees as in Figure 4.3b. This reasoning is
derived from the fact that often the lines from opposite sections are actually parts of
the same line split by the line detection algorithm due to changes of brightness in the
image and therefore there is in reality no intersection point.
When the line geometry is reduced into a set of lines found only from one section, as
shown in Figure 4.3d, the LDOP evaluation is based on the mutual alignment of the
lines using a method proposed in [5]. The angle between all lines in the set and the
x-axis of the image is calculated and the pair with the largest angle between them is
selected. The angle between the first line of the pair and the image x-axis (α1 ) and
between the second line of the pair and the image x-axis (α2 ) is obtained using the
estimated central vanishing point location (xvz , yvz ), the starting point (xi , yi ) of line
i ( i=1,2) and distance (Di ) of the estimated vanishing point from the starting point
of line i as
4.3. Error Detection
49
Fig. 4.3. Four images resulting from the vanishing point calculation. The image is divided
into four sections around the estimated vanishing point (section borders shown with
black dotted lines) and its reliability is evaluated based on the geometry of the blue
lines used for calculations. The vanishing point is found correctly in images a, b and
d (outside the image). The black continuous lines are used to calculate the vertical
vanishing point.
xvz − x1
D1
yvz − y1
sin(α1 ) =
D1
xvz − x2
cos(α2 ) =
D2
yvz − y2
sin(α2 ) =
.
D2
cos(α1 ) =
(56)
The matrix H, characterizing the line geometry and containing unit vectors of the
50
4. Visual gyroscope
lines, is
H=
cos(α1 ) sin(α1 )
cos(α2 ) sin(α2 )
!
.
(57)
The matrix G is formed from the geometry matrix H using G = (HT H) and G−1
is
1
|G|
!
− cos(α1 ) sin(α1 ) + cos(α2 ) sin(α2 )
sin2 (α1 ) + sin2 (α2 )
− cos(α1 ) sin(α1 ) + cos(α2 ) sin(α2 )
cos2 (α1 ) + cos2 (α2 )
G−1 =
(58)
where |G| is the determinant of G and is |G| = sin2 (α1 − α2 ). The GDOP in
GNSS positioning applications is calculated using the diagonal values of the G−1
matrix as explained in Chapter 2 and the result may be transformed for the case of
two satellites [5], and further used to calculate the LDOP for the lines as
s
LDOP =
1
(cos2 (α1 ) + cos2 (α2 ) + sin2 (α1 ) + sin2 (α2 )).
|G|
(59)
q
2
For any two angles (α1 , α2 ) (59) may be now written as LDOP =
|G| . The
√
smallest possible LDOP value is 2 and arises from the maximum angle between
two lines lying in the same quarter section, namely 90 degrees. When the magnitude
of the angle between the lines is more than 10 degrees, the accuracy of the estimated vanishing point location is still sufficient, but decreases rapidly when the angle
decreases.
Evaluation of the accuracy of the estimated vertical vanishing point cannot be done
using the line geometry, but is based on monitoring the roll obtained. A camera can
be rolled over 15 degrees for the purpose of obtaining images with special viewpoints
[32], but it is seldom done unintentionally and such large roll is not convenient for
vision-aided navigation. Because the calculation of an accurate vertical vanishing
point is not always possible (due to noise in the images and shortage of lines) the
roll‘s magnitude must be monitored. If the roll‘s magnitude exceeds 15 degrees, the
4.4. Performance of the Visual Gyroscope
51
vertical vanishing point is discarded as it is deemed erroneous. In these situations
the roll is set to zero and the heading and pitch are calculated more accurately using
only the central vanishing point. If the camera is actually experiencing roll when the
calculations fail and the roll is set to zero, errors appear also in the heading and pitch.
The effect of this error into the user attitude observations is discussed in detail below.
When the value of LDOP is large, the uncertainty of the vanishing point location is
large. Therefore visual gyroscope’s measurements possessing a large LDOP value
are discarded in the calculation of the navigation solution. As a result possible visual
attitude errors arising from poor line geometry are avoided.
4.4
Performance of the Visual Gyroscope
The visual gyroscope is a comprehensive method providing a heading change measurement but it does not provide any absolute value of the heading and must therefore
be integrated with measurements from other sources. Accurate initialization of the
visual gyroscope’s heading is crucial using absolute heading information. The visual
gyroscope is based on the vanishing point observations calculated using lines found
from the image of the environment and as a result cannot be used during sharp turns,
when the visibility to building boundaries forming lines is lost due to the camera
being too close to the wall. If however the view to the lines is maintained during a
turn, the change of the world frame may be perceived only when the image rate is
high. Therefore the visual gyroscope’s heading need to be augmented with heading
measurements obtained using another system, e.g. a rate gyroscope, a magnetometer
or a floor plan also occasionally during navigation.
Identification of the correct vanishing point location is dependent on the number and
geometry of the lines found in the image. Low lighting of the navigation environment reduces the number of lines found, possibly resulting in erroneous vanishing
point location. In addition to low lighting, the problem of obtaining an erroneous
vanishing point location may arise from the selection criteria of the Hough Line algorithm parameters. The parameters used in this thesis are adjusted so that lines
shorter than a threshold are left out of the computation to reduce the number of nonparallel lines disturbing the vanishing point calculation. An optimal parameter for
indoor environments was found by through experimentation, being 25 pixels. When
the scene consists of a plane, there are no lines in the image and the vanishing point
52
4. Visual gyroscope
Table 4.1. Statistics for heading change accuracy, all units degrees
Statistics
Heading change
Pitch
min error
0
0
max error
18.4
10.7
mean error
0.8
0.3
std of error
0.6
0.3
cannot be calculated. Sometimes, despite the parameter selection, the set of lines
found from the scene consists of a number of nonparallel lines resulting in erroneous
vanishing point location.
The accuracy of the heading change as well as the pitch estimates were evaluated
with a test containing 7555 images taken with a static camera in an office environment. The test environment had changing light as well as dynamic objects in the
scene of the camera sometimes encompassing the view totally. Table 4.1 shows the
statistics for the heading change and pitch obtained in the 2.5 hour time span of the
test, the mean errors being 0.8 and 0.3 degrees, and standard deviations 0.6 and 0.3
degrees, respectively. The errors in heading and pitch measurements all came from
dynamic objects namely human, blocking the view of the camera. The mean errors
and deviations are good for an indoor navigation system, however the office corridor
is a favourable environment for the method. Additional test results with a moving
camera will be presented later in Chapter 6.
Allan Variance analysis was performed for the visual gyroscope. The most significant source affecting the gyroscope accuracy, especially a MEMS gyro, is the drift.
The Allan variance analysis method [6] was originally developed for the study of
oscillator stability, but has since been applied widely to gyro drift analysis. Because
the Allan variance method is suitable for the noise study of any instrument, it is applied here to evaluate the noise level of the visual gyroscope. The Allan variance
2 (t ) [57] for the averaging time t is
σC
A
A
2
σC
(tA ) =
N
−1
X
1
(ỹ(tA )k+1 − ỹ(tA )k )2
2(N − 1)
(60)
k=1
where ỹ(tA )k is the average value of a bin k containing the heading change and pitch
values. The averaging time tA is the length of a bin and N is the number of bins
formed of the data for the corresponding averaging time. The plotted Allan variance
may be used for finding different error types for the sensors [114].
4.4. Performance of the Visual Gyroscope
53
Fig. 4.4. Allan deviation plot showing the noise in the visual gyroscope.
The Allan deviation plot for the 7555 images taken is shown in Figure 4.4. The figure
shows large deviations due to the uncorrelated noise affecting the visual gyroscope
stability for the short integration times. After the deviation has reached a minimum
value, the rate random walk starts to increase the deviation again. The bias instability
measure may be found from the minimum value, and is 0.058 degrees/second for
the heading, and 0.045 degrees/second for the pitch, at the integration time of 245
seconds. The bias is a result of quantization of an analog signal into discrete image
pixels as well as of possible biases in edge and line detection algorithms [53]. The
obtained measure is lower than the values obtained for a typical MEMS gyroscope
tested in [114].
The test showed also the tolerance of the method to dynamic objects, which may be
seen from Figure 4.5. Heading angle and pitch errors due to dynamic objects obscuring the scene were very infrequent. The errors in the visual gyroscope are mostly
54
4. Visual gyroscope
Fig. 4.5. Calculation of the central vanishing point is largely tolerant to dynamic objects in
the scene.
introduced by environmental factors such as lighting, construction of the environment
and objects with lines not parallel to the direction of propagation.
As discussed before the roll observations may be significantly erroneous due to the
restricted amount of vertical lines in the scene and therefore angle measurements over
15 degrees are not trusted but the roll is set to be zero. Evidently the roll is not totally
zero in most cases and setting the roll to be zero introduces errors also to heading and
pitch measurements. Fortunately, in most cases these errors are small as is shown
below using the numerical analysis of resulting error measures. This numerical analysis was done by computing the correct central vanishing point location using the
real orientation values presented below, then setting the roll zero and computing the
resulting heading and pitch using the obtained vanishing point and Equation 52. In
most common use cases when the heading changes and roll between two images are
less or equal to five degrees, the errors in estimated heading and pitch are 0.6 and 0.1
degrees, respectively. In an extreme case when the camera is simultaneously experi-
4.4. Performance of the Visual Gyroscope
55
Table 4.2. Effect of roll error on other angle observations
Real camera rotation (degrees)
Heading
1
5
15
-15
Pitch
1
5
-15
-15
Roll
-15
-5
15
15
Errors in observation
when roll estimated
to be zero (degrees)
Heading
Pitch
0.3
0.01
0.6
0.1
6.1
2.1
5.2
2.8
encing roll and heading changes of 15 degrees between two consecutive images, the
errors in heading and pitch are 6.1 and 2.8 degrees, respectively. When the camera
is otherwise static (i.e. the heading change and pitch are around one degree between
consecutive images) even a large roll causes small errors to the observed heading
and pitch, namely 0.3 and 0.01 degrees, respectively. Table 4.2 summarizes some
errors arising from camera motions between two consecutive images when the roll is
erroneously estimated to be zero due to vertical vanishing point calculation failures.
4.4.1
Theoretical Analysis of Attainable Accuracy
The statistical behavior of errors affecting the visual gyroscope was analyzed in order
to obtain an understanding of the attainable accuracy of the method. This was done
by observing the errors in the line detection and the vanishing point computation
directly affecting the accuracy of the visual gyroscope. The following discussion
is derived from [53] which includes more detailed explanations and proofs for the
theorems used. For the analysis a unit surface normal of a line (l) to the plane passing
through the origin O and a unit vector indicating the orientation of the ray starting
from the origin passing a point (X) are presented using normalized unit vectors n and
m, respectively, and defined as


 
x
X


 
n =  Y  , m = y  ,
Z/f
f
(61)
where (x,y) is a point on the image plane, f is the focal length and the line (l) is
56
4. Visual gyroscope
presented as Xx + Y y + Z = 0.
Noise Model
The noise in vision-aiding methods is not introduced solely by the camera but also
by the image operations e.g. edge and line detection. In the presence of noise the
normalized unit vectors presented are observed perturbed as m0 = m + ∆m, where
∆m is a random variable presenting noise. If the noise ∆m is small compared to the
vector m the error may be characterised by the covariance matrix C as
C(m) = E(∆m∆mT )
(62)
where E() denotes expectation and T transpose. The covariance matrix C is symmetric and semi-positive definite with three eigenvalues σ12 , σ22 , 0 and corresponding
orthonormal system of eigenvectors u, v, m, C may be expressed in the spectral decomposition C = σ12 uuT + σ22 vvT + 0mmT .
Assuming the noise is independent and identically distributed having standard deviations sx , sy in the image x-q
and y-axis directions, respectively, the root-square
magnitude of the noise is = s2x + s2y . Now is called the image accuracy and
measured in pixels and ˜ is defined as ˜ = /f .
Error in Line Detection
Errors in image resolution affect the line detection, which again affects the vanishing
point detection and furthermore the visual gyroscope’s attitude measurements. A line
is detected by looking at collinear edge pixels. mα , α = 1, ..., K are the normalized
unit vectors representing all edge pixels of a line. Ω represents the disparity, i.e. the
angle between m1 and mK , u the orientation of the line and mG its center point.
Now, the covariance matrix C of the optimally fitted line is
C(n) =
mG mTG
˜2
uuT
(
+
).
K 1 − sincΩ 1 + sincΩ
(63)
When the length of the line is k the disparity may be defined using the focal length as
Ω ≈ k/f . If k is small compared to the focal length, disparity is small and C may be
4.5. Effect of Camera and Setup Characteristics on the Accuracy of the Visual Gyroscope 57
approximated as
C(n) ≈
6κ T
κ
uu + 2 mG mTG ,
3
k
2f k
(64)
2
where κ = ρ is called image resolution and ρ is the line density i.e. the number of
pixels per unit pixel length.
When the length of a line is small compared to the focal length, the covariance further
reduces to
C(n) ≈
6κ T
uu .
k3
(65)
Error in Vanishing Point Detection
As discussed before, the vanishing point is the intersection point of parallel lines.
When their projections into image plane are lα , α = 1, ..., K, the normalized unit
vector m presents the vanishing point and nα is normalize unit vector of the lines. nG
is the normalized unit vector of a virtual center line of all lines and unit eigenvector
for the second largest eigenvalue mC = m × nG . When φα represents deviation,
i.e. the angle between nG and nα and θα is disparity of the vanishing point from the
center point of the line α, the covariance matrix of the vanishing point is
C(m) ≈ PK
6κmc mTc
2
2
3
α=1 wα sin φα / sin θα
4.5
.
(66)
Effect of Camera and Setup Characteristics on the Accuracy of the
Visual Gyroscope
Image quality, camera rate and mounting location affect the visual motion perception. In this chapter these aspects are considered from the point of view of the visual
gyroscope‘s performance.
Many factors affect the accuracy of measurements calculated from images. As discussed before, the vanishing points based rotation observation is dependent on the
58
4. Visual gyroscope
amount of straight lines found from the image and is therefore reliant on the scene.
The failure of the method in unsuitable scenes sets requirements for the image rate;
when the image rate is low, based on the experiments a feasible image rate would be
10 Hz, the probability of obtaining accurate heading change observation is reduced.
Image quality and parameters for algorithms used in visual computations affect the
amount and correctness of detected line features. Therefore the characteristics of the
camera used as a visual gyroscope is significant for the success of the method. The
quality and features of the image sensor are crucial in terms of the amount of noise
present in the images. The aperture is the lens diaphragm opening inside the camera
lens. It regulates the amount of light passing to the sensor. The aperture size is indicated by an f-number. Smaller f-number indicates more light is let in and the higher
the image quality in low-light situations. The focal length of the camera, discussed
in the previous chapter, influences the sharpness of the image. Images taken using a
camera with a wide-angle lens, namely a lens with a short focal length, are sharper
than the ones taken using a standard lens.
When the pedestrian is navigating and holding the camera in hand, the heading
change of the camera may be transformed into the heading change of the user, if
the configuration of the camera with respect to the body of the user is carefully considered. The roll and pitch of the camera may be obtained from the locations of the
vertical and central vanishing points, but if the camera is not aligned with the user
body, they may not be considered as the orientation of the user. Likewise, if the hand
is in motion in the heading direction that is not equal to the motion of the body, the
heading change provided by the visual gyroscope is not accurate.
Three different cameras and two different setups were tested in an experiment done
mainly indoors in a challenging environment to address all the factors affecting the
quality of visual gyroscope‘s roll, pitch and heading change observations. The three
cameras used for the experiments are a GoPro HD Hero helmet camera [1] directed
for first responders, a Sony HD video camera aimed for extreme sports [100] and
a Nokia N8 smartphone‘s camera [77]. Table 4.3 summarizes their most important
parameters.
The GoPro Hero is a helmet camera developed for first responders and recreational
users. Its wide-angle lens, providing tall HD video stream, gives extended field-ofview both in horizontal and vertical directions. The wide-angle lens increases the
number of lines found in addition to providing sharper images. The camera has a
4.5. Effect of Camera and Setup Characteristics on the Accuracy of the Visual Gyroscope 59
fixed lens and captures video with a speed of 30 frames per second. The video was
converted into still images having an image rate of 10 Hz and resolution of 1280 x
960 pixels for the camera characteristics experiment. The f-number of the lens is
2.8, resulting in increased performance in low-light indoor environments. Besides
providing sharp and extensive images, the wide-angle lens produces distortion. The
distortion has to be corrected for in order to obtain accurate vision-based calculations
using the method introduced in the previous chapter.
The Sony HXR-MC1 is a camera for recording video during extreme sports or in
other high dynamic situations. It has a standard lens with an f-number of 3.2. The
camera captures video with a speed of 30 frames per second. The video was converted into still images having an image rate of 10 Hz and resolution of 1440 x 1080
pixels. The images taken with the Sony camera are darker and blurred compared to
the images captured using the other two cameras, and the view is more restricted.
The Nokia N8 smartphone camera has a wide-angle lens and an f-number value of
2.8, increasing the performance of the camera in low-light indoor environments. The
camera was programmed to capture still images with a 0.8 Hz rate and resolution of
640 x 480 pixels. The images taken with the Nokia camera have more light than the
ones taken using the Sony camera, but they are not as sharp as the ones using the
GoPro unit.
Figure 4.6 shows the same scene taken with the GoPro (left), the Sony (middle)
and the Nokia (right) cameras. The images show the effect of the different camera
parameters discussed above. The effects of the above factors, focal length value and
low-light tolerance, may be seen in the figure. The images on left and right are
taken with cameras having wide-angle lenses and small f-numbers and therefore the
images are sharp, bright and wide, while the image in the middle is taken with a
camera having a standard lens and larger f-number, in which case the image quality
is poorer.
The camera’s sensor is the component that converts the light in the lens projected
image into electrical signal to be then digitized. Complementary metal oxide semiconductor (CMOS) and charge coupled device (CCD) chips are the two most used
sensor types. The main difference between the two types of sensors is the way they
read the information [34]. Each sensor consists of millions of light-sensitive photosites which correspond to pixels in an image. In a CMOS sensor information is
60
4. Visual gyroscope
Fig. 4.6. An image captured of the same scene with three different cameras showing the effect
of different camera characteristics on the image quality.
Table 4.3. Parameters for GoPro, Sony and Nokia cameras
Parameters
GoPro Hero
Camera
Sony HXR-MC1
Nokia N8
8 mm
f/2.8
10 Hz
79.5 mm
f/3.2
10 Hz
28 mm
f/2.8
0.8 Hz
Focal length
(35 mm equivalent)
f-number
Image rate
read from each photosite individually whereas in CCD a line of photosites is read
at once. This makes the CCD sensors simple to design whereas the CMOS sensors
are power efficient and therefore widely used in low-cost cameras as in smartphones
but with the cost of introducing more distortion into images. All cameras used in the
experiments presented in the thesis contain a CMOS sensor.
4.5.1
Experimental Results
The NovAtel SPAN-SE GPS/GLONASS receiver with Northrop Grumman’s tactical
grade LCI-IMU was used as a reference for both experiments and was carried in
a backpack. The first round of experiments was completed using the GoPro and
Sony cameras attached to the upper part of the backpack and with the Nokia N8
smartphone in the hand, as is shown in Figure 4.7. The reference was used to evaluate
the accuracy of the rotation angles provided by the visual gyroscope using GoPro’s
and Sony’s cameras and the heading change using the Nokia N8 smartphone camera.
The evaluation of the full three dimensional rotation was not possible for the Nokia
N8 smartphone camera due to the lack of a reliable reference system that could have
4.5. Effect of Camera and Setup Characteristics on the Accuracy of the Visual Gyroscope 61
Fig. 4.7. Experiment setup for testing the effect of different camera characteristics on the
heading change accuracy.
been attached to the camera, but was possible to the other cameras as they were
experiencing the same pitch and roll as the reference system due to their location.
The second setup consisted of the reference system and all three cameras attached to
the upper part of the backpack, enabling the evaluation of the effect of the camera
site to the heading change accuracy. Both setups were tested in experiments with
duration of almost 30 minutes each. As the GoPro and Sony video streams were
sampled at 10 Hz rate the data collection resulted in 14264 (first round) and 14162
(second round) images using GoPro camera, 14279 (first round) and 14158 (second
round) using Sony and 1162 using Nokia N8 smartphone’s camera for both rounds.
The different number of images for Sony and GoPro was due to the time stamping
method, in which a handheld GPS clock was shown to the camera and the first image
used was the first image where the time was seen clearly.
The test was conducted on the University of Calgary campus, mainly indoors. The
environment was very challenging for the vanishing point based navigation, because
it consisted of many turns, doors (i.e. planes) and spacious cafeteria and hall areas
(i.e. not all lines were orthogonal and view to lines was restricted in some parts).
The pitch, roll and heading change between two consecutive images was computed
62
4. Visual gyroscope
Table 4.4. Heading change error statistics
GoPro Hero
Sony HXR-MC1
Nokia N8
Statistics of heading change error
Std
Min
Max
%
of
images
(degrees) (degrees) (degrees) (degrees) success
2.5
2.7
0
17
82
2.8
3.0
0
19
72
2.3
3.3
0
18.6
35
2.6
3.3
0
25
30
4.6
3.7
0
15.5
57
4.4
3.7
0
16
45
Mean
Camera
Test 1
Test 2
Test 1
Test 2
Test 1
Test 2
using the vanishing point based visual gyroscope measuring the full 3D orientation.
Because the heading change between two images was evaluated, when the visual
gyroscope measurement failed, the heading change was again computed using the
subsequent two successful consecutive images.
Statistics of heading change errors for all cameras and two rounds (test 1 and test 2)
are shown in Table 4.4. The success rate of the images, based on the LDOP error
detection algorithm, is also shown in the table. Before computing the statistics the
visual orientation measurements evaluated as erroneous by the error detection were
discarded. The mean error in heading change from the visual gyroscope using the
GoPro and Sony cameras was around 2.5 degrees, and that using the Nokia N8 was
around 4.5 degrees. The percentage of successful images was between 70 and 80
percent for the GoPro camera, around 50 for the Nokia N8 and only around 30 for
the Sony camera. While the failed images contained the ones taken during sharp
turns, in situations where the light was insufficient and in spacious areas containing
non-parallel lines, common for all cameras, the differences in the success rate are explained by the different characteristics of the cameras and are discussed below. Table
4.5 summarizes the roll and pitch error statistics. The roll and pitch accuracy was
evaluated only for test 2, because of the lack of a reference system for the rotations
of the handheld Nokia N8 in test 1. The mean roll and pitch errors were 0.5 and
2.0 degrees for the Sony, 2.1 and 2.5 degrees for the GoPro and 1.3 and 3.8 degrees
for the Nokia N8. The results are elaborated for image quality, image rate, and the
location of the camera on the user.
4.5. Effect of Camera and Setup Characteristics on the Accuracy of the Visual Gyroscope 63
Table 4.5. Roll and pitch error statistics
Camera
GoPro Hero
Sony HXR-MC1
Nokia N8
roll
pitch
roll
pitch
roll
pitch
Statistics of pitch and roll
Mean error
Std
Min
Max
(degrees) (degrees) (degrees) (degrees)
2.1
3.1
0
26
2.5
4.0
0
59
0.5
0.9
0
15.4
2.0
3.0
0
22
1.3
1.9
0
14.4
3.8
4.9
0
43
Image Quality
Image quality is an important factor for the success and accuracy of the vanishing
point based calculations. As explained before, smaller aperture size and focal length
relate to sharper images, especially in low-light situations. When the images are
sharp, more lines are found for the vanishing point calculations and they are less
noisy. The GoPro camera surpasses significantly the other two cameras in image
quality and especially in the field-of-view. The images converted from the Sony video
stream were in some places grainy and dark and therefore the amount of lines found
was reduced. The limited number of lines disturbs the vanishing point calculations,
as may be seen from the low success rate of heading change calculations using the
images taken with this camera, namely 30 % and 35 %. While the success rate of
calculations performed using the GoPro images was high at 72 % and 82 %,mainly
due to the larger amount of lines captured into the image, as shown in Figure 4.8,
the heading change accuracy was slightly worse than when using the Sony camera
(mean errors of 2.5 and 2.8 degrees compared to 2.3 and 2.6 degrees). The images
in the figure are taken using Sony (on left) and GoPro (on right) cameras at the same
location. The red rectangle shows the portion of the GoPro image captured also by the
Sony camera. The worse accuracy with better quality images is due to the distortion
of the wide angle lens. Though the distortion was corrected before the vanishing
point calculations, some effect still remains.
64
4. Visual gyroscope
Fig. 4.8. Effect of the field-of-view on the line detection. The images are taken using Sony (on
left) and GoPro (on right) cameras at the same location. The red rectangle shows
the portion of the GoPro image captured also by the Sony camera. Red dots are
resulting vanishing points.
Image Rate
The image rate becomes an important factor when the environment appears challenging. Previous tests employing the Nokia N8 using the same method gave a mean
error of 2.5 degrees for heading change observations [95]. When the image rate is
low, namely 0.8 Hz, the environment, consisting of many turns and spacious areas,
with reduced amount of lines in view, leaves the method with few good observations.
The effect is seen from the reduced accuracy of the visual gyroscope when using the
Nokia N8 in which case the mean heading change error is 4.5 degrees as compared
to 2.5 degrees with Sony’s camera’s lower quality images.
4.5. Effect of Camera and Setup Characteristics on the Accuracy of the Visual Gyroscope 65
Camera Configuration
Surprisingly the camera configuration did not have a significant effect on the accuracy of the visual gyroscope heading change. The heading change of the camera is
assumed equal to that of the user when the visual gyroscope is used for pedestrian
navigation. When the camera is held in a hand, the change of the hand’s posture in
the horizontal direction introduces a heading change in the camera that is inconsistent
with the heading change of the user. The roll and pitch magnitudes are also larger
compared to those of the configuration where the camera is tied to the user. The
heading change accuracy was evaluated with two tests; in the first round the camera
was held in the hand and in the second round, tied to the backpack. The difference in
the mean errors was only 0.2 degrees between the two tests. The configuration of the
camera affects also the success rate of the images. When the camera is held in a hand,
it sometimes rotates too much, thereby reducing the sight to the lines and decreasing
the success rate of visual gyroscope calculations.
Pitch and Roll
The mean pitch error values were consistent with heading change errors for all cameras. However, the roll error was over two degrees less than for the heading change
using the Sony and Nokia N8, namely 0.5 and 1.3 degrees. The method described in
this paper calculates the roll based on the ratio between the totally vertical lines having an infinite slope value and the vertical lines with slope deviating from infinity, as
explained previously. The approach decreases the effect of vertical lines non-parallel
to the camera y-axis. When the camera is not experiencing any roll, the number
of totally vertical lines surpasses the value of other vertical lines, also non-parallel.
These lines, otherwise causing errors to vanishing point calculations, were therefore
discarded. The vertical line classification however fails when the distortion is corrected. The correction changes pixel values of all lines’ end points and therefore there
are no lines left with infinite slope value and the ratio based method is no longer
valid. The effect was seen from the larger roll error when using the GoPro camera
(2.1 degrees).
66
4. Visual gyroscope
Table 4.6. Processing time for different algorithms in visual gyroscope’s Nokia N8 Symbian
(capturing the photo) smartphone implementation using OpenCV (edge and line
detection) and vanishing point, heading and tilt computations implemented using
C++.
Algorithm
Capturing a photo
Edge detection (Canny)
Line detection (Hough)
Vanishing point, heading, tilt
Total
4.6
processing time (s)
1.2
0.17
1.0
0.07
2.7
Smartphone Application of Visual Gyroscope
The visual gyroscope has been implemented herein in a Nokia N8 Symbian smartphone. The implementation was done using Nokia’s C++ based development environment QT [78]. The basic visual algorithms (i.e. filtering, edge detection, Hough
transform, SIFT and matching) were obtained from OpenCV open source visual library [3]. The total processing time for automatically capturing an image, finding the
straight lines, voting for the vanishing point and calculating the orientation of the
camera is on average a bit over two and half seconds, with the specification shown in
Table 4.6. The bottlenecks of the calculation are the image capturing (1.2 seconds)
and line detection using the Standard Hough Transform algorithm (1 second). The
slowness of the Hough Transform was also acknowledged in [47] discussing an alternative visual gyroscope implementation in a smartphone. The time used for extracting the lines and calculating the vanishing point has to be decreased for a realtime navigation solution. This will be realized by using a more efficient line detection
algorithm as discussed in Chapter 6. Also the effect of using video images instead
of the still images has to be addressed in the future in pursuance of concentrating
especially on the power consumption aspect.
4.7. Visual Gyroscope Implementation Using Probabilistic Hough Transform
4.7
67
Visual Gyroscope Implementation Using Probabilistic Hough
Transform
The three most significant limitations of the visual gyroscope presented in above are
its inability to monitor the heading change during sharp turns, its accuracy suffers
from irregularities of the environment (namely lines violating the orthogonality requirement) and the calculation is relatively slow for real-time implementation. The
first two problems could be addressed by a tighter integration of the visual gyroscope and other positioning systems, especially a gyroscope. As the visual gyroscope
presented has processed the image data independently from other positioning systems
it has not had any support for exceptional situations. As was mentioned before the
line detection using the Hough transform is a bottleneck in the visual gyroscope’s
processing. All three problems are addressed in this section by developing a method
extracting the lines using a novel modification of an algorithm called Probabilistic
Hough Transform [52] utilizing the information of the user attitude obtained from an
IMU.
In [14] the attitude information obtained from INS was utilized to estimate the vanishing point location by calculating a probability density function for the Hough parameter space, and the method was called a predictive Hough Transform. The attitude
of the user obtained from the INS may be transformed into an estimate of the vanishing point using the relation
ṽ = KCcb Cbn R
(67)
where Ccb is a direction cosine matrix (DCM) [105] from the body to the camera
frame and Cbn from navigation frame to the body frame. The direction cosine matrix,


c11 c12 c13


Cnb = c11 c12 c13 
c11 c12 c13
(68)
where the element at the ith row and the jth column is the cosine of the angle between
the i-axis of the reference frame and the j-axis of the initial frame. A vector defined in
a certain axes frame may be expressed in reference axes by multiplying it by the direction cosine matrix (expressed here as transforming the vector rb in body axes into
68
4. Visual gyroscope
the navigation frame rn (the transformation may be done likewise for other transformations also) as
rn = Cnb rb
(69)
When the user rotates through angle ψ about the z-axis (heading), angle θ about the
new y-axis (pitch) and angle φ about the new x-axis (roll) the transformation may be
presented using the direction cosine matrix

cos θ cos ψ


n
Cb = cos θ sin ψ

− sin θ
− cos φ sin ψ+sin φ sin θ cos ψ
cos φ cos ψ+sin φ sin θ sin ψ
sin φ cos θ
sin φ sin ψ+cos φ sin θ cos ψ



− sin φ cos ψ+cos φ sin θ sin ψ .

cos φ cos θ
(70)
The reverse rotation, in this case the rotation Cbn is obtained using a transpose rule
Cnb = (Cbn )T
Matrix K in (67) is the camera calibration matrix and R normalized rotation matrix of the camera this time in the navigation frame and computed using the attitude
information obtained from the IMU. The expected vanishing point location ṽ is characterised by a Gaussian density function with parameters (µρ , σρ2 ) [14] as
ρθ ∼ N (µρ , σρ2 )
(71)
where the distance ρθ related to a certain angle θ (θ ∈ [0, π]) is normally distributed. The mean µρ is computed for each angle θ and the corresponding line going
through the estimated vanishing point ṽ. The variance σρ2 is decided based on the
IMU accuracy.
In [14] the probability density function was used as a filter for the Standard Hough
Transform (SHT) result space and provided a corrected vanishing point location. The
attitude information from the accurate vanishing point was finally used for correcting the INS attitude with a Kalman filter and an improved navigation solution was
obtained. The method addresses the problems arising from the erroneous vanishing point calculations due to an unsuitable environment, namely line geometry overwhelmed by non-orthogonal lines. As the line detection is done using the Standard
4.7. Visual Gyroscope Implementation Using Probabilistic Hough Transform
69
Hough Transform, the processing time jeopardizing the real-time solution is not improved. Therefore in this thesis an accelerated line detection algorithm based on
Probabilistic Hough transform is developed. However, it should be noted that the
method should be employed only for extracting lines when a vision-aided IMU system is used, it is not suitable as a generic line extraction algorithm.
As explained in Chapter3 the Standard Hough Transform computes the parameter
space (ρ, θ) for each point (x, y) found from the input image, being usually the result
of the edge detection, as
ρ = x cos(θ) + y sin(θ).
(72)
A matrix, called accumulator, keeps count of the number of image points corresponding to a certain (ρ, θ)-pair. After examining each point in the input image the
maximum value at the accumulator are identified and stated to represent lines in the
image. The Probabilistic Hough Transform [56] is a modification of the Standard
Hough Transform using only a random subset of image points for voting and deriving the number of votes needed for identifying a line using Monte Carlo evaluation
theory. According to [73] the algorithm reduces the computation only if a priori information of the number of lines is available and as this is not usually the case a
Progressive Probabilistic Hough Transform was developed. In the method the image
points used were selected randomly and the parameter pair was selected to represent
a line when the votes it had received exceeded the number that would be expected
from random noise. The amount of points needed to represent a line was evaluated
progressively based on the rate of the pixels examined and the pixels voting for a certain line. In this thesis a method combining the INS aided vanishing point detection
and Progressive Probabilistic Hough Transform discussed above is developed.
The attitude of the user obtained from the IMU measurements is transformed into an
estimate of the vanishing point location ṽ using (67) and the probability space (71)
corresponding to the point for each possible angle θ computed. Then, a pixel is selected randomly from the set containing all pixels resulted from the edge detection.
The distance ρ is calculated for all possible θ and the values in corresponding accumulator cells are increased by summing the value of the probability density function
for the obtained distance and mean with the existing cell value. The Standard Hough
Transform increases all accumulator cells equally because its objective is to find all
70
4. Visual gyroscope
straight lines present in the image, while here the objective is to find the lines supporting the vanishing point. As a result of using the probability function the closer
the possible line is to the estimated vanishing point, the more the accumulator cell
value is increased. When the value in the accumulator cell exceeds a predetermined
threshold, a line is found. As a line is found and no more support for it is needed,
all other image points belonging to the line are removed from the pixel set. Also all
the votes in the accumulator arising from the line are removed for not disturbing the
identification of other lines. In this way the number of image points examined and
therefore the computation time needed decreases. As the points having a larger likelihood of belonging to a line going through the estimated vanishing point or a point
close to it are given more weight, the lines found are likely to be in the direction of
supporting the central vanishing point.
After all pairs (ρi , θi ) supporting a line are identified as explained above, the correct
vanishing point is found from the intersection of all lines i as follows. As discussed
above, a (ρ, θ)-pair in the parameter space represents all collinear points (xi , yi ) in
the image. This is also true the other way around [28]; all points (ρi , θi ) satisfying
the equation
ρi = x cos(θi ) + y sin(θi )
(73)
which represent lines going through a point (x, y). As the line detection was done
by emphasizing the lines supporting the estimated vanishing point, all the lines found
should intersect at the correct vanishing point which may then be found using a leastsquares estimation technique for (73).
Two parameters selected for the calculation are crucial for the performance of the
visual gyroscope presented in this section, namely the threshold for deciding when a
line is found and the standard deviation of the estimated vanishing point value. When
the threshold for finding a line is deficient, the rate of false positives is large, and
when it is too large, the computation time increases and occasionally too few lines
are found from the low-light indoor environment resulting in an inaccurate vanishing
point location. Also, when the standard deviation assigned for the estimated vanishing point value is too large the errors in IMU induced attitude distort the line detection
by emphasizing points close to the estimated point probably not even belonging to a
line. For the experiments presented in Chapter 6, the parameter σ was chosen to be
4.7. Visual Gyroscope Implementation Using Probabilistic Hough Transform
71
20 to allow the estimated vanishing point to be within ±20 pixels from the correct
vanishing point and threshold for identifying a line 0.4 through experimentation.
72
4. Visual gyroscope
5. VISUAL ODOMETER
The user translation derived from the accelerometer measurements suffers from errors
and therefore a method for providing the information from consecutive images and
feasible for augmenting or replacing the accelerometer, namely a concept of visual
odometer, is developed herein. This chapter discusses the principle of the visual
odometer and especially the challenges in observing the translation from consecutive
images, i.e. the unknown depth of the objects seen in images and a scale problem
arising from it. Then, the visual gyroscope’s error detection and performance are
discussed as well as how the problems arising from degeneracy are avoided.
5.1
The Principle of the Visual Odometer
Translation of the camera between two consecutive images is constrained with a rule
called homography that encompasses the calibration of the camera as well as its rotation and translation between the images as was explained in Chapter 3. The homography equation, reproduced here, is
x0 = K0 RK−1 x + K0 t/Z.
(74)
When the image points in the first (x) and second (x0 ) image are normalized homogenous coordinates x̂0 the relation reduces to
x̂0 = Rx̂ + t/Z
(75)
where R is the camera rotation and t = [tx , ty , tz ] the translation between the images.
Z represents the distance (depth) of the photographed object from the camera and
because it is usually unknown in vision-aided navigation applications, the translation
is solved only within an ambiguous scale. In navigation, the absolute magnitude of
the translation has to be solved for and some solutions for obtaining the depth Z and
74
5. Visual odometer
therefore the scale were presented in Chapter 1. The visual odometer presented in
this thesis is based on a special camera configuration providing means to resolve the
object depth in an unknown environment. The method utilizes the camera rotation
obtained using the visual gyroscope and the known height of the camera measured
before starting navigation and kept sufficiently static. The definition of sufficient in
this context is given later in the chapter.
5.1.1
Measuring the Distance of an Object from the Camera
The distance Z from the camera to the object is calculated using information of the
height of the camera (h), the focal length in units of vertical pixels (fy ), and the
height of the image in pixels (H) [16]. The height of the camera must be known
but the focal length and height of the image may be obtained by camera calibration.
Figure 5.1 visualizes the configuration for resolving the depth Z of an object having
coordinates (X, Y, Z) and projected into image point (x,y) using the parameters listed
above, camera pitch φ computed by the visual gyroscope and β computed as follows.
When the image height (H) and the vertical component of the focal length (fy ) are
known , the vertical field-of-view (vfov) may be calculated as
H
.
(76)
vfov = 2 arctan
2fy
Now the angle comprising half of the vertical field-of-view may be defined as
tan(
H
vfov
)= 2
2
fy
(77)
and the angle (β) between the principal ray of the camera and the ray from the camera
to the object using the image point y and the focal length fy as
tan(β) =
y − H2
fy
(78)
which reduces to
tan(β) =
(y −
vfov
H
2 ) tan( 2 )
H
2
=(
2y
vfov
− 1) tan(
)
H
2
(79)
5.1. The Principle of the Visual Odometer
75
Fig. 5.1. The special configuration of the camera for resolving the distance (Z) of the object,
using the height (h) and pitch (φ) of the camera.
and results in
β = arctan
vfov
2y
− 1 tan
.
H
2
(80)
The Y coordinate of the object is obtained as
Y =
h
.
sin(α + β)
(81)
Finally, using β and the pitch of the camera φ, the distance Z is obtained as
Z = Y cos(β) =
h cos(β)
.
sin(φ + β)
(82)
In order to be able to determine the user translation by using this presented special
configuration, the object is required to lie in close vicinity of the camera, namely
76
5. Visual odometer
between the camera and the point where the principal ray intersects with the floor
plane with the prevailing pitch angle. The vicinity requirement is rational also in the
sense that the motion of the far-off objects is very small in term of pixels and may
therefore be overwhelmed by noise. Also, the method for resolving the ambiguous
scale using the known height of the camera requires the image points to lie on or
close to the floor plane. Experiments have shown [89] that lines used by the visual
gyroscope to find the vanishing point are mainly found from the floor, namely from
the junction of the floor and walls, especially with a camera having a pitch angle larger than zero towards the floor plane. If no such points are found, a coarser method
is introduced, considering all points found below the vanishing point, or if the vanishing point is not found, the principal point. As the method presented in the thesis
uses the rotation perceived by the visual gyroscope, the amount of matching image
points needed is reduced and therefore the limitation of the region for finding suitable
objects does not incur substantial limitations for the translation observation.
SIFT features are extracted from the two consecutive images and matched using Matlab algorithms [107] and the restrictions of the object’s location described above;
matching is shown in Figure 5.2. The lines join image points matched in the consecutive images (first on left and second on right), the red dot is the central vanishing
point and the green point on the floor of the left image is the only matched point inside
the region suitable for the visual odometer. Due to the low amount of features in indoor environments, less certain matches are accepted, so that some matches are found
for most images. Now the translation of the camera between the two images may be
resolved from the matched normalized homogenous image points x̂, x̂0 in the first and
second image, respectively, using (75). When the image points followed are projections of the objects lying on the floor, the translation matrix z- and x-component show
the horizontal translation that may further be transformed into translation in East and
North directions using the visual gyroscope induced heading. The loose matching
criteria, as well as the occasional use of the coarse floor plane recovery, necessitate
careful error handling for the robust visual odometer measurements which will be
explained next. Also the detection of the unknown scale will be discussed.
5.1. The Principle of the Visual Odometer
77
Fig. 5.2. Matched Sift features between consecutive images.
5.1.2
Error Detection and Resolving the unknown scale for the Visual Odometer
As was explained in Chapter 3, the image point presented with homogenous coordinates x = (x, y, 1), is related to the corresponding object coordinates X =
(X, Y, Z, 1) as
x = K[R|t]X
(83)
where t is the translation of the camera center, R its orientation with respect to the
world coordinate frame center and camera matrix P = K[R|t] is such that x = PX.
When the configuration is done in a manner that the location and attitude of the
camera while capturing the first image is set as the world frame origin and initial
attitude, the attitude and location obtained using the second image’s points reflect the
change of the location and attitude of the camera between the images, i.e. rotation
and translation of the camera.
The coordinates (X) of an object represented by two image points (x, y, 1)T and
(x, , y , , 1)T in consecutive images are estimated using triangulation as [39]
78
5. Visual odometer
xpT3 X = pT1 X
ypT3 X = pT2 X
x, pT3 X = pT1 X
(84)
y , pT3 X = pT2 X
where pTi represents the i-th row of the camera matrix P. The four equations may be
expressed in a matrix form as AX = 0 for a suitable A. An estimation X̂ for X may
now be obtained using the Linear-Eigen method, namely it is the unit eigenvector
corresponding to the smallest eigenvector of AT A minimizing |A| and having the
condition |X| = 1 [38]. This is done using the Singular Value Decomposition (SVD).
SVD is a factorization of a matrix M as M = UDVT [38]. Matrices U, V are
orthogonal and D diagonal with non-negative values. The decomposition of a matrix
MT M is MT M = VD2 V−1 and the values of D2 are its eigenvalues and columns
of V eigenvectors.
Because the object should be lying on the floor plane, the Y-coordinate of the estimation X̂ has to be equal to the height of the camera, and the unknown scale present
in the image homography may be solved. The scale factor is the ratio of the camera
height and the object Y-coordinate Yh [58]. When the translation obtained using the
homography is multiplied by the scale factor, the real translation in the horizontal
plane is obtained, and the translation in the vertical direction is assumed to be zero
based on the configuration requiring the camera height to be static.
The reprojection error is a measure used for evaluating if the image points are matched
correctly and is used for discarding erroneously matched points [38]. It is done by estimating the object point X̂ from the image point correspondences x, x0 as described
above and then reprojecting the estimated object point to the matched image points.
Error detection in the case of the visual odometer cannot be based on monitoring the
reprojection error for the motion is mainly forward and therefore the rays from the
camera to the image point in consecutive images are nearly parallel. Herein the Ycoordinate values are monitored and the ones deviating more than a threshold from
the mean of all observations are discarded. The deviation is constrained as
|Yi − µ(Y)| < 2σ(Y)
(85)
where Yi is the i-th object’s Y-coordinate, µ(Y) the mean of all obtained Y-coordinate
values and σ(Y) their standard deviation.
5.1. The Principle of the Visual Odometer
5.1.3
79
Degeneracy
Degeneracy problems arise from special situations while resolving motion of the
camera from consecutive images. When the camera rotates about its centre, or the
camera is static but its intrinsic parameters change between the images, a motion degeneracy arises [106] because the epipolar geometry between the consecutive images
is not defined. Structure degeneracy arises in the case when all image points used for
matching are coplanar because then the epipolar geometry between the images cannot
be uniquely determined. This is due to the fact that the camera matrix P presented
in Chapter 3 has 11 degrees of freedom and when the image points are coplanar they
define a homography with only 8 degrees of freedom.
The visual odometer proposed in this thesis resolves the degeneracy problems as
follows. When the depth of the object (Z) computed for matched image points as
explained above is constant for two consecutive images, the translation between the
images is set to zero. Thus, the homography resulting in errors due to the motion
degeneracy is not computed. As the heading is computed using the visual gyroscope
the heading measurement does not suffer from the degeneracy problem. The structure
degeneracy arising from the image points being planar is avoided from the configuration of the translation and rotation solution. As the camera is calibrated a priori,
the rotation is obtained from the visual gyroscope and the translation in the vertical
direction is assumed zero, and in theory only one matched image point between the
consecutive images is needed to resolve the horizontal translation. Therefore the image points may be coplanar, the missing 3 degrees of freedom needed to resolve the
camera matrix are already solved using these other visual parameters.
5.1.4
Performance of the Visual Odometer
The visual odometer provides relative information about the user position, i.e. translation, and therefore the initial position has to be obtained using an absolute positioning system. If the camera used is calibrated, the visual odometer does not need
additional calibration before or during navigation, however the performance of the
navigation system substantially increases if the absolute position of the user is occasionally calibrated, reducing the effect of unavoidable error occurrences in visual
observations. The visual odometer does not depend on any knowledge of the environment, but only the camera height above the floor plane must be estimated and
80
5. Visual odometer
kept sufficiently static. However, its performance is dependent on the accuracy of
the visual gyroscope’s measurements. The most drastic errors may be avoided by
monitoring changes in pitch; if the change is considerable it is most likely due to an
error in vanishing point location and in this case the previous pitch and heading values are used. The method of the visual odometer is not as tolerant to dynamic objects
as the visual gyroscope, but again the error arising from monitoring the motion of a
dynamic object depends on how many matching static points are found. The mean
error of the user speed obtained in different navigation environments was found to be
less than 0.3 m/s, with a standard deviation of 0.3 m/s. Analysis of the errors arising
from different navigation environments will be discussed in the following chapter.
The camera height has to be evaluated a priori and kept sufficiently constant during
navigation for the visual odometer to perform correctly. The effect of using an incorrect height estimate on the visual odometer’s performance due to erroneous a priori
measurement or failure in keeping the camera at a constant height, which would naturally occur when using smartphone in hand, was evaluated. The statistics from the
visual odometer perceived user speed while the height of the camera was correct and
kept constant were compared to a situation where the height of the camera was ±10
cm and ±30 cm off the correct value via simulation. In an experiment done in an
office corridor and resulting in 183 images, the mean error in speed was 0.26 m/s
when the correct height was used and no effect was seen when a height value with
an error of −10 cm was used. However, when the height was erroneous with the
same magnitude in the other direction, the mean error increased to 0.38 and 0.54 m/s
when the error was 10 and 30 cm, respectively. Table 5.1 shows the statistics. The
results show that a vertical motion of the camera less than or equal to 10 cm does not
deteriorate the performance of the visual odometer substantially, hence the upwards
the motion may be even larger.
81
5.1. The Principle of the Visual Odometer
Table 5.1. Statistics of the effect of camera height errors for visual odometer’s speed accuracy, units are in m/s
Statistics
Correct height
Height -10 cm
Height -30 cm
Height +10 cm
Height +30 cm
min error
0
0
0
0
0
max error
1.5
1.4
0.8
1.5
1.5
mean error
0.26
0.26
0.28
0.38
0.54
std of error
0.24
0.26
0.18
0.29
0.29
82
5. Visual odometer
6. VISION-AIDED NAVIGATION USING VISUAL GYROSCOPE
AND ODOMETER
This chapter discusses pedestrian indoor and urban navigation solutions utilizing a
visual gyroscope and a visual odometer to obtain a vision-aided integrated system.
The visual gyroscope used in the majority of the experiments is based on the method
utilizing Standard Hough Transform, the algorithm based on Probabilistic Hough
Transform is presented in the end of the Chapter. The collected data, visual and other
measurements from different sensors and radio positioning systems, are integrated
using a Kalman filter. It should be noted here that in the discussions related to Kalman
filter the matrices H and R are defined as was discussed in Chapter 2 and should not
be confused with the ones related to computer vision variables having corresponding
names. The durations of tests and test environments are varied in order to obtain an
extensive understanding of the suitability of the algorithms for real life pedestrian
navigation.
Due to the different start times and measurement rates of different systems, all measurements have to be time stamped carefully. Because cameras do not usually provide
a time stamp to images in Coordinated Universal Time (UTC) like other sensors often
do, but in relation to the camera’s own clock, the time for images has to be resolved
differently. In this thesis this is done using two different methods, namely by initializing the system through keeping all sensors static and then looking at the time of the
start of motion from images or alternatively by showing a handheld GPS clock to the
camera before starting navigation. The initialization of the world frame with respect
to the navigation frame is explained separately for each implementation discussed.
Calculations for all experiments are done in post-processing using Matlab and therefore the time synchronization done in the initialization phase is valid throughout the
total experiment time; no drift in time is experienced. The experiment setups and
results are then discussed.
84
6. Vision-aided navigation using visual gyroscope and odometer
6.1
Visual Gyroscope and Odometer Aided Multi-Sensor Positioning
The measurements from the visual gyroscope detecting heading and pitch changes
and the translation from the visual odometer were integrated with GPS position obtained using a Fastrax IT500 high-sensitivity receiver (sensitivity being -165 dBm
in navigation), WLAN fingerprinting observations from Nokia N8 smartphone, and
speed and heading measurements from the MSP (multi-sensor positioning) device
[21] as well as Nokia 6710 accelerometers and magnetometers using the Kalman filter presented below and explained in detail in [63]. The GPS receiver, as well as
the reference system, were initialized outdoors. The visual measurements were calculated from images taken using the Nokia N8 camera that has a resolution of 12
Mega pixels. Processing of such large images is very time consuming and therefore
the resolution was reduced to 640 × 480 to enable real-time performance. A NovAtel SPAN (Synchronized Position Attitude Navigation) GPS/INS high-accuracy positioning system was used as reference. The equipment was placed in a cart using the
setup shown in Figure 6.1. The start position was set to be the origin of the navigation
frame and heading was initialized using the reference solution at the beginning of the
experiment. This initial heading was used for setting up the visual gyroscope’s world
frame, hence at the initial point the visual gyroscope’s heading was stated to be equal
to the initial heading and during navigation heading changes between consecutive
images were monitored to propagate the heading. As the camera was attached to a
holder in a cart its roll was incremental and therefore the visual gyroscope measuring
only the pitch and heading change, presented in Chapter 4 was used. The time synchronization in the experiments testing the performance of vision-aided multi-sensor
positioning was done by monitoring the motion start time from the self-contained
sensors and the images.
The experiments were done in an office corridor first by obtaining WLAN position
measurements from a functional WLAN radio map and then using an outdated map
for assessing the performance of the visual gyroscope and odometer while absolute
updates are restricted, otherwise the setup was the same for both rounds. The relatively short test time is due to the difficulties in obtaining an accurate reference
trajectory for a longer time indoors. The results are however anticipated to produce
similar accuracy for longer time periods due to the possibility to obtain occasional
absolute position updates. The results from the experiments show that the vision-
6.1. Visual Gyroscope and Odometer Aided Multi-Sensor Positioning
85
Fig. 6.1. Equipment setup for experimenting the vision-aided multi-sensor positioning. The
Nokia N8 phone acquiring the images was attached to a holder in the front of the
cart. The GNSS antenna is that of the NovAtel SPAN reference system.
aiding increases the accuracy, precision and availability of the navigation solution as
described below.
6.1.1
Kalman Filter Used in Multi-Sensor Positioning
A constant speed model, which is often used in pedestrian navigation, was also used
and is defined as
Xk+1 = Xk + Ẋk ∆t + v1
Yk+1 = Yk + Ẏk ∆t + v2
(86)
Ẋk+1 = Ẋk + v3
Ẏk+1 = Ẏk + v4
where X and Y are the latitude and longitude transformed into the ENU (East, North,
86
6. Vision-aided navigation using visual gyroscope and odometer
Up) coordinate frame, Ẋ and Ẏ are their time derivatives, k denotes the current
epoch, ∆t is the time interval between two epochs, and vi is the state uncertainty
component of the element i. The state vector (x), transition matrix (Φ) and process
noise matrix (Q) for the model are
 

Xk
1
0
Y 

 
xk =  k  Φ k = 
Ẋk 
0
Ẏk
0

 ∆t4
q̃1 4
0 ∆t 0
 0
1 0 ∆t


 Qk =  ∆t3
0 1
0
q̃1 2
0 0
1
0
0
4
q̃2 ∆t4
0
3
q̃2 ∆t2

3
q̃1 ∆t2
0
3
0
q̃2 ∆t2 


q̃1 ∆t2
0 
0
q̃2 ∆t2
(87)
where q̃1 is a spectral noise density value for the North component and q̃2 for the East
component, both having a value of (0.5 m/s2 )2 based on an empirical assessment of
the quality of the sensors in the hardware platform. The measurements consisted
of position, speed and heading from GPS, WLAN (position), Nokia phone (ACC1)
and MSP (ACC2) accelerometer as well as the visual odometer (V) (speed), and
Nokia phone (DC1) and MSP (DC2) digital compasses as well as the initialized visual
gyroscope (V) (heading). The measurement vector z is

XGP S


YGP S




XW LAN






YW LAN


SACC1 cos θDC1 
.

zk = 

S
sin
θ
ACC1
DC1



S
 ACCV cos θV 


 SACCV sin θV 


SACC2 cos θDC2 
SACC2 sin θDC2

Matrices H and R are
(88)
6.1. Visual Gyroscope and Odometer Aided Multi-Sensor Positioning

1
0


1

0

0
Hk = 
0

0


0

0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
1
0
1
0
1
0

0
0


0

0

0

1

0


1

0
1
87
(89)
2
2
Rk = diag(σX
, σY2GP S , σX
, σY2W LAN , σS2 ACC1 cos θDC1 , σS2 ACC1 sin θDC1 ,
GP S
W LAN
σS2 ACCV
cos θV
, σS2 ACCV
sin θV
, σS2 ACC2 cos θDC2 , σS2 ACC2 sin θDC2 ).
(90)
The variance values σi2DEV were chosen by testing the accuracy of the corresponding
measurements (i) obtained using the device (DEV). The devices utilized had different
measurement rates, i.e. 36 Hz for accelerometers and magnetometers, 1 Hz for GPS
measurements, 0.8 for the visual gyroscope and odometer and 0.1 Hz for WLAN; the
size of the matrices therefore varied based on the number of measurements obtained
for the epoch under consideration.
6.1.2
Test in an Indoor Office Environment
The performance of the vision-aided multi-sensor positioning using the visual gyroscope and the visual odometer was tested in a typical office corridor as shown in
Figure 6.2; in this case however the corridor was suffering from low lighting due to
the dark North European winter. The data collection lasted three minutes. Although
the images were taken with a 0.8 Hz rate the vanishing point location calculation
occasionally failed due to scenes that were too dark or corridor turns and therefore
no visual gyroscope or visual odometer observations were obtained for a small percentage of the images captured and thereby the data collection resulted in 148 images
with successful visual calculations. The error in cumulative distance obtained when
propagating the position using the visual odometer measurements was only 1.3 m,
with the total length of the route being 158 m. However, the result is over optimistic
88
6. Vision-aided navigation using visual gyroscope and odometer
Fig. 6.2. Office corridor used for evaluating the vision-aided multi-sensor implementation.
as may be seen in Figure 6.3 that shows the variation of the speed measurements.
The mean error of the speed was 0.3 m/s and the standard deviation 0.3 m/s. In this
experiment the user speed was not filtered as may be seen in the figure, but in the
following experiments the user speed will be filtered with an upper limit of 1.5 m/s,
which is considered a reasonable assumption for normal pedestrian navigation.
As the magnitude of the steep turns (i.e. 90 degrees or over) may not be observed
using the visual gyroscope and no other feasible heading measurements were available in this and the following office experiments, the turns were detected as follows:
In turning situations where the turns were sharp and the vanishing points not perceived, the visual odometer observed the turns as being the only regions where the
matching failed completely and no corresponding image points were found. When
only one matching image pair was lost, the turn was assumed to be most likely 90
degrees and when three points were lost the turn was evaluated to be 180 degrees. If
some kind of map matching [83] were used as further augmentation, this procedure
would become more feasible. The visual gyroscope and odometer induced heading
changes and speed information were integrated with the other measurements using
the Kalman filter described above and the user position during navigation was obtained. The position solution obtained was compared to a solution using only GPS
6.1. Visual Gyroscope and Odometer Aided Multi-Sensor Positioning
89
Fig. 6.3. Average speed from ground truth (red) and visual odometer (blue).
or WLAN positions as measurements to the filter as well as to the integrated solution
using all other measurements but the visual. All solutions obtained were compared
to the ground truth obtained from the SPAN reference system. The vision-aided
fused solution, i.e. integration of the visual gyroscope and odometer measurements,
provided the best user position accuracy and precision, the mean error being 5.3 m
and the standard deviation being 3.8 m. The corresponding values for the fused solution without vision aiding are 6.7 and 5.1 m, for GPS positioning 17.8 and 11.5 m
and for WLAN positioning 5.9 and 4.7 m. The availability of the other positioning
systems, when the solution is computed at 1Hz rate, is 100% except for the WLAN,
for which the availability is only 10% due to the low update rate of the solution in
consequence of power saving requirements of the smartphone. Table 6.1 shows the
statistics. The position result is visualized in Figure 6.5 showing on the left the fused
solution without vision-aiding (blue), GPS position (black), WLAN position (purple)
and the ground truth (green) and on the right the vision-aided fused solution (blue),
GPS position (black), WLAN position (purple) and the ground truth (green).
90
6. Vision-aided navigation using visual gyroscope and odometer
Table 6.1. Positioning error statistics using different systems in an office corridor
Statistics
WLAN
GPS
Fused
Vision-aided fused
min
error (m)
1.6
0.6
0.9
0.9
max
error (m)
17.2
38.7
23.4
19.7
mean
error (m)
5.9
17.8
6.7
5.3
std of
error (m)
4.7
11.5
5.1
3.8
availability
(%)
10
100
100
100
Fig. 6.4. Indoor positioning results for a pedestrian using different sensors and fused solutions. The figure on the left has a fused solution without visual-aiding (blue) and the
figure on the right a fused solution using visual-aiding from the visual gyroscope
and visual odometer (blue), ground truth (green), GPS position solution (black) and
WLAN position solution (purple).
6.1.3
Test in Office Environment Using an Outdated WLAN Radio Map
An experiment using the same setup, equipment and methods as above was carried
out this time using an outdated WLAN radio map and therefore deteriorating the absolute position calibration during the navigation. This test was performed to assess
the impact of an incorrect WLAN map as such maps are known to change frequently
due to human traffic and other changes. Two of the eight WLAN access points in the
office corridor were out of order and two had changed locations after the formation
of the radio map. Also, some new electrical equipment was placed to the vicinity of
one access point. Thus, this altered setup caused the WLAN positioning accuracy
to be reduced to 11 meters from the previous 6 m. The use of the visual odometer
6.1. Visual Gyroscope and Odometer Aided Multi-Sensor Positioning
91
Fig. 6.5. Indoor positioning results for a pedestrian using different sensors and fused solutions in an office corridor with an outdated WLAN radio map. The figure on the left
has a fused solution without visual-aiding (blue) and the figure on the right a fused
solution using visual-aiding from the visual gyroscope and visual odometer (blue),
ground truth (green), GPS position solution (black) and WLAN position solution
(purple).
induced user speed filtering explained above decreased the mean error in the speed
from 0.3 m/s to 0.26 m/s with a standard deviation of 0.24 m/s. Now the fused position solution without vision-aiding had a 7.8 m mean error and a standard deviation of
4.8 m while the vision-aided fused solution using the visual gyroscope and odometer
measurements had a mean error of 5.8 m and a standard deviation of 3.7 m. The
statistics are shown in Table 6.2. The problems in the absolute position calibration
due to the outdated WLAN radio map increase the fused position solution mean error
by more than 1 m compared to the fused solution using a valid WLAN radio map, but
the degradation is not so significant for the vision-aided fused solution, namely only
0.5 m. The position result is visualized in Figure 6.5 showing on the left the fused
solution without vision-aiding (blue), GPS position (black), WLAN position (purple)
and the ground truth (green) and on the right the vision-aided fused solution (blue),
GPS position (black), WLAN position (purple) and the ground truth (green).
92
6. Vision-aided navigation using visual gyroscope and odometer
Table 6.2. Positioning error statistics using different positioning systems in an office corridor
with an outdated WLAN radio map
Statistics
WLAN
GPS
Fused
Vision-aided fused
min
error (m)
0.5
0.3
0.5
0.4
6.2
max
error (m)
36.9
32.6
16.1
13.1
mean
error (m)
11.0
14.9
7.8
5.8
std of
error (m)
10.9
8.4
4.8
3.7
availability
(%)
10
100
100
100
Stand-Alone Visual System
The two previous experiments assessed the performance of a visual gyroscope and
odometer aided fused navigation solution in an environment most favorable for the
visual methods developed herein and consisting of scenes having good line geometry
and only a few dynamic objects. Because pedestrian navigation is needed in various
environments, the method was tested in the most challenging environment for the
visual aiding approach, namely in a shopping mall with shoppers in motion, wide
corridors restricting frequently the view of their sides and therefore making the line
geometry degraded, varying lighting conditions and various objects forming many
non-orthogonal lines. As the method is also aimed for urban navigation, it was tested
in an urban environment, frequently close to the wall of a tall building. As these environments lacked an absolute positioning system for integration, the performance of
the visual gyroscope and odometer was tested as a stand-alone system, propagating
the initial position and orientation using the visual heading and speed measurements.
The visual measurements were calculated from images taken using the Nokia N8
camera and reduced resolution as in the previous experiments. Again, a NovAtel
SPAN GPS/INS high-accuracy positioning system was used as reference. The equipment was placed into a cart as in the previous experiments. The visual measurements
were propagated using a Kalman filter implemented as follows. The results obtained
were compared to the ground truth and are discussed below.
6.2. Stand-Alone Visual System
6.2.1
93
Kalman Filter Used in Stand-Alone Visual Positioning
In this case, the navigation solution is obtained by propagating the initial position and
heading using the visual gyroscope induced heading information and visual odometer
speed. The propagation is done using a straightforward Kalman filter modeling the
user position as
Xk+1 = Xk + Sk+1 ∆t sin θk+1 + v1
Yk+1 = Yk + Sk+1 ∆t cos θk+1 + v2
(91)
where X and Y are the latitude and longitude transformed into the ENU frame, S
is the speed of the user from the visual odometer, θ is the heading obtained using
the visual gyroscope, ∆t is the time interval between the current epoch k and the
consecutive epoch k + 1. v1 , v2 are the state uncertainty components of elements
X, Y and the state vector xk = [X Y ]T . The Φ, H and Q matrices for the filter are
#
#
"
"
1 0
22 0
Φ=H=
Q=
.
0 1
0 22
(92)
2 , σ 2 ) are based
The variances in the measurement covariance matrix Rk = diag(σX
Y
on the performance evaluation of the visual gyroscope and odometer and change
during the processing based on the error detection results, i.e. when there is a high
probability of an error occurrence the values are increased and therefore less weight
is given to the measurement.
6.2.2
Test in a Shopping Mall Environment
An experiment lasting 420 seconds was done in the Iso Omena shopping mall in
Espoo, Finland. The challenging environment deteriorated the visual gyroscope’s
performance as was already seen in Chapter 4, with the mean heading error being
4.4 degrees and the standard deviation 6 degrees. Figure 6.6 shows the challenges
set by the environment. The incomplete line geometry due to the wide corridor and
richness of objects decreases the number of straight lines found from the original image on the left and shown in the image on the right. The lines found are classified
as was explained in Chapter 4, and the erroneous vanishing point found is shown
94
6. Vision-aided navigation using visual gyroscope and odometer
Fig. 6.6. An image captured by the smartphone on the left and after processing using the
visual gyroscope algorithm on the right. The lines found from the image are classified and colored, the ones in the direction of propagation and used for the vanishing
point calculation being in blue. The central vanishing point is shown as a red dot.
as a red dot. The figure shows also a challenge set for the visual odometer, namely
the shoppers in motion. However, the visual odometer induced speed did not suffer
from the environment due to the increased number of objects and therefore matched
image points found in the environment. The mean error of the visual odometer’s
speed observations was 0.25 m/s with a standard deviation of 0.2 m/s. The visual
odometer propagated path was 179 meters, the total route length being 198 meters,
hence yielding an agreement of 90%. Figure 6.7 shows the position solution obtained by propagating the initial position and heading using the visual gyroscope and
odometer measurements (green) and the ground truth (red). As these results show
the solution without any absolute positioning method calibrating the position and
heading during navigation or aiding the solution in the occurrence of errors, the positioning accuracy is expected to increase substantially when the stand-alone visual
gyroscope and odometer measurements are integrated with other radio positioning
and sensor measurements.
6.2.3
Test in an Urban Canyon
As pedestrian navigation is needed also in urban environments, the suitability of the
developed vision-aiding method was tested in an outdoor environment, namely close
to a wall of a tall building deteriorating and occasionally totally blocking line of sight
GPS observations. The challenges set by the environment may be seen in Figure
6.2. Stand-Alone Visual System
95
Fig. 6.7. The two-dimensional position solution in the Iso Omena shopping centre with visual
stand-alone solution (green) and SPAN reference (red).
6.8. On the left is the result of a successful vanishing point observation despite the
presence of a dynamic object (a vehicle in this case) and bright lighting loosing edges.
On the right, these challenges disturb the vanishing point location observation, the
dynamic object (a human) introduces a number of non-orthogonal lines surpassing
the number of lines going in the direction of propagation, a phenomena that is also
partly due to the bright lighting disturbing the edge detection. The mean error of
the heading obtained using the visual gyroscope in the experiment lasting for 270
seconds and resulting in 224 images was 3.3 degrees with a standard deviation of 5.1
degrees, therefore the accuracy in this urban canyon falls between the office corridor
and the mall environment. The mean error in the visual odometer induced user speed
was 0.2 m/s with a standard deviation of 0.23 m/s. The route length was 146 meters
and the cumulative distance obtained using the visual odometer 131 meters, yielding
an agreement of 90%.
The navigation solution was again computed as a stand-alone visual system by propagat-
96
6. Vision-aided navigation using visual gyroscope and odometer
Fig. 6.8. Challenging environment of an urban canyon having bright light results in successful vanishing point detection when the line geometry is favorable (in left) or in errors
when dynamic objects create disturbances (on right).
ing the initial position and heading using the Kalman filter explained above and the
visual gyroscope and odometer measurements. The obtained position solution was
compared to the one obtained using a Fastrax IT500 high-sensitivity GPS receiver, a
typical consumer-grade L1 high-sensitivity GPS receiver. Figure 6.9 shows the position solution obtained using each system, with visual stand-alone system on the left
and the IT500 GPS receiver on the right. The different and therefore complementary
natures of the two positioning systems are seen in the figure, namely the visual position solution is immediately on the correct track but drifts slowly during navigation,
while the GPS solution takes 13 seconds to converge into the correct position but is
accurate thereafter after observing the necessary satellite geometry and a sufficient
amount of good-quality satellite signals, as is the case throughout this experiment
done in a modest urban canyon. The mean error of the visual stand-alone position
solution was 10.3 meters with a standard deviation of 5.8 meters. The corresponding measurements for GPS solution were 16.7 and 10.1 meters, Table 6.3 shows the
statistics. The experiment demonstrates how the two systems complement each other
and when an integrated the solution is anticipated to improve significantly. In this
section it was also shown that the performance of the stand-alone visual system is
comparable to that of the other positioning systems. A harsher urban canyon situation will be discussed in Chapter 7 where the concept of vision-aided carrier phase
navigation is introduced.
97
6.3. Visual Gyroscope Aided IMU Positioning
Fig. 6.9. Position solutions obtained in an urban canyon using a visual stand-alone system
(green, on left) and an IT500 GPS receiver (green, on right). The ground reference
is shown in red.
Table 6.3. Positioning error statistics for visual stand-alone and GPS position solutions
Statistics
Visual stand-alone
system
GPS
6.3
min
error (m)
0.3
max
error (m)
22.5
mean
error (m)
10.3
std of
error (m)
5.8
0.4
51.0
16.7
10.1
Visual Gyroscope Aided IMU Positioning
In the previous sections the experiments using a vision-aided multi-sensor positioning system, calibrating the position solution occasionally using absolute position information obtained from WLAN, and a visual stand-alone navigation system were
described. The accuracy of the visual stand-alone system drifts in time due to the
various errors discussed previously and WLAN based positioning needs a priori prepared environment, therefore other means for navigation need to be addressed. Selfcontained sensors carried by the user and presented in Chapter 2 are desirable for positioning. With a known initial position, the current position may be propagated for
a limited time using a triad of gyroscopes and accelerometers. The deficiency of the
self-contained sensors is the cumulative measurement errors that affect the accuracy
of the attitude obtained from the gyroscopes. Herein a method of updating the navigation filters attitude using vision-aiding and thereby providing accurate absolute user
98
6. Vision-aided navigation using visual gyroscope and odometer
position for indoor navigation is presented. The method uses only equipment carried
by the user, thus not requiring any additional infrastructure.
The method incorporates the visual gyroscope induced attitude as updates in a filter
integrating also the GPS position and Analog Devices ADIS16488 inertial measurement unit (IMU) data. The ADI IMU is a high grade MEMS IMU, with a 12◦ /hr
in-run bias stability and 1620◦ /hr noise level [7] providing measurements at 200 Hz
rate. The NovAtel SPAN-SE GPS/GLONASS receiver with a Northrop Grumman’s
tactical grade LCI (low-coherency interferometry) IMU was used as a reference system and carried in the backpack for both experiments. The visual gyroscope measurements are obtained from images taken with a GoPro camera, discussed in more
detail in Chapter 4. The filter used is a tightly coupled 21-state extended Kalman
filter (EKF) [11] and will be discussed below. The visual odometer measurements
are not integrated in this method and therefore the speed is obtained using the IMU
accelerometer alone. The visual and IMU measurements are time synchronized by
showing a handheld GPS receiver clock to the camera; the IMU measurements are
GPS-time tagged.
The navigation filter attitude is updated using the visual heading, pitch and roll measurements obtained from the visual gyroscope, and in the case of temporal visual
attitude updates discussed below the heading change is used instead of the absolute
heading. Only the measurements having an LDOP value below a specified threshold
are used for the update. However, some errors arising from environments not suitable for the vanishing point based method are not identified by the error detection
algorithm and have to be discarded using a fault detection algorithm. One such
situation is when the user is walking along a ramp, the line geometry is good and
therefore this violation of the orthogonality requirement is not perceived by the error
detection and as a result all visual pitch measurements deviate from the real pitch.
The fault detection is applied by accepting only the standardized visual measurement
values wi that do not surpass a pre-defined threshold value [62]. The standardized
visual measurements are obtained from the Kalman filter’s innovations of the heading, pitch and roll values vi and their corresponding estimated standard deviations,
σv , as
wi = |
vi
|, i = 1 : n.
σv
(93)
6.3. Visual Gyroscope Aided IMU Positioning
99
The innovation in the Kalman filter is defined to be the difference vk between the
measurement zk and the predicted value of the state x̂k and calculated as vk =
zk − Hk x̂−
k . In some cases only the visual heading measurement is found faulty and
discarded, while the pitch and roll measurements are used in an update.
The relative visual gyroscope induced heading measurements are transformed into
absolute heading information by observing the attitude of the camera with respect to
the navigation frame and using this information in propagating the visual gyroscope’s
heading during navigation. The initial position and orientation of the user is obtained
using the GNSS measurements at the start of the experiment. The GNSS receiver
observing the position was the NovAtel SPAN-SE GPS/GLONASS receiver used
also for the reference, but as the purpose of the experiment was to assess the effect
of vision-aiding on gyro errors, GNSS data was only used for three minutes at the
start to provide an initial position. As the GNSS measurements were not available
indoors and the gyroscope measurements are too noisy to measure the change in
heading accurately and the visual gyroscope fails in sharp turns, the heading of the
user has to be initialized after each sharp turn during navigation in order to obtain a
robust user heading continuously. This was done by using the building layout of the
navigation environment, 18 times during the experiment. All results are obtained in
post-processing using.
6.3.1
Kalman Filter Used in Visual Gyroscope Aided IMU Positioning
The errors in the gyroscopes cause the attitude measurements to drift, introducing
continuously increasing errors in the navigation solution. The errors consist of the
gyro bias, scale factor and non-orthogonalities, and the g-dependent error and noise.
The error model, discussed in more detail in [11], is
b
b
ω̃ib
= Sg ωib
+ bg + Gf bib + ηg
(94)
b is the gyroscope angular velocity measurement, S is a matrix including
where ω̃ib
g
b
the scale factors and non-orthogonalities, ωib is the body (b) turn rate with respect to
the inertial (i) frame measured by the gyroscope, bg are the gyro biases, G is a 3 × 3
matrix of the g-sensitivity coefficients, fibb is the specific force and ηg is the noise.
100
6. Vision-aided navigation using visual gyroscope and odometer
The g-dependent bias is due to high accelerations, especially affecting sensors attached to the ankle, where the acceleration may rise to a maximum of 12 g. The
g-dependent bias is an error source often neglected, but significant especially in pedestrian navigation applications directed to first responders, electronic monitoring and
military personnel. The g-dependent bias in the gyroscopes is a result of mass imbalances caused by the manufacturing process and can impact the MEMS gyros with an
error of 100 degrees/hour/g or more when uncompensated.
The tightly coupled 21-state extended Kalman filter (EKF), developed and implemented for [9], consists of linear perturbations of the position, attitude, velocity, gyro
and accelerometer bias, three gyro scale factor coefficients and three g-sensitivity
coefficients and the model is defined as
δ ṙe = δve
δ v̇e = Ne δre − 2Ωeie δve − Fe ε + Reb (ba )
b
+ bg + Gf bib )
ε̇e = −Ωeie εe + Reb ((I − Sg )ωib
ḃa = −τa−1 ba
(95)
ḃg = −τg−1 bg
Ṡg = 0
Ġ = 0
where re , ve are the position and velocity vectors in the earth centered earth fixed
(ECEF) frame, ε is the perturbation of the Euler angles relating the body frame to the
ECEF frame and ba and bg are the biases of the accelerometer and gyro. The inertia
tensor is denoted with Ne , the skew symmetric forms of the earth rotation vector Ωeie
and specific force measurement Fe . The rotation matrix Reb rotates the specific force
and angular velocity from the body to the ECEF frame.
6.3.2
Equipment Setup on the Body
The data, used for testing the feasibility of a body mounted vision-aided IMU navigation indoors, was collected through an experiment conducted on the University of
Calgary campus, mainly inside buildings. The environment was again very challenging for the visual measurements, consisting of numerous sharp turns, wide regions
6.3. Visual Gyroscope Aided IMU Positioning
101
Fig. 6.10. Route for experiments on the University of Calgary campus.
such as cafeterias and outdoor garden areas. The experiment was conducted during
office hours adding many moving humans into the images. The duration of the experiment was 48 minutes, succeeding a 10 minute walk outdoors, thereby allowing
the filter to converge, the route (obtained using the reference system) is shown in Figure 6.10. GNSS data was used only for three minutes at the start of the experiment
to provide an initial position. All equipment was carried in a backpack as shown in
Figure 6.11. The figure shows also a close-up of the setup on the backpack, namely
the camera attached to the top of the backpack and the IMU on the same plane. The
mutual attitude of the IMU and camera was observed in order to be able to integrate
the measurements. The results of the vision-aided IMU navigation obtained using the
two different update methods mentioned before are as follows.
Absolute Visual Attitude Update (AVUPT)
The absolute heading obtained by propagating the GNSS initialized heading using
measurements from the visual gyroscope was used as absolute updates to the Kalman
filter. The video stream obtained from the experiment was sampled into still images
at 10 Hz rate and resulted in 29802 images, of which 16347 were discarded due to
102
6. Vision-aided navigation using visual gyroscope and odometer
Fig. 6.11. Equipment attached to a backpack and carried by a user in a body mounted test
(on left) and a close-up of the equipment (on right).
large LDOP values. The fault detection within the navigation filter further rejected
11% of the remaining images. Visual pitch and roll updates only, with no heading,
were accepted from 38% of the images remaining from the error detection. Therefore
8337 absolute visual heading updates and 14549 absolute visual pitch and roll updates
were provided to the navigation filter.
Figure 6.12 shows the standard deviations for different integration schemes. When
visual updates are used, either absolute or temporal (discussed in the following subsection), the standard deviations of roll, pitch and heading stay close to zero for the
entire experiment; when no visual updates are used the standard deviations increase
with time. This is mainly due to the decrease of the gyro drift growth when visual
updates are used. This phenomenon should be considered with care as the update of
the absolute heading using the building layout during the navigation has a significant
effect on the growth of errors when the absolute update method is used. Figure 6.13
shows the effect of the visual updates on the attitude errors. The attitude errors are
expressed using a measure called root mean square error (rms). The rms error is a
measure expressing the spread of the values around the average. It is computed by
taking the root of the averaged squared residuals as
6.3. Visual Gyroscope Aided IMU Positioning
103
Fig. 6.12. Standard deviation for different integration schemes, namely no visual updates
used (blue), using temporal updates (purple) and absolute updates (green).
s
rms =
PN
i=1 (ŷi
N
− yi )2
(96)
where ŷi is the predicted value of the measurement yi , N is the total number of measurements. The vision-aiding improves the navigation solution’s pitch and roll only
slightly, as is seen from the figure and in Table 6.4. In the experiment, the pitch
root mean square error decreases from 1.7 to 1.4 degrees and the roll from 2.0 to 1.4
degrees when the absolute vision-aided attitude updates are used. However, the heading improves significantly, namely 93% as the root mean square error decreases from
29.5 to 2.1 degrees when the navigation filter is updated with visual measurements.
104
6. Vision-aided navigation using visual gyroscope and odometer
Fig. 6.13. Attitude error using different integration schemes, namely no visual updates used
(blue), using temporal updates (purple) and absolute updates (green).
Table 6.4. Attitude errors obtained for body-mounted IMU with different integration methods
Attitude errors (rms, degrees)
Pitch
No visual updates
1.7
Absolute vision-aided attitude updates (AVUPT) 1.4
Temporal vision-aided attitude updates(TVUPT)
1.6
Roll
2.0
1.4
1.7
Heading
29.5
2.1
17.6
Temporal Visual Attitude Update (TVUPT)
When no prior information of the environment is available, like a floor plan used
in the previous experiment, the temporal attitude (i.e. the change of the attitude
over a short interval) of the camera may be used. In temporal visual attitude update
(TVUPT) the Kalman filter integrates the user attitude obtained from the visual gyroscope to estimate the errors in the IMU attitude measurements. The temporal attitude
6.3. Visual Gyroscope Aided IMU Positioning
105
observation may be presented as
 e
 bk−n  e
φ
φ
φ
 
 
 
= β  − β 
β 
θ
θ
θ
bk
bk
(97)
bk−n
where φ, β and θ are the pitch, roll and heading of the camera, respectively. As the
filter estimates the errors in IMU attitude measurements, having a 200 Hz rate using
consecutive images having a lower rate, the two consecutive images are time labeled
as k and k − n, where n is the number of IMU epochs between two consecutive
images. b represents the body frame and e the ECEF frame.
The equation may be represented in the filter’s (95) rotation matrix form as Rebk =
b
Rebk−n Rbk−n
.
k
The image interval used in previous visual gyroscope aided IMU positioning experiments was 0.10 s. For the temporal visual attitude updates the interval had to be
decreased, because in order to get accurate temporal visual attitude updates, the image rate has to be as large as possible. Due to the computational limitations the entire
data set was not sampled at the chosen 30 Hz rate, but two consecutive images were
retrieved with a 0.033 s interval and then four subsequent images were discarded.
The sampling resulted in 30077 images.
When the temporal visual attitude update method was used, the user heading root
mean square error decreased from 29.5 degrees to 17.6 degrees, resulting in a 40%
improvement. Again, the improvement in the pitch and roll accuracy was minor,
namely the pitch root mean error decreased from 1.7 to 1.6 and roll from 2.0 to 1.7
degrees, as shown in Table 6.4 and Figure 6.13. The standard deviations of roll, pitch
and heading stayed close to zero for the entire experiment when the temporal visual
updates were used; when no visual updates are used the standard deviations increase
with time, shown also in Figure 6.12. However, the visual attitude updates show an
overly optimistic variance because the update is temporal rather than absolute for
which the integration algorithm was originally developed and therefore this method
provides variances that are not exactly indicative of the estimate.
106
6. Vision-aided navigation using visual gyroscope and odometer
6.3.3
Equipment Setup on the Foot
When the gyro is located on the ankle of the user the vertical acceleration can rise
up to the maximum of 12 g causing very large g-dependent errors. The effect of correcting the errors through vision aiding of the attitude was tested by an experiment
using a foot mounted system. The IMU and camera were attached rigidly to each
other and located on the ankle of the user. The setup is shown in Figure 6.14. Data
was collected in an experiment of 43 minutes conducted mainly indoors. Because the
purpose of the research was to assess vision-aiding performance on attitude and gyro
errors, GNSS data was only used for three periods of two to three minutes during
the navigation in low canyons between buildings. A pedestrian navigation solution
was obtained by integrating the vision-aided gyroscope attitude measurements and
applying zero velocity updates to the inertial navigation filter as well as using the occasional absolute heading updates obtained from the building layout. The integration
was performed using the Kalman filter described above. Due to the lack of a reference system mounted on the foot (the reference system was carried in the backpack),
the attitude errors could not be evaluated but the position errors could.
The visual heading, pitch and roll measurements were used as absolute updates to the
navigation filter attitude as explained above in the case of a body-mounted system.
The calculation of visual measurements was challenging due to large camera movements at the ankle of the pedestrian. The total number of images acquired during
the experiment was 25664. Only 18% received an LDOP value sufficiently good for
trusting the visual measurements due to image blurring introduced by the fast motion
of the foot and because the camera was pointing straight down to the floor for a short
time during a step cycle period, shown in Figure 6.15. Again, fault detection was
used to remove the noise from the visual measurements. The fault detection within
the navigation filter further rejected 65% of the images remaining from the error detection. Visual pitch and roll updates only, with no heading, were accepted from 18%
of the remaining images. This resulted into 1326 visual heading updates and 1617
visual pitch and roll updates to the navigation filter.
Table 6.5 and Figure 6.16 show the improvement of the position obtained with the
vision-aided foot mounted navigation system. The periods when GNSS was used
are shown in the figure with black squares. Vision-aiding improves the horizontal
position significantly in the eperiment; the root mean square horizontal position error
6.3. Visual Gyroscope Aided IMU Positioning
107
Fig. 6.14. Equipment setup for the foot, namely the GoPro camera and IMU attached to each
other.
Fig. 6.15. Images resulting from one step cycle period, three of the images are too blurred for
visual gyroscope calculations and are therefore not shown, five accurate vanishing
points are obtained because the rest of the images show only the floor plane or are
too blurred for accurate line detection.
decreases from 30.9 m to 20 m, yielding an improvement of 34%. Vision-aiding has
no effect on the vertical position error, in which case the error remains at 68 m.
108
6. Vision-aided navigation using visual gyroscope and odometer
Fig. 6.16. RMS position errors obtained for foot-mounted IMU with (green) and without
vision-aiding (red).
Table 6.5. RMS position error obtained for foot-mounted IMU with and without vision-aiding
RMS Position Errors (m)
Horizontal Vertical
No vision-aiding
30.9
67.5
With vision-aiding
20.0
67.7
6.4
Performance of Visual Gyroscope Implemented Using Probabilistic
Hough Transform
The method presented in Section 4.7 was tested using a subset of the data collected
using the body mounted equipment for testing the absolute visual attitudes explained
above. The subset consisted of 80 seconds of data collected indoors, resulting in 800
sampled images. Figure 6.17 shows the result of line detection and vanishing point
calculations. As the images were taken using the GoPro camera with a wide-angle
6.4. Performance of Visual Gyroscope Implemented Using Probabilistic Hough Transform 109
Table 6.6. Ratio of image points used for computing the Probabilistic Hough Transform
presented to the image points used by Standard Hough Transform for the images
processed in the experiment
Ratio of image points used compared to all image points (%)
Min Mean Max
Std
27
45
67
8
lens, discussed in Chapter 4, they are distorted. In order to maintain the real-time
processing obtained using the developed line detection, the distortion is not corrected
as described, but instead it effect is reduced by discarding the pixels close to the edges
of the image. The blue lines are extracted using the Probabilistic Hough Transform.
It should be noted that because the image is not distortion corrected the lines found
do not agree with the lines seen in the figure but would if corrected. The green point
is the vanishing point estimation based on the IMU attitude and the red point is the
corrected vanishing point. The figure shows how the vanishing point is found reliably
even when the IMU induced attitude and therefore the estimated vanishing point is
distorted.
As stated in [73] the processing time of an algorithm is dependent on the computer
used and algorithm and software implementation. Therefore the effect of the algorithm is shown by comparing the number of image points examined, in other words
the iterations of the parameter calculation. The Standard Hough Transform examines
all pixels in the input image and afterwards searches for local maxima from the accumulator to find the lines. The algorithm presented uses a fraction of the image points
for extracting the lines, namely on average 45% of all image points, and therefore the
computation is anticipated to be accelerated in the same proportion to the Standard
Hough Transform computation time. Table 6.6 gives the test iteration statistics. It
should also be noted that the method already detects the lines during the point examination as well as the vanishing point, further reducing the computation time used for
obtaining visual gyroscope measurements.
Restricted light conditions and lines violating the orthogonality requirement are major threats for the visual gyroscope’s accuracy often resulting in errors. An example
of an image suffering from both situations is show in Figure 6.18. The vanishing
point computed by the visual gyroscope discussed in Chapter 4 is shown on the left
110
6. Vision-aided navigation using visual gyroscope and odometer
Fig. 6.17. Line detection and vanishing point calculations using Probabilistic Hough Transform. Estimated vanishing point (green) is corrected (red) using the lines (blue)
found with the Probabilistic Hough Transform.
Fig. 6.18. Vanishing point detection in an environment suffering from low lighting and nonorthogonal lines, on the left using the visual gyroscope presented in Chapter 4, and
on the right using the visual gyroscope based on Probabilistic Hough Transform.
while the one using the visual gyroscope with the Probabilistic Hough Transform
presented in this is section on the right. The experiment shows increased tolerance
for vanishing point computation because now the computation process is not dependent on visual perception only, but receives a priori information of the user attitude
from the IMU.
Although the vanishing point detection method is dependent on the IMU, the method
is tolerant to large errors in the IMU measurements when the parameters of the Probabilistic Hough Transform algorithm are carefully selected. Figure 6.19 shows how
an estimated vanishing point (green) resulting from large temporary errors in IMU
6.4. Performance of Visual Gyroscope Implemented Using Probabilistic Hough Transform 111
measurements is corrected (red) through the line detection presented.
The method gives also promising results for turn detection that has so far been the
most significant obstacle preventing the use of vision-aided inertial sensors autonomously for navigation in unknown indoor environments. In turning situations the estimated vanishing point obtained by propagating the attitude using the method falls
outside the image at the same time as the corrected vanishing point obtained from the
Probabilistic Hough Transform detection is found from the other side of the image as
shown in Figure 6.20. This is due to the change of the World Frame, i.e. the visual
gyroscope was initialized at the beginning as having a zero heading when the camera
frame is totally aligned with the world frame and now the world frame’s horizontal
axes are rotated 90 degrees. When this contradiction is used in integration, at least
the existence of a steep turn is observed. Observing the magnitude of the turn is a
future research objective.
Fig. 6.19. Detected vanishing point (red) may be used to correct large errors in IMU measurements resulting in erroneous estimated vanishing point location (green).
112
6. Vision-aided navigation using visual gyroscope and odometer
Fig. 6.20. Conflict between estimated and detected vanishing point locations indicates the
existence of a steep turn.
7. VISION-AIDED CARRIER PHASE NAVIGATION
Navigation in urban areas is challenging for GNSS. Line-of-sight signals are blocked
by tall buildings and therefore, the requirement for measurements from at least four
satellites is not fulfilled and consequently a position solution is not obtained. Even
when the solution is obtained, multipath effects deteriorate the accuracy of the position. In this chapter the visual gyroscope and a version of the visual odometer are
used for aiding GNSS measurements in such areas.
The visual gyroscope is suitable also for urban environments, which consists of
countless straight lines, i.e. edges of buildings and roads. The limitation of the visual
gyroscope is its need for absolute heading information that has to be updated during
navigation due to its deficiency to monitor the magnitude of sharp turns and due to
calculation errors arising from problems in visual perception. However, the calibration need to be done only occasionally and in between the correct heading maintained
by propagating the absolute heading using the visual gyroscope’s measurements. Although the likelihood for observing at least four satellites needed for resolving the
user position is reduced, it is still possible occasionally even in an urban canyon and
therefore the visual gyroscope and GNSS positioning complement each other in these
challenging environments. The translation obtained from the homography constraining consecutive images has an ambiguous scale that was resolved earlier in this thesis
using a special configuration of the camera. In this chapter an alternative method is
presented, namely the scale is observed using differenced carrier phase GNSS measurements. As the carrier phase measurements are differenced, the need to resolve the
ambiguous integer number of the satellites’ carrier phase cycles is avoided. Because
only the scale, i.e. the total magnitude of translation between two time epochs is
needed, using two satellites with the proper geometry is enough and therefore the
method is feasible also for a dense urban environment. Below, the method is described in detail, then the verification of the method in a sub-urban environment is
discussed and finally an experiment testing the method in a dense urban canyon is
114
7. Vision-aided carrier phase navigation
presented.
7.1
Ambiguity Resolution Using Differenced GNSS Carrier Phase
Measurements
Time-differenced satellite carrier phase measurements provide information about the
magnitude of translation of the user between two time epochs which may be used
for resolving the ambiguous scale in the translation obtained from images. Because
only the magnitude of the translation and the receiver clock error are needed, acquisition of two satellites is enough. The idea has been approached earlier by [99]. The
method was developed for robot navigation and visual measurements utilized from
three cameras. The pitch and roll of the camera were obtained from an IMU. As the
heading was not observed accurately using an IMU, it was included to the algorithm
as an unknown and therefore observations from at least three satellites were needed.
In the test lasting for 100 seconds centimeter level accuracy was obtained, which
decreased into decimeter level when the receiver clock was calibrated and only two
satellites acquired. A similar method for vehicular navigation was developed in [23]
utilizing again an IMU for attitude, GNSS and vision-aiding. In the experiments at
least three satellites were observed and meter level horizontal position accuracy was
obtained. The method presented in this thesis is aimed at pedestrian navigation where
the possible amount and size of equipment is limited. When the relative heading and
translation information obtained from images is initialized with the absolute position
and heading information is provided by GNSS, the user position may be propagated
and only occasional absolute updates are needed for a functional navigation solution.
The carrier phase observation (ϕi ) for the satellite i may be represented using a simplified form as
ϕi = ri + cdtrcvr + λN + η i + εiϕ
(98)
where ri is the true range between the satellite and the receiver, cdtrcvr is the speed
of light times the difference between the receiver clock and satellite clock errors
with respect to GPS time, λ is the carrier wavelength, N is the integer ambiguity,
η i is an error term incorporating ionospheric, tropospheric and satellite orbital errors
and the error term εiϕ is the combined effect of multipath and noise. The equation
7.1. Ambiguity Resolution Using Differenced GNSS Carrier Phase Measurements
115
assumes that the satellite clock error is already compensated for. The carrier phase
measurements obtained at two time epochs (t1 , t2 ) are differentiated and the resulting
measurement is
∆ϕi = ϕi (t2 ) − ϕi (t1 ) = ∆ri + c∆dtrcvr + εiϕ .
(99)
The integer ambiguity term is removed by differencing the carrier phase observations
over time and the change in the term encompassing the errors stays below a centimeter / second level and is therefore omitted [99]. The differenced range ∆ri may
further be expressed [23] as
∆ri = (Ti (t2 ) − Trcvr (t2 )) · u(t2 ) − (Ti (t1 ) − Trcvr (t1 )) · u(t1 )
= (Ti (t2 ) · u(t2 )) − (Ti (t1 ) · u(t1 ))
−(∆Trcvr (t2 )) · u(t2 )) − (Trcvr (t1 )) · ∆u(t2 ))
(100)
where (·) denotes a vector dot product, Ti is the satellite position vector, Trcvr the
receiver position vector and u the unit vector from the user to the satellite and calculated as
u=
Ti − Trcvr
.
|Ti − Trcvr |
(101)
The term (Ti (t2 ) · u(t2 )) − (Ti (t1 ) · u(t1 )) is called the satellite Doppler term
(DOP Pi ) and it arises from the motion of the satellite between the two time epochs
and may be derived from the satellite observations. The term (Trcvr (t1 )) · ∆u(t2 )
expresses the change in the user unit vector (the line-of-sight unit vector) and is called
the geometry change (uGC ). As the geometry and satellite Doppler change terms
employ the user position at the second time epoch that is not known yet but will be
obtained from the calculations, an estimate of the position is used. The estimate has
to be accurate only to within 100 meters [99]. Now the differenced carrier phase
measurement (99) may be presented as
∆ϕi = −(∆Trcvr (t2 ) · u(t2 )) + DOP Pi − uGC .
(102)
By rearranging the terms, an equation for resolving the magnitude of the user translation ∆Trcvr (t2 ) between the time epochs (t1 , t2 ) is obtained from
116
7. Vision-aided carrier phase navigation
∆ϕcorr
= ∆ϕi − DOP Pi + uGC = −(∆Trcvr (t2 ) · u(t2 )).
i
(103)
The equation expressing the user translation ∆Trcvr (t2 ) has three unknowns, namely
the translation in x-, y- and z-axis directions. As the receiver clock error has to be
also resolved, four satellites would be required to obtain a solution, which would
not necessarily be always feasible in the dense urban environments. However, images provide information about the user translation between two time epochs, i.e.
the times of capturing the two consecutive images. The translation of the user with
an ambiguous scale may be obtained using the epipolarity constraint and the Fundamental matrix arising from it, discussed in Chapter 3.
7.1.1
Ambiguous Translation Using the Fundamental Matrix
The fundamental matrix F defined by (31) may be computed, given sufficiently many
matching image points (x0 , x), using a linear algorithm [38], used in the thesis; however, more robust algorithms for the computation are also presented in the reference.
When the epipolar geometry is defined to be affine, i.e. the difference of an affine
geometry compared to a projective geometry is that the cameras are defined to have
their centres at the infinity and therefore there is a parallel projection from scene to
image. The affine Fundamental matrix is


0 0 a


F = 0 0 b  .
c d e
(104)
Now, each point correspondence (x0 , x) may be represented as
x01

(x0i , yi0 , xi , yi , 1)f = 0, i.e.  ...

y10
..
.
x1
..
.
y1
..
.

1
..  f = 0
.
(105)
x0n yn0 xn yn 1
when n matching image points in the two consecutive images are found and f =
(a, b, c, d, e)T . At least four corresponding points are needed, but when there are
more, as is usually the case especially in outdoor environments with favorable lighting conditions, the singular value decomposition is used. The fundamental matrix F
7.1. Ambiguity Resolution Using Differenced GNSS Carrier Phase Measurements
117
may further be transformed into the Essential matrix E, also discussed in Chapter 3,
using the camera calibration matrix K (assumed constant between the images) and
the camera motion [76] as
h i
E = KT FK = t R.
×
(106)
h i
The term t denotes a skew symmetric matrix of the translation vector (tx , ty , tz )T
×
and is


0 −tz ty
h i


t =  tz
0 −tx  .
×
−ty tx
0
(107)
The singular value decomposition of the Essential matrix E gives E ∼ UDVT and
h
iT
the ambiguous translation obtained from U is t ∼ u13 u23 u33 , where uij
denotes the element in the matrix U on the ith row and jth column.
7.1.2
Navigation Solution Incorporating the Absolute User Translation
The user translation ∆Trcvr (t2 ) is perceived in the navigation frame but the visual
translation is in the camera frame. In order to be able to obtain the ambiguous scale
of the visual translation and turn it into a position change of the user again in the
navigation frame, transformations have to be made. The visual gyroscope presented
in Chapter 4 provides information about the attitude of the user with respect to the
navigation frame. A Direction Cosine Matrix Cbn transforming the observations from
the navigation frame to the camera frame is formed using the heading, pitch and
roll measurements obtained from the visual gyroscope as explained in Chapter 6. For
resolving the scale, both the unit line-of-sight vector u(t2 ) and user translation vector
(∆Trcvr (t2 )) has to be multiplied using the direction cosine matrix Cbn . The user
translation vector (∆Trcvr (t2 )) may be written using the scalar scale s of the visual
translation that is still unknown and the visual translation vector t as (∆Trcvr (t2 )) =
Cbn st. Now, (103) can be re-written as
∆ϕcorr
= ∆ϕi − DOP Pi + uGC = −(Cbn u(t2 ) · Cbn ts)
i
(108)
118
7. Vision-aided carrier phase navigation
and in a matrix form
 corr 


"
#
∆ϕ1
−(Cbn uT1 (t2 )) · (Cbn t) 1
s
 .. 


.
.
..
..  x =
y= .  H=
c∆dtrcvr
∆ϕcorr
−(Cbn uTN (t2 )) · (Cbn t) 1
N
(109)
from which the ambiguous scale of translation s may be obtained using the leastsquares equation x = (HT H)−1 HT y.
Occasionally, especially when only two satellites are observed, the scale computation
fails. The errors are detected and discarded by monitoring the magnitude of the user
speed from the absolute speed obtained, i.e. the measurement is discarded and a
previous one used when the speed exceeds 3 m/s.
The translation is again transformed into the navigation frame (ENU) using a Kalman
filter propagating the user position (X, Y ), heading (θ) and speed (S). Speed is
obtained from the translation computed as discussed above and the heading from the
visual gyroscope and occasional absolute heading updates as discussed below. The
Kalman filter models the user position as
Xk+1 = Xk + S sin(θk )∆t
Yk+1 = Yk + S cos(θk )∆t
(110)
discussed in more detail in [66]. The state xk and measurement zk vectors for the
model are
 
 
X
Xk
Y 
Y 
 
 
xk =  k  zk =  
θ
 θ 
S
S
(111)
and state transition matrix Φ and process noise (Q) matrices

1
0

Φk = 
0
0
0
1
0
0

0 sin(θk )∆t
0 cos(θk )∆t


1
0

0
1
(112)
119
7.2. Method Verification in a Sub-Urban Environment

2
2
3
q1 ∆t + (a q3 +b3 q4 )∆t

(acq3 +bdq4 )∆t3

3

Qk = 
aq3 ∆t2

2
bq4 ∆t2
2
(acq3 +bdq4 )∆t3
3
2
2
3
q2 ∆t + (c q3 +d3 q4 )∆t
cq3 ∆t2
2
dq4 ∆t2
2
aq3 ∆t2
2
cq3 ∆t2
2
q3 ∆t
0

bq4 ∆t2
2

dq4 ∆t2 
2

 (113)
0 
q4 ∆t
where q1 is the spectral density for the position North component (X), q2 for the East
component (Y ), q3 the spectral density for the heading and q4 for the speed. In the
following section two experiments using the method discussed above are described
and assessed.
7.2
Method Verification in a Sub-Urban Environment
Although the method of vision-aided carrier phase navigation is designed for urban
positioning, performance verification was first performed in an easier signal environment, namely that with lower buildings blocking out only satellites having an
elevation less than 30 degrees. The test setup consisted of the Novatel SPAN-SE
GPS/GLONAS receiver providing carrier phase measurements and a GoPro camera
for visual measurements. A Northrop Grummans tactical grade LCI-IMU and the
SPAN were used for acquiring the reference solution as well as initializing the user
position and heading at the beginning of the experiment and after every three minutes
of navigation. The system was carried in a backpack as shown in Figure 7.1. The
camera and GNSS receiver were attached to the top of the system and are indicated
with a red circle in the figure. After initialization the user position was observed by
propagating the heading and speed measurements using the Kalman filter explained
above. The data was collected for 15 minutes and post-processed using Matlab.
As the purpose of the verification experiment was to test the feasibility of the system
in extreme signal conditions when only two satellites are used, the vision-aided carrier phase navigation solution was computed using only two satellites available for
the full experiment. Because the heading obtained from the visual gyroscope suffers
from occasional errors in vanishing point calculation and the sharp turns cannot be
observed, every three minutes the position and heading were re-initialized using the
reference system. This is justified by the fact that even in a dense urban area the requirement for four satellites is fulfilled once in a while as is also seen in the real urban
120
7. Vision-aided carrier phase navigation
Fig. 7.1. Setup for data collection used in verification of vision-aided carrier phase navigation.
Table 7.1. Positioning verification error statistics using vision-aided carrier phase (VA)
Statistics
VA
min
error (m)
0
max
error (m)
76
mean
error (m)
24
std of
error (m)
18
experiment below. Figure 7.2 shows the path obtained using the vision-aided carrier
phase navigation (red) compared to the ground truth (blue). The red light lines show
the effect of the position correction after every three minutes. Table 7.1 shows the
horizontal position error statistic. The fairly large mean error in position, namely 23
meters, resulted from the difficulty in obtaining an accurate heading solution using
the visual gyroscope in this fairly open sub-urban environment lacking straight lines
from high-rise buildings of urban canyons. This was anticipated to improve when the
method is experimented in an urban canyon. The length of the obtained path agreed
with the ground truth and therefore the visual odometer utilizing the carrier phase
measurements was shown to provide promising results.
7.3. Vision-Aided GNSS Navigation in an Urban Environment
121
Fig. 7.2. Position solution verification using vision-aided carrier phase navigation (position
red dot, path red line) and compared to ground truth (blue) in a sub-urban environment in Calgary, shown in Google Earth.
7.3
Vision-Aided GNSS Navigation in an Urban Environment
In order to show the advantages and limitations of the method in severe urban canyons
using an unaided standard receiver, a test was carried out in downtown Calgary as
shown in Figure 7.3. This canyon encompasses tall and reflecting buildings heavily
blocking the satellite signals and/or causing multipath. The test duration was 25
minutes. The user position and heading were initialized using the reference system
described in the previous section. Then, the user position was propagated using the
Kalman filter described above and the following procedure. The Novatel SPAN-SE
GPS/GLONASS receiver was used to acquire the pseudorange (L1), carrier phase
(L1) and Doppler measurements as well as the GNSS navigation message. Only
GPS measurements were used in the processing. The GoPro camera was used for
capturing a video stream that was sampled at 10Hz. All equipment was attached to a
backpack carried by the user as in the experiment above. When four or more satellites
were acquired, the user position was computed using the pseudorange measurements
and the least-squares method described in Chapter 2. As may be seen below, the GPS
measurements are heavily degraded in such challenging environments and therefore
the quality of the position solution was monitored by examining at the least-squares
residuals. The residual r is computed after the least-squares final solution is obtained
and it expresses the difference between the anticipated and obtained measurements.
The residual vector is calculated using the pseudorange measurement vector (z), the
geometry matrix (H) and the estimated user position vector (x̂) as
r = z − Hx̂.
(114)
122
7. Vision-aided carrier phase navigation
Fig. 7.3. Calgary downtown environment used for vision-aided GNSS navigation.
When the residual of any satellite i exceeded a threshold (herein 20 m, selected by
experimentation), the GPS solution was discarded and a vision-aided carrier phase
solution was computed instead using the position and heading from the previous
epoch in the state vector for initialization. When the consecutive epochs provided
successful GPS position solutions, heading between the epochs was computed. As
the error in GPS-derived position using pseudoranges is commonly a few meters even
in a favorable environment, the heading computed from two consecutive epochs having a time interval of only one second, would be erroneous in most cases. Therefore,
the heading was computed using the longest interval of successful positions, however not exceeding 10 epochs. The heading was computed using the latitude (φ)
and longitude (λ) of the position at the first time epoch (1) and last epoch used (n)
as [110]
θ=
mod(arctan 2(sin(λn − λ1 ) cos(φn ),
cos(φ1 ) sin(φn ) − sin(φ1 ) cos(φn ) cos(λn − λ1 )), 2π).
(115)
When less than four satellites were found, the translation and heading of the user
were computed using the visual gyroscope and the visual odometer presented in this
chapter and the position propagated using the Kalman filter. Figure 7.4 shows the
number of satellites obtained for each time epoch in the experiment (blue). As the
experiment was started in an open area, up to 9 GPS satellites were occasionally used.
As the path proceeded into the urban canyon the number of satellites used decreased,
dropping under four towards the end of the data set. The number of observed satellites does not directly however reflect how many satellites were available for position
computation. The figure shows also the epochs when the obtained position accuracy
was deemed unreliable based on the residual and vision-aided carrier phase solution
7.3. Vision-Aided GNSS Navigation in an Urban Environment
123
Fig. 7.4. Number of satellites acquired in an urban canyon for each time epoch of the experiment (blue) and the epochs when vision-aided carrier phase navigation used due to
too few satellites or large residuals in GNSS pseudorange position estimations (red
above the blue mark).
used instead (red above the blue mark).
Figure 7.5 shows the path obtained using vision-aided carrier phase navigation (blue),
GPS-only solution (green) and the ground reference (red). As the GPS-only positions
deviate strongly from the reference in the end of the data set, the figure has been
zoomed showing better the vision-aided solution results. The vision-aided navigation solution provided fairly accurate results at the beginning of the experiment but
deteriorated as the user entered the urban canyon and obtained poor GPS measurements, resulting in 200 meters of error in the worst case.
Figure 7.6 shows the horizontal errors of the vision-aided carrier phase navigation
and GPS-only solutions. Again, at the beginning of the experiment the errors remained low but deeper inside the urban canyon they grew. The figure shows also
the main reason for the error growth, namely the GPS position computation process
resulting in low range residual values and therefore used although being erroneous
making the GPS position significantly deteriorated. The figure is zoomed to exclude
the largest GPS-only errors enabling closer examination of the vision-aided solution
124
7. Vision-aided carrier phase navigation
Fig. 7.5. Position solution in an urban canyon using the ground truth (red), GPS-only(green)
and vision-aided carrier phase navigation (blue).
errors. Heavily multipath-affected observations are difficult to discard by assessing
range residuals only, especially when the measurement redundancy is low. A better
filtering and observation selection for the obtained GPS position should be developed
for a more accurate vision-aided navigation solution. Table 7.2 shows the statistics
for the horizontal position errors, namely a mean error of 25 meters with a standard deviation of 48 meters. However, already in this simple implementation, the
vision-aided carrier phase method improves the navigation solution significantly as
may be seen from the table also showing the corresponding horizontal position error
statistics for a GPS-only solution based on the pseudoranges (mean error 73 meters,
standard deviation of 1241 meters). This positive effect of vision-aiding may also be
seen by comparing the navigation paths obtained using the vision-aided carrier phase
navigation shown in Figure 7.7 and using the GPS pseudorange measurements only,
propagated using a simple Kalman filter but no error detection for the obtained position, shown in Figure 7.8. Both figures show the obtained position solution (green
7.3. Vision-Aided GNSS Navigation in an Urban Environment
125
Fig. 7.6. Horizontal position error in an urban canyon obtained using vision-aided carrier
phase navigation and GPS-only.
Table 7.2. Positioning error statistics using vision-aided carrier phase (VA) and GPS only
(GPS)
Statistics
VA
GPS
min
error (m)
0.4
0.1
max
error (m)
200
4015
mean
error (m)
25
73
std of
error (m)
48
1241
dots), its path (red line) and the ground truth (blue).
As anticipated, performance especially when using GPS alone, was significantly degraded in the urban canyon. Due to the frequent unavailability and large GPS position
errors, the vision-aided solution suffered significantly. As the position and heading
from GPS-only were already erroneous when the navigation solution computation
switched to the vision-aided carrier phase method, the user position was still poor
despite the addition of accurate visual measurements,. If other sensors were added
to aid GPS, better measurement error detection for the GPS measurements would
occur, in which vision-aided carrier phase navigation would provide significantly
better results. The absolute heading could be obtained by using a 3D magnetometer.
In urban canyons, magnetometer heading measurements deteriorate due to magnetic
126
7. Vision-aided carrier phase navigation
Fig. 7.7. Position solution using vision-aided carrier phase navigation (position green dot,
path red line) and compared to ground truth (blue) in an urban canyon in downtown
of Calgary, shown in Google Earth.
disturbances arising from nearby ferrous environments. However, techniques to mitigate these errors, especially when magnetometer measurements are combined with
3D accelerometers and rate gyros, are emerging [4]. Inertial sensors and barometers
would also aid the error detection process, thereby eliminating large GPS measurement errors.
Fig. 7.8. Position solution using GPS measurements only (position green dot, path red line)
and compared to ground truth (blue) in an urban canyon in downtown of Calgary,
shown in Google Earth.
8. CONCLUSIONS
There is a strong need for enhanced pedestrian navigation systems for improved
safety and to improve everyday life. First responders, electronic monitoring (i.e.
monitoring of people such as dangerous offenders) and military personnel operate
in challenging situations and need a system that is available in all environments.
Moreover, general users need precise indoor navigation to locate specific rooms in
buildings and to use location based applications. Pedestrian navigation is mainly
needed indoors and in urban environments. Although indoor and urban navigation
has been an active research subject for years, no unique navigation system addressing
all needs has yet been developed with a level of performance similar to that of GNSS
in the outdoors. A pedestrian navigation system has to be light and small in size, have
low power consumption and price, in addition to perform well in all environments.
Therefore it has preferably to be independent of specific indoor infrastructures such
as RF access points.
So far the most promising approach for pedestrian navigation is the fusion of many
different sensors and positioning systems, the most widely used being self-contained
sensors, GNSS and WLAN. The performance of GNSS is degraded indoors and in
urban canyons, WLAN needs an a priori prepared infrastructure and the errors in selfcontained sensors result in position solutions that drift in time and become distorted.
Hence, other means for augmenting or replacing some of these methods have to be
used. Vision-aiding is a feasible method in many environments because it is affected
by error sources which are different from those of other navigation technologies.
Consecutive images provide relative information about the attitude and translation of
the camera, which can be further transformed into user heading and position information. In favorable environments and circumstances this information obtained from
the images results in a much more accurate and available navigation solution when
integrated with other measurements; it can also be used for stand-alone navigation
for short periods of time if the absolute position and heading are known at the begin-
128
8. Conclusions
ning of these periods. However, vision-aiding solution suffers from errors due to low
lighting or of scenes unsuitable for visual perception, especially those including moving objects (i.e. human, vehicles). Therefore vision-aided systems need occasional
calibration from an absolute navigation system.
In this thesis, new tools for vision-aiding navigation solutions were developed and
tested, namely a concept called visual gyroscope for observing the user heading and
a visual odometer for translation. Both methods provide user displacement information by monitoring the motion of features in consecutive images captured by a camera carried by the user. Different methods were investigated to combine the above
measurements using Kalman filtering and vision-aided navigation systems were obtained. Different variations of a Kalman filter were selected because they are prevailing means for observation integration and estimation in the navigation field. However, better performance might be obtained using more sophisticated algorithms, e.g.
particle filters. The next section discusses the main results.
Despite the challenges in vision-aiding arising from the occasional unsuitability of
the urban navigation environment for visual processing as well as difficulties related
to computer vision algorithms, vision-aiding improves navigation accuracy, availability, reliability and continuity.
8.1
Main Results
The visual gyroscope tracks the motion of virtual points, called vanishing points,
arising from the projective geometry mapping the parallel lines in a three dimensional scene into a two dimensional image. The change in the location of the vanishing points can further be transformed into the attitude of the user and therefore the
method is called the visual gyroscope. In an ideal situation where there is enough and
constant light, the environment has a favorable structure and does not contain any dynamic objects, a visual gyroscope processing images from a static camera produces
almost no error compared to a reference system in the user attitude. In the situation
where the lighting is not the best possible and dynamic objects disturb the vanishing point perception, a static visual gyroscope performs still better than a common
MEMS gyroscope in a performance test. In a real navigation situation the conditions
vary a lot and deteriorate the visual perception and therefore careful error detection
is needed. An error detection algorithm suitable for pedestrian navigation, namely
8.1. Main Results
129
a method computing a quantity called LDOP monitoring the reliability of the visual
gyroscope’s attitude measurements based on the geometry of the lines used, was developed. When the visual gyroscope’s measurements passing the error detection were
used as updates for a Kalman filter propagating the angular velocity and acceleration
provided by an IMU, the obtained user attitude improved significantly. When the
user heading was occasionally calibrated using the building layout, the attitude improved 93% and when the visual gyroscope’s heading change measurements were
used with no calibration, an improvement for the attitude of 40% was obtained in the
experiment.
The visual gyroscope is incapable to detect the magnitude of sharp turns, namely
when the turn is close to 90 degrees or more. This was addressed by developing a
method using the IMU attitude measurements to aid the vanishing point detection
also making the procedure more accurate. So far the detection of the sharp turns
was implemented, whether it is possible to also observe the magnitude of the turns
is a task for future research. Simultaneously, the algorithm of IMU-aided vanishing
point detection was found to decrease the computational burden of the line detection
needed in the visual gyroscope method and therefore making the real-time performance of the navigation solution possible.
The visual odometer identified the matching feature points in consecutive images and
used the homography relation to observe the translation of the user i.e. the distance
travelled. As the translation obtained from the images has an ambiguous scale, two
different approaches to solve it were developed. A visual gyroscope feasible for
indoor navigation resolved the scale by using the known, special configuration of the
camera. As the height of the camera was known a priori and kept sufficiently static
(vertical motion of ±10cm was still found to provide accurate results), the pitch of
the camera was obtained using the visual gyroscope, basic geometry could be used to
compute the depth of the object found in the image and therefore the scale too. As the
method observed the attitude of the camera using the visual gyroscope and the camera
was kept static, only the horizontal translation was left to be resolved reducing the
amount of matching image points needed, a profitable feature especially for indoor
navigation. For the special configuration to be useful, the image points had to lie on
the floor. However, again because the attitude of the camera was obtained using the
visual gyroscope, the degeneracy problem arising from using image points all found
from a plane, was avoided. The performance of the visual odometer was evaluated by
130
8. Conclusions
looking at the agreement of the translation obtained using the method and the ground
truth, and it was found to be over 90% in all experiments. The visual gyroscope and
odometer were integrated with a multi-sensor multi-network system and tested in an
office corridor with a configuration having a workable WLAN positioning solution
and using an outdated WLAN radio map. The vision-aided fused solution was found
to improve the mean error of the user position from 1.5 to 2 meters. The visual
gyroscope and odometer were also tested as stand-alone system in more challenging
environments, namely in a shopping mall and an urban canyon, also resulting in
improved position accuracy.
In urban canyons the GNSS measurements are typically available, however deteriorated. A method utilizing GNSS carrier phase observations for resolving the scale
problem in translation was developed. After observing the ambiguous translation
from the consecutive images, the scale was obtained by looking at the differentiated
carrier phase measurements from the satellites and observed at the time epochs of
capturing the images. As the carrier phase measurements were differentiated, the
integer ambiguity in carrier phase measurements was cancelled. Because only the
magnitude of the translation and the receiver clock error needed to be resolved, tracking two satellites was enough. By integrating the visual gyroscope’s and odometer’s
measurements using a Kalman filter, an initial position could be propagated and a
resulting navigation solution obtained. The method was first experimented in an suburban environment propagating only the visual gyroscope induced orientation and
odometer’s translation and solving the scale for the translation from differentiated
GNSS carrier-phase measurements for two satellites. Since the errors in the visual
perception deteriorate the position, the solution was calibrated using the reference
system once in three minutes and a mean error of 24 meters was obtained in the 15minute experiment. In a real urban canyon GPS was vision-aided by propagating the
user heading and position using the visual gyroscope’s and odometer’s measurements
when less than four satellites were observed or the GPS induced position was distorted based on assessing the solution’s residual values. Again, as the visual gyroscope
and odometer are dependent of an absolute initialization, and they were re-initialized
whenever more than four acceptable satellite observations were available providing
a GPS solution, the errors in the GPS position used to calibrate the visual measurements occasionally deteriorated also the vision-aided navigation result. However, the
overall position accuracy improved significantly compared to the one obtained using
8.2. Future Development
131
GPS positioning only, namely the mean error decreased from 73 meters to 25 meters
when the vision-aided GPS carrier-phase based processing was used.
Despite of all challenges in the urban and indoor environments the visual aiding improved the navigation solution in all different experiments discussed in the thesis.
Already in their present form when integrated with e.g. IMU the visual methods
presented in the thesis would provide a solution with enough accuracy for short term
pedestrian navigation. Continuous navigation time could further be increased significantly by using e.g. a floor plan or other absolute positioning means. However, it
should be noted that the tests were done using a limited amount of data and therefore future research should assess the performance of the developed methods via
extended duration of data collection. Due to its distinctive characteristics, visual
processing complements other positioning technologies in order to provide better
pedestrian navigation accuracy and reliability.
8.2
Future Development
The largest challenges in using the visual gyroscope’s measurements for visionaiding the user attitude is its incapability to observe the magnitude of the turns and
therefore the need for occasional absolute calibration. This problem was preliminarily addressed by developing a visual gyroscope utilizing the attitude information
obtained from an IMU to aid the vanishing point location computation. From the
disagreement of the vanishing point location provided by the IMU and the visual
gyroscope, the occurrence of a sharp turn was observed. Future work should address
the possibility to observe the magnitude of such a turn and therefore eliminate or
at least decrease the need for calibration during navigation. Utilization of more advanced computer vision methods for vanishing point calculation and their impact on
the visual gyroscope’s accuracy should also be assessed.
As the integration of different systems is beneficial, or even mandatory, for indoor
and urban area navigation, the error detection for visual measurements as well as for
observation from the other systems is crucial and should be a topic for further research. Means for emphasizing the strengths of all systems involved as well as covering for their deficiencies is a subject for the development of even more functional
and seamless integration means. Though this thesis addressed various challenges in
the indoor and urban navigation and proposed various visual processing methods for
132
8. Conclusions
positioning, the matter of ubiquitous pedestrian navigation is by no means yet solved
by using vision-aiding, but definitely improved.
BIBLIOGRAPHY
[1] GoPro’s web pages. Last accessed June 15, 2012. [Online]. Available:
http://gopro.com/cameras/hd-helmet-hero-camera/
[2] National geophysical data center (NGDC). Last accessed April, 2013.
[Online]. Available: http://www.ngdc.noaa.gov/geomag-web/#igrfwmm
[3] OpenCV open source visual library. Last accessed March 2, 2013. [Online].
Available: http://opencv.org/
[4] M. Afzal, “Use of Earth’s magnetic field for pedestrian navigation,” Ph.D.
dissertation, the University of Calgary, Canada, 2011.
[5] F. Alizahed-Shabdiz and M. Heidari, “Method of determining position in a
hybrid positioning system using a dilution of precision metric,” U.S. Patent
2011/0 080 317, 2009, 19 pages.
[6] D. Allan, “Statistics of atomic frequency standards,” Proc. of the IEEE, vol. 54,
no. 2, pp. 221–230, 1966.
[7] ADIS16375 Data Sheet (Rev. A), Analog Devices, 2010, pp. 28.
[8] H. Aoki, B. Schiele, and A. Pentland, “Realtime personal positioning system
for wearable computers,” in Proc. of the 3rd IEEE ISWC, San Fransisco, CA,
USA, Oct. 18–19, 1999, p. 37.
[9] J. Bancroft, “Multiple inertial measurement unit fusion for pedestrian navigation,” Ph.D. dissertation, the University of Calgary, Canada, 2010.
[10] J. Bancroft and G. Lachapelle, “Data fusion algorithms for multiple inertial
measurement units,” Sensors, vol. 11, pp. 6771–6798, 2011.
134
Bibliography
[11] J. Bancroft and G. Lachapelle, “Estimating MEMS gyroscope g-sensitivity
errors in foot mounted navigation,” in Proc. UPINLBS, Helsinki, Finland, Oct.
3–4, 2012, p. 6.
[12] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “Speeded-up robust features
(SURF),” Computer Vision and Image Understanding, vol. 110, pp. 346–359,
2008.
[13] J.-C. Bazin and M. Pollefeys, “3-line ransac for orthogonal vanishing point
detection,” in Proc. IROS, Algarve, Portugal, Oct. 7–12, 2012, pp. 4282–4287.
[14] J. Borkowski and M. Veth, “Passive indoor image-aided inertial attitude estimation using a predictive hough transformation,” in Proc. IEEE/ION Position Location Navigation Symp., Indian Wells, CA, USA, May 4–6, 2010, pp.
295–302.
[15] J.-Y. Bouguet. (2010) Camera calibration toolbox
lab. Last accessed March 27,
2013. [Online].
http://www.vision.caltech.edu/bouguetj/calib doc/index.html
for MatAvailable:
[16] J. Campbell, R. Sukthankar, I. Nourbakhsh, and A. Pahwa, “A robust visual
odometry and precipice detection system using consumer-grade monocular
vision,” in Proc. IEEE International Conference on Robotics and Automation,
Barcelona, Spain, Apr. 18–22, 2005, pp. 3421–3427.
[17] J. F. Canny, “A computational approach to edge detection,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, pp. 679–
698, 1986.
[18] M. J. Caruso, “Applications of magnetoresistive sensors in navigation systems,” SAE Technical Paper 970602, 1997.
[19] D. Chekhlov, M. Pupilli, W. Mayol-Cuevas, and A. Calway, “Robust real-time
visual SLAM using scale prediction and exemplar based feature description,”
in Proc. IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, Minneapolis, MN, USA, Jun. 18–23, 2007, pp. 1–7.
[20] L. Chen, H. Kuusniemi, Y. Chen, L. Pei, T. Kröger, and R. Chen, “Motion
restricted information filter for indoor bluetooth positioning,” Int. J. of Embedded and Real-Time Communication Systems, vol. 3, no. 3, pp. 54–66, 2012.
Bibliography
135
[21] R. Chen, H. Kuusniemi, Y. Chen, L. Pei, W. Chen, J. Liu, H. Leppäkoski,
and J. Takala, “Multi-sensor, multi-network positioning,” GPS World, vol. 21,
no. 2, pp. 18–28, 2010.
[22] W. Chen, R. Chen, Y. Chen, H. Kuusniemi, Z. Fu, and J. Wang, “An adaptive
calibration approach for a 2-axis digital compass in a low-cost pedestrian navigation system,” in Proc. I2MTC IEEE, Austin, TX, USA, May 3–6, 2010, pp.
1392–1397.
[23] T. Chu, N. Guo, S. Backén, and D. Akos, “Monocular camera/IMU/GNSS
integration for ground vehicle navigation in challenging GNSS environments,”
Sensors, vol. 12, pp. 3162–3185, 2012.
[24] J. Collin, “Investigations of self-contained sensors for personal navigation,”
Ph.D. dissertation, Tampere University of Technology, Finland, 2006.
[25] J. Coughlan and A. Yuille, “Manhattan World: Compass direction from a
single image by Bayesian interference,” in Proc. 7th IEEE International Conference on Computer Vision, Kerkyra, Greece, Sep. 20–27, 1999, pp. 1–10.
[26] A. Davison, “Real-time simultaneous localisation and mapping with a single
camera,” in Proc. IEEE International Conference on Computer Vision, vol. 2,
Nice, France, Oct. 13–16, 2003, pp. 1403–1410.
[27] P. Djuric, J. Kotecha, J. Zhang, Y. Huang, T. Ghirmai, M. Bugallo, and
J. Miguez, “Particle filtering,” IEEE Signal Processing Magazine, vol. 20,
no. 5, pp. 19–38, 2003.
[28] R. Duda and P. Hart, “Use of the Hough transformation to detect lines and
curves in pictures,” Communications of the ACM, vol. 15, no. 1, pp. 11–15,
1972.
[29] M. Fischler and R. Bolles, “Random sample consensus: a paradigm for model
fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 6, no. 24, pp. 381–395, 1981.
[30] J. Fletcher, M. Veth, and J. Raquet, “Real-time fusion of image and inertial
sensors for navigation,” in Proc. 63rd Annual Meeting of The Institute of Navigation, Cambridge, MA, US, Apr. 23–25, 2007, pp. 534–544.
136
Bibliography
[31] D. Forsyth and J. Ponce, Computer Vision: A Modern approach.
Saddle River, NJ, USA: Prentice Hall, 2003.
Upper
[32] A. Gallagher, “Using vanishing points to correct camera rotation in images,”
in Proc. 2nd Canadian Conference on Computer and Robot Vision, IEEE, Victoria, BC, Canada, May 9–11, 2005, pp. 460–467.
[33] M. Ge, G. Gendt, M. Rothacher, C. Shi, and J. Liu, “Resolution of GPS carrierphase ambiguities in precise point positioning (PPP) with daily observations,”
Journal of Geodesy, vol. 82, pp. 389–399, 2007.
[34] C. George, The Book of Digital Photography, 2nd ed.
ILEX, 2009.
Cambridge, England:
[35] D. Gerogiannis, C. Nikou, and A. Likas, “Fast and efficient vanishing point detection in indoor images,” in Proc. ICPR, Tsukuba, Japan, Nov. 11–15, 2012,
pp. 3244–3247.
[36] M. Grewal and A. Andrews, Kalman Filtering Theory and Practice using Matlab, 3rd ed. New York, NY, USA: Wiley, 2008.
[37] P. Groves, G. Pulford, C. Littlefield, D. Nash, and C. Mather, “Inertial navigation versus pedestrian dead reckoning: optimizing the integration,” in Proc.
ION GNSS, Fort Worth Convention Center, TX, USA, Sep. 24–25, 2007, pp.
2043–2055.
[38] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision,
2nd ed. Cambridge, UK: Cambridge University Press, 2003.
[39] R. I. Hartley and P. Sturm, “Triangulation,” Computer Vision and Image Understanding, vol. 68, no. 2, pp. 146–157, 1997.
[40] G. Hein, A. Teuber, H.-J. Thierfelder, and A. Wolfe, “GNSS indoors fighting
the fading, part 2,” Inside GNSS, vol. 3, no. 4, pp. 47–53, 2008.
[41] C. Hide, T. Botterill, and M. Andreotti, “Vision-aided IMU for handheld pedestrian navigation,” in Proc. ION GNSS, Portland, OR, USA, Sep. 21–24,
2010, pp. 534–541.
Bibliography
137
[42] C. Hide, T. Moore, and T. Botterill, “Low cost IMU, GPS, and camera integration for handheld indoor positioning,” in Proc. ION GNSS, Portland, OR,
USA, Sep. 19–23, 2011, pp. 1378–1385.
[43] H. Hile and G. Borriello, “Information overlay for camera phones in indoor
environments,” in LNCS 3rd International Symposium on Location- and Context Awareness, vol. 4718, Oberpfaffenhofen, Germany, Sep. 20–21, 2007, pp.
68–84.
[44] S. Hilsenbeck, A. Möller, R. Huitl, G. Schroth, M. Kranz, and E. Steinbach,
“Scale-preserving long-term visual odometry for indoor navigation,” Sydney,
Australia, Nov. 13–15, 2012, p. 10.
[45] C. Holzmann and M. Hochgatterer, “Measuring distance with mobile phones
using single-camera stereo vision,” in Proc. IEEE 32nd International Conference on Distributed Computing Systems Workshops, Macau, China, Jun.
18–21, 2012, pp. 88–93.
[46] P. Hough, “Method and means for recognizing complex patterns,” U.S. Patent
3 069 654, 1962.
[47] V. Huttunen and R. Piché, “A monocular camera gyroscope,” Department
of Mathematics, Tampere University of Technology, Finland, Tech. Rep. 98,
2011.
[48] Standard Atmosphere, International Organization for Standardization Std. ISO
2533:1975, 1975.
[49] R. Jirawimut, S. Prakoonwit, F. Cecelja, and W. Balachandran, “Visual odometer for pedestrian navigation,” in Proc. IEEE Instrumentation and Measurement Technology Conference, Anchorage, AK, USA, May 21–23, 2002,
pp. 43–48.
[50] S. Julier and J. Uhlmann, “A new extension of the Kalman filter to nonlinear systems,” in Proc. Aerosense: The 11th International Symposium
Aerospace/Defense Sensing, Orlando, FL, USA, Apr. 20–25, 1997, pp. 182–
192.
[51] R. Kalman, “A new approach to linear filtering and prediction problems,”
Transactions of the ASME, vol. 82, pp. 35–45, 1960.
138
Bibliography
[52] H. Kälviäinen, P. Hirvonen, L. Xu, and E. Oja, “Probabilistic and nonprobabilistic Hough transforms: overview and comparisons,” Image and Vision Computing, vol. 13, pp. 239–252, 1995.
[53] K. Kanatani, “Statistical analysis of geometric computation,” CVGIP: Image
Understanding, vol. 59, no. 3, pp. 286–306, 1991.
[54] E. Kaplan and D. Hegarty, Eds., Understanding GPS Principles and Applications. Norwood, MA, USA: Artech House, 2006.
[55] C. Kessler, C. Ascher, N. Frietsch, and M. W. G. Trommer, “Vision-based
attitude estimation for indoor navigation using vanishing points and lines,” in
Proc. IEEE/ION Position Location Navigation Symp., Indian Wells, CA, USA,
May 4–6, 2010, pp. 310–318.
[56] N. Kiriyati, Y. Eldar, and A. Bruckstein, “A probabilistic Hough Transform,”
Pattern Recognition, vol. 24, no. 4, pp. 303–316, 1991.
[57] M. Kirkko-Jaakkola, J. Collin, and J. Takala, “Bias prediction for MEMS gyroscopes,” IEEE Sensors Journal, vol. 12, no. 6, pp. 2157–2163, 2012.
[58] B. Kitt, J.Rehder, A. Chambers, M. Schnbein, H. Lategahn, and S. Singh,
“Monocular visual odometry using a planar road model to solve scale ambiguity,” in Proc. 5th European Conference on Mobile Robots, Örebro, Sweden,
Sep.7–9, 2011, pp. 43–48.
[59] D. Kocur, J. Rovňáková, and M. Švecová, “Through wall tracking of moving
targets by M-sequence UWB radar,” in Towards Intelligent Engineering and
Information Technology, ser. Studies in Computational Intelligence, I. Rudas,
J. Fodor, and J. Kacprzyk, Eds. Springer, 2009, vol. 243, pp. 349–364.
[60] J. Kosecka and W. Zhang, “Video compass,” in Proc. European Conference on
Computer Vision, Copenhagen, Denmark, May 28–31, 2002, pp. 657–673.
[61] J. Kuipers, Quaternions and Rotation Sequences. Princeton, NJ, USA: Princeton University Press, 1999.
[62] H. Kuusniemi, “User-level reliability and quality monitoring in satellite-based
personal navigation,” Ph.D. dissertation, Tampere University of Technology,
Finland, 2005.
Bibliography
139
[63] H. Kuusniemi, L. Chen, L. Pei, J. Liu, Y. Chen, L. Ruotsalainen, and R. Chen,
“Evaluation of Bayesian approaches for multi-sensor multi-network seamless
positioning,” in Proc. ION GNSS, Portland, OR, USA, Sep. 19–23, 2011, pp.
2137–2144.
[64] H. Kuusniemi, L. Chen, L. Ruotsalainen, L. Pei, Y. Chen, and R. Chen, “Multisensor multi-network seamless positioning with visual aiding,” in Proc. ICLGNSS, Tampere, Finland, Jun. 29–30, 2011, pp. 146–151.
[65] H. Kuusniemi, Y. Chen, and L. Chen, “Multi-sensor multi-network positioning,” in Ubiquitous positioning and mobile location-based services in smart
phones, R. Chen, Ed. Hershey, MA, USA: IGI Global, 2012, ch. 5.
[66] H. Kuusniemi, J. Liu, L. Pei, Y. Chen, and R. Chen, “Reliability considerations
of multi-sensor multi-network pedestrian navigation,” IET Radar,Sonar and
Navigation, vol. 6, no. 3, pp. 157–164, 2012.
[67] G. Lachapelle, M. Cannon, and G. Lu, “High precision GPS navigation with
emphasis on carrier phase ambiguity resolution,” Marine Geodesy, vol. 15,
no. 4, pp. 253–269, 1992.
[68] G. Lachapelle, H. Kuusniemi, D. Dao, G. Macgougan, and M. Cannon, “HSGPS signal analysis and performance under various indoor conditions,” Navigation, vol. 51, no. 1, pp. 29–43, 2004.
[69] X. Li, J. Wang, and W. Ding, “Vision-based positioning with a single camera
and 3D maps: accuracy and reliability analysis,” Journal of Global Positioning
Systems, vol. 10, no. 1, pp. 19–29, 2011.
[70] D. Lowe, “Object recognition from local scale-invariant features,” in Proc. International Conference on Computer Vision, Corfu, Greece, Sep. 20–25, 1999,
pp. 1150–1157.
[71] D. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
[72] L. Ma, Y. Chen, and K. Moore, “Analytical piecewise radial distortion model
for precision camera calibration,” IEE Proc. Vision, Image and Signal Processing, vol. 153, no. 4, pp. 468–474, 2006.
140
Bibliography
[73] J. Matas, C. Galambos, and J. Kittler, “Robust detection of lines using progressive probabilistic Hough Transform,” Computer Vision and Image Understanding, vol. 78, no. 1, pp. 119–137, 2000.
[74] R. Mautz, “Indoor positioning technologies,” Habilitation Thesis, ETH Zurich,
2012.
[75] P. Misra and P. Enge, Global Positioning System: Signals, Measurements, and
Performance. Lincoln, MA, USA: Ganga-Jamuna Press, 2006.
[76] D. Nistér, “An efficient solution to the five-point relative pose problem,” in
Proc. IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, vol. 2, Madison, WI, USA, Jun. 18–20, 2003, pp. 195–202.
[77] Nokia USA’s product web pages. Nokia. Last accessed April 15, 2012.
[Online]. Available: http://www.nokia.com/us-en/products/phone/n8-00/
[78] QT application framework. Nokia. Accessed 4.4.2013. [Online]. Available:
http://qt-project.org/
[79] N. Nourani-Vatani, J. Roberts, and M. Srinivasan, “Practical visual odometry
for car-like vehicles,” in Proc. IEEE International Conference on Robotics and
Automation, Kobe, Japan, May 12–17, 2009, pp. 3551–3557.
[80] J. Parviainen, J. Kantola, and J. Collin, “Differential barometry in personal navigation,” in Proc. IEEE/ION Position Location Navigation Symp.,
Monterey, CA, USA, May 5–8, 2008, pp. 148–152.
[81] L. Pei, R. Chen, J. Liu, H. Kuusniemi, Y. Chen, and T. Tenhunen, “Using
motion-awareness for the 3D indoor personal navigation on a smartphone,” in
Proc. ION GNSS, Portland, OR, USA, Sep. 21–23, 2011, pp. 2906–2913.
[82] D. Prahl and M. Veth, “Coupling vanishing point tracking with inertial navigation to produce drift-free attitude estimates in a structured environment,” in
Proc. ION GNSS, Portland, OR, USA, Sep. 22–24, 2010, pp. 3571–3581.
[83] M. Quddus, W. Ochieng, and R. Noland, “Current map-matching algortihms
for transport applications: state-of-the art and future research directions,”
Transportation Research Part C, vol. 15, pp. 312–328, 2007.
Bibliography
141
[84] B. Ristic, S. Arulampalm, and N. Gordon, Beyond the Kalman Filter: Particle
Filters for Tracking Applications. Norwood, MA, USA: Artech House, 2004.
[85] D. Robertson and R. Cipolla, “An image-based system for urban navigation,”
in Proc. British Machine Vision Conference, London, UK, Sep. 7–9, 2004, pp.
260–272.
[86] T. Roos, P. Myllymäki, H. Tirri, P. Misikangas, and J. Sievänen, “Probabilistic approach to WLAN user location estimation,” Int. J. Wirel. Inform. Netw.,
vol. 9, pp. 155–164, 2002.
[87] E. Rosten and T. Drummond, “Machine learning for high-speed corner detection,” in Proc. ECCV, Graz, Austria, May 7–13, 2006, pp. 430–443.
[88] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: an efficient alternative to sift or surf,” in Proc. ICCV, Barcelona, Spain, Nov. 6–13, 2011, pp.
2564 – 2571.
[89] L. Ruotsalainen, “Visual gyroscope and odometer for pedestrian indoor navigation with a smartphone,” in Proc. ION GNSS, Nashville, TN, USA, Sep.
19–21, 2012, pp. 2422–2431.
[90] L. Ruotsalainen, J. Bancroft, H. Kuusniemi, G. Lachapelle, and R. Chen,
“Utilizing visual measurements for obtaining robust attitude and positioning
for pedestrians,” in Proc. ION GNSS, Nashville, TN, USA, Sep. 19–21, 2012,
pp. 2454–2461.
[91] L. Ruotsalainen, J. Bancroft, and G. Lachapelle, “Mitigation of attitude and
gyro errors through vision aiding,” in Proc. IPIN, Sydney, Australia, Nov. 13–
15, 2012, p. 9.
[92] L. Ruotsalainen, J. Bancroft, G. Lachapelle, H. Kuusniemi, and R. Chen, “Effect of camera characteristics on the accuracy of a visual gyroscope for indoor
pedestrian navigation,” in Proc. UPINLBS, Helsinki, Finland, Oct. 2–4, 2012,
p. 8.
[93] L. Ruotsalainen, H. Kuusniemi, M. Bhuiyan, L. Chen, and R. Chen, “Twodimensional pedestrian navigation solution aided with a visual gyroscope and
a visual odometer,” GPS Solutions, vol. 17, pp. 575–586, 2012. [Online].
Available: DOI: 10.1007/s10291-012-0302-8
142
Bibliography
[94] L. Ruotsalainen, H. Kuusniemi, and R. Chen, “Heading change detection for
indoor navigation with a smartphone camera,” in Proc. IPIN, Guimaraes, Portugal, Sep. 21–23, 2011, p. 7.
[95] L. Ruotsalainen, H. Kuusniemi, and R. Chen, “Visual-aided two-dimensional
pedestrian indoor navigation in a smartphone,” The Journal of Global Positioning Systems, vol. 10, no. 1, pp. 11–18, 2011.
[96] N. Sagias and G. Karagiannidis, “Gaussian class multivariate Weibull distributions: Theory and applications in fading channels,” IEEE Trans. Inf. Theory,
vol. 51, no. 10, 2005.
[97] A. Santos, L. Tarrataca, and J. Cardoso, “An analysis of navigation algorithms
for smartphones using J2ME,” in MobileWireless Middleware, Operating Systems, and Applications, ser. LNICST. Springer Berlin Heidelberg, 2009,
vol. 7, pp. 266–279.
[98] G. Shrivakshan and C. Chandrasekar, “A comparison of various edge detection
techniques used in image processing,” IJCSI International Journal of Computer Science Issues, vol. 9, no. 1, pp. 269–276, 2012.
[99] A. Soloviev and D. Venable, “When GNSS goes blind. Integrating vision
measurements for navigation in signal-challenged environment,” Inside GNSS,
vol. 5, no. 7, pp. 18–29, 2010.
[100] HXRMC1 Digital HD POV Camera Recorder. Sony. Last accessed August 26,
2013. [Online]. Available: http://pro.sony.com/bbsc/ssr/product-HXRMC1/
[101] V. Stantchev, T. Schulz, D. H. Trung, and I. Ratchinski, “Optimizing clinical
processes with position-sensing,” IEEE IT Professional, vol. 10, no. 2, pp.
31–37, 2008.
[102] U. Steinhoff, D. Omerčević, R. Perko, B. Schiele, and A. Leonardis, “How
computer vision can help in outdoor positioning,” in Proceedings of the 2007
European conference on Ambient intelligence, ser. AmI’07, vol. 4794, Darmstadt, Germany, Nov. 7–10, 2007, pp. 124–141.
[103] D. Strelow, “Motion estimation from image and inertial measurements,” Ph.D.
dissertation, Carnegie Mellon University, Pittsburgh, PA, 2004.
Bibliography
143
[104] M. Susi, V. Renaudin, and G. Lachapelle, “Detection of quasi-static instants
from handheld MEMS devices,” in Proc. IPIN, Guimaraes, Portugal, Sep. 21–
23, 2011, p. 9.
[105] D. Titterton and J. Weston, Strapdown Inertial Navigation Technology, 2nd ed.
Stevenage, UK: IET, 2004.
[106] P. Torr, A. Fitzgibbon, and A. Zisserman, “The problem of degeneracy in structure and motion recovery from uncalibrated image sequences,” International
Journal of Computer Vision, vol. 32, no. 1, pp. 27–44, 1999.
[107] Demo software:
Sift keypoint detector. University of British Columbia. Last accessed April 17, 2013. [Online]. Available:
http://www.cs.ubc.ca/˜lowe/keypoints
[108] M. Veth and J. Raquet, “Fusion of low-cost imaging and inertial sensors for
navigation,” Air Force Institute of Technology, OH, Tech. Rep. 0704-0188,
2007, 11 pages.
[109] G. Welch and G. Bishop, “An introduction to the Kalman filter,” University of
North Carolina at Chapel Hill, NC, Tech. Rep., 1995.
[110] E. Williams. Aviation formulary v1.46. Last accessed April 23, 2013.
[Online]. Available: http://williams.best.vwh.net/avform.htm
[111] Y. Xu, S. Oh, and A. Hoogs, “A minimum error vanishing point detection
approach for uncalibrated monocular images of man-made environments,” in
Proc. CVPR, Portland, OR, USA, Jun. 23–28, 2013, pp. 1376–1383.
[112] M. Youssef, A. Agrawala, and A. Shankar, “WLAN location determination
via clustering and probability distributions,” in Proc. First IEEE International
Conference on Pervasive Computing and Communications, Dallas Fort-Worth,
TX, USA, Mar. 23–26, 2003, pp. 143–150.
[113] W. Zhang and J. Kosecka, “Image based localization in urban environments,”
in Proc. The Third International Symposium on 3D Data Processing, Visualization and Transmission, Chapel Hill, NC, USA, Jun. 14–16, 2006, pp. 33–40.
[114] X. Zhang, P. Mumford, and C. Rizos, “Allan variance analysis on error characters of MEMS inertial sensors for an FPGA-based GPS/INS system,” in Proc.
144
Bibliography
International Symposium on GPS/GNSS, Tokyo, Japan, Nov. 11–14, 2008, pp.
127–133.
[115] X. Zhang, A. Rad, and Y.-K. Wong, “Sensor fusion of monocular cameras
and laser rangefinders for line-based simultaneous localization and mapping
(SLAM) tasks in autonomous mobile robots,” Sensors, vol. 12, pp. 429–452,
2012.
Bibliography
145
Suomen Geodeettisen laitoksen julkaisut:
Verffentlichungen des Finnischen Geodtischen Institutes:
Publications of the Finnish Geodetic Institute:
1. Y. VÄISÄLÄ: Tafeln für geodätische Berechnungen nach den Erddimensionen von Hayford.
Helsinki 1923. 30 S.
2. Y. VÄISÄLÄ: Die Anwendung der Lichtinterferenz zu Längenmessungen auf grösseren Distanzen. Helsinki 1923. 22 S.
3. ILMARI BONSDORFF, Y. LEINBERG, W. HEISKANEN: Die Beobachtungsergebnisse der
südfinnischen Triangulation in den Jahren 1920-1923. Helsinki 1924. 235 S.
4. W. HEISKANEN: Untersuchungen über Schwerkraft und Isostasie. Helsinki 1924. 96 S. 1
Karte.
5. W. HEISKANEN: Schwerkraft und isostatische Kompensation in Norwegen. Helsinki 1926. 33
S. 1 Karte.
6. W. HEISKANEN: Die Erddimensionen nach den europäischen Gradmessungen. Helsinki 1926.
26 S.
7. ILMARI BONSDORFF, V.R. ÖLANDER, Y. LEINBERG: Die Beobachtungsergebnisse der
südfinnischen Triangulation in den Jahren 1924-1926. Helsinki 1927. 164 S. 1 Karte.
8. V.R. ÖLANDER: Ausgleichung einer Dreieckskette mit Laplaceschen Punkten. Helsinki 1927.
49 S. 1 Karte.
9. U. PESONEN: Relative Bestimmungen der Schwerkraft auf den Dreieckspunkten der südfinnischen Triangulation in den Jahren 1924-1925. Helsinki 1927. 129 S.
10. ILMARI BONSDORFF: Das Theorem von Clairaut und die Massenverteilung im Erdinnern.
Helsinki 1929. 10 S.
11. ILMARI BONSDORFF, V.R. ÖLANDER, W. HEISKANEN, U. PESONEN: Die Beobachtungsergebnisse der Triangulationen in den Jahren 1926-1928. Helsinki 1929. 139 S. 1 Karte.
12. W. HEISKANEN: Über die Elliptizität des Erdäquators. Helsinki 1929. 18 S.
13. U. PESONEN: Relative Bestimmungen der Schwerkraft in Finnland in den Jahren 1926-1929.
Helsinki 1930. 168 S. 1 Karte.
14. Y. VÄISÄLÄ: Anwendung der Lichtinterferenz bei Basismessungen. Helsinki 1930. 47 S.
15. M. FRANSSILA: Der Einfluss der den Pendel umgebenden Luft auf die Schwingungszeit beim
v. Sterneckschen Pendelapparat. Helsinki 1931. 23 S.
16. Y. LEINBERG: Ergebnisse der astronomischen Ortsbestimmungen auf den finnischen Dreieckspunkten. Helsinki 1931. 162 S.
17. V.R. ÖLANDER: Über die Beziehung zwischen Lotabweichungen und Schwereanomalien sowie
über das Lotabweichungssystem in Süd-Finnland. Helsinki 1931. 23 S.
18. PENTTI KALAJA, UUNO PESONEN, V.R. ÖLANDER, Y. LEINBERG: Beobachtungsergebnisse. Helsinki 1933. 240 S. 1 Karte.
19. R.A. HIRVONEN: The continental undulations of the geoid. Helsinki 1934. 89 pages. 1 map.
146
Bibliography
20. ILMARI BONSDORFF: Die Länge der Versuchsbasis von Helsinki und Längenveränderungen
der Invardrähte 634-637. Helsinki 1934. 41 S.
21. V.R. ÖLANDER: Zwei Ausgleichungen des grossen südfinnischen Dreieckskranzes. Helsinki
1935. 66 S. 1 Karte.
22. U. PESONEN, V.R. ÖLANDER: Beobachtungsergebnisse. Winkelmessungen in den Jahren
1932-1935. Helsinki 1936. 148 S. 1 Karte.
23. R.A. HIRVONEN: Relative Bestimmungen der Schwerkraft in Finnland in den Jahren 1931,
1933 und 1935. Helsinki 1937. 151 S.
24. R.A. HIRVONEN: Bestimmung des Schwereunterschiedes Helsinki-Potsdam im Jahre 1935
und Katalog der finnischen Schwerestationen. Helsinki 1937. 36 S. 1 Karte.
25. T.J. KUKKAMÄKI: Über die nivellitische Refraktion. Helsinki 1938. 48 S.
26. Finnisches Geodätisches Institut 1918-1938. Helsinki 1939. 126 S. 2 Karten.
27. T.J. KUKKAMÄKI: Formeln und Tabellen zur Berechnung der nivellitischen Refraktion. Helsinki 1939. 18 S.
28. T.J. KUKKAMÄKI: Verbesserung der horizontalen Winkelmessungen wegen der Seitenrefraktion. Helsinki 1939. 18 S.
29. ILMARI BONSDORFF: Ergebnisse der astronomischen Ortsbestimmungen im Jahre 1933.
Helsinki 1939. 47 S.
30. T. HONKASALO: Relative Bestimmungen der Schwerkraft in Finnland im Jahre 1937. Helsinki 1941. 78 S.
31. PENTTI KALAJA: Die Grundlinienmessungen des Geodätischen Institutes in den Jahren 19331939 nebst Untersuchungen über die Verwendung der Invardrähte. Helsinki 1942. 149 S.
32. U. PESONEN, V.R. ÖLANDER: Beobachtungsergebnisse. Winkelmessungen in den Jahren
1936-1940. Helsinki 1942. 165 S. 1 Karte.
33. PENTTI KALAJA: Astronomische Ortsbestimmungen in den Jahren 1935-1938. Helsinki 1944.
142 S.
34. V.R. ÖLANDER: Astronomische Azimutbestimmungen auf den Dreieckspunkten in den Jahren
1932-1938; Lotabweichungen und Geoidhöhen. Helsinki 1944. 107 S. 1 Karte.
35. U. PESONEN: Beobachtungsergebnisse. Winkelmessungen in den Jahren 1940-1947. Helsinki
1948. 165 S. 1 Karte.
36. Professori Ilmari Bonsdorffille hänen 70-vuotispäivänään omistettu juhlajulkaisu. Publication
dedicated to Ilmari Bonsdorff on the occasion of his 70th anniversary. Helsinki 1949. 262 pages.
13 maps.
37. TAUNO HONKASALO: Measuring of the 864 m-long Nummela standard base line with the
Väisälä light interference comparator and some investigations into invar wires. Helsinki 1950.
88 pages.
38. V.R. ÖLANDER: On the geoid in the Baltic area and the orientation of the Baltic Ring. Helsinki
1950. 26 pages.
39. W. HEISKANEN: On the world geodetic system. Helsinki 1951. 25 pages.
Bibliography
147
40. R.A. HIRVONEN: The motions of Moon and Sun at the solar eclipse of 1947 May 20th. Helsinki 1951. 36 pages.
41. PENTTI KALAJA: Catalogue of star pairs for northern latitudes from 55◦ to 70◦ for astronomic
determination of latitudes by the Horrebow-Talcott method. Helsinki 1952. 191 pages.
42. ERKKI KÄÄRIÄINEN: On the recent uplift of the Earth’s crust in Finland. Helsinki 1953. 106
pages. 1 map.
43. PENTTI KALAJA: Astronomische Ortsbestimmungen in den Jahren 1946-1948. Helsinki 1953.
146 S.
44. T.J. KUKKAMÄKI, R.A. HIRVONEN: The Finnish solar eclipse expeditions to the Gold Coast
and Brazil 1947. Helsinki 1954. 71 pages.
45. JORMA KORHONEN: Einige Untersuchungen über die Einwirkung der Abrundungsfehler bei
Gross-Ausgleichungen. Neu-Ausgleichung des südfinnischen Dreieckskranzes. Helsinki 1954.
138 S. 3 Karten.
46. Professori Weikko A. Heiskaselle hänen 60-vuotispäivänään omistettu juhlajulkaisu. Publication dedicated to Weikko A. Heiskanen on the occasion of his 60th anniversary. Helsinki 1955.
214 pages.
47. Y. VÄISÄLÄ: Bemerkungen zur Methode der Basismessung mit Hilfe der Lichtinterferenz.
Helsinki 1955. 12 S.
48. U. PESONEN, TAUNO HONKASALO: Beobachtungsergebnisse der finnischen Triangulationen in den Jahren 1947-1952. Helsinki 1957. 91 S.
49. PENTTI KALAJA: Die Zeiten von Sonnenschein, Dämmerung und Dunkelheit in verschiedenen
Breiten. Helsinki 1958. 63 S.
50. V.R. ÖLANDER: Astronomische Azimutbestimmungen auf den Dreieckspunkten in den Jahren
1938-1952. Helsinki 1958. 90 S. 1 Karte.
51. JORMA KORHONEN, V.R. ÖLANDER, ERKKI HYTÖNEN: The results of the base extension nets of the Finnish primary triangulation. Helsinki 1959. 57 pages. 5 appendices. 1 map.
52. V.R. ÖLANDER: Vergleichende Azimutbeobachtungen mit vier Instrumenten. Helsinki 1960.
48 pages.
53. Y. VÄISÄLÄ, L. OTERMA: Anwendung der astronomischen Triangulationsmethode. Helsinki
1960. 18 S.
54. V.R. ÖLANDER: Astronomical azimuth determinations on trigonometrical stations in the years
1955-1959. Helsinki 1961. 15 pages.
55. TAUNO HONKASALO: Gravity survey of Finland in years 1945-1960. Helsinki 1962. 35
pages. 3 maps.
56. ERKKI HYTÖNEN: Beobachtungsergebnisse der finnischen Triangulationen in den Jahren
1953-1962. Helsinki 1963. 59 S.
57. ERKKI KÄÄRIÄINEN: Suomen toisen tarkkavaaituksen kiintopisteluettelo I. Bench mark list
I of the Second Levelling of Finland. Helsinki 1963. 164 pages. 2 maps.
58. ERKKI HYTÖNEN: Beobachtungsergebnisse der finnischen Triangulationen in den Jahren
1961-1962. Helsinki 1963. 32 S.
148
Bibliography
59. AIMO KIVINIEMI: The first order gravity net of Finland. Helsinki 1964. 45 pages.
60. V.R. ÖLANDER: General list of astronomical azimuths observed in 1920-1959 in the primary
triangulation net. Helsinki 1965. 47 pages. 1 map.
61. ERKKI KÄÄRIÄINEN: The second levelling of Finland in 1935-1955. Helsinki 1966. 313
pages. 1 map.
62. JORMA KORHONEN: Horizontal angles in the first order triangulation of Finland in 19201962. Helsinki 1966. 112 pages. 1 map.
63. ERKKI HYTÖNEN: Measuring of the refraction in the Second Levelling of Finland. Helsinki
1967. 18 pages.
64. JORMA KORHONEN: Coordinates of the stations in the first order triangulation of Finland.
Helsinki 1967. 42 pages. 1 map.
65. Geodeettinen laitos - The Finnish Geodetic Institute 1918-1968. Helsinki 1969. 147 pages. 4
maps.
66. JUHANI KAKKURI: Errors in the reduction of photographic plates for the stellar triangulation.
Helsinki 1969. 14 pages.
67. PENTTI KALAJA, V.R. ÖLANDER: Astronomical determinations of latitude and longitude in
1949-1958. Helsinki 1970. 242 pages. 1 map.
68. ERKKI KÄÄRIÄINEN: Astronomical determinations of latitude and longitude in 1954-1960.
Helsinki 1970. 95 pages. 1 map.
69. AIMO KIVINIEMI: Niinisalo calibration base line. Helsinki 1970. 36 pages. 1 sketch appendix.
70. TEUVO PARM: Zero-corrections for tellurometers of the Finnish Geodetic Institute. Helsinki
1970. 18 pages.
71. ERKKI KÄÄRIÄINEN: Astronomical determinations of latitude and longitude in 1961-1966.
Helsinki 1971. 102 pages. 1 map.
72. JUHANI KAKKURI: Plate reduction for the stellar triangulation. Helsinki 1971. 38 pages.
73. V.R. ÖLANDER: Reduction of astronomical latitudes and longitudes 1922-1948 into FK4 and
CIO systems. Helsinki 1972. 40 pages.
74. JUHANI KAKKURI AND KALEVI KALLIOMÄKI: Photoelectric time micrometer. Helsinki
1972. 53 pages.
75. ERKKI HYTÖNEN: Absolute gravity measurement with long wire pendulum. Helsinki 1972.
142 pages.
76. JUHANI KAKKURI: Stellar triangulation with balloon-borne beacons. Helsinki 1973. 48
pages.
77. JUSSI KÄÄRIÄINEN: Beobachtungsergebnisse der finnischen Winkelmessungen in den Jahren
1969-70. Helsinki 1974. 40 S.
78. AIMO KIVINIEMI: High precision measurements for studying the secular variation in gravity
in Finland. Helsinki 1974. 64 pages.
79. TEUVO PARM: High precision traverse of Finland. Helsinki 1976. 64 pages.
Bibliography
149
80. R.A. HIRVONEN: Precise computation of the precession. Helsinki 1976. 25 pages.
81. MATTI OLLIKAINEN: Astronomical determinations of latitude and longitude in 1972-1975.
Helsinki 1977. 90 pages. 1 map.
82. JUHANI KAKKURI AND JUSSI KÄÄRIÄINEN: The Second Levelling of Finland for the
Aland archipelago. Helsinki 1977. 55 pages.
83. MIKKO TAKALO: Suomen Toisen tarkkavaaituksen kiintopisteluettelo II. Bench mark list II
of the Second Levelling of Finland. Helsinki 1977. 150 sivua.
84. MATTI OLLIKAINEN: Astronomical azimuth determinations on triangulation stations in 19621970. Helsinki 1977. 47 pages. 1 map.
85. MARKKU HEIKKINEN: On the tide-generating forces. Helsinki 1978. 150 pages.
86. PEKKA LEHMUSKOSKI AND JAAKKO MÄKINEN: Gravity measurements on the ice of
Bothnian Bay. Helsinki 1978. 27 pages.
87. T.J. KUKKAMÄKI: Väisälä interference comparator. Helsinki 1978. 49 pages.
88. JUSSI KÄÄRIÄINEN: Observing the Earth Tides with a long water-tube tiltmeter. Helsinki
1979. 74 pages.
89. Publication dedicated to T.J. Kukkamäki on the occasion of his 70th anniversary. Helsinki 1979.
184 pages.
90. B. DUCARME AND J. KÄÄRIÄINEN: The Finnish Tidal Gravity Registrations in Fennoscandia. Helsinki 1980. 43 pages.
91. AIMO KIVINIEMI: Gravity measurements in 1961-1978 and the results of the gravity survey
of Finland in 1945-1978. Helsinki 1980. 18 pages. 3 maps.
92. LIISI OTERMA: Programme de latitude du tube zénithal visuel de l’observatoire Turku-Tuorla
systéme amélioré de 1976. Helsinki 1981. 18 pages.
93. JUHANI KAKKURI, AIMO KIVINIEMI AND RAIMO KONTTINEN: Contributions from
the Finnish Geodetic Institute to the Tectonic Plate Motion Studies in the Area between the
Pamirs and Tien-Shan Mountains. Helsinki 1981. 34 pages.
94. JUSSI KÄÄRIÄINEN: Measurement of the Ekeberg baseline with invar wires. Helsinki 1981.
17 pages.
95. MATTI OLLIKAINEN: Astronomical determinations of latitude and longitude in 1976-1980.
Helsinki 1982. 90 pages. 1 map.
96. RAIMO KONTTINEN: Observation results. Angle measurements in 1977-1978. Helsinki
1982. 29 pages.
97. G.P. ARNAUTOV, YE N. KALISH, A. KIVINIEMI, YU F. STUS, V.G. TARASIUK, S.N.
SCHEGLOV: Determination of absolute gravity values in Finland using laser ballistic gravimeter. Helsinki 1982. 18 pages.
98. LEENA MIKKOLA (EDITOR): Mean height map of Finland. Helsinki 1983. 3 pages. 1 map.
99. MIKKO TAKALO AND JAAKKO MÄKINEN: The Second Levelling of Finland for Lapland.
Helsinki 1983. 144 pages.
100. JUSSI KÄÄRIÄINEN: Baseline Measurements with invar wires in Finland 1958-1970. Helsinki 1984. 78 pages.
150
Bibliography
101. RAIMO KONTTINEN: Plate motion studies in Central Asia. Helsinki 1985. 31 pages.
102. RAIMO KONTTINEN: Observation results. Angle measurements in 1979-1983. Helsinki
1985. 30 pages.
103. J. KAKKURI, T.J. KUKKAMÄKI, J.-J. LEVALLOIS ET H. MORITZ: Le 250e anniversaire
de la mesure de l’arc du meridien en Laponie. Helsinki 1986. 60 pages.
104. G. ASCH, T. JAHR, G. JENTZSCH, A. KIVINIEMI AND J. KÄÄRIÄINEN: Measurements
of Gravity Tides along the ’Blue Road Geotraverse’ in Fennoscandia. Helsinki 1987. 57 pages.
105. JUSSI KÄÄRIÄINEN, RAIMO KONTTINEN, LU QIANKUN AND DU ZONG YU: The
Chang Yang Standard Baseline. Helsinki 1986. 36 pages.
106. E.W. GRAFAREND, H. KREMERS, J. KAKKURI AND M. VERMEER: Adjusting the SW
Finland Triangular Network with the TAGNET 3-D operational geodesy software. Helsinki
1987. 60 pages.
107. MATTI OLLIKAINEN: Astronomical determinations of latitude and longitude in 1981-1983.
Helsinki 1988. 37 pages.
108. MARKKU POUTANEN: Observation results. Angle measurements in 1967-1973. Helsinki
1988. 35 pages.
109. JUSSI KÄÄRIÄINEN, RAIMO KONTTINEN AND ZSUZSANNA NÉMETH: The Gödöllö Standard Baseline. Helsinki 1988. 66 pages.
110. JUSSI KÄÄRIÄINEN AND HANNU RUOTSALAINEN: Tilt measurements in the underground laboratory Lohja 2, Finland, in 1977-1987. Helsinki 1989. 37 pages.
111. MIKKO TAKALO: Lisäyksiä ja korjauksia Suomen tarkkavaaitusten linjastoon 1977-1989.
Helsinki 1991. 98 sivua.
112. RAIMO KONTTINEN: Observation results. Angle measurements in the Pudasjärvi loop in
1973-1976. Helsinki 1991. 42 pages.
113. RAIMO KONTTINEN, JORMA JOKELA AND LI QUAN: The remeasurement of the Chang
Yang Standard Baseline. Helsinki 1991. 40 pages.
114. JUSSI KÄÄRIÄINEN, RAIMO KONTTINEN AND MARKKU POUTANEN: Interference
measurements of the Nummela Standard Baseline in 1977, 1983, 1984 and 1991. Helsinki
1992. 78 pages.
115. JUHANI KAKKURI (EDITOR): Geodesy and geophysics. Helsinki 1993. 200 pages.
116. JAAKKO MÄKINEN, HEIKKI VIRTANEN, QIU QI-XIAN AND GU LIANG-RONG: The
Sino-Finnish absolute gravity campaign in 1990. Helsinki 1993. 49 pages.
117. RAIMO KONTTINEN: Observation results. Geodimeter observations in 1971-72, 1974-80 and
1984-85. Helsinki 1994. 58 pages.
118. RAIMO KONTTINEN: Observation results. Angle measurements in 1964-65, 1971, 1984 and
1986-87. Helsinki 1994. 67 pages.
119. JORMA JOKELA: The 1993 adjustment of the Finnish First-Order Terrestrial Triangulation.
Helsinki 1994. 137 pages.
120. MARKKU POUTANEN (EDITOR): Interference measurements of the Taoyuan Standard Baseline.
Helsinki 1995. 35 pages.
Bibliography
151
121. JORMA JOKELA: Interference measurements of the Chang Yang Standard Baseline in 1994.
Kirkkonummi 1996. 32 pages.
122. OLLI JAAKKOLA: Quality and automatic generalization of land cover data. Kirkkonummi
1996. 39 pages.
123. MATTI OLLIKAINEN: Determination of orthometric heights using GPS levelling. Kirkkonummi
1997. 143 pages.
124. TIINA KILPELÄINEN: Multiple Representation and Generalization of Geo-Databases for Topographic Maps. Kirkkonummi 1997. 229 pages.
125. JUSSI KÄÄRIÄINEN AND JAAKKO MÄKINEN: The 1979-1996 gravity survey and the results of the gravity survey of Finland 1945-1996. Kirkkonummi 1997. 24 pages. 1 map.
126. ZHITONG WANG: Geoid and crustal structure in Fennoscandia. Kirkkonummi 1998. 118
pages.
127. JORMA JOKELA AND MARKKU POUTANEN: The Väisälä baselines in Finland. Kirkkonummi
1998. 61 pages.
128. MARKKU POUTANEN: Sea surface topography and vertical datums using space geodetic
techniques. Kirkkonummi 2000. 158 pages
129. MATTI OLLIKAINEN, HANNU KOIVULA AND MARKKU POUTANEN: The Densification of the EUREF Network in Finland. Kirkkonummi 2000. 61 pages.
130. JORMA JOKELA, MARKKU POUTANEN, ZHAO JINGZHAN, PEI WEILI, HU ZHENYUAN AND ZHANG SHENGSHU: The Chengdu Standard Baseline. Kirkkonummi 2000. 46
pages.
131. JORMA JOKELA, MARKKU POUTANEN, ZSUZSANNA NÉMETH AND GÁBOR VIRÁG:
Remeasurement of the Gödöllö Standard Baseline. Kirkkonummi 2001. 37 pages.
132. ANDRES RÜDJA: Geodetic Datums, Reference Systems and Geodetic Networks in Estonia.
Kirkkonummi 2004. 311 pages.
133. HEIKKI VIRTANEN: Studies of Earth Dynamics with the Superconducting Gravimeter. Kirkkonummi
2006. 130 pages.
134. JUHA OKSANEN: Digital elevation model error in terrain analysis. Kirkkonummi 2006. 142
pages. 2 maps.
135. MATTI OLLIKAINEN: The EUVN-DA GPS campaign in Finland. Kirkkonummi 2006. 42
pages.
136. ANNU-MAARIA NIVALA: Usability perspectives for the design of interactive maps. Kirkkonummi
2007. 157 pages.
137. XIAOWEI YU: Methods and techniques for forest change detection and growth estimation using
airborne laser scanning data. Kirkkonummi 2007. 132 pages.
138. LASSI LEHTO: Real-time content transformations in a WEB service-based delivery architecture for geographic information. Kirkkonummi 2007. 150 pages.
139. PEKKA LEHMUSKOSKI, VEIKKO SAARANEN, MIKKO TAKALO AND PAAVO ROUHIAINEN: Suomen Kolmannen tarkkavaaituksen kiintopisteluettelo. Bench Mark List of the Third
Levelling of Finland. Kirkkonummi 2008. 220 pages.
152
Bibliography
140. EIJA HONKAVAARA: Calibrating digital photogrammetric airborne imaging systems using a
test field. Kirkkonummi 2008. 139 pages.
141. MARKKU POUTANEN, EERO AHOKAS, YUWEI CHEN, JUHA OKSANEN, MARITA
PORTIN, SARI RUUHELA, HELI SUURMÄKI (EDITORS): Geodeettinen laitos Geodetiska
Institutet Finnish Geodetic Institute 19182008. Kirkkonummi 2008. 173 pages.
142. MIKA KARJALAINEN: Multidimensional SAR Satellite Images a Mapping Perspective. Kirkkonummi
2010. 132 pages.
143. MAARIA NORDMAN: Improving GPS time series for geodynamic studies. Kirkkonummi
2010. 116 pages.
144. JORMA JOKELA AND PASI HÄKLI: Interference measurements of the Nummela Standard
Baseline in 2005 and 2007. Kirkkonummi 2010. 85 pages.
145. EETU PUTTONEN: Tree Species Classification with Multiple Source Remote Sensing Data.
Kirkkonummi 2012. 162 pages.
146. JUHA SUOMALAINEN: Empirical Studies on Multiangular, Hyperspectral, and Polarimetric
Reflectance of Natural Surfaces. Kirkkonummi 2012. 144 pages.
147. LEENA MATIKAINEN: Object-based interpretation methods for mapping built-up areas. Kirkkonummi
2012. 210 pages.
148. LAURI MARKELIN: Radiometric calibration, validation and correction of multispectral photogrammetric imagery. Kirkkonummi 2013. 160 pages.
149. XINLIAN LIANG: Feasibility of Terrestrial Laser Scanning for Plotwise Forest Inventories.
Kirkkonummi 2013. 150 pages.
150. EERO AHOKAS: Aspects of accuracy, scanning angle optimization, and intensity calibration
related to nationwide laser scanning. Kirkkonummi 2013. 124 pages.
151. LAURA RUOTSALAINEN: Vision-aided Pedestrian Navigation for Challenging GNSS Environments. Kirkkonummi 2013. 180 pages.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement