# Ruotsalainen

Tampere University of Technology Vision-Aided Pedestrian Navigation for Challenging GNSS Environments Citation Ruotsalainen, L. (2013). Vision-Aided Pedestrian Navigation for Challenging GNSS Environments. (Suomen geodeettisen laitoksen julkaisuja - Publications of the Finnish Geodetic Institute;151). Suomen geodeettinen laitos. Year 2013 Link to publication TUTCRIS Portal (http://www.tut.fi/tutcris) Take down policy If you believe that this document breaches copyright, please contact [email protected], and we will remove access to the work immediately and investigate your claim. Download date:15.09.2017 SUOMEN GEODEETTISEN LAITOKSEN JULKAISUJA VERFFENTLICHUNGEN DES FINNISCHEN GEODTISCHEN INSTITUTES PUBLICATIONS OF THE FINNISH GEODETIC INSTITUTE ================== N:o 151 ================== Vision-Aided Pedestrian Navigation for Challenging GNSS Environments by Laura Ruotsalainen Doctoral dissertation for the degree of Doctor of Science in Technology to be presented with due permission for public examination and debate in Tietotalo Building, Auditorium TB109, at Tampere University of Technology on the 4th of November 2013 at 12 noon. KIRKKONUMMI 2013 ii ISBN (printed): 978-951-711-302-1 ISBN (pdf): 978-951-711-303-8 ISSN: 0085-6932 Juvenes Print Oy, 2013 iii Supervising professor Professor Ruizhi Chen, Texas A&M University Corpus Christi, Conrad Blucher Institute of Surveying and Science, School of Engineering and Computer Science Thesis co-supervisors Professor Gérard Lachapelle, University of Calgary, Department of Geomatics Engineering Professor Jarmo Takala, Tampere University of Technology, Department of Pervasive Computing Preliminary examiners Professor Andreas Wieser, Swiss Federal Institute of Technology Zurich ETH, Institute of Geodesy and Photogrammetry Professor Jari Hannuksela, University of Oulu, Department of Computer Science and Engineering Opponents D.Sc.(Tech) Susanna Pirttikangas, University of Oulu, Department of Computer Science and Engineering D.Sc.(Tech) Jari Syrjärinne, Nokia Oyj iv ABSTRACT There is a strong need for an accurate pedestrian navigation system, functional also in GNSS challenging environments, namely urban areas and indoors, for improved safety and to enhance everyday life. Pedestrian navigation is mainly needed in these environments that are challenging for GNSS but also for other RF positioning systems and some non-RF systems such as the magnetometry used for heading due to the presence of ferrous material. Indoor and urban navigation has been an active research area for years. There is no individual system at this time that can address all needs set for pedestrian navigation in these environments, but a fused solution of different sensors can provide better accuracy, availability and continuity. Self-contained sensors, namely digital compasses for measuring heading, gyroscopes for heading changes and accelerometers for the user speed, constitute a good option for pedestrian navigation. However, their performance suffers from noise and biases that result in large position errors increasing with time. Such errors can however be mitigated using information about the user motion obtained from consecutive images taken by a camera carried by the user, provided that its position and orientation with respect to the user’s body are known. The motion of the features in the images may then be transformed into information about the user’s motion. Due to its distinctive characteristics, this vision-aiding complements other positioning technologies in order to provide better pedestrian navigation accuracy and reliability. This thesis discusses the concepts of a visual gyroscope that provides the relative user heading and a visual odometer that provides the translation of the user between the consecutive images. Both methods use a monocular camera carried by the user. The visual gyroscope monitors the motion of virtual features, called vanishing points, arising from parallel straight lines in the scene, and from the change of their location that resolves heading, roll and pitch. The method is applicable to the human environments as the straight lines in the structures enable the vanishing point perception. For the visual odometer, the ambiguous scale arising when using the homography vi Abstract between consecutive images to observe the translation is solved using two different methods. First, the scale is computed using a special configuration intended for indoors. Secondly, the scale is resolved using differenced GNSS carrier phase measurements of the camera in a method aimed at urban environments, where GNSS can’t perform alone due to tall buildings blocking the required line-of-sight to four satellites. However, the use of visual perception provides position information by exploiting a minimum of two satellites and therefore the availability of navigation solution is substantially increased. Both methods are sufficiently tolerant for the challenges of visual perception in indoor and urban environments, namely low lighting and dynamic objects hindering the view. The heading and translation are further integrated with other positioning systems and a navigation solution is obtained. The performance of the proposed vision-aided navigation was tested in various environments, indoors and urban canyon environments to demonstrate its effectiveness. These experiments, although of limited durations, show that visual processing efficiently complements other positioning technologies in order to provide better pedestrian navigation accuracy and reliability. PREFACE The research presented in this thesis has been carried out mainly at the Finnish Geodetic Institute (FGI), Department of Navigation and Positioning, during years 20102013. The research included also a eight-month research visit to the Department of Geomatics Engineering, University of Calgary, Canada in 2012. I have been privileged to receive guidance from four distinguished professors, to whom I want to express my gratitude. First, I would like to thank my supervisor Prof. Ruizhi Chen for providing the possibility to commit this research, for guidance and encouragement. Second, I would like to thank my co-supervisor Prof. Gérard Lachapelle for providing me the valuable opportunity to work and study in his Position, Location and Navigation (PLAN) group at the University of Calgary, for guidance and for introducing me the stunning Canadian Rockies. Third, I would like to thank my other co-supervisor Prof. Jarmo Takala for his guidance in practical issues related to my studies and the dissertation process. Last, but definitely not least, I would like to than Prof. Heidi Kuusniemi for her endless guidance, aiding and encouragement during the process. I would like to express my appreciation to Prof. Andreas Wieser and Prof. Jari Hannuksela for reviewing the manuscript and providing constructive comments. I have also been privileged to work with numerous people being able to realize their passion to science and therefore making the working environment very pleasant. Therefore I would like to thank everyone I have been dealing with during my scientific career, but especially my colleagues who have contributed to my work and made my everyday work pleasant both at Finnish Geodetic Institute and University of Calgary: Dr. Zahidul Bhuiyan, Dr. Liang Chen, Dr. Yuwei Chen, Dr. Ling Pei, Dr. Jingbin Liu, M.Sc. Robert Guinness, M.Sc. Heli Honkanen, Dr. Jared Bancroft, M.Sc. Anup Dhital and David Garrett, and all the others I have had the pleasure to collaborate with of the Position, Location and Navigation group in Calgary and of viii Preface the Finnish Geodetic Institute. My research has been supported financially by the Nokia Foundation award, received in years 2011 and 2012 and Tampere University of Technology’s grant for postgraduate exchange, which are gratefully acknowledged. In addition, I would like to express my gratitude to my parents Marja-Kirsti and Jouko Eliasson and my sister Liisa Eliasson-Tapio for always believing in me, for their support and admiration during this process as always, which has given the selfconfidence needed also in this process. I would also like to thank all my relatives, especially my grandmothers, and my family-in-law for their encouragement also in this process as always. I would like to thank my friends for their friendship, for offering me the valuable moments of laugh and happiness and sharing of concerns. Finally, my greatest thanks go to my family, my husband, Aki, who has fully supported me in this process as in everything for the last twenty years, mentally as well as in practice, and my two beautiful daughters, Maria and Malla, who fill my every day with joy and happiness. And most of all, for giving life the meaning. Helsinki, September 2013 Laura Ruotsalainen TABLE OF CONTENTS Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Author’s Contribution . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2. Overview of pedestrian navigation . . . . . . . . . . . . . . . . . . . . . 11 2.1 Navigation Frames and Attitude . . . . . . . . . . . . . . . . . . . 11 2.2 Absolute Positioning . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 Global Navigation Satellite Systems . . . . . . . . . . . . . 13 2.2.2 WLAN Positioning . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 Other Technologies . . . . . . . . . . . . . . . . . . . . . . 18 Relative Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.1 20 2.3 Inertial Sensors . . . . . . . . . . . . . . . . . . . . . . . . x Table of Contents 2.3.2 2.4 Other Self-Contained Sensors . . . . . . . . . . . . . . . . 22 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.1 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.2 Extended Kalman Filter . . . . . . . . . . . . . . . . . . . 27 3. Computer vision methods for navigation . . . . . . . . . . . . . . . . . . 29 3.1 Camera, Fundamental and Essential Matrices and Coordinate Frames 29 3.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.1 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.2 SIFT Features . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.3 Line Extraction . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3 Image Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4.1 39 Distortion . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Visual gyroscope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.1 Locating the Vanishing Points . . . . . . . . . . . . . . . . . . . . 42 4.2 Attitude of the Camera . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3 Error Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.4 Performance of the Visual Gyroscope . . . . . . . . . . . . . . . . 51 4.4.1 Theoretical Analysis of Attainable Accuracy . . . . . . . . 55 Effect of Camera and Setup Characteristics on the Accuracy of the Visual Gyroscope . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.5.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . 60 4.6 Smartphone Application of Visual Gyroscope . . . . . . . . . . . . 66 4.7 Visual Gyroscope Implementation Using Probabilistic Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.5 Table of Contents 5. Visual odometer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 6.2 6.3 73 5.1.1 Measuring the Distance of an Object from the Camera . . . 74 5.1.2 Error Detection and Resolving the unknown scale for the Visual Odometer . . . . . . . . . . . . . . . . . . . . . . . 77 5.1.3 Degeneracy . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.1.4 Performance of the Visual Odometer . . . . . . . . . . . . . 79 83 Visual Gyroscope and Odometer Aided Multi-Sensor Positioning . . 84 6.1.1 Kalman Filter Used in Multi-Sensor Positioning . . . . . . 85 6.1.2 Test in an Indoor Office Environment . . . . . . . . . . . . 87 6.1.3 Test in Office Environment Using an Outdated WLAN Radio Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Stand-Alone Visual System . . . . . . . . . . . . . . . . . . . . . . 92 6.2.1 Kalman Filter Used in Stand-Alone Visual Positioning . . . 93 6.2.2 Test in a Shopping Mall Environment . . . . . . . . . . . . 93 6.2.3 Test in an Urban Canyon . . . . . . . . . . . . . . . . . . . 94 Visual Gyroscope Aided IMU Positioning . . . . . . . . . . . . . . 97 6.3.1 6.4 73 The Principle of the Visual Odometer . . . . . . . . . . . . . . . . 6. Vision-aided navigation using visual gyroscope and odometer . . . . . . . 6.1 xi Kalman Filter Used in Visual Gyroscope Aided IMU Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.3.2 Equipment Setup on the Body . . . . . . . . . . . . . . . . 100 6.3.3 Equipment Setup on the Foot . . . . . . . . . . . . . . . . . 106 Performance of Visual Gyroscope Implemented Using Probabilistic Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7. Vision-aided carrier phase navigation . . . . . . . . . . . . . . . . . . . 113 7.1 Ambiguity Resolution Using Differenced GNSS Carrier Phase Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 xii Table of Contents 7.1.1 Ambiguous Translation Using the Fundamental Matrix . . . 116 7.1.2 Navigation Solution Incorporating the Absolute User Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 7.2 Method Verification in a Sub-Urban Environment . . . . . . . . . . 119 7.3 Vision-Aided GNSS Navigation in an Urban Environment . . . . . 121 8. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 8.1 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 8.2 Future Development . . . . . . . . . . . . . . . . . . . . . . . . . 131 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 LIST OF FIGURES 2.1 User attitude in navigation frame . . . . . . . . . . . . . . . . . . . 12 2.2 Absolute heading error of a digital compass indoors . . . . . . . . . 23 3.1 Coordinate frames in vision-aiding . . . . . . . . . . . . . . . . . . 31 3.2 Epipolar geometry . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3 Hough transform parameters . . . . . . . . . . . . . . . . . . . . . 37 4.1 Vanishing point . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2 Vanishing point in an image with roll . . . . . . . . . . . . . . . . . 45 4.3 Vanishing point error detection . . . . . . . . . . . . . . . . . . . . 49 4.4 Allan deviation of the visual gyroscope . . . . . . . . . . . . . . . 53 4.5 Visual gyroscope‘s tolerance on dynamic objects . . . . . . . . . . 54 4.6 An image captured of the same scene with three different cameras . 60 4.7 Experiment setup for testing camera characteristics . . . . . . . . . 61 4.8 Effect of the field-of-view on the line detection. . . . . . . . . . . . 64 5.1 Visual odometer configuration . . . . . . . . . . . . . . . . . . . . 75 5.2 Matched SIFT features between consecutive images . . . . . . . . . 77 6.1 Equipment setup for testing the vision-aided multi-sensor positioning system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2 Office corridor used for experiments . . . . . . . . . . . . . . . . . 88 6.3 Visual odometer speed . . . . . . . . . . . . . . . . . . . . . . . . 89 6.4 Vision-aided position solution in an office corridor . . . . . . . . . 90 xiv List of Figures 6.5 Vision-aided position solution in an office corridor with an outdated WLAN radio map . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.6 Challenging environment of Iso Omena shopping centre . . . . . . 94 6.7 The two-dimensional position solution in the Iso Omena shopping centre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.8 Challenging environment of an urban canyon . . . . . . . . . . . . 96 6.9 Position solution in an urban canyon . . . . . . . . . . . . . . . . . 97 6.10 Route for experiments on the University of Calgary campus . . . . . 101 6.11 Body mounted test equipment . . . . . . . . . . . . . . . . . . . . 102 6.12 Standard deviation for different integration schemes . . . . . . . . . 103 6.13 Attitude error using different integration schemes . . . . . . . . . . 104 6.14 Equipment setup for the foot . . . . . . . . . . . . . . . . . . . . . 107 6.15 Images from one step cycle period . . . . . . . . . . . . . . . . . . 107 6.16 RMS position errors obtained for foot-mounted IMU . . . . . . . . 108 6.17 Line detection and vanishing point calculations using Probabilistic Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.18 Evaluation of vanishing point detection in an environment suffering from low lighting and non-orthogonal lines . . . . . . . . . . . . . 110 6.19 Correcting IMU errors using a vanishing point obtained using Probabilistic Hough Transform . . . . . . . . . . . . . . . . . . . . . . 111 6.20 Conflict between estimated and detected vanishing . . . . . . . . . 112 7.1 Setup for vision-aided carrier phase navigation . . . . . . . . . . . 120 7.2 Position solution verification in a sub-urban environment shown in Google Earth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.3 Calgary downtown . . . . . . . . . . . . . . . . . . . . . . . . . . 122 7.4 Number of satellites acquired in an urban canyon . . . . . . . . . . 123 7.5 Position solution in an urban canyon . . . . . . . . . . . . . . . . . 124 List of Figures xv 7.6 Horizontal position errors in an urban canyon . . . . . . . . . . . . 125 7.7 Position solution in an urban canyon shown in Google Earth . . . . 126 7.8 Position solution using GPS only in an urban canyon shown in Google Earth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 xvi List of Figures LIST OF TABLES 4.1 Statistics for heading change accuracy, all units degrees . . . . . . . 52 4.2 Effect of roll error on other angle observations . . . . . . . . . . . . 55 4.3 Parameters for GoPro, Sony and Nokia cameras . . . . . . . . . . . 60 4.4 Heading change error statistics . . . . . . . . . . . . . . . . . . . . 62 4.5 Roll and pitch error statistics . . . . . . . . . . . . . . . . . . . . . 63 4.6 Processing time for different algorithms in visual gyroscope’s Nokia N8 Symbian (capturing the photo) smartphone implementation using OpenCV (edge and line detection) and vanishing point, heading and tilt computations implemented using C++. . . . . . . . . . . . . . . 66 Statistics of the effect of camera height errors for visual odometer’s speed accuracy, units are in m/s . . . . . . . . . . . . . . . . . . . 81 5.1 6.1 Positioning error statistics using different systems in an office corridor 90 6.2 Positioning error statistics using different positioning systems in an office corridor with an outdated WLAN radio map . . . . . . . . . . 92 Positioning error statistics for visual stand-alone and GPS position solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Attitude errors obtained for body-mounted IMU with different integration methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 RMS position error obtained for foot-mounted IMU with and without vision-aiding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Ratio of image points used for computing the Probabilistic Hough Transform presented to the image points used by Standard Hough Transform for the images processed in the experiment . . . . . . . . 109 6.3 6.4 6.5 6.6 xviii List of Tables 7.1 Positioning verification error statistics using vision-aided carrier phase (VA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.2 Positioning error statistics using vision-aided carrier phase (VA) and GPS only (GPS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 ABBREVIATIONS AVUPT Absolute Visual attitude Update BLUE Best Linear Unbiased Estimate C/A Coarse/Acquisition CCD Charge Coupled Device CMOS Complementary Metal Oxide Semiconductor COMPASS/Beidou Chinese Satellite Navigation System DCM Direction Cosine Matrix DOP Dilution Of Precision E East ECEF Earth Centered Earth Fixed EKF Extended Kalman Filter ENU East-North-Up EXIF Exchangeable Image File Galileo European Satellite Navigation System GDOP Geometric Dilution Of Precision GLONASS The Russian Positioning System, Global’naya Navigatsionnaya Sputknikkovaya Sistema GNSS Global Navigation Satellite System xx Abbreviations GPS Global Positioning System HD High-definition HSGPS High Sensitivity GPS IEEE The Institute of Electrical and Electronics Engineers ION Institute of Navigation IMU Inertial Measurement Unit INS Inertial Navigation System KF Kalman filter LCI Low-coherence Interferometry LDOP Line Dilution Of Precision LOS Line Of Sight Max Maximum MEMS Micro-Electro-Mechanical Min Minimum MSP Multi Sensor Positioning N North PGCP Pseudo Ground Control Points PDOP Position Dilution Of Precision PPP Precise Point Positioning RANSAC RANdom SAmple Consensus xxi RF Radio Frequency RFID Radio Frequency Identification rms root mean square RSSI Received Signal Strength Indication SHT Standard Hough Tranform SIFT Scale Invariant Feature Transform SLAM Simultaneous Localization And Mapping SPAN Synchronized Position Attitude Navigation SVD Singular Value Decomposition std standard deviation ToA Time of Arrival TVUPT Temporal Visual Attitude Update U Up UAV Unmanned Aerial Vehicle UKF Unscented Kalman filter UTC Coordinated Universal Time UERE User Equivalent Range Error UWB Ultra-Wideband VA Vision-aided WiFi Wireless network, a registered trademark of the Wi-Fi Alliance WLAN Wireless Local Area Network xxii Abbreviations SYMBOLS αi Angle between a line i in an image and the image x-axis β roll ∆t Time interval ∆x Vector offset of the user’s true position and time bias from the values at the linearization point δxk Perturbation of the state − Error of a priori state estimate or perturbation of the Euler angles Error of a posteriori state estimate or noise in GPS measurements or vector of errors in GNSS measurements ηg Noise in gyroscope or carrier phase measurement λ Carrier wavelength or longitude µ Mean ∇ Image gradient ω Earth turn rate b ωib Body (b) turn rate with respect to the inertial (i) frame angular velocity measurement b ω̃ib Gyroscope angular velocity measurement xxiv Symbols Ω Skew symmetrical matrix of the angular velocity vector φ pitch or latitude Φ State transition matrix ρ Pseudorange or the radius of a line in an image in Hough Transform ρ̂ Estimated pseudorange computed from the estimated user position σ Standard deviation σ2 Variance 2 (t ) σC A Allan variance θ Heading, (azimuth) ϕ Carrier phase b body frame c Speed of light C Direction cosine matrix or Convolution d direction of a line in an image diono Ionospheric delay dtropo Tropospheric delay dρ Ephemeris error dt Satellite clock error Di Distance between the starting point of line i and the vanishing point E Essential matrix f Focal legth f Specific force xxv F Fundamental matrix g Mass gravitation G User-satellite geometry matrix or Convolution kernel or g-sensitivity coefficient matrix h Height H Height of an image in pixels H Design matrix or image homography i inertial frame I Image matrix k Distortion value K Kalman gain or camera calibration matrix L1 GPS signal carrier frequency at 1575.42 MHz M Image gradient magnitude matrix N Gaussian probability distribution or integer number of carrier waves Ne Inertia tensor O Image gradient orientation matrix p pressure P State error covariance or camera matrix q̃ Spectral density value Q Process noise covariance r Geometric range rd Radial distance of the normalized distorted image point r User position vector or Least-squares residual vector xxvi R Symbols Measurement noise covariance or camera rotation matrix Rg Universal gas constant RW LAN RSSI observation vector s Ambiguous scale in translation observed from consecutive images s Satellite coordinate vector S User speed S Scale factor and non-orthogonality matrix t time tu Receiver clock error t User translation vector Ti Satellite i’s position vector Trcvr Receiver position vector T0 Temperature at the sea level TL Temperature lapse rate u Principal point’s x-coordinate u User coordinate vector or the unit vector from user to satellite uGC Satellite and user geometry change v Principal point’s y-coordinate v Kalman filter’s innovation vector or user velocity vector or a vanishing point matrix vk Process noise xxvii vf ov Vertical field-of-view of a camera vx Vanishing point in x-axis direction in homogenous coordinates vy Vanishing point in y-axis direction in homogenous coordinates vz Vanishing point in z-axis direction in homogenous coordinates wi Standardized innovation of the ith element of the innovation vector wk Measurement noise xk State vector x̂− k a priori state estimate x̂k a posteriori state estimate xk Nominal value of the state xu User (receiver) x-coordinate x Feature coordinates in the image reference frame X Object coordinates in the world reference frame or user position East component Ẋ Time derivative of X yu User (receiver) y-coordinate ỹ(tA )k Average value of bin k in Allan variance Y User position North component Ẏ Time derivative of Y zk Measurement vector xxviii Symbols zu User (receiver) z-coordinate Z Depth of an object i.e. the Z-coordinate in the world reference frame 1. INTRODUCTION In addition to commercial solutions, such as directing the user flexibly to the destination desired, pedestrian navigation is crucial in critical applications such as positioning of first responders, electronic monitoring (i.e. monitoring of dangerous offenders under house arrest or parole), and military personnel. The equipment used for pedestrian navigation has to be small and light to carry, effortless to use as well as have reasonable low levels of power consumption and price. Like in all navigation systems the position information has to be accurate and available in real time. At present Global Navigation Satellite Systems (GNSS) are the superior navigation technology fulfilling all the above requirements in outdoor open-sky environments. However, instruments for pedestrian navigation are mainly needed indoors and in urban areas, where GNSS is significantly degraded or unavailable. In these challenged GNSS environments the absolute position of the user may be obtained with other radio navigation systems like Wireless Local Area Networks (WLAN), Bluetooth, or Radio Frequency Identification (RFID). The drawbacks of these radio systems are that they need a priori prepared infrastructure and are therefore restricted to certain areas. They also have, in some environments, too low availability for the needs of pedestrian navigation, depending on the number of access points available. When the initial absolute position is known, the position of the user may be propagated using relative positioning approaches, like self-contained sensors. The propagated position may then be used to augment the position measurements obtained with GNSS or other radio sensors for more accurate and available, or even short-time stand-alone navigation. The most commonly used self-contained sensors in pedestrian navigation are digital compasses for measuring the heading of the user, gyroscopes for heading changes, and accelerometers for the user speed. When these measurements are used as inputs to Pedestrian Dead Reckoning (PDR) algorithms or integrated with absolute position 2 1. Introduction measurements using a Kalman filter, the position of the user is obtained continuously despite the degradation of the GNSS signals. However, self-contained sensors suffer from biases and drift errors that may decrease the position accuracy substantially, especially when consumer grade Micro-Electro-Mechanical (MEMS) sensors are used. The errors in a pedestrian indoor position solution experienced due to the above shortcomings of self-contained sensors, of which the accelerometers, gyroscopes and magnetometers are discussed in more detail herein, may be mitigated using information about the user motion obtained from consecutive images. When the user is carrying a camera whose position and orientation with respect to the user’s body are known, the motion of the features in the observed images may be transformed into information about the user motion. The visual motion information is not affected by the same error sources as GNSS and self-contained sensors, and is therefore a complementary information source for augmenting the positioning measurements. Visual-aiding increases the accuracy, availability, continuity and reliability of the navigation solution, as will be shown herein. The visual perception used in the methods presented herein utilizes a camera carried by the user and facing roughly to the direction of motion. In the case of consumer smartphone applications this is in most cases a preferable configuration, as during the navigation in e.g. a mall or an office building the user is likely to follow the path or look at a map from the smartphone display, resulting in the camera orientation required by the method. In the case of first responders, electronic monitoring, and military personnel, the camera is preferably not carried in hand where it would complicate fundamental operations, but attached to the body or helmet of the person. This configuration is also favourable for the methods presented in the thesis. 1.1 Research Objectives The use of visual information in navigation is challenging. Motion of the features in consecutive images provides information about the change of the user’s heading and translation during the time interval between two consecutive images. In order to convert this relative information into the absolute position information needed for navigation, the position and heading have to be initialized with known absolute values and then propagated using the relative measurements. The attitude and 1.1. Research Objectives 3 translation obtained using visual information would stay accurate during the navigation if the environment was favourable for the visual perception and there would be no correspondence errors while matching features from consecutive images. Visual measurements obtained from different time epochs are independent and therefore the errors in previous epochs do not affect the measurements from subsequent images. However, as errors are inevitable especially the propagated distance and therefore the position starts to drift after a while due to errors affecting the visual measurements and therefore absolute information is needed to re-initialize the trajectory from time to time. The main error sources for the visual-aiding observable in indoor surroundings are the varying lighting conditions of the environment and the low amount of distinctive features to be detected. Urban outdoor areas do not usually suffer from low lighting during daytime. Outdoor surroundings are rich in features, but also in dynamic objects, namely humans and vehicles. When the motion of the camera, and therefore of the user, is observed by following the motion of the features in the consecutive images, the image processing method has to be able to exclude the dynamic objects in the scene from disturbing the perception of the motion. In this research the user heading, as well as the pitch and roll of the camera, are observed by tracking the motion of vanishing points, namely features arising from projective transformations that map the three-dimensional objects into two-dimensional image points. The deficiency of the method is that it is strongly dependent of the geometry of the environment, as it requires straight parallel lines in the view of the camera, preferably in at least two orthogonal directions. In sharp turns this requirement is violated and the magnitude of the turn is often impossible to be defined using image processing alone. Resolving translation from consecutive images using only a monocamera is a challenging research task as well. The complication in observing translation from consecutive images is that the distance between the objects seen in images and the camera contributes to the extent the image pixels move when the camera moves. When the depth, i.e. the distance of the object from the camera, is unknown the scale of translation stays unknown regardless of how many matching image points are found between the consecutive images. The objectives of this research is to provide methods to retrieve heading and trans- 4 1. Introduction lation information using consecutive images addressing the above mentioned challenges and also to enable accurate, more reliable and available navigation solutions by augmenting other positioning systems with the information obtained. All calculations are of a sufficiently low complexity to be adopted in real time for navigation with current smartphones. The algorithms of the concept called ”visual gyroscope” providing the user heading are already developed for the mobile phone environment and the feasibility of the implemented system is herein discussed. 1.2 Related Work The research related to visual positioning has so far mainly concentrated on navigation of vehicles and mobile robots [26] [103]. The motion of a robot or vehicle is constrained and usually only two-dimensional. The visual calculations are easier due to the fact that the location and the path of motion are to some extent known in advance. The first papers related to vision-aiding in pedestrian navigation were published since the late 90s [8]. They used earlier prepared databases with images taken of the surroundings tagged with position information obtained using Global Positioning System (GPS), a map or a floor plan. The absolute position of the pedestrian was provided when a match was found between an image taken by the pedestrian and an image from the database [113]. One of the first such applications made for a smartphone was published in 2004 by Robertson and Cipolla [85] running the calculations on a server to which the query image was sent. Hile and Borriello [43] matched features, like corners, found from the query image, into a floor plan saved in a server. The feature matching was restricted to a certain area of the floor plan using coarse position information obtained with WLAN. The database based vision-aiding applications provide accurate positioning but are restricted to a certain area and require extensive preparation. A visual pedestrian navigation system independent of a server and of pre-existing databases needs usually integration with other positioning sensors to be functional. In such a system the relative position of the user is obtained by monitoring the motion of features in consecutive images taken by the user device and integrating the information with measurements obtained with self-contained sensors or GNSS receiver. With initial absolute position information the navigation may be performed with reduced drift and other errors, as without the initial position the visual perception 1.2. Related Work 5 provides information of the user motion only. Such server independent systems have been proposed by [41] using visual-aided Inertial Measurement Unit (IMU) measurements. On the other hand, a Simultaneous Localization And Mapping (SLAM) system produces a map of the unknown environment while simultaneously locating the user. Traditionally the mapping has been done using inertial sensors but in recent years visual SLAM systems integrating also a camera has been developed [19]. Most human made environments, in indoors and urban outdoor areas consist of segments forming a Cartesian coordinate system with straight lines in three orthogonal directions. The coordinate system is called the Manhattan grid [25] and it provides a good basis for vision-aided navigation utilizing vanishing points [13] [35] [111]. The method of integrating vanishing point based orientation information with Inertial Navigation System (INS) measurements has been implemented before for accurate indoor navigation of an unmanned aerial vehicle (UAV) [108] [30] [82] and for pedestrian navigation [55] [47]. The method presented in this thesis follows the mentioned vanishing points based methods but is further developed for pedestrian and especially smartphone use by developing computationally less demanding algorithms and sophisticated error detection. The unknown scale in translation obtained from tracking the motion of features in consecutive images is one of the most challenging issues related to visual navigation. The magnitude of the motion of the figure in an image is dependent on the depth of the object, i.e. the distance of the object from the camera. Because the distance of the objects from the camera in the navigation environment is usually unknown, a scale problem arises and different methods for overcoming it have been used. When the environment contains objects with known sizes, the distance may be resolved [97]. Also, when scale information about the environment is available, for example in the form of a floor plan [44], the depth of the objects may be observed. Tools aiding the resolving of the distance, like laser rangefinders, have been integrated with a camera by [115] and the motion of the user resolved. The requirement for special equipment reduces the applicability of the methods for pedestrian navigation at this time. When a stereo camera is used, the distance of the objects may be resolved using triangulation [49]. Recently some smartphones equipped with stereo cameras have been launched. In the case of stereo vision the distance between the two cameras, called the baseline, affects the accuracy of the motion obtained from images. The farther the two cameras are from each other the better the accuracy [45]. Therefore a con- 6 1. Introduction figuration using a monocular camera and images taken from two different positions provides better results for vision-aided navigation than a smartphone equipped with a stereo camera due to its very short baseline (e.g. 2.4 cm for LG OPTIMUS 3D smartphone) as a larger baseline results in better depth accuracy [45]. Certain configuration of the navigation system gives information about the distance of the objects being photographed from the camera. When the camera is pointing down to the ground the z-coordinate i.e. the distance is constant and equals the height of the camera. The method utilizing the downward-pointing camera has been used in the applications of vehicle navigation [79] [58] and recently in pedestrian navigation [42]. However, one of the challenges of the visual-aiding in indoor environments is the shortage of features to be tracked. Especially floor textures are usually very homogenous and for that it is very difficult to find matching image points using a camera pointing straight to the floor. [16] developed an outdoor robot navigation system using a special camera configuration, namely the camera had a certain pitch towards the ground, to resolve the distance problem. Optical flow calculations for finding the camera rotation and translation were used. The method presented in this thesis follows the ideas presented in this method but is further developed for pedestrian and indoor use. In [16] the pitch was measured a priori and kept static during navigation as in the method developed herein where orientation of the camera is computed separately for each image and therefore accommodating the irregular movement of the camera e.g. in a smartphone held in hand. As the orientation of the camera is computed separately the method decreases the number of features needed for resolving the motion. As mentioned above, GNSS is an accurate and freely accessible system for outdoor navigation widely used in smartphones. However, at least four satellites in a good geometry are needed for solving the user position, a requirement that is not always fulfilled in urban areas. When knowledge of an initial position is available, fewer satellites may be used for resolving the total change in position between two time epochs as will be shown later. When the errors affecting satellite signal propagation are known, information obtained with two satellites is enough to resolve the total magnitude of translation in addition to the receiver clock error. [99] used the magnitude information for resolving the ambiguous scale in translation induced by motion of features in consecutive images. A method for robot navigation encompassing three cameras for visual measurements, an IMU for resolving pitch and roll of the 1.3. Author’s Contribution 7 camera and an iterative algorithm for solving the user heading was developed. In this thesis, an algorithm more suitable for pedestrian navigation is developed, utilizing less equipment and more robust vision-derived heading information. 1.3 Author’s Contribution In this thesis a novel pedestrian navigation system is presented. Two concepts are developed, namely a ”visual gyroscope” providing the user heading and a ”visual odometer” providing the translation. The motivation for dividing the observation of the user heading and translation into two separate tasks instead of using traditional methods resolving the full motion at once is the difficulty to determine the unknown scale of the translation and also the challenges indoor environments set for visual perceiving, namely the low lighting and shortage of features. The visual odometer presented in the thesis builds on the orientation information produced by the visual gyroscope. Also, traditional methods utilize feature points matched in consecutive images. As the two-fold method presented herein perceives the orientation change using lines observed from the scene, measurements are much more accurate than using other features, as will be explained in Chapter 4 and the number of other features needed for resolving the translation is reduced. Author’s contributions include also a system developed for pedestrian urban navigation, utilizing the visual gyroscope, visual odometer and signal carrier information obtained from at least two GNSS satellites. All calculations are of a sufficiently low complexity to be adopted for navigation with current smartphones. The main contributions of the thesis are as follows: • A visual gyroscope with lower computational requirements compared to the existing algorithms resolving the user heading using visual perceiving and therefore suitable for present smartphones. The visual gyroscope is based on observing heading, pitch and roll of the camera, using vanishing points. • A novel error detection method for the visual gyroscope which provides accurate and reliable navigation despite the unforeseeable motions of a pedestrian. The algorithm makes the visual gyroscope suitable for pedestrian navigation. 8 1. Introduction • A visual odometer, namely a method to resolve translation from images using a monocular camera. The visual odometer is suitable to be used also in indoor environments which are usually poor in features. It is feasible for seamless navigation since it leans on the visual gyroscope’s orientation information and needs only the approximate height of the camera as prior information. • A vision-aided differentiated carrier phase navigation system for pedestrians. The method is leaner than previous similar solutions. The system is independent from other sensors than the camera and the GNSS receiver because it encompasses the visual gyroscope and visual odometer providing the orientation and motion information. The core contributions of Chapters 4-6 were first presented in [89], [90], [92], [91], [93], [94] and [95] in which the author of the thesis is the first author and in [64] in which the author of the thesis is a co-author. 1.4 Thesis Outline The thesis combines two different scientific disciplines; navigation and computer vision. They both have a well-established terminology and mathematical representation. In order to respect the fundamentals of both sciences, the thesis utilizes customary terminology. Unfortunately, in the case of variable names and expressions there are many differing meanings and therefore the terminology characteristic for each is presented in dedicated chapters. In Chapter 2, the most prevalent systems used in pedestrian navigation - i.e. GNSS, WLAN and self-contained sensors - are presented. The computer vision principles relevant in vision-aided navigation are discussed in Chapter 3 with an emphasis on the methods and algorithms used in the thesis. Chapter 4 introduces the concept of a ”visual gyroscope” and the novel error detection algorithm. The feasibility and challenges of the visual gyroscope are discussed as well as the effect of different camera and setup characteristics on the accuracy and applicability of the method in pedestrian navigation. In Chapter 5 a concept of ”visual odometer” is presented. The mathematics, strengths and challenges of the visual odometer and its utilization are discussed. Chapter 6 presents results from various experiments integrating the 1.4. Thesis Outline 9 visual gyroscope and odometer, both for indoor and urban pedestrian navigation. In Chapter 7 the vision-aided differentiated carrier phase navigation system for pedestrians, results from experiments and its feasibility for urban pedestrian navigation are discussed. Chapter 8 provides conclusions and recommendations for future research. The Novatel GPS and the Novatel SPAN-SE GPS/GLONASS receivers with a Northrop Grummans tactical grade LCI-IMU were used to determine the reference trajectories for assessing the performances of the algorithms developed in this thesis. As the system is initialized outdoor using the GNSS receiver and the navigation time indoors is limited to a short-term period, the obtained post-processed reference solution has a decimeter level accuracy. 10 1. Introduction 2. OVERVIEW OF PEDESTRIAN NAVIGATION Global Navigation Satellite Systems (GNSS) are the superior navigation technology used also for pedestrian positioning. However, GNSS is significantly degraded or unavailable in environments where pedestrian positioning is mainly needed, namely in indoors and urban areas and other methods are required for augmentation or replacement in these situations. Methods other than GNSS for these indoor and urban areas may be divided into two classes based on the type of position information they provide, namely into absolute and relative positioning. Robust integration of measurements from sources providing data with different rates and perceiving observations in different reference frames is challenging. This chapter introduces the basics of GNSS based positioning, Wireless Local Area Network (WLAN) positioning and other absolute positioning methods. The relative positioning systems used in this thesis, namely Inertial Navigation System and other self-contained sensors, are also presented. Finally the Kalman filter, a set of mathematical equations used for estimating the state of a process based on a priori knowledge of the accuracy of the measurements and confidence on the model used, is discussed. 2.1 Navigation Frames and Attitude This thesis uses five reference frames relevant for vision-aided pedestrian navigation, namely Inertial, Earth-Centered Earth-Fixed, Navigation, Body and Camera reference frames. The inertial frame has origin at the centre of the Earth and axes fixed with respect to stars, not rotating with the Earth. Earth-Centered Earth-Fixed reference frame has also origin at the centre of the Earth, but the axes rotate with the Earth with respect to the inertial frame. Both frames have their z-axis coincident with the Earth’s polar axis. The navigation frame is a local geographic frame with origin defined by the initialization of the navigation setup and axes pointing at north, east and up. The body frame is a frame where the inertial navigation system is installed, 12 2. Overview of pedestrian navigation Fig. 2.1. Heading, pitch and roll in Navigation frame. containing three orthogonal axis, z-axis pointing up [105]. In vehicle navigation the rotation around z-axis is called yaw, x-axis roll and y-axis pitch. In pedestrian navigation where the orientation of the unit is not always fixed with respect to the user the term yaw is substituted with heading and defined as the angle between the chosen body axis with respect to the north [24], which is defined as the direction from a point to the North pole of the earth and projected onto the level surface. The heading, pitch and roll are shown in Figure 2.1. The definition for the camera reference frame is not needed in this chapter but will be given in Chapter 3. 2.2 Absolute Positioning Absolute positioning systems provide the actual coordinates of the user position as the relative positioning systems tell the speed (or translation) and direction of the user to be intergraded with the initial position. In reality, only GNSS provides the absolute coordinate information of the pedestrian in the Earth-Centered Earth-Fixed 2.2. Absolute Positioning 13 (ECEF) coordinate frame as the other systems provide the absolute position in some a priori defined local reference frame, for example inside a certain building. All the absolute positioning techniques presented facilitate positioning by transmitting radio waves with different wavelengths and frequencies. 2.2.1 Global Navigation Satellite Systems GNSS encompass the United States Global Positioning System (GPS), the Russian GLONASS, Chinese COMPASS/Beidou and the European Galileo systems. The following principles will be based on GPS because it is still the most used system due to its long existence compared to other systems mentioned above, but are true for the other systems also. In GNSS based positioning the traverse time of a signal from the satellite to the user receiver antenna is estimated. When this time is multiplied by the speed of light a geometric range between the satellite and the user is obtained. In an ideal case measurements from three satellites would provide an accurate three dimensional position of the user. In reality the measurements are erroneous, the main error source being the timing errors between the receiver clock and the satellite clock from the system time. Therefore the measured range is called the pseudorange. The satellite clocks are precise and synchronized by the ground control segment of the system. However, the clocks in the user receivers are low-cost with typically a large timing error. Therefore, it has to be estimated as a parameter in the navigation solution. Observations from at least four satellites are needed for three dimensional positioning, namely the fourth observation is used for resolving the receiver clock error. The pseudorange measurement is defined as ρi = ri + c(tu − dti ) + dρi + diiono + ditropo + εiρ (1) where ri is the geometric range between the user receiver’s antenna and the satellite i [m], c is the speed of light [m/s], tu is the receiver clock error [s] and dti is the satellite clock error [s] with respect to GPS time, dρi is the ephemeris error [m], diiono and ditropo are the ionospheric and tropospheric delays [m], respectively and εiρ encompasses noise, unmodelled errors and multipath [75]. Because some of the errors may be corrected using the data found in the signal and the rest may be considered negligible compared to the receiver clock error, the pseudorange measurements may be expressed as 14 2. Overview of pedestrian navigation ρi = ||si − u|| + ctu (2) where si represents the coordinate vector of satellite i, ctu is the speed of light (c) times the advance of the receiver clock tu and u is the user coordinate vector (xu , yu , zu ) to be resolved [54]. These pseudorange measurements from at least four satellites may further be used for obtaining the user coordinates. Because the pseudorange equations are non-linear, the user position and clock error have to be linearized by expanding using a Taylor series as f (xu , yu , zu , tu ) = f (x̂u + ∆xu , ŷu + ∆yu , ẑu + ∆zu , t̂u + ∆tu ) = δf (x̂u , ŷu , ẑu , t̂u ) δf (x̂u , ŷu , ẑu , t̂u ) ∆xu + ∆yu δ x̂u δ ŷu δf (x̂u , ŷu , ẑu , t̂u ) δf (x̂u , ŷu , ẑu , t̂u ) ∆zu + ∆tu + ... + δẑu δ t̂u f (x̂u , ŷu , ẑu , t̂u ) + (3) where (x̂, ŷ, ẑ, t̂) are approximated values of the true position and true clock error (xu , yu , zu , tu ) and (∆xu , ∆yu , ∆zu , ∆tu ) are the differences between the true and approximated values. The higher order derivatives are discarded to neglect nonlinear terms and the remaining first-order partial derivatives are δf (x̂u , ŷu , ẑu , t̂u ) δ x̂u δf (x̂u , ŷu , ẑu , t̂u ) δ ŷu δf (x̂u , ŷu , ẑu , t̂u ) δẑu δf (x̂u , ŷu , ẑu , t̂u ) δ t̂u xi − x̂u r̂i yi − ŷu =− r̂i zi − ẑu =− r̂i =− (4) =c The variables r̂i for the estimated geometric ranges are defined as r̂i = p (xi − x̂u )2 + (yi − ŷu )2 + (zi − ẑu )2 . The pseudorange measurement may now be presented as (5) 15 2.2. Absolute Positioning ρi = ρ̂i − yi − ŷu zi − ẑu xi − x̂u ∆xu − ∆yu − ∆zu + c∆tu r̂i r̂i r̂i (6) and finally the difference between the measured pseudorange ρi and the pseudorange computed using the estimated user position ρ̂i is ∆ρi = xi − x̂u yi − ŷu zi − ẑu ∆xu + ∆yu + ∆zu − c∆tu . r̂i r̂i r̂i (7) The differences between the approximated and true position and clock error ∆x is ∆x = H−1 ∆ρ where the matrices for n measured satellites are x1 −x̂u ∆ρ1 r̂1 .. .. ∆ρ = . H = . xn −x̂u ∆ρn r̂n y1 −ŷu r̂1 z1 −ẑu r̂1 yn −ŷu r̂n zn −ẑu r̂n .. . .. . ∆xu 1 ∆y u 1 ∆x = ∆zu 1 −c∆tu (8) and by using this information the true position and clock error may be computed from the approximated values when four satellites are observed. When more than four satellites are observed, the solution is computed using the least-squares estimation as ∆x = (HT H)−1 H∆ρ. The user / satellite relative geometry contribute to how much the combined measurement errors, the most important being ionospheric and tropospheric delay, receiver noise and resolution and multipath, expressed using a variable called User Equivalent Range Error (UERE), will affect the resulting position error [54]. The more measurements the better the position solution is obtained only when the measurements are linearly independent [75]. When the satellites used are widely spread with respect to the user receiver, the dilution of precision (DOP) is small and the position solution much more accurate than when the satellites are close to each other or otherwise in poor configuration. The effect of the satellite geometry for the position error is related as follows. The user-satellite geometry is denoted as G = (HT H)−1 , where the matrix H is called the design matrix and is as explained above. Then the covariance matrix of the position errors on the x−, y−, and z−components and of the user clock bias (tu ) estimate is cov(x) = GσU2 ERE . (9) 16 2. Overview of pedestrian navigation The variances of position and clock error components are [54] [75] σx = σU ERE G11 ; σy = σU ERE G22 ; σz = σU ERE G33 ; σb = σU ERE G44 , where Gii is the ith entry on the diagonal of G. The Geometric Dilution of Precision (GDOP) encompassing 3-D position and clock bias estimation error is now GDOP = p G11 + G22 + G33 + G44 (10) and Position Dilution of Precision (PDOP) the square root of the sum of the three first factors. In the case where four satellites are tracked the PDOP value is smallest, and therefore the position solution best possible, when three of the satellites are evenly distributed in azimuth near the horizon and the fourth is perpendicularly above the user receiver (i.e. at zenith). A more accurate satellite-to-user distance is obtained when a carrier phase observation is used. The carrier phase observation ϕ from satellite i is defined as ϕi = ri + c(tu − dti ) + dρi + λN − diiono + ditropo + εiϕ (11) where λ is the carrier wavelength, N is the integer ambiguity, εiϕ encompasses noise, unmodelled errors and multipath and the other variables are as in the case of the pseudorange measurement. Although carrier phase measurements provide very accurate positioning, in millimeter level in favourable conditions, they have not been widely used in pedestrian navigation. In order to obtain an accurate position solution, the integer ambiguity, namely the integer number of cycles the signal has traversed since leaving the satellite, has to be resolved. This may be done using double differenced GNSS measurements [67] or single differenced measurements and Precise Point Positioning (PPP) [33], both too complex for the equipment used for pedestrian navigation with lightness and reasonable cost requirements. The carrier phase observations are also difficult to be obtained continuously in the environments typical for pedestrian positioning namely in urban areas and indoors: the carrier phase tracking loop is more vulnerable to losing lock in attenuated signal environments than the code delay tracking loop, which produces the pseudorange measurements. However, when the carrier phase measurements obtained in two consecutive time epochs are differentiated the integer ambiguity, which is unchangeable, disappears. 2.2. Absolute Positioning 17 The differentiated measurements are left with the error and noise terms as well as the change in geometric range between the time epochs, which may further be used for pedestrian navigation, as will be shown in Chapter 7 GNSS is the superior positioning system in open outdoor areas, but its use is very limited in urban and indoor areas. Although in these challenging environments High Sensitivity GPS (HSGPS) receivers are usually used, the performance in terms of reliability and accuracy is degraded [62]. As the received signal power level decreases the measurement uncertainty increases due to noise. The Effective Isotopic Radiated Power (EIRP) of a GPS (L1 C/A Code) civil signal is 26.8 dBW at the time of transmission. The power decreases mainly due to free space propagation loss (∼ 184.4 dB) while the signal travels from space to the Earth. In order to be able to find the relevant information from the signal below noise, the minimum received power at the conventional receiver has to be around -160 dBW [54] and at a typical HSGPS receiver -186 dBW [68]. The requirement of the -160 dBW received power is achieved with a Line-of-Sight (LOS) signal, but the signal degrades due to the attenuation resulting from propagation through a material (i.e. shadowing) and interference, typically multipath (i.e. fading). Type of the material the signal has to penetrate affects the amount of attenuation, a good comparison of the effect of different widely used materials may be found from [40]. For example, while entering a concrete and steel building the mean fading of the signal ranges from 19 to 23 dB and from 12 to 21 dB when entering a residential garage, depending on the elevation angle of the satellites tracked [68]. The required and received signal power levels show that the use of a HSGPS provides increased availability of the GPS positioning in most GNSS challenging environments, but the accuracy is still too poor for pedestrian navigation. Therefore augmenting and replacing GPS signals in urban and indoor areas is needed and few comprehensive methods will be discussed below. 2.2.2 WLAN Positioning Wireless Local Area Network (WLAN) based on IEEE 802.11 standard is a wireless network used for communication between closely-spaced electronic devices (occasionally also called Wi-Fi which is a registered trademark of the Wi-Fi Alliance). Because of its broad existence means for using the technology for positioning has also been developed. In a WLAN positioning solution, the prevailing fingerprinting 18 2. Overview of pedestrian navigation technique uses a database, a so called radio map, of access point signal strengths collected manually during an off-line training process. The user position is determined with the radio map and Received Signal Strength Indication (RSSI) measurements, which are the power level measurements of the received radio signal. A Bayesian theorem and e.g. the Histogram Maximum Likelihood method are used to solve the user position with the measurements [86], [112]. WLAN positioning provides typically room-level accuracy but is limited to the surroundings with existing and prepared infrastructure [101]. The measured RSSI samples at each reference point during the training phase are utilized to estimate the parameters of Weibull distribution [96] used to describe the WLAN signal strength distribution [66]. During positioning phase the observation vector RSSI = {r1 , ..., rn } is used to find the position x that maximizes the conditional probability P (x|R) using the Bayesian theorem as arg max[P (x |RW LAN )] = arg max x x P (RW LAN |x )P (x ) . P (RW LAN ) (12) The advantages of using WLAN positioning are its large coverage, typically the range is from 50 m to 100 m, and that no line of sight is required [74]. A major weakness of the fingerprinting procedure is its vulnerability to the changes of the environment, causing the signal propagation patterns and thus the radio map to become obsolete and therefore offering a position solution with reduced accuracy. Also electrical equipment placed to the vicinity of the access points contorts the position solution. 2.2.3 Other Technologies The other promising and actively researched absolute positioning technologies for pedestrians contain Radio Frequency Identification (RFID), Bluetooth and UltraWideband (UWB). RFID positioning is based on attaching the user with tags that are then observed by a reader, or by attaching the environment with tags and equipping the user with a reader. The two most widespread methods used for resolving the user position is by just acknowledging that a user is close to the reader with known position or by using the RSSI measurements as described above. Ultra-Wideband positioning is based on a transmitter emitting radio waves occupying a large frequency bandwidth, namely more than 500 MHz. The benefit of using the Ultra-Wideband 2.3. Relative Positioning 19 signals compared to the narrow band equivalents is their ability to penetrate many building materials such as concrete, glass and wood [59]. The UWB based positioning may be performed similar to the RFID positioning by attaching the user with receiver tags using RSSI methods such as in WLAN positioning or similar to Time of Arrival (ToA) methods such as GNSS based positioning. However, UWB positioning may also be used without supplying the user with special equipment, namely by using the UWB transmitter as a radar. In this manner the time elapsed before an emitted signal comes back to the transmitter after reflecting from the user is measured. When the background is known, the position of the user may be estimated. Bluetooth positioning uses the same principles as WLAN positioning; the position is mainly obtained using the RSSI methods utilizing an a priori prepared database of the access points in the area. Smartphones have been equipped with Bluetooth receivers for long already, but unfortunately the infrastructure of the access points is not even close to be as widespread as for WLANs. The benefit of using Bluetooth for positioning is that the transmitters may be manufactured to transmit signals with strong power and therefore resulting in long range positioning [20]. A comprehensive presentation of various techniques used for indoor positioning, not all mentioned in this thesis, may be found from [74]. 2.3 Relative Positioning Self-contained sensors carried by the user are desirable equipment for pedestrian navigation providing relative position information independently of the environment. With a known initial position, the position may be propagated using the sensors for a limited period of time [24]. The propagation is done using standard inertial algorithms incorporating the attitude obtained by integrating the gyroscope measurements and translation obtained by double integrating the accelerometer measurements. The limitation of the self-contained sensors is the cumulative measurement errors growing fast due to the procedure integrating also the measurement noise and biases. An important aspect strongly affecting the development in pedestrian navigation is that the tolerable number and size of the equipment used is limited compared to e.g. robot or vehicular navigation. This forces to seek for a compromise between the accuracy and usability of the system. Micro-Electro-Mechanical System (MEMS) 20 2. Overview of pedestrian navigation sensors are small in size and weight, have low power consumption and are inexpensive to produce [105] and therefore used widely for pedestrian navigation and especially in smartphones, however with decreased measurement performance. 2.3.1 Inertial Sensors Accelerometers and gyroscopes are called inertial sensors. A system encompassing at least one accelerometer observing the acceleration of the body and a gyro measuring the rotation is called an Inertial Measurement Unit (IMU). Two different methods are used for processing the IMU measurements, namely Pedestrian Dead Reckoning (PDR) and inertial navigation [37], systems using the latter referred to as Inertial Navigation Systems (INS). PDR has three phases; step detection, step length estimation and navigation solution update by combining the step length estimate obtained using at least one accelerometer and heading using a magnetometer or a gyro augmented with a magnetometer. As the performance of PDR is less sensitive to the quality of the sensors, especially when the distance travelled is concerned, PDR is feasible to be used with MEMS sensors. The process works with a single accelerometer, but the performance increases when more sensors are used [37]. Inertial navigation algorithms require a full IMU with triads of accelerometers and gyroscopes. The accelerometers observe the acceleration of a body, but to be able to transform the acceleration measurements into user position the direction of the acceleration is also needed and that is done by observing the relative rotational motion of the body with respect to the inertial reference frame using rate gyroscopes. The most common type of MEMS gyros are vibratory gyros based on Coriolis force [9]. The formation of a navigation solution from the accelerometer and gyroscope measurements is as follows [105]. The accelerometers output a measurement of specific force in the body reference frame f b . The measurement has to be transformed into the inertial reference frame using a Direction Cosine Matrix [61] Cib as f i = Cib f b . Matrix Cib may be computed b obtained from gyroscopes using from the angular velocities ωib Ċib = Cib Ωbib (13) 2.3. Relative Positioning 21 where Ωbib is the skew symmetric matrix of the angular velocity vector. The specific force contains a measure of mass gravitation (g) that has to be accommodated resulting in d2 r =f +g dt2 i (14) where r is the user position vector with respect to the reference frame origin and t is time. By integrating the obtained value once the velocity of the user in inertial reference frame vi is obtained. In pedestrian navigation the final navigation solution is needed in the Earth frame. The Coriolis theorem provides means to convert the velocity measurement into the ECEF frame using information of the Earth turn rate ωie = [0 0 Ω]T as ve = vi − ωie × r (15) where × denotes a vector cross product. By integrating again the velocity measurement the position of the user r in the Earth-Centered Earth-Fixed reference frame is obtained. Integration of INS and GNSS fills the outages in positioning and provides more robust and reliable systems than either alone. However, both the accelerometer and gyroscope measurements suffer from various error sources, the most important ones being bias, scale factor and noise. Therefore the low-cost MEMS accelerometers used especially in smartphones are too erroneous to be used to obtain the user speed without augmentation with e.g. GNSS or using calibration and special algorithms, e.g. [104] [81]. The drift in gyroscope induced user attitude, especially heading, due to the mentioned errors seems to be still the most significant challenge in indoor and urban pedestrian navigation, although the accuracy of the position solution increases as multiple IMUs are used [10]. In order to achieve the accuracy and continuity of positioning needed for pedestrian navigation, means for overcoming the challenges due to the mentioned errors have to be found. Next section introduces few techniques used for augmenting the inertial sensors. 22 2. Overview of pedestrian navigation 2.3.2 Other Self-Contained Sensors This section presents two self-contained sensors not discussed above and used in the thesis, namely a magnetic compass, also called a magnetometer and a barometer. Magnetometer A magnetic compass provides absolute angle information of the user with respect to magnetic north by measuring the intensity of Earth’s magnetic field [18]. The Earth’s magnetic field has a component parallel to the Earth’s surface pointing toward magnetic north that differs in Helsinki area [2] by approximately 8 degrees from the geographic north (magnetic declination) and has a field intensity of about 0.52 gauss. The magnetic declination varies both from place to place and with the course of time. If the compass is totally parallel to the earth’s surface, the heading (i.e. azimuth) (θ) may be computed from its horizontal measurements XM , YM , neglecting the vertical component ZM , as 90, 270, θ = 180 − (arctan(YM /XM )) ∗ 180/π, −(arctan(YM /XM )) ∗ 180/π, 360 − (arctan(YM /XM )) ∗ 180/π, if XM = 0, YM > 0 if XM = 0, YM < 0 if XM < 0 (16) if XM > 0, YM ≤ 0 if XM > 0, YM > 0. In practice, especially in pedestrian navigation applications, the compass is not totally parallel to the Earth’s surface and the tilt has to be compensated for. If the roll (β) and pitch (φ) of the compass are known, the compass XM and YM measurements may be transformed to the horizontal plane (XH , YH ) as XH YH = XM cos(φ) + YM sin(β) sin(φ) − ZM cos(β) sin(φ) = YM cos(β) + ZM sin(β). (17) Now the heading θ may be computed using the equations in 16 by substituting the variables XH , YH for XM , YM . The compass measurement suffer from errors of two different types; predictable and unpredictable. The predictable errors come from sources such as orientation of the 2.3. Relative Positioning 23 Fig. 2.2. Absolute heading error obtained with a digital compass of a smartphone indoors. navigation platform, soft and hard iron effects and magnetic declination. These errors may be eliminated by calibration or real-time compensation algorithms [22]. The unpredictable errors mainly due to environmental magnetic disturbances may be high, for example as in an office corridor experiment causing a heading mean error of around 18 degrees, when using a MEMS compass built-in a smartphone [94] as shown in Figure 2.2, and are difficult to be removed. The error in the Figure is obtained by comparing the heading direction obtained using the smartphone compass carried by a walking user to the real geographical direction computed using the known path and a floor plan. Therefore the compass heading in indoor environments is too poor to be used without augmentation, but as the magnetic heading is an absolute measure it is an effective measurement when integrated with e.g. a gyro. 24 2. Overview of pedestrian navigation Barometer The self-contained sensors presented above are poor in estimating the user z coordinate, namely the height. Barometers measure the air pressure that can be converted into altitude information in indoor environments. The pressure (p) measured by the barometer is related to the height (h) as [80] h= kRg g T0 p 1− TL p0 (18) where Rg is the universal gas constant 8.3143(N ·m)/(mol·K), p0 is the average sea level pressure 101, 325(kP a), TL is the temperature lapse rate −0.0065(K/m) [48], T0 is the temperature at the sea level and g is the gravitation constant. Cameras are not affected by the error sources deteriorating the measurements from gyroscopes, accelerometers and compasses and GNSS in indoor and urban areas. Observing the heading and translation from consecutive images is also free from preparing the environment a priori. Cameras are also light and small in size as well as reasonable priced. Therefore vision-aiding is a feasible method for augmenting the above mentioned systems in pedestrian navigation applications and will be discussed in the following chapters. 2.4 Estimation Kalman filter is a set of mathematical equations used for estimating the state, e.g. position, velocity and attitude in case of pedestrian navigation, of a process recursively [51]. Kalman filter is used for integrating the measurements obtained using visual methods presented herein with other position measurements. These implementations are discussed in Chapter 6. The state is estimated in a way that the mean of squared errors between the actual measurements and the expected measurements is minimized [109]. The recursive nature of the filter provides means for incorporating information about the past states and using the information to predict the current or even the future states. This is done by using a discrete-time stochastic system model [84] 2.4. Estimation xk = fk−1 (xk−1 , vk−1 ) 25 (19) where fk−1 is a known linear or nonlinear function of the state xk−1 and vk−1 represents process noise. The measurements zk are related to the state through a measurement model (h) as zk = hk (xk , wk ) (20) where wk is measurement noise. In the following the fundamentals of the filter are described for the linear (Kalman filter) and nonlinear (Extended Kalman filter) cases. 2.4.1 Kalman Filter Kalman filter estimates the state of a linear stochastic system by using measurements that are corrupted by zero-mean Gaussian noise wk and are linear functions of the state that is also corrupted by zero-mean Gaussian noise vk−1 [36]. The discretetime system model is [109] xk = Φk−1 xk−1 + wk−1 (21) where k denotes the time epoch and Φ is called a state transition matrix and propagates the state from the epoch k − 1 to k. The measurement obtained is zk = Hxk + vk . (22) Matrix H relates the state to the measurement as was the case for resolving the user position from the pseudorange measurements explained in GNSS positioning above. The state and measurement errors have Gaussian probability distributions p(w) ∼ N (0, Q) p(v) ∼ N (0, R) (23) where Q is process noise and R measurement noise covariance. Kalman filter has a prediction stage and an update stage, where the predicted state estimation is corrected 26 2. Overview of pedestrian navigation using the obtained measurement. The predicted state estimation is called a priori estimate x̂− k and the updated state estimate a posteriori x̂k . The measurement zk is used to update the state as − x̂k = x̂− k + K(zk − Hx̂k ). (24) The factor (zk − Hx̂− k ) is called the measurement innovation and it expresses the difference between the predicted (Hx̂− k ) and observed zk measurement. Matrix K is called the Kalman gain and is computed as − T T −1 Kk = P− k H (HPk H + R) . (25) − The errors of a priori and a posteriori state estimates are defined as − k = xk − x̂k and k = xk − x̂k , respectively. The matrix P− k represents the covariance of the a priori state estimate error. The objective of the Kalman gain is to minimize the obtained a posteriori state estimate error covariance. Equation 24 shows that when the measurement error covariance is small and therefore the measurements are reliable, the gain is large and the innovation is weighted heavily as when the state estimate error covariance P− k is small, the gain is also small and a priori estimated state is trusted more. Kalman filter is initialized by setting values for the initial state x0 and initial state error covariance P0 . The measurement covariance matrix R is usually set a priori and kept constant as the matrix H, process noise covariance matrix Q and the state transition matrix. The algorithm then recursively predicts the state as x̂− = Φk−1 x̂k−1 k P− = Φk−1 Pk−1 Φk−1 + Q k (26) and updates the state estimate and state error covariance when the measurement is obtained incorporating the new Kalman gain computed using (25) as − x̂k = x̂− k + Kk (zk − Hx̂k ) Pk = (I − Kk H)P− k. (27) 2.4. Estimation 2.4.2 27 Extended Kalman Filter The Kalman filter is a comprehensive method for estimating the state when both the state model f and measurement model h are linear. However this is not true for all applications including positioning and therefore other means, like an extension of the algorithm called Extended Kalman filter (EKF), should be used [65]. In EKF the state (19) and measurement (20) models are xk = fk (xk−1 ) + wk−1 zk = hk (xk ) + vk . (28) In the case of the Kalman filter the state estimates are updated using the measurements as in EKF the nominal value of the state xk is updated with the perturbations (δxk ) as xk = xk + δxk . (29) EKF linearizes the measurement matrix H around the mean of the prior state approximating the non-linearity by a Taylor expansion. Therefore EKF is a Best Linear Unbiased Estimator (BLUE) minimizing the expectation E(||xk − x̂k ||2 ) [65]. The integration processes discussed in the thesis use different variations of Kalman and Extended Kalman filters. These were chosen as they are prevailing means for observation integration and estimation in the navigation field. The performance of EKF is however poor when the state and measurement models are highly non-linear and in such cases other estimators, e.g. Unscented Kalman Filter (UKF) [27] or Particle Filter (PF) [50] might be an alternative and should be a topic for further research. 28 2. Overview of pedestrian navigation 3. COMPUTER VISION METHODS FOR NAVIGATION This chapter introduces the basics of computer vision. Herein the real life matters that are seen in the field-of-view of the camera are called objects and their twodimensional images features. Humans inherently possess a good quality ”stereo camera”, namely eyes, and the human visual perception is capable of filling in missing information. Therefore it is easy for a human to understand perspectives, evaluate distances and occluded parts of the objects in the scene. In the case of the computer vision, objects in the scene are seen as sets of points of digitized brightness value functions. The form of these features, i.e. the point sets, change related to the pose of the camera and the lightness of the environment. Therefore care has to be taken while the features are extracted and matched in images. Deduction of motion information from images is also challenging. The methods used widely in vision-aided navigation research, also in the approaches presented in this thesis, are explained below. 3.1 Camera, Fundamental and Essential Matrices and Coordinate Frames The principles explained in this section are mainly derived from [38]. The objects in the 3D world are mapped into 2D image features using projective transformations. These projections do not preserve the properties of shape, length, angle, distance or ratio of distances, but they do preserve the property of straightness. As a result of projective transformation the lines parallel in the scene seem to intersect in an image at a point, called the vanishing point. Therefore, to obtain a projective geometry space, the Euclidean geometry has to be augmented with a point and line in the infinity. Also, two coordinates (x, y) presenting a point in Euclidean space are replaced with a triplet (x, y, 1) called homogenous coordinates in projective space. An object point having coordinates XN = (X, Y, Z) in the world (navigation) frame is transformed into the camera frame XC using the rotation (R) of the camera frame 30 3. Computer vision methods for navigation with respect to the world frame and translation of the camera origin (t) with respect to the world frame origin as XC = RXN + t. (30) The methods presented in this thesis assume a pinhole camera model, in which an object point in the camera frame expressed in homogenous coordinates XC = (X, Y, Z, 1) is mapped to the point x = (f X, f Y, Z) in image plane. f is called the focal length and a line perpendicular to the image plane going from the camera centre, called principal axis, meets the plane at distance f in a point called principal point. World, camera and image frames as well as principal point and focal length are shown in Figure 3.1. An important note is that as opposed to the established means of coordinate frame configuration in navigation research presented in the previous chapter, in computer vision the y-axis is pointing up and z-axis along the camera’s principal axis. This is an issue that has to be accommodated for while forming the navigation solution. The mapping of the world point X to the homogenous image point x = (x, y, 1), where x = f X/Z and y = f Y /Z is characterized by a 3x4 camera matrix P as x = PX. When the calibration matrix K is known the world point X is mapped into image point x using camera matrix P = K[R|t], where [R|t] denotes a 3x4 matrix composed of a 3x3 matrix R and a 3x1 column vector t. When the object points all lie on a plane, point correspondences xi , x0i in two images are related by a homography expressed using a 3x3 matrix H as Hxi = x0i . The point vectors have three entries but are defined only up to a scale and therefore four point correspondences (each having two coordinates) are needed to resolve the ambiguous values in H. If three of these point correspondences are collinear, the homograpy is said to be degenerate and does not have a unique solution. Degeneracy problems are addressed in more detail in Chapter 5. Image points must be normalized for the solutions of homography to be correct. The image points x are normalized using the camera calibration matrix K, discussed in detail below, as x̂ = K−1 x. When the object points do not lie on a plane but are tracked from a real 3D scene, a Fundamental matrix (F) has to be computed. The Fundamental matrix encompasses the intrinsic projective geometry between two views, meaning that only the rotation and translation of the two camera centers (also one camera between two images) and the internal camera parameters represented by the Calibration matrix K, affect the 3.1. Camera, Fundamental and Essential Matrices and Coordinate Frames 31 Fig. 3.1. Camera, image and world coordinate frames. matrix. In other words, Fundamental matrix (F) represents the epipolar geometry between the two views, visualized in Figure 3.2. The figure shows the image x, x0 of an object point X, the epipoles are intersections of the baseline between the optical centres of the cameras and the corresponding image planes. If only the position of the first image point x is known, epipolar geometry restricts the location of the second image point x0 to lie on the epipolar line, which is the line joining the image point and the epipole, and an epipolar plane is configured by the baseline and epipolar lines. The Fundamental matrix F is a 3x3 matrix and is defined for all corresponding points in two images (x, x0 ) as x0 Fx = 0. (31) At least seven corresponding points have to be matched from two images to compute the Fundamental matrix. For a general motion the rotation R, translation t and cameras’ internal parameters K, K0 encompassed in the Fundamental matrix relate the 32 3. Computer vision methods for navigation Fig. 3.2. The epipolar geometry between two images including epipolar plane, epipoles (e, e0 ) and epipolar lines (l, l0 ) [31]. image points in the first and second images (x, x0 ), respectively, as x0 = K0 RK−1 x + K0 t/Z (32) where Z is the z-coordinate, i.e. the depth, of the object point. When the image points are normalized as was explained above, the Essential matrix may be used instead of the Fundamental matrix, as x̂0T Ex̂ = 0, where E = K0T FK. 3.2 Feature Extraction First, the features have to be extracted for solving the motion of camera between consecutive images. SIFT features, explained below, are good features to be matched, when the environment contains many distinguishable objects. However, when the environment is poor with features, such as an office corridor, features arising from the constructions, like corners and lines, are more robust to be used. Below, first the procedure called filtering is explained, because of its use in noise reduction from images as well as edge detection. Then the extraction of two types of features used herein, SIFT features and lines, is explained. 3.2. Feature Extraction 3.2.1 33 Filtering Filtering is used for finding patterns and reducing noise from images. Filtering replaces the value of an individual pixel (x, y) with a weighted sum of its neighbors. Different weights correspond to different processes [31]. The pattern of weights is denoted as the kernel of the filter. The process of employing a filter is usually called convolution and is defined as Cij = X Gi−x,j−y Ix,y (33) x,y where the ith and jth component of the convolution result is denoted with Cij , I is the image and Gi−x,j−y is the kernel of the convolution. A symmetric Gaussian kernel has the form of the probability density for a 2D Gaussian random variable and is a good kernel for noise reduction convolution. Using a large standard deviation (σ) in convolution emphasizes the weight of the neighboring pixels and reduces the noise heavily, though causing some blurring. The Gaussian kernel is presented as Gσ (x, y) = 3.2.2 (x2 + y 2 ) 1 exp(− ). 2πσ 2 2σ 2 (34) SIFT Features Scale Invariant Feature Transform (SIFT) [70] is an approach based on transforming an image into local feature vectors; SIFT descriptors, describing the intensities around image points that are found as maxima or minima of a difference-of-Gaussian function. Each vector is invariant to image translation, scaling, and rotation and partially invariant to illumination changes and affine or 3D projections. The process for using SIFT features is divided into to parts: keypoint localization and computation of a SIFT descriptor. First, the keypoints used for computing the SIFT descriptors are localized as follows. The minima and maxima of a difference of Gaussian function are computed in SIFT by building an image pyramid and resampling the data in each level. The 1D Gaussian kernel used is 34 3. Computer vision methods for navigation g(x) = √ −x2 1 e 2σ2 2πσ (35) √ and σ = 2 [70]. The image is convolved twice using the mentioned sigma and √ resulting in an image after the first convolution with smoothing of σ = 2 and an image after two convolutions using the same sigma and having an effective smoothing of σ = 2. Difference of Gaussian is obtained by subtracting the two images from each other. Then, the image resulted from the second convolution is resampled using bilinear interpolation and a pixel spacing of 1.5 resulting in an image having each pixel as a constant linear combination of the four adjacent pixels. The maxima and minima are found by comparing each pixel to its neighbours. Secondly, the image resulting from the first convolution (I 1 ) in each level is processed for obtaining the image gradient magnitudes (Mij ) and orientations (Oij ) as Mij = q 1 1 (Iij1 − Ii+1,j )2 + (Iij1 − Ii,j+1 )2 (36) Oij 1 1 = arctan 2(Iij1 − Ii+1,j , Ii,j+1 − Iij1 ). The gradient magnitudes (Mij ) are given a threshold of 0.1 times the maximum possible value to provide robustness to changes in illumination. Its effect on the orientations (Oij ) is much lower. The orientation invariance is obtained by convolving the image using Gaussian kernel with large σ-value and by multiplying the weights with the corresponding gradient value. A histogram with 10 degree intervals is built from the convolution results and the dominant orientation of the feature is the peak in the histogram. As a result the features in the image are represented with descriptors having a stable location, scale and orientation also invariant for changes in illumination in consecutive images. Feature detection is an active research area in computer vision. Although SIFT is a comprehensive method, faster algorithms have also been developed [12] [88] [87] and their suitability especially for smartphone applications should be assessed. 3.2.3 Line Extraction Indoor and urban environments are constructed in a way that their structures constitute a three dimensional grid defining a orthogonal coordinate frame, also called Man- 3.2. Feature Extraction 35 hattan grid [25], containing straight parallel lines and therefore the methods based on line features are suitable for these environments otherwise poor in features [55]. Lines are also good features for the basis of visual positioning because they are invariant to changes in the lighting, which is crucial especially for indoor positioning, but also because straight lines remain straight in projective transformations and are not disturbed by dynamic objects that are not blocking the view to all lines in the scene. Line extraction begins by identifying edges of all features in an image and then separating the straight lines from other features. These are explained below. Edge Detection Fast changes of brightness in the image indicate edges of objects. The brightness of the pixel in an image depends on the characteristics of light sources as well as the traits and orientation of the surface. The orientation of the surface is specified with surface gradients. Canny edge detector [17] calculates magnitudes and directions of these gradients. It is an optimal algorithm for detecting the edges requiring low error rate of the calculations, the points to be well localized, meaning that the distance between the calculated location of the edge and the real one has to be minimal, and refusing multiple responses for an edge. In a two dimensional image the edge has a position and an orientation. The direction of the tangent to the edge contour is called edge direction. The edge is found by convolving the image using a first derivative Gn in direction n of a two-dimensional Gaussian G as a kernel and defined as Gn = δG = n · ∇G δn (37) x2 + y 2 ). 2σ 2 (38) where G = exp(− Now an edge point is a local maximum when the image I is convolved using Gn as a kernel. The local maximum is found from an image pixel fulfilling δ2 G·I=0 δn2 (39) and the edge strength is calculated as |Gn · I| = |∇(G · I)|. (40) 36 3. Computer vision methods for navigation A pixel having a magnitude of local maximum along the gradient direction belongs to the edge and the process of finding the maxima using (39) is called non-maximum suppression. The set of possible edge pixels found by looking for local maxima contains too many members. The pixels having weak response to an edge have to be excluded using a procedure called hysteresis. Hysteresis evaluates each pixel in the possible edge set using two thresholds. All pixels having edge strength (40)above the upper threshold are classified as part of an edge and all below the lower threshold as not belonging to the edge. Pixels between the two thresholds are evaluated based on their neighbours. If the neighbour belongs to an edge, then the pixel belongs also, otherwise not. After all pixels in the possible pixel set are deemed to belong to an edge or not, the optimal edge set is defined. Canny edge detection is one of the most used edge detectors, but many others also exist, for example Sobel, Laplace, and Prewitt operators [98]. Separating the Lines from Other Edges Canny edge detection finds all changes of brightness in an image demonstrating an edge of an object. For most computer vision applications there is still a need to find certain shapes among all edges. Hough [46] developed a method for identifying lines among all image pixels. His method maps all image points into a two-dimensional parameter space, the parameters being the slope and the intercept of the line. Each point is then examined and given a vote for all features possibly travelling through it. Since both the slope and intercept are unbounded, a modified form of Hough transform was developed and has been widely exploited in computer vision research [28]. When extended it is suitable for finding also other curves than lines. The method is based on the parameter space (ρ, θ), where ρ is the radius of a line passing through the origin and normal to the line being detected and θ is its angle with the x-axis as shown in Fig. 3.3. A straight line including the pixel (x,y) is then defined as a sinusoid ρ = x cos(θ) + y sin(θ). (41) When the possible values of θ are restricted to the interval [0, π] every line in the image plane corresponds to a unique point in the parameter space defined plane. Now the curves going through a common point in the parameter plane correspond to the image points on a specific straight line. Therefore the lines are identified by 3.3. Image Matching 37 Fig. 3.3. Formulation of parameters ρ and θ in Hough transform. looking for the points in the (ρ, θ)-parameter space having local maxima of votes. The weakness of the otherwise sophisticated algorithm is that it is computationally heavy. As in pedestrian navigation the real-time processing of algorithms is crucial, more efficient line detection algorithm based on probabilistic Hough Transform was developed in this thesis and will be presented in Chapter 6. 3.3 Image Matching Matching is a process of identifying the corresponding features in two images of the same scene taken at different viewpoints, different times, or by different sensors (cameras). As the SIFT descriptors are invariant to rotation and translation, their matching is restricted to finding the most similar descriptors in the two images, i.e. the descriptors with minimum Euclidean distance [71]. SIFT functions well for matching in environments full of features, such as in outdoors, but suffers from errors if the images are comprised mostly of vegetation or dynamic objects [102]. When more robust matching is needed from the noisy image data RANdom SAmple 38 3. Computer vision methods for navigation Consensus (RANSAC) [29] is used. The RANSAC algorithm enlarges the set composed by the minimum set needed for the solution with the points within some error tolerance. This is done by searching for a random sample of points that leads to a fit of the model in question for which many of the points agree. This leaves the outliers out from the data used in calculations. The algorithm is used widely in computer vision applications such as vanishing point detection. A comprehensive explanation of the algorithm with examples is given in [31]. RANSAC is however computationally quite heavy and therefore not used in the methods discussed in this thesis emphasizing the computational efficiency. 3.4 Camera Calibration As was explained earlier the operations performed for mapping the objects into images are described using projective geometry. However, navigation solutions need the information to be presented in Euclidean reconstruction, i.e. in correct distances and angles. This may be done by using a calibrated camera for capturing the images. Calibration provides information about camera’s intrinsic parameters and is represented using a calibration matrix K. The camera intrinsic parameters are focal length (fx , fy ), principal point (u, v), skew coefficient (S), aspect ratio and distortions. Focal length is defined as the distance between the centre of the cameras lens and the sensor while taking a focused image of an object that is infinitely far. Principal point is the intersection point of the cameras optical axis with the image plane, as was shown in Figure 3.1. Distortions blur the image due to the fact that the focal length varies in different points of the lens. The skew comes from manufacturing errors and makes the two image axes non-orthogonal, the skew coefficient defines the angle between these axes. The general form of the calibration matrix is fx S u Kg = 0 fy v 0 0 1 and aspect ratio may be computed as fy fx . (42) 3.4. Camera Calibration 39 The skew in a normal camera is usually zero, except when taking an image of an image, for example, when enlarging a negative [38]. A reduced form, with zero skew, of a camera matrix K is normally used for computer vision applications and is fx 0 u K = 0 fy v . 0 0 1 (43) It is adequate for a pedestrian navigation system to calibrate the camera once and assume the parameters unchanged since. The calibration may be done by photographing a certain model image from different viewpoints and then calculating the parameters using the images and the known geometry of the model. In the methods described in the thesis calibration is implemented using a Matlab application [15]. A camera may be calibrated also with a single image using vanishing point information and the zero assumption of the skew. If the positions of three vanishing points can be recovered, the focal length and centre of projection (the principal point) may be determined [60]. When the accuracy of the navigation solution may be compromised for the sake of adaptability, the focal length may be found from the images Exchangeable Image File (EXIF) data but is an average of values the cameras of the type in question have and therefore not as accurate as the focal length obtained by calibrating the certain camera used, and the principal point may be assumed to be the central point of the image. 3.4.1 Distortion The best accuracy for vision-aided calculations is obtained when a camera with a wide angle lens offering an extended field-of-view is used, as will be shown in Chapter 4. However, the wide angle lens results in radial distortion in the images. If the distortion is not corrected, the calculation accuracy suffers. According to [38], the rectification of the whole image introduces aliasing effects complicating the feature detection. For optimal result, when a wide angle lens camera is used the radial distortion is corrected only for the features extracted from the images with a model presented in [72] and explained below. The radial distance (rd ) of the normalized distorted image points (xd , yd ) [72] from 40 3. Computer vision methods for navigation the radial distortion center, which is in most cases the principal point (u,v), is q rd = x2d + yd2 . (44) Using the radial distance of the distorted image points, the radial distance (r) of the corrected image points (xu , yu ) is obtained as r = rd (1 − k1 rd2 − k2 rd4 ). (45) The constants ki are the distortion values specific to the camera and are obtained from calibration. The corrected and distorted image points are related as xd = xu (1 + k1 r + k2 r2 ) yd = yu (1 + k1 r + k2 r2 ). (46) The effect of the distortion correction is shown in the case of the feasibility of the visual gyroscope in Chapter 4. 4. VISUAL GYROSCOPE Urban scenes in indoor and downtown environments consist mainly of straight lines in three orthogonal directions [25]. The projective transformations mapping the three dimensional scene into a two dimensional image preserves the straight lines, but not the angles, and therefore the lines parallel in the scene seem to intersect in the image. The lines in three orthogonal directions form three intersection points called vanishing points. The vanishing points arising from lines in x- and y-axis directions are called horizontal and vertical vanishing points, respectively, and from the lines in the direction of propagation (z-axis) the central vanishing point. A vanishing point is the intersection point v of a ray through the camera centre having a direction d, and of all other lines also having direction d, and the image plane. The vanishing point v is related to the direction d as v = Kd [38] where K is the camera calibration matrix encompassing the intrinsic parameters of the camera. The directions d and d0 of two vanishing points in consecutive images are related by the Rotation matrix R as d0 = Rd, i.e. rotation of the camera between the two images. The change of position of the camera between the images, meaning pure translation with no rotation, has no effect on the vanishing point location. The rotation R of the camera may also be thought as the rotation from the initial position where the camera is aligned with the navigation frame so that the z-axis of the camera is pointing to the direction of the propagation and the x- and y-axes are orthogonal to the z-axis as shown in Figure 3.1, i.e. the orientation of the camera with respect to the navigation frame. In this initial configuration, which requires a careful alignment of the camera in a way that its optical axis coincides with the construction of the environment, the central vanishing point vz lies at the principal point and the other two vanishing points at infinity on the x and y image axes. Then the orientation of the camera R is described with V = KR, where V is the vanishing point location matrix incorporating the horizontal, vertical and central vanishing points, [vx vy vz ] and vi = (xvi , yvi , 1)T , K is the calibration matrix containing the camera intrinsic 42 4. Visual gyroscope parameters (defined in Chapter 3), and R is the rotation matrix of the camera [32]. As the perfect alignment of the camera optical axis and the environment is not possible in reality, the navigation solution is initialized by measuring the orientation of the user using other means, as will be discussed in Chapter 6, and then propagating the orientation using the visual gyroscope measurements in two consecutive images. Also, the initialization is crucial in order to present the orientation of the camera with respect to the scene obtained using visual perceiving and explained in this chapter as heading, pitch and roll in the navigation frame. From the change of the vanishing point locations in consecutive images the change in the camera attitude may be monitored and the camera used as a visual gyroscope. It should be noted that the methods presented in the thesis assume that the camera is not experiencing heading change, pitch or roll that exceeds 90 degrees between consecutive images. When only the central vanishing point is obtained, the visual gyroscope provides the pitch and heading of the user, if also either the horizontal or vertical vanishing point is perceived, the full three dimensional attitude is provided. First, the method for obtaining the central and vertical vanishing points is explained. Secondly, the rotation matrices for the user heading and camera pitch are given, then the configurations for full attitude. Effective error detection is crucial for not deteriorating the integrated navigation solution using erroneous visual measurements and therefore a concept of Line Dilution of Precision (LDOP) has been developed and will be presented. The influence of different camera characteristics on the performance of the visual gyroscope has been studied and will be discussed. Finally, smartphone implementation of the visual gyroscope will be represented. 4.1 Locating the Vanishing Points In order to find the central vanishing point and furthermore the heading change and the pitch of the camera, the straight lines in the direction of the propagation must be identified in the image. Because the images are noisy, especially the ones taken with a smartphone camera and in an indoor environment, the images must be preprocessed. The images are smoothed using a Gaussian filter by replacing the image pixels with a weighted sum of their neighbour pixel values and therefore reducing the noise [31]. All edges in images are identified with a Canny edge detector and the straight lines separated from the set of all edges with the Hough Lines algorithm as 4.1. Locating the Vanishing Points 43 explained in Chapter 3. All lines found are classified based on their orientation with respect to the camera frame, as going in the direction of the z-axis, totally horizontal or totally vertical and horizontal or vertical. Totally horizontal (and vertical) lines have angle of zero degrees with respect to the x-axis (y-axis), i.e. the slope of the line is zero (infinite), whereas the ones classified as horizontal or vertical have a slope lower than a threshold with respect to the corresponding axis. The central vanishing point is found by using a voting scheme, namely each intersection of all line pairs, i.e. vanishing point candidate, is voted for by all the lines found and the point that gets most of the votes, in this case the intersection point of most of the lines, is selected as the correct one. However, this configuration assumes that the main proportion of the lines in the direction of propagation are parallel with the construction, i.e. a specific non-parallel pattern of floor or wall decoration is not dominating. The classification of the lines and the central vanishing point found with the method explained are shown in Figure 4.1. A misclassification may be seen on the right edge of the Figure, where blue, turqoise and green lines all represent lines going in the direction of propagation, but because of their slopes three are classified as horizontal and one as totally horizontal line. Only the 2 degrees-of-freedom attitude may be obtained from one vanishing point location and in order to resolve also the roll at least one other vanishing point in addition to the central one has to be located. Experiments show that the horizontal lines are infrequent in urban and indoor environments and therefore in this thesis the vertical vanishing points are tracked. When the camera is experiencing only a small roll (as in Figure 4.1), the lines in vertical directions are mainly totally vertical due to the relatively low resolution of the images, which reflects the fact that the pixels obtained with a camera experiencing small roll are overwhelmed by the noise in images. When the camera has a larger roll, most of the vertical lines have slopes deviating from infinity, as is the case in Figure 4.2. The ratio of the number of vertical lines that have finite slopes and that of the lines that have infinite slopes is calculated. If the ratio exceeds a threshold, the camera is experiencing roll, and the location of the vertical vanishing point is calculated similarly as the central one and incorporated into the rotation calculations. The locations of the central vanishing points in Figures 4.1 and 4.2 are shown using a red circle. The vertical vanishing points lie outside the image and therefore are not shown. 44 4. Visual gyroscope Fig. 4.1. Lines in an image with no (or minor) roll are classified based on their slope as totally vertical or horizontal (green), vertical (white dotted), horizontal (turqoise) and along the direction of propagation (blue). Red dot is the central vanishing point. 4.2 Attitude of the Camera As discussed above, the heading, tilt and roll of the camera are observed with respect to the scene, namely to the initial orientation where the camera x-, y- and z-axis are aligned with the axis of the scene. In order to obtain a navigation solution these have to be related to the navigation frame as will be shown later. When the camera is rotated from the initial position counter clockwise with heading θ degrees the rotation matrix has the form cos θ 0 sin θ R= 0 1 0 − sin θ 0 cos θ and the pitch towards the floor plane φ degrees (47) 4.2. Attitude of the Camera 45 Fig. 4.2. When the camera experiences roll the number of totally vertical lines decreases. Lines are classified based on their slope as totally vertical or horizontal (green), vertical (white dotted), horizontal (turqoise) and along the direction of propagation (blue). Red dot is the central vanishing point. 1 0 0 R = 0 cos φ − sin φ . 0 sin φ cos φ (48) When the camera experiences these two rotations the matrix R becomes cos θ 0 sin θ R = sin φ sin θ cos φ − sin φ cos θ . − cos φ sin θ sin φ cos φ cos θ (49) When the calibration and rotation matrices are as explained above, the heading (θ) and pitch (φ) angles may be obtained from the location of the central vanishing point as 46 4. Visual gyroscope fx 0 u cos θ 0 sin θ V = KR = 0 fy v sin φ sin θ cos φ − sin φ cos θ 0 0 1 − cos φ sin θ sin φ cos φ cos θ (50) resulting in fx cos θ − u cos φ sin θ u sin φ fx sin θ + u cos φ cos θ V = fy sin φ sin θ − v cos φ sin θ fy cos φ + v sin φ −fy sin φ cos θ + v cos φ cos θ − cos φ sin θ sin φ cos φ cos θ (51) and thus fx sin θ + u cos φ cos θ vz = −fy sin φ cos θ + v cos φ cos θ . cos φ cos θ (52) The vanishing point vz is presented in homogenous coordinates as (x, y, 1), where the x, y are the pixel coordinates of the central vanishing point obtained using the voting scheme explained above. As the third row of the (52) equals 1, the heading (θ) and pitch (φ) may be computed as x−u ) fx y−v φ = arcsin( ). −fy cos(θ) θ = arcsin( (53) An important note is that the heading and pitch obtained by tracking the vanishing points reverses to the definition of the navigation frame and has to be accommodated for in the integration. To be exact, when the camera rotates clockwise, i.e. its heading increases in navigation frame, the vanishing point location moves counterclockwise and its x-coordinate decreases and therefore the obtained visual heading θ has an opposite sign compared to the heading in the navigation frame. Also in the navigation frame the pitch increases upwards as φ decreases. Tracking the central vanishing point provides information about the user heading change and camera pitch. When also roll β is required, at least two vanishing points 4.3. Error Detection 47 are needed, in this thesis the other being the vertical vanishing point as explained above. Now the rotation matrix R of the camera experiencing only roll has the form cos β − sin β 0 R = sin β cos β 0 . 0 0 1 (54) And the full rotation of the camera experiencing simultaneously change in heading, pitch and roll is cos β cos θ − sin β sin φ sin θ − sin β cos φ cos β sin θ + sin β sin φ cos θ R = sin β cos θ + cos β sin φ sin θ cos β cos φ sin β sin θ − cos β sin φ cos θ − cos φ sin θ sin φ cos φ cos θ (55) and all three angles may be resolved using the two vanishing point locations. 4.3 Error Detection In the case when the location of the vanishing point is known a priori to some extent, the intersection points deviating remarkably from the estimate may be discarded and an accurate orientation measurement obtained [82]. The method, used for UAVs (unmanned aerial vehicles), determines the possible vanishing point locations based on the known potential attitude of the camera. The error detection based on the estimated vanishing point location is suitable for robot and vehicle navigation, where the motion is to some extent foreseeable and stable. However, no limitations may be imposed for the possible vanishing point locations in pedestrian navigation, especially if the vision-aiding is done using a smartphone camera, because the motion of the pedestrian is much more unpredictable. Therefore a method evaluating the vanishing point accuracy based on the geometry of lines used to compute it is developed in this thesis. The concept of Dilution of Precision (DOP), originally specifying the geometry of satellites used for obtaining a position solution with GNSS [54], was introduced into vision-based navigation by [69]. Their DOP value presented the orientation and position of pseudo ground control points (PGCP) needed in navigation using a camera 48 4. Visual gyroscope and 3D maps constructed of the environment. Now the concept of an LDOP, a dilution of precision value demonstrating the geometry of the lines used for calculating the position of the vanishing point is developed. The method is based on dividing the scene in an image into four quarters around the estimated vanishing point. If lines intersecting at the vanishing point are found from all four sections, the estimated vanishing point is correct with high probability, and it is given a minimum √ LDOP value, namely 2. The situation is visualized in Figure 4.3a. The justification for the minimum LDOP value selection is given below, where the situation of reduced line geometry is explained. If the lines intersecting at the estimated vanishing point are from three of the sections, the line geometry is still determined sufficiently accurate and a low LDOP value is assigned to the estimated vanishing point. When the geometry of lines is reduced, namely the lines are found only from two sections, shown in Figures 4.3b and 4.3c, or especially only from one, as in Figure 4.3d, more evaluation of the geometry must be done. In the case where lines are found from two sections, the accuracy of the estimated vanishing point is strongly dependent of the orientation of the lines found. If the lines are from opposite quarters, as in Figure 4.3c, namely the angle between the lines is close to or larger than 180 degrees, the intersection is in an incorrect location with a higher probability and a higher LDOP value is assigned than for the case where the lines are from adjacent quarters, namely the angle between them is less than 180 degrees as in Figure 4.3b. This reasoning is derived from the fact that often the lines from opposite sections are actually parts of the same line split by the line detection algorithm due to changes of brightness in the image and therefore there is in reality no intersection point. When the line geometry is reduced into a set of lines found only from one section, as shown in Figure 4.3d, the LDOP evaluation is based on the mutual alignment of the lines using a method proposed in [5]. The angle between all lines in the set and the x-axis of the image is calculated and the pair with the largest angle between them is selected. The angle between the first line of the pair and the image x-axis (α1 ) and between the second line of the pair and the image x-axis (α2 ) is obtained using the estimated central vanishing point location (xvz , yvz ), the starting point (xi , yi ) of line i ( i=1,2) and distance (Di ) of the estimated vanishing point from the starting point of line i as 4.3. Error Detection 49 Fig. 4.3. Four images resulting from the vanishing point calculation. The image is divided into four sections around the estimated vanishing point (section borders shown with black dotted lines) and its reliability is evaluated based on the geometry of the blue lines used for calculations. The vanishing point is found correctly in images a, b and d (outside the image). The black continuous lines are used to calculate the vertical vanishing point. xvz − x1 D1 yvz − y1 sin(α1 ) = D1 xvz − x2 cos(α2 ) = D2 yvz − y2 sin(α2 ) = . D2 cos(α1 ) = (56) The matrix H, characterizing the line geometry and containing unit vectors of the 50 4. Visual gyroscope lines, is H= cos(α1 ) sin(α1 ) cos(α2 ) sin(α2 ) ! . (57) The matrix G is formed from the geometry matrix H using G = (HT H) and G−1 is 1 |G| ! − cos(α1 ) sin(α1 ) + cos(α2 ) sin(α2 ) sin2 (α1 ) + sin2 (α2 ) − cos(α1 ) sin(α1 ) + cos(α2 ) sin(α2 ) cos2 (α1 ) + cos2 (α2 ) G−1 = (58) where |G| is the determinant of G and is |G| = sin2 (α1 − α2 ). The GDOP in GNSS positioning applications is calculated using the diagonal values of the G−1 matrix as explained in Chapter 2 and the result may be transformed for the case of two satellites [5], and further used to calculate the LDOP for the lines as s LDOP = 1 (cos2 (α1 ) + cos2 (α2 ) + sin2 (α1 ) + sin2 (α2 )). |G| (59) q 2 For any two angles (α1 , α2 ) (59) may be now written as LDOP = |G| . The √ smallest possible LDOP value is 2 and arises from the maximum angle between two lines lying in the same quarter section, namely 90 degrees. When the magnitude of the angle between the lines is more than 10 degrees, the accuracy of the estimated vanishing point location is still sufficient, but decreases rapidly when the angle decreases. Evaluation of the accuracy of the estimated vertical vanishing point cannot be done using the line geometry, but is based on monitoring the roll obtained. A camera can be rolled over 15 degrees for the purpose of obtaining images with special viewpoints [32], but it is seldom done unintentionally and such large roll is not convenient for vision-aided navigation. Because the calculation of an accurate vertical vanishing point is not always possible (due to noise in the images and shortage of lines) the roll‘s magnitude must be monitored. If the roll‘s magnitude exceeds 15 degrees, the 4.4. Performance of the Visual Gyroscope 51 vertical vanishing point is discarded as it is deemed erroneous. In these situations the roll is set to zero and the heading and pitch are calculated more accurately using only the central vanishing point. If the camera is actually experiencing roll when the calculations fail and the roll is set to zero, errors appear also in the heading and pitch. The effect of this error into the user attitude observations is discussed in detail below. When the value of LDOP is large, the uncertainty of the vanishing point location is large. Therefore visual gyroscope’s measurements possessing a large LDOP value are discarded in the calculation of the navigation solution. As a result possible visual attitude errors arising from poor line geometry are avoided. 4.4 Performance of the Visual Gyroscope The visual gyroscope is a comprehensive method providing a heading change measurement but it does not provide any absolute value of the heading and must therefore be integrated with measurements from other sources. Accurate initialization of the visual gyroscope’s heading is crucial using absolute heading information. The visual gyroscope is based on the vanishing point observations calculated using lines found from the image of the environment and as a result cannot be used during sharp turns, when the visibility to building boundaries forming lines is lost due to the camera being too close to the wall. If however the view to the lines is maintained during a turn, the change of the world frame may be perceived only when the image rate is high. Therefore the visual gyroscope’s heading need to be augmented with heading measurements obtained using another system, e.g. a rate gyroscope, a magnetometer or a floor plan also occasionally during navigation. Identification of the correct vanishing point location is dependent on the number and geometry of the lines found in the image. Low lighting of the navigation environment reduces the number of lines found, possibly resulting in erroneous vanishing point location. In addition to low lighting, the problem of obtaining an erroneous vanishing point location may arise from the selection criteria of the Hough Line algorithm parameters. The parameters used in this thesis are adjusted so that lines shorter than a threshold are left out of the computation to reduce the number of nonparallel lines disturbing the vanishing point calculation. An optimal parameter for indoor environments was found by through experimentation, being 25 pixels. When the scene consists of a plane, there are no lines in the image and the vanishing point 52 4. Visual gyroscope Table 4.1. Statistics for heading change accuracy, all units degrees Statistics Heading change Pitch min error 0 0 max error 18.4 10.7 mean error 0.8 0.3 std of error 0.6 0.3 cannot be calculated. Sometimes, despite the parameter selection, the set of lines found from the scene consists of a number of nonparallel lines resulting in erroneous vanishing point location. The accuracy of the heading change as well as the pitch estimates were evaluated with a test containing 7555 images taken with a static camera in an office environment. The test environment had changing light as well as dynamic objects in the scene of the camera sometimes encompassing the view totally. Table 4.1 shows the statistics for the heading change and pitch obtained in the 2.5 hour time span of the test, the mean errors being 0.8 and 0.3 degrees, and standard deviations 0.6 and 0.3 degrees, respectively. The errors in heading and pitch measurements all came from dynamic objects namely human, blocking the view of the camera. The mean errors and deviations are good for an indoor navigation system, however the office corridor is a favourable environment for the method. Additional test results with a moving camera will be presented later in Chapter 6. Allan Variance analysis was performed for the visual gyroscope. The most significant source affecting the gyroscope accuracy, especially a MEMS gyro, is the drift. The Allan variance analysis method [6] was originally developed for the study of oscillator stability, but has since been applied widely to gyro drift analysis. Because the Allan variance method is suitable for the noise study of any instrument, it is applied here to evaluate the noise level of the visual gyroscope. The Allan variance 2 (t ) [57] for the averaging time t is σC A A 2 σC (tA ) = N −1 X 1 (ỹ(tA )k+1 − ỹ(tA )k )2 2(N − 1) (60) k=1 where ỹ(tA )k is the average value of a bin k containing the heading change and pitch values. The averaging time tA is the length of a bin and N is the number of bins formed of the data for the corresponding averaging time. The plotted Allan variance may be used for finding different error types for the sensors [114]. 4.4. Performance of the Visual Gyroscope 53 Fig. 4.4. Allan deviation plot showing the noise in the visual gyroscope. The Allan deviation plot for the 7555 images taken is shown in Figure 4.4. The figure shows large deviations due to the uncorrelated noise affecting the visual gyroscope stability for the short integration times. After the deviation has reached a minimum value, the rate random walk starts to increase the deviation again. The bias instability measure may be found from the minimum value, and is 0.058 degrees/second for the heading, and 0.045 degrees/second for the pitch, at the integration time of 245 seconds. The bias is a result of quantization of an analog signal into discrete image pixels as well as of possible biases in edge and line detection algorithms [53]. The obtained measure is lower than the values obtained for a typical MEMS gyroscope tested in [114]. The test showed also the tolerance of the method to dynamic objects, which may be seen from Figure 4.5. Heading angle and pitch errors due to dynamic objects obscuring the scene were very infrequent. The errors in the visual gyroscope are mostly 54 4. Visual gyroscope Fig. 4.5. Calculation of the central vanishing point is largely tolerant to dynamic objects in the scene. introduced by environmental factors such as lighting, construction of the environment and objects with lines not parallel to the direction of propagation. As discussed before the roll observations may be significantly erroneous due to the restricted amount of vertical lines in the scene and therefore angle measurements over 15 degrees are not trusted but the roll is set to be zero. Evidently the roll is not totally zero in most cases and setting the roll to be zero introduces errors also to heading and pitch measurements. Fortunately, in most cases these errors are small as is shown below using the numerical analysis of resulting error measures. This numerical analysis was done by computing the correct central vanishing point location using the real orientation values presented below, then setting the roll zero and computing the resulting heading and pitch using the obtained vanishing point and Equation 52. In most common use cases when the heading changes and roll between two images are less or equal to five degrees, the errors in estimated heading and pitch are 0.6 and 0.1 degrees, respectively. In an extreme case when the camera is simultaneously experi- 4.4. Performance of the Visual Gyroscope 55 Table 4.2. Effect of roll error on other angle observations Real camera rotation (degrees) Heading 1 5 15 -15 Pitch 1 5 -15 -15 Roll -15 -5 15 15 Errors in observation when roll estimated to be zero (degrees) Heading Pitch 0.3 0.01 0.6 0.1 6.1 2.1 5.2 2.8 encing roll and heading changes of 15 degrees between two consecutive images, the errors in heading and pitch are 6.1 and 2.8 degrees, respectively. When the camera is otherwise static (i.e. the heading change and pitch are around one degree between consecutive images) even a large roll causes small errors to the observed heading and pitch, namely 0.3 and 0.01 degrees, respectively. Table 4.2 summarizes some errors arising from camera motions between two consecutive images when the roll is erroneously estimated to be zero due to vertical vanishing point calculation failures. 4.4.1 Theoretical Analysis of Attainable Accuracy The statistical behavior of errors affecting the visual gyroscope was analyzed in order to obtain an understanding of the attainable accuracy of the method. This was done by observing the errors in the line detection and the vanishing point computation directly affecting the accuracy of the visual gyroscope. The following discussion is derived from [53] which includes more detailed explanations and proofs for the theorems used. For the analysis a unit surface normal of a line (l) to the plane passing through the origin O and a unit vector indicating the orientation of the ray starting from the origin passing a point (X) are presented using normalized unit vectors n and m, respectively, and defined as x X n = Y , m = y , Z/f f (61) where (x,y) is a point on the image plane, f is the focal length and the line (l) is 56 4. Visual gyroscope presented as Xx + Y y + Z = 0. Noise Model The noise in vision-aiding methods is not introduced solely by the camera but also by the image operations e.g. edge and line detection. In the presence of noise the normalized unit vectors presented are observed perturbed as m0 = m + ∆m, where ∆m is a random variable presenting noise. If the noise ∆m is small compared to the vector m the error may be characterised by the covariance matrix C as C(m) = E(∆m∆mT ) (62) where E() denotes expectation and T transpose. The covariance matrix C is symmetric and semi-positive definite with three eigenvalues σ12 , σ22 , 0 and corresponding orthonormal system of eigenvectors u, v, m, C may be expressed in the spectral decomposition C = σ12 uuT + σ22 vvT + 0mmT . Assuming the noise is independent and identically distributed having standard deviations sx , sy in the image x-q and y-axis directions, respectively, the root-square magnitude of the noise is = s2x + s2y . Now is called the image accuracy and measured in pixels and ˜ is defined as ˜ = /f . Error in Line Detection Errors in image resolution affect the line detection, which again affects the vanishing point detection and furthermore the visual gyroscope’s attitude measurements. A line is detected by looking at collinear edge pixels. mα , α = 1, ..., K are the normalized unit vectors representing all edge pixels of a line. Ω represents the disparity, i.e. the angle between m1 and mK , u the orientation of the line and mG its center point. Now, the covariance matrix C of the optimally fitted line is C(n) = mG mTG ˜2 uuT ( + ). K 1 − sincΩ 1 + sincΩ (63) When the length of the line is k the disparity may be defined using the focal length as Ω ≈ k/f . If k is small compared to the focal length, disparity is small and C may be 4.5. Effect of Camera and Setup Characteristics on the Accuracy of the Visual Gyroscope 57 approximated as C(n) ≈ 6κ T κ uu + 2 mG mTG , 3 k 2f k (64) 2 where κ = ρ is called image resolution and ρ is the line density i.e. the number of pixels per unit pixel length. When the length of a line is small compared to the focal length, the covariance further reduces to C(n) ≈ 6κ T uu . k3 (65) Error in Vanishing Point Detection As discussed before, the vanishing point is the intersection point of parallel lines. When their projections into image plane are lα , α = 1, ..., K, the normalized unit vector m presents the vanishing point and nα is normalize unit vector of the lines. nG is the normalized unit vector of a virtual center line of all lines and unit eigenvector for the second largest eigenvalue mC = m × nG . When φα represents deviation, i.e. the angle between nG and nα and θα is disparity of the vanishing point from the center point of the line α, the covariance matrix of the vanishing point is C(m) ≈ PK 6κmc mTc 2 2 3 α=1 wα sin φα / sin θα 4.5 . (66) Effect of Camera and Setup Characteristics on the Accuracy of the Visual Gyroscope Image quality, camera rate and mounting location affect the visual motion perception. In this chapter these aspects are considered from the point of view of the visual gyroscope‘s performance. Many factors affect the accuracy of measurements calculated from images. As discussed before, the vanishing points based rotation observation is dependent on the 58 4. Visual gyroscope amount of straight lines found from the image and is therefore reliant on the scene. The failure of the method in unsuitable scenes sets requirements for the image rate; when the image rate is low, based on the experiments a feasible image rate would be 10 Hz, the probability of obtaining accurate heading change observation is reduced. Image quality and parameters for algorithms used in visual computations affect the amount and correctness of detected line features. Therefore the characteristics of the camera used as a visual gyroscope is significant for the success of the method. The quality and features of the image sensor are crucial in terms of the amount of noise present in the images. The aperture is the lens diaphragm opening inside the camera lens. It regulates the amount of light passing to the sensor. The aperture size is indicated by an f-number. Smaller f-number indicates more light is let in and the higher the image quality in low-light situations. The focal length of the camera, discussed in the previous chapter, influences the sharpness of the image. Images taken using a camera with a wide-angle lens, namely a lens with a short focal length, are sharper than the ones taken using a standard lens. When the pedestrian is navigating and holding the camera in hand, the heading change of the camera may be transformed into the heading change of the user, if the configuration of the camera with respect to the body of the user is carefully considered. The roll and pitch of the camera may be obtained from the locations of the vertical and central vanishing points, but if the camera is not aligned with the user body, they may not be considered as the orientation of the user. Likewise, if the hand is in motion in the heading direction that is not equal to the motion of the body, the heading change provided by the visual gyroscope is not accurate. Three different cameras and two different setups were tested in an experiment done mainly indoors in a challenging environment to address all the factors affecting the quality of visual gyroscope‘s roll, pitch and heading change observations. The three cameras used for the experiments are a GoPro HD Hero helmet camera [1] directed for first responders, a Sony HD video camera aimed for extreme sports [100] and a Nokia N8 smartphone‘s camera [77]. Table 4.3 summarizes their most important parameters. The GoPro Hero is a helmet camera developed for first responders and recreational users. Its wide-angle lens, providing tall HD video stream, gives extended field-ofview both in horizontal and vertical directions. The wide-angle lens increases the number of lines found in addition to providing sharper images. The camera has a 4.5. Effect of Camera and Setup Characteristics on the Accuracy of the Visual Gyroscope 59 fixed lens and captures video with a speed of 30 frames per second. The video was converted into still images having an image rate of 10 Hz and resolution of 1280 x 960 pixels for the camera characteristics experiment. The f-number of the lens is 2.8, resulting in increased performance in low-light indoor environments. Besides providing sharp and extensive images, the wide-angle lens produces distortion. The distortion has to be corrected for in order to obtain accurate vision-based calculations using the method introduced in the previous chapter. The Sony HXR-MC1 is a camera for recording video during extreme sports or in other high dynamic situations. It has a standard lens with an f-number of 3.2. The camera captures video with a speed of 30 frames per second. The video was converted into still images having an image rate of 10 Hz and resolution of 1440 x 1080 pixels. The images taken with the Sony camera are darker and blurred compared to the images captured using the other two cameras, and the view is more restricted. The Nokia N8 smartphone camera has a wide-angle lens and an f-number value of 2.8, increasing the performance of the camera in low-light indoor environments. The camera was programmed to capture still images with a 0.8 Hz rate and resolution of 640 x 480 pixels. The images taken with the Nokia camera have more light than the ones taken using the Sony camera, but they are not as sharp as the ones using the GoPro unit. Figure 4.6 shows the same scene taken with the GoPro (left), the Sony (middle) and the Nokia (right) cameras. The images show the effect of the different camera parameters discussed above. The effects of the above factors, focal length value and low-light tolerance, may be seen in the figure. The images on left and right are taken with cameras having wide-angle lenses and small f-numbers and therefore the images are sharp, bright and wide, while the image in the middle is taken with a camera having a standard lens and larger f-number, in which case the image quality is poorer. The camera’s sensor is the component that converts the light in the lens projected image into electrical signal to be then digitized. Complementary metal oxide semiconductor (CMOS) and charge coupled device (CCD) chips are the two most used sensor types. The main difference between the two types of sensors is the way they read the information [34]. Each sensor consists of millions of light-sensitive photosites which correspond to pixels in an image. In a CMOS sensor information is 60 4. Visual gyroscope Fig. 4.6. An image captured of the same scene with three different cameras showing the effect of different camera characteristics on the image quality. Table 4.3. Parameters for GoPro, Sony and Nokia cameras Parameters GoPro Hero Camera Sony HXR-MC1 Nokia N8 8 mm f/2.8 10 Hz 79.5 mm f/3.2 10 Hz 28 mm f/2.8 0.8 Hz Focal length (35 mm equivalent) f-number Image rate read from each photosite individually whereas in CCD a line of photosites is read at once. This makes the CCD sensors simple to design whereas the CMOS sensors are power efficient and therefore widely used in low-cost cameras as in smartphones but with the cost of introducing more distortion into images. All cameras used in the experiments presented in the thesis contain a CMOS sensor. 4.5.1 Experimental Results The NovAtel SPAN-SE GPS/GLONASS receiver with Northrop Grumman’s tactical grade LCI-IMU was used as a reference for both experiments and was carried in a backpack. The first round of experiments was completed using the GoPro and Sony cameras attached to the upper part of the backpack and with the Nokia N8 smartphone in the hand, as is shown in Figure 4.7. The reference was used to evaluate the accuracy of the rotation angles provided by the visual gyroscope using GoPro’s and Sony’s cameras and the heading change using the Nokia N8 smartphone camera. The evaluation of the full three dimensional rotation was not possible for the Nokia N8 smartphone camera due to the lack of a reliable reference system that could have 4.5. Effect of Camera and Setup Characteristics on the Accuracy of the Visual Gyroscope 61 Fig. 4.7. Experiment setup for testing the effect of different camera characteristics on the heading change accuracy. been attached to the camera, but was possible to the other cameras as they were experiencing the same pitch and roll as the reference system due to their location. The second setup consisted of the reference system and all three cameras attached to the upper part of the backpack, enabling the evaluation of the effect of the camera site to the heading change accuracy. Both setups were tested in experiments with duration of almost 30 minutes each. As the GoPro and Sony video streams were sampled at 10 Hz rate the data collection resulted in 14264 (first round) and 14162 (second round) images using GoPro camera, 14279 (first round) and 14158 (second round) using Sony and 1162 using Nokia N8 smartphone’s camera for both rounds. The different number of images for Sony and GoPro was due to the time stamping method, in which a handheld GPS clock was shown to the camera and the first image used was the first image where the time was seen clearly. The test was conducted on the University of Calgary campus, mainly indoors. The environment was very challenging for the vanishing point based navigation, because it consisted of many turns, doors (i.e. planes) and spacious cafeteria and hall areas (i.e. not all lines were orthogonal and view to lines was restricted in some parts). The pitch, roll and heading change between two consecutive images was computed 62 4. Visual gyroscope Table 4.4. Heading change error statistics GoPro Hero Sony HXR-MC1 Nokia N8 Statistics of heading change error Std Min Max % of images (degrees) (degrees) (degrees) (degrees) success 2.5 2.7 0 17 82 2.8 3.0 0 19 72 2.3 3.3 0 18.6 35 2.6 3.3 0 25 30 4.6 3.7 0 15.5 57 4.4 3.7 0 16 45 Mean Camera Test 1 Test 2 Test 1 Test 2 Test 1 Test 2 using the vanishing point based visual gyroscope measuring the full 3D orientation. Because the heading change between two images was evaluated, when the visual gyroscope measurement failed, the heading change was again computed using the subsequent two successful consecutive images. Statistics of heading change errors for all cameras and two rounds (test 1 and test 2) are shown in Table 4.4. The success rate of the images, based on the LDOP error detection algorithm, is also shown in the table. Before computing the statistics the visual orientation measurements evaluated as erroneous by the error detection were discarded. The mean error in heading change from the visual gyroscope using the GoPro and Sony cameras was around 2.5 degrees, and that using the Nokia N8 was around 4.5 degrees. The percentage of successful images was between 70 and 80 percent for the GoPro camera, around 50 for the Nokia N8 and only around 30 for the Sony camera. While the failed images contained the ones taken during sharp turns, in situations where the light was insufficient and in spacious areas containing non-parallel lines, common for all cameras, the differences in the success rate are explained by the different characteristics of the cameras and are discussed below. Table 4.5 summarizes the roll and pitch error statistics. The roll and pitch accuracy was evaluated only for test 2, because of the lack of a reference system for the rotations of the handheld Nokia N8 in test 1. The mean roll and pitch errors were 0.5 and 2.0 degrees for the Sony, 2.1 and 2.5 degrees for the GoPro and 1.3 and 3.8 degrees for the Nokia N8. The results are elaborated for image quality, image rate, and the location of the camera on the user. 4.5. Effect of Camera and Setup Characteristics on the Accuracy of the Visual Gyroscope 63 Table 4.5. Roll and pitch error statistics Camera GoPro Hero Sony HXR-MC1 Nokia N8 roll pitch roll pitch roll pitch Statistics of pitch and roll Mean error Std Min Max (degrees) (degrees) (degrees) (degrees) 2.1 3.1 0 26 2.5 4.0 0 59 0.5 0.9 0 15.4 2.0 3.0 0 22 1.3 1.9 0 14.4 3.8 4.9 0 43 Image Quality Image quality is an important factor for the success and accuracy of the vanishing point based calculations. As explained before, smaller aperture size and focal length relate to sharper images, especially in low-light situations. When the images are sharp, more lines are found for the vanishing point calculations and they are less noisy. The GoPro camera surpasses significantly the other two cameras in image quality and especially in the field-of-view. The images converted from the Sony video stream were in some places grainy and dark and therefore the amount of lines found was reduced. The limited number of lines disturbs the vanishing point calculations, as may be seen from the low success rate of heading change calculations using the images taken with this camera, namely 30 % and 35 %. While the success rate of calculations performed using the GoPro images was high at 72 % and 82 %,mainly due to the larger amount of lines captured into the image, as shown in Figure 4.8, the heading change accuracy was slightly worse than when using the Sony camera (mean errors of 2.5 and 2.8 degrees compared to 2.3 and 2.6 degrees). The images in the figure are taken using Sony (on left) and GoPro (on right) cameras at the same location. The red rectangle shows the portion of the GoPro image captured also by the Sony camera. The worse accuracy with better quality images is due to the distortion of the wide angle lens. Though the distortion was corrected before the vanishing point calculations, some effect still remains. 64 4. Visual gyroscope Fig. 4.8. Effect of the field-of-view on the line detection. The images are taken using Sony (on left) and GoPro (on right) cameras at the same location. The red rectangle shows the portion of the GoPro image captured also by the Sony camera. Red dots are resulting vanishing points. Image Rate The image rate becomes an important factor when the environment appears challenging. Previous tests employing the Nokia N8 using the same method gave a mean error of 2.5 degrees for heading change observations [95]. When the image rate is low, namely 0.8 Hz, the environment, consisting of many turns and spacious areas, with reduced amount of lines in view, leaves the method with few good observations. The effect is seen from the reduced accuracy of the visual gyroscope when using the Nokia N8 in which case the mean heading change error is 4.5 degrees as compared to 2.5 degrees with Sony’s camera’s lower quality images. 4.5. Effect of Camera and Setup Characteristics on the Accuracy of the Visual Gyroscope 65 Camera Configuration Surprisingly the camera configuration did not have a significant effect on the accuracy of the visual gyroscope heading change. The heading change of the camera is assumed equal to that of the user when the visual gyroscope is used for pedestrian navigation. When the camera is held in a hand, the change of the hand’s posture in the horizontal direction introduces a heading change in the camera that is inconsistent with the heading change of the user. The roll and pitch magnitudes are also larger compared to those of the configuration where the camera is tied to the user. The heading change accuracy was evaluated with two tests; in the first round the camera was held in the hand and in the second round, tied to the backpack. The difference in the mean errors was only 0.2 degrees between the two tests. The configuration of the camera affects also the success rate of the images. When the camera is held in a hand, it sometimes rotates too much, thereby reducing the sight to the lines and decreasing the success rate of visual gyroscope calculations. Pitch and Roll The mean pitch error values were consistent with heading change errors for all cameras. However, the roll error was over two degrees less than for the heading change using the Sony and Nokia N8, namely 0.5 and 1.3 degrees. The method described in this paper calculates the roll based on the ratio between the totally vertical lines having an infinite slope value and the vertical lines with slope deviating from infinity, as explained previously. The approach decreases the effect of vertical lines non-parallel to the camera y-axis. When the camera is not experiencing any roll, the number of totally vertical lines surpasses the value of other vertical lines, also non-parallel. These lines, otherwise causing errors to vanishing point calculations, were therefore discarded. The vertical line classification however fails when the distortion is corrected. The correction changes pixel values of all lines’ end points and therefore there are no lines left with infinite slope value and the ratio based method is no longer valid. The effect was seen from the larger roll error when using the GoPro camera (2.1 degrees). 66 4. Visual gyroscope Table 4.6. Processing time for different algorithms in visual gyroscope’s Nokia N8 Symbian (capturing the photo) smartphone implementation using OpenCV (edge and line detection) and vanishing point, heading and tilt computations implemented using C++. Algorithm Capturing a photo Edge detection (Canny) Line detection (Hough) Vanishing point, heading, tilt Total 4.6 processing time (s) 1.2 0.17 1.0 0.07 2.7 Smartphone Application of Visual Gyroscope The visual gyroscope has been implemented herein in a Nokia N8 Symbian smartphone. The implementation was done using Nokia’s C++ based development environment QT [78]. The basic visual algorithms (i.e. filtering, edge detection, Hough transform, SIFT and matching) were obtained from OpenCV open source visual library [3]. The total processing time for automatically capturing an image, finding the straight lines, voting for the vanishing point and calculating the orientation of the camera is on average a bit over two and half seconds, with the specification shown in Table 4.6. The bottlenecks of the calculation are the image capturing (1.2 seconds) and line detection using the Standard Hough Transform algorithm (1 second). The slowness of the Hough Transform was also acknowledged in [47] discussing an alternative visual gyroscope implementation in a smartphone. The time used for extracting the lines and calculating the vanishing point has to be decreased for a realtime navigation solution. This will be realized by using a more efficient line detection algorithm as discussed in Chapter 6. Also the effect of using video images instead of the still images has to be addressed in the future in pursuance of concentrating especially on the power consumption aspect. 4.7. Visual Gyroscope Implementation Using Probabilistic Hough Transform 4.7 67 Visual Gyroscope Implementation Using Probabilistic Hough Transform The three most significant limitations of the visual gyroscope presented in above are its inability to monitor the heading change during sharp turns, its accuracy suffers from irregularities of the environment (namely lines violating the orthogonality requirement) and the calculation is relatively slow for real-time implementation. The first two problems could be addressed by a tighter integration of the visual gyroscope and other positioning systems, especially a gyroscope. As the visual gyroscope presented has processed the image data independently from other positioning systems it has not had any support for exceptional situations. As was mentioned before the line detection using the Hough transform is a bottleneck in the visual gyroscope’s processing. All three problems are addressed in this section by developing a method extracting the lines using a novel modification of an algorithm called Probabilistic Hough Transform [52] utilizing the information of the user attitude obtained from an IMU. In [14] the attitude information obtained from INS was utilized to estimate the vanishing point location by calculating a probability density function for the Hough parameter space, and the method was called a predictive Hough Transform. The attitude of the user obtained from the INS may be transformed into an estimate of the vanishing point using the relation ṽ = KCcb Cbn R (67) where Ccb is a direction cosine matrix (DCM) [105] from the body to the camera frame and Cbn from navigation frame to the body frame. The direction cosine matrix, c11 c12 c13 Cnb = c11 c12 c13 c11 c12 c13 (68) where the element at the ith row and the jth column is the cosine of the angle between the i-axis of the reference frame and the j-axis of the initial frame. A vector defined in a certain axes frame may be expressed in reference axes by multiplying it by the direction cosine matrix (expressed here as transforming the vector rb in body axes into 68 4. Visual gyroscope the navigation frame rn (the transformation may be done likewise for other transformations also) as rn = Cnb rb (69) When the user rotates through angle ψ about the z-axis (heading), angle θ about the new y-axis (pitch) and angle φ about the new x-axis (roll) the transformation may be presented using the direction cosine matrix cos θ cos ψ n Cb = cos θ sin ψ − sin θ − cos φ sin ψ+sin φ sin θ cos ψ cos φ cos ψ+sin φ sin θ sin ψ sin φ cos θ sin φ sin ψ+cos φ sin θ cos ψ − sin φ cos ψ+cos φ sin θ sin ψ . cos φ cos θ (70) The reverse rotation, in this case the rotation Cbn is obtained using a transpose rule Cnb = (Cbn )T Matrix K in (67) is the camera calibration matrix and R normalized rotation matrix of the camera this time in the navigation frame and computed using the attitude information obtained from the IMU. The expected vanishing point location ṽ is characterised by a Gaussian density function with parameters (µρ , σρ2 ) [14] as ρθ ∼ N (µρ , σρ2 ) (71) where the distance ρθ related to a certain angle θ (θ ∈ [0, π]) is normally distributed. The mean µρ is computed for each angle θ and the corresponding line going through the estimated vanishing point ṽ. The variance σρ2 is decided based on the IMU accuracy. In [14] the probability density function was used as a filter for the Standard Hough Transform (SHT) result space and provided a corrected vanishing point location. The attitude information from the accurate vanishing point was finally used for correcting the INS attitude with a Kalman filter and an improved navigation solution was obtained. The method addresses the problems arising from the erroneous vanishing point calculations due to an unsuitable environment, namely line geometry overwhelmed by non-orthogonal lines. As the line detection is done using the Standard 4.7. Visual Gyroscope Implementation Using Probabilistic Hough Transform 69 Hough Transform, the processing time jeopardizing the real-time solution is not improved. Therefore in this thesis an accelerated line detection algorithm based on Probabilistic Hough transform is developed. However, it should be noted that the method should be employed only for extracting lines when a vision-aided IMU system is used, it is not suitable as a generic line extraction algorithm. As explained in Chapter3 the Standard Hough Transform computes the parameter space (ρ, θ) for each point (x, y) found from the input image, being usually the result of the edge detection, as ρ = x cos(θ) + y sin(θ). (72) A matrix, called accumulator, keeps count of the number of image points corresponding to a certain (ρ, θ)-pair. After examining each point in the input image the maximum value at the accumulator are identified and stated to represent lines in the image. The Probabilistic Hough Transform [56] is a modification of the Standard Hough Transform using only a random subset of image points for voting and deriving the number of votes needed for identifying a line using Monte Carlo evaluation theory. According to [73] the algorithm reduces the computation only if a priori information of the number of lines is available and as this is not usually the case a Progressive Probabilistic Hough Transform was developed. In the method the image points used were selected randomly and the parameter pair was selected to represent a line when the votes it had received exceeded the number that would be expected from random noise. The amount of points needed to represent a line was evaluated progressively based on the rate of the pixels examined and the pixels voting for a certain line. In this thesis a method combining the INS aided vanishing point detection and Progressive Probabilistic Hough Transform discussed above is developed. The attitude of the user obtained from the IMU measurements is transformed into an estimate of the vanishing point location ṽ using (67) and the probability space (71) corresponding to the point for each possible angle θ computed. Then, a pixel is selected randomly from the set containing all pixels resulted from the edge detection. The distance ρ is calculated for all possible θ and the values in corresponding accumulator cells are increased by summing the value of the probability density function for the obtained distance and mean with the existing cell value. The Standard Hough Transform increases all accumulator cells equally because its objective is to find all 70 4. Visual gyroscope straight lines present in the image, while here the objective is to find the lines supporting the vanishing point. As a result of using the probability function the closer the possible line is to the estimated vanishing point, the more the accumulator cell value is increased. When the value in the accumulator cell exceeds a predetermined threshold, a line is found. As a line is found and no more support for it is needed, all other image points belonging to the line are removed from the pixel set. Also all the votes in the accumulator arising from the line are removed for not disturbing the identification of other lines. In this way the number of image points examined and therefore the computation time needed decreases. As the points having a larger likelihood of belonging to a line going through the estimated vanishing point or a point close to it are given more weight, the lines found are likely to be in the direction of supporting the central vanishing point. After all pairs (ρi , θi ) supporting a line are identified as explained above, the correct vanishing point is found from the intersection of all lines i as follows. As discussed above, a (ρ, θ)-pair in the parameter space represents all collinear points (xi , yi ) in the image. This is also true the other way around [28]; all points (ρi , θi ) satisfying the equation ρi = x cos(θi ) + y sin(θi ) (73) which represent lines going through a point (x, y). As the line detection was done by emphasizing the lines supporting the estimated vanishing point, all the lines found should intersect at the correct vanishing point which may then be found using a leastsquares estimation technique for (73). Two parameters selected for the calculation are crucial for the performance of the visual gyroscope presented in this section, namely the threshold for deciding when a line is found and the standard deviation of the estimated vanishing point value. When the threshold for finding a line is deficient, the rate of false positives is large, and when it is too large, the computation time increases and occasionally too few lines are found from the low-light indoor environment resulting in an inaccurate vanishing point location. Also, when the standard deviation assigned for the estimated vanishing point value is too large the errors in IMU induced attitude distort the line detection by emphasizing points close to the estimated point probably not even belonging to a line. For the experiments presented in Chapter 6, the parameter σ was chosen to be 4.7. Visual Gyroscope Implementation Using Probabilistic Hough Transform 71 20 to allow the estimated vanishing point to be within ±20 pixels from the correct vanishing point and threshold for identifying a line 0.4 through experimentation. 72 4. Visual gyroscope 5. VISUAL ODOMETER The user translation derived from the accelerometer measurements suffers from errors and therefore a method for providing the information from consecutive images and feasible for augmenting or replacing the accelerometer, namely a concept of visual odometer, is developed herein. This chapter discusses the principle of the visual odometer and especially the challenges in observing the translation from consecutive images, i.e. the unknown depth of the objects seen in images and a scale problem arising from it. Then, the visual gyroscope’s error detection and performance are discussed as well as how the problems arising from degeneracy are avoided. 5.1 The Principle of the Visual Odometer Translation of the camera between two consecutive images is constrained with a rule called homography that encompasses the calibration of the camera as well as its rotation and translation between the images as was explained in Chapter 3. The homography equation, reproduced here, is x0 = K0 RK−1 x + K0 t/Z. (74) When the image points in the first (x) and second (x0 ) image are normalized homogenous coordinates x̂0 the relation reduces to x̂0 = Rx̂ + t/Z (75) where R is the camera rotation and t = [tx , ty , tz ] the translation between the images. Z represents the distance (depth) of the photographed object from the camera and because it is usually unknown in vision-aided navigation applications, the translation is solved only within an ambiguous scale. In navigation, the absolute magnitude of the translation has to be solved for and some solutions for obtaining the depth Z and 74 5. Visual odometer therefore the scale were presented in Chapter 1. The visual odometer presented in this thesis is based on a special camera configuration providing means to resolve the object depth in an unknown environment. The method utilizes the camera rotation obtained using the visual gyroscope and the known height of the camera measured before starting navigation and kept sufficiently static. The definition of sufficient in this context is given later in the chapter. 5.1.1 Measuring the Distance of an Object from the Camera The distance Z from the camera to the object is calculated using information of the height of the camera (h), the focal length in units of vertical pixels (fy ), and the height of the image in pixels (H) [16]. The height of the camera must be known but the focal length and height of the image may be obtained by camera calibration. Figure 5.1 visualizes the configuration for resolving the depth Z of an object having coordinates (X, Y, Z) and projected into image point (x,y) using the parameters listed above, camera pitch φ computed by the visual gyroscope and β computed as follows. When the image height (H) and the vertical component of the focal length (fy ) are known , the vertical field-of-view (vfov) may be calculated as H . (76) vfov = 2 arctan 2fy Now the angle comprising half of the vertical field-of-view may be defined as tan( H vfov )= 2 2 fy (77) and the angle (β) between the principal ray of the camera and the ray from the camera to the object using the image point y and the focal length fy as tan(β) = y − H2 fy (78) which reduces to tan(β) = (y − vfov H 2 ) tan( 2 ) H 2 =( 2y vfov − 1) tan( ) H 2 (79) 5.1. The Principle of the Visual Odometer 75 Fig. 5.1. The special configuration of the camera for resolving the distance (Z) of the object, using the height (h) and pitch (φ) of the camera. and results in β = arctan vfov 2y − 1 tan . H 2 (80) The Y coordinate of the object is obtained as Y = h . sin(α + β) (81) Finally, using β and the pitch of the camera φ, the distance Z is obtained as Z = Y cos(β) = h cos(β) . sin(φ + β) (82) In order to be able to determine the user translation by using this presented special configuration, the object is required to lie in close vicinity of the camera, namely 76 5. Visual odometer between the camera and the point where the principal ray intersects with the floor plane with the prevailing pitch angle. The vicinity requirement is rational also in the sense that the motion of the far-off objects is very small in term of pixels and may therefore be overwhelmed by noise. Also, the method for resolving the ambiguous scale using the known height of the camera requires the image points to lie on or close to the floor plane. Experiments have shown [89] that lines used by the visual gyroscope to find the vanishing point are mainly found from the floor, namely from the junction of the floor and walls, especially with a camera having a pitch angle larger than zero towards the floor plane. If no such points are found, a coarser method is introduced, considering all points found below the vanishing point, or if the vanishing point is not found, the principal point. As the method presented in the thesis uses the rotation perceived by the visual gyroscope, the amount of matching image points needed is reduced and therefore the limitation of the region for finding suitable objects does not incur substantial limitations for the translation observation. SIFT features are extracted from the two consecutive images and matched using Matlab algorithms [107] and the restrictions of the object’s location described above; matching is shown in Figure 5.2. The lines join image points matched in the consecutive images (first on left and second on right), the red dot is the central vanishing point and the green point on the floor of the left image is the only matched point inside the region suitable for the visual odometer. Due to the low amount of features in indoor environments, less certain matches are accepted, so that some matches are found for most images. Now the translation of the camera between the two images may be resolved from the matched normalized homogenous image points x̂, x̂0 in the first and second image, respectively, using (75). When the image points followed are projections of the objects lying on the floor, the translation matrix z- and x-component show the horizontal translation that may further be transformed into translation in East and North directions using the visual gyroscope induced heading. The loose matching criteria, as well as the occasional use of the coarse floor plane recovery, necessitate careful error handling for the robust visual odometer measurements which will be explained next. Also the detection of the unknown scale will be discussed. 5.1. The Principle of the Visual Odometer 77 Fig. 5.2. Matched Sift features between consecutive images. 5.1.2 Error Detection and Resolving the unknown scale for the Visual Odometer As was explained in Chapter 3, the image point presented with homogenous coordinates x = (x, y, 1), is related to the corresponding object coordinates X = (X, Y, Z, 1) as x = K[R|t]X (83) where t is the translation of the camera center, R its orientation with respect to the world coordinate frame center and camera matrix P = K[R|t] is such that x = PX. When the configuration is done in a manner that the location and attitude of the camera while capturing the first image is set as the world frame origin and initial attitude, the attitude and location obtained using the second image’s points reflect the change of the location and attitude of the camera between the images, i.e. rotation and translation of the camera. The coordinates (X) of an object represented by two image points (x, y, 1)T and (x, , y , , 1)T in consecutive images are estimated using triangulation as [39] 78 5. Visual odometer xpT3 X = pT1 X ypT3 X = pT2 X x, pT3 X = pT1 X (84) y , pT3 X = pT2 X where pTi represents the i-th row of the camera matrix P. The four equations may be expressed in a matrix form as AX = 0 for a suitable A. An estimation X̂ for X may now be obtained using the Linear-Eigen method, namely it is the unit eigenvector corresponding to the smallest eigenvector of AT A minimizing |A| and having the condition |X| = 1 [38]. This is done using the Singular Value Decomposition (SVD). SVD is a factorization of a matrix M as M = UDVT [38]. Matrices U, V are orthogonal and D diagonal with non-negative values. The decomposition of a matrix MT M is MT M = VD2 V−1 and the values of D2 are its eigenvalues and columns of V eigenvectors. Because the object should be lying on the floor plane, the Y-coordinate of the estimation X̂ has to be equal to the height of the camera, and the unknown scale present in the image homography may be solved. The scale factor is the ratio of the camera height and the object Y-coordinate Yh [58]. When the translation obtained using the homography is multiplied by the scale factor, the real translation in the horizontal plane is obtained, and the translation in the vertical direction is assumed to be zero based on the configuration requiring the camera height to be static. The reprojection error is a measure used for evaluating if the image points are matched correctly and is used for discarding erroneously matched points [38]. It is done by estimating the object point X̂ from the image point correspondences x, x0 as described above and then reprojecting the estimated object point to the matched image points. Error detection in the case of the visual odometer cannot be based on monitoring the reprojection error for the motion is mainly forward and therefore the rays from the camera to the image point in consecutive images are nearly parallel. Herein the Ycoordinate values are monitored and the ones deviating more than a threshold from the mean of all observations are discarded. The deviation is constrained as |Yi − µ(Y)| < 2σ(Y) (85) where Yi is the i-th object’s Y-coordinate, µ(Y) the mean of all obtained Y-coordinate values and σ(Y) their standard deviation. 5.1. The Principle of the Visual Odometer 5.1.3 79 Degeneracy Degeneracy problems arise from special situations while resolving motion of the camera from consecutive images. When the camera rotates about its centre, or the camera is static but its intrinsic parameters change between the images, a motion degeneracy arises [106] because the epipolar geometry between the consecutive images is not defined. Structure degeneracy arises in the case when all image points used for matching are coplanar because then the epipolar geometry between the images cannot be uniquely determined. This is due to the fact that the camera matrix P presented in Chapter 3 has 11 degrees of freedom and when the image points are coplanar they define a homography with only 8 degrees of freedom. The visual odometer proposed in this thesis resolves the degeneracy problems as follows. When the depth of the object (Z) computed for matched image points as explained above is constant for two consecutive images, the translation between the images is set to zero. Thus, the homography resulting in errors due to the motion degeneracy is not computed. As the heading is computed using the visual gyroscope the heading measurement does not suffer from the degeneracy problem. The structure degeneracy arising from the image points being planar is avoided from the configuration of the translation and rotation solution. As the camera is calibrated a priori, the rotation is obtained from the visual gyroscope and the translation in the vertical direction is assumed zero, and in theory only one matched image point between the consecutive images is needed to resolve the horizontal translation. Therefore the image points may be coplanar, the missing 3 degrees of freedom needed to resolve the camera matrix are already solved using these other visual parameters. 5.1.4 Performance of the Visual Odometer The visual odometer provides relative information about the user position, i.e. translation, and therefore the initial position has to be obtained using an absolute positioning system. If the camera used is calibrated, the visual odometer does not need additional calibration before or during navigation, however the performance of the navigation system substantially increases if the absolute position of the user is occasionally calibrated, reducing the effect of unavoidable error occurrences in visual observations. The visual odometer does not depend on any knowledge of the environment, but only the camera height above the floor plane must be estimated and 80 5. Visual odometer kept sufficiently static. However, its performance is dependent on the accuracy of the visual gyroscope’s measurements. The most drastic errors may be avoided by monitoring changes in pitch; if the change is considerable it is most likely due to an error in vanishing point location and in this case the previous pitch and heading values are used. The method of the visual odometer is not as tolerant to dynamic objects as the visual gyroscope, but again the error arising from monitoring the motion of a dynamic object depends on how many matching static points are found. The mean error of the user speed obtained in different navigation environments was found to be less than 0.3 m/s, with a standard deviation of 0.3 m/s. Analysis of the errors arising from different navigation environments will be discussed in the following chapter. The camera height has to be evaluated a priori and kept sufficiently constant during navigation for the visual odometer to perform correctly. The effect of using an incorrect height estimate on the visual odometer’s performance due to erroneous a priori measurement or failure in keeping the camera at a constant height, which would naturally occur when using smartphone in hand, was evaluated. The statistics from the visual odometer perceived user speed while the height of the camera was correct and kept constant were compared to a situation where the height of the camera was ±10 cm and ±30 cm off the correct value via simulation. In an experiment done in an office corridor and resulting in 183 images, the mean error in speed was 0.26 m/s when the correct height was used and no effect was seen when a height value with an error of −10 cm was used. However, when the height was erroneous with the same magnitude in the other direction, the mean error increased to 0.38 and 0.54 m/s when the error was 10 and 30 cm, respectively. Table 5.1 shows the statistics. The results show that a vertical motion of the camera less than or equal to 10 cm does not deteriorate the performance of the visual odometer substantially, hence the upwards the motion may be even larger. 81 5.1. The Principle of the Visual Odometer Table 5.1. Statistics of the effect of camera height errors for visual odometer’s speed accuracy, units are in m/s Statistics Correct height Height -10 cm Height -30 cm Height +10 cm Height +30 cm min error 0 0 0 0 0 max error 1.5 1.4 0.8 1.5 1.5 mean error 0.26 0.26 0.28 0.38 0.54 std of error 0.24 0.26 0.18 0.29 0.29 82 5. Visual odometer 6. VISION-AIDED NAVIGATION USING VISUAL GYROSCOPE AND ODOMETER This chapter discusses pedestrian indoor and urban navigation solutions utilizing a visual gyroscope and a visual odometer to obtain a vision-aided integrated system. The visual gyroscope used in the majority of the experiments is based on the method utilizing Standard Hough Transform, the algorithm based on Probabilistic Hough Transform is presented in the end of the Chapter. The collected data, visual and other measurements from different sensors and radio positioning systems, are integrated using a Kalman filter. It should be noted here that in the discussions related to Kalman filter the matrices H and R are defined as was discussed in Chapter 2 and should not be confused with the ones related to computer vision variables having corresponding names. The durations of tests and test environments are varied in order to obtain an extensive understanding of the suitability of the algorithms for real life pedestrian navigation. Due to the different start times and measurement rates of different systems, all measurements have to be time stamped carefully. Because cameras do not usually provide a time stamp to images in Coordinated Universal Time (UTC) like other sensors often do, but in relation to the camera’s own clock, the time for images has to be resolved differently. In this thesis this is done using two different methods, namely by initializing the system through keeping all sensors static and then looking at the time of the start of motion from images or alternatively by showing a handheld GPS clock to the camera before starting navigation. The initialization of the world frame with respect to the navigation frame is explained separately for each implementation discussed. Calculations for all experiments are done in post-processing using Matlab and therefore the time synchronization done in the initialization phase is valid throughout the total experiment time; no drift in time is experienced. The experiment setups and results are then discussed. 84 6. Vision-aided navigation using visual gyroscope and odometer 6.1 Visual Gyroscope and Odometer Aided Multi-Sensor Positioning The measurements from the visual gyroscope detecting heading and pitch changes and the translation from the visual odometer were integrated with GPS position obtained using a Fastrax IT500 high-sensitivity receiver (sensitivity being -165 dBm in navigation), WLAN fingerprinting observations from Nokia N8 smartphone, and speed and heading measurements from the MSP (multi-sensor positioning) device [21] as well as Nokia 6710 accelerometers and magnetometers using the Kalman filter presented below and explained in detail in [63]. The GPS receiver, as well as the reference system, were initialized outdoors. The visual measurements were calculated from images taken using the Nokia N8 camera that has a resolution of 12 Mega pixels. Processing of such large images is very time consuming and therefore the resolution was reduced to 640 × 480 to enable real-time performance. A NovAtel SPAN (Synchronized Position Attitude Navigation) GPS/INS high-accuracy positioning system was used as reference. The equipment was placed in a cart using the setup shown in Figure 6.1. The start position was set to be the origin of the navigation frame and heading was initialized using the reference solution at the beginning of the experiment. This initial heading was used for setting up the visual gyroscope’s world frame, hence at the initial point the visual gyroscope’s heading was stated to be equal to the initial heading and during navigation heading changes between consecutive images were monitored to propagate the heading. As the camera was attached to a holder in a cart its roll was incremental and therefore the visual gyroscope measuring only the pitch and heading change, presented in Chapter 4 was used. The time synchronization in the experiments testing the performance of vision-aided multi-sensor positioning was done by monitoring the motion start time from the self-contained sensors and the images. The experiments were done in an office corridor first by obtaining WLAN position measurements from a functional WLAN radio map and then using an outdated map for assessing the performance of the visual gyroscope and odometer while absolute updates are restricted, otherwise the setup was the same for both rounds. The relatively short test time is due to the difficulties in obtaining an accurate reference trajectory for a longer time indoors. The results are however anticipated to produce similar accuracy for longer time periods due to the possibility to obtain occasional absolute position updates. The results from the experiments show that the vision- 6.1. Visual Gyroscope and Odometer Aided Multi-Sensor Positioning 85 Fig. 6.1. Equipment setup for experimenting the vision-aided multi-sensor positioning. The Nokia N8 phone acquiring the images was attached to a holder in the front of the cart. The GNSS antenna is that of the NovAtel SPAN reference system. aiding increases the accuracy, precision and availability of the navigation solution as described below. 6.1.1 Kalman Filter Used in Multi-Sensor Positioning A constant speed model, which is often used in pedestrian navigation, was also used and is defined as Xk+1 = Xk + Ẋk ∆t + v1 Yk+1 = Yk + Ẏk ∆t + v2 (86) Ẋk+1 = Ẋk + v3 Ẏk+1 = Ẏk + v4 where X and Y are the latitude and longitude transformed into the ENU (East, North, 86 6. Vision-aided navigation using visual gyroscope and odometer Up) coordinate frame, Ẋ and Ẏ are their time derivatives, k denotes the current epoch, ∆t is the time interval between two epochs, and vi is the state uncertainty component of the element i. The state vector (x), transition matrix (Φ) and process noise matrix (Q) for the model are Xk 1 0 Y xk = k Φ k = Ẋk 0 Ẏk 0 ∆t4 q̃1 4 0 ∆t 0 0 1 0 ∆t Qk = ∆t3 0 1 0 q̃1 2 0 0 1 0 0 4 q̃2 ∆t4 0 3 q̃2 ∆t2 3 q̃1 ∆t2 0 3 0 q̃2 ∆t2 q̃1 ∆t2 0 0 q̃2 ∆t2 (87) where q̃1 is a spectral noise density value for the North component and q̃2 for the East component, both having a value of (0.5 m/s2 )2 based on an empirical assessment of the quality of the sensors in the hardware platform. The measurements consisted of position, speed and heading from GPS, WLAN (position), Nokia phone (ACC1) and MSP (ACC2) accelerometer as well as the visual odometer (V) (speed), and Nokia phone (DC1) and MSP (DC2) digital compasses as well as the initialized visual gyroscope (V) (heading). The measurement vector z is XGP S YGP S XW LAN YW LAN SACC1 cos θDC1 . zk = S sin θ ACC1 DC1 S ACCV cos θV SACCV sin θV SACC2 cos θDC2 SACC2 sin θDC2 Matrices H and R are (88) 6.1. Visual Gyroscope and Odometer Aided Multi-Sensor Positioning 1 0 1 0 0 Hk = 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 87 (89) 2 2 Rk = diag(σX , σY2GP S , σX , σY2W LAN , σS2 ACC1 cos θDC1 , σS2 ACC1 sin θDC1 , GP S W LAN σS2 ACCV cos θV , σS2 ACCV sin θV , σS2 ACC2 cos θDC2 , σS2 ACC2 sin θDC2 ). (90) The variance values σi2DEV were chosen by testing the accuracy of the corresponding measurements (i) obtained using the device (DEV). The devices utilized had different measurement rates, i.e. 36 Hz for accelerometers and magnetometers, 1 Hz for GPS measurements, 0.8 for the visual gyroscope and odometer and 0.1 Hz for WLAN; the size of the matrices therefore varied based on the number of measurements obtained for the epoch under consideration. 6.1.2 Test in an Indoor Office Environment The performance of the vision-aided multi-sensor positioning using the visual gyroscope and the visual odometer was tested in a typical office corridor as shown in Figure 6.2; in this case however the corridor was suffering from low lighting due to the dark North European winter. The data collection lasted three minutes. Although the images were taken with a 0.8 Hz rate the vanishing point location calculation occasionally failed due to scenes that were too dark or corridor turns and therefore no visual gyroscope or visual odometer observations were obtained for a small percentage of the images captured and thereby the data collection resulted in 148 images with successful visual calculations. The error in cumulative distance obtained when propagating the position using the visual odometer measurements was only 1.3 m, with the total length of the route being 158 m. However, the result is over optimistic 88 6. Vision-aided navigation using visual gyroscope and odometer Fig. 6.2. Office corridor used for evaluating the vision-aided multi-sensor implementation. as may be seen in Figure 6.3 that shows the variation of the speed measurements. The mean error of the speed was 0.3 m/s and the standard deviation 0.3 m/s. In this experiment the user speed was not filtered as may be seen in the figure, but in the following experiments the user speed will be filtered with an upper limit of 1.5 m/s, which is considered a reasonable assumption for normal pedestrian navigation. As the magnitude of the steep turns (i.e. 90 degrees or over) may not be observed using the visual gyroscope and no other feasible heading measurements were available in this and the following office experiments, the turns were detected as follows: In turning situations where the turns were sharp and the vanishing points not perceived, the visual odometer observed the turns as being the only regions where the matching failed completely and no corresponding image points were found. When only one matching image pair was lost, the turn was assumed to be most likely 90 degrees and when three points were lost the turn was evaluated to be 180 degrees. If some kind of map matching [83] were used as further augmentation, this procedure would become more feasible. The visual gyroscope and odometer induced heading changes and speed information were integrated with the other measurements using the Kalman filter described above and the user position during navigation was obtained. The position solution obtained was compared to a solution using only GPS 6.1. Visual Gyroscope and Odometer Aided Multi-Sensor Positioning 89 Fig. 6.3. Average speed from ground truth (red) and visual odometer (blue). or WLAN positions as measurements to the filter as well as to the integrated solution using all other measurements but the visual. All solutions obtained were compared to the ground truth obtained from the SPAN reference system. The vision-aided fused solution, i.e. integration of the visual gyroscope and odometer measurements, provided the best user position accuracy and precision, the mean error being 5.3 m and the standard deviation being 3.8 m. The corresponding values for the fused solution without vision aiding are 6.7 and 5.1 m, for GPS positioning 17.8 and 11.5 m and for WLAN positioning 5.9 and 4.7 m. The availability of the other positioning systems, when the solution is computed at 1Hz rate, is 100% except for the WLAN, for which the availability is only 10% due to the low update rate of the solution in consequence of power saving requirements of the smartphone. Table 6.1 shows the statistics. The position result is visualized in Figure 6.5 showing on the left the fused solution without vision-aiding (blue), GPS position (black), WLAN position (purple) and the ground truth (green) and on the right the vision-aided fused solution (blue), GPS position (black), WLAN position (purple) and the ground truth (green). 90 6. Vision-aided navigation using visual gyroscope and odometer Table 6.1. Positioning error statistics using different systems in an office corridor Statistics WLAN GPS Fused Vision-aided fused min error (m) 1.6 0.6 0.9 0.9 max error (m) 17.2 38.7 23.4 19.7 mean error (m) 5.9 17.8 6.7 5.3 std of error (m) 4.7 11.5 5.1 3.8 availability (%) 10 100 100 100 Fig. 6.4. Indoor positioning results for a pedestrian using different sensors and fused solutions. The figure on the left has a fused solution without visual-aiding (blue) and the figure on the right a fused solution using visual-aiding from the visual gyroscope and visual odometer (blue), ground truth (green), GPS position solution (black) and WLAN position solution (purple). 6.1.3 Test in Office Environment Using an Outdated WLAN Radio Map An experiment using the same setup, equipment and methods as above was carried out this time using an outdated WLAN radio map and therefore deteriorating the absolute position calibration during the navigation. This test was performed to assess the impact of an incorrect WLAN map as such maps are known to change frequently due to human traffic and other changes. Two of the eight WLAN access points in the office corridor were out of order and two had changed locations after the formation of the radio map. Also, some new electrical equipment was placed to the vicinity of one access point. Thus, this altered setup caused the WLAN positioning accuracy to be reduced to 11 meters from the previous 6 m. The use of the visual odometer 6.1. Visual Gyroscope and Odometer Aided Multi-Sensor Positioning 91 Fig. 6.5. Indoor positioning results for a pedestrian using different sensors and fused solutions in an office corridor with an outdated WLAN radio map. The figure on the left has a fused solution without visual-aiding (blue) and the figure on the right a fused solution using visual-aiding from the visual gyroscope and visual odometer (blue), ground truth (green), GPS position solution (black) and WLAN position solution (purple). induced user speed filtering explained above decreased the mean error in the speed from 0.3 m/s to 0.26 m/s with a standard deviation of 0.24 m/s. Now the fused position solution without vision-aiding had a 7.8 m mean error and a standard deviation of 4.8 m while the vision-aided fused solution using the visual gyroscope and odometer measurements had a mean error of 5.8 m and a standard deviation of 3.7 m. The statistics are shown in Table 6.2. The problems in the absolute position calibration due to the outdated WLAN radio map increase the fused position solution mean error by more than 1 m compared to the fused solution using a valid WLAN radio map, but the degradation is not so significant for the vision-aided fused solution, namely only 0.5 m. The position result is visualized in Figure 6.5 showing on the left the fused solution without vision-aiding (blue), GPS position (black), WLAN position (purple) and the ground truth (green) and on the right the vision-aided fused solution (blue), GPS position (black), WLAN position (purple) and the ground truth (green). 92 6. Vision-aided navigation using visual gyroscope and odometer Table 6.2. Positioning error statistics using different positioning systems in an office corridor with an outdated WLAN radio map Statistics WLAN GPS Fused Vision-aided fused min error (m) 0.5 0.3 0.5 0.4 6.2 max error (m) 36.9 32.6 16.1 13.1 mean error (m) 11.0 14.9 7.8 5.8 std of error (m) 10.9 8.4 4.8 3.7 availability (%) 10 100 100 100 Stand-Alone Visual System The two previous experiments assessed the performance of a visual gyroscope and odometer aided fused navigation solution in an environment most favorable for the visual methods developed herein and consisting of scenes having good line geometry and only a few dynamic objects. Because pedestrian navigation is needed in various environments, the method was tested in the most challenging environment for the visual aiding approach, namely in a shopping mall with shoppers in motion, wide corridors restricting frequently the view of their sides and therefore making the line geometry degraded, varying lighting conditions and various objects forming many non-orthogonal lines. As the method is also aimed for urban navigation, it was tested in an urban environment, frequently close to the wall of a tall building. As these environments lacked an absolute positioning system for integration, the performance of the visual gyroscope and odometer was tested as a stand-alone system, propagating the initial position and orientation using the visual heading and speed measurements. The visual measurements were calculated from images taken using the Nokia N8 camera and reduced resolution as in the previous experiments. Again, a NovAtel SPAN GPS/INS high-accuracy positioning system was used as reference. The equipment was placed into a cart as in the previous experiments. The visual measurements were propagated using a Kalman filter implemented as follows. The results obtained were compared to the ground truth and are discussed below. 6.2. Stand-Alone Visual System 6.2.1 93 Kalman Filter Used in Stand-Alone Visual Positioning In this case, the navigation solution is obtained by propagating the initial position and heading using the visual gyroscope induced heading information and visual odometer speed. The propagation is done using a straightforward Kalman filter modeling the user position as Xk+1 = Xk + Sk+1 ∆t sin θk+1 + v1 Yk+1 = Yk + Sk+1 ∆t cos θk+1 + v2 (91) where X and Y are the latitude and longitude transformed into the ENU frame, S is the speed of the user from the visual odometer, θ is the heading obtained using the visual gyroscope, ∆t is the time interval between the current epoch k and the consecutive epoch k + 1. v1 , v2 are the state uncertainty components of elements X, Y and the state vector xk = [X Y ]T . The Φ, H and Q matrices for the filter are # # " " 1 0 22 0 Φ=H= Q= . 0 1 0 22 (92) 2 , σ 2 ) are based The variances in the measurement covariance matrix Rk = diag(σX Y on the performance evaluation of the visual gyroscope and odometer and change during the processing based on the error detection results, i.e. when there is a high probability of an error occurrence the values are increased and therefore less weight is given to the measurement. 6.2.2 Test in a Shopping Mall Environment An experiment lasting 420 seconds was done in the Iso Omena shopping mall in Espoo, Finland. The challenging environment deteriorated the visual gyroscope’s performance as was already seen in Chapter 4, with the mean heading error being 4.4 degrees and the standard deviation 6 degrees. Figure 6.6 shows the challenges set by the environment. The incomplete line geometry due to the wide corridor and richness of objects decreases the number of straight lines found from the original image on the left and shown in the image on the right. The lines found are classified as was explained in Chapter 4, and the erroneous vanishing point found is shown 94 6. Vision-aided navigation using visual gyroscope and odometer Fig. 6.6. An image captured by the smartphone on the left and after processing using the visual gyroscope algorithm on the right. The lines found from the image are classified and colored, the ones in the direction of propagation and used for the vanishing point calculation being in blue. The central vanishing point is shown as a red dot. as a red dot. The figure shows also a challenge set for the visual odometer, namely the shoppers in motion. However, the visual odometer induced speed did not suffer from the environment due to the increased number of objects and therefore matched image points found in the environment. The mean error of the visual odometer’s speed observations was 0.25 m/s with a standard deviation of 0.2 m/s. The visual odometer propagated path was 179 meters, the total route length being 198 meters, hence yielding an agreement of 90%. Figure 6.7 shows the position solution obtained by propagating the initial position and heading using the visual gyroscope and odometer measurements (green) and the ground truth (red). As these results show the solution without any absolute positioning method calibrating the position and heading during navigation or aiding the solution in the occurrence of errors, the positioning accuracy is expected to increase substantially when the stand-alone visual gyroscope and odometer measurements are integrated with other radio positioning and sensor measurements. 6.2.3 Test in an Urban Canyon As pedestrian navigation is needed also in urban environments, the suitability of the developed vision-aiding method was tested in an outdoor environment, namely close to a wall of a tall building deteriorating and occasionally totally blocking line of sight GPS observations. The challenges set by the environment may be seen in Figure 6.2. Stand-Alone Visual System 95 Fig. 6.7. The two-dimensional position solution in the Iso Omena shopping centre with visual stand-alone solution (green) and SPAN reference (red). 6.8. On the left is the result of a successful vanishing point observation despite the presence of a dynamic object (a vehicle in this case) and bright lighting loosing edges. On the right, these challenges disturb the vanishing point location observation, the dynamic object (a human) introduces a number of non-orthogonal lines surpassing the number of lines going in the direction of propagation, a phenomena that is also partly due to the bright lighting disturbing the edge detection. The mean error of the heading obtained using the visual gyroscope in the experiment lasting for 270 seconds and resulting in 224 images was 3.3 degrees with a standard deviation of 5.1 degrees, therefore the accuracy in this urban canyon falls between the office corridor and the mall environment. The mean error in the visual odometer induced user speed was 0.2 m/s with a standard deviation of 0.23 m/s. The route length was 146 meters and the cumulative distance obtained using the visual odometer 131 meters, yielding an agreement of 90%. The navigation solution was again computed as a stand-alone visual system by propagat- 96 6. Vision-aided navigation using visual gyroscope and odometer Fig. 6.8. Challenging environment of an urban canyon having bright light results in successful vanishing point detection when the line geometry is favorable (in left) or in errors when dynamic objects create disturbances (on right). ing the initial position and heading using the Kalman filter explained above and the visual gyroscope and odometer measurements. The obtained position solution was compared to the one obtained using a Fastrax IT500 high-sensitivity GPS receiver, a typical consumer-grade L1 high-sensitivity GPS receiver. Figure 6.9 shows the position solution obtained using each system, with visual stand-alone system on the left and the IT500 GPS receiver on the right. The different and therefore complementary natures of the two positioning systems are seen in the figure, namely the visual position solution is immediately on the correct track but drifts slowly during navigation, while the GPS solution takes 13 seconds to converge into the correct position but is accurate thereafter after observing the necessary satellite geometry and a sufficient amount of good-quality satellite signals, as is the case throughout this experiment done in a modest urban canyon. The mean error of the visual stand-alone position solution was 10.3 meters with a standard deviation of 5.8 meters. The corresponding measurements for GPS solution were 16.7 and 10.1 meters, Table 6.3 shows the statistics. The experiment demonstrates how the two systems complement each other and when an integrated the solution is anticipated to improve significantly. In this section it was also shown that the performance of the stand-alone visual system is comparable to that of the other positioning systems. A harsher urban canyon situation will be discussed in Chapter 7 where the concept of vision-aided carrier phase navigation is introduced. 97 6.3. Visual Gyroscope Aided IMU Positioning Fig. 6.9. Position solutions obtained in an urban canyon using a visual stand-alone system (green, on left) and an IT500 GPS receiver (green, on right). The ground reference is shown in red. Table 6.3. Positioning error statistics for visual stand-alone and GPS position solutions Statistics Visual stand-alone system GPS 6.3 min error (m) 0.3 max error (m) 22.5 mean error (m) 10.3 std of error (m) 5.8 0.4 51.0 16.7 10.1 Visual Gyroscope Aided IMU Positioning In the previous sections the experiments using a vision-aided multi-sensor positioning system, calibrating the position solution occasionally using absolute position information obtained from WLAN, and a visual stand-alone navigation system were described. The accuracy of the visual stand-alone system drifts in time due to the various errors discussed previously and WLAN based positioning needs a priori prepared environment, therefore other means for navigation need to be addressed. Selfcontained sensors carried by the user and presented in Chapter 2 are desirable for positioning. With a known initial position, the current position may be propagated for a limited time using a triad of gyroscopes and accelerometers. The deficiency of the self-contained sensors is the cumulative measurement errors that affect the accuracy of the attitude obtained from the gyroscopes. Herein a method of updating the navigation filters attitude using vision-aiding and thereby providing accurate absolute user 98 6. Vision-aided navigation using visual gyroscope and odometer position for indoor navigation is presented. The method uses only equipment carried by the user, thus not requiring any additional infrastructure. The method incorporates the visual gyroscope induced attitude as updates in a filter integrating also the GPS position and Analog Devices ADIS16488 inertial measurement unit (IMU) data. The ADI IMU is a high grade MEMS IMU, with a 12◦ /hr in-run bias stability and 1620◦ /hr noise level [7] providing measurements at 200 Hz rate. The NovAtel SPAN-SE GPS/GLONASS receiver with a Northrop Grumman’s tactical grade LCI (low-coherency interferometry) IMU was used as a reference system and carried in the backpack for both experiments. The visual gyroscope measurements are obtained from images taken with a GoPro camera, discussed in more detail in Chapter 4. The filter used is a tightly coupled 21-state extended Kalman filter (EKF) [11] and will be discussed below. The visual odometer measurements are not integrated in this method and therefore the speed is obtained using the IMU accelerometer alone. The visual and IMU measurements are time synchronized by showing a handheld GPS receiver clock to the camera; the IMU measurements are GPS-time tagged. The navigation filter attitude is updated using the visual heading, pitch and roll measurements obtained from the visual gyroscope, and in the case of temporal visual attitude updates discussed below the heading change is used instead of the absolute heading. Only the measurements having an LDOP value below a specified threshold are used for the update. However, some errors arising from environments not suitable for the vanishing point based method are not identified by the error detection algorithm and have to be discarded using a fault detection algorithm. One such situation is when the user is walking along a ramp, the line geometry is good and therefore this violation of the orthogonality requirement is not perceived by the error detection and as a result all visual pitch measurements deviate from the real pitch. The fault detection is applied by accepting only the standardized visual measurement values wi that do not surpass a pre-defined threshold value [62]. The standardized visual measurements are obtained from the Kalman filter’s innovations of the heading, pitch and roll values vi and their corresponding estimated standard deviations, σv , as wi = | vi |, i = 1 : n. σv (93) 6.3. Visual Gyroscope Aided IMU Positioning 99 The innovation in the Kalman filter is defined to be the difference vk between the measurement zk and the predicted value of the state x̂k and calculated as vk = zk − Hk x̂− k . In some cases only the visual heading measurement is found faulty and discarded, while the pitch and roll measurements are used in an update. The relative visual gyroscope induced heading measurements are transformed into absolute heading information by observing the attitude of the camera with respect to the navigation frame and using this information in propagating the visual gyroscope’s heading during navigation. The initial position and orientation of the user is obtained using the GNSS measurements at the start of the experiment. The GNSS receiver observing the position was the NovAtel SPAN-SE GPS/GLONASS receiver used also for the reference, but as the purpose of the experiment was to assess the effect of vision-aiding on gyro errors, GNSS data was only used for three minutes at the start to provide an initial position. As the GNSS measurements were not available indoors and the gyroscope measurements are too noisy to measure the change in heading accurately and the visual gyroscope fails in sharp turns, the heading of the user has to be initialized after each sharp turn during navigation in order to obtain a robust user heading continuously. This was done by using the building layout of the navigation environment, 18 times during the experiment. All results are obtained in post-processing using. 6.3.1 Kalman Filter Used in Visual Gyroscope Aided IMU Positioning The errors in the gyroscopes cause the attitude measurements to drift, introducing continuously increasing errors in the navigation solution. The errors consist of the gyro bias, scale factor and non-orthogonalities, and the g-dependent error and noise. The error model, discussed in more detail in [11], is b b ω̃ib = Sg ωib + bg + Gf bib + ηg (94) b is the gyroscope angular velocity measurement, S is a matrix including where ω̃ib g b the scale factors and non-orthogonalities, ωib is the body (b) turn rate with respect to the inertial (i) frame measured by the gyroscope, bg are the gyro biases, G is a 3 × 3 matrix of the g-sensitivity coefficients, fibb is the specific force and ηg is the noise. 100 6. Vision-aided navigation using visual gyroscope and odometer The g-dependent bias is due to high accelerations, especially affecting sensors attached to the ankle, where the acceleration may rise to a maximum of 12 g. The g-dependent bias is an error source often neglected, but significant especially in pedestrian navigation applications directed to first responders, electronic monitoring and military personnel. The g-dependent bias in the gyroscopes is a result of mass imbalances caused by the manufacturing process and can impact the MEMS gyros with an error of 100 degrees/hour/g or more when uncompensated. The tightly coupled 21-state extended Kalman filter (EKF), developed and implemented for [9], consists of linear perturbations of the position, attitude, velocity, gyro and accelerometer bias, three gyro scale factor coefficients and three g-sensitivity coefficients and the model is defined as δ ṙe = δve δ v̇e = Ne δre − 2Ωeie δve − Fe ε + Reb (ba ) b + bg + Gf bib ) ε̇e = −Ωeie εe + Reb ((I − Sg )ωib ḃa = −τa−1 ba (95) ḃg = −τg−1 bg Ṡg = 0 Ġ = 0 where re , ve are the position and velocity vectors in the earth centered earth fixed (ECEF) frame, ε is the perturbation of the Euler angles relating the body frame to the ECEF frame and ba and bg are the biases of the accelerometer and gyro. The inertia tensor is denoted with Ne , the skew symmetric forms of the earth rotation vector Ωeie and specific force measurement Fe . The rotation matrix Reb rotates the specific force and angular velocity from the body to the ECEF frame. 6.3.2 Equipment Setup on the Body The data, used for testing the feasibility of a body mounted vision-aided IMU navigation indoors, was collected through an experiment conducted on the University of Calgary campus, mainly inside buildings. The environment was again very challenging for the visual measurements, consisting of numerous sharp turns, wide regions 6.3. Visual Gyroscope Aided IMU Positioning 101 Fig. 6.10. Route for experiments on the University of Calgary campus. such as cafeterias and outdoor garden areas. The experiment was conducted during office hours adding many moving humans into the images. The duration of the experiment was 48 minutes, succeeding a 10 minute walk outdoors, thereby allowing the filter to converge, the route (obtained using the reference system) is shown in Figure 6.10. GNSS data was used only for three minutes at the start of the experiment to provide an initial position. All equipment was carried in a backpack as shown in Figure 6.11. The figure shows also a close-up of the setup on the backpack, namely the camera attached to the top of the backpack and the IMU on the same plane. The mutual attitude of the IMU and camera was observed in order to be able to integrate the measurements. The results of the vision-aided IMU navigation obtained using the two different update methods mentioned before are as follows. Absolute Visual Attitude Update (AVUPT) The absolute heading obtained by propagating the GNSS initialized heading using measurements from the visual gyroscope was used as absolute updates to the Kalman filter. The video stream obtained from the experiment was sampled into still images at 10 Hz rate and resulted in 29802 images, of which 16347 were discarded due to 102 6. Vision-aided navigation using visual gyroscope and odometer Fig. 6.11. Equipment attached to a backpack and carried by a user in a body mounted test (on left) and a close-up of the equipment (on right). large LDOP values. The fault detection within the navigation filter further rejected 11% of the remaining images. Visual pitch and roll updates only, with no heading, were accepted from 38% of the images remaining from the error detection. Therefore 8337 absolute visual heading updates and 14549 absolute visual pitch and roll updates were provided to the navigation filter. Figure 6.12 shows the standard deviations for different integration schemes. When visual updates are used, either absolute or temporal (discussed in the following subsection), the standard deviations of roll, pitch and heading stay close to zero for the entire experiment; when no visual updates are used the standard deviations increase with time. This is mainly due to the decrease of the gyro drift growth when visual updates are used. This phenomenon should be considered with care as the update of the absolute heading using the building layout during the navigation has a significant effect on the growth of errors when the absolute update method is used. Figure 6.13 shows the effect of the visual updates on the attitude errors. The attitude errors are expressed using a measure called root mean square error (rms). The rms error is a measure expressing the spread of the values around the average. It is computed by taking the root of the averaged squared residuals as 6.3. Visual Gyroscope Aided IMU Positioning 103 Fig. 6.12. Standard deviation for different integration schemes, namely no visual updates used (blue), using temporal updates (purple) and absolute updates (green). s rms = PN i=1 (ŷi N − yi )2 (96) where ŷi is the predicted value of the measurement yi , N is the total number of measurements. The vision-aiding improves the navigation solution’s pitch and roll only slightly, as is seen from the figure and in Table 6.4. In the experiment, the pitch root mean square error decreases from 1.7 to 1.4 degrees and the roll from 2.0 to 1.4 degrees when the absolute vision-aided attitude updates are used. However, the heading improves significantly, namely 93% as the root mean square error decreases from 29.5 to 2.1 degrees when the navigation filter is updated with visual measurements. 104 6. Vision-aided navigation using visual gyroscope and odometer Fig. 6.13. Attitude error using different integration schemes, namely no visual updates used (blue), using temporal updates (purple) and absolute updates (green). Table 6.4. Attitude errors obtained for body-mounted IMU with different integration methods Attitude errors (rms, degrees) Pitch No visual updates 1.7 Absolute vision-aided attitude updates (AVUPT) 1.4 Temporal vision-aided attitude updates(TVUPT) 1.6 Roll 2.0 1.4 1.7 Heading 29.5 2.1 17.6 Temporal Visual Attitude Update (TVUPT) When no prior information of the environment is available, like a floor plan used in the previous experiment, the temporal attitude (i.e. the change of the attitude over a short interval) of the camera may be used. In temporal visual attitude update (TVUPT) the Kalman filter integrates the user attitude obtained from the visual gyroscope to estimate the errors in the IMU attitude measurements. The temporal attitude 6.3. Visual Gyroscope Aided IMU Positioning 105 observation may be presented as e bk−n e φ φ φ = β − β β θ θ θ bk bk (97) bk−n where φ, β and θ are the pitch, roll and heading of the camera, respectively. As the filter estimates the errors in IMU attitude measurements, having a 200 Hz rate using consecutive images having a lower rate, the two consecutive images are time labeled as k and k − n, where n is the number of IMU epochs between two consecutive images. b represents the body frame and e the ECEF frame. The equation may be represented in the filter’s (95) rotation matrix form as Rebk = b Rebk−n Rbk−n . k The image interval used in previous visual gyroscope aided IMU positioning experiments was 0.10 s. For the temporal visual attitude updates the interval had to be decreased, because in order to get accurate temporal visual attitude updates, the image rate has to be as large as possible. Due to the computational limitations the entire data set was not sampled at the chosen 30 Hz rate, but two consecutive images were retrieved with a 0.033 s interval and then four subsequent images were discarded. The sampling resulted in 30077 images. When the temporal visual attitude update method was used, the user heading root mean square error decreased from 29.5 degrees to 17.6 degrees, resulting in a 40% improvement. Again, the improvement in the pitch and roll accuracy was minor, namely the pitch root mean error decreased from 1.7 to 1.6 and roll from 2.0 to 1.7 degrees, as shown in Table 6.4 and Figure 6.13. The standard deviations of roll, pitch and heading stayed close to zero for the entire experiment when the temporal visual updates were used; when no visual updates are used the standard deviations increase with time, shown also in Figure 6.12. However, the visual attitude updates show an overly optimistic variance because the update is temporal rather than absolute for which the integration algorithm was originally developed and therefore this method provides variances that are not exactly indicative of the estimate. 106 6. Vision-aided navigation using visual gyroscope and odometer 6.3.3 Equipment Setup on the Foot When the gyro is located on the ankle of the user the vertical acceleration can rise up to the maximum of 12 g causing very large g-dependent errors. The effect of correcting the errors through vision aiding of the attitude was tested by an experiment using a foot mounted system. The IMU and camera were attached rigidly to each other and located on the ankle of the user. The setup is shown in Figure 6.14. Data was collected in an experiment of 43 minutes conducted mainly indoors. Because the purpose of the research was to assess vision-aiding performance on attitude and gyro errors, GNSS data was only used for three periods of two to three minutes during the navigation in low canyons between buildings. A pedestrian navigation solution was obtained by integrating the vision-aided gyroscope attitude measurements and applying zero velocity updates to the inertial navigation filter as well as using the occasional absolute heading updates obtained from the building layout. The integration was performed using the Kalman filter described above. Due to the lack of a reference system mounted on the foot (the reference system was carried in the backpack), the attitude errors could not be evaluated but the position errors could. The visual heading, pitch and roll measurements were used as absolute updates to the navigation filter attitude as explained above in the case of a body-mounted system. The calculation of visual measurements was challenging due to large camera movements at the ankle of the pedestrian. The total number of images acquired during the experiment was 25664. Only 18% received an LDOP value sufficiently good for trusting the visual measurements due to image blurring introduced by the fast motion of the foot and because the camera was pointing straight down to the floor for a short time during a step cycle period, shown in Figure 6.15. Again, fault detection was used to remove the noise from the visual measurements. The fault detection within the navigation filter further rejected 65% of the images remaining from the error detection. Visual pitch and roll updates only, with no heading, were accepted from 18% of the remaining images. This resulted into 1326 visual heading updates and 1617 visual pitch and roll updates to the navigation filter. Table 6.5 and Figure 6.16 show the improvement of the position obtained with the vision-aided foot mounted navigation system. The periods when GNSS was used are shown in the figure with black squares. Vision-aiding improves the horizontal position significantly in the eperiment; the root mean square horizontal position error 6.3. Visual Gyroscope Aided IMU Positioning 107 Fig. 6.14. Equipment setup for the foot, namely the GoPro camera and IMU attached to each other. Fig. 6.15. Images resulting from one step cycle period, three of the images are too blurred for visual gyroscope calculations and are therefore not shown, five accurate vanishing points are obtained because the rest of the images show only the floor plane or are too blurred for accurate line detection. decreases from 30.9 m to 20 m, yielding an improvement of 34%. Vision-aiding has no effect on the vertical position error, in which case the error remains at 68 m. 108 6. Vision-aided navigation using visual gyroscope and odometer Fig. 6.16. RMS position errors obtained for foot-mounted IMU with (green) and without vision-aiding (red). Table 6.5. RMS position error obtained for foot-mounted IMU with and without vision-aiding RMS Position Errors (m) Horizontal Vertical No vision-aiding 30.9 67.5 With vision-aiding 20.0 67.7 6.4 Performance of Visual Gyroscope Implemented Using Probabilistic Hough Transform The method presented in Section 4.7 was tested using a subset of the data collected using the body mounted equipment for testing the absolute visual attitudes explained above. The subset consisted of 80 seconds of data collected indoors, resulting in 800 sampled images. Figure 6.17 shows the result of line detection and vanishing point calculations. As the images were taken using the GoPro camera with a wide-angle 6.4. Performance of Visual Gyroscope Implemented Using Probabilistic Hough Transform 109 Table 6.6. Ratio of image points used for computing the Probabilistic Hough Transform presented to the image points used by Standard Hough Transform for the images processed in the experiment Ratio of image points used compared to all image points (%) Min Mean Max Std 27 45 67 8 lens, discussed in Chapter 4, they are distorted. In order to maintain the real-time processing obtained using the developed line detection, the distortion is not corrected as described, but instead it effect is reduced by discarding the pixels close to the edges of the image. The blue lines are extracted using the Probabilistic Hough Transform. It should be noted that because the image is not distortion corrected the lines found do not agree with the lines seen in the figure but would if corrected. The green point is the vanishing point estimation based on the IMU attitude and the red point is the corrected vanishing point. The figure shows how the vanishing point is found reliably even when the IMU induced attitude and therefore the estimated vanishing point is distorted. As stated in [73] the processing time of an algorithm is dependent on the computer used and algorithm and software implementation. Therefore the effect of the algorithm is shown by comparing the number of image points examined, in other words the iterations of the parameter calculation. The Standard Hough Transform examines all pixels in the input image and afterwards searches for local maxima from the accumulator to find the lines. The algorithm presented uses a fraction of the image points for extracting the lines, namely on average 45% of all image points, and therefore the computation is anticipated to be accelerated in the same proportion to the Standard Hough Transform computation time. Table 6.6 gives the test iteration statistics. It should also be noted that the method already detects the lines during the point examination as well as the vanishing point, further reducing the computation time used for obtaining visual gyroscope measurements. Restricted light conditions and lines violating the orthogonality requirement are major threats for the visual gyroscope’s accuracy often resulting in errors. An example of an image suffering from both situations is show in Figure 6.18. The vanishing point computed by the visual gyroscope discussed in Chapter 4 is shown on the left 110 6. Vision-aided navigation using visual gyroscope and odometer Fig. 6.17. Line detection and vanishing point calculations using Probabilistic Hough Transform. Estimated vanishing point (green) is corrected (red) using the lines (blue) found with the Probabilistic Hough Transform. Fig. 6.18. Vanishing point detection in an environment suffering from low lighting and nonorthogonal lines, on the left using the visual gyroscope presented in Chapter 4, and on the right using the visual gyroscope based on Probabilistic Hough Transform. while the one using the visual gyroscope with the Probabilistic Hough Transform presented in this is section on the right. The experiment shows increased tolerance for vanishing point computation because now the computation process is not dependent on visual perception only, but receives a priori information of the user attitude from the IMU. Although the vanishing point detection method is dependent on the IMU, the method is tolerant to large errors in the IMU measurements when the parameters of the Probabilistic Hough Transform algorithm are carefully selected. Figure 6.19 shows how an estimated vanishing point (green) resulting from large temporary errors in IMU 6.4. Performance of Visual Gyroscope Implemented Using Probabilistic Hough Transform 111 measurements is corrected (red) through the line detection presented. The method gives also promising results for turn detection that has so far been the most significant obstacle preventing the use of vision-aided inertial sensors autonomously for navigation in unknown indoor environments. In turning situations the estimated vanishing point obtained by propagating the attitude using the method falls outside the image at the same time as the corrected vanishing point obtained from the Probabilistic Hough Transform detection is found from the other side of the image as shown in Figure 6.20. This is due to the change of the World Frame, i.e. the visual gyroscope was initialized at the beginning as having a zero heading when the camera frame is totally aligned with the world frame and now the world frame’s horizontal axes are rotated 90 degrees. When this contradiction is used in integration, at least the existence of a steep turn is observed. Observing the magnitude of the turn is a future research objective. Fig. 6.19. Detected vanishing point (red) may be used to correct large errors in IMU measurements resulting in erroneous estimated vanishing point location (green). 112 6. Vision-aided navigation using visual gyroscope and odometer Fig. 6.20. Conflict between estimated and detected vanishing point locations indicates the existence of a steep turn. 7. VISION-AIDED CARRIER PHASE NAVIGATION Navigation in urban areas is challenging for GNSS. Line-of-sight signals are blocked by tall buildings and therefore, the requirement for measurements from at least four satellites is not fulfilled and consequently a position solution is not obtained. Even when the solution is obtained, multipath effects deteriorate the accuracy of the position. In this chapter the visual gyroscope and a version of the visual odometer are used for aiding GNSS measurements in such areas. The visual gyroscope is suitable also for urban environments, which consists of countless straight lines, i.e. edges of buildings and roads. The limitation of the visual gyroscope is its need for absolute heading information that has to be updated during navigation due to its deficiency to monitor the magnitude of sharp turns and due to calculation errors arising from problems in visual perception. However, the calibration need to be done only occasionally and in between the correct heading maintained by propagating the absolute heading using the visual gyroscope’s measurements. Although the likelihood for observing at least four satellites needed for resolving the user position is reduced, it is still possible occasionally even in an urban canyon and therefore the visual gyroscope and GNSS positioning complement each other in these challenging environments. The translation obtained from the homography constraining consecutive images has an ambiguous scale that was resolved earlier in this thesis using a special configuration of the camera. In this chapter an alternative method is presented, namely the scale is observed using differenced carrier phase GNSS measurements. As the carrier phase measurements are differenced, the need to resolve the ambiguous integer number of the satellites’ carrier phase cycles is avoided. Because only the scale, i.e. the total magnitude of translation between two time epochs is needed, using two satellites with the proper geometry is enough and therefore the method is feasible also for a dense urban environment. Below, the method is described in detail, then the verification of the method in a sub-urban environment is discussed and finally an experiment testing the method in a dense urban canyon is 114 7. Vision-aided carrier phase navigation presented. 7.1 Ambiguity Resolution Using Differenced GNSS Carrier Phase Measurements Time-differenced satellite carrier phase measurements provide information about the magnitude of translation of the user between two time epochs which may be used for resolving the ambiguous scale in the translation obtained from images. Because only the magnitude of the translation and the receiver clock error are needed, acquisition of two satellites is enough. The idea has been approached earlier by [99]. The method was developed for robot navigation and visual measurements utilized from three cameras. The pitch and roll of the camera were obtained from an IMU. As the heading was not observed accurately using an IMU, it was included to the algorithm as an unknown and therefore observations from at least three satellites were needed. In the test lasting for 100 seconds centimeter level accuracy was obtained, which decreased into decimeter level when the receiver clock was calibrated and only two satellites acquired. A similar method for vehicular navigation was developed in [23] utilizing again an IMU for attitude, GNSS and vision-aiding. In the experiments at least three satellites were observed and meter level horizontal position accuracy was obtained. The method presented in this thesis is aimed at pedestrian navigation where the possible amount and size of equipment is limited. When the relative heading and translation information obtained from images is initialized with the absolute position and heading information is provided by GNSS, the user position may be propagated and only occasional absolute updates are needed for a functional navigation solution. The carrier phase observation (ϕi ) for the satellite i may be represented using a simplified form as ϕi = ri + cdtrcvr + λN + η i + εiϕ (98) where ri is the true range between the satellite and the receiver, cdtrcvr is the speed of light times the difference between the receiver clock and satellite clock errors with respect to GPS time, λ is the carrier wavelength, N is the integer ambiguity, η i is an error term incorporating ionospheric, tropospheric and satellite orbital errors and the error term εiϕ is the combined effect of multipath and noise. The equation 7.1. Ambiguity Resolution Using Differenced GNSS Carrier Phase Measurements 115 assumes that the satellite clock error is already compensated for. The carrier phase measurements obtained at two time epochs (t1 , t2 ) are differentiated and the resulting measurement is ∆ϕi = ϕi (t2 ) − ϕi (t1 ) = ∆ri + c∆dtrcvr + εiϕ . (99) The integer ambiguity term is removed by differencing the carrier phase observations over time and the change in the term encompassing the errors stays below a centimeter / second level and is therefore omitted [99]. The differenced range ∆ri may further be expressed [23] as ∆ri = (Ti (t2 ) − Trcvr (t2 )) · u(t2 ) − (Ti (t1 ) − Trcvr (t1 )) · u(t1 ) = (Ti (t2 ) · u(t2 )) − (Ti (t1 ) · u(t1 )) −(∆Trcvr (t2 )) · u(t2 )) − (Trcvr (t1 )) · ∆u(t2 )) (100) where (·) denotes a vector dot product, Ti is the satellite position vector, Trcvr the receiver position vector and u the unit vector from the user to the satellite and calculated as u= Ti − Trcvr . |Ti − Trcvr | (101) The term (Ti (t2 ) · u(t2 )) − (Ti (t1 ) · u(t1 )) is called the satellite Doppler term (DOP Pi ) and it arises from the motion of the satellite between the two time epochs and may be derived from the satellite observations. The term (Trcvr (t1 )) · ∆u(t2 ) expresses the change in the user unit vector (the line-of-sight unit vector) and is called the geometry change (uGC ). As the geometry and satellite Doppler change terms employ the user position at the second time epoch that is not known yet but will be obtained from the calculations, an estimate of the position is used. The estimate has to be accurate only to within 100 meters [99]. Now the differenced carrier phase measurement (99) may be presented as ∆ϕi = −(∆Trcvr (t2 ) · u(t2 )) + DOP Pi − uGC . (102) By rearranging the terms, an equation for resolving the magnitude of the user translation ∆Trcvr (t2 ) between the time epochs (t1 , t2 ) is obtained from 116 7. Vision-aided carrier phase navigation ∆ϕcorr = ∆ϕi − DOP Pi + uGC = −(∆Trcvr (t2 ) · u(t2 )). i (103) The equation expressing the user translation ∆Trcvr (t2 ) has three unknowns, namely the translation in x-, y- and z-axis directions. As the receiver clock error has to be also resolved, four satellites would be required to obtain a solution, which would not necessarily be always feasible in the dense urban environments. However, images provide information about the user translation between two time epochs, i.e. the times of capturing the two consecutive images. The translation of the user with an ambiguous scale may be obtained using the epipolarity constraint and the Fundamental matrix arising from it, discussed in Chapter 3. 7.1.1 Ambiguous Translation Using the Fundamental Matrix The fundamental matrix F defined by (31) may be computed, given sufficiently many matching image points (x0 , x), using a linear algorithm [38], used in the thesis; however, more robust algorithms for the computation are also presented in the reference. When the epipolar geometry is defined to be affine, i.e. the difference of an affine geometry compared to a projective geometry is that the cameras are defined to have their centres at the infinity and therefore there is a parallel projection from scene to image. The affine Fundamental matrix is 0 0 a F = 0 0 b . c d e (104) Now, each point correspondence (x0 , x) may be represented as x01 (x0i , yi0 , xi , yi , 1)f = 0, i.e. ... y10 .. . x1 .. . y1 .. . 1 .. f = 0 . (105) x0n yn0 xn yn 1 when n matching image points in the two consecutive images are found and f = (a, b, c, d, e)T . At least four corresponding points are needed, but when there are more, as is usually the case especially in outdoor environments with favorable lighting conditions, the singular value decomposition is used. The fundamental matrix F 7.1. Ambiguity Resolution Using Differenced GNSS Carrier Phase Measurements 117 may further be transformed into the Essential matrix E, also discussed in Chapter 3, using the camera calibration matrix K (assumed constant between the images) and the camera motion [76] as h i E = KT FK = t R. × (106) h i The term t denotes a skew symmetric matrix of the translation vector (tx , ty , tz )T × and is 0 −tz ty h i t = tz 0 −tx . × −ty tx 0 (107) The singular value decomposition of the Essential matrix E gives E ∼ UDVT and h iT the ambiguous translation obtained from U is t ∼ u13 u23 u33 , where uij denotes the element in the matrix U on the ith row and jth column. 7.1.2 Navigation Solution Incorporating the Absolute User Translation The user translation ∆Trcvr (t2 ) is perceived in the navigation frame but the visual translation is in the camera frame. In order to be able to obtain the ambiguous scale of the visual translation and turn it into a position change of the user again in the navigation frame, transformations have to be made. The visual gyroscope presented in Chapter 4 provides information about the attitude of the user with respect to the navigation frame. A Direction Cosine Matrix Cbn transforming the observations from the navigation frame to the camera frame is formed using the heading, pitch and roll measurements obtained from the visual gyroscope as explained in Chapter 6. For resolving the scale, both the unit line-of-sight vector u(t2 ) and user translation vector (∆Trcvr (t2 )) has to be multiplied using the direction cosine matrix Cbn . The user translation vector (∆Trcvr (t2 )) may be written using the scalar scale s of the visual translation that is still unknown and the visual translation vector t as (∆Trcvr (t2 )) = Cbn st. Now, (103) can be re-written as ∆ϕcorr = ∆ϕi − DOP Pi + uGC = −(Cbn u(t2 ) · Cbn ts) i (108) 118 7. Vision-aided carrier phase navigation and in a matrix form corr " # ∆ϕ1 −(Cbn uT1 (t2 )) · (Cbn t) 1 s .. . . .. .. x = y= . H= c∆dtrcvr ∆ϕcorr −(Cbn uTN (t2 )) · (Cbn t) 1 N (109) from which the ambiguous scale of translation s may be obtained using the leastsquares equation x = (HT H)−1 HT y. Occasionally, especially when only two satellites are observed, the scale computation fails. The errors are detected and discarded by monitoring the magnitude of the user speed from the absolute speed obtained, i.e. the measurement is discarded and a previous one used when the speed exceeds 3 m/s. The translation is again transformed into the navigation frame (ENU) using a Kalman filter propagating the user position (X, Y ), heading (θ) and speed (S). Speed is obtained from the translation computed as discussed above and the heading from the visual gyroscope and occasional absolute heading updates as discussed below. The Kalman filter models the user position as Xk+1 = Xk + S sin(θk )∆t Yk+1 = Yk + S cos(θk )∆t (110) discussed in more detail in [66]. The state xk and measurement zk vectors for the model are X Xk Y Y xk = k zk = θ θ S S (111) and state transition matrix Φ and process noise (Q) matrices 1 0 Φk = 0 0 0 1 0 0 0 sin(θk )∆t 0 cos(θk )∆t 1 0 0 1 (112) 119 7.2. Method Verification in a Sub-Urban Environment 2 2 3 q1 ∆t + (a q3 +b3 q4 )∆t (acq3 +bdq4 )∆t3 3 Qk = aq3 ∆t2 2 bq4 ∆t2 2 (acq3 +bdq4 )∆t3 3 2 2 3 q2 ∆t + (c q3 +d3 q4 )∆t cq3 ∆t2 2 dq4 ∆t2 2 aq3 ∆t2 2 cq3 ∆t2 2 q3 ∆t 0 bq4 ∆t2 2 dq4 ∆t2 2 (113) 0 q4 ∆t where q1 is the spectral density for the position North component (X), q2 for the East component (Y ), q3 the spectral density for the heading and q4 for the speed. In the following section two experiments using the method discussed above are described and assessed. 7.2 Method Verification in a Sub-Urban Environment Although the method of vision-aided carrier phase navigation is designed for urban positioning, performance verification was first performed in an easier signal environment, namely that with lower buildings blocking out only satellites having an elevation less than 30 degrees. The test setup consisted of the Novatel SPAN-SE GPS/GLONAS receiver providing carrier phase measurements and a GoPro camera for visual measurements. A Northrop Grummans tactical grade LCI-IMU and the SPAN were used for acquiring the reference solution as well as initializing the user position and heading at the beginning of the experiment and after every three minutes of navigation. The system was carried in a backpack as shown in Figure 7.1. The camera and GNSS receiver were attached to the top of the system and are indicated with a red circle in the figure. After initialization the user position was observed by propagating the heading and speed measurements using the Kalman filter explained above. The data was collected for 15 minutes and post-processed using Matlab. As the purpose of the verification experiment was to test the feasibility of the system in extreme signal conditions when only two satellites are used, the vision-aided carrier phase navigation solution was computed using only two satellites available for the full experiment. Because the heading obtained from the visual gyroscope suffers from occasional errors in vanishing point calculation and the sharp turns cannot be observed, every three minutes the position and heading were re-initialized using the reference system. This is justified by the fact that even in a dense urban area the requirement for four satellites is fulfilled once in a while as is also seen in the real urban 120 7. Vision-aided carrier phase navigation Fig. 7.1. Setup for data collection used in verification of vision-aided carrier phase navigation. Table 7.1. Positioning verification error statistics using vision-aided carrier phase (VA) Statistics VA min error (m) 0 max error (m) 76 mean error (m) 24 std of error (m) 18 experiment below. Figure 7.2 shows the path obtained using the vision-aided carrier phase navigation (red) compared to the ground truth (blue). The red light lines show the effect of the position correction after every three minutes. Table 7.1 shows the horizontal position error statistic. The fairly large mean error in position, namely 23 meters, resulted from the difficulty in obtaining an accurate heading solution using the visual gyroscope in this fairly open sub-urban environment lacking straight lines from high-rise buildings of urban canyons. This was anticipated to improve when the method is experimented in an urban canyon. The length of the obtained path agreed with the ground truth and therefore the visual odometer utilizing the carrier phase measurements was shown to provide promising results. 7.3. Vision-Aided GNSS Navigation in an Urban Environment 121 Fig. 7.2. Position solution verification using vision-aided carrier phase navigation (position red dot, path red line) and compared to ground truth (blue) in a sub-urban environment in Calgary, shown in Google Earth. 7.3 Vision-Aided GNSS Navigation in an Urban Environment In order to show the advantages and limitations of the method in severe urban canyons using an unaided standard receiver, a test was carried out in downtown Calgary as shown in Figure 7.3. This canyon encompasses tall and reflecting buildings heavily blocking the satellite signals and/or causing multipath. The test duration was 25 minutes. The user position and heading were initialized using the reference system described in the previous section. Then, the user position was propagated using the Kalman filter described above and the following procedure. The Novatel SPAN-SE GPS/GLONASS receiver was used to acquire the pseudorange (L1), carrier phase (L1) and Doppler measurements as well as the GNSS navigation message. Only GPS measurements were used in the processing. The GoPro camera was used for capturing a video stream that was sampled at 10Hz. All equipment was attached to a backpack carried by the user as in the experiment above. When four or more satellites were acquired, the user position was computed using the pseudorange measurements and the least-squares method described in Chapter 2. As may be seen below, the GPS measurements are heavily degraded in such challenging environments and therefore the quality of the position solution was monitored by examining at the least-squares residuals. The residual r is computed after the least-squares final solution is obtained and it expresses the difference between the anticipated and obtained measurements. The residual vector is calculated using the pseudorange measurement vector (z), the geometry matrix (H) and the estimated user position vector (x̂) as r = z − Hx̂. (114) 122 7. Vision-aided carrier phase navigation Fig. 7.3. Calgary downtown environment used for vision-aided GNSS navigation. When the residual of any satellite i exceeded a threshold (herein 20 m, selected by experimentation), the GPS solution was discarded and a vision-aided carrier phase solution was computed instead using the position and heading from the previous epoch in the state vector for initialization. When the consecutive epochs provided successful GPS position solutions, heading between the epochs was computed. As the error in GPS-derived position using pseudoranges is commonly a few meters even in a favorable environment, the heading computed from two consecutive epochs having a time interval of only one second, would be erroneous in most cases. Therefore, the heading was computed using the longest interval of successful positions, however not exceeding 10 epochs. The heading was computed using the latitude (φ) and longitude (λ) of the position at the first time epoch (1) and last epoch used (n) as [110] θ= mod(arctan 2(sin(λn − λ1 ) cos(φn ), cos(φ1 ) sin(φn ) − sin(φ1 ) cos(φn ) cos(λn − λ1 )), 2π). (115) When less than four satellites were found, the translation and heading of the user were computed using the visual gyroscope and the visual odometer presented in this chapter and the position propagated using the Kalman filter. Figure 7.4 shows the number of satellites obtained for each time epoch in the experiment (blue). As the experiment was started in an open area, up to 9 GPS satellites were occasionally used. As the path proceeded into the urban canyon the number of satellites used decreased, dropping under four towards the end of the data set. The number of observed satellites does not directly however reflect how many satellites were available for position computation. The figure shows also the epochs when the obtained position accuracy was deemed unreliable based on the residual and vision-aided carrier phase solution 7.3. Vision-Aided GNSS Navigation in an Urban Environment 123 Fig. 7.4. Number of satellites acquired in an urban canyon for each time epoch of the experiment (blue) and the epochs when vision-aided carrier phase navigation used due to too few satellites or large residuals in GNSS pseudorange position estimations (red above the blue mark). used instead (red above the blue mark). Figure 7.5 shows the path obtained using vision-aided carrier phase navigation (blue), GPS-only solution (green) and the ground reference (red). As the GPS-only positions deviate strongly from the reference in the end of the data set, the figure has been zoomed showing better the vision-aided solution results. The vision-aided navigation solution provided fairly accurate results at the beginning of the experiment but deteriorated as the user entered the urban canyon and obtained poor GPS measurements, resulting in 200 meters of error in the worst case. Figure 7.6 shows the horizontal errors of the vision-aided carrier phase navigation and GPS-only solutions. Again, at the beginning of the experiment the errors remained low but deeper inside the urban canyon they grew. The figure shows also the main reason for the error growth, namely the GPS position computation process resulting in low range residual values and therefore used although being erroneous making the GPS position significantly deteriorated. The figure is zoomed to exclude the largest GPS-only errors enabling closer examination of the vision-aided solution 124 7. Vision-aided carrier phase navigation Fig. 7.5. Position solution in an urban canyon using the ground truth (red), GPS-only(green) and vision-aided carrier phase navigation (blue). errors. Heavily multipath-affected observations are difficult to discard by assessing range residuals only, especially when the measurement redundancy is low. A better filtering and observation selection for the obtained GPS position should be developed for a more accurate vision-aided navigation solution. Table 7.2 shows the statistics for the horizontal position errors, namely a mean error of 25 meters with a standard deviation of 48 meters. However, already in this simple implementation, the vision-aided carrier phase method improves the navigation solution significantly as may be seen from the table also showing the corresponding horizontal position error statistics for a GPS-only solution based on the pseudoranges (mean error 73 meters, standard deviation of 1241 meters). This positive effect of vision-aiding may also be seen by comparing the navigation paths obtained using the vision-aided carrier phase navigation shown in Figure 7.7 and using the GPS pseudorange measurements only, propagated using a simple Kalman filter but no error detection for the obtained position, shown in Figure 7.8. Both figures show the obtained position solution (green 7.3. Vision-Aided GNSS Navigation in an Urban Environment 125 Fig. 7.6. Horizontal position error in an urban canyon obtained using vision-aided carrier phase navigation and GPS-only. Table 7.2. Positioning error statistics using vision-aided carrier phase (VA) and GPS only (GPS) Statistics VA GPS min error (m) 0.4 0.1 max error (m) 200 4015 mean error (m) 25 73 std of error (m) 48 1241 dots), its path (red line) and the ground truth (blue). As anticipated, performance especially when using GPS alone, was significantly degraded in the urban canyon. Due to the frequent unavailability and large GPS position errors, the vision-aided solution suffered significantly. As the position and heading from GPS-only were already erroneous when the navigation solution computation switched to the vision-aided carrier phase method, the user position was still poor despite the addition of accurate visual measurements,. If other sensors were added to aid GPS, better measurement error detection for the GPS measurements would occur, in which vision-aided carrier phase navigation would provide significantly better results. The absolute heading could be obtained by using a 3D magnetometer. In urban canyons, magnetometer heading measurements deteriorate due to magnetic 126 7. Vision-aided carrier phase navigation Fig. 7.7. Position solution using vision-aided carrier phase navigation (position green dot, path red line) and compared to ground truth (blue) in an urban canyon in downtown of Calgary, shown in Google Earth. disturbances arising from nearby ferrous environments. However, techniques to mitigate these errors, especially when magnetometer measurements are combined with 3D accelerometers and rate gyros, are emerging [4]. Inertial sensors and barometers would also aid the error detection process, thereby eliminating large GPS measurement errors. Fig. 7.8. Position solution using GPS measurements only (position green dot, path red line) and compared to ground truth (blue) in an urban canyon in downtown of Calgary, shown in Google Earth. 8. CONCLUSIONS There is a strong need for enhanced pedestrian navigation systems for improved safety and to improve everyday life. First responders, electronic monitoring (i.e. monitoring of people such as dangerous offenders) and military personnel operate in challenging situations and need a system that is available in all environments. Moreover, general users need precise indoor navigation to locate specific rooms in buildings and to use location based applications. Pedestrian navigation is mainly needed indoors and in urban environments. Although indoor and urban navigation has been an active research subject for years, no unique navigation system addressing all needs has yet been developed with a level of performance similar to that of GNSS in the outdoors. A pedestrian navigation system has to be light and small in size, have low power consumption and price, in addition to perform well in all environments. Therefore it has preferably to be independent of specific indoor infrastructures such as RF access points. So far the most promising approach for pedestrian navigation is the fusion of many different sensors and positioning systems, the most widely used being self-contained sensors, GNSS and WLAN. The performance of GNSS is degraded indoors and in urban canyons, WLAN needs an a priori prepared infrastructure and the errors in selfcontained sensors result in position solutions that drift in time and become distorted. Hence, other means for augmenting or replacing some of these methods have to be used. Vision-aiding is a feasible method in many environments because it is affected by error sources which are different from those of other navigation technologies. Consecutive images provide relative information about the attitude and translation of the camera, which can be further transformed into user heading and position information. In favorable environments and circumstances this information obtained from the images results in a much more accurate and available navigation solution when integrated with other measurements; it can also be used for stand-alone navigation for short periods of time if the absolute position and heading are known at the begin- 128 8. Conclusions ning of these periods. However, vision-aiding solution suffers from errors due to low lighting or of scenes unsuitable for visual perception, especially those including moving objects (i.e. human, vehicles). Therefore vision-aided systems need occasional calibration from an absolute navigation system. In this thesis, new tools for vision-aiding navigation solutions were developed and tested, namely a concept called visual gyroscope for observing the user heading and a visual odometer for translation. Both methods provide user displacement information by monitoring the motion of features in consecutive images captured by a camera carried by the user. Different methods were investigated to combine the above measurements using Kalman filtering and vision-aided navigation systems were obtained. Different variations of a Kalman filter were selected because they are prevailing means for observation integration and estimation in the navigation field. However, better performance might be obtained using more sophisticated algorithms, e.g. particle filters. The next section discusses the main results. Despite the challenges in vision-aiding arising from the occasional unsuitability of the urban navigation environment for visual processing as well as difficulties related to computer vision algorithms, vision-aiding improves navigation accuracy, availability, reliability and continuity. 8.1 Main Results The visual gyroscope tracks the motion of virtual points, called vanishing points, arising from the projective geometry mapping the parallel lines in a three dimensional scene into a two dimensional image. The change in the location of the vanishing points can further be transformed into the attitude of the user and therefore the method is called the visual gyroscope. In an ideal situation where there is enough and constant light, the environment has a favorable structure and does not contain any dynamic objects, a visual gyroscope processing images from a static camera produces almost no error compared to a reference system in the user attitude. In the situation where the lighting is not the best possible and dynamic objects disturb the vanishing point perception, a static visual gyroscope performs still better than a common MEMS gyroscope in a performance test. In a real navigation situation the conditions vary a lot and deteriorate the visual perception and therefore careful error detection is needed. An error detection algorithm suitable for pedestrian navigation, namely 8.1. Main Results 129 a method computing a quantity called LDOP monitoring the reliability of the visual gyroscope’s attitude measurements based on the geometry of the lines used, was developed. When the visual gyroscope’s measurements passing the error detection were used as updates for a Kalman filter propagating the angular velocity and acceleration provided by an IMU, the obtained user attitude improved significantly. When the user heading was occasionally calibrated using the building layout, the attitude improved 93% and when the visual gyroscope’s heading change measurements were used with no calibration, an improvement for the attitude of 40% was obtained in the experiment. The visual gyroscope is incapable to detect the magnitude of sharp turns, namely when the turn is close to 90 degrees or more. This was addressed by developing a method using the IMU attitude measurements to aid the vanishing point detection also making the procedure more accurate. So far the detection of the sharp turns was implemented, whether it is possible to also observe the magnitude of the turns is a task for future research. Simultaneously, the algorithm of IMU-aided vanishing point detection was found to decrease the computational burden of the line detection needed in the visual gyroscope method and therefore making the real-time performance of the navigation solution possible. The visual odometer identified the matching feature points in consecutive images and used the homography relation to observe the translation of the user i.e. the distance travelled. As the translation obtained from the images has an ambiguous scale, two different approaches to solve it were developed. A visual gyroscope feasible for indoor navigation resolved the scale by using the known, special configuration of the camera. As the height of the camera was known a priori and kept sufficiently static (vertical motion of ±10cm was still found to provide accurate results), the pitch of the camera was obtained using the visual gyroscope, basic geometry could be used to compute the depth of the object found in the image and therefore the scale too. As the method observed the attitude of the camera using the visual gyroscope and the camera was kept static, only the horizontal translation was left to be resolved reducing the amount of matching image points needed, a profitable feature especially for indoor navigation. For the special configuration to be useful, the image points had to lie on the floor. However, again because the attitude of the camera was obtained using the visual gyroscope, the degeneracy problem arising from using image points all found from a plane, was avoided. The performance of the visual odometer was evaluated by 130 8. Conclusions looking at the agreement of the translation obtained using the method and the ground truth, and it was found to be over 90% in all experiments. The visual gyroscope and odometer were integrated with a multi-sensor multi-network system and tested in an office corridor with a configuration having a workable WLAN positioning solution and using an outdated WLAN radio map. The vision-aided fused solution was found to improve the mean error of the user position from 1.5 to 2 meters. The visual gyroscope and odometer were also tested as stand-alone system in more challenging environments, namely in a shopping mall and an urban canyon, also resulting in improved position accuracy. In urban canyons the GNSS measurements are typically available, however deteriorated. A method utilizing GNSS carrier phase observations for resolving the scale problem in translation was developed. After observing the ambiguous translation from the consecutive images, the scale was obtained by looking at the differentiated carrier phase measurements from the satellites and observed at the time epochs of capturing the images. As the carrier phase measurements were differentiated, the integer ambiguity in carrier phase measurements was cancelled. Because only the magnitude of the translation and the receiver clock error needed to be resolved, tracking two satellites was enough. By integrating the visual gyroscope’s and odometer’s measurements using a Kalman filter, an initial position could be propagated and a resulting navigation solution obtained. The method was first experimented in an suburban environment propagating only the visual gyroscope induced orientation and odometer’s translation and solving the scale for the translation from differentiated GNSS carrier-phase measurements for two satellites. Since the errors in the visual perception deteriorate the position, the solution was calibrated using the reference system once in three minutes and a mean error of 24 meters was obtained in the 15minute experiment. In a real urban canyon GPS was vision-aided by propagating the user heading and position using the visual gyroscope’s and odometer’s measurements when less than four satellites were observed or the GPS induced position was distorted based on assessing the solution’s residual values. Again, as the visual gyroscope and odometer are dependent of an absolute initialization, and they were re-initialized whenever more than four acceptable satellite observations were available providing a GPS solution, the errors in the GPS position used to calibrate the visual measurements occasionally deteriorated also the vision-aided navigation result. However, the overall position accuracy improved significantly compared to the one obtained using 8.2. Future Development 131 GPS positioning only, namely the mean error decreased from 73 meters to 25 meters when the vision-aided GPS carrier-phase based processing was used. Despite of all challenges in the urban and indoor environments the visual aiding improved the navigation solution in all different experiments discussed in the thesis. Already in their present form when integrated with e.g. IMU the visual methods presented in the thesis would provide a solution with enough accuracy for short term pedestrian navigation. Continuous navigation time could further be increased significantly by using e.g. a floor plan or other absolute positioning means. However, it should be noted that the tests were done using a limited amount of data and therefore future research should assess the performance of the developed methods via extended duration of data collection. Due to its distinctive characteristics, visual processing complements other positioning technologies in order to provide better pedestrian navigation accuracy and reliability. 8.2 Future Development The largest challenges in using the visual gyroscope’s measurements for visionaiding the user attitude is its incapability to observe the magnitude of the turns and therefore the need for occasional absolute calibration. This problem was preliminarily addressed by developing a visual gyroscope utilizing the attitude information obtained from an IMU to aid the vanishing point location computation. From the disagreement of the vanishing point location provided by the IMU and the visual gyroscope, the occurrence of a sharp turn was observed. Future work should address the possibility to observe the magnitude of such a turn and therefore eliminate or at least decrease the need for calibration during navigation. Utilization of more advanced computer vision methods for vanishing point calculation and their impact on the visual gyroscope’s accuracy should also be assessed. As the integration of different systems is beneficial, or even mandatory, for indoor and urban area navigation, the error detection for visual measurements as well as for observation from the other systems is crucial and should be a topic for further research. Means for emphasizing the strengths of all systems involved as well as covering for their deficiencies is a subject for the development of even more functional and seamless integration means. Though this thesis addressed various challenges in the indoor and urban navigation and proposed various visual processing methods for 132 8. Conclusions positioning, the matter of ubiquitous pedestrian navigation is by no means yet solved by using vision-aiding, but definitely improved. BIBLIOGRAPHY [1] GoPro’s web pages. Last accessed June 15, 2012. [Online]. Available: http://gopro.com/cameras/hd-helmet-hero-camera/ [2] National geophysical data center (NGDC). Last accessed April, 2013. [Online]. Available: http://www.ngdc.noaa.gov/geomag-web/#igrfwmm [3] OpenCV open source visual library. Last accessed March 2, 2013. [Online]. Available: http://opencv.org/ [4] M. Afzal, “Use of Earth’s magnetic field for pedestrian navigation,” Ph.D. dissertation, the University of Calgary, Canada, 2011. [5] F. Alizahed-Shabdiz and M. Heidari, “Method of determining position in a hybrid positioning system using a dilution of precision metric,” U.S. Patent 2011/0 080 317, 2009, 19 pages. [6] D. Allan, “Statistics of atomic frequency standards,” Proc. of the IEEE, vol. 54, no. 2, pp. 221–230, 1966. [7] ADIS16375 Data Sheet (Rev. A), Analog Devices, 2010, pp. 28. [8] H. Aoki, B. Schiele, and A. Pentland, “Realtime personal positioning system for wearable computers,” in Proc. of the 3rd IEEE ISWC, San Fransisco, CA, USA, Oct. 18–19, 1999, p. 37. [9] J. Bancroft, “Multiple inertial measurement unit fusion for pedestrian navigation,” Ph.D. dissertation, the University of Calgary, Canada, 2010. [10] J. Bancroft and G. Lachapelle, “Data fusion algorithms for multiple inertial measurement units,” Sensors, vol. 11, pp. 6771–6798, 2011. 134 Bibliography [11] J. Bancroft and G. Lachapelle, “Estimating MEMS gyroscope g-sensitivity errors in foot mounted navigation,” in Proc. UPINLBS, Helsinki, Finland, Oct. 3–4, 2012, p. 6. [12] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “Speeded-up robust features (SURF),” Computer Vision and Image Understanding, vol. 110, pp. 346–359, 2008. [13] J.-C. Bazin and M. Pollefeys, “3-line ransac for orthogonal vanishing point detection,” in Proc. IROS, Algarve, Portugal, Oct. 7–12, 2012, pp. 4282–4287. [14] J. Borkowski and M. Veth, “Passive indoor image-aided inertial attitude estimation using a predictive hough transformation,” in Proc. IEEE/ION Position Location Navigation Symp., Indian Wells, CA, USA, May 4–6, 2010, pp. 295–302. [15] J.-Y. Bouguet. (2010) Camera calibration toolbox lab. Last accessed March 27, 2013. [Online]. http://www.vision.caltech.edu/bouguetj/calib doc/index.html for MatAvailable: [16] J. Campbell, R. Sukthankar, I. Nourbakhsh, and A. Pahwa, “A robust visual odometry and precipice detection system using consumer-grade monocular vision,” in Proc. IEEE International Conference on Robotics and Automation, Barcelona, Spain, Apr. 18–22, 2005, pp. 3421–3427. [17] J. F. Canny, “A computational approach to edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, pp. 679– 698, 1986. [18] M. J. Caruso, “Applications of magnetoresistive sensors in navigation systems,” SAE Technical Paper 970602, 1997. [19] D. Chekhlov, M. Pupilli, W. Mayol-Cuevas, and A. Calway, “Robust real-time visual SLAM using scale prediction and exemplar based feature description,” in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, Jun. 18–23, 2007, pp. 1–7. [20] L. Chen, H. Kuusniemi, Y. Chen, L. Pei, T. Kröger, and R. Chen, “Motion restricted information filter for indoor bluetooth positioning,” Int. J. of Embedded and Real-Time Communication Systems, vol. 3, no. 3, pp. 54–66, 2012. Bibliography 135 [21] R. Chen, H. Kuusniemi, Y. Chen, L. Pei, W. Chen, J. Liu, H. Leppäkoski, and J. Takala, “Multi-sensor, multi-network positioning,” GPS World, vol. 21, no. 2, pp. 18–28, 2010. [22] W. Chen, R. Chen, Y. Chen, H. Kuusniemi, Z. Fu, and J. Wang, “An adaptive calibration approach for a 2-axis digital compass in a low-cost pedestrian navigation system,” in Proc. I2MTC IEEE, Austin, TX, USA, May 3–6, 2010, pp. 1392–1397. [23] T. Chu, N. Guo, S. Backén, and D. Akos, “Monocular camera/IMU/GNSS integration for ground vehicle navigation in challenging GNSS environments,” Sensors, vol. 12, pp. 3162–3185, 2012. [24] J. Collin, “Investigations of self-contained sensors for personal navigation,” Ph.D. dissertation, Tampere University of Technology, Finland, 2006. [25] J. Coughlan and A. Yuille, “Manhattan World: Compass direction from a single image by Bayesian interference,” in Proc. 7th IEEE International Conference on Computer Vision, Kerkyra, Greece, Sep. 20–27, 1999, pp. 1–10. [26] A. Davison, “Real-time simultaneous localisation and mapping with a single camera,” in Proc. IEEE International Conference on Computer Vision, vol. 2, Nice, France, Oct. 13–16, 2003, pp. 1403–1410. [27] P. Djuric, J. Kotecha, J. Zhang, Y. Huang, T. Ghirmai, M. Bugallo, and J. Miguez, “Particle filtering,” IEEE Signal Processing Magazine, vol. 20, no. 5, pp. 19–38, 2003. [28] R. Duda and P. Hart, “Use of the Hough transformation to detect lines and curves in pictures,” Communications of the ACM, vol. 15, no. 1, pp. 11–15, 1972. [29] M. Fischler and R. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 6, no. 24, pp. 381–395, 1981. [30] J. Fletcher, M. Veth, and J. Raquet, “Real-time fusion of image and inertial sensors for navigation,” in Proc. 63rd Annual Meeting of The Institute of Navigation, Cambridge, MA, US, Apr. 23–25, 2007, pp. 534–544. 136 Bibliography [31] D. Forsyth and J. Ponce, Computer Vision: A Modern approach. Saddle River, NJ, USA: Prentice Hall, 2003. Upper [32] A. Gallagher, “Using vanishing points to correct camera rotation in images,” in Proc. 2nd Canadian Conference on Computer and Robot Vision, IEEE, Victoria, BC, Canada, May 9–11, 2005, pp. 460–467. [33] M. Ge, G. Gendt, M. Rothacher, C. Shi, and J. Liu, “Resolution of GPS carrierphase ambiguities in precise point positioning (PPP) with daily observations,” Journal of Geodesy, vol. 82, pp. 389–399, 2007. [34] C. George, The Book of Digital Photography, 2nd ed. ILEX, 2009. Cambridge, England: [35] D. Gerogiannis, C. Nikou, and A. Likas, “Fast and efficient vanishing point detection in indoor images,” in Proc. ICPR, Tsukuba, Japan, Nov. 11–15, 2012, pp. 3244–3247. [36] M. Grewal and A. Andrews, Kalman Filtering Theory and Practice using Matlab, 3rd ed. New York, NY, USA: Wiley, 2008. [37] P. Groves, G. Pulford, C. Littlefield, D. Nash, and C. Mather, “Inertial navigation versus pedestrian dead reckoning: optimizing the integration,” in Proc. ION GNSS, Fort Worth Convention Center, TX, USA, Sep. 24–25, 2007, pp. 2043–2055. [38] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd ed. Cambridge, UK: Cambridge University Press, 2003. [39] R. I. Hartley and P. Sturm, “Triangulation,” Computer Vision and Image Understanding, vol. 68, no. 2, pp. 146–157, 1997. [40] G. Hein, A. Teuber, H.-J. Thierfelder, and A. Wolfe, “GNSS indoors fighting the fading, part 2,” Inside GNSS, vol. 3, no. 4, pp. 47–53, 2008. [41] C. Hide, T. Botterill, and M. Andreotti, “Vision-aided IMU for handheld pedestrian navigation,” in Proc. ION GNSS, Portland, OR, USA, Sep. 21–24, 2010, pp. 534–541. Bibliography 137 [42] C. Hide, T. Moore, and T. Botterill, “Low cost IMU, GPS, and camera integration for handheld indoor positioning,” in Proc. ION GNSS, Portland, OR, USA, Sep. 19–23, 2011, pp. 1378–1385. [43] H. Hile and G. Borriello, “Information overlay for camera phones in indoor environments,” in LNCS 3rd International Symposium on Location- and Context Awareness, vol. 4718, Oberpfaffenhofen, Germany, Sep. 20–21, 2007, pp. 68–84. [44] S. Hilsenbeck, A. Möller, R. Huitl, G. Schroth, M. Kranz, and E. Steinbach, “Scale-preserving long-term visual odometry for indoor navigation,” Sydney, Australia, Nov. 13–15, 2012, p. 10. [45] C. Holzmann and M. Hochgatterer, “Measuring distance with mobile phones using single-camera stereo vision,” in Proc. IEEE 32nd International Conference on Distributed Computing Systems Workshops, Macau, China, Jun. 18–21, 2012, pp. 88–93. [46] P. Hough, “Method and means for recognizing complex patterns,” U.S. Patent 3 069 654, 1962. [47] V. Huttunen and R. Piché, “A monocular camera gyroscope,” Department of Mathematics, Tampere University of Technology, Finland, Tech. Rep. 98, 2011. [48] Standard Atmosphere, International Organization for Standardization Std. ISO 2533:1975, 1975. [49] R. Jirawimut, S. Prakoonwit, F. Cecelja, and W. Balachandran, “Visual odometer for pedestrian navigation,” in Proc. IEEE Instrumentation and Measurement Technology Conference, Anchorage, AK, USA, May 21–23, 2002, pp. 43–48. [50] S. Julier and J. Uhlmann, “A new extension of the Kalman filter to nonlinear systems,” in Proc. Aerosense: The 11th International Symposium Aerospace/Defense Sensing, Orlando, FL, USA, Apr. 20–25, 1997, pp. 182– 192. [51] R. Kalman, “A new approach to linear filtering and prediction problems,” Transactions of the ASME, vol. 82, pp. 35–45, 1960. 138 Bibliography [52] H. Kälviäinen, P. Hirvonen, L. Xu, and E. Oja, “Probabilistic and nonprobabilistic Hough transforms: overview and comparisons,” Image and Vision Computing, vol. 13, pp. 239–252, 1995. [53] K. Kanatani, “Statistical analysis of geometric computation,” CVGIP: Image Understanding, vol. 59, no. 3, pp. 286–306, 1991. [54] E. Kaplan and D. Hegarty, Eds., Understanding GPS Principles and Applications. Norwood, MA, USA: Artech House, 2006. [55] C. Kessler, C. Ascher, N. Frietsch, and M. W. G. Trommer, “Vision-based attitude estimation for indoor navigation using vanishing points and lines,” in Proc. IEEE/ION Position Location Navigation Symp., Indian Wells, CA, USA, May 4–6, 2010, pp. 310–318. [56] N. Kiriyati, Y. Eldar, and A. Bruckstein, “A probabilistic Hough Transform,” Pattern Recognition, vol. 24, no. 4, pp. 303–316, 1991. [57] M. Kirkko-Jaakkola, J. Collin, and J. Takala, “Bias prediction for MEMS gyroscopes,” IEEE Sensors Journal, vol. 12, no. 6, pp. 2157–2163, 2012. [58] B. Kitt, J.Rehder, A. Chambers, M. Schnbein, H. Lategahn, and S. Singh, “Monocular visual odometry using a planar road model to solve scale ambiguity,” in Proc. 5th European Conference on Mobile Robots, Örebro, Sweden, Sep.7–9, 2011, pp. 43–48. [59] D. Kocur, J. Rovňáková, and M. Švecová, “Through wall tracking of moving targets by M-sequence UWB radar,” in Towards Intelligent Engineering and Information Technology, ser. Studies in Computational Intelligence, I. Rudas, J. Fodor, and J. Kacprzyk, Eds. Springer, 2009, vol. 243, pp. 349–364. [60] J. Kosecka and W. Zhang, “Video compass,” in Proc. European Conference on Computer Vision, Copenhagen, Denmark, May 28–31, 2002, pp. 657–673. [61] J. Kuipers, Quaternions and Rotation Sequences. Princeton, NJ, USA: Princeton University Press, 1999. [62] H. Kuusniemi, “User-level reliability and quality monitoring in satellite-based personal navigation,” Ph.D. dissertation, Tampere University of Technology, Finland, 2005. Bibliography 139 [63] H. Kuusniemi, L. Chen, L. Pei, J. Liu, Y. Chen, L. Ruotsalainen, and R. Chen, “Evaluation of Bayesian approaches for multi-sensor multi-network seamless positioning,” in Proc. ION GNSS, Portland, OR, USA, Sep. 19–23, 2011, pp. 2137–2144. [64] H. Kuusniemi, L. Chen, L. Ruotsalainen, L. Pei, Y. Chen, and R. Chen, “Multisensor multi-network seamless positioning with visual aiding,” in Proc. ICLGNSS, Tampere, Finland, Jun. 29–30, 2011, pp. 146–151. [65] H. Kuusniemi, Y. Chen, and L. Chen, “Multi-sensor multi-network positioning,” in Ubiquitous positioning and mobile location-based services in smart phones, R. Chen, Ed. Hershey, MA, USA: IGI Global, 2012, ch. 5. [66] H. Kuusniemi, J. Liu, L. Pei, Y. Chen, and R. Chen, “Reliability considerations of multi-sensor multi-network pedestrian navigation,” IET Radar,Sonar and Navigation, vol. 6, no. 3, pp. 157–164, 2012. [67] G. Lachapelle, M. Cannon, and G. Lu, “High precision GPS navigation with emphasis on carrier phase ambiguity resolution,” Marine Geodesy, vol. 15, no. 4, pp. 253–269, 1992. [68] G. Lachapelle, H. Kuusniemi, D. Dao, G. Macgougan, and M. Cannon, “HSGPS signal analysis and performance under various indoor conditions,” Navigation, vol. 51, no. 1, pp. 29–43, 2004. [69] X. Li, J. Wang, and W. Ding, “Vision-based positioning with a single camera and 3D maps: accuracy and reliability analysis,” Journal of Global Positioning Systems, vol. 10, no. 1, pp. 19–29, 2011. [70] D. Lowe, “Object recognition from local scale-invariant features,” in Proc. International Conference on Computer Vision, Corfu, Greece, Sep. 20–25, 1999, pp. 1150–1157. [71] D. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. [72] L. Ma, Y. Chen, and K. Moore, “Analytical piecewise radial distortion model for precision camera calibration,” IEE Proc. Vision, Image and Signal Processing, vol. 153, no. 4, pp. 468–474, 2006. 140 Bibliography [73] J. Matas, C. Galambos, and J. Kittler, “Robust detection of lines using progressive probabilistic Hough Transform,” Computer Vision and Image Understanding, vol. 78, no. 1, pp. 119–137, 2000. [74] R. Mautz, “Indoor positioning technologies,” Habilitation Thesis, ETH Zurich, 2012. [75] P. Misra and P. Enge, Global Positioning System: Signals, Measurements, and Performance. Lincoln, MA, USA: Ganga-Jamuna Press, 2006. [76] D. Nistér, “An efficient solution to the five-point relative pose problem,” in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, Madison, WI, USA, Jun. 18–20, 2003, pp. 195–202. [77] Nokia USA’s product web pages. Nokia. Last accessed April 15, 2012. [Online]. Available: http://www.nokia.com/us-en/products/phone/n8-00/ [78] QT application framework. Nokia. Accessed 4.4.2013. [Online]. Available: http://qt-project.org/ [79] N. Nourani-Vatani, J. Roberts, and M. Srinivasan, “Practical visual odometry for car-like vehicles,” in Proc. IEEE International Conference on Robotics and Automation, Kobe, Japan, May 12–17, 2009, pp. 3551–3557. [80] J. Parviainen, J. Kantola, and J. Collin, “Differential barometry in personal navigation,” in Proc. IEEE/ION Position Location Navigation Symp., Monterey, CA, USA, May 5–8, 2008, pp. 148–152. [81] L. Pei, R. Chen, J. Liu, H. Kuusniemi, Y. Chen, and T. Tenhunen, “Using motion-awareness for the 3D indoor personal navigation on a smartphone,” in Proc. ION GNSS, Portland, OR, USA, Sep. 21–23, 2011, pp. 2906–2913. [82] D. Prahl and M. Veth, “Coupling vanishing point tracking with inertial navigation to produce drift-free attitude estimates in a structured environment,” in Proc. ION GNSS, Portland, OR, USA, Sep. 22–24, 2010, pp. 3571–3581. [83] M. Quddus, W. Ochieng, and R. Noland, “Current map-matching algortihms for transport applications: state-of-the art and future research directions,” Transportation Research Part C, vol. 15, pp. 312–328, 2007. Bibliography 141 [84] B. Ristic, S. Arulampalm, and N. Gordon, Beyond the Kalman Filter: Particle Filters for Tracking Applications. Norwood, MA, USA: Artech House, 2004. [85] D. Robertson and R. Cipolla, “An image-based system for urban navigation,” in Proc. British Machine Vision Conference, London, UK, Sep. 7–9, 2004, pp. 260–272. [86] T. Roos, P. Myllymäki, H. Tirri, P. Misikangas, and J. Sievänen, “Probabilistic approach to WLAN user location estimation,” Int. J. Wirel. Inform. Netw., vol. 9, pp. 155–164, 2002. [87] E. Rosten and T. Drummond, “Machine learning for high-speed corner detection,” in Proc. ECCV, Graz, Austria, May 7–13, 2006, pp. 430–443. [88] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: an efficient alternative to sift or surf,” in Proc. ICCV, Barcelona, Spain, Nov. 6–13, 2011, pp. 2564 – 2571. [89] L. Ruotsalainen, “Visual gyroscope and odometer for pedestrian indoor navigation with a smartphone,” in Proc. ION GNSS, Nashville, TN, USA, Sep. 19–21, 2012, pp. 2422–2431. [90] L. Ruotsalainen, J. Bancroft, H. Kuusniemi, G. Lachapelle, and R. Chen, “Utilizing visual measurements for obtaining robust attitude and positioning for pedestrians,” in Proc. ION GNSS, Nashville, TN, USA, Sep. 19–21, 2012, pp. 2454–2461. [91] L. Ruotsalainen, J. Bancroft, and G. Lachapelle, “Mitigation of attitude and gyro errors through vision aiding,” in Proc. IPIN, Sydney, Australia, Nov. 13– 15, 2012, p. 9. [92] L. Ruotsalainen, J. Bancroft, G. Lachapelle, H. Kuusniemi, and R. Chen, “Effect of camera characteristics on the accuracy of a visual gyroscope for indoor pedestrian navigation,” in Proc. UPINLBS, Helsinki, Finland, Oct. 2–4, 2012, p. 8. [93] L. Ruotsalainen, H. Kuusniemi, M. Bhuiyan, L. Chen, and R. Chen, “Twodimensional pedestrian navigation solution aided with a visual gyroscope and a visual odometer,” GPS Solutions, vol. 17, pp. 575–586, 2012. [Online]. Available: DOI: 10.1007/s10291-012-0302-8 142 Bibliography [94] L. Ruotsalainen, H. Kuusniemi, and R. Chen, “Heading change detection for indoor navigation with a smartphone camera,” in Proc. IPIN, Guimaraes, Portugal, Sep. 21–23, 2011, p. 7. [95] L. Ruotsalainen, H. Kuusniemi, and R. Chen, “Visual-aided two-dimensional pedestrian indoor navigation in a smartphone,” The Journal of Global Positioning Systems, vol. 10, no. 1, pp. 11–18, 2011. [96] N. Sagias and G. Karagiannidis, “Gaussian class multivariate Weibull distributions: Theory and applications in fading channels,” IEEE Trans. Inf. Theory, vol. 51, no. 10, 2005. [97] A. Santos, L. Tarrataca, and J. Cardoso, “An analysis of navigation algorithms for smartphones using J2ME,” in MobileWireless Middleware, Operating Systems, and Applications, ser. LNICST. Springer Berlin Heidelberg, 2009, vol. 7, pp. 266–279. [98] G. Shrivakshan and C. Chandrasekar, “A comparison of various edge detection techniques used in image processing,” IJCSI International Journal of Computer Science Issues, vol. 9, no. 1, pp. 269–276, 2012. [99] A. Soloviev and D. Venable, “When GNSS goes blind. Integrating vision measurements for navigation in signal-challenged environment,” Inside GNSS, vol. 5, no. 7, pp. 18–29, 2010. [100] HXRMC1 Digital HD POV Camera Recorder. Sony. Last accessed August 26, 2013. [Online]. Available: http://pro.sony.com/bbsc/ssr/product-HXRMC1/ [101] V. Stantchev, T. Schulz, D. H. Trung, and I. Ratchinski, “Optimizing clinical processes with position-sensing,” IEEE IT Professional, vol. 10, no. 2, pp. 31–37, 2008. [102] U. Steinhoff, D. Omerčević, R. Perko, B. Schiele, and A. Leonardis, “How computer vision can help in outdoor positioning,” in Proceedings of the 2007 European conference on Ambient intelligence, ser. AmI’07, vol. 4794, Darmstadt, Germany, Nov. 7–10, 2007, pp. 124–141. [103] D. Strelow, “Motion estimation from image and inertial measurements,” Ph.D. dissertation, Carnegie Mellon University, Pittsburgh, PA, 2004. Bibliography 143 [104] M. Susi, V. Renaudin, and G. Lachapelle, “Detection of quasi-static instants from handheld MEMS devices,” in Proc. IPIN, Guimaraes, Portugal, Sep. 21– 23, 2011, p. 9. [105] D. Titterton and J. Weston, Strapdown Inertial Navigation Technology, 2nd ed. Stevenage, UK: IET, 2004. [106] P. Torr, A. Fitzgibbon, and A. Zisserman, “The problem of degeneracy in structure and motion recovery from uncalibrated image sequences,” International Journal of Computer Vision, vol. 32, no. 1, pp. 27–44, 1999. [107] Demo software: Sift keypoint detector. University of British Columbia. Last accessed April 17, 2013. [Online]. Available: http://www.cs.ubc.ca/˜lowe/keypoints [108] M. Veth and J. Raquet, “Fusion of low-cost imaging and inertial sensors for navigation,” Air Force Institute of Technology, OH, Tech. Rep. 0704-0188, 2007, 11 pages. [109] G. Welch and G. Bishop, “An introduction to the Kalman filter,” University of North Carolina at Chapel Hill, NC, Tech. Rep., 1995. [110] E. Williams. Aviation formulary v1.46. Last accessed April 23, 2013. [Online]. Available: http://williams.best.vwh.net/avform.htm [111] Y. Xu, S. Oh, and A. Hoogs, “A minimum error vanishing point detection approach for uncalibrated monocular images of man-made environments,” in Proc. CVPR, Portland, OR, USA, Jun. 23–28, 2013, pp. 1376–1383. [112] M. Youssef, A. Agrawala, and A. Shankar, “WLAN location determination via clustering and probability distributions,” in Proc. First IEEE International Conference on Pervasive Computing and Communications, Dallas Fort-Worth, TX, USA, Mar. 23–26, 2003, pp. 143–150. [113] W. Zhang and J. Kosecka, “Image based localization in urban environments,” in Proc. The Third International Symposium on 3D Data Processing, Visualization and Transmission, Chapel Hill, NC, USA, Jun. 14–16, 2006, pp. 33–40. [114] X. Zhang, P. Mumford, and C. Rizos, “Allan variance analysis on error characters of MEMS inertial sensors for an FPGA-based GPS/INS system,” in Proc. 144 Bibliography International Symposium on GPS/GNSS, Tokyo, Japan, Nov. 11–14, 2008, pp. 127–133. [115] X. Zhang, A. Rad, and Y.-K. Wong, “Sensor fusion of monocular cameras and laser rangefinders for line-based simultaneous localization and mapping (SLAM) tasks in autonomous mobile robots,” Sensors, vol. 12, pp. 429–452, 2012. Bibliography 145 Suomen Geodeettisen laitoksen julkaisut: Verffentlichungen des Finnischen Geodtischen Institutes: Publications of the Finnish Geodetic Institute: 1. Y. VÄISÄLÄ: Tafeln für geodätische Berechnungen nach den Erddimensionen von Hayford. Helsinki 1923. 30 S. 2. Y. VÄISÄLÄ: Die Anwendung der Lichtinterferenz zu Längenmessungen auf grösseren Distanzen. Helsinki 1923. 22 S. 3. ILMARI BONSDORFF, Y. LEINBERG, W. HEISKANEN: Die Beobachtungsergebnisse der südfinnischen Triangulation in den Jahren 1920-1923. Helsinki 1924. 235 S. 4. W. HEISKANEN: Untersuchungen über Schwerkraft und Isostasie. Helsinki 1924. 96 S. 1 Karte. 5. W. HEISKANEN: Schwerkraft und isostatische Kompensation in Norwegen. Helsinki 1926. 33 S. 1 Karte. 6. W. HEISKANEN: Die Erddimensionen nach den europäischen Gradmessungen. Helsinki 1926. 26 S. 7. ILMARI BONSDORFF, V.R. ÖLANDER, Y. LEINBERG: Die Beobachtungsergebnisse der südfinnischen Triangulation in den Jahren 1924-1926. Helsinki 1927. 164 S. 1 Karte. 8. V.R. ÖLANDER: Ausgleichung einer Dreieckskette mit Laplaceschen Punkten. Helsinki 1927. 49 S. 1 Karte. 9. U. PESONEN: Relative Bestimmungen der Schwerkraft auf den Dreieckspunkten der südfinnischen Triangulation in den Jahren 1924-1925. Helsinki 1927. 129 S. 10. ILMARI BONSDORFF: Das Theorem von Clairaut und die Massenverteilung im Erdinnern. Helsinki 1929. 10 S. 11. ILMARI BONSDORFF, V.R. ÖLANDER, W. HEISKANEN, U. PESONEN: Die Beobachtungsergebnisse der Triangulationen in den Jahren 1926-1928. Helsinki 1929. 139 S. 1 Karte. 12. W. HEISKANEN: Über die Elliptizität des Erdäquators. Helsinki 1929. 18 S. 13. U. PESONEN: Relative Bestimmungen der Schwerkraft in Finnland in den Jahren 1926-1929. Helsinki 1930. 168 S. 1 Karte. 14. Y. VÄISÄLÄ: Anwendung der Lichtinterferenz bei Basismessungen. Helsinki 1930. 47 S. 15. M. FRANSSILA: Der Einfluss der den Pendel umgebenden Luft auf die Schwingungszeit beim v. Sterneckschen Pendelapparat. Helsinki 1931. 23 S. 16. Y. LEINBERG: Ergebnisse der astronomischen Ortsbestimmungen auf den finnischen Dreieckspunkten. Helsinki 1931. 162 S. 17. V.R. ÖLANDER: Über die Beziehung zwischen Lotabweichungen und Schwereanomalien sowie über das Lotabweichungssystem in Süd-Finnland. Helsinki 1931. 23 S. 18. PENTTI KALAJA, UUNO PESONEN, V.R. ÖLANDER, Y. LEINBERG: Beobachtungsergebnisse. Helsinki 1933. 240 S. 1 Karte. 19. R.A. HIRVONEN: The continental undulations of the geoid. Helsinki 1934. 89 pages. 1 map. 146 Bibliography 20. ILMARI BONSDORFF: Die Länge der Versuchsbasis von Helsinki und Längenveränderungen der Invardrähte 634-637. Helsinki 1934. 41 S. 21. V.R. ÖLANDER: Zwei Ausgleichungen des grossen südfinnischen Dreieckskranzes. Helsinki 1935. 66 S. 1 Karte. 22. U. PESONEN, V.R. ÖLANDER: Beobachtungsergebnisse. Winkelmessungen in den Jahren 1932-1935. Helsinki 1936. 148 S. 1 Karte. 23. R.A. HIRVONEN: Relative Bestimmungen der Schwerkraft in Finnland in den Jahren 1931, 1933 und 1935. Helsinki 1937. 151 S. 24. R.A. HIRVONEN: Bestimmung des Schwereunterschiedes Helsinki-Potsdam im Jahre 1935 und Katalog der finnischen Schwerestationen. Helsinki 1937. 36 S. 1 Karte. 25. T.J. KUKKAMÄKI: Über die nivellitische Refraktion. Helsinki 1938. 48 S. 26. Finnisches Geodätisches Institut 1918-1938. Helsinki 1939. 126 S. 2 Karten. 27. T.J. KUKKAMÄKI: Formeln und Tabellen zur Berechnung der nivellitischen Refraktion. Helsinki 1939. 18 S. 28. T.J. KUKKAMÄKI: Verbesserung der horizontalen Winkelmessungen wegen der Seitenrefraktion. Helsinki 1939. 18 S. 29. ILMARI BONSDORFF: Ergebnisse der astronomischen Ortsbestimmungen im Jahre 1933. Helsinki 1939. 47 S. 30. T. HONKASALO: Relative Bestimmungen der Schwerkraft in Finnland im Jahre 1937. Helsinki 1941. 78 S. 31. PENTTI KALAJA: Die Grundlinienmessungen des Geodätischen Institutes in den Jahren 19331939 nebst Untersuchungen über die Verwendung der Invardrähte. Helsinki 1942. 149 S. 32. U. PESONEN, V.R. ÖLANDER: Beobachtungsergebnisse. Winkelmessungen in den Jahren 1936-1940. Helsinki 1942. 165 S. 1 Karte. 33. PENTTI KALAJA: Astronomische Ortsbestimmungen in den Jahren 1935-1938. Helsinki 1944. 142 S. 34. V.R. ÖLANDER: Astronomische Azimutbestimmungen auf den Dreieckspunkten in den Jahren 1932-1938; Lotabweichungen und Geoidhöhen. Helsinki 1944. 107 S. 1 Karte. 35. U. PESONEN: Beobachtungsergebnisse. Winkelmessungen in den Jahren 1940-1947. Helsinki 1948. 165 S. 1 Karte. 36. Professori Ilmari Bonsdorffille hänen 70-vuotispäivänään omistettu juhlajulkaisu. Publication dedicated to Ilmari Bonsdorff on the occasion of his 70th anniversary. Helsinki 1949. 262 pages. 13 maps. 37. TAUNO HONKASALO: Measuring of the 864 m-long Nummela standard base line with the Väisälä light interference comparator and some investigations into invar wires. Helsinki 1950. 88 pages. 38. V.R. ÖLANDER: On the geoid in the Baltic area and the orientation of the Baltic Ring. Helsinki 1950. 26 pages. 39. W. HEISKANEN: On the world geodetic system. Helsinki 1951. 25 pages. Bibliography 147 40. R.A. HIRVONEN: The motions of Moon and Sun at the solar eclipse of 1947 May 20th. Helsinki 1951. 36 pages. 41. PENTTI KALAJA: Catalogue of star pairs for northern latitudes from 55◦ to 70◦ for astronomic determination of latitudes by the Horrebow-Talcott method. Helsinki 1952. 191 pages. 42. ERKKI KÄÄRIÄINEN: On the recent uplift of the Earth’s crust in Finland. Helsinki 1953. 106 pages. 1 map. 43. PENTTI KALAJA: Astronomische Ortsbestimmungen in den Jahren 1946-1948. Helsinki 1953. 146 S. 44. T.J. KUKKAMÄKI, R.A. HIRVONEN: The Finnish solar eclipse expeditions to the Gold Coast and Brazil 1947. Helsinki 1954. 71 pages. 45. JORMA KORHONEN: Einige Untersuchungen über die Einwirkung der Abrundungsfehler bei Gross-Ausgleichungen. Neu-Ausgleichung des südfinnischen Dreieckskranzes. Helsinki 1954. 138 S. 3 Karten. 46. Professori Weikko A. Heiskaselle hänen 60-vuotispäivänään omistettu juhlajulkaisu. Publication dedicated to Weikko A. Heiskanen on the occasion of his 60th anniversary. Helsinki 1955. 214 pages. 47. Y. VÄISÄLÄ: Bemerkungen zur Methode der Basismessung mit Hilfe der Lichtinterferenz. Helsinki 1955. 12 S. 48. U. PESONEN, TAUNO HONKASALO: Beobachtungsergebnisse der finnischen Triangulationen in den Jahren 1947-1952. Helsinki 1957. 91 S. 49. PENTTI KALAJA: Die Zeiten von Sonnenschein, Dämmerung und Dunkelheit in verschiedenen Breiten. Helsinki 1958. 63 S. 50. V.R. ÖLANDER: Astronomische Azimutbestimmungen auf den Dreieckspunkten in den Jahren 1938-1952. Helsinki 1958. 90 S. 1 Karte. 51. JORMA KORHONEN, V.R. ÖLANDER, ERKKI HYTÖNEN: The results of the base extension nets of the Finnish primary triangulation. Helsinki 1959. 57 pages. 5 appendices. 1 map. 52. V.R. ÖLANDER: Vergleichende Azimutbeobachtungen mit vier Instrumenten. Helsinki 1960. 48 pages. 53. Y. VÄISÄLÄ, L. OTERMA: Anwendung der astronomischen Triangulationsmethode. Helsinki 1960. 18 S. 54. V.R. ÖLANDER: Astronomical azimuth determinations on trigonometrical stations in the years 1955-1959. Helsinki 1961. 15 pages. 55. TAUNO HONKASALO: Gravity survey of Finland in years 1945-1960. Helsinki 1962. 35 pages. 3 maps. 56. ERKKI HYTÖNEN: Beobachtungsergebnisse der finnischen Triangulationen in den Jahren 1953-1962. Helsinki 1963. 59 S. 57. ERKKI KÄÄRIÄINEN: Suomen toisen tarkkavaaituksen kiintopisteluettelo I. Bench mark list I of the Second Levelling of Finland. Helsinki 1963. 164 pages. 2 maps. 58. ERKKI HYTÖNEN: Beobachtungsergebnisse der finnischen Triangulationen in den Jahren 1961-1962. Helsinki 1963. 32 S. 148 Bibliography 59. AIMO KIVINIEMI: The first order gravity net of Finland. Helsinki 1964. 45 pages. 60. V.R. ÖLANDER: General list of astronomical azimuths observed in 1920-1959 in the primary triangulation net. Helsinki 1965. 47 pages. 1 map. 61. ERKKI KÄÄRIÄINEN: The second levelling of Finland in 1935-1955. Helsinki 1966. 313 pages. 1 map. 62. JORMA KORHONEN: Horizontal angles in the first order triangulation of Finland in 19201962. Helsinki 1966. 112 pages. 1 map. 63. ERKKI HYTÖNEN: Measuring of the refraction in the Second Levelling of Finland. Helsinki 1967. 18 pages. 64. JORMA KORHONEN: Coordinates of the stations in the first order triangulation of Finland. Helsinki 1967. 42 pages. 1 map. 65. Geodeettinen laitos - The Finnish Geodetic Institute 1918-1968. Helsinki 1969. 147 pages. 4 maps. 66. JUHANI KAKKURI: Errors in the reduction of photographic plates for the stellar triangulation. Helsinki 1969. 14 pages. 67. PENTTI KALAJA, V.R. ÖLANDER: Astronomical determinations of latitude and longitude in 1949-1958. Helsinki 1970. 242 pages. 1 map. 68. ERKKI KÄÄRIÄINEN: Astronomical determinations of latitude and longitude in 1954-1960. Helsinki 1970. 95 pages. 1 map. 69. AIMO KIVINIEMI: Niinisalo calibration base line. Helsinki 1970. 36 pages. 1 sketch appendix. 70. TEUVO PARM: Zero-corrections for tellurometers of the Finnish Geodetic Institute. Helsinki 1970. 18 pages. 71. ERKKI KÄÄRIÄINEN: Astronomical determinations of latitude and longitude in 1961-1966. Helsinki 1971. 102 pages. 1 map. 72. JUHANI KAKKURI: Plate reduction for the stellar triangulation. Helsinki 1971. 38 pages. 73. V.R. ÖLANDER: Reduction of astronomical latitudes and longitudes 1922-1948 into FK4 and CIO systems. Helsinki 1972. 40 pages. 74. JUHANI KAKKURI AND KALEVI KALLIOMÄKI: Photoelectric time micrometer. Helsinki 1972. 53 pages. 75. ERKKI HYTÖNEN: Absolute gravity measurement with long wire pendulum. Helsinki 1972. 142 pages. 76. JUHANI KAKKURI: Stellar triangulation with balloon-borne beacons. Helsinki 1973. 48 pages. 77. JUSSI KÄÄRIÄINEN: Beobachtungsergebnisse der finnischen Winkelmessungen in den Jahren 1969-70. Helsinki 1974. 40 S. 78. AIMO KIVINIEMI: High precision measurements for studying the secular variation in gravity in Finland. Helsinki 1974. 64 pages. 79. TEUVO PARM: High precision traverse of Finland. Helsinki 1976. 64 pages. Bibliography 149 80. R.A. HIRVONEN: Precise computation of the precession. Helsinki 1976. 25 pages. 81. MATTI OLLIKAINEN: Astronomical determinations of latitude and longitude in 1972-1975. Helsinki 1977. 90 pages. 1 map. 82. JUHANI KAKKURI AND JUSSI KÄÄRIÄINEN: The Second Levelling of Finland for the Aland archipelago. Helsinki 1977. 55 pages. 83. MIKKO TAKALO: Suomen Toisen tarkkavaaituksen kiintopisteluettelo II. Bench mark list II of the Second Levelling of Finland. Helsinki 1977. 150 sivua. 84. MATTI OLLIKAINEN: Astronomical azimuth determinations on triangulation stations in 19621970. Helsinki 1977. 47 pages. 1 map. 85. MARKKU HEIKKINEN: On the tide-generating forces. Helsinki 1978. 150 pages. 86. PEKKA LEHMUSKOSKI AND JAAKKO MÄKINEN: Gravity measurements on the ice of Bothnian Bay. Helsinki 1978. 27 pages. 87. T.J. KUKKAMÄKI: Väisälä interference comparator. Helsinki 1978. 49 pages. 88. JUSSI KÄÄRIÄINEN: Observing the Earth Tides with a long water-tube tiltmeter. Helsinki 1979. 74 pages. 89. Publication dedicated to T.J. Kukkamäki on the occasion of his 70th anniversary. Helsinki 1979. 184 pages. 90. B. DUCARME AND J. KÄÄRIÄINEN: The Finnish Tidal Gravity Registrations in Fennoscandia. Helsinki 1980. 43 pages. 91. AIMO KIVINIEMI: Gravity measurements in 1961-1978 and the results of the gravity survey of Finland in 1945-1978. Helsinki 1980. 18 pages. 3 maps. 92. LIISI OTERMA: Programme de latitude du tube zénithal visuel de l’observatoire Turku-Tuorla systéme amélioré de 1976. Helsinki 1981. 18 pages. 93. JUHANI KAKKURI, AIMO KIVINIEMI AND RAIMO KONTTINEN: Contributions from the Finnish Geodetic Institute to the Tectonic Plate Motion Studies in the Area between the Pamirs and Tien-Shan Mountains. Helsinki 1981. 34 pages. 94. JUSSI KÄÄRIÄINEN: Measurement of the Ekeberg baseline with invar wires. Helsinki 1981. 17 pages. 95. MATTI OLLIKAINEN: Astronomical determinations of latitude and longitude in 1976-1980. Helsinki 1982. 90 pages. 1 map. 96. RAIMO KONTTINEN: Observation results. Angle measurements in 1977-1978. Helsinki 1982. 29 pages. 97. G.P. ARNAUTOV, YE N. KALISH, A. KIVINIEMI, YU F. STUS, V.G. TARASIUK, S.N. SCHEGLOV: Determination of absolute gravity values in Finland using laser ballistic gravimeter. Helsinki 1982. 18 pages. 98. LEENA MIKKOLA (EDITOR): Mean height map of Finland. Helsinki 1983. 3 pages. 1 map. 99. MIKKO TAKALO AND JAAKKO MÄKINEN: The Second Levelling of Finland for Lapland. Helsinki 1983. 144 pages. 100. JUSSI KÄÄRIÄINEN: Baseline Measurements with invar wires in Finland 1958-1970. Helsinki 1984. 78 pages. 150 Bibliography 101. RAIMO KONTTINEN: Plate motion studies in Central Asia. Helsinki 1985. 31 pages. 102. RAIMO KONTTINEN: Observation results. Angle measurements in 1979-1983. Helsinki 1985. 30 pages. 103. J. KAKKURI, T.J. KUKKAMÄKI, J.-J. LEVALLOIS ET H. MORITZ: Le 250e anniversaire de la mesure de l’arc du meridien en Laponie. Helsinki 1986. 60 pages. 104. G. ASCH, T. JAHR, G. JENTZSCH, A. KIVINIEMI AND J. KÄÄRIÄINEN: Measurements of Gravity Tides along the ’Blue Road Geotraverse’ in Fennoscandia. Helsinki 1987. 57 pages. 105. JUSSI KÄÄRIÄINEN, RAIMO KONTTINEN, LU QIANKUN AND DU ZONG YU: The Chang Yang Standard Baseline. Helsinki 1986. 36 pages. 106. E.W. GRAFAREND, H. KREMERS, J. KAKKURI AND M. VERMEER: Adjusting the SW Finland Triangular Network with the TAGNET 3-D operational geodesy software. Helsinki 1987. 60 pages. 107. MATTI OLLIKAINEN: Astronomical determinations of latitude and longitude in 1981-1983. Helsinki 1988. 37 pages. 108. MARKKU POUTANEN: Observation results. Angle measurements in 1967-1973. Helsinki 1988. 35 pages. 109. JUSSI KÄÄRIÄINEN, RAIMO KONTTINEN AND ZSUZSANNA NÉMETH: The Gödöllö Standard Baseline. Helsinki 1988. 66 pages. 110. JUSSI KÄÄRIÄINEN AND HANNU RUOTSALAINEN: Tilt measurements in the underground laboratory Lohja 2, Finland, in 1977-1987. Helsinki 1989. 37 pages. 111. MIKKO TAKALO: Lisäyksiä ja korjauksia Suomen tarkkavaaitusten linjastoon 1977-1989. Helsinki 1991. 98 sivua. 112. RAIMO KONTTINEN: Observation results. Angle measurements in the Pudasjärvi loop in 1973-1976. Helsinki 1991. 42 pages. 113. RAIMO KONTTINEN, JORMA JOKELA AND LI QUAN: The remeasurement of the Chang Yang Standard Baseline. Helsinki 1991. 40 pages. 114. JUSSI KÄÄRIÄINEN, RAIMO KONTTINEN AND MARKKU POUTANEN: Interference measurements of the Nummela Standard Baseline in 1977, 1983, 1984 and 1991. Helsinki 1992. 78 pages. 115. JUHANI KAKKURI (EDITOR): Geodesy and geophysics. Helsinki 1993. 200 pages. 116. JAAKKO MÄKINEN, HEIKKI VIRTANEN, QIU QI-XIAN AND GU LIANG-RONG: The Sino-Finnish absolute gravity campaign in 1990. Helsinki 1993. 49 pages. 117. RAIMO KONTTINEN: Observation results. Geodimeter observations in 1971-72, 1974-80 and 1984-85. Helsinki 1994. 58 pages. 118. RAIMO KONTTINEN: Observation results. Angle measurements in 1964-65, 1971, 1984 and 1986-87. Helsinki 1994. 67 pages. 119. JORMA JOKELA: The 1993 adjustment of the Finnish First-Order Terrestrial Triangulation. Helsinki 1994. 137 pages. 120. MARKKU POUTANEN (EDITOR): Interference measurements of the Taoyuan Standard Baseline. Helsinki 1995. 35 pages. Bibliography 151 121. JORMA JOKELA: Interference measurements of the Chang Yang Standard Baseline in 1994. Kirkkonummi 1996. 32 pages. 122. OLLI JAAKKOLA: Quality and automatic generalization of land cover data. Kirkkonummi 1996. 39 pages. 123. MATTI OLLIKAINEN: Determination of orthometric heights using GPS levelling. Kirkkonummi 1997. 143 pages. 124. TIINA KILPELÄINEN: Multiple Representation and Generalization of Geo-Databases for Topographic Maps. Kirkkonummi 1997. 229 pages. 125. JUSSI KÄÄRIÄINEN AND JAAKKO MÄKINEN: The 1979-1996 gravity survey and the results of the gravity survey of Finland 1945-1996. Kirkkonummi 1997. 24 pages. 1 map. 126. ZHITONG WANG: Geoid and crustal structure in Fennoscandia. Kirkkonummi 1998. 118 pages. 127. JORMA JOKELA AND MARKKU POUTANEN: The Väisälä baselines in Finland. Kirkkonummi 1998. 61 pages. 128. MARKKU POUTANEN: Sea surface topography and vertical datums using space geodetic techniques. Kirkkonummi 2000. 158 pages 129. MATTI OLLIKAINEN, HANNU KOIVULA AND MARKKU POUTANEN: The Densification of the EUREF Network in Finland. Kirkkonummi 2000. 61 pages. 130. JORMA JOKELA, MARKKU POUTANEN, ZHAO JINGZHAN, PEI WEILI, HU ZHENYUAN AND ZHANG SHENGSHU: The Chengdu Standard Baseline. Kirkkonummi 2000. 46 pages. 131. JORMA JOKELA, MARKKU POUTANEN, ZSUZSANNA NÉMETH AND GÁBOR VIRÁG: Remeasurement of the Gödöllö Standard Baseline. Kirkkonummi 2001. 37 pages. 132. ANDRES RÜDJA: Geodetic Datums, Reference Systems and Geodetic Networks in Estonia. Kirkkonummi 2004. 311 pages. 133. HEIKKI VIRTANEN: Studies of Earth Dynamics with the Superconducting Gravimeter. Kirkkonummi 2006. 130 pages. 134. JUHA OKSANEN: Digital elevation model error in terrain analysis. Kirkkonummi 2006. 142 pages. 2 maps. 135. MATTI OLLIKAINEN: The EUVN-DA GPS campaign in Finland. Kirkkonummi 2006. 42 pages. 136. ANNU-MAARIA NIVALA: Usability perspectives for the design of interactive maps. Kirkkonummi 2007. 157 pages. 137. XIAOWEI YU: Methods and techniques for forest change detection and growth estimation using airborne laser scanning data. Kirkkonummi 2007. 132 pages. 138. LASSI LEHTO: Real-time content transformations in a WEB service-based delivery architecture for geographic information. Kirkkonummi 2007. 150 pages. 139. PEKKA LEHMUSKOSKI, VEIKKO SAARANEN, MIKKO TAKALO AND PAAVO ROUHIAINEN: Suomen Kolmannen tarkkavaaituksen kiintopisteluettelo. Bench Mark List of the Third Levelling of Finland. Kirkkonummi 2008. 220 pages. 152 Bibliography 140. EIJA HONKAVAARA: Calibrating digital photogrammetric airborne imaging systems using a test field. Kirkkonummi 2008. 139 pages. 141. MARKKU POUTANEN, EERO AHOKAS, YUWEI CHEN, JUHA OKSANEN, MARITA PORTIN, SARI RUUHELA, HELI SUURMÄKI (EDITORS): Geodeettinen laitos Geodetiska Institutet Finnish Geodetic Institute 19182008. Kirkkonummi 2008. 173 pages. 142. MIKA KARJALAINEN: Multidimensional SAR Satellite Images a Mapping Perspective. Kirkkonummi 2010. 132 pages. 143. MAARIA NORDMAN: Improving GPS time series for geodynamic studies. Kirkkonummi 2010. 116 pages. 144. JORMA JOKELA AND PASI HÄKLI: Interference measurements of the Nummela Standard Baseline in 2005 and 2007. Kirkkonummi 2010. 85 pages. 145. EETU PUTTONEN: Tree Species Classification with Multiple Source Remote Sensing Data. Kirkkonummi 2012. 162 pages. 146. JUHA SUOMALAINEN: Empirical Studies on Multiangular, Hyperspectral, and Polarimetric Reflectance of Natural Surfaces. Kirkkonummi 2012. 144 pages. 147. LEENA MATIKAINEN: Object-based interpretation methods for mapping built-up areas. Kirkkonummi 2012. 210 pages. 148. LAURI MARKELIN: Radiometric calibration, validation and correction of multispectral photogrammetric imagery. Kirkkonummi 2013. 160 pages. 149. XINLIAN LIANG: Feasibility of Terrestrial Laser Scanning for Plotwise Forest Inventories. Kirkkonummi 2013. 150 pages. 150. EERO AHOKAS: Aspects of accuracy, scanning angle optimization, and intensity calibration related to nationwide laser scanning. Kirkkonummi 2013. 124 pages. 151. LAURA RUOTSALAINEN: Vision-aided Pedestrian Navigation for Challenging GNSS Environments. Kirkkonummi 2013. 180 pages.

### Related manuals

Download PDF

advertisement