Use R! Series Editors: Robert Gentleman Kurt Hornik Giovanni G. Parmigiani For further volumes: http://www.springer.com/series/6991 Dirk Eddelbuettel Seamless R and C++ Integration with Rcpp 123 Dirk Eddelbuettel River Forest Illinois, USA ISBN 978-1-4614-6867-7 ISBN 978-1-4614-6868-4 (eBook) DOI 10.1007/978-1-4614-6868-4 Springer New York Heidelberg Dordrecht London Library of Congress Control Number: 2013933242 © The Author 2013 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) To Lisa, Anna and Julia Preface Rcpp is an R add-on package which facilitates extending R with C++ functions. It is being used for anything from small and quickly constructed add-on functions written either to fluidly experiment with something new or to accelerate computing by replacing an R function with its C++ equivalent to large-scale bindings for existing libraries, or as a building block in entirely new research computing environments. While still relatively new as a project, Rcpp has already become widely deployed among users and developers in the R community. Rcpp is now the most popular language extension for the R system and used by over 100 CRAN packages as well as ten BioConductor packages. This books aims to provide a solid introduction to Rcpp. Target Audience This book is for R users who would like to extend R with C++ code. Some familiarity with R is certainly helpful; a number of other books can provide refreshers or specific introductions. C++ knowledge is also helpful, though not strictly required. An appendix provides a very brief introduction for C++ to those familiar only with the R language. The book should also be helpful to those coming to R with more of a C++ programming background. However, additional background reading may be required to obtain a firmer grounding in R itself. Chambers (2008) is a good introduction to the philosophy behind the R system and a helpful source in order to acquire a deeper understanding. There may also be some readers who would like to see how Rcpp works internally. Covering that aspect, however, requires a fairly substantial C++ content and is not what this book is trying to provide. The focus of this book is clearly on how to use Rcpp. vii viii Preface Historical Context Rcpp first appeared in 2005 as a (fairly small when compared to its current size) contribution by Dominick Samperi to the RQuantLib package started by Eddelbuettel in 2002 (Eddelbuettel and Nguyen 2012). Rcpp became a CRAN package in its own name in early 2006. Several releases (all provided by Samperi) followed in quick succession under the name Rcpp. The package was then renamed to RcppTemplate; several more releases followed during 2006 under the new name. However, no new releases were made during 2007, 2008, or most of 2009. Following a few updates in late 2009, the RcppTemplate package has since been archived on CRAN for lack of active maintenance. Given the continued use of the package, Eddelbuettel decided to revitalize it. New releases, using the original name Rcpp, started in November 2008. These included an improved build and distribution process, additional documentation, and new functionality—while retaining the existing “classic Rcpp” interface. While not described here, this API will continue to be provided and supported via the RcppClassic package (Eddelbuettel and François 2012c). Reflecting evolving C++ coding standards (see Meyers 2005), Eddelbuettel and François started a significant redesign of the code base in 2009. This added numerous new features, many of which are described in the package via different vignettes. This redesigned version of Rcpp (Eddelbuettel and François 2012a) has become widely used with over ninety CRAN packages depending on it as of November 2012. It is also the version described in this book. Rcpp continues to be under active development, and extensions are being added. The content described here shall remain valid and supported. Related Work Integration of C++ and R has been addressed by several authors; the earliest published reference is probably Bates and DebRoy (2001). The “Writing R Extensions” manual (R Development Core Team 2012d) has also been mentioning C++ and R integration since around that time. An unpublished paper by Java et al. (2007) expresses several ideas that are close to some of our approaches, though not yet fully fleshed out. The Rserve package (Urbanek 2003, 2012) acts as a socket server for R. On the server side, Rserve translates R data structures into a binary serialization format and uses TCP/IP for transfer. On the client side, objects are reconstructed as instances of Java or C++ classes that emulate the structure of R objects. The packages rcppbind (Liang 2008), RAbstraction (Armstrong 2009a), and RObjects (Armstrong 2009b) are all implemented using C++ templates. None of them have matured to the point of a CRAN release. CXXR (Runnalls 2009) approaches this topic from the other direction: its aim is to completely refactor R on a stronger C++ foundation. CXXR is therefore concerned with all aspects of the R interpreter, read-eval-print loop (REPL), and threading; object interchange be- Preface ix tween R and C++ is but one part. A similar approach is discussed by Temple Lang (2009a) who suggests making low-level internals extensible by package developers in order to facilitate extending R. Temple Lang (2009b), using compiler output for references on the code in order to add bindings and wrappers, offers a slightly different angle. Lastly, the rdyncall package (Adler 2012) provides a direct interface from R into C language APIs. This can be of interest if R programmers want to access lower-level programming interfaces directly. However, it does not aim for the same object-level interchange that is possible via C++ interfaces, and which we focus on with Rcpp. Typographic Convention The typesetting follows the usage exemplified both by the publisher, and by the Journal of Statistical Software. We use • Sans-serif for programming language such as R or C++ • Boldface for (CRAN or other) software packages such as Rcpp or inline • Courier for short segments of code or variables such as x <- y + z We make use of a specific environment for the short pieces of source code interwoven with the main text. River Forest, IL, USA Dirk Eddelbuettel Acknowledgements Rcpp is the work of many contributors, and a few words of thanks are in order. Dominick Samperi contributed the original code which, while much more limited in scope than the current Rcpp, pointed clearly in the right direction of using C++ templates to convert between R and C++ types. Romain François has shown impeccable taste in designing and implementing very large parts of Rcpp as it is today. The power of the current design owes a lot to his work and boundless energy. Key components such as modules and sugar, as well as lot a of template “magic,” are his contributions. This started as an aside to make object interchange easier for our RProtoBuf package—and it has taken us down a completely different, but very exciting road. It has been a pleasure to work with Romain, I remain in awe of his work, and I look forward to many more advances with Rcpp. Doug Bates has been a help from the very beginning: had it not been for some simple macros to pick list components out of SEXP types, I may never have started RQuantLib a decade ago. Doug later joined this project and has been instrumental in a few key decisions regarding Rcpp and RcppArmadillo and has taken charge of the RcppEigen project. John Chambers become a key supporter right when Rcpp modules started and contributed several important pieces at the gory intersection between R and C++. It was very flattering for Romain and me to hear from John how Rcpp is so close to an original design vision of a whole-object interchange between systems which was already present on a hand-drawn Bell Labs designs from the 1970s. JJ Allaire has become a very important contributor to Rcpp and a key supporter of the same idea of an almost natural pairing between R and C++. The Rcpp attributes which he contributed are showing a lot of promise, and we expect great things to be built on top of this. Several other members of the R Core team—notably Kurt Hornik, Uwe Ligges, Martyn Plummer, Brian Ripley, Luke Tierney, and Simon Urbanek—have helped at various points with anything from build issues and portability to finer points of R internals. Last but not least, there would of course be no Rcpp if there was no R system to build upon and to extend. xi xii Acknowledgements Finally, many members of the R and Rcpp communities have been very supportive at different workshops, conference presentations, and via the mailing lists. Numerous good questions and suggestions have come this way. And, of course, it is seeing this work being used so actively which motivates us and keeps us moving forward with Rcpp. Contents Part I Introduction 1 A Gentle Introduction to Rcpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Background: From R to C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 A First Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 A First R Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 A First C++ Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Using Inline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.5 Using Rcpp Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.6 A Second R Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.7 A Second C++ Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.8 A Third R Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.9 A Third C++ Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 A Second Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 R Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 C++ Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.4 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 7 7 7 8 9 11 12 12 14 14 15 15 15 16 17 18 2 Tools and Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Overall Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 General Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Platform-Specific Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The R Application Programming Interface . . . . . . . . . . . . . . . . . . . . . . 2.4 A First Compilation with Rcpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 The Inline Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Using Includes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 20 20 21 22 23 25 25 27 xiii xiv Contents 2.5.3 Using Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.4 Creating Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Rcpp Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 30 31 32 Part II Core Data Types 3 Data Structures: Part One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 The RObject Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The IntegerVector Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 A First Example: Returning Perfect Numbers . . . . . . . . . . . . . 3.2.2 A Second Example: Using Inputs . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 A Third Example: Using Wrong Inputs . . . . . . . . . . . . . . . . . . 3.3 The NumericVector Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 A First Example: Using Two Inputs . . . . . . . . . . . . . . . . . . . . . 3.3.2 A Second Example: Introducing clone . . . . . . . . . . . . . . . . . 3.3.3 A Third Example: Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Other Vector Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 LogicalVector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 CharacterVector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 RawVector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 39 41 42 43 44 45 45 46 47 48 48 49 49 4 Data Structures: Part Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 The Named Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 The List aka GenericVector Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 List to Retrieve Parameters from R . . . . . . . . . . . . . . . . . . . . . 4.2.2 List to Return Parameters to R . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 The DataFrame Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 The Function Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 A First Example: Using a Supplied Function . . . . . . . . . . . . . 4.4.2 A Second Example: Accessing an R Function . . . . . . . . . . . . 4.5 The Environment Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 The S4 Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 ReferenceClasses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 The R Mathematics Library Functions . . . . . . . . . . . . . . . . . . . . . . . . . 51 51 52 53 54 55 56 56 56 57 58 59 60 Part III Advanced Topics 5 Using Rcpp in Your Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Using Rcpp.package.skeleton . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 R Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 C++ Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.5 Makevars and Makevars.win . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 65 66 66 67 68 69 69 Contents xv 5.2.6 NAMESPACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.7 Help Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Case Study: The wordcloud Package . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 71 73 74 6 Extending Rcpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Extending Rcpp::wrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Intrusive Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Nonintrusive Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Templates and Partial Specialization . . . . . . . . . . . . . . . . . . . . 6.3 Extending Rcpp::as . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Intrusive Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Nonintrusive Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Templates and Partial Specialization . . . . . . . . . . . . . . . . . . . . 6.4 Case Study: The RcppBDT Package . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 75 76 76 77 78 78 78 79 79 80 82 7 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.1.1 Exposing Functions Using Rcpp . . . . . . . . . . . . . . . . . . . . . . . 83 7.1.2 Exposing Classes Using Rcpp . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.2 Rcpp Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.2.1 Exposing C++ Functions Using Rcpp Modules . . . . . . . . . . . 86 7.2.2 Exposing C++ Classes Using Rcpp Modules . . . . . . . . . . . . . 90 7.3 Using Modules in Other Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.3.1 Namespace Import/Export . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.3.2 Support for Modules in Skeleton Generator . . . . . . . . . . . . . . 99 7.3.3 Module Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7.4 Case Study: The RcppCNPy Package . . . . . . . . . . . . . . . . . . . . . . . . . 100 7.5 Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 8 Sugar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8.2 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 8.2.1 Binary Arithmetic Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 8.2.2 Binary Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 8.2.3 Unary Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 8.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 8.3.1 Functions Producing a Single Logical Result . . . . . . . . . . . . . 107 8.3.2 Functions Producing Sugar Expressions . . . . . . . . . . . . . . . . . 107 8.3.3 Mathematical Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 8.3.4 The d/q/p/q Statistical Functions . . . . . . . . . . . . . . . . . . . . . . . 114 8.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 8.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 xvi Contents 8.5.1 The Curiously Recurring Template Pattern . . . . . . . . . . . . . . . 117 8.5.2 The VectorBase Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 8.5.3 Example: sapply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 8.6 Case Study: Computing π Using Rcpp sugar . . . . . . . . . . . . . . . . . . . . 122 Part IV Applications 9 RInside . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 9.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 9.2 A First Example: Hello, World! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 9.3 A Second Example: Data Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 9.4 A Third Example: Evaluating R Expressions . . . . . . . . . . . . . . . . . . . . 132 9.5 A Fourth Example: Plotting from C++ via R . . . . . . . . . . . . . . . . . . . . 133 9.6 A Fifth Example: Using RInside Inside MPI . . . . . . . . . . . . . . . . . . . . 134 9.7 Other Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 10 RcppArmadillo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 10.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 10.2 Motivation: FastLm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 10.2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 10.2.2 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 10.2.3 A Caveat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 10.3 Case Study: Kalman Filter Using RcppArmadillo . . . . . . . . . . . . . . . 146 10.4 RcppArmadillo and Armadillo Differences . . . . . . . . . . . . . . . . . . . . . 152 11 RcppGSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 11.2 Motivation: FastLm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 11.3 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 11.3.1 GSL Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 11.3.2 RcppGSL::vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 11.3.3 Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 11.3.4 Vector Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 11.4 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 11.4.1 Creating Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 11.4.2 Implicit Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 11.4.3 Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 11.4.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 11.4.5 Matrix Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 11.5 Using RcppGSL in Your Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 11.5.1 The configure Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 11.5.2 The src Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 11.5.3 The R Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 11.6 Using RcppGSL with inline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 11.7 Case Study: GSL-Based B-Spline Fit Using RcppGSL . . . . . . . . . . . 169 Contents xvii 12 RcppEigen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 12.2 Eigen classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 12.2.1 Fixed-Size Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . 178 12.2.2 Dynamic-Size Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . 179 12.2.3 Arrays for Per-Component Operations . . . . . . . . . . . . . . . . . . . 180 12.2.4 Mapped Vectors and Matrices and Special Matrices . . . . . . . 181 12.3 Case Study: Kalman filter using RcppEigen . . . . . . . . . . . . . . . . . . . . 182 12.4 Linear Algebra and Matrix Decompositions . . . . . . . . . . . . . . . . . . . . 183 12.4.1 Basic Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 12.4.2 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . 184 12.4.3 Least-Squares Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 12.4.4 Rank-Revealing Decompositions . . . . . . . . . . . . . . . . . . . . . . . 185 12.5 Case Study: C++ Factory for Linear Models in RcppEigen . . . . . . . 186 Part V Appendix A C++ for R Programmers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 A.1 Compiled Not Interpreted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 A.2 Statically Typed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 A.3 A Better C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 A.4 Object-Oriented (But Not Like S3 or S4) . . . . . . . . . . . . . . . . . . . . . . . 200 A.5 Generic Programming and the STL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 A.6 Template Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 A.7 Further Reading on C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Software Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 List of Tables Table 1.1 Run-time performance of the recursive Fibonacci examples . . 10 Table 1.2 Run-time performance of the different VAR simulation implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Table 8.1 Run-time performance of Rcpp sugar compared to R and manually optimized C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Table 8.2 Run-time performance of Rcpp sugar compared to R for simulating π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Table 11.1 Correspondence between GSL vector types and templates defined in RcppGSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Table 11.2 Correspondence between GSL vector view types and templates defined in RcppGSL . . . . . . . . . . . . . . . . . . . . . . . . . 162 Table 12.1 Mapping between Eigen matrix and vector types, and corresponding array types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Table 12.2 lmBenchmark results for the RcppEigen example . . . . . . . 191 xix List of Figures Figure 1.1 Plotting a density in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 1.2 Plotting a density and bootstrapped confidence interval in R . Figure 1.3 Fibonacci spiral based on first 34 Fibonacci numbers . . . . . . . 5 6 8 Figure 9.1 Combining RInside with the Qt toolkit for a GUI application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Figure 9.2 Combining RInside with the Wt toolkit for a web application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Figure 10.1 Object trajectory and Kalman filter estimate . . . . . . . . . . . . . . 149 Figure 11.1 Artificial data and B-spline fit . . . . . . . . . . . . . . . . . . . . . . . . . . 175 xxi List of Listings 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 1.15 2.1 2.2 2.3 2.4 2.6 2.5 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 Plotting a density in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plotting a density and bootstrapped confidence interval in R . . . . . . Fibonacci number in R via recursion . . . . . . . . . . . . . . . . . . . . . . . . . Fibonacci number in C++ via recursion . . . . . . . . . . . . . . . . . . . . . . . Fibonacci wrapper in C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fibonacci number in C++ via recursion, using inline . . . . . . . . . . . . Fibonacci number in C++ via recursion, using Rcpp attributes . . . . Fibonacci number in C++ via recursion, via Rcpp attributes and sourceCpp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fibonacci number in R via memoization . . . . . . . . . . . . . . . . . . . . . . Fibonacci number in C++ via memoization . . . . . . . . . . . . . . . . . . . . Fibonacci number in R via iteration . . . . . . . . . . . . . . . . . . . . . . . . . . Fibonacci number in C++ via iteration . . . . . . . . . . . . . . . . . . . . . . . . VAR(1) of order 2 generation in R . . . . . . . . . . . . . . . . . . . . . . . . . . . VAR(1) of order 2 generation in C++ . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of VAR(1) run-time between R and C++ . . . . . . . . . . . A first manual compilation with Rcpp . . . . . . . . . . . . . . . . . . . . . . . . A first manual compilation with Rcpp using Rscript . . . . . . . . . . . . . Using the first manual compilation from R . . . . . . . . . . . . . . . . . . . . Convolution example using inline . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using inline with include= . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Program source from convolution example using inline in verbose mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A first RcppArmadillo example for inline . . . . . . . . . . . . . . . . . . . . . Creating a plugin for use with inline . . . . . . . . . . . . . . . . . . . . . . . . . . Example of new cppFunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of new cppFunction with plugin . . . . . . . . . . . . . . . . . . C++ example of throwing and catching an exception . . . . . . . . . . . Using C++ example of throwing and catching an exception . . . . . C++ example of example from Rcpp-type checks . . . . . . . . . . . . . . C++ macros for Rcpp exception handling . . . . . . . . . . . . . . . . . . . . 4 4 7 8 9 9 11 11 12 12 14 14 16 16 17 23 24 24 26 27 28 29 30 31 32 32 33 33 34 xxiii xxiv List of Listings 2.15 2.16 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 6.1 6.2 inline version of C++ example of throwing and catching an exception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rcpp attributes version of C++ example of throwing and catching an exception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A function to return four perfect numbers . . . . . . . . . . . . . . . . . . . . . A function to reimplement prod() . . . . . . . . . . . . . . . . . . . . . . . . . . A second function to reimplement prod() . . . . . . . . . . . . . . . . . . . Testing the prod() function with floating-point inputs . . . . . . . . . Testing the prod() function with inappropriate inputs . . . . . . . . . A function to return a generalized sum of powers . . . . . . . . . . . . . . . Declaring two vectors from the same SEXP type . . . . . . . . . . . . . . . Declaring two vectors from the same SEXP type using clone . . . . . Using Rcpp sugar to compute a second vector . . . . . . . . . . . . . . . . . . Declaring a three-dimensional vector . . . . . . . . . . . . . . . . . . . . . . . . . A function to take square roots of matrix elements . . . . . . . . . . . . . . A function to assign a logical vector . . . . . . . . . . . . . . . . . . . . . . . . . . A function to assign a character vector . . . . . . . . . . . . . . . . . . . . . . . . A named vector in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A named vector in C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A named vector in C++ , second approach . . . . . . . . . . . . . . . . . . . . Using the List class for parameters . . . . . . . . . . . . . . . . . . . . . . . . . Using a List to return objects to R . . . . . . . . . . . . . . . . . . . . . . . . . Using the DataFrame class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using a Function passed as argument . . . . . . . . . . . . . . . . . . . . . . Using a Function accessed from R . . . . . . . . . . . . . . . . . . . . . . . . Using a Function via an Environment . . . . . . . . . . . . . . . . . . . Assigning in the global environment . . . . . . . . . . . . . . . . . . . . . . . . . A simple example for accessing S4 class elements . . . . . . . . . . . . . . A simple example for accessing S4 class elements . . . . . . . . . . . . . . Example use of Rmath.h functions . . . . . . . . . . . . . . . . . . . . . . . . . A first Rcpp.package.skeleton example . . . . . . . . . . . . . . . . Files created by Rcpp.package.skeleton . . . . . . . . . . . . . . . . R function rcpp hello world . . . . . . . . . . . . . . . . . . . . . . . . . . . C++ header file rcpp hello world.h . . . . . . . . . . . . . . . . . . . . . C++ source file rcpp hello world.cpp . . . . . . . . . . . . . . . . . . Calling R function rcpp hello world . . . . . . . . . . . . . . . . . . . . . DESCRIPTION file for skeleton package . . . . . . . . . . . . . . . . . . . . . Makevars file for skeleton package . . . . . . . . . . . . . . . . . . . . . . . . . . . Makevars.win file for skeleton package . . . . . . . . . . . . . . . . . . . . . . . NAMESPACE file for skeleton package . . . . . . . . . . . . . . . . . . . . . . . Manual page mypackage-package.Rd for skeleton package . Manual page rcpp hello world.Rd for skeleton package . . . . Function is overlap from the wordcloud package . . . . . . . . . . . as and wrap declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implicit use of as and wrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 34 42 43 43 44 45 45 46 46 47 47 48 48 49 51 52 52 53 54 55 56 57 57 58 58 59 60 66 67 67 68 68 69 69 70 70 71 71 72 73 75 75 List of Listings xxv 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 7.13 7.14 77 77 78 78 79 79 80 81 81 82 84 84 84 85 86 86 87 87 87 88 88 89 89 7.15 7.16 7.17 7.18 7.19 7.20 7.21 7.22 7.23 7.24 7.25 7.26 7.27 7.28 7.29 7.30 7.31 7.32 Intrusive extension for wrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonintrusive extension for wrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . Partial specialization for wrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Intrusive extension for as . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonintrusive extension for as . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Partial specialization via Exporter . . . . . . . . . . . . . . . . . . . . . . . . . Partial specialization of as via Exporter . . . . . . . . . . . . . . . . . . . RcppBDT definitions of as and wrap . . . . . . . . . . . . . . . . . . . . . . . . RcppBDT use of as and wrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RcppBDT example for getFirstDayOfWeekAfter . . . . . . . . A simple norm function in C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . Calling the norm function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A simple class Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exposing two member functions for Uniform class . . . . . . . . . . . . Using the Uniform class from R . . . . . . . . . . . . . . . . . . . . . . . . . . . Exposing the norm function via modules . . . . . . . . . . . . . . . . . . . . . Using norm function exposed via modules . . . . . . . . . . . . . . . . . . . . A module example with six functions . . . . . . . . . . . . . . . . . . . . . . . . . Modules example interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modules example use from R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modules example with function documentation . . . . . . . . . . . . . . . . Output for modules example with function documentation . . . . . . . Modules example with documentation and formal arguments . . . . . Output for modules example with documentation and formal arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modules example with documentation and formal arguments without defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Usage of modules example with documentation and formal arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modules example with ellipis argument . . . . . . . . . . . . . . . . . . . . . . . Output of modules example with ellipis argument . . . . . . . . . . . . . . Exposing Uniform class using modules . . . . . . . . . . . . . . . . . . . . . Using Uniform class via modules . . . . . . . . . . . . . . . . . . . . . . . . . . Constructor with a description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Constructor with a validator function pointer . . . . . . . . . . . . . . . . . . Exposing fields and properties for modules . . . . . . . . . . . . . . . . . . . . Field with documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Readonly-field with documentation . . . . . . . . . . . . . . . . . . . . . . . . . . Property with getter and setter, or getter-only . . . . . . . . . . . . . . . . . . Example of using a getter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of using a setter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example code for properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example using properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example documenting a method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Const and non-const member functions . . . . . . . . . . . . . . . . . . . . . . . 89 89 90 90 90 91 91 92 92 92 93 93 93 94 94 94 95 95 96 xxvi 7.33 7.34 7.35 7.36 7.37 7.38 7.39 7.40 7.41 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13 8.14 8.15 8.16 8.17 8.18 8.19 8.20 8.21 8.22 8.23 8.24 8.25 8.26 8.27 8.28 8.29 8.30 8.31 8.32 8.33 8.34 8.35 8.36 List of Listings Example of S4 dispatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Complete example of exposing std::vector<double> . . . . . 97 R use of std::vector<double> modules example . . . . . . . . . 98 R NAMESPACE import of Rcpp for modules . . . . . . . . . . . . . . . . . . 98 R .onLoad() code for module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Package skeleton support for modules . . . . . . . . . . . . . . . . . . . . . . . . 99 Use of prompt for documentation skeleton . . . . . . . . . . . . . . . . . . . 100 NumPy load and save functions defined in RcppCNPy . . . . . . . . . . 100 Example of module declaration in RcppCNPy . . . . . . . . . . . . . . . . . 102 A simple C++ function operating on vectors . . . . . . . . . . . . . . . . . . 103 A simple R function operating on vectors . . . . . . . . . . . . . . . . . . . . . 104 A simple C++ function using sugar operating on vectors . . . . . . . . 104 Binary arithmetic operators for sugar . . . . . . . . . . . . . . . . . . . . . . . . . 105 Binary logical operators for sugar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Unary operators for sugar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Functions returning a single boolean result . . . . . . . . . . . . . . . . . . . . 107 Using functions returning a single boolean result . . . . . . . . . . . . . . . 107 Example using is na sugar function . . . . . . . . . . . . . . . . . . . . . . . . . 108 Example using seq along sugar function . . . . . . . . . . . . . . . . . . . . 108 Example using seq len sugar function . . . . . . . . . . . . . . . . . . . . . . 108 Example using pmin and pmax sugar function . . . . . . . . . . . . . . . . 109 Example using ifelse sugar function . . . . . . . . . . . . . . . . . . . . . . . 109 Example using sapply sugar function . . . . . . . . . . . . . . . . . . . . . . . 109 Example using std::unary function functor with sapply 110 Example using std::unary function functor with mapply 110 Example using sign sugar function . . . . . . . . . . . . . . . . . . . . . . . . . 111 Example using sign sugar function . . . . . . . . . . . . . . . . . . . . . . . . . 111 Example using setdiff sugar function . . . . . . . . . . . . . . . . . . . . . . 111 Example using union sugar function . . . . . . . . . . . . . . . . . . . . . . . 111 Example using intersect sugar function . . . . . . . . . . . . . . . . . . . 112 Example using clamp sugar function . . . . . . . . . . . . . . . . . . . . . . . . 112 Example using unique sugar function . . . . . . . . . . . . . . . . . . . . . . . 112 Example using table sugar function . . . . . . . . . . . . . . . . . . . . . . . . 113 Example using duplicated sugar function . . . . . . . . . . . . . . . . . . 113 Examples using mathematical sugar functions . . . . . . . . . . . . . . . . . 113 Examples of d/p/q/r statistical sugar functions sugar . . . . . . . . . . . . 114 Examples of using sugar RNG functions with RNGScope . . . . . . . 115 The Curiously Recurring Template Pattern (CRTP) . . . . . . . . . . . . . 117 The VectorBase class for Rcpp sugar . . . . . . . . . . . . . . . . . . . . . . 117 The sapply Rcpp sugar implementation . . . . . . . . . . . . . . . . . . . . . 119 Rcpp::traits::result of template . . . . . . . . . . . . . . . . . . . . 120 result of trait implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Rcpp::traits::r sexptype traits template . . . . . . . . . . 120 r vector element converter class . . . . . . . . . . . . . . . . . . . . 121 storage type trait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 List of Listings 8.37 8.38 8.39 8.40 8.41 8.42 8.43 9.1 9.2 9.3 9.4 9.5 9.6 9.7 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 10.10 10.11 10.12 10.13 10.14 11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 11.9 11.10 11.11 11.12 11.13 11.14 11.15 11.16 11.17 xxvii Input expression base type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Output expression base type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Constructor for Sapply class template . . . . . . . . . . . . . . . . . . . . . . . 122 Implementation of Sapply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Simulating π in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Simulating π in C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Simulating π in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 First RInside example: Hello, World! . . . . . . . . . . . . . . . . . . . . . . . . . 128 Makefile for RInside examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Using Makefile for RInside to build example . . . . . . . . . . . . . . . . . . 130 Second RInside example: data transfer . . . . . . . . . . . . . . . . . . . . . . . . 131 Third RInside example: data transfer . . . . . . . . . . . . . . . . . . . . . . . . . 132 Fourth RInside example: plotting from C++ via R . . . . . . . . . . . . . 133 Fifth RInside example: parallel computing with MPI . . . . . . . . . . . . 134 A simple Armadillo example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 FastLm function using RcppArmadillo . . . . . . . . . . . . . . . . . . . . . . 141 Basic fLm() function without formula interface . . . . . . . . . . . . . . . 142 Basic fastLmPure() R function without formula interface . . . . 143 FastLm comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 An example of a rank-deficient design matrix . . . . . . . . . . . . . . . . . . 144 Basic Kalman filter in Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Basic Kalman filter in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Basic Kalman filter in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Basic Kalman filter class in C++ using Armadillo . . . . . . . . . . . . . . 150 Basic Kalman filter function in C++ . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Basic Kalman filter timing comparison . . . . . . . . . . . . . . . . . . . . . . . 151 Standard defines for RcppArmadillo . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Standard defines for RcppArmadillo . . . . . . . . . . . . . . . . . . . . . . . . . . 153 FastLm function using RcppGSL . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Definition of gsl vector and gsl vector int . . . . . . . . . . . . 158 Example use of gsl vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Example use of RcppGSL::vector<T> . . . . . . . . . . . . . . . . . . . . 159 Example RcppGSL::vector<T> function . . . . . . . . . . . . . . . . . . 160 Example call of RcppGSL::vector<T> function . . . . . . . . . . . . 160 Second example RcppGSL::vector<T> function . . . . . . . . . . . 160 Example call of second RcppGSL::vector<T> function . . . . . . 161 Example of a vector view class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Example use RcppGSL matrix class . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Implicit conversion for RcppGSL matrix class . . . . . . . . . . . . . . . . . 163 Indexing for RcppGSL matrix class . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Matrix norm in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Autoconf script for RcppGSL use . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Shell script configuration script for RcppGSL use . . . . . . . . . . . . . . 166 Windows shell script configuration script for RcppGSL use . . . . . . 166 Vector norm function for RcppGSL . . . . . . . . . . . . . . . . . . . . . . . . . . 166 xxviii 11.18 11.19 11.20 11.21 11.22 11.23 11.24 11.25 11.26 12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9 12.10 12.11 12.12 12.13 12.14 12.15 12.16 A.1 A.2 A.3 A.4 A.5 A.6 A.7 A.8 A.9 A.10 A.11 A.12 A.13 A.14 A.15 A.16 A.17 List of Listings Makevars.in for RcppGSL example . . . . . . . . . . . . . . . . . . . . . . . 167 R function for RcppGSL example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Using RcppGSL with inline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Using package.skeleton with inline result . . . . . . . . . . . . . . . . 169 B-spline fit example from the GSL . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Beginning of C++ file with B-spline fit for R . . . . . . . . . . . . . . . . . 172 Data generation for GSL B-spline fit for R . . . . . . . . . . . . . . . . . . . . 172 Data fit for GSL B-spline with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 R side of GSL B-spline example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 A simple Eigen example using fixed-size vectors and matrices . . . . 178 Eigen fixed-size vector and matrix representation . . . . . . . . . . . . . . . 178 Eigen dynamic-size vector and matrix representation . . . . . . . . . . . . 179 A simple Eigen example using dynamic-size vectors and matrices . 179 Comparing performance of simple operations between dynamic and fixed size vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Timing results simple operations betweem dynamic and fixed size vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Basic Kalman filter class in C++ using Eigen . . . . . . . . . . . . . . . . . . 182 Using a basic Eigen solver from R . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Computing eigenvalues using Eigen . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Computing least-squares using Eigen . . . . . . . . . . . . . . . . . . . . . . . . . 185 Rank-revelaing decompositions using Eigen . . . . . . . . . . . . . . . . . . . 186 Core of definition of lm class in Eigen . . . . . . . . . . . . . . . . . . . . . . . . 187 Derived classes of lm providing specializations . . . . . . . . . . . . . . . . 188 Implementation of two subclass constructors for lm model fit . . . . 189 Selection of subclasses for lm model fit . . . . . . . . . . . . . . . . . . . . . . . 189 Actual fastLm function in RcppEigen package . . . . . . . . . . . . . 190 Simple C++ example: Hello, World! . . . . . . . . . . . . . . . . . . . . . . . . 195 Compiling and linking simple C++ example: Hello, World! . . . . . 196 Compiling and linking simple C++ example in one step: Hello, World! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 Simple C++ example using Rmath . . . . . . . . . . . . . . . . . . . . . . . . . . 196 Compiling and linking simple C++ example using Rmath . . . . . . . 197 Simple R example of dynamic types . . . . . . . . . . . . . . . . . . . . . . . . . 197 Simple R example of dynamic types . . . . . . . . . . . . . . . . . . . . . . . . . 198 Simple C++ function example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Simple C++ function call example . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Simple C++ data structure using struct . . . . . . . . . . . . . . . . . . . . 200 Simple C++ data structure using class . . . . . . . . . . . . . . . . . . . . . 201 Simple C++ example using iterators on vector . . . . . . . . . . . . . . 202 Simple C++ example using const iterators on list . . . . . . . . . . . 203 Simple C++ example using const iterators on deque . . . . . . . . . . 203 Simple C++ example using accumulate algorithm . . . . . . . . . . 203 Simple C++ template example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Another C++ template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Part I Introduction Chapter 1 A Gentle Introduction to Rcpp Abstract This initial chapter provides a first introduction to Rcpp. It uses a somewhat slower pace and generally more gentle approach than the rest of the book in order to show key concepts which are revisited and discussed in more depth throughout the remainder. So the aim of this chapter is to cover a fairly wide range of material, but at a more introductory level for an initial overview. Two larger examples are studied in detail. We first compute the Fibonacci sequence in three different ways in two languages. Second, we simulate from a multivariate dynamic model provided by a vector autoregression. 1.1 Background: From R to C++ R is both a powerful interactive environment for data analysis, visualization, and modeling and an expressive programming language designed and built to support these tasks. The interactive nature of working with data—through data displays, summaries, model estimation, simulation, and numerous other tasks—is a key strength of the R environment. And, so is the R programming language which permits use from interactive explorations to small scripts and all the way to complete implementations of new functionality. This R programming language is in fact a dialect of the S programming language initially developed by Bell Labs. The dual nature of interactive analysis, as well as programming, is no accident. As succinctly expressed in the title of one of the books on the S language (which has provided the foundations upon which R is built), it is designed to support Programming with Data (Chambers 1998). That is a rather unique proposition as far as programming languages go. As a domain-specific language (DSL), R is tailored specifically to support and enable data analysis work. Moreover, there is also a particular focus on research use for developing new and exciting approaches, as well as solidifying existing approaches. R and its predecessor S are not static languages: they have evolved since the first designs well over thirty years ago and continue to evolve today. D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4 1, © The Author 2013 3 4 1 A Gentle Introduction to Rcpp To mention just one example of this evolution, object orientation in R is supported by the S3 and S4 class systems, as well as the newer Reference Classes. Of course, such flexibility of having alternate approaches can also be seen as a weakness. It may lead to yet more material which language beginners may find perplexing, and it may lead to small inconsistencies which may confuse intermediate and advanced users. Coincidentally, similar concerns are also sometimes raised about the C++ language. These arguments have some merit, but on the margin more useful and actually used languages are preferable to those that are very cleanly designed, yet not used much. Having a proper programming language is a key feature supporting rigorous and reproducible research: by encoding all steps of a data analysis and estimation in a script or program, the analyst makes every aspect of the process explicit and thereby ensures full reproducibility. Consider the first example which is presented below. It is a slightly altered version of an example going back to a post by Greg Snow to the r-help mailing list. 1 3 xx <- faithful$eruptions fit <- density(xx) plot(fit) Listing 1.1 Plotting a density in R We assign a new variable xx by extracting the named component eruptions of the (two-column) data.frame faithful included with the R system. The data set contains waiting times between eruptions, as well as eruption duration times, at the Old Faithful geyser in the Yellowstone National Park in the USA. To estimate the density function of eruption duration based on this data, we then call the R function density (which uses default arguments besides the data we pass in). This function returns an object we named fit, and the plot function then visualizes it as shown in the corresponding Fig. 1.1. This is a nice example, and it illustrates some features of R such as the objectoriented nature in which we can simply plot an object returned from a modeling function. However, this example was introduced primarily to provide the basis for an extension also provided by Greg Snow and shown in the next listing. 1 3 5 7 9 11 xx <- faithful$eruptions fit1 <- density(xx) fit2 <- replicate(10000, { x <- sample(xx,replace=TRUE); density(x, from=min(fit1$x), to=max(fit1$x))$y }) fit3 <- apply(fit2, 1, quantile,c(0.025,0.975)) plot(fit1, ylim=range(fit3)) polygon(c(fit1$x,rev(fit1$x)), c(fit3[1,], rev(fit3[2,])), col=’grey’, border=F) lines(fit1) Listing 1.2 Plotting a density and bootstrapped confidence interval in R 1.1 Background: From R to C++ 5 0.3 0.0 0.1 0.2 Density 0.4 0.5 density.default(x = xx) 1 2 3 4 5 N = 272 Bandwidth = 0.3348 6 Fig. 1.1 Plotting a density in R The first two lines are identical apart from now assigning to an object fit1 holding the estimated density. Lines three to six execute a minimal bootstrapping exercise. The replicate() function repeats N (here 10, 000) times the code supplied in the second argument. Here, this argument is a code block delimited by braces, containing two instructions. The first instruction creates a new data set by resampling with replacement from the original data. The second instruction then estimates a density on this resampled data. This time the data range is limited to the range of the initial estimated in fit1; this ensures that the bootstrapped density is estimated on the same grid of x values as in fit1. For this data set, the grid contains 512 points. We retain only the y coordinates of the fit—these will be collected as the N columns in the resulting object fit2 making this a matrix of dimension 512 × N. The next command on line 7 then applies the quantile() function to each of the 512 rows in fit2, returning the 2.5 % and 97.5 % quantiles, and creating a new matrix of dimension 2 × 512 where the two rows contains the quantile estimates at each grid point for the x axis. We then plot the initial fit, adjusting the y-axis to the range of quantile estimates. Next, we add a gray polygon defined by the x grid and the quantile estimates which visualizes the bootstrapped 95 % confidence interval of the initial density estimate. Finally, we replot the fit1 density over the gray polygon. The resulting plot is shown in Fig. 1.2. The main takeaway of this second example is that with just a handful of lines of code, we can deploy fairly sophisticated statistical modeling functions (such as the density estimate) and even provide a complete resampling strategy. This uses the 6 1 A Gentle Introduction to Rcpp 0.3 0.0 0.1 0.2 Density 0.4 0.5 density.default(x = xx) 1 2 3 4 5 N = 272 Bandwidth = 0.3348 6 Fig. 1.2 Plotting a density and bootstrapped confidence interval in R same estimation function in a nonparametric bootstrap to also provide a confidence interval for the estimation, and finally plots both. Few languages besides R are this expressive and powerful for working with data. A key aspect of the internal implementation of R is that its own core interpreter and extension mechanism are implemented in the C language. C is often used for system programming as it is reasonably lean and fast, yet also very portable and easily available on most hardware platforms. A key advantage of C is that it is extensible via external libraries and modules. R takes full advantage of this, and so does the Rcpp extension featured in this book. The principal goal of Rcpp is to make writing these extensions easier and less error-prone. The aim of this book is to show how this can be accomplished with what we consider relative ease compared to the standard C interface. The C language is also closely related to the C++ language which can be seen as an extension and superset. There are some small quibbles about a few minor aspects of C which do not carry over to C++ but we can safely ignore these for our purposes. C++ has been called “a federation of [four] languages” (Meyers 2005). This offers new and unique programming aspects and, in particular, provides a key match to the object-model in R (even if the terminology and philosophy of objectoriented programming differs between R and C++ ). Appendix A provides a very brief introduction to the C++ language. 1.2 A First Example 7 1.2 A First Example 1.2.1 Problem Setting Let us consider a concrete first example which requires only basic mathematics. This particular problem was first suggested in a post1 at the StackOverflow site. The Fibonacci sequence Fn is defined as a recursive sum of the two preceding terms in the same sequence: Fn = Fn−1 + Fn−2 (1.1) with these two initial conditions F0 = 0 and F1 = 1 so that the first ten numbers of the sequence F0 to F9 are seen to be 0, 1, 1, 2, 3, 5, 8, 13, 21, 34. Here, we follow the convention of starting the Fibonacci sequence at F0 ; there are other discussions which begin the recursion with F1 which would require slightly altered code in the examples shown below. Fibonacci sequences have long been studied, and the corresponding Wikipedia page2 provides additional resources. Fibonacci sequences can also be visualized: Fig. 1.3 shows the so-called Fibonacci spiral built from the first 34 Fibonacci numbers. 1.2.2 A First R Solution The classic approach of implementing the computation of a Fibonacci number Fn for a given value of n is to evaluate equation 1.1 directly. This commonly leads to a simple recursive function. In R this could be written as follows: 2 4 fibR <- function(n) { if (n == 0) return(0) if (n == 1) return(1) return (fibR(n - 1) + fibR(n - 2)) } Listing 1.3 Fibonacci number in R via recursion 1 2 See http://stackoverflow.com/questions/6807068/. See http://en.wikipedia.org/wiki/Fibonacci_number. 8 1 A Gentle Introduction to Rcpp Fig. 1.3 Fibonacci spiral based on first 34 Fibonacci numbers Source: http://en.wikipedia.org/wiki/File:Fibonacci_spiral_34.svg, released under Public Domain This simple function has several key features: • • • • It is very short. It does not test for wrong input values less than zero. It is easy to comprehend. It is a very faithful rendition of the relationship in Eq. 1.1. However, it also has a key disadvantage: it is very inefficient. Consider the calculation of F5 . Via the recursion in Eq. 1.1, this becomes the sum of F3 and F4 . But already when we compute F4 as the sum of F3 and F2 , we note that we end up recomputing F3 . Similarly, F2 will be computed several times too. In fact, formal analysis reveals that the algorithm is exponential in n—in other words its run-time increases at an exponential rate relative to the argument n. This type of performance is worst-in-class, leading to a search for alternative approaches. Another concern specific to R is that function calls are not particularly lightweight, which makes recursive function calls particularly unattractive. Naturally, for both these reasons, many better algorithms have been suggested and we will discuss two other approaches below. 1.2.3 A First C++ Solution A simple solution to compute Fn much faster using the same simple and intuitive algorithm is to switch to C or C++ . We can write a simple C++ version as follows: 1 3 5 int fibonacci(const int x) { if (x == 0) return(0); if (x == 1) return(1); return (fibonacci(x - 1)) + fibonacci(x - 2); } Listing 1.4 Fibonacci number in C++ via recursion 1.2 A First Example 9 The function is recursive just like the preceding version. For simplicity, it also operates without checking its input arguments. In order to call it from R, we need to use a wrapper function as R prescribes a very particular interface via its .Call() function: all variables used at the interface have to be of pointer to S expression type, or SEXP. There are alternatives to .Call(), but as we will discuss in Chap. 2, .Call() is our preferred interface as we can transfer whole objects from R to C++ and back which is not possible with the alternatives. So without going into details at this point, a suitable wrapper is 1 3 5 extern "C" SEXP fibWrapper(SEXP xs) { int x = Rcpp::as<int>(xs); int fib = fibonacci(x); return (Rcpp::wrap(fib)); } Listing 1.5 Fibonacci wrapper in C++ This uses two key Rcpp tools, the converter functions as and wrap. The first, as, is used to convert the incoming argument xs from SEXP to integer. Similarly, wrap converts the integer result in the integer variable fib to the SEXP type returned by a function used with .Call(). 1.2.4 Using Inline The next steps are to compile these two functions, to link them into a so-called shared library (which can be loaded at run-time by a system such as R), and to actually load it. These three steps do sound a little tedious and labor-intensive, and they are. So it is at this point that we introduce another very powerful helper: the inline package (Sklyar et al. 2012). inline, written mostly by Oleg Sklyar, brings an idea to R which has been used with other dynamically extensible scripting languages. By providing a complete wrapper around the compilation, linking, and loading steps, the programmer can concentrate on the actual code (in either one of the supported languages C, C++, or Fortran) and forget about the operating-system specific details of compilation, linking, and loading. A single entry point, the function cxxfunction() can be used to turn code supplied as a text variable into an executable function. 1 3 5 7 9 ## we need a pure C/C++ function as the generated function ## will have a random identifier at the C++ level preventing ## us from direct recursive calls incltxt <- ’ int fibonacci(const int x) { if (x == 0) return(0); if (x == 1) return(1); return fibonacci(x - 1) + fibonacci(x - 2); }’ 10 11 13 15 17 19 1 A Gentle Introduction to Rcpp ## now use the snipped above as well as one argument conversion ## in as well as out to provide Fibonacci numbers via C++ fibRcpp <- cxxfunction(signature(xs="int"), plugin="Rcpp", incl=incltxt, body=’ int x = Rcpp::as<int>(xs); return Rcpp::wrap( fibonacci(x) ); ’) Listing 1.6 Fibonacci number in C++ via recursion, using inline We actually supply two arguments, the pure C++ function and the wrapper function. The pure C++ function is passed as an argument to the includes argument which allows us to pass additional include directives, or even function or class definitions as seen here. This code passed to the includes variable is included unaltered in the code prepared by inline. The main body of the function is supplied as the argument to the argument body. The function cxxfunction writes the function signature (i.e., its interface defining the variables going in) using the information from the signature variable (here: a single argument named xs) and sets up the code to also use Rcpp features by selecting its plugin. We also note that the function body supplied here is identical to the wrapper functions detailed above. Once cxxfunction has been run successfully, the resulting function created as a result (here: fibRcpp()) can be called just like any other R function. In particular, we can time the execution and compare it to the first solution based on the initial R implementation. The results, obtained from running the script fibonacci.r included as an example in the Rcpp package, are shown in Table 1.1. The actual timing is undertaken by the package rbenchmark (Kusnierczyk 2012). Table 1.1 Run-time performance of the recursive Fibonacci examples Function N Elapsed time (s) Relative (ratio) fibRcpp fibR (byte-compiled) fibR 1 1 1 0.092 62.288 62.711 1.00 677.04 681.64 The compiled version is over 600 times faster, showing that recursive function calls do indeed exert a cost on R performance. We also notice that byte-compiling the R function does not make a difference as the results are essentially unchanged. The takeaway from this section is that there is obvious merit in replacing simple R code with simple C++ code. Writing the Fibonacci recurrence as a simple three-line function is natural. Switching implementation languages to C++ very significantly boosts run-time performance as we have seen with certain values of the argument n. However, no matter what the chosen implementation language, an 1.2 A First Example 11 exponential algorithm will eventually be inapplicable provided the argument n is large enough. For those cases, better algorithms help, and we will look at two different implementations below. It should, however, be stressed that faster implementation languages and better algorithms are not exclusive as we can combine both as we will do in the remainder of the chapter. 1.2.5 Using Rcpp Attributes The inline package discussed in the previous section has become very widely used due to both its versatility and its robustness from fairly wide testing. As we have seen, it permits us to quickly extend R with compiled code directly from the R session. More recently, inline has been complemented by a new approach that arrived with version 0.10.0 of the Rcpp package. This approach borrows from an upcoming (but not yet widely available) feature in the new C++ standard—the “attributes”— but implements it internally. The programmer simply declares certain “attributes,” notably whether a function is to be exported for use from R or from another C++ function (or both). One can declare dependencies whose resolution still relies on the plugin framework provided by inline. Used this way, the “Rcpp attribute” framework can automate more aspects of the type conversion and marshaling of variables. A simple example, once again on the Fibonacci sequence, follows: 1 3 5 7 9 11 #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] int fibonacci(const int x) { if (x < 2) return x; else return (fibonacci(x - 1)) + fibonacci(x - 2); } Listing 1.7 Fibonacci number in C++ via recursion, using Rcpp attributes The key element to note here is the [[Rcpp::export]] attribute preceding the function definition. This can be used as easily as shown in the following example: 1 3 R> sourceCpp("fibonacci.cpp") R> fibonacci(20) [1] 6765 Listing 1.8 Fibonacci number in C++ via recursion, via Rcpp attributes and sourceCpp 12 1 A Gentle Introduction to Rcpp The new function sourceCpp() reads the code from the given source file, parses it for the relevant attributes, and creates the required wrappers before calling R to compile and link just like inline does. Note, however, that we did not have to specify a wrapper function, and that we obtain a single function fibonacci() which can operate through recursion. The new “Rcpp attributes” will be discussed more in Sect. 2.6 where we revisit the same example using the cppFunction() which operates on a character string containing the program, rather than a file. 1.2.6 A Second R Solution One elegant solution to retain the basic recursive structure of the algorithm without incurring the cost of repeated computation of the same value is provided by a method called memoization. Here, the Fibonacci number N is computed for each value between 1 and N − 1 and stored. On the next computation, the precomputed value is recalled, as opposed to starting the full recursion. An R solution with memoization can be written as follows (and is courtesy of Pat Burns) 1 3 5 7 9 11 13 ## memoization solution courtesy of Pat Burns mfibR <- local({ memo <- c(1, 1, rep(NA, 1000)) f <- function(x) { if (x == 0) return(0) if (x < 0) return(NA) if (x > length(memo)) stop("x too big for implementation") if (!is.na(memo[x])) return(memo[x]) ans <- f(x-2) + f(x-1) memo[x] <<- ans ans } }) Listing 1.9 Fibonacci number in R via memoization If a value for argument n has already been encountered, it is used. Otherwise, it is computed and stored in vector memo. This ensures that the recursive function is called exactly once for each possible value of n, which results in a dramatic speedup. 1.2.7 A Second C++ Solution We can also use memoization in C++. A simple solution is provided by the following piece of code: 2 ## memoization using C++ mincltxt <- ’ #include <algorithm> 1.2 A First Example 4 6 #include #include #include #include 13 <vector> <stdexcept> <cmath> <iostream> 8 10 12 14 16 18 20 22 24 26 28 30 class Fib { public: Fib(unsigned int n = 1000) { memo.resize(n); // reserve n elements std::fill( memo.begin(), memo.end(), NAN ); // set to NaN memo[0] = 0.0; // initialize for memo[1] = 1.0; // n=0 and n=1 } double fibonacci(int x) { if (x < 0) // guard against bad input return( (double) NAN ); if (x >= (int) memo.size()) throw std::range_error(\"x too large for implementation\"); if (! ::isnan(memo[x])) return(memo[x]); // if exist, reuse values // build precomputed value via recursion memo[x] = fibonacci(x-2) + fibonacci(x-1); return( memo[x] ); // and return } private: std::vector< double > memo; // internal memory for precomp. }; ’ 32 34 36 38 40 42 ## now use the snippet above as well as one argument conversion ## in as well as out to provide Fibonacci numbers via C++ mfibRcpp <- cxxfunction(signature(xs="int"), plugin="Rcpp", includes=mincltxt, body=’ int x = Rcpp::as<int>(xs); Fib f; return Rcpp::wrap( f.fibonacci(x-1) ); ’) Listing 1.10 Fibonacci number in C++ via memoization We define a very simple C++ class Fib with three elements: • A constructor which is called once upon initialization. • A single public member function which computes Fn . • A private data vector holding the memoization values. So this example provides a first glance at using classes in C++ code. In the actual wrapper function, we simply instantiate an object f of the class Fib and then invoke the member function to compute the given Fibonacci number. 14 1 A Gentle Introduction to Rcpp 1.2.8 A Third R Solution Naturally, we can also compute Fn using an iterative approach. A number of solutions are provided at the WikiBooks site3 so that setting up a simple R solution such as the following is straightforward: 2 4 6 8 10 12 ## linear / iterative solution fibRiter <- function(n) { first <- 0 second <- 1 third <- 0 for (i in seq_len(n)) { third <- first + second first <- second second <- third } return(first) } Listing 1.11 Fibonacci number in R via iteration The iterative solution improves further on the approach using memoization as it requires neither stateful memory nor recursion. 1.2.9 A Third C++ Solution Given the iterative solution in R , it is also straightforward to write as a C++ function as shown below. 2 4 6 8 10 12 14 ## linear / iterative solution fibRcppIter <- cxxfunction(signature(xs="int"), plugin="Rcpp", body=’ int n = Rcpp::as<int>(xs); double first = 0; double second = 1; double third = 0; for (int i=0; i<n; i++) { third = first + second; first = second; second = third; } return Rcpp::wrap(first); ’) Listing 1.12 Fibonacci number in C++ via iteration 3 http://en.wikibooks.org/wiki/Fibonacci_number_program. 1.3 A Second Example 15 For completeness, we also show a C++ solution that uses iterations. It is bound to be the fastest version yet as compiled loops generally execute faster than those from an interpreted language such as R. 1.3 A Second Example 1.3.1 Problem Setting Let us consider a second example. This example was motivated in a private communication with Lance Bachmeier who used it in an introductory econometrics class. This example is also included in the RcppArmadillo package. RcppArmadillo uses Rcpp to implement a very convenient and powerful interface between R and the Armadillo library (Sanderson 2010) for linear algebra with C++. Chapter 10 provides a more in-depth discussion of RcppArmadillo. The context of the example is a vector autoregressive process of order one for two variables, or in formal notation a VAR(1). More generally, a VAR model consists of a number K of endogenous variable xt . A VAR(p) process is then defined by a series of coefficient matrices A j with j ∈ 1, . . . , p such that xt = A1 xt−1 + . . . + A pxt−p + ut plus a possible non-time-series regressor matrix which is omitted here. We follow typographic convention of using lowercase letters for scalars, bold lowercase letters for vectors, and uppercase letters for matrices. For the example, we are considering the simplest case of a two-dimensional VAR of order one. At time t, it is comprised of two endogenous variables xt = (x1t , x2t ) which are a function of their previous values at t − 1 via a coefficient matrix A. As A is assumed to be constant, it no longer requires an index. This can be written as follows: xt = Axt−1 + ut (1.2) where xt and ut are time-varying vectors of size two and A is a two-by-two matrix. 1.3.2 R Solution When studying the properties of VAR systems, simulation is a tool that is frequently used to assess these models. And, for the simulations, we need to generate suitable data. A closer look at Eq. 1.2 reveals that we cannot easily vectorize the expression 16 1 A Gentle Introduction to Rcpp due to the interdependence between the two coefficients. As a result, we need to loop explicitly. 1 3 5 7 9 11 ## parameter and error terms used throughout a <- matrix(c(0.5,0.1,0.1,0.5),nrow=2) u <- matrix(rnorm(10000),ncol=2) ## Let’s start with the R version rSim <- function(coeff, errors) { simdata <- matrix(0, nrow(errors), ncol(errors)) for (row in 2:nrow(errors)) { simdata[row,] = coeff %*% simdata[(row-1),] + errors[row,] } return(simdata) } 13 rData <- rSim(a, u) # generated by R Listing 1.13 VAR(1) of order 2 generation in R This approach is pretty straightforward. The simulation function receives a 2 × 2 matrix a of parameters, and a vector u of size N × 2 of normally and independently distributed random error terms. It then creates a vector y, also of dimension N × 2, by looping from the second row to the last row. Each element of y is assigned the product of the previous row times the coefficient matrix plus the error terms as specified in equation 1.2. 1.3.3 C++ Solution The same basic approach can be used with a C++ function to generate the simulated VAR data. The next listing shows how to use RcppArmadillo via inline to compile, link and load C++ code on the fly into your R session. 2 4 6 8 10 12 14 ## Now load ’inline’ to compile C++ code on the fly suppressMessages(require(inline)) code <- ’ arma::mat coeff = Rcpp::as<arma::mat>(a); arma::mat errors = Rcpp::as<arma::mat>(u); int m = errors.n_rows; int n = errors.n_cols; arma::mat simdata(m,n); simdata.row(0) = arma::zeros<arma::mat>(1,n); for (int row=1; row<m; row++) { simdata.row(row) = simdata.row(row-1)*trans(coeff) + errors.row(row); } return Rcpp::wrap(simdata); ’ 16 ## create the compiled function 1.3 A Second Example 18 17 rcppSim <- cxxfunction(signature(a="numeric",u="numeric"), code,plugin="RcppArmadillo") 20 rcppData <- rcppSim(a,u) # generated by C++ code stopifnot(all.equal(rData, rcppData)) # checking results 22 Listing 1.14 VAR(1) of order 2 generation in C++ We initialize a matrix for the coefficients and a matrix of error terms from the supplied function arguments. We then create a results matrix of the same dimension as the error term matrix, and loop as before to fill this result matrix row-by-row, just as we did in the preceding solution. 1.3.4 Comparison We can run a comparison to determine the run-time of both approaches, as well as the run-time of a third hybrid solution using byte-compiled code (using the compiler package introduced with version 2.13.0 of the R system). 1 3 5 7 ## now load the rbenchmark package and compare all three suppressMessages(library(rbenchmark)) res <- benchmark(rcppSim(a,e), rSim(a,e), compRsim(a,e), columns=c("test", "replications", "elapsed", "relative", "user.self", "sys.self"), order="relative") Listing 1.15 Comparison of VAR(1) run-time between R and C++ The results, obtained from running the script varSimulation.r included as an example in the RcppArmadillo, are shown in Table 1.2. Table 1.2 Run-time performance of the different VAR simulation implementations Function rcppSim (byte-compiled) Rsim rSim N 100 100 100 Elapsed (s) Relative (ratio) 0.033 2.229 4.256 1.00 67.55 128.97 The C++ solution takes only 33 ms whereas the R code takes 4.26 s, or almost 130 times as much. Byte-compilation improves the R performance by a factor of almost two—yet the byte-compiled function still trails the C++ solution by a factor of about 67. 18 1 A Gentle Introduction to Rcpp 1.4 Summary This introductory chapter illustrated the appeal of using Rcpp to extend R with short and simple C++ routines. Code can be written in C++ which is very similar to the R code first used to prototype a solution. Thanks to tools such as the inline package and particularly the more recent Rcpp attributes, we can easily extend R with short C++ functions—and reap substantial performance gains in the process. The rest of the book will introduce Rcpp, as well as extension packages such as RcppArmadillo, in much more detail. The next section starts with a fuller discussion of the required tools. Chapter 2 Tools and Setup Abstract Chapter 1 provided a gentle introduction to Rcpp and some of its key features. In this chapter, we look more closely at the required toolchain of compilers and related R packages needed to deploy the Rcpp package. In particular, on Windows, the Rtools collection is used and non-gcc compilers are not supported. On Unix-alike systems such as Linux and OS X, gcc/g++ is the default. 2.1 Overall Setup The Rcpp package provides a C++ Application Programming Interface (API) as an extension to the R system. Because of these very close ties to R itself, it is both bound by the choices made by the R build system and influenced by how R is configured. Some of the requirements for working with Rcpp and R are: • The development environment has to comprise a suitable compiler (which is discussed more in the next section), as well as header files and libraries for a number of required components (R Development Core Team 2012a). • R should be built in a way that permits both dynamic linking and embedding; on Unix-alike systems this is typically ensured by the --enable-shared-lib option to configure (R Development Core Team 2012d, Chapter 8) and most binary distributions of R are built this way. • Common development tools such as make are needed which should be standard on Unix-alike systems (though OS X requires installation of developer tools) whereas Windows users will have to install the Rtools suite provided via the CRAN mirror network (R Development Core Team 2012a, Appendix D). In general, the standard environment for building a CRAN package from source is required. The (even stronger) requirement of being able to build R itself is a possible guideline as is documented in R Development Core Team (2012a,d). D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4 2, © The Author 2013 19 20 2 Tools and Setup There are a few additional CRAN packages that are very useful along with Rcpp, and which the package itself depends upon. These are: inline which is invaluable for direct compilation, linking and loading of short code snippets, and used throughout this book too. rbenchmark which is used to run simple timing comparisons and benchmarks; it is also recommended by Rcpp but not required. RUnit which is used for unit testing; the package is recommended and will only be needed to rerun these tests but it is not strictly required. We already saw two of these packages in use in the preceding chapter. Lastly, users who want to build Rcpp from the repository source (rather than the distributed tarfile) also need the highlight binary by André Simon which is used to provide colored source code in several of the vignettes. 2.2 Compilers 2.2.1 General Setup A basic requirement for extending a program with suitable loadable binary modules relates to the compiler being used. But exactly what is suitable can depend on a number of factors. The choice of compilers generally matters, and more so for some languages than for others. The C language has a simpler interface for callable functions which makes it possible to have a program compiled with one compiler load a module built with another compiler. In general, this is not an option for C++ due to a much more complicated interface reflecting some of the richer structures in the C++ language. For example, how function names, and member function names inside classes, are represented is not standardized between compiler makers, and this generally prevents mixing of object code between different compilers. As Rcpp is of course a C++ application, this last restriction applies and we need to stick with the compilers used to build R on the different platforms. The CRAN repository generally employs the same approach of using one main compiler per platform, and this approach is the one supported by the CRAN maintainers and the R Core team. In practice, this means that on almost all platforms, the GNU Compiler Collection (or gcc, which is also the name of its C language compiler) has to be used along with the corresponding g++ compiler for the C++ language. One notable exception is Solaris where the Sun compiler can be used as well; however, this platform is not as widely available and used, and we will not discuss its particular aspects any further. Also, on Windows, the prescribed way to access the suitable compiler is via the Rtools package contributed by the Windows R maintainers (R Development Core Team 2012a, Appendix D). OS X is an exception as Apple will not ship gcc 2.2 Compilers 21 versions past 4.2.1. Its transition to the clang++ compiler of the LLVM project is not yet complete as this book is being written. Users on the OS X platform may have to download tools provided by Simon Urbanek, the R Core maintainer supporting OS X. So on Windows, OS X and Linux, the compiler of choice for Rcpp generally is the g++ compiler. A minimum suitable version is a final 4.2.* release; releases earlier than 4.2.* were lacking some C++ features used by Rcpp. Later versions are preferred as version 4.2.1 has some known bugs. But generally speaking, as of 2013 the (current) default compilers on all the common platforms are suitable. As of R version 2.12.0, the Windows platform has switched to version 4.5.1 of g++ in order to support both 32- and 64-bit builds. More advanced C++ features from the next C++ standard, C++11, which has recently been approved by the standards committee will become available once the compilers support them by default. 2.2.2 Platform-Specific Notes Windows Windows is both the most common platform for R use—yet quite possibly the hardest to develop on. The reason for this difficulty with R development on Windows is that the build environment and tools do not come standard with the operating system. However, due to the popularity of the platform, good support exists in the form of a third-party package kindly provided by some of the R Core developers who focus on Windows, namely Brian Ripley and Duncan Murdoch. The Rtools package, initially distributed via a site maintained by Duncan Murdoch but now available via the CRAN network, contains all the required tools in a single package. Complete instructions specific to Windows are available in the “R Administration” manual (R Development Core Team 2012a, Appendix D). To stress again what was hinted at above: other compilers are not supported on Windows. In particular, the popular series of compilers produced by Microsoft cannot be used to build R from source (for reasons that are beyond the scope of this discussion) as these compilers are simply not supported by R Core. While it may be possible to compile some C++ extensions for R using these compilers, the Rcpp package follows the recommendation of the R Core team and sticks to the officially supported compilers. So for the last few years, and presumably for the next few years too, this limits the choice on Windows to the version of g++ in the Rtools bundle. 22 2 Tools and Setup OS X OS X has become a popular choice of operating system among developers. As noted in the “R Administration” manual (R Development Core Team 2012a, Appendix C.4), the Apple Developer Tools (e.g., Xcode) have to be installed (as well as gfortran if R or Fortran-using packages are to be built). Some older versions of OS X do not have a C++ compiler that is recent enough for some of the template code in the Rcpp; releases starting from “Snow Leopard” should be sufficient. Unfortunately, Apple and the Free Software Foundation (the organization backing all the GNU software) are at an impasse over licensing. The GNU Compiler Collection now uses version 3 of GNU General Public License which Apple deems unsuitable for its operating system. As became cleat in 2011, it seems that g++ version 4.2.1 will be the last version available from Apple, which is unfortunate as more current g++ releases have made great strides towards adding new features of the upcoming C++ language standard. However, the clang++ compiler from the LLVM should eventually provide a full-featured replacement. Linux On Linux, developers need to install the standard development packages. Some distributions provide helper packages which pull in all the required packages; the r-base-dev package on Debian and Ubuntu is an example. In general, whatever tools are needed to build R itself will be sufficient to build Rcpp from source, and to build packages utilizing Rcpp. Other Platforms Few other platforms appear to be in widespread use. The CRAN archive runs regression tests against Solaris and its Sun compiler. However, as we do not have direct access to the platform, development and debugging of Rcpp is somewhat cumbersome on this platform. Moreover, we have not yet detected measurable interest among the population of possible users. That said, Rcpp plugs into general R facilities for building packages, and the clear intent is to have Rcpp install and work on every platform supported by R itself. 2.3 The R Application Programming Interface The R language and environment supports an application programming interface, or API for short. The API is described in the “Writing R Extensions” manual (R Development Core Team 2012d), and defined in the header files provided with every R installation. The R Core group usually stresses that only the public API should be used as other (undocumented) functions could change without notice. 2.4 A First Compilation with Rcpp 23 Several books describe the API and its use. Venables and Ripley (2000) is an important early source. Gentleman (2009) and Matloff (2011) are more recent additions, while Chambers (2008) is authoritative in the context of “Programming with Data.” There are two fundamental extension functions provided: .C() and .Call(). The first, .C() first appeared in an earlier version of the R language and is much more restrictive. It only supports pointers to basic C types which is a very severe restriction. More current code uses the richer .Call() interface exclusively. It can operate on the so-called SEXP objects, which stands for pointers to S expression objects. Essentially everything inside R is represented as such a SEXP object, and by permitting exchange of such objects between the C and C++ languages on the one hand, and R on the other hand, programmers have the ability to operate directly on R objects. This is key for Rcpp as well—and the principal reason why Rcpp works exclusively with .Call(). Rcpp essentially sits on top of this API offered by R itself and provides a complementary interface to those aiming to extend R. By leveraging facilities available to C++ programmers (but not in plain C), Rcpp can offer what we think is an easier to use and possibly even more consistent interface that is closer to the way R programmers work with their data. 2.4 A First Compilation with Rcpp Having discussed the required compiler and toolkit setup, and having seen introductory examples in Chap. 1, it is now appropriate to address how to use these tools on an actual source file. In doing so, we will use the explicit commands to illustrate the different steps required. Shorter and more convenient alternatives will be discussed later. We consider the first example from the introductory chapter and assume that both the fibonacci function and the wrapper have been saved in a file fibonacci.cpp. Then, on a 64-bit Linux computer with Rcpp installed in a standard location, we can compile it via the example shown in Listing 2.1. 2 4 6 8 10 sh> PKG_CXXFLAGS="-I/usr/local/lib/R/site-library/Rcpp/include" \ PKG_LIBS="-L/usr/local/lib/R/site-library/Rcpp/lib -lRcpp" \ R CMD SHLIB fibonacci.cpp g++ -I/usr/share/R/include -DNDEBUG \ -I/usr/local/lib/R/site-library/Rcpp/include \ -fpic -g -O3 -Wall -c fibonacci.cpp -o fibonacci.o g++ -shared -o fibonacci.so fibonacci.o \ -L/usr/local/lib/R/site-library/Rcpp/lib -lRcpp -Wl,-rpath,/usr/local/lib/R/site-library/Rcpp/lib \ -L/usr/lib/R/lib -lR Listing 2.1 A first manual compilation with Rcpp 24 2 Tools and Setup Execution of R CMD SHLIB triggers two distinct g++ invocations. The first command (on line four) corresponds to R CMD COMPILE to turn a given source file into an object file. The second command (on line eight) corresponds to R CMD LINK and uses the g++ compiler a second time to link the object file into a shared library. This creates the file fibonacci.so which we can load into R. Also note how two environment variables are defined on lines 1 and 2 to let R know where to find the header files and libraries required for use with Rcpp. But before we get to that step, let us review a few of the issues with the approach described here: 1. On line one, we have to set two environment variables, one each for the header file location (via PKG_CXXFLAGS) and one for the library location and name (via PKG_LIBS). 2. Both these variables use explicit path settings which are not portable across computers, let alone operating systems. 3. File extensions are operating-system dependent, the shared library ends on .so on Linux but .dylib under OS X. To address some of these concerns, Rcpp offers two helper functions which can be invoked using the scripting front-end Rscript. 2 sh> PKG_CXXFLAGS=‘Rscript -e ’Rcpp:::CxxFlags()’‘ \ PKG_LIBS=‘Rscript -e ’Rcpp:::LdFlags()’‘ \ R CMD SHLIB fibonacci.cpp Listing 2.2 A first manual compilation with Rcpp using Rscript Running the example in Listing 2.2 results in the same two commands as above. But this approach improves over the previous one: 1. By using commands to request information which is returned in a portable fashion freeing the user from having to specify these details. 2. The helper functions are part of the Rcpp package and can therefore impute the relevant locations in a portable manner. 3. Moreover, the helper functions also know the operating system details and therefore are able to supply the required per-operating system details such as file extensions. The end result is that we have a single command that works across platforms,1 including portably in a Makefile. So we can now use this file in R. 1 3 R> dyn.load("fibonacci.so") R> .Call("fibWrapper", 10) [1] 55 Listing 2.3 Using the first manual compilation from R 1 Well, Windows user may have to set the two environment variables differently but that is a shell limitation in Windows and not an issue with Rcpp. 2.5 The Inline Package 25 We can load the shared library via the dyn.load() function. It uses the full filename, including the explicit platform-dependent extension which is .so on Unix, .dll on Windows, and .dylib on OS X. Once the shared library is loaded into the R session, we can then call the function fibWrapper using the standard .Call() interface. We supply the argument n to compute the corresponding Fibonacci number and obtain the requested result. So this example proves the point we were trying to make in this section: we can extend R with simple C++ functions, even though the process of doing so may seem somewhat involved and intimidating at first. The inline package discussed in the next section and the Rcpp attributes extension discussed in the following section make the build process a lot more seamless to use. 2.5 The Inline Package We saw in the previous chapter how to compile, link, and load a new function for use by R. We will now look more closely at a tool first mentioned in that introductory chapter which greatly simplifies this process. 2.5.1 Overview Extending R with compiled code requires a mechanism for reliably compiling, linking, and loading the code. Doing this in the context of a package is preferable in the long run, but it may well be too involved for quick explorations. Undertaking the compilation manually is certainly possible. But, as the previous section showed, also somewhat laborious. A better alternative is provided by the inline package (Sklyar et al. 2012) which compiles, links, and loads a C, C++ , or Fortran function—directly from the R prompt using simple functions cfunction and cxxfunction. The latter provides an extension which works particularly well with Rcpp via the so-called plugins which provide information about additional header file and library locations; and a third function, rcpp, which defaults to selecting that plugin for use with Rcpp. The use of inline is possible as Rcpp itself can be installed and updated just like any other R package using, for example, the install.packages() function for initial installation as well as update.packages() for upgrades. So even though R / C++ interfacing would otherwise require source code, the Rcpp library is always provided ready for use as a pre-built library through the CRAN package mechanism.2 2 This presumes a platform for which pre-built binaries are provided. Rcpp is available in binary form for Windows and OS X users from CRAN, and as a .deb package for Debian and Ubuntu users. For other systems, the Rcpp library is automatically built from source during installation or upgrades. 26 2 Tools and Setup The library and header files provided by Rcpp for use by other packages are installed along with the Rcpp package. When building a package, the LinkingTo: Rcpp directive in the DESCRIPTION file lets R properly reference the header files automatically. That makes usage easier than for direct compilation via R CMD COMPILE or R CMD SHLIB (as in the previous section) where the function Rcpp:::CxxFlags() can be used to export the header file location and the appropriate -I switch. The Rcpp package also provides appropriate information for the -L switch needed for linking via the function Rcpp:::LdFlags(). It can be used by Makevars files of other packages, or to directly set the variables PKG_CXXFLAGS and PKG_LIBS, respectively. The inline package makes use of both these facilities. All of this is done behind the scenes without the need for explicitly setting compiler or linker options. Moreover, by specifying the desired outcome rather to explicitly encode it, we provide a suitable level of indirection that permits the Rcpp package to completely abstract away the operating system-specific components. Usage of Rcpp via inline is therefore as portable as R itself: the same code will run on Windows, OS X, and Linux (provided the required tools are present as discussed earlier). A standard example for a function extending R is a convolution of two vectors; this example is used throughout the “Writing R Extensions” manual (R Development Core Team 2012d). This convolution example can also be rewritten for use by inline as shown below. The function body is provided by the R character variable src, the function header (and its variables and their names) is defined by the argument signature, and we only need to enable plugin=="Rcpp" to obtain a new R function fun based on the C++ code in src: 1 3 5 7 9 11 13 15 R> src <- ’ + Rcpp::NumericVector xa(a); + Rcpp::NumericVector xb(b); + int n_xa = xa.size(), n_xb = xb.size(); + + Rcpp::NumericVector xab(n_xa + n_xb - 1); + for (int i = 0; i < n_xa; i++) + for (int j = 0; j < n_xb; j++) + xab[i + j] += xa[i] * xb[j]; + return xab; + ’ R> fun <- cxxfunction(signature(a="numeric", b="numeric"), + src, plugin="Rcpp") R> fun( 1:4, 2:5 ) [1] 2 7 16 30 34 31 20 Listing 2.4 Convolution example using inline With one assignment—albeit spanning lines one to eleven—to the R variable src, and one call of the R function cxxfunction (provided by the inline package), we have created a new R function fun that uses the C++ code we assigned to src—and all this functionality can be used directly from the R prompt making prototyping with C++ functions straightforward. 2.5 The Inline Package 27 Note that with version 0.3.10 or later of inline, a convenience wrapper rcpp is available which automatically adds the plugin="Rcpp" argument so that the invocation in Listing 2.4 could also have been written as fun <- rcpp(signature(a="numeric", b="numeric"), src) but we will generally use the cxxfunction() form. A few further options are noteworthy at this stage. Adding verbose=TRUE shows both the temporary file created by cxxfunction() and the invocations by R CMD SHLIB. This can be useful for debugging if needed. Listing 2. 5 shows the generated file. Noteworthy aspects include the function declaration with the randomized function name, and the signature with the two variable names implied from the signature() argument to cxxfunction. Also shown are the macros BEGIN_RCPP and END_RCPP discussed in Sect. 2.7. Other options permit us to set additional compiler flags as well as additional include directories as shown in the next section. 2.5.2 Using Includes As mentioned in the previous section, cxxfunction offers a number of other options. One aspect that we would like to focus on now is includes. As seen in Sect. 1.2.7, it allows us to include another block of code to, say, define a new struct or class type. An example is provided by the following code sample from the Rcpp FAQ which was created after a user question on the Rcpp mailing list. A simple templated class which squares its argument is created in a code snippet supplied via include. The main function then uses this templated class on two different types: 1 3 5 7 9 11 13 15 17 19 R> + + + + + ’ R> + + + + + + + + ’ R> + R> inc <- ’ template <typename T> class square : public std::unary_function<T,T> { public: T operator()( T t) const { return t*t ;} }; src <- ’ double x = Rcpp::as<double>(xs); int i = Rcpp::as<int>(is); square<double> sqdbl; square<int> sqint; Rcpp::DataFrame df = Rcpp::DataFrame::create(Rcpp::Named("x", sqdbl(x)), Rcpp::Named("i", sqint(i))); return df; fun <- cxxfunction(signature(xs="numeric", is="integer"), body=src, include=inc, plugin="Rcpp") fun(2.2, 3L) 28 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 2 Tools and Setup >> Program source : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : // includes from the plugin #include <Rcpp.h> #ifndef BEGIN_RCPP #define BEGIN_RCPP #endif #ifndef END_RCPP #define END_RCPP #endif using namespace Rcpp; // user includes // declarations extern "C" { SEXP file2370678f8cfe( SEXP a, SEXP b) ; } // definition SEXP file2370678f8cfe( SEXP a, SEXP b ){ BEGIN_RCPP Rcpp::NumericVector xa(a); Rcpp::NumericVector xb(b); int n_xa = xa.size(), n_xb = xb.size(); Rcpp::NumericVector xab(n_xa + n_xb - 1); for (int i = 0; i < n_xa; i++) for (int j = 0; j < n_xb; j++) xab[i + j] += xa[i] * xb[j]; return xab; END_RCPP } Listing 2.5 Program source from convolution example using inline in verbose mode 2.5 The Inline Package 21 29 x i 1 4.84 9 Listing 2.6 Using inline with include= This code example uses a few Rcpp items we have not yet encountered such as the DataFrame class or the static create method (and these will be discussed later). We again see the explicit converter Rcpp::as<>() used to access scalar types integer and double passed to C++ from R. More important is the definition of the sample helper class square. It derives from a public class std::unary_function templated to the same argument and return type. It also defines just one operator() which, unsurprisingly for a class called square, returns its argument squared. The example demonstrates that while cxxfunction may be of primary use for short and simple test applications, it can also be used to test in more complicated setups. In fact, the plugin structure discussed in the next section allows for even more customization, should it be needed. The RcppArmadillo, RcppEigen, and RcppGSL packages discussed in the final part of the book all use this facility via a plugin generator. 2.5.3 Using Plugins We have seen the use of the options plugin="Rcpp" in the previous examples. Plugins provide a general mechanism for packages using Rcpp to supply additional information which may be needed to compile and link the particular package. Examples may include additional header files and directories, as well as additional library names to link against as well as their locations. Without going into too much detail about how to write a plugin, we can easily illustrate the use of a plugin. Below is a example which shows the code underlying the fastLm() example from RcppArmadillo. We will rebuild it using cxxfunction from inline: 2 4 6 8 10 12 14 R> + + + + + + + + + + + + + + src <- ’ Rcpp::NumericVector yr(ys); Rcpp::NumericMatrix Xr(Xs); int n = Xr.nrow(), k = Xr.ncol(); arma::mat X(Xr.begin(), n, k, false); arma::colvec y(yr.begin(), yr.size(), false); arma::colvec coef = arma::solve(X, y); // fit y ˜ X arma::colvec res = y - X*coef; // residuals double s2 = std::inner_product(res.begin(),res.end(), res.begin(),double()) / (n - k); arma::colvec se = arma::sqrt(s2 * 30 16 18 20 22 24 2 Tools and Setup + arma::diagvec(arma::inv(arma::trans(X)*X))); + + return Rcpp::List::create(Rcpp::Named("coef")= coef, + Rcpp::Named("se") = se, + Rcpp::Named("df") = n-k); + ’ R> fun <- cxxfunction(signature(ys="numeric", Xs="numeric"), + src, plugin="RcppArmadillo") R> ## could now run fun(y, X) to regress y ˜ X Listing 2.7 A first RcppArmadillo example for inline This illustrates nicely how inline can be used to compile, link, and load packages on the fly, even when these packages depend on several other R packages. In the case of RcppArmadillo, which integrates the Armadillo C++ library, the dependency is on both RcppArmadillo and Rcpp. The plugin provides the necessary information to compile and link this example. 2.5.4 Creating Plugins A simple example of how to modify a plugin is provided in the Rcpp-FAQ vignette. This example is centered around using the GNU Scientific Library (or GNU GSL, or just GSL for short) along with R. The GSL is described in Galassi et al. (2010). The example here illustrates how to set a fixed header location. A more comprehensive example might also attempt to determine the location, possibly by querying the gsl-config helper script as done in the RcppGSL package discusses in Chap. 11. 2 4 6 8 10 12 14 16 18 20 22 R> gslrng <- ’ + int seed = Rcpp::as<int>(par) ; + gsl_rng_env_setup(); + gsl_rng *r = gsl_rng_alloc (gsl_rng_default); + gsl_rng_set (r, (unsigned long) seed); + double v = gsl_rng_get (r); + gsl_rng_free(r); + return Rcpp::wrap(v); + ’ R> plug <- Rcpp:::Rcpp.plugin.maker( + include.before = "#include <gsl/gsl_rng.h>", + libs = paste("-L/usr/local/lib/R/site-library/Rcpp/lib " + "-lRcpp -Wl,-rpath," + "/usr/local/lib/R/site-library/Rcpp/lib ", + "-L/usr/lib -lgsl -lgslcblas -lm", sep="")) R> registerPlugin("gslDemo", plug ) R> fun <- cxxfunction(signature(par="numeric"), + gslrng, plugin="gslDemo") R> fun(0) [1] 4293858116 R> fun(42) [1] 1608637542 R> Listing 2.8 Creating a plugin for use with inline 2.6 Rcpp Attributes 31 Here the Rcpp function Rcpp.plugin.maker is used to create a plugin named plug. We specify the inclusion of the GSL header file declaring the random number generator functions. We also specify the required libraries for linking against the GSL (with values suitable for a Linux system). Subsequently, the plugin is registered and deployed in a call to cxxfunction(). Finally, we test the new function and generate two random draws for two different initial seeds. 2.6 Rcpp Attributes A recent addition to Rcpp provides an even more direct connection between C++ and R. This feature is called “attributes” as it is inspired by a C++ extension of the same name in the new C++11 standard (which will be available to R users only when CRAN permits use of these extension, which may be years away). Simply put, “Rcpp attributes” internalizes key features of the inline package while at the same time reusing some of the infrastructure built for use by inline such as the plugins. “Rcpp attributes” adds new functions sourceCpp to source a C++ function (similar to how source is used for R code), cppFunction for a similar creation of a function from a character argument, evalCpp for a direct evaluation of a C++ expression and more. Behind the scenes, these functions make use of the existing wrappers as<> and wrap and do in fact rely heavily on them: any arguments with existing converters to or from SEXP types can be used. The standard build commands such as R RMD COMPILE and R CMD SHLIB are executed behind the scenes, and template programming is used to provide compile-time bindings and conversion. An example may illustrate this: 5 cpptxt <- ’ int fibonacci(const int x) { if (x < 2) return(x); return (fibonacci(x - 1)) + fibonacci(x - 2); }’ 7 fibCpp <- cppFunction(cpptxt) 1 3 # compiles, load, links, ... Listing 2.9 Example of new cppFunction cppFunction returns an R function which calls a wrapper, also created by cppFunction in a temporary file which it also builds. The wrapper function in turn calls the C++ function we passed as a character string. The build process administered by cppFunction uses a caching mechanism which ensures that only one compilation is needed per session (as long as the source code used is unchanged). Alternatively, we could pass the name of a file containing the code to the function sourceCpp which would compile, link, and load the corresponding C++ code and assign it to the R function on the left-hand side of the assignment. 32 2 Tools and Setup These new attributes can also use inline plugins. The following simple example uses the plugin for the RcppGSL package (which is discussed more fully in Chap. 11). The program itself is not that interesting: we merely use the definitions of five physical constants. 1 3 5 7 9 11 13 15 17 R>code <- ’ + #include <gsl/gsl_const_mksa.h> // decl of constants + + std::vector<double> volumes() { + std::vector<double> v(5); + v[0] = GSL_CONST_MKSA_US_GALLON; // 1 US gallon + v[1] = GSL_CONST_MKSA_CANADIAN_GALLON; // 1 Canadian gallon + v[2] = GSL_CONST_MKSA_UK_GALLON; // 1 UK gallon + v[3] = GSL_CONST_MKSA_QUART; // 1 quart + v[4] = GSL_CONST_MKSA_PINT; // 1 pint + return v; + }’ R> R> gslVolumes <- cppFunction(code, depends="RcppGSL") R> gslVolumes() [1] 0.003785412 0.004546090 0.004546092 0.000946353 0.000473176 R> Listing 2.10 Example of new cppFunction with plugin But as inline is very mature and tested, and as the attributes functions are at this point not of comparable maturity, the remainder of the book will continue to use inline and its slightly more verbose expression. Going forward more new documentation will probably be written using the new functions once the interface stabilizes. Transitioning from one system to the other is seamless as the examples above indicated. 2.7 Exception Handling C++ has a mechanism for handling exceptions. At a conceptual level, this is similar to what R programmers may already be familiar with via the tryCatch() function, or its simpler version try(). In essence, inside a segment of code preceded by the keyword try, an exception can be thrown via the keyword throw followed by an appropriately typed exception object which is typically inherited from the std::exceptions type. The following example may illustrate this. 1 3 5 7 extern "C" SEXP fun( SEXP x ) { try { int dx = Rcpp::as<int>(x); if (dx > 10) throw std::range_error("too big"); return Rcpp::wrap(dx * dx); } catch( std::exception& __ex__) { 2.7 Exception Handling forward_exception_to_r(__ex__); } catch(...) { ::Rf_error( "c++ exception (unknown reason)" ); } return R_NilValue; // not reached 9 11 13 33 } Listing 2.11 C++ example of throwing and catching an exception For reasons that will become apparent in a moment, we are showing a complete function rather than a just short snippet used with cxxfunction() from the inline package. If this function is compiled and linked (with appropriate flags to find the Rcpp headers and library), we can call it as 1 3 5 7 R> .Call("fun", 4) [1] 16 R> .Call("fun", -4) [1] 16 R> .Call("fun", 11) Error in cpp_exception(message = "too big", class = "std::range_error") : too big R> Listing 2.12 Using C++ example of throwing and catching an exception As the code tests only whether the argument is larger than 10, both 4 and −4 are properly squared by this (not very interesting) function. For the argument 11, however, the exception is triggered via the throw followed by exception of type std::range_error with a short text indicating that the argument is too large for the assumed parameter limitation. What happens after the throw is that a suitable catch() segment is identified. Here, as the exception was typed with a type inherited from the standard exception, the first branch is the one the code enters. The exception is then passed to an internal Rcpp function which converts it into an R error message. And indeed, at the R level, we see both that an exception was caught and what its type was. This is a very useful mechanism that permits the programmer to return control to the calling instance (here the R program) with a clearly defined message. We can illustrate this last point with a second example. What happens when we call the function with a non-numeric argument? 2 4 R> .Call("fun", "ABC") Error in cpp_exception(message = "not compatible with INTSXP", class = "Rcpp::not_compatible") : not compatible with INTSXP R> Listing 2.13 C++ example of example from Rcpp-type checks Here the function is called with a character variable which cannot be used in the assignment to the integer variable dx. So an exception is thrown by the templated Rcpp function as which is templated to an integer type (written as as<int>) 34 2 Tools and Setup here. The exception that is thrown is of type Rcpp::not_compatible which also inherits from the standard exception and a proper R error message is generated. Similar messages will be shown if the Rcpp types discussed in the next two chapters are instantiated with inappropriate types. If no matching type is found, the default catch branch is executed. Here, it simply calls the error function of the R API with a constant text message. Because the framework of the try statement (preceding the actual code block) and the catch clauses at the end are in fact invariant, they can also be expressed as a simple unconditional macro. Such macros are provided by Rcpp. Their definitions are shown in Listing 2.14. 2 #ifndef BEGIN_RCPP #define BEGIN_RCPP try{ #endif 4 6 8 10 12 #ifndef VOID_END_RCPP #define VOID_END_RCPP } \ catch (std::exception& __ex__) { \ forward_exception_to_r(__ex__); \ } \ catch(...) { \ ::Rf_error("c++ exception (unknown reason)"); \ } #endif 14 16 #ifndef END_RCPP #define END_RCPP VOID_END_RCPP return R_NilValue; #endif Listing 2.14 C++ macros for Rcpp exception handling These macros are also used by cxxfunction() so that the following function is fully equivalent to Listing 2.11. 1 3 5 7 9 src <- ’int dx = Rcpp::as<int>(x); if( dx > 10 ) throw std::range_error("too big"); return Rcpp::wrap( dx * dx); ’) fun <- cxxfunction(x="integer", body=src, plugin="Rcpp") fun(3) [1] 9 fun(13) Error: too big Listing 2.15 inline version of C++ example of throwing and catching an exception Thanks to inline, this version is much easier to compile, link, and load. And of course, an Rcpp attributes version can be written just as easily: 2 cppFunction(’ int fun2(int dx) { if ( dx > 10 ) 2.7 Exception Handling 4 6 8 10 35 throw std::range_error("too big"); return dx * dx; } ’) fun2(3) [1] 9 fun2(13) Error: too big Listing 2.16 Rcpp attributes version of C++ example of throwing and catching an exception The proper exception handling framework by Rcpp is provided automatically in both cases by adding the required code to the generated files. Part II Core Data Types Chapter 3 Data Structures: Part One Abstract This chapter first discusses the RObject class at the heart of the Rcpp class system. While RObject is not meant to be used directly, it provides the foundation upon which many important and frequently-used classes are built. We then introduce the two core vector types NumericVector and IntegerVector. Other related vector types are briefly discussed at the end of the chapter. 3.1 The RObject Class The RObject class occupies a central role in the implementation of the Rcpp class hierarchy. While it is not directly user-facing, it provides the common structure used by all the classes detailed below. It is the basic class underlying the Rcpp API. We will therefore discuss aspects of this class before we turn to the key classes derived from it. An instance of the RObject class encapsulates an R object. Every R object itself is internally represented by a SEXP: a pointer to a so-called S expression object, or SEXPREC for short. The “R Internals” manual (R Development Core Team 2012b, Section 1.1) provides a fuller treatment of the SEXP pointers to SEXPREC types, as well as of the related VECSXP or vectors of S expression pointers. One key aspect is that S expression objects are union types (which are sometimes called variant types). This means that depending on the particular value in a control field, different types can be represented. One could think of this as being similar to a switch statement where, conditional on the value of the expression, one out of a set of given branches is executed. With a union type, depending on the value of the control field, the remaining bits will be interpreted as forming the type implied by the control field. Consequently, this implies that an object pointed to by a SEXP could, for example, hold an integer vector, whereas another object could hold a character string, or another type including one of the several internal types. An important aspect of this representation is that SEXP objects should be considered opaque. In programming, this term usually refers to something which should D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4 3, © The Author 2013 39 40 3 Data Structures: Part One be accessed and viewed only indirectly using helper functions. Such functions are provided by the R API for those using the C language. In particular, macros (in fact, two different sets of macros) are provided to access the SEXP types. Our Rcpp package extends this C language API with a higher-level abstraction provided via the C++ language. Similar to the C API, the Rcpp API contains a set of consistent functions which are appropriate for all types. Key among these functions are those that manage memory allocation and de-allocation. As a general rule, users of the Rcpp API never need to manually allocate memory, or free it after use. This is an extremely important point as memory management has been vilified as a common source of programming mistakes. For example, the C language is described by critics as too error-prone due to the explicit and manual memory management it requires. Languages such as Java or C# aim to improve upon C by managing the memory for the users. C++ occupies a middle ground: programmers can manually control memory management (which is important for performance-critical application), yet language constructs such as the Standard Template Library (see Sect. A.5 for a brief introduction) also provide control structures (such as vectors and lists) which—by providing a higher-level abstraction—free the programmer from the manual and error-prone aspects of memory allocation and de-allocation. Rcpp follows this philosophy, and the RObject class is a key piece in the implementation as it transparently manages memory allocation and de-allocation for the users. As for the actual implementation, the key aspect of the RObject class is that it is only a very thin layer around the SEXP type it encapsulates. This extends the opaque view approach of the R API by fully wrapping the underlying SEXP and providing the class member functions to access or modify it. One can think of the API provided by Rcpp as providing a richer, more complete means of accessing the underlying SEXP data representation which remains unaltered from the way R represents an object. In fact, the SEXP is indeed the only data member of an RObject.1 Hence, the RObject class does not interfere with the way R manages its memory. Nor does it not perform copies of the object into a different and possibly suboptimal C++ representation. Rather, it acts as a proxy to the object it encapsulates. With this, methods applied to the RObject instance are relayed back to the SEXP in terms of the standard R API functionality invoked by the proxy classes. The RObject class also takes advantage of the explicit life cycle of C++ objects: objects which are dynamically allocated (“on the heap” in C or C++ parlance) using a so-called constructor member function are then automatically deallocated at the end of local scope by the deconstructor. This lets the underlying R object (represented by the SEXP) be transparently managed by the R garbage collector. The RObject effectively treats its underlying SEXP as a resource. The constructor of the RObject class takes the necessary measures to guarantee that the underlying SEXP is protected from the garbage collector, and the destructor assumes the responsibility to withdraw that protection. Together, these two steps 1 See the header file include/Rcpp/RObject.h for details. 3.2 The IntegerVector Class 41 provide transparent and automatic memory management for the user. And, by assuming the entire responsibility of garbage collection, Rcpp relieves the programmer from having to write repetitive code to manage the protection stack with the familiar PROTECT and UNPROTECT macros provided by the R API. Aside from the memory management functionality, a number of helper functions are applicable to instances of the RObject class. As the underlying SEXP may be of different types, these member functions have to be applicable to any R object that can be represented by a SEXP, irrespective of its type. Several member functions are available for all classes that are deriving from the RObject class. The isNULL, isObject, and isS4 functions are used to query object properties. The explicit naming of these functions provides a first description; these functions return a true or false value depending on the object they are being applied to. Similarly, the member function inherits can be used to test for inheritance from a specified class. Attributes of R objects can be queried or set with the functions attributeNames, hasAttribute, or attr. For S4 objects,2 the hasSlot and slot functions permit the handling of the data slots that are a key feature of the S4 object system. A large number of user-visible classes derive from the RObject class: IntegerVector for vectors of type integer. NumericVector for vectors of type numeric. LogicalVector for vectors of type logical. CharacterVector for vectors of type character. GenericVector for generic vectors which implement List types. ExpressionVector for vectors of expression types. RawVector for vectors of type raw. For the integer and numeric types, we also have IntegerMatrix and NumericMatrix corresponding to the equivalent R types, and similarly implemented as vectors with associated dimension attributes specifying row and column sizes. We will discuss the integer and numeric vectors in some detail— including examples—in the next two sections. 3.2 The IntegerVector Class The IntegerVector class provides a natural mapping from and to the standard R integer vectors. We can assign existing R vectors to C++ objects, and we can create new integer vectors directly in C++ and return them to R. In both cases, the corresponding converter function—the templated as<>() function in the case of converting from R to C++ and the wrap() function for the inverse direction—are automatically called thanks to C++ template logic. 2 Member functions dealing with slots are only applicable to S4 objects; otherwise an exception is thrown. 42 3 Data Structures: Part One 3.2.1 A First Example: Returning Perfect Numbers Suppose we wanted to write a function that provides a vector with the first four perfect (and even) numbers. A perfect number is a positive integer that is equal to the sum of its divisors. The first one is six, as it is the sum of its divisors one, two, and three. The second perfect number is 28: 28 = 1 + 2 + 4 + 7 + 14 and two more even perfect numbers—496 and 8182—were already known by the ancient Greeks.3 With the help of the inline package introduced initially in Sect. 1.2.4 and more fully in Sect. 2.5, one can quickly create functions in R which are based on C++ code. The inline package takes the C++ source code as a character variable, and then compiles, links, and loads this code making it directly accessible via a function it returns. So here we assign five statements, separated by semicolons, to a single R character variable src. This variable is then passed to the cxxfunction() with additional arguments for a functions signature—empty in our case—and selection of the “Rcpp” plugin which will instruct cxxfunction() to look for the header files and library from the Rcpp package: 1 3 5 7 9 11 R> src <- ’ + Rcpp::IntegerVector epn(4); + epn[0] = 6; + epn[1] = 14; + epn[2] = 496; + epn[3] = 8182; + return epn; + ’ R> fun <- cxxfunction(signature(), src, plugin="Rcpp") R> fun() [1] 6 14 496 8182 Listing 3.1 A function to return four perfect numbers The example is of course not very meaningful—we could have created the same R vector in a single R statement. Yet the short program already highlights a few key points about the vector types: • Creating a new vector is as easy as selecting an initial size (and there are other creation methods). • Elements of the vector can be set one-by-one (and the new C++ standard C++11 will allow array-style assignments in one statement). • Returning the vector requires no additional code as the implicit version of wrap is being called. We will build upon this example in the next section. 3 More details are at http://en.wikipedia.org/wiki/Perfect_number. 3.2 The IntegerVector Class 43 3.2.2 A Second Example: Using Inputs The previous example showed how to create a new vector at the C++ level. Receiving a vector from R is also straightforward. Consider the next simple example which reimplements the prod() function for a given integer vector. Note that the use of the colon operator (:) creates as integer-valued sequence even though we do not use explicit integer instantiation via the L suffix (e.g., 10L). 1 3 5 7 9 11 R> src <- ’ + Rcpp::IntegerVector vec(vx); + int prod = 1; + for (int i=0; i<vec.size(); i++) { + prod *= vec[i]; + } + return Rcpp::wrap(prod); ’ R> fun <- cxxfunction(signature(vx="integer"), src, + plugin="Rcpp") R> fun(1:10) # creates integer vector [1] 3628800 Listing 3.2 A function to reimplement prod() The example shows a second possibility for instantiation of an IntegerVector object. In this case, through implicit use of the as<>() template function, the SEXP-typed argument vx is used. This variable is defined in the function signature via the first argument to cxxfunction. Through vx, the vector vec has been instantiated: It contains a copy of the pointer to the original SEXP object from R. It is important to stress that only the pointer to the underlying data is copied, not the underlying data itself. And given vec, it is straightforward to compute the product. We can also solve this problem using tools from the Standard Template Library (STL) as shown in the next example. 2 4 6 8 10 R> src <- ’ + Rcpp::IntegerVector vec(vx); + int prod = std::accumulate(vec.begin(),vec.end(), + 1, std::multiplies<int>()); + return Rcpp::wrap(prod); + ’ R> fun <- cxxfunction(signature(vx="integer"), src, + plugin="Rcpp") R> fun(1:10) # creates integer vector [1] 3628800 Listing 3.3 A second function to reimplement prod() This approach employs the accumulate() function, which is in the std namespace like the rest of the STL. It is called with iterators—which we can think of as functions which safely generalize the notion of a pointer—to the beginning and the end of the vector. This allows the function to operate on the thereby chosen 44 3 Data Structures: Part One range of elements. The next two arguments are an initial value of one, just like in the previous example, and a binary function, in this case the predefined function multiplies which is templated to the integer type to correspond to our vector type. Usage of STL may appear a little more complicated at first. But just how functional programming in R tends to become more natural with use, extended use of the STL is certainly a recommended programming habit for C++. These two code examples, though simple to understand and hence suitable for exposition, still have few flaws we would not use in more serious code. First, no test is made for NA or zero values in the vector. Second, the code is likely to do poorly on larger values due to integer overflow: even the sequence 1L:13L returns a result that is already different from what prod returns, so switching to computing the product as a sum of logarithms may be preferable, while possibly being more expensive to compute (yet this could be provided as an option). Third, and on a more cosmetic note, we could even have omitted the assignment to the temporary variable and returned the result from accumulate directly in wrap. 3.2.3 A Third Example: Using Wrong Inputs An important feature of the class hierarchy is the ability to test for conforming input types. Consider a function written for an integer vector (as above). What behavior shall we expect with types that are different? This could be a floating-point vector where an integer vector is expected: automatic conversion would be nice. But what behavior would be expected for clearly inappropriate types? It so happens that we can illustrate this behavior using some of the example programs shown above. Let us revisit the prod-replacement expecting an integer vector. 2 4 R> fun(1:10) [1] 3628800 R> fun(seq(1.0, 1.9, by=0.1)) [1] 1 Listing 3.4 Testing the prod() function with floating-point inputs The first result restates what we had seen before: floating-point number (which happen to be whole numbers) can be converted without a problem to the corresponding integers. The second example is more interesting: The product of ten floatingpoint number over the interval from 1.0 to 1.9 yields 1? How? The answers lie in the typical behavior of computers when confronted with floating-point and integer numbers. Here, we assigned a vector of floating-point numbers—all greater or equal to one—to an integer vector. Standard behavior in this case is truncation (rather than rounding). So the value 1.5 simply becomes 1. And consequently, the integer product of ten floating-point values between 1.0 and 1.9 is indeed 1 as the calculation reduces to 110 . But what happens when we use clearly inappropriate types? 3.3 The NumericVector Class 2 45 R> fun(LETTERS[1:10]) Error in fun(LETTERS[1:10]) : not compatible with INTSXP Listing 3.5 Testing the prod() function with inappropriate inputs An exception is thrown when the integer vector object is instantiated as no conversion from character to integer is possible. This exception is caught and then transformed into an R error message, while control (in the interactive session) resumes— as we discussed above in Sect. 2.7. In other words, the type-conversion code behind the Rcpp object hierarchy both tests for appropriate types and safely returns control to the R session in case an inadmissible type is used as input. Last but not least, we note that R integer vectors can be converted as easily into std::vector<int>. Similarly, the NumericVector type discussed in the next section can be converted in std::vector<double>. 3.3 The NumericVector Class 3.3.1 A First Example: Using Two Inputs NumericVector is quite possibly the most commonly used vector type among the Rcpp classes. It corresponds to the basic R type of a numeric vector and can hold real-valued floating-point variables. Its storage type is double, and all computation will be in double precision just as in R itself. As a first example, consider a simple generalization of a sum of squares calculation. Instead of always squaring the elements, we also pass an argument for the exponent. 2 4 6 8 10 12 14 R> src <- ’ + Rcpp::NumericVector vec(vx); + double p = Rcpp::as<double>(dd); + double sum = 0.0; + for (int i=0; i<vec.size(); i++) { + sum += pow(vec[i], p); + } + return Rcpp::wrap(sum); + ’ R> fun <- cxxfunction(signature(vx="numeric", dd="numeric"), + src, plugin="Rcpp") R> fun(1:4,2) [1] 30 R> fun(1:4,2.2) [1] 37.9185 Listing 3.6 A function to return a generalized sum of powers This example could also be rewritten using an STL algorithm. But using a custom-written transformation function would be a little more involved due to its C++ focus and detract us from examining the C++ and R integration. 46 3 Data Structures: Part One 3.3.2 A Second Example: Introducing clone One important aspect of the proxy model implementation mentioned above is that the C++ object contains a pointer to the underlying SEXP object from R, which is itself a pointer. This implies that code trying to transform a vector—say by taking logarithms—and wanting to return a modified copy along with the original vector cannot be written as follows where both vectors are constructed from the input argument: 1 3 5 7 9 11 13 15 R> src <- ’ + Rcpp::NumericVector invec(vx); + Rcpp::NumericVector outvec(vx); + for (int i=0; i<invec.size(); i++) { + outvec[i] = log(invec[i]); + } + return outvec; + ’ R> fun <- cxxfunction(signature(vx="numeric"), + src, plugin="Rcpp") R> x <- seq(1.0, 3.0, by=1) R> cbind(x, fun(x)) x [1,] 0.0000000 0.0000000 [2,] 0.6931472 0.6931472 [3,] 1.0986123 1.0986123 Listing 3.7 Declaring two vectors from the same SEXP type Modifications in outvec do, due to its underlying pointer sharing with the same underlying R object, also affect invec. Changes will therefore also affect the R object passed in as an argument. So while this lightweight proxy model makes for efficient code, we need different operations to create an independent second vector. The clone method is a suitable alternative as it allocates memory for a new object. Hence, changes do not propagate to the original vector: 2 4 6 8 10 12 14 16 R> src <- ’ + Rcpp::NumericVector invec(vx); + Rcpp::NumericVector outvec = Rcpp::clone(vx); + for (int i=0; i<invec.size(); i++) { + outvec[i] = log(invec[i]); + } + return outvec; + ’ R> fun <- cxxfunction(signature(vx="numeric"), + src, plugin="Rcpp") R> x <- seq(1.0, 3.0, by=1) R> cbind(x, fun(x)) x [1,] 1 0.0000000 [2,] 2 0.6931472 [3,] 3 1.0986123 R> Listing 3.8 Declaring two vectors from the same SEXP type using clone 3.3 The NumericVector Class 47 We should note that clone is a generic feature of vectors derived from RObject objects and applies to all objects instantiated from a SEXP. To close this point, we should also note that an even simpler form uses Rcpp sugar (discussed more fully in Chap. 8) to directly assign the result via a single vectorized call of the log() function: 1 3 5 7 9 11 13 R> src <- ’ + Rcpp::NumericVector invec(vx); + Rcpp::NumericVector outvec = log(invec); + return outvec; + ’ R> fun <- cxxfunction(signature(vx="numeric"), + src, plugin="Rcpp") R> x <- seq(1.0, 3.0, by=1) R> cbind(x, fun(x)) x [1,] 1 0.0000000 [2,] 2 0.6931472 [3,] 3 1.0986123 R> Listing 3.9 Using Rcpp sugar to compute a second vector We could even have omitted the declaration of, and assignment to, outvec and computed the result directly in the return statement thanks to the implicit use of wrap(). 3.3.3 A Third Example: Matrices Besides vectors, matrices play an equally important role in modeling, as they do in the underlying linear algebra derivations. Internally, matrices are implemented as vectors with an associated dimension attribute, just as they are in R itself. Similarly, the more general form is actually a multidimensional array—and the matrix is merely a special case where the dimension attribute has size two for both a row and column count. For example, a numeric vector of dimension three can be created as 2 Rcpp::NumericVector vec3 = Rcpp::NumericVector( Rcpp::Dimension(4, 5, 6)); Listing 3.10 Declaring a three-dimensional vector These multidimensional arrays can be useful for particular applications. However, we will focus more on matrices for their more common use in linear algebra and modeling. A first example illustrates the use of matrices and also shows the clone method discussed in the previous section. 48 2 4 6 8 10 12 14 16 3 Data Structures: Part One R> src <- ’ + Rcpp::NumericMatrix mat = + Rcpp::clone<Rcpp::NumericMatrix>(mx); + std::transform(mat.begin(), mat.end(), + mat.begin(), ::sqrt); + return mat; + ’ R> fun <- cxxfunction(signature(mx="numeric"), src, + plugin="Rcpp") R> orig <- matrix(1:9, 3, 3) R> fun(orig) [,1] [,2] [,3] [1,] 1.00000 2.00000 2.64575 [2,] 1.41421 2.23607 2.82843 [3,] 1.73205 2.44949 3.00000 R> Listing 3.11 A function to take square roots of matrix elements This also illustrates how the two-dimensional matrix is treated as a one-dimensional continuous vector (just like in R) in memory as the sqrt() function is being swept across all elements. Many of the “Rcpp sugar” extensions discussed more fully in Chap. 8 are also directly applicable to vectors and matrices. 3.4 Other Vector Classes 3.4.1 LogicalVector The LogicalVector class is very similar in behavior to IntegerVector as it represents the two possible values of a logical, or boolean, type. These values—True and False—can also be mapped to one and zero (or more generally to “not zero” and zero). However, as R generally supports missing values in its data structures, the LogicalVector has to support this—and is, in fact, seen as supporting three rather than two possible values. Listing 3. 12 illustrates this as it shows how the other nonfinite values NaN, Inf, and NA all collapse into NA in the context of a logical vector. 2 4 6 8 R> fun <- cxxfunction(signature(), plugin="Rcpp", + body=’ + Rcpp::LogicalVector v(6); + v[0] = v[1] = false; + v[1] = true; + v[3] = R_NaN; + v[4] = R_PosInf; + v[5] = NA_REAL; 3.4 Other Vector Classes 10 12 14 49 + return v; + ’) R> fun() [1] FALSE TRUE FALSE NA NA NA R> identical(fun(), c(FALSE, TRUE, FALSE, rep(NA, 3))) [1] TRUE R> Listing 3.12 A function to assign a logical vector The example shows that assigning any of three possible nonfinite values NaN, Inf (which can be positive or negative), or NA (which is commonly defined for real-valued variables only) results in NA value in the logical vector. The example also illustrates, by means of using the R function identical, that the values returned from an Rcpp-created function are indistinguishable from those created directly in R itself. 3.4.2 CharacterVector The class CharacterVector can be used for vectors of R character vectors (“strings”). 1 3 5 7 9 11 R> fun <- cxxfunction(signature(), plugin="Rcpp", + body=’ + Rcpp::CharacterVector v(3); + v[0] = "The quick brown"; + v[1] = "fox"; + v[2] = R_NaString; + return v; + ’) R> fun() [1] "The quick brown" "fox" NA R> Listing 3.13 A function to assign a character vector And similar to the other vectors, CharacterVector can hold its primary type, here strings, as well as NA value. Character vectors can also be converted to std::vector<std::string>. 3.4.3 RawVector The RawVector can be very useful when “raw” bytes have to be used, for example in a networked application that transmits them to another application or program running on another machine. As such a use case is somewhat more specialized, we are not showing a full example here. Chapter 4 Data Structures: Part Two Abstract This chapter introduces several other important classes such as List, DataFrame, Function, and Environment which both correspond to key R language objects and have an underlying SEXP representation. The previous chapter discussed fundamental Rcpp classes centered around the basic Vector classes, covering anything from integers and numeric to raw types, logicals vector and string vectors. It also touched upon multidimensional vectors and the important special case of matrices. In this chapter, we extend the analysis to more data types. After introducing a useful helper class, we continue with what seems like yet another vector, but is really the all-important List type. This is followed by the DataFrame class, before we turn to different types of classes with Function and Environment, respectively. We then briefly discuss S4 and Reference Classes before ending the chapter with the R mathematics functions. 4.1 The Named Class The Named class is a helper class used for setting the key side of key/value pairs. It corresponds to standard R use. Consider this simple example 1 3 5 R> someVec <- c(mean=1.23, dim=42.0, cnt=12) R> someVec mean dim cnt 1.23 42.00 12.00 R> Listing 4.1 A named vector in R where three elements are assigned in a vector, and each is also assigned a specific identifier or label. The Named class permits us to do the same thing for objects created in C++ which we want to return to the calling R function with such labels. D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4 4, © The Author 2013 51 52 4 Data Structures: Part Two Several examples of this class will follow in the next sections. As a first illustration, consider the vector shown above which could be created at the C++ level as follows: 1 3 5 7 9 11 R> src <- ’ + Rcpp::NumericVector x = + Rcpp::NumericVector::create( + Rcpp::Named("mean") = 1.23, + Rcpp::Named("dim") = 42, + Rcpp::Named("cnt") = 12); + return x; ’ R> fun <- cxxfunction(signature(), src, plugin="Rcpp") R> fun() mean dim cnt 1.23 42.00 12.00 Listing 4.2 A named vector in C++ We can shorten the somewhat verbose coding style by • Declaring using namespace Rcpp; to import the Rcpp namespace (and we should note that the cxxfunction() function from the inline package also does that when the “Rcpp” plugin is selected) • Employing the shortcut form _["key"] which allows us to rewrite the example as 1 3 5 7 9 R> src <- ’ + NumericVector x = NumericVector::create( + _["mean"] = 1.23, + _["dim"] = 42, + _["cnt"] = 12); + return x; ’ R> fun <- cxxfunction(signature(), src, plugin="Rcpp") R> fun() mean dim cnt 1.23 42.00 12.00 Listing 4.3 A named vector in C++ , second approach We may switch between the more explicit and the shortened style for some of the following examples. 4.2 The List aka GenericVector Class The GenericVector is equivalent to the List type. This is the most general data type which can contain other types—and corresponds to how a list() in R can contain objects of different types. Objects of type List can also contain other objects of different lengths (and, this is sometimes called a “ragged array” if vectors of different length are parts of the same object). Furthermore, List objects can contain other List objects, which allow for arbitrary nesting of data structures. 4.2 The List aka GenericVector Class 53 Being able to hold different types of objects makes the List type suitable for parameter exchanges in either direction between R and C++ . 4.2.1 List to Retrieve Parameters from R Consider the following example, taken from the general-purpose optimization package RcppDE (Eddelbuettel 2012b). It has been somewhat simplified by removing a number of similar argument types, and by removing a second layer dealing with exception handling. 2 4 6 8 10 12 14 16 18 RcppExport SEXP DEoptim(SEXP lowerS, SEXP upperS, SEXP fnS, SEXP controlS, SEXP rhoS) { Rcpp::NumericVector f_lower(lowerS), f_upper(upperS); Rcpp::List control(controlS); double VTR = Rcpp::as<double>(control["VTR"]); int i_strategy = Rcpp::as<int>(control["strategy"]); int i_itermax = Rcpp::as<int>(control["itermax"]); int i_D = Rcpp::as<int>(control["npar"]); int i_NP = Rcpp::as<int>(control["NP"]); int i_storepopfrom = Rcpp::as<int>(control["storepopfrom"])-1; int i_storepopfreq = Rcpp::as<int>(control["storepopfreq"]); int i_specinitialpop = Rcpp::as<int>(control["specinitialpop"]); Rcpp::NumericMatrix initialpopm = Rcpp::as<Rcpp::NumericMatrix>(control["initialpop"]); double f_weight = Rcpp::as<double>(control["F"]); double f_cross = Rcpp::as<double>(control["CR"]); [...] } Listing 4.4 Using the List class for parameters Here two vectors for upper and lower parameter bounds are directly initialized from one SEXP each as we have seen in the previous chapter. The SEXP variable controlS is assigned to a Rcpp::List variable named control. This object contains a large set of user-supplied parameters to control the optimization. The List type allows for access by named string “key,” similar to how we would extract a named entry from a list in R using the [["key"]] operator. Here in C++, we obtain an element of the list which will generally be of SEXP type—and we can use the explicit conversion function as<>() along with a template type to assign the value. For example, the first parameter is a floating-point variable keyed to the name “VTR” which we assign to a double. Similarly, several count variables denoting sizes, dimensions, numbers of iterations, etc. are assigned to more integer-valued variable. However, the list also contains a genuine numeric matrix under the key “initialpop.” RcppDE performs differential optimization, an evolutionary algorithm related to genetic algorithms but particularly suitable for floating-point representations. These types of optimization algorithms operate on populations of poten- 54 4 Data Structures: Part Two tial solutions—and the matrix indexed by “initialpop” can be used to initialize the algorithm with an initial set of potential solutions. Finally, two more floating-point control parameters are assigned. 4.2.2 List to Return Parameters to R We can use an example from the same package, RcppDE, to illustrate how to return values to C++ as well. 1 3 5 7 9 return Rcpp::List::create(Rcpp::Named("bestmem") = t_bestP, Rcpp::Named("bestval") = t_bestC, Rcpp::Named("nfeval") = l_nfeval, Rcpp::Named("iter") = i_iter, Rcpp::Named("bestmemit") = t(d_bestmemit), Rcpp::Named("bestvalit") = d_bestvalit, Rcpp::Named("pop") = t(d_pop), Rcpp::Named("storepop") = d_storepop); Listing 4.5 Using a List to return objects to R Because we do not show the declaration of the variables, we cannot tell immediately what their types are. But the beauty of the List type is that all types that can be converted to a SEXP are admissible! Here we have Armadillo vectors (t_bestP) and matrices (d_bestmemit, d_pop) thanks to RcppArmadillo, as well as standard long (_nfeval), double (t_bestC) and int (i_iter) scalars. The use of Rcpp::List::create() is fairly idiomatic. It permits us to create a list on the fly, with its size determined at compile-time by the number of name = value pairs we have supplied. However, as we have seen in earlier examples, an alternate method of setting elements is also available by first reserving a sufficiently dimensioned list (and the same is of course true for vectors). This can be done directly using the constructor as in Rcpp::List ll(4), where four elements would be reserved. A second possibility is to use the reserve() member function to specify a size. Once sufficient space has been reserved, we can then assign these using the standard square bracket operator []. Needless to say, the square bracket operator cannot assign elements beyond the pre-reserved size range. An alternative insertion method is provided by two functions modeled after equivalent STL functions: push_back() which insert the given element at the back—and, thereby, extends the vector or list by one element—and push_front() which inserts at the front and similarly extends the size by one. It should, however, be noted that these may alter the vectors. And as vectors are implemented in contiguous memory, this will in most cases result in a complete copy of the whole vector. In other words, the operations push_back and push_front operations can be relatively inefficient (due to the underlying memory model of SEXP types) and are provided mainly for convenience. 4.3 The DataFrame Class 55 4.3 The DataFrame Class Data frames are an essential object type in R and are used by almost all modeling functions, so naturally Rcpp supports this type too. Internally, data frames are represented as lists. This permits the data frame to contain data of different types. For example, a data frame may contain time stamps, real-valued measurements as well as, say, group identifications encoded as a factor. The different columns will always be recycled to have common length. To take an example, if we insert a vector of length four into a data frame followed by a vector of length two, the latter vector will be repeated a second time to also have length four (and this recycling at construction is done only for integer multiples). Having common length is an important feature, as other functions can always assume that data frames are rectangular. Rows are commonly seen as observations with columns representing variables. We have already seen one example of a data frame creation earlier in Sect. 2.5.2 where the static create function was used. A similar example, taken from one of the unit tests, is 1 3 5 7 9 11 13 15 R> src <- ’ + Rcpp::IntegerVector v = + Rcpp::IntegerVector::create(7,8,9); + std::vector<std::string> s(3); + s[0] = "x"; + s[1] = "y"; + s[2] = "z"; + return Rcpp::DataFrame::create(Rcpp::Named("a")=v, + Rcpp::Named("b")=s); + ’ R> fun <- cxxfunction(signature(), src, plugin="Rcpp") R> fun() a b 1 7 x 2 8 y 3 9 z Listing 4.6 Using the DataFrame class Otherwise, the data frame type can really be seen as a specialization of the list type, with the added restrictions of excluding nesting types and of imposing common length. While the latter is achieved via recycling in R, in C++ we have to ensure that each component of a data.frame is of the same length. Data frames are very useful to concisely organize return data for further use by R, as well as a standard data representation for many modeling functions. 56 4 Data Structures: Part Two 4.4 The Function Class 4.4.1 A First Example: Using a Supplied Function A function object is needed whenever an R function—either supplied by the user or by accessing an R function—is employed. Consider this simple first example which uses the sort() function (passed as an argument from R ) and applies it to a user-supplied vector: 2 4 6 8 10 R> src <- ’ + Function sort(x) ; + return sort( y, Named("decreasing", true)); + ’ R> fun <- cxxfunction(signature(x="function", + y="ANY"), + src, plugin="Rcpp") R> fun(sort, sample(1:5, 10, TRUE)) [1] 5 5 5 3 3 3 2 2 2 1 R> fun(sort, sample(LETTERS[1:5], 10, TRUE)) [1] "E" "E" "C" "B" "B" "B" "B" "B" "A" "A" Listing 4.7 Using a Function passed as argument On line two a C++ variable named sort is initialized from the object x that is of type function. The object named y is passed through as is; we never instantiate a C++ object with it. After compiling and loading this function, we pass the R function sort() as the first argument. Other suitable functions such as order() could also be used. Another useful point to note is that because the second argument is never instantiated, we can pass different types of a suitable nature. In the example above, both integer and character vectors are passed in as randomized permutations, and both are being returned in decreasing sort order. This works because no Rcpp object is instantiated and hence no particular type is encoded (or even enforced via static typing). So no tests for matching types are executed and no exceptions are thrown—as was the case in Sect. 3.2.3—because no mismatched type is encountered. This example shows that passing the original SEXP type through can have its uses too. 4.4.2 A Second Example: Accessing an R Function The function class can also be used to access R functions directly. In the example below, we draw five random numbers from a t-distribution with three degrees of freedom. As we are accessing the random number generators, we need to ensure that it is in a proper state. The RNGScope class ensures this by initializing the random number generator by calling the GetRNGState() function from the class constructor, and by restoring the initial state via PutRNGState() via its destructor (R Development Core Team 2012d, Section 6.3). 4.5 The Environment Class 1 3 5 7 9 11 57 R> src <- ’ + RNGScope scp; + Rcpp::Function rt("rt"); + return rt(5, 3); + ’ R> fun <- cxxfunction(signature(), + src, plugin="Rcpp") R> set.seed(42) R> fun() [1] 2.339681 0.130995 -0.074028 -0.057701 -0.046482 R> fun() [1] 9.16504 1.08153 0.87017 1.99557 -0.22438 Listing 4.8 Using a Function accessed from R We first instantiate a function object by giving it a string with the name of the R function we want to access. If the function is not globally accessible, we may need to access the corresponding namespace first. Regarding the random number generation, it is also important to note that executing the equivalent commands in R itself, that is, set.seed(42) followed by rt(10,3) to generate ten random numbers from the t-distribution with three degrees of freedom, generates exactly the same ten numbers. This is key for reproducibility and also helps with debugging. 4.5 The Environment Class The environment provides access to environments, a type of object which may be familiar to R programmers. Environments are defined in Section 2.1.10 of the “R Language” manual (R Development Core Team 2012c). Their role in variable lookup and their relationship to namespaces are described in Section 1.2 of the “R Internals” manual (R Development Core Team 2012b). As a first example of using an environment, we consider the following example where we instantiated the stats namespace of R to access the rnorm() function: 2 Rcpp::Environment stats("package:stats"); Rcpp::Function rnorm = stats["rnorm"]; return rnorm(10, Rcpp::Named("sd", 100.0)); Listing 4.9 Using a Function via an Environment As the preceding section showed, such a two-step approach may not be needed as we can also use the Function class in Rcpp to search for an identifier. However, it is useful to create and initialize environments, or to use environments to access variables in the current R session. A second example interfaces the global environment in R: 58 1 3 Rcpp::Environment global = Rcpp::Environment::global_env(); std::vector<double> vx = global["x"]; 7 std::map<std::string,std::string> map; map["foo"] = "oof"; map["bar"] = "rab"; 9 global["y"] = map; 5 4 Data Structures: Part Two Listing 4.10 Assigning in the global environment Here an instance of the global environment is created using the variable name global. It is used to access a variable x in R via direct lookup. Similarly, we create a map from string to string of size two and assign it to a symbol y in the global environment. 4.6 The S4 Class R as a programming language has evolved over several decades. The first big step towards object-oriented programming was provided by what is known as the “White Book” (Chambers and Hastie 1992) which introduced S3 classes and methods. While particular to R and the underlying S programming language, and thus different from object-oriented programming notions in C++ or Java, S3 methods provide a simple approach that is still supported and is widely used. The basic feature is called method dispatch, implemented via what is known as a “generic function” which invokes corresponding methods driven by the class of the data type. A useful introduction to this approach is provided byVenables and Ripley (2000, Chapter 4). A more ambitious approach to object-oriented programming was then added with the introduction of S4 classes in the “Green Book” (Chambers 1998); a more recent treatment is provided by Chambers (2008, Chapters 9 and 10). These classes have also been in R for well over a decade and provide a significant extension to the preceding object-oriented programming framework available to the R programmer. S4 classes offer a rich structure with a more rigid formalism, at the cost of some of the flexibility of the earlier and somewhat more ad hoc S3 class types. However, S4 offers more structure which may be needed for larger programming tasks. As indicated towards the end of Sect. 3.1, Rcpp can access and modify S4 objects using the Rcpp::S4 class type. Getting and setting slot types of S4 objects is supported, as are various tests for object properties such S4 objects. Listing 4. 11 shows how to test if RObject object is in fact an S4 object, how to test for presence of a slot, and how to access a slot. 1 3 f1 <- cxxfunction(signature(x="any"), plugin="Rcpp", body=’ RObject y(x) ; List res(3) ; res[0] = y.isS4(); 4.7 ReferenceClasses 59 res[1] = y.hasSlot("z"); res[2] = y.slot("z"); return res; 5 7 ’) Listing 4.11 A simple example for accessing S4 class elements Similarly, Listing 4.12 shows how to create an S4 object at the C++ level 2 4 f2 <- cxxfunction(signature(x="any"), plugin="Rcpp", body=’ S4 foo(x); foo.slot(".Data") = "foooo"; return foo; ’) Listing 4.12 A simple example for accessing S4 class elements So while the basic facilities to access, alter, or create S4 objects exist, it may often remain easier to do so at the R code level. So a more common paradigm may be to compute and create the core C++ aspects of an object at the C++ level and to complement the objects at the R language level. Given that Rcpp functionality is often accessed from R functions, additional R code can then be executed at the R level after returning from a C++ function. S4 classes have been used extensively in a number of CRAN packages. The BioConductor Project uses them throughout a large number of its packages as well. 4.7 ReferenceClasses ReferenceClasses appeared with R version 2.12.0 and complete the set of objectoriented programming paradigms in R by adding a style more similar to what is found in C++ or Java. As of late 2012, the best documentation for ReferenceClasses is provided by invoking help(ReferenceClasses) in R; the topic is still undergoing changes. ReferenceClasses are implemented using S4 methods and classes and are therefore related to S4 at least in the sense of some of the implementation details. ReferenceClasses are also related to “Rcpp modules” (covered later in Chap. 7) which use them as a representation. Two key aspects of ReferenceClasses are (a) that they are mutable which differs from the standard “copy on write” semantics in the R system and (b) that methods are primarily associated with objects rather than functions. Their mutable state makes ReferenceClasses objects good candidates for use in applications that require to track a “state.” Typical examples are graphical user interface applications or servers. With the mutable state also comes the pass-by-reference semantics: rather than copying the whole object (when a component changes), only the reference to the object is copied. This is closer to how C++ and Java objects behave. Similarly, the association of methods to the underlying objects also more closely resembles the object-oriented design philosophies of C++ and Java. 60 4 Data Structures: Part Two ReferenceClasses are still undergoing changes and more definitive documentation may be forthcoming. Because this area of R programming has not yet settled as firmly, we will limit the discussion of Rcpp in this context to this brief overview. 4.8 The R Mathematics Library Functions R provides a large number of mathematical and statistical functions described in the header file Rmath.h. As documented in R Development Core Team (2012d, Section 6.16), these functions can be used from a stand-alone library independent of R itself. Naturally, they can also be used with the R API. R programmers may want to access these functions from C++ code too. Rcpp provides these via the ”Rcpp sugar” extension described below in Chap. 8 as vectorized functions. In order to use them on atomistic double types, the prefix Rf was required. Starting with Rcpp release 0.10.0, these functions are also accessible via the R namespace. By using a distinct namespace, it is possible to cleanly reuse the same identifiers without the need for a remapping prefix such as Rf even though the functions are provided only via a more limited C language API. The following example illustrates this use. For a given vector X, the probability function of the Normal distribution is computed. 1 #include <Rcpp.h> 3 extern "C" SEXP mypnorm(SEXP xx) { Rcpp::NumericVector x(xx); int n = x.size(); Rcpp::NumericVector y1(n),y2(n),y3(n); 5 7 for (int i=0; i<n; i++) { // accessing function via remapped R header y1[i] = ::Rf_pnorm5(x[i], 0.0, 1.0, 1, 0); 9 11 // or accessing same function via Rcpp’s ’namespace R’ y2[i] = R::pnorm(x[i], 0.0, 1.0, 1, 0); 13 } // or using Rcpp sugar which is vectorized y3 = Rcpp::pnorm(x); 15 17 return Rcpp::DataFrame::create(Rcpp::Named("R")=y1, Rcpp::Named("Rf_")=y2, Rcpp::Named("sugar")=y3); 19 21 } Listing 4.13 Example use of Rmath.h functions A vector x is instantiated, its size determined, and three return vectors are allocated. Each will contain the corresponding pnorm value. On line nine, the new form R::pnorm() is used. It is identical to the function described in the documentation 4.8 The R Mathematics Library Functions 61 for the R API, but provided from a distinct namespace R. Line ten shows the older approach: a global identifier (as indicated by ::) with the prefix Rf which is effectively a primitive C language attempt at creating a namespace separation when no such feature exists in the language. Both functions operate on a single double-type variable at a time; a loop is required to compute all elements of the vector. However, line twelve provides a preview of “Rcpp sugar” as detailed in Chap. 8. This version is in fact vectorized and all components of the vector y3 are assignment in a single (vectorized) operation just as one would in R but at the speed of C++. The R namespace contains a large number of probability, density, and quantile functions, as well as corresponding random numbers, for a wide variety of distributions: Normal, Uniform, Gamma, Beta, LogNormal, Chi-squared, F, t, Binomial, Multinomial, Cauchy, Exponential, Geometric, Hypergeometric, Negative Binomial, Poisson, Weibull, Non-central Beta, Non-central F, Non-central t, Studentized Range (also known as “Tukey”), Wilcoxon Rank Sum, Wilcoxon Signed Rank, as well as a number of related functions. The header file Rmath.h within the Rcpp header directory provides full details. Part III Advanced Topics Chapter 5 Using Rcpp in Your Package Abstract This chapter provides an overview of how to use Rcpp when writing an R package. It shows how using the function Rcpp.package.skeleton() can create a complete and self-sufficient example of a package using Rcpp. All components of the directory tree created by Rcpp.package.skeleton() are discussed in detail. A brief case study of an existing CRAN package concludes the chapter. This chapter complements the Writing R Extensions manual (R Development Core Team 2012d) which is the authoritative source on how to extend R in general. 5.1 Introduction Rcpp helps to extend R by offering an easy-to-use yet featureful interface between C++ and R. Rcpp itself is distributed as an R package. However, it is somewhat different from a traditional R package because its key component is a C++ library along with a set of header files defining the library interface. A client package that wants to make use of the Rcpp features must link against this library provided by Rcpp. For most of the previously examined examples, we have relied on the inline package to take care of the finer details of the dependencies on the Rcpp package itself. It should be noted that R has only limited support for C and C++-level dependencies between packages (R Development Core Team 2012d). The LinkingTo declaration in the package DESCRIPTION file allows the client package to access the headers of the target package (here Rcpp), but support for linking against a library is not provided by R (outside of a function registration setup more suitable for C-only packages with a limited number of registered interface functions) and has to be added manually. The discussion in this chapter follows the Rcpp.package.skeleton() function to show a recommended way of using Rcpp from a client package. We illustrate this using a simple C++ function which will be called by an R D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4 5, © The Author 2013 65 66 5 Using Rcpp in Your Package function. The material in the Writing R Extensions manual (R Development Core Team 2012d) is strongly recommended. Other documents on R package creation (for example Leisch 2008) are helpful as well concerning the package build process. R package creation is standardized and follows a logical pattern—but is not all that well documented leading beginners to experience some frustration. A basic understanding of how to create an R package helps when trying to add the additional information needed on how to use the Rcpp package in such add-on packages. The working example provided in the next few sections provides a complete illustration of the process and can serve as a reference use case. 5.2 Using Rcpp.package.skeleton 5.2.1 Overview Rcpp provides a function Rcpp.package.skeleton, modeled after the base R function package.skeleton, which facilitates creation of a so-called skeleton package using Rcpp. A skeleton package is a minimal package providing a working example which can then be adapted and extended as needed by the user. Rcpp.package.skeleton has a number of arguments documented on its help page (and similar to those of package.skeleton). The main argument is the first one which provides the name of the package one aims to create by invoking the function. An illustration of a call using the argument mypackage is provided below. 2 4 6 8 10 12 14 16 18 20 22 R> Rcpp.package.skeleton( "mypackage" ) Creating directories ... Creating DESCRIPTION ... Creating NAMESPACE ... Creating Read-and-delete-me ... Saving functions and data ... Making help files ... Done. Further steps are described in ’./mypackage/Read-and-delete-me’. Adding Rcpp settings >> added Depends: Rcpp >> added LinkingTo: Rcpp >> added useDynLib directive to NAMESPACE >> added Makevars file with Rcpp settings >> added Makevars.win file with Rcpp settings >> added example header file using Rcpp classes >> added example src file using Rcpp classes >> added example R file calling the C++ example >> added Rd file for rcpp_hello_world R> Listing 5.1 A first Rcpp.package.skeleton example 5.2 Using Rcpp.package.skeleton 67 We can use the (Linux) command ls -1R to recursively list the directory and file structure created by this command: 2 4 6 8 10 12 14 R> system("ls -1R mypackage") mypackage: DESCRIPTION man NAMESPACE R Read-and-delete-me src mypackage/man: mypackage-package.Rd rcpp_hello_world.Rd mypackage/R: rcpp_hello_world.R 16 18 20 22 mypackage/src: Makevars Makevars.win rcpp_hello_world.cpp rcpp_hello_world.h R> Listing 5.2 Files created by Rcpp.package.skeleton Using Rcpp.package.skeleton() is by far the simplest approach as it fulfills two roles. It creates the complete set of files needed for a package, and it also includes the different components needed for using Rcpp that we discuss in the following sections. 5.2.2 R Code The skeleton created here contains an example R function rcpp_hello_world() that uses the .Call interface to invoke the C++ function rcpp_hello_world from the package mypackage. 2 rcpp_hello_world <- function(){ .Call( "rcpp_hello_world", PACKAGE = "mypackage" ) } Listing 5.3 R function rcpp hello world Rcpp uses the .Call calling convention as it allows exchange of actual R objects back and forth between the R side and the C++ side. R objects encoded as SEXP types can be conveniently manipulated using the Rcpp API as we discussed in the preceding chapters. 68 5 Using Rcpp in Your Package Note that in this example, no arguments are passed from R down to the C++ layer. Doing so is straightforward (and one of the key features of Rcpp) but not central to our discussion of the package creation mechanics and hence omitted here. 5.2.3 C++ Code The C++ function is declared in the rcpp_hello_world.h header file: 1 #ifndef _mypackage_RCPP_HELLO_WORLD_H #define _mypackage_RCPP_HELLO_WORLD_H 3 #include <Rcpp.h> 5 7 9 11 13 15 17 19 /* * note : RcppExport is an alias to ‘extern "C"‘ * defined by Rcpp. * * It gives C calling convention to the rcpp_hello_world * function so that it can be called from .Call in R. * Otherwise, the C++ compiler mangles the * name of the function and .Call can’t find it. * * It is only useful to use RcppExport when the function * is intended to be called by .Call. See the thread * http://thread.gmane.org/gmane.comp.lang.r.rcpp/649/focus=672 * on Rcpp-devel for a misuse of RcppExport */ RcppExport SEXP rcpp_hello_world(); 21 #endif Listing 5.4 C++ header file rcpp hello world.h The header includes the Rcpp.h file which is the sole header file that needs to be included in order to use Rcpp. The function itself is implemented in the file rcpp_hello_world.cpp. #include "rcpp_hello_world.h" 2 4 SEXP rcpp_hello_world(){ using namespace Rcpp ; CharacterVector x = CharacterVector::create( "foo", "bar"); NumericVector y = NumericVector::create( 0.0, 1.0 ); List z = List::create( x, y ); 6 8 return z ; 10 } Listing 5.5 C++ source file rcpp hello world.cpp 5.2 Using Rcpp.package.skeleton 69 The function creates an R list that contains a character vector and a numeric vector using Rcpp classes. At the R level, we will therefore receive a list of length two containing these two vectors: 1 3 5 R> rcpp_hello_world( ) [[1]] [1] "foo" "bar" [[2]] [1] 0 1 R> Listing 5.6 Calling R function rcpp hello world 5.2.4 DESCRIPTION The skeleton generates an appropriate DESCRIPTION file, using both Depends: and LinkingTo: for Rcpp: 2 4 6 8 10 12 Package: mypackage Type: Package Title: What the package does (short line) Version: 1.0 Date: 2012-11-10 Author: Who wrote it Maintainer: Who to complain to <[email protected]> Description: More about what it does (maybe more than one line) License: What Licence is it under ? LazyLoad: yes Depends: Rcpp (>= 0.9.13) LinkingTo: Rcpp Listing 5.7 DESCRIPTION file for skeleton package Rcpp.package.skeleton() adds the three last lines to the DESCRIPTION file. The Depends declaration indicates R-level dependency between the client package and Rcpp. The LinkingTo declaration indicates that the client package needs to use header files exposed by Rcpp. 5.2.5 Makevars and Makevars.win Unfortunately, and notwithstanding its name, the LinkingTo declaration in itself is not enough to link to the user C++ library of Rcpp. Until more explicit support for libraries is added to R, we need to manually add the Rcpp library to the PKG LIBS variable in the Makevars and Makevars.win files. Rcpp provides the unexported function Rcpp:::LdFlags() to ease the process: 70 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 5 Using Rcpp in Your Package ## Use the R_HOME indirection to support ## installations of multiple R version PKG_LIBS=‘$(R_HOME)/bin/Rscript -e "Rcpp:::LdFlags()"‘ ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## As an alternative, one can also add this code in a file ’configure’ PKG_LIBS=‘${R_HOME}/bin/Rscript -e "Rcpp:::LdFlags()"‘ sed -e "s|@[email protected]|${PKG_LIBS}|" \ src/Makevars.in > src/Makevars which together with the following file ’src/Makevars.in’ PKG_LIBS = @[email protected] can be used to create src/Makevars dynamically. This scheme is more powerful and can be expanded to also check for and link with other libraries. It should be complemented by a file ’cleanup’ rm src/Makevars which removes the autogenerated file src/Makevars. Of course, autoconf can also be used to write configure files. This is done by a number of packages, but recommended only for more advanced users comfortable with autoconf and its related tools. Listing 5.8 Makevars file for skeleton package The file Makevars.win is the equivalent version targeting Windows. This version uses an additional variable to call the architecture-dependent variant of Rscript in order to create the correct arguments for 32-bit or 64-bit versions of Windows. 2 4 ## Use the R_HOME indirection to support ## installations of multiple R version PKG_LIBS = $(shell "${R_HOME}/bin${R_ARCH_BIN}/Rscript.exe" -e "Rcpp:::LdFlags()") Listing 5.9 Makevars.win file for skeleton package 5.2 Using Rcpp.package.skeleton 71 5.2.6 NAMESPACE The Rcpp.package.skeleton() function also creates a file NAMESPACE with the content shown here. 1 useDynLib(mypackage) exportPattern("ˆ[[:alpha:]]+") Listing 5.10 NAMESPACE file for skeleton package This file serves two purposes. First, it ensures that the dynamic library contained in the package we are creating via Rcpp.package.skeleton() will be loaded and thereby made available to the newly created R package. Second, it declares which identifiers, that is functions or data sets, should be globally visible from the namespace of this package. As a reasonable default, we export all functions via a regular expression covering all identifiers starting with a letter. 5.2.7 Help Files Also created is a directory man containing two help files. One is for the package itself, the other for the (single) R function being provided and exported. Writing a help file is an important step in fully documenting a package. The Writing R Extensions manual (R Development Core Team 2012d) provides the complete documentation on how to create suitable content for help files. 5.2.7.1 mypackage-package.Rd The help file mypackage-package.Rd is used to describe the new package. 2 4 6 8 10 12 14 16 18 \name{mypackage-package} \alias{mypackage-package} \alias{mypackage} \docType{package} \title{ What the package does (short line) } \description{ More about what it does (maybe more than one line) ˜˜ A concise (1-5 lines) description of the package ˜˜ } \details{ \tabular{ll}{ Package: \tab mypackage\cr Type: \tab Package\cr Version: \tab 1.0\cr Date: \tab 2012-11-10\cr License: \tab What license is it under?\cr 72 20 22 24 5 Using Rcpp in Your Package LazyLoad: \tab yes\cr } ˜˜ An overview of how to use the package, including the most important functions ˜˜ } \author{ Who wrote it 26 28 30 32 34 36 38 40 42 Maintainer: Who to complain to <[email protected]> } \references{ ˜˜ Literature or other references for background information ˜˜ } ˜˜ Optionally other standard keywords, one per line, from file KEYWORDS in the R documentation directory ˜˜ \keyword{ package } \seealso{ ˜˜ Optional links to other man pages, e.g. ˜˜ ˜˜ \code{\link[<pkg>:<pkg>-package]{<pkg>}} ˜˜ } \examples{ %% ˜˜simple examples of the most important functions˜˜ } Listing 5.11 Manual page mypackage-package.Rd for skeleton package 5.2.7.2 rcpp hello world.Rd The help file rcpp_hello_world.Rd serves as documentation for the example R function. 2 4 6 8 10 12 14 16 \name{rcpp_hello_world} \alias{rcpp_hello_world} \docType{package} \title{ Simple function using Rcpp } \description{ Simple function using Rcpp } \usage{ rcpp_hello_world() } \examples{ \dontrun{ rcpp_hello_world() } } Listing 5.12 Manual page rcpp hello world.Rd for skeleton package 5.3 Case Study: The wordcloud Package 73 5.3 Case Study: The wordcloud Package An interesting package which uses Rcpp in what may be the simplest possible way is wordcloud (Fellows 2012). The wordcloud package has one main function creating word clouds, a common illustration depicting relative word frequency in a text corpus. The package initially provided this functionality via an R solution. However, the performance of iteratively finding placements of words on a two-dimensional plane such that the placement is tight yet not overlapping was seen as limiting. So the key determination of whether there is overlap between boxes, which executes a loop over the potentially large list of key words assigned to boxes, was then reimplemented in a short C++ function which is shown in Listing 5.13. 1 #include "Rcpp.h" 3 5 7 /* * Detect if a box at position (x11,y11), with width sw11 * and height sh11 overlaps with any of the boxes in boxes1 */ using namespace Rcpp; 9 11 13 15 17 19 21 23 25 27 29 RcppExport SEXP is_overlap(SEXP x11,SEXP y11,SEXP sw11, SEXP sh11,SEXP boxes1){ double x1 = as<double>(x11); double y1 =as<double>(y11); double sw1 = as<double>(sw11); double sh1 = as<double>(sh11); Rcpp::List boxes(boxes1); Rcpp::NumericVector bnds; double x2, y2, sw2, sh2; bool overlap= true; for (int i=0;i < boxes.size();i++) { bnds = boxes(i); x2 = bnds(0); y2 = bnds(1); sw2 = bnds(2); sh2 = bnds(3); if (x1 < x2) overlap = (x1 + sw1) > x2; else overlap = (x2 + sw2) > x1; 31 if (y1 < y2) overlap = (overlap && ((y1 + sh1) > y2)); else overlap = (overlap && ((y2 + sh2) > y1)); 33 35 if(overlap) return Rcpp::wrap(true); 37 39 } 74 5 Using Rcpp in Your Package return Rcpp::wrap(false); 41 } Listing 5.13 Function is overlap from the wordcloud package The package has no other external dependencies and requires only the files created by Rcpp.package.skeleton() as discussed above. 5.4 Further Examples There are now over 100 packages on the CRAN sites which use Rcpp, and which therefore provided working examples which can be studied—just like the wordcloud package in the previous section. Among the CRAN packages using Rcpp are • • • • RcppArmadillo (François et al. 2012); RcppEigen (Bates et al. 2012); RcppBDT (Eddelbuettel and François 2012b); and, RcppGSL (François and Eddelbuettel 2010) all of which not only follow the guidelines described in this chapter but are also discussed in the remainder of the book. These packages, as well as other packages on CRAN, can serve as examples on how to get data to and from C++ routines and can be considered templates for how to use Rcpp. A complete list of packages using Rcpp can always found at the CRAN page of the package. Chapter 6 Extending Rcpp Abstract This chapter provides an overview of the steps programmers should follow to extend Rcpp for use with their own classes and class libraries. The packages RcppArmadillo, RcppEigen, and RcppGSL provide working examples of how to extend Rcpp to work with, respectively, the Armadillo and Eigen C++ class libraries as well as the GNU Scientific Library. The chapter ends with an illustration of how the RcppBDT package connects the date types of R with those of the Boost Date Time library by extending Rcpp. 6.1 Introduction As discussed in the preceding chapters, Rcpp facilitates data interchange between R and C++ through the templated function Rcpp::as<>() which convert objects from R to C++, and the function Rcpp::wrap() which converts from C++ to R. In doing so, these function transform data from the representation in the socalled S Expression Pointers (the type SEXP of the R API) to a corresponding (templated) C++ type, and vice versa. The corresponding function declarations for Rcpp::as() and Rcpp::wrap() are as follows: 1 // conversion from R to C++ template <typename T> T as(SEXP m_sexp); 3 5 // conversion from C++ to R template <typename T> SEXP wrap(const T& object); Listing 6.1 as and wrap declarations These converters are often used implicitly, as in the following example: 1 3 code <- ’ // we get a list from R List input(inputS) ; D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4 6, © The Author 2013 75 76 6 Extending Rcpp // pull std::vector<double> from R list // achieved through an implicit call to Rcpp::as std::vector<double> x = input["x"] ; 5 7 // return an R list // achieved through implicit call to Rcpp::wrap return List::create(_["front"] = x.front(), _["back"] = x.back()); 9 11 13 ’ 15 fx <- cxxfunction(signature(inputS = "list"), body=code, plugin = "Rcpp") input <- list( x = seq(1, 10, by = 0.5) ) fx( input ) 17 Listing 6.2 Implicit use of as and wrap In this example, a list object (containing a vector x defined as a sequence from 1 to 10) is created in R and passed to the C++ code where a Rcpp::List object is instantiated. A list element named x is then extracted by name and assigned to a C++ vector object. For the return, we also create a list with two named components for the first and last element, respectively, which are named “front” and “back” just like the STL-style accessor function used to extract the corresponding elements. This example shows how Rcpp::as and Rcpp::wrap can be used to convert standard R and C++ types. These two converter functions have been designed to be extensible to user-defined types and third-party types. The next sections discuss how to apply Rcpp::wrap and Rcpp::as to user-supplied types. 6.2 Extending Rcpp::wrap The Rcpp::wrap converter is extensible in essentially two ways: a more intrusive approach (which modifies the header files defining the class to be made known to wrap) and via two nonintrusive variants that do not require changes to the class being wrapped. We discuss all three approaches below. 6.2.1 Intrusive Extension When extending Rcpp with your own data type, the recommended way is to implement a conversion to SEXP. This lets Rcpp::wrap know about the new data type. The template meta-programming (or TMP) dispatch is able to recognize that a type is convertible to a SEXP and Rcpp::wrap will then use that conversion. 6.2 Extending Rcpp::wrap 77 The caveat is that the type must be declared before the main header file Rcpp.h is included. #include <RcppCommon.h> 2 4 class Foo { public: Foo(); 6 // this operator enables implicit Rcpp::wrap operator SEXP(); 8 } 10 #include <Rcpp.h> Listing 6.3 Intrusive extension for wrap This is called intrusive because the conversion to the SEXP operator has to be declared within the class that we want to use with Rcpp. This means we have to add the header RcppCommon.h before the class declaration: this makes SEXP known in the context of our class Foo. By adding the header Rcpp.h later, we ensure that Rcpp knows about the conversion from SEXP to foo. And, of course, the actual code for the operator SEXP() has to be supplied as well in a corresponding source file. 6.2.2 Nonintrusive Extension It is often desirable to offer automatic conversion to third-party types over which the developer has no control. Lack of control, or access to source code, or maybe even design and policy reasons not to alter an existing code base or library may all preclude us from including a conversion to SEXP operator in the class definition as in the previous section. So to provide automatic conversion from C++ to R, one must declare a specialization of the Rcpp::wrap template between the includes of RcppCommon.h and Rcpp.h. 1 #include <RcppCommon.h> 3 // third party library that declares class Bar #include <foobar.h> 5 7 9 11 13 // declaring the specialization namespace Rcpp { template <> SEXP wrap( const Bar& ); } // this must appear after the specialization, else // the specialization will not be seen by Rcpp types #include <Rcpp.h> Listing 6.4 Nonintrusive extension for wrap 78 6 Extending Rcpp It should be noted that only the declaration is required. The implementation can appear after the Rcpp.h file is included and can therefore take full advantage of the Rcpp type system. 6.2.3 Templates and Partial Specialization It is also perfectly valid to declare a partial specialization for the Rcpp::wrap template using the templated typename T and a templated use of our class. The compiler will identify the appropriate overload: 1 #include <RcppCommon.h> 3 // third party library declarings template class Bling<T> #include <foobar.h> 5 7 9 11 13 15 // declaring the partial specialization namespace Rcpp { namespace traits { template <typename T> SEXP wrap( const Bling<T>& ); } } // this must appear after the specialization, else // the specialization will not be seen by Rcpp types #include <Rcpp.h> Listing 6.5 Partial specialization for wrap 6.3 Extending Rcpp::as Conversion from R to C++ using as<>() is also possible in both intrusive and nonintrusive ways. 6.3.1 Intrusive Extension As part of its template meta-programming dispatch logic, Rcpp::as will attempt to use the constructor of the target class taking a SEXP. 1 #include <RcppCommon.h> 3 class Foo{ public: Foo() ; 5 6.3 Extending Rcpp::as // this constructor enables implicit Rcpp::as Foo(SEXP) ; 7 9 11 79 } #include <Rcpp.h> Listing 6.6 Intrusive extension for as Taking this intrusive route, this constructor can then be implemented in the sources defining class Foo. 6.3.2 Nonintrusive Extension It is also possible to fully specialize Rcpp::as to enable nonintrusive implicit conversion capabilities. 1 #include <RcppCommon.h> 3 // third party library that declares class Bar #include <foobar.h> 5 7 9 11 13 // declaring the specialization namespace Rcpp { template <> Bar as( SEXP ) throw(not_compatible); } // this must appear after the specialization, else // the specialization will not be seen by Rcpp types #include <Rcpp.h> Listing 6.7 Nonintrusive extension for as 6.3.3 Templates and Partial Specialization The signature of Rcpp::as does not allow partial specialization. So when exposing a templated class to Rcpp::as, the programmer must specialize the template class Rcpp::traits::Exporter. The TMP dispatch will recognize that a specialization of Exporter is available and delegate the conversion to this class. Rcpp defines the Rcpp::traits::Exporter template class as follows: 1 namespace Rcpp { namespace traits { 3 5 template <typename T> class Exporter{ public: Exporter( SEXP x ) : t(x){} 80 6 Extending Rcpp inline T get(){ return t; } 7 private: T t; } ; 9 11 } 13 } Listing 6.8 Partial specialization via Exporter This is the reason why the default behavior of Rcpp::as is to invoke the constructor of the type T taking a SEXP. Since partial specialization of class templates is allowed, we can expose a set of classes as follows: 1 #include <RcppCommon.h> 3 // third party library that declares template class Bling<T> #include <foobar.h> 5 7 9 11 13 15 // declaring the partial specialization namespace Rcpp { namespace traits { template <typename T> class Exporter< Bling<T> >; } } // this must appear after the specialization, else // the specialization will not be seen by Rcpp types #include <Rcpp.h> Listing 6.9 Partial specialization of as via Exporter Using this approach, the requirements for the Exporter< Bling<T> > class are twofold. It should have • A constructor taking a SEXP type. • A method called get which returns an instance of the Bling<T> type. 6.4 Case Study: The RcppBDT Package The package RcppBDT (Eddelbuettel and François 2012b) interfaces some of the Date Time classes of the Boost C++ library collection. To do so, it contains Rcpp::as() and Rcpp::wrap() implementations to convert from one representation to the other. The case discussed in this section is straightforward and concerns the conversion for actual date types which are represented internally as (unsigned) integers. It does, however, illustrate the general principle of receiving a SEXP type and converting to a type from the to-be-wrapped library via the templated function 6.4 Case Study: The RcppBDT Package 81 as<>() in order to convert to a new type, and conversely returning a SEXP via the wrap() function which converts from a new type. The following specialization includes the actual code along with the declarations. 1 3 5 7 // define template specializations for as and wrap namespace Rcpp { template <> boost::gregorian::date as( SEXP dtsexp ) { Rcpp::Date dt(dtsexp); return boost::gregorian::date(dt.getYear(), dt.getMonth(), dt.getDay()); } 9 template <> SEXP wrap(const boost::gregorian::date &d) { boost::gregorian::date::ymd_type ymd = d.year_month_day(); // convert to y/m/d struct return Rcpp::wrap(Rcpp::Date( ymd.year, ymd.month, ymd.day )); } 11 13 15 17 } Listing 6.10 RcppBDT definitions of as and wrap In the case of as, the SEXP is first converted to a Rcpp::Date type. Its accessors for day, month, and year are then used to instantiate a Boost Gregorian date type. The wrap function here simply does the inverse and deploys the year, month, and date accessors of such a Boost Gregorian date type to access one of the constructor of the Rcpp::Date class. These two converter functions are then used to pass values between the R representation and the representation used by Boost Date Time. As an example, consider the implementation of the function 1 3 5 Rcpp::Date Date_firstDayOfWeekAfter(boost::gregorian::date *d, int weekday, SEXP date) { boost::gregorian::first_day_of_the_week_after fdaf(weekday); boost::gregorian::date dt = Rcpp::as<boost::gregorian::date>(date); return Rcpp::wrap(fdaf.get_date(dt)); } Listing 6.11 RcppBDT use of as and wrap As detailed in Chap. 7, the first argument of function using a so-called “Rcpp Module” has to be a pointer to the object being wrapped, here the class date in the namespace boost::gregorian. The next two arguments are the requested day of the week (encoded as an integer, or using one of constants Mon, Tue, . . ., Sun defined in the package) and a date for which the next matching weekday has to be determined. The function first instantiates a first_day_of_the_week_after object utilizing the Boost functionality. Next, the Rcpp::as() converter is used to pass the date obtained from R as a SEXP variable into a date variable, here dt, for 82 6 Extending Rcpp Boost. Finally, Rcpp::wrap() is used to convert the result obtained from computing the “first day of the week after” functionality on the given date dt, returned also as a date object—which Rcpp::wrap converts into a SEXP type from which an Rcpp::Date can be derived implicitly. This functionality can be illustrated with a simple usage example in which we compute the date of the first Monday after New Year 2020: 2 R> getFirstDayOfWeekAfter(Mon, as.Date("2020-01-01")) [1] "2020-01-06" R> Listing 6.12 RcppBDT example for getFirstDayOfWeekAfter Working with Rcpp modules is detailed in the next chapter. 6.5 Further Examples The packages RcppArmadillo (François et al. 2012), RcppEigen (Bates et al. 2012), and RcppGSL (François and Eddelbuettel 2010) provide concrete examples in the context of vector and matrix classes. Chapter 7 Modules Abstract This chapter discusses Rcpp modules which allow programmers to expose C++ functions and classes to R with relative ease. Rcpp modules are inspired from the Boost.Python C++ library which provides similar features for integrating Python and C++. Furthermore, Rcpp modules also offer the ability to extend C++ classes exposed to R directly from the R side. This chapter discusses modules in detail and ends on an applied case study featuring the RcppCNPy package. 7.1 Motivation Exposing C++ functionality to R is greatly facilitated by the Rcpp package and its underlying C++ library. Rcpp smoothes many of the rough edges in any R and C++ integration by adding a consistent set of C++ classes to the traditional R Application Programming Interface (API) described in “Writing R Extensions ” (R Development Core Team 2012d). The Rcpp-based approach was the focus of the earlier chapters. These Rcpp facilities offer a lot of assistance to the programmer wishing to interface R and C++. At the same time, they are limited as they operate on a function-byfunction basis. The programmer has to implement a .Call() compatible function (to conform to the R API) using classes of the Rcpp API as we briefly review in the next section. 7.1.1 Exposing Functions Using Rcpp Exposing existing C++ functions to R through Rcpp usually involves several steps. One approach is to write an additional wrapper function that is responsible for D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4 7, © The Author 2013 83 84 7 Modules converting input objects to the appropriate types, calling the actual worker function and converting the results back to the only suitable type that can be returned to R via the .Call() interface: the SEXP. As a concrete example, consider the norm function below: 1 3 double norm( double x, double y ) { return sqrt( x*x + y*y ); } Listing 7.1 A simple norm function in C++ This simple function does not meet the requirements imposed by the .Call interface, so it cannot be called directly by R. Exposing the function involves writing a simple wrapper function that does match the .Call interface. Rcpp makes this easy. 1 3 using namespace Rcpp; RcppExport SEXP norm_wrapper(SEXP x_, SEXP y_) { // step 0: convert input to C++ types double x = as<double>(x_), y = as<double>(y_); 5 // step 1: call the underlying C++ function double res = norm( x, y ); 7 // step 2: return the result as a SEXP return wrap( res ); 9 11 } Listing 7.2 Calling the norm function We use the (templated) Rcpp converter as() which can transform from a SEXP to a number of different C++ and Rcpp types; here we used it to assign two scalar double types. The Rcpp function wrap() offers the opposite functionality and converts many known types to a SEXP; here we use it to return the double scalar result. This process is simple enough and is widely used by a number of CRAN packages. However, it requires direct involvement from the programmer, which becomes laborious when many functions are involved. Rcpp modules provides a much more elegant and unintrusive way to expose C++ functions such as the norm function shown above to R. 7.1.2 Exposing Classes Using Rcpp Exposing C++ classes or structs is even more of a challenge because it requires writing glue code for each member function that is to be exposed. Consider the simple Uniform class below: 1 3 class Uniform { public: Uniform(double min_, double max_) : 7.1 Motivation 85 min(min_), max(max_) {} 5 NumericVector draw(int n) { RNGScope scope; return runif( n, min, max ); } 7 9 11 13 private: double min, max; }; Listing 7.3 A simple class Uniform This class enables us to draw a number of uniformly distributed random numbers, and it uses two internal state variables to store the lower and upper bound of the range from which the draws are taken. To use this class from R, we at least need to expose the constructor and the draw method. External pointers (R Development Core Team 2012d) are a suitable mechanism for this, and we can use the Rcpp:::XPtr template to expose the class with these two functions: 1 using namespace Rcpp; 3 /// create an external pointer to a Uniform object RcppExport SEXP Uniform__new(SEXP min_, SEXP max_) { // convert inputs to appropriate C++ types double min = as<double>(min_), max = as<double>(max_); 5 7 // create a pointer to an Uniform object and wrap it // as an external pointer Rcpp::XPtr<Uniform> ptr( new Uniform( min, max ), true ); 9 11 // return the external pointer to the R side return ptr; 13 } 15 17 19 /// invoke the draw method RcppExport SEXP Uniform__draw( SEXP xp, SEXP n_ ) { // grab the object as a XPtr (smart pointer) to Uniform Rcpp::XPtr<Uniform> ptr(xp); // convert the parameter to int int n = as<int>(n_); 21 23 // invoke the function NumericVector res = ptr->draw( n ); 25 // return the result to R return res; 27 29 } Listing 7.4 Exposing two member functions for Uniform class 86 7 Modules However, it is generally considered a bad idea to expose external pointers “as is.” Rather, we prefer to have them wrapped as a slot of a corresponding S4 class. 1 3 5 7 9 11 13 15 17 19 21 23 R> setClass("Uniform", + representation( pointer = "externalptr" ) ) [1] "Uniform" R> # helper R> Uniform_method <- function(name) { + paste( "Uniform", name, sep = "__" ) + } R> # syntactic sugar to allow object$method( ... ) R> setMethod( "$", "Uniform", function(x, name ) { + function(...) .Call(Uniform_method(name), + [email protected], ... ) + } ) R> # syntactic sugar to allow new( "Uniform", ... ) R> setMethod("initialize", + "Uniform", function(.Object, ...) { + [email protected] <+ .Call(Uniform_method("new"), ... ) + .Object + } ) [1] "initialize" R> u <- new( "Uniform", 0, 10 ) R> u$draw( 10L ) [1] 4.325224 0.269805 9.990058 7.137135 6.335477 [5] 6.833734 1.385790 8.850125 1.243403 4.070396 Listing 7.5 Using the Uniform class from R 7.2 Rcpp Modules The design of Rcpp modules has been influenced by Python modules which are generated by the Boost.Python library (Abrahams and Grosse-Kunstleve 2003). Rcpp modules provide a convenient and easy-to-use way to expose C++ functions and classes to R, grouped together in a single entity. An Rcpp module is created in a cpp file using the RCPP MODULE macro, which then provides declarative code of what the module exposes to R. 7.2.1 Exposing C++ Functions Using Rcpp Modules Consider the norm function from the previous section. We can expose it to R using a single line of code inside the RCPP MODULE macro: using namespace Rcpp; 2 double norm( double x, double y ) { 7.2 Rcpp Modules 87 return sqrt( x*x + y*y ); 4 } 6 8 RCPP_MODULE(mod) { function( "norm", &norm ); } Listing 7.6 Exposing the norm function via modules The code creates an Rcpp module called mod that exposes the norm function. Rcpp automatically deduces the conversions that are needed for input and output. This alleviates the need for a wrapper function using either Rcpp or the R API. On the R side, the module is retrieved by using the Module() function from Rcpp: 1 3 R> require( Rcpp ) R> mod <- Module( "mod" ) R> mod$norm( 3, 4 ) Listing 7.7 Using norm function exposed via modules A module can contain any number of calls to function to register many internal functions to R. For example, consider these six functions covering a range of input and return arguments: 1 std::string hello() { return "hello"; } 3 int bar( int x) { return x*2; } 5 double foo( int x, double y) { return x * y; } 7 void bla( ) { Rprintf( "hello\\n" ); } 9 void bla1( int x) { Rprintf( "hello (x = %d)\\n", x ); } 11 13 15 void bla2( int x, double y) { Rprintf( "hello (x = %d, y = %5.2f)\\n", x, y ); } Listing 7.8 A module example with six functions They can all be exposed to R with the following minimal code: 1 RCPP_MODULE(yada) { using namespace Rcpp; 3 5 7 function( function( function( function( function( "hello" "bar" "foo" "bla" "bla1" , , , , , &hello &bar &foo &bla &bla1 ); ); ); ); ); 88 7 Modules function( "bla2" 9 , &bla2 ); } Listing 7.9 Modules example interface We can now use them from R as follows: 2 4 6 8 R> R> R> R> R> R> R> R> require( Rcpp ) yada <- Module( "yada" ) yada$bar( 2L ) yada$foo( 2L, 10.0 ) yada$hello() yada$bla() yada$bla1( 2L) yada$bla2( 2L, 5.0 ) Listing 7.10 Modules example use from R The requirements for a function to be exposed to R via Rcpp modules are as follows: • The function has to take between 0 and 65 parameters. • Each input parameter must be manageable by the templated Rcpp::as conversion function. • The return type of the function must be either void or any type that can be managed by the Rcpp::wrap template conversion function. • The function name itself has to be unique in the module. In other words, no two functions with the same name but different signatures are allowed. While C++ allows overloading functions, Rcpp modules relies on named identifiers for the lookup and cannot allow two identical identifiers. 7.2.1.1 Documentation for Exposed Functions Using Rcpp Modules In addition to the name of the function and the function pointer, it is possible to pass a short description of the function as the third parameter of function. using namespace Rcpp; 2 4 double norm( double x, double y ) { return sqrt( x*x + y*y ); } 6 8 10 RCPP_MODULE(mod) { function("norm", &norm, "Provides a simple vector norm" ); } Listing 7.11 Modules example with function documentation 7.2 Rcpp Modules 89 The description is used when displaying the function to the R prompt: 2 4 R> mod <- Module( "mod", getDynLib( fx ) ) R> show( mod$norm ) internal C++ function <0x2477630> docstring : Provides a simple vector norm signature : double norm(double, double) Listing 7.12 Output for modules example with function documentation 7.2.1.2 Formal Arguments Specification Using function, we can specify the formal arguments of the R function that encapsulates the C++ function by passing a Rcpp::List after the function pointer and before the (also optional) documentation entry: 1 using namespace Rcpp; 3 double norm( double x, double y ) { return sqrt( x*x + y*y ); } 5 7 9 11 RCPP_MODULE(mod_formals) { function("norm", &norm, List::create(_["x"] = 0.0, _["y"] = 0.0), "Provides a simple vector norm"); } Listing 7.13 Modules example with documentation and formal arguments A simple usage example is provided below: 2 4 6 8 10 R> norm <- mod$norm R> norm() [1] 0 R> norm( y = 2 ) [1] 2 R> norm( x = 2, y = 3 ) [1] 3.605551 R> args( norm ) function (x = 0, y = 0) NULL Listing 7.14 Output for modules example with documentation and formal arguments To set formal arguments without default values, simply omit the right-hand side. using namespace Rcpp; 2 4 double norm( double x, double y ) { return sqrt( x*x + y*y ); } 90 7 Modules 6 8 10 RCPP_MODULE(mod_formals2) { function( "norm", &norm, List::create(_["x"], _["y"] = 0.0), "Provides a simple vector norm"); } Listing 7.15 Modules example with documentation and formal arguments without defaults This can be used as follows: 1 3 R> norm <- mod$norm R> args( norm ) function (x, y = 0) NULL Listing 7.16 Usage of modules example with documentation and formal arguments The ellipsis (...) can be used to denote that additional arguments are optional; it does not take a default value. using namespace Rcpp; 2 4 double norm( double x, double y ) { return sqrt( x*x + y*y ); } 6 8 10 RCPP_MODULE(mod_formals3) { function( "norm", &norm, List::create( _["x"], _["..."] ), "documentation for norm"); } Listing 7.17 Modules example with ellipis argument This now shows the ellipsis in the documentation output. 1 3 R> norm <- mod$norm R> args( norm ) function (x, ...) NULL Listing 7.18 Output of modules example with ellipis argument 7.2.2 Exposing C++ Classes Using Rcpp Modules Rcpp modules also provide a mechanism for exposing C++ classes, based on the Reference Classes which were first introduced in R release 2.12.0. 7.2 Rcpp Modules 91 7.2.2.1 Initial Example A class is exposed using the class keyword (and the trailing underscore is required as we cannot use the C++ language keyword class). The Uniform class may be exposed to R as follows: 2 4 using namespace Rcpp; class Uniform { public: Uniform(double min_, double max_) : min(min_), max(max_) {} 6 NumericVector draw(int n) const { RNGScope scope; return runif( n, min, max ); } 8 10 double min, max; 12 }; 14 16 double uniformRange( Uniform* w) { return w->max - w->min; } 18 RCPP_MODULE(unif_module) { 20 class_<Uniform>( "Uniform" ) 22 .constructor<double,double>() 24 .field( "min", &Uniform::min ) .field( "max", &Uniform::max ) 26 .method( "draw", &World::draw ) .method( "range", &uniformRange ) ; 28 30 32 } Listing 7.19 Exposing Uniform class using modules A short example follows and shows how to use this class: 2 4 6 8 10 R> Uniform <- unif_module$Uniform R> u <- new( Uniform, 0, 10 ) R> u$draw( 10L ) [1] 3.7950482 6.9525034 0.5783621 5.7234278 0.6869314 [6] 5.6403064 2.3408875 6.5695670 1.8821565 8.8553301 R> u$range() [1] 10 R> u$max <- 1 R> u$range() [1] 1 R> u$draw( 10 ) 92 12 7 Modules [1] 0.1987632 0.7598329 0.7276362 0.3101182 0.2300929 [6] 0.7121408 0.1005060 0.4007011 0.1643178 0.2252207 Listing 7.20 Using Uniform class via modules Here, class is templated by the C++ class or struct that is to be exposed to R. The parameter of the class <Uniform> constructor is the name we will use on the R side. It often makes sense to use the same name as the class name. While this is not enforced, it might be useful when exposing a class generated from a template. Then a single constructor, two fields and two methods are exposed to complete the simple example. Of the two methods, one accesses a class member function (draw), whereas the other uses a free function (uniformRange). 7.2.2.2 Exposing Constructors Using Rcpp Modules Public constructors that take from zero and seven parameters can be exposed to the R level using the .constuctor template method of .class . Optionally, .constructor can take a description as the first argument. 1 .constructor<double, double>( "sets the min and max value of the distribution") Listing 7.21 Constructor with a description Also, the second argument can be a function pointer (called validator) matching the following type : typedef bool (*ValidConstructor)(SEXP*, int); Listing 7.22 Constructor with a validator function pointer The validator can be used to implement dispatch to the appropriate constructor, when multiple constructors taking the same number of arguments are exposed. The default validator always accepts the constructor as valid if it is passed the appropriate number of arguments. For example, with the call above, the default validator accepts any call from R with two double arguments (or arguments that can be cast to double). 7.2.2.3 Exposing Fields and Properties class has three ways to expose fields and properties, as illustrated in the example below: 1 3 5 using namespace Rcpp; class Foo { public: Foo( double x_, double y_, double z_ ): x(x_), y(y_), z(z_) {} 7.2 Rcpp Modules 93 double x; double y; 7 9 double get_z() { return z; } void set_z( double z_ ) { z = z_; } 11 private: double z; 13 15 }; 17 RCPP_MODULE(mod_foo) { class_<Foo>( "Foo" ) 19 .constructor<double,double,double>() 21 .field( "x", &Foo::x ) .field_readonly( "y", &Foo::y ) 23 .property( "z", &Foo::get_z, &Foo::set_z ) ; 25 27 } Listing 7.23 Exposing fields and properties for modules The .field method exposes a public field with read/write access from R; field also accepts an extra parameter to give a short description of the field: 1 .field( "x", &Foo::x, "documentation for x" ) Listing 7.24 Field with documentation The .field readonly method exposes a public field with read-only access from R . It also accepts the description of the field. 1 .field_readonly( "y", &Foo::y, "documentation for y" ) Listing 7.25 Readonly-field with documentation The .property method allows indirect access to fields through a getter and a setter function. The setter is optional, and the property is considered read-only if the setter is not supplied. As before, an optional documentation string can also be supplied to describe the property: 1 3 5 // with getter and setter .property("z", &Foo::get_z, &Foo::set_z, "Documentation for z" ) // with only a getter .property( "z", &Foo::get_z, "Documentation for z" ) Listing 7.26 Property with getter and setter, or getter-only 94 7 Modules The type of the field (T) is deduced from the return type of the getter, and if a setter is given, its unique parameter should be of the same type. Getters can be member functions taking no parameters and returning a T (e.g., get z above), or a free function taking a pointer to the exposed class and returning a T, for example: double z_get( Foo* foo ) { return foo->get_z(); } Listing 7.27 Example of using a getter Setters can be either a member function taking a T and returning void, such as set z above, or a free function taking a pointer to the target class and a T : 1 void z_set( Foo* foo, double z ) { foo->set_z(z); } Listing 7.28 Example of using a setter Using properties gives more flexibility in case field access has to be tracked or has an impact on other fields. For example, this class keeps track of how many times the x field is read and written. 1 class Bar { public: 3 Bar(double x_) : x(x_), nread(0), nwrite(0) {} 5 double get_x( ) { nread++; return x; } 7 9 void set_x( double x_) { nwrite++; x = x_; } 11 13 15 IntegerVector stats() const { return IntegerVector::create( _["read"] = nread, _["write"] = nwrite ); } 17 19 21 private: double x; int nread, nwrite; 23 25 }; 27 29 31 RCPP_MODULE(mod_bar) { class_<Bar>( "Bar" ) .constructor<double>() 7.2 Rcpp Modules 95 .property( "x", &Bar::get_x, &Bar::set_x ) .method( "stats", &Bar::stats ) 33 ; 35 } Listing 7.29 Example code for properties Here is a simple usage example: 2 4 6 8 10 R> Bar <- mod_bar$Bar R> b <- new( Bar, 10 ) R> b$x + b$x [1] 20 R> b$stats() read write 2 0 R> b$x <- 10 R> b$stats() read write 2 1 Listing 7.30 Example using properties 7.2.2.4 Exposing Methods Using Rcpp Modules class has several overloaded and templated .method functions allowing the programmer to expose a method associated with the class. A legitimate method to be exposed by .method can be: • A public member function of the class, either const or non-const, that returns void or any type that can be handled by Rcpp::wrap, and that takes between 0 and 65 parameters whose types can be handled by Rcpp::as. ; or • A free function that takes a pointer to the target class as its first parameter, followed by 0 or more (up to 65) parameters that can be handled by Rcpp::as and returning a type that can be handled by Rcpp::wrap or void. Documenting Methods .method can also include a short documentation of the method, after the method (or free function) pointer. 1 3 .method( "stats", &Bar::stats, "vector indicating the number of times" "x has been read and written" ) Listing 7.31 Example documenting a method Note that the documentation string is really only one argument as there is no comma separating the two pieces. 96 7 Modules Const and Non-const Member Functions method is able to expose both const and non-const member functions of a class. There are, however, situations where a class defines two versions of the same method, differing only in their signature by the const-ness. This is, for example, the case of the member functions back of the std::vector template from the STL. 1 reference back ( ); const_reference back ( ) const; Listing 7.32 Const and non-const member functions To resolve the ambiguity, it is possible to use either the const method or the nonconst method instead of method in order to restrict the candidate methods. Special Methods Rcpp considers the methods [[ and [[<- special and promotes them to indexing methods on the R side. 7.2.2.5 Object Finalizers The .finalizer member function of class can be used to register a finalizer. A finalizer is a free function that takes a pointer to the target class and returns void. The finalizer is called before the destructor and so operates on a valid object of the target class. It can be used to perform suitable operations such as releasing resources, or summarizing and logging behavior. The finalizer is called automatically when the R object that encapsulates the C++ object is garbage-collected. 7.2.2.6 S4 Dispatch When a C++ class is exposed by the class template, a new S4 class is registered as well. The name of the S4 class is obfuscated in order to avoid name clashes (i.e., two modules exposing the same class). This allows for the implementation of R-level (S4) dispatch. For example, one might implement the show method for C++ World objects: 2 4 6 setMethod("show", yada$World, function(object) { msg <- paste("World object with message :", object$greet() ) writeLines( msg ) } ) Listing 7.33 Example of S4 dispatch 7.2 Rcpp Modules 97 7.2.2.7 Full Example The following example illustrates how to use Rcpp modules to expose the class std::vector<double> from the STL. 2 4 6 // convenience typedef typedef std::vector<double> vec; // helpers void vec_assign( vec* obj, Rcpp::NumericVector data ) { obj->assign( data.begin(), data.end() ); } 8 10 12 void vec_insert(vec* obj, int position, Rcpp::NumericVector data) { vec::iterator it = obj->begin() + position; obj->insert( it, data.begin(), data.end() ); } 14 16 Rcpp::NumericVector vec_asR( vec* obj ) { return Rcpp::wrap( *obj ); } 18 20 void vec_set( vec* obj, int i, double value ) { obj->at( i ) = value; } 22 24 26 28 30 32 34 36 38 40 42 RCPP_MODULE(mod_vec) { using namespace Rcpp; // we expose the class std::vector<double> // as "vec" on the R side class_<vec>( "vec") // exposing constructors .constructor() .constructor<int>() // exposing member functions .method( "size", &vec::size) .method( "max_size", &vec::max_size) .method( "resize", &vec::resize) .method( "capacity", &vec::capacity) .method( "empty", &vec::empty) .method( "reserve", &vec::reserve) .method( "push_back", &vec::push_back ) .method( "pop_back", &vec::pop_back ) .method( "clear", &vec::clear ) 44 46 48 // specifically exposing const member functions .const_method( "back", &vec::back ) .const_method( "front", &vec::front ) .const_method( "at", &vec::at ) 98 7 Modules // exposing free functions taking a // std::vector<double>* as their first argument .method( "assign", &vec_assign ) .method( "insert", &vec_insert ) .method( "as.vector", &vec_asR ) 50 52 54 // special methods for indexing .const_method( "[[", &vec::at ) .method( "[[<-", &vec_set ) 56 58 ; 60 } Listing 7.34 Complete example of exposing std::vector<double> The R usage is as follows: 1 3 5 7 9 R> R> R> R> R> R> R> R> R> vec <- mod_vec$vec v <- new( vec ) v$reserve( 50L ) v$assign( 1:10 ) v$push_back( 10 ) v$size() v$capacity() v[[ 0L ]] v$as.vector() Listing 7.35 R use of std::vector<double> modules example 7.3 Using Modules in Other Packages 7.3.1 Namespace Import/Export 7.3.1.1 Import All Functions and Classes When using Rcpp modules in a package, the client package needs to import Rcpp’s namespace. This is achieved by adding the following line to the NAMESPACE file. 1 import( Rcpp ) Listing 7.36 R NAMESPACE import of Rcpp for modules Loading the module must happen after the dynamic library of the package is loaded. There are two approaches. The older one uses the .onLoad() hook. 1 # grab the namespace NAMESPACE <- environment() 3 5 .onLoad <- function(libname, pkgname) { ## load the module and store it in our namespace 7.3 Using Modules in Other Packages yada <- Module( "yada" ) populate( yada, NAMESPACE 7 99 ) } Listing 7.37 R .onLoad() code for module The call to populate installs all functions and classes from the module into the namespace of package. 7.3.1.2 Import All Modules in a Package There is also a convenience function loadRcppModules() that loops over all modules declared in the DESCRIPTION file. The loadRcppModules() function has a single argument direct with a default value of TRUE implying that all (exported) identifiers in the module are directly populated into the namespace of the module. Otherwise, only the module is exposed and its functions need to be addressed indirectly (as, for example, via v$size()). The loadRcppModules() function has to be called from the .onLoad() function as well. 7.3.1.3 Using loadModule A separate function loadModule() has been available since release 0.9.11 of Rcpp. It can be used in any .R function in a package (and not just in .onLoad()). Its first argument is the module name. The second argument can be used to detail which parts of a module should be loaded; the special value TRUE signals that all objects with valid names are exported. As an example, RcppBDT uses loadModule("bdtDtMod", TRUE) to load all components of the bdtDtMod module. Similarly, the RcppCNPy package uses loadModule("cnpy", TRUE) to load its sole module cnpy. 7.3.2 Support for Modules in Skeleton Generator The Rcpp.package.skeleton() function has been extended to facilitate the use of Rcpp modules. When the module argument is set to TRUE, the skeleton generator installs code that uses a simple module. R> Rcpp.package.skeleton( "testmod", module = TRUE ) Listing 7.38 Package skeleton support for modules This provides the easiest way to create a new package containing Rcpp modules code. 100 7 Modules 7.3.3 Module Documentation Rcpp defines a prompt() method for the Module class, allowing generation of a skeleton of an Rd file containing some information about the module. 1 R> yada <- Module( "yada" ) R> prompt( yada, "yada-module.Rd" ) Listing 7.39 Use of prompt for documentation skeleton 7.4 Case Study: The RcppCNPy Package Modules are a very powerful tool. They are suited to exposing code from existing class libraries as the RcppBDT package discussed in the previous section illustrates. The Rcpp attributes system uses modules to easily connect the wrappers it generates for the user-supplied code. Modules can be used to provide simple wrappers to external libraries. A simple example is provided by the RcppCNPy package (Eddelbuettel 2012a). It uses a small stand-alone library provided in a single header and source files in order to access NumPy files used by this popular Python extension. In the package, two functions npyLoad() and npySave() are defined. They simply transfer data between a given file name and R by relying on the external library provided by the source files cnpy.cpp and cnpy.h. Listing 7. 40 shows simplified versions of these two functions. We have omitted several aspects to keep the exposition shorter: the special case of transposing, the additional layer of transparently dealing with gzip-compressed files, the support for long long types, as well as the support for different underlying data types concentratic on just numeric. 2 4 6 8 10 12 14 16 18 Rcpp::RObject npyLoad(const std::string & filename, const std::string & type) { cnpy::NpyArray arr; arr = cnpy::npy_load(filename); std::vector<unsigned int> shape = arr.shape; SEXP ret = R_NilValue; if (shape.size() == 1) { if (type == "numeric") { double *p = reinterpret_cast<double*>(arr.data); ret = Rcpp::NumericVector(p, p + shape[0]); } else { arr.destruct(); Rf_error("Unsupported type in npyLoad"); } } else if (shape.size() == 2) { if (type == "numeric") { ret = Rcpp::NumericMatrix(shape[0], shape[1], 7.4 Case Study: The RcppCNPy Package reinterpret_cast<double*>(arr.data)); } else { arr.destruct(); Rf_error("Unsupported type in npyLoad"); } } else { arr.destruct(); Rf_error("Unsupported dimension in npyLoad"); } arr.destruct(); return ret; 20 22 24 26 28 30 } 32 void npySave(std::string filename, Rcpp::RObject x, std::string mode) { if (::Rf_isMatrix(x)) { if (::Rf_isNumeric(x)) { Rcpp::NumericMatrix mat = transpose(Rcpp::NumericMatrix(x)); std::vector<unsigned int> shape = Rcpp::as<std::vector<unsigned int> >( Rcpp::IntegerVector::create(mat.ncol(), mat.nrow())); 34 36 38 40 101 42 cnpy::npy_save(filename, mat.begin(), &(shape[0]), 2, mode); } else { Rf_error("Unsupported matrix type\n"); } } else if (::Rf_isVector(x)) { if (::Rf_isNumeric(x)) { Rcpp::NumericVector vec(x); std::vector<unsigned int> shape = Rcpp::as<std::vector<unsigned int> >( Rcpp::IntegerVector::create(vec.length())); cnpy::npy_save(filename, vec.begin(), &(shape[0]), 1, mode); } else { Rf_error("Unsupported vector type\n"); } } else { Rf_error("Unsupported type\n"); } 44 46 48 50 52 54 56 58 60 62 } Listing 7.40 NumPy load and save functions defined in RcppCNPy The following module declaration is all it takes to make the functions known to R along with some standard declaration done by the rest of the package, and provided by helper functions like Rcpp.package.skeleton() when the modules= TRUE option is selected. 102 2 7 Modules RCPP_MODULE(cnpy){ 4 using namespace Rcpp; 6 // name of the identifier at the R level function("npyLoad", // function pointer to helper function defined above &npyLoad, // function arguments including default value List::create( Named("filename"), Named("type") = "numeric", Named("dotranspose") = true), "read an npy file into a numeric vector or matrix"); 8 10 12 14 // name of the identifier at the R level function("npySave", // function pointer to helper function defined above &npySave, // function arguments including default value List::create( Named("filename"), Named("object"), Named("mode") = "w"), "save an R object to an npy file"); 16 18 20 22 24 26 } Listing 7.41 Example of module declaration in RcppCNPy 7.5 Further Examples Several packages on CRAN now use Rcpp modules. As of late 2012, the list comprises the packages GUTS (Albert and Vogel 2012), RSofia (King and Diaz 2011), RcppBDT (Eddelbuettel and François 2012b), RcppCNPy (Eddelbuettel 2012a), cda (Auguie 2012a), highlight (François 2012a), maxent (Jurka and Tsuruoka 2012), parser (François 2012b), planar (Auguie 2012b), and transmission (Thomas and Redd 2012). Chapter 8 Sugar Abstract This chapter describes Rcpp sugar which brings a higher level of abstraction to C++ code written using the Rcpp API. Rcpp sugar is based on expression templates and provides some “syntactic sugar” facilities directly in Rcpp. In this chapter, we will introduce many of the very useful Rcpp sugar features. As our focus is firmly on using Rcpp sugar, we will do so without venturing too deeply into the template meta programming approach used to implement it. Some technical details are provided at the end, and this section can be skipped by users who are interested primarily in using, rather than extending, Rcpp sugar. A brief simulation example using Rcpp sugar concludes the chapter. 8.1 Motivation Rcpp facilitates development of compiled code to extend R via either inline code or an R package by abstracting low-level details of the R API (R Development Core Team 2012d) into a consistent set of C++ classes. Code written using Rcpp classes is easier to read, write, and maintain, without losing performance. Consider the following code example which provides a function foo as a C++ extension to R by using the Rcpp API: 2 4 6 8 10 RcppExport SEXP foo( SEXP xs, SEXP ys) { Rcpp::NumericVector xv(xs); Rcpp::NumericVector yv(ys); int n = xv.size(); Rcpp::NumericVector res( n ); for (int i=0; i<n; i++) { double x = xv[i]; double y = yv[i]; if( x < y ) { res[i] = x * x; } else { D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4 8, © The Author 2013 103 104 8 Sugar res[i] = -( y * y); 12 } } return res; 14 16 } Listing 8.1 A simple C++ function operating on vectors The goal of the function foo code is simple. We pass two numeric vectors passed from R (as SEXP types) and create two Rcpp vectors. We then create a third one of the same length as x, fill it, and return it to R (and let us ignore for a moment that the actual transformation from xv and yv into the results vector res is not all that meaningful). This function shows typical low-level C++ code that could be written much more concisely in R, thanks to vectorization as shown in the next example. 2 R> foo <- function(x, y){ + ifelse( x < y, x*x, -(y*y) ) + } Listing 8.2 A simple R function operating on vectors Put succinctly, the motivation of Rcpp sugar is to bring a subset of the high-level R syntax to C++. Hence, with Rcpp sugar, the C++ version of foo now becomes 1 3 5 RcppExport SEXP foo( SEXP xs, SEXP ys){ Rcpp::NumericVector x(xs) ; Rcpp::NumericVector y(ys) ; return ifelse( x < y, x*x, -(y*y) ) ; } Listing 8.3 A simple C++ function using sugar operating on vectors which is only about a third as long as the initial version. More importantly, it permits us to collapse explicit loops (which are common in C++ but we note that, for example, the STL offers alternatives) into vectorized expressions just as it would in R. Apart from the fact that we need to assign the two objects we obtain from R— which is a simple statement each thanks to the template magic in Rcpp, and as previously discussed also a lightweight operation copying only a pointer—and the need for an explicit return statement, the code is now identical between highly vectorized R and C++. So Rcpp sugar enables us to express a vectorized expression in C++ just as easily as in R. Rcpp sugar is written using expression templates and lazy evaluation techniques (Abrahams and Gurtovoy 2004; Vandevoorde and Josuttis 2003). This not only allows for a much nicer high-level syntax but also makes it rather efficient as we detail further in Sects. 8.4 and 8.6 below. 8.2 Operators 105 8.2 Operators Rcpp sugar takes advantage of C++ operator overloading. The next few sections discuss several examples. 8.2.1 Binary Arithmetic Operators Rcpp sugar defines the usual binary arithmetic operators: +, -, *, /. 1 3 5 7 9 11 13 15 17 19 // two numeric vectors of the same size NumericVector x; NumericVector y; // expressions involving two vectors NumericVector res = x + y; NumericVector res = x - y; NumericVector res = x * y; // NB element-wise multiplication NumericVector res = x / y; // one vector, one single value NumericVector res = x + 2.0; NumericVector res = 2.0 - x; NumericVector res = y * 2.0; NumericVector res = 2.0 / y; // two expressions NumericVector res = x * y + y / 2.0; NumericVector res = x * ( y - 2.0 ); NumericVector res = x / ( y * y ); Listing 8.4 Binary arithmetic operators for sugar The left-hand side (lhs) and the right-hand side (rhs) of each binary arithmetic expression must be of the same type (e.g., they should be both numeric expressions). The lhs and the rhs can either have the same size or one of them could be a primitive value of the appropriate type, for example adding a NumericVector and a double. This is different from R which uses a recycling rule for its operation. When a shorter vector, say, of length four is added to a longer vector of length eight, the recycling operation can succeed as an integer multiple (here: two) of the shorter vector’s length (here: four) is equal to the longer vector’s length (here: eight). This behavior is not emulated in Rcpp sugar where either the two operants have to be of the same length or one has to be a single primitive C++ type such as double. 106 8 Sugar 8.2.2 Binary Logical Operators Binary logical operators create a logical sugar expression, from either two sugar expressions of the same type or one sugar expression and a primitive value of the associated type. 2 // two integer vectors of the same size NumericVector x; NumericVector y; 4 6 8 10 // expressions involving two vectors LogicalVector res = x < y; LogicalVector res = x > y; LogicalVector res = x <= y; LogicalVector res = x >= y; LogicalVector res = x == y; LogicalVector res = x != y; 12 14 16 // one vector, one single value LogicalVector res = x < 2; LogicalVector res = 2 > x; LogicalVector res = y <= 2; LogicalVector res = 2 != y; 18 20 22 // two expressions LogicalVector res = ( x + y ) < ( x*x ); LogicalVector res = ( x + y ) >= ( x*x ); LogicalVector res = ( x + y ) == ( x*x ); Listing 8.5 Binary logical operators for sugar 8.2.3 Unary Operators The unary operator- can be used to negate a (numeric) sugar expression, whereas the unary operator! negates a logical sugar expression: 2 4 // a numeric vector NumericVector x; // negate x NumericVector res = -x; 6 8 10 12 14 // use it as part of a numerical expression NumericVector res = -x * ( x + 2.0 ); // two integer vectors of the same size NumericVector y; NumericVector z; // negate the logical expression "y < z" LogicalVector res = ! ( y < z ); Listing 8.6 Unary operators for sugar 8.3 Functions 107 8.3 Functions Rcpp sugar defines functions that closely match the behavior of R functions of the same name. 8.3.1 Functions Producing a Single Logical Result Given a logical sugar expression, the all function identifies if all the elements are TRUE. Similarly, the any function identifies if any one of the elements of a given logical sugar expression is TRUE. 1 3 IntegerVector x = seq_len( 1000 ); all( x*x < 3 ); any( x*x < 3 ); Listing 8.7 Functions returning a single boolean result Either call to all and any creates an object of a class that has member functions is true, is false, is na and a conversion to SEXP operator. One important thing to highlight is that all is lazy. Unlike in R, there is no need to fully evaluate the expression. In the example above, the result of all is fully resolved after evaluating only the first two indices of the expression x * x < 3. any is lazy too, so it will only need to resolve the first element of the example above. Another important consideration is the conversion to the bool type. In order to respect the concept of missing values (NA) in R, expressions generated by any or all cannot be converted directly to bool. Instead one must use is true, is false or is na: 1 // wrong: will generate a compile error bool res = any( x < y) ); 3 5 7 // ok bool res = is_true( any( x < y ) ); bool res = is_false( any( x < y ) ); bool res = is_na( any( x < y ) ); Listing 8.8 Using functions returning a single boolean result 8.3.2 Functions Producing Sugar Expressions 8.3.2.1 is na Given a sugar expression of any type, is_na (just like the other functions in this section) produces a logical sugar expression of the same length. Each element of the 108 8 Sugar result expression evaluates to TRUE if the corresponding input is a missing value, or FALSE otherwise. 1 IntegerVector x = IntegerVector::create( 0, 1, NA_INTEGER, 3 ); 3 is_na( x ); all( is_na( x ) ); any( ! is_na( x ) ); 5 Listing 8.9 Example using is na sugar function 8.3.2.2 seq along Given a sugar expression of any type, seq along creates an integer sugar expression whose values go from 1 to the size of the input. 1 IntegerVector x = IntegerVector::create( 0, 1, NA_INTEGER, 3 ); 3 seq_along( x ); seq_along( x * x * x * x * x * x * x ); Listing 8.10 Example using seq along sugar function This is a “lazy” (in the R evaluation sense) function, as it only needs to call the size member function of the input expression. In other words, the value of the input expression does need not to be computed. The two examples above give the same result with the same efficiency at run-time. The compile time will be affected by the complexity of the second expression, since the abstract syntax tree is built at compile time. 8.3.2.3 seq len seq len creates an integer sugar expression whose ith element expands to i. This makes seq len particularly useful for functions such as sapply and lapply (which are similar to their R equivalents, and discussed below). 2 // 1, 2, ..., 10 IntegerVector x = seq_len( 10 ); 4 lapply( seq_len(10), seq_len ); Listing 8.11 Example using seq len sugar function 8.3.2.4 pmin and pmax Given two sugar expressions of the same type and size, or one expression and one primitive value of the appropriate type, pmin (pmax) generates a sugar expression 8.3 Functions 109 of the same type whose ith element expands to the lowest (highest) value between the ith element of the first expression and the ith element of the second expression. IntegerVector x = seq_len( 10 ); 2 4 6 pmin( x, x*x ); pmin( x*x, 2 ); pmin( x, x*x ); pmin( x*x, 2 ); Listing 8.12 Example using pmin and pmax sugar function 8.3.2.5 ifelse Given a logical sugar expression and either • Two compatible sugar expressions (same type, same size) or • One sugar expression and one compatible primitive. ifelse expands to a sugar expression whose ith element is the ith element of the first expression if the ith element of the condition expands to TRUE, or the ith of the second expression if the ith element of the condition expands to FALSE, or the appropriate missing value otherwise. 1 IntegerVector x; IntegerVector y; 3 5 ifelse( x < y, x, (x+y)*y ); ifelse( x > y, x, 2 ); Listing 8.13 Example using ifelse sugar function 8.3.2.6 sapply sapply applies a C++ function to each element of the given expression to create a new expression. The type of the resulting expression is deduced by the compiler from the result type of the function. The function can be a free C++ function such as the overload generated by the template function below: 1 3 5 template <typename T> T square( const T& x){ return x * x; } sapply( seq_len(10), square<int> ); Listing 8.14 Example using sapply sugar function 110 8 Sugar Alternatively, the function can be a functor whose type has a nested type called result type. One way to satisfy this requirement is by inheriting from the std::unary_function functor: 1 3 5 7 template <typename T> struct square : std::unary_function<T,T> { T operator()(const T& x){ return x * x; } } sapply( seq_len(10), square<int>() ); Listing 8.15 Example using std::unary function functor with sapply 8.3.2.7 lapply lapply is similar to sapply except that the result is always a list expression (an expression of type VECSXP). 8.3.2.8 mapply mapply is similar to sapply and lapply but permits multiple vectors as input. This is (at least currently) limited to either two or three vectors. We can modify the example from Listing 8.14 to illustrate how mapply can be used to work on multiple vectors. Here, instead of computing the squared value of each element, we “sweep” a sum of squares calculation across two vectors. 1 3 5 template <typename T> struct sumOfSquares : std::unary_function<T,T> { T operator()(const T& x, const T& y){ return x*x + y*y; } } 7 9 NumericVector res; res = mapply(seq_len(10), seq_len(10), sumOfSquares<double>() ); Listing 8.16 Example using std::unary function functor with mapply 8.3.2.9 sign Given a numeric or integer expression, sign expands to an expression whose values are one of 1, 0, −1, or NA, depending on the sign of the input expression. 8.3 Functions 111 IntegerVector xx; 2 4 sign( xx ); sign( xx * xx ); Listing 8.17 Example using sign sugar function 8.3.2.10 diff The ith element of the result of diff is the difference between the (i + 1)th and the ith element of the input expression. Supported types are integer and numeric. IntegerVector xx; 2 diff( xx ); Listing 8.18 Example using sign sugar function 8.3.2.11 setdiff The setdiff function returns the values of the first vector which are not contained in the second vector; this is analogous to the R version. 1 IntegerVector xx, yy; 3 setdiff( xx, yy ); Listing 8.19 Example using setdiff sugar function 8.3.2.12 union The union function returns the union of the two vectors. The function has to be named with the trailing underscore in order not to conflict with the language keyword union. 1 IntegerVector xx, yy; 3 union_( xx, yy ); Listing 8.20 Example using union sugar function 112 8 Sugar 8.3.2.13 intersect The intersect function returns the intersection of the two vectors. 1 IntegerVector xx, yy; 3 intersect( xx, yy ); Listing 8.21 Example using intersect sugar function 8.3.2.14 clamp The clamp function combines the application of both pmin and pmax. Calling clamp(a, x, b) computes the same result as pmax(a, pmin(x, b)). In other words, it returns the values of the vector x limited to a minimum value of a and a maximum value of b. 1 IntegerVector xx; int a, b; 3 clamp( a, xx, b ); Listing 8.22 Example using clamp sugar function 8.3.2.15 unique The unique function returns the subset of unique values among its input vector. IntegerVector xx; 2 unique( xx ); Listing 8.23 Example using unique sugar function 8.3.2.16 sort unique The sort unique function combines the results from unique with a call to sort. 8.3.2.17 table The table function returns a named vector with counts of the occurrences of each of the named elements in the input vector, just like the R function table. 8.3 Functions 113 1 IntegerVector xx; 3 table( xx ); Listing 8.24 Example using table sugar function 8.3.2.18 duplicated The duplicated function returns a logical vector indicated whether the element at position i in the input vector duplicates a previous value. 1 IntegerVector xx; 3 duplicated( xx ); Listing 8.25 Example using duplicated sugar function 8.3.3 Mathematical Functions For the following set of functions, generally speaking, the ith element of the result of the given function (say, abs) is the result of applying that function to this ith element of the input expression. Supported types are integer and numeric. Some functions reduce the input vector to a scalar result. Examples such as min(), max(), mean(), var() or sd() show that the commonly assumed functionality is also provided by these Rcpp sugar functions. 1 3 5 7 9 11 13 15 17 19 NumericVector x, y; int k; double z; abs(x); exp(x); floor(x); ceil(x); pow(x, z); log(x); log10(x); sqrt(x); sin(x); cos(x); tan(x); sinh(x); cosh(x); tanh(x); asin(x); acos(x); # x to the power of z 114 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 8 Sugar atan(x); gamma(x); lgamma(x); # log gamma digamma(x); trigamma(x); tetragamma(x); pentagamma(x); expm1(x); log1p(x); factorial(x); lfactorial(x); choose(n, k); lchoose(n, k); beta(n, k); lbeta(n, k); psigamma(n, k); trunc(x); round(x, k); signif(x, k); mean(x); var(x); sd(x); sum(x); cumsum(x); min(x); max(x); range(x); which_min(x); which_max(x); setequal(x, y); Listing 8.26 Examples using mathematical sugar functions 8.3.4 The d/q/p/q Statistical Functions The framework provided by Rcpp sugar also permits easy and efficient access to the density, distribution function, quantile and random number generation functions used by R itself. These are also provided via the Rmath library and, as discussed in Sect. 4.8, available via the R namespace provided by the Rcpp package for this part of the R API. In general, the functions provided by Rcpp sugar are vectorized for the first element. Consequently, in the following example, the function calls work in C++ just as they would in R: 2 4 x1 x2 x3 x4 = = = = dnorm(y1, 0, 1); pnorm(y2, 0, 1); qnorm(y3, 0, 1); rnorm(n, 0, 1); // // // // density of y1 at m=0, sd=1 distribution function of y2 quantiles of y3 ’n’ RNG draws of N(0, 1) Listing 8.27 Examples of d/p/q/r statistical sugar functions sugar 8.4 Performance 115 For x1 to x3, the resulting vector is of the same dimension as the input y1 to y3. Similar d/q/p/r functions are provided for the most common distributions: beta, binom, cauchy, chisq, exp, F, gamma, geom, hyper, lnorm, logis, nbeta, nbinom, nbinom mu, nchisq, nf, norm, nt, pois, t, unif, and weibull. One important point is that the programmer using the random number generator functions needs to initialize the state of the random number generator as detailed in Section 6.3 of the “Writing R Extensions” manual (R Development Core Team 2012d). To help with this, the Rcpp package offers a convenient C++ solution: A scoped class that sets the random number generator on entry to a block and resets it on exit. The following example uses this RNGScope class. The function defines the code block in which the scoped variable is active; and here RNGScope is activated upon entering the function. Therefore, the random number generator can be called in order to assign values to x. The scoped variable is then destroyed after the core function code terminates with the return statement. RNGScope does not have to be the first statement. In fact, it can be placed anywhere in the function scope—but it has to be called before the first call to the random number generator is made. 2 4 RcppExport SEXP getRGamma() { RNGScope scope; NumericVector x = rgamma( 10, 1, 1 ); return x; } Listing 8.28 Examples of using sugar RNG functions with RNGScope As there is some computational overhead involved in using RNGScope, we are not wrapping it automatically around each inner function generating random numbers. Rather, the user of these functions should place an RNGScope at the appropriate level of his or her code. In cases where scalar functions of a single argument returning a single are required, these are provided via R namespace as discussed in Sect. 4.8. The interface is identical to the one offered by the header file Rmath.h of the R installation. This additional interface is provided as Rcpp sugar requires the original header file to be used via a remapping through prefix Rf_, whereas the added functions can be used directly from the new namespace cleanly separating these identifiers. 8.4 Performance The Rcpp package contains a complete example illustrating possible performance gains from using Rcpp sugar in the directory examples/SugarPerformance. It compares the performance on four different R expressions between calling running it as an R expression, running it via hand-optimized C++ code, and running it via the Rcpp sugar vectorized C++ approach. Four different expressions are evaluated covering any, ifelse (where we use two variants of the handwritten code with and without checks for missing values), and sapply. 116 8 Sugar Table 8.1 Run-time performance of Rcpp sugar compared to R and manually optimized C++ R expression Runs Manual Sugar R any(x * y < 0) ifelse(x<y, x*x, -(y*y)) ifelse(x<y, x*x, -(y*y)) (noNA) sapply(x, square) 5,000 500 500 500 0.00027 1.28566 0.41462 0.16721 0.00069 1.52103 1.14434 0.19224 6.8914 13.8829 13.8537 115.4236 As can be seen in Table 8.1, performance varies greatly. This is most dramatic in the first example. An R expression such as any(x * y < 0) will always be evaluated for all pairwise elements in the two vectors. The Rcpp sugar implementation, on the other hand, can take a shortcut and stop as soon as one of the expressions swept across all pairwise elements evaluates to true. After all, the tests is only for “at least one” rather than a full count. The combination of compiled code together with a possible short-circuit exit makes the C++ implementation much faster. In fact, the ratio of C++ time to R time is almost 1–10,000. Handwritten code can still be faster than the Rcpp sugar code by a small factor. The second and third examples illustrate the vector function introduced at the beginning of this chapter. We show two sets of results: In the second example, a standard implementation which, just like R itself, tests all elements for NA (which imposes an additional performance burden); and, the third example which does not perform these tests for NA. Here, we find a 9- and 12-fold gain, respectively, from Rcpp sugar relative to the vectorized R code. The manually written C++ code gains most from omitting the test for NA, whereas the default version is only marginally slower. The fourth and final example illustrates sweeping a function (here computing a square of its argument) over a vector using sapply. Once again, the Rcpp sugar variant is a lot faster than the R version; the ratio of the two measurements is about 600. The Rcpp sugar code is only marginally slower than a handwritten loop. This section illustrates that Rcpp sugar can offer substantial performance gains. While manually written C++ code is seen to be marginally faster, the more concise vectorized code offered by Rcpp sugar might be easier to write and maintain making it an attractive proposition. 8.5 Implementation This section details some of the techniques used in the implementation of Rcpp sugar. Note that the user need not to be familiar with the implementation details in order to use Rcpp sugar, so this section can be skipped during an initial read of the chapter. 8.5 Implementation 117 Writing Rcpp sugar functions is fairly repetitive and follows a well-structured pattern. So once the basic concepts are mastered (which may take time given the inherent complexities in template programming), it should be possible to extend the set of functions further following the established pattern. 8.5.1 The Curiously Recurring Template Pattern Expression templates such as those used by Rcpp sugar employ a technique called the Curiously Recurring Template Pattern (CRTP).1 The general form of CRTP is: 1 // The Curiously Recurring Template Pattern (CRTP) 3 // A templated base class template <typename T> struct base { // ... }; 5 7 9 11 13 // A derived class // which is a template for the base class it inherits from struct derived : base<derived> { // ... }; Listing 8.29 The Curiously Recurring Template Pattern (CRTP) The base class is templated by the class that derives from it: derived. This shifts the relationship between a base class and a derived class—and allows the base class to access methods of the derived class. 8.5.2 The VectorBase Class The CRTP is used as the basis for Rcpp sugar with the VectorBase class template. All sugar expressions derive from one class generated by the VectorBase template. The current definition of VectorBase is given here: 1 3 5 7 template <int RTYPE, bool na, typename VECTOR> class VectorBase { public: struct r_type : traits::integral_constant<int,RTYPE>{}; struct can_have_na : traits::integral_constant<bool,na>{}; typedef typename 1 The Wikipedia page at http://en.wikipedia.org/wiki/Curiously_recurring_ template_pattern has a good introduction and further pointers. 118 8 Sugar traits::storage_type<RTYPE>::type stored_type; 9 VECTOR& get_ref(){ return static_cast<VECTOR&>(*this); } 11 13 inline stored_type operator[]( int i) const { return static_cast<const VECTOR*>(this)->operator[](i); } 15 17 inline int size() const { return static_cast<const VECTOR*>(this)->size(); } 19 21 /* definition omitted here */ class iterator; 23 25 inline iterator begin() const { return iterator(*this, 0); } inline iterator end() const { return iterator(*this, size() ); } 27 29 31 } Listing 8.30 The VectorBase class for Rcpp sugar The VectorBase template has three parameters. RTYPE which controls the type of the underlying SEXP expression. na which embeds in the derived type information about whether instances may contain missing values. Rcpp vector types (IntegerVector, . . . ) derive from VectorBase with this parameter set to true because there is no way to know at compile-time if the vector will contain missing values at run-time. However, this parameter is set to false for types that are generated by sugar expressions as these are guaranteed to produce expressions that are without missing values. An example is the is na function. This parameter is used in several places as part of the compile time dispatch to limit the occurrence of redundant operations. VECTOR which is a key component of Rcpp sugar. This is the manifestation of CRTP. The indexing operator and the size method of VectorBase use a static cast of this to the VECTOR type to forward calls to the actual method of the derived class. 8.5.3 Example: sapply As an example, the current implementation of sapply, supported by the template class Rcpp::sugar::Sapply is given below in Listing 8.31. 8.5 Implementation 2 4 6 8 119 template <int RTYPE, bool NA, typename T, typename Function> class Sapply : public VectorBase< Rcpp::traits::r_sexptype_traits< typename ::Rcpp::traits::result_of<Function>::type >::rtype, true, Sapply<RTYPE,NA,T,Function> > { public: typedef typename ::Rcpp::traits::result_of<Function>::type; 10 12 const static int RESULT_R_TYPE = Rcpp::traits::r_sexptype_traits<result_type>::rtype; 14 typedef Rcpp::VectorBase<RTYPE,NA,T> VEC; 16 typedef typename Rcpp::traits::r_vector_element_converter< RESULT_R_TYPE>::type converter_type; 18 20 22 typedef typename Rcpp::traits::storage_type< RESULT_R_TYPE>::type STORAGE; Sapply( const VEC& vec_, Function fun_ ) : vec(vec_), fun(fun_){} 24 26 inline STORAGE operator[]( int i ) const { return converter_type::get( fun( vec[i] ) ); } 28 inline int size() const { return vec.size(); } 30 34 private: const VEC& vec; Function fun; }; 36 // sugar 38 template <int RTYPE, bool _NA_, typename T, typename Function> inline sugar::Sapply<RTYPE,_NA_,T,Function> sapply(const Rcpp::VectorBase<RTYPE,_NA_,T>& t, Function fun) { return sugar::Sapply<RTYPE,_NA_, T,Function>(t, fun); } 32 40 42 Listing 8.31 The sapply Rcpp sugar implementation 120 8 Sugar 8.5.3.1 The sapply Function sapply is a template function that takes two arguments. • The first argument is a sugar expression, which we recognize because of the relationship with the VectorBase class template. • The second argument is the function fun to apply. The sapply function itself does not do anything. Rather, it is used to trigger compiler detection of the template parameters that will be used in the sugar:: Sapply template. 8.5.3.2 Detection of Return Type of the Function In order to decide what kind of expression is built, the Sapply template class queries the template argument via the Rcpp::traits::result of template. 2 typedef typename ::Rcpp::traits::result_of<Function>::type result_type; Listing 8.32 Rcpp::traits::result of template The result of type trait is implemented as follows: 2 4 6 8 template <typename T> struct result_of { typedef typename T::result_type type; }; template <typename RESULT_TYPE, typename INPUT_TYPE> struct result_of< RESULT_TYPE (*)(INPUT_TYPE) >{ typedef RESULT_TYPE type; }; Listing 8.33 result of trait implementation The generic definition of result of targets functors which contain a nested result type type. The second definition is a partial specialization targeting function pointers. 8.5.3.3 Identification of Expression Type Based on the result type of the function, the r sexptype traits trait is used to identify the expression type. 1 const static int RESULT_R_TYPE = Rcpp::traits::r_sexptype_traits<result_type>::rtype; Listing 8.34 Rcpp::traits::r sexptype traits template 8.5 Implementation 121 8.5.3.4 Converter The r vector element converter class is used to convert an object of the function’s result type to the actual storage type suitable for the sugar expression. 2 typedef typename Rcpp::traits::r_vector_element_converter<RESULT_R_TYPE>::type converter_type; Listing 8.35 r vector element converter class 8.5.3.5 Storage Type The storage type trait is used to get access to the storage type associated with a sugar expression type. For example, the storage type of a REALSXP expression is double. 1 typedef typename Rcpp::traits::storage_type<RESULT_R_TYPE>::type STORAGE; Listing 8.36 storage type trait 8.5.3.6 Input Expression Base Type The input expression—the expression over which sapply runs—is also defined in a typedef for convenience: typedef Rcpp::VectorBase<RTYPE,NA,T> VEC; Listing 8.37 Input expression base type 8.5.3.7 Output Expression Base Type In order to be part of the Rcpp sugar system, the type generated by the Sapply class template must inherit from VectorBase. 1 3 5 7 template <int RTYPE, bool NA, typename T, typename Function> class Sapply : public VectorBase< Rcpp::traits::r_sexptype_traits< typename ::Rcpp::traits::result_of<Function>::type>::rtype, true, Sapply<RTYPE, NA, T, Function> > Listing 8.38 Output expression base type 122 8 Sugar Here we have three arguments. First, the expression built by Sapply depends on the result type of the function. Second, it may contain missing values. The third argument is the manifestation of the CRTP. 8.5.3.8 Constructor The constructor of the Sapply class template is straightforward, it simply consists of holding the reference to the input expression and the function. 1 Sapply( const VEC& vec_, Function fun_ ) : vec(vec_), fun(fun_){} 3 5 private: const VEC& vec; Function fun; Listing 8.39 Constructor for Sapply class template 8.5.3.9 Implementation The indexing operator and the size member function is what the VectorBase expects. The size of the result expression is the same as the size of the input expression and the ith element of the result is simply retrieved by applying the function and the converter. Both these methods are inlined to maximize performance: 2 4 inline STORAGE operator[]( int i ) const { return converter_type::get( fun( vec[i] )); } inline int size() const { return vec.size(); } Listing 8.40 Implementation of Sapply 8.6 Case Study: Computing π Using Rcpp sugar Rcpp sugar provides a large number of functions that can be used as building blocks for other programs and applications. Rather than picking a particular example from an existing package, this section will demonstrate how Rcpp sugar is as compact and expressive as R itself. To do so, we will revisit the well-known introductory example of approximating π . The algorithm uses the property that the area of a unit circle is equal to π and repeatedly draws two uniform random numbers x and y, each between zero and one. It then computes the distance d = x2 + y2 8.6 Case Study: Computing π Using Rcpp sugar 123 to the origin and compares it to one to determine whether the point (x, y) is inside or outside the unit circle. By summing up all the attempts less than one, dividing by the total count N, one obtains a proportion—of the area of a quarter of the unit circle as we constrained the initial draws to be in the first quadrant. So, consequently, our estimate of π is provided by four times that area as an estimate for the area of the whole unit circle. 2 4 6 piR <- function(N) { x <- runif(N) y <- runif(N) d <- sqrt(xˆ2 + yˆ2) return(4 * sum(d < 1.0) / N) } Listing 8.41 Simulating π in R An equivalent program can be written in C++ thanks to Rcpp sugar with only one more line (in the function body) to ensure the random number generator is properly set. 2 #include <Rcpp.h> 4 using namespace Rcpp; 6 // [[Rcpp::export]] double piSugar(const int N) { RNGScope scope; // ensure RNG gets set/reset NumericVector x = runif(N); NumericVector y = runif(N); NumericVector d = sqrt(x*x + y*y); return 4.0 * sum(d < 1.0) / N; } 8 10 12 Listing 8.42 Simulating π in C++ Using Rcpp attributes, we can obtain an R function of the same name piSugar simply by passing the name of the file into the sourceCpp() function. The complete example is shown in Listing 8.43. 2 4 6 8 library(Rcpp) library(rbenchmark) piR <- function(N) { x <- runif(N) y <- runif(N) d <- sqrt(xˆ2 + yˆ2) return(4 * sum(d < 1.0) / N) } 10 12 # get C++ version from source file sourceCpp("piSugar.cpp") 14 N <- 1e6 124 16 8 Sugar set.seed(42) resR <- piR(N) 18 20 22 set.seed(42) resCpp <- piSugar(N) ## important: check results are identical with RNG seeded stopifnot(identical(resR, resCpp)) 24 res <- benchmark(piR(N), piSugar(N), order="relative") 26 print(res[,1:4]) Listing 8.43 Simulating π in R The result is shown in Table 8.2. Even though both versions are equally compact and execute essentially identical vectorized code leading to identical results (as we verified), the C++ version manages to reduce the run-time by almost half which is a surprisingly good result. Table 8.2 Run-time performance of Rcpp sugar compared to R for simulating π R expression Replications Elapsed Relative piSugar(N) piR(N) 100 100 5.777 11.227 1.000 1.943 We should stress, though, that one should not take away from this to attempt to rewrite each and every R expression using Rcpp sugar. Its power lies in providing us concise expression at the C++ level. This allows the programmer to write compact code similar to what one could achieve in R. This can complement other C++ code and is not necessarily meant to replace R as vectorized R code is typically fast enough. Part IV Applications Chapter 9 RInside Abstract The RInside package permits direct use of R inside of a C++ application. RInside provides an abstraction layer around the R embedding API and makes it easier to access an R instance inside your application. Moreover, thanks to the classes provided by Rcpp, data interchange between R and C++ becomes very straightforward. We illustrate RInside by examining several of the many examples included with the package. 9.1 Motivation Rcpp has been discussed throughout this book as a package which makes it easier to add new code to R itself. Examples for this can be stand-alone routines, accessing an external library or a combination of both. One commonality has been that R remains the principal interface: The aim is always to extend R with new facilities, yet the focus is on extending R as the principal environment for statistical computing, data analysis, and modeling. However, there are situations where one may want to take a different view, starting from a C++ program as a main executable. One might like to deploy R as an analysis engine or service to enhance the executable. As a more concrete example, consider the case of a (potentially large) program to control a set of simulations. After running a number of experiments, results can be aggregated to be analyzed further with the intent of deriving intermediate results which will influence further simulations. Our quest is to make the intermediate analysis accessible to the outer C++ program controlling the simulation. So at this point, one of the two common workflows is usually deployed. The first approach is the simplest. Data is written out to files. Standard textfiles are common, or maybe a domain-specific or higher-performance binary format is chosen. At this point, analysis may switch to another program for data analysis such as R. The analysis itself maybe written out in scripts; maybe these can be executed with a front-end such as Rscript (which is available wherever R is installed). D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4 9, © The Author 2013 127 128 9 RInside And, in that case the main analysis program could even call Rscript itself via a system() call. Once the data analysis has completed, the main simulation program can resume, possibly reflecting updated parameters from the analysis. A second approach may be to communicate with the analysis program over the network. The Rserve package (Urbanek 2003, 2012) can listen on network sockets and receive data as well as instructions—which makes it suitable to act as a networked R analysis facility. The main program can transmit the data to be analyzed and can then invoke the analysis script. The main program proceeds with its operations once it receives the results from the analysis engine or server. Thus, this setup loosely corresponds to the facilities from the first scripting case in that calls are made to an external engine. Both approaches work, yet both also have drawbacks. Calling an external program using system() is relatively simple and somewhat robust. However, a key problem is the difficulty of reporting errors back via the reduced interface offered by the system() call. One can encode the respective success and error codes as integer values which can be returned. Alternatively, results can be written out to the standard output, or into files, which the main program then needs to parse. No other formal mechanism exists for communication between the two programs sharing the work. Another drawback concerning file-based communication is the possibility of race conditions if more than one instance is running. Similarly, the networked solution introduces a possible new point of failure at the network layer. This can of course be mitigated by running the Rserve instance on the same machine, or by adding some redundancy to the network setup. The potential shortcomings of both approaches suggest a search for alternatives. One possibility supported by R is to embed the interpreter in another program. This is supported through an extensive embedding API (R Development Core Team 2012d, Chapter 8). This API is written in C and does not support higher-level abstractions as in Rcpp. However, the RInside package (Eddelbuettel and François 2012d) builds on top of the standard R embedding API in a way that may seem more natural to C++ programmers. In this chapter, we illustrate the use of RInside via a number of examples supplied with the package itself. 9.2 A First Example: Hello, World! Let us look at the very first of well over a dozen examples presented in the directory examples/standard of the RInside package. It follows the long and proud tradition of making a first program display the string “Hello, world!” on the screen: 1 #include <RInside.h> 3 int main(int argc, char *argv[]) { 5 7 // create an embedded R instance RInside R(argc, argv); 9.2 A First Example: Hello, World! 129 // assign a char* (string) to ’txt’ R["txt"] = "Hello, world!\n"; 9 // eval the init string, ignoring any returns R.parseEvalQ("cat(txt)"); 11 13 exit(0); 15 } Listing 9.1 First RInside example: Hello, World! The program really consists of only four statements, and one single header file (RInside.h) providing all declarations. We first instantiate an object called R of the RInside class. The class has two arguments to deal with command-line arguments; these arguments are, however, entirely optional. This is followed by an assignment of a constant text string—the message to be displayed—to a variable named txt which is created directly inside the R session. Next, we parse and evaluate an R command passed to the embedded R instances which calls the cat() function to simply display the content of variable txt. This last parse and evaluation is done “quietly” (as indicated by the trailing “Q” on the member function name) and no result is returned. The related function parseEval() which we will see below returns the value of its last expression, much like a standard R function. Finally, we return with error code of zero, which is a common value to indicate successful completion. Building this first program is straightforward provided that a Makefile (or Makefile.win for the Windows platform) has been set up—as is the case with the aforementioned directory examples/standard of the RInside package. The Makefile contains shell expressions which query R , Rcpp, and RInside for relevant header and library information, and then use this information to build the complete compile command: 1 3 5 ## comment this out if you need a different version of R, ## and set set R_HOME accordingly as an environment variable R_HOME := $(shell R RHOME) sources := programs := $(wildcard *.cpp) $(sources:.cpp=) 7 9 11 13 15 ## include headers and libraries for R RCPPFLAGS := $(shell $(R_HOME)/bin/R RLDFLAGS := $(shell $(R_HOME)/bin/R RBLAS := $(shell $(R_HOME)/bin/R RLAPACK := $(shell $(R_HOME)/bin/R CMD CMD CMD CMD config config config config --cppflags) --ldflags) BLAS_LIBS) LAPACK_LIBS) ## if you need to set an rpath to R itself, also uncomment #RRPATH := -Wl,-rpath,$(R_HOME)/lib 17 19 ## include headers and libraries for Rcpp interface classes RCPPINCL := $(shell echo ’Rcpp:::CxxFlags()’ | \ $(R_HOME)/bin/R --vanilla --slave) 130 21 RCPPLIBS := 9 RInside $(shell echo ’Rcpp:::LdFlags()’ | \ $(R_HOME)/bin/R --vanilla --slave) 23 25 27 ## include headers and libraries for RInside embedding classes RINSIDEINCL := $(shell echo ’RInside:::CxxFlags()’ | \ $(R_HOME)/bin/R --vanilla --slave) RINSIDELIBS := $(shell echo ’RInside:::LdFlags()’ | \ $(R_HOME)/bin/R --vanilla --slave) 29 31 33 35 37 ## compiler etc settings used in default make rules CXX := $(shell $(R_HOME)/bin/R CMD config CXX) CPPFLAGS := -Wall \ $(shell $(R_HOME)/bin/R CMD config CPPFLAGS) CXXFLAGS := $(RCPPFLAGS) $(RCPPINCL) $(RINSIDEINCL) \ ‘$(R_HOME)"/bin/R CMD config CXXFLAGS‘ LDLIBS := $(RLDFLAGS) $(RRPATH) $(RBLAS) $(RLAPACK) \ $(RCPPLIBS) $(RINSIDELIBS) Listing 9.2 Makefile for RInside examples With a Makefile in place, we merely have to say make rinside_sample0 to build this first example, or even just make to build all examples. The equivalent manual build commands are displayed below as the output from calling make. The exact form will differ depending on where packages have been installed as well as operating system-specific aspects and local system-wide compiler flags. To provide an illustration for a Linux system, we see the following execute (with lines broken for display purposes): 1 3 5 7 9 11 sh> make rinside_sample0 g++ -I/usr/share/R/include -I/usr/local/lib/R/site-library/Rcpp/include -I"/usr/local/lib/R/site-library/RInside/include" -O3 -pipe -g -Wall rinside_sample0.cpp -L/usr/lib64/R/lib -lR -lblas -llapack -L/usr/local/lib/R/site-library/Rcpp/lib -lRcpp -Wl,-rpath,/usr/local/lib/R/site-library/Rcpp/lib -L/usr/local/lib/R/site-library/RInside/lib -lRInside -Wl,-rpath,/usr/local/lib/R/site-library/RInside/lib -o rinside_sample0 Listing 9.3 Using Makefile for RInside to build example This multiline expression looks somewhat intimidating. But it really only combines three sets of headers and libraries. These come from three distinct sources which are combined into one larger compile-and-link command: 1. For compiling and linking with R as would be provided by R CMD COMPILE and R CMD LINK. 2. For compiling and linking with Rcpp. 3. For compiling and linking with RInside. Users can simply copy the same base components from the provided Makefile to build their own Makefile. As an alternative, the provided Makefile is 9.3 A Second Example: Data Transfer 131 generic and can be reused. It will compile and link any collection of example files into the corresponding set of executable programs. The only (current) constraint is the one-to-one mapping between source files and executables. Multiple dependencies are not currently supported though this could be added by extending the Makefile. Finally, on the Windows platform the corresponding Windows Makefile has to be used. This can be accomplished most easily by supplying the -f Makefile.win argument to the make invocation as in make -f Makefile.win. 9.3 A Second Example: Data Transfer Several other examples are supplied with the RInside package in a directory examples/standard. Several of these examples contain simple usage cases of calling R functions. The example below, which is a lightly edited version of the file rinside_sample6.cpp, shows simple data transfers for a variety of containers around double types: 1 #include <RInside.h> 3 int main(int argc, char *argv[]) { 5 RInside R(argc, argv); 7 double d1 = 1.234; R["d1"] = d1; // scalar double std::vector<double> d2; d2.push_back(1.23); d2.push_back(4.56); R["d2"] = d2; // vector of doubles 9 11 13 15 17 std::map< d3["a"] = d3["b"] = R["d3"] = std::string, double > d3; // map 7.89; 7.07; d3; 19 21 23 25 27 29 std::list< double > d4; d4.push_back(1.11); d4.push_back(4.44); R["d4"] = d4; // list of doubles std::string txt = // now access in R "cat(’\nd1=’, d1, ’\n’); print(class(d1));" "cat(’\nd2=\n’);print(d2);print(class(d2));" "cat(’\nd3=\n’);print(d3);print(class(d3));" "cat(’\nd4=\n’);print(d4);print(class(d4));"; 132 9 RInside R.parseEvalQ(txt); 31 exit(0); 33 } Listing 9.4 Second RInside example: data transfer This shows how to transfer, respectively, a single scalar floating-point number in double precision as well as the STL containers vector, map, and list for such floating-point numbers. Integers, characters, and logical could be passed through accordingly. 9.4 A Third Example: Evaluating R Expressions The third example (based on rinside_sample7.cpp) shows that accessing results from a normal evaluation (as opposed to “quiet” as in the first example) in R is also very straightforward. We can display the results at the C++ side using C++ output operators as Rcpp has taken care of all transfers. 1 #include <RInside.h> 3 int main(int argc, char *argv[]) { 5 RInside R(argc, argv); 7 // assignment can be done directly via [] R["x"] = 10 ; R["y"] = 20 ; 9 // R statement evaluation and result R.parseEvalQ("z <- x + y"); 11 13 // retrieval access using [] and implicit wrapper int sum = R["z"]; std::cout << "10 + 20 = " << sum << std::endl ; 15 17 // we can also return the value directly sum = R.parseEval("x + y") ; std::cout << "10 + 20 = " << sum << std::endl ; 19 21 exit(0); 23 } Listing 9.5 Third RInside example: data transfer This third example only shows a transfer of scalar values. However, larger composite objects can also be returned due to the implicit use of wrap(). This last aspect is very important as any expression in R is represented as a SEXP, and as such SEXP objects can be transferred between R and C++ with ease thanks to the Rcpp functions as<>() and wrap(). By using these facilities, we 9.5 A Fourth Example: Plotting from C++ via R 133 actually have a perfectly generic and extensible way of passing vectors, matrices, data frames, and lists—or even combinations of these types—simply by relying on the templated code in Rcpp. 9.5 A Fourth Example: Plotting from C++ via R The fourth and last example for RInside in this section is based on the file rinside_sample8.cpp. It shows that one can call the R function plot() from C++ function as well. 1 #include <RInside.h> #include <unistd.h> 3 int main(int argc, char *argv[]) { 5 // create an embedded R instance RInside R(argc, argv); 7 // evaluate an R expression with curve() // because RInside defaults to interactive=false we use a file std::string cmd = "tmpf <- tempfile(’curve’); " "png(tmpf); " "curve(xˆ2, -10, 10, 200); " "dev.off();" "tmpf"; // we get the last assignment back, here the filename std::string tmpfile = R.parseEval(cmd); 9 11 13 15 17 std::cout << "Could now use plot in " << tmpfile << std::endl; unlink(tmpfile.c_str()); // cleaning up 19 21 // alternatively, by forcing a display we can plot to screen cmd = "x11(); curve(xˆ2, -10, 10, 200); Sys.sleep(30);"; R.parseEvalQ(cmd); // parseEvalQ evals without assignment 23 25 exit(0); 27 } Listing 9.6 Fourth RInside example: plotting from C++ via R A small set of R instructions selects a temporary file to be used for Portable Network Graphics (PNG) file. A simple curve is then plotted. Here, we remove the temporary file but its name and location could be passed to another function displaying it. For completeness, we also plot onto a normal graphics device. This assumes that such a device can be opened as in normal interactive mode. As this example shows, the embedded R instance is capable of executing the same set of instructions as an interactive R session. 134 9 RInside 9.6 A Fifth Example: Using RInside Inside MPI Besides the directory examples/standard containing the example discussed so far in this chapter, the RInside package contains a directory examples/mpi which shows how to use R and Rcpp in the context of the Message Passing Interface (MPI). MPI is a very mature standard used particularly in scientific computing. It enables clusters of computers to work concurrently on programming problems. A detailed discussion of MPI is far beyond the scope of this sections; standard references exist (Gropp et al. 1996, 1999). A simple example rinside_mpi_sample2.cpp follows. It is based on an earlier version which was kindly provided by Jianping Hua and which used the C version of the MPI standard. We have updated it to the C++ variant of the MPI API; both API variants are rather close. 1 #include <mpi.h> // mpi header #include <RInside.h> // for the embedded R via RInside 3 int main(int argc, char *argv[]) { 5 // mpi initialization MPI::Init(argc, argv); // obtain current node rank and total nodes running int myrank = MPI::COMM_WORLD.Get_rank(); int nodesize = MPI::COMM_WORLD.Get_size(); 7 9 11 // create an embedded R instance RInside R(argc, argv); 13 std::stringstream txt; // node information txt << "Hello from node " << myrank << " of " << nodesize << " nodes!" << std::endl; // assign string var to R variable ’txt’ R.assign( txt.str(), "txt"); 15 17 19 21 // show node information std::string evalstr = "cat(txt)"; // eval the init string, ignoring any returns R.parseEvalQ(evalstr); 23 25 // mpi finalization MPI::Finalize(); 27 29 exit(0); 31 } Listing 9.7 Fifth RInside example: parallel computing with MPI This program simply prints the standard “Hello, World!” greeting from each of the nodes in an MPI cluster. It needs to build against MPI headers and libraries; the supplied Makefile does that for the Open MPI standard implementation. It can easily be adapted to different local deployments as well. 9.7 Other Examples 135 A somewhat richer example rinside_mpi_sample3.cpp is available as well and shows how to do some simple computations on each node in the MPI cluster. 9.7 Other Examples The examples/standard directory contains a number of additional examples which may be of interest. Among the topics illustrated are: • How to pass two-dimensional data structures such as matrices. • Running a regression in R and displaying the results via C++ indicating how to deploy R as a backend for a C++ application. • A small portfolio management problem motivated a mailing list post. • Conversions examples for logicals, lists, and tests for environments. • An example demonstrating how to use Rcpp modules with RInside. Fig. 9.1 Combining RInside with the Qt toolkit for a GUI application The examples/qt directory shows an example of how to embed R inside of an application using the powerful and popular Qt toolkit supporting cross-platform applications, and in particular those with a graphical user interface (GUI) (Fig. 9.1). 136 9 RInside This example is fairly straightforward. Given a mixture distribution (for which the user can alter parameter as well as the functional form), the choice of kernel function and parameters for the density estimation directly influence the density estimate. The application provides an opportunity to interactively experiment with these choices. Several of the so-called widgets in the GUI toolkit return parameters. For the selection of the estimation kernel, as well as the estimation bandwidth, this is an integer. For the R expression denoting the generation of the data set from which the density is estimated, we obtain a character string. These can be passed to R rather easily using the Rcpp facilities we have studied. From R, we then obtain an updated graphics file and the GUI toolkit covers displaying the updated file. The remainder of the program—which is relatively short at about two hundred lines—mostly deals with the typical code to set up a graphical application, lay out the GUI widgets, and organize the callback events. Adding R into the mix is straightforward, thanks to RInside. Similarly, the examples/wt directory shows two examples which implement a web application (corresponding to the GUI application written using Qt) by using the Wt (“web toolkit”) library. Wt is responsible for all aspects of the network programming, provides an integrated webserver, and negotiates the best communications protocol with the webclient sending the request (Fig. 9.2). Fig. 9.2 Combining RInside with the Wt toolkit for a web application 9.7 Other Examples 137 The web application example is very similar to the GUI application as the Web toolkit library Wt covers all aspects of the network communication. Similar to the Qt example, the programmer simply has to receive the user choices on user events and update the estimated density based on these choices. This example uses cascading style sheets (CSS) to allow alteration of the appearances of the application without requiring any code logic and supports an XML file containing widget labels permitting similar changes to the descriptive texts without requiring an application rebuild. Chapter 10 RcppArmadillo Abstract The RcppArmadillo package implements an easy-to-use interface to the Armadillo library. Armadillo is an excellent, modern, high-level C++ library aiming to be as expressive to use as a scripting language while offering highperformance code due to modern C++ design including template meta- programming. RcppArmadillo brings all these features to the R environment by leaning on the Rcpp interface. This chapter introduces Armadillo and provides a motivating example via a faster replacement function for fitting linear models before it discusses a detailed case study of implementing a Kalman filter in RcppArmadillo. 10.1 Overview Armadillo (Sanderson 2010) is a modern C++ library with a focus on linear algebra and related operations. Quoting from its homepage: Armadillo is a C++ linear algebra library aiming towards a good balance between speed and ease of use. The syntax is deliberately similar to Matlab. Integer, floating point and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided [. . . ]. A delayed evaluation approach is employed (during compile time) to combine several operations into one and reduce (or eliminate) the need for temporaries. This is accomplished through recursive templates and template meta-programming. This library is useful if C++ has been decided as the language of choice (due to speed and/or integration capabilities) [. . . ]. Its features can be illustrated with a simple example from the Armadillo web site (which we modified slightly to fit with the style of the rest of the book). 1 #include <iostream> #include <armadillo> 3 5 using namespace std; using namespace arma; D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4 10, © The Author 2013 139 140 9 int main(int argc, char** argv) { mat A = randn<mat>(4,5); mat B = randn<mat>(4,5); 11 cout << A * trans(B) << endl; 7 10 RcppArmadillo return 0; 13 } Listing 10.1 A simple Armadillo example Header files for both the standard input/output streams and Armadillo itself are included to provide the required declarations. Two Armadillo matrices A and B of size 4 × 5 are then filled with random variables drawn from a N(0, 1) distribution. The randn() function is templated to the matrix type. Finally, the 4 × 4 matrix resulting from multiplying A by the transpose of B is computed and the result is printed to the standard output. The code is highly readable and easy to study. This is in one part due to the global namespace import for both std and arma which shortens the function and class names. It is also due to the sensible use of identifiers such as trans(B) for a transpose (and the alternate form B.t() is also supported), as well as mat as a default matrix type. This is in fact a typedef for Mat<double>, a matrix templated to the standard floating-point type. Other matrix and vector types exist for integers, unsigned integers, and complex numbers. Armadillo supports a large number of functions as a look at its available documentation reveals. While many of these functions are also available within R itself, they make Armadillo as attractive choice for the C++ programmer aiming to easily extend functionality at the C++ source level. This, in essence, is the main attraction of Armadillo: an easy-to-use, feature-complete, well-supported modern C++ library for linear algebra. The RcppArmadillo package (François et al. 2012; Eddelbuettel and Sanderson 2013) integrates it into R using facilities provided by the Rcpp package. 10.2 Motivation: FastLm 10.2.1 Implementation Fitting linear models is a fundamental building block of data analysis. It is available in R via the powerful lm() function which provides a vast amount of additional functionality, as well as the more spartan lm.fit() function. Below, we show the complete file fastLm.cpp from the src directory of the RcppArmadillo package which implements a faster replacement function suitable for use in extended simulations. 10.2 Motivation: FastLm 141 extern "C" SEXP fastLm(SEXP ys, SEXP Xs) { 2 try { // Rcpp and arma structure reuse original memory Rcpp::NumericVector yr(ys); Rcpp::NumericMatrix Xr(Xs); int n = Xr.nrow(), k = Xr.ncol(); arma::mat X(Xr.begin(), n, k, false); arma::colvec y(yr.begin(), yr.size(), false); int df = n - k; 4 6 8 10 // fit model y ˜ X, extract residuals arma::colvec coef = arma::solve(X, y); arma::colvec res = y - X*coef; 12 14 double s2 = std::inner_product(res.begin(), res.end(), res.begin(), 0.0)/df; // std.errors of coefficients arma::colvec sderr = arma::sqrt(s2 * arma::diagvec(arma::pinv(arma::trans(X)*X))); 16 18 20 return Rcpp::List::create(Rcpp::Named("coefficients")=coef, Rcpp::Named("stderr") =sderr, Rcpp::Named("df") =df); 22 24 } catch( std::exception &ex ) { forward_exception_to_r( ex ); } catch(...) { ::Rf_error( "c++ exception (unknown reason)" ); } return R_NilValue; // -Wall 26 28 30 32 } Listing 10.2 FastLm function using RcppArmadillo As the example demonstrates, Armadillo allows us to write remarkably compact code: 1. We start by instantiating Rcpp objects for the model matrix and dependent variable; these are lightweight proxy objects and no data is copied. 2. The vector y and matrix X are initialized as an arma matrix and vector from the Rcpp types using the dimension information and iterator pointing to the beginning of the data, and again no explicit memory allocation is needed. 3. The model fit of y ∼ X is also done in one solve() statement, as is the calculation of the residuals as y − X β̂ . 4. Similarly, the sum of squared residuals is computed in a single statement, thanks to the STL inner_product function, and the result is divided by the degrees of freedom n − k. 5. Now the standard errors of the estimate are extracted as the squared root of the diagonal of (X X)−1 , scaled by the sum of squared residuals. 6. We need one further statement to create the named list of return values. 142 10 RcppArmadillo 7. Controlling for exceptions is straightforward with a try/catch block which passes recognized exception back to R using a helper function, or shows a default text in case of an unrecognized exception. The RcppArmadillo package provides access to the function above via two different interfaces. The simpler function fastLmPure() just transfers a given vector and matrix and executes the regression, without any other transformation (but two tests for suitable data type and conforming dimensions). The higher-level function fastLm() provides the standard modeling interface using the common formula notation. 10.2.2 Performance Comparison As before, we can rely on inline to create a function by compiling, linking, and loading the code below. This is made possible by a plugin provided by the RcppArmadillo package and used by the inline to determine the required values to instrument the underlying R CMD COMPILE and R CMD SHLIB calls which execute the build. 2 4 6 src <- ’ Rcpp::NumericMatrix Xr(Xs); Rcpp::NumericVector yr(ys); int n = Xr.nrow(), k = Xr.ncol(); arma::mat X(Xr.begin(), n, k, false); arma::colvec y(yr.begin(), yr.size(), false); int df = n - k; 8 10 // fit model y ˜ X, extract residuals arma::colvec coef = arma::solve(X, y); arma::colvec res = y - X*coef; 12 14 16 double s2 = std::inner_product(res.begin(), res.end(), res.begin(), 0.0)/df; // std.errors of coefficients arma::colvec sderr = arma::sqrt(s2 * arma::diagvec(arma::pinv(arma::trans(X)*X))); 18 20 22 24 return Rcpp::List::create(Rcpp::Named("coefficients")=coef, Rcpp::Named("stderr") =sderr, Rcpp::Named("df") =df); ’ fLm <- cxxfunction(signature(Xs="numeric", ys="numeric"), src, plugin="RcppArmadillo") Listing 10.3 Basic fLm() function without formula interface We can also time and compare the approaches. As before, we use the rbenchmark package (Kusnierczyk 2012) which contains a function benchmark() which makes such timing comparisons very straightforward. We use the data set 10.2 Motivation: FastLm 143 trees which is also used in the example from the help page for the original lm() function in R and compare computation of the linear fit via the three different approaches, each repeated one thousand times. As intermediate approaches, we use both the fastLmPure() function implemented in RcppArmadillo and a simpler fastLmPure2() used just here. We can think of fastLmPure() as an equivalent to lm.fit() as it works directly on a matrix and vector rather than a model formula. The function is defined as follows: fastLmPure <- function(X, y) { 2 stopifnot(is.matrix(X)) stopifnot(nrow(y)==nrow(X)) 4 .Call("fastLm", X, y, PACKAGE = "RcppArmadillo") 6 } Listing 10.4 Basic fastLmPure() R function without formula interface The fastLmPure2() is a copy where we removed the two stopifnot() tests for proper types and dimensions. That is done just for this performance comparison and not recommended for normal production code. Given these functions, the small performance comparison can be executed as follows: 1 3 5 7 9 11 13 15 17 R> R> R> R> + + + + + + + y <- log(trees$Volume) X <- cbind(1, log(trees$Girth)) frm <- formula(log(Volume) ˜ log(Girth)) benchmark(fLm(X, y), fastLmPure(X, y), fastLmPure2(X, y), fastLm(frm, data=trees), columns = c("test", "replications", "elapsed", "relative"), order="relative", replications=1000) test replications elapsed relative 1 fLm(X, y) 1000 0.034 1.000000 3 fastLmPure2(X, y) 1000 0.040 1.176471 2 fastLmPure(X, y) 1000 0.081 2.382353 5 lm.fit(X, y) 1000 0.136 4.000000 4 fastLm(frm, data = trees) 1000 1.414 41.588235 Listing 10.5 FastLm comparison Given the small size of the data set, executing the underlying regression is not expensive at all. Hence, small code differences such as the testing for data type and data dimension (which is done by fastLmPure() but not by fastLmPure2()) can have a disproportionate performance impact. It may appear magnified here given the short computation time required by the minimal implementation in fLm() from the cxxfunction() invocation above. 144 10 RcppArmadillo The more dramatic difference is between the formula interface offered by the RcppArmadillo package and the more direct implementation. The additional function calls needed to parse the formula and to set up the model matrix can be seen as having a disproportionate cost especially given the small size of the data set used here. We also see that fLm(), with its directly attached object code and pointer to it, performs marginally faster than the simplest possible implementation in a function using .Call() employing code from the indicated package, here RcppArmadillo. However, this small difference is dwarfed by the cost of the (highly recommended) tests for proper types and dimensions in fastLmPure(). Overall the example is very encouraging. We can execute one thousand calls to the simple regression function created on the fly via the inline package in about 34 ms. The bare bones implementation of fastLmPure from RcppArmadillo takes a little longer at 81 ms. By reducing fastLmPure further to just the .Call() invocation, its time moves within twenty percent of the time of fLm which indicates a small but noticeable overhead from the .Call() interface. 10.2.3 A Caveat The reimplementation of lm() using Armadillo has served as a very useful example of how to add C++ code implementing linear algebra operations. However, there is one important difference between the numerical computing aspect and the statistical computing side. The help page for fastLm in the RcppArmadillo package has an illustration. Numerical computing Statistical computing It uses an artificial data set constructed such that it produces a rank-deficient two-way layout with missing cells. Such cases require a special pivoting scheme of “pivot only on (apparent) rank deficiency” which R contains via customized routines based on the Linpack library. This provides the appropriate statistical computing approach, but such pivoting is generally not contained in any conventional linear algebra software libraries such as Armadillo. 2 4 6 8 10 12 14 16 R> ## case where fastLm breaks down R> dd <- data.frame(f1 = gl(4, 6, labels=LETTERS[1:4]), + f2 = gl(3, 2, labels=letters[1:3]))[-(7:8),] R> xtabs(˜ f2 + f1, dd) # one missing cell f1 f2 A B C D a 2 0 2 2 b 2 2 2 2 c 2 2 2 2 R> mm <- model.matrix(˜ f1 * f2, dd) R> kappa(mm) # large, indicating rank deficiency [1] 4.30923e+16 R> set.seed(1) R> dd[,"y’’] <- mm %*% seq_len(ncol(mm)) + + rnorm(nrow(mm), sd = 0.1) R> summary(lm(y ˜ f1 * f2, dd)) # detects rank deficiency 10.2 Motivation: FastLm 18 20 22 24 26 28 30 32 34 36 38 145 Call: lm(formula = y ˜ f1 * f2, data = dd) Residuals: Min 1Q Median -0.122 -0.047 0.000 3Q 0.047 Max 0.122 Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 0.9779 0.0582 16.8 3.4e-09 *** f1B 12.0381 0.0823 146.3 < 2e-16 *** f1C 3.1172 0.0823 37.9 5.2e-13 *** f1D 4.0685 0.0823 49.5 2.8e-14 *** f2b 5.0601 0.0823 61.5 2.6e-15 *** f2c 5.9976 0.0823 72.9 4.0e-16 *** f1B:f2b -3.0148 0.1163 -25.9 3.3e-11 *** f1C:f2b 7.7030 0.1163 66.2 1.2e-15 *** f1D:f2b 8.9643 0.1163 77.1 < 2e-16 *** f1B:f2c NA NA NA NA f1C:f2c 10.9613 0.1163 94.2 < 2e-16 *** f1D:f2c 12.0411 0.1163 103.5 < 2e-16 *** --Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 40 42 Residual standard error: 0.0823 on 11 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 1.86e+04 on 10 and 11 DF, p-value: <2e-16 44 R> summary(fastLm(y ˜ f1 * f2, dd)) # some huge coefficients 46 48 50 52 54 56 58 60 62 64 Call: fastLm.formula(formula = y ˜ f1 * f2, data = dd) Estimate StdErr t.value p.value (Intercept) 2.384e-01 5.091e-01 4.680e-01 0.649605 f1B 5.165e+15 3.394e-01 1.522e+16 < 2e-16 *** f1C 3.728e+00 7.200e-01 5.177e+00 0.000415 *** f1D 4.697e+00 7.200e-01 6.523e+00 6.70e-05 *** f2b 5.752e+00 7.200e-01 7.989e+00 1.19e-05 *** f2c 6.632e+00 7.200e-01 9.211e+00 3.36e-06 *** f1B:f2b -5.165e+15 5.366e-01 -9.625e+15 < 2e-16 *** f1C:f2b 7.000e+00 1.018e+00 6.875e+00 4.32e-05 *** f1D:f2b 8.262e+00 1.018e+00 8.114e+00 1.04e-05 *** f1B:f2c -5.165e+15 5.366e-01 -9.625e+15 < 2e-16 *** f1C:f2c 1.027e+01 1.018e+00 1.009e+01 1.46e-06 *** f1D:f2c 1.131e+01 1.018e+00 1.111e+01 6.02e-07 *** --Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 Multiple R-squared: 0.996, Adjusted R-squared: 0.993 Listing 10.6 An example of a rank-deficient design matrix The lm() function correctly detected the rank-deficiency in the model matrix which is illustrated by the empty cell in the cross-tabulation. The corresponding 146 10 RcppArmadillo interaction parameter has been set to zero; all other coefficients are reasonable. This is achieved by not using the standard numerical computing approach, but by relying on a custom pivoting implementation which R provides based on a Linpack routine. This will necessarily be slower than the optimized BLAS Level 3 routine for QR decomposition called by Armadillo for the faster reimplementation Also, our faster reimplementation has no (explicit) check for rank deficiency by computing κ , the condition number. The model fit is executed in the standard way (as far as numerical computing is concerned), and biased coefficients ensue in this (degenerate) case. Arguably, silently returning wrong results is worse or more inconvenient than an outright failure. This illustrates that it may well pay to bear a small performance penalty by running lm() (or at least lm.fit()) directly as there may be situations when it is numerically more stable. That said, on current computing architecture floating-point precision tends to be high enough so that such cases are rare in practice. We admit that the example was a little contrived—but it provides a useful illustration. 10.3 Case Study: Kalman Filter Using RcppArmadillo The Armadillo library provides common linear algebra operations in a well-designed and modern C++ framework. It permits us to write elegant and concise code that is also very efficient. A minor additional focus of Armadillo is that it also aims to make it easy for programmers who are familiar with the Matlab / Octave matrix languages to get started in C++ with Armadillo. To demonstrate this aspect, we are going to discuss a second example motivated by a discussion of how Matlab can estimate a Kalman filter, and turn the simple program into a C++ version. While R has no comparable tools to convert R code automatically to C or C++ , we can, however, achieve rather decent gains by switching the code to Armadillo via the RcppArmadillo package. The page http://www.mathworks.com/products/matlab-coder/ demos.html lists several case studies for this (commercial) code converter. One of these examples covers the Kalman filter. It describes in some detail the original filter, including an example data set, as well as all the steps from the initial script to autogenerated C code. 1 3 5 7 9 11 % Copyright 2010 The MathWorks, Inc. function y = kalmanfilter(z) %#codegen dt=1; % Initialize state transition matrix A=[ 1 0 dt 0 0 0;... % [x ] 0 1 0 dt 0 0;... % [y ] 0 0 1 0 dt 0;... % [Vx] 0 0 0 1 0 dt;... % [Vy] 0 0 0 0 1 0 ;... % [Ax] 0 0 0 0 0 1 ]; % [Ay] 10.3 Case Study: Kalman Filter Using RcppArmadillo 13 15 17 19 21 23 25 27 29 31 147 H = [ 1 0 0 0 0 0; 0 1 0 0 0 0 ]; % Initialize measurement matrix Q = eye(6); R = 1000 * eye(2); persistent x_est p_est % Initial state conditions if isempty(x_est) x_est = zeros(6, 1); % x_est=[x,y,Vx,Vy,Ax,Ay]’ p_est = zeros(6, 6); end % Predicted state and covariance x_prd = A * x_est; p_prd = A * p_est * A’ + Q; % Estimation S = H * p_prd’ * H’ + R; B = H * p_prd’; klm_gain = (S \ B)’; % Estimated state and covariance x_est = x_prd + klm_gain * (z - H * x_prd); p_est = p_prd - klm_gain * H * p_prd; % Compute the estimated measurements y = H * x_est; end % of the function Listing 10.7 Basic Kalman filter in Matlab The R code below reimplements this basic linear Kalman filter. FirstKalmanR <- function(pos) { 2 4 6 8 10 12 14 16 kalmanfilter <- function(z) { dt <- 1 A <- matrix( c( 1, 0, dt, 0, 0, 0, 0, 1, 0, dt, 0, 0, 0, 0, 1, 0, dt, 0, 0, 0, 0, 1, 0, dt, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1), 6, 6, byrow=TRUE) H <- matrix( c(1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0), 2, 6, byrow=TRUE) Q <- diag(6) R <- 1000 * diag(2) 18 20 22 24 26 28 N <- nrow(pos) y <- matrix(NA, N, 2) ## predicted state and covariance xprd <- A %*% xest pprd <- A %*% pest %*% t(A) + Q ## estimation S <- H %*% t(pprd) %*% t(H) + R B <- H %*% t(pprd) # # # # # # x y Vx Vy Ax Ay 148 10 RcppArmadillo kalmangain <- t(solve(S, B)) 30 ## est. state and cov., assign to vars in parent env xest <<- xprd + kalmangain %*% (z - H %*% xprd) pest <<- pprd - kalmangain %*% H %*% pprd 32 34 ## compute the estimated measurements y <- H %*% xest 36 } 38 xest <- matrix(0, 6, 1) pest <- matrix(0, 6, 6) 40 for (i in 1:N) { y[i,] <- kalmanfilter(t(pos[i,,drop=FALSE])) } 42 44 invisible(y) 46 } Listing 10.8 Basic Kalman filter in R The Matlab example uses “persistent” (or “static” for C++ programmers) variables for xest and pest. We use a different R paradigm of defining these variables in the enclosing function. Otherwise, the code is very similar to the original example. Figure 10.1 displays the object trajectory as well as the estimate provided by the Kalman filter. A slight improvement is available when the invariant code creating variables, from the assignment of dt all the way to the initial variable setup, is also moved to the enclosing function. 1 3 5 KalmanR <- function(pos) { kalmanfilter <- function(z) { ## predicted state and covariance xprd <- A %*% xest pprd <- A %*% pest %*% t(A) + Q 7 ## estimation S <- H %*% t(pprd) %*% t(H) + R B <- H %*% t(pprd) ## kalmangain <- (S \ B)’ kalmangain <- t(solve(S, B)) 9 11 13 ## estimated state and covariance, assign to vars in parent env xest <<- xprd + kalmangain %*% (z - H %*% xprd) pest <<- pprd - kalmangain %*% H %*% pprd 15 17 ## compute the estimated measurements y <- H %*% xest 19 } 21 10.3 Case Study: Kalman Filter Using RcppArmadillo 149 Trajectory Estimate y −1.0 −0.5 0.0 0.5 1.0 Object Trajectory and Kalman Filter Estimate −0.5 0.0 x 0.5 Fig. 10.1 Object trajectory and Kalman filter estimate dt <- 1 A <- matrix( c( 1, 0, dt, 0, 0, 0, 0, 1, 0, dt, 0, 0, 0, 0, 1, 0, dt, 0, 0, 0, 0, 1, 0, dt, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1), 6, 6, byrow=TRUE) H <- matrix( c(1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0), 2, 6, byrow=TRUE) Q <- diag(6) R <- 1000 * diag(2) 23 25 27 29 31 33 # # # # # # x y Vx Vy Ax Ay 35 N <- nrow(pos) y <- matrix(NA, N, 2) 37 xest <- matrix(0, 6, 1) pest <- matrix(0, 6, 6) 39 41 for (i in 1:N) { y[i,] <- kalmanfilter(t(pos[i,,drop=FALSE])) } 43 45 invisible(y) 47 } Listing 10.9 Basic Kalman filter in R 1.0 150 10 RcppArmadillo When rewriting this implementation in C++, a first option would be to copy the structure of the first example in Listing 10.8. However, just as Listing 10.9 improves upon the first by factoring out the assignment of variables that are invariant to the call given a particular observation, we can do something very similar in C++ by using basic principles of object-oriented programming. By creating a class for the Kalman filter, we can instantiate the constant variables in the class constructor and be assured that this code will be executed exactly once. The actual estimation— done in the R code in function kalmanfilter—can then be computed in a class member variable. 1 using namespace arma; 3 5 7 9 11 13 15 class Kalman { private: mat A, H, Q, R, xest, pest; double dt; public: // constructor, sets up data structures Kalman() : dt(1.0) { A.eye(6,6); A(0,2) = A(1,3) = A(2,4) = A(3,5) = dt; H.zeros(2,6); H(0,0) = H(1,1) = 1.0; Q.eye(6,6); R = 1000 * eye(2,2); 17 19 xest.zeros(6,1); pest.zeros(6,6); 21 } 23 25 27 29 // sole member function: estimate model mat estimate(const mat & Z) { unsigned int n = Z.n_rows, k = Z.n_cols; mat Y = zeros(n, k); mat xprd, pprd, S, B, kalmangain; colvec z, y; 31 33 35 37 for (unsigned int i = 0; i<n; i++) { colvec z = Z.row(i).t(); // predicted state and covariance xprd = A * xest; pprd = A * pest * A.t() + Q; 41 // estimation S = H * pprd.t() * H.t() + R; B = H * pprd.t(); 43 // kalmangain = t(S \ B) 39 10.3 Case Study: Kalman Filter Using RcppArmadillo 151 kalmangain = trans(solve(S, B)); 45 // estimated state and covariance xest = xprd + kalmangain * (z - H * xprd); pest = pprd - kalmangain * H * pprd; 47 49 // compute the estimated measurements y = H * xest; 51 Y.row(i) = y.t(); } return Y; 53 55 } 57 }; Listing 10.10 Basic Kalman filter class in C++ using Armadillo The code itself is very close indeed to the original Matlab code. One could argue that it is easier on the eye than the R code. A simple * for matrix or vector multiplication—which we can use as C++ permits one to overload these operators for appropriately defined types such as matrices or vectors—is more succinct than the %*% operator in R. All the code in Listing 10. 10 can be assigned to a variable kalmanClass for included text which we use along with the now very simple function body to create a function via cxxfunction: 1 3 5 kalmanSrc <- ’ mat Z = as<mat>(ZS); Kalman K; mat Y = K.estimate(Z); return wrap(Y); ’ // passed from R 7 9 11 KalmanCpp <- cxxfunction(signature(ZS="numeric"), body=kalmanSrc, include=kalmanClass, plugin="RcppArmadillo") Listing 10.11 Basic Kalman filter function in C++ Given the two R versions and this C++ version, we can first determine that results are in fact identical (to numerical precision) between these variants, and then run a benchmark example. 1 3 5 7 9 R> R> R> R> R> R> R> + + require(rbenchmark) require(compiler) FirstKalmanRC <- cmpfun(FirstKalmanR) KalmanRC <- cmpfun(KalmanR) stopifnot(identical(KalmanR(pos), KalmanRC(pos)), all.equal(KalmanR(pos), KalmanCpp(pos)), identical(FirstKalmanR(pos), FirstKalmanRC(pos)), 152 11 13 15 17 19 21 23 25 27 10 RcppArmadillo + all.equal(KalmanR(pos), FirstKalmanR(pos))) R> R> res <- benchmark(KalmanR(pos), + KalmanRC(pos), + FirstKalmanR(pos), + FirstKalmanRC(pos), + KalmanCpp(pos), + columns = c("test", "replications", + "elapsed", "relative"), + order="relative", + replications=100) R> R> print(res) test replications elapsed relative 5 KalmanCpp(pos) 100 0.087 1.0000 2 KalmanRC(pos) 100 5.774 66.3678 1 KalmanR(pos) 100 6.448 74.1149 4 FirstKalmanRC(pos) 100 8.153 93.7126 3 FirstKalmanR(pos) 100 8.901 102.3103 Listing 10.12 Basic Kalman filter timing comparison The timing results are very satisfactory. Compared to the basic R routine, an improvement in run-time of around 100 times can be achieved. Even the byte-compiled version gains only about ten percent on the basic R versions, and C++ shows an over 90-fold improvement. The simple change of moving invariant code out of the estimation function helps reduce run-time by about a quarter: now the C++ code is about 74 times as fast as the R code, and 66 times as fast as the byte-compiled variant. 10.4 RcppArmadillo and Armadillo Differences Generally speaking, RcppArmadillo does not differ from Armadillo. The core source code of the actual Armadillo implementation is included “as-is” and not modified. Armadillo is written to be used as a portable, general-purpose C++ library with the expectation of being used with a variety of compilers and operating systems. In our case, and for the purposes of RcppArmadillo, we have a predictable and narrowly defined setup. We know, for example, that there always is an underlying R installation whenever Rcpp and RcppArmadillo are used. This enables us to simplify and standardize the use of Armadillo by making the following definitions in a configuration header file sourced before the Armadillo headers themselves are sources: 2 #define ARMA_USE_LAPACK 4 #define ARMA_USE_BLAS 10.4 RcppArmadillo and Armadillo Differences 6 8 #define #define #define #define 153 ARMA_HAVE_STD_ISFINITE ARMA_HAVE_STD_ISINF ARMA_HAVE_STD_ISNAN ARMA_HAVE_STD_SNPRINTF Listing 10.13 Standard defines for RcppArmadillo We can always assume a Lapack and BLAS installation via R as R will either be built against the system BLAS and Lapack libraries or provide its own implementation for its usage. Similarly we can make some assumptions about how complete the C library is (though we do undefine all these values on the Solaris platform, and undefine just one for Windows 64). Two more definitions are more specific to R. Because R provides the “shell” around our statistical computing, programs need to synchronize their (printed) output with R which uses its own buffering. The CRAN maintainers now warn if code uses functions which direct print such as printf, or puts, or if the C++ facility std::cout is used. For standard printing, we can use Rprintf from the R API which also provides REprintf for error messages. Thanks to a contributed patch, Rcpp now wraps a special output device Rcpp::Rcout around calls to Rprintf. By defining ARMA DEFAULT OSTREAM, all output generated by Armadillo is then synchronized via the buffering done by R. 1 3 5 // Rcpp has its own stream object which cooperates more nicely // with R’s i/o -- and as of Armadillo 2.4.3, we can use this // stream object as well #if !defined(ARMA_DEFAULT_OSTREAM) #define ARMA_DEFAULT_OSTREAM Rcpp::Rcout #endif 7 9 11 13 // R now defines NDEBUG which suppresses a number of useful // Armadillo tests Users can still defined it later, and/or // define ARMA_NO_DEBUG #if defined(NDEBUG) #undef NDEBUG #endif Listing 10.14 Standard defines for RcppArmadillo A related matter is the definition of NDEBUG as it (among other things) typically inhibits program halt if a call to assert results in a (logically) false condition. This is reasonable from the point of R which can ill afford an exit in a subroutine. However, this has side effects as it may also turn off useful testing. In the case of Armadillo, bounds checks for vector and matrix indices are suppressed when NDEBUG is defined. This may not be desirable during the development of new code and is the reason why this definition is removed in the RcppArmadillo headers. Users can still define it, or define ARMA_NO_DEBUG if they want it. Chapter 11 RcppGSL Abstract The RcppGSL package provides an easy-to-use interface between data structures from the GNU Scientific Library, or GSL for short, and R by building on facilities provided in the Rcpp package. The GSL is a well-known collection of numerical routines for scientific computing. It is particularly useful for C and C++ programs as it provides a standard C interface to a wide range of mathematical routines. The chapter provides an introduction to the vector and matrix types in RcppGSL, illustrates their use by revisiting the linear modeling example, discusses how to deploy the RcppGSL from another package and via inline, and closes with an extended application example. 11.1 Introduction The GNU Scientific Library, or GSL, is a collection of routines for scientific computing and numerical analysis (Galassi et al. 2010). It is a rigorously developed and tested library providing support for a wide range of scientific or numerical tasks. Among the topics covered in the GSL are complex numbers, roots of polynomials, special functions, vector and matrix data structures, permutations, combinations, sorting, BLAS support, linear algebra, fast Fourier transforms, eigensystems, random numbers, quadrature, random distributions, quasi-random sequences, Monte Carlo integration, N-tuples, differential equations, simulated annealing, numerical differentiation, interpolation, series acceleration, Chebyshev approximations, root-finding, discrete Hankel transforms least-squares fitting, minimization, physical constants, basis splines, and wavelets. Support for C programming with the GSL is readily available. The GSL itself is written in C (just like R) and provides a C-language Application Programming Interface (API). Several scripting languages have interfaces to the GSL library; the CRAN network for R also contains a package gsl providing access to GSL functionality for R users. D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4 11, © The Author 2013 155 156 11 RcppGSL As a C-language API is provided, access from C++ is also possible, albeit not at the abstraction level that can be offered by dedicated C++ implementations.1 The GSL is somewhat unique among numerical libraries. Its combination of broad coverage of scientific topics, serious implementation effort, and the use of an Open Source license have led to a fairly wide usage of the library. A number of CRAN packages use the GSL directly; and (as of late 2012) nine packages use the CRAN package gsl (Hankin 2011) which exposes some parts of the GSL to R. This is an indication that the GSL is popular among programmers using either the C or C++ language for solving problems in applied science. At the same time, the Rcpp package offers a higher-level abstraction between R and underlying C++ (or C) code. Rcpp permits R objects such as vectors, matrices, lists, functions, environments, . . ., to be manipulated directly at the C++ level, which alleviates the needs for complicated and error-prone parameter passing and memory allocation. It also permits compact vectorized expressions similar to what can be written in R, but written directly at the C++ level. The RcppGSL package aims to help close the gap between R and the GSL. It tries to offer access to GSL functions, in particular via the vector and matrix data structures used throughout the GSL, while staying closer to the “whole object model” familiar to the R programmer. The rest of the chapter is organized as follows. The next section shows a motivating example of a fast linear model fit routine using GSL functions. The following section discusses support for GSL vector types, which is followed by a section on matrices. We close with a case study using B-splines provided by the GSL from R. 11.2 Motivation: FastLm As discussed in Chap. 10, fitting linear models is a key building block of analyzing data and model building. R has a very complete and feature-rich function in lm(). It can provide a model fit as well as a number of diagnostic measures, either directly or via the corresponding summary() method for linear model fits. The lm.fit() function also provides a faster alternative (which is, however, recommended only for advanced users) which provides estimates only and fewer statistics for inference. This sometimes leads to user requests for a routine which is both fast and featureful enough. The fastLm routine shown in Listing 11. 1 provides such an implementation (and it preceded the routine based on RcppArmadillo from Chap. 10). It uses the GSL for the least-squares fitting functions and therefore provides a nice example for GSL integration with R, and a direct comparison to the Armadillo-based variant introduced in Chap. 10. 1 Several C++ wrappers for the GSL have been written over the years, yet none reached a state of completion comparable to the GSL itself. Three such wrapping library are http://cholm. home.cern.ch/cholm/misc/gslmm/, http://gslwrap.sourceforge.net/, and http://code.google.com/p/gslcpp/. 11.2 Motivation: FastLm 157 3 #include <RcppGSL.h> #include <gsl/gsl_multifit.h> #include <cmath> 5 extern "C" SEXP fastLm(SEXP ys, SEXP Xs) { 1 try { RcppGSL::vector<double> y = ys; RcppGSL::matrix<double> X = Xs; 7 9 // gsl data str. via SEXP int n = X.nrow(), k = X.ncol(); double chisq; 11 13 RcppGSL::vector<double> coef(k); RcppGSL::matrix<double> cov(k,k); 15 // hold the coef vector // and covariance matrix // the actual fit req. working memory we allocate and free gsl_multifit_linear_workspace *work = gsl_multifit_linear_alloc (n, k); gsl_multifit_linear (X, y, coef, cov, &chisq, work); gsl_multifit_linear_free (work); 17 19 21 // extract the diagonal as a vector view gsl_vector_view diag = gsl_matrix_diagonal(cov) ; 23 25 // currently no direct interface in Rcpp::NumericVector // that uses wrap(), so we have to do it in two steps Rcpp::NumericVector std_err ; std_err = diag; std::transform(std_err.begin(), std_err.end(), std_err.begin(), sqrt); 27 29 31 Rcpp::List res = Rcpp::List::create(Rcpp::Named("coefficients") = coef, Rcpp::Named("stderr") = std_err, Rcpp::Named("df") = n - k); 33 35 // free all the GSL vectors and matrices -- as these are // really C data structures we cannot take advantage of // automatic C++ memory management coef.free(); cov.free(); y.free(); X.free(); 37 39 41 return res; // return the result list to R 43 } catch( std::exception &ex ) { forward_exception_to_r( ex ); } catch(...) { ::Rf_error( "c++ exception (unknown reason)" ); } return R_NilValue; // -Wall 45 47 49 } Listing 11.1 FastLm function using RcppGSL 158 11 RcppGSL We first initialize a RcppGSL vector and matrix, each templated to the standard numeric type double (and the GSL supports other types ranging from lower precision floating point to signed and unsigned integers as well as complex numbers). Next, we reserve another vector and matrix to hold the resulting coefficient estimates as well as the estimate of the covariance matrix. We then allocate workspace using a GSL routine, fit the linear model, and free the workspace. This is followed by extraction of the diagonal element from the covariance matrix. We then employ a so-called iterator—a common C++ idiom from the Standard Template Library (STL)—to iterate over the vector of diagonal and transforming it by applying the square root function to compute our standard error of the estimate. Finally, we create a named list with the return value before we free temporary memory allocation. This last step is required because the underlying objects are really C objects conforming to the GSL interface. Hence, they do have the automatic memory management we could have with C++ vector or matrix structures as used through the Rcpp package. Finally, we return the result to R. As seen in the previous chapter, RcppArmadillo (François et al. 2012) implements a matching fastLm function using the Armadillo library by Sanderson (2010) and can do so with more compact code due to C++ features. 11.3 Vectors This section details the different vector representations, starting with their definition inside the GSL itself. We then discuss our layering before showing how the two types map to each other. A discussion of read-only “vector view” classes concludes the section. 11.3.1 GSL Vectors GSL defines various vector types to manipulate one-dimensional data, similar to R arrays. For example, the gsl_vector and gsl_vector_int structs are defined as: 2 4 6 typedef struct{ size_t size; size_t stride; double *data; gsl_block *block; int owner; } gsl_vector; 8 10 12 typedef struct { size_t size; size_t stride; int *data; 11.3 Vectors 14 159 gsl_block_int *block; int owner; } gsl_vector_int; Listing 11.2 Definition of gsl vector and gsl vector int A typical use of the gsl_vector struct is given below: 1 3 5 7 int i; // allocate a gsl_vector of size 3 gsl_vector * v = gsl_vector_alloc (3); // fill the vector for (i = 0; i < 3; i++) { gsl_vector_set (v, i, 1.23 + i); } 9 11 13 // access elements double sum = 0.0 ; for (i = 0; i < 3; i++) { sum += gsl_vector_get( v, i ) ; } 15 17 // free the memory gsl_vector_free (v); Listing 11.3 Example use of gsl vector 11.3.2 RcppGSL::vector RcppGSL defines the template RcppGSL::vector<T>. It can manipulate pointers to gsl vector objects by taking advantage of C++ templates. With this new type, the previous example becomes: 1 3 5 7 int i; // allocate a gsl_vector of size 3 RcppGSL::vector<double> v(3); // fill the vector for (i = 0; i < 3; i++) { v[i] = 1.23 + i ; } 9 11 13 // access elements double sum = 0.0 ; for (i = 0; i < 3; i++) { sum += v[i] ; } 15 17 // free the memory v.free() ; 160 11 RcppGSL Listing 11.4 Example use of RcppGSL::vector<T> The class RcppGSL::vector<double> implements a smart pointer which can be used anywhere in place of a raw pointer to gsl_vector. Examples are the gsl_vector_set and gsl_vector_get functions above. Beyond the convenience of a nicer syntax for allocation and release of memory, the RcppGSL::vector template facilitates the interchange of GSL vectors with Rcpp objects, and hence R objects. The following example defines a .Call() compatible function called sum_gsl_vector_int that operates on a gsl_vector_int through the RcppGSL::vector<int> template specialization: 1 3 5 RCPP_FUNCTION_1(int, sum_gsl_vector_int, RcppGSL::vector<int> vec) { int res = std::accumulate(vec.begin(), vec.end(), 0); vec.free(); // we need to free vec after use return res; } Listing 11.5 Example RcppGSL::vector<T> function The macro RCPP FUNCTION 1 expands its arguments to a single-parameter function. The generated function returns the type given as the first macro argument, has the function name provided by the second macro argument, and takes the third macro argument as the function argument. Hence, the function can then be called from R as: 2 R> .Call( "sum_gsl_vector_int", 1:10 ) [1] 55 Listing 11.6 Example call of RcppGSL::vector<T> function A second example shows a simple function that grabs elements of an R list as gsl_vector objects using implicit conversion mechanisms of Rcpp 2 4 RCPP_FUNCTION_1(double, gsl_vector_sum_2, Rcpp::List data ) { // grab "x" as a gsl_vector through // the RcppGSL::vector<double> class RcppGSL::vector<double> x = data["x"] ; 6 8 10 12 // grab "y" as a gsl_vector through // the RcppGSL::vector<int> class RcppGSL::vector<int> y = data["y"] ; double res = 0.0 ; for( size_t i=0; i< x->size; i++){ res += x[i] * y[i] ; } 14 16 // we need to explicitly free the memory x.free() ; 11.3 Vectors 161 y.free() ; 18 // return the result return res ; 20 } Listing 11.7 Second example RcppGSL::vector<T> function which can be called from R as follows: 1 R> .Call( "gsl_vector_sum_2", data ) [1] 36.66667 Listing 11.8 Example call of second RcppGSL::vector<T> function 11.3.3 Mapping Table 11.1 shows the mapping between types defined by the GSL and their corresponding types in the RcppGSL package. Table 11.1 Correspondence between GSL vector types and templates defined in RcppGSL gsl vector gsl gsl gsl gsl gsl gsl gsl gsl gsl gsl gsl gsl gsl gsl vector vector vector vector vector vector vector vector vector vector vector vector vector vector RcppGSL (with RcppGSL:: prefix) int float long char complex complex float complex long double long double short uchar uint ushort ulong vector<double> vector<int> vector<float> vector<long> vector<char> vector<gsl complex> vector<gsl complex float> vector<gsl complex long double> vector<long double> vector<short> vector<unsigned char> vector<unsigned int> vector<insigned short> vector<unsigned long> 11.3.4 Vector Views Several GSL algorithms return GSL vector views as their result type. RcppGSL defines the template class RcppGSL::vector view to handle vector views using C++ syntax. 162 2 4 6 8 10 extern "C" SEXP test_gsl_vector_view(){ int n = 10 ; RcppGSL::vector<double> v(n) ; for( int i=0 ; i<n; i++){ v[i] = i ; } RcppGSL::vector_view<double> v_even = gsl_vector_subvector_with_stride(v,0,2,n/2); RcppGSL::vector_view<double> v_odd = gsl_vector_subvector_with_stride(v,1,2,n/2); List res = List::create( _["even"] = v_even, _["odd" ] = v_odd ) ; v.free() ; // we only free v, views do not own data return res ; 12 14 16 18 11 RcppGSL } Listing 11.9 Example of a vector view class As with vectors, C++ objects of type RcppGSL::vector view can be converted implicitly to their associated GSL view type. Table 11.2 displays the pairwise correspondence so that the C++ objects can be passed to compatible GSL algorithms. Note that the vector view<gsl complex long double> variant has been omitted for typesetting reasons. Table 11.2 Correspondence between GSL vector view types and templates defined in RcppGSL gsl vector views gsl gsl gsl gsl gsl gsl gsl gsl gsl gsl gsl gsl gsl vector vector vector vector vector vector vector vector vector vector vector vector vector view view view view view view view view view view view view view RcppGSL (with RcppGSL:: prefix) int float long char complex complex float long double short uchar uint ushort ulong vector vector vector vector vector vector vector vector vector vector vector vector vector view<double> view<int> view<float> view<long> view<char> view<gsl complex> view<gsl complex float> view<long double> view<short> view<unsigned char> view<unsigned int> view<insigned short> view<unsigned long> The vector view class also contains a conversion operator to automatically transform the data of the view object to a GSL vector object. This enables the use of vector views where GSL would expect a vector. 11.4 Matrices 163 11.4 Matrices The GSL also defines a set of matrix data types : gsl matrix, gsl matrix int etc., for which RcppGSL defines a corresponding convenience C++ wrapper generated by the RcppGSL::matrix template. 11.4.1 Creating Matrices The RcppGSL::matrix template exposes three constructors. 2 4 // convert an R matrix to a GSL matrix matrix( SEXP x) throw(::Rcpp::not_compatible) // encapsulate a GSL matrix pointer matrix( gsl_matrix* x) 6 8 // create a new matrix with the // given number of rows and columns matrix( int nrow, int ncol) Listing 11.10 Example use RcppGSL matrix class 11.4.2 Implicit Conversion RcppGSL::matrix defines implicit conversion of a pointer to the associated GSL matrix type, as well as dereferencing operators, making the class RcppGSL ::matrix look and feel like a pointer to a GSL matrix type. 1 3 gsltype* operator gsltype* gsltype& data ; gsltype*(){ return data ; } operator->() const { return data; } operator *() const { return *data; } Listing 11.11 Implicit conversion for RcppGSL matrix class 11.4.3 Indexing Indexing of elements of GSL matrices is usually done using the getter functions gsl matrix get, gsl matrix int get, etc. and the setter functions gsl matrix set, gsl matrix int set, etc. As C functions, these have to supply the row and column indices individually. 164 11 RcppGSL RcppGSL takes advantage of both operator overloading and templates to make indexing a GSL matrix much more convenient and closer to our common mathematical notation as shown in the next example. 2 4 6 // create a matrix of size 10x10 RcppGSL::matrix<int> mat(10,10); // fill the diagonal for( int i=0; i<10: i++) { mat(i,i) = i ; } Listing 11.12 Indexing for RcppGSL matrix class 11.4.4 Methods The RcppGSL::matrix type also defines the following member functions: nrow() ncol() size() free() to extract the number of rows to extract the number of columns to extract the number of elements to release the memory 11.4.5 Matrix Views Similar to the vector views discussed above, the RcppGSL also provides an implicit conversion operator which returns the underlying matrix stored in the matrix view class. 11.5 Using RcppGSL in Your Package The RcppGSL package contains a complete example providing a single function colNorm which computes a norm for each column of a supplied matrix. This example adapts a matrix example from the GSL manual that has been chosen merely as a means for showing how to set up a package to use RcppGSL. Needless to say, we could compute such a matrix norm easily in R using existing facilities. One such possibility is a simple expression as in Listing 11. 13 11.5 Using RcppGSL in Your Package 165 which is also shown on the corresponding help page in the example package inside RcppGSL. 1 apply(M, 2, function(x) sqrt(sum(xˆ2))) Listing 11.13 Matrix norm in R One point in favor of using the GSL code is that it employs BLAS functions. On sufficiently large matrices, and with suitable BLAS libraries installed, this variant could be faster due to the optimized code in high-performance BLAS libraries and/or the inherent parallelism a multi-core BLAS variant which compute the vector norm in parallel. On all “reasonable” matrix sizes, however, the performance differences should be negligible. 11.5.1 The configure Script 11.5.1.1 Using autoconf Using RcppGSL means employing both the GSL and R. We may need to find the location of the GSL headers and library, and this can be done easily from a configure source script which autoconf generates from a configure.in source file such as the following: 1 AC_INIT([RcppGSLExample], 0.1.0) 3 ## Use gsl-config to find arguments for compiler + linker flags ## ## Check for non-standard programs: gsl-config(1) AC_PATH_PROG([GSL_CONFIG], [gsl-config]) ## If gsl-config was found, let’s use it if test "${GSL_CONFIG}" != ""; then # Use gsl-config for header and linker arguments # (without BLAS which we get from R) GSL_CFLAGS=‘${GSL_CONFIG} --cflags‘ GSL_LIBS=‘${GSL_CONFIG} --libs-without-cblas‘ else AC_MSG_ERROR([gsl-config not found, is GSL installed?]) fi 5 7 9 11 13 15 17 19 ## Use Rscript to query Rcpp for compiler and linker flags ## link flag providing library as well as path to library, ## and optionally rpath RCPP_LDFLAGS=‘${R_HOME}/bin/Rscript -e ’Rcpp:::LdFlags()’‘ 21 23 25 # Now substitute these variables # in src/Makevars.in to create src/Makevars AC_SUBST(GSL_CFLAGS) AC_SUBST(GSL_LIBS) 166 11 RcppGSL AC_SUBST(RCPP_LDFLAGS) 27 AC_OUTPUT(src/Makevars) Listing 11.14 Autoconf script for RcppGSL use Such a source configure.in gets converted into a script configure by invoking the autoconf program. 11.5.1.2 Using Functions Provided by RcppGSL RcppGSL provides R functions that allow one to retrieve the same information. Therefore, the configure script can also be written as: #!/bin/sh 2 4 GSL_CFLAGS=‘${R_HOME}/bin/Rscript -e "RcppGSL:::CFlags()"‘ GSL_LIBS=‘${R_HOME}/bin/Rscript -e "RcppGSL:::LdFlags()"‘ RCPP_LDFLAGS=‘${R_HOME}/bin/Rscript -e "Rcpp:::LdFlags()"‘ 6 8 10 sed -e "s|@[email protected]|${GSL_LIBS}|" \ -e "s|@[email protected]|${GSL_CFLAGS}|" \ -e "s|@[email protected]|${RCPP_LDFLAGS}|" \ src/Makevars.in > src/Makevars Listing 11.15 Shell script configuration script for RcppGSL use Similarly, the configure.win for windows can be written as: 2 4 6 8 RSCRIPT="${R_HOME}/bin${R_ARCH_BIN}/Rscript.exe" GSL_CFLAGS=‘${RSCRIPT} -e "RcppGSL:::CFlags()"‘ GSL_LIBS=‘${RSCRIPT} -e "RcppGSL:::LdFlags()"‘ RCPP_LDFLAGS=‘${RSCRIPT} -e "Rcpp:::LdFlags()"‘ sed -e "s|@[email protected]|${GSL_LIBS}|" \ -e "s|@[email protected]|${GSL_CFLAGS}|" \ -e "s|@[email protected]|${RCPP_LDFLAGS}|" \ src/Makevars.in > src/Makevars.win Listing 11.16 Windows shell script configuration script for RcppGSL use 11.5.2 The src Directory The C++ source file takes the matrix supplied from R and applies the GSL function to each column. 1 3 #include <RcppGSL.h> #include <gsl/gsl_matrix.h> 11.5 Using RcppGSL in Your Package 167 #include <gsl/gsl_blas.h> 5 extern "C" SEXP colNorm(SEXP sM) { 7 try { // create gsl data structures from SEXP RcppGSL::matrix<double> M = sM; int k = M.ncol(); Rcpp::NumericVector n(k); // for results 9 11 13 for (int j = 0; j < k; j++) { RcppGSL::vector_view<double> colview = gsl_matrix_column (M, j); n[j] = gsl_blas_dnrm2(colview); } M.free() ; return n; // return vector 15 17 19 21 } catch( std::exception &ex ) { forward_exception_to_r( ex ); 23 } catch(...) { ::Rf_error( "c++ exception (unknown reason)" ); } return R_NilValue; // -Wall 25 27 29 } Listing 11.17 Vector norm function for RcppGSL The Makevars.in file governs the compilation and uses the values supplied by configure during build-time: 1 3 5 7 9 # set by configure GSL_CFLAGS = @[email protected] GSL_LIBS = @[email protected] RCPP_LDFLAGS = @[email protected] # combine with standard arguments for R PKG_CPPFLAGS = $(GSL_CFLAGS) PKG_LIBS = $(GSL_LIBS) $(RCPP_LDFLAGS) Listing 11.18 Makevars.in for RcppGSL example The variables surrounded by @ will be filled by configure during package build-time with values determined by the configure code shown above. 11.5.3 The R Directory The R source is very simple: a single matrix is passed to C++: 168 1 3 11 RcppGSL colNorm <- function(M) { stopifnot(is.matrix(M)) res <- .Call("colNorm", M, PACKAGE="RcppGSLExample") } Listing 11.19 R function for RcppGSL example 11.6 Using RcppGSL with inline As we have seen throughout the book, the inline package (Sklyar et al. 2012) is very helpful for prototyping code in C, C++, or Fortran as it takes care of code compilation, linking and dynamic loading directly from R. It is being used extensively by Rcpp, for example, in the numerous unit tests. The example below shows how inline can be deployed with RcppGSL. We implement the same column norm example, but this time as an R script which is compiled, linked, and loaded on-the-fly. Compared to standard use of inline, we have to make sure to add a short section declaring which header files from GSL we need to use; the RcppGSL then communicates with inline to tell it about the location and names of libraries used to build code against GSL. 2 4 6 8 10 12 14 16 18 20 22 24 26 R> require(inline) R> inctxt=’ + #include <gsl/gsl_matrix.h> + #include <gsl/gsl_blas.h> + ’ R> bodytxt=’ + // create gsl data structures from SEXP + RcppGSL::matrix<double> M = sM; + int k = M.ncol(); + Rcpp::NumericVector n(k); // for results + + for (int j = 0; j < k; j++) { + RcppGSL::vector_view<double> colview = + gsl_matrix_column (M, j); + n[j] = gsl_blas_dnrm2(colview); + } + M.free() ; + return n; // return vector + ’ R> foo <- cxxfunction( + signature(sM="numeric"), + body=bodytxt, inc=inctxt, plugin="RcppGSL") R> ## see Section 8.4.13 of the GSL manual: R> ## create M as a sum of two outer products R> M <- outer(sin(0:9), rep(1,10), "*") + + outer(rep(1, 10), cos(0:9), "*") R> foo(M) 11.7 Case Study: GSL-Based B-Spline Fit Using RcppGSL 28 169 [1] 4.314614 3.120504 2.193159 3.261141 2.534157 [6] 2.572810 4.204689 3.652017 2.085236 3.073134 Listing 11.20 Using RcppGSL with inline The RcppGSL inline plugin supports creation of a package skeleton based on the inline function. 1 R> package.skeleton( "mypackage", foo ) Listing 11.21 Using package.skeleton with inline result This creates a skeleton package, similar to what we have seen in Chap. 5, based on the function produced by cxxfunction() and assigned to the function object foo() as per the code example in Listing 11.20. 11.7 Case Study: GSL-Based B-Spline Fit Using RcppGSL The GSL and R both overlap in a number of areas. In that sense the following example is contrived as we could compute it entirely in R. However, as the GSL is a well-established numerical library in its own right, it is of interest to show how examples from GSL can be deployed together with R. In this section, we illustrate this using an example from Section 39.7 of the GSL reference manual. We generate data from y(x) = e−x/10 cos(x) with x ∈ [0, 15] and fit this data using weighted least squares using a cubic B-spline basis function with uniform breakpoints. Again, we are not interested as much in the statistical aspect of this problem as we are in exploring how to let R use code from the GSL. The original program follows in Listing 11.22. 1 3 5 7 #include #include #include #include #include #include #include #include <stdio.h> <stdlib.h> <math.h> <gsl/gsl_bspline.h> <gsl/gsl_multifit.h> <gsl/gsl_rng.h> <gsl/gsl_randist.h> <gsl/gsl_statistics.h> 9 11 13 /* number of data points to fit */ #define N 200 /* number of fit coefficients */ #define NCOEFFS 12 15 /* nbreak = ncoeffs + 2 - k = ncoeffs - 2 since k = 4 */ 170 11 RcppGSL 17 #define NBREAK 19 int main (void) { const size_t n = N; const size_t ncoeffs = NCOEFFS; const size_t nbreak = NBREAK; size_t i, j; gsl_bspline_workspace *bw; gsl_vector *B; double dy; gsl_rng *r; gsl_vector *c, *w; gsl_vector *x, *y; gsl_matrix *X, *cov; gsl_multifit_linear_workspace *mw; double chisq, Rsq, dof, tss; 21 23 25 27 29 31 (NCOEFFS - 2) 33 35 37 39 41 43 45 47 49 gsl_rng_env_setup(); r = gsl_rng_alloc(gsl_rng_default); /* allocate a cubic bspline workspace (k = 4) */ bw = gsl_bspline_alloc(4, nbreak); B = gsl_vector_alloc(ncoeffs); x = gsl_vector_alloc(n); y = gsl_vector_alloc(n); X = gsl_matrix_alloc(n, ncoeffs); c = gsl_vector_alloc(ncoeffs); w = gsl_vector_alloc(n); cov = gsl_matrix_alloc(ncoeffs, ncoeffs); mw = gsl_multifit_linear_alloc(n, ncoeffs); printf("#m=0,S=0\n"); /* this is the data to be fitted */ 51 53 55 for (i = 0; i < n; ++i) { double sigma; double xi = (15.0 / (N - 1)) * i; double yi = cos(xi) * exp(-0.1 * xi); sigma = 0.1 * yi; dy = gsl_ran_gaussian(r, sigma); yi += dy; 57 59 gsl_vector_set(x, i, xi); gsl_vector_set(y, i, yi); gsl_vector_set(w, i, 1.0 / (sigma * sigma)); 61 63 printf("%f %f\n", xi, yi); 65 } 67 69 /* use uniform breakpoints on [0, 15] */ gsl_bspline_knots_uniform(0.0, 15.0, bw); 11.7 Case Study: GSL-Based B-Spline Fit Using RcppGSL 71 73 171 /* construct the fit matrix X */ for (i = 0; i < n; ++i) { double xi = gsl_vector_get(x, i); /* compute B_j(xi) for all j */ gsl_bspline_eval(xi, B, bw); 75 77 /* fill in row i of X */ for (j = 0; j < ncoeffs; ++j) { double Bj = gsl_vector_get(B, j); gsl_matrix_set(X, i, j, Bj); } 79 81 83 } 85 /* do the fit */ gsl_multifit_wlinear(X, w, y, c, cov, &chisq, mw); 87 89 dof = n - ncoeffs; tss = gsl_stats_wtss(w->data, 1, y->data, 1, y->size); Rsq = 1.0 - chisq / tss; 91 fprintf(stderr, "chisq/dof = %e, Rsq = %f\n", chisq / dof, Rsq); 93 95 /* output the smoothed curve */ { double xi, yi, yerr; 97 printf("#m=1,S=0\n"); for (xi = 0.0; xi < 15.0; xi += 0.1) { gsl_bspline_eval(xi, B, bw); gsl_multifit_linear_est(B, c, cov, &yi, &yerr); printf("%f %f\n", xi, yi); } 99 101 103 } 105 107 109 111 113 115 117 gsl_rng_free(r); gsl_bspline_free(bw); gsl_vector_free(B); gsl_vector_free(x); gsl_vector_free(y); gsl_matrix_free(X); gsl_vector_free(c); gsl_vector_free(w); gsl_matrix_free(cov); gsl_multifit_linear_free(mw); return 0; } /* main() */ Listing 11.22 B-spline fit example from the GSL This original GSL example provides a stand-alone program with a single main() function. First, the data is generated and written out to the standard output. Next, the 172 11 RcppGSL cubic B-spline is set up and fit, and the result is written out. In the original documentation it is then suggested to use an external plotting program to visualize data and fit. We can of course easily do the second step in R by reading the input, subsetting it into input data (of which there are 200 lines) and results data (151 lines for the grid from 0.0 to 15.0 in increments of 0.1. In order to use this functionality from R, we will decompose the program into two parts: data generation and data fitting. Each part will be addressed by a single C++ function which executes just its part. We use “Rcpp attributes” (described in Sect. 2.6) to provide access to this C++ code from R itself. 2 4 6 8 10 12 // [[Rcpp::depends(RcppGSL)]] #include <RcppGSL.h> #include #include #include #include #include <gsl/gsl_bspline.h> <gsl/gsl_multifit.h> <gsl/gsl_rng.h> <gsl/gsl_randist.h> <gsl/gsl_statistics.h> const int N = 200; // number of data points to fit const int NCOEFFS = 12; // number of fit coefficients const int NBREAK = (NCOEFFS - 2); // nbreak=ncoeffs-2 since k = 4 Listing 11.23 Beginning of C++ file with B-spline fit for R This first declares a dependency on RcppGSL implying that R will use both the header files and library from the GSL—by using the plugin discussed above. Several header files are then included to declare the types used by RcppGSL (and Rcpp) as well as for the GSL functionality needed. We can then define the first function to generate the data. 2 4 6 8 10 // [[Rcpp::export]] Rcpp::List genData() { const size_t n = N; size_t i; double dy; gsl_rng *r; RcppGSL::vector<double> w(n), x(n), y(n); gsl_rng_env_setup(); r = gsl_rng_alloc(gsl_rng_default); 12 14 16 18 20 //printf("#m=0,S=0\n"); /* this is the data to be fitted */ for (i = 0; i < n; ++i) { double sigma; double xi = (15.0 / (N - 1)) * i; double yi = cos(xi) * exp(-0.1 * xi); 11.7 Case Study: GSL-Based B-Spline Fit Using RcppGSL 173 sigma = 0.1 * yi; dy = gsl_ran_gaussian(r, sigma); yi += dy; 22 24 gsl_vector_set(x, i, xi); gsl_vector_set(y, i, yi); gsl_vector_set(w, i, 1.0 / (sigma * sigma)); 26 28 //printf("%f %f\n", xi, yi); 30 } 32 Rcpp::DataFrame res = Rcpp::DataFrame::create(Rcpp::Named("x") = x, Rcpp::Named("y") = y, Rcpp::Named("w") = w); 34 36 x.free(); y.free(); w.free(); gsl_rng_free(r); 38 40 return(res); 42 } Listing 11.24 Data generation for GSL B-spline fit for R Similarly, the second function used to fit the data can be defined as well. 1 // [[Rcpp::export]] Rcpp::List fitData(Rcpp::DataFrame ds) { 3 5 7 const size_t ncoeffs = NCOEFFS; const size_t nbreak = NBREAK; const size_t n = N; size_t i, j; 9 11 13 15 17 19 Rcpp::DataFrame D(ds); // construct data.frame RcppGSL::vector<double> y = D["y"]; // access columns by name RcppGSL::vector<double> x = D["x"]; // assigning GSL vectors RcppGSL::vector<double> w = D["w"]; gsl_bspline_workspace *bw; gsl_vector *B; gsl_vector *c; gsl_matrix *X, *cov; gsl_multifit_linear_workspace *mw; double chisq, Rsq, dof, tss; 21 23 // allocate a cubic bspline workspace (k = 4) bw = gsl_bspline_alloc(4, nbreak); B = gsl_vector_alloc(ncoeffs); 25 27 X = gsl_matrix_alloc(n, ncoeffs); c = gsl_vector_alloc(ncoeffs); 174 11 RcppGSL cov = gsl_matrix_alloc(ncoeffs, ncoeffs); mw = gsl_multifit_linear_alloc(n, ncoeffs); 29 // use uniform breakpoints on [0, 15] gsl_bspline_knots_uniform(0.0, 15.0, bw); 31 33 // construct the fit matrix X for (i = 0; i < n; ++i) { double xi = gsl_vector_get(x, i); 35 37 // compute B_j(xi) for all j gsl_bspline_eval(xi, B, bw); 39 // fill in row i of X for (j = 0; j < ncoeffs; ++j) { double Bj = gsl_vector_get(B, j); gsl_matrix_set(X, i, j, Bj); } 41 43 45 } 47 // do the fit gsl_multifit_wlinear(X, w, y, c, cov, &chisq, mw); 49 dof = n - ncoeffs; tss = gsl_stats_wtss(w->data, 1, y->data, 1, y->size); Rsq = 1.0 - chisq / tss; 51 53 // output the smoothed curve Rcpp::NumericMatrix M(150,2); double xi, yi, yerr; for (xi = 0.0, i=0; xi < 15.0; xi += 0.1, i++) { gsl_bspline_eval(xi, B, bw); gsl_multifit_linear_est(B, c, cov, &yi, &yerr); M(i,0) = xi; M(i,1) = yi; } 55 57 59 61 63 gsl_bspline_free(bw); gsl_vector_free(B); gsl_matrix_free(X); gsl_vector_free(c); gsl_matrix_free(cov); gsl_multifit_linear_free(mw); 65 67 69 71 return(Rcpp::List::create( Rcpp::Named("M")=M, Rcpp::Named("chisqdof")=Rcpp::wrap(chisq/dof), Rcpp::Named("rsq")=Rcpp::wrap(Rsq))); 73 75 } Listing 11.25 Data fit for GSL B-spline with R Finally, we can generate the compiled functions, generate the data, and fit the spline model. This fit is illustrated in a chart shown in Fig. 11.1. 11.7 Case Study: GSL-Based B-Spline Fit Using RcppGSL 2 175 # compile two functions sourceCpp("bSpline.cpp") 4 14 16 # plot op <- par(mar=c(3,3,1,1)) plot(dat[,"x"], dat[,"y"], pch=19, col="#00000044") lines(M[,1], M[,2], col="orange", lwd=2) par(op) Listing 11.26 R side of GSL B-spline example 1.0 12 0.5 10 # fit the model, returns matrix and gof measures fit <- fitData(dat) M <- fit[[1]] 0.0 8 # generate the data dat <- genData() −0.5 6 0 Fig. 11.1 Artificial data and B-spline fit 5 10 15 Chapter 12 RcppEigen Abstract The RcppEigen package provides an interface to the Eigen library. Eigen is a featureful C++ library which deploys modern template meta-programming techniques. It is similar to Armadillo but provides an even more granular applicationprogramming interface (API). This chapter provides an introduction to the Rcpp Eigen package by introducing the core data structures, illustrating some of the available matrix decomposition methods and concludes with a case study of particular C++ implementation (providing what is called a “factory” pattern) for different matrix decomposition approaches in order to provide a faster reimplementation of the lm method. 12.1 Introduction Eigen is a modern C++ library for linear algebra, similar in scope as Armadillo (which was discussed in Chap. 10), but with an even more fine-grained applicationprogramming interface (API). Eigen (Guennebaud et al. 2012) started as a sub-project to KDE (a popular Linux desktop environment), initially focusing on fixed-size matrices to represent rotations, projections, or affine transformations in a visualization application. Eigen grew from there and has over the course of about a decade produced three major releases with “Eigen3” being the current major version. Eigen is now widely used in a number of projects, including ceres, a largescale nonlinear least-squares solver released by Google.1 And just like Armadillo, Eigen has been prepared for use with Rcpp by providing appropriate conversion functions as<>() and wrap() in the RcppEigen package (Bates and Eddelbuettel 2013). The next section introduces some of the key data types in Eigen, as well as the corresponding converters accessible from Rcpp. 1 See https://code.google.com/p/ceres-solver/. D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4 12, © The Author 2013 177 178 12 RcppEigen 12.2 Eigen Classes 12.2.1 Fixed-Size Vectors and Matrices The earliest version of Eigen aimed at supporting visualizations and projections in computational chemistry. For this task, fixed-size matrices and vectors are appropriate and are still supported in the current version. C++ meta-template programming (Abrahams and Gurtovoy 2004) is used extensively throughout Eigen. When dimensions are known at compile-time, operations which would commonly involve loops at run-time can in fact be converted at compile-time. Consider this simple example: 2 4 Eigen::Vector3d x(1,2,3); Eigen::Vector3d y(4,5,6); Eigen::Matrix3d m1 = x * y.transpose(); double m2 = x.transpose() * y; 6 8 Rcpp::Rcout << Rcpp::Rcout << "Outer:\n" << m1 << std::endl; "Inner:\n" << m2 << std::endl; Listing 12.1 A simple Eigen example using fixed-size vectors and matrices The length of the two vectors is fixed at size three in the definition of the vector classes. The (square) matrix is also of size three. The prime reason for creating the fixed-size variants is efficiency. Through the use of templates, this library lets the compiler create a more efficient implementation for the inner and outer products which, essentially, replaces the loop constructs with a constant assignment. We will show just how dramatic the difference can be below. Conceptually, the representation in Eigen is of the following type (where we limit ourselves to dimension three, but variants for dimension two and four exist, as do variants for types float and complex not shown here): 2 4 6 typedef Matrix<int, 3, 1> Vector3i; typedef Matrix<double, 3, 1> Vector3d; typedef typedef typedef typedef Matrix<int, 1, Matrix<int, 3, Matrix<double, Matrix<double, 3> RowVector3i; 1> ColVector3i; 1, 3> RowVector3d; 3, 1> ColVector3d; 8 10 typedef Matrix<int, 3, 3> Matrix3i; typedef Matrix<double, 3, 3> Matrix3d; Listing 12.2 Eigen fixed-size vector and matrix representation However, because data types in R are essentially always dynamic and can be resized at any moment, no accessors or conversion functions exist between the fixed-size representation in Eigen and the representation in R. All of the interfaces discussed in this chapter use the dynamically sized vectors and matrices introduced next. 12.2 Eigen classes 179 12.2.2 Dynamic-Size Vectors and Matrices Work with data must accommodate changing data sizes, particularly when used interactively or with varying inputs. The core data type for R work with Eigen is therefore the type defined as follows: 2 4 typedef Matrix<double, Dynamic, 1> VectorXd; typedef Matrix<double, Dynamic, Dynamic> MatrixXd; typedef Matrix<int, Dynamic, 1> VectorXi; typedef Matrix<int, Dynamic, Dynamic> MatrixXi; Listing 12.3 Eigen dynamic-size vector and matrix representation with additional variants for rows and column vectors, as well as different underlying scalar representations for complex and float. The core R access functions involve the types VectorXd and MatrixXd. We can revisit the example from the previous section where we now use dynamic vectors and matrices. Note how the initialization is now at run-time using the overloaded << operator. 1 Eigen::VectorXd u(3); u << 1,2,3; Eigen::VectorXd v(3); v << 4,5,6; 3 5 7 Eigen::MatrixXd m3 = u * v.transpose(); double m4 = u.transpose() * v; Rcpp::Rcout << Rcpp::Rcout << "Outer:\n" << m3 << std::endl; "Inner:\n" << m4 << std::endl; Listing 12.4 A simple Eigen example using dynamic-size vectors and matrices The result is of course the same. What about performance differences? More recent versions of Rcpp contain a simple helper class Timer. It has to be included explicitly as shown below. We can then continue the example and create simple timed loops: 2 4 6 8 10 // include header file for timer #include <Rcpp/Benchmark/Timer.h> // start the timer const int n = 1000000; Rcpp::Timer timer; for(int i=0; i<n; i++) { m1 = x * y.transpose(); m2 = x.transpose() * y; } timer.step("fixed") ; 12 14 16 for(int i=0; i<n; i++) { m3 = u * v.transpose(); m4 = u.transpose() * v; } 180 12 RcppEigen timer.step("dynamic"); 18 20 22 24 26 for(int i=0; i<n; i++) { } // empty loop timer.step( "empty loop" ) ; Rcpp::NumericVector res(timer); for (int i=0; i<res.size(); i++) { res[i] = res[i] / n; } Rcpp::Rcout << res << std::endl; Listing 12. 5 Comparing performance of simple operations between dynamic and fixed size vectors The result of this comparison of making one million matrix multiplication of an inner and outer product of two short vectors is rather astounding. The Timer class keeps the data in nanoseconds (provided the operating system supports it). By dividing the results by the number of iterations n, we obtain the cost per iteration: 2 fixed dynamic empty loop 0.001129 135.464204 0.000256 Listing 12.6 Timing results simple operations betweem dynamic and fixed size vectors The templated code for fixed-size vectors and matrices is barely slower than the empty loop. Without having inspected the generated machine-language code, we assume that the assignment of the nine elements of the outer-product matrix plus the tenth result from the inner-product scalar are replaced by constant assignments— whereas the loop using dynamic data types takes 135 ns per iteration which is in relatively terms much more than the operation implemented with fixed-size data types. This example, while unrealistic in its simplicity, shows that modern optimizing compilers, combined with template logic, can result in very efficient code as they essentially factor out invariants. 12.2.3 Arrays for Per-Component Operations C++ matrix libraries overload the operator * such that (conforming) vectors and matrices can by multiplied. This is very useful for the focus on linear algebra and matrix operations and decompositions. However, programmers also often need perelement operations (as is done in R when doing, say, c(1:3) * c(2:4)). Eigen supports these operations via the Array template classes. In general, there is a one-to-one mapping between the Matrix and Vector classes, and their Array counterparts as shown in Table 12.1. Where Vector denotes a single dimension, Array uses one X or digit. For Matrix types, two digits are used to fixed-size objects, and XX denotes variable size arrays. The trailing letter still denotes the storage type. 12.2 Eigen classes 181 Table 12.1 Mapping between Eigen matrix and vector types, and corresponding array types Vector or Matrix object type Array object type VectorXd Vector3d MatrixXd Matrix3d ArrayXd Array3d ArrayXXd Array33d Conversion between Matrix / Vector and Array are done, respectively, with the array() method for the former, and the matrix() method for the latter. 12.2.4 Mapped Vectors and Matrices and Special Matrices The previous sections illustrated the basic vector and matrix representations in Eigen, providing either fixed or dynamically sized storage. To interfacing external libraries, or C and C++ arrays, Eigen provides another class: Map. This approach fits perfectly with the design of Rcpp which operates via proxy classes that access the SEXP type of the underlying R object. Using such a “mapped object” requires no additional copy upon construction, allowing for efficient passage of objects from R to code using Eigen in the same way that the Rcpp classes are lightweight. In general, one uses the desired representation as a template argument for the Map classes, leading to, for example, Eigen::Map< Eigen::MatriXd> > in the case of a dynamically sized matrix of type double. By deploying the using directive to import either the full namespace or selected identifiers, this can be reduced to Map<MatrixXd>. It is good practice to declare such mapped object as const to prevent accidental changes to the memory content of a mapped variable. Moreover, Eigen also supports operations on sparse matrices. The core class is SparseMatrix which offers high performance executing yet low memory usage. It is based on a variant of the Compressed Column (or Row) Storage scheme used by other software libraries operating on sparse matrices. The internal representation uses four compact arrays: Values stores the coefficient values of the nonzero elements. InnerIndices stores the row (or column) indices of the nonzero elements. OuterStarts stores for each column (or row) the index of the first nonzero in the previous two arrays. InnerNNZs stores the number of nonzeros of each column (or row). Here “inner” refers to column vector for a column-major matrix (or a row vector for a row-major matrix) and “outer” refers to the other direction. Eigen also supports matrices with particular known structure such as symmetric matrices (provided as either a lower- or upper-triangular matrix) or banded matrices. In general, these are provided as “views” which means that while the full size is used, only the relevant portion is accessed during operations. 182 12 RcppEigen 12.3 Case Study: Kalman Filter Using RcppEigen Section 10.3 above discussed a simple Kalman filter example and showed its implementation in a simple C++ class using Armadillo. For comparison, we can also implement it using Eigen. #include <RcppEigen.h> 2 4 6 8 10 12 using namespace Rcpp; using namespace Eigen; class Kalman { private: MatrixXd A, H, Q, R, xest, pest; public: // constructor, sets up data structures Kalman() { const double dt = 1.0; A.setIdentity(6,6); A(0,2) = A(1,3) = A(2,4) = A(3,5) = dt; 14 16 H.setZero(2,6); H(0,0) = H(1,1) = 1.0; 18 20 Q.setIdentity(6,6); R = 1000 * R.Identity(2,2); 22 xest.setZero(6,1); pest.setZero(6,6); 24 26 } 28 // sole member function: estimate model MatrixXd estimate(const MatrixXd & Z) { unsigned int n = Z.rows(), k = Z.cols(); MatrixXd Y = MatrixXd::Zero(n,k); MatrixXd xprd, pprd, S, B, kalmangain; VectorXd z, y; 30 32 34 36 38 40 for (unsigned int i = 0; i<n; i++) { z = Z.row(i).transpose(); // predicted state and covariance xprd = A * xest; pprd = A * pest * A.transpose() + Q; 44 // estimation S = H * pprd.transpose() * H.transpose() + R; B = H * pprd.transpose(); 46 kalmangain = S.ldlt().solve(B).transpose(); 42 12.4 Linear Algebra and Matrix Decompositions 183 // estimated state and covariance xest = xprd + kalmangain * (z - H * xprd); pest = pprd - kalmangain * H * pprd; 48 50 // compute the estimated measurements y = H * xest; Y.row(i) = y.transpose(); 52 54 } return Y; 56 } 58 }; Listing 12.7 Basic Kalman filter class in C++ using Eigen Listing 12. 7 provides a straightforward adaptation of the previous implementation. The switch from Armadillo to Eigen consists mostly of • An obvious change in the header file that is included. • Switching declarations from mat to MatrixXd, and from vec to VectorXd. • Changing member functions zero(), identity() and t() to setZero(), setIdentity() and transpose(), respectively. • Selecting a robust Cholesky decomposition with pivoting matrix decomposition method via the ldlt() member function to use the solve() method. The code is slightly more verbose than the variant in Listing 10.10. And while Eigen has a reputation for providing fast-running code, it turns out that for these (arguably rather naı̈ve) implementations, Armadillo holds a considerable speed gain of more than 60 % over Eigen using the code shown in Listings 10.10 and 12.7.2 12.4 Linear Algebra and Matrix Decompositions 12.4.1 Basic Solvers Eigen has very substantial support for linear algebra operations and a number of matrix decompositions. Bates and Eddelbuettel (2013) provide a thorough discussion, so rather than enumerating these again, we will highlight a few key elements. The case study in Sect. 12.5 also deploys a number of these in order to evaluate their relative performance in a reimplementation of a linear model estimator. Useful documentation for these methods is provided by the corresponding tutorial section of the Eigen documentation (Guennebaud et al. 2012)3 from which we have taken the following examples. 2 A comment from another R / Eigen developer confirming this ratio is gratefully acknowledged. The Eigen tutorial can be accessed via http://eigen.tuxfamily.org/dox, and more detailed documentation about matrix decompositions is at http://eigen.tuxfamily.org/ dox/TopicLinearAlgebraDecompositions.html. 3 184 12 RcppEigen The solvers example can be adapted quite easily to code to be called from R. 2 4 6 8 10 12 R> src <- ’ const Map<MatrixXd> A(as<Map<MatrixXd> >(As)); const Map<VectorXd> b(as<Map<VectorXd> >(bs)); VectorXd x = A.colPivHouseholderQr().solve(b); return wrap(x);’ R> solveEx <- cxxfunction(signature(As = "mat", bs = "vec"), + body=src, plugin="RcppEigen") R> A <- matrix(c(1,2,3, 4,5,6, 7,8,10), 3,3, byrow=TRUE) R> b <- c(3, 3, 4) R> solveEx(A, b) [1] -2 1 1 R> Listing 12.8 Using a basic Eigen solver from R In this example, we pass a matrix and vector from R, and the R data types are used to instantiate the corresponding Eigen objects. As discussed in the previous section, a Map type permits us to reuse the R memory without an additional copy of data, and we use the dynamically sized type with double precision. In the example, the matrix A is decomposed using a column-pivoting Householder QR decomposition after which the linear equation Ax = b is solved for a given b. 12.4.2 Eigenvalues and Eigenvectors Eigenvalues and eigenvector calculation are also available. The following example uses a self-adjoint solver suitable for symmetric matrices in which only one triangle of the corresponding matrix is used, while the other is inferred. Alternate solvers classes EigenSolver and ComplexEigenSolver are also available. 2 4 6 8 10 12 R> src <- ’ + using namespace Eigen; + const Map<MatrixXd> A(as<Map<MatrixXd> >(As)); + SelfAdjointEigenSolver<MatrixXd> es(A); + if (es.info() != Success) stop("Problem with Matrix"); + return List::create(Named("values") = es.eigenvalues(), + Named("vectors") = es.eigenvectors());’ R> eigEx <- cxxfunction(signature(As = "mat"), body=src, + plugin="RcppEigen") R> A <- matrix(c(1,2, 2,3), 2,2, byrow=TRUE) R> eigEx(A) $values [1] -0.236068 4.236068 14 $vectors 12.4 Linear Algebra and Matrix Decompositions 16 18 20 22 185 [,1] [,2] [1,] -0.850651 -0.525731 [2,] 0.525731 -0.850651 R> R> eigEx(matrix(c(1,NA,NA,1),2,2)) Error: Problem with Matrix R> Listing 12.9 Computing eigenvalues using Eigen Line four shows how a member function of the solver can be queried for success or failure; we then use the stop() wrapper around the Rcpp exception handlers to return to R with an appropriate error message. Lines 21 and 22 illustrate this with a degenerate matrix. As expected, control is returned to R with the error message specified in line four. 12.4.3 Least-Squares Solvers Listing 12.7 in Sect. 12.3 already showed the use of the ldlt() member function for solving linear systems. The following example uses a basic SVD approach. The next section will revisit this problem in more detail. 1 3 5 7 9 11 13 R> src <- ’ + using namespace Eigen; + const Map<MatrixXd> X(as<Map<MatrixXd> >(Xs)); + const Map<VectorXd> y(as<Map<VectorXd> >(ys)); + VectorXd x = X.jacobiSvd(ComputeThinU|ComputeThinV).solve(y); + return wrap(x);’ R> lsEx <- cxxfunction(signature(Xs = "matrix", ys = "vector"), + body=src, plugin="RcppEigen") R> data(cars) R> X <- cbind(1, log(cars[,"speed"])) R> y <- log(cars[,"dist"]) R> lsEx(X, y) [1] -0.729669 1.602391 R> Listing 12.10 Computing least-squares using Eigen We use the standard R data set cars for the well-known regression example of fitting the logarithm of distance to a constant and the logarithm of speed. 12.4.4 Rank-Revealing Decompositions The Eigen library also supports a number of rank-revealing decompositions which can compute the rank of the matrix they are operating on. Such methods tend to be 186 12 RcppEigen best-behaved in the case of matrices of less than full rank, as, for example, singular matrices in the case of squared dimensions. The reference in footnote 3 on page 183 provides the full details about all available methods. 2 4 6 8 10 12 14 16 18 R> src <- ’ + using namespace Eigen; + const Map<MatrixXd> A(as<Map<MatrixXd> >(As)); + FullPivLU<MatrixXd> lu_decomp(A); + return List::create(Named("rank") = lu_decomp.rank(), + Named("nullSpace") = lu_decomp.kernel(), + Named("colSpace") = lu_decomp.image(A)); ’ R> rrEx <- cxxfunction(signature(As = "mat"), body=src, plugin=" RcppEigen") R> A <- matrix(c(1,2,5, 2,1,4, 3,0,3),3,3,byrow=TRUE) R> rrEx(A) $rank [1] 2 $nullSpace [,1] [1,] 0.5 [2,] 1.0 [3,] -0.5 24 $colSpace [,1] [,2] [1,] 5 1 [2,] 4 2 [3,] 3 3 26 R> 20 22 Listing 12.11 Rank-revelaing decompositions using Eigen The example discussed in this section illustrates how fine-grained the Eigen API is: a variety of basic decompositions (SVD, LU, QR, . . . ) can be deployed, and pivoting schemes are available for several of them. The vignette in package RcppEigen also provides more detail. The next section provides a more in-depth discussing about how to use these in order to estimate linear models. 12.5 Case Study: C++ Factory for Linear Models in RcppEigen The RcppEigen package continues a theme started by RcppArmadillo (François et al. 2012) and RcppGSL (François and Eddelbuettel 2010). It consists of taking the venerable linear model estimation as the basis for comparison between different linear algebra implementations. Doug Bates took this a step further with RcppEigen by providing a complete “factory” for linear models. 12.5 Case Study: C++ Factory for Linear Models in RcppEigen 187 A “factory,” in software engineering parlance, is a set of code, frequently implemented as functions, that produces objects given a set of parameters. Often these objects stem from classes which are related by class inheritances. This is commonly implemented with a base, or top-level, class from which the various models derive. One or more parameters are then used to select and initiate the type of object desired. In our context, this provides an excellent illustration for both a set of more advanced C++ code and an opportunity to detail more of the components of Eigen and RcppEigen. The lm class in Listing 12. 12 is the base class from which the factory methods derive. 2 4 namespace lmsol { using Eigen::ArrayXd; using Eigen::Map; using Eigen::MatrixXd; using Eigen::VectorXd; 6 class lm { protected: Map<MatrixXd> Map<VectorXd> MatrixXd::Index MatrixXd::Index MatrixXd::VectorXd int MatrixXd::VectorXd MatrixXd::VectorXd MatrixXd::RealScalar 8 10 12 14 16 18 bool m_X; // model matrix m_y; // response vector m_n; // number of rows of X m_p; // number of columns of X m_coef; // coefficient vector m_r; // comp. rank or NA_INTEGER m_fitted; // vector of fitted values m_se; // standard errors m_prescribedThreshold; // user specified tolerance m_usePrescribedThreshold; 20 public: lm(const Map<MatrixXd>&, const Map<VectorXd>&); 22 ArrayXd Dplus(const ArrayXd& D); MatrixXd I_p() const { return MatrixXd::Identity(m_p, m_p); } MatrixXd XtX() const; 24 26 28 // setThreshold + threshold based on ColPivHouseholderQR RealScalar threshold() const; const VectorXd& se() const {return m_se;} const VectorXd& coef() const {return m_coef;} const VectorXd& fitted() const {return m_fitted;} int rank() const {return m_r;} lm& setThreshold(const RealScalar&); }; 30 32 34 36 38 // .. 40 } Listing 12.12 Core of definition of lm class in Eigen 188 12 RcppEigen The implementation of the non-inlined member functions is provided in the package RcppEigen as the source file fastLm.cpp. We will omit these functions here due to space constraints. With the declarations of the basic linear model class lm, we can define specializations providing the various decompositions. As these classes all inherit from lm, they share all its member functions and variables shown in Listing 12. 12 yet each add their own specific decomposition function—simply by instantiating the corresponding class from Eigen. In the implementation shown below in Listing 12. 13, the classes deriving from the lm class shown above all instantiate an Eigen object of the same name. This is made possible by the different namespaces. The Eigen namespace is used by the Eigen package (and we have omitted a number of statements such as using Eigen::Llt which permit use of the Llt class without the namespace prefix), and the lmsol namespace is used for the “linear model solutions” implemented in this example from the RcppEigen package. So to take the first example, the ColPivQR class in the lmsol namespace inherits from lm in the same namespace and provides access to the Eigen::ColPivQR class from RcppEigen. We sometimes prefer to write this explicitly—and the two forms lmsol::ColPivQR and Eigen::ColPivQR make the provenance more explicit. 2 4 6 8 class ColPivQR : public lm { public: ColPivQR(const Map<MatrixXd>&, const Map<VectorXd>&); }; class Llt : public lm { public: Llt(const Map<MatrixXd>&, const Map<VectorXd>&); }; 10 12 14 16 18 class Ldlt : public lm { public: Ldlt(const Map<MatrixXd>&, const Map<VectorXd>&); }; class QR : public lm { public: QR(const Map<MatrixXd>&, const Map<VectorXd>&); }; 20 22 24 26 28 30 class GESDD : public lm { public: GESDD(const Map<MatrixXd>&, const Map<VectorXd>&); }; class SVD : public lm { public: SVD(const Map<MatrixXd>&, const Map<VectorXd>&); }; 12.5 Case Study: C++ Factory for Linear Models in RcppEigen 32 34 189 class SymmEigen : public lm { public: SymmEigen(const Map<MatrixXd>&, const Map<VectorXd>&); }; Listing 12.13 Derived classes of lm providing specializations With these declarations (and the actual implementations which are included in the RcppEigen package as file fastLm.cpp), we can show the implementation of the C++ part of the fastLm() function. But before we go there, we will illustrate two of the different constructors. 1 3 5 7 9 11 13 15 17 QR::QR(const Map<MatrixXd> &X, const Map<VectorXd> &y) : lm(X, y) { HouseholderQR<MatrixXd> QR(X); m_coef = QR.solve(y); m_fitted = X * m_coef; m_se = QR.matrixQR().topRows(m_p). triangularView<Upper>(). solve(I_p()).rowwise().norm(); } Llt::Llt(const Map<MatrixXd> &X, const Map<VectorXd> &y) : lm(X, y) { LLT<MatrixXd> Ch(XtX().selfadjointView<Lower>()); m_coef = Ch.solve(X.adjoint() * y); m_fitted = X * m_coef; m_se = Ch.matrixL().solve(I_p()).colwise().norm(); } Listing 12.14 Implementation of two subclass constructors for lm model fit These two examples show that particular aspects of the respective Eigen classes are used. For the QR decomposition variant of the linear model, the coefficients are provided via solve() to obtain the parameter vector. Fitted values are then just a multiplication with the original design matrix, and standard errors can be computed exploiting properties of the QR decomposition. This is similar for the Llt approach; the full source of fastLm.cpp in the package RcppEigen provides the full detail. Before we can address the implementation of the actual linear model fit, we first define an inlined helper function. It creates the corresponding object given the matrix X, vector y, and a variable named type to select the given “type” of decomposition used for the model fit: 1 3 5 7 9 static inline lm do_lm(const Map<MatrixXd> &X, const Map<VectorXd> &y, int type) { switch(type) { case ColPivQR_t: return ColPivQR(X, y); case QR_t: return QR(X, y); case LLT_t: return Llt(X, y); 190 12 RcppEigen case LDLT_t: return Ldlt(X, y); case SVD_t: return SVD(X, y); case SymmEigen_t: return SymmEigen(X, y); case GESDD_t: return GESDD(X, y); } throw invalid_argument("invalid type"); return ColPivQR(X, y); // -Wall 11 13 15 17 19 21 } Listing 12.15 Selection of subclasses for lm model fit Note that as the do_lm function is in the lmsol namespace, it does instantiate the subclasses of lm declared in Listing 12. 13 rather than the Eigen classes they provide access to. Finally, the actual linear model function called from R : 2 4 6 extern "C" SEXP fastLm(SEXP Xs, SEXP ys, SEXP type) { try { const Map<MatrixXd> X(as<Map<MatrixXd> >(Xs)); const Map<VectorXd> y(as<Map<VectorXd> >(ys)); Index n = X.rows(); if ((Index)y.size() != n) throw invalid_argument("size mismatch"); 8 10 12 14 16 18 // Select and apply the least squares method lm ans(do_lm(X, y, ::Rf_asInteger(type))); // Copy coefficients and install names, if any NumericVector coef(wrap(ans.coef())); List dimnames(NumericMatrix(Xs).attr("dimnames")); if (dimnames.size() > 1) { RObject colnames = dimnames[1]; if (!(colnames).isNULL()) coef.attr("names") = clone(CharacterVector(colnames)); } 20 22 24 26 28 30 32 34 VectorXd resid = y - ans.fitted(); int rank = ans.rank(); int df = (rank == ::NA_INTEGER) ? n - X.cols() : n - rank; double s = resid.norm() / std::sqrt(double(df)); // Create the standard errors VectorXd se = s * ans.se(); return List::create(_["coefficients"] _["se"] _["rank"] _["df.residual"] _["residuals"] _["s"] _["fitted.values"] = = = = = = = coef, se, rank, df, resid, s, ans.fitted()); 12.5 Case Study: C++ Factory for Linear Models in RcppEigen 191 } catch( std::exception &ex ) { forward_exception_to_r( ex ); } catch(...) { ::Rf_error( "c++ exception (unknown reason)" ); } return R_NilValue; // -Wall 36 38 40 } Listing 12.16 Actual fastLm function in RcppEigen package Here the ans object is instantiated with the return from the do_lm function from the preceding listing. This ans object then provides the appropriate solutions given the type of decomposition chosen. Together, this implements a very elegant setup providing a large number of different approaches (which can then be compared) with a minimal amount of code repetition. This illustrates nicely how C++ design choices enable us to provide code very effectively from R while also computing efficiently thanks to the advanced features in Eigen. The package vignette (Bates et al. 2012) has the complete details, but we can restate the result of the comparison computed with the help of the code listings shown above (as well as the rest not shown here but available in the RcppEigen package). All solutions referenced in Table 12.2 refer to the corresponding Eigen classes as implemented in the fastLm function in the RcppEigen package. Exceptions are “arma” and “GSL” for the corresponding fastLm functions from the RcppArmadillo and RcppGSL packages, and “lm.fit” for the base R function. Table 12.2 lmBenchmark results for the RcppEigen example Method Relative Elapsed User Sys LDLt LLt SymmEig QR arma PivQR lm.fit GESDD SVD GSL 1.000 1.003 2.629 5.117 5.215 5.502 6.086 9.582 33.932 115.522 4.423 4.438 11.629 22.631 23.068 24.335 26.919 42.379 150.082 510.955 4.388 4.389 10.253 21.205 77.020 22.477 45.143 126.832 145.781 601.682 0.020 0.032 1.320 1.340 15.045 1.772 50.951 39.782 3.753 701.116 The timings are from a desktop computer running the default size, 100, 000 × 40, full-rank model matrix running 100 repetitions for each method. Times (Elapsed, User and Sys) are in seconds. The BLAS in use is the version of the OpenBLAS library included with Ubuntu 12.10. The processor used for these timings is a 4-core processor but almost all the methods are single-threaded and not affected by the number of cores. Only the arma, lm.fit, GESDD, and GSL methods benefit from the multi-threaded BLAS implementation provided by OpenBLAS. 192 12 RcppEigen What we can take away from these results is that methods based on forming and decomposing X X (LDLt, LLt and SymmEig) are considerably faster. The pivoted QR method is marginally times faster than lm.fit (from R ) on this test and provides nearly the same information as lm.fit (which has improved its performance relative to older versions of R). Methods based on the singular value decomposition (SVD and GSL) are much slower, which is presumably caused at least in part by X having many more rows than columns. Also, the GSL method from the GNU Scientific Library uses an older algorithm for the SVD and is clearly not competitive in this comparison. We also note that GESDD implements an interesting hybrid approach by using Eigen classes, but calling out to the LAPACK routine dgesdd for the actual SVD calculation. This leads to better performance compared to using the SVD implementation of Eigen which, while not as bad as the GSL, is still not a particularly fast SVD method. This example, developed by Doug Bates and implemented as an example in the RcppEigen package, provided a nice illustration about the potential for a Rcppbased solution to accelerate computations done in R. Rcpp permits us to connect to modern linear algebra libraries such as Armadillo (Sanderson 2010) and Eigen (Guennebaud et al. 2012) with ease. As can be seen from Table 12.2, a sizeable improvement can be achieved even against the fastest and purest solution offered by R, even if that solution is already fairly efficient and itself implemented mostly in compiled code as is the case with lm.fit(), the core function underlying linear model estimation in R. As of early 2013, over 100 on CRAN and 10 packages on BioConductor use Rcpp and offer a wide variety of different choices of how to enhance R seamlessly with C++. The rcpp-devel mailing list is vibrant, development of Rcpp continues at a rapid pace, and we look forward to more exciting activity in seamlessly bridging R and C++. Part V Appendix Appendix A C++ for R Programmers Abstract The short appendix offers a very basic introduction to the C++ language to someone already (at least somewhat) familiar with R programming. Introducing all of C++ in just a few pages is not really possible. Countless books have been written about the C++ language since its inception in the early 1990s (and we will list a few at the end in a section on further readings). A.1 Compiled Not Interpreted One of the key differences between R and C++ is that R is interpreted. It was designed for interactive exploration, visualization, and modeling. The flexibility that such a goal aspires to is most naturally reflected in a language with features common to those held by R. This includes “computing on the language” with objects which modify other objects, or functions and more. C++, on the other hand, came after C and has always been compiled. This means that we take a file containing source code and convert it into object code with a compiler. A linker then creates an executable out of the object code as well as potentially further libraries on the system. We should also note that R itself now ships with a (byte-code) compiler, but that is slightly different as it generates an intermediate level of parsed expressions, rather than machine code in object files as a standard compiler for a language such as C or C++ would. Let us consider a concrete example. If the code below 1 #include <cstdio> 3 int main(void) { printf("Hello, World!\n"); } 5 Listing A.1 Simple C++ example: Hello, World! is saved in a file ex1.cpp, then the commands D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4 13, © The Author 2013 195 196 1 A C++ for R Programmers sh> g++ -c ex1.cpp sh> g++ -o ex1 ex1.o Listing A.2 Compiling and linking simple C++ example: Hello, World! first compile the source file into the object file ex1.o as requested by the -c command-line option. Next, the resulting object file is linked into the executable ex1 as specified by the -o command-line argument. This can also be achieved in a single operation via sh> g++ ex1.cpp -o ex1 Listing A.3 Compiling and linking simple C++ example in one step: Hello, World! The resulting program ex1 can now be executed. It displays the text that is ever so common for first examples by calling the C-level function printf which may be somewhat familiar to R programmers via the related R function sprintf which uses similar formatting rules to print into a character variable. Notice how we also specified a so-called include file in the first line; it contains a number of function declarations related to input and output such as printf. These operations would be the same on any operating system on which g++ has been installed, in particular Windows, OS X, or Linux. The file extensions used for object files or executable may differ, the compile commands remain the same. As an aside, such portability of tools across operating system is a very useful attribute which the R software system shares. Other compilers can also be used for as long as they are supported by R itself. As noted in Chap. 2 above, this excludes the Microsoft family of compilers, but may include the (commercial) Intel Compiler on several platforms, the Sun compiler if installed on Solaris or Linux, and older Unix compilers such as the IBM AIX compiler and the HP UX compiler. However, as such operating systems are less commonly used, we will concentrate on g++. An important second aspect of compilation concerns how to build upon other projects via their libraries (providing code) and header files (providing declarations). For example, the R environment provides a stand-alone library with several of the mathematical, probability, and random-number functions used in R (R Development Core Team 2012d, Section 16.6). Consider the following example which uses the R stand-alone mathematics library to compute the 95 % percentile of the N(0, 1) distribution. 1 #include <cstdio> #include <Rmath.h> 3 5 7 int main(void) { printf("N(0,1) 95th percentile %9.8f\n", qnorm(0.95, 0.0, 1.0, 1, 0)); } Listing A.4 Simple C++ example using Rmath We can build this program via A.2 Statically Typed 1 197 sh> g++ -c ex2.cpp -DMATHLIB_STANDALONE -I/usr/include sh> g++ -o ex2 ex2.o -L/usr/lib -lRmath Listing A.5 Compiling and linking simple C++ example using Rmath which shows two new aspects in the compile step. First, we inform the compiler where to find the header file Rmath.h (which contains the declarations) by using the -I/usr/include switch. We define a variable MATHLIB_STANDALONE to enable the stand-alone use of the library outside of its normal deployment with the R engine. Next, for the linking step, the -L/usr/lib switch points to the library location whereas -lRmath enables linking with the R mathematics library from the file libRmath.so (or libRmath.a in case of static linking). In this particular case both the header file location and the library location actually correspond to system defaults. This means we could have omitted them both; however, it is instructive to show them in case the location does need to be specified as would be the case, say, with a local installation in the home directory of the user. Understanding compiling and linking options and common error messages is of some importance when working with compiled code. For most of our cases, R helps by providing complete wrappers via sub-commands such as R CMD COMPILE or R CMD LINK. However, it is helpful to understand basic compiling and linking in order to examine or debug possible build issues. A.2 Statically Typed A second key difference between R and C++ concerns the difference between dynamic and static typing. In R, an expression determines the type it is assigned to. In other words, in 2 R> x <- rnorm(10) R> x <- "some text" Listing A.6 Simple R example of dynamic types the variable x is first assigned a numeric (or floating-point) vector of size ten as returned from the rnorm function. This value in x is then replaced by characters as a fixed text is assigned. This is completely valid R code where the result of the expression determines the type of variable it is assigned to: dynamic typing. Statically typed languages such as C or C++ are different. Variables have to be declared first which assigns the name of a variable to a particular type. That type is then fixed for as long as this variable is in scope which may be as long as the program runs, or just a fraction of a second until the current scope (typically defined by a pair of curly braces) is exited. A certain number of assignments from one type to another are possible. For example, assigning a floating number such as 3.1415 to an integer truncates (rather than rounds) its value to 3. Assigning it back to a floating-point variable would then make it 3.0. In other words, the assignment may 198 A C++ for R Programmers (or may not) be losing precision, and it depends on the type of variable assigned to and from. Standard variable types in C++ are • Integer of different sizes and hence supported ranges of values; int and long are most common; they can also be qualified as unsigned which excludes negative values and thereby doubles the range of positive values. • Floating point numbers of lower (float) and higher (double) precision. • Logical values such as bool. • Character values as char but these are individual letters or symbols, not compounds such as strings as there is no base type for strings (but see below for the STL strings). Another key difference is that all these types are scalar. Vectors can be created statically with size fixed at compile time, or dynamically as in C. That is, however, a feature which can be avoided almost entirely by relying on STL types as discussed below. A.3 A Better C C++ can also be seen as better C. In fact, Meyers (2005) argues in his first of 55 “items” that C++ should be seen as a federation of four languages, with C being one of these. Hence, we need to review a few core language elements which are actually fairly similar to the R language. Control Structures C++ contains several control structures which are similar to those in R: • for loops are very common; they contain three components initialization, comparison for termination, and incrementing. So in 2 for (int i=0; i<10; i++) { // some code here } Listing A.7 Simple R example of dynamic types the loop body will be entered ten times with the variable i ranging from zero to nine. Once the expression i < 10 no longer evaluates to true, the code resumes after the end of the for block. • while loops are also similar with a top-level boolean expression and a loop body that is entered for as long as the condition is true; a related but much less used variant starts with the do keyword and the loop body and the test at the end; lastly, keywords break and next exist to exit the loop body and skip to the next iterations, respectively. A.3 A Better C 199 • if statements are very similar to what one uses in R with optional else blocks and nesting; very little is different here. • switch statements are an alternative to “ladders” of if/else as a single statement evaluates and the matching condition, represented by a case label, is executed, or else a default value is chosen. Functions Functions also share some similarities with their R equivalents. Functions can be defined to take a number of arguments. Argument matching is always by position; passing arguments by name as in R is not permitted. All listed function arguments have to be supplied. An example of this was the qnorm function above which we had to call with all five arguments. Its R version also has up to five arguments, but if called as qnorm(0.95), default values for mean, standard deviation, lower tail, and logarithmic use apply. In C++, we explicitly list all five arguments (though default arguments can be supplied as well in a function definition). Because the language is statically typed, functions are differentiated by both their names and argument types. That means that these two function declarations 1 int min(int a, int b); double min(double a, double b); Listing A.8 Simple C++ function example are in fact distinct. The compiler will call the corresponding ones for min(4, 5) and min(4.0, 5.0). Templates, discussed below, offer an approach to write more generic functions that apply to several variable types. Pointers and Memory Management Pointers and memory management is an important advanced topic, particularly for C programming. In C++, use of pointers can in many cases be avoided, diverting one frequent case of criticism. There are two very common use cases for pointers. The first one concerns dynamic memory allocation. In C, the only approach to reserve a vector or array (of, say, type double) of a size given only at run-time is to declare a pointer to double. At run-time, when the required size is known, this pointer is then assigned a dynamic memory allocation of the appropriate size determined as the number of required elements times the size of a double. After use, the memory has to be freed or else it leaks which means it is allocated but not used; a waste of system resources. This process sounds manual and error-prone, and it is. But with C++ and facilities such as the Standard Template Library (STL) discussed below, we do not need to resort to this approach to have dynamically sized vectors or arrays. 200 A C++ for R Programmers The second aspect concerns how arguments are passed to functions. There are two approaches in C and by extension C++. The first is call-by-value in which a copy is passed to the subroutine. It can be modified at will and changes will not affect the calling function and its value. That is safe yet occasionally inefficient (as larger compound data types will be copied in full) or even inapplicable. The second use case is call-by-reference. Here, a pointer is passed and one can access the original memory location. This is frequently more efficient, and also the only way to alter an object. C++ improves upon this setup by offering a call-by-reference without pointers. #include <cstdio> 2 4 6 8 10 12 void abs(double & x) { if (x < 0) x = -x; } int main(void) { double x = -3.4; printf("%f\n", x); abs(x); printf("%f\n", x); } Listing A.9 Simple C++ function call example Here a function which changes its argument to its absolute value is defined and tested. The output is first negative, and then positive. No pointers are used, yet the value is changed as the variable is passed by reference, indicated by the & in the function signature. (The example is contrived, typically we write a function for an absolute value as returning the modified value, rather than changing the argument.) A.4 Object-Oriented (But Not Like S3 or S4) The second of the four languages “federated inside Cpp” according to Meyers (2005) is object-oriented C++. There is an enormous amount of complexity around both the how and why of object-oriented programming, both in general and specifically in C++ which is arguably a fairly complex language. That said, some highlevel concepts are easy enough to express in a few paragraphs, and we will concentrate on such a higher-level approach here. The basic composite type in C++ is a struct, which is inherited from C. It offers the most basic form of composition as it permits to group several variables inside a newly defined type. 1 3 struct Date unsigned unsigned unsigned { int year int month; int date; A.5 Generic Programming and the STL 5 }; 7 struct Person { char firstname[20]; char lastname[20]; struct Date birthday; unsigned long id; }; 9 11 201 Listing A.10 Simple C++ data structure using struct Here, we define a Date type containing year, month, and date as separate unsigned integers. That structure is then reused in the Person structure. So far, so good. What is not to like? First, all data elements are by default public meaning every piece of code having access to the structure has the ability to change values. Second, the structure really only holds data but no code. The class data type overcomes both by associating methods (which are classspecific functions) with the class. Moreover, data can now be public (visible to all), private (visible only to methods of the class), or protected (a refinement having to do with inheritance we can ignore here). A possible sketch of a class declaration for a Date could be 2 4 6 8 10 class Date { private: unsigned int year unsigned int month; unsigned int date; public: void setDate(int y, int m, int d); int getDay(); int getMonth(); int getYear(); } Listing A.11 Simple C++ data structure using class This class contains a few changes as discussed above. Date fields are now private: data cannot be accessed directly from outside the class. To do so, we now have accessor functions. The first sets the date—and, in doing so, can ascertain that the date supplied is actually a valid date. This is followed by three more functions which access the date components. The implementation of the function bodies would then be supplied in a matching cpp file complementing the declaration from a header file. A.5 Generic Programming and the STL The STL provides the third distinct language aspect within the “federation of four languages” view described by Meyers (2005). The STL has become a staple of C++ programming for efficient and generic programming (Austen 1999). In this 202 A C++ for R Programmers context, generic means a consistent interface that is provided irrespective of the chosen data type. As one illustrative example, consider the so-called sequence container types vector, deque, and list. Each of these supports common functions such as push back() to insert at the end pop back() to remove from the front begin() returning an iterator to the first element end() returning an iterator to just after the last element size() for the number of elements and more similar functions. However, list offers different performance guarantees and implementation details than vector so it can also offer complementary functions such as push_front() and pop_front() which a vector does not have. On the other hand, v[i] can access the element with index i in the vector (which offers random access), whereas a list has to be traversed. The deque class provides aspects of both vector and lists and can be seen as a compromise or superset of both features sets. Other commonly used container types are associative: set for collections of objects where both key and value are the same; it provides set-theoretic operations such as union and intersect. multiset which extends set by allowing several instances of the same key/value. map which is pair associative container linking a key to a value; both can be of different types mapping, e.g., a numeric index (e.g., a zip or postal code represented as an integer) to a character vector or string type with the name of the municipality. multimap which extends map by allowing an unlimited number of values for a given key. as well as hashed versions of these types. In the original SGI implementation of the STL these are named hash_* but the name ordered_* was chosen in the current TR1 implementation of the upcoming C++ standard. One commonality of both sequence and associative containers is the traversal via iterators. Consider this example for the vector class where we use the const_iterator variant which indicates that it accesses elements read-only but never modifies: 1 3 std::vector<double>::const_iterator si; for (si=s.begin(); si != s.end(); si++) { std::cout << *si << std::endl; } Listing A.12 Simple C++ example using iterators on vector We can use the exact same for loop simply by modifying the iterator to be a list type A.6 Template Programming 203 std::list<double>::const_iterator si; Listing A.13 Simple C++ example using const iterators on list or change it to the deque type 1 std::deque<double>::const_iterator si; Listing A.14 Simple C++ example using const iterators on deque which illustrates the generic nature of STL operators. The STL also contains a number of algorithms. A simple one is accumulate which can be used as 1 3 std::cout << "Sum is " << std::accumulate(s.begin(), s.end(), 0) << std::endl; Listing A.15 Simple C++ example using accumulate algorithm and this is irrespective of what class the object s is instantiated from for as long as it supports iterator access as well as begin() and end(). The third argument is the initial value of the summation which we set to zero. Other popular STL algorithms are find which finds the first element equal to the supplied value, if any. count which counts the number of elements match the given condition. transform which applies a supplied unary or binary function to each element. for each which also sweeps over all elements but does not alter elements. inner product which can be used to compute the inner product of two vectors, or a sum of squares of a single vector. The key insight here is that these algorithms and iterators can be applied to the different data structures—sequential containers as well as associative containers— with very minimal change. It is in this sense that programming with the STL is “generic.” Many more algorithms are available and described, for example, in Meyers (2001). A final note is that with the STL now being part of the language standard, the use of the term “STL” which refers to what was once an external library is no longer entirely correct. The Standard C++ Library is more appropriate. However, it is still common to refer to these parts of the library as STL reflecting their historical source of having been an extension to the then-smaller standard library. A.6 Template Programming Template programming provides the fourth and last language federated within C++ as per Meyers (2005). It is arguably the most complex aspect and the one in which C++ differs most from other related languages such as Java or C#. 204 A C++ for R Programmers Templates programming and its use can range from the very simple to the very complex. Examples for the more complex end of template use are provided by the template meta-programming technique. It is at the core of both Armadillo (Sanderson 2010), Eigen (Guennebaud et al. 2012), and Rcpp sugar discussed in Chap. 8. This section, however, will focus on simpler uses of templates. An example above considered different min functions for integers and doubles. A more general solution uses templates: 1 3 template <typename T> const T& min(const T& x, const T& y) { return y < x ? y : x; } Listing A.16 Simple C++ template example This returns a constant reference of the templated type T which is also used for the input types. The sole expression uses the standard C comparison operator to return the smaller of the two arguments x and y. A simple example of template use was already shown in Sect. 2.5.2. That section illustrates how to combine the inline package with short text and code snippets similar to header files. As a concrete example, the following templated class that squares its input was shown: 2 4 6 template <typename T> class square : public std::unary_function<T,T> { public: T operator()( T t) const { return t*t; } }; Listing A.17 Another C++ template Templates are used throughout the Rcpp sources. Key components of Rcpp such as the conversion function as<>() are implemented using templates. The as<>() conversion function accepts a template type and converts a SEXP provided as input to this type (provided the conversion is suitable; else an exception is thrown). However, the inverse operation, provided by wrap, is a standard function which does not use templates: the dispatch is done on the basis of the main argument type. Template programming is a more advanced form of C++ use. It can get rather complicated rather quickly; so we will not dive deeper into templates but refer the reader to the literature. A.7 Further Reading on C++ A standard reference and introduction to C++ is provided by the creator of the language in Stroustrup (1997). This book is generally not recommended as a first book on C++ for which Lippman et al. (2005) is more frequently listed. Meyers (2005, A.7 Further Reading on C++ 205 1995, 2001) is a highly recommended and readable series of “items” suggesting best practices for C++ and STL use. As C++ is a popular and widely used programming language, several good resources exist on the Internet as well. The Wikipedia page1 provides a very good start with numerous further references. Brokken (2012) is a recommended and freely downloadable text introducing C++ which has been maintained and extended since 1994. Also, good introductions to template programming are provided by Abrahams and Gurtovoy (2004) and Vandevoorde and Josuttis (2003). Finally, among C++ projects, Boost (at http://www.boost.org) stands out and deserves special mention. Boost is a collection of several dozen rigorously developed and peer-reviewed libraries. Some of the Boost libraries will be included in the next version of the C++ standard. 1 See http://en.wikipedia.org/wiki/C++. References Abrahams D, Grosse-Kunstleve RW (2003) Building Hybrid Systems with Boost.Python. Boost Consulting, URL http://www.boostpro.com/ writing/bpl.pdf Abrahams D, Gurtovoy A (2004) C++ Template Metaprogramming: Concepts, Tools and Techniques from Boost and Beyond. Addison-Wesley, Boston Adler D (2012) rdyncall: Improved Foreign Function Interface (FFI) and Dynamic Bindings to C Libraries. URL http://CRAN.R-Project.org/ package=rdyncall, R package version 0.7.5 Albert C, Vogel S (2012) GUTS: Fast Calculation of the Likelihood of a Stochastic Survival Model. URL http://CRAN.R-Project.org/package=GUTS, R package version 0.2.8 Armstrong W (2009a) RAbstraction: C++ abstraction for R objects. URL http:// github.com/armstrtw/rabstraction, code repository last updated July 22, 2009. Armstrong W (2009b) RObjects: C++ wrapper for R objects (a better implementation of RAbstraction). URL http://github.com/armstrtw/ RObjects, code repository last updated November 28, 2009. Auguie B (2012a) cda: Couple dipole approximation. URL http://CRAN. R-Project.org/package=cda, R package version 1.2.1 Auguie B (2012b) planar: Multilayer optics. URL http://CRAN.R-Project. org/package=planar, R package version 1.2.4 Austen MH (1999) Generic Programming and the STL: Using and Extending the C++ Standard Template Library. Addison-Wesley Bates D, DebRoy S (2001) C++ classes for R objects. In: Hornik K, Leisch F (eds) Proceedings of the 2nd International Workshop on Distributed Statistical Computing (DSC 2001), TU Vienna, Austria Bates D, Eddelbuettel D (2013) Fast and elegant numerical linear algebra using the RcppEigen package. Journal of Statistical Software 52(5), URL http://www. jstatsoft.org/v52/i05 D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4, © The Author 2013 207 208 References Bates D, François R, Eddelbuettel D (2012) RcppEigen: Rcpp integration for the Eigen templated linear algebra library. URL http://CRAN.R-Project. org/package=RcppEigen, R package version 0.3.1.2 Brokken FB (2012) C++ annotations. Electronic book, University of Groningen, URL http://www.icce.rug.nl/documents/cplusplus/, version 9.4.0, accessed 2012-11-24. Chambers JM (1998) Programming with Data: A Guide to the S Language. Springer, Heidelberg, ISBN 978-0387985039 Chambers JM (2008) Software for Data Analysis: Programming with R. Statistics and Computing, Springer, Heidelberg, ISBN 978-0-387-75935-7 Chambers JM, Hastie TJ (1992) Statistical Models in S. Chapman & Hall, London Eddelbuettel D (2012a) RcppCNPy: Rcpp bindings for NumPy files. URL http://CRAN.R-Project.org/package=RcppCNPy, R package version 0.2.0 Eddelbuettel D (2012b) RcppDE: Global optimization by differential evolution in C++. URL http://CRAN.R-Project.org/package=RcppDE, R package version 0.1.1 Eddelbuettel D, François R (2012a) Rcpp: Seamless R and C++ Integration. URL http://CRAN.R-Project.org/package=Rcpp, R package version 0.10.0 Eddelbuettel D, François R (2012b) RcppBDT: Rcpp binding for the Boost Date Time library. URL http://CRAN.R-Project.org/ package=RcppBDT, R package version 0.2.1 Eddelbuettel D, François R (2012c) RcppClassic: Deprecated ’classic’ Rcpp API. URL http://CRAN.R-Project.org/package=RcppClassic, R package version 0.9.2 Eddelbuettel D, François R (2012d) RInside: C++ classes to embed R in C++ applications. URL http://CRAN.R-Project.org/package=RInside, R package version 0.2.7 Eddelbuettel D, Nguyen K (2012) RQuantLib: R interface to the QuantLib library. URL http://CRAN.R-Project.org/package=RQuantLib, R package version 0.3.9 Eddelbuettel D, Sanderson C (2013) RcppArmadillo: Accelerating R with highperformance C++ linear algebra. Computational Statistics and Data Analysis (in press) Fellows I (2012) wordcloud: Word clouds. URL http://CRAN.R-Project. org/package=wordcloud, R package version 2.2 François R (2012a) highlight: Syntax highlighter. URL http://CRAN. R-Project.org/package=highlight, R package version 0.3–2 François R (2012b) parser: Detailed R source code parser. URL http://CRAN. R-Project.org/package=parser, R package version 0.1 François R, Eddelbuettel D (2010) RcppGSL: Rcpp integration for GNU GSL vectors and matrices. URL http://CRAN.R-Project.org/ package=RcppGSL, R package version 0.2.0 References 209 François R, Eddelbuettel D, Bates D (2012) RcppArmadillo: Rcpp integration for Armadillo templated linear algebra library. URL http://CRAN. R-Project.org/package=RcppArmadillo, R package version 0.3.4.4 Galassi M, Davies J, Theiler J, Gough B, Jungman G, Alken P, Booth M, Rossi F (2010) GNU Scientific Library Reference Manual. 3rd edn, URL http:// www.gnu.org/software/gsl, version 1.14. ISBN 0954612078 Gentleman R (2009) R Programming for Bioinformatics. Computer Science and Data Analysis, Chapman & Hall/CRC, Boca Raton, FL Gropp W, Lusk E, Doss N, Skjellum A (1996) A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing 22(6):789–828, URL http://dx.doi.org/10.1016/ 0167-8191(96)00024-5 Gropp W, Lusk E, Skjellum A (1999) Using MPI: Portable Parallel Programming with the Message Passing Interface, 2nd edn. Scientific and Engineering Computation Series, MIT Press, ISBN 978-0-262-57132-6 Guennebaud G, Jacob B, et al (2012) Eigen v3. URL http://eigen. tuxfamily.org Hankin RKS (2011) gsl: Wrapper for the Gnu Scientific Library. URL http:// CRAN.R-Project.org/package=gsl, R package version 1.9–9 Java JJ, Gaile DP, Manly KE (2007) R/Cpp: Interface classes to simplify using R objects in C++ extensions, URL http://sphhp.buffalo.edu/biostat/ research/techreports/UB_Biostatistics_TR0702.pdf, unpublished manuscript, University at Buffalo Jurka TP, Tsuruoka Y (2012) maxent: Low-memory Multinomial Logistic Regression with Support for Text Classification. URL http://CRAN.R-Project. org/package=maxent, R package version 1.3.2 King M, Diaz FC (2011) RSofia: Port of sofia-ml to R. URL http://CRAN. R-Project.org/package=RSofia, R package version 1.1 Kusnierczyk W (2012) rbenchmark: Benchmarking routine for R. URL http:// CRAN.R-Project.org/package=rbenchmark, R package version 1.0 Leisch F (2008) Tutorial on Creating R Packages. In: Brito P (ed) COMPSTAT 2008 – Proceedings in Computational Statistics, Physica Verlag, Heidelberg, Germany, URL http://CRAN.R-Project.org/doc/contrib/ Leisch-CreatingPackages.pdf Liang G (2008) rcppbind: A template library for R/C++ developers. URL http:// R-Forge.R-Project.org/projects/rcppbind, R package version 1.0 Lippman SB, Lajoie J, Moo BE (2005) The C++ Primer, 4th edn. Addison-Wesley Matloff N (2011) The Art of R Programming: A Tour of Statistical Software Design. No Starch Press, San Francisco, CA Meyers S (1995) More Effective C++: 35 New Ways to Improve Your Programs and Designs. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, ISBN 020163371X 210 References Meyers S (2001) Effective STL: 50 specific ways to improve your use of the standard template library. Addison-Wesley Longman Ltd., Essex, UK, ISBN 0-20174962-9 Meyers S (2005) Effective C++: 55 Specific Ways to Improve Your Programs and Designs, 3rd edn. Addison-Wesley Professional, ISBN 978-0321334879 R Development Core Team (2012a) R Installation and Administration. R Foundation for Statistical Computing, Vienna, Austria, URL http://CRAN. R-Project.org/doc/manuals/R-admin.html, ISBN 3-900051-09-7 R Development Core Team (2012b) R internals. R Foundation for Statistical Computing, Vienna, Austria, URL http://CRAN.R-Project.org/doc/ manuals/R-ints.html, ISBN 3-900051-14-3 R Development Core Team (2012c) R language. R Foundation for Statistical Computing, Vienna, Austria, URL http://CRAN.R-Project.org/doc/ manuals/R-lang.html, ISBN 3-900051-13-5 R Development Core Team (2012d) Writing R extensions. R Foundation for Statistical Computing, Vienna, Austria, URL http://CRAN.R-Project.org/ doc/manuals/R-exts.html, ISBN 3-900051-11-9 Runnalls A (2009) Aspects of CXXR internals. In: Directions in Statistical Computing, University of Copenhagen, Denmark Sanderson C (2010) Armadillo: An open source C++ algebra library for fast prototyping and computationally intensive experiments. Tech. rep., NICTA, URL http://arma.sf.net Sklyar O, Murdoch D, Smith M, Eddelbuettel D, François R (2012) inline: Inline C, C++, Fortran function calls from R. URL http://CRAN.R-Project.org/ package=inline, R package version 0.3.10 Stroustrup B (1997) The C++ Programming Language, 3rd edn. Addison-Wesley Temple Lang D (2009a) A modest proposal: an approach to making the internal R system extensible. Computational Statistics 24(2):271–281 Temple Lang D (2009b) Working with meta-data from C/C++ code in R: the RGCCTranslationUnit package. Computational Statistics 24(2):283–293 Thomas A, Redd A (2012) transmission: Continuous time infectious disease models on individual data. URL http://CRAN.R-Project.org/ package=transmission, R package version 0.1 Urbanek S (2003) Rserve: A fast way to provide R functionality to applications. In: Hornik K, Leisch F, Zeileis A (eds) Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), TU Vienna, Austria Urbanek S (2012) Rserve: Binary R server. URL http://CRAN.R-Project. org/package=Rserve, R package version 0.6–8 Vandevoorde D, Josuttis NM (2003) C++ Templates: The Complete Guide. Addison-Wesley, Boston Venables WN, Ripley BD (2000) S Programming. Statistics and Computing, Springer-Verlag, New York Subject Index A Analysis, 3, 144 Application Programming Interface (API), 19, 22, 155 Armadillo, 15, 54, 75, 139–153, 158, 177, 183, 191, 192, 204 colvec, 30, 141, 142 eye(), 151 mat, 30, 140–142 randn(), 140 solve(), 141, 142 trans(), 140, 151 zeros(), 151 autoconf, 165 B B-spline, 169–175 Basic Linear Algebra Subroutine (BLAS), 146, 153, 191 Bell Labs, xi, 3 Benchmark, 10, 17, 20, 115, 124, 143, 152, 180, 183, 191 BioConductor, 59, 192 Boost, 80, 81, 86, 205 Date Time, 80, 81 Python, 83, 86 Bootstrap, 4 Byte-compiler, see R, Byte-compiler C C, 25, 198 C++ class, 117–118 Curiously Recurring Template Pattern (CRTP), 117, 118 Expression templates, 104 Functor, 110 Lazy evaluation, 104, 107, 108 Operator overloading, 105 std::unary function, 110 template, 110, 117, 118 Template meta-programming, 104 C++, 195–205 C++11, 11, 21, 22 class, 12, 27, 29, 77, 84, 85, 90–98, 150, 201 Conditional execution, 199 deque, 203 Exceptions, 32–35, 142 Federation of languages, 6, 198 Function, 199 Iterator, 158, 202 list, 203 Loops, 198 map, 202 multimap, 202 Object-orientation, 6, 150, 201, 200–201 Pointers, 199–200 Scope, 197 set, 202 Static typing, 197–198 std::accumulate(), 43, 160, 203 std::begin(), 202 std::count(), 203 std::end(), 202 std::find(), 203 std::for each(), 203 std::inner product(), 141, 142, 203 std::list, 132 std::map, 58, 132 std::multiplies(), 43 std::pop back(), 202 std::push back(), 202 std::size(), 202 std::string, 58, 87, 132, 133 D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4, © The Author 2013 211 212 std::transform(), 48, 156, 203 std::unary function, 29 std::vector, 45, 49, 76, 97–98, 132 str::string, 49 struct, 200 template, 29, 75, 78–81, 203–204 Template meta-programming, 76, 78, 178, 203 Cascading Style Sheets (CSS), 137 cfunction, see R, cfunction clang++, 21, 22 Compiler, 19, 20, 23, 26, 130, 195–197 configure, 166 configure.in, 166 Convolution, 26 CRAN, 19, 25, 59, 74, 153, 155, 192 cxxfunction, see R, cxxfunction D Density estimation, 3, 136 Domain-specific language (DSL), 3 dyn.load(), 25 Dynamic linking, 19 E Econometrics, 15 Eigen, 75, 177–192 Array33d, 180 Array3d, 180 ArrayXd, 180, 187 ArrayXXd, 180 Cholesky decomposition with pivoting, 183 Column-pivoting Householder QR, 184 Column-pivoting QR Decomposition, 188 Full Pivoting LU Decomposition, 186 GESDD, 191 HouseholderQR, 189 Jacobi SVD, 185 LDLT, 183, 191, 192 LLT, 189, 191, 192 Map, 181, 184–187, 189, 191 Matrix3d, 178 MatrixXd, 179, 183–187, 189, 191 MatrixXi, 179 PivQR, 191 QR, 191 SelfAdjointEigenSolver, 185 solve(), 185, 186 SparseMatrix, 181 SVD, 191 SymmEig, 191, 192 Vector3d, 178 VectorXd, 179, 183–187, 189, 191 Subject Index VectorXi, 179 Environment variable, 24 F Fibonacci Sequence, 7–15, 25, 31 Spiral, 7, 8 Fortran, 25 G g++, see GNU Project C and C++ Compiler gcc, see GNU Project C and C++ Compiler GNU Project C and C++ Compiler, 20, 21, 23, 24, 130, 196, 197 GNU Scientific Library (GSL), 30–32, 75, 155–175, 191, 192 gsl-config, 30, 166 gsl blas dnrm2, 167, 169 gsl bspline, 171, 174 gsl const mksa, 32 gsl matrix, 163, 164, 171, 174 gsl multifit linear, 156 gsl multifit wlinear, 171, 174 gsl ran gaussian, 171, 174 gsl rng, 30, 173, 174 gsl stats wtss, 171, 174 gsl vector, 158–162, 171, 174 gsl vector view, 156 GSL, see GNU Scientific Library I inline, 9–11, 25–31, 168 cfunction, see R, cfunction cxxfunction, see R, cxxfunction Interface, 9 K Kalman filter, 146–152, 182–183 Kernel estimator, 3, 136 L Linear Model, 29, 140–146, 156–158, 169, 186–192 Linker, 19, 26, 130, 195–197 LU Decomposition, 186 M Makefile, 24, 129, 130 Makefile.win, 129 Matlab, 139, 146, 147 Memoization, 12–13 Message Passing Interface (MPI), 134 Modeling, 5 Subject Index O Object orientation, 3 Object-orientation, 6, 58–60, 150 Octave, 146 Old Faithful, 3 OLS, see Linear Model OS X, see Platform, OS X P Platform HP UX, 196 IBM, 196 Linux, 22, 25, 26, 196 OS X, 19, 21, 22, 24–26, 196 Other, 22 Solaris, 20, 22, 153, 196 Windows, 19–21, 25, 26, 70, 129, 131, 153, 196 Portable Network Graphics (PNG), 133 Python, 100 Q QR Decomposition, 184, 186, 188, 189, 191 Quantile estimation, 4, 5 R R .C(), 23 .Call, 160, 161 .Call(), 9, 23, 25, 67, 83, 84, 143, 160, 168 Application Programming Interface (API), 22, 83, 128 apply, 165 apply(), 4 benchmark(), 17 Byte-compiler, 10, 17, 195 cfunction(), 25 CMD COMPILE, 31, 142 CMD LINK, 130, 197 CMD SHLIB, 24, 26, 31, 142 cxxfunction(), 9, 10, 13, 14, 17, 25–27, 29–31, 33, 34, 42, 43, 45–49, 52, 53, 55–57, 60, 76, 142, 143, 151, 169, 185, 186 density(), 4 DESCRIPTION, 26, 65, 67, 69, 99 dyn.load(), 25 Embedding, 128 environment(), 99 generic function, 58 History, 3 Inf, 49 Makevars, 26, 67, 69, 166 Makevars.win, 67, 69 213 NA, 49 NAMESPACE, 67, 71, 98, 99 NaN, 49 Object orientation, 3 Object-orientation, 6 package.skeleton(), 66 pnorm(), 60 polygon(), 4 PROTECT, 41 qnorm(), 196 quantile(), 4 Reference Classes, 3, 59–90 replicate(), 4 Rmath, 60, 61, 114, 115, 197 S3, 3 S4, 3, 41, 58, 59, 86 sample(), 4, 56 sapply(), 116 Scope, 3 set.seed(), 57 SEXP, 9, 23, 39, 40, 43, 46, 53, 60, 67, 73, 75, 76, 79, 81, 84, 85, 101, 104, 132, 181, 204 UNPROTECT, 41 Rank deficiency, 144, 146 Rcpp Application Programming Interface, 39–49, 51–60 as(), 9, 10, 13, 14, 17, 29, 31, 34, 45, 53, 73, 75, 76, 78–81, 84, 85, 88, 95, 133, 177, 204 attributes, 11–12, 31–32, 172–175 CharacterVector, 41, 49, 68 clone(), 46, 48 cppFunction(), 12, 31, 32 create, 29 CxxFlags(), 24, 26 DataFrame, 29, 55, 60, 173 Date, 81, 82 depends, 172 Environment, 57 export, 11, 123, 173, 174 ExpressionVector, 41 Function, 56, 57, 118 GenericVector, 41, 52 IntegerMatrix, 41, 101 IntegerVector, 41–45, 55, 101, 107, 118 isNULL, 41 isObject, 41 isS4, 41 LdFlags(), 24, 26 List, 30, 52–54, 68, 76, 89, 90, 102, 141, 142, 156, 174, 191 LogicalVector, 41, 48, 106 214 Modules, 81, 83–102, 135 class , 91, 93, 95, 98 const method, 98 constructor, 91, 93, 95, 98 field, 91, 93 function, 88, 90 method, 91, 95, 98 property, 93, 95 RCPP MODULE, 86–88 Named, 29, 51–54, 56, 57, 60, 141, 142, 173, 174 NumericMatrix, 30, 41, 48, 53, 101, 141, 142 NumericVector, 26, 30, 41, 45–48, 52, 53, 60, 68, 85, 101, 104, 105, 107, 115, 123, 141, 142, 156, 169, 191 NumericVector(), 104 operator SEXP(), 77 package, 65–74 Plugin, 30 RawVector, 41, 49 Rcout, 153 Rcpp.package.skeleton(), 65–72, 74, 99, 101 Rcpp.plugin.maker(), 31 RcppArmadillo, 139–153 RNGScope, 56, 85, 115, 123 RObject, 39–41, 191 S4, 58, 96 sourceCpp(), 11, 31, 123, 124, 175 sugar, 103–124 abs(), 114 acos(), 114 all(), 107, 108, 115 any(), 107, 108, 115 asin(), 114 atan(), 114 beta(), 114 ceil(), 114 choose(), 114 clamp(), 112 cos(), 114 cosh(), 114 cumsum(), 114 diff(), 111 digamma(), 114 Distributions, 115 dnorm(), 114 duplicated(), 113 exp(), 114 expm1(), 114 factorial(), 114 floor(), 114 gamma(), 114 ifelse(), 104, 109, 115 Subject Index intersect(), 112 is false(), 107 is na(), 107, 108, 118 is true(), 107 lapply(), 108, 110 lbeta(), 114 lchoose(), 114 lfactorial(), 114 lgamma(), 114 log(), 114 log10(), 114 log1p(), 114 mapply(), 110 max(), 113, 114 mean(), 113, 114 min(), 113, 114 pentagamma(), 114 pmax(), 109, 112 pmin(), 109, 112 pnorm(), 60, 114 pow(), 114 psigamma(), 114 qnorm(), 114 range(), 114 rnorm(), 114 round(), 114 runif(), 123 sapply, 118–122 sapply(), 110, 116 sd(), 113, 114 seq along(), 108 seq len(), 110 setdiff(), 111 setequal(), 114 sign(), 111 signif(), 114 sin(), 114 sinh(), 114 sort unique(), 112 sqrt(), 114, 123 sum(), 114, 123 table(), 113 tan(), 114 tanh(), 114 tetragamma(), 114 trigamma(), 114 trunc(), 114 union (), 111 unique(), 112 var(), 113, 114 which max(), 114 which min(), 114 Timer, 179, 180 try(), 32 Subject Index tryCatch(), 32 wrap(), 9, 10, 13, 14, 17, 34, 42, 43, 45, 57, 73, 75–78, 80, 81, 84, 88, 95, 133, 177, 204 XPtr, 85 Rcpp::Rcout, 179 RcppGSL CFLags, 166 LdFLags, 166 matrix, 156, 164, 169 vector, 156, 159–162, 173, 174 Recursion, 7, 8, 12 Reference Classes, see R, Reference Classes Reproducible research, 3 Resampling, 4 RInside, 127–137 parseEval, 132, 133 parseEvalQ, 129, 132–134 Rscript, 24, 70, 127, 128, 166 Rserve, 128 S S, 3 S3, see R, S3 S4, see R, S4;Rcpp, S4 215 SEXP, see R, SEXP Shared library, 9 Simulation, 4, 15, 127 Singular-Value Decomposition (SVD), 185, 186, 192 Standard Template Library (STL), 40, 43–45, 54, 76, 96, 104, 132, 141, 158, 198, 201 STL, see Standard Template Library (STL) T Text file, 127 U Unit tests, 20 V Vector Autogression (VAR), 15–17 W Web application, 136 Windows, see Platform, Windows Wt, 136 X XML, 137 Software Index A Armadillo, 15, 54, 75, 139–141, 144, 146, 151–153, 177, 182, 183, 192 autoconf, 165 B BLAS, 146, 153, 191 Boost, 75, 80–82, 205 Boost.Python, 83, 86 C C, ix, 6, 8, 9, 20, 23, 25, 40, 60, 61, 65, 128, 134, 146, 155, 156, 158, 163, 168, 181, 195–200, 204 C++, vii–ix, xi, 4, 6, 8–11, 13–23, 25, 26, 29–35, 40–43, 45, 46, 51–56, 59, 60, 73, 76, 80, 83, 84, 88, 89, 91, 103–105, 109, 114–116, 123, 124, 127, 128, 132–135, 139, 140, 144, 146, 148, 150–153, 159, 172, 177, 182, 187, 189, 191, 195–205 C++, xi, 9, 15, 23, 26, 44, 53, 58, 59, 61, 65, 67–69, 74, 75, 77, 78, 83, 84, 86, 90, 92, 96, 132, 146, 150, 151, 155, 156, 158, 161–163, 166–168, 177, 180, 181, 183, 192, 195, 199, 200 C++11, 21, 31, 42 C#, 40, 203 cda, 102 ceres, 177 compiler, 17 Cpp, 200 CXXR, viii D Date Time, 75, 80, 81 E Eigen, 75, 177–192 F Fortran, 9, 25, 168 G GSL, 155, 156, 158, 160–166, 168, 169, 171–175 gsl, 155, 156 GUTS, 102 H highlight, 20, 102 I inline, ix, 9–12, 16, 18, 20, 25–34, 42, 52, 65, 142, 144, 155, 168, 169, 204 J Java, 40, 58, 59, 203 L Lapack, 153 Linpack, 144, 146 M Matlab, 139, 146–148, 151 maxent, 102 mypackage, 67 O Octave, 146 OpenBLAS, 191 P parser, 102 planar, 102 Python, 83, 100 Python modules, 86 D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4, © The Author 2013 217 218 Q Qt, 135, 137 R R, vii–ix, xi, xii, 3, 4, 6–12, 14–26, 29–34, 39–49, 51–61, 65–69, 71–73, 75–78, 81, 83–93, 96, 98–101, 103–105, 107, 108, 111, 112, 114–116, 122–124, 127–136, 139, 140, 142–144, 146–153, 155, 156, 158, 160, 161, 164–169, 172–175, 178–181, 183–185, 190–192, 195–199, 209 RAbstraction, viii rbenchmark, 10, 20, 142 Rcpp, vii–ix, xi, xii, 3, 6, 9–11, 15, 18–27, 29–31, 33–35, 39–42, 45, 49, 51, 52, 55–61, 65–69, 73–79, 83, 84, 87, 96, 98–100, 103, 104, 114, 115, 118, 127–130, 132–134, 136, 139–141, 152, 153, 155, 156, 158, 160, 168, 172, 177, 179, 181, 185, 192, 204 RcppArmadillo, xi, 15–18, 29, 30, 54, 74, 75, 82, 139–144, 146, 152, 153, 156, 158, 186, 191 RcppBDT, 74, 75, 80–82, 99, 100, 102 rcppbind, viii RcppClassic, viii Software Index RcppCNPy, 83, 99–102 RcppDE, 53, 54 RcppEigen, xi, 29, 74, 75, 82, 177, 182, 186–189, 191, 192 RcppGSL, 29, 30, 32, 74, 75, 82, 155–159, 161–169, 172, 186, 191 RcppTemplate, viii rdyncall, ix RInside, 127–136 Rmath, 196, 197 RObjects, viii RProtoBuf, xi RQuantLib, viii, xi Rserve, viii, 128 RSofia, 102 Rtools, 21 RUnit, 20 S S, 3, 58 T transmission, 102 W wordcloud, 73, 74 Wt, 136, 137 Author Index A Abrahams, David, 86, 104, 178, 205 Adler, Daniel, ix Albert, Carlo, 102 Alken, Patrick, 30, 155 Allaire, JJ, xi Armstrong, Whit, viii Auguie, Baptiste, 102 Austen, Matthew H., 201 B Bachmeier, Lance, 15 Bates, Douglas, viii, xi, 74, 82, 140, 158, 186, 191, 192 Boost, 205 Booth, Michael, 30, 155 Brokken, Frank B., 205 Burns, Patrick, 12 C Chambers, John M., vii, xi, 3, 23, 58 D Davies, Jim, 30, 155 DebRoy, Saikat, viii Diaz, Fernando Cela, 102 Doss, Nathan, 134 E Eddelbuettel, Dirk, viii, 9, 25, 53, 74, 80, 82, 100, 102, 128, 140, 158, 168, 186, 191 F Fellows, Ian, 73 François, Romain, viii, 9, 25, 74, 80, 82, 102, 128, 140, 158, 168, 186, 191 G Gaile, Daniel P., viii Galassi, Mark, 30, 155 Gentleman, Robert, 23 Google, 177 Gough, Brian, 30, 155 Gropp, William, 134 Grosse-Kunstleve, Ralf W., 86 Guennebaud, Gaël, 177, 183, 192, 204 Gurtovoy, Aleksey, 104, 178, 205 H Hankin, Robin K. S., 156 Hastie, Trevor J., 58 Hornik, Kurt, xi Hua, Jianping, 134 J Jacob, Benoı̂t, 177, 183, 192, 204 Java, James J., viii Josuttis, Nicolai M., 104, 205 Jungman, Gerard, 30, 155 Jurka, Timothy P., 102 K King, Michael, 102 Kusnierczyk, Wacek, 10, 142 L Lajoie, Josée, 204 Leisch, Friedrich, 66 Liang, Gang, viii Ligges, Uwe, xi Lippman, Stanley B., 204 Lusk, Ewing, 134 M Manly, Kenneth E., viii D. Eddelbuettel, Seamless R and C++ Integration with Rcpp, Use R! 64, DOI 10.1007/978-1-4614-6868-4, © The Author 2013 219 220 Matloff, Norman, 23 Meyers, Scott, viii, 6, 198, 200, 201, 203, 205 Moo, Barbara E., 204 Murdoch, Duncan, 9, 21, 25, 168 N Nguyen, Khanh, viii P Plummer, Martyn, xi R R Development Core Team, viii, 19–22, 26, 39, 56, 57, 60, 65, 66, 71, 83, 85, 103, 115, 128, 196 Redd, Andrew, 102 Ripley, Brian D., xi, 21, 23, 58 Rossi, Fabrice, 30, 155 Runnalls, Andrew, viii S Samperi, Dominick, viii, xi Sanderson, Conrad, 15, 139, 158, 192, 204 Simon, André, 20 Skjellum, Anthony, 134 Author Index Sklyar, Oleg, 9, 25, 168 Smith, Mike, 9, 25, 168 Snow, Greg, 4 StackOverflow, 7 Stroustrup, Bjarne, 204 T Temple Lang, Duncan, ix Theiler, James, 30, 155 Thomas, Alun, 102 Tierney, Luke, xi Tsuruoka, Yoshimasa, 102 U Urbanek, Simon, viii, xi, 21, 128 V Vandevoorde, David, 104, 205 Venables, Willian N., 23, 58 Vogel, Sören, 102 W WikiBooks, 14 Wikipedia, 7, 8, 42, 117, 205

Download PDF

advertisement