Eidos: A Simple Scripting Language

Eidos: A Simple Scripting Language
Eidos: A Simple Scripting Language
Benjamin C. Haller
Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853
email: [email protected]
Version: 1.0a1 (last revised 7 October 2015)
License:
Eidos is a free software: you can redistribute it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any
later version.
Disclaimer:
The program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even
the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License (http://www.gnu.org/licenses/) for more details.
Citation:
Haller, B.C., 2015. Eidos: A Simple Scripting Language. URL: http://benhaller.com/eidos.html
1
Contents
PART I: THE EIDOS LANGUAGE
1. Eidos overview
1.1 Introduction
1.2 Why Eidos?
1.3 A quick summary of the Eidos language
2. Language features
2.1 Types, literals, and constants
2.1.1 The integer type
2.1.2 The float type
2.1.3 The logical type
2.1.4 The string type
2.2 Vectors
2.2.1 Everything is a vector
2.2.2 Sequences: operator :
2.2.3 Concatenation: function c()
2.2.4 Subsets: operator []
2.3 Expressions
2.3.1 Arithmetic expressions: operator +, -, *, /, %, ^
2.3.2 Logical expressions: operator |, &, !
2.3.3 Comparative expressions: operator ==, !=, <, <=, >, >=
2.3.4 String concatenation: operator +
2.3.5 Nested expressions: using () for grouping
2.4 Variables
2.4.1 Assignment: operator =
2.4.2 Everything is a vector (a reminder)
2.5 Conditionals
2.5.1 The if statement
2.5.2 The if–else statement
2.5.3 A digression: the semicolon, ;
2.5.4 Compound statements with { }
2.6 Loops
2.6.1 The while statement
2.6.2 The do–while statement
2.6.3 The for statement
2.6.4 The next statement
2.6.5 The break statement
2.6.6 The return statement
2.7 Functions
2.7.1 Calling functions: operator ()
2.7.2 The NULL type
2.7.3 The function() function
2.8 Objects
2.8.1 The object type
2.8.2 Element access: operator [] and sharing semantics
2.8.3 Properties: operator .
2.8.4 Multiplexed assignment through properties: operator = revisited
2
3.
4.
5.
6.
7.
8.
2.8.5 Comparison with object: operator ==, !=, <, <=, >, >= revisited
2.8.6 Methods: operator . and the method() method
Built-in functions and methods
3.1 Math functions
3.2 Summary statistics functions
3.3 Vector construction functions
3.4 Value inspection & manipulation functions
3.5 Value type testing and coercion functions
3.6 Filesystem access functions
3.7 Miscellaneous functions
3.8 Built-in methods
EidosScribe
4.1 EidosScribe overview
4.2 Interactive scripting
4.3 File-based script execution
4.4 Code completion
4.5 Debugging controls
4.5.1 Showing tokenization
4.5.2 Showing the abstract syntax tree (AST)
4.5.3 Showing the evaluation trace
Eidos language reference sheet
Railroad diagrams
6.1 Start rule
6.2 Statements
6.3 Expressions
6.4 Tokens
6.5 SLiM extensions to the Eidos grammar
Acknowledgements
References
PART II: USING EIDOS IN A NEW CONTEXT
9. Running Eidos scripts in C++
9.1 Initializing Eidos
9.2 Creating a script object: EidosScript
9.3 Setting up an Eidos interpreter: EidosInterpreter
9.4 Handling script results: EidosValue, Eidos_intrusive_ptr, and EidosObjectPool
10. Making C++ objects visible in Eidos
10.1 Defining an interface: EidosObjectClass
10.2 Defining an implementation: EidosObjectElement
10.3 Defining new Eidos symbols: EidosSymbolTable
11. Adding properties to Context objects
11.1 Defining a property signature: EidosPropertySignature
11.2 Declaring a property interface
11.2 Implementing a property interface
12. Adding methods to Context objects
12.1 Defining a method signature: EidosMethodSignature
3
13.
14.
15.
16.
17.
12.2 Declaring a method interface
12.3 Implementing a method interface
Writing new built-in Eidos functions
13.1 Defining a function signature: EidosFunctionSignature
13.2 Implementing a new function: EidosDelegateFunctionPtr
13.3 Making a new function visible: EidosFunctionMap
Making an Objective-C/Cocoa GUI for an Eidos Context
14.1 The EidosConsoleWindowController class
14.2 The EidosTextView and EidosConsoleTextView classes
14.3 Extending EidosScribe
Using the Eidos tokenizer and parser: EidosToken and EidosASTNode
15.1 Working with tokens: EidosToken
15.2 Working with the parse tree: EidosASTNode
Extending the Eidos grammar
The future of Eidos
4
PART I: THE EIDOS LANGUAGE
5
1. Eidos overview
1.1 Introduction
To get the obvious question out of the way at the outset: Eidos is pronounced “A-dose”, with
the accent on the long “a” (as in “day”). It is a Classical Greek word (εἶδος) meaning “form”,
“essence”, “type”, or “species”; it is the word that Plato used to refer to his Forms.
Eidos is, by design, a very simple and unoriginal language. It is intended to be easy to learn for
anyone with any experience with programming – or indeed, even for those with none. For
example, the traditional “hello, world” program in Eidos is about as simple as it could possibly be:
print("hello, world");
This manual will set out the language in a very methodical and perhaps tedious manner; it may
be possible – particularly for those with experience with any ALGOL-based language such as C,
Java, or R – to learn Eidos largely from the overview given in section 1.3 and the language
reference sheet in section 5. However, reading the rest of this overview is recommended, at least.
Eidos is mostly a hybrid between the C and R languages, with a bit borrowed from Objective-C
as well. From R it takes the fact that everything in the language is a vector, and many of the builtin functions are vectorized – they are built to operate directly upon vector variables that contain
any number of individual values, making the use of for loops and similar constructs unnecessary
in many cases. The names and patterns of functions in the base package of R are also borrowed
liberally, both because they are generally well-designed, and because this borrowing will allow
users familiar with R to hit the ground running. R is also the source of the sequence operator, :,
and the way that subsetting works in Eidos with the [] operator. From C, on the other hand, Eidos
takes the use of the . operator to address object members, the mandatory use of semicolons to
terminate statements, zero-based instead of one-based indexing of vectors – and of course a great
many properties of its grammar and syntax, which R also inherited from C or its relatives.
However, Eidos also departs from its ancestors – mostly in the direction of simplicity. Unlike C,
there are no variable declarations, and typing is entirely dynamic; variables are created simply by
assignment, and types are checked only at runtime. There are no compound assignment operators
(+=, -=, *=, etc.), no scalar logical operators (&&, ||), no bitwise operators (>>, <<, etc.), no ternary
conditional (?:), no pre/post increment or decrement (--, ++), and no pointers. Although built-in
functions exist, you cannot define a new function (although you can define a “lambda” to fill some
of that void). Similarly, although built-in objects exist, you cannot define a new class (or even a
struct – or even an enum). There is no switch statement, no goto, no multi-dimensional arrays.
There is only one scope, the global scope. The hope is that this simplicity will not be excessively
limiting, since the tasks to which Eidos will be put are expected to be quite simple.
What sort of tasks are those? Eidos is intended to be used to control other software, referred to
here as “the Context”; section 1.2 will discuss this in detail.
It is worth noting that Eidos is an interpreted language – it is not “compiled” to the assembly
language that is run natively by your computer, but instead “interprets” your code on the fly. You
might imagine that, rather than the Eidos interpreter being a native speaker of its own language, it
is a tourist frantically looking up words in a dictionary as it encounters them, piecing together
meanings from the definitions it finds. This means that Eidos is relatively slow (but not that slow;
see section 2.6.5). For maximal performance, minimize your use of Eidos, particularly by taking
advantage of vectorization when possible rather than writing for loops.
Eidos has been a lot of fun to develop; I hope it is a pleasure to use as well. Enjoy!
6
1.2 Why Eidos?
There are lots of programming languages in the world; do we really need another? Surprisingly,
the answer is a resounding YES! Eidos is a rather unusual language, designed to fit a rather
unusual niche. There might be another language out there somewhere that would be well-suited
to that niche, but certainly no popular mainstream language – C, C++, R, Java, Python, whatever –
would fit the bill; the closest is Lua, but even Lua does not fit the bill. This section will explain
exactly what Eidos is designed to do and why a new language was needed to do it.
First of all, Eidos is intended to provide a scriptable layer on top of existing C++ objects. The
idea is that objects that are designed and written in C++, and are instantiated and controlled
principally by C++ code, are nevertheless visible and manipulable in Eidos code. These C++
objects are referred to as the Context; a given use of Eidos is tightly integrated with a Context that
the script controls. The C++ object and the Eidos object should be, in some sense, one and the
same object; scripting proxies, message forwarding, etc., should not be needed. The built-in types
in Eidos, similarly, should be C++ types “under the hood”, allowing C++ code to implement Eidos
functions and primitives without the added complexity of type translation or bridging. Few
languages exist that fit this bill; most languages are designed principally to be standalone entities,
even if some, like R, provide a “back door” to a lower-level language. Lua is designed to interface
with a Context, like Eidos, but since it is ANSI C, providing a front end to C++ would be complex.
Second of all, Eidos is intended to support an interactive workflow; it needs to be an interpreted
language so that the user can work at an interactive console prompt, typing Eidos commands and
getting results back without a compile-run cycle. There are many interpreted languages, of course;
but this is a reason that C++ itself could not fit the niche of Eidos, even though Eidos is designed to
control a C++ Context. The goal is to control C++ objects dynamically, without having to write
C++ code or any compiled code at all.
Third, Eidos needs to be cross-platform, open-source, and compatible with the GNU copyleft
license. This requirement is because the Context that Eidos is particularly designed to control –
the SLiM evolutionary simulation package – has those requirements.
Fourth, Eidos needs to be lean, with as few compile-time and run-time dependencies as
possible. This is in part because it needs to be easily buildable on a wide variety of platforms,
including places like high performance computing clusters where the operators of the machines
may not want to install third-party dynamic libraries and so forth. This is also, in part, because
Eidos needs to be tight on memory and quick to load, since it will be used in environments where
those resources are scarce. Eidos thus depends only on the GNU Scientific Library (GSL) and the
standard C++11 runtime; it does not use Boost or other C++ libraries.
Fifth, it needs to be blazingly fast to set up and tear down a new interpreter. For some
applications, a new interpreter will need to be instantiated millions or even billions of times, and
that overhead may be the main execution time bottleneck. For example, in a SLiM simulation a
fitness() callback may be called for each mutation of a given type, in each individual, in each
subpopulation, in each generation, but the callback may do only a very simple calculation to get
the resulting fitness value. Each call to the callback needs a fresh Eidos interpreter with its own
state variables. Eidos is highly optimized for this usage case.
Sixth, Eidos needs to be extremely simple to use; the target users are not professional
programmers, but rather biologists (in the case of SLiM). Lua, among others, fails this test.
Perhaps this makes the need for Eidos more clear. Although it was originally designed to
control SLiM simulations, it should be quite useful for controlling other Contexts too, and this
manual is written to be agnostic about the Context with which Eidos is being used. Part II of this
manual will discuss how to use Eidos to control a Context of your own design.
7
1.3 A quick summary of the Eidos language
This section is a one-page summary of the Eidos language, intended for experienced
programmers who don’t want to slog through the whole manual. If it is gibberish to you, skip it.
Types. Eidos defines six built-in types: NULL, logical, integer, float, string, and object.
NULL is similar to a null pointer; it represents the absence of a value, often due to an error. The
logical type represents Boolean values, and may be either true (T) or false (F). The integer and
float types represent 64-bit integers and double-precision floating-point values, respectively. The
string type represents a string of characters; there is no character type in Eidos. Finally, the
object type represents a C++ object of a particular class, as defined by the Context that Eidos is
controlling (see section 1.2); Eidos itself defines no object classes.
Vectors. All values in Eidos are vectors of zero or more elements. Elements themselves are not
accessible in Eidos; you can get a vector containing a single element, but you cannot extract the
element itself, since Eidos provides no syntax to do so. Internally, elements are C++ types: bool
for logical, int64_t for integer, double for float, std::string for string, and a subclass of
EidosObjectElement for object (NULL is always length zero and thus has no element type). A
vector of exactly one element is called a singleton. Many of the Eidos operators and functions
work with whole vectors, allowing well-designed Eidos code to run fairly quickly.
Expressions. Eidos defines a familiar set of operators for expressions. Arithmetic operators
include + (addition), - (negation or subtraction), * (multiplication), / (division), % (modulo), and
^ (exponentiation). A range operator, :, produces a vector sequence between its operands.
Logical operators include | (or), & (and), and ! (not). Comparison operators are ==, !=, <, <=, >,
and >=, with their expected meanings (!= being not-equals). A subset operator, [], is provided that
is similar to that in R, taking either a logical vector specifying which corresponding values to
take, or an integer vector specifying which indices to take. The + operator may also be used for
string concatenation. All of these operators work on vector operands; often this means they allow
operands which are either identical in length (in which case the operation is conducted between
matching pairs of elements) or one of which is a singleton (in which case the singleton is paired
with every element of the other operand). Operator precedence is very similar to C/C++
precedence; parentheses () may be used to modify the standard order of evaluation.
Variables. Variables are defined by assigning a value to a symbol with the = operator; no
declaration is needed. Eidos variables all live in a single global scope, even when lambdas are
executed (see below). However, the Context may choose to modify the Eidos symbol table in a
way that resembles scoping, parameter passing, or other such paradigms.
Statements. Statements are semicolon-terminated. Statement types include null statements (a
lone ;), expression statements, assignments, if and if-else statements, loops (while, do-while,
and an iterator-type for-in statement), control-flow statements (next, break, and return,
essentially as in R), and compound statements enclosed by braces, {}.
Functions. Eidos supplies a broad range of built-in functions, documented in section 3. It is
not possible to define your own functions within Eidos, but you may call a lambda, a string that is
dynamically interpreted as Eidos code, using the executeLambda() and apply() functions.
Objects. As mentioned above, object-type elements are C++ objects defined by the Context.
Objects may have Eidos properties and methods (not the same as their C++ instance variables and
methods). Properties are lightweight, typically getter/setter access to simple values, whereas
methods are heavyweight, performing computation or having wider-reaching effects. The dot
operator, ., is used both to access object properties with the syntax x.property and to call object
methods with the syntax x.method(). It is also a vector operator, which can be surprising.
And there you have it – the Eidos language in one page. Now let’s go into more detail.
8
2. Language features
This section will lay out essentially all of the fundamental grammar and syntax of Eidos,
beginning with the basic types of Eidos (section 2.1) and the vector-based approach taken (section
2.2), and then proceeding on to expressions (section 2.3), variables (section 2.4), conditionals
(section 2.5), loops (section 2.6), functions (section 2.7), and objects (section 2.8).
2.1 Types, literals, and constants
2.1.1 The integer type
The integer type is used in Eidos to represent integers – whole numbers, with no fractional
component. Unlike in many languages, exponential notation may be used to specify integer
literals (“literals” means values stated literally in the script, rather than derived through
calculations). For example, the following are all integer literals in Eidos:
0
−179483
12e8
The integer type is advantageous primarily because it is exact; it does not suffer from any sort
of roundoff error. Exact comparison with integer constants is therefore safe; roundoff error will not
lead to problems caused by 0.999999999 being deemed to be unequal to 1. However, integer is
disadvantageous because it can only represent a limited range of values, and beyond that range,
results will be unpredictable. Eidos uses 64 bits to store integer values, so that range is quite
wide; to −9223372036854775806 to 9223372036854775807, to be exact. That is broad, but it is still
enormously narrower than the range of numbers representable with the next type, float.
2.1.2 The float type
The float type is used in Eidos to represent all non-integer numbers – fractions and real
numbers. Exponential notation may be used to specify float literals; in particular; literals with a
decimal point or a negative exponent are taken to be of type float. For example, the following
are all float literals:
0.0
−1.2
12e−3
1.2e9
Note that this rule means that some literals, such as the first and last shown above, are
represented using float even though they could also be represented using integer.
The float type is advantageous primarily because it can represent an enormously wide range of
values. Eidos uses C++’s double type to represent its float values; the range of values allowed
will depend upon your computer’s settings, but it will be vast. If that range is exceeded, or if
numerical problems occur, type float can also represent values as infinity or as “Not A Number”
(INF and NAN, respectively, in Eidos). The float type is thus more robust for operations that might
produce such values. The disadvantage of float is that it is inexact; some values cannot be
represented exactly (just as 1/3 in base 10 cannot be represented exactly, and must be written as
0.3333333...). Roundoff can thus cause comparison errors, overflow and underflow errors, and
the accumulation of numerical error.
Several float constants are defined in Eidos; besides INF and NAN, PI is defined as π
(3.14159...), and E is defined as e (2.71828...).
9
2.1.3 The logical type
The logical type represents true and false values, such as those from comparisons (see section
2.3.3). In many languages this type is called something like boolean or BOOL; Eidos follows R in
using the name logical instead.
There are no logical literals in Eidos. However, there are defined constants that behave in
essentially the same way as literals. In particular, T is defined as true, and F is defined as false.
These are the only two values that the logical type can take. As in a great many other languages,
these logical values have equivalent numerical values; F is 0, and T is 1 (and in fact any non-zero
value is considered to be true if converted to logical type). Values of type integer or float may
therefore be converted to logical, and vice-versa, as will be seen in later sections.
2.1.4 The string type
The string type represents a string of characters – a word, a sentence, a paragraph, the
complete works of Shakespeare. There is no formatting on a string – no font, no point size, no bold
or italic. Instead, it is just a character stream. A string literal must be enclosed by either single or
double quotation marks, ' or ". This choice simplifies writing Eidos strings that themselves
contain quote characters, because you can delimit the string with the opposite kind of quote. For
example, 'You say, "Ere thrice the sun done salutation to the dawn"' is a string that
contains double quotes, whereas "Quoth the Raven, 'nevermore'.” is a string that contains
single quotes. Apart from this consideration, it does not matter whether you use single or double
quotes; the internal representation is the same. The suggested convention is to prefer double
quotes, all else being equal, since they are more universally used in other programming languages.
A complication arises if one wishes to include both single and double quotation marks within a
string; whichever delimiter you choose, one or the other quote character will terminate the
string literal. In this case, the quotation mark must be “escaped” by preceding it with a
backslash, \. The backslash can be used to “escape” various other characters; to include a
newline in a string, for example, use \n, and to include a tab, use \t. Newlines, in particular, can
only be included in quoted string literals using a \n escape sequence, since actual newlines are
not legal – but see below for an alternative. Since the backslash has this special meaning,
backslashes themselves must be escaped as \\. Keeping those escape sequences in mind, these
are string literals in Eidos:
"hello, world"
'this is also a string'
"this\nis\nfive\nlines\nlong."
"this contains a tab\tand a backslash: \\."
'The Foo exclaimed, "foo!"'
'\'I never said most of the things I said.\' – L. P. "Yogi" Berra'
Beginning users may wish to skip the rest of this subsection, but for more advanced users, there
is another way of representing string literals in Eidos. Sometimes the limitations and complications
of standard string literals just get in the way; you just want a block of text to be interpreted as a
string literal, including any newlines that it contains, and without any quoting issues, any special
interpretation of escape sequences, and so forth. This is often desirable for including a snippet of
Eidos code as a string literal in your code, for example (perhaps because you plan to pass the
string to a function like apply() or executeLambda() that will interpret as executable code; see
section 3). For this purpose, Eidos provides a multiline string literal format, roughly following a
style commonly called a “here document” in other languages such as Perl, PHP, and Ruby. The
idea of these multiline “here document”-style string literals is quite simple; since a code sample
is worth a thousand words, let’s start with an example:
10
<<--hello,
world
>>---
That is exactly equivalent to the quoted string literal "hello,\nworld"; the start delimiter <<--and the newline following it are removed, the >>--- delimiter and the newline preceding it are
removed, and everything remaining that was between the delimiters is taken literally, including the
newline after the comma.
Stated more precisely, an Eidos multiline string literal starts with a user-defined delimiter of the
form <<DELIMITER, where DELIMITER may be any sequence of characters whatsoever, or may be
zero-length. The <<DELIMITER start delimiter must be immediately followed by a newline. The
contents of the string literal begin after that newline, and continue until the point at which a
newline followed by an end delimiter is encountered, where the end delimiter is of the form
>>DELIMITER (DELIMITER being the same character sequence as in the start delimiter). The start and
end delimiters thus comprise a matched pair, like <<FOO and >>FOO, <<========= and >>=========,
or simply << and >>. As long as you choose a delimiter that does not occur within your string
literal’s text, there will be no problem, and no quoting or escaping will be necessary. Indeed,
character sequences such as \n that would be interpreted as special escape sequences in a quoted
string literal will be treated just like any other characters in a multiline string literal.
2.2 Vectors
2.2.1 Everything is a vector
In Eidos (following R), everything is a vector. A vector is simply an ordered collection of zero
or more values. That may sound rather arcane; the point is that when you write 9 in Eidos, you are
not, in fact, referring to a single integer of value 9. Rather, you are referring to an integer vector;
that vector happens to contain one value, which is 9. Often, if you are doing calculations with
single values (called “scalars”), you can ignore this fact – but not always. It might seem like an
unnecessary complication, at first blush, that you cannot simply work with scalar values; but in
fact the opposite is true. If Eidos supported scalars in addition to vectors, it would have twice as
many data types as it has; and if it supported only scalars, and not vectors, then a great deal of
useful functionality would be missing from the language. Once you get used to working with
vectors, it will come to seem very intuitive. There are good reasons why R is a vector-based
language, and why Eidos follows in its footsteps.
So 9 is an integer vector containing one value, 9; OK. How do we get vectors with more than
one value, then? Section 2.7 will introduce a variety of built-in functions that can produce
vectors, but until then, we will follow the lead established in the next two sections, which will
introduce an operator and a function that produce new vectors.
2.2.2 Sequences: operator :
An “operator” is an entity that performs an operation on operands. That probably sounds
deliberately obscure, but in fact it is quite simple. In the statement “1+1 equals 2”, the + symbol is
an operator, and the two 1 values are the operands upon which the + operator acts. The +
operator performs addition; it adds its operands. Thus do language geeks talk about things. (The
word “equals” is actually an operator here also; see section 2.3.3).
The first Eidos operator we’ll discuss is the : operator. It is used to construct vectors with
(usually) more than one value. In particular, it is used to construct sequences, and so it is called
the sequence operator. Given operands x and y (standing for any two numbers), the sequence
11
operator starts at x and counts, by 1 (or -1, as appropriate) toward y without passing it. It yields a
vector containing all of the numbers it encounters along the way.
To illustrate this, we’ll look at an interactive session in EidosScribe, the interactive Eidos
interpreter described in section 4. Here, lines beginning with a > have been entered by the user;
the > character is the “prompt” shown by EidosScribe to request input from the user, so the user
did not type the > character but did type what follows it. The lines following show the result
produced by Eidos by interpreting the user’s input:
> 1:5
1 2 3 4 5
> 5:1
5 4 3 2 1
> -1.2:6.5
-1.2 -0.2 0.8 1.8 2.8 3.8 4.8 5.8
Note that the sequence operator can count down as well as up, that it can handle float as well
as integer operands, and that negative numbers are allowed.
2.2.3 Concatenation: function c()
This section will provide a sneak preview of functions, which are covered in more depth in
section 2.7. Here we will look at just one function, called c(). The “c” stands for “concatenate”,
meaning to paste together end-to-end, and that is exactly what c() does. You can supply it with
any number of values, as a comma-separated list inside its parentheses, and c() will stick them
together and give you back a single vector. For example, here is another interactive session within
EidosScribe:
> c(1, 5, 9, 18392, -17, 3)
1 5 9 18392 -17 3
> c(6.5, 3:9, 0)
6.5 3 4 5 6 7 8 9 0
> c(5:7, "foo")
"5" "6" "7" "foo"
The first example shows that c() simply returns a vector with the values it was passed, in order.
It might look like it did nothing at all; but in fact it turned the things that were passed to it – six
integer vectors, each containing one value – into a single integer vector containing six values.
The second example shows that you can supply float values to c() as well, and even
sequences constructed with the sequence operator, and it will do the same thing: paste them
together, end to end, to make a single vector.
The third example shows what happens if you mix types: integer and string, here. The c()
function produces a single vector of a single type. Type string cannot be changed to type
integer, in general (what integer would "foo" be?), so instead the integer values have to be
changed to string, and a string vector results. In fact, a similar thing happened in the second
example; c() was given a mix of float and integer, and it converted the integer values to float
to avoid losing information, producing a float vector as a result. These are both examples of what
is called “type promotion”, which Eidos does automatically in some cases.
A digression about type promotion in Eidos might be called for here, while we are on the
subject. This digression will reference many topics that are covered later in this manual, so you
might wish to skip it for now if you are not an experienced programmer. Type promotion is done
by Eidos (1) when multiple values of different types are concatenated together to form a single
12
vector, as done by the c() function, the apply() function (see section 3), and the results from
method calls (see section 2.8.6), (2) when values of different types are compared using the
comparative operators ==, !=, <, <=, >, and >= (see section 2.3.3), and (3) with the string
concatenation operator +, which promotes its operands to string (see section 2.3.4). The
arithmetic operators +, -, *, /, %, and ^ (see section 2.3.1) also generally accept a mixture of
integer and float operands, promoting integer to float as needed; they will not promote
logical upward, however. Conceptually, the arithmetic operators work with “numeric” type,
which is either integer or float. Many functions also effectively work with “numeric” type, by
declaring their parameters to accept either integer or float, but this is not actually automatic type
promotion so much as a policy choice to accept parameters of particular types, so the function
signature (see section 2.7.3) of a given function should be checked to see what types that function
allows. Apart from these specific cases, automatic type promotion is not done in Eidos; it is not
done for the operands of operators, or the parameters of functions, for example, except for the
specific cases mentioned here. When automatic type promotion does apply, it will promote types
upward according to a strict hierarchy: logical is the lowest type on the hierarchy and can be
promoted upward to integer, float, or string, (2) integer is next and can be promoted to float
or string, (3) float can be promoted upward to string, and (4) string cannot be promoted at all
since it is the highest type in the hierarchy. Type object (see section 2.8) does not participate in
automatic promotion at all; it is not promoted to any other type, and other types are never
promoted to it. It should also be noted that when Eidos expects a logical value, such as in if,
while, and do–while statements (see sections 2.5 and 2.6) or with the logical operators &, |, and !
(see section 2.3.2), values of integer, float, and string type will be interpreted as either T or F
according to specific rules (see section 2.3.2); this is not automatic type promotion, however, but
rather type coercion applied by those statements and operators. In any case, you can always
explicitly convert Eidos values from one type to another using the as...() family on functions (see
section 3). Again, this digression brought in many future concepts, so don’t worry if it didn’t make
sense; it just seemed best to summarize the dynamics of type conversion in Eidos in one place,
when the subject first arose.
2.2.4 Subsets: operator []
While we are on the topic of vectors, it makes sense to introduce the operator in Eidos that
most explicitly operates on vectors: the [] operator. This operator selects a subset of the vector
upon which it operates; it is thus often called the subset operator. It can work in one of two
different ways, depending upon whether it is given an integer vector of indices, or is given a
logical vector of selectors. These two methods will be described below.
First of all, a subset can be selected with an integer vector of indices. These indices are zerobased, like C but unlike R; the first value in a vector is thus at index 0, not index 1. Here, for
example, are a few subset operations, also using the methods of vector construction discussed in
the previous sections:
> 10:19
10 11 12 13 14 15 16 17 18 19
> (10:19)[5]
15
> (10:19)[c(2,3,5,3,3,9)]
12 13 15 13 13 19
Note that a given index can be used multiple times.
13
Second, a subset can be selected with a logical vector of selectors. In this case, the logical
vector must be the same length as the vector being selected; each logical value indicates whether
the corresponding vector value should be selected (T) or not (F). For example:
> (10:19)[c(T,T,F,F,F,T,F,F,F,T)]
10 11 15 19
This may look a bit clumsy, shown as it is here with a logical vector constructed with c(); but
it is, in fact, enormously powerful and useful when combined with the power of expressions, as
explored in the next section.
2.3 Expressions
2.3.1 Arithmetic expressions: operator +, -, *, /, %, ^
These are the standard operators of arithmetic; + performs addition, - performs subtraction, *
performs multiplication, / performs division, % performs a modulo operation (more on that below),
and ^ performs exponentiation. Not a great deal needs to be said about these operators, which
behave according to the standard rules of mathematics. They also follow the standard rules of
“precedence”; exponentiation is the highest precedence, addition and subtraction are the lowest
precedence, and the other three are in the middle, so 4^2+5*6^7 is grouped as (4^3)+(5*(6^7)), as
expected if you remember your grade-school math.
There are only a few minor twists to be discussed. One is the meaning of the % operator, which
many people have not previously encountered. This computes the “modulo” from a division,
which is the remainder left behind after division. For example, 13%6 is 1, because after 13 is
divided evenly by 6 (taking care of 12 of the 13), 1 is left as a remainder. Probably the most
common use of % is in determining whether a number is even or odd by looking at the result of a
%2 operation; 5%2 is 1, indicating that 5 is odd, whereas 6%2 is 0, indicating that 6 is even.
Another twist is that both the division and modulo operators in Eidos operate on float values –
even if integer values are passed – and return float results. (For those who care, division is
performed internally using the C++ division operator /, and modulo is performed using the C++
fmod() function). This policy was chosen because the definitions of integer division and modulo
vary widely among programming languages and are contested and unclear (see Bantchev 2006,
http://www.math.bas.bg/bantchev/articles/divmod.pdf). If you are sure that you want integer
division or modulo, and understand the issues involved, Eidos provides the functions
integerDiv() and integerMod() for this purpose (see section 3). Besides side-stepping the vague
definitions of the integer operator, this policy also avoids rather common bugs involving the
accidental use of integer division when float division was desired – a much more common
occurrence than vice versa.
A third twist is that + and - can both act as “unary” operators, meaning that they are happy to
take just a single operand. This is standard math notation, as in the expressions -6+3 or 7*-5; but
it can sometimes look a bit strange, as in the expression 5--6 (more easily read as 5 - -6).
A fourth twist is that the ^ operator is right-associative, whereas all other binary Eidos operators
are left-associative. For example, 2-3-4 is evaluated as (2-3)-4, not as 2-(3-4); this is leftassociativity. However, 2^3^4 is evaluated as 2^(3^4), not (2^3)^4; this is right-associativity.
Since this follows the standard associativity for these operators, in both mathematics and most
other programming languages, the result should generally be intuitive, but if you have never
explicitly thought about associativity before you might be taken by surprise. See section 2.3.5 for
further discussion of associativity, as well as operator precedence, a related topic.
14
A fifth twist is that the arithmetic operators and functions in Eidos are guaranteed to handle
overflows safely. The float type is safe because it uses IEEE-standard arithmetic, including the use
of INF to indicate infinities and the use of NAN to represent not-a-number results; this is the same as
in most languages. In Eidos, however, the integer type is also safe, unlike in C, C++, and many
other languages. All operations on integer values in Eidos either (1) will always produce float
results, as the / and % operators do; (2) will produce float results when needed to avoid overflow,
as the product() and sum() functions do; or (3) will raise an error condition on an overflow, as the
Eidos operators +, -, and * do, as well as the abs() and asInteger() functions. This means that
the integer type in Eidos can be used without fear that overflows might cause results to be
incorrect.
The final twist is really a reminder: everything is a vector. These operators are designed to do
something smart, when possible, with vectors of any length, not just with single-valued vectors as
shown above. Here are a few examples, which will give you a sense of how it works:
> (1:10)+10
11 12 13 14 15 16 17 18 19 20
> (1:10)*5
5 10 15 20 25 30 35 40 45 50
> (1:10)*(10:1)
10 18 24 28 30 30 28 24 18 10
> (1:10)%2
1 0 1 0 1 0 1 0 1 0
In general, the operands of these arithmetic operators must either be the same length (in which
case the elements in the operand vectors are paired off and the operation is performed between
each pair), or one or the other vector must be of length 1 (in which case the operation is performed
using that single value, paired with each value in the other operand vector). The examples above
will be easier to understand than the previous sentence was.
2.3.2 Logical expressions: operator |, &, !
The |, &, and ! operators act upon logical values. If they are given operands of other types,
those operands will be “coerced” to logical values following the rule mentioned above: zero is F,
non-zero is T (and for string operands, a string that is zero characters long – the empty string, ""
– is considered F, while all other string values are considered T).
As to what they do: | is the “or” operation, & is the “and” operation, and ! is the “not”
operation. As in common parlance, “or” is T if either of its operands is T, whereas “and” is T only
if both of its operands are T. The “not” operator is unary (it takes only one operand), and it negates
its operand; T becomes F, F becomes T. As with the arithmetic operators, these operators work
with vector operands, too – either matching up values pairwise between the two operands, or
applying a single value across a multivalued operand. Some examples:
>
F
>
T
>
F
>
T
>
T & F
T | F
c(T, F) & F
F
c(T, F) | F
F
c(T, T, F, F) & c(T, F, T, F)
15
T F F F
> c(T, T, F, F) | c(T, F, T, F)
T T T F
Those familiar with programming might wish to know that the | and & operators do not “shortcircuit” – they can’t, because they are vector operators. If the & operator first sees an operand that
evaluates to F, for example, it knows that it will produce F value(s) as a result; but it does not know
what size result vector to make. If a later operand is a multivalued vector, the & operator will
produce a result vector of matching length; if all later operands are also length 1, however, & will
produce a result vector of length 1. To know this for sure (and to make sure that there are no
illegal length mismatches between later operands), it must evaluate all of its operands; it cannot
short-circuit. Similarly for the | operator.
These semantics match those in R, for its | and & operators, but they might seem a little strange
to those used to C and other scalar-based languages. For those used to R, on the other hand, it
should be noted here that Eidos does not support the && and || operators of R, for reasons of
simplicity; it is safer to use the any() or all() functions described in section 3 to simplify
multivalued logical vectors before using & or |. If this is gibberish to you, it is not important; the
point here is only to prevent confusion among users accustomed to R.
2.3.3 Comparative expressions: operator ==, !=, <, <=, >, >=
These operators compare their left and right operand. The operators test for equality (==),
inequality (!=), less-than (<), less-than-or-equality (<=), greater-than (>), and greater-than-orequality (>=) relationships. As seen above with the arithmetic and logical operators, this can work
in two different ways: if the operands are the same length, their elements are paired up and the
comparison is done between each pair, whereas if the operands are not the same length then one
operand must be of length one, and its value is compared against all of the values of the other
operand. Regardless of the types of the operands, these operators all produce a logical result
vector. If the operands are of different types, promotion will be used to coerce them to be the
same type (i.e. logical will be coerced to integer, integer to float, and float to string, as
previously discussed in the context of the c() function in section 2.2.3).
This is all pretty straightforward, so a few examples should suffice to make it clear:
>
T
>
F
>
T
>
F
>
F
5 == 5
5 == 6
5 == "5"
1:5
F F
1:5
F T
== 4
T F
== 5:1
F F
Note that this is often not what you want! You might not want the automatic type promotion
that makes 5=="5" evaluate as T, or the vectorized comparison that makes 1:5==4 evaluate as
something other than simply F. You might really want to ask: are two values identical? With that
as prelude, see the discussion at section 2.5.1 for a better alternative to operator == and
operator != in many situations.
16
2.3.4 String concatenation: operator +
The + operator was previously discussed as an arithmetic operator in section 2.3.1, but it can
also act as a concatenation operator for string operands. Concatenation is pasting together; the +
operator simply pastes its string operands together, end to end:
> "foo" + "bar"
"foobar"
In fact, this works with non-string operands too, as long as a string operand is nearby; the
interpretation of + as a concatenation operator is preferred by Eidos, and wins out over its
arithmetic interpretation, as long as a string operand is present to suggest doing so. The other
non-string operands will be coerced to string:
> 3 + " + " + 7 + " equals " + 10
"3 + 7 equals 10"
However, this does not work retroactively; if Eidos has already done arithmetic addition on
some operands, it will not go back and perform concatenation instead:
> 3 + 7 + " equals " + 10
"10 equals 10"
To force concatenation in such situations, you can simply begin the expression with an empty
string, "":
> "" + 3 + 7 + " equals " + 10
"37 equals 10"
Which is not a true statement; but it is a correct concatenation operation!
The concatenation operator also works with vectors, as usual:
> (99:97) + " bottles of beer on the wall..."
"99 bottles of beer on the wall..." "98 bottles of beer on the wall..."
"97 bottles of beer on the wall..."
2.3.5 Nested expressions: using () for grouping
All of the discussion above involved simple expressions that allowed the standard precedence
rules of mathematics to determine the order of operations; 1+2*3 is evaluated as 1+(2*3) rather
than (1+2)*3 because the * operator is higher precedence than the + operator. For the record,
here is the full precedence hierarchy for operators in Eidos (including a few operators that have not
yet been discussed), from highest to lowest precedence:
[], (), .
subscript, function call, and member access
+, -, !
unary plus, unary minus, logical (Boolean) negation (right-associative)
^
exponentiation (right-associative)
:
sequence construction
*, /, %
multiplication, division, and modulo
+, addition and subtraction
<, >, <=, >=
less-than, greater-than, less-than-or-equality, greater-than-or-equality
==, !=
equality and inequality
&
logical (Boolean) and
|
logical (Boolean) or
=
assignment
17
Operators at the same precedence level are generally evaluated in the order in which they are
encountered. Put more technically, Eidos operators are generally left-associative; 3*5%2 evaluates
as (3*5)%2, which is 1, not as 3*(5%2), which is 3. The only binary operator in Eidos that is an
exception to this rule is the ^ operator, which (following standard mathematical convention) is
right-associative; 2^3^4 is evaluated as 2^(3^4), not (2^3)^4 (see section 2.3.1). The unary +,
unary -, and ! operators are also technically right-associative; for unary operators this is of little
practical import, however (it basically just implies that the unary operators must occur to the left of
their operand; you write -x, not x-, to express the negation of x).
In any case, parentheses can be used to modify the order of operations, just as in math. This
works just as you would expect:
> 1+2*3
7
> (1+2)*3
9
> ((1+2)*3)^4
6561
Note that this use of parentheses is distinct from the () operator, involved in making function
calls and discussed in section 2.7 (and which you saw briefly in section 2.2.3 for c()).
2.4 Variables
2.4.1 Assignment: operator =
Thus far, we have only used expressions in an immediate sense: evaluating them and seeing
their result print in Eidos’s console. But the results of expressions can also be saved in variables.
As in many languages, this is done with the = operator, often called the assignment operator.
Here’s an example of the use of a variable:
> x
> x
32
> x
52
> x
32
> x
> x
52
> x
T
= 2 ^ 5
+ 20
= x + 20
== 52
Note that the expression x + 20 did not change the value of x; a new assignment, x = x + 20,
was necessary to achieve that.
As the example above illustrates, the assignment operator, =, is different from the equality
comparison operator, ==. In many languages, confusing the two can cause bugs that are hard to
find; in C, for example, it is legal to write:
if (x=y) ...
18
In C, this would assign the value of y to x, and then the expression x=y would evaluate to the
value that was assigned, and that value would be tested by the if statement. This can be useful as
a way of writing extremely compact code; but it is also a very common source of bugs, especially
for inexperienced programmers. In Eidos using assignment in this way is simply illegal;
assignment is allowed only in the context of a statement like x=y; to prevent these issues. (This
point is mostly of interest to experienced programmers, so if it is unclear, don’t worry.)
Variable names are fairly unrestricted. They may begin with a letter (uppercase or lowercase)
or an underscore, and subsequently may contain all of those characters, and numerical digits as
well. So x_23, fooBar, and MyVariable23 are all legal variable names (although not good ones –
good variable names explain what the variable represents, such as selection_coeff). However,
4by4 would not be a legal variable name, since it begins with a digit.
Functions will be discussed in depth in section 2.7 and section 3, but here’s a bit of
foreshadowing: you can use the ls() function to list (thus the name of the function) all the
variables that have been defined:
> ls()
E => (float) 2.71828
F => (logical) F
INF => (float) INF
NAN => (float) NAN
NULL => (NULL) NULL
PI => (float) 3.14159
T => (logical) T
x <-> (float) 52
(You might see some other variables listed as well, depending on your version of Eidos and the
context in which you execute the statement.) Here x was defined by the code we just ran above;
the rest of the variables listed are constants that are predefined by Eidos, as discussed in section
2.1. Note that variables you have defined are shown with a <-> arrow; that indicates that they
may be changed. Constants are shown with a => arrow, simply indicating that they are constant.
(Constants are predefined in Eidos; you cannot define your own constants.)
In the listing above, the type of each variable is also given. You might notice that there is a
constant, NULL, of a type, also called NULL, that we have not yet discussed; that topic will be taken
up in section 2.7.2.
2.4.2 Everything is a vector (a reminder)
Since it is so central to Eidos, it is worth a reminder here that every value in Eidos is a vector.
Variables therefore contain vectors, too; they might have just a single value in them (or no values
at all!), but they are vectors nonetheless. You can find out the number of values in a vector using
the size() function – jumping ahead a bit, again, but it is simple enough:
> x = 17:391
> size(x)
375
2.5 Conditionals
2.5.1 The if statement
As in many languages, conditional execution is provided by the if statement. This statement is
supplied with a logical condition; if the condition is T, the rest of the if statement is executed,
whereas if the condition is F, the rest of the if statement is ignored. An example:
19
> if (2^2^2^2^2 > 10000) "exponentiation is da bomb!"
"exponentiation is da bomb!"
The only twist here, really, is that the condition must evaluate to a single value, i.e. a vector of
size() == 1. The if statement, in other words, is essentially a scalar operator, not a vector
operator. If you have a multivalued logical vector, you can use the any() or all() functions to
simplify it to a single logical value; see section 3. Alternatively, the ifelse() function provides a
vector conditional operation, similar to that in R; see section 3.
It is worth exploring this twist with an example. Suppose you have a variable x which ought to
be equal to 3, and a variable y which ought to contain two values, 7 and 8. You might expect to
be able to write:
> if (x == 3 & y == c(7,8)) "yes!"
ERROR (EidosInterpreter::Evaluate_If): condition has size() != 1.
This is the first time we’ve seen a Eidos error in this manual; notice they print in red in
EidosScribe. Their wording can be a bit arcane; apologies in advance for that. In this case, the
error informs you that the size of condition is not equal to 1 (and that that is a problem). The
expression y == c(7,8) produces a logical vector with two values, the result of testing the first
and second values respectively. The & operator thus produces a two-valued logical vector as its
result, and if is not happy about that. To resolve this, you could use the all() function, as in:
if (x == 3 & all(y == c(7,8))) "yes!"
That makes it clear that you require all of the results of the y == c(7,8) comparison to be true,
not just any of those results (for which you would use any()). Without that clarification, Eidos
doesn’t know what to do. (R has the same difficulty; it handles it by issuing a warning (“the
condition has length > 1 and only the first element will be used”) and testing only the first element
of the logical condition vector – a behavior that is probably almost never what one wants.)
You might wonder why Eidos and R don’t just take the obvious route of requiring all values to
be true by default. The difficulty is that the semantics of the & operator do not cooperate. Suppose
x is not a singleton, but instead has three values; your if statement would then become:
if (x == c(3,4,5) & y == c(7,8)) "yes!"
After the == operators have been evaluated, this boils down to a use of the & operator on two
logical vectors, which might look like:
c(T,T,T) & c(T,T)
And that is not legal, because the operands to operator & differ in size and neither is a singleton;
Eidos will give you an error (“ERROR (EidosInterpreter::Evaluate_And): operands to the '&'
operator are not compatible in size()”). There is no good way to resolve that without
breaking the semantics of the & operator in ways that would have deep and undesirable
consequences in other areas. So it really is necessary to use all(), as in:
if (all(x == c(3,4,5)) & all(y == c(7,8))) "yes!"
However, there is a better solution! What this all points out, really, is that operator == is not a
very good way to test whether one value is identical to another value, because it does an
elementwise equality comparison and returns a vector with one logical value per element (see
section 2.3.3). Eidos (and R) both provide a better way to test whether two values are identical:
20
the identical() function. Since we haven’t covered functions yet, this jumps ahead a bit, but its
usage here is very simple. You could write the if statement above as:
if (identical(x, c(3,4,5)) & identical(y, c(7,8))) "yes!"
That will do precisely what you want. As an added bonus, it will also test that x and y have the
correct type, whereas operator == will use automatic type promotion in its test; 7=="7" is T, but
identical(7, "7") is F. If you find yourself using operator == or operator != in Eidos, you should
always ask yourself, “Do I really want to perform an elementwise comparison with type
promotion?”, and if the answer is not a resounding T, consider using identical().
It is also worth noting that the condition for if does not need to be a logical value; a value of
a different type will be converted to logical by coercion if possible (see section 2.3.2).
2.5.2 The if–else statement
Often you want to perform an alternative action when the condition of an if statement is F; the
statement allows this. It is simplest to just show this with an example:
if–else
> if (2/2/2/2/2 > 10000) "division is da bomb!"; else "not so much."
"not so much."
Super simple, right?
2.5.3 A digression: the semicolon, ;
You might have noticed a subtle twist in the last example: the semicolon ; at the end of the first
part of the statement. A bit of subterfuge has gotten swept under the rug thus far, but it’s time to
come clean. Every statement in Eidos must end with a semicolon (except compound statements;
see section 2.5.4). However, when you’re working interactively in EidosScribe, EidosScribe will
add a trailing semicolon to your statements if necessary, just to make your life simpler. So when
you type:
> 1+1==2
what is really being evaluated behind the scenes is:
> 1+1==2;
When you’re not working interactively, semicolons are required, and if you forget, you will get
an error, like this:
> 1+1==2
ERROR (Parse): unexpected token 'EOF' in statement; expected ';'
EOF stands for End Of File; it’s a standard way of referring to the end of an input buffer, in this
case the line of input provided by the user for execution.
So now the reason for that semicolon’s existence, in the example back in section 2.5.2, is
obvious: it is required at the end of the if statement, as usual, but EidosScribe is not smart enough
to add it for us automatically in this case, because the if statement is followed by an else clause.
The simplest and shortest possible statement in Eidos is the “null statement”, which consists of
nothing but a semicolon:
;
This is not terribly useful, since it does nothing.
21
2.5.4 Compound statements with { }
The other thing you might wonder about, regarding if statements, is: what if I want to perform
more than one action in response to the condition being T or F? This, then, is an opportune
moment to introduce the concept of compound statements. A compound statement is a series of
statements (zero or more) enclosed by braces. An example is worth a thousand words:
> if (1+1==2)
{
x = 1;
x = x + 1;
x;
}
else
{
"whoah, I'm confused";
}
2
Note that the input here is spread across multiple lines for clarity; all of this could be typed on a
single line instead. If entered as multiple lines, it cannot presently be entered in EidosScribe’s
interactive mode because the if statement would stand on its own and be evaluated as soon as it
was completed; instead, the full text would need to be entered in the script area on the left,
selected, and executed (see section 4 for more discussion of using EidosScribe). All of the blue
lines are user input, whereas the final line in black, 2, shows the output of the execution of the
whole if–else statement; the if clause is executed, the calculations involving x are performed,
and the final statement x; produces a result which is printed to the console as usual.
The way that x; results in output here might seem a bit surprising at first, but it is really the
same thing as what happened in section 2.5.1, where it presumably seemed more natural. The
only strange thing here is the fact that the value of a compound statement is the value of the last
statement executed within the compound statement; the values of the previous statements are
discarded.
You can use a compound statement in any context in which a single statement would be
allowed. For example, compound statements are very commonly used with the looping constructs
discussed in the next section.
2.6 Loops
Loops are used to repeat a statement (or a compound statement) more than once. Depending
upon the task at hand, the best tool for the job might be a while loop, a do–while loop, or a for
loop, as described in the following sections.
2.6.1 The while statement
A while loop repeats a statement as long as a given condition is true. The condition is tested
before the first time that the statement is executed, so the statement will be executed zero or more
times. Here is a code snippet to compute the first twenty numbers of the Fibonacci sequence:
> fib = c(1, 1);
while (size(fib) < 20)
{
next_fib = fib[size(fib) - 1] + fib[size(fib) - 2];
fib = c(fib, next_fib);
}
fib;
22
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765
This snippet brings together many of the things we’ve seen in past sections, such as constructing
vectors with the concatenation function c() (section 2.2.3), defining variables with the assignment
operator = (section 2.4.1), getting the size() of a vector (section 2.4.2), and accessing values
within a vector using the subset operator [] (section 2.2.4). Its use of a while loop is optimal,
because it ensures that if the fib vector is already long enough to satisfy the length condition
size(fib) < 20, no further values of fib will be computed. You could use this while loop to
lengthen the fib vector on demand within a larger block of code that used the fib vector
repeatedly.
2.6.2 The do–while statement
A do–while loop also repeats a statement as long as a given condition is true. However, in this
case the condition is tested at the end of the loop, and thus the loop statement is always executed
at least once. Here is a code snippet to compute a factorial:
> counter = 5;
factorial = 1;
do
{
factorial = factorial * counter;
counter = counter - 1;
}
while (counter > 0);
"The factorial of 5 is " + factorial;
"The factorial of 5 is 120"
This example brings in string concatenation using the + operator (section 2.3.3) in order to
generate its output line. Note that this example could be rewritten using a while loop instead, but
it might be a bit less intuitive in its operation since it would no longer embody the formal
definition of the factorial as explicitly. Note also that computing a factorial could be done much
more trivially (and efficiently) using the sequence operator : and the product() function (see
section 3), but the code here is useful for the purpose of illustration.
2.6.3 The for statement
The third type of loop, the for loop, is used to loop through all of the elements in a vector. For
each value in the given vector, a given variable is set to the value, and a given statement is then
executed. For example, the following code computes squares by setting element to each value of
my_sequence, one by one, and then executing the print() function for each value:
> my_sequence = 1:4;
for (element in my_sequence)
print("The square of " + element + " is " + element^2);
"The square of 1 is 1"
"The square of 2 is 4"
"The square of 3 is 9"
"The square of 4 is 16"
This is the first time we have seen the print() function; its meaning here is rather obvious, but
it is covered in more detail in section 3. Notice how operator + is used to construct the output.
This looping construct is called by various names in other languages, such as the “for each”
statement (PHP), the “range-based for” (C++), “fast enumeration” (Objective-C), and so forth. It is
different from the traditional for loop of C and related languages, which entails an initializer
expression, a condition expression, and an increment/decrement expression. That type of for
23
loop does not exist in Eidos (following R); the iterator for of R and Eidos is a more natural and
efficient choice for vector-based languages.
2.6.4 The next statement
Sometimes you might wish to cut short the execution of a given iteration of a loop, skipping the
rest of the work that would normally be done and proceeding directly to the next iteration. This is
the function of the next statement. For illustration, here is a trivial modification of the previous
example, changed to print only squares that are divisible by 5 – and to print their cubes as well:
> for (element in 1:20)
{
square = element ^ 2;
if (square % 5 != 0)
next;
cube = element ^ 3;
print(element + " squared is " + square + ", cubed is " + cube);
}
"5 squared is 25, cubed is 125"
"10 squared is 100, cubed is 1000"
"15 squared is 225, cubed is 3375"
"20 squared is 400, cubed is 8000"
Notice how the vector used by the for loop is specified directly, using the sequence 1:20; this
is a very common pattern. This example also uses the modulo operator % (section 2.3.1) to
determine divisibility.
The main point, however, is that the next statement causes the loop to skip most of its work
whenever square is not divisible by 5. The next statement can be used within while and do–while
loops as well, and does exactly the same thing in those contexts.
2.6.5 The break statement
The final topic to cover regarding loops is the break statement. Often it is necessary to stop the
execution of a loop altogether, not just to cut short the current iteration of the loop as next does.
To achieve this – to break out of a loop completely – use the break statement. For example, here
is a very primitive way of finding prime numbers in Eidos:
> for (number in 2:20)
{
prime = T;
for (divisor in 2:number^0.5)
if (number % divisor == 0)
{
prime = F;
break;
}
if (prime)
print(number + " is prime!");
}
"3 is prime!"
"5 is prime!"
"7 is prime!"
"11 is prime!"
"13 is prime!"
"17 is prime!"
"19 is prime!"
24
There are a few things to note here. First of all, this example is more structurally complex than
previous examples; we have an if statement with a compound statement, nested within a for
statement, which is itself nested within the compound statement of another for statement. If it is
not obvious what is going on here, take a little time to ponder it. Second, notice how the ^
operator (section 2.3.1) is used to calculate a square root; the : and % operators also play major
roles, but we have already revisited them in previous loop examples. A logical “flag” variable,
prime, is used to keep track of whether we have found a divisor; such flag variables are a very
common and useful programming paradigm.
Most importantly, however, note how the break statement stops the execution of the inner for
loop as soon as a divisor is found. This makes the loop much more efficient; on my machine,
computing the primes up to 100,000 takes about 12.5 seconds with the break statement, versus 86
seconds without it – quite a difference! Skipping unnecessary work is good. Not that this code is
a paragon of speed anyway; there are, of course, much faster ways to search for primes.
What’s that you say? What’s a faster way to search for primes? OK, since it also illustrates the
use of break, but this time in a do–while loop, here’s a faster way to search for primes:
// the Sieve of Eratosthenes in Eidos!
last = 100000;
x = 2:last;
lim = last^0.5;
do {
v = x[0];
if (v > lim)
break;
print(v + " is prime!");
x = x[x % v != 0];
} while (T);
for (v in x)
print(v + " is prime!");
This Eidos algorithm takes much less than a second on my machine. It’s based on a famous
algorithm called the Sieve of Eratosthenes that was discovered more than 20 centuries ago. The
first line of code is a comment that says precisely that, in fact. Comments in Eidos start with //
and consume the remainder of the line; they do nothing except annotate your code for readability.
There are a few things to note here. One is that the do–while loop has a condition of T; it will
thus loop forever unless it is terminated by a break statement. This is a common paradigm called
an infinite loop, used when you want to terminate a loop based on a condition test in the middle
of a loop, not at the beginning (as a while loop would do) or at the end (as a do–while loop would
do). Inside the loop, the most recently found prime, v, is compared against a threshold value, lim;
when that threshold is reached, all of the values that still remain in the vector x are guaranteed to
be prime, so the loop is exited via break, and everything left in x is printed.
Another thing to note is that while Eidos is slow compared to compiled languages, it is still
fairly quick. Timing tests indicate that it is typically substantially faster than R, and often not
markedly slower than Lua (which is nothing to sneeze at). As seen above, it can find every prime
up to 100,000, and print them out as well, in much less than a second! It can do this, however, by
virtue of using a good algorithm; using a less efficient algorithm to perform the same task took 12.5
seconds or even 86 seconds, as seen above. If your Eidos code is performing poorly, the first step
should be to look at the big picture and think about whether there is a more efficient algorithm
you could use to solve your problem.
25
2.6.6 The return statement
While we are on the subject of control flow, we can cover one more statement type: the return
statement. This doesn’t belong under the topic of loops, exactly, but there is no better place in this
manual to treat it, because the return statement is much more specialized, and less useful, than it
is in most languages; it is a very special case in Eidos. The reason is that the return statement
returns a value from a block of code, as in other languages such as C and R – but in most
circumstances, returned values are unimportant in Eidos since it is not possible to define your own
Eidos functions. A return is useful, then, mostly when the Context within which you’re using
Eidos uses the returned value. When using Eidos in SLiM, for example, SLiM uses the value
returned by Eidos scripts such as fitness() callbacks and mateChoice() callbacks, making return
very useful in that Context (see SLiM’s manual for details). Apart from such Context-dependent
uses, return is mainly useful as a way to break out of nested loops regardless of the depth of
nesting, as illustrated below.
The return statement is very simple: the keyword return, and then, optionally, an expression.
When the return statement is executed, the expression is evaluated and its value is immediately
returned as the value of the largest enclosing statement. The return statement therefore breaks out
of all conditionals, loops, and compound statements, regardless of the depth of nesting. For
example, one could write a (very dumb) search for factors of a number like this:
for (i in 1:10)
for (j in 1:10)
if (i*j == 21) // the magic numbers?
return 21 + " is " + i + " * " + j;
As soon as this code finds a combination of i and j that produce 21 when multiplied (as noted
by the comment on the if statement), the return statement breaks out of both loops and returns a
string describing the hit. A break statement, on the other hand, would only break out of the
innermost loop; the outer loop, over i, would continue to execute. (A break statement also can’t
be given a value to return, so it would be much less convenient to use here for that reason as
well.)
In some circumstances a return statement is not necessary, because compound statements
evaluate to the value of the last statement evaluated within them, and if statements behave
similarly; as in R, therefore, a return statement can often be omitted. However, using return
makes the intentions of the programmer more explicit, and so its use is encouraged.
If the expression for the return statement is omitted, the return value used is NULL. In situations
where the return value will not be used, such as Eidos events in SLiM, the return value should be
omitted to make the intent of the code clear.
2.7 Functions
2.7.1 Calling functions: operator ()
Functions are so useful that we have already used them in several examples. It is now time to
discuss them more formally. A function is simply a block of code which has been given a name.
Using that name, you can then cause the execution of that block of code whenever you wish.
That is the first major purpose of functions: the reuseability of a useful chunk of code. A function
can be supplied with the particular variables upon which it should act, called the function’s
“parameters” or “arguments”; you can execute a function with the sequence 5:15 as an argument
in one place, and with the string "foo" as an argument in another. That is the second major
purpose of functions: the generalization of a useful chunk of code to easily act on different inputs.
26
In Eidos, it is not possible for you to define your own functions (but see executeLambda(),
section 3.6). This limitation is a major design simplification that made Eidos substantially easier to
implement (and to document). Since the tasks to which Eidos will be put are expected to be fairly
simple, it is not expected to be an important limitation for most users. A fairly large set of built-in
functions are supplied for your use, and the hope is that they will suffice for most purposes. The
built-in functions are covered in detail in section 3. For now, there are just a few generalities that
should be mentioned.
First of all, functions are called using the () operator; in section 2.4.1, for example, we called
the function named “ls” simply by writing ls(). The ls() function is an example of a function
that can be called with no arguments; it will then print all of the constants and variables defined in
the global namespace, as seen in section 2.4.1.
Second, function arguments go between the parentheses of the () operator, separated by
commas. In section 2.4.2, for example, we used the size() function to get the size – the number
of elements – in a vector named x, by passing x as an argument to size(). That was written as
size(x). The size() function requires exactly one argument; writing size() without an argument
will cause an error:
> size()
ERROR (FunctionSignature::CheckArguments): missing required argument for
function size().
Third, some functions can take a variable number of arguments. For example, in section 2.2.3
we saw the c() function, which will take any number of arguments and stick them all together to
produce a single vector as its result. Most functions expect an exact number of arguments; many
functions, in fact, are even fussier than that, requiring each parameter to be of a particular type, a
particular size, or both. But some, such as c(), are more flexible. In section 3 the argument
requirements for every built-in function will be specified.
Fourth, many functions provide a return value. In other words, a function call like c(5,6) can
evaluate to a particular value, just as an expression like 5+6 evaluates to a particular value. The
result from a function call can be used in an expression or assigned to a variable, as you might
expect. Here is an example:
> x = c(1, 3, 5, 7, 9) + 0.5
> x
1.5 3.5 5.5 7.5 9.5
> size(x)
5
The c() function produces a vector; that vector is then added to 0.5 with the + operator, and
the result of that is assigned to x with the = operator. The number of elements in x is then counted
with the size() function, which again produces a return value: 5. The return type for all of the
built-in functions will be specified in section 3.
2.7.2 The NULL type
In section 2.4.1, you saw a bit of foreshadowing regarding another variable type in Eidos: NULL.
The time has come to delve into that topic a little more, because NULL is important in using
functions. It has two uses, in fact: as a return value, and as a parameter.
As a return value, NULL is used to indicate that a function had nothing useful to return. Some
functions always return NULL; the print() function that you have seen already, for example:
> x = print("foo")
27
"foo"
> x
NULL
The string "foo" printed by print() here is not its return value; print() sends its output directly
to the Eidos console. It has nothing useful to return, so it returns NULL. (That NULL value does not
normally get printed out by Eidos because it is marked as an “invisible” return, a side topic not
really worth getting into here; invisible returns work much as they do in R).
Some functions will return a useful value if they can, but will return NULL if they can’t. Often a
NULL return is a result of passing NULL in as an argument; garbage in, garbage out, as they say. A
trivial example:
> max(NULL)
NULL
More interestingly, the readFile() function will return NULL if an error occurs that prevents the
file read operation from completing. The calling code could then detect that NULL return and act
accordingly – it might try to read from a different path, print an error, or terminate execution with
stop(), or it might just ignore the problem, if reading the file was optional anyway (such as an
optional configuration file to modify the default behavior of a script).
The other use of NULL, as mentioned above, is as an argument to a function. Passing NULL is
occasionally a way of signaling that you don’t want to supply a value for an argument, or that you
want a default behavior from the function rather than telling it more specifically what to do. None
of the functions you have seen thus far use NULL in this way, but you will see examples of it in
section 3.
NULL cannot be an element of a vector of some other type; it cannot be used to mark missing or
unknown values, for example. Instead, NULL is its own type of vector in Eidos, always of zero
length. (There is also no NA value in Eidos like the one in R, while we’re on the topic of marking
missing values. Not having to worry about missing values makes Eidos substantially simpler and
faster, and Eidos – unlike R – is not designed to be used for doing statistical analysis, so marking
missing values is not expected to be important. Eidos does support NAN – Not A Number – values
in float vectors, however, which could conceivably be used to mark missing values if necessary.)
The basic philosophy of how Eidos handles NULL values in expressions and computations is that
NULL in such situations represents a non-fatal error or an unknown value. If using the NULL value in
some meaningful way could lead to potentially misleading or incorrect results, Eidos will generate
a fatal error. The idea is to give Eidos code an opportunity to detect a NULL, and thus to catch and
handle the non-fatal error; but if the code does not handle the NULL, using the NULL in further
operations will result in a fatal error before the functioning of the code is seriously compromised.
NULL values are thus a sort of third rail; there’s a good reason they exist, but you have to be very
careful around them. They are a bit like zero-valued pointers in C (NULL), C++ (nullptr),
Objective-C (nil), and similar languages; they are widely used, but if you ever use one the wrong
way it is an immediate and fatal error.
Documenting exactly how Eidos handles NULL in every conceivable situation would be difficult.
If you need to know what happens in a particular case, you should try it out in EidosScribe; as
Socrates liked to say, “there’s nothing like asking”. A few broad guidelines can be stated, though
(making reference to some topics that have not yet been covered, so don’t worry if some terms
here are unfamiliar):
28
• Functions and methods may return NULL to indicate a non-fatal exceptional condition
even if their call signature does not state NULL as a potential return type; call signatures
indicate only the return type under normal, non-exceptional circumstances. If an
exceptional return of NULL is a possibility for a given function or method, that should be
mentioned in its documentation. Functions and methods declared with a return type of
void will always return NULL; that is the meaning of the void return type in Eidos, in fact.
• Methods calls on object vectors that contain multiple elements might result in NULL being
returned by some elements and non-NULL values being returned by other elements. In this
case, the NULL returns will be silently dropped from the aggregated return value from the
method call. This is probably the most dangerous aspect of the way Eidos handles NULL.
If the method returns one value per element, you can check for this condition by
comparing the size() of the returned value to the size() of the object upon which the
method was called. If you need to detect and respond to NULL returns from each element
in an object, you should iterate over the elements of the object with a for loop and call
the method on each element individually. Because of this, it is probably wise for Context
designers to design methods so that they return NULL only in cases where dropping the
NULL values is likely to be harmless.
• NULL may not be passed as a parameter value to a function or method unless that is
specifically allowed by the call signature of the function/method. Eidos designates this as
a fatal error in order to limit the unpredictable consequences of propagating NULL values
forward.
• Read-only properties on object variables may, as far as Eidos is concerned, produce NULL
even if the property signature does not state NULL as the property’s type, but this should be
stated explicitly in the documentation for the property. Assigning NULL into a property is
illegal – not only because it might cause problems, but also because it violates the one-toone rule for multiplexed assignment into properties (see section 2.8.4), since NULL has a
size of 0, so it would be illegal anyway. This means that a read-write property cannot
accept NULL assignment, nor can it generate NULL, even if it is declared as doing so in the
property’s signature, in fact. Since properties are intended to represent lightweight
attributes, they should not be generating exceptional conditions anyway.
• NULL may not be used with any of the arithmetic operators (+, -, *, /, %, ^), nor with any of
the logical operators (&, |, !), nor with the string concatenation operator (+), nor with the
range operator (:), nor may it be used as the condition in an if, while, or do loop, nor
may it be used as the value over which a for loop iterates. In all cases, using NULL is a
fatal error.
• NULL may not be used with the comparison operators (==, !=, <, >, <=, >=). To test whether
a given value is NULL, therefore, you cannot use the equality operator, ==; the comparison
x == NULL will always cause a fatal error. Instead, you must use the isNULL() function.
• It is legal to subscript NULL; NULL[...] always produces NULL. NULL may not be used as a
subscript, however; x[NULL] is always an error (except NULL[NULL], which produces NULL).
• You may assign NULL into a variable (x = NULL) and you may return NULL from your code
(return NULL). Passing NULL values around in your own code is harmless; it is the attempt
to actually use a NULL value that is often fatal.
• NULL can be passed to print() (it prints as “NULL”) and to cat() (it emits nothing). It may
not be assembled into a vector with other values using c(), nor can it ever be turned into
a value of another type by automatic type promotion or by the as...() family of
functions.
29
These policies are considerably stricter than the policies regarding NULL in R. This reflects
several factors. First, R generally takes quite a permissive attitude toward runtime errors, and is
famous (some would say infamous) for bending over backwards to find a possible interpretation for
questionable code rather than generating an error. Eidos is much stricter, in order to minimize the
risk of code invisibly producing an incorrect result; maximal robustness, rather than maximal
flexibility, is the primary design goal of Eidos. Second, Eidos does not have the NA value of R, so
many cases in which R would produce an NA have to produce a NULL in Eidos. This means that the
policies in Eidos regarding NULL have to be correspondingly stricter, because NULL is more
common than in R, and might represent cases that the user would expect, from experience with R,
to be handled automatically; the risk of undetected failure is therefore high, and a strict policy is
needed to minimize that risk.
2.7.3 The function() function
As mentioned above, section 3 will cover every built-in function in detail. Here, however, it is
worth mentioning that Eidos has just a little taste of a language feature called introspection: it has
ways for you to examine the internals of the language itself. Of particular note here is a function
called function(). Calling this function results in output like this (with vertical ellipses to indicate
that only some of the output is shown here):
> function()
⋮
(float)atan(numeric)
(*)c(...)
(void)cat(*)
(float)ceil(numeric)
(string$)class(*)
⋮
Each line shows the function signature of a different built-in function. The first line, for
example, shows the signature of atan(), the arctangent function. This signature indicates that
atan() takes a numeric argument (meaning either an integer or a float), but it always returns a
float.
The next signature, for c(), indicates that c() takes any number of arguments (as shown by the
ellipsis ...), and can return any type (as indicated by the *) – the type that c() returns will depend
on what types of arguments you pass to it.
The cat() function takes one argument of any type (*) and returns nothing at all (void).
Actually, a return value of void means that an invisible NULL is returned; but that is, for all practical
purposes, equivalent to no return value at all, so it is clearer to use the void designation (inherited
from the C language).
The ceil() function, like atan(), takes a numeric argument and returns a float result. This is a
math function that rounds off the value passed to it in a particular way (see section 3). The
signature for ceil() and atan() is the function signature of almost all of the math functions in
Eidos, in fact. These math functions will take an integer as an argument for convenience, but
their return value is float because the mathematical operation they perform can produce a float
even from integer input; sin(1) is approximately 0.84147, for example. It might seem that
ceil() could have a return type of integer, since it performs a type of round-off; but it cannot.
This is because the value passed to it could be an infinity, an NAN, or a value outside of the range
representable by integer (see section 2.1.2); float is thus the only return type guaranteed to work.
In the final signature shown above, the class() function takes one argument of any type (*), but
it returns a string. Not only that; it guarantees that the string it returns will be a singleton – a
30
vector containing exactly one value. That guarantee is represented by the $ at the end of the
return value designation (string$). A $ at the end of an argument type specification would
similarly indicate that that argument requires a singleton – for that argument, in other words, you
must pass a vector containing exactly one value. Conveniently, function() is itself a function
with such an argument, and we can ask function() about itself:
> function("function")
(void)function([string$])
Here we have called function() with one argument: the string value "function". In
response, it has looked up the function signature for the function of that name – itself – and has
printed out its signature for us. Note that function() takes one argument, a singleton string, as
indicated by string$. However, that argument is inside brackets, [], too; this indicates that it is
an optional argument. When we called function() at the beginning of this section, we omitted
that optional argument, and so function() printed out every function signature known to Eidos;
just above, however, we chose to supply "function" as the optional argument, and so function()
looked up that function name and printed only that function signature.
The functions mentioned here that you haven’t seen yet, such as class(), will be discussed
more in section 3; the purpose here was just to introduce function signatures in order to illustrate
how Eidos represents functions and the various constraints and guarantees they embody when you
use them. In summary, a function signature states the arguments and the return type of a function,
following a template like:
(return_type)function_name(argument_list)
The return type and the argument list use various symbols:
void
integer
float
logical
string
object
numeric
+
*
$
[]
...
an invisible NULL return, or (for the argument list) no legal arguments
an integer (see section 2.1.1)
a float (see section 2.1.2)
a logical (see section 2.1.3)
a string (see section 2.1.4)
an object (see section 8)
either an integer or float
any type except object
any type, including object
follows a type to indicate a singleton – a vector containing exactly one value
encloses an optional argument – an argument which may be omitted
indicates that any number of arguments may be supplied, from zero on up
Occasionally, a function will allow an argument or a return value to be of more than one type.
For example, in the Context of SLiM it is common for functions to take an argument that is either
an integer (identifying a SLiM object by ID) or an object (passing the object directly). In such
cases, Eidos represents the type of the variable using single letters to designate each supported
type:
N
l
NULL type
logical type
i
f
integer type
float type
s
string
type
31
o
object
type
In the Context of SLiM, for example, there is a function named initializeGenomicElement()
that takes a parameter stated to be of type io<GenomicElementType>$. Unpacking this, the
parameter can be either an integer (because of the i) or an object of type GenomicElementType
(because of the o and the element type designation); whichever it is, it must be a singleton
(because of the $). These sorts of mixed-type designations are uncommon, however (except that
the numeric type mentioned above is really such a mixed-type designation, alternatively
designated as if; it is so common that it merited its own designator).
You will never need to write your own function signatures, since you can’t write your own
functions in Eidos; it’s just useful to understand them so that you can use function() to remind
yourself about the built-in functions you’re using. Signatures will also come up in the context of
methods – discussed in the next section, in which the object type is introduced.
2.8 Objects
2.8.1 The object type
In addition to logical, integer, float, string, and NULL, there is one more type in Eidos left to
discuss: object. An object is a vector that contains elements; it is a container, a bag of stuff. In
this way, it is similar to Eidos’s other types; a float in Eidos is a vector containing floating-point
elements, whereas an object is a vector containing object-elements (often just called “elements”
in general). An object can also embody behavior: it has operations that it can perform using the
elements it contains. The object type in Eidos is thus similar to objects in other languages such as
Java, C++, or R – except much more limited. In Eidos you cannot define your own classes of
object type; you work only with the predefined object types supplied by SLiM or whatever other
Context you might be using Eidos within. These predefined object types generally contain
Context-dependent elements related to the task performed by the Context; in SLiM, the elements
are things such as mutations, genomic elements, and mutation types (described in SLiM’s
documentation).
The behaviors of objects in Eidos manifest in two ways: objects can have properties (also called
instance variables or member variables in some languages) that can be read from and written to,
and they can have methods (also sometimes called member functions). The properties of an
object in Eidos are determined by the type of element the object contains; a Eidos object will
always contain only one type of element (just as a float cannot contain string-elements, for
example).
Instances of particular object classes – particular kinds of objects – are obtained via built-in
functions and/or global constants and variables. For example, in SLiM there is a global constant
called sim that represents the current simulation as an instance of the SLiMSim class. In the
interests of keeping free of the assumption that Eidos is being used with SLiM, we will discuss
objects here using a hypothetical example: a neighborhood object type. This idea will be fleshed
out in the next section, on element access.
2.8.2 Element access: operator [] and sharing semantics
If an object is a vector of elements, how do you access those elements from Eidos? The answer
is obvious, really: using the [] operator, just as you can access the float-elements in a float
vector, or the string-elements in a string vector (section 2.2.4). Remember that [] used on a
float vector does not give you a raw floating-point value; the individual elements inside Eidos
types are never directly accessible. Instead, everything in Eidos is a vector (section 2.2.1), and
32
even single values are always wrapped inside a vector. The same is true for object vectors and
their underlying elements; you cannot get an object-element on its own.
So this subsection is not really saying anything new; it is just saying that the object type works
very much like the other built-in types. Don’t get confused by that. As an example, let’s develop
the idea introduced above (section 2.8.1) of a neighborhood object type. The object is a vector
that contains elements; a neighborhood contains houses, so each element would be a house, in
this example. If you had a neighborhood object variable called neigh, it might contain ten
houses; myHome = neigh[0] would assign a new neighborhood object containing just the first
house from neigh into a new variable named myHome.
Those of you with programming experience might have just asked yourselves the question:
what exactly is meant by the claim that myHome contains “just the first house from neigh”? There
are two possibilities. One is that an element is only ever contained by one particular object, so
when that element is put into a new object, it gets copied. This is called “copy semantics” – and it
is not how Eidos works. The other possibility is that an element can be contained by more than
one object, and never gets copied unless a copy is explicitly requested somehow. This is called
“sharing semantics” – and it is how objects in Eidos work. Skip this comment if it confuses you,
but: if you are familiar with the concept of “pointers” from languages such as C, you could think of
Eidos object variables as using pointers to refer to the object-elements they contain – and that is,
in fact, how Eidos is implemented internally.
It might be worth mentioning the reason for this policy choice, since it is somewhat unusual (at
least without the explicit use of pointers, as in C and Objective-C, or references, as in C++). The
reason is that Eidos is not about creating new objects, new classes, etc.; it is about manipulating
the existing objects defined by the Context within which Eidos is being used. In SLiM, for
example, the SLiM simulation defines the genomes, mutations, genomic element types, etc., that
Eidos manipulates. If Eidos made copies of those elements as a side effect of operations like
subscripting and assignment into variables, the newly created elements would not be recognized
by the simulation, and changing or manipulating them would make no difference to the simulation
– a bizarre and pointless outcome. Instead, it makes sense for Eidos to always act on the objects
that presently exist within its Context; thus, always following sharing semantics is the logical
policy. When you wish to create a new object-element, such as a new mutation in SLiM, you
generally do so by sending a request, via a function call or method call, to the Context.
For those with some programming experience: this means that there is no new operator in Eidos
as there is in C++, no malloc() as in C, no +alloc as in Objective-C. Similarly, there is no delete
operator, no free(), no -dealloc; just as you never create a new object-element directly, you
also never dispose of one. Finally, it means that there are no memory management issues in Eidos;
deciding when to dispose of an object-element is not your problem, it is the Context’s problem.
In SLiM, for example, a mutation might be disposed of when it is no longer referenced by any
genome in the simulation. You might wonder: what happens to the mutation object-element in
Eidos when that happens? Since your Eidos scripts are never executing at the moment, in-between
generations, when SLiM cleans up mutation objects, and since you cannot define an Eidos variable
that lives across that boundary, either, that is a question which has no answer. By design, it is
impossible for a reference in Eidos to ever be “stale” – to refer to an object-element that has been
disposed of – assuming the Context has been designed correctly. In fact, Eidos does not even need
to garbage-collect; the whole topic of memory management is enclosed by a big Somebody Else’s
Problem field (hat-tip to Douglas Adams). The way Eidos works may seem strange to those with
programming experience, but the advantages should now be clear!
33
So, returning from philosophy to practicality: neigh[0] produces a new neighborhood object, a
vector containing exactly one house. That house is the same house as the one in neigh. This
means that if something is done to change the house in myHome, the corresponding house in neigh
will also change – because they are, in fact, the same house. This may seem like a strange thing to
emphasize, but it is in fact crucial, as will be seen in the next section.
2.8.3 Properties: operator .
What can you do with an object? In section 2.8.1 it was stated that objects encapsulate
behaviors as well as elements. One type of behavior is called a property. A property is a simple
attribute of each element in an object. For example, following the neighborhood object example
above, one property of a house might be its address. That would be an example of a read-only
property (sometimes called a “member constant”); you can ask a house what its address is, but you
can’t easily change the address of a house. Properties can be read using the member-access
operator, written as . (a period). The name of a particular property can be used with . to get that
property’s value. For example, we could (hypothetically) access the addresses of the houses in
neigh:
> neigh.address
"100 Main St." "101 Main St." "102 Main St." ...
Notice that since neigh is a vector containing several house elements, the result is a vector
containing several string elements; operations on object are vectorized just as they are for all
other types in Eidos (see sections 2.2.1 and 2.4.2).
Houses might also have a color property, the color they are painted. Since houses can be
repainted in a new color, this would be a read-write property (sometimes called a “member
variable”). You can use the member-access operator to both read and write properties. For
example, using our variable myHome that contains just one house (see section 2.8.2), we could do:
> myHome.color
"red"
> myHome.color = "blue"
> myHome.color
"blue"
We’ve just repainted our house – and the corresponding house in neigh will now also be blue,
since they are the same house! This is exactly the behavior that is needed for controlling an
external simulation. In your scripts, you may want to play around with object vectors that contain
mutations, for example; you want to be able to take subsets of those vectors, set properties on the
mutations (e.g., change their selection coefficients), and so forth – operating the whole time on the
actual mutations in your simulation, not on copies that would fail to affect your simulation run.
This is the reason that sharing semantics were chosen as a foundation of Eidos: you will almost
always want to be referring to objects that already exist in your simulation, and that you want to
collect and manipulate in situ.
2.8.4 Multiplexed assignment through properties: operator = revisited
In the previous section, we saw that you can change a property of an element by assigning to a
property using the . operator, but we did that only for a neighborhood object containing a single
house element. What happens if we have a neighborhood object with a bunch of house elements
in it?
34
The answer to that is “multiplexed assignment”. To see what that means, let’s take a step back
and ask the same question about simple integer vectors; when we saw the assignment operator,
=, in section 2.4.1, this question got conveniently glossed over. Suppose we have an integer
vector:
> x = 1:10
> x
1 2 3 4 5 6 7 8 9 10
We saw in section 2.4.1 that we could replace the value of x by assignment, with a statement
like x = "foo", but let’s try something more interesting instead:
> x[3] = 100
> x
1 2 3 100 5 6 7 8 9 10
That assignment replaced the element at index 3 with a new element, 100. So the = operator
can be used to replace individual elements, not just to redefine entire variables. It can also be
used to replace a larger subset of elements:
> x[x % 2 == 0] = 0
> x
1 0 3 0 5 0 7 0 9 0
This is “multiplexed assignment”: a single value, 0, has been assigned into multiple elements (in
this case, elements that currently contain an even value, as determined by the logical vector
produced by x % 2 == 0).
Assignment can also be used to assign a different value to each element selected by the subset
operator:
> x[x == 0] = c(-1, -3, -5, -7, -9)
> x
1 -1 3 -3 5 -5 7 -7 9 -9
So assignment can either assign the same value to each selected element, or it can assign
corresponding values from the right-hand side to the left-hand side of the assignment. If the
number of values provided does not fit either of these cases, then it is an error:
> x[x < 0] = c(3, 8)
ERROR (_AssignRValueToLValue): assignment to a subscript requires an
rvalue that is a singleton (multiplex assignment) or that has a .size()
matching the .size of the lvalue.
That error message refers to the “rvalue” and the “lvalue” of the assignment; the “rvalue” is the
value on the right-hand side, and the “lvalue” is the value on the left-hand side.
To return to our original question about assignments involving properties, the answer is that that
works in exactly the same way. Given our hypothetical neighborhood object named neigh that
contains several houses, with each house having a property named color, we could write:
> neigh.color
"blue" "red" "red" "red" "red" ...
> neigh.color = “green”
> neigh.color
"green" "green" "green" "green" "green" ...
35
> neigh.color = c("red", "blue", "white", "purple", "celadon", ...)
> neigh.color
"red" "blue" "white" "purple" "celadon" ...
Furthermore, the [] operator can be mixed into this as well:
> neigh.color[2:3] = "turquoise"
> neigh.color
"red" "blue" "turquoise" "turquoise" "celadon" ...
> neigh[c(1, 3)].color = "mustard"
> neigh
"red" "mustard" "turquoise" "mustard" "celadon" ...
Note that neigh.color[indices] is essentially the same as neigh[indices].color. They are
conceptually distinct; the first form selects particular indices from a vector of color properties
derived from all of the elements of x, while the second form selects particular elements from x and
then gets the color property of the selected elements. In practice, however, the same values are
referred to, and so they are formally equivalent. (This formal equivalence is guaranteed by Eidos,
in fact, as a consequence of sharing semantics, combined with a guarantee that read-write
properties – although not read-only properties – must have exactly one value per element. This
means that there is a one-to-one correspondence between read-write properties and their values,
and so it makes no difference in which order . and [] are performed.)
All of this machinery (modeled on how assignment works in R, incidentally) results in a lot of
power contained within the assignment operator. For example, we could institute a neighborhood
code that all houses must be beige – except that the guy with the celadon house refuses to repaint:
> neigh[neigh.color != "celadon"].color = "beige"
> neigh
"beige" "beige" "beige" "beige" "celadon" ...
In the Context of a SLiM simulation, this could be useful for changing the selection coefficient
of all neutral mutations to be slightly deleterious, for example, while leaving all other mutations
untouched; that operation could be achieved in a single line, with no for loops and no if
statements, using an operation very similar to the above example.
2.8.5 Comparison with object: operator ==, !=, <, <=, >, >= revisited
In this discussion of the object type we’ve revisited various operators, such as [] and =, and
seen how they work with object operands. There is one more category of operators that we need
to revisit: the comparative operators ==, !=, <, <=, >, and >= (all previously seen in section 2.3.3).
What does it mean to test whether a mutation object, for example, is less than or greater than a
genomic element object? What, in general, does it mean to compare object values to each other,
and to other types?
Eidos’s answer to this question is: not much. Using the relative comparative operators <, <=, >,
and >= with an object operand is simply illegal; no meaning is attached to such comparisons at
all:
> neigh[2] < neigh[4]
ERROR (Evaluate_Lt): the '<' operator cannot be used with type object.
Using the equality operators == and != with a mixture of object and non-object operands is
also illegal:
36
> myHome == 5.5
ERROR: operand type object cannot be converted to type float.
You may, however, use the equality operators to compare one object to another object. Two
object-elements are equal if and only if they are exactly the same object-element – not different
object-elements with exactly the same values:
> myHome == neigh[0]
T
> myHome == neigh[1]
F
(That last comparison would be F even if neigh[1] had the same address and color as myHome,
since they are different object-elements.) Just as with other operand types (as described in section
2.3.3), this comparison can compare object-elements one-to-one, one-to-many, or many-to-one.
For example, we can find out which house in neigh is our home:
> neigh == myHome
T F F F F ...
Each element in neigh is compared to myHome; only the first element of neigh is the same
as the object-element in myHome, so only the first logical value produced is T.
object-element
2.8.6 Methods: operator . and the method() method
Section 2.8.3 introduced one type of object behavior, properties. Properties generally
represent simple stored values; getting the value of a property might involve a little bit of
computation behind the scenes, but conceptually a property is a trait or an attribute of the object
itself, like the color of a house. The other type of object behavior in Eidos is the method.
Methods are very much like functions; they are chunks of code that you can call to perform tasks.
However, each type of object has its own particular methods – unlike functions, which are
defined globally. Methods are more heavyweight than properties; they might involve quite a lot of
computation, they might create a completely new object as their result, and they might even
modify the object upon which they are called. Not all methods are heavyweight in this sort of
way, however; anything that one might want an object to do, but that does not feel like a simple
property of the object, can be a method. Methods can also take arguments, just like functions,
and they can return whole vectors as their result, unlike (read-write) properties, which must refer
to singleton values so that multiplexed assignment can work (see section 2.8.3). Methods are
therefore much more powerful than properties.
Like functions and object classes, Eidos does not allow you to define your own methods.
There is a set of pre-defined methods for each object class, and the hope is that they will do
everything that you need done (if not, please let us know). This section will not go into any detail
about the pre-defined methods for the object classes provided by SLiM; see the SLiM manual for
details on that. Instead, here we will continue with the hypothetical neighborhood object type
introduced above.
Methods are called using the member-access operator, ., with a syntax that looks a lot like
accessing a property, but combined with the function call operator, (). For example, suppose our
house element had a method named setZip(). This method would find the zip code from within
the address of the house, and would change it to a new zip code passed to the method, while
keeping the rest of the house’s address the same. In code, we could do something like this:
> neigh[3].address
37
"33 N. Wiltshire St., Cambridge, CA 32588"
> neigh[3].setZip(14850)
> neigh.address
"33 N. Wiltshire St., Cambridge, CA 14850"
Notice that the method is allowed to change address even though address is a read-only
property; methods are allowed more latitude than properties are, and can change the state of an
object in whatever ways are allowed by the underlying code in which they are implemented.
Naturally, method calls are also vector operations, so we could change the zip code of our
whole neighborhood in a single statement:
> neigh.setZip(54321)
Or if we had a getZip() method that extracted the current zip code from the address, we could
change the zip codes only for houses with an old zip code that needed updating:
> neigh[neigh.getZip() == 65432].setZip(23456)
The possibilities are endless, obviously – limited only by the facilities provided by the Context
within which Eidos is being used.
In section 2.7.3 we encountered the function() function, which printed out all of the functions
known to Eidos in the form of function signatures. There is a parallel method, the method()
method, for printing out all of the methods defined for a particular object in the form of method
signatures. For example, we could do a little introspection on our neighborhood object, neigh:
>
+
+
-
neigh.method()
(integer)getZip(void)
(void)method([string$])
(void)property([string$])
(void)setZip(integer)
(void)str(void)
This output tells us that there are five methods defined on our house element (and thus on our
neighborhood object): the getZip() and setZip() methods that we just discussed, the method()
method that we are discussing now, and two more, property() and str(), that we will discuss
momentarily.
One thing you might notice is that these method signatures begin with a leading + or -, unlike
the function signatures we saw in section 2.7.3. This notation is borrowed from Objective-C; it
denotes whether a method is a class method (with a +) or an instance method (with a -). In some
languages this is a very strong distinction, between a class – representing the abstract concept of a
neighborhood, for example – and an instance – representing a specific neighborhood with
particular houses in it. In Eidos this distinction is fairly unimportant, because classes are not very
defined in Eidos; there is no way to get a class object, there is no inheritance hierarchy among
classes, and so forth. However, the difference does manifest with methods in one significant way:
a class method is called just a single time, even when it is called on an object that contains
several elements, because the method is being called on the class, which is shared by all the
elements. Since method() is a class method, this happened with the call above, in fact, but you
may not have noticed! If method() had been called on each element – each house inside neigh –
then the list of method signatures would have been printed repeatedly, once for each element, as
method() was called on each element in turn. Instead, method() was called just once, on the
class. In other words, method() does not ask elements “what methods do you respond to?”, but
instead asks the class “what methods would all instances of you respond to?” If that explanation is
38
clear as mud, don’t worry about it; because classes are de-emphasized in Eidos, there is really no
need to understand this distinction between class methods and instance methods.
Note that method() takes an optional singleton string argument; as with function(), this
allows us to get the method signature for a specific method:
> neigh.method("property")
+ (void)property([string$])
So what is the property() method, then? It is actually an object introspection method as well,
like method(), but it shows us the properties that are defined by a given object class. In a sense, it
is very similar to the ls() function (see section 2.4.1); but the ls() function lists globally defined
constants and variables, whereas the property() method shows us the properties defined by an
object. For example, we could call it on our neighborhood vector, neigh:
> neigh.property()
address => (string)
color <-> (string)
This output tells us that two properties are defined for neigh (or for the neighborhood class,
really, since property() is a class method). One is a read-only property (as shown by the =>
arrow) named address; that property has type string. The other is a read-write property (as
shown by the <-> arrow) named color; that property also has type string. All of this should be
quite reminiscent of the ls() function that we saw in section 2.4.1.
There was one more method shown above in the output from neigh.method(): the str()
method. Let’s try calling it on neigh to see what it does:
> neigh.str()
Neighborhood:
address => (string) "100 Main St." "101 Main St." ... (10 values)
color <-> (string) "beige" "beige" ... (10 values)
So it is similar to property(), but it shows us some additional information about the object:
what class it is (Neighborhood), how many values each property has (10, in this case), and what the
first couple of values look like. The name “str” is short for “structure”; the idea is that this method
gives you a sense of the internal structure of an object.
Incidentally, method(), property(), and str() are defined for all object classes in Eidos; they
are in no way particular to the hypothetical neighborhood class we’ve been exploring. They are
formally documented in section 3.8.
3. Built-in functions and methods
As mentioned in sections 1 and 2.7, Eidos does not allow the user to define new functions. The
built-in functions presented here are thus the sum total of the functions available to Eidos code
(well, except for additional functions defined by the Context; SLiM defines a handful, for example,
for use in initializing simulations). If you find that you need a reasonably common and generalpurpose function that has not been provided, such as a math function, please let us know. If you
find this limiting, you might want to try out the executeLambda() function described in section 3.6,
which supplies a cheap substitute for user-defined functions.
Functions are listed here by category (math, summary statistics, vector construction, value
inspection/manipulation, value typing/coercion, filesystem access, and other miscellaneous
39
functions). Within each category functions are listed alphabetically. For each function, the
function signature is shown (see section 2.7.3), and then a brief description is given.
It might be worth noting here that the basic philosophy of Eidos is that of a functional language:
operations in Eidos (other than assignment) generally do not modify existing values, but instead
generate new values. If you wish a vector x to become sorted, you execute x = sort(x) instead of
simply sort(x); the sort() function does not modify x, but rather produces a new value which
must be assigned back into x if you want the new value to replace the old. Eidos follows R in
adopting this philosophy because it limits unexpected side effects. The object-elements in Eidos
violate this philosophy; when you set a property or call a method on an object-element, you are
changing the object-element itself (for the reasons discussed in section 2.8.2). The built-in types
and functions of Eidos, however, follow this philosophy.
3.1. Math functions
The math functions in Eidos are pattered closely upon those in C++, and are typically
implemented by calling a C++ function of the same name, as described below when applicable.
(numeric)abs(numeric x)
Returns the absolute value of x. If x is integer, the C++ function llabs() is used and an integer
vector is returned; if x is float, the C++ function fabs() is used and a float vector is returned.
(float)acos(numeric x)
Returns the arc cosine of x using the C++ function acos().
(float)asin(numeric x)
Returns the arc sine of x using the C++ function asin().
(float)atan(numeric x)
Returns the arc tangent of x using the C++ function atan().
(float)atan2(numeric x, numeric y)
Returns the arc tangent of y/x using the C++ function atan2(), which uses the signs of both x and y
to determine the correct quadrant for the result.
(float)ceil(float x)
Returns the ceiling of x: the smallest integral value greater than or equal to x. Note that the return
value is float even though integral values are guaranteed, because values could be outside of the
range representable by integer.
(float)cos(numeric x)
Returns the cosine of x using the C++ function cos().
(float)exp(numeric x)
Returns the base-e exponential of x, ex, using the C++ function exp(). This may be somewhat
faster than E^x for large vectors.
(float)floor(float x)
Returns the floor of x: the largest integral value less than or equal to x. Note that the return value is
float even though integral values are guaranteed, because values could be outside of the range
representable by integer.
40
(integer)integerDiv(integer x, integer y)
Returns the result of integer division of x by y. The / operator in Eidos always produces a float
result; if you want an integer result you may use this function instead. If any value of y is 0, an error
will result. The parameters x and y must either be of equal length, or one of the two must be a
singleton. The precise behavior of integer division, in terms of how rounding and negative values
are handled, may be platform dependent; it will be whatever the C++ behavior of integer division is
on the given platform. Eidos does not guarantee any particular behavior, so use this function with
caution.
(integer)integerMod(integer x, integer y)
Returns the result of integer modulo of x by y. The % operator in Eidos always produces a float
result; if you want an integer result you may use this function instead. If any value of y is 0, an error
will result. The parameters x and y must either be of equal length, or one of the two must be a
singleton. The precise behavior of integer modulo, in terms of how rounding and negative values
are handled, may be platform dependent; it will be whatever the C++ behavior of integer modulo is
on the given platform. Eidos does not guarantee any particular behavior, so use this function with
caution.
(logical)isFinite(float x)
Returns the finiteness of x: T if x is not INF or NAN, F if x is INF or NAN. INF and NAN are defined only
for type float, so x is required to be a float. Note that isFinite() is not the opposite of
isInfinite(), because NAN is considered to be neither finite nor infinite.
(logical)isInfinite(float x)
Returns the infiniteness of x: T if x is INF, F otherwise. INF is defined only for type float, so x is
required to be a float. Note that isInfinite() is not the opposite of isFinite(), because NAN is
considered to be neither finite nor infinite.
(logical)isNAN(float x)
Returns the undefinedness of x: T if x is not NAN, F if x is NAN. NAN is defined only for type float, so x
is required to be a float.
(float)log(numeric x)
Returns the base-e logarithm of x using the C++ function log().
(float)log10(numeric x)
Returns the base-10 logarithm of x using the C++ function log10().
(float)log2(numeric x)
Returns the base-2 logarithm of x using the C++ function log2().
(numeric$)product(numeric x)
Returns the product of x: the result of multiplying all of the elements of x together. If x is float, the
result will be float. If x is integer, things are a bit more complex; the result will be integer if it
can fit into the integer type without overflow issues (including during intermediate stages of the
computation), otherwise it will be float.
(float)round(float x)
Returns the round of x: the integral value nearest to x, rounding half-way cases away from 0. Note
that the return value is float even though integral values are guaranteed, because values could be
outside of the range representable by integer.
(float)sin(numeric x)
Returns the sine of x using the C++ function sin().
41
(float)sqrt(numeric x)
Returns the square root of x using the C++ function sqrt(). This may be somewhat faster than
x^0.5 for large vectors.
(numeric$)sum(lif x)
Returns the sum of x: the result of adding all of the elements of x together. The unusual parameter
type signature lif indicates that x can be logical, integer, or float. If x is float, the result will
be float. If x is logical, the result will be integer (the number of T values in x, since the integer
values of T and F are 1 and 0 respectively). If x is integer, things are a bit more complex; in this
case, the result will be integer if it can fit into the integer type without overflow issues (including
during intermediate stages of the computation), otherwise it will be float.
(float)tan(numeric x)
Returns the tangent of x using the C++ function tan().
(float)trunc(float x)
Returns the truncation of x: the integral value nearest to, but no larger in magnitude than, x. Note
that the return value is float even though integral values are guaranteed, because values could be
outside of the range representable by integer.
3.2. Summary statistics functions
(+$)max(+ x)
Returns the maximum of x: the greatest value it contains. The return type will match that of x. If x
has a size of 0, the return value will be NULL.
(float$)mean(numeric x)
Returns the arithmetic mean of x: the sum of x divided by the number of values in x. If x has a size of
0, the return value will be NULL.
(+$)min(+ x)
Returns the minimum of x: the least value it contains. The return type will match that of x. If x has a
size of 0, the return value will be NULL.
(+)pmax(+ x, + y)
Returns the parallel maximum of x and y: the element-wise maximum for each corresponding pair of
elements in x and y. Both the type and the size of x and y must match, and the returned value will
have the same type and size.
(+)pmin(+ x, + y)
Returns the parallel minimum of x and y: the element-wise minimum for each corresponding pair of
elements in x and y. Both the type and the size of x and y must match, and the returned value will
have the same type and size.
(numeric)range(numeric x)
Returns the range of x, a vector of length 2 composed of the minimum and maximum values of x at
indices 0 and 1, respectively. The return type will match that of x. If x has a size of 0, the return
value will be NULL.
(float$)sd(numeric x)
Returns the corrected sample standard deviation of x. If x has a size of 0 or 1, the return value will
be NULL.
42
3.3. Vector construction functions
(*)c(...)
Returns the concatenation of all of its parameters into a single vector. The parameters will be
promoted to the highest type represented among them, and that type will be the return type. NULL
values are ignored; they have no effect on the result.
(float)float(integer$ length)
Returns a new float vector of the length specified by length, filled with 0.0 values. This can be
useful for pre-allocating a vector which you then fill with values by subscripting.
(integer)integer(integer$ length)
Returns a new integer vector of the length specified by length, filled with 0 values. This can be
useful for pre-allocating a vector which you then fill with values by subscripting.
(logical)logical(integer$ length)
Returns a new logical vector of the length specified by length, filled with F values. This can be
useful for pre-allocating a vector which you then fill with values by subscripting.
(object<undefined>)object(void)
Returns a new empty object vector. Unlike float(), integer(), logical(), and string(), a
length cannot be specified and the new vector contains no elements. This is because there is no
default value for the object type. Adding to such a vector is typically done with c().
(integer)rbinom(integer$ n, integer size, float prob)
Returns a vector of n random draws from a binomial distribution with a number of trials specified by
size and a probability of success specified by prob. The size and prob parameters may either be
singletons, specifying a single value to be used for all of the draws, or they may be vectors of length n,
specifying a value for each draw. The draws are obtained from the standard Eidos random number
generator, which might be shared with the Context. The algorithm used is from the GNU Scientific
Library.
(*)rep(* x, integer$ count)
Returns the repetition of x: the entirety of x is repeated count times. The return type matches the
type of x.
(*)repEach(* x, integer count)
Returns the repetition of elements of x: each element of x is repeated. If count is a singleton, it
specifies the number of times that each element of x will be repeated. Otherwise, the length of count
must be equal to the length of x; in this case, each element of x is repeated a number of times
specified by the corresponding value of count.
(float)rexp(integer$ n, [numeric rate])
Returns a vector of n random draws from an exponential distribution with rate parameter rate (i.e.
mean 1/rate). The rate parameter may either be a singleton, specifying a single value to be used for
all of the draws, or it may be a vector of length n, specifying a value for each draw. By default, rate
is 1.0; it must have a value greater than 0. The draws are obtained from the standard Eidos random
number generator, which might be shared with the Context. The algorithm used is from the GNU
Scientific Library.
(float)rnorm(integer$ n, [numeric mean], [numeric sd])
Returns a vector of n random draws from a normal distribution with mean mean and standard
deviation sd. The mean and sd parameters may either be singletons, specifying a single value to be
used for all of the draws, or they may be vectors of length n, specifying a value for each draw. By
43
default, mean is 0.0 and sd is 1.0. The draws are obtained from the standard Eidos random number
generator, which might be shared with the Context. The algorithm used is from the GNU Scientific
Library.
(integer)rpois(integer$ n, numeric lambda)
Returns a vector of n random draws from a Poisson distribution with parameter lambda (not to be
confused with the language concept of a “lambda”; lambda here is just the name of a parameter,
because the symbol typically used for the parameter of a Poisson distribution is the Greek letter λ).
The lambda parameter may either be a singleton, specifying a single value to be used for all of the
draws, or it may be a vector of length n, specifying a value for each draw. The draws are obtained
from the standard Eidos random number generator, which might be shared with the Context. The
algorithm used is from the GNU Scientific Library.
(float)runif(integer$ n, [numeric min], [numeric max])
Returns a vector of n random draws from a uniform distribution from min to max, inclusive. The min
and max parameters may either be singletons, specifying a single value to be used for all of the draws,
or they may be vectors of length n, specifying a value for each draw. By default, min is 0.0 and max
is 1.0. The draws are obtained from the standard Eidos random number generator, which might be
shared with the Context.
(*)sample(* x, integer$ size, [logical$ replace], [numeric weights])
Returns a vector of size containing a sample from the elements of x . If replace is T, sampling is
conducted with replacement (the same element may be drawn more than once); if it is F sampling is
done without replacement. If replace is not specified, it is F by default. A vector of weights may be
supplied in weights; if supplied, it must be equal in size to x, all weights must be non-negative, and
the sum of the weights must be greater than 0. If weights is not supplied, equal weights are used for
all elements of x. An error occurs if sample() runs out of viable elements from which to draw; most
notably, if sampling is done without replacement then size must be at most equal to the size of x, but
if weights of zero are supplied then the restriction on size will be even more stringent. The draws are
obtained from the standard Eidos random number generator, which might be shared with the Context.
(numeric)seq(numeric$ from, numeric$ to, [numeric$ by])
Returns a sequence, starting at from and proceeding in the direction of to until the next value in the
sequence would fall beyond to. By default, the sequence steps by values of 1 or -1 (as needed to
proceed in the direction of to); a different step value may optionally be supplied in by. If from, to,
and by are all integer then the return type will be integer, otherwise it will be float.
(integer)seqAlong(* x)
Returns an index sequence, from 0 to size(x) - 1, with a step of 1. This is a convenience function
for easily obtaining a set of indices to address or iterate through a vector.
(string)string(integer$ length)
Returns a new string vector of the length specified by length, filled with "" values. This can be
useful for pre-allocating a vector which you then fill with values by subscripting.
3.4. Value inspection & manipulation functions
(logical$)all(logical x)
Returns T if all values are T in x; if any value is F, returns F. If x is zero-length, T is returned.
(logical$)any(logical x)
Returns T if any value is T in x; if all values are F, returns F. If x is zero-length, F is returned.
44
(void)cat(* x, [string$ sep])
Concatenates output to Eidos’s output stream, joined together by sep. The value x that is output may
be of any type. By default, sep is a single space, " ". A newline is not appended to the output,
unlike the behavior of print(). Also unlike print(), cat() tends to emit very literal output;
print(logical(0)) will emit “logical(0)”, for example – showing a semantic interpretation of the
value – whereas cat(logical(0)) will emit nothing at all, since there are no elements in the value
(it is zero-length). Similarly, print(NULL) will emit “NULL”, but cat(NULL) will emit nothing.
(logical$)identical(* x, * y)
Returns a logical value indicating whether two values are identical. If x and y have exactly the
same type and size, and all of their corresponding elements are exactly the same, this will return T,
otherwise it will return F. The test here is for exact equality; an integer value of 1 is not considered
identical to a float value of 1.0, for example. Elements in object values must be literally the same
element, not simply identical in all of their properties. Type promotion is never done. For testing
whether two values are the same, this is generally preferable to the use of operator == or operator !=;
see the discussion at section 2.5.1. Note that identical(NULL,NULL) is T.
(*)ifelse(logical test, * trueValues, * falseValues)
Returns the result of a vector conditional operation: a vector composed of values from trueValues,
for indices where test is T, and values from falseValues, for indices where test is F. The lengths
of test, trueValues, and falseValues must be equal, and the type of trueValues and
falseValues must be the same (including, if they are object type, their element type). The return
will be of the same length and type as trueValues and falseValues. This is quite similar to a
function in R of the same name; note, however, that Eidos evaluates all arguments to functions calls
immediately, so trueValues and falseValues will be evaluated fully regardless of the values in
test, unlike in R. Value expressions without side effects are therefore recommended.
(integer)match(* x, * table)
Returns a vector of the positions of (first) matches of x in table. Type promotion is not performed; x
and table must be of the same type. For each element of x, the corresponding element in the result
will give the position of the first match for that element of x in table; if the element has no match in
table, the element in the result vector will be -1. The result is therefore a vector of the same length
as x. If a logical result is desired, with T indicating that a match was found for the corresponding
element of x, use (match(x, table) >= 0).
(integer)nchar(string x)
Returns a vector of the number of characters in the string-elements of x.
(string$)paste(* x, [string$ sep])
Returns a joined string composed from the string representations of the elements of x, joined
together by sep. By default, sep is a single space, " ". Although this function is based upon the R
function of the same name, note that it is much simpler and less powerful; in particular, only the
elements of a single vector may be joined, rather than the var-args functionality of the R paste().
The string representation used by paste() is the same as that emitted by cat().
(void)print(* x)
Prints output to Eidos’s output stream. The value x that is output may be of any type. A newline is
appended to the output. See cat() for a discussion of the differences between print() and cat().
(*)rev(* x)
Returns the reverse of x: a new vector with the same elements as x, but in the opposite order.
(integer$)size(* x)
Returns the size of x: the number of elements contained in x.
45
(+)sort(+ x, [logical$ ascending])
Returns a sorted copy of x: a new vector with the same elements as x, but in ascending sorted order.
If the optional logical value ascending is F, the sorted order will be descending. The ordering is
determined according to the same logic as the < and > operators in Eidos. To sort an object vector,
use sortBy().
(object)sortBy(object x, string$ property, [logical$ ascending])
Returns a sorted copy of x: a new vector with the same elements as x, but in ascending sorted order.
If the optional logical value ascending is F, the sorted order will be descending. The ordering is
determined according to the same logic as the < and > operators in Eidos. The property argument
gives the name of the property within the elements of x according to which sorting should be done.
This must be a simple property name; it cannot be a property path. For example, to sort a Mutation
vector by the selection coefficients of the mutations, you would simply pass "selectionCoeff",
including the quotes, for property.
(void)str(* x)
Prints the structure of x: a summary of its type and the values it contains. If x is an object, note that
str() produces different results from the str() method of x; the str() function prints the external
structure of x (the fact that it is an object, and the number and type of its elements), whereas the
str() method prints the internal structure of x (the external structure of all the properties contained
by x).
(string)strsplit(string$ x, [string$ sep])
Returns substrings of x that were separated by the separator string sep. By default, sep is a single
space, " ". Every substring defined by an occurrence of the separator is included, and thus zerolength substrings may be returned. For example, strsplit(".foo..bar.", ".") returns a string
vector containing "", "foo", "", "bar", "". In that example, the empty string between "foo" and
"bar" in the returned vector is present because there were two periods between foo and bar in the
input string – the empty string is the substring between those two separators. Note that paste()
performs the inverse operation of strsplit().
(string)substr(string x, integer first, [integer last])
Returns substrings extracted from the elements of x, spanning character position first to character
position last (inclusive). Character positions are numbered from 0 to nchar(x)-1. Positions that fall
outside of that range are legal; a substring range that encompasses no characters will produce an
empty string. If first is greater than last, an empty string will also result. If last is omitted, the
substring will extend to the end of the string. The parameters first and last may either be
singletons, specifying a single value to be used for all of the substrings, or they may be vectors of the
same length as x, specifying a value for each substring.
(*)unique(* x)
Returns the unique values in x. In other words, for each value k in x that occurs at least once, the
vector returned will contain k exactly once. The order of values in x is preserved, taking the first
instance of each value.
(integer)which(logical x)
Returns the indices of T values in x. In other words, if an index k in x is T, then the vector returned
will contain k; if index k in x is F, the vector returned will omit k. One way to look at this is that it
converts from a logical subsetting vector to an integer (index-based) subsetting vector, without
changing which subset positions would be selected.
(integer)whichMax(+ x)
Returns the index of the (first) maximum value in x. In other words, if k is equal to the maximum
value in x, then the vector returned will contain the index of the first occurrence of k in x. If the
46
maximum value is unique, the result is the same as (but more efficient than) the expression
which(x==max(x)), which returns the indices of all of the occurrences of the maximum value in x.
(integer)whichMin(+ x)
Returns the index of the (first) minimum value in x. In other words, if k is equal to the minimum
value in x, then the vector returned will contain the index of the first occurrence of k in x. If the
minimum value is unique, the result is the same as (but more efficient than) the expression
which(x==min(x)), which returns the indices of all of the occurrences of the minimum value in x.
3.5. Value type testing and coercion functions
(float)asFloat(+ x)
Returns the conversion to float of x. If x is string and cannot be converted to float, Eidos will
throw an error.
(integer)asInteger(+ x)
Returns the conversion to integer of x. If x is of type string or float and cannot be converted to
integer, Eidos will throw an error.
(logical)asLogical(+ x)
Returns the conversion to logical of x. Recall that in Eidos the empty string "" is considered F,
and all other string values are considered T. Converting INF or -INF to logical yields T (since
those values are not equal to zero); converting NAN to logical throws an error.
(string)asString(+ x)
Returns the conversion to string of x.
(string$)elementType(* x)
Returns the element type of x, as a string. For the non-object types, the element type is the same
as the type: "NULL", "logical", "integer", "float", or "string". For object type, however,
elementType() returns the name of the type of element contained by the object, such as "SLiMSim"
or "Mutation" in the Context of SLiM. Contrast this with type().
(logical$)isFloat(* x)
Returns T if x is float type, F otherwise.
(logical$)isInteger(* x)
Returns T if x is integer type, F otherwise.
(logical$)isLogical(* x)
Returns T if x is logical type, F otherwise.
(logical$)isNULL(* x)
Returns T if x is NULL type, F otherwise.
(logical$)isObject(* x)
Returns T if x is object type, F otherwise.
(logical$)isString(* x)
Returns T if x is string type, F otherwise.
(string$)type(* x)
Returns the type of x, as a string: "NULL", "logical", "integer", "float", "string", or
"object". Contrast this with elementType().
47
3.6. Filesystem access functions
(string)filesAtPath(string$ path, [logical$ fullPaths])
Returns a string vector containing the names of all files in a directory specified by path. If
fullPaths is T, full filesystem paths are returned for each file; if fullPaths is F or omitted, only the
filenames relative to the specified directory are returned. This list includes directories (i.e. subfolders),
including the "." and ".." directories on Un*x systems. The list also includes invisible files, such as
those that begin with a "." on Un*x systems. This function does not descend recursively into
subdirectories. If an error occurs during the read, NULL will be returned.
(string)readFile(string$ filePath)
Reads in the contents of a file specified by filePath and returns a string vector containing the
lines (separated by \n and \r characters) of the file. Reading files other than text files is not presently
supported. If an error occurs during the read, NULL will be returned.
(logical$)writeFile(string$ filePath, string contents)
Writes out a new file to filePath with contents specified by contents, a string vector of lines.
Note that newline characters will be added at the ends of the lines in contents. If you do not wish to
have newlines added, you should use paste() to assemble the elements of contents together into a
singleton string. If the write is successful, T will be returned; if not, F will be returned.
3.7. Miscellaneous functions
(*)apply(* x, string$ lambdaSource)
Applies a block of Eidos code to the elements of a vector. This function is sort of a hybrid between
c() and executeLambda(); it might be useful to consult the documentation for both of those
functions to better understand what apply() does. For each element in x, the lambda defined by
lambdaSource will be called. For the duration of that callout, a variable named applyValue will be
defined to have as its value the element of x currently being processed. The expectation is that the
lambda will use applyValue in some way, and will return either NULL or a new value (which need
not be a singleton, and need not be of the same type as x). The return value of apply() is generated
by concatenating together all of the individual vectors returned by the lambda, in exactly the same
manner as the c() function (including the possibility of type promotion).
Since this function can be hard to understand at first, here is an example:
apply(1:10, "if (applyValue % 2) applyValue ^ 2;");
This produces the output 1 9 25 49 81. The apply() operation begins with the vector 1:10. For
each element of that vector, the lambda is called and applyValue is defined with the element value.
In this respect, apply() is actually very much like a for loop. If applyValue is even (as evaluated
by the modulo operator, %), the condition of the if statement is F and so NULL is implicitly returned
by the lambda (since the if has no else clause). If applyValue is odd, on the other hand, the
lambda returns its square (as calculated by the exponential operator, ^). Just as with the c() function,
NULL values are dropped during concatenation, so the final result contains only the squares of the odd
values.
This example illustrates that the lambda can “drop” values by returning NULL, so apply() can be used
to select particular elements of a vector that satisfy some condition, much like the subscript operator,
[]. The example also illustrates that input and result types do not have to match; the vector passed in
is integer, whereas the result vector is float.
There is no scoping in Eidos, so as with executeLambda(), all defined variables are accessible within
the lambda, and changes made to variables inside the lambda will persist beyond the end of the
apply() call; the lambda is executing in the same scope as the rest of your code.
48
The apply() function can seem daunting at first, but it is an essential tool in the Eidos toolbox. It
combines the iteration of a for loop, the ability to select elements like operator [], and the ability to
assemble results of mixed type together into a single vector like c(), all with the power of arbitrary
Eidos code execution like executeLambda(). It is much faster than calling executeLambda() on
each element of a vector using a for loop; if the alternative to using apply() is a for loop that adds
new values to a result vector one at a time, apply() is also likely to be much faster. Like
executeLambda(), apply() is most efficient if it is called multiple times with a single string script
variable, rather than with a newly constructed string for lambdaSource each time.
(string$)date(void)
Returns a standard date string for the current date in the local time of the executing machine. The
format is %d-%m-%Y (day in two digits, then month in two digits, then year in four digits, zero-padded
and separated by dashes) regardless of the localization of the executing machine, for predictability
and consistency.
(*)doCall(string$ function, ...)
Returns the results from a call to a specified function. The function named by the parameter
function is called, and the remaining parameters to doCall() are forwarded on to that function
verbatim. This can be useful for calling one of a set of similar functions, such as sin(), cos(), etc.,
to perform a math function determined at runtime, or one of the as...() family of functions to
convert to a type determined at runtime.
(*)executeLambda(string$ lambdaSource, [logical$ timed])
Executes a block of Eidos code defined by lambdaSource. It has been said at various points in this
manual that Eidos does not allow you to define your own functions. That is, strictly speaking, true.
However, it does allow you to execute lambdas: blocks of Eidos code which can be called. Eidos
lambdas do not take arguments; for this reason, they are not first-class functions. (They share the
scope of the caller, however, so you may effectively pass values in and out of a lambda using global
variables.) The string argument lambdaSource may contain one or many Eidos statements as a
single string value. Lambdas are represented, to the caller, only as the source code string
lambdaSource; the executable code is not made available programmatically. If an error occurs
during the tokenization, parsing, or execution of the lambda, that error is raised as usual; executing
code inside a lambda does not provide any additional protection against exceptions raised. The
return value produced by the code in the lambda is returned by executeLambda(). If the optional
parameter timed is T, the total (user clock) execution time for the lambda will be printed after the
lambda has completed; the default for timed is F.
The current implementation of executeLambda() caches a tokenized and parsed version of
lambdaSource, so calling executeLambda() repeatedly on a single source string is much more
efficient than calling executeLambda() with a newly constructed string each time. If you can use a
string literal for lambdaSource, or reuse a constructed source string stored a variable, that will
improve performance considerably.
(void)function([string$ functionName])
Prints function signatures for all functions, or for the function named by functionName, to Eidos’s
output stream. See section 2.7.3 for more information.
(integer$)getSeed(void)
Returns the random number seed. This is the last seed value set using setSeed(); if setSeed() has
not been called, it will be a seed value chosen based on the process-id and the current time when
Eidos was initialized, unless the Context has set a different seed value.
(void)license(void)
Prints Eidos’s license terms to Eidos’s output stream.
49
(void)ls(void)
Prints all currently defined variables to Eidos’s output stream. See section 2.4.1 for more information.
(void)rm([string variableNames])
Removes global variables from the Eidos namespace; in other words, it causes the variables to
become undefined. Variables are specified by their string name in the variableNames parameter.
If the optional variableNames parameter is omitted, all variables will be removed (be careful!).
Attempting to remove a constant is an error.
(void)setSeed(integer$ seed)
Set the random number seed. Future random numbers will be based upon the seed value set, and the
random number sequence generated from a particular seed value is guaranteed to be reproducible.
The last seed set can be recovered with the getSeed() function.
(void)stop([string$ message])
Stops execution of Eidos (and of the Context, such as the running SLiM simulation, if applicable), in
the event of an error. If the optional message parameter is supplied it will be printed to Eidos’s output
stream prior to stopping.
(string$)time(void)
Returns a standard time string for the current time in the local time of the executing machine. The
format is %H:%M:%S (hour in two digits, then minute in two digits, then seconds in two digits, zeropadded and separated by dashes) regardless of the localization of the executing machine, for
predictability and consistency. The 24-hour clock time is used (i.e., no AM/PM).
(void)version(void)
Prints Eidos’s version to Eidos’s output stream.
3.8. Built-in methods
These methods are built into Eidos, although Eidos has no object classes of its own. All objects
defined by the Context will automatically inherit these methods.
+ (void)method([string$ methodName])
Prints the method signature for the method specified by methodName, or for all methods supported by
the receiving object if methodName is not supplied.
+ (void)property([string$ propertyName])
Prints the property signature for the property specified by propertyName, or for all properties
supported by the receiving object if propertyName is not supplied.
– (void)str(void)
Prints the internal property structure of the receiving object; in particular, the element type of the
object is printed, followed, on successive lines, by all of the properties supported by the object, their
types, and a sample of their values.
4. EidosScribe
4.1 EidosScribe overview
EidosScribe is a Mac OS X application that provides an interactive scripting environment for
Eidos. It’s a very simple app, so this manual will provide only a very quick overview of it.
50
All the action in EidosScribe happens in one window, the scripting window: [Added since this:
the variable browser, the status bar showing function/method signatures.]
4
5
6
7
8
9
10
3
1
2
The numbers in this screenshot mark controls and areas of the EidosScribe window:
1. the Console area,
2. the Script area,
3. the Check Script button,
4. the Eidos Help button,
5. the Execute Selection button,
6. the Execute File button,
7. the Clear Console button,
8. the Show Tokens button,
9. the Show AST button, and
10. the Show Execution Trace button.
4.2 Interactive scripting
The Console area (1) is the most important part of the EidosScribe window for interactive
scripting. The > prompt shown by EidosScribe indicates that it is ready for new input. Input may
be entered at the prompt; that single line may contain more than one statement, but it must be
syntactically complete. In other words, unlike some other interpreter environments, a partial
statement will not result in a continuation prompt requesting further lines of input; it will instead
51
result in a parsing error. Multi-line input can be entered using option-Return (or option-Enter) to
insert newlines; the input will not be processed until Return or Enter is pressed.
Given syntactically and semantically correct input, EidosScribe will tokenize, parse, and
execute the input, and will show the output and results in the Console. Color-coding is used to
differentiate among the different blocks of text in the Console; blue text is user input, whereas
black text is output from Eidos. The Console area can be cleared by pressing the Clear Console
button (7).
The Console remembers its history of previously executed commands. You may flip through
that history using the up and down arrow keys. This is particularly useful when a like of input
generated an error due to a typo; you can press up-arrow and edit the line to fix the problem.
Note that for interactive scripting, a semicolon is not required at the end of your input line to
terminate your last statement. So 6+7 is acceptable input, and will be executed as 6+7; after
correction. This is intended to be a convenience feature for quick interactive sessions;
remembering the semicolons can be frustrating, especially for those coming from languages such
as R that do not require them.
4.3 File-based script execution
EidosScribe also allows for programming based upon a script file. The Script area (2) shows
your script file. You may enter any text you wish in that area; it is of no significance until you tell
EidosScribe to execute it in the Console.
To execute a block of your script, select it and press the Execute Selection button (5). If you
wish to execute a single line of script, you can simply place the insertion point anywhere within
that line and press Execute Selection, and the full line will be executed. You can also execute the
full contents of your Script area by pressing the Execute File button (6).
You can check the syntax of your script at any time by pressing the Check Script button (3).
This runs the tokenizer and parser on your script, so syntax errors will be found and flagged. It
does not, however, execute your script, so semantic errors such as supplying the wrong number of
parameters to a function or mixing incompatible types will not be found; those are runtime errors
that are only found when your script is actually executed.
Help with Eidos can be brought up by pressing the Eidos Help button (4). In fact, that button
may well bring up this very manual; but perhaps something briefer and more immediately helpful
will appear, depending upon the version of EidosScribe you are using.
Unlike in interactive scripting (section 4.2), file-based script execution requires a semicolon at
the end of every statement; EidosScribe will not correct your script prior to execution. Semicolons
are, in fact, a required part of the grammar of Eidos, and the cheap hack that EidosScribe uses to
add them in interactive mode would not work well as a general solution.
4.4 Code completion
It can be hard to remember the names of all of the properties and methods exported by Eidos
objects, as well as the names of the functions and global variables. For this reason, EidosScribe
provides a code completion mechanism to assist you in programming. The keyboard command to
initiate code completion is the Escape key (␛); Command-period (⌘.) may also work depending
upon your key bindings.
The situations in which code completion will work are somewhat limited. If the selection is at
a point in your script where a new statement is beginning (after a semicolon, ;, or a right brace, },
for example), you will be offered a list of all of the global constants and variables, functions, and
52
statement keywords available. If the selection is in an object key path such as sim.chromosome.
you will be offered a list of the properties and methods supported for that key path. In both of
these circumstances, a partial identifier just prior to the selection will be used to filter the choices
available.
For example, if you begin a new statement and type si and then press ␛, you will be offered
completions beginning with “si”: sim (the SLiMSim simulation global variable), if you are using
Eidos in SLiM, and sin() and size(), the two functions whose names begin with “si”. Similarly, if
(in SLiMgui, in fact, not in EidosScribe) you type sim.chromosome.r and then press ␛, you will be
offered completions based upon the Chromosome object type that begin with “r”:
recombinationEndPositions and recombinationRates.
Eidos is generally aware of the types produced by functions, methods, and properties; it uses
the same information that it uses to provide the output for the function() function section 2.7.3)
and the method() and property() methods (section 2.8.6). In some cases, that information is
insufficient; the sortBy() function returns an object, for example, but it is not known what class
that object will be until the code is actually executed. In such cases, code completion cannot
provide suggestions. In most cases, however, it has the information it needs.
Code completion is, however, based upon your current interpreter state – the global variables
that are defined, their values, and the state of the Context you are in (such as the SLiM simulation
that is running). This is true even if you are working in EidosScribe’s Script area. You might type
xyzzy = 17; there, and then on the next line type xyz and press ␛ – and get no completions.
Why? Because you have not actually executed the xyzzy = 17; statement, and so the variable
xyzzy is not actually defined. If you wish to use code completion in such cases, you should
execute each line after you finish it in order to keep EidosScribe’s interpreter state up to date
regarding the variables you have defined; completion off of them should then work correctly.
Even if you can’t execute the actual code you’re working on, you might be able to set a dummy
value for a variable so that completion knows its type and can look up its properties and methods.
4.5 Debugging controls
EidosScribe provides some facilities for examining the internals of Eidos. This is intended
primarily for debugging, but might also be of interest to those who are curious about how the
language is implemented.
4.5.1 Showing tokenization
The Show Tokens button (8), if pressed, causes the token stream obtained from the user’s input
to be displayed in the Console. Tokens are small chunks into which the input is broken prior to
processing. In the English language, tokens would be words and punctuation marks; in Eidos,
tokens are operators, numeric and string literals, function names, variable names, and such.
Showing tokens allows you to verify that your input was broken into tokens correctly. For
example:
> x = 3+7*3^2
@x = #3 + #7 * #3 ^ #2 ; EOF
You can see that x was tokenized as an identifier (marked with an @), the numbers were
tokenized as numeric literals (and thus marked with #), the operators were given their own tokens,
and the end of the input was marked with an EOF (End Of File) token.
53
4.5.2 Showing the abstract syntax tree (AST)
The Show AST button (9), if pressed, causes the AST (Abstract Syntax Tree) generated from the
token stream to be shown. The AST is an intermediate representation of the user’s input as a tree,
where operators are nodes in the tree and their operands are their children. The nesting of the tree
is shown with parentheses. So for the input above, we would have:
> x = 3+7*3^2
($>
(=
@x
(+
#3
(*
#7
(^ #3 #2)
)
)
)
)
The deepest node is the exponentiation, 3^2. The next node up multiplies the result of that
exponentiation by 7, and so forth. At the top of the tree is the parent node for the whole input
line, which is marked with the special designator $> representing the EidosScribe prompt. You can
examine the AST to confirm that the correct grouping and precedence of operations is being
expressed by the tree.
4.5.3 Showing the evaluation trace
The Show Execution Trace button, if pressed, causes EidosScribe to output a trace of Eidos’s
internal execution as it processes the AST, evaluates it, and generates a result. Evaluating the AST
depends upon an algorithm that recursively “walks” the AST, visiting parents and then visiting
each of their children in turn. As each child is evaluated, it generates a return value; those return
values are collected, and once assembled together, allow the evaluation of the parent, which can
then generate its own return value. The execution trace shows entry into, and exit out of, each
node in the AST as it is walked, as well as each intermediate return value generated. Since this
execution trace refers to functions and state that is internal to the implementation of Eidos, it may
be of limited utility; at a minimum, however, it can be used to confirm that execution of the AST
results in a call sequence that makes logical sense. For example, for the example above, the
execution trace would be:
> x = 3+7*3^2
EvaluateInterpreterBlock() entered
EvaluateNode() : token =
Evaluate_Assign() entered
Evaluate_LValueReference() : token @x
EvaluateNode() : token +
Evaluate_Plus() entered
EvaluateNode() : token #3
Evaluate_Number() entered
Evaluate_Number() : return == 3
EvaluateNode() : token *
Evaluate_Mult() entered
EvaluateNode() : token #7
Evaluate_Number() entered
Evaluate_Number() : return == 7
EvaluateNode() : token ^
54
Evaluate_Exp() entered
EvaluateNode() : token #3
Evaluate_Number() entered
Evaluate_Number() : return == 3
EvaluateNode() : token #2
Evaluate_Number() entered
Evaluate_Number() : return == 2
Evaluate_Exp() : return == 9
Evaluate_Mult() : return == 63
Evaluate_Plus() : return == 66
Evaluate_Assign() : return == NULL
EvaluateInterpreterBlock() : return == NULL
At the deepest level, for example, you can see how the exponentiation, 3^2, is evaluated by first
evaluating each operand (getting 3 and 2 as results), and then performing an exponentiation of 3
and 2 to produce a result of 9. That 9 is then passed up as an operand to its parent,
Evaluate_Mult, which evaluates the multiplication of 7 and 3^2. When the recursive tree walk
completes, the topmost node returns NULL, because the assignment operator always generates NULL
as its result (the assignment of a value to a variable is a side effect).
If all that made no sense, it doesn’t matter. It is of interest mainly to those who are interested in
debugging Eidos and EidosScribe – although an understanding of it might be useful in writing
scripts, too.
55
5. Eidos language reference sheet
Types (in promotion order):
NULL: no value
logical: true/false values
integer: whole numbers
float: real numbers
string: characters
object: SLiM objects
Constants:
E: e (2.7182...)
PI: π (3.1415...)
F: false (logical)
T: true (logical)
INF: infinity (float)
NAN: Not a Number (float)
NULL: a NULL-type value
Operators (precedence order):
[], (), .
subset, function/method call, member access
+, -, !
unary plus/minus, logical (Boolean) negation
^
exponentiation
:
sequence construction
*, /, %
multiplication, division, modulo
+, addition and subtraction
<, >, <=, >= less-than, greater-than, etc.
==, !=
equality and inequality
&
logical (Boolean) and
|
logical (Boolean) or
=
assignment
Statements:
;
{ ... }
empty statement (null statement, no-op)
compound statement (0 or more statements)
Special Statements (Conditional, Loop, Control-Flow):
if (condition) statement
conditional (executed if condition is T)
if (condition) statement else statement
conditional with alternative (executed if F)
while (condition) statement
loop with condition test at top
do statement while (condition)
loop with condition test at bottom
for (identifier in vector) statement
loop over the values in a vector
next
skip the rest of this time around a loop
break
terminate loop execution and exit
return
exit a script block, returning a value
Math:
abs(): absolute value
ceil(): ceiling (round up)
cos(): cosine
exp(): base-e exponential
floor(): floor (round down)
isFinite(): is a value finite?
isNAN(): is a value NAN?
log(): base-e log
product(): prod. of elements
round(): round to nearest
sin(): sine
sqrt(): square root
sum(): sum of elements
trunc(): round toward 0
Type testing / coercion:
as...() : convert value to ...
element() : get element type
is...() : is value of type ...?
type() : get vector type
Bookkeeping:
date(): get the current date
function(): function signatures
globals(): global variables
help(): get help
rm(): remove a variable
stop(): stop execution
Vector construction:
c(): concatenate vectors
float(): new float vector
integer(): new integer vector
logical(): new logical vector
rbinom(): binomial draws
rep(): repeat a vector
repEach(): repeat elements
rpois(): Poisson draws
runif(): uniform draws
seq(): construct a sequence
seqAlong(): seq. along vector
string(): new string vector
Inspection / manipulation:
all(): are all values T?
any(): are any values T?
cat(): concatenate output
ifelse(): vector conditional
paste(): paste together a string
print(): print output
rev(): reverse a vector
size(): number of elements in vector
sort(): sort a vector
str(): print the structure of a value
which(): indices which are T
whichMax(): indices with max value
whichMin(): indices with min value
Summary statistics:
max(): largest value in a vector
mean(): mean of a vector
min(): smallest value in a vector
range(): range (min/max) of a vector
sd(): standard deviation of a vector
56
6. Railroad diagrams
These “railroad diagrams” represent Eidos’s grammar in a graphical form equivalent to its EBNF
grammar. This grammar has one conflict, due to the if–else ambiguity as usual; this is resolved
by associating the else clause with the closest preceding if statement to which it could apply.
6.1 Start rule
This is Eidos’s start rule, used in the interactive interpreter; note that a semicolon terminating the final
statement may be optional. This is not the start rule used by SLiM, however; SLiM defines a modified Eidos
grammar with a different start rule for use in parsing SLiM input files, as described in section 6.5.
6.2 Statements
57
6.3 Expressions
58
Note that the left-hand side of an assignment statement is restricted at runtime to be an lvalue expression,
an expression that identifies a specific set of defined values to be modified. For example, x is an lvalue
expression since x is a specific identifier whose value can be (re)defined, but x*y is not. Since this
restriction is checked at runtime, the grammar shown is correct, but it does not express this restriction.
Note also that assignment is not valid in Eidos expressions, in general; it is only allowed in the context of an
expr_statement. Eidos differs from many other languages, such as C, in this respect; constructs like
if (x=y) ... are not legal in Eidos, for safety and simplicity (see section 2.4.1).
6.4 Tokens
Whitespace is not significant in Eidos, so these tokens are removed from the token stream. It is a bit odd
that it is shown as '\t'..'\n'; in fact those characters are adjacent in ASCII.
Note that integer and float are not different token types; both tokenize and parse as numeric literals,
and the distinction between them is made when the AST is interpreted. Upon interpretation, numbers with
a decimal point or a negative exponent are taken to be float; all other numeric literals are taken to be
integer. Note that this means that integer literals in Eidos may have a positive exponent.
This railroad diagram is not strictly correct. The lower path does not allow every possible Unicode
character; backslash, quote, newline, and carriage return are excluded. Literal newlines and carriage
returns are thus not legal within string literals; they must be escaped as \n and \r, respectively. Quotes
and backslashes must also be escaped, but tabs may be included literally. Single-quoted string literals
and multiline string literals are not shown in this railroad diagram; see section 2.1.4.
59
6.5 SLiM extensions to the Eidos grammar
The Eidos language itself is defined by the grammar shown in the previous sections. However, SLiM
defines the grammar of a SLiM input file, which defines a small extension to the Eidos language – in
particular, by using a different start rule than Eidos’s interpreter normally uses:
This start rule defines a SLiM input file as a series of zero or more SLiM Eidos blocks, each of which is a
compound statement preceded by an informational section and a callback declaration. The informational
section is optional, since each of its components is optional; it looks like this:
The informational section begins with an optional identifier that can be used to later identify the script
block programmatically. If supplied, it should be an identifier like "s1", or more generally, "sX" where X
is an integer greater than or equal to 0. The rest of the informational section comprises an optional
generation or range of generations in which the script block will be used by SLiM. The generation numbers
are defined syntactically by the grammar as numeric literals, but semantically, there are further restrictions
(see section 5.13.1).
The callback declaration section is also an addition by SLiM to the base Eidos grammar. It is also optional,
since it can be empty:
This rule defines a SLiM script block as being either a simple compound statement without a callback
declaration – an Eidos event – or one of the supported types of Eidos callback (initialize(), fitness(),
mateChoice(), or modifyChild()). The identifier tokens in this rule specify restrictions on the
circumstances in which the callback will be used by SLiM. See SLiM’s documentation for further details on
Eidos events and Eidos callbacks in SLiM.
In all other respects the grammar of Eidos is unmodified in SLiM. It will probably be common for the
Context within which Eidos is used, such as SLiM, to define grammar extensions of this sort, since it will
often be desirable for several Eidos script blocks to be defined within a single text file. The SLiM extended
grammar has thus been described here as an example of how a Context might used Eidos within a broader
file format.
60
7. Acknowledgements
Thanks to the designers of the C, R, and Objective-C languages, which paved the way for Eidos
in so many ways. Thanks to Terence Parr for his excellent book Language Implementation
Patterns, which guided much of the implementation of Eidos’s tokenizer and parser; and thanks to
him also for his grammar of the C language, which provided guidance in the development of
Eidos’s grammar. Thanks to Jean Bovet and Terence Parr for ANTLRWorks, which was very
helpful in designing Eidos’s grammar, and which generated the railroad diagrams used in this
manual. Thanks to Philipp Messer for believing me when I said that adding a whole language to
SLiM would make it much cooler.
8. References
Cox, B. J., & Novobilski, A. J. (1991). Object-oriented programming: An evolutionary approach
[Second edition]. Reading, MA: Addison-Wesley.
Kernighan, B. W., & Ritchie, D. M. (1988). The C programming language [Second edition].
Englewood Cliffs, NJ: Prentice-Hall.
R Core Team (2014). R: A language and environment for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Bovet, J., & Parr, T. (2008). ANTLRWorks: An ANTLR grammar development environment.
Software: Practice and Experience, 38(12), 1305-1332.
Parr, T. (2009). Language Implementation Patterns: Create your own domain-specific and general
programming languages. Frisco, TX: Pragmatic Bookshelf.
61
PART II: USING EIDOS IN A NEW CONTEXT
62
This second part of the Eidos manual is intended for developers who would like to add Eidos to
their own program, to provide scriptability and extensibility. If you are just using Eidos in an
existing Context – such as using it to script SLiM simulations – you do not need to read any of the
chapters in this part, and indeed, they will likely be of no use to you whatsoever.
The chapters in this part of the manual are not intended to provide a comprehensive survey of
every corner of the Eidos implementation and how it could interact with a particular Context.
Instead, the goal is to briefly introduce each of the major classes involved in the implementation of
Eidos, to point developers in the right direction. Since the design of Eidos is still in flux in many
ways, there would be little point in offering more detailed documentation at this time. If you need
further information, the source code for Eidos itself is ultimately the best reference for how Eidos
works, and the source code for SLiM is probably the best guide to how to implement a Context
that works with Eidos. Nevertheless, I hope the chapters here provide some useful guidance.
9. Running Eidos scripts in C++
This chapter will show how to run an Eidos script from C++ code, including creating a script,
setting up an interpreter to run the script, and handling the result returned by the script. These
topics are demonstrated well by the eidos command-line tool, which is one of the targets in the
Xcode project for SLiM and Eidos.
9.1 Initializing Eidos
The first thing that must be done when using Eidos in a given Context is to initialize Eidos so
that it is ready for use. There are several components to this.
First, the global gEidosTerminateThrows, defined in eidos_globals.h, may be set. A value of
true makes Eidos throw an exception whenever any error occurs – an unrecognized or
unexpected token, an undefined variable, an out-of-range subscript. In this case, it is expected
that the Context will catch the exception and will use the error-tracking globals provided by Eidos
(also in eidos_globals.h) to display the error to the user. This is the mode typically used by
interactive applications such as EidosScribe and SLiMgui. A value of false, on the other hand,
makes Eidos log an error message to the console and call exit(), terminating execution completely
with no support from the Context. This is the mode typically used by non-interactive applications
such as the command-line eidos and slim tools. The default is gEidosTerminateThrows == true.
Next, the EidosWarmUp() function (defined in eidos_globals.h) must be called to allow Eidos
to initialize itself. This performs various tasks such as setting up the shared object pool used by
Eidos for value object allocations (see section 9.4), allocating a global symbol table with constants
such as NULL, T, and F (see section 10.3), registering all of the global strings used by Eidos (see
immediately below), and setting up the standard function map (see section 13.3). Before these
tasks have been done, Eidos is not in a usable state, so this call should occur as early in your
program’s execution as is feasible. It is harmless to call this function more than once.
Next, you may wish to register global strings with Eidos. When working with strings such as
property names, method names, function names, and the names of identifiers, Eidos always uses a
special type called EidosGlobalStringID that represents a uniqued string value using an integer;
the actual std::string objects are almost never used, since they are much slower. If you define
your own global strings that need to be registered, this is the time to do it. You will know if you
need to do this, because you will need an EidosGlobalStringID for a call to Eidos, and you won’t
have one. The functions you will need to manage EidosGlobalStringIDs are defined in
63
eidos_globals.h. Eidos_RegisterStringForGlobalID()
registers a permanent global string with a
given EidosGlobalStringID which you must assign, starting at the value gEidosID_LastEntry and
counting upward; you may use values from gEidosID_LastEntry to gEidosID_LastContextEntry
for your own global strings. EidosGlobalStringIDForString() looks up an EidosGlobalStringID
for a non-permanent string, with the side effect of registering a new EidosGlobalStringID for that
string if one does not already exist; this would normally be used for strings that come from the user
and thus cannot be predicted, such as identifier names. StringForEidosGlobalStringID() looks
up the string for a given EidosGlobalStringID, handy for displaying error messages and such.
If you want to start up the Eidos random number generator with a particular seed value, you
may do that now using the EidosInitializeRNGFromSeed() function (in eidos_rng.h). If you do
not do so, a seed will be chosen for you by the EidosGenerateSeedFromPIDAndTime() function
based upon the current process ID and system clock time, when a new Eidos interpreter is created.
Note the implication that random number sequences are not reproducible in Eidos by default; you
must explicitly set a fixed seed if you want a reproducible sequence.
Finally, you may wish to set the gEidosContextVersion and gEidosContextLicense strings
(defined in eidos_globals.h) to strings appropriate for your Context, to tailor the version and
license information displayed by Eidos.
9.2 Creating a script object: EidosScript
Making a new Eidos script object is fairly straightforward, and might look something like this:
EidosScript::ClearErrorPosition();
EidosScript script(script_string);
gEidosCurrentScript = &script;
gEidosExecutingRuntimeScript = false;
script.Tokenize();
script.ParseInterpreterBlockToAST();
The first line clears out any existing error-tracking information kept by Eidos, giving you a clean
slate; this is generally advisable, but is not done automatically in order to give you complete
control over error handling.
The second line allocates a script object (as a local stack variable, here, but operator new could
also be used) with a std::string script named script_string. This call in itself does very little
work apart from storing the script string inside the new script object.
The third and fourth lines set up error-tracking information that ensures that if an error is raised
by Eidos it is clear which script the error occurred in, so that errors are reported correctly. The
gEidosExecutingRuntimeScript flag is true when Eidos is executing a script called from another
script through a mechanism such as the executeLambda() and apply() functions; for scripts
created by a Context to be executed at the top level this flag should be false.
The fifth line tokenizes the script string; the tokens may be extracted from the script and
processed after this call returns, for purposes such as syntax coloring. Eidos tokens are discussed
further in section 15.1. If an error occurs during tokenization, it will either raise an exception or
call exit(), depending upon the value of the global gEidosTerminateThrows (see section 9.1). If
you want to handle such exceptions, you will need to use standard C++ try–catch blocks. This is
generally true for calls to Eidos, so it will not be repeated in the sections below.
The sixth line parses the token stream and creates an AST (Abstract Syntax Tree) representing
the parse. The AST may also be extracted from the script and processed after this call returns, for
64
purposes such as code analysis; see section 15.2. This call also may generate an error due to
syntax errors or semantic errors in the script.
9.3 Setting up an Eidos interpreter: EidosInterpreter
After the calls above, you have an EidosScript object that is tokenized, parsed, and ready to be
executed. To execute the script, code such as the following is typical:
EidosSymbolTable var_symbols(false, gEidosConstantsSymbolTable);
EidosFunctionMap *function_map = EidosInterpreter::BuiltInFunctionMap();
EidosInterpreter interpreter(script, var_symbols, *function_map, nullptr);
EidosValue_SP result = interpreter.EvaluateInterpreterBlock(true);
std::string output = interpreter.ExecutionOutput();
The first line allocates a symbol table (as a local stack variable here, but operator new could be
used) for use by the interpreter. This symbol table is set up to hold variables (indicated by the
false parameter) rather than constants, and it links to the built-in Eidos symbol table of standard
constants, gEidosConstantsSymbolTable, in order to include its contents. A symbol table is set up
explicitly in this way because many Contexts will want to set up a more complicated symbol table
situation, with one or more symbol tables of Context-specific constants linked into the chain (see
section 10.3).
The second line gets the standard Eidos function map, used to look up functions when they are
called. Again, this is done explicitly because many Contexts will wish to define additional
functions using a derived function map (see section 13.3).
The third line creates an Eidos interpreter for the given script, symbol table, and function map.
The final nullptr parameter is of type EidosContext*, where EidosContext is a typedef for
EidosObjectElement (see section 10.2). You may designate any such object as the Context for the
interpreter; SLiM, for example, designates the top-level SLiMSim object representing the simulation
as the Context here. Eidos itself will do nothing at all with this Context object except keep a
pointer to it for you and provide it back to you when you call the Context() method of
EidosInterpreter. It is thus a way for you to get back to objects belonging to the Context when
your code is called by Eidos. Use of it is purely optional, thus the nullptr passed here.
The fourth line calls on the interpreter to evaluate (i.e., execute) its script. An object of type
EidosValue_SP is returned, encapsulating the result of that execution (see the next section). The
true parameter indicates that each line of the script should emit its standard output to the Eidos
output stream; this is the usual mode, but in some situations (such as the executeLambda() and
apply() functions) such output is suppressed in order to produce the desired effect.
Finally, the fifth line gets the contents of the interpreter’s output stream as a std::string which
might be printed in a console window, saved to a file, or shown in an alert panel.
After these calls, the interpreter object is normally not used again. Interpreters are extremely
lightweight to construct, so repeated execution of the same script would normally reuse the script
(avoiding retokenization and reparsing) but would use a fresh interpreter instance.
9.4 Handling script results: EidosValue, Eidos_intrusive_ptr, and EidosObjectPool
In the previous section we saw that a result was returned from EidosInterpreter of type
This introduces perhaps the most complex topic in the implementation of Eidos:
how values are handled by Eidos. There are several complex facets to this topic, so hold on tight.
EidosValue_SP.
65
First of all, all values in Eidos – the results of evaluating expressions, the values of variables, and
so forth – are represented using subclasses of an abstract base class, EidosValue. This base class
and all of its subclasses are all defined in eidos_value.h. There are six basic types of values,
represented by six values of an enumeration, EidosValueType:
NULL: type EidosValueType::kValueNULL, subclass EidosValue_NULL
logical: type EidosValueType::kValueLogical, subclass EidosValue_Logical
integer: type EidosValueType::kValueInt,
float: type EidosValueType::kValueFloat,
subclass EidosValue_Int
subclass EidosValue_Float
string: type EidosValueType::kValueString, subclass EidosValue_String
object: type EidosValueType::kValueObject, subclass EidosValue_Object
In fact the situation is more complex than this, because all of these classes except
have subclasses of their own, for performance reasons. EidosValue_Logical has
a subclass named EidosValue_Logical_const that is used internally by Eidos to represent the T
and F constants with immutable values. The other four classes are all themselves abstract base
classes, and have two concrete subclasses each, one with a name ending in _vector that is used to
represent vectors of values, the other with a name ending in _singleton that is (usually) used to
represent a single value. Use of the _singleton subclass is not required when working with
singleton values, but since it avoids all of the overhead of std::vector, it is considerably more
efficient than the _vector subclass when only a single value is stored. In general, however, this
complexity may be ignored unless you are creating your own values; if Eidos hands you a value,
you can usually treat it with the EidosValue API, or at worst with one of the direct subclass APIs.
Casting down to a particular subclass is often done in Eidos by first determining the type of the
value, using the EidosValue::Type() method, and then doing a static cast.
So that is EidosValue. However, EidosInterpreter returned an EidosValue_SP, not an
EidosValue; what is that? Eidos uses “smart pointers” to handle EidosValue instances in almost all
cases. This provides automatic reference-counting and automatic deallocation of values when
they are no longer referenced. Eidos does this using its own smart pointer class, called
Eidos_intrusive_ptr, defined in eidos_intrusive_ptr.h (“intrusive” here means that the smart
pointer keeps the reference count inside the object that is pointed to). Smart pointers are quite
transparent to use; in most respects they are semantically identical to an ordinary C pointer, except
that you don’t need to worry about the memory-management policy for the object pointed to, and
you never need to explicitly deallocate that object. Just use the smart pointers like ordinary
pointers, and the bookkeeping will be done for you. Since Eidos_intrusive_ptr is a templatebased class, using it is syntactically a bit annoying; eidos_value.h therefore defines some standard
typedefs for smart pointers to EidosValue and its various subclasses. EidosValue_SP is a typedef
for Eidos_intrusive_ptr<EidosValue>; other smart pointer types are defined similarly with an _SP
suffix on the name of the class pointed to.
One other point worth mentioning here, to keep you from getting into trouble, is that all
EidosValue instances are allocated out of a special global object pool, rather than using operator
new and delete (placement new is used instead, to construct an object in place in the object pool).
The global object pool is named gEidosValuePool, and is an instance of the class
EidosObjectPool, defined in eidos_object_pool.h. (This class, like Eidos_intrusive_ptr, is a
general-purpose template-based class that you might wish to use for other purposes in your
Context code as well, by the way.) The mantra for making a new heap-based EidosValue
therefore looks like this:
EidosValue_NULL
66
EidosValue_Int_vector *int_value = (new (gEidosValuePool->AllocateChunk())
EidosValue_Int_vector());
The gEidosValuePool->AllocateChunk() call returns a pointer to a block of memory inside the
object pool. That is then passed to placement new to construct a new object at that location. A
constructor for the desired concrete subclass, EidosValue_Int_vector, completes the syntax (and
various constructors exist in eidos_value.h for constructing from std::vector, from
std::initializer_list, and from a singleton value of the appropriate type).
The allocated EidosValue object is then typically put under the control of the smart pointer
architecture immediately:
EidosValue_SP value_SP = EidosValue_SP(int_value);
With this call – which is just this side of mandatory, given the design of Eidos – you surrender
ownership of the int_value object and give it to the smart pointer value_SP. When value_SP goes
out of scope, int_value will automatically be disposed of – unless another reference to it has been
generated in another smart pointer through assignment, return, or other mechanisms. For this
reason, it is best not to use int_value again; use value_SP instead, for all purposes, unless you
really know what you are doing, to avoid the possibility of using a stale pointer. You do not own
int_value any more.
When an EidosValue is no longer used, it is destroyed not with a call to delete, but by first
calling its destructor – the virtual destructor ~EidosValue() – and then by returning its memory to
the object pool by calling gEidosValuePool->DisposeChunk() with its pointer. However, since
this is handled by the smart pointer encapsulating the value object, you should literally never need
to do this yourself.
So, with all that as preface, what might you do with the EidosValue_SP returned by the
interpreter? The methods defined on EidosValue provide most of your options. You could
determine its type with Type(), count how many values it contains with Count(), extract
individual elements to C++ values with the ...AtIndex() suite of methods, or print a description
of it to a std::ostream with Print(), for example. If you need to really delve into the contents of
an EidosValue, further APIs are provided, documented in the header.
10. Making C++ objects visible in Eidos
The previous showed how to run a pure Eidos script. Eidos is designed to control a Context,
however; Eidos is not an interesting language in itself, it is the ability to control objects defined in
the Context that makes it worthwhile. This chapter will explore how to define objects that are
exposed to script in Eidos, and subsequent chapters will show how to give those objects properties
and methods.
10.1 Defining an interface: EidosObjectClass
Objects are exposed in Eidos as elements of vector values of type object. A single Eidos object
value can thus contain many C++ objects, each of which is one element in the Eidos vector. Each
element – each C++ object – is a subclass of EidosObjectElement, as we will see in the next
section. Instances of EidosObjectElement do not declare their own Eidos interface, however – the
properties and methods they support. Instead, this is done by a “class object” that is a subclass of
EidosObjectClass. For each class that is exposed in Eidos, a single global class object is allocated
67
that lives forever, and all instances of that class contain a pointer to their class object, as we shall
see.
EidosObjectClass is quite straightforward to subclass. It is not an abstract base class, and in
fact it declares and defines the standard str(), property(), and method() methods that are present
in all Eidos objects. When you subclass it, the only thing you are required to do is to override
ElementType() to return the name of your class (which should be a unique, permanent string).
Every subclass of EidosObjectClass should have a different class name. You can also, optionally,
override further methods to add Eidos properties and methods to your class interface, as we will
see in later sections.
It is worth noting here that methods in Eidos can be either class methods or instance methods.
Given an Eidos value containing multiple elements of a given class, an instance method is called
on each element and the results aggregated and returned, whereas a class method is called on the
class object just once (not once per element). An EidosObjectClass subclass would therefore
implement any class methods that it declares, and the base class implements the property() and
method() class methods.
10.2 Defining an implementation: EidosObjectElement
Having defined a class interface, as described in the previous section, we now need to
implement that interface. That is done in a subclass of EidosObjectElement. This is not an
abstract base class; it defines the implementation for the standard str() method, as well as setting
up some useful superclass behavior for use by Contexts. You subclass it to add your own
functionality, as declared by your class object.
The only requirement for a subclass of EidosObjectElement is to override the Class() method
to return the unique, permanent class object for your class. This object is often defined as a global
variable; for example, you might put in your header:
extern EidosObjectClass *gMyClassObject;
In your implementation file you would put:
EidosObjectClass *gMyClassObject = new MyClass();
Your EidosObjectElement subclass would then define Class() as:
const EidosObjectClass *MyElement::Class(void) const
{
return gMyClass;
}
10.3 Defining new Eidos symbols: EidosSymbolTable
The final step is to make instances of your C++ class visible as Eidos objects. This is typically
done by defining Eidos constants for those instances; SLiM, for example, defines an Eidos constant
named sim for the simulation (an instance of SLiMSim) and for each mutation type, genomic
element type, subpopulation, and so forth.
To do this, what is needed is to (1) define a symbol table containing the objects that you want
to export into Eidos, and (2) to add that symbol table to the interpreter used to execute your Eidos
script. Tackling point (2) first, within the larger context of setting up an interpreter (section 9.3),
the new code might look something like:
EidosSymbolTable context_symbols(true, gEidosConstantsSymbolTable);
68
EidosSymbolTable var_symbols(false, context_symbols);
EidosInterpreter interpreter(script, var_symbols, *function_map, nullptr);
This sets up a chain of EidosSymbolTable instances; var_symbols contains variables defined by
the executing script, and links to context_symbols which contains Context-defined constants,
which in turn links to the global Eidos symbol table gEidosConstantsSymbolTable that should be
the root of every symbol table chain in order to provide the standard Eidos constants. By the way,
“constant” in this context is in the sense of a constant pointer; the objects exported into Eidos in
this way are fully modifiable and mutable, but they are “constant” in the sense that the script
cannot assign a new value to the symbol bound to the object. Of course you could make your
exported objects be variables instead of constants; but using SLiM as an example, it would make
little sense to allow the user to execute sim=7; and redefine the sim constant to have a value other
than the simulation object. Eidos constants thus typically make sense for such values.
Having set up a symbol table in this way, adding new constants to it is as simple as:
context_symbols.InitializeConstantSymbolEntry(name, value);
The parameter name should be an EidosGlobalStringID corresponding to the name of the
symbol, set up at initialization time if the name is a constant like sim, or set up later if it is not (see
section 9.1). The parameter value is an EidosValue_SP wrapper for the C++ instance to be
represented by the symbol; this can be set up as described in section 9.4. Note that it is not your
C++ object instance that is allocated out of the Eidos object pool; your C++ instance is allocated
as usual, in whatever way you wish (i.e., operator new, typically). It is the EidosValue wrapper
containing your instance that is allocated from the Eidos object pool and given to
InitializeConstantSymbolEntry(). The Eidos constant you define might be a singleton, like sim,
or it might be a vector containing a whole set of object elements of a given class.
This is all that it takes; after the call to InitializeConstantSymbolEntry(), the object will be
visible in Eidos and will be able to be assigned, subscripted, and have properties and methods
accessed through it. If you wish to define a variable instead of a constant, the
SetValueForSymbol() method of EidosSymbolTable should do what you want; but you will need
to add it to the variable symbol table (var_symbols, above) rather than to a symbol table defined to
contain constants.
11. Adding properties to Context objects
Having exported a C++ object to Eidos by defining a class object to declare its interface and by
adding a symbol for it to the Eidos symbol table, we are now ready to start giving the object
behavior that is visible in Eidos. The first step is adding a property.
11.1 Defining a property signature: EidosPropertySignature
Properties are defined using EidosPropertySignature. In your EidosObjectClass subclass, you
should define SignatureForProperty() to return signatures for the properties supported by your
class. Signatures are typically held in static variables that are allocated once and kept forever.
For example, you might write:
const EidosPropertySignature
*MyClass::SignatureForProperty(EidosGlobalStringID p_property_id) const
{
static EidosPropertySignature *propSig = nullptr;
69
if (!propSig)
{
propSig = new EidosPropertySignature(gPropStr, gPropID, false,
kEidosValueMaskInt | kEidosValueMaskSingleton));
}
switch (p_property_id)
{
case gPropID: return propSig;
default: return
EidosObjectClass::SignatureForProperty(p_property_id);
}
}
The name and EidosGlobalStringID of the property are contained in the variables gPropStr
and gPropID, respectively; these do not have to be globals, but should be permanent, and should
have been set up by registration at initialization time (see section 9.1). The false flag indicates
that the property is read-write; true would indicate a read-only property. The last argument
indicates the type of the property, in this case integer combined with a flag that indicates that the
property is guaranteed to return a singleton value (which is required by the semantics of Eidos for
all read-write properties). The call to the superclass in the default case ensures both that any
properties defined by the base class work properly, and that any properties not supported by the
class are handled correctly; this call to the superclass should always be present.
11.2 Declaring a property interface
Having set up a property with SignatureForProperty() does not declare that your class
actually supports that property. To make that declaration, you must override the Properties()
method to include your new property. A typical implementation would look like:
const std::vector<const EidosPropertySignature *>
*Eidos_TestElementClass::Properties(void) const
{
static std::vector<const EidosPropertySignature *> *properties =
nullptr;
if (!properties)
{
properties = new std::vector<const EidosPropertySignature
*>(*EidosObjectClass::Properties());
properties->emplace_back(SignatureForPropertyOrRaise(gPropID));
std::sort(properties->begin(), properties->end(),
CompareEidosPropertySignatures);
}
return properties;
}
A vector of property signatures is kept as a static variable, defined on demand. The vector
begins as a copy of the properties supported by the superclass; any properties defined by the
subclass are then added. The vector is then sorted (for display purposes within Eidos), one-time
overhead since the vector is cached in the static variable, which is returned.
After declaring a property in this way, it will appear in your class interface within Eidos; since
you have not yet defined the property, however, it will not work properly. Wiring that up is
described in the next section.
70
11.2 Implementing a property interface
Now that a property has been declared in your class using SignatureForProperty() and
Properties(), you need to implement it so that the property gets and sets the proper value in your
object. This is done in your object element (the subclass of EidosObjectElement), rather than your
object class (the subclass of EidosObjectClass), since the actual value for a given property is held
within each object element.
To implement the “getter” functionality of a property – the ability to read the property from
Eidos – you subclass GetProperty(), and if the property EidosGlobalStringID passed to that
method is the ID of your property, you construct and return an EidosValue_SP for the value of the
property (see section 9.4). To implement the “setter” functionality, you subclass SetProperty(),
and if the property EidosGlobalStringID passed in is the ID of your property, you extract the value
being set and keep it in the instance variables of your object. In both cases, your implementations
should call the superclass to handle properties other than the one you have implemented. For
example, to implement an integer property you might use code like this:
EidosValue_SP Eidos_TestElement::GetProperty(EidosGlobalStringID p_propid)
{
if (p_propid == gPropID)
return EidosValue_SP(new (gEidosValuePool->AllocateChunk())
EidosValue_Int_singleton(prop_));
else
return EidosObjectElement::GetProperty(p_propid);
}
void Eidos_TestElement::SetProperty(EidosGlobalStringID p_propid, const
EidosValue &p_value)
{
if (p_propid == gPropID)
prop_ = p_value.IntAtIndex(0, nullptr);
else
EidosObjectElement::SetProperty(p_propid, p_value);
}
Here, prop_ is an instance variable in your C++ class, probably defined as an int64_t so that it
can hold the full range of values supported by the Eidos integer type.
12. Adding methods to Context objects
Adding methods to an Eidos class is done in a very similar way to adding properties; you define
the method using a signature, you declare the method in your class object, and you implement the
method in your object element class.
12.1 Defining a method signature: EidosMethodSignature
Method signatures are defined using EidosMethodSignature within the SignatureForMethod()
method of a class object. In outline, this looks very much like defining a property signature, as in
section 11.1, using a static variable to hold a permanently allocated signature. The method
signature definition itself might look like this:
methodSig = (EidosInstanceMethodSignature *)(new
EidosInstanceMethodSignature(gMethodName, kEidosValueMaskInt))>AddInt_S("paramName");
71
This declares a new method signature, methodSig, for a method named gMethodName that
returns a value of type integer (as specified by the mask parameter to the constructor). This
method takes one parameter, an integer (the AddInt base) which must be a singleton (the _S
suffix) and which is named paramName. Further parameters may be added by chaining Add...()
calls; a wide variety of possible parameter types can be specified with standard Add...() calls
declared in eidos_call_signature.h.
The code above creates a signature for an instance method; there is also a subclass named
EidosClassMethodSignature used to create class methods in the same way.
12.2 Declaring a method interface
Method signatures are declared in your class interface within the Methods() method of your
class object. This is very similar to how properties are declared (section 11.2):
const std::vector<const EidosMethodSignature *> *MyClass::Methods(void)
const
{
static std::vector<const EidosMethodSignature *> *methods = nullptr;
if (!methods)
{
methods = new std::vector<const EidosMethodSignature
*>(*EidosObjectClass::Methods());
methods->emplace_back(SignatureForMethodOrRaise(gMethodID));
std::sort(methods->begin(), methods->end(),
CompareEidosCallSignatures);
}
}
return methods;
12.3 Implementing a method interface
Your method is implemented in either the ExecuteClassMethod() method of your class object,
for class methods, or the ExecuteInstanceMethod() method of your object element class, for
instance methods. If the EidosGlobalStringID passed in identifies your method, then your code
should do whatever calculations are needed and return whatever value is appropriate, using the
arguments passed in to the Execute...Method() method. The EidosInterpreter object is also
made available here if you need it for some purpose; for example, you can get a reference to your
EidosContext object by calling its Context() method (see section 9.3). As in the other cases we
have seen, be sure to call the superclass method for methods that you do not implement yourself,
so that both standard Eidos methods and undefined methods are handled correctly.
13. Writing new built-in Eidos functions
Writing new functions that are visible in Eidos is fairly straightforward. As with properties and
methods, the first step is to define a signature for your function. You then implement the function
in your code, and finally you make the function visible by adding it to the function map used by
the Eidos interpreter.
72
13.1 Defining a function signature: EidosFunctionSignature
A function is defined using EidosFunctionSignature, which is very similar to
EidosMethodSignature (see section 12.1). There is no standard place for this to be done; often it is
done in your initialization code, to set up static function signatures that are used thenceforth. The
important thing is that each function that you want to define should have a permanent signature
object associated with it that defines the return type and the parameters for the function, just as for
method signatures. You might keep these signatures in a static global std::vector, for example.
The definition of the signature for SLiM’s mutationTypes method on SLiMSim, for example, looks
like this:
sim_0_signatures_.emplace_back((EidosFunctionSignature *)(new
EidosFunctionSignature(gStr_initializeGenomicElementType,
EidosFunctionIdentifier::kDelegatedFunction, kEidosValueMaskObject |
kEidosValueMaskSingleton, gSLiM_GenomicElementType_Class,
SLiMSim::StaticFunctionDelegationFunnel, static_cast<void *>(this),
"SLiM"))->AddIntString_S("id")->AddIntObject("mutationTypes",
gSLiM_MutationType_Class)->AddNumeric("proportions"));
The signature is stored in a vector named sim_0_signatures_. The name of the function is in
the string object gStr_initializeGenomicElementType. It is identified as a “delegated” function
by the next parameter; functions implemented in the Context should always be identified in this
way. The next parameter defines its return type (a singleton object), and the following parameter
defines the class of that object return type (gSLiM_GenomicElementType_Class, the class object for
genomic element type objects in the SLiM Context). The next argument,
SLiMSim::StaticFunctionDelegationFunnel, is the name of a static method that is called when
the function is called in Eidos; this is the C++ implementation of the function, which may be either
a static method or a function in C++. The next parameter supplies a C++ “delegate object” for the
function; this is passed back to the delegate function or method as its first parameter, as a sort of
context, and may be nullptr if you don’t need such context. The final parameter gives a
std::string name for the delegate or Context that is implementing the function; this is used just for
display purposes in Eidos, to show an attribution for externally defined functions.
The rest of the statement adds parameters to the function being defined, using Add...()
methods as seen before for method signatures. Here, a singleton integer parameter named id is
added, then a parameter that may be either an integer or an object of class MutationType (i.e.,
gSLiM_MutationType_Class), and finally a numeric (i.e. integer or float) parameter named
proportions. This example illustrates the chaining of parameter addition, as well as how to
specify an object parameter with a given class.
13.2 Implementing a new function: EidosDelegateFunctionPtr
In the previous example, SLiMSim::StaticFunctionDelegationFunnel was described as the
C++ implementation of the function being defined. This parameter, whether it is a static method
or a function, should be of type EidosDelegateFunctionPtr (defined in eidos_functions.h). This
is a typedef for a pointer to function that takes parameters including a void* delegate pointer (as
given to the function signature when constructed; above, this is static_cast<void *>(this)), a
std::string giving the name of the function (this is likely to change to an EidosGlobalStringID in
a future API revision), a list of arguments with a count, and a reference to the current
EidosInterpreter. Your implementation should simply use the function name to determine the
function being run (if you use the same bottleneck to respond to multiple functions), and then use
73
the values in the argument list to perform whatever task or calculation your function performs and
return whatever EidosValue_SP value you wish to return.
13.3 Making a new function visible: EidosFunctionMap
At this point, your function will still not actually be callable from Eidos; Eidos does not know
that it exists, since you have not given your function signature object to Eidos to register it. That is
the final step, described here.
Back in section 9.3, we got a function map from Eidos, which we then handed to the
interpreter:
EidosFunctionMap *function_map = EidosInterpreter::BuiltInFunctionMap();
This is the point at which you introduce your own functions to the interpreter. Rather than
giving it the built-in function map, you can derive a new function map, add your own function
signatures, and give that map to the interpreter instead. EidosFunctionMap is just a typedef
(defined in eidos_interpreter.h) for std::map<std::string, const EidosFunctionSignature*>.
You can therefore use standard C++ to (1) make a copy of the built-in function map, and (2) insert
all of your own function signatures into the map, with their std::string names as the keys for the
entries (again, the std::string may change to EidosGlobalStringID in a future API revision). You
then give your derived function map to the interpreter, and all of your functions will then be
known to the interpreter. Your C++ code will automatically be called when the corresponding
Eidos function is called.
It is recommended that you cache your derived function map in a static global, rather than
constructing it each time that you want to set up an interpreter, for better performance. In any
case you must ensure that the function map lives for at least as long as the interpreter that you give
it to; the interpreter does not make its own copy of the function map, for performance reasons.
14. Making an Objective-C/Cocoa GUI for an Eidos Context
The preceding chapters have described how to run Eidos scripts within your Context, including
defining Eidos objects with properties and methods that represent the internal state of your
Context. For command-line tools, this may be all that you need to use Eidos with your Context.
User applications with a graphical user interface (GUI) may want more, however – support for
interactive scripting in a console, syntax coloring, code autocompletion, browsing of variables,
and other such niceties. Eidos supplies some reusable classes for these tasks within the
EidosScribe application, one of the four targets in the Xcode project for SLiM/Eidos. This chapter
will provide a brief introduction to those facilities. Note that here we shift languages here from
C++, which is used for the core code of Eidos, to Objective-C++ and Cocoa, which is used for the
Eidos user interface.
14.1 The EidosConsoleWindowController class
For applications that simply wish to display a standard Eidos interactive console window, the
class may suffice. It encapsulates all of the behavior of such a
console, including script checking and execution, display of output, and a variable browser that
can be used by the user to examine the values of individual Eidos variables. It also provides
access to the Eidos help browser, with online help for Eidos functions, methods, operators,
keywords, types, and statements. Using this class simply entails copying the class, its nib, and all
EidosConsoleWindowController
74
of its dependencies from EidosScribe into your own project (including EidosTextView,
EidosConsoleTextView, EidosVariableBrowserController, EidosHelpController, etc.). Eidos
does not presently supply a framework that can be linked to in order to get this functionality; that
is certainly a possibility for the future, if the demand exists.
EidosScribe shows how to create a console window; for example, it calls Eidos_WarmUp() in its
applicationWillFinishLaunching: method, and then loads the EidosConsoleWindow nib and
shows the console window in its applicationDidFinishLaunching: method.
EidosConsoleWindowController uses the standard Objective-C delegation pattern to let the
Context modify the behavior of the console window. The full suite of supported delegate methods
are given in EidosConsoleWindowControllerDelegate.h, and include methods for (1) customizing
the symbol table and function map used by the console window controller’s interpreter; (2)
improving syntax coloring and code completion by telling the console window controller about
your added methods, constants, and language keywords; (3) receiving notifications before and
after script execution in the console window; and several other facilities as well.
One point of interest is that EidosScribe follows a single-window model in which the console
window is the app, and if the console window is closed the app quits. This is achieved in the
delegate method eidosConsoleWindowControllerConsoleWindowWillClose:, which calls
[[NSApplication sharedApplication] terminate:nil] to terminate the app. You may instead
wish to use the console window in a less central way, or even to allow multiple console windows
to be open simultaneously in your app, as SLiMgui does. There is no problem with doing so,
although you should note that Eidos is not multithreaded at this time, and so all calls to Eidos
should be done on the main thread.
14.2 The EidosTextView and EidosConsoleTextView classes
For many applications, the prepackaged console window functionality provided by
EidosConsoleWindowController may not fit the bill. For such applications, the EidosTextView
EidosConsoleTextView classes are provided. These classes are actually used internally by
and
EidosConsoleWindowController; using them individually allows greater flexibility, however. For
example, an EidosTextView could be used provide an Eidos scripting view within a larger window
in your application, perhaps to attach a snippet of Eidos code to some larger entity being
controlled in that window.
Of the two classes, EidosTextView is the more generally useful. It provides a subclass of
NSTextView that knows how to perform some standard operations with Eidos code – syntax
coloring, shifting the selection left and right, commenting and uncommenting the selection,
selecting the error range in the script after an Eidos error occurs, and doing code completion when
the user presses the <esc> key. Its behavior can be customized with delegate methods in a very
similar way to EidosConsoleWindowController (which actually just forwards many of its delegate
methods onward from the EidosTextView that it contains); those methods are described in
EidosTextViewDelegate.h.
EidosConsoleTextView is a subclass of EidosTextView that adds the concept of a prompt, a
command history, and a user request for execution of the command at the current prompt. It does
not actually execute commands, however; it does not keep an interpreter or a symbol table. Its
design is closely tied to that of EidosConsoleWindowController, which executes commands for it
and updates its state, and it is not likely to be easy to use outside of that context. If you really want
to have a console textview without using EidosConsoleWindowController, you will probably need
to delve deep into the internals of EidosConsoleWindowController to find out how the two classes
work together.
75
14.3 Extending EidosScribe
EidosScribe is designed to be a minimal application providing GUI scripting capabilities for
Eidos. It might therefore be a fairly good starting point for your own Eidos-based app. You can
customize its behavior using the delegate methods described in the previous sections; for example,
that is how you would export your own Context-defined functions and classes into the Eidos
interpreter kept internally by the console window. SLiMgui is a substantially larger and more
complex application that EidosScribe, but it does customize the Eidos interpreter and GUI in these
sorts of ways, so you might wish to examine its code. You might also wish to modify or subclass
EidosConsoleWindowController or EidosTextView, or to modify EidosConsoleWindowController’s
nib to customize the console window for your application. All of the EidosScribe code is opensourced and reasonably well-commented to try to facilitate such modification (as long as you
follow its licensing terms, of course). Since EidosScribe and SLiMgui are the only GUI applications
made so far using these facilities, however, it is likely that there are ways in which they will prove
difficult to modify and customize. Please feel free to send feature requests, or to send pull requests
for new features you have added in your own GitHub branch.
15. Using the Eidos tokenizer and parser: EidosToken and EidosASTNode
If you wish only to execute Eidos scripts and define your own Eidos object classes, you can
safely ignore the internals of how the Eidos interpreter works. In some cases, however, you may
wish to delve deeper, examining the token stream or the AST generated by tokenization and
parsing. This very brief chapter will point you in the right direction.
15.1 Working with tokens: EidosToken
After a script has been tokenized (see section 9.2), the token stream may be obtained from the
method of EidosScript. The token stream is just a C++ std::vector of EidosToken*.
EidosToken, defined in eidos_token.h, is basically just a bag to hold information about a token; it
contains very little intelligence. Tokens are defined primarily by their type, a value from the
enumeration EidosTokenType; operators, language keywords, identifiers, numeric and string
literals, comments, and whitespace all have their own token types (although comments and
whitespace are normally omitted in the token stream unless requested). Tokens also know their
token string (usually the literal text in the script from which the token was formed). Finally, they
usually know their position in the script’s text, both for the UTF-8 representation of the script (as
kept by std::string) and for the UTF-16 representation of the script (as kept by NSString). This
positional information can be used to very easily do syntax coloring of an Eidos script, for
example.
The instance variables of EidosToken are public, since it is really just a bag of information, with
no behavior of its own. The only thing EidosToken really knows how to do is print itself to a
std::ostream. If you use EidosToken, therefore, you are typically running through the stream of
tokens yourself, looking for whatever tokens or patterns of tokens you are interested in.
Tokens()
15.2 Working with the parse tree: EidosASTNode
The next level up in sophistication from EidosToken is EidosASTNode. These nodes contain a
children array, allowing them to form a simple tree structure. Each node also contains a pointer to
its corresponding token; in general there is a one-to-one correspondence, so the AST is essentially
the token stream organized into a tree in order to express the syntactical structure implied by the
76
tokens. Because of this, an AST can only be constructed if the token stream expresses syntactically
correct Eidos code; otherwise, a raise from the parser will prevent parsing from completing. Once
parsing has been successfully completed with the ParseInterpreterBlockToAST() method of
EidosScript (see section 9.2), the AST can be obtained by the AST() method. That method returns
the root node of the parse; all other nodes are descendants of the root node, obtained through the
chain of child vectors.
EidosASTNode is also, like EidosToken, pretty much a dumb bag of data, and so its instance
variables are similarly public. It knows how to print itself, or how to print a full indented tree of
itself and all of its descendants via recursion. It also has methods to add a child, and to change the
token assigned to a node (used during parsing in a few cases).
The only other intelligence it has is that it knows how to perform certain optimization
operations, analyzing the AST and cached some useful precomputed values and flags for greater
speed during interpretation of the tree. For example, the token for a numeric literal such as
6.02e+23 contains only the std::string text of the token, so during interpretation that string
would need to be processed to become an EidosValue_SP every time that the constant was
encountered. If the constant was used within a for loop, that repeated work would prove
extremely time-consuming; it might even be the bulk of all of the work done by the for loop. In
an optimization phase kicked off by a call to OptimizeTree() on the root AST node, however, an
EidosValue_SP is generated and cached by the AST node of the numeric literal, and that cached
value is used by the interpreter every time that the numeric literal is encountered during execution
of the script – a huge time savings. Note that OptimizeTree() is called automatically, toward the
end of the ParseInterpreterBlockToAST() method; you will probably never need to call it
yourself unless you implement your own top-level parsing function.
The optimization phase does several other smart things as well, although it could do a great
deal more; at present the structure of the AST is never rearranged by optimization, for example.
The details of this optimization are likely to change in future, and are beyond the scope of this
documentation to cover in detail; it is mentioned here mainly as a point of interest. However, if
you extend the grammar of Eidos (see chapter 16), you may find that you wish to also extend the
power of the optimization phase to make your extended grammar more efficient; SLiM does this,
for example, to increase the execution speed of its callbacks by pre-scanning for the use of
particular identifiers. SLiM does this in a class called SLiMEidosBlock that encapsulates a runnable
Eidos “script block” that is itself an Eidos object; but for many purposes it might be simpler to
subclass EidosScript instead to achieve a similar effect.
16. Extending the Eidos grammar
The final topic to be covered in this manual is extending the grammar of Eidos itself. In
principle, you could do this in some very sophisticated ways; the parsing of each of the
grammatical structures of Eidos is handled by a separate method in EidosScript that could be
overridden (although you would need to change the function you override to be virtual, since
those functions are not declared as virtual presently). Similarly, the interpretation of each of the
language constructs is handled by a separate method in EidosInterpreter that could be
overridden in a subclass. Eidos is not presently designed to facilitate this level of customization,
but it would probably not be terribly difficult, if you were willing to get your hands dirty making
some simple changes to the code of Eidos itself. However, doing this is beyond the scope of this
manual.
77
A simpler possibility is exemplified by SLiM. SLiM extends the grammar of Eidos to provide the
concept of “script blocks”, independent snippets of Eidos code, each representing the scripts for an
event or callback, all contained within a single source file. This is done by (1) subclassing
EidosScript, in the form of SLiMEidosScript, (2) defining a new top-level parsing method to
SLiMEidosScript, named ParseSLiMFileToAST(), that parses the overall structure of a SLiM input
file to an AST that has one node per script block, and (3) parsing that AST, in
SLiMSim::InitializeFromFile(), to extract the script blocks, making a SLiMEidosBlock
(essentially an EidosScript with extra baggage) out of each block. When SLiM wants to execute a
particular script block, it then sets up an interpreter on the EidosScript for that script block.
This design is relatively simple because it leverages the facilities provided by Eidos with only a
few simple and superficial modifications. The custom top-level parsing method,
ParseSLiMFileToAST(), parses only the top-level structure of the file, in which events and
callbacks are declared, and even that parsing is done mainly by calling in to the parsing methods
of EidosScript. When it reaches the body of a script block – the actual Eidos code – it simply
calls the EidosScript method Parse_CompoundStatement() to parse the entire body of the block.
Similarly, because of the way that SLiM post-processes the generated AST to break it up into its
component script blocks, SLiM does not need to modify the Eidos interpreter at all; it has no
subclass of EidosInterpreter. All of the functionality of its events and callbacks is provided by
modifying the symbol table and function map provided to the standard interpreter, and using the
result returned by the interpreter to modify the behavior of the simulation. Under the hood, when
a callback is running, it is pure Eidos.
17. The future of Eidos
It is not presently clear what the future holds for Eidos. The design of the language is pulled in
two different directions at the moment. On the one hand, it would be wonderful to extend it in
some obvious ways to make it more powerful and general – being able to define your own
functions in Eidos would be the obvious place to start. On the other hand, the radical simplicity of
Eidos right now is both a design goal and a selling point; we want the language to be immediately
approachable by people with no programming experience (biologists, in particular, in the case of
SLiM). Every feature added to the language moves us away from that goal; if we add functions, we
would probably have to add a function type to the suite of types supported by EidosValue, and
would almost certainly have to add scoping rules, quoting and evaluation rules, and other
complexities. All of that new complexity would have to be documented, and would make Eidos
that much less approachable.
We have not yet decided on the precise balance we wish to strike; since the primary goal for
Eidos is to provide scriptability to SLiM, a lot depends upon what we hear back from the users of
SLiM 2.0. If you have feedback for us, either as a user of SLiM or as a user of Eidos in some other
Context, we’d love to hear it.
78
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement