Thesis Kornienko2015A

Thesis Kornienko2015A
DISSERTATION
submitted to the
Combined Faculties for the Natural Sciences and for Mathematics
of the Ruperto-Carola University of Heidelberg, Germany
for the degree of
Doctor of Natural Sciences
presented by
Diploma-Biomathematician Olga Kornienko
born in Kiew, Ukraine
Oral-examination: 16.11.2015
Neural Representations and Decoding with
Optimized Kernel Density Estimates
Referees:
Prof. Dr. Daniel Durstewitz
Prof. Dr. Christoph Schuster
Statement of Originality
Declarations according to §8 (3) b) and c) of the doctoral degree regulations: a) I hereby declare that I
have written the submitted dissertation myself and in this process have used no other sources or materials than those expressly indicated, b) I hereby declare that I have not applied to be examined at any
other institution, nor have I used the dissertation in this or any other form at any other institution as an
examination paper, nor submitted it to any other faculty as a dissertation.
3
Abstract
In in-vivo neurophysiology, firing rates from single neurons are traditionally presented in the form of
spike counts or peri-stimulus time histograms which are accumulated and averaged across many presumably identical trials. These histograms may on the one hand provide either only noisy representations
of the true underlying spiking activity, or on the other hand do not enable single trial resolution. Kernel density estimates (KDE), a weighted moving average with Gaussian kernels centered around spike
times, act as a low-pass filters averaging out rapid changes in the firing frequency. Optimized KDEs
with the width of the Gaussians (bandwidth) determined through cross-validation or bootstrapping reflect more accurately the underlying spiking activity and also allow for single trial resolution.
We found that optimized bandwidth estimates obtained through unbiased cross-validation (UCV) are an
information rich measure, which is applicable to more problems than firing rate estimation, by analyzing
both simulations and multiple single-unit recordings from the prefrontal cortex (PFC) of behaving rats.
Optimized bandwidth estimates provide a characteristic value for the temporal spiking structure of single units and can be modeled as a function of the temporal precision within spiking patterns accounting
for the signal-to-noise ratio in simulated data. The distribution of optimized bandwidth estimates of
PFC units and their joint distribution with further spike train metrics allows to segregate groups of cells
with distinct spiking properties. Additionally, optimized KDEs obtained with UCV-based bandwidths
perform reliable or superior compared to non-optimized KDEs when decoding behavioral events during
the task. Moreover, when applied to analyze mechanisms of encoding and internal processing during
self-paced cognitive tasks, optimized KDEs facilitate across-trial comparisons of firing activity during
trials varying in length, enable to identify neuronal ensembles encoding for task-related events and can
unfold population dynamics displaying the underlying neural process.
4
Zusammenfassung
In der In vivo-Neurophysiologie werden Feuerraten von einzelnen Neuronen traditionell mittels der Zaehlung von Aktionspotentialen (AP) innerhalb eines Zeitfensters oder mithilfe des Peri-Stimulus Zeithistogramms dargestellt, in welchem moeglichst viele identische Versuche akkumuliert und gemittelt worden sind. Diese Histogramme koennen einerseits entweder nur verrauschte Repraesentationen der tatsaechlich zugrunde liegenden neuronalen Aktivitaeten liefern oder ermoeglichen es andererseits nicht, single-trial Aufloesung zu erhalten. Kerndichteschaetzer (KDS), ein gewichteter gleitender
Mittelwert, bei welchen Gauss’sche Dichtefunktionen um Spike-Zeiten zentriert werden, fungieren als
low-pass Filter, welche hochfrequente Fluktuationen in der Feuerrate herausmitteln. Optimierte KDS,
bei welchen die Weite oder Varianz der Gauss’schen Dichtefunktion (die Bandweite) durch Kreuzvalidierungsverfahren oder Bootstrapping bestimmt wird, reflektieren akkurater die zugrunde liegenden
AP-Aktivitaeten und ermoeglichen eine single-trial Aufloesung. Wir haben herausgefunden, dass optimierte Bandweitenschaetzer, welche mittels ”unbiased cross-validation” (UCV) ermittelt wurden, ein
informationsreiches Mass darstellen, welches anwendbar auf viele weitere Probleme als nur die Bestimmung der Feuerrate ist, sowohl in simulierten Daten als auch in Tetroden-basierten elektrophysiologischen Ableitungen des Praefrontalen Cortex (PFC) von Ratten waehrend Verhaltensexperimenten.
Ermittelte optimierte Bandweiten liefern einen charakteristischen Wert fuer die zeitliche Aktivitaet von
einzelnen Neuronen und koennen in simulierten Daten als Funktion in Abhaengigkeit von der zeitlichen
Praezision und des Signal-Rausch-Verhaeltnisses innerhalb der Spike-Muster modelliert werden. Die
Verteilung der optimierten Bandweiten-Werte fuer PFC-Zellen sowie ihre gemeinsame Verteilung mit
weiteren Massen fuer Regularitaet von AP erlauben es, Klassen von Zellen mit bestimmten Feuerverhalten zu unterscheiden. Ausserdem verhalten sich optimierte KDS, welche mittels UCV-basierten
Bandweiten geschaetzt wurden, zuverlaessiger und liefern bessere Ergebnisse im Vergleich zu nicht
optimierten KDS, wenn diese eingesetzt werden, um Verhalten des Tieres waehrend eines Experiments
zu dekodieren. Darueber hinaus erlauben es optimierte KDS wenn sie zur Analyse von Mechanismen der Neuronalen Kodierung waehrend Durchfuehrung von komplexen kognitiven Aufgaben eingesetzt werden Einzel-Trial Vergleiche von zeitlich variablen Versuchsdurchgaengen. Sie ermoeglichen
die Identifizierung von neuronalen Ensembles, die aufgabenbezogene Ereignisse kodieren und koennen
Populationsdynamiken, welche einen zugrunde liegenden neuronalen Prozess abbilden, aufdecken.
5
Contents
1
Introduction
1.1
1.2
1.3
2
3
The coding problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Encoding schemes of the brain . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.2 Decoding and stimulus reconstruction - methods and algorithms for retrieving
information conveyed by spiking activity . . . . . . . . . . . . . . . . . . . .
Representations of neuronal spiking activity . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Model-free and model-based approaches in firing rate estimation . . . . . . . .
1.2.2 Model complexity, parameter estimation and optimization in probabilistic and
non-probabilistic approaches . . . . . . . . . . . . . . . . . . . . . . . . . . .
Motivation & goal of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
2
2
.
.
.
4
6
6
.
.
8
9
Methods
12
2.1
2.2
2.3
2.4
2.5
2.6
12
13
14
16
18
20
Spike train regularity measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Kernel densitity esimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Selecting the optimal smoothness of kernel density estimates by unbiased cross-validation
Decoding behavior from spike train data: the classification procedure . . . . . . . . . . .
Evaluating decoding accuracy by m-fold cross-validation . . . . . . . . . . . . . . . . .
Experimental procedures and electrophysiological recordings . . . . . . . . . . . . . . .
General properties of UCV-based single unit bandwidth estimates
3.1
3.2
3.3
3.4
4
1
Motivation: inferring temporal structure of spike trains . . . . . . . . . . . . . . . . .
Surrogate data: dependency of optimized bandwidth estimated on temporal spiking patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Generation of temporally precise spiking patterns . . . . . . . . . . . . . . .
3.2.2 Relationship between temporal structure of spike trains and the bandwidth outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
In-vivo recordings: UCV-based bandwidth estimates of rat mPFC single-units . . . . .
3.3.1 Distribution and properties of bandwidth estimates . . . . . . . . . . . . . . .
3.3.2 Relationship between bandwidth estimates and other measures . . . . . . . . .
Summary & Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Validity of optimized KDEs in neuronal decoding
4.1
22
. 22
. 23
. 23
.
.
.
.
.
25
26
27
29
32
35
Validity measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1.1 Evaluation of UCV-based bandwidth estimates in decoding . . . . . . . . . . . . 37
4.1.2 Comparing decoding with optimized on non-optimized bandwidths . . . . . . . 39
I
4.2
4.3
4.4
5
Surrogate data: validity of optimized bandwidth estimates dependent on distinct firing
states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Generation of spiking activity with multiple firing states . . . . . . . . . . . .
4.2.2 UCV-bandwidth validity for simulated data sets . . . . . . . . . . . . . . . . .
Decoding spiking activity of in-vivo recordings from the rat mPFC . . . . . . . . . . .
4.3.1 Bandwidth validity for experimental data . . . . . . . . . . . . . . . . . . . .
4.3.2 Decoding performance compared to non-optimized KDEs . . . . . . . . . . .
Summary & Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
Outlook and possible extensions
5.1
5.2
5.3
5.4
40
40
43
44
44
45
46
48
Single-trial analysis with optimized single neuron representations
Identifying cell ensembles encoding for task-related information .
Reconstructing population dynamics during sequence processing .
Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . .
II
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
51
52
55
List of Figures
1.1
1.2
1.3
1.4
1.5
Spiking activity extracted from extracellular recordings . . . . . . . . . . . . . . . . .
Schematic representation of neuronal codes . . . . . . . . . . . . . . . . . . . . . . .
Illustration of Bayesian classification and the neuronal coding problem . . . . . . . . .
Comparison of firing activity obtained with PSTH and by Gaussian kernel smoothing .
Setting the kernel bandwidth determines temporal precision and smoothness of the firing
rate estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Error criterion and the optimal bandwidth . . . . . . . . . . . . . . . . . . . . . . . .
Illustration of the two-stage approach for evaluating the optimized bandwidth estimates
in decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
1
2
5
6
.
.
7
8
2.1
2.2
2.3
2.4
2.5
JISI scatter plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Illustration of Gaussian smoothing . . . . . . . . . . . . . . . . . . . . . .
UCV estimate as a function of the bandwidth h . . . . . . . . . . . . . . .
Simulation illustrating Fisher’s LDA with two classes and two spiking units
Simplified illustration of the m-fold cross-validation scheme . . . . . . . .
.
.
.
.
.
3.1
3.2
3.3
Illustration of precise spike patterns in surrogate spike trains . . . . . . . . . . . . . .
KDE of precise spike patterns in surrogate spike trains . . . . . . . . . . . . . . . . .
Functional dependency between UCV-optimal bandwidth estimates and the temporal
structure of spike trains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Distribution single unit bandwidth estimates . . . . . . . . . . . . . . . . . . . . . . .
Properties of single neuron spiking activity: JISI scatterplots, JISI densities and relationship to task-related events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Correlation to other measures reflecting precise temporal spiking patterns . . . . . . .
Relationship to other spike train irregularity measures . . . . . . . . . . . . . . . . . .
Relationship of optimal bandwidth estimates to mean spiking activity . . . . . . . . .
1.6
1.7
3.4
3.5
3.6
3.7
3.8
4.1
4.2
4.3
4.4
4.5
4.6
4.7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 10
. 24
. 24
. 25
. 27
.
.
.
.
Concept: How to choose the smoothness? Illustration of the two-stage approach for
evaluating the validity of optimized bandwidth estimates in decoding . . . . . . . . . . .
Illustration of bandwith validity measures and decoding performance as a function of
the scaling parameter λ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Decoding performance as function of the bandwidth scaling λ or smoothing degree h . .
Steps of the HMM simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Experimental and surrogate firing rate profile distributions of the Markov sequence states
Bandwidth validity for simulated data sets . . . . . . . . . . . . . . . . . . . . . . . . .
Validity measures and decoding performance as a function of the bandwidth estimate . .
III
13
14
16
17
19
28
29
30
31
36
37
39
40
42
43
44
4.8
Comparison in decoding performance of optimized KDEs and most predictive nonoptimized KDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1
5.2
5.3
5.4
5.5
Illustration of time-warped KDEs . . . . . . . . . . . . . . . . . . . . . . . .
Optimized instantaneous firing rates of encoding single units . . . . . . . . . .
Identifying encoding units via stepwise feature selection procedures . . . . . .
Reconstructing population dynamics from optimal KDEs . . . . . . . . . . . .
Sequence switch task apparatus and stages of pre-training sessions on the maze
IV
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
50
51
53
I
Symbols and Abbreviations
Abbreviation
CV
CVE
HMM
ifr
ISE
ISI
JISI
KDE
LDA
MDS
MISE
mPFC
pdf
PFC
PSTH
QDA
SD
SEM
UCV
Description
cross-validation
cross-validation error
hidden Markov model
instantaneous firing rate
integrated squared error
interspike interval
joint interspike interval
kernel density estimate
Fisher’s linear discriminant analysis
multi dimensional scaling
mean integrated squared error
medial prefrontal cortex
probability density function
prefrontal cortex
peri-stimulus time histogram
quadratic discriminant analysis
standard deviation
standard error of mean
unbiased cross-validation
Symbol
δErr
λ
λopt
µ
ν
σj
σ2
h
hopt
hucv
Cv
Err
Lv
N (µ, σ 2 )
r
r2
Description
error percentage in deviation from optimal value
bandwidth scaling parameter
optimal bandwidth scaling parameter in decoding
mean of a variable
instantaneous firing rate
inverse of the temporal precision
variance of a variable
bandwidth of the Gaussian kernel
optimal bandwidth in decoding for a fixed set of units
optimal unbiased cross-validation bandwidth estimate
coefficient of variation
averaged cross-validated test set prediction error
local variation
normal distribution with mean µ and variance σ 2
Pearson correlation coefficient
coefficient of determination
V
1 Introduction
mV
How does the brain encode, transmit, store or retrieve information? Recently developed large-scale
techniques measuring neuronal activity in the brain on a microscopic scale, such as multiple single unit
tetrode recordings make it possible to acquire data from a vast number of simultaneously recorded cells
at a high temporal resolution. This allows for more detailed insight into the spatio-temporal structure
and functional organization of neuronal activity.
A vertebrate brain contains billions of electrically excitable cells called neurons which possess elaborate branching structure, connected via synapses, through which each cell receives thousands of inputs
from other neurons. Physiological features of neurons such as cell membrane-spanning ion channels,
control the net flow of sodium, potassium, calcium, and chloride ions in response to internal and external signals. Under resting conditions, a neuron has a net negative charge of about −70 mV termed a
negative membrane potential, which is defined as the difference in electrical charge between the interior
and the surrounding extracellular medium. At this point, the cell is said to be polarized due to different
ion concentrations on either side of the cell membrane. Ion channel gating and the capability to vary
the membrane potential plays a key role in the ability of a neuron to generate and propagate electrical
signals (neuronal signaling).
A change in the voltage gradient across the
membrane which leads to a sufficient depolar∗
∗ ∗
∗
ization and exceeds a certain threshold level,
causes the generation of a short electrical pulse
of few milliseconds, termed action potential or
spike, which in the case of a temporal sequence of action potentials or multiple spikes
.
is also referred to as a spike train (fig. 1.1).
|
|
|
|
time
ISI1
ISI2
ISI3
t4
t1
t2
t3
The action potential sequence can be then characterized by a list of successive spike arrival times. For n spikes, we denote these Figure 1.1: Spiking activity extracted from extraceltimes by {ti } = {t1 , . . . , tn } with t1 < . . . < lular recordings. Voltage trace containing four spike
events at times t1 , . . . ,t4 indicated by stars. After the volttn .
Transmission of these spike trains is believed to age signal from an electrode is amplified and band-pass
filtered, firing of neurons in the vicinity appears as acallow neurons to communicate with each other tion potentials on top of background activity. Spikes are
and to assemble building blocks which encode in- then detected using an amplitude threshold and assigned
to single units according to spike sorting algorithms.
formation in their activity.
1
1.1 The coding problem
Deciphering how the brain processes information and the impact on subsequent behavior is one of the
most intriguing and debated questions in neuroscience. Progress in this field requires a better understanding of the diverse neural coding strategies used by different brain areas and, in the specific mechanisms
by which neurons represent or code for different entities. One ongoing debate in the field is identifying
what characteristics of neuronal spike trains serve as the coding signals that carry information (Rieke
et al., 1997; Shadlen and Newsome, 1995, 1998; Softky, 1995; Tovee et al., 1993; Fujii et al., 1996;
DeCharms and Zador, 2000). While decoding represents the process of reconstructing or extracting information about a stimulus from a given neural response, for instance by predicting the most probable
stimulus that could have elicited an observed spike train, encoding can be regarded as the generation of
specific activity patterns that serve as representation for these stimuli or behavioral events.
1.1.1 Encoding schemes of the brain
Several candidate codes have been proposed in
order to explain how neurons represent information. In the following we will briefly review main ideas and concepts underlying different more fundamental coding strategies, for
a more detailed overview see e.g. DeCharms
and Zador (2000); Rieke et al. (1997); Feng
(2003).
Perhaps the most widely debated question in
neural coding is whether information is conveyed in the precise spike timing of action potentials, temporal code (fig. 1.2 a), or
rather in their frequency, rate code (fig. 1.2
b). As the timing of successive action potentials is highly irregular (Softky and Koch,
1993), the interpretation of this irregularity
has led to two divergent views of cortical organization which involve two distinct, commonly considered as mutually exclusive encoding schemes that support the propagation of either asynchronous (rate code) or
synchronous (temporal code) spiking activity.
1101
1110
1011
T=4△t
spike pattern
spike rate
3
3
3
(a) .
s1
s2
4
2
s3
3
spike rate
T=1△t
(b).
s1
s2
s3
Figure 1.2: Schematic representation of temporal and
rate codes. The x-axis represents the time axis, short
black vertical lines denote spikes times and rows correspond to different trials. a.: Temporal code. Stimuli are
encoded by the relative timing of spikes. From the response in a given window of length T to three distinct
stimuli s1 , s2 , s3 indicated by colored bars, one can extract the spike pattern (here a binary four-digit number).
Time windows of same rate can contain distinct spike
patterns which are determined by the choice of the temporal precision or the size of the binning parameter △t.
The rate coding hypothesis, initially proposed by b.: Rate code. Distinct stimuli are encoded by the numAdrian and Zotterman (1926), claims that if irreg- ber of spikes within the encoding window.
ularity arises from stochastic fluctuations the irregular interspike interval (ISI) reflects a random process.
2
This implies that it would take the pooled responses of many individual neurons to elicit an instantaneous
spike rate and requires that the spiking activities of neurons in a population are mostly uncorrelated
(Mazurek and Shadlen, 2002). Accordingly, temporal pattern of spikes would convey little information.
By contrast, in the the temporal coding hypothesis the irregular ISIs may result from precise coincidences of presynaptic events. Consequently, synchrony must represent a necessary feature of spiking
activity and information is conveyed by the exact timing of spikes, their intervals and patterns.
From a statistical point of view, the distinction between the temporal and the rate code is more a question
of the timescale and can be circumvent by selecting an appropriate temporal precision when estimating
the spike rate (Rieke et al., 1997; DeCharms and Zador, 2000). Given that the firing rate is defined as the
number of spikes over some time interval △t (e.g. Tovee et al., 1993), the firing rate can be estimated reliably from a single spike train employing time bins sufficiently longer than the time between the spikes
so that enough spikes occur in each bin (fig. 1.2 b). However, when using very small bins with a high
temporal precision, each bin will contain only one or zero spikes (fig. 1.2 a), e.g. when the firing rate
changes faster than average ISI and the time bins required to capture these changes must be very small,
one is effectively measuring the position of individual spikes, being more a measure of spike timing than
spike rate. For that reason, the distinction between rate and temporal encoding when analyzing individual spike trains is principled but in that sense rather arbitrary and is based upon the selected temporal
precision or time interval chosen for counting the spikes. Not surprisingly, some studies argue that both
encoding mechanisms might represent two extreme modes of a continuum and both can be integrated
into a single extended encoding concept (Kumar et al., 2010; Ainsworth et al., 2012).
So far we discussed coding properties of single cells, but apart from different spiking patterns or the
rate of action potentials (Georgopoulos et al., 1986; Shadlen and Newsome, 1998) much of the attention has focused on whether information carried by single neurons, or by neuronal populations. While
the sparse or selective coding strategy postulates that an individual neuron, sometimes also referred to
as ”grandmother cell”, may encode for only one item (e.g. Barlow, 1972; Quiroga et al., 2005), in a
fully distributed, ensemble or population coding scheme, each stimulus is coded by a pattern of activity
across a larger number of cells. Sparse distributed coding falls between these two schemes, where the
simultaneous activation of a small proportion of neurons encodes one item, and each neuron contributes
to the representation of only a few stimuli. As neurons in the cortex are densely interconnected, locally
and distally, despite evidence for a clear role of individual neurons, involvement of multiple cells during
sensory processing or motor activity clearly implicates that population coding must play an important
role too (Sakurai, 1996). The question arises of how many neurons are needed or, what size of a functional group of neurons is required to generate a valid representation. For example, the encoding of
information can emerge from a more complex temporal organization in networks of spiking cells, such
as functional groups of neurons which act as temporally transient cell assemblies (Hebb, 1949; Gerstein
et al., 1989; Nicolelis et al., 1995; Harris, 2005; Riehle et al., 1997). Moreover, it is of importance
whether interactions between the spiking of different neurons provide additional significant information
about a stimulus that cannot be obtained by considering all of their firing patterns individually (correlation coding), or, whether individual neurons fire spikes independently of each other.
3
Such correlations might play a role for even more elaborate encoding strategies which suggest that
different electrophysiological signals can be integrated and that spike phase-timing, i.e. the relative
timing of action potentials with respect to an ongoing oscillation, may serve as an encoding mechanism
(Hopfield, 1995; Kayser et al., 2009; Panzeri et al., 2010).
1.1.2 Decoding and stimulus reconstruction - methods and algorithms for retrieving information conveyed by spiking activity
Information conveyed by spiking activity can be retrieved either using the classical approach based upon
the tuning curve or by decoding or predicting the stimulus or behavior that elicits a particular neuronal
response. Thereby, the response can represent any kind of neuronal activity, either a single neuron or a
population response, a (trial-averaged) spike count, a spike sequence or an instantaneous firing rate.
In the tuning curve approach, which has been widely applied for analyses of various sensory areas, the
neural response is characterized as function of one particular stimulus attribute. Well-known studies employing this method have revealed evidence for the tuning of primary visual cortex neurons to the spatial
location, orientation, direction of motion of visual stimuli (Hubel and Wiesel, 1962), sound frequency
and intensity in the primary auditory cortex (Schreiner et al., 2000), and to planned direction of reaching
movements in the primary motor cortex (Georgopoulos et al., 1986). Rather than analyzing what neural
activity a particular stimulus leads to, more recent reconstruction or classification methods attempt to
predict what stimulus produces a particular neuronal response (Bialek et al., 1991; Rieke et al., 1997).
This paradigm, is analogous to the task a downstream neuron might perform when reading out its input
spike trains. A great variety of machine learning and pattern recognition techniques has been employed
for reconstructing stimuli from neuronal signals. A thorough discussion of decoding algorithms can be
found in e.g. Oram et al. (1998); Dayan and Abbott (2005); Pouget et al. (2000).
A descriptive example can be provided by Bayesian classification (see fig. 1.3): if P(si ) denotes the
prior probability of observing a stimulus si belonging to the set of stimuli S = {s1 , s2 , ..., sN } and P(r|si )
is the conditional probability to obtain the neuronal response r when stimulus si was presented, then, by
employing Bayes’ theorem, we obtain:
P(r|si )P(si )
P(r)
(1.1)
j=1 P(r|sj )P(sj )
(1.2)
P(si |r) =
with
P(r) =
∑N
Equation 1.1 gives the posterior probability for a stimulus si being present given the observed single-trial
response r. We can then predict the most probable stimulus ŝ that could have elicited the response r by
assigning it to its most likely stimulus class label ŝ = arg max P(si |r), after having obtained posterior
si
probabilities for all stimuli si ∈ S (fig. 1.3).
Besides Bayesian decoding further examples of classifiers include nearest-neighbor algorithms which
assign a given neural response to the class of its nearest neighbor. Fisher linear discriminant classifiers
and support vector machines project the original data to a space that optimally separates the samples
4
.
stimuli s1 , s2 ∈ S
encoding
decoding
reconstructed stimulus ŝ
s1
trial 1
..
.
trial 1
trial k .
..
Probability
trial i .
time
.
neural response r
s2
r
. time
trial k .
Spike rate (Hz)
conditional probabilty P (r|s)
time
neural response r1 , r2 ∈ R
Figure 1.3: Illustration of Bayesian classification and the neuronal coding problem. While encoding can be
regarded as the generation of specific activity patterns that serve as representation for these stimuli or behavioral
events, decoding represents the process of reconstructing/extracting information about a stimulus from a given
neural response, for instance through Bayesian decoding by predicting the most probable stimulus that could have
elicited an observed spike train. First, spike trains from multiple trials of a simulated single neuron are converted
to spike rates that represent the neuronal response. Then, the response distributions P(r|s) corresponding to two
different stimuli, s1 , s2 , here: a wheel-turn and a lever-press, are estimated. When the neuron fires a single-trial
response r just about the average spike rate to s2 , stimulus s2 will be decoded.
of each stimulus class (Balaguer-Ballester et al., 2011). Additionally, decoding can be performed by
training an artificial neural network (Nicolelis et al., 1998; Wessberg et al., 2000).
So far we discussed principles of decoding. However, before neural activity can be decoded, first the
sequence of successive spike arrival times t1 < . . . < tN needs to be transformed to a more feasible
representation of spiking activity, e.g. by converting a spike train to a spike count rate as in the Bayesian
decoding example (fig. 1.3).
5
1.2 Representations of neuronal spiking activity
The complex temporal structure and dynamics of univariate single neuron and multivariate spiking activity pose the need for advanced methods when analyzing precise temporal spiking of single cells, correlations across multiple neurons or population dynamics (Brown et al., 2004; Churchland et al., 2007;
Kass et al., 2005). Thus, prior to being able to access the information which is conveyed by spiking
activity e.g. by decoding, spike train data (successive spike arrival times) needs to be transformed first
to generate a denoised, more interpretable representation of neuronal spiking activity. Discrete spike
trains can be transformed in various ways to representations of the underlying neuronal activity. These
Representations can comprise either a single neuron or a population response, a binary spike sequence,
a (trial-averaged) spike count or an instantaneous firing rate.
Precise spiking pattern
Constant rate function
20
N=1
PSTH
0 .
0
50
20
N=50
20
N=50
0 .
0
0 .
0
30
20
20
N=1
50
N=50
0 .
0
30
20
N=50
KDE
ifr [Hz]
N=1
0 .
0
Precise spiking pattern
Constant rate function
20
ifr [Hz]
N=1
20
50
0 .
0
t [s]
0 .
0
30
50
0 .
0
t [s]
30
Figure 1.4: Comparison of firing activity obtained with PSTH and by Gaussian kernel smoothing for constant spiking activity and temporally precise spiking patterns when averaged over N=50 simulated trials and single
trial estimates, dashed line indicates the true underlying rate. While PSTH only converges to the true rate when
averaged over a sufficiently high number of trials for both precise pattern and constant activity (top panel), instantaneous firing rates obtained by kernel smoothing can approximate the underlying spiking activity on a single trial
basis (lower panel).
1.2.1 Model-free and model-based approaches in firing rate estimation
The irregular spiking activity of a neuron is considered mathematically as a stochastic point-process or
probability function (Johnson, 1996; Tuckwell, 1988; Perkel et al., 1967; Cox and Isham, 1980) and can
be generally described by the observed number of spikes per time interval, the spike count or spike rate.
Due to the probabilistic nature of spike trains, the same underlying input rate will not produce an identical spiking pattern. Therefore, in the most commonly employed approach, spike counts are averaged
across many presumably equal trials in form of a peri-stimulus time histogram (PSTH; Gerstein and
Kiang, 1960), which in case of a sufficiently large number of trials will approximate the true underlying
rate. Figure 1.4, top panel shows PSTHs (gray bars) for simulated constant spiking activity of 5 Hz and
precise spike patterns (short pulses of milliseconds of elevated firing of 10 Hz on top of lower back-
6
ground spiking rates of 3 Hz). When averaging over 50 trials (fig. 1.4, top right) the noise level is clearly
reduced and estimated rates approximate true underlying rates (dashed lines) compared to non-averaged,
single-trial PSTHs (fig. 1.4, top left). Although noise may be reduced by averaging across multiple experimental trials to produce a smooth firing rate estimate, averaging is sometimes not desirable and can
even be misleading. For instance, if neural responses reflect internal processing over external stimulus
drive, time courses of the neural responses differ or, trials are self-paced (in the case of freely moving
animal), single-trial analysis should be preferred. This is particularly important in behavioral tasks involving motor planning, decision making, rule learning and perception (Nawrot et al., 1999; Horwitz
and Newsome, 2001; Czanner et al., 2008; Mante et al., 2013; Churchland et al., 2010).
.
.
.
Figure 1.5: Setting the kernel bandwidth determines temporal precision and smoothness of the firing rate
estimate (gray lines). For KDE we employed a normal (Gaussian) N (µ, σ 2 ) probability density function as kernel.
µ is centered at spike times (denoted by black vertical lines) and the standard deviation σ controls the width of
the kernel. When the bandwidth is small (left) compared to average ISI length one is effectively measuring the
position of individual spikes, making it more a measure of spike timing while the estimated rate is more noisy.
By setting reasonable bandwidth (middle) one may capture trends and precise spike patterns, whereas a too large
bandwidth (right) leads to oversmoothing and gives an estimate of the average spike rate.
The instantaneous firing rate obtained by kernel smoothing can resolve this problem by giving denoised
single-trial estimates without averaging and at the same time capture precise spike timing if the temporal resolution is sufficiently high. Figure 1.4, bottom panel, shows trial-averaged and single-trial firing
rates obtained with KDE. In this setting, single-trial KDEs already approximate the true underlying rate,
in contrast to the PSTH. Kernel smoothing, originally known as the Parzen window method (Parzen,
1962) or, also referred to as kernel density estimation (KDE; Shimazaki and Shinomoto, 2010; Nawrot
et al., 1999; Bowman and Azzalini, 1997; Silverman, 1986; Wand and Jones, 1994) is closely related
to the PSTH but provides a smooth and continuous firing rate estimate by after setting time interval
for counting the number of spikes, replacing each spike by a kernel, i.e. probability density function.
Each point (firing rate value) in time is then set to a weighted average of that point’s neighborhood. The
convolution of the spike train with a kernel acts as low-pass filter averaging out rapid changes in the
firing frequency. Similarly to the bin width of the PSTH, varying the kernel bandwidth, i.e. the width
of probability density function, will change the temporal precision of resulting firing rate estimates, as
shown in figure 1.5. Alternatively, recently employed techniques for obtaining denoised, smoothed estimates of single-neuron instantaneous firing rates on a trial-by-trial basis which are more tailored to
neural data include Bayesian binning (BB; Endres et al., 2008), Bayesian adaptive regression splines
(BARS; Dimatteo et al., 2001; Olson et al., 2000; Kaufman et al., 2005; Kass et al., 2005), Gaussian
process firing rates (GPFR; Cunningham et al., 2008) or generalized linear models (GLMs; Truccolo
et al., 2005; Barbieri, 2001; Czanner et al., 2008).
7
1.2.2 Model complexity, parameter estimation and optimization in probabilistic and nonprobabilistic approaches
Many sophisticated methods are probabilistic (BB, BARS, GPFR, GLM), i.e. they require assumptions
of the true firing rate distribution (based on a parametric form, such as a homogeneous Poisson distribution, Daley and Vere-Jones, 2003) to estimate the most likely firing rate function via maximum
likelihood or Bayesian inference. These methods perform poorly if the parametric form deviates from
the empirical firing rate. Additionally, in case of many parameters to be estimated they suffer from
technical complexity which leads to increasing computational run-time (Cunningham et al., 2009; Kass
et al., 2005). Particularly for exploratory data analysis, where the underlying spiking structure is not
well known and first has to be investigated informally (Tukey, 1977), KDEs provide a good trade-off
between rather inaccurate multiple-trial firing rate estimators (PSTH) and more complex probabilistic
models, which are more efficient, but require greater computational cost and perform poorly when the
assumptions of the parametric model are not met.
ifr [Hz]
MISE
The most obvious advantages of kernel density methods are that they are simple to implement and nonparametric (i.e requiring no assumptions about the underlying firing rate distribution).
Equivalently to choosing the time interval for counting the number of spikes PSTH, the temporal precision of the KDE is set by a single parameter (the kernel width) which also determines the smoothness
of the instantaneous firing rate (fig. 1.5). The optimal bandwidth can be inferred in multiple ways: informally, e.g. by visual inspection, by rule of thumb (as a multiple of the average ISI) or, according to
some formal optimization measures (Bowman, 1984; Rudemo, 1982; Shimazaki and Shinomoto, 2010).
These usually quantify the discrepancy between the estimate and the underlying rate by some error criterion. The optimized bandwidth will then be the bandwidth value that minimizes the error measured by
the error criterion (fig. 1.6).
. .
.
.
too small
too large
optimal bandwidth
time [s]
Figure 1.6: Error criterion for the bandwith parameter and the optimal bandwidth. Left: True underlying
rate (black dashed line), noisy firing rate estimate with a too narrow bandwidth choice (light gray), oversmoothed
firing rate estimate with a too large bandwidth (dark gray). Right: mean integrated squared error (MISE) quantifies
deviation from the underlying rate (on the left) as a function of the bandwidth, MISE values for the two nonoptimal firing rate estimates shown in the left plot are indicated as marks on the x-axis.
Throughout the thesis we will make use of optimized KDEs with bandwidths estimated through unbiased cross-validation (UCV) for obtaining instantaneous firing rates which serve as a representation of
the underlying neuronal spiking activity. Both approaches will be discussed in detail in sections 2.3 and
2.2.
8
1.3 Motivation & goal of the thesis
The Link between neuronal representations and decoding
Decoding and finding a representation of spiking activity are tightly interlinked processes. The estimated
firing rate will impact subsequent classification and inference about encoding mechanisms. Conversely,
encoding mechanisms influence the temporal structure of single neuron and population spiking activity
that will be reflected in the value of optimized single neuron bandwidth estimates.
The main goal of the thesis is to outline this relationship by focusing on three major properties of optimized bandwidth estimates: First, their role as characteristic value which implicates information on
temporal spiking structure and single neuron coding properties as well as their contribution to segregate
groups of cells with distinct spiking structure; Second, their contribution to improved decoding performance by optimal noise reduction in instantaneous firing rate estimates; And third, the conclusions we
can draw about neuronal coding mechanisms when applying optimized bandwidths and KDEs to analyze single-trial representations, to identify ensembles of cells encoding for task-events and to unfold
neuronal population dynamics.
Although bandwidth optimization is not a novel approach, its application in this context is unique: Despite an abundance of general-purpose statistical bandwidth optimization routines (Scott and Terrell,
1987; Silverman, 1986; Simonoff, 1996; Sain, 2002; Sain and Scott, 2002; Hazelton, 2003; Antoniadis
et al., 2009; Bouezmarni and Rombouts, 2010; Tran, 2010; Zougab et al., 2014), surprisingly few of
these approaches have been applied to neural data. Instead, increasing attention is directed towards
model-based techniques which are specifically tailored to analyze spike trains. However, these are derived from strict assumptions about the underlying properties of data, e.g. homogeneous Poisson spiking,
as the method proposed by Shimazaki and Shinomoto (2010), which often do not hold. Therefore, there
is a need to exploit these pre-existing model-free bandwidth optimization routines for spike train analysis. Furthermore, bandwidth optimizer have been employed to spike train data primarily for facilitating
estimation of firing rates and subsequent decoding, in a two-stage approach. Most these studies focus on
the general performance compared to other methods for ifr estimation (Cunningham et al., 2009; Kass
et al., 2005). However, to date validity and unbiasedness of the bandwidth optimization routine outcome
in decoding have not been analyzed systematically. Additionally, exploration of many highly interesting
features, such as temporal structure of spike trains, which can be highlighted by optimized bandwidth
estimates, have been neglected.
This work aims to address this gap in the research by analyzing data from multiple simultaneously
recorded prefrontal cortex neurons of freely behaving rodents during the performance of complex stimulus operant-based self-paced tasks.
9
I. k e r n e l s m o o t hi n g
⊗
. h
spiking activity
representation of spiking activity
unit 1
.
MISE
unit 2
t
t
np
.
.
wt
.
misclassification error
.
h
ca
tio
n
optimal bandwidth
i
sif
las
c
II.
III
.p
er
for
ma
nce
Neuron 2 (spikes/sec)
encoding scheme
. .nose-poke
. .wheel-turn
class 1 class 2
.
Neuron 1 (spikes/sec)
Figure 1.7: Synthesis and link between the coding problem and finding a valid representation of neuronal
spiking activity: illustration of the two-stage approach for the simulated spiking activity of two hypothetical
neurons in response to two stimuli: wheel-turn (wt) and nose-poke (np). In a first step optimized bandwidth estimates are obtained from both units respectively. These are pugged-in as parameters for subsequent KDE. The
estimated multivariate firing rates (ifr neuron one vs. ifr neuron two) are then used to uncover the encoding mechanism, here: correlation- and rate-coding. Decoding performance can then be quantified by the misclassification
error, i.e. the relative number of hits or matches.
.
10
Thesis organization:
After the general introduction, an overview over the employed methods, and a description of the behavioral task and electrophysiological recordings is given. The first part will cover the basic properties of
optimized bandwidth estimates which are usually employed to infer the instantaneous firing rate by kernel density estimation. We focus on general features of the single unit UCV-based bandwidth outcomes
by examining their distribution and correlation with further spike train statistics in experimental data.
We investigate what information on temporal structure of spike trains bandwidth estimates can reflect
by analyzing simulated data with temporally structured spiking patterns.
In the second part, we combine firing rate estimation methods with classification techniques to decode
animal behavior during the task based on spiking activity from in-vivo recordings and simulated data
sets. As part of this two-stage approach (kernel density estimation in combination with decoding, see
fig. 1.7), we address the advantages, disadvantages of the method and its contribution to the improvement in decoding performance by comparing the outcome of optimized with non-optimized KDEs.
This is especially important given, in optimization procedures bandwidth estimates are inferred for each
recorded neuron individually while non-optimized bandwidths are equal across all cells. Additionally,
we examine validity and unbiasedness of UCV-based bandwidth estimates by systematically scaling
their outcomes to ascertain if a more optimal scaled value for the decoding procedure exists.
Lastly, we will give an outlook which inferences can be made about encoding mechanisms when applying optimized KDEs to study the ensemble activity of simultaneously recorded neurons from the rat
mPFC during a cognitive higher-order sequence-processing task.
The second and third part have already been published as conference abstracts in the context of several
conference poster presentations:
Kornienko O, Ma L, Hyman JM, Seamans JK, Durstewitz D. (2013). Analysis of population coding in
the rat medial prefrontal cortex. (Poster presentation). Society for Neuroscience 43nd Annual Meeting,
San Diego, CA.
Kornienko et al.: Neuronal coding in the rodent prefrontal cortex. BMC Neuroscience 2013 14(Suppl
1): P117. (Poster presentation). 22nd Annual Computational Neuroscience Meeting, Paris, France.
Kornienko O, Ma L, Hyman JM, Seamans JK, Durstewitz D. (2012). Reconstructing neural population
dynamics during sequence processing in rat prefrontal cortex. (Poster presentation). Society for Neuroscience 42nd Annual Meeting, New Orleans, LA.
11
2 Methods
In the following chapter we will introduce methods statistical concepts for spike train data analysis and
the experimental data which will be used throughout this study. First, we will make reference to more
general spike train statistics measures and regularity metrics. Then, we will outline the representation of
the spike train by its instantaneous firing rate and the procedure how to determine the optimal bandwidth
for instantaneous firing rate estimation. And, finally the last sections of this chapter are dedicated to
neuronal decoding with the aid of linear discrimination analysis based on instantaneous firing rates, the
validation of the decoding results and the in-vivo data sets we applied our methods to.
All procedures and analyses described throughout the thesis are implemented and run using custom
written code (MathWorks, Inc., Natick, Massachusetts).
2.1 Spike train regularity measures
A standard metric for measuring the variability or irregularity of a spike train is the coefficient of variation Cv , which defined as
Cv =
σISI
µISI
(2.1)
where µISI and σISI denote the mean and the standard deviation of the ISI distribution. A Cv value close
to zero implies that ISIs are almost constant and that the spike train is highly regular, while values close
to unity are expected for a Poisson process (Softky and Koch, 1993; Holt et al., 1996; Ponce-Alvarez
et al., 2010). It is worth noting that the Cv will not give an accurate estimate of the ISI variability if
µISI and σISI are not stationary and the firing rate changes over time (Softky and Koch, 1993; Holt et al.,
1996). Thus, the Cv represents rather a global measure of regularity of spikes emitted by a neuron that is
sensitive to fluctuations in the neurons firing rate. Accordingly, a very high Cv value might reflect high
firing rate fluctuations (Wohrer et al., 2013).
To decrease the effect of such fluctuation and allow for local non-stationarities metrics which measure
the local regularity have been developed. Proposed by (Shinomoto et al., 2003, 2002) the local variation
Lv compares only adjacent inter-spike intervals
Lv =
3 ∑ (ISIn − ISIn+1 )2
N −1
(ISIn + ISIn+1 )2
(2.2)
where ISIn = tn − tn−1 and n=2,...,N . N is the number of emitted spikes. The Lv is equal to zero for
distributions reflecting constant ISIs and expected to be near one for Poisson processes.
12
ISIn+1 [ms]
ISIn+1 [ms]
A further method to characterize the serial depenA) Bursty spiking
dence of adjacent ISIs is by graphical examination
of joint ISI (JISI) scatter plots. JISI plots are a
104
.
widely used technique in detecting nonlinear dynamics, which are also referred to as Poincare, re102
turn or recurrence maps (Abarbanel et al., 1996; Segundo et al., 1998; Fitzurka and Tam, 1999; Szucs
100 .
100
102
104
ISIn [ms]
et al., 2003). In such plots each point corresponds
B) Quasi-random spiking
to a value pair of consecutive ISIs (ISIn ,ISIn+1 )
among three adjacent spikes. Points falling into
104
the right-lower part indicate spike triplets with a
.
short ISI following a longer one and points in
102
the left-upper half correspond to triplets with a
longer ISI following a shorter one (fig. 2.1, top
100 .
panel).
100
102
104
ISIn [ms]
This plots can highlight points concentrating in
increased local density which indicate precisely Figure 2.1: JISI scatter plots of spiking activity of
replicating spike triplet patterns (Szucs et al., two recorded units from the rat mPFC, taken from
sequence-switch task (described in sec. 5.4 and 2.6)
2003).
2.2 Kernel densitity esimation
The classical exploratory approach, representation of the data in histogram or PSTH form, which conveys visual information of frequency and the relative frequencies of observations, which is the essence of
any density function, comes with several drawbacks. As displayed in bin counts which are constructed
on the set of equal-sized non-overlapping intervals, the PSTH suffers from discontinuities (see fig. 1.4,
top panel), i.e. sharp steps between the bins and variability when the placement of bins is differed. To
overcome both shortcomings one can apply a moving average, which calculates the local average value
in a window centered around each data point ti and which can be viewed mathematically as convolving
the original data with a uniform rectangular window
ν̂h (t) =
#{ti ∈ (t − h, t + h]}
2nh
(2.3)
Equation 2.3 gives the basic kernel density estimator which was introduced as the Parzen-Rosenblatt
window method (Rosenblatt, 1956; Parzen, 1962) and can be rewritten as
ν̂h (t) =
n
)
(
1 ∑ t − ti
K
nh
h
(2.4)
i=1
where h is the bandwidth and K(u) is the kernel, which is the uniform probability density function
K(u) = 21 , if −1 < u ≤ 1 and zero otherwise, n is the total number of spikes and t1 < . . . < tn are successive single unit spike arrival times.
13
Principally, the kernel function K(u) can be any probability density which has to satisfy the conditions
∫
∫
∫
K (u) ≥ 0, K (u) du = 1, uK (u) du = 0, u 2 K (u) du ≤ ∞.
Instead of using a rectangular window each point can be replaced by a locally weighted moving average
which assigns different weights to adjacent points in the sample window by multiplying these with
different factors (e.g. Gaussian function in fig. 2.2). The weighted moving average provides a smooth,
continuous density estimate averaging out undesired high-frequency components which has the same
effect as low-pass filtering the signal.
In order to obtain firing rates from spiking activity we employed Gaussian pdfs also known as Gaussian
smoothing so that, after substituting the Gaussian kernel function in equation 2.4, the instantaneous
firing rate estimate can then be written as follows,
)
(
n
−(t − ti )2
1 ∑
exp
ν̂h (t) =
√
(2.5)
2h2
n 2πh
i=1
Kernel density estimates of spiking activity ν̂h (t) were obtained at a temporal resolution of 100 ms as a
function of time bin t for all recorded neurons.
spike train
Gaussian kernel
⊗
.
0 < t1 < t2 < ... < tn
. h
constructed KDE
=
.
(
1
)
2
−u
exp
K (u) = √
2h2
2πh
.
(
)
n
−(t − ti )2
1 ∑
exp
νh (t) = √
2h2
n 2πh i=1
Figure 2.2: Illustration of Gaussian smoothing Convolution with Gaussian kernels: the KDE (dark green) as
average over probability densities (light green) centered at the spike times ti (black vertical lines).
2.3 Selecting the optimal smoothness of kernel density estimates by unbiased cross-validation
Kernel density estimation is left with one free parameter: the width of the Gaussian kernel h. The choice
of h is the major problem in KDE and selecting an appropriate bandwidth is crucial since it highlights
different aspects in the structure of the data, effects the smoothness of the instantaneous firing rate and
thus determines the goodness-of-fit of the density estimate to the underlying spiking activity.
The optimal bandwidth can be inferred in multiple ways: informally by rule of thumb, e.g. as a multiple
of the average ISI, or, according to more formal optimization criteria (Bowman, 1984; Rudemo, 1982;
Shimazaki and Shinomoto, 2010). The average ISI, which is equal to the inverse of the mean firing rate
⟨ISI⟩ = 1/⟨ν⟩, can be a reasonable choice if spiking activity is homogeneous, i.e. following a Poisson
process with approximately constant firing rate ν over time. However, often this is not the case so
14
that more accurate measures are needed. More formal, mathematical approaches usually quantify the
discrepancy between the estimate ν̂h and the underlying rate ν by some error criterion. A commonly
employed measure to quantify the accuracy of the firing rate estimate is the mean integrated square error
(MISE; Simonoff, 1996; Silverman, 1986). MISE calculates the distance between the estimated rate ν̂h
and the actual rate ν as a function of the kernel width h.
∫
MI SE (h) = E [ISE(h)] = E (ν̂h − ν)2 (t)dt
(2.6)
The optimal bandwidth will then be identified as the bandwidth value that minimizes the error function
(as illustrated in fig. 1.6). However, since the underlying rate is usually unknown, MISE itself has to be
approximated asymptotically. Several studies suggested bandwidth selection with approximate MISE
by cross-validating the data (Scott and Terrell, 1987; Silverman, 1986; Rudemo, 1982; Bowman, 1984;
Loader, 1999), i.e. calculating MISE for partitions of the data where one part is treated as the actual and
the other as the estimated firing rate.
In the following we will make use of the bandwidth selection method (fully described in Rudemo, 1982;
Bowman, 1984) referred to as least-squares cross-validation or unbiased cross-validation (UCV) which
provides an explicit solution for the error function. The idea is to consider the expansion of the integrated
square error (ISE) in the following way
I SE(h) =
=
∫
∫
(ν̂h − ν)2 (t)dt
∫
∫
2
ν̂h (t)dt − 2 ν(t)ν̂h (t)dt + ν 2 (t)dt
(2.7)
The last term in eq. (2.7) is constant as it does not depend on ν̂h or h, so that only the first two terms
need to be considered. The ideal choice of bandwidth is the one which minimizes
∫
∫
∫
2
2
I SE(h) − ν (t)dt =
(2.8)
ν̂h (t)dt − 2 ν(t)ν̂h (t)dt
The CV approach suggests removing one observation at a time tj , j=1,...,n, and calculating the usual
kernel estimator based on the remaining n − 1 data points
∑ (t − t )
1
i
ν̂h,−j (t) =
K
, j = 1, ..., n
(2.9)
(n − 1)h
h
i,j
∫
Accordingly, the last integral in eq. (2.8) can be then replaced by (ν̂h ν)(t)dt = E[ν̂h,n−1 (t)] where the
expectation is computed using the sample mean of the leave-one-out kernel density estimator in (2.9)
n
E[ν̂h,n−1 (t)] =
1∑
ν̂h,−j (tj )
n
j=1
=
)
n ∑ (
∑
tj − ti
1
K
n(n − 1)h
h
(2.10)
j=1 j,i
∫
Using the right hand side of (2.8) and replacing (ν̂h ν)(t)dt by E[ν̂h,n−1 (t)] results in the UCV function
∫
n
2∑
2
UCV (h) =
ν̂h (t)dt −
ν̂h,−j (tj )
(2.11)
n
j=1
15
After substituting the Gaussian KDE (2.5) in (2.8) we obtain the following expression (Taylor, 1989),






2
2
∑
√ ∑
2 
−
(
t
−
t
)
−
(
t
−
t
)





4n
4n
1
j
i
j
i



 −
 +

UCV ( h ) =
exp 
exp 
√  2


2
2
n−1
n − 1 
4h
2h
2n2 h 2π 
i,j
i,j
(2.12)
UCV
The optimal UCV-based bandwidth hucv is then the
bandwidth for which the value of the UCV function has
its minimum. We minimized U CV (h) with respect to
h numerically by means of the Nelder-Mead algorithm
.
(a direct unconstrained nonlinear optimization method
optimal bandwidth hucv h
provided by the built-in Matlab function ’fminsearch’,
for more details see Lagarias et al., 1998). The mean Figure 2.3: UCV estimate as a function of the
interspike interval was used throughout the thesis as an bandwidth h for spiking activity of a simulated
unit (as shown in fig. 1.4, second column).
initial estimate for the fminsearch fitting procedure.
2.4 Decoding behavior from spike train data: the classification procedure
In order to decode animal behavior from multiple single unit spiking activity we employed Fisher’s
linear discriminant analysis (LDA) in combination with cross-validation (e.g. Hastie et al., 2009).
First, Kernel density estimates were obtained from all recorded neurons by convolving spike trains with
Gaussian kernels binned at a temporal resolution of 100 ms (as described in sec. 2.2). Depending on the
goal of classification the smoothing degree, i.e. the width of the Gaussian kernel, was either selected
arbitrarily or determined by the UCV optimization routine (eq. 2.12). Single unit instantaneous firing
rates were then combined to p-dimensional population vectors ν(t) = {ν1 (t), . . . , νp (t)} as a function of
time bin t.
As LDA is a supervised learning algorithm where labeled data is necessary to build the classification rule
we constructed time-dependent vectors y(t) containing class labels corresponding to the different events.
Based on event time stamps which were acquired during a recording session, each sampling point in
time ti centered around the time window of 500 ms preceding and following task-relevant behavior of
the animal was assigned to k = 3 distinct classes ck corresponding to three different actions performed
on task and summarized in a response-class vector y(t) ∈ {1, 2, 3} as a function of time (fig. 2.4, a).
Similarly we extracted connected blocks corresponding response-class specific population vectors of
spiking activity ν(t|ck ) discarding all points lying outside the time window of task-relevant events. Also,
we analyzed systematically the classification outcome applying different time windows. However, since
we did not observe any qualitatively significant change, the time window of 1 s was kept fixed for all
classification routines. All labeled points were divided into segments according to distinct trials.
16
decision surface
10
ν(t|c2 )
Neuron 2 (spikes/sec)
ν(t|c1 )
ν1
ν2
. .nose-poke
. .wheel-turn
µ1
5
W
pro
jec
cla
ss
µ2
1
tio
no
no
pti
ma
ld
cla
ss
isc
rim
ina
2
tio
t
(a)
|
np
|
. y =(1, ..., 1,
|
...
wt
(b)
|
nd
ire
0 .
ctio
n
5
2, ..., 2)
10
Neuron 1 (spikes/sec)
Figure 2.4: Simulation illustrating Fisher’s LDA with two classes and two spiking units. a: Obtained firing
rates (green) and extracted consecutive blocks of spiking activity ν(t|ck ) centered around ±500ms task-relevant
behavior of the animal, here: nose-poke (np) and a wheel-turn (wt). Arrows on the x-axis indicate event timestamps. Similarly, the corresponding response-class vector y(t) ∈ {1, 2} is constructed by assigning each point in
time to one of the two classes. b: Firing rates of two units during the two different responses shown in a) are plotted
in a 2-dimensional coordinate system and color-coded according to their class-membership. The underlying firing
rates of the two neurons are correlated for both responses, clearly discernible by the covariance ellipses. LDA
finds the direction in the two dimensional space of greatest separation between the two classes.
Class-specific population vectors of spiking activity ν(t|ck ) and the response-class vector y(t) were then
employed to build a LDA classifier and to assess decoding accuracy by means of cross-validation as
discussed in the next section.
Fisher’s discriminant analysis works by maximizing the difference between class means µk and thus
the between-class scatter Sb while minimizing within-class covariances Sw , in other words, by finding
the direction W of the high-dimensional space along which the overlap between class distributions is
minimized (this is illustrated in fig 2.4 b using two neurons) and µk , Sb , Sw are defined as follows
µk =
1 ∑
ν(ti ),
Nk t ∈c
i
Sw =
k
K ∑
∑
k=1 ti ∈ck
µ=
K
1∑
µk
N
(2.13)
k=1
(ν(ti ) − µk )(ν(ti ) − µk )⊤ ,
Sb =
K
∑
k=1
Nk (µk − µ)(µk − µ)⊤
(2.14)
The projections y(ti ) of high-dimensional vectors ν(ti ) onto the optimally discriminating direction W
are obtained by
y(ti ) = W ⊤ ν(ti )
(2.15)
The optimally discriminating direction or the so-called weight values or the projection matrix W are
determined by those eigenvectors of Sw−1 Sb that correspond to the K largest eigenvalues, by maximizing
the following equation
W ∗ = argmax
17
|W ⊤ Sb W |
|W ⊤ Sw W |
(2.16)
This corresponds to maximizing between-class while minimizing within-class scatter as illustrated in
figure 2.4 b for the two dimensional case with two neurons. Similarly, projecting the firing rate vector of
multiple single units ν(ti ) for one point in time ti is equivalent to assigning the same vector to the class
with minimum Mahalanobis distance δk
√
δk = (ν(ti ) − µk )Σ−1 (ν(ti ) − µk )⊤
(2.17)
where Σ is the pooled covariance matrix so that the predicted class labels ŷ are
ŷ(ν(ti )) = arg min δk (ν(ti ))
(2.18)
k
Performance of LDA is limited by the fact that the model assumes the data has a Gaussian mixture
distribution with varying class means but the same covariance matrix for each class. Since that is often
not the case we also employed a generalized model, quadratic discriminant analysis (QDA), where both
means and covariances of each class vary and decision boundaries are not linear.
2.5 Evaluating decoding accuracy by m-fold cross-validation
To evaluate the prediction power of a model or in our case smoothed multiple single unit spike train
data, and the applied classifier, it is of great importance to make use of objective measures that assess
the agreement between the prediction and the underlying observations. One of the most widely used
methods for estimating the decoding performance is cross-validation, which estimates well the expected
prediction error (e.g. Efron, 2004; Hastie et al., 2009; Bishop, 2006; Krzanowski, 2000). The idea
behind cross-validation is that it avoids over-fitting, in terms of the bias-variance trade-off (Hastie et al.,
2009). As, by fitting all available data at once by one single model, one tends to fit rather the variance
or noise than the signal. Instead, we employ multiple subsets of the sample for both estimation of
model parameters, in our case classifier weights W, and assessment of classification performance. This
procedure provides a more generalized and accurate measure of the decoding performance (Efron, 2004;
Hastie et al., 2009).
In m-fold cross-validation a subset of the available data is used to build the classification rule, and a
disjoint, hold out set to test it. In more detail, the data is split into m roughly equal-sized parts, each
containing n points. We fit the model to and build the classifier on m − 1 parts of the data, the training
set, and calculate the out-of-sample prediction error of the fitted model by predicting the left out, mth
part of the data, the test set. Thus, the test set prediction error Errm is denoted as
n
Errm =
1 ∑ −i
I [ŷ (ν(ti )) = y(ν(ti ))]
n
(2.19)
i=1
where I is the indicator function, the superscript −i indicates that data points ν(ti ) were left out for
building the classifier, ŷ denotes predicted class labels and y is the true class membership of ν(ti ).
This procedure is repeated for all i = 1, . . . , m parts. Averaging over the m resulting estimates yields
the m-fold cross-validated test set classification error (CVE) or more generally the expected prediction
error, which for the sake of simplicity will be referred to in the following chapters of the thesis as
18
misclassification error or Err, given by
m
Err =
1∑
Errj
m
(2.20)
j=1
After obtaining response-class vectors y(t) and class-specific population firing rates ν(t|ck ) for each
data set all points were further assigned to l distinct trials. We preserved the temporal structure of
trials in the data. Thus, any sample could be split maximally into m ≤ l parts each containing at least
n/3 points of each response-class. If the number of folds was smaller than the amount of trials per
recording session m < l there is a multinomial number of ways of partitioning l trials into m distinct
l!
parts, with ni objects in each part i resulting in n !...n
possible combinations to divide trials in training
1
m!
and test sets. For instance for a data set comprised of l = 30 trials and m = 3 folds this gives a total
30!
= 5.551 1013 different possible combinations to choose training and test sets. To account
of 10!10!10!
for the vast number of possible sample partitions the CV procedure was repeated multiple times with
shuffled trials.
original time series
ν(t|c1 )
ν(t|c2 )
ν(t|c3 )
—
build classifier on
predict on
trial 1
trial 2
training set 1
training set 2
trial 1
trial 2
Figure 2.5: Simplified illustration of the m-fold cross-validation scheme with l=2 trials, k=3 response-classes,
class labels are color-coded
The cross-validation procedure consisted of the following consecutive steps summarized in the pseudocode:
1. if m < l
.
permute blocks of trials randomly N times
. else N = 1
. for j = 1 to N
.
2. partition ν(t|ck ) and y(t) into m roughly equal-sized distinct parts preserving the trial structure
.
for i = 1 to m
.
3. build classifier on the training set consisting of m − 1 parts of the data
.
4. test classifier on the mth part and calculate prediction error Erri
.
5. Average error obtained for training parts i and their hold out part −i
∑
.
Errj = 1/m m Erri
6. Average over all permutations N to obtain the cross-validation test set error
∑
.
Err = 1/N N Errj
m-fold cross-validation pseudo-code.
19
Based on our findings in chapter 4 two criteria had a substantial impact on the CV outcome: first the
number of partitions m and second the auto-correlation which is present in each consecutive block of
response class-specific firing rates ν(t|ck ). E.g. taking m = 3 folds resulted in a CVE converging to zero.
However, the test sets contained many similar (autocorrelated) samples which made the interpretation
of the CV result difficult. To overcome this issue, we averaged consecutive blocks of the same class
k for each trial instead of taking multiple sample points from the entire time window centered around
responses. Decoding with temporally averaged class-specific firing rates yielded a poorer performance
compared to non-averaged ν(t|ck ), but also implied qualitatively different information which will be
discussed in chapter 4. We employed two different cross-validation schemes.
1. Leave-one-third-trial-out CV, where m=3 and for each trial we take block-wise averaged responseclass firing rates. This method gives a large pool of temporally uncorrelated training and test
samples which provides a more robust estimate of the error.
2. Leave-one-trial-out CV, where m = l and folds correspond the number of trials per data set. Taking
the non-averaged time-series we preserve auto-correlation and temporal structure of the original
spike train series.
We varied these depending on the purpose of the classification and the issue of interest: e.g. evolution of
single-trial dynamics (chap. 5) was analyzed with the latter and robustness and general inference about
prediction accuracy (chap. 3 and 4) was examined with the former CV scheme. Summarizing, at first
instance we did not aim to achieve a misclassification error close to zero, taking CVE as prediction
accuracy measure, which was the case for particular cross-validation schemes. More importantly, our
primary goal was to determine a suitable measure which allowed for comparisons among certain criteria
of interest, e.g. choice of the bandwidth for kernel smoothing or selection of set of units to build the
classifier. In general, the aim was to identify the model which is superior compared to others with the
aid of the CVE.
2.6 Experimental procedures and electrophysiological recordings
All of our experimental analyses were performed on data from behavioral experiments and extracellular
in-vivo recordings from the rat anterior cingulate cortex (ACC) which were conducted and generously
provided by Dr. Liya Ma and Dr. James Hyman, Brain Research Centre and Department of Psychiatry,
University of British Columbia, Vancouver, Canada.
In-vivo recordings were partly taken from the study published in Hyman et al. (2013), and partly from
the study conducted by Ma (personal communication). Both were involving two distinct operant-based
behavioral tasks (”sequence-switch” and ”alternation task”) with multiple stimuli, task events and responses.
Sequence switch task: In the study conducted by Ma, animals had to perform a fixed sequence of three
actions on a maze before reward, which was reversed after a given number of trials. The first sequence
consisted of the following actions: wheel-turn, lever-press nose-poke (see fig. 5.5). After at least 20
trials or 20 min animals were removed and placed back in the maze which indicated the switch of the
sequential order.
20
Alternation task: In the Hyman et al., 2013 study rats performed a continuous or delayed alternation
task where following a nose-poke one of two levers had to be pressed in an alternating fashion prior to
delivery of a reward. In the delayed version, a delay of 10 sec was introduced between each lever-press
from a preceding trial and the nose-poke initiating the next trial.
For a more detailed description and procedures of the alternation task the reader is referred to the publication by Hyman et al. (2013). Behavioral and electrophysiological details of the sequence-switch task
are best described by the citation provided by Dr. Ma in section 5.4.
We ran our analysis procedures on spike-sorted single units plus time stamps for relevant events, which
were held in the format of MatLab matrices. Overall data from 7 animals, consisting of 19 recording
sessions with 40 trials on average and up to 74 isolated units per recording session, containing 905 units
in total was analyzed.
For decoding analyses we extracted event-time stamps related to the 4s delay period preceding nose
poke, nose-poke and lever press from the alternation task, and to a nose-poke, wheel-turn and leverpress from the sequence-switch task. Only correct trials were included.
21
3 General properties of UCV-based single unit
bandwidth estimates
3.1 Motivation: inferring temporal structure of spike trains
Precise spike train patterns in in-vivo recordings, i.e. exact timing of single neuron spikes in behaving
animals during repeated stimulus presentation have been reported in multiple cortical areas (Bair and
Koch, 1996; Buračas et al., 1998; Reinagel and Reid, 2002; Fellous et al., 2004; Brown et al., 2005;
Raman et al., 2010). Particular precision of spike timing can be characteristic for both specific stimuli
and single neurons (Mainen and Sejnowski, 1995; Reinagel and Reid, 2002). As pointed out in a previous study by Song et al. (2009) a bandwidth estimate ”essentially determines the optimal temporal
resolution used in comparing the predicted spike train with the actual spike train where large bandwidth
values indicate low temporal resolution whereas small values imply high temporal resolution”.
The main aim of the following chapter is twofold: first, to examine if the UCV-based bandwidth estimate reflects information about temporal structure of the spike train as well as what sort of information
it implicates (experimental data) and second, conversely, if the temporal structure, i.e. the precision of
spikes and their signal-to-noise ratio, directly influence the bandwidth outcome (surrogate data).
To answer the second question we generated simulated spike trains with known underlying ground
truth and examined the relationship between bandwidth outcome and precision and signal-to-noise ratio
(sec. 3.2). To answer the first question we looked at measures in experimental in-vivo recordings which
are associated with precise spike timing and examined the relationship to the hucv outcome (sec. 3.3).
We analyzed the three following single unit spike train properties which are related to the temporal structure of spiking activity: First, single unit discrimination power of task-related event occurrences; second,
recurrence of spike triplet patterns, i.e. distribution of joint interspike intervals of successive ISIs pairs
and regions densely clustered/spread; third, general spike train statistics and irregularity measures, i.e.
the coefficient of variation, the local variation and the average firing activity).
In the following, we will to refer to the temporal resolution of spike times interchangeably as precision
or jitter where the precision (units: 1/s) is defined to be the inverse of the jitter (units: s).
22
3.2 Surrogate data: dependency of optimized bandwidth estimated on temporal spiking patterns
To examine how temporal structure of spike trains impacts optimized bandwidth estimates, we generated
surrogate spike trains with known underlying structure and analyzed the UCV-based bandwidth outcome
as function of the parameters which determine the spiking pattern.
3.2.1 Generation of temporally precise spiking patterns
We conducted following consecutive steps:
A homogeneous Poisson point process was generated with specified constant firing rate ν = N /T of
duration T =600s with N number of spikes corresponding initially the background spiking activity or
noise ν = νnoise .
We introduced nsignal short pulses or number of non-overlapping spike train patterns of short duration
Tsignal with mean firing rate νsignal =10 Hz corresponding the signal.
To preserve the mean firing rate of the entire spike train ν, νnoise was decreased by removing the
number of introduced signal spikes Nsignal from the background spike train at random
N ′ = N − Nsignal so that ν = N ′ /T = νsignal + νnoise .
To reduce temporal precision of spike patterns which is equivalent to introducing spike-timing variability or temporal jitter σj , signal spike times were jittered by adding independent normally distributed
random variables with zero mean and standard deviation specified by the amount of jitter ∼ N (0, σj2 )
While keeping νsignal =10 Hz and duration of spike train T =600s constant, we examined the impact
of two parameters on the bandwidth estimate, which essentially determine the temporal structure of the
generated spike patterns:
Precision: Adding the amount of the jitter σj ∈ (0, 32] s to the data changed temporal precision of
spikes.
Signal-to-noise ratio (SNR): We varied number of spiking patterns/short pulses corresponding the
signal nsignal ∈ {5, 10, 25, 50, 75, 100, 200}, signal duration Tsignal ∈ {0.5, 1} s and the average firing
activity of the entire spike train ν ∈ {1, 2.5, 5} Hz.
SNR of precise patterns to background activity can be then computed directly applying the following
equations
SNR =
=
Nsignal
(3.1)
Nnoise
nsignal
T
n
− νsignal · Tsignal · signal
T
νsignal · Tsignal ·
ν
23
(3.2)
high precision
low precision
..
.
high SNR
unit 1
.
unit p
.
0
.
50
0
t [s]
50
0
25
50
low SNR
25
.
.
0
Tsignal
50
Figure 3.1: Illustration of precise spike patterns in surrogate spike trains. Rasterplots show spiking activity of
five exemplary simulated units (y-axis) for different SNRs and temporal precisions. Gray-colored horizontal bars
correspond to spike time occurences of the background activity and red bars to (precise) spike patterns, Tsignal = 1
s, ν = 2.5 Hz, high SNR=2, low SNR=0.06, high precision σj =0 s, low precision σj =2 s.
high SNR, high precision
20
.
.
.actual
.KDE
ifr [Hz]
Figure 3.1 shows examples of simulated spike trains
with different temporal precisions σj ∈ {0, 2} of the
short pulses and different SNRs ∈ {0.06, 2}. Figure 3.2
shows the framed box in top right panel for high precision and high SNR (SNR=2, σj =0) as instantaneous firing rate. The dashed black line displays the underlying
rate and the gray solid line the firing rate estimate obtained by KDE as average over n=50 exemplified surrogate spike trains. Precise patterns can be identified as
short pulses with increased activity by looking at optimally smoothed instantaneous firing rates.
0 .
0
t [s]
30
Figure 3.2: KDE of precise spike patterns in surrogate spike trains Instantaneous firing rates obtained by optimized kernel density estimation as
average over 50 exemplified simulated spike trains.
We calculated bandwidth estimates for all simulated spike trains with different SNRs, precisions and
average firing rates.
24
3.2.2 Relationship between temporal structure of spike trains and the bandwidth outcomes
Analysis of UCV-based optimal bandwidth estimates hucv as a function of temporal structure in simulated spike trains reveals that lower bandwidth estimates are associated with more precise temporal
patterns of the spike trains, a higher signal-to-noise ratio and higher average spiking activity.
To illustrate the dependence between bandwidth estimates and the level of jitter we first averaged values
over n=5000 simulations and then base-10 logarithm transformed both axes. From figure 3.3, b it follows that hucv values can be characterized by a sigmoid: the UCV-based bandwidth increases initially
approximately exponentially with decreasing precision of the precise patterns. Then, depending on SNR
levels, after reaching the inflection point (jitter>5s) the growth slows and hucv attains its maximal value.
Following this capacity limit, after precise patterns are completely smeared out, hucv values remain constant.
hucv ∼ SNR, ν =5Hz
hucv ∼ precision−1 , ν ∈ {2.5,5}Hz
hucv ∼ precision−1 , ν =5Hz
102
.jitter=0
.jitter=1
.jitter=2.5
.jitter=8
0.03 0.2
(a)
.
.
.
.
101
100 .
0.5
SNR
2
.
1
.SNR=2
.SNR=0.5
.SNR=0.2
.SNR=0.03
5
jitter [s]
hucv
.
.
.
100 ..
.
102
hucv
hucv
101
10
100 .
32
(b)
.
.
.
.
.
.
1
1
.SNR=0.2, ν =2.5
.SNR=0.2, ν =5.0
.SNR=0.5, ν =2.5
.SNR=0.5, ν =5.0
.SNR=2.0, ν =2.5
.SNR=2.0, ν =5.0
5
jitter [s]
32
(c)
Figure 3.3: Functional dependency between UCV-optimal bandwidth estimates and the temporal structure
of spike trains. a: Optimized bandwidth as a function of the signal-to-noise ratio for different precisions σj ∈
{0,1,2.5,8}. b: Optimized bandwidth as a function of the temporal precision for different SNRs ∈ {0.03,0.2,0.5,2},
ν=5Hz. Lines correspond to the mean over n=5000 bandwidth estimates obtained from simulated spike trains.
Shaded area indicates the SEM. c: Generalized logistic functions fitted to hucv values dependent on the temporal
precision for different average spiking activity ν ∈ {2.5,5} Hz, SNR ∈ {0.2,0.5,2}. Dots correspond actual values
and lines to fitted values according to eq. 3.3. SEM in the right panel has been left out for clarity.
The transformed bandwidth values dependent on the jitter σj can then be fitted by a generalized logistic
function of the following form for different SNR and average firing activity ν levels (fig. 3.3, c)
hucv (σj ) = A +
K −A
1 + e −(a+b·σj )
(3.3)
Parameters A, K, a, b denote the lower and upper asymptote, growth rate, jitter at maximum slope and
were determined by maximizing the log-likelihood with the MATLAB built-in function ”fminsearch”.
From figure 3.3 c it also follows that the UCV-based bandwidth outcome always increases with lower
average firing rates if SNR levels are kept constant. Moreover, lower bandwidth estimates are associated
with higher SNRs within the temporal structure of simulated spike trains.
25
SNR/ ν
[Hz]
0.2/ 2.5
0.2/ 5.0
0.5/ 2.5
0.5/ 5.0
2.0/ 2.5
2.0/ 5.0
A
K
a
b
1.292 (1.263, 1.321)
0.907 (0.873, 0.941)
0.731 (0.705, 0.757)
0.358 (0.314, 0.402)
0.287 (0.250, 0.323)
-0.405 (-0.53, -0.28)
2.366 (2.332, 2.400)
2.118 (2.085, 2.151)
1.870 (1.852, 1.889)
2.398 (2.354, 2.442)
2.158 (2.129, 2.188)
3.040 (2.893, 3.186)
-1.614 (-1.779, -1.450)
-1.438 (-1.594, -1.282)
-1.021 (-1.122, -0.920)
-1.294 (-1.397, -1.191)
-1.105 (-1.190, -1.020)
-1.080 (-1.199, -0.961)
3.09 (2.809, 3.371)
3.234 (2.958, 3.510)
3.373 (3.176, 3.570)
2.788 (2.611, 2.965)
2.946 (2.794, 3.097)
1.991 (1.787, 2.195)
r2
1-1.2 10-3
1-1.2 10-3
1-6 10-4
1-5 10-4
1-4 10-4
1-7 10-4
Table 3.1: Parameter solutions and coefficients of determination for equation 3.3
Estimated model parameters and their respective 95% confidence intervals for the functions in fig. 3.3 c
are shown in the table 3.1.
In general, the SEM was low (fig. 3.3, a and b) ranging from 10−2 to below 100 and the proportion of
variance explained by the model in equation 3.3 r 2 is almost unity. The estimated sigmoid functions
hucv (σj |SNR, ν) shown in figure 3.3 c and parameter solutions for equation 3.3 given in table 3.1 are
unique in the probed SNR and ν range.
3.3 In-vivo recordings: UCV-based bandwidth estimates of rat mPFC singleunits
Besides the distribution of the bandwidth estimates, we examined the three following single unit spike
train properties which are related to the temporal structure of spiking activity: First, single unit discrimination power of task-related event occurrences; second, the recurrence of spike triplet pattern: distribution of joint interspike intervals of successive ISIs pairs and how densely those cluster; and, third,
general spike train statistics and global and local irregularity measures such as the mean firing rate, the
coefficient of variation and the local variation.
To examine the single unit discrimination power we first measured the decoding performance as prediction error of a linear discriminant classifier built on temporally averaged response-class specific optimized firing rates of single cells (discussed in detail in sec. 2.4). This measure reflects how well spiking
activity of a single unit discriminates between distinct task events. The misclassification rate will be
low if the single unit discrimination power is high. Consequently, a misclassification rate below chance
implies that spiking activity and spike timing must be time-locked to task-related events. Thus, low
misclassification rate can be associated with the occurrence of temporally precise spiking patterns.
In order to determine the clustering or spread of recurrent spike triplet pattern we looked at the local
autocorrelation structure of spike trains by examining the relationship of adjacent ISIs. Besides being
related to temporal correlation of ISIs JISI plots can also reflect the degree of regularity or periodicity
in the underlying activity and more importantly the precision of recurrent triplet spiking patterns (Faure
et al., 2000). We also examined the correlation between optimized bandwidth estimates and the coefficient of variation Cv which reflects the global regularity of a spike train, the local variation Lv and the
average firing rate ⟨ν⟩ (as introduced in sec. 2.1).
26
3.3.1 Distribution and properties of bandwidth estimates
We obtained bandwidth estimates of all recorded units N=905 (units with spiking activity below 0.1Hz
were excluded). From figure 3.4 it follows that UCV-based bandwidth estimates of single units are distributed mixed log-normally with heavy tails and values ranging from 0.05-356.7. The majority of single
units has an UCV-optimal bandwidth at hucv ≈ 0.5 (visible as pronounced mode in fig. 3.4 a). This value
corresponds to the optimal temporal resolution, i.e. variance of the Gaussian kernel, for computation
of the instantaneous firing rate. Logarithm-transforms of bandwidth estimates reveal that values follow
a log-normal mixture distribution with at least two (alternation task, fig. 3.4 b) or three distinct modes
(sequence-switch task, fig. 3.4 c). We further investigated if distinct modes represent different subgroups
of units which share common properties in their temporal spiking structure.
Bandwidth distribution
Normalized bandwidth task 1
Normalized bandwidth task 2
0.5
0.6
pdf
0.6
0.3
0.3
.
(a) 0
2
6
hucv
.
/ /
(b)0 -2
350
10
/
0
10
2
hucv
10
(c) 0
.
10-2
100
hucv
102
Figure 3.4: a: Distribution of N = 905 single unit bandwidth estimates. b: Base-10 logarithm transformed
bandwidth estimates on task 1 (sequence-switch task) and c: on task 2 (alternation task).
To illustrate what sort of information on the temporal structure of single units can be detected by the
bandwidth estimate we selected three exemplary units with hucv = {0.15, 0.26, 81.73} from tails of the
log-transformed hucv distribution. We first visualized their spiking activity with the aid of return maps
(fig. 3.4 top and middle panel) and second displayed their spike patterns in relation to onsets of taskrelated events(fig. 3.4 lower panel).
The JISI plots we obtained showed the presence of distinct clusters (fig 3.5, top). The cluster size and
the spread of the points reflects the serial dependence of successive ISIs Dodla and Wilson (2010). To
compute how likely joint ISIs are going to recur we calculated the joint ISI pair density using a bivariate
Gaussian kernel similar to the univariate KDE in section 2.2. These densities are visualized for logtransformed ISIs as color-coded contour maps (fig. 3.5, middle). Hues changing from white to magenta
indicate increasing probability of recurrent spike triplet patterns, i.e., when pairs of ISIs are more likely
to appear. Points which scatter in clusters indicate recurrent spike triplet patterns.
27
A) Bursty sparsely spiking unit
104
104
104
ISIn+1 [ms]
C) Quasi-random sparsely spiking unit
B) Bursty stimulus-locked unit
.
.
102
102
100 . 0
10
102
100
2
0
4
10
10
10
10
ISIn [ms]
2
100 .
100
4
10
102
104
ISIn [ms]
ISIn [ms]
0.6
0.5
0.4
0.3
0.2
0.1
0
trial 1
..
.
trial k .
0
.
5
10
time [s]
15
0
.
5
10
time [s]
15
0
5
10
15
time [s]
Figure 3.5: Properties of single neuron spiking activity: JISI scatterplots, JISI densities and relationship
to task-related events. Columns are arranged from left to right in order of increasing hucv values for three
selected units (chosen from the same data set) with hucv = {0.15, 0.26, 81.73}. Top: Scatter plots of base-10 logarithm transformed JISIs. Middle: JISI density contour plots, regions with increased densities of successive ISIs
are color-coded. ISI pairs were logarithm transformed for visualization purposes and provided a more compact
overview compared to non-transformed heavy-tailed ISI distributions. Bottom: Spiking activity (black horizontal bars) and time-stamps of successive task-related events (red horizontal bars) for 8 depicted trials of the same
units. Spike trains were aligned at the beginning of each trial. Units are taken from the sequence-switch task,
five task-related events per trial comprise: wheel-turn, nose-poke, lever-press, approaching the reward and reward
consumption respectively.
Exemplary units with small bandwidth estimates (hucv <1) exhibit irregular spike sequences with sparse
and bursty firing behavior (fig. 3.5, lower left and middle panel). JISI scatter and JISI density plots of
the unit having the lowest estimate (hucv =0.15) show three discernible clusters (fig. 3.5 left, top and
middle panel). The two arms parallel to the axes correspond to interburst intervals, where the mode
to the right displays ISIs associated with initiation and the upper mode completion of bursts. The unit
with a slightly higher hucv value of 0.26 (figure 3.5 middle panel) is clearly time-locked to some of the
task-related events (fig. 3.5 lower, middle panel), while the third unit with the highest hucv =81.73 is
spiking quasi-random and is not time-locked to any events of the task-related events.
28
3.3.2 Relationship between bandwidth estimates and other measures
After we have characterized distribution of the bandwidth estimates p(hucv ) separately, we examine the
covariation of bandwidth estimates and other spike train statistics. In order to quantify this covariation
statistically, we first examined if there is a linear correlation between hucv and the other measures by
estimating Pearsons correlation coefficient r and fitting a linear regression model with hucv as the independent variable. If the correlation was non-linear, we attempt to fit a non-linear model. Otherwise,
if there was no direct relationship which can be characterized by an explicit function, i.e. bandwidth
estimates and values of other measures exhibit multiple distinct clusters in their joint scatter plots, we
estimated their joint distribution, denoted as p(hucv , · ), by fitting a Gaussian mixture model. We then
characterized each component of the joint distribution separately by determining the cluster centers and
the respective covariances within the clusters.
Single unit discrimination power. Bandwidth estimates and decoding performance of single units were
significantly correlated (r=0.36, p-value< 10-29 , Pearson correlation coefficient). By fitting a linear
model of the form Err = 3.2 ·hucv + 61.4 13% of the variation can be explained (r 2 =0.13, F: 138, pvalue = 1.01e-29, F-statistic vs. constant model). Most of the cells were encoding above chance. The
mode of the error distribution is centered at ≈66% misclassification error (fig. 3.6 b, dotted horizontal
line and distribution right panel). Decoding three actions this corresponds to a chance level of 1/3 correct
classification rate.
hucv ∼ single unit discrimination power
max JISI density
.
.r=-0.83 (p <10-234 )
102
101
100
.
.
(a)
Misclassification error [%]
hucv ∼ spike triplet pattern variability
10
-2
0
10 hucv
10
2
(b)
.
.
80
pdf
.r=0.36 (p <10-29 )
.chance level
60
40
.
10-2
.
100 hucv
102
.
Figure 3.6: Correlation with other measures reflecting precise temporal spiking patterns. Optimal bandwidth
estimates and a: base-10 logarithm transformed maximum JISI density values, and b: single unit misclassification
error.
29
Probability of recurrent spike triplet patterns. We analyzed the clustering of ISI pairs by examining the
maximal probability values ISI pairs p(ISIn , ISIn+1 ) across all single units (fig. 3.6 a). This quantity is
proportional to the local number of points in the JISI scatter plot (Dodla and Wilson, 2010). A density
of unity corresponds to the case where all ISI pairs fall into the same region in the JISI plot. This would
imply that there exist only one exact spike triplet pattern. In contrast, smaller probability values indicate
that ISIs are more spread in time, i.e. occupy larger regions in return maps and there is more variability
of recurrent spike triplet patterns. The maximal probability of recurrent spike triplet patterns was significantly correlated with lower bandwidth estimates (r=-0.83,p-value<1.35e-234, Pearson correlation
coefficient).
Irregularity. We computed two further measures which are associated with global and local irregularity of the underlying spiking activity: the coefficient of variation Cv , the local variation Lv . We then
quantified their relationship to UCV-based bandwidth estimates. Coefficients of variation and bandwidth
estimates were negatively correlated. Cv values exponentially decrease with increasing bandwidth estimates (fig. 3.7 a). Single units spiking close to a Poisson-like or random, i.e. Cv ≈1 or regular regime
(Cv <1) display increasing bandwidths estimates. We fitted an exponential function of the form
Cv = 0.24 + e -0.68·hucv which could explain r 2 =0.34 variation contained in the data.
hucv ∼ local variation
hucv ∼ coefficient of variation
hucv ∼ local variation
pdf
.
r2 =0.34
.
1.5
1.0
1
Cv
Lv
10
1
0.5
0.5
100
.
..
.
10
-2
(a)
0
10
hucv
2
10
.
-1
10
(b)
0
1
10
10
hucv
10
2
..
.
10
-1
(c)
0
10
10
1
2
10
hucv
Figure 3.7: Correlation between optimal bandwidth estimates and a: the coefficient of variation and b: the local
variation for N=905 units. c: Joint distribution p(hucv , Lv ) by fitting a three-component Gaussian mixture model
Although we did not observed explicit correlation between the local variation and bandwidth estimates,
scatter plots showed at least two or three discernible clusters. We next fitted their joint distribution
p(hucv , Lv ) by a three-component Gaussian mixture model (fig. 3.7 c). One cluster was centered around
Lv ≈1 and scattered along a line parallel to the hucv -axis and two more clusters at hucv =10-0.07 and
hucv =10-0.15 scattered parallel to the Lv -axis. According the definition of the Lv (as introduced in
sec. 2.1) points falling within the first cluster centered at (101.19 ,1.04) represent Poisson-spiking units
(Lv ≈1) with high bandwidth estimates >101 . More regularly spiking cells (Lv <1) fall in the second
cluster centered at (10-0.15 , 0.84), while units firing irregularly (Lv >1) lie within the third cluster centered at (10-0.07 , 1.25).
30
Average spiking activity. Figure 3.8 shows the relationship between base-10 logarithm transformed average firing rates, the local variation Lv and optimized bandwidth estimates. To establish a connection to
the previous findings (fig. 3.7), we computed the correlation between average firing activity of cells and
the their local irregularity measure Lv (fig. 3.8 a). Both are negatively correlated with r = −0.72 (p-value
=1.35e-142, Pearson correlation coefficient) and could be fitted by a linear model Lv = -0.31·⟨ν⟩+ 1.02
(F: 945, p-value = 1.35e-142, F-statistic vs. constant model) which explains r 2 =0.51 of the variation in
the data. It is worth noting that this relationship implies that cells spiking more irregularly with Lv above
unity exhibit low average firing rates ⟨ν⟩ < 1Hz (fig. 3.8 a, left upper quadrant), while units with higher
average firing activity ⟨ν⟩ > 1Hz are spiking more regularly Lv <1 (fig. 3.8 a, right lower quadrant).
hucv ∼ firing rate
firing rate ∼ Lv
1.5
hucv ∼ firing rate
.r=-0.37 (p <10-30 )
.
101
⟨ν⟩
Lv
.r=-0.72 (p <10-142 )
.Lv =1 Poisson spiking
.
.
1
0.4
100
0.2
10-1
0.5
.
.
10-1
(a)
100
⟨ν⟩
.
101
10-2
(b)
.
.
10-1
100
101
hucv
102
10-2 10-1 100 101 102
(c)
hucv
Figure 3.8: Correlation between average firing activity, optimal bandwidth estimates and the local variation.
The log-transformed mean firing rate ⟨ν⟩ = 1/⟨ISI⟩ and a: local variation Lv and b: log-transformed bandwidth
estimates. c: Obtained joint distribution p(hucv , ⟨ν⟩) by fitting a two-component Gaussian mixture model.
In general, bandwidth estimates and average firing rate of single units were significantly correlated (r=0.37, p-value= 1.63 · 10-30 , Pearson correlation coefficient). We quantified the relationship by fitting a
linear model of the form ν = -0.25 ·hucv + 0.014, which could explain 14% of the variation (r 2 =0.14,
F: 142, p-value< 1.63e-30, F-statistic vs. constant model). Furthermore, scatter plots displayed distinct
clusters. We fitted a two-component Gaussian mixture model to characterize the joint distribution of
single unit UCV-based bandwidth estimates and average firing rates p(hucv , ⟨ν⟩). Both components are
visualized as two ellipses in figure 3.8 b. The first cluster was centered at (10-0.23 , 100.06 ) with a mixing
proportion of 0.58 and a second cluster at (101.0 ,10-0.22 ) with a mixing proportion of 0.42. By extracting correlation coefficients of the Gaussian covariances we found that for points falling within the first
cluster bandwidth estimates and average firing activity are positively correlated with r =0.18, while for
points falling within the second component hucv and ⟨ν⟩ are negatively correlated with r = −0.64.
31
3.4 Summary & Discussion
Neither bandwidth optimization nor the application of Kernel density estimates to neurophysiological
data is a novel approach. However, the sort of information which UCV-based bandwidth estimates of
single units alone or in combination with other measures enable to retrieve, e.g. on the temporal structure, global or local regularity of a spike train or its encoding properties, is unique and can be employed
to answer qualitatively different questions. The main findings of this chapter can be summarized as
follows:
First, bandwidth estimates are tuned to the temporal structure of spike trains and there exists an explicit
functional dependence between UCV-based the bandwidth outcome and the precision of spiking patterns
which also accounts for signal-to-noise ratio and average firing activity of a spike train. Second, by providing a characteristic value for the temporal spiking structure, bandwidth estimates can highlight single
units encoding for task-related events. Third, the distribution of bandwidth estimates reveals distinct
subgroups of cells with common firing properties.
Bandwidth estimates are tuned to the temporal structure of surrogate spike trains
The simulations we conducted show that smaller bandwidth estimates are associated with first, higher
precisions of the spiking patterns, second, higher signal-to-noise ratio within the structure of spiking
activity and third, higher average firing rates of spike trains. Moreover, the relationship between the
temporal structure of the simulated spike trains and optimized bandwidth estimates can be fully described by a sigmoid function given in eq. 3.3 (fig. 3.3 c) which provides unique parameter solutions for
different temporal structure settings: SNR, ν.
The first two findings, relationship between bandwidth estimates and the precision and SNR of spike
train patterns are in line with general properties of KDEs and reflect that the lower variability, i.e. the
more densely points cluster within a restricted region the smaller bandwidths are needed to represent
the underlying distribution (Scott and Terrell, 1987; Silverman, 1986; Rudemo, 1982; Bowman, 1984;
Simonoff, 1996). Similarly, higher spike rates have lower mean interspike intervals and variances (Gerstner and Kistler, 2002; Dayan and Abbott, 2005), i.e. lower variability and thus higher average firing
rates implicate lower bandwidth estimates.
The sigmoid functional dependence between bandwidth estimates and the precision of spiking patterns which also takes into account signal-to-noise ratio and average firing activity of the spike train
hucv (σj |SNR, ν) implies that there most likely exists an analytical solution for hucv which can then be
expressed as a function of the temporal structure. Consequently, if this is the case, bandwidth estimates
and temporal structure of spike trains are not just correlated, but estimates must to some extent reflect
exact information on particular aspects of the spiking patterns.
32
Bandwidth estimates can highlight single units encoding for task-related events
We found significant correlation between higher decoding performance when predicting task-related
events with LDA and small single unit bandwidth estimates (fig. 3.6 b). This finding implies that spike
trains of encoding units must convey temporally precise spiking patterns which are time-locked to these
events and is in agreement with results for surrogate data (in sec. 3.2) that UCV-based bandwidth estimates highlight the temporal structure in spike train data. Units which had low bandwidth estimates but
high prediction error values might fall into the following categories: units which are not discriminating
events in their activity but have a high average firing rate (supported by fig. 3.8 b) or bursty spiking
activity. Also encoding units could have high prediction error values for instance if they exhibit a high
trial-by-trial variability, are locked to all stimuli, but do not discriminate in their firing rate, or reflect
internal processing (e.g. delay activity or reward).
Distribution of bandwidth estimates reveals distinct subgroups of cells
To examine in more detail what sort of temporal information can be revealed with UCV-based bandwidth estimates we analyzed mPFC in-vivo multiple single unit recordings of behaving rats. In-vivo
recordings show that the distribution of bandwidth estimates exhibits at least two distinct modes. Exemplary units depicted from the distinct modes display different types of spiking activity structure with
respect to recurrence of spike triplet patterns or locking to task-related events. Similar to findings for
surrogate data summarized in the previous paragraph, temporally more structured spiking activity seems
to be correlated with lower optimized bandwidth outcomes (fig. 3.5).
More detailed analysis across all units in combination with other spike train metrics and regularity measures demonstrates either significant correlation or the presence of multiple clusters. We found that
first, probabilities of recurrent spike triplet patterns (fig. 3.6 a) and global spike train irregularity, i.e.
coefficients of variation (fig. 3.7 a) are significantly correlated with lower bandwidth estimates. Second,
examining the joint distribution with the average firing rate shows the presence of two and with the local
irregularity measure Lv the presence of three clusters (fig. 3.8 and 3.7 c). One cluster is centered at Lv
close to unity and higher bandwidth values (>101 ), and two more clusters centered below and above
Lv =1, and lower hucv values <100 . These clusters represent cells which are spiking random or Poisson
(Lv ≈1, first cluster), locally regular (Lv <1, second cluster) or locally irregular (Lv >1, third cluster)
(Shinomoto et al., 2002). It is worth noting, that by examining the Lv distribution alone it is not possible
to discriminate these three separate components as it has only one mode centered around unity (fig. 3.7
b, right panel). This is in agreement with the Lv distribution for PFC neurons which has been reported
by Shinomoto et al. (2003). Also the clear negative correlation we found for average firing rates ⟨ν⟩ and
local irregularity values of spike trains, i.e. high ⟨ν⟩ are associated with low Lv , has been reported in
other studies which show either a power-law relationship for awake rat PFC (Peyrache et al., 2009) or a
linear relationship for anesthetized cat V2 neurons (Blanche et al., 2005).
The components of the joint p(hucv , Lv ) distribution might be categorized in the following way: units
with low bandwidth estimates, locally regular spiking activity and thus high firing rates which fall in the
second cluster might represent putative interneurons, while units with low bandwidth estimates, locally
irregular spiking activity and thus low firing rates which fall in the third cluster might represent puta-
33
tive pyramidal cells. This categorization is supported by other studies which used the Lv distribution
either to classify cells from distinct functional brain areas or to distinguish distinct cell classes from the
same brain region (Shinomoto et al., 2002, 2009, 2005). Also, by combining several spike train metrics,
⟨ν⟩, Cv , Lv and two more spike waveform measures, Ardid et al. (2015) were able to segregate several
functional classes of broad spiking putative pyramidal cells and narrow spiking putative inhibitory cells
in the PFC of macaques engaged in an attention task. Authors reported that putative inhibitory cells
were associated with lower Lv and higher ⟨ν⟩ and vise versa for putative pyramidal cells which is in
agreement with our categorization.
Summarizing, qualitative information conveyed by UCV-based bandwidth estimates clearly differs from
other spike train irregularity measures, as Lv , Cv and also from general spike train statistics as the
average spike rate ⟨ν⟩. Therefore, combining these measures might represent a powerful tool for discrimination of different cells types as it was already shown for the local variation (Shinomoto et al.,
2002, 2009, 2005) or multiple measures in combination with clustering analyses Ardid et al. (2015).
Concluding, apart from their application in firing rate estimation in the first place, optimized bandwidths
can be used to solve different problems: give a characteristic value of temporal precision within spiking
patterns, discriminate task-related units and possibly even distinguish different cell types.
34
4 Validity of optimized KDEs in neuronal decoding
In the previous chapter we have studied general features and characteristics of UCV-based optimized
bandwidths obtained for in-vivo spike train data from the rat mPFC of behaving animals. We have seen
that optimized bandwidth provide an information rich measure reflecting temporal structure or encoding
properties of single units. In this chapter we will examine properties of optimized KDEs in decoding,
i.e. when KDEs obtained with optimized bandwidths are applied in subsequent decoding analyses.
The goal of this chapter is two-fold: First, we aim to assess if UCV method gives a reliable bandwidth
estimate when optimized KDEs are applied subsequent classification and second, compare the decoding
performance achieved with optimized KDEs to non-optimized KDEs.
This chapter is structured in the following way: First, we will describe the used measures to assess the
decoding validity of optimized KDEs, next will apply the validity measures to surrogate data and to
mPFC in-vivo recordings, then we will compare decoding performance of optimized to non-optimized
KDEs in-vivo data and finally we will draw which conclusions can be made based on the outcome.
35
4.1 Validity measures
We have described the procedure to obtain optimized KDEs in section 2.2, how to decode task-related
events in section 2.4 and how compute a generalized, expected prediction error of a classifier by crossvalidation in section 2.5. In order to examine how reliable the obtained optimized bandwidth estimate
are, we analyzed how the classification performance changes, depending on to which extent we scale
single unit bandwidth estimates. As the scaling will effect smoothness of instantaneous firing rates and
thus also the decoding performance.
I. k e r n e l s m o o t h i n g
⊗
a li n g
. h
λ-
..
.
.
ν1
sc
unit 1
..
.
unit p
U CV
t
.
νp
t
.
np
wt
lp
.
h
optimal bandwidth hucv
sif
ica
tio
n
misclassification error Err(λ · hucv )
III
.p
er
for
ma
nce
s
cla
II.
dim 2
Err=1/22
. np.
. wt.
. lp.
.
dim 1
Figure 4.1: Illustration of the two-stage approach for evaluating the validity of optimized bandwidth estimates in decoding. Before obtaining firing rates, UCV-based bandwidth estimates are scaled by a parameter λ
and after subsequent decoding the validity of the UCV method is evaluated by measuring the misclassification
error as a function of λ.
36
4.1.1 Evaluation of UCV-based bandwidth estimates in decoding
λopt = arg min Err(λ · hucv )
Err(hucv )
Err(λopt )
Misclassification error [%]
We implemented the LDA in combination with the 3-fold cross-validation scheme (as introduced in
sec. 2.4 and 2.5). Experimental data: only correct trials were included. For the decoding procedure
we took class labels from alternation task: 4s delay period preceding nose poke, nose-poke and lever
press. Sequence-switch task: nose-poke, wheel-turn and lever-press. Instead of using optimized KDEs
ν(t|hucv ) to calculate the prediction error Err(hucv ), population vectors of firing activity were obtained
employing a scaled version of the bandwidth estimate ν(t|λ · hucv ), λ ∈(0, 20], and then used for subsequent stimulus class prediction (fig. 4.1). To test if the true bandwidth estimate gives reliable unbiased
classification results, i.e. if the scaling does not significantly improve prediction, we determined two
measures:
First, the optimal scaling which gives the
best prediction and second, the percentage de60
viation in decoding performance of the true
compared to the best scaled bandwidth esti50
mate.
The optimal scaling parameter λopt is then
40
defined as the value for which Err(λ · hucv )
∆ Err
achieves best prediction (eq. 4.1) and the per.
λopt 1
30
centage difference in prediction error represents
.
the deviation when using the true bandwidth
1 2
4
estimate relative to the best scaled estimate
λ
(eq. 4.2).
Figure 4.2: Illustration of bandwith validity measures
Both measures λopt and δErr indicate if a scaling and decoding performance as a function of the scalλ exists which allows more accurately to discrim- ing parameter λ, dataset 15 alternation task. Smoothed
(solid black line) and raw (left plot, dots) misclassificainate between distinct stimulus classes and neution error values as a function of the scaling parameter
ronal activity states, or, whether the UCV method λ follow a v-shaped curve with a pronounced minimum
provides a sufficiently valid and unbiased esti- at λopt . The discrepancy between true error Err(hucv ) at
mate (when λopt ≈ 1 and δErr ≈ 0) so that de- λ = 1 and the best error Err(λopt ) can be measured as
absolute (∆ Error) or relative value (δErr ).
coding performance cannot be improved by bandwidth scaling.
(4.1)
λ
δErr
=
Err(hucv ) − min Err(λ · hucv )
× 100
max Err(λ · hucv ) − min Err(λ · hucv )
37
(4.2)
We computed δErr and λopt applying the following procedure to each data set:
.for units i = 1 to p
.
1. compute optimal UCV-based bandwidth estimates hucv
2. define a set of n scaling parameters λ = {λ1 , ..., λn } ∈ (0,20]
.for j = 1 to n
.
3. scale the true estimates λj · hucv
.
4. obtain KDEs by plugging in scaled bandwidth estimates ν(t |λj · hucv )
.
5. estimate decoding performance Err(λj · hucv ) based on 3-fold-CVE scheme (sec. 2.5)
6. smooth resulting Err(λ)-functions with the MATLAB built-in cubic spline function ’csaps’
7. determine scaling factor λopt which gives best decoding performance
.
λopt = arg min Err(λj · hucv )
j
8. determine the relative deviation of the true error Err(hucv ) from the best error Err(λopt )
.
δErr = [Err(hucv ) − Err(λopt )/(arg max Err(λj · hucv ) − Err(λopt ))] × 100
j
Bandwidth validity pseudo-code.
38
4.1.2 Comparing decoding with optimized on non-optimized bandwidths
λ/h
λ/h
Figure 4.3: Decoding performance with scaled optimized and non-optimized KDEs. The misclassification
error is shown as a function of the bandwidth scaling λ for optimized KDEs (black lines) and dependent on a fixed
bandwidth h for non-optimized KDEs (gray lines) which is equal across all units for a given data set.
Selecting different cross-validation error schemes and subsets units from experimental data.
Both bandwidth selection methods were probed using different subsets of units for a given data set.
Employing optimized KDEs and also spike counts over 500 ms, we identified for each data set the most
predictive subset by means of sequential unit elimination starting with the complete set and successively
removing units which most improve the misclassification error (for the complete description of the
method see appendix 5.4). So that for each of the 19 experimental data sets used for our analyses we
identified and compared the performance using three different sets:
First, the complete set, i.e. all units of one recording session; second, the most predictive set of cells
when the firing rate was computed as spike count over 500 ms; third, the most predictive set of cells
when the firing rate was obtained by Kernel density estimation with UCV-based bandwidths hucv .
Deploying these sets we also compared the decoding performance of optimized to non-optimized KDEs
when the misclassification error was calculated using different cross-validation schemes. We applied
3-fold and m-fold CV schemes, both are discussed in section 2.5.
39
. .
. .
. .
.
.
.
.
. .
.
..
. .
Err( hopt )
.
Err( hucv )
. .
.
. ..
. .
. .
. .
. .
. .
. .
. .
..
..
hopt hucv
..
.
. .
. .
10
.
5
. .
..
.
1 2
.
.
.
. .
.
.
.optimized KDEs
.non-optimized KDEs
.
20
. .
.
.
.
. .
40
.
.
. .
..
..
. .
. .
. .
.
60
. .
.. .
Misclassification error [%]
80
..
Since in experimental settings most commonly used bandwidths are chosen arbitrarily which makes
them equal or fixed across units for a given data set, we examined how the classification performance
changes, depending on what bandwidth selection method is used for prior kernel smoothing.
We computed the decoding performance as a function of the fixed bandwidth h ∈ (0, 20] and identified the best non-optimized bandwidth as the value which gives the lowest prediction error hopt =
arg h min Err(h). We employed Wilcoxon one-sided signed rank test for paired samples to compare
Err(hucv ) (optimized) and Err(hopt ) (non-optimized) decoding outcomes.
4.2 Surrogate data: validity of optimized bandwidth estimates dependent
on distinct firing states
4.2.1 Generation of spiking activity with multiple firing states
To analyze the reliability of the optimal bandwidth estimates in more detail we generated surrogate spike
trains of known underlying structure which reflect response-class specific firing activity ν(t|ck ) during
performance of task-related events and applied the previously introduced validity measures.
In order to simulate animal behavior and to generate surrogate spike train data we implemented a traditional hidden Markov model (HMM) framework (Rabiner, 1989) which has been widely applied and
extended to spike train data (Abeles et al., 1995; Seidemann et al., 1996; Jones et al., 2007; Yu et al.,
2006; Escola et al., 2011). In this context, neuronal activity can be characterized by a HMM which
states that responses of different neurons reflect a common dynamical process in the network, a network
state. These states use to drive spiking activity of the neurons. Transitions from one state to another can
undergo abruptly, at variable times as described in models of network dynamics (Hopfield, 1982; Amit,
1992).
The implementation of the HMM is based on the following assumptions
1. Neuronal activity can be characterized by a small number of hidden states s which correspond to
average firing rates νi (s) of cells i. At every moment in time, the system is in one of these states.
2. In each state the neurons fire according to an approximately stationary homogeneous Poisson
process with a constant firing rate. whereas the precise spike timing is random.
3. Hidden states change as a time-homogeneous Markov chain, i.e. the probability of a transition
from one state i to another state j from one time bin to the next, denoted as pi,j , is constant over
time and can be summarized by transition probability matrix T . For states to be persistent for
much longer than the time bin of 1 ms, the diagonal elements of T are almost unity, pi,i ∼ 1.
s3
p3,3
p0,3
p0,0
s0
s2
p2,0
p0,1
p2,2
ν1
states s
p3,0
p0,2
..
.
νp
p1,0
(a)
s1.
.
p1,1
time [s]
(b)
s1
(c)
.
.
s2
s3
Figure 4.4: Steps of the HMM simulation. a: Markov chain with nodes representing states and edges possible
state transitions and assigned transition probabilities. b: Markov state sequence corresponding animal behavior
for one simulated trial. c: Markov states drive spiking activity and give rise to different ifr profiles of several units.
We first simulated animal behavior by a Markov state sequence with S = 4 distinct states corresponding
a resting state s0 and three events/behaviors s1 , s2 , s3 performed on task based on the transition proba-
40
bilities shown as edges of the Markov chain shown in fig 4.4 a and summarized in the transition matrix T .

p0,0

p
T =  1,0
p2,0

p3,0
p0,1
p1,1
p2,1
p3,1
p0,2
p1,2
p2,2
p3,2
 

p0,3  1 − 3 · 10−5
10−5
10−5
10−5 
 


p1,3   10−4
1 − 1 · 10−4
0
0

= 

p2,3   10−4
0
1 − 1 · 10−4
0


 
p3,3
10−4
0
0
1 − 1 · 10−4
Apart from recurrence to the same state, transitions were possible from the resting state to any other
state, however, vice versa from a non-resting state only to s0 (e.g. fig. 4.4 a). Probability of remaining in
s0 within the next time bin of 1 ms was set to 1−3 · 10−5 and to 1−1 · 10−4 for any other state, transition
from s0 to any other state 10−4 , and back to s0 10−5 . Minimum duration of persistence in a state was set
to 500 ms, transitions were then arranged in sequential order so that each trail consists of the following
state sequence (s0 , s1 , s0 , s2 , s0 , s3 , s0 ) (fig. 4.4 b) as in the experiment. The states were then used to drive
spiking activity of the model neurons and gave rise to different firing rate profiles νi for each state s
(fig. 4.4 c). The firing rate of the resting state is set to νi (s0 ) =0.1 Hz.
To be as close to experimental conditions as possible we analyzed empirical firing rate profiles for each
behavior (non-resting state) prior to surrogate data generation. The distribution of average firing rates
within 1 sec time windows centered around behavioral events of all units and data sets is shown in figure 4.5 a (black line), the mean and variance of firing rate profiles across all task-related events and cells
for each data set ⟨ν⟩s,i is shown in figure 4.5 b. From figure 4.5 a we concluded that average statespecific single-unit firing rates νi (s) are log-normally distributed with mean µs and variance σs2 which
can be fitted by a log-normal probability density function ln N (µs , σs2 ) with µs =1.75 and σs2 =8 (fig. 4.5
a, gray line).
Additionally, as reported by Abeles et al. (1995) state transitions are associated not only with a change
in the firing rate profile of several units, but also pairwise correlations between the units vary between
the different states. To account for these pairwise correlations of surrogate spike trains multivariate
binary patterns were generated at a millisecond resolution from a dichotomized Gaussian distribution
(Cox and Wermuth, 2002; Macke et al., 2009) with population mean and correlation structure specified
by the Markov sequence state and converted to spike times (MATLAB implementation of the binary
pattern generation algorithm adapted from Macke et al., 2009). We double checked that assumption (2)
was met and for each state ISIs were exponentially distributed, i.e. model neurons fired according to an
approximately stationary homogeneous Poisson process with constant firing rate.
Thus, to analyze optimized bandwidth validity systematically, with known underlying ground truth, but
at the same time to remain as close to experimental conditions as possible, each HMM comprised p = 10
units, s = 4 states and the following state-dependent, multivariate spiking activity parameters: pairwise
correlation coefficients ρ ∈ Rp×s , 3 non-resting states with log-normally distributed average firing activity ν(s) ∼ ln N (µs , σs2 ) with mean µs ∈ {1, 3, 5} and variance σs2 ∈ {10, 15, 20} and resting state s0 with
ν(s0 ) = 0.1 Hz, ν(s) ∈ Rp×s .
41
20
0.8
.original
.fitted
σ̂s2
pdf
.
.
10
.
.
0
10
(a)
(b)
ν̂s [Hz]
5
2.5
µ̂s
pdf
.
.
.
Err [%]
0.8
.µs =1
.µs =3
.µs =5
(c)
.
.
.
20
10
.
0
30
0 .
10
(d)
νs [Hz]
1
10
.µs =1
.µs =3
.µs =5
20
λ
Figure 4.5: Experimental and surrogate firing rate profile distributions of the Markov sequence states.
a: Empirical distribution of firing rates during different behaviors on the task (black) and log-normal fit with
µs =1.75 and σs2 =8 (gray). b: Mean and variance of firing rates across different behaviors and cells per data
set. c: Surrogate state firing rates are drawn from a log-normal distribution e.g. here: σs2 = 20 d: Corresponding
misclassification error curves as a function of the bandwidth scaling averaged over N=100 models for state firing
rates drawn from the distribution shown in c.
We conducted the following procedure to compute δErr and λopt for simulated data sets:
.for σs2 = {10, 15, 20}
.
for µs = {1, 3, 5}
.
generate N = 100 models a p = 10 units and s = 4 states
.
for j = 1 to N
.
1. set state-dependent HMM parameters ρ, ν(s) ∈ Rp×s
.
.draw equally distributed ρs ∈ (0, 0.02] and ν(si ) ∼ ln N (µs , σs2 ), i =1,..,3
.
2. generate a state sequence of 60 min based on transition probabilities defined in T
.
3. produce Poisson spike train output from parameters in (1) and (2)
.
4. obtain optimal UCV-based bandwidth estimates hucv
.
5. compute the error function dependent on the scaling parameter Errj (λ · hucv )
.
6. average over N =100 models to obtain th expected prediction error
∑
.
⟨Err(λ · hucv | µs , σs2 )⟩ = 1/N j Errj (λ · hucv )
7. from averaged error curves ⟨Err(...)⟩ estimate bandwidth validity λopt (µs , σs2 ) and δErr (µs , σs2 )
Model generation and validation procedure.
42
4.2.2 UCV-bandwidth validity for simulated data sets
QDA
.
5
.. .
0.5
.
.
3
5
. ..
.
. ..
..
. ..
..
. ..
..
.
. 1
...
.
µs
.
1
λopt
.σs2 =20
.σs2 =15
.σs2 =10
.
.
.
.
.
1
(b)
...
λopt
µs
0.5
(c)
3
.
1
(a)
1
...
δErr [%]
..
..
.
.
5
..
...
3
10
5
..
10
5
20
. ..
20
. ..
LDA
..
δErr [%]
. ..
Deviation from the best prediction performance of actual compared to scaled optimized bandwidths is
very low (δErr < 5%) when units have higher state-specific firing rates compared to background activity.
Employing bandwidth validity measures to averaged cross-validated misclassification error curves (as
shown in fig. 4.5 d for σs2 = 20, µs ∈ {1, 3, 5}) yields results showing that with high signal-to-noise ratio,
i.e. increasing simulated state firing rates νs the deviation from the best prediction performance of the
true compared to scaled optimized bandwidth estimate converges to zero (fig. 4.6 a,b).
This essentially means that the higher state-dependent firing rates in surrogate data and firing rates in
response to stimulus presentation compared to the background firing activity in in-vivo recordings the
more reliable the application of optimized KDEs in decoding will be. However, the optimal scaled
bandwidth values which gave the best prediction do not match exactly the actual ones (fig. 4.6 c,d).
1
µs
(d)
3
µs
5
Figure 4.6: Bandwidth validity for simulated data sets, N=10 units. a, b: Percentage deviation in misclassification error when using the true bandwidth estimate relative to the best scaled bandwidth. c, d: Optimal scaling
which gives the best prediction.
Applying QDA which takes into account that covariance matrices of different activity states vary in contrast to LDA which assumes their identity, yielded qualitatively similar results, however outperformed
LDA significantly by a lower deviation in prediction error (fig. 4.6 b) and an optimal scaling closer to
the true estimate (fig. 4.6 d).
43
4.3 Decoding spiking activity of in-vivo recordings from the rat mPFC
4.3.1 Bandwidth validity for experimental data
Decoding accuracy as a function of the scaling λ of UCV-based bandwidth estimates follows a v-shaped
curve with a pronounced minimum centered around one (fig. 4.7 a, b). This results implicate that the
best prediction performance is achieved with the unscaled, actual estimate hucv . The summary across all
experimental data sets shows that when using optimized bandwidths for decoding for 80% of the data
(15 out of 19 sets) the misclassification error differs from the minimum by less than 10% (fig. 4.7 c, d:
λopt ≈ 1 and δErr <10 ).
Err [%]
60
60
40
40
20
.
.
(a)
1
10
20
(b)
λ
1
10
20
λ
λopt
2
(c)
1
0 .
5
10
15
datasets
15
datasets
δErr [%]
30
(d)
20
10
0 .
5
10
Figure 4.7: Validity measures and decoding performance as a function of the bandwidth estimate. Illustration of the prediction error as a function of the scaling parameter λ for two exemplary data sets from a: task
two (alternation task, data set 13) and b: task one (sequence-task, data set 12). Confidence intervals were ≤10-1
and left out for clarity reasons. c: Optimal scaling which gives the best prediction. d: Percentage deviation in
misclassification error when using the true bandwidth estimate relative to the best scaled optimized bandwidth.
Both measures λ and δErr are sorted in order of increasing δErr values.
44
4.3.2 Decoding performance compared to non-optimized KDEs
Decoding performance with optimized and non-optimized KDEs was probed on different sets of units
for a given data set and the misclassification error was calculated using two different cross-validation
schemes. To estimate the prediction error we employed 3-fold and m-fold CV. For 3-fold CV the classifier was trained on two third and tested one third of the trials. KDEs of the same response-class were
averaged block-wise per trial respectively. For m-fold CV we employed non-averaged KDEs and the
classifier was trained on m-1 and tested on the left out trial respectively (see sec. 2.5). The different sets
of units comprised the complete set of units for a given recording session, the most predictive set when
decoding was performed with spike counts and the most predictive set when using optimized KDEs
(sec. 4.1.2). The resulting prediction errors for the two different bandwidth selection methods were then
compared using the one-sided Wilcoxon signed rank test for paired samples to determine which method
performed superior.
3-fold CV Err(hopt ) [%]
Complete set of units
50
50
50
30
30
30
10
10
10
.
.
.
10
m-fold CV Err(hopt ) [%]
Most predictive hucv set
Most predictive spike count set
30
50
10
30
10
50
50
50
50
30
30
30
10
10
.
30
Err(hucv ) [%]
50
50
10
.
10
30
.
10
30
Err(hucv ) [%]
50
10
30
50
Err(hucv ) [%]
Figure 4.8: Comparison in decoding performance of optimized KDEs and most predictive non-optimized
KDEs with different sets of units (columns arranged from left to right), 3-fold and leave-one-trial-out crossvalidation (upper and lower panel). Each point corresponds to misclassification errors for one data set obtained
with optimized (x-axis) and non-optimized KDEs (y-axis). The gray bisector indicates the positions at which
values of both methods are identical.
For the different conditions we tested (2 CV schemes, 3 unit subsets) the overall best decoding performance across all the data sets on average was achieved with optimized KDEs of the most predictive
subset of units when employing m-fold CV (Err= 14.8±1.1% and Err=17.1±1.9% with and without
excluding the outlier at Err=34.2 and 39.5%). This decoding performance was significantly better
compared to the best achieved decoding with non-optimized KDEs for the same condition (p<0.018,
z=2.09, fig. 4.8, lower right panel). And also when comparing the same optimized with non-optimized
KDE results across the different sets of units but the same CV scheme optimized KDEs yielded sig-
45
nificantly better performance (complete sets: p<9.94e-04, z=3.09; most predictive set based on spike
counts: p<0.0018,z=2.91). Comparing both methods for m-fold CV and the same sets of units directly yielded significantly poorer performance for optimized KDEs for all the other cases (p<3.36e-04,
z=3.40, p<5.20e-04, z=3.28, fig. 4.8, lower left and middle). Employing 3-fold CV resulted in a similar
outcome: optimized KDEs performed poorer for same conditions (complete sets: p<2.4e-04, z=3.48;
most predictive set based on spike counts: p<0.0023, z=2.84, fig. 4.8, top left and middle panel) or
equally with most predictive optimized KDEs (p=0.71, z=0.36, fig. 4.8, top right).
Summarizing, optimized KDEs yielded the best overall decoding performance which is significantly
better or equally good compared the best performance of non-optimized KDE for the analyzed conditions (fig. 4.8 top and lower right panel). However, the classification accuracy highly depends on the
subset of units used for firing rate estimation and subsequent classification: while employing the most
predictive subset of optimized KDEs outperforms non-optimized KDEs, for decoding with other sets
of units the outcome is significantly poorer. Decoding accuracy also depends on the validation method
when estimating the prediction error.
4.4 Summary & Discussion
In a two-stage approach we first assessed how reliable we can predict task-relevant behavior of the animal when applying instantaneous firing rates obtained with optimized bandwidths, using both surrogates
and in vivo recordings from mPFC multiple single-unit spike times of behaving rats, and second, how
well we can decode experimental data with optimized compared to non-optimized KDEs.
Our findings show that first, in experimental data decoding outcome with optimized KDEs highly depends on the sets of units used for classification and also on the cross-validation scheme for estimation
of the prediction error. Second, the UCV-method provides a reliable bandwidth estimate for 80% of
the analyzed in-vivo data, i.e. the UCV-based differs from the best misclassification error when scaling
optimized bandwidths by less than 10% when employing robust validation sets (3-fold CV). Third, this
finding is in agreement with the results obtained for surrogate data which shows that prediction performance with actual KDEs deviates less than 5% from best performance achieved with scaled estimates
for sufficiently high non-resting state activity (fig. 4.6). This implies that decoding outcome of optimized
KDEs will be more reliable the higher state-dependent firing rates compared to the background activity
in surrogate data and the higher firing rates in response to stimulus presentation in in-vivo recordings are.
In the following we will briefly review possible reasons why the decoding outcome for optimized KDEs
varies depending on the set of single units and the CV choice when applied to mPFC in-vivo recordings.
One reason for dependency on single units is that neuronal activity under same conditions evolves over
time. Single-units can be responding to stimuli in some trials but being silent in other trials. This is in
line with previous studies which report inter-trial (Nawrot et al., 2008), e.g. responding non-stationary
when presented with the same stimulus, and long-term variability meaning that the state of neuronal
mPFC ensembles systematically exposed to the same conditions shifts with time (Hyman et al., 2012;
46
Balaguer-Ballester et al., 2011). It is also in agreement with the general concept that neural ensembles encoding for different task events or stimuli are context-dependent (Balaguer-Ballester et al., 2011;
Lapish et al., 2008). This may account for the flexibility needed during the performance of higher cognitive tasks such spatial and temporal context-encoding (Hyman et al., 2012), rule learning (Durstewitz
et al., 2010), or decision making (Balaguer-Ballester et al., 2011). In non-optimized KDEs spike strains
of all stationary and non-stationary units within a data set are convolved by a Gaussian kernel of the
same width resulting in instantaneous firing rates which are smoothed to the same extent. Instead, in
optimized KDEs individual spike trains are smoothed based on their individual bandwidth outcomes.
A non-stationary unit encoding task-related events but which activity shifts over time will have a lower
UCV-based bandwidth estimate (as pointed out in sec. 2.1 that firing rate fluctuations are associated with
higher Cv and high Cv with low hucv values, sec. 3.3, fig. 3.7 a), give a more peaked instantaneous firing
rate and thus add more variability to a given data set. Consequently, non-stationary units will contribute
to a higher prediction error when employing optimized KDEs for classification. By removing units by
sequential feature selection we can presumably identify the set of cells with firing rates which are stationary over time for same task events. Thus, to overcome this issue one can either remove units with
high trial-by-trial variability, or alternatively, employ adaptive KDEs with variable bandwidths which
adjust to non-stationary regions of spike trains (Sain, 1994; Hazelton, 2003; Terrell and Scott, 1992;
Shimazaki and Shinomoto, 2010). Also multivariate Kernel density esitimates or optimized KDEs developed for analysis of time-series might provide a more accurate alternative (Antoniadis et al., 2009;
Bouezmarni and Rombouts, 2010; Tran, 2010; Zougab et al., 2014; Balaguer-Ballester et al., 2014).
The reasons why decoding results with optimized KDEs depend on the CV scheme are based on general
statistical properties of the CV procedure: as pointed out by Hastie et al. (2009) the leave-one-out (here
referred to as m-fold) CV will have low bias, but high variance since training sets are similar to each
other. In 3-fold CV the test set comprises a time series with many auto-correlated samples. Thus, we
computed mean KDEs by averaging consecutive blocks of the same event or class label for each trial.
Decoding with temporally averaged class-specific firing rates results in a poorer performance (higher
bias) compared to non-averaged KDEs, however lower variance. It also conveys qualitatively different
information: when temporal patterns which are highlighted by optimized KDEs are averaged out, performance of optimized and non-optimized KDEs does not differ significantly (figure 4.8, upper right panel).
To conclude, although UCV-optimal bandwidth selection is an unsupervised method, which does not include any prior information on the timing of task-relevant events, it highlights temporal structure in spike
train data and improves subsequent decoding performance. Thus, the UCV method presents a helpful
tool for automated bandwidth selection and instantaneous firing rate estimation. The limitations of validity assessment of optimized KDEs in decoding are that the model we used to generate spiking activity
with multiple firing states does not incorporate trial-by-trial (Nawrot et al., 2008), long-term variability
(Hyman et al., 2012) or precise temporal spiking patterns present in in-vivo recordings (Mainen and Sejnowski, 1995; Bair and Koch, 1996; Buračas et al., 1998; Reinagel and Reid, 2002; Fellous et al., 2004;
Brown et al., 2005; Raman et al., 2010) which poses the need for more analyses including parameters
which specify temporal precision, inter-trial and long-term variability of spiking activity.
47
5 Outlook and possible extensions
UCV-based optimized bandwidths provide a information rich measure reflecting spiking activity of single cells and enhance classifier performance when applied to the recorded neuronal population (as seen
in the previous chapters 4 and 3). However, employing optimized KDEs we can tackle many more recent
challenging problems of the neuroscience community. Such might present single trial analysis of neural
time courses, detection of encoding neuronal ensembles or the unfolding population dynamics (Brown
et al., 2004; Baeg et al., 2007). Although there exist certainly many more possible considerable extensions related to optimized bandwidths and KDEs, this outlook will be devoted to show the above stated
applications of optimized KDE to in-vivo recordings of the rat mPFC and, which general inferences can
be made about neuronal coding mechanisms based on the outcome.
Apart from being involved in working memory the rat mPFC has been implicated in spatial and temporal
context-encoding (Hyman et al., 2012) and rule deduction (Durstewitz et al., 2010). Furthermore it has
been reported that the context-dependent organization of neuronal ensembles which encode for different
task events or stages (Lapish et al., 2008; Balaguer-Ballester et al., 2011; Ma et al., 2014) may account
for the great flexibility required during the performance of higher cognitive tasks.
Employing optimized KDEs to the activity of simultaneously recorded neurons from the rat mPFC
during the sequence-switch task (sec. 5.4) we will first examine single-trial neural representation of the
different sequences and actions in single unit activity. To enable across-trial comparisons of activity during self-paced events, we will use ’time-warped’ optimized KDEs. Second, we will identify neuronal
ensembles, i.e. subsets of units, encoding for task-related events by keeping those which maximally
decrease the misclassification rate. And third, we will unfold population dynamics during the different
sequences by reconstructing the neural trajectories in a 3-dimensional space by multidimensional scaling.
48
5.1 Single-trial analysis with optimized single neuron representations
We can examine how units represent task events and sequence information by comparing the time
courses of their neuronal activity across trials. To enable comparisons of spiking activity across trials
varying in length (when animals were freely moving), we aligned KDEs at time-stamps of task-related
behavior by ”time-warping” or temporal scaling (for a detailed description see sec. 5.4).
Representation of sequence information by single units: Figure 5.1 illustrates the transformation from
the original (left) to time-warped (middle and right) single-trial KDEs. Across-trial comparisons of
time-warped optimized KDEs in figure 5.2 confirm that single units significantly differentiate between
distinct events in time courses of their activity. We then examined whether the firing of single cells can
represent combinations of independent task-related attributes encoding for both actions and context information. Therefore, we grouped single-trials according to the task stage or order of performed actions.
20
.
20
ifr [Hz]
trial 1
sequence 1
30
.
.
.
sequence 2
trial 10
0 .
.
0 .
Tc |
Tn
| time [s]
Tc normalized time
.
1st action
2nd
3nd
approaching
reward
normalized time
Figure 5.1: Illustration of single-trial KDEs varying in length and time-warped KDEs. Left: single unit
optimized KDE estimated at equidistant time points. Event time-stamps are denoted by vertical lines. Middle:
same single trial spiking activity as ’time-warped’ representation aligned at task-related behaviors. Gray-colored
sections on the x-axis indicate response (Tc , dark gray) and non-response-specific time-windows (Tn , light gray).
Sections of the same color in left span the same time periods as in the middle plot. Right: contour plot of ten
’time-warped’ single-trial KDEs for the two different task stages: actions performed in clock-wise (sequence 1,
trials 1-5) and anti-clock-wise order (sequence 2, trials 6-10). Dark blue regions indicate high firing activity and
light blue regions activity close to zero Hz.
49
0
trial 10
trial 18
.
trial 10
.
trial 12
.
trial 20
.
.
.
trial 16
.
.
trial 17
.
.
20
.
trial 17
.
unit 20
trial 28
.
trial 33
.
.
trial 16
.
trial 17
trial 23
.
trial 28
.
trial 33
.
trial 14
.
trial 16
trial 20
.
trial 23
.
trial 28
.
trial 12
.
trial 14
trial 18
.
trial 20
.
trial 23
trial 10
.
trial 12
.
trial 14
trial 18
trial 33
.
.
unit 24
unit 27
ifr [Hz]
.
.
0 .
.
1st action
2nd
3nd
approaching reward
normalized time
.sequence 1
.sequence 2
.
1st action
2nd
3nd
approaching reward
normalized time
1st action
2nd
3nd
approaching reward
normalized time
Figure 5.2: Time courses of time-warped optimized single trial instantaneous firing rates of single units
which convey maximal information about task-related responses, grouped according to the task-stages.
Data set 12 taken from the sequence-switch task. Upper panels: Ten single-trial ifrs of three selected units during
the two different task stages (color-coded sequences one and two). Lower panel: trial-averaged activity across the
two sequences, light blue and red lines indicate 95% confidence regions. The left column displays the same busty,
stimulus-locked unit as shown in chap. 3, figure 3.5, middle panel. Time-warped single trial KDEs are aligned
at time-stamps corresponding to actions performed on the task denoted by vertical lines (level-press, nose-poke,
wheel-turn, approaching the reward and reward consumption).
Single-trial optimized KDEs of single units in figure 5.2, lower panel reveal that some cells exhibit
temporal activity patterns that clearly differentiate between sequences, even when actions within a sequence are re-ordered to match both sequences. This finding implies that information on actions and on
sequence order is encoded in some of the single unit activity.
50
5.2 Identifying cell ensembles encoding for task-related information
forward
forward-backward
backward
40
.
.
.
.original
.smoothed
.95% ci
20
10
0 .
.
.
19
#units included in LDA
20 40 60
iterations
31
#units included in LDA
.
original
.
.
smoothed
.
.
.
.
.
.
40
.
.
.
.
.
.
.
.
.
60
.
.
.
.
.
.
.
Encoding units [%]
.
.
.
Misclassification error [%]
In order to make inferences about population coding mechanisms we examined if mPFC neurons follow
rather a distributed or sparse coding scheme by identifying units encoding for distinct task-related events
and determining the size of these neuronal ensembles proportional to the recorded population.
When coding is fully distributed information will be conveyed by the activity of many cells and accordingly including all units to build the classifier will yield a higher decoding performance, otherwise, when
encoding is sparse, classification accuracy will be improved when activity of only few predictive cells is
considered.
We can address the problem from a statistical analytical view-point by using a procedure termed feature
or model selection. In this context, each feature corresponds to the optimized kernel density or instantaneous firing rate estimate νi of a single unit, i=1,...,p and a model represents the set of features the
classifier is built on. Units which most improve the misclassification error are successively added or
removed from the model (for a more detailed description see appendix 5.4).
20
0.
forward
backward
forw-backw
task 1
forward
backward
forw-backw
task 2
Figure 5.3: Identifying encoding units via stepwise feature selection procedures. Top: decoding performance
as a function of the subset size used for LDA classifier construction, dataset 12, sequence-switch task. The optimal
feature set size is indicated by vertical lines. Lower panel: average size of neuronal ensembles conveying maximal
task-related information proportional to the recorded population identified by the three different feature selection
algorithms (mean value ± 95% confidence intervals).
51
Impact of the ensemble size on the decoding accuracy: Figure 5.3, top panel illustrates misclassification error curves as a function of the subset size of neurons which was employed to construct the
LDA classifier for three different variable selection methods. Results obtained with forward-backward
selection are not considered for further analyses due to lack of convergence to a global minimum. All
prediction error curves are typically u-shaped due to the bias-variance trade-off. Starting with one unit
the misclassification rate decreases as the activity of more units are used for building the classifier until
a global minimum is reached, which defines the optimal size of the most predictive neuronal ensemble,
indicated by gray horizontal lines in figure 5.3, top panel. Prediction accuracy then again deteriorates
as more units are added to the model. Decoding accuracy significantly improves when taking into account exclusively the activity of the most predictive optimized KDEs of cells which are identified by
feature selection. For instance, in backward selection the misclassification error is reduced from about
41±2.5%, when including the entire set of recorded units, to below 7.9±1.43% for the most predictive
ensemble Furthermore, from figure 5.3, top middle column it follows that the complete set has almost
the same decoding performance as the best single unit 46.9±2.6%. This means that discrimination of
task-related event can significantly be improved with by selection the subset of encoding cells.
Sizes of encoding neuronal ensembles: The application of optimized KDEs in combination with feature
selection suggests that around 40% of the cells are encoding for task-related events on the sequenceswitch task (task one: 38.2±7.5, forward and 43.8±3.2 backward selection) and above 50% in the alternation task (task two: 54.9±13.0, forward and 53.8±10.6 backward selection), figure 5.3, lower panel.
On average more cells were recorded per session in task one (∼50) compared to task two, (∼40). Generally, most predictive feature sets identified by backward selection slightly differ from those found by
forward selection, but not to a significant extent.
5.3 Reconstructing population dynamics during sequence processing
By analyzing high-dimensional population dynamics and low-dimensional neural trajectories in the PFC
it has already been possible to uncover interesting phenomena during rule learning, context perception
or decision making either on a single-trial or on a multiple-trial basis (Mante et al., 2013; Stokes et al.,
2013; Durstewitz et al., 2010). Here, we show first, that also optimized KDEs enable to extract lowdimensional single-trial time courses which allow to monitor the simultaneous activity of a neuronal
population and second, which conclusions on mechanisms of sequence encoding can be inferred from
trajectories based on optimized KDEs.
The idea behind this approach is that each neuron is considered as a noisy sensor which reflects an underlying neural process (Brown et al., 2005; Carrillo-Reid et al., 2008; Mazor and Laurent, 2005; Yuan
and Niranjan, 2010; Stopfer et al., 2003; Yu et al., 2006; Balaguer-Ballester et al., 2011). This process
can be uncovered by extracting a low-dimensional neural trajectory from the recorded high-dimensional
population activity. In behavioral experiments high trial-by-trail variability of single neurons can be
related to internal processing, e.g. in behavioral tasks involving motor planning, decision making, rule
learning and perception (Nawrot et al., 1999; Horwitz and Newsome, 2001; Czanner et al., 2008; Mante
et al., 2013; Churchland et al., 2010) and will reflect the state of the network and thus will be shared
among many cells. The neural trajectory facilitates the visualization of the underlying neuronal process
by providing a reduced representation of the shared high-dimensional population activity.
52
We applied multidimensional scaling (MDS) to reduce the dimensionality of the population vector of
optimized instantaneous firing rates. Figure 5.4, middle and top panel shows the trajectories obtained
with optimized and non-optimized KDEs for an exemplary data set from the sequence-switch task. Each
dot represents the state of the entire recorded ensemble in one 100-ms bin. All points corresponding to
different 100-ms bins in the epochs of the same behavior are shown in the same color.
dim 3
dim 3
.
dim 1
dim 1
nose-poke
.
.
.
lever-press
.
approaching
.
consuming
wheel-turn
.
.
dim 2
.
.
.
.
.
.
dim 2
dim 3
dim 3
.
dim 1
.
.
dim 1
dim 2
.
dim 2
dim 3
dim 3
.
dim 1
.
dim 2
.
dim 1
.
.
sequence 1
.
.
sequence 2
.
dim 2
Figure 5.4: Reconstructing population dynamics from optimal KDEs. Data set 12 taken from the sequenceswitch task. Population vector of optimized instantaneous firing rates mapped to a 3-dimensional space with
multi-dimensional scaling. a-b: neuronal single trial trajectories based on non-optimized KDEs. c-d: neuronal
single trial trajectories based on optimized KDEs, both showing 10 consecutive trials. e-f: averaged trajectories
over the two different task stages: sequence 1 and sequence 2.
53
Representation of sequence information by the neuronal population: Our findings demonstrate that
first, optimized KDEs with automated bandwidth selection enable to unfold population dynamics and
show smooth single-trial trajectories (fig.5.4, middle panel). This is clearly not possible with nonoptimized KDEs where the bandwidth is chosen at random (fig.5.4, top panel). Second, when averaging
over the sequences in the reduced space, task events which appear in the same order in both sequences
(reward-approaching, reward-consumption) occupy similar regions of the space. Instead, same actions
but with reserved order can be discriminated along some dimensions of the reduced space while aligned
along other dimensions (fig.5.4, lower panel). The conclusions on mechanisms of sequence encoding we
can draw from analyzing trajectories based on optimized KDEs are that information encoding for both
actions as well as sequence order (context) seems to be represented simultaneously in the population
activity. These findings support the view that the mPFC is involved in higher-level contextual encoding
(Hyman et al., 2012; Ma et al., 2014; Rigotti et al., 2013).
54
5.4 Concluding remarks
To date, studies which seek to determine more accurately the underlying spike density function by bandwidth optimization of KDEs are mostly employed to analyze representations of neuronal activity in form
of instantaneous firing rates (Nawrot et al., 1999; Lehky, 2010; Shimazaki and Shinomoto, 2010).
However, in this study we have seen that optimized bandwidths provide an information rich measure
which helps to address and study more challenging questions and can provide valuable insight into the
understanding of multiple different mechanisms: By examining optimized bandwidths and their distributions, we can discriminate cells with specific temporal spiking structure and moreover, segregate cell
groups with distinct spiking characteristics. By employing optimized KDEs obtained with automated
bandwidth selection we can enhance classification performance and unfold population dynamics reflecting the underlying neuronal process and thereby draw conclusions on encoding mechanisms.
Theoretical research on kernel density estimation is far ahead of its practical application to neurophysiological data for a long time: adaptive KDEs with variable bandwidths which adjust to non-stationary
regions (Sain, 1994; Hazelton, 2003; Terrell and Scott, 1992; Tran, 2010; Antoniadis et al., 2009) could
be employed to overcome trial-by-trial variability in spike trains. Also, multivariate KDEs (Sain, 2002;
Bouezmarni and Rombouts, 2010; Zougab et al., 2014) might provide a more accurate method to estimate the underlying spike density function of multiple-single unit recordings. Yet, the PSTH is widely
applied and the most commonly used method in in-vivo neurophysiology for the representation of spiking activity.
This poses the need for future studies to integrate these already present more advanced methods on
automated bandwidth selection and adaptive and multivariate KDE and will be crucial for a better understanding of functional organization of neuronal activity. Also bandwidths estimates obtained by
optimization should be studied more systematically in a variety of cortical areas. We hope that the
application of bandwidth estimates and KDEs determined by optimization will in future shed light on
the neuronal coding problem and contribute to a better understanding of the functional mechanisms of
internal processing during complex cognitive tasks in higher order areas as the PFC.
I would like to conclude with the remark that I hope that this work will fill at least a tiny peace of the gap
between the field of Kernel density estimation and the complex but fascinating world of how the brain
works which may inspire and enrich future research.
55
Acknowledgements
I would like to thank all my colleagues and friends who contributed with valuable and thoughtful discussions to the work described in this thesis.
First and foremost I offer my sincerest gratitude to my supervisor, Professor Daniel Durstewitz, supporting me thoughout my thesis with his patience and knowledge whilst allowing me the room to work in
my own way and demanding a high quality of work in all my endavours.
Additionally, I would like to thank my dissertation committee members Professors Ursula Kummer,
Christoph Schuster, and Dr. Kevin Allen for their interest in my work and their important feedback.
I am very grateful to Liya Ma, James Hyman, and Jeremy Seamans for providing the vivo data sets used
to test our methods and analysis.
Every result described in this thesis was accomplished with the help and support of fellow labmates and
collaborators at the Central Institute for Mental Health, Mannheim and the Bernstein Center for Computational Neuroscience, Heidelberg and Mannheim. In my daily work I have been blessed over the years
with a friendly and cheerful group of fellow students. Loreen Hertaeg, Charmaine Demanuele, Emili
Balaguer-Ballester, Grant Sutcliffe, Claudio Sebastian Quiroga-Lombard, Tatiana Golovko, Joachim
Hass, Thomas Fucke, Sven Berberich, Thomas Hahn, Elonora Russo, Georgia Koppe, Carla Filosa,
Sadjad Sadeghi, and Hazem Toutounji made my time here at the Central Institute a lot more fun.
Finally, I would like to acknowledge friends and family who supported me during my time here. First
and foremost I would like to thank Mom, Dad, Igor, Jessica, Oliver for their constant love and support.
56
Appendix
Methods
Supplemental Information on the electrophysiological recordings & sequence-switch task
The following description was provided by Dr Liya Ma (personal communication).
Apparatus. The maze consisted of 4 platforms connected by 4 two-foot long passages that formed a diamond shape (fig. 5.5), with platforms 1 and 3 at the sharp tips of the diamond. The individual platforms
differed in size, shape, floor texture and wall patterns. Platforms 1 to 3 contained unique manipulandae:
a nose-poke port in platform 1 to the right; a lever in platform 2 in the middle; and a response wheel
in platform 3 to the left. A 3W signal light was located above each of these manipulandae. During
initial task shaping, a food-cup was inserted above each manipulanda for food-pellet delivery, which
was always accompanied by a 0.5s pure tone at 1.5 KHz. Once trained, these food cup dispensers were
removed and food was only delivered on platform 4 in association with the same tone. All cue lights,
tone-generator, manipulandae and pellet dispensers were operated by a MedPC IV system (Med Associates, Georgia, VT). Doors located at the start of each passage could be controlled from outside of the
maze.
Figure 5.5: Sequence switch task apparatus and stages of pre-training sessions on the maze. Figure and
description taken from Ma (personal communication).
I
Behavior. Pre-training on the Maze: Training on the maze started with single-action instrumental conditioning. The animals were restricted for fixed periods to each of the nose poke (np)(right), lever press
(lp) (middle) and wheel turn (wt) (left) platforms, thus ensuring daily training on each of the 3 instrumental actions (fig. 5.5). They progressed through FR1, fixed-interval 10s (FI10s) and random-interval
15s (RI15s) schedules with a performance criterion of 30pellets/20min. Ten daily sessions of sequence
shaping started the day after they attained criterion performance on all three individual operant actions.
The animals were placed in the reward platform at the beginning of the session and the only open passage led to Platform 1, where the light above the np port was illuminated (fig. 5.5 middle). Once the
animals reached this platform, two doors blocked both exits and the light stayed on until a np response
was emitted. At that point a door opened to allow access to Platform 2 and the light above the lever
was illuminated. The animals could only leave the Platform 2 after they performed a lp. At that point
the light above the lever was extinguished and the door was opened and the light above the wheel on
Platform 3 was illuminated. Once the animal reached the wt platform and turned on the wheel for a full
circle, the light above the wheel was extinguished and the last door opened to allow entrance to Platform
where 4 food pellets were delivered accompanied by a 0.5-s pure tone at 1.5 KHz. After 4 seconds,
another door opened so they could move to Platform 1 and start the next trial. These sessions lasted for
no less than 45min and ended either when the animal stopped responding for at least 3min or 60min had
elapsed.
Maze sequence task: After 10 daily shaping sessions, all doors were removed. Rats were still required
to perform the 3 actions in the aforementioned order, but could commit out-of-sequence errors such as
running in the wrong direction around the maze. Performance was evaluated by the number of outof-sequence errors per trial. Since the intention was to examine the plasticity of action representations
rather than sequence learning per se, it was desirable to minimize these types of errors. Therefore the
rats continued to be guided through the sequence by cue lights that were activated by the manipulandae
and that illuminated the next platform in the sequence. Repeated responses on the manipulandae in the
4s after the initial correct response were not considered as errors. A session typically lasted for 60min,
but extra time (up to 10min) was occasionally given if the animal was in the middle of the sequence at
60min. The animals continued to receive this self-paced sequence training for a total of 23 days.
Sequence switch task: After training on the original sequence (sequence A), animals were trained to
perform the action-sequence in the reversed order (i.e., wt lp np reward: sequence B). On the maze, rats
were first guided through sequence B with doors, then without, until they reached the same level of efficiency as exhibited previously on sequence A. At that point the sequence-switch sessions commenced on
the following days, when the animals were required to complete at least 20 trials on sequence B within
20min at which point they were removed from the maze for a minute. They were then placed back in the
apparatus and the light above the np was illuminated instead of the light above the wt (as was the case for
sequence B). This prompted them to perform sequence A. Only the trials free of out-of-sequence errors
were used in the analyses of neural data. In the results, we refer to the first sequence (i.e. sequence B)
as ”sequence 1” and the 2nd as ”sequence 2”.
Subjects & surgery. Male Long-Evans rats (450-550g) were housed in a facility with 12hr light-dark cycle, with all training and recording taking place during the light cycle. For the duration of the behavioral
experiments, the rats were food-restricted to just below 90% of their free-feeding weights. Feeding took
place in the home cage after their daily training/recording sessions, and water was available ad libitum
II
in the cages at all times. All procedures were carried out in accordance with the Canadian Council of
Animal Care and the Animal Care Committee at the University of British Columbia. Stereotaxic surgeries were performed on naive rats with sterilized-tip procedures. NSAIDs analgesic, antibiotic, and a
local anesthetic, were given before incision. An elliptical-shaped craniotomy was made, centered at:
AP: +3.2mm, ML: ±0.5mm. Once the dura mater was retracted, the bottoms of the two bundles of 8,
30-gauge tubes, containing a total of 16 tetrodes were placed bilaterally immediately beside the central
sinus, touching the cortical surface. Each bundle had a cylindrical shape with bottom radius ∼0.4mm,
and were angled by 3.5∼5 degrees. The implants were fixed with bone screws and dental acrylic. All
tetrodes were extended ∼0.7mm into the brain at the end of the surgery. After 10d of recovery, the
tetrodes were advanced ventrally into the ACC. Once all tetrodes were placed into the dorsal ACC according to lowering records and atlas coordinates, small adjustments were made with hyperdrives to
maximize the number of neurons recorded.
Acquisition of electrophysiological data. For data acquisition, an EIB-36TT board (Neuralynx Inc.,
Bozeman, MT, USA), connected to the extracellular electrodes were plugged into HS-36 headstages and
tether cables (Neuralynx Inc., Bozeman, MT). Signals were converted by a Digital Lynx 64 channel
system (Neuralynx Inc., Bozeman, MT) and sent to a PC workstation, where electrophysiological and
behavioral data were read into Cheetah 5.0 software (Neuralynx Inc., Bozeman, MT). Files were then
read into Offline Sorter (Plexon Inc., Dallas, TX) for spike sorting, based on visually dissociable clusters in 3D projections along multiple axes for each electrode of a tetrode (peak and valley amplitudes,
peak-to-valley ratio, principal components and area). Sorting was confirmed by examining auto- and
cross-correlations, and ANOVAs were conducted from the 2D and 3D projections. Spike timestamps
were then read into MATLAB (Mathworks Inc., Natick, MA) for all further analysis. At the end of the
studies, the animals were deeply anesthetized using urethane i.p. injection, and a 100µA current was
passed through the electrodes for 30s. Animals were then perfused with a solution containing 250ml
10% buffered formalin, 10ml glacial acetic acid, and 10 g of potassium ferrocyanide. This solution
causes a Prussian blue reaction, which marks with blue the location of the iron particles deposited by
passing current through the electrodes. The brains were then removed and stored in a 10% buffered
formalin/20% sucrose solution for at least 1 week, before being sliced and mounted to determine precise
electrode tracks. Since multiple sessions were recorded from individual animals the precise recording
locations could not be derived from electrode lesions, but all electrode tracks were inferred between the
entrance point and the dyed spot. All tracks ended within the medial frontal cortex with the vast majority
of tracks limited to the ACC and a minority extending into superficial layers of the prelimbic region.
III
Single-trial time warping
In order to align single trial KDEs, instead of using a grid of usually equidistant points at which the
instantaneous firing rate has to be estimated (fig. 5.1, left), we divided each trial into an equal number of
non-overlapping scaled time bins, which length was adapted to the timing of task events (fig. 5.1, middle
and right panel). We obtained the temporal binning for each trial in the following way:
Based on recorded time-stamps corresponding to actions ck , k = 1,...,5, performed on the task (levelpress, nose-poke, wheel-turn, approaching the reward and reward consumption, see sec. 5.4), we identified periods of 500 ms preceding and following all task-related behaviors as time windows Tc of
response-specific activity (fig. 5.1, dark gray horizontal lines). Time intervals Tn lying in-between were
labeled as non-response-specific (fig. 5.1, light gray horizontal lines). Then, the temporal resolution of
KDEs, i.e. grid at which the instantaneous firing rate was estimated, was adjusted depending on the
window type. Response-specific time-windows of length Tc =1 s were divided into L=20 equally-spaced
bins, so that the resulting response-specific instantaneous firing rates ν(t|ck ) had a temporal resolution
of Tc /L = △t = 50 ms. Non-response-specific periods Tn were partitioned into the same number of bins
respectively. However, as the timing of self-paced behavior varied, the temporal resolution was adjusted
to the length of time intervals between consecutive responses △t = Tn /L.
Feature selection
Before feature selection we partition the input and output dataset of optimized KDEs ν and stimulusvectors y into two subsets based on the leave-one-trial-out CV principle (sec 2.5). The training set is
used for classifier construction and the test set is retained for subsequent validation.
We employed LDA in combination with forward, backward and forward-backward feature selection algorithms, which are best described by the following pseudo-code:
1. Initialize feature set
.
S0 = ⟨∅⟩; m = 0
2. Select the next best feature
.
ν ∗ = arg min Err(Sm ∪ νi )
1. Initialize feature set
.
S0 = ν; m = 0
2. Remove the worst feature
.
ν ∗ = arg min Err(Sm ∖ νi )
3. Update feature set
.
Sm+1 = Sm ∪ ν ∗
4. While m < p
.
m = m+1
.
Go to Step 2
Sequential forward selection pseudo-code.
3. Update feature set
.
Sm+1 = Sm ∖ ν ∗
4. While m > p
.
m = m+1
.
Go to Step 2
Sequential backward selection pseudo-code.
νi <Sm
νi ∈Sm
The backward feature selection procedure starts with the full set of features or single neuron firing rate
estimates Sp = ν = {ν1 , ..., νp } and then sequentially eliminates features νi ∈ ν which most improve
the misclassification error Err(Sp−1 ), Sp−1 = Sp ∖ νi , while in forward selection units are added to the
current set. Forward-backward selection combines both procedures so that units are added or deleted
from the subset Sm as long as the prediction error is still decreasing Err(Sm+1 ) ≤ Err(Sm ).
IV
We computed the prediction error Err by means of leave-one-trial-out CV (see sec. 2.5). After smoothing the resulting Err(Sm )-functions with the MATLAB built-in cubic spline function ”csaps”, the most
predictive subset of neurons which conveys maximal stimulus-related information is then identified as
the set for which the prediction error attains its global minimum Sopt = arg min Err(Sm ).
m
V
Bibliography
Abarbanel, H. D., Huerta, R., Rabinovich, M. I., Rulkov, N. F., Rowat, P. F., and Selverston, I. (1996).
Synchronized action of synaptically coupled chaotic model neurons. Neural computation, 8:1567–
1602.
Abeles, M., Bergman, H., Gat, I., Meilijson, I., Seidemann, E., Tishby, N., and Vaadia, E. (1995).
Cortical activity flips among quasi-stationary states. Proceedings of the National Academy of Sciences
of the United States of America, 92(September):8616–8620.
Adrian, E. D. and Zotterman, Y. (1926). The impulses produced by sensory nerve-endings: Part II. The
response of a Single End-Organ. The Journal of physiology, 61(2):151–171.
Ainsworth, M., Lee, S., Cunningham, M. O., Traub, R. D., Kopell, N. J., and Whittington, M. A. (2012).
Rates and Rhythms: A Synergistic View of Frequency and Temporal Coding in Neuronal Networks.
Amit, D. J. (1992). Modeling Brain Function: The World of Attractor Neural Networks. AddisonWesley.
Antoniadis, A., Paparoditis, E., and Sapatinas, T. (2009). Bandwidth selection for functional time series
prediction. Statistics and Probability Letters, 79(6):733–740.
Ardid, S., Vinck, M., Kaping, D., Marquez, S., Everling, S., and Womelsdorf, T. (2015). Mapping
of Functionally Characterized Cell Classes onto Canonical Circuit Operations in Primate Prefrontal
Cortex. Journal of Neuroscience, 35(7):2975–2991.
Baeg, E. H., Kim, Y. B., Kim, J., Ghim, J.-W., Kim, J. J., and Jung, M. W. (2007). Learning-induced
enduring changes in functional connectivity among prefrontal cortical neurons. The Journal of neuroscience, 27(4):909–918.
Bair, W. and Koch, C. (1996). Temporal precision of spike trains in extrastriate cortex of the behaving
macaque monkey. Neural computation, 8:1185–1202.
Balaguer-Ballester, E., Lapish, C. C., Seamans, J. K., and Durstewitz, D. (2011). Attracting dynamics of frontal cortex ensembles during memory-guided decision-making. PLoS Comput. Biol.,
7(5):e1002057.
Balaguer-Ballester, E., Tabas-Diaz, A., and Budka, M. (2014). Can we identify non-stationary dynamics
of trial-to-trial variability? PLoS ONE, 9(4).
Barbieri, R. (2001). Construction and analysis of non-Poisson stimulus-response models of neural spiking activity. Journal of Neuroscience Methods, 105(1):25–37.
VI
Barlow, H. B. (1972). Single units and sensation: a neuron doctrine for perceptual psychology? Perception, 1(4):371–394.
Bialek, W., Rieke, F., de Ruyter van Steveninck, R. R., and Warland, D. (1991). Reading a neural code.
Science, 252(5014):1854–1857.
Bishop, C. M. (2006). Pattern Recognition and Machine Learning, volume 4. Springer.
Blanche, T. J., Spacek, M. A., Hetke, J. F., Swindale, N. V., Timothy, J., Spacek, M. A., Hetke, J. F.,
and Swindale, N. V. (2005). Polytrodes : High-Density Silicon Electrode Arrays for Large-Scale
Multiunit Recording. Journal of Neurophysiology, 93:2987–3000.
Bouezmarni, T. and Rombouts, J. V. (2010). Nonparametric density estimation for positive time series.
Computational Statistics & Data Analysis, 54(2):245–261.
Bowman, A. W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika, 71(2):353–360.
Bowman, A. W. and Azzalini, A. (1997). Applied Smoothing Techniques for Data Analysis: The Kernel
Approach with S-Plus Illustrations. Oxford Statistical Science Series. Oxford University Press, USA.
Brown, E. N., Kass, R. E., and Mitra, P. P. (2004). Multiple neural spike train data analysis: state-ofthe-art and future challenges. Nature Neuroscience, 7(5):456–461.
Brown, S. L., Joseph, J., and Stopfer, M. (2005). Encoding a temporally structured stimulus with a
temporally structured neural representation. Nature Neuroscience, 8(11):1568–1576.
Buračas, G. T., Zador, A. M., DeWeese, M. R., and Albright, T. D. (1998). Efficient discrimination of
temporal patterns by motion-sensitive neurons in primate visual cortex. Neuron, 20:959–969.
Carrillo-Reid, L., Tecuapetla, F., Tapia, D., Hernández-Cruz, A., Galarraga, E., Drucker-Colin, R., and
Bargas, J. (2008). Encoding network states by striatal cell assemblies. Journal of neurophysiology,
99(January 2008):1435–1450.
Churchland, M. M., Yu, B. M., Cunningham, J. P., Sugrue, L. P., Cohen, M. R., Corrado, G. S., Newsome, W. T., Clark, A. M., Hosseini, P., Scott, B. B., Bradley, D. C., Smith, M. A., Kohn, A., Movshon,
J. A., Armstrong, K. M., Moore, T., Chang, S. W., Snyder, L. H., Lisberger, S. G., Priebe, N. J., Finn,
I. M., Ferster, D., Ryu, S. I., Santhanam, G., Sahani, M., and Shenoy, K. V. (2010). Stimulus onset
quenches neural variability: a widespread cortical phenomenon. Nature neuroscience, 13(3):369–378.
Churchland, M. M., Yu, B. M., Sahani, M., and Shenoy, K. V. (2007). Techniques for extracting singletrial activity patterns from large-scale neural recordings. Current opinion in neurobiology, 17(5):609–
618.
Cox, D. R. and Isham, V. (1980). Point Processes. CRC Monographs on Statistics & Applied Probability.
Chapman & Hall/CRC.
Cox, D. R. and Wermuth, N. (2002). On some models for multivariate binary variables parallel in
complexity with the multivariate Gaussian distribution. Biometrika, 89(2):462–469.
VII
Cunningham, J. P., Gilja, V., Ryu, S. I., and Shenoy, K. V. (2009). Methods for estimating neural firing
rates, and their application to brain-machine interfaces. Neural Networks, 22(9):1235–1246.
Cunningham, J. P., Shenoy, K. V., and Sahani, M. (2008). Fast Gaussian process methods for point process intensity estimation. In Proceedings of the 25th international conference on Machine learning,
ICML ’08, pages 192–199, New York, NY, USA. ACM.
Czanner, G., Eden, U. T., Wirth, S., Yanike, M., Suzuki, W. A., and Brown, E. N. (2008). Analysis
of between-trial and within-trial neural spiking dynamics. Journal of neurophysiology, 99(5):2672–
2693.
Daley, D. J. and Vere-Jones, D. (2003). An Introduction to the Theory of Point Processes, Volume 1 (2nd
ed.). Springer, New York.
Dayan, P. and Abbott, L. F. (2005). Theoretical neuroscience: computational and mathematical modeling of neural systems. MIT Press.
DeCharms, R. C. and Zador, A. (2000). Neural Representation and the Cortical Code. Annual Review
of Neuroscience, 23:613–647.
Dimatteo, I., Genovese, C. R., and Kass, R. E. (2001). Bayesian curve-fitting with free-knot splines.
Biometrika, 88(4):1055–1071.
Dodla, R. and Wilson, C. J. (2010). Quantification of clustering in joint interspike interval scattergrams
of spike trains. Biophysical Journal, 98(11):2535–2543.
Durstewitz, D., Vittoz, N. M., Floresco, S. B., and Seamans, J. K. (2010). Abrupt transitions between
prefrontal neural ensemble states accompany behavioral transitions during rule learning. Neuron,
66(3):438–448.
Efron, B. (2004). The Estimation of Prediction Error. Journal of the American Statistical Association,
99(467):619–632.
Endres, D., Oram, M., Schindelin, J., and Foldiak, P. (2008). Bayesian binning beats approximate
alternatives: estimating peri-stimulus time histograms. In Platt, J. C., Koller, D., Singer, Y., and
Roweis, S., editors, Advances in Neural Information Processing Systems 20. MIT Press.
Escola, S., Fontanini, A., Katz, D., and Paninski, L. (2011). Hidden markov models for the stimulusresponse relationships of multistate neural systems. Neural computation, 23(2006):1071–1132.
Faure, P., Kaplan, D., and Korn, H. (2000). Synaptic efficacy and the transmission of complex firing
patterns between neurons. Journal of neurophysiology, 84(6):3010–3025.
Fellous, J.-M., Tiesinga, P. H. E., Thomas, P. J., and Sejnowski, T. J. (2004). Discovering spike patterns
in neuronal responses. The Journal of neuroscience, 24(12):2989–3001.
Feng, J. (2003). Computational Neuroscience: A Comprehensive Approach. CRC Press.
VIII
Fitzurka, M. a. and Tam, D. C. (1999). A joint interspike interval difference stochastic spike train analysis: detecting local trends in the temporal firing patterns of single neurons. Biological cybernetics,
80:309–326.
Fujii, H., Ito, H., Aihara, K., Ichinose, N., and Tsukada, M. (1996). Dynamical cell assembly hypothesis
- Theoretical possibility of spatio-temporal coding in the cortex. Neural Networks, 9(8):1303–1350.
Georgopoulos, A. P., Kettner, R. E., and Schwartz, A. B. (1986). Neuronal population coding of movement direction. Science, 233:1416–1419.
Gerstein, G. L., Bedenbaugh, P., and Aertsen, A. M. H. J. (1989). Neuronal assemblies. IEEE Transactions on Biomedical Engineering, 36(1):4–14.
Gerstein, G. L. and Kiang, N. Y. (1960). An Approach to the Quantitative Analysis of Electrophysiological Data from Single Neurons. Biophys. J., 1(1):15–28.
Gerstner, W. and Kistler, W. K. (2002). Spiking Neuron Models. Cambridge University Press.
Harris, K. D. (2005). Neural signatures of cell assembly organization. Nature reviews. Neuroscience,
6(May):399–407.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning, volume 27.
Springer.
Hazelton, M. L. (2003). Variable kernel density estimation. Australian & New Zealand Journal of
Statistics, 45(3):271–284.
Hebb, D. O. (1949). The Organization of Behavior: A Neuropsychological Theory. Wiley, New York,
new ed edition.
Holt, G. R., Softky, W. R., Koch, C., and Douglas, R. J. (1996). Comparison of discharge variability in
vitro and in vivo in cat visual cortex neurons. Journal of neurophysiology, 75(5):1806–1814.
Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences of the United States of America,
79(April):2554–2558.
Hopfield, J. J. (1995). Pattern recognition computation using action potential timing for stimulus representation. Nature, 376:33–36.
Horwitz, G. D. and Newsome, W. T. (2001). Target selection for saccadic eye movements: prelude
activity in the superior colliculus during a direction-discrimination task. Journal of neurophysiology,
86(5):2543–2558.
Hubel, D. H. and Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture
in the cat’s visual cortex. The Journal of physiology, 160:106–154.
Hyman, J. M., Ma, L., Balaguer-Ballester, E., Durstewitz, D., and Seamans, J. K. (2012). Contextual encoding by ensembles of medial prefrontal cortex neurons. Proc. Natl. Acad. Sci. U.S.A.,
109(13):5086–5091.
IX
Hyman, J. M., Whitman, J., Emberly, E., Woodward, T. S., and Seamans, J. K. (2013). Action and
outcome activity state patterns in the anterior cingulate cortex. Cerebral Cortex, 23(June):1257–1268.
Johnson, D. H. (1996). Point process models of single-neuron discharges. Journal of computational
neuroscience, 3(4):275–299.
Jones, L. M., Fontanini, A., Sadacca, B. F., Miller, P., and Katz, D. B. (2007). Natural stimuli evoke
dynamic sequences of states in sensory cortical ensembles. Proceedings of the National Academy of
Sciences of the United States of America, 104:18772–18777.
Kass, R. E., Ventura, V., and Brown, E. N. (2005). Statistical issues in the analysis of neuronal data.
Journal of neurophysiology, 94(1):8–25.
Kaufman, C. G., Ventura, V., and Kass, R. E. (2005). Spline-based non-parametric regression for
periodic functions and its application to directional tuning of neurons. Statistics in Medicine,
24(14):2255–2265.
Kayser, C., Montemurro, M. A., Logothetis, N. K., and Panzeri, S. (2009). Spike-Phase Coding Boosts
and Stabilizes Information Carried by Spatial and Temporal Spike Patterns. Neuron, 61:597–608.
Krzanowski, W. J. (2000). Principles of multivariate analysis. Oxford University Press.
Kumar, A., Rotter, S., and Aertsen, A. (2010). Spiking activity propagation in neuronal networks:
reconciling different perspectives on neural coding. Nature reviews. Neuroscience, 11:615–627.
Lagarias, J. C., Reeds, J. A., Wright, M. H., and Wright, P. E. (1998). Convergence properties of the
nelder-mead simplex method in low dimensions. SIAM Journal of Optimization, 9:112–147.
Lapish, C. C., Durstewitz, D., Chandler, L. J., and Seamans, J. K. (2008). Successful choice behavior
is associated with distinct and coherent network states in anterior cingulate cortex. Proc. Natl. Acad.
Sci. U.S.A., 105(33):11963–11968.
Lehky, S. R. (2010).
22(5):1245–1271.
Decoding Poisson spike trains by Gaussian filtering.
Neural computation,
Loader, C. R. (1999). Bandwidth selection: classical or plug-in? The Annals of Statistics, 27(2):415–
438.
Ma, L., Hyman, J. M., Lindsay, A. J., Phillips, A. G., and Seamans, J. K. (2014). Differences in the
emergent coding properties of cortical and striatal ensembles. Nature neuroscience, 17(June):1100–
1106.
Macke, J. H., Berens, P., Ecker, A. S., Tolias, A. S., and Bethge, M. (2009). Generating spike trains with
specified correlation coefficients. Neural Comput., 21(2):397–423.
Mainen, Z. F. and Sejnowski, T. J. (1995). Reliability of Spike Timing in Neocortical Neurons. Science,
268(5216):1503–1506.
Mante, V., Sussillo, D., Shenoy, K. V., and Newsome, W. T. (2013). Context-dependent computation by
recurrent dynamics in prefrontal cortex. Nature, 503(7474):78–84.
X
Mazor, O. and Laurent, G. (2005). Transient dynamics versus fixed points in odor representations by
locust antennal lobe projection neurons. Neuron, 48:661–673.
Mazurek, M. E. and Shadlen, M. N. (2002). Limits to the temporal fidelity of cortical spike rate signals.
Nature neuroscience, 5:463–471.
Nawrot, M., Aertsen, A., and Rotter, S. (1999). Single-trial estimation of neuronal firing rates: from
single-neuron spike trains to population activity. J Neurosci Methods, 94(1):81–92.
Nawrot, M. P., Boucsein, C., Rodriguez Molina, V., Riehle, A., Aertsen, A., and Rotter, S. (2008).
Measurement of variability dynamics in cortical spike trains. Journal of Neuroscience Methods,
169(2):374–390.
Nicolelis, M., Baccala, L., Lin, R. C., and Chapin, J. K. (1995). Sensorimotor encoding by synchronous
neural ensemble activity at multiple levels of the somatosensory system. Science, 268(June):1353–
1358.
Nicolelis, M. A., Ghazanfar, A. A., Stambaugh, C. R., Oliveira, L. M., Laubach, M., Chapin, J. K.,
Nelson, R. J., and Kaas, J. H. (1998). Simultaneous encoding of tactile information by three primate
cortical areas. Nature neuroscience, 1:621–630.
Olson, C. R., Gettner, S. N., and Kass, R. E. (2000). Neuronal activity in macaque supplementary eye
field during planning of saccades in response to pattern and spatial cues. Journal of Neurophysiology,
84:1369.
Oram, M. W., Földiák, P., Perrett, D. I., and Sengpiel, F. (1998). The ’ideal homunculus’: Decoding
neural population signals. Trends in Neurosciences, 21:259–265.
Panzeri, S., Brunel, N., Logothetis, N. K., and Kayser, C. (2010). Sensory neural codes using multiplexed temporal scales. Trends in Neurosciences, 33(3):111–120.
Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33(3):1065–1076.
Perkel, D. H., Gerstein, G. L., and Moore, G. P. (1967). Neuronal Spike Trains and Stochastic Point
Processes. Biophysical Journal, 7(4):391–418.
Peyrache, A., Khamassi, M., Benchenane, K., Wiener, S. I., and Battaglia, F. P. (2009). Replay of
rule-learning related neural patterns in the prefrontal cortex during sleep. Nature Publishing Group,
12(7):919–926.
Ponce-Alvarez, A., Kilavik, B. E., and Riehle, A. (2010). Comparison of local measures of spike time
irregularity and relating variability to firing rate in motor cortical neurons. Journal of Computational
Neuroscience, 29:351–365.
Pouget, A., Dayan, P., Zemel, R., and House, A. (2000). Information Processing with Population Codes.
Nature Reviews Neuroscience, 1(2):125–132.
XI
Quiroga, R. Q., Reddy, L., Kreiman, G., Koch, C., and Fried, I. (2005). Invariant visual representation
by single neurons in the human brain. Nature, 435(7045):1102–1107.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition.
Raman, B., Joseph, J., Tang, J., and Stopfer, M. (2010). Temporally Diverse Firing Patterns in Olfactory
Receptor Neurons Underlie Spatiotemporal Neural Codes for Odors. The Journal of Neuroscience,
30(6):1994–2006.
Reinagel, P. and Reid, R. C. (2002). Precise firing events are conserved across neurons. The Journal of
neuroscience, 22(16):6837–6841.
Riehle, A., Grün, S., Diesmann, M., and Aertsen, A. (1997). Spike synchronization and rate modulation
differentially involved in motor cortical function. Science, 278:1950–1953.
Rieke, F., Warland, D., de Ruyter van Steveninck, R. R., and Bialek, W. (1997). Spikes: Exploring the
Neural Code. MIT Press, Cambridge, MA, 1st edition.
Rigotti, M., Barak, O., Warden, M. R., Wang, X.-J., Daw, N. D., Miller, E. K., and Fusi, S.
(2013). The importance of mixed selectivity in complex cognitive tasks. Nature, advance online
publication(7451):585–590.
Rosenblatt, M. (1956). Remarks on Some Nonparametric Estimates of a Density Function. The Annals
of Mathematical Statistics, 27:832–837.
Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators. Scandinavian
Journal of Statistics, 9(2):65–78.
Sain, S. (1994). Adaptive Kernel Density Estimation.
Sain, S. R. (2002). Multivariate locally adaptive density estimation. Computational Statistics and Data
Analysis, 39:165–186.
Sain, S. R. and Scott, D. W. (2002). Zero-Bias Locally Adaptive Density Estimators. Scandinavian
Journal of Statistics, 29:441–460.
Sakurai, Y. (1996). Population coding by cell assemblies - what it really is in the brain. Neuroscience
Research, 26:1 – 16.
Schreiner, C. E., Read, H. L., and Sutter, M. L. (2000). Modular Organization of Frequency Integration
in Primary Auditory Cortex. Annual Review of Neuroscience, 23:501–529.
Scott, D. W. and Terrell, G. R. (1987). Biased and Unbiased Cross-Validation in Density Estimation.
Journal of the American Statistical Association, 82(400):1131–1146.
Segundo, J. P., Sugihara, G., Dixon, P., Stiber, M., and Bersier, L. F. (1998). The spike trains of
inhibited pacemaker neurons seen through the magnifying glass of nonlinear analyses. Neuroscience,
87(4):741–766.
XII
Seidemann, E., Meilijson, I., Abeles, M., Bergman, H., and Vaadia, E. (1996). Simultaneously recorded
single units in the frontal cortex go through sequences of discrete and stable states in monkeys performing a delayed localization task. The Journal of neuroscience, 16(2):752–768.
Shadlen, M. N. and Newsome, W. T. (1995). Is there a signal in the noise? Current opinion in neurobiology, 5:248–250.
Shadlen, M. N. and Newsome, W. T. (1998). The variable discharge of cortical neurons: implications for
connectivity, computation, and information coding. The Journal of neuroscience, 18(10):3870–3896.
Sheather, S. J. and Jones, M. C. (1991). A Reliable Data-Based Bandwidth Selection Method for Kernel
Density Estimation. Journal of the Royal Statistical Society. Series B (Methodological), 53(3):683–
690.
Shimazaki, H. and Shinomoto, S. (2010). Kernel bandwidth optimization in spike rate estimation. J
Comput Neurosci, 29(1-2):171–182.
Shinomoto, S., Kim, H., Shimokawa, T., Matsuno, N., Funahashi, S., Shima, K., Fujita, I., Tamura,
H., Doi, T., Kawano, K., Inaba, N., Fukushima, K., Kurkin, S., Kurata, K., Taira, M., Tsutsui, K. I.,
Komatsu, H., Ogawa, T., Koida, K., Tanji, J., and Toyama, K. (2009). Relating neuronal firing patterns
to functional differentiation of cerebral cortex. PLoS Computational Biology, 5(7).
Shinomoto, S., Miyazaki, Y., Tamura, H., and Fujita, I. (2005). Regional and laminar differences in in
vivo firing patterns of primate cortical neurons. Journal of neurophysiology, 94(March 2005):567–
575.
Shinomoto, S., Shima, K., and Tanji, J. (2002). New classification scheme of cortical sites with the
neuronal spiking characteristics. Neural Networks, 15:1165–1169.
Shinomoto, S., Shima, K., and Tanji, J. (2003). Differences in spiking patterns among cortical neurons.
Neural computation, 15:2823–2842.
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall/CRC
Monographs on Statistics and Applied Probability. Chapman and Hall/CRC, first edition.
Simonoff, J. (1996). Smoothing methods in statistics. Springer, New York.
Softky, W. R. (1995). Simple codes versus efficient codes. Current Opinion in Neurobiology, 5:239–250.
Softky, W. R. and Koch, C. (1993). The highly irregular firing of cortical cells is inconsistent with
temporal integration of random EPSPs. The Journal of Neuroscience, 13:334–350.
Song, D., Chan, R. H. M., Marmarelis, V. Z., Hampson, R. E., Deadwyler, S. a., and Berger, T. W.
(2009). Nonlinear modeling of neural population dynamics for hippocampal prostheses. Neural
Networks, 22(9):1340–1351.
Stokes, M. G., Kusunoki, M., Sigala, N., Nili, H., Gaffan, D., and Duncan, J. (2013). Dynamic Coding
for Cognitive Control in Prefrontal Cortex. Neuron, 78(2):364–375.
XIII
Stopfer, M., Jayaraman, V., and Laurent, G. (2003). Intensity versus identity coding in an olfactory
system. Neuron, 39:991–1004.
Szucs, A., Pinto, R. D., Rabinovich, M. I., Abarbanel, H. D. I., and Selverston, A. I. (2003). Synaptic
modulation of the interspike interval signatures of bursting pyloric neurons. Journal of neurophysiology, 89:1363–1377.
Taylor, C. C. (1989). Bootstrap choice of the smoothing parameter in Kernel density estimation.
Biometrika, 76(4):705–712.
Terrell, G. R. and Scott, D. W. (1992). Variable Kernel Density Estimation. The Annals of Statistics,
20(3):1236–1265.
Tovee, M. J., Rolls, E. T., Treves, A., and Bellis, R. P. (1993). Information encoding and the responses
of single neurons in the primate temporal visual cortex. Journal of neurophysiology, 70(2):640–654.
Tran, L. T. (2010). Variable-Kernel Density Estimates for Time Series. The Canadian Journal of
Statistics, 19(4):371–387.
Truccolo, W., Eden, U. T., Fellows, M. R., Donoghue, J. P., and Brown, E. N. (2005). A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic
covariate effects. Journal of neurophysiology, 93(2):1074–1089.
Tuckwell, H. C. (1988). Introduction to theoretical neurobiology. Cambridge studies in mathematical
biology, 8. Cambridge University Press.
Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
Wand, M. P. and Jones, M. C. (1994). Kernel smoothing, volume 60. Crc Press.
Wessberg, J., Stambaugh, C. R., Kralik, J. D., Beck, P. D., Laubach, M., Chapin, J. K., Kim, J., Biggs,
S. J., Srinivasan, M. A., and Nicolelis, M. A. (2000). Real-time prediction of hand trajectory by
ensembles of cortical neurons in primates. Nature, 408(1):361–365.
Wohrer, A., Humphries, M. D., and Machens, C. K. (2013). Population-wide distributions of neural
activity during perceptual decision-making. Progress in Neurobiology, 103:156–193.
Yu, B. M., Afshar, A., Santhanam, G., Ryu, S., Shenoy, K., and Sahani, M. (2006). Extracting dynamical
structure embedded in neural activity. Advances in Neural Information Processing Systems, 18:1545–
1552.
Yuan, K. and Niranjan, M. (2010). Estimating a state-space model from point process observations: a
note on convergence. Neural computation, 22:1993–2001.
Zougab, N., Adjabi, S., and Kokonendji, C. C. (2014). Bayesian estimation of adaptive bandwidth
matrices in multivariate kernel density estimation. Computational Statistics & Data Analysis, 75:28–
38.
XIV
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement