Classification of Neuronal Subtypes in the Striatum and the

Classification of Neuronal Subtypes in the Striatum and the
Classification of Neuronal
Subtypes in the Striatum and the
Effect of Neuronal Heterogeneity
on the Activity Dynamics
Classification of Neuronal Subtypes in the
Striatum and the Effect of Neuronal
Heterogeneity on the Activity Dynamics
Master’s Thesis at CSC
Supervisor: Hjerling-Leffler Jens & Kumar Arvind
Examiner: Lansner Anders
Clustering of single-cell RNA sequencing data is often used to show
what states and subtypes cells have. Using this technique, striatal cells
were clustered into subtypes using different clustering algorithms. Previously known subtypes were confirmed and new subtypes were found.
One of them is a third medium spiny neuron subtype.
Using the observed heterogeneity, as a second task, this project questions whether or not differences in individual neurons have an impact
on the network dynamics. By clustering spiking activity from a neural network model, inconclusive results were found. Both algorithms
indicating low heterogeneity, but by altering the quantity of a subtype
between a low and high number, and clustering the network activity in
each case, results indicate that there is an increase in the heterogeneity.
This project shows a list of potential striatal subtypes and gives
reasons to keep giving attention to biologically observed heterogeneity.
Klassificering av neuronala subtyper i striatum och
effekten av neuronal heterogenitet på
Klustring av singelcell RNA sekvenseringsdata används ofta till att visa
vilka tillstånd eller subtyper som celler har. Med denna teknik har här
striatala celler klustrats in i subtyper med olika klustringsalgoritmer.
Tidigare kända subtyper har konfirmerats och nya subtyper har hittats.
En av dem är en tredje medium spiny neuron.
Som en andra uppgift, med denna observerade heterogenitet som
grund, ifrågasätter detta projekt om skillnader mellan individuella nervceller har en inverkan pånätvärksdynamiken. Genom att klustra spikaktivitet från en neural-nätverksmodell har tvetydiga resultat hittats.
Båda algoritmerna visar på låg heterogenitet, men genom att ändra fördelningen av en subtyp och klustra i båda fallen så fanns resultat som
indikerar att heterogeniteten ökar när antalet nervceller av subtypen
Det här projektet visar en lista på potentiella striatala subtyper
och ger anledningar till att fortsätta ge uppmärksamhet till biologiskt
observerad heterogenitet.
1 Introduction
1.1 The QUESTION . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Task 1. Classification of neuronal types in the striatum . . . . .
1.2.1 Questions . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Task 2. Effect of neuronal heterogeneity on the activity dynamics
the striatum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6 Ethical concerns and sustainability . . . . . . . . . . . . . . . . .
1.7 Societal aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.8 Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
2 Background and theory
2.1 Striatum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Basal ganglia and related areas . . . . . . . . . . . . . . . . .
2.1.2 Network structure of the striatum . . . . . . . . . . . . . . .
2.1.3 Medium spiny neurons . . . . . . . . . . . . . . . . . . . . . .
2.1.4 Interneurons . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.5 Striosomes and matrisomes . . . . . . . . . . . . . . . . . . .
2.2 Single-cell RNA sequencing . . . . . . . . . . . . . . . . . . . . . . .
2.3 Clustering algorithms . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 BackSPIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Geneteams . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3 t-distributed stochastic neighbour embedding . . . . . . . . .
2.3.4 Relative expression index . . . . . . . . . . . . . . . . . . . .
2.4 Neural modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 The Izhikevich and the Multi-timescale adaptive threshold
model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2 State-dependent Stochastic Bursting Neuron . . . . . . . . .
2.4.3 Iaf-cond-alpha . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Network activity modes . . . . . . . . . . . . . . . . . . . . . . . . .
What are network activity modes? . . . . . . . . . . . . . . .
Spike-Train Communities . . . . . . . . . . . . . . . . . . . .
3 Experiments
3.1 Clustering single-cell RNA sequencing data . . . . .
3.1.1 BackSPIN . . . . . . . . . . . . . . . . . . . .
3.1.2 Geneteams . . . . . . . . . . . . . . . . . . .
3.1.3 t-distributed stochastic neighbour embedding
3.2 Neural network simulation . . . . . . . . . . . . . . .
3.2.1 Multi-adaptive threshold neuron model . . .
3.2.2 Spike train clustering . . . . . . . . . . . . .
3.2.3 State-dependent Stochastic Bursting Neuron
3.2.4 The network . . . . . . . . . . . . . . . . . .
3.2.5 Simulations . . . . . . . . . . . . . . . . . . .
3.2.6 Neuron model behaviour characteristics . . .
5 Conclusions
5.1 Striatal subtypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Neural network simulations . . . . . . . . . . . . . . . . . . . . . . .
6 Future work
6.1 Clustering single-cell RNA sequencing data . . . . . . . . . .
6.1.1 Agglomerative clustering . . . . . . . . . . . . . . . . .
6.1.2 Geneteams improvements . . . . . . . . . . . . . . . .
6.1.3 Cell Expression format . . . . . . . . . . . . . . . . . .
6.1.4 Negative Binomial Stochastic Neighbour Embedding .
6.2 Network activity . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.1 Auditory encoding ISI:s . . . . . . . . . . . . . . . . .
6.2.2 Community detection algorithm and tSNE comparison
4 Results and Analysis
4.1 Striatal subtypes . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 Medium spiny neuron . . . . . . . . . . . . . . . . . . .
4.1.2 Interneurons . . . . . . . . . . . . . . . . . . . . . . . .
4.1.3 Striosome and matrisome . . . . . . . . . . . . . . . . .
4.1.4 Geneteams results . . . . . . . . . . . . . . . . . . . . .
4.1.5 Geneuron 2.0 . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Neural network simulations . . . . . . . . . . . . . . . . . . . .
4.2.1 Humphries spike train communities . . . . . . . . . . . .
4.2.2 t-distributed stochastic neighbour embedding clustering
Chapter 1
The heterogeneity in the brain is vast, especially when looking at mRNA expression levels. We are only beginning to discover the hidden subclasses of neurons
and their functions. Researchers have recently clustered neurons into classes and
subclasses using single-cell RNA sequencing (scRNA-seq) expression levels from the
mouse hippocampal CA1 region and somatosensory cortex [50]. ScRNA-seq was
named method of the year in 2013 by Nature methods [1]. An interesting part for
computational modellers is whether or not these subclasses have a function that
will affect the signalling behaviour for the individual neuron or even at a network
level. By knocking out a single gene, doing immunohistochemistry or blocking its
corresponding protein one can begin to draw conclusions about this. Typically
scRNA-seq data has meta-data such as the diameter of the cell. This meta-data
is interesting to look at when deciding upon classes but also when constructing a
neural network model based on these classes. A class usually has a marker or can
be marked by overlapping markers. This enables experimentalists to look at the
electrophysiology and morphology of the class which are also crucial information
for classification and neural modelling.
This project consists of two tasks with the first one involving classification of
subtypes in a brain area called striatum using scRNA-seq data. The second task
asks whether or not heterogeneity in individual neurons matter on a network level,
by analysing spike trains from a neural network. The first task is part of a larger
project at Karolinska Institute and will likely be published during 2016, containing
experimental confirmations. This master project presents some potentially new
subtypes of neurons the striatum.
This project aims to answer several questions listed in the next sections. If one were
to compress this project into one question it would be: What neuronal subtypes
exist in the striatum, and what various modes of network activity do they enable
from a computational perspective?
Task 1. Classification of neuronal types in the striatum
Using clustering algorithms such as affinity propagation and hierarchical clustering
to classify scRNA-seq data has been common and recently, new algorithms have
started to emerge. BackSPIN is one of them and has been shown to have quite impressive biological relevance and robustness [50]. There is still room for improvement
though and Kenneth Harris (working with Hjerling-Leffler J.) has been developing
an algorithm called Geneteams [13]. This project aims to compare, analyse and explore the data from the two algorithms and see if clusters for striosome/matrisome
markers or a third medium spiny neuron (MSN) type can be justified. The idea
behind the potential existence of a third MSN subtype came from preliminary clustering and expression analysis of the scRNA-seq data used in this project. The
scRNA-seq data used in this project is sampled from the mouse striatum with 1412
cells in total.
The following questions were posed for task 1:
• Are there clusters which divide the striosome/matrisome markers of the striatum?
• How do the striosome/matrisome markers correlate with other clusters?
• What markers are there for each of the clusters?
• Is there a third MSN subtype?
The following is a list of hypotheses for task 1:
• The striosome/matrisome division can be justified with clustered scRNA-seq
• There is a third MSN population in the striatum justified by scRNA-seq data.
• There are markers in the striatum which can tell us about the function of a
particular neuron class.
Task 2. Effect of neuronal heterogeneity on the
activity dynamics of the striatum
The fact that there are different subtypes of neurons with different types of spiking behavior and connections is known. Recent experiments show that different
inhibitory interneurons display cell type specific activity [30] and may perform different computations [22, 45]. Despite these observations, it is not clear how this
heterogeneity affects the dynamics of a neuronal network. For instance, if a network is composed of 10 different neuron types, should it express 10 different activity
modes and each type would perform a different computational role? Previous modelling works suggest that the impact of a specific neuron type on the network activity
itself depends on the dynamical state of the activity [36].
The neuronal clusters observed in striatal scRNA-seq data have different functions
which influence network behaviour and can be demonstrated by clustering spiking
data from neural network simulations.
The overall objective of this project consists of the above described Task 1 and 2.
More specifically the objectives of task 1 were the following:
• Cluster striatal scRNA-seq data using the BackSPIN or Geneteams algorithms
with subtypes as result.
• Create a neural network based on the subtypes from the clustering algorithms.
• cluster the spiking data from the neural network simulations and discuss its
This work is important to neural network modellers since it assesses the never ending
modelling thought: "Is my model detailed enough or is it too abstract?". The results
from experimentalists often show great heterogeneity, but when modelling, the heterogeneity is very often assumed to be insignificant or included in some other factor.
One example is the different currents in a neuron. A very common method is to
adjust the model parameters to mimic electro-physiological data, without modelling
all the existing currents. Assuming that for example a certain current is not important and therefore integrated into another more general current. We will not know
if the neuron misses some functionality not captured by the electro-physiological
data. The same goes for networks of neurons. Let us say there is high biological
heterogeneity in a certain neuron population. If we create a neural network that
behaves quantitatively similar to the other network but with one or a few different neuron types (less heterogeneity), can we say they are equivalent? Or did we
loose some function not captured by the quantitative measurements? If evidence
is acquired that the latter is true then computational modellers will have reasons
to build more heterogeneous models and have greater respect for the consequences
of heterogeneity in neuronal populations. Experimentalists and bioinformaticians
will get clues to whether or not the differences in scRNA expression, captured by
cluster algorithms, actually have an effect on the neural network level or not.
Ethical concerns and sustainability
How many lives have been sacrificed for this project? The answer depends on what
you consider a life with value. Many mice have been sacrificed for the scRNA-seq
data used in this project. How do we justify this? Before asking this we need to
answer how it is possible that we still eat animals everyday. In a utopia I would
like to believe that all lives are valued equally. We value biological experimental
data from animals high but not the consciousness that is created from it. Perhaps
we value ourselves so much that the value of animals are reduced below a certain
threshold that enables experiments. Indeed this kind of data enables research that
will be important not only to understand how the brain functions, but also diseases
and how to treat patients. With regards to why we eat animals, it has nothing to
do with diseases and saving lives unless the people eating are very poor. Long before preventing animal research, one should prevent eating of animals in industrial
countries. Until then, the show must go on.
This project also contains neural modelling and simulations which is a step away
from doing animal experiments. In the future, when our empathy regarding eating
animals and animal experiments is developed enough, I think modelling will become
even more important and widely used.
Societal aspects
Why should the society care about this research? The society consists of individuals, with a consciousness, created by the brain, and understanding this involves
understanding our existence and what we are, where we came from and how we
evolved. The society also has a large group of people that do not agree that consciousness is created by the brain. To know what types of neurons exist in the brain
and how and why neural heterogeneity matters is important for the society even if
they do not believe this though. It becomes important to most people when they
realize this type of research is fundamental to understanding diseases and creating
treatments for patients with brain related diseases.
Discussion about neurobiology, molecular neuroscience and clustering algorithms
was held with Jens Hjerling-Leffler and some people in his group Carolina Bengtsson Gonzales, Hermany Mungubas, Ana B. Muñoz-Manchado and his colleagues
Kenneth Harris, Sten Linnarsson, Amit Zesiel and Hannah Hochgerner. Computational neuroscience discussions regarding neural network models and simulations
and dynamical systems analysis was held with Arvind Kumar.
Chapter 2
Background and theory
The striatum is the primary input to the basal ganglia (BG) and receives excitatory signals from the cerebral cortex and the thalamus [23]. The BG are a group of
subcortical nuclei that are important for action selection and are involved in motor
learning, planning, execution, decision making, and reward related behaviour. In
order to perform all of these tasks with the same precision as for example mice
or humans, the brain needs to consider all sensory information and motor alternatives and select the appropriate one. The striatum is considered to be a hub for
integrating all this information into the rest of the BG [32].
Basal ganglia and related areas
The basal ganglia are a group of subcortical nuclei that include the striatum, substantia nigra pars compacta (SNc) and reticulata (SNr), globus pallidus external
(GPe) and internal (GPi) segment, and subthalamic nucleus (STN). The connection
between these nuclei can be seen in figure 2.1. Figure 2.2 is a more anatomically
realistic image of the striatum in the mouse brain, but with simplified connectivity.
One often speaks of the indirect and direct pathway of the basal ganglia. From the
striatum to the thalamus one can take two paths not including the one through SNr
(SNr is often ignored in this pathway). The direct pathway goes through GPi and
the indirect pathway goes through GPe, STN and then GPi [31]. For understanding
the neuronal subtypes striatum this pathway is important to have in mind.
The striatum receives massive amounts of input from the cortex, which is why it
is important to understand how the cortex projects to the striatum through the
cortico-striatal pathway [49]. The main part of the cortex is the neocortex which
is divided into six (I-VI) layers of neurons of different type and connectivity [18].
Layers II-VI all project to the striatum, but Layer V has the most dense connections
Figure 2.1. Connectivity within BG and some external connections. The parts
marked with green/blue are considered as part of BG. Glutamatergic connections are
indicated with "+" and GABAergic with "-". Dopaminergic connections are indicated
with "D1,+" meaning excitatory and "D2,-" meaning inhibitory.
[33]. The striatum and cortex are similar in the way that they have a small but
heterogeneous population of interneurons [34] projecting onto a larger group of
principal cells. The interneurons have similarities but are different [39].
The thalamus is located between the cortex and the midbrain. It functions as a
relay for sensory and motor signals but is also involved in consciousness, sleep and
alertness [38]. It is sending glutamatergic axons to the striatum but also receives
output from the BG through GPi and SNr [31].
Pedunculopontine nucleus
The pedunculopontine nucleus (PPN) is located in the brainstem and connects to
the BG network, which is rarely mentioned in BG literature. It is divided into two
Figure 2.2. A more anatomically realistic illustration of striatum in the mouse brain
with simplified connectivity. Image taken from reference: [21].
areas, one containing cholinergic neurons and one containing glutamatergic neurons.
This area is highly involved in the reticular activation system. It was recently found
to be involved in the cholinergic synaptic transmission in the striatum. Hence, a
component connected to the striatum is connected to the system which is responsible
for wakefulness and alertness [11, 46, 39], which could be good to have in mind when
considering the overall function of the striatum. The striatum is indeed bombarded
with inputs from all over the brain [32].
The amygdala is situated within the temporal lobe and is commonly associated with
fear related reactions. That is because it has a main role in emotional reactions
but it is also a major component in memory and decision-making [2]. It sends
glutamatergic axons to the striatum [47].
The hippocampus lies in the temporal lobe underneath the cortex and is a major
component in short and long-term memory and spatial navigation [51]. It has
glutamatergic axons targeting the striatum [47].
Substantia nigra pars compacta
The substantia nigra pars compacta (SNc) is heavily involved in motor activity
modulation. This was observed from animals with lesions in SNc [7]. SNc is believed
to be involved in producing learned responses from stimuli. The firing frequency of
the dopaminergic neurons in SNc is low (0.5-7.0 Hz) [24].
Globus pallidus
The globus pallidus (GP) is often divided into an internal (GPi) and external (GPe)
part. GPi and GPe are both tonically active but GPe inhibit GPi and Subthalamic
nucleus, and GPi continuously inhibits thalamus. If GPi were to become inhibited transiently by the striatum then the corresponding transient activity would be
transferred to the thalamus [31].
Basal ganglia diseases and conditions
The prefrontal cortex (PFC) and the striatum are strongly related to Parkinson’s
(PD) and Huntington’s disease (HD), but also non-movement related conditions
such as schizophrenia and autism [3, 4]. In PD, degeneration occur in the dopaminergic connections from the SNc to the striatum. That is why many PD patients
are treated with L-dopa. HD occurs because of degenerated connections in the
projections from the striatum to GPe [31].
Network structure of the striatum
The striatum consist 90-95% of MSNs (depending on the species) which are GABAergic and thus act as inhibitory neurons. The rest are interneurons of different classes
and can be GABAergic or cholinergic. The internal connectivity of the striatum
can be seen in figure 2.3. Observe that the cholinergic interneuron (Chat) to MSN
connetions are co-operating through dopaminergic synapses from SNc. Figure 2.3
shows the internal connections of the striatum based on the following references:
[35, 6, 15, 39, 47]. Note that this project is not covering glial cells, epithelial cells
or other non neuronal cell types that can be found in the scRNA-seq dataset used
in this project.
Figure 2.4 aims to show the heterogeneity of celltypes in striatum but also that
many celltypes although different have similar firing properties such as low-threshold
spiking (LTS) and fast spiking interneurons (FSIs). Observe that there are some
more Htr3a co-expressing populations (for example Th) as shown in previous studies
Figure 2.3. Connectivity within the striatum and some external connections. Oval
objects are brain structures and rectangles are neuron subtypes. Connection arrows
with no target means that it targets the striatum, but information about the exact
subtypes targeted was not known/found, inconsistent or not relevant for this project.
Medium spiny neurons
There are two types of MSNs marked by Drd1a (MSN1) and Drd2 (MSN2), that are
generally accepted [39]. The division of these subtypes is strong because of the clear
markers (see table 2.1 but also because of the difference in functionality. MSN1 is
involved in the direct pathway whereas MSN2 is involved in the indirect pathway.
Some support exists for a third population of MSNs which contain both D1 and D2
receptors (MSN-D1/D2) [29, 47, 10]. It is an ongoing debate and one of the things
that will be discussed in this report. Table 2.1 shows some of the markers for MSN
and its subtypes. Commonly used MSN markers are Bcl11b or Gpr88 [27]. The
firing frequency of striatal medium spiny neurons lies between 0.2-20 Hz [49].
The heterogeneity of interneurons in the striatum is vast even though they make up
only 5% of the striatal neurons [42]. Lhx6 and Ncald are two interneuron markers
[41]. In the following subsections more or less hypothetical interneuron subtypes will
be presented. These subtypes will serve as a basis for analysing the subtypes found
in the results and analysis section. Table 2.2 shows a list of interneuron subtype
markers. Researchers have been asking whether or not Vip and Cck expressing
Figure 2.4. Different cell types in the striatum including hypothetical types. The
first column of arrows indicate hierarchy. The second column of arrows indicate
relations to functionality based subtypes or other reported subtypes.
Bcl11b, Gpr88
Drd2, Adora2a
Table 2.1. Subtype markers for MSNs.
striatal cells are marking two subtypes, so far without obtaining any conclusive
evidence for either case [42].
Vasoactive intestinal polypeptide
Sst, Npy, Nos1, Chodl
Table 2.2. Subtype markers for striatal interneurons.
Pvalb+ interneurons
Parvalbumin (Pvalb) expressing interneurons, also known as fast-spiking interneurons (FSIs) and sometimes called PV instead of Pvalb [20]. It can spike with
frequencies around 200 Hz and in some instances over 400 Hz. In 2010, Pvalb was
the only striatal neuron which had been observed morphologically to have gap junctions. They:
1) do not fire spontaneously,
2) have low input resistance 50 − 150M Ω,
3) are similar to the FSIs reported in Hippocampus and Cortex, and
4) cannot sustain repetitive firing at low frequencies.
It is possible there are two subtypes of Pvalb-interneurons distingushed by one
group that fires continuously and another group that fires in a stuttering manner.
This might be two states in one subtype though. The functional role of FSIs in the
striatum is to exhibit feed-forward inhibition onto the MSNs for spike time control
[42]. They also connect to other FSIs but only sparsely to LTS and not at all to
cholinergic [41].
Calb2+ interneurons
Calretinin (Calb2) expressing interneurons, sometimes called CR instead of Calb2,
is a small group of interneurons and in 2010 there were not even any recordings
made yet from this subtype. 0.5% of the striatal neurons in rat are Calb2+. This
subtype is much more common in primates than rodents. [42]
Npy+ LTS interneurons
Neuropeptide-Y (Npy), nitric oxide synthase (Nos) and somatostatin-expressing
(Sst) interneurons has previously been clustered into one group, with the potential
for more subtypes [42] as shown by reference: [27]. This group is equivalent to the
low-threshold spiking persistent depolarizing plateau potential interneurons (PLTS)
[42]. A large portion of the cells exhibit tonic spontaneous activity in mice but not
in rats [39].
Npy+ NGF interneurons
Neuropeptide-y expressing (Npy+) neurogliaform (NGF) is another small interneuron population. Information about these smaller subtypes is hard to find. NGF
cells have also been found in the Htr3a expressing interneurons [16] This celltype
does not seem to exist in rats, but they definitely do in mice [39].
Th+ interneurons
The tyrosine hydroxylase (Th) expressing interneuron has been found to have four
electro-physiologically different subtypes. All of the type IV has a LTS component. None of the types are similar to MSNs, SNc dopaminergic neurons, FSIs or
cholinergic interneurons [15].
Htr3a+ interneurons
This is a large group co-expressing with Pvalb around 20%. Two distinct subtypes
not overlapping with Pvalb or Npy/Sst/Nos1 have been identified. A late-spiking
(LS) Htr3a-NGF population, that is Npy-negative, and a larger group with LTS-like
activity even though they are Npy/Sst/Nos1-negative. The Htr3a-LTS intenreurons
differ from other LTS in that they respond to nicotine administration [27]. It would
be interesting to analyse the differences between the NPY-NGF/LTS and the Htr3aNGF/LTS populations.
Chat+ interneurons
Cholinergic interneurons (Chat) are not similar to the other interneurons in striatum. Intead of GABAergic synapses it forms cholinergic synapses with MSNs and
other interneurons. Axons from neurons SNc connect to this interneuron but axons
Figure 2.5. Example of scRNA-seq pipline. Image taken from reference: [50].
from GABAergic interneurons in striatum has also been reported. They make up
0.5-1% of the striatal interneurons. They are tonically active [39].
Vip/Cck+ interneurons
Vasoactive intestinal polypeptide (Vip) and Cholecystokine (Cck) are two potential
subtypes that are hard to find and low in numbers [42]. This also applies to finding
information about these subtypes in literature.
Striosomes and matrisomes
MSNs can as mentioned above be divided into the two groups with one contributing
to the indirect pathway (Drd2) and the other to the direct pathway (Drd1a). But
they can also be divided into striosome and matrisomes also called patch and matrix.
The idea is that striosomes and matrisomes have different functions in the striatum
but it is not entirely understood how. The matrix is part of the sensory-motor
and associative circuit, and the striosome receives input from the limbic cortex and
project to SNc. More specifically it has been suggested that striosomes enable usage
of limbic information on sensory-motor and associative behaviour [8].
Single-cell RNA sequencing
ScRNA-seq is a method that can capture the transcriptional state of a single cell.
Recall that in a cell, DNA is transcribed to RNA, and RNA tells the cell which
proteins to synthesize, but is also involved in catalysis of biological reactions, gene
expression control, and cell communication [37].
Figure 2.5 shows the overall process of gathering and working with scRNA-seq
data. A more detailed description of the method used for obtaining the scRNA-seq
dataset used in this project can be found in the upcoming publication of this data,
likely during 2016.
Clustering algorithms
Gene markers have often been the way that one defines a group of cells. This is not
optimal since there are more than 24000 genes that need to be considered. Also
marking a cell group is just one way of defining a group. Another way could be that
two genes co-express a certain group of cells or that every cell in a group, expressed
by gene1, also express gene2 except for a certain subset. These kinds of relationships
are hard to find manually, but not impossible for a computer clustering algorithm.
BackSPIN [50] is an unsupervised divisive clustering algorithm based on the SPIN
algorithm [43]. SPIN stands for Sorting Points Into Neighbourhoods and the main
difference between SPIN and BackSPIN is that SPIN does not identify clusters,
which is the whole point with BackSPIN. SPIN is used to sort the distance/correlation
matrix and results in a specific order for the features. BackSPIN can be described
with three steps.
1. Sorting the cells and genes in the expression matrix using the SPIN algorithm
on the unsorted distance matrix resulting in a sorted distance matrix R.
2. Splitting the expression matrix using R in the following function,
S(i) =
j,h=1 Rj,h
i2 +
+ N
j,h=i+1 Rj,h
(N − i)2
for all i = (1, ..., N ), creating a vector of Scores S. Isplit = index of max(S(i)). Now
we have the splitting point Isplit and want to see if it is relevant to make a split.
We calculate Slef t and Sright in which we essentially calculate a ratio between the
mean of the sub-matrix of the correlation and divide it of the mean correlation in
the whole matrix. This ratio will tell us if the split is good or not.
j,h=1 Rj,h
Slef t =
Sright =
j,h=1 Rj,h
Rj,h (N − Isplit
j,h=1 Rj,h
Only split if max(Slef t , Sright ) > 1.15. However this cut off value can be adjusted
as preferred. Also genes are assigned to groups in this step depending on where it
is expressed the most.
3. Recursion on the each sub-matrix consisting of the cells and genes assigned to
each half.
Geneteams is a semi-automatic algorithm with a new way of selecting candidate
features as input to divisive hierarchical clustering. "Team scores" are defined as
Xc = xc · w1 and Yc = xc · w2 where xc is the expression of one cell and w a
non-zero vector that tells each cell to what degree it belongs to either team 1 (w1 )
or 2 (w2 ). To measure how good well the teams are at dividing the cells into two
groups the following function is used:
f (X, Y ) =
(X + 0.5)2 + (Y + 0.5)2
(1 + X + Y )2
To account for the fact that this function can be maximized by either increasing
w or maximization of number of cells in one class, it is complemented by two
penalty terms that can be found in the original article [13]. The enormous search
space that this problem would require is managed by selecting a subset of 100
genes. This is done by using the fact that scRNA-seq data is negative binomially
distributed, and if a gene strays from this distribution, it probably does so because
it is describing a subset of cells. Genes that are expressed only in low levels are
also sorted out. As mentioned above, the algorithm is semi-automatic in its current
form. This can be considered a strength and a weakness. The function described
above is used recursively so that a hierarchical structure is created from the cells in
a divisive manner. At before each split is made the user is required to analyse the
expression and score plots and make the decision of whether or not to split. Users
with high knowledge of biology and genes can benefit from this since they can use
their knowledge to make better split decisions. This unfortunately also makes the
algorithm very time consuming for the user since there can be a lot of splits in large
heterogeneous datasets.
t-distributed stochastic neighbour embedding
The visual clustering technique t-distributed stochastic neighbour embedding (tSNE)
is a machine learning algorithm designed for dimensionality reduction, essentially
used in the same way as principal component analysis (PCA). The tSNE algorithm
is very common among scRNA-seq researchers which I personally can confirm after
attending the Single-Cell Genome conference in Utrecht 2015. It is popular because
of its ease of use and power in separating data points for scatter plotting compared
to other methods such as PCA.
To understand tSNE one must recall the normal (gaussian) distribution and the
t-student distribution which are the building blocks of tSNE. Let x1 , ..., xi be all
the single cells (data points) with N genes (dimensions). First tSNE computes the
probabilities pij in order to find out the similarities between all combinations of
cells xi and xj with the following functions:
exp(−||xi − xj ||2 /2σi2 )
2 ,
k6=j exp(−||xi − xj || /2σi )
pj|i = P
pji =
pj|i + pi|j
In equation 2.5 one may recall the gaussian equation used in a normal distribution.
The nominator of the similarities pj|i is the probability that xi comes from the
normal distribution defined by xj which is used instead of the mean µ or horizontal
position of the bell curve, together with a suitable σi . The nominator is then
normalized by the sum of the same measurement but with all other data point
combinations with i instead of j. pji adds them both together so that it becomes a
two sided measurement of similarity. So now we have the similarities and need to
create a d-dimensional map y1 , ..., yN with yi Rd (yi could for example be a 2D point
in a scatter plot). Since we do not know what these values are in the beginning they
will be initialized by using randomly selected values from a student-t distribution.
This is why the tSNE scatter plots can look different when rerunning. For measuring
how similar yi and yj are the following function is used,
(−||yi − yj ||2 /2σi2 )−1
2 −1 .
y6=j (−||yi − yj || /2σi )
qj|i = P
As you can see, the equation above is in a way similar to 2.5 but based on the
student-t distribution. Now, we just have to find a way to give each of the data
points yi and yj the correct amount of similarity (measured by 2.7) which pji can tell
us about. A good way to solve this kind of problem is to use the Kullback-Leibler
pij log( ),
KL(P ||Q) =
and minimize it using gradient descent [25].
Relative expression index
In order to find cluster markers a relative expression index (REI) was calculated
for each cluster. Observe that this method requires that the scRNA-seq data has
already been clustered. Intuitively one can easily realize that you can use the mean
of the cluster and divide it by the mean µ of the rest of the clusters to get genes
that are highly expressed. In reference [13], REI is defined as:
µ 1 − µ2
(µ1 + µ2 + 1)
where µ1 is the mean for the cluster you are interested in and µ2 is the mean for the
rest of the clusters. Another possibility is to use the mean and standard deviation σ
to calculate a REI, since σ could tell us how stable the expression is in each cluster.
Here is an example on how this can be implemented:
(µ2 ∗ σ2 )
where σ2 is the standard deviation for the rest of the clusters. A more standardized
form is needed so that the scale goes between two fixed values such as 0 and 1 as in
equation 2.9. The fraction between µ1 and µ2 tells us about the relative expression
and if σ2 is high it means there is some varying expression outside of the cluster we
are interested in which is not good for a marker.
Neural modelling
The Izhikevich and the Multi-timescale adaptive threshold model
The Izhikevich model is a very time efficient neuron model relative that essentially
can demonstrate all functions that a Hodgkin Huxley model can as shown in the
figure 2.6.
The main drawback of the Izikhivich model is that it has non-linear dynamics
which only enables approximations of numerical solultions [17].
Equation 2.11 shows the original Multi-timescale adaptive threshold (MAT)
model [19]. The original MAT model could only reproduce 9 out of 20 firing modes.
However the MAT with voltage dependent firing threshold can produce all 20 firing
modes shown in the Izikhivich description [48]. The MAT model is given by,
= −V + RI(t),
with a spiking threshold rule given by,
θ(t) =
H(t − tk ) + w
H(t) =
αj exp(−t/τj ),
with variables as given in table 2.3.
No previous network model of the striatum using the NEST simulator and the
MAT or Izhikevich model is available in the literature. NEST stands for NEural
Simulation Tool and is a computer program for simulating large neural networks
[12]. A striatal model with integrate-and-fire (iaf) neurons was published recently
[5]. Models have been made in GENESIS [44] with 1049 Hodgkin Huxley type
neurons [9]. The simulation time would most likely be to high using the amount of
neurons that this project requires. Choosing integrate and fire neurons would be
computationally possible but a lot of neuron firing modes would be lost. The MAT
model is a good middle ground alternative since it is computationally efficient yet
still has high firing complexity.
State-dependent Stochastic Bursting Neuron
The State-dependent Stochastic Bursting Neuron (SSBN) is a new neuron model.
With this neuron it is possible to control the bursting of an integrate and fire neuron
Figure 2.6. Firing modes that Izhikivich model is able produce. Image taken from
reference: [17].
τk (j = 1, ..., L)
αj (j = 1, ..., L)
membrane time constant
model potential
membrane resistance
input current
kth spike time
Number of threshold time constants
jth time constants
weights of the jth time constants
resting value
Table 2.3. Variable explanation for equations 2.11, 2.12 and 2.13.
without changing the frequency-intensity curve [36]. The SSBN is originally a leaky
IAF neuron and therefore it has the same base equation as the MAT model equation
(2.11). It does not however have an adaptive threshold. Instead it has a constant
predefined threshold u at which a burst is fired with probability 1/b, with b spikes
in the burst. There also exists a modified SSBN version in which the number of
spikes b (spikes/bursts) is set to be a function of the mean input current that a
neuron receives. It is drawn from a binomial distribution b ∼ B(n, p), where n is
the maximum number of spikes per burts and set to n = 4, and p is the probability
of one spike.
This is a simple integrate-and-fire neuron model (iaf-cond-alpha) with conductancebased alpha-shaped input current [26] and is defined by the following equation :
= −Ileak − Ispike − Isyn ,
where c is the capacitance, Ileak is the leak current, Ispike is the current for the
neuron spiking mechanism, and Isyn is the synaptic input currents. The leak current
is defined by:
V − Vp
Ileak = c
where Vp is the resting potential and τp is the membrane time constant. Ispike
triggers a spike at a certain predefined threshold and is defined as:
(Vth − Vr )δ(V − Vth ),
V =Vth
where Vth is the firing threshold, Vr is the reset voltage and δ is the delta (or dirac)
function. The explicit equation for Isyn can be found in reference: [26]. It is not included to decrease the amount of equations and variables in this report. Essentially,
Isyn describes how the inputs from inhibitory, excitatory and external excitatory
neurons is interpreted. In iaf-cond-alpha the synaptic current is conductance-based
as opposed to iaf-neuron on which the SSBN neuron is based on. They also have a
different framework as one can easily notice by looking at the C++ source code.
Network activity modes
What are network activity modes?
How can the network activity modes (NAM) be analysed? Should one analyse
from a neuron perspective as in scRNA-seq clustering, or should one cluster based
on time? NAM can be analysed from both of these perspectives but they give us
different answers. If one clusters from a neuron perspective, we basically ask: Can
we separate the neurons based on their individual spike trains? And if one cluster
different time steps we ask: If each neuron is a singer in a choir, can we divide the
song into different parts? We initially considered the neuron perspective. But the
analogy of neurons being singers led me to develop an alternative way of looking
at spiking activity. Why is it that we always look at things? Of course the eye
is probably our most powerful sense, but could we be missing something by not
using our perhaps second most powerful sense, the hearing? In principle I wanted
to convert two spike trains from two connected neurons and see what they sound
like. Direct conversion of spike train to sound has of course been done but I wanted
to convert spike trains into sinus waves. By taking the distance between two spikes,
also called the inter spike interval (ISI), and using that distance to insert a sine
wave scaled with the ISI, we can encode a whole spike train and make each neuron
sing a tone varying with the ISI. Since each spike train is scaled equally we can hear
harmonies between the tones no matter how the tones are scaled. Scaling is done
both to slow the tones down and to bring them to a suitable (human) hearing level.
This could be viewed as art but it also has some scientifically interesting factors.
Are the neurons forming some kind of activity mode or synchrony when the neurons
form a chord?
Spike-Train Communities
The term "Network activity modes" is used to imply that it is the activity from the
neuron perspective that is to be analysed. A different term is preferable since one
can interpret it as activity from a time perspective (instead of neuron). Mark D.
Humphries has done analysis from both time and neuron perspective. The analysis
from the neuron perspective is called "Spike-Train communities", which I find suitable [14]. He has created an algorithm capable of finding groups of similar spike
trains. One of the unique aspects is that it also determines the number of groups
suitable for a certain set of spike trains.
When clustering sets of spike trains a question that comes up is whether or
not to use binning of time intervals so that its possible to sum all spikes in each
interval and use that value instead of 1 and 0 (spike or no spike). There will of
course always be some kind of time interval, but the question is whether or not to
allow multiple spikes or not in each interval. Humphries algorithm allows for both
and the binned version converts each sum to 1 if at least one spike occurred in the
interval. The binned version calculates the proportion of bins that differed between
each pair of vectors (using hamming distance) and puts it in a comparison matrix C.
The binless version uses a Gaussian kernel to create a continuous vector that
represents the spike train. To compare this type of vector he uses the cosine angle
between each pair combination of spike trains to create the comparison matrix C.
This comparison matrix is visualized by using a network where each node is a spike
train and each connection between two nodes are the similarity between the spike
train with thickness encoded to the connection line.
By taking the difference between the number of connecting nodes for each pair
of nodes and the number of expected nodes defined by a null model one gets the
matrix B. The number of positive eigenvectors of this matrix is then used to split
the spike trains into groups. The main difference between Humphries and previous
algorithms is that he makes use of all the eigenvectors instead of only using the
leading one, he uses the full comparison matrix instead of applying a filter function
which saves information, and instead of splitting in two iteratively, the maximum
number of groups is calculated from the number of positive eigenvalues of B. B is
defined as:
B = C − P,
where C is the comparison matrix and P is a null model created from the links in
C. The maximum number of groups is defined as M = n + 1 where n is the number
of positive eigenvalues in B. To manage group membership the matrix S is used,
and defined as:
1, if node i is in group j
Sij =
0 otherwise
The goal of the algorithm is to maximize the following equation:
Q = T r(S t BS),
where T r (trace function) calculates the sum of all diagonal elements, and t means
transpose. The actual splitting is done using the k-means algorithm which needs a
predefined number of clusters K. In this case K is the number between 2 and M
that maximizes Q.
Chapter 3
Clustering single-cell RNA sequencing data
A recently compiled scRNA-seq dataset from the striatum containing 1412 samples
was used to create the clusters using the the BackSPIN and Geneteams algorithm.
The cells were taken from mice at age 20-28 weeks.
A program called Geneuron 2.0 with a graphical user interface was developed
to simplify repetitive plotting and analysis and enabling users with low programming knowledge to participate in the analysis. It is a great enrichment for the
analysis process to include people with excellent biological knowledge. A similar
web-program called Geneuron 1.1 was developed in earlier work for simplifying
analysis of co-expression in multiple genes [50].
In later stages of clustering after main groups already had been found, activity
dependent genes were removed. A gene was considered activity dependent if the
expression increased more than 3 fold during 1 or 6 hours. After this was done
many of the smaller groups defined by a marker were less clear. The list of activity
dependent genes were taken from: [40].
BackSPIN clustered the data hierarchically by splitting each group once per level.
Once the main groups (MSNs/interneuron/others) had been decided the groups
were split again, this time without hierarchy but a specific number of predefined
splits. All backspin clustering was performed by Amit Zesiel, from Sten Linnarssons
lab at KI, since he had an improved version of the algorithm. The difference from
the currently publicly available version (the one described in previous chapter) is
not significant though. My task included analysing the resulting clustering, finding
gene markers for each cluster, integrating the new clusters in Geneuron 2.0 and
comparing with Geneteams and tSNE. Analysis of the clustering included looking
at different cell labels such as diameter or strain to see if there were any differences
between clusters or if a certain cluster contained more of a certain strain than one
could expect by random chance. I will only show the most relevant analyses and
results so as not to overburden the reader.
Geneteams clustering was performed in parallel to give greater legitimacy to the
clusters The algorithm is semi-automatic and was executed with the following guidlines: Keep splitting while:
1) you get a "L-shaped" plot of the team scores,
2) division is not based on sex-specific genes (Xist, Tsix etc),
3) the number of genes in the teams is reasonably small (less than 20),
4) and the cells are not over fitted, which can happen if there are few cells.
Using all 1412 cells, it could take up to a week to perform a complete clustering.
This of course depends on the user and how detailed clustering one desires.
t-distributed stochastic neighbour embedding
Clustering using t-distributed stochastic neighbour embedding (tSNE) was performed but not mainly to gain legitimacy to the clusters but to visualize the clustering in different ways. To cluster with tSNE using all genes cells at once is not
optimal. Contamination, activity dependent genes, and quality differences affects
the clustering a lot.
Neural network simulation
Multi-adaptive threshold neuron model
The multi-adaptive threshold (MAT) neuron model was tested but not used in
the end because the NEST version of the MAT model does not generate the same
behaviour as in the original papers when using type specific parameters. Hans
Plessner could provide the parameters one should use to reproduce the behaviour
of the original articles. I also tested a MAT model adapted by a student at the
University of Freiburg, Germany, to reproduce spiking behaviour of MSNs. The
model was not yet adapted to the NEST framework so more time would be needed
in order to test it for the purpose of this project. Both of these MAT models seem
promising in near future.
Spike train clustering
Humphries algorithm [14] was used on the simulation data. I also created a clustering approach consisting of the follow steps:
-Creating a more suitable data vector for each spike train by binning on low dimension (1ms),
-Scaling up the numbers by multiplying by 10000,
Figure 3.1. Illustration of the neural network model that was used. Arrows indicate
glutametergic connections, and the cirle indicates GABAergic connections.
-Using a gaussian kernel with a certain standard deviation to smooth the curve,
-Insert the vector to tSNE for clustering using two dimensions.
This results in a 2D plot representation of each spike train. The neurons were
labelled so that one could see the type and whether or not it clustered separately.
State-dependent Stochastic Bursting Neuron
The State-dependent Stochastic Bursting Neuron (SSBN) was integrated to a network based on striatum [5] . Since the network was built with IAF neurons it is very
general even though I apply it to the striatum. The network was built in NEST
using the iaf-cond-alpha neuron model together with SSBN.
The network
I used a model developed by Bahuguna et al. [5] as a basis to create a model of
the striatum. The network included four types of neurons: MSN-D1, MSN-D2,
FSI and SSBN, where the SSBN is the new component compared to the network in
Bahuguna et al. All membrane potentials where initiated at a random level between
-80 and -60. Input from cortex and poisson noise to each neuron was set to 5000
Hz. Figure 3.1 is an illustration of the neural network models that was used. The
connection probabilities for the network are stated in table 3.1 and the properties
for the neurons in the network is stated in table 3.2. For more parameters you may
look in the code shared with this project.
Table 3.1. The following connection probabilities were used for the the network
simulation. Connection direction is indicated by row->column.
Threshold (mV):
Resetting level (mV):
Resting membrane potential (mV):
Capacitance of membrane (pF):
MSN d1/d2
Table 3.2. The following properties where used for the neuron models in the simulation.
There were two main simulation setups that were constructed to show a difference
in clustering by altering the heterogeneity. Setup 1 had a low amount of SSBNs
and Setup 2 had a relatively high amount of SSBN as shown in table 3.3.
Setup 1
Setup 2
Table 3.3. Simulation setups.
Neuron model behaviour characteristics
The characteristic behaviour defined by a frequency intensity (FI) curve and transfer
function (TF) for each of the neuron types can be seen in figure 3.2. D1 and D2
had the same parameters. The reason they differ in behaviour in the network is
because of the different synaptic connection probabilities.
(a) FSI
(b) FSI
(c) D1/D2
(d) D1/D2
(e) SSBN
(f) SSBN
Figure 3.2. Transfer function (y=inhibitory, x=excitatory, poisson frequency input),
and frequency intensity curve for each of the neuron types. This input represents total
excitatory and inhibitory input impinging on a MSN.
Chapter 4
Results and Analysis
Striatal subtypes
Early in this project it was established that there was a clear main hierarchy of
neuronal cells consisting of MSNs, interneurons, and another large group with nonneuronal cells which this project does not focus on. These main types can be seen
in figure 4.1. The non-neuronal cells called "others" in figure 4.1 are clearly separated from the neuronal and consist of for example astrocytes, oligodendrocytes,
microglia, cycling cells, endothelial cells, ependymal cells. For analysis of this striatal dataset from a non-neuronal cell perspective you may look for publications
from Goncalo lab at the Karolinska Institute. Figure 4.2 shows the expression of all
1412 cells. At the far right in the figure one can see some cells that were initially
labelled as the neuronal subtypes but were removed later because of low quality
(low molecule counts).
Medium spiny neuron
There is a clear division of the medium spiny neurons (MSNs) into two groups MSN1
and MSN2. Apart from this clear division there are two additional possible groups
MSN3 and MSN4. MSN3 is marked by MSN3-gene1, and can be seen in figure
4.4. They are a bit smaller cells and have slightly lower quality compared to the
other MSNs. MSN4 is is defined by the co-expression of MSN1 markers and MSN2
markers. There are 696 cells expressing either Drd1a or Penk, and 122 cells express
both. That said MSN4 is was not a clear defined cluster in any of the clustering
algorithms BS, GT or tSNE. Therefore MSN4 is not included as a separate group.
These markers can be seen in figure 4.3.
Considering the BS, and tSNE clustering and expression of marker genes I conclude that there are most likely 3 MSN subtypes, as shown in figure 4.5.
Figure 4.1. TSNE plot of all 1412 cells labelled with general types interneurons,
MSNs , other.
Figure 4.2. Expression plot of all 1412 neurons labeled with general types interneurons, MSNs , other.
Figure 4.3. Expression plot of MSN subtype marker genes.
Figure 4.4. tSNE plot with MSN marker Gpr88, Drd1a, Penk and MSN3-gene1.
In BackSPIN interneurons are first of all separated from MSNs and other nonneuronal cell types which can be seen by marker Ncald in figure 4.6. At this level
of the hierarchy all interneurons (including Chat cells) are included. After this, the
Sst expressing cells are clustered separately and the rest of the interneurons can be
described by a large Kit/Htr3a group and the Chat cells.
Chat+ interneurons
Chat cells seem to be one fairly strong cluster. However note that when visualized
with tSNE, some Chat cells formed a small subgroup. Also Ntrk1 marks a subset
of the Chat cells.
Figure 4.5. Hierarchical illustration of MSN subtypes.
Pthlh+ interneurons
A larger group was found in both BS and GT marked by Pthlh and Cox6a2. This
group contains subgroups Pvalb and Pthlh (Pvalb-). There were indications of more
subtypes but no clear markers were found for them.
I find it interesting that Gadd45g is expressed in the NG cells and at the same
time in Pthlh. It is known that a large part of the Htrta/Kit+ cells are NG cells
[27]. Could Gadd45g be a NGC marker?
Th+ interneurons
Th shows one relatively clear cluster. Gal, Cadps2 and Chrna3 are possible markers
of a subtype within this big Th cluster. But the latter two are possibly just low
expressing genes that otherwise would express the whole Th cluster. As described
earlier in this report, there is supposedly 4 types of Th neurons from a electrophysiological viewpoint. Two larger and two smaller groups. At least we can assume
that there is heterogeneity withing the Th expressing cells. The two large groups
could be explained by the markers Gal, Cadps2 and Chrna3. The smaller groups
could be explained by the Th expressing cells that are "outside" of the main Th
cluster. For example there are Th+ neurons in the Sst, Pthlh (and Gadd45g) and
Pvalb clusters. 40/332 ([gene1 and gene2]/[gene1 or gene2]) express both Th and
Pthlh. 26/231 cells express both Th and Pvalb. Although not many in the main
Figure 4.6. tSNE plot with interneuron markers Ncald, Chat, Npy, Kit.
Pvalb cluster. 7/274 cells express both Th and Sst. The Gal marker is a very
interesting subtype of the Th+ cells. In a study [50] using scRNA-seq data from
cortex and hippocampus, an interneuron subtype named Int14 have this marker
expressed relatively clear as seen in figure 4.11. Gal stands for Galanin and is a
neuropeptide on which plenty of research has been done. One study linked Gal in
the mouse striatum to consummatory behaviours [28].
Sst+ interneurons
Sst and Npy are as previously shown often used to mark the same group. Sst and
not Npy was chosen as the marker for these cells since there exists a small population
of Npy+ and Sst-. The expression of Sst can be seen in figure 4.8.
NGF interneurons
There are NGF cells in the Npy+ cells labeled Npy-NGC in figure 4.7. The
Pthlh/Kit/Htr3a+ population has a larger population of NGF cells and ultimately
Figure 4.7. Expression plot with interneuron markers Ncald, Chat, Npy, Kit.
Figure 4.8. Expression plot with interneuron subtype markers Chat, Sst, Pvalb,
Figure 4.9. Expression plot with interneuron subtype markers Crhbp, Gadd45g,
Gal, Cadps2.
Figure 4.10. Hierarchical illustration of interneuron subtypes.
Figure 4.11. ScRNA-seq expression of Gal in cortex/hippocampus. Figure taken
from [?].
Figure 4.12.
Image of matrisome genes with Striatal scRNA-seq expression
one would like a marker for all NGF cells. I believe Gadd45g is a candidate for this.
This gene can be seen in figure 4.9. But note that this is very hypothetical and
experiments is needed to confirm this claim. All I note is that Gadd45g is epressed
by the Npy-NGC and has a population in the Pthlh/Kit/Htr3a+ population. Pnoc
is a gene that marks this small group relatively clear.
Calb2+ interneurons
There were 14 Calb2 expressing cells 4 of which were co-epressing Npy and Sst. One
could still call it a subtype but it is not sorted neatly into any cluster. This could
be because Calb2 is low expressed in general with a peak of 5 molecules/cell.
Vip+ interneurons
There were few Vip+ cells and they had low expression except for one cell with 80
molecules/cell. Interestingly 14/20 Vip cells co-express Th including the one with
80 molecules/cell.
Cck+ interneurons
Cck has low expression with a peak of 11 molecules/cell and is spread out in many
different clusters. This gene does not seem to mark any subtype.
Striosome and matrisome
The markers for matrisomes and striosomes are shown in figures 4.12 and 4.13 but no
specific group or pattern was detected by manual inspection of BackSPIN clustered
Figure 4.13. Image of striosome genes with Striatal scRNA-seq expression heatmap.
(and ordered) data. This strengthens the hypothesis that the patterns observed in
the striatum that are called striosomes and matrisomes are marked by the axons to
the striatum, originating from other structures than the striatum, and not so much
by the striatal neurons themselves.
Geneteams results
The results for the Geneteams algorithm was unfortunately not so robust since it
is user dependent and seems to sometimes suggest gene markers that are widely
expressed but has local variation. Therefore, I decided that the main clustering
would not be made using this algorithm. Instead I used it to confirm subtypes. So
if a subtype with a certain marker was found both in BackSPIN and Geneteams,
it would be considered a stronger type rather than if it was only found in one.
However none of the subtypes were determined solely by Geneteams.
Geneuron 2.0
During this project repetitive plotting and analysis of different genes were performed. A lot of time can often be spent on generating figures and comparing
different genes with each other using different labels such as clustering, chip-id or,
strain etc. Geneuron is an application with a user-friendly interface adapted so
that even people without programming skills should be able to generate plots and
perform data analysis. The idea of Geneuron started while working on a similar
scRNA-seq dataset of cortex and hippocampus from the same labs [50], and is hosted
on Geneuron 2.0 is quite different and more user friendly,
Figure 4.14. Spiking activity in a network model of the striatum with four types of
but is built on the same idea, programming language (R) and packages (Shiny).
Geneuron 2.0 will be shared when the dataset is published.
Neural network simulations
The spiking behaviour of the neurons during the simulation can be seen in figure
4.14. In this simulation one can even intuitively see that there is heterogeneity
between and within the neuron groups.
Humphries spike train communities
As seen in figure 4.15 and 4.16, low heterogeneity is demonstrated using the simulation data. There were 2 groups when the network simulation only had 20 SSBNs
(setup 1), as seen in figure 4.15. There were 3 groups when the network simulation
had 100 SSBNs (setup 2), as seen in figure 4.16. The increase in heterogeneity that
one would expect by including more SSBNs is shown by the increase of groups from
2 to 3 between setup 1 and 2.
Figure 4.15. Humphries spike train communities using 20 SSBNs (setup 1). Each
variation between black and gray indicates a new cluster.
Figure 4.16. Humphries spike train communities using 100 SSBNs (setup 2). Each
variation between black and gray indicates a new cluster.
t-distributed stochastic neighbour embedding clustering
Clustering of the spike trains from the 4 neuron subtypes using tSNE lead to discovering 2 or 3 clusters as seen in figure 4.17, depending on the kernel used before
clustering shown in figure 4.18. It seems the higher the standard deviation (SD)
used in the gaussian kernel, the easier it becomes to separate the clusters. Using
high SD:s does not really make sense though since the result of the kernel application becomes less and less alike the original data, and contains less information.
Figure 4.17 shows that the difference in heterogeneity between setup 1 and 2 is not
very big. Nevertheless when SD=5 the FSIs are more independent in (d) compared
to (c).
(a) SD=1, 20 SSBNs
(b) SD=1, 100 SSBNs
(c) SD=5, 20 SSBNs
(d) SD=5, 100 SSBNs
(e) SD=10, 20 SSBNs
(f) SD=10, 100 SSBNs
Figure 4.17. TSNE plot with 20 SSBNs used in the left column (a, c, e) and 100
SSBNs in the right column (b, d, f). Each row has a different standard deviation
(SD) used in the gaussian filter.
(a) SD=1
(b) SD=1
(c) SD=5
(d) SD=5
(e) SD=10
(f) SD=10
Figure 4.18. Gaussian filtered spike train of an SSBN. The left column contains an
SSBN from the simulation with 20 SSBNs. The right column contains an SSBN from
the simulation with 100 SSBNs. Each row has a different standard deviation (SD).
Chapter 5
Striatal subtypes
In general I note that there were less specific types than I had expected. But there
was not a lack of heterogeneity. The subtypes one would wish for were simply not
clear enough to be declared as subtypes. There are certainly markers that mark a
subset of cells within a certain subtype as mentioned in the results, but since these
do not appear as markers for a specific cluster, it is not wise to include to many of
those. I conclude that there are strong grounds for believing in the subtypes:
MSN1, MSN2, MSN3, Th, Pvalb, Pthlh(Pvalb-), NGC-Npy, Sst and Chat. As for
the potential subtypes marked by: Crhbp, Gadd45g, Gal, Cadps2, I leave it to the
biologists to experimentally confirm their existence. Experimental confirmation is
also needed for some of the stronger subtypes such as MSN3.
Neural network simulations
The answer to the question of whether 10 neuron types can generate 10 types of
spiking behaviour is a matter of interpretation. Depending on what level of detail
one looks at and what algorithm one uses, different answers can arise. According to
the tSNE based algorithm and Humphries spike train community algorithm there
is low heterogeneity in the spike train simulation data that was used. The tSNE
based algorithm sorted all of the neurons into their own area except if the number of
neurons was low enough, such as in setup 1 with only 20 SSBNs. The methods agree
when looking at general heterogeneity but also regarding the difference between
clustering in setup 1 and 2. Setup 2 shows more heterogeneity than setup 1 in both
methods. According to this analysis 10 neuron does not enable 10 NAM. Perhaps
6-8 NAM.
Another conclusion I draw is that the way we look at network activity needs
to be improved. I think we need to better quantify the information that a spiking
neural network is trying to communicate, and I think the key lies in the inter-spike
interval (ISI), and the combinations of ISI:s between neurons.
As a general comment on my own work I would like to say that the questions for
these two tasks that I have tried to answer are not easy questions, and one could
easily have constructed a master project for each of them individually.
Chapter 6
Future work
Clustering single-cell RNA sequencing data
Agglomerative clustering
I was surprised that none of the clustering approaches I studied had an agglomerative (bottom up) clustering approach. This would mean that one would try to pair
cell together into groups one at a time instead of trying to split all cells (divisive
clustering) into 2 or more groups and then split each group. I think there is a
risk of forcing splits but the divisive approach makes sense from a developmental
biology perspective since cells are created from precursor cells and hence divided
into two. However this does not mean that we should apply the same technique.
In fact it means we need to take the opposite approach. My intuition tells me
that if someone created a heterogeneous distribution by divisive splitting, the best
approach to find this heterogeneity is do divide them backwards, in other words
agglomerative clustering. The practical downside of doing this is that we have not
narrowed down what genes are important, as in for example BackSPIN, when we
decide the smallest clusters. Therefore, irrelevant gene expression differences might
create non-meaningful clusters. If one could figure out a way to solve this problem
I think it would be interesting to see how an agglomerative approach would cluster
the cells.
Geneteams improvements
I think the semi-automatic nature of Geneteams is both a strength and weakness.
The gained control takes a lot of time and attention from the user but it also allows
people who know a lot of genes by heart to create a custom clustering in a way that is
perhaps impossible for automatic algorithms. Unfortunately the initial suggestions
for splitting clusters by Geneteams are not always good so the algorithm is not good
for all users yet. I think there is great potential in this algorithm and its approach.
It would be interesting to see how one could make the algorithm automatic by
converting the guidlines I received from Kenenth Harris to programming code.
As noticed, none of the Geneteams results were shown in this report. The
Geneteams algorithm needs improvement in handling scRNA-seq data with high
noise levels. It seems to work well in smaller datasets on around 100-200 cells as
noticed in the original article [13], but when applying it on 1000 cells, some of the
suggested clusters makes no sense. For example the clusters had no marker or was
expressed highly in some other cluster. I think this is just a bug which can be fixed
since the idea that Geneteams builds on is solid.
Cell Expression format
The Cef file format is an attempt to standardise cell expression files. I think this
field greatly needs it and I would encourage usage of this format. However for some
reason people do not work with the format in practice. Not even persons from the
same lab that created the format use it. I believe that the people responsible are the
ones that control motivation in the individual researchers. For example scientific
journals could demand a standardized format in the data released together with
a publication. Not adapting to a standardised format creates a lag in attempts
to collaborate and it limits the computer programs and functions that are built
to analyse the data. I do not blame individual researchers for taking the path of
least resistance, meaning that they work in their own format and ignore export and
import functions or re-adapting the code to the standard format.
If more people start using the cef format one could develop it even further. For
example standard labels in CellInfo. I would suggest adding always working with
one file and having Cellinfo added with a order of clustering instead of constantly
making new files with a new order. instead add an order and its associated cluster
labels. Attitudes in labs is often free and one does not want to force anyone to do
things. This is a good thing, but in cases of standardization which is important for
collaboration one need to encourage standard formats. This could be encouraged
with the help of the funders or journals. Avoid publishing scRNA-seq papers unless
it is in cef format or other agreed format.
Negative Binomial Stochastic Neighbour Embedding
This is an attempt to inspire mathematicians to develop a negative binomial version
of the t-stochastic neighbour embedding machine learning algorithms. At this stage
the idea is purely intuitive and not mathematically tested. I cannot imagine that me
and my supervisor Jens were the only ones that think this might be an interesting
thing to try. With these lines I want to encourage anyone working or thinking about
working with this idea.
Network activity
Auditory encoding ISI:s
I suggest someone investigate how much one can gain by analysing ISI:s between
neurons using auditory encoding. This can be a way of discovering hidden information in spike trains and perhaps discover a new way of looking at network spiking
activity. One way of doing this can be as previously mentioned to encode each ISI
with a tone scaled to a suitable listening tone and scaled speed. Preliminary analyses show that different chord form when listening at two detailed neurons spiking
together. It would be interesting to analyse what these chords and other features
of the sound correspond to in relation to measurements of synchrony, entropy and
Community detection algorithm and tSNE comparison
It would be interesting the compare the performance of Humphries community
detection algorithm and the tSNE-based algorithm developed in this project, when
the neural network is operating in different dynamical regimes, expressing different
Method of the Year 2013. Nature Methods, 11(1):2014, 2014.
K. Amunts, O. Kedo, M. Kindler, P. Pieperhoff, H. Mohlberg, N.J. Shah,
U. Habel, F. Schneider, and K. Zilles. Cytoarchitectonic mapping of the human
amygdala, hippocampal region and entorhinal cortex: intersubject variability
and probability maps. Anatomy and Embryology, 210:343–352, 2005.
Evan G Antzoulatos and Earl K Miller. Differences between neural activity
in prefrontal cortex and striatum during learning of novel abstract categories.
Neuron, 71(2):243–9, July 2011.
Evan G. Antzoulatos and Earl K. Miller. Increases in functional connectivity between prefrontal cortex and striatum during category learning. Neuron,
83(1):216–225, 2014.
Jyotika Bahuguna, Ad Aertsen, and Arvind Kumar. Existence and Control of
Go/No-Go Decision Transition Threshold in the Striatum. PLOS Computational Biology, 11:e1004233, 2015.
Xandra O Breakefield, Anne J Blood, Yuqing Li, Mark Hallett, Phyllis I Hanson, and David G Standaert. The pathophysiological basis of dystonias. Nature
reviews. Neuroscience, 9(march):222–234, 2008.
Gordon K. Hodge Butcher and Larry L. Pars Compacta of the Substantia Nigra
Modulates Motor Activity but is not Involved Importantly in Regulating Food
and Water Intake. Naunyn-Schmiedeberg’s Archives of Pharmacology, (313):51–
67, 1980.
Jill R Crittenden and Ann M Graybiel. Basal Ganglia disorders associated
with imbalances in the striatal striosome and matrix compartments. Frontiers
in neuroanatomy, 5(September):59, 2011.
Sriraman Damodaran, John R Cressman, Zbigniew Jedrzejewski-szmek, and
Kim T Blackwell. Desynchronization of Fast-Spiking Interneurons Reduces
Beta-Band Oscillations and Imbalance in Firing in the Dopamine-Depleted
Striatum. The Journal of neuroscience : the official journal of the Society for
Neuroscience, 35(3):1149–1159, 2015.
[10] Sergi Ferré, Carme Lluís, Zuzana Justinova, César Quiroz, Marco Orru,
Gemma Navarro, Enric I. Canela, Rafael Franco, and Steven R. Goldberg.
Adenosine-cannabinoid receptor interactions. Implications for striatal function.
British Journal of Pharmacology, 160:443–453, 2010.
[11] E. Garcia-Rill. The pedunculopontine nucleus.
36:363–389, 1991.
Progress in Neurobiology,
[12] Marc-Oliver Gewaltig and Markus Diesmann. NEST (neural simulation tool).
Scholarpedia, 2(4):1430, 2007.
[13] Kenneth D Harris, Lorenza Magno, Linda Katona, Peter Lönnerberg, and Ana
B Muñoz Manchado. Molecular organization of CA1 interneuron classes. 2015.
[14] Mark D Humphries. Spike-train communities: finding groups of similar spike
trains. The Journal of neuroscience : the official journal of the Society for
Neuroscience, 31:2321–2336, 2011.
[15] Osvaldo Ibáñez Sandoval, Fatuel Tecuapetla, Bengi Unal, Fulva Shah, Tibor
Koós, and James M Tepper. Electrophysiological and morphological characteristics and synaptic connectivity of tyrosine hydroxylase-expressing neurons
in adult mouse striatum. The Journal of neuroscience : the official journal of
the Society for Neuroscience, 30(20):6999–7016, 2010.
[16] O. Ibanez-Sandoval, F. Tecuapetla, B. Unal, F. Shah, T. Koos, and J. M.
Tepper. A Novel Functionally Distinct Subtype of Striatal Neuropeptide Y
Interneuron. Journal of Neuroscience, 31(46):16757–16769, 2011.
[17] E Izhikevich. Dynamical Systems In Neuroscience. MIT Press, page 111, 2007.
[18] E Kandel. Principles of Neural Science, Fifth Edition. Principles of Neural
Science. McGraw-Hill Education, 2013.
[19] Ryota Kobayashi, Yasuhiro Tsubo, and Shigeru Shinomoto. Made-to-order
spiking neuron model equipped with a multi-timescale adaptive threshold.
Frontiers in computational neuroscience, 3(July):9, 2009.
[20] T Koós and J M Tepper. Inhibitory control of neostriatal projection neurons
by GABAergic interneurons. Nature neuroscience, 2:467–472, 1999.
[21] Alexxai V Kravitz and A C Kreitzer. Striatal mechanisms underlying
movement, reinforcement, and punishment. Physiology (Bethesda, Md.),
27(116):167–77, 2012.
[22] Seung-Hee Lee, Alex C. Kwan, Siyu Zhang, Victoria Phoumthipphavong,
John G. Flannery, Sotiris C. Masmanidis, Hiroki Taniguchi, Z. Josh Huang,
Feng Zhang, Edward S. Boyden, Karl Deisseroth, and Yang Dan. Activation
of specific interneurons improves V1 feature selectivity and visual perception.
Nature, 488(7411):379–383, 2012.
[23] Mikael Lindahl, Iman Kamali Sarvestani, Orjan Ekeberg, and Jeanette Hellgren Kotaleski. Signal enhancement in the output stage of the basal ganglia by
synaptic short-term plasticity in the direct, indirect, and hyperdirect pathways.
Frontiers in computational neuroscience, 7(June):76, January 2013.
[24] T Ljungberg, P Apicella, and W Schultz. Responses of monkey dopamine
neurons during learning of behavioral reactions. Journal of neurophysiology,
67(1):145–163, January 1992.
[25] Laurens Van Der Maaten and Geoffrey Hinton. Visualizing Data using t-SNE.
Journal of Machine Learning Research, 9:2579–2605, 2008.
[26] Hamish Meffin, Anthony N Burkitt, and David B Grayden. An analytical model
for the "large, fluctuating synaptic conductance state" typical of neocortical
neurons in vivo. Journal of computational neuroscience, 16:159–75, 2004.
[27] A B Muñoz Manchado, C Foldi, S Szydlowski, L Sjulson, M Farries, C Wilson,
G Silberberg, and J Hjerling-Leffler. Novel Striatal GABAergic Interneuron
Populations Labeled in the 5HT3aEGFP Mouse. Cerebral cortex (New York,
N.Y. : 1991), pages 1–10, August 2014.
[28] Y. S. Nikolova, E. K. Singhi, E. M. Drabant, and a. R. Hariri. Reward-related
ventral striatum reactivity mediates gender-specific effects of a galanin remote
enhancer haplotype on problem drinking. Genes, Brain and Behavior, 12:516–
524, 2013.
[29] Akinori Nishi, Mahomi Kuroiwa, and Takahide Shuto. Mechanisms for the
modulation of dopamine d(1) receptor signaling in striatal neurons. Frontiers
in neuroanatomy, 5(July):43, 2011.
[30] Lucas Pinto and Yang Dan. Cell-Type-Specific Activity in Prefrontal Cortex
during Goal-Directed Behavior. Neuron, 87(2):437–450, 2015.
[31] D Purves. Neuroscience. Sinauer Associates, 2012.
[32] Ramon Reig and Gilad Silberberg. Multisensory Integration in the Mouse
Striatum. Neuron, 83(5):1200–1212, 2014.
[33] A Rosell and J M Gimenez-Amaya. Anatomical re-evaluation of the corticostriatal projections to the caudate nucleus: a retrograde labeling study in the
cat. Neuroscience research, 34(4):257–269, September 1999.
[34] Bernardo Rudy, Gordon Fishell, SooHyun Lee, and Jens Hjerling-Leffler. Three
groups of interneurons account for nearly 100% of neocortical GABAergic neurons. Developmental Neurobiology, 71:45–61, 2011.
[35] Scott Russo and Eric Nestler. The brain reward circuitry in mood disorders.
Nature Reviews Neuroscience, 14(September):609–625, 2013.
[36] Ajith Sahasranamam, Ioannis Vlachos, Ad Aertsen, and Arvind Kumar. Dynamical state of the network determines the efficacy of single neuron properties
in shaping the network activity. bioRxiv, 2015.
[37] Ehud Shapiro, Tamir Biezuner, and Sten Linnarsson. Single-cell sequencingbased technologies will revolutionize whole-organism science. Nature reviews.
Genetics, 14(July):618–30, 2013.
[38] S. Murray Sherman. Thalamus. scholarpedia, 2006.
[39] Gilad Silberberg and J. Paul Bolam. Local and afferent synaptic pathways
in the striatal microcircuitry. Current Opinion in Neurobiology, 33:182–187,
[40] Ivo Spiegel, Alan R Mardinly, Harrison W Gabel, Jeremy E Bazinet,
Cameron H Couch, Christopher P Tzeng, David A Harmin, and Michael E
Greenberg. Npas4 Regulates Excitatory-Inhibitory Balance within Neural Circuits through Cell-Type-Specific Gene Programs. Cell, 157(5):1216–1229, 2014.
[41] S. N. Szydlowski, I. Pollak Dorocic, H. Planert, M. Carlen, K. Meletis, and
G. Silberberg. Target Selectivity of Feedforward Inhibition by Striatal FastSpiking Interneurons. Journal of Neuroscience, 33(4):1678–1683, 2013.
[42] James M Tepper, Fatuel Tecuapetla, Tibor Koós, and Osvaldo Ibáñez Sandoval.
Heterogeneity and diversity of striatal GABAergic interneurons. Frontiers in
neuroanatomy, 4(December):150, 2010.
[43] D. Tsafrir, I. Tsafrir, L. Ein-Dor, O. Zuk, D. A. Notterman, and E. Domany.
Sorting points into neighborhoods (SPIN): Data analysis and visualization by
ordering distance matrices. Bioinformatics, 21(10):2301–2308, 2005.
[44] M A Wilson, U S Bhalla, J D Uhley, and J M Bower. GENESIS: A system
for simulating neural networks. Advances in Neural Information Processing
Systems, pages 485–492, 1989.
[45] Nathan R. Wilson, Caroline A. Runyan, Forea L. Wang, and Mriganka Sur. Division and subtraction by distinct cortical inhibitory networks in vivo. Nature,
488(7411):343–348, 2012.
[46] Philip Winn. How best to consider the structure and function of the pedunculopontine tegmental nucleus: Evidence from animal studies. Journal of the
Neurological Sciences, 248:234–250, 2006.
[47] L. M. Yager, A. F. Garcia, A. M. Wunsch, and S. M. Ferguson. The ins and
outs of the striatum: Role in drug addiction. Neuroscience, 301:529–541, 2015.
[48] Satoshi Yamauchi, Hideaki Kim, and Shigeru Shinomoto. Elemental Spiking
Neuron Model for Reproducing Diverse Firing Patterns and Predicting Precise Firing Times. Frontiers in Computational Neuroscience, 5(October):1–15,
[49] Man Yi Yim, Ad Aertsen, and Arvind Kumar. Significance of input correlations
in striatal function. PLoS computational biology, 7(11):e1002254, November
[50] Amit Zeisel, Ana B Muñoz Manchado, Simone Codeluppi, Peter Lönnerberg,
Gioele La Manno, Anna Juréus, and Sueli Marques. Cell types in the mouse
cortex and hippocampus revealed by single-cell RNA-seq. 2:1–8, 2015.
[51] Halle R. Zucker and Charan Ranganath. Navigating the human hippocampus
without a GPS. Hippocampus, 25(March):697–703, 2015.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Related manuals

Download PDF