# HW4

```G80.3042.002 – Fall 2007
Statistical Analysis and Modeling of Neural Data
Homework 4
Due: Wednesday, 19 Dec 2007
Your results should be in the form of a MATLAB file (typically, the filename should have an
extension of .m). Email your solutions to [email protected] and [email protected]
1. Mutual Information of Gaussian variables.
A well-known normative theory for early sensory processing is that it is designed to optimize the mutual information between stimuli and responses, which we showed in class is
equal to: , where is the entropy.
(a) Derive the entropy of a multi-variate (i.e., vector-valued) Gaussian distribution. It
might be easiest to first write the scalar case.
(b) Imagine you have two neurons, and you can approximate their noise with additive uncorrelated Gaussian random variables, each of variance . Assume also their mean
responses across a particular ensemble of inputs are Gaussian distributed, with covariance matrix that has values on the diagonal, and on the off-diagonal. Write
down the mutual information, as a function of the parameters .
(c) How does the mutual information vary with the correlatedness of the two responses? In
particular, plot the mutual information as a function of over the range . Where
is the minimum? Why?
2. Spike sorting
Two elementary clustering algorithms are the k-nearest neighbor algorithm and the k-means
algoritm. In this problem, you will use these algorithms to sort extracellular recorded action
potentials.
Data format: The data is organized in one data structure Sp with two fields, Times and
Waveforms. Sp.Times contains the times of all the spikes in the recording in units of ms.
Sp.Waveforms contains the event minimum-aligned threshold-crossing waveforms with
one event per row. The minimum of each waveform is at bin 6.
The data set contains 68098 events recorded over a 15 minute interval. The threshold for
spike detection was chosen so that not all these events are waveforms from action potentials.
Therefore sorting these waveforms into clusters will, at a minimum, give a multiunit cluster
that contains activity from many neurons combined with background neural noise. Other
clusters may also be present. These clusters contain action potentials that may be isolated
from the multiunit activity and each other.
(a) Use PCA to project the event waveforms into a 3-D subspace. How much variance
of the original data does this subspace contain? Plot projections of the events into this
subspace. First, plot all the data in the recording, then divide it up into 10000 event sections and plot the data within each section. Are there any differences between the sections? Describe them. Hint: You can use the command plot(x,y,’.’,’Markersize’,0.5)
to make it easier to see the points.
homework04
2
(b) Implement the k-means algorithm and use it to sort all the data points in the recording. The steps are to choose the number of clusters, initialize the centers of each cluster,
assign each event to the cluster with the nearest mean, recompute the centers of each
cluster and iterate until convergence. Plot your results as cluster plots to visually evaluate the performance of the algorithm. Comment on your results. How do you decide
how many clusters to seed? How many events in each sorted cluster satisfy the requirement for refractoriness?
(c) Repeat this clustering for each 10000 event subsection. Comment on your results.
What evidence is there that the number of clusters changes over the course of the
recording? How well is refractoriness achieved using the clusters derived in this way?
(d) The nearest-neighbors algorithm is hard to apply to this data set because there are so
many points. Use k-means to overcluster the data and selectively merge clusters using
the nearest-neighbors algorithm on the clusters instead of the original data points. This
is one way to avoid so many computations. Work with the Euclidean distance in the
3-D subspace, choose an appropriate criterion (distance or number of clusters) to stop
merging and use only the distance to the nearest neighbor. Compare your results with
the results of using straight k-means.
```