Compute the MLE for a simple function (maybe a binomial distribution)
Why is it useful to maximize the log likelihood rather than the likelihood?
Compute the population vector for a set of neurons. We can either have them do this visually, or we can supply them with data.
Conceptual questions on what bias is, what variance is, and how there's a tradeoff between them.
Give them two tuning curves, one that is very narrow and one that is very wide. Which of these tuning curves would be most informative?
Ideas for Programming Assignments
Adapt Dayan & Abbott Ch. 1 Pr. 1: Give them the Poisson generator (with a random number generator with a set seed so we can actually have them enter exact values) and have them compute the coefficient of variation and Fano factor. Potentially, compare it with the ISI distribution for the H1 data -- ask an interpretive question about the refractory period. -- we might shift this to the HH week and get them to compare an HH neuron and an I&F neuron.
Population vector computation -- I like this one as it can incorporate a specific understanding of Poisson firing.
For a model organism/system (any ideas about what would be best?) with direction sensitive neurons, select a direction of motion and a set of neurons with overlapping tuning curves and different max firing rates.
Sample a number of fixed time trials per neuron using a poisson generator with a firing rate corresponding to the point on the tuning curve for the chosen direction.
Have them compute the population vector for this "population" of artificial neurons by computing the average firing rate per neuron and using the population vector equation to compute the population vector.
I suggest that we give them a code that generates a poisson firing count as a function of input for say 4 cercal-like neurons that fire with cosine tuning curves. i.e. you put in s, wind angle, you get back 4 spike counts. Is there any way to make that code invisible to them but such that they could run it as a function inside a loop?
We could then ask them:
-- what form best describes the tuning curves? (they would have to sample the firing rates as a function of s, and average over many trials.
-- one of the neurons is non-Poisson: ie make the firing rate f(s) = fbar(s) + (rectified) Gaussian random number. Then they could try to diagnose which of the neurons is not Poisson (great multiple choice question!).
-- compute the population vector for these neurons. We could make this multiple choice by asking them to average the RMS error over 20 trials for a specific value of s and ask what range is it in.
Conceptual Questions
1. Likelihood ratio test with asymmetric costs.
Suppose we have a stimulus defined by a single variable, called s. And let's say that s can take one of two values, which we will call s1 and s2. You could think of these as lights flashing in the eyes at one of two possible frequencies. Or perhaps listening to punk rock vs. listening to Dvorak.
Let's call the firing rate response of a neuron to this stimulus r.
Suppose that under stimulus s1 the response rate of the neuron can be roughly approximated with a Gaussian distribution with the following parameters: Mu: 5 Sigma: 0.5
And likewise for s2: Mu: 7 Sigma: 1
Lets say that both stimuli are equally likely, given no other prior information.
Now let's throw in another twist. Let's say that we receive a measurement of the neuron's response and want to guess which stimulus was presented, but that to us it is twice as bad to mistakenly think it is s2 than to mistakenly think it is s1.
Which of these firing rates would make the best decision threshold for us in determining the value of s given a neuron's firing rate?
Hint: There are several functions available to help you evaluate Gaussian distributions. In Octave and in Matlab's stats toolbox you can use the 'normpdf' function. If you know how to set the problem up, you will be able to try all the answers below to find the one that works best. If you decide to challenge yourself to solve this algebraically instead, you can use the univariate Gaussian PDF, given at the top of: https://en.wikipedia.org/wiki/Normal_distribution
a.) 5.830 <--doesn't take cost into account b.) 5.978 <--correct c.) 5.667 <--cost adjustment is inverted d.) 2.690 <--fits LRT equation, but is absurd because its not between the means
2. ML and MAP
Suppose we are diagnosing a very rare illness, which happens only once in 100 million people on average. We have a test for this illness, luckily, but it is not perfectly accurate. If somebody has the disease, it will report positive 99% of the time. If somebody does not have the disease, it will report positive 2% of the time.
Suppose a patient walks in and tests positive for the disease. Using the maximum likelihood (ML) criteria, would we diagnose them positive? a.) Yes <--correct b.) No
What if we used the maximum a posteriori (MAP) criteria? a.) Yes b.) No <--correct
Why do we see a difference between the two criteria, if there is one? a.) Since ML assumes a Gaussian distribution, unlike MAP, it oversimplifies the world. b.) The role of the prior probability is different between the two. <--correct
c.) Unlike MAP, ML assumes the same model for all people. d.) There is no difference between the two, because in this case they are equivalent.
3. Information, entropy, and mutual information:
First, suppose that we have a neuron which, in a given time period, will fire with probability 0.1. That is - we have a binomial distribution for the neuron's firing, with P(1) = 0.1. Lets call this random variable f.
Which of these is closest to the entropy of this distribution H(f)?
a.) -0.1354 <--get one of the signs wrong b.) 1.999 <--they might think its just the number of bits we are looking at c.) 0.1354 <--get one of the signs wrong d.) 0.3251 <--correct
Now lets add a stimulus to the picture. Suppose that we think this neuron's activity is related to a light flashing in the eye. Let us say that the light is flashing in a given time period with probability 0.10. Call this random variable s.
If there is a flash, the neuron will fire with probability 1/2. If there is not a flash, the neuron will fire with probability 1/18.
Which of these is closest to the mutual information MI(s,f)?
a.) 0.0627 <---correct b.) −0.3202 <---if they flipped p(r|s=1) and p(r|s=0) c.) 0.2624 <---if they forgot to subtract noise entropy from total
d.) -0.2624 <---mirror answer to c to be tricky
4. Coding, sparse coding
This math from lecture 4-3 could potentially be intimidating, but in fact the concept is really simple. Getting an intuition for it will help with many types of problems. Lets work out a metaphor to understand it.
Suppose we want to build a complex image. We could do that by layering a whole bunch of pieces together (mathematically - summing). This is like drawing on transparencies with various levels of opacity and putting them on top of each other. Those familiar with Photoshop or Gimp will recognize that concept. If we had to build an image in Photoshop with a bicycle on a road, for instance, perhaps we could have an image of a sky, and one of the road, and one of the bike. We could "add" these pieces together to make our target image.
Of course, if our neural system was trying to make visual fields that worked for any sort of input, we would want more than just roads, skies, and bikes to work with! One possibility is to have a bunch of generic shapes of various sizes, orientations, and locations within the image. If we chose the right variety, we could blend/sum these primitive pieces together to make just about any image! One way to blend them is to let them have varying transparencies/opacities, and to set them on top of each other. That is what we would call a weighted sum, where the weights are how transparent each piece is.
Of course, we may not want to have too many possible shapes to use. As mentioned in the video, the organism likely wants to conserve energy. That means having as few neurons firing as possible at once. If we conceptually make a correlation between these shapes and the neurons, then we can point out we would want to use as few shapes as we could while maintaining an accurate image.
This math gives us a way of summing a bunch of pieces together to represent an image, to attempt to make that representation look as much like the image as possible, and to make that representation efficient - using as few pieces as possible. That is a lot of work for two lines of math!
Now lets put this metaphor into action to understand what all these symbols mean. I'll give you one to start with. x-arrow represents the coordinates of a point in the image. Now you fill in the rest:
What does phi(i), called the "basis functions," represent? a.) The pieces that make up the image b.) The level of transparency vs. opacity/influence of each piece c.) The importance of coding efficiency d.) The difference between the actual image and the representation
What does little epsilon represent? a.) The pieces that make up the image b.) The level of transparency vs. opacity/influence of each piece c.) The importance of coding efficiency d.) The difference between the actual image and the representation
What does a(i) represent? a.) The pieces that make up the image b.) The level of transparency vs. opacity/influence of each piece c.) The importance of coding efficiency d.) The difference between the actual image and the representation
What does lambda represent? a.) The pieces that make up the image b.) The level of transparency vs. opacity/influence of each piece c.) The importance of coding efficiency d.) The difference between the actual image and the representation
Ideas for Programming Assignments
I suggest that we give them a code that generates a poisson firing count as a function of input for say 4 cercal-like neurons that fire with cosine tuning curves. i.e. you put in s, wind angle, you get back 4 spike counts. Is there any way to make that code invisible to them but such that they could run it as a function inside a loop?
We could then ask them:
-- what form best describes the tuning curves? (they would have to sample the firing rates as a function of s, and average over many trials.
-- one of the neurons is non-Poisson: ie make the firing rate f(s) = fbar(s) + (rectified) Gaussian random number. Then they could try to diagnose which of the neurons is not Poisson (great multiple choice question!).
-- compute the population vector for these neurons. We could make this multiple choice by asking them to average the RMS error over 20 trials for a specific value of s and ask what range is it in.
Conceptual Questions
1. Likelihood ratio test with asymmetric costs.
Suppose we have a stimulus defined by a single variable, called s. And let's say that s can take one of two values, which we will call s1 and s2. You could think of these as lights flashing in the eyes at one of two possible frequencies. Or perhaps listening to punk rock vs. listening to Dvorak.
Let's call the firing rate response of a neuron to this stimulus r.
Suppose that under stimulus s1 the response rate of the neuron can be roughly approximated with a Gaussian distribution with the following parameters:
Mu: 5
Sigma: 0.5
And likewise for s2:
Mu: 7
Sigma: 1
Lets say that both stimuli are equally likely, given no other prior information.
Now let's throw in another twist. Let's say that we receive a measurement of the neuron's response and want to guess which stimulus was presented, but that to us it is twice as bad to mistakenly think it is s2 than to mistakenly think it is s1.
Which of these firing rates would make the best decision threshold for us in determining the value of s given a neuron's firing rate?
Hint: There are several functions available to help you evaluate Gaussian distributions. In Octave and in Matlab's stats toolbox you can use the 'normpdf' function. If you know how to set the problem up, you will be able to try all the answers below to find the one that works best. If you decide to challenge yourself to solve this algebraically instead, you can use the univariate Gaussian PDF, given at the top of: https://en.wikipedia.org/wiki/Normal_distribution
a.) 5.830 <--doesn't take cost into account
b.) 5.978 <--correct
c.) 5.667 <--cost adjustment is inverted
d.) 2.690 <--fits LRT equation, but is absurd because its not between the means
2. ML and MAP
Suppose we are diagnosing a very rare illness, which happens only once in 100 million people on average. We have a test for this illness, luckily, but it is not perfectly accurate. If somebody has the disease, it will report positive 99% of the time. If somebody does not have the disease, it will report positive 2% of the time.
Suppose a patient walks in and tests positive for the disease. Using the maximum likelihood (ML) criteria, would we diagnose them positive?
a.) Yes <--correct
b.) No
What if we used the maximum a posteriori (MAP) criteria?
a.) Yes
b.) No <--correct
Why do we see a difference between the two criteria, if there is one?
a.) Since ML assumes a Gaussian distribution, unlike MAP, it oversimplifies the world.
b.) The role of the prior probability is different between the two. <--correct
c.) Unlike MAP, ML assumes the same model for all people.
d.) There is no difference between the two, because in this case they are equivalent.
3. Information, entropy, and mutual information:
First, suppose that we have a neuron which, in a given time period, will fire with probability 0.1. That is - we have a binomial distribution for the neuron's firing, with P(1) = 0.1. Lets call this random variable f.
Which of these is closest to the entropy of this distribution H(f)?
a.) -0.1354 <--get one of the signs wrong
b.) 1.999 <--they might think its just the number of bits we are looking at
c.) 0.1354 <--get one of the signs wrong
d.) 0.3251 <--correct
Now lets add a stimulus to the picture. Suppose that we think this neuron's activity is related to a light flashing in the eye. Let us say that the light is flashing in a given time period with probability 0.10. Call this random variable s.
If there is a flash, the neuron will fire with probability 1/2. If there is not a flash, the neuron will fire with probability 1/18.
Which of these is closest to the mutual information MI(s,f)?
a.) 0.0627 <---correct
b.) −0.3202 <---if they flipped p(r|s=1) and p(r|s=0)
c.) 0.2624 <---if they forgot to subtract noise entropy from total
d.) -0.2624 <---mirror answer to c to be tricky
4. Coding, sparse coding
This math from lecture 4-3 could potentially be intimidating, but in fact the concept is really simple. Getting an intuition for it will help with many types of problems. Lets work out a metaphor to understand it.
Suppose we want to build a complex image. We could do that by layering a whole bunch of pieces together (mathematically - summing). This is like drawing on transparencies with various levels of opacity and putting them on top of each other. Those familiar with Photoshop or Gimp will recognize that concept. If we had to build an image in Photoshop with a bicycle on a road, for instance, perhaps we could have an image of a sky, and one of the road, and one of the bike. We could "add" these pieces together to make our target image.
Of course, if our neural system was trying to make visual fields that worked for any sort of input, we would want more than just roads, skies, and bikes to work with! One possibility is to have a bunch of generic shapes of various sizes, orientations, and locations within the image. If we chose the right variety, we could blend/sum these primitive pieces together to make just about any image! One way to blend them is to let them have varying transparencies/opacities, and to set them on top of each other. That is what we would call a weighted sum, where the weights are how transparent each piece is.
Of course, we may not want to have too many possible shapes to use. As mentioned in the video, the organism likely wants to conserve energy. That means having as few neurons firing as possible at once. If we conceptually make a correlation between these shapes and the neurons, then we can point out we would want to use as few shapes as we could while maintaining an accurate image.
This math gives us a way of summing a bunch of pieces together to represent an image, to attempt to make that representation look as much like the image as possible, and to make that representation efficient - using as few pieces as possible. That is a lot of work for two lines of math!
Now lets put this metaphor into action to understand what all these symbols mean. I'll give you one to start with. x-arrow represents the coordinates of a point in the image. Now you fill in the rest:
What does phi(i), called the "basis functions," represent?
a.) The pieces that make up the image
b.) The level of transparency vs. opacity/influence of each piece
c.) The importance of coding efficiency
d.) The difference between the actual image and the representation
What does little epsilon represent?
a.) The pieces that make up the image
b.) The level of transparency vs. opacity/influence of each piece
c.) The importance of coding efficiency
d.) The difference between the actual image and the representation
What does a(i) represent?
a.) The pieces that make up the image
b.) The level of transparency vs. opacity/influence of each piece
c.) The importance of coding efficiency
d.) The difference between the actual image and the representation
What does lambda represent?
a.) The pieces that make up the image
b.) The level of transparency vs. opacity/influence of each piece
c.) The importance of coding efficiency
d.) The difference between the actual image and the representation