1. Which of the following is true of unsupervised and supervised learning during training of a network? (check all that apply)

*In unsupervised learning, inputs are provided to the network
In unsupervised learning, correct outputs are provided to the network
*In supervised learning, inputs are provided to the network
*In supervised learning, correct outputs are provided to the network


2. Give weights of a perceptron, and several points above, below, and on the line. Ask the students to say the class of each one (a check box).

Recall the perceptron, which classifies inputs as belonging to one of two categories, and correspondingly outputs either +1 or -1. Consider a perceptron that receives two inputs u_1 and u_2 with synaptic weights 1 and 3 respectively. If the threshold (mu) for the output perceptron is 0, which of the following input vectors (u = [u_1 u_2]) will lead to a positive output from perceptron A? (check all that apply)

[-1 -1]
[1 -1]
*[-1 1]
*[1 1]


3. Suppose a perceptron receiving 5 inputs is being trained to categorize input patterns $$u$$ (a 5-element vector consisting of either +1 or -1 for each of the inputs) into one of two categories. The perceptron generates an output $$v$$ (+1 or -1) corresponding to its categorization of the input pattern. Synaptic weights $$w_i$$ are given in vector w. Now suppose that the perceptron has incorrectly classified the most recent input pattern [+1 -1 -1 +1 -1], outputting +1 when it should have given -1. According to the perceptron learning rule, which of the following adjustments should be made? (check all that apply)

  • $$w_3$$ should increase
$$w_3$$ should decrease
*the threshold should increase
the threshold should decrease


4. Learning with perceptrons appears similar to many other methods we learned - we take a delta between the known output and our predicted output and use that to determine the direction and scale of our adjustments to the model. Likewise with our multilayer sigmoid networks, we once again used gradient descent to learn parameters.
<br><br>
Typically we can have a single data set which we loop over multiple times, using each data point in the set to train the perceptron until its weight values appear to converge to something. One issue with this type of learning is that it can result in oscillations around the ideal estimate, so it never quite converges. In other words - it will bounce around the value we are looking for without ever falling within some stopping criteria (for instance - that the parameters stop changing much). As a result, we might run the training forever! <br><br>

Suppose we had a single layer perceptron with a decision threshold (no sigmoid). What can we change to guarantee this doesn't happen?
* a. Lower the learning rate $$\eta$$ to a small positive value
b. Get more training data
c. Randomize the order of the training data each time we loop over it
d. We cannot guarantee it


<<Variations: show them two out of these 6. Maybe break the 6 into two groups with one question of each type of learning so that they are guaranteed not to see the same question twice.>>
5., 6.

We have now talked explicitly about three types of learning: unsupervised, supervised, and reinforcement learning. Which is most appropriate depends on the structure of your problem and the data available. For each of the next two questions, pick the type of learning that is most appropriate for the problem:

We are teaching an AI agent to play chess, but we do not have an expert around to show us the value of individual moves. The only feedback we get is at the end of the game when we discover that we have won or lost.
a. Unsupervised
b. Supervised
* c. Reinforcement

We have a robot which can perceive and manipulate objects in the environment, but does not have a sense for how to manipulate them skillfully (e.g. to keep from breaking them). Let us say we asked the robot to pour us a cup of milk. Whenever it broke something or did something unrelated to the task, we said "bad robot." And when it made some good progress we said "good robot." Periodically, we would say "hurry up" to get it to work faster. After it accomplishes its task, we would plug it into the wall to show our appreciation.
a. Unsupervised
b. Supervised
* c. Reinforcement

We have a disease outbreak, and we have no idea what caused it. The only information we have is where each reported case is located on the map. We want to calculate where the largest focii of outbreaks are so we can go there to investigate.
* a. Unsupervised
b. Supervised
c. Reinforcement

Suppose we have a collection of images of cars. All of the images are ~1 million pixels. Let us say we were trying to model a neuron whose job it was to find the dominant color in an image (e.g. that the car was red). Since 1 million pixels makes the learning problem a very large one, we want to reduce each image to a simple set of features - some simple measure of the overall color balance in each image, with the possibility of, for instance, weighting the pixels in the center of the image more than those near the outside. What type of learning would we use for that?
* a. Unsupervised
b. Supervised
c. Reinforcement

Let us say there is a disease outbreak, and for every case we have the complete medical history of the patient. Suppose we think the disease has some correlation with the amount of soda pop you drink every day. If we have good soda consumption information for every patient, and some generalized consumption information for the greater society, how would we go about estimating the connection between soda consumption and the disease?
a. Unsupervised
* b. Supervised
c. Reinforcement

Suppose we have a collection of images of cars, which somebody has conveniently labelled as "red car", "blue car", etc. Now we want to build a system which can later make the labels for us for new car images.
a. Unsupervised
* b. Supervised
c. Reinforcement


7.
maze.png
Going back to the maze from the lectures, let's add a crazy twist to our randomized policy. Suppose that after arriving at B, the rat is immediately teleported to C 20% of the time and the rest of the time, the action selected by the uniformly random policy is executed at B.
What would the new values for B and C be?
* a. $$v(B) = 2.2; v(C) = 1$$
b. $$v(B) = 2.2; v(C) = 1.3$$
c. $$v(B) = 2.25; v(C) = 1.25$$
d. None of these


8. Now let's make the situation symmetrical - now teleportation can happen from B to C or C to B, again with the 20% probability (this could theoretically go on forever!). What are the new values for B and C?
a. $$v(B) = 2.5; v(C) = 1$$ (unchanged from original)
b. $$v(B) = 2.2; v(C) = 1.3$$
* c. $$v(B) = 2.25; v(C) = 1.25$$
d. None of these


9.
Let's look at a twist on the same maze, but without the teleportation. First, let's define a new function called $$R$$, which stands for 'reward.' The reward function is defined as: $$R(u,a) \rarrow \mathbb{R}$$, which means that if we are in a given state, and take a particular action, we will be granted a certain real-numbered reward. Notice that this differs only slightly from what we were doing in the lecture, where the states are locations on the maze, and rewards are associated only with the four final states (the grey boxes). In this case, rewards are not tied to the actions themselves, only to the resulting states.
<br><br>
The reason we may want to extend the reward function in this way is that it allows us to do more interesting things. Let us say that we divided this maze up into a grid, where each grid cell is a state. Now A, B, and C are still states, as are the four final states, but we have numerous states in between them. Our set of actions allows us to move one state at a time, and backtrack as many times as we want. Given this revised model, let us say that we wanted to give a penalty proportional to the distance traveled. Check each of the following that would allow us to accomplish that in the context of our new reward function (note that some could be done with the old reward function too).
* a. Set $$R$$ equal to some constant $$r_{penalty}$$ for all states and actions excluding the final states, which will remain as they are marked in the maze for all actions.
* b. Using a reward function defined over only states, set $$R$$ equal to some constant $$r_{penalty}$$ for all states, excluding the final states, which will remain as they are marked presently.
c. Subtract some amount from each final state proportional to its distance from the maze entrance.
d. Set a penalty at B and C proportional to the sum of their distance from A and the neighboring final states.

10. Suppose instead that we wanted to penalize for time taken instead of distance traveled. Let's say that the rat can stop to rest at any time.
* a. Set $$R$$ equal to some constant $$r_{penalty}$$ for all states and actions excluding the final states, which will remain as they are marked in the maze for all actions.
b. Using a reward function defined over only states, set $$R$$ equal to some constant $$r_{penalty}$$ for all states, excluding the final states, which will remain as they are marked presently.
c. Subtract some amount from each final state proportional to its distance from the maze entrance.
d. Set a penalty at B and C proportional to the sum of their distance from A and the neighboring final states.