Recent Changes

Thursday, June 6

  1. page HW6 edited ... *[-1 1] *[1 1] ... input patterns u $$u$$ (a 5-element ... an output v $$v$$ (+1 or…
    ...
    *[-1 1]
    *[1 1]
    ...
    input patterns u$$u$$ (a 5-element
    ...
    an output v$$v$$ (+1 or
    ...
    Synaptic weights w_i$$w_i$$ are given
    ...
    that apply)
    *w_3

    $$w_3$$
    should increase
    w_3

    $$w_3$$
    should decrease
    *the threshold should increase
    the threshold should decrease
    ...
    Typically we can have a single data set which we loop over multiple times, using each data point in the set to train the perceptron until its weight values appear to converge to something. One issue with this type of learning is that it can result in oscillations around the ideal estimate, so it never quite converges. In other words - it will bounce around the value we are looking for without ever falling within some stopping criteria (for instance - that the parameters stop changing much). As a result, we might run the training forever! <br><br>
    Suppose we had a single layer perceptron with a decision threshold (no sigmoid). What can we change to guarantee this doesn't happen?
    a.* a. Lower the
    b. Get more training data
    c. Randomize the order of the training data each time we loop over it
    ...
    The reason we may want to extend the reward function in this way is that it allows us to do more interesting things. Let us say that we divided this maze up into a grid, where each grid cell is a state. Now A, B, and C are still states, as are the four final states, but we have numerous states in between them. Our set of actions allows us to move one state at a time, and backtrack as many times as we want. Given this revised model, let us say that we wanted to give a penalty proportional to the distance traveled. Check each of the following that would allow us to accomplish that in the context of our new reward function (note that some could be done with the old reward function too).
    * a. Set $$R$$ equal to some constant $$r_{penalty}$$ for all states and actions excluding the final states, which will remain as they are marked in the maze for all actions.
    ...
    b. Using the olda reward function
    c. Subtract some amount from each final state proportional to its distance from the maze entrance.
    d. Set a penalty at B and C proportional to the sum of their distance from A and the neighboring final states.
    10. Suppose instead that we wanted to penalize for time taken instead of distance traveled. Let's say that the rat can stop to rest at any time.
    * a. Set $$R$$ equal to some constant $$r_{penalty}$$ for all states and actions excluding the final states, which will remain as they are marked in the maze for all actions.
    b. Using the olda reward function
    c. Subtract some amount from each final state proportional to its distance from the maze entrance.
    d. Set a penalty at B and C proportional to the sum of their distance from A and the neighboring final states.
    (view changes)
    1:00 pm
  2. page HW6 edited ... Typically we can have a single data set which we loop over multiple times, using each data poi…
    ...
    Typically we can have a single data set which we loop over multiple times, using each data point in the set to train the perceptron until its weight values appear to converge to something. One issue with this type of learning is that it can result in oscillations around the ideal estimate, so it never quite converges. In other words - it will bounce around the value we are looking for without ever falling within some stopping criteria (for instance - that the parameters stop changing much). As a result, we might run the training forever! <br><br>
    Suppose we had a single layer perceptron with a decision threshold (no sigmoid). What can we change to guarantee this doesn't happen?
    ...
    to a very small positive value
    b. Get more training data
    c. Randomize the order of the training data each time we loop over it
    * d.d. We cannot guarantee it
    5. What if we guarantee the data is linearly separable, and we had an oscillation, but we still wanted to reduce it?
    * a. Lower the learning rate $$\eta$$ to a very small value
    b. Get more training data
    c. Randomize the order of the training data each time we loop over it
    d. None of these

    <<Variations: show them two out of these 6. Maybe break the 6 into two groups with one question of each type of learning so that they are guaranteed not to see the same question twice.>>
    6., 7.5., 6.
    We have now talked explicitly about three types of learning: unsupervised, supervised, and reinforcement learning. Which is most appropriate depends on the structure of your problem and the data available. For each of the next two questions, pick the type of learning that is most appropriate for the problem:
    We are teaching an AI agent to play chess, but we do not have an expert around to show us the value of individual moves. The only feedback we get is at the end of the game when we discover that we have won or lost.
    ...
    * b. Supervised
    c. Reinforcement
    8.7.
    {maze.png}
    Going back to the maze from the lectures, let's add a crazy twist to our randomized policy. Suppose that after arriving at B, the rat is immediately teleported to C 20% of the time and the rest of the time, the action selected by the uniformly random policy is executed at B.
    ...
    c. $$v(B) = 2.25; v(C) = 1.25$$
    d. None of these
    9.8. Now let's
    a. $$v(B) = 2.5; v(C) = 1$$ (unchanged from original)
    b. $$v(B) = 2.2; v(C) = 1.3$$
    * c. $$v(B) = 2.25; v(C) = 1.25$$
    d. None of these
    10.9.
    Let's look at a twist on the same maze, but without the teleportation. First, let's define a new function called $$R$$, which stands for 'reward.' The reward function is defined as: $$R(u,a) \rarrow \mathbb{R}$$, which means that if we are in a given state, and take a particular action, we will be granted a certain real-numbered reward. Notice that this differs only slightly from what we were doing in the lecture, where the states are locations on the maze, and rewards are associated only with the four final states (the grey boxes). In this case, rewards are not tied to the actions themselves, only to the resulting states.
    <br><br>
    ...
    c. Subtract some amount from each final state proportional to its distance from the maze entrance.
    d. Set a penalty at B and C proportional to the sum of their distance from A and the neighboring final states.
    11.10. Suppose instead
    * a. Set $$R$$ equal to some constant $$r_{penalty}$$ for all states and actions excluding the final states, which will remain as they are marked in the maze for all actions.
    b. Using the old reward function defined over only states, set $$R$$ equal to some constant $$r_{penalty}$$ for all states, excluding the final states, which will remain as they are marked presently.
    (view changes)
    12:57 pm

Wednesday, June 5

  1. page HW6 edited ... d. Set a penalty at B and C proportional to the sum of their distance from A and the neighbori…
    ...
    d. Set a penalty at B and C proportional to the sum of their distance from A and the neighboring final states.
    11. Suppose instead that we wanted to penalize for time taken instead of distance traveled. Let's say that the rat can stop to rest at any time.
    a.* a. Set $$R$$
    b. Using the old reward function defined over only states, set $$R$$ equal to some constant $$r_{penalty}$$ for all states, excluding the final states, which will remain as they are marked presently.
    c. Subtract some amount from each final state proportional to its distance from the maze entrance.
    (view changes)
    10:33 pm
  2. page HW6 edited ... 4. Learning with perceptrons appears similar to many other methods we learned - we take a delt…
    ...
    4. Learning with perceptrons appears similar to many other methods we learned - we take a delta between the known output and our predicted output and use that to determine the direction and scale of our adjustments to the model. Likewise with our multilayer sigmoid networks, we once again used gradient descent to learn parameters.
    <br><br>
    ...
    loop over multplemultiple times, using
    ...
    until its weight values appear
    ...
    result in periodic oscillations around
    ...
    training forever! (or close the program)<br><br><br><br>
    Suppose we had a single layer perceptron with a decision threshold (no sigmoid). What can we change to guarantee this doesn't happen?
    a. Lower the learning rate $$\eta$$ to a very small value
    ...
    c. Randomize the order of the training data each time we loop over it
    * d. We cannot guarantee it
    ...
    had an oscillation which was not infinite,oscillation, but we
    * a. Lower the learning rate $$\eta$$ to a very small value
    b. Get more training data
    ...
    6., 7.
    We have now talked explicitly about three types of learning: unsupervised, supervised, and reinforcement learning. Which is most appropriate depends on the structure of your problem and the data available. For each of the next two questions, pick the type of learning that is most appropriate for the problem:
    ...
    an AI agent to play
    a. Unsupervised
    b. Supervised
    ...
    b. Supervised
    c. Reinforcement
    ...
    problem a varyvery large one,
    ...
    want to dimensionally reduce each
    * a. Unsupervised
    b. Supervised
    ...
    8.
    {maze.png}
    ...
    the lectures, letslet's add a
    ...
    Suppose that with a 20% probability, after arriving at B, we arethe rat is immediately teleported to C rather than having our20% of the time and the rest of the time, the action executed. Whatselected by the uniformly random policy is executed at B.
    What
    would the
    * a. $$v(B) = 2.2; v(C) = 1$$
    b. $$v(B) = 2.2; v(C) = 1.3$$
    c. $$v(B) = 2.25; v(C) = 1.25$$
    d. None of these
    9. Now letslet's make the
    a. $$v(B) = 2.5; v(C) = 1$$ (unchanged from original)
    b. $$v(B) = 2.2; v(C) = 1.3$$
    ...
    d. None of these
    10.
    LetsLet's look at
    ...
    teleportation. First, letslet's define a
    ...
    on the map,maze, and rewards
    <br><br>
    ...
    to move onone state at
    ...
    in the mapmaze for all
    * b. Using the old reward function defined over only states, set $$R$$ equal to some constant $$r_{penalty}$$ for all states, excluding the final states, which will remain as they are marked presently.
    c. Subtract some amount from each final state proportional to its distance from the maze entrance.
    d. Set a penalty at B and C proportional to the sum of their distance from A and the neighboring final states.
    ...
    distance traveled. LetsLet's say that the mouserat can stop
    ...
    in the mapmaze for all
    b. Using the old reward function defined over only states, set $$R$$ equal to some constant $$r_{penalty}$$ for all states, excluding the final states, which will remain as they are marked presently.
    c. Subtract some amount from each final state proportional to its distance from the maze entrance.
    (view changes)
    9:45 pm
  3. page HW6 edited ... *[-1 1] *[1 1] ... perceptron receiving inputs from 5 other perceptrons inputs is bei…
    ...
    *[-1 1]
    *[1 1]
    ...
    perceptron receiving inputs from 5 other perceptronsinputs is being
    ...
    of the input perceptrons)inputs) into one
    ...
    categorization of thisthe input pattern.
    *w_3 should increase
    w_3 should decrease
    (view changes)
    9:11 pm
  4. page HW6 edited ... *In supervised learning, correct outputs are provided to the network 2. Give weights of a per…
    ...
    *In supervised learning, correct outputs are provided to the network
    2. Give weights of a perceptron, and several points above, below, and on the line. Ask the students to say the class of each one (a check box).
    ...
    a perceptron (perceptron A) that receives all its inputs from two other perceptrons Binputs u_1 and C,u_2 with synaptic
    ...
    output perceptron A is 0,
    ...
    input vectors (u) showing inputs from perceptrons B and C respectively(u = [u_1 u_2]) will lead
    [-1 -1]
    [1 -1]
    (view changes)
    9:08 pm
  5. page HW6 edited ... 2. Give weights of a perceptron, and several points above, below, and on the line. Ask the stu…
    ...
    2. Give weights of a perceptron, and several points above, below, and on the line. Ask the students to say the class of each one (a check box).
    Recall the perceptron, which classifies inputs as belonging to one of two categories, and correspondingly outputs either +1 or -1. Consider a perceptron (perceptron A) that receives all its inputs from two other perceptrons B and C, with synaptic weights 1 and 3 respectively. If the threshold (mu) for the output perceptron A is 0, which of the following input vectors (u) showing inputs from perceptrons B and C respectively will lead to a positive output from perceptron A? (check all that apply)
    <br><br>
    [-1 -1]
    [1 -1]
    ...
    *the threshold should increase
    the threshold should decrease
    3.4. Learning with
    <br><br>
    Typically we can have a single data set which we loop over multple times, using each data point in the set to train the perceptron until its values appear to converge to something. One issue with this type of learning is that it can result in periodic oscillations around the ideal estimate, so it never quite converges. In other words - it will bounce around the value we are looking for without ever falling within some stopping criteria (for instance - that the parameters stop changing much). As a result, we might run the training forever! (or close the program)<br><br>
    ...
    c. Randomize the order of the training data each time we loop over it
    * d. We cannot guarantee it
    4.5. What if
    * a. Lower the learning rate $$\eta$$ to a very small value
    b. Get more training data
    ...
    d. None of these
    <<Variations: show them two out of these 6. Maybe break the 6 into two groups with one question of each type of learning so that they are guaranteed not to see the same question twice.>>
    5., 6.6., 7.
    We have now talked explicitly about three types of learning: unsupervised, supervised, and reinforcement learning. Which is most appropriate depends on the structure of your problem and the data available. For each of the next two questions, pick the type of learning that is most appropriate for the problem:
    We are teaching an AI to play chess, but we do not have an expert around to show us the value of individual moves. The only feedback we get is at the end of the game when we discover that we have won or lost.
    ...
    * b. Supervised
    c. Reinforcement
    7.8.
    {maze.png}
    Going back to the maze from the lectures, lets add a crazy twist to our randomized policy. Suppose that with a 20% probability, after arriving at B, we are teleported to C rather than having our action executed. What would the new values for B and C be?
    ...
    c. $$v(B) = 2.25; v(C) = 1.25$$
    d. None of these
    8.9. Now lets
    a. $$v(B) = 2.5; v(C) = 1$$ (unchanged from original)
    b. $$v(B) = 2.2; v(C) = 1.3$$
    * c. $$v(B) = 2.25; v(C) = 1.25$$
    d. None of these
    9.10.
    Lets look at a twist on the same maze, but without the teleportation. First, lets define a new function called $$R$$, which stands for 'reward.' The reward function is defined as: $$R(u,a) \rarrow \mathbb{R}$$, which means that if we are in a given state, and take a particular action, we will be granted a certain real-numbered reward. Notice that this differs only slightly from what we were doing in the lecture, where the states are locations on the map, and rewards are associated only with the four final states (the grey boxes). In this case, rewards are not tied to the actions themselves, only to the resulting states.
    <br><br>
    ...
    c. Subtract some amount from each final state proportional to its distance from the maze entrance.
    d. Set a penalty at B and C proportional to the sum of their distance from A and the neighboring final states.
    10.11. Suppose instead
    a. Set $$R$$ equal to some constant $$r_{penalty}$$ for all states and actions excluding the final states, which will remain as they are marked in the map for all actions.
    b. Using the old reward function defined over only states, set $$R$$ equal to some constant $$r_{penalty}$$ for all states, excluding the final states, which will remain as they are marked presently.
    (view changes)
    4:50 pm
  6. page HW6 edited 1. Which of the following is true of unsupervised and supervised learning during training of a netw…
    1. Which of the following is true of unsupervised and supervised learning during training of a network? (check all that apply)
    *a. In*In unsupervised learning,
    ...
    the network
    b. In

    In
    unsupervised learning,
    ...
    the network
    *c. In

    *In
    supervised learning,
    ...
    the network
    *d. In

    *In
    supervised learning,
    2. Give weights of a perceptron, and several points above, below, and on the line. Ask the students to say the class of each one (a check box).
    Recall the perceptron, which classifies inputs as belonging to one of two categories, and correspondingly outputs either +1 or -1. Consider a perceptron (perceptron A) that receives all its inputs from two other perceptrons B and C, with synaptic weights 1 and 3 respectively. If the threshold (mu) for the output perceptron A is 0, which of the following input vectors (u) showing inputs from perceptrons B and C respectively will lead to a positive output from perceptron A? (check all that apply)
    <br><br>
    a. [-1[-1 -1]
    b. [1

    [1
    -1]
    *c. [-1

    *[-1
    1]
    *d. [1

    *[1
    1]
    3. Suppose a perceptron receiving inputs from 5 other perceptrons is being trained to categorize input patterns u (a 5-element vector consisting of either +1 or -1 for each of the input perceptrons) into one of two categories. The perceptron generates an output v (+1 or -1) corresponding to its categorization of this input pattern. Synaptic weights w_i are given in vector w. Now suppose that the perceptron has incorrectly classified the most recent input pattern [+1 -1 -1 +1 -1], outputting +1 when it should have given -1. According to the perceptron learning rule, which of the following adjustments should be made? (check all that apply)
    *w_3 should increase
    w_3 should decrease
    *the threshold should increase
    the threshold should decrease

    3. Learning with perceptrons appears similar to many other methods we learned - we take a delta between the known output and our predicted output and use that to determine the direction and scale of our adjustments to the model. Likewise with our multilayer sigmoid networks, we once again used gradient descent to learn parameters.
    <br><br>
    (view changes)
    4:48 pm
  7. page HW6 edited Which 1. Which of the *a. In unsupervised learning, inputs are provided to the network b. In u…
    Which1. Which of the
    *a. In unsupervised learning, inputs are provided to the network
    b. In unsupervised learning, correct outputs are provided to the network
    *c. In supervised learning, inputs are provided to the network
    *d. In supervised learning, correct outputs are provided to the network
    1.2. Give weights
    Recall the perceptron, which classifies inputs as belonging to one of two categories, and correspondingly outputs either +1 or -1. Consider a perceptron (perceptron A) that receives all its inputs from two other perceptrons B and C, with synaptic weights 1 and 3 respectively. If the threshold (mu) for the output perceptron A is 0, which of the following input vectors (u) showing inputs from perceptrons B and C respectively will lead to a positive output from perceptron A? (check all that apply)
    <br><br>
    ...
    *c. [-1 1]
    *d. [1 1]
    2.3. Learning with
    <br><br>
    Typically we can have a single data set which we loop over multple times, using each data point in the set to train the perceptron until its values appear to converge to something. One issue with this type of learning is that it can result in periodic oscillations around the ideal estimate, so it never quite converges. In other words - it will bounce around the value we are looking for without ever falling within some stopping criteria (for instance - that the parameters stop changing much). As a result, we might run the training forever! (or close the program)<br><br>
    ...
    c. Randomize the order of the training data each time we loop over it
    * d. We cannot guarantee it
    3.4. What if
    * a. Lower the learning rate $$\eta$$ to a very small value
    b. Get more training data
    ...
    d. None of these
    <<Variations: show them two out of these 6. Maybe break the 6 into two groups with one question of each type of learning so that they are guaranteed not to see the same question twice.>>
    4., 5.5., 6.
    We have now talked explicitly about three types of learning: unsupervised, supervised, and reinforcement learning. Which is most appropriate depends on the structure of your problem and the data available. For each of the next two questions, pick the type of learning that is most appropriate for the problem:
    We are teaching an AI to play chess, but we do not have an expert around to show us the value of individual moves. The only feedback we get is at the end of the game when we discover that we have won or lost.
    ...
    * b. Supervised
    c. Reinforcement
    6.7.
    {maze.png}
    Going back to the maze from the lectures, lets add a crazy twist to our randomized policy. Suppose that with a 20% probability, after arriving at B, we are teleported to C rather than having our action executed. What would the new values for B and C be?
    ...
    c. $$v(B) = 2.25; v(C) = 1.25$$
    d. None of these
    7.8. Now lets
    a. $$v(B) = 2.5; v(C) = 1$$ (unchanged from original)
    b. $$v(B) = 2.2; v(C) = 1.3$$
    * c. $$v(B) = 2.25; v(C) = 1.25$$
    d. None of these
    8.9.
    Lets look at a twist on the same maze, but without the teleportation. First, lets define a new function called $$R$$, which stands for 'reward.' The reward function is defined as: $$R(u,a) \rarrow \mathbb{R}$$, which means that if we are in a given state, and take a particular action, we will be granted a certain real-numbered reward. Notice that this differs only slightly from what we were doing in the lecture, where the states are locations on the map, and rewards are associated only with the four final states (the grey boxes). In this case, rewards are not tied to the actions themselves, only to the resulting states.
    <br><br>
    ...
    c. Subtract some amount from each final state proportional to its distance from the maze entrance.
    d. Set a penalty at B and C proportional to the sum of their distance from A and the neighboring final states.
    9.10. Suppose instead
    a. Set $$R$$ equal to some constant $$r_{penalty}$$ for all states and actions excluding the final states, which will remain as they are marked in the map for all actions.
    b. Using the old reward function defined over only states, set $$R$$ equal to some constant $$r_{penalty}$$ for all states, excluding the final states, which will remain as they are marked presently.
    (view changes)
    3:48 pm
  8. page HW6 edited Which of the following is true of unsupervised and supervised learning during training of a network…
    Which of the following is true of unsupervised and supervised learning during training of a network? (check all that apply)
    In*a. In unsupervised learning,
    ...
    the network
    In

    b. In
    unsupervised learning,
    ...
    the network
    In

    *c. In
    supervised learning,
    ...
    the network
    In

    *d. In
    supervised learning,
    1. Give weights of a perceptron, and several points above, below, and on the line. Ask the students to say the class of each one (a check box).
    Recall the perceptron, which classifies inputs as belonging to one of two categories, and correspondingly outputs either +1 or -1. Consider a perceptron (perceptron A) that receives all its inputs from two other perceptrons B and C, with synaptic weights 1 and 3 respectively. If the threshold (mu) for the output perceptron A is 0, which of the following input vectors (u) showing inputs from perceptrons B and C respectively will lead to a positive output from perceptron A? (check all that apply)
    (view changes)
    3:47 pm

More