CS 370 SNHU Design Defense Project

Competencies

In this project, you will demonstrate your mastery of the following competencies:

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
  • Explain the basic concepts and techniques that pertain to artificial intelligence and intelligent systems
  • Analyze how algorithms are used in artificial intelligence to solve a variety of complex problems
  • Scenario

    You are working as an AI developer for a gaming company. The company is developing a treasure hunt game where the player needs to find the treasure before the pirates find it. As an AI developer, you have been asked to design an intelligent agent of the game for an NPC (non-player character) to represent the pirate. The pirate will need to navigate the game world, which consists of different pathways and obstacles, in order to find the treasure. The pirate agent’s goal is to find the treasure before the human player. This is commonly called a pathfinding problem, as the agent you create will need to find a path towards its goal.

    You have been provided with some starter code and a sample environment where your pirate agent will be placed. You will need to create a deep Q-learning algorithm to train your pirate agent. Finally, you have also been asked to write a design defense that demonstrates your understanding of the fundamental AI concepts involved in creating and training your intelligent agent.

    Directions

    Pirate Intelligent AgentAs part of your project, you will create a pirate intelligent agent to meet the specifications that you have been given. Be sure to review any feedback that you received on your Project Two Milestone before submitting the final version of your intelligent agent. Follow these steps to complete your intelligent agent:

  • Before creating your pirate intelligent agent, be sure to review the Pirate Intelligent Agent Specifications document, located in the Supporting Materials section. This document provides details about the code that you have been given, and what aspects you will need to create.
  • Download the zipped folder containing your starter code and Jupyter Notebook files by using the link in the Supporting Materials section. Access the Virtual Lab (Apporto) by using the link in the Virtual Lab Access module. Upload the zipped folder into the Virtual Lab, unzip the folder, and upload the files into the Jupyter Notebook application. Use the tutorials in the Supporting Materials to help you with these tasks.
  • Be sure to review the starter code that you have been given. Watch the Project Two Walkthrough video, located in the Supporting Materials section, to help you understand this code in more detail.IMPORTANT: Do not modify any of the PY files that you have been given.Complete the code for the Q-Training Algorithm section in your Jupyter Notebook. In order to successfully complete the code, you must do the following:Develop code that meets the given specifications:Complete the program for the intelligent agent so that it achieves its goal: The pirate should get the treasure.Apply a deep Q-learning algorithm to solve a pathfinding problem.Create functional code that runs without error.Use industry standard best practices such as in-line comments to enhance readability and maintainability.

    Save Time On Research and Writing
    Hire a Pro to Write You a 100% Plagiarism-Free Paper.
    Get My Paper
  • After you have finished creating the code for your notebook, save your work. Make sure that your notebook contains your name in the filename (such as, “Doe_Jane_ProjectTwo.ipynb”). This will help your instructor access and grade your work easily. Be sure to download a copy of your notebook (IPYNB file) for your submission.
  • Design DefenseAs a part of your project, you will also submit a design defense. This design defense will demonstrate the approach you took in solving this problem, explain how the intelligent agent works, and evaluate the algorithm you chose to use. In order to adequately defend your designs, you will need to support your ideas with research from your readings. You must include citations for sources that you used.Analyze the differences between human and machine approaches to solving problems.Describe the steps a human being would take to solve this maze.Describe the steps your intelligent agent is taking to solve this pathfinding problem.What are the similarities and differences between these two approaches?Assess the purpose of the intelligent agent in pathfinding.What is the difference between exploitation and exploration? What is the ideal proportion of exploitation and exploration for this pathfinding problem? Explain your reasoning.How can reinforcement learning help to determine the path to the goal (the treasure) by the agent (the pirate)?Evaluate the use of algorithms to solve complex problems.How did you implement deep Q-learning using neural networks for this game? CS 370 Pirate Intelligent Agent Specifications
    Agent Specifications
    • You will use the Python programming language for this project, as well as the TensorFlow and
    Keras libraries. These have been pre-installed in the Virtual Lab (Apporto).
    • The environment for your agent has already been designed as a maze (8×8 matrix), containing
    free (1), occupied (0), and target (1 at the bottom right) cells, as below:
    [ 1., 0., 1., 1., 1., 1., 1., 1.],
    [ 1., 0., 1., 1., 1., 0., 1., 1.],
    [ 1., 1., 1., 1., 0., 1., 0., 1.],
    [ 1., 1., 1., 0., 1., 1., 1., 1.],
    [ 1., 1., 0., 1., 1., 1., 1., 1.],
    [ 1., 1., 1., 0., 1., 0., 0., 0.],
    [ 1., 1., 1., 0., 1., 1., 1., 1.],
    [ 1., 1., 1., 1., 0., 1., 1., 1.]



    Your agent (pirate) should start at the top left. The agent can move in four directions: left, right,
    up, and down.
    The agent rewards vary from -1 point to 1 point. When the agent reaches the target, the reward
    will be 1 point. Moving to an occupied cell will result in a penalty of -0.75 points. Attempting to
    move outside the matrix boundary will result in a penalty of -0.8 points. Moving from a cell to an
    adjacent cell will result in a penalty of -0.04 points, primarily to avoid the agent wandering
    within the maze.
    A negative threshold has been defined for you in order to reduce training time, avoid infinite
    loops, and avoid unnecessary wandering.
    Provided Elements
    Below is a brief description of the different elements involved in the game. Several elements have
    already been given to you in the starter code. You will need to create the code for the Q-Training
    Algorithm section yourself.
    Environment
    (NOTE: You have been given this code)
    TreasureEnvironment.py contains complete code for your environment. It includes a maze object
    defined as a matrix. The provided code supports methods for resetting the pirate position, updating the
    state based on pirate movement, returning rewards based on agent movement guidelines, keeping track
    of the state and total reward based on agent action, determining the current environment state and
    game status, listing the valid actions from the current cell, and a visualization method for graphical
    display of environment and agent action.
    Experience for Replay
    (NOTE: You have been given this code)
    GameExperience.py contains complete code for experience replay. It stores the episodes, all the states
    that come in between the initial state and the terminal state. This is later used by the agent for learning
    by experience. The class supports methods for storing episodes in memory, predicting the next action
    based on the current environment state, and returning input and targets from memory based on
    specified data size.
    Build Model
    (NOTE: You have been given this code)
    You have been given a complete implementation to build a neural network model in the
    TreasureHuntGame Jupyter notebook. Make sure to review the code and note the number of layers, as
    well as the activation, optimizer, and loss functions that are used to train the model.
    Q-Training Algorithm
    (NOTE: You will need to create this code)
    You have been given a skeleton implementation in the TreasureHuntGame Jupyter Notebook. Your task
    is to implement deep-Q learning. The goal of your deep Q-learning implementation is to find the best
    possible navigation sequence that results in reaching the treasure cell while maximizing the reward. In
    your implementation, you need to determine the optimal number of epochs to achieve a 100% win rate.
    Play Game
    (NOTE: You have been given this code)
    You have been given a complete implementation of this function in the TreasureHuntGame Jupyter
    notebook. This function helps you to determine whether the pirate can win any game at all. If your maze
    is not well designed, the pirate may not be able to win, in which case your training may not yield any
    result. The provided maze in this notebook ensures that there is a path to win and you can run this
    method to check.
    Read and Review Your Starter Code
    The theme of this project is a popular treasure hunt game in which the player needs to find the treasure before the pirate does. While you
    will not be developing the entire game, you will write the part of the game that represents the intelligent agent, which is a pirate in this case.
    The pirate will try to find the optimal path to the treasure using deep Q-learning.
    You have been provided with two Python classes and this notebook to help you with this assignment. The first class, TreasureMaze.py,
    represents the environment, which includes a maze object defined as a matrix. The second class, GameExperience.py, stores the
    episodes – that is, all the states that come in between the initial state and the terminal state. This is later used by the agent for learning by
    experience, called “exploration”. This notebook shows how to play a game. Your task is to complete the deep Q-learning implementation
    for which a skeleton implementation has been provided. The code blocs you will need to complete has #TODO as a header.
    First, read and review the next few code and instruction blocks to understand the code that you have been given.
    In [1]: from __future__ import print_function
    import os, sys, time, datetime, json, random
    import numpy as np
    from keras.models import Sequential
    from keras.layers.core import Dense, Activation
    from keras.optimizers import SGD , Adam, RMSprop
    from keras.layers.advanced_activations import PReLU
    import matplotlib.pyplot as plt
    from TreasureMaze import TreasureMaze
    from GameExperience import GameExperience
    %matplotlib inline
    Using TensorFlow backend.
    The following code block contains an 8×8 matrix that will be used as a maze object:
    In [2]: maze = np.array([
    [ 1., 0., 1.,
    [ 1., 0., 1.,
    [ 1., 1., 1.,
    [ 1., 1., 1.,
    [ 1., 1., 0.,
    [ 1., 1., 1.,
    [ 1., 1., 1.,
    [ 1., 1., 1.,
    ])
    1.,
    1.,
    1.,
    0.,
    1.,
    0.,
    0.,
    1.,
    1.,
    1.,
    0.,
    1.,
    1.,
    1.,
    1.,
    0.,
    1.,
    0.,
    1.,
    1.,
    1.,
    0.,
    1.,
    1.,
    1.,
    1.,
    0.,
    1.,
    1.,
    0.,
    1.,
    1.,
    1.],
    1.],
    1.],
    1.],
    1.],
    0.],
    1.],
    1.]
    This helper function allows a visual representation of the maze object:
    In [3]: def show(qmaze):
    plt.grid(‘on’)
    nrows, ncols = qmaze.maze.shape
    ax = plt.gca()
    ax.set_xticks(np.arange(0.5, nrows, 1))
    ax.set_yticks(np.arange(0.5, ncols, 1))
    ax.set_xticklabels([])
    ax.set_yticklabels([])
    canvas = np.copy(qmaze.maze)
    for row,col in qmaze.visited:
    canvas[row,col] = 0.6
    pirate_row, pirate_col, _ = qmaze.state
    canvas[pirate_row, pirate_col] = 0.3
    # pirate cell
    canvas[nrows-1, ncols-1] = 0.9 # treasure cell
    img = plt.imshow(canvas, interpolation=’none’, cmap=’gray’)
    return img
    The pirate agent can move in four directions: left, right, up, and down.
    While the agent primarily learns by experience through exploitation, often, the agent can choose to explore the environment to find
    previously undiscovered paths. This is called “exploration” and is defined by epsilon. This value is typically a lower value such as 0.1,
    which means for every ten attempts, the agent will attempt to learn by experience nine times and will randomly explore a new path one
    time. You are encouraged to try various values for the exploration factor and see how the algorithm performs.
    In [4]: LEFT = 0
    UP = 1
    RIGHT = 2
    DOWN = 3
    # Exploration factor
    epsilon = 0.1
    # Actions dictionary
    actions_dict = {
    LEFT: ‘left’,
    UP: ‘up’,
    RIGHT: ‘right’,
    DOWN: ‘down’,
    }
    num_actions = len(actions_dict)
    The sample code block and output below show creating a maze object and performing one action (DOWN), which returns the reward. The
    resulting updated environment is visualized.
    In [5]: qmaze = TreasureMaze(maze)
    canvas, reward, game_over = qmaze.act(DOWN)
    print(“reward=”, reward)
    show(qmaze)
    reward= -0.04
    Out[5]:
    This function simulates a full game based on the provided trained model. The other parameters include the TreasureMaze object and the
    starting position of the pirate.
    In [6]: def play_game(model, qmaze, pirate_cell):
    qmaze.reset(pirate_cell)
    envstate = qmaze.observe()
    while True:
    prev_envstate = envstate
    # get next action
    q = model.predict(prev_envstate)
    action = np.argmax(q[0])
    # apply action, get rewards and new state
    envstate, reward, game_status = qmaze.act(action)
    if game_status == ‘win’:
    return True
    elif game_status == ‘lose’:
    return False
    This function helps you to determine whether the pirate can win any game at all. If your maze is not well designed, the pirate may not win
    any game at all. In this case, your training would not yield any result. The provided maze in this notebook ensures that there is a path to
    win and you can run this method to check.
    In [7]: def completion_check(model, qmaze):
    for cell in qmaze.free_cells:
    if not qmaze.valid_actions(cell):
    return False
    if not play_game(model, qmaze, cell):
    return False
    return True
    The code you have been given in this block will build the neural network model. Review the code and note the number of layers, as well as
    the activation, optimizer, and loss functions that are used to train the model.
    In [8]: def build_model(maze):
    model = Sequential()
    model.add(Dense(maze.size, input_shape=(maze.size,)))
    model.add(PReLU())
    model.add(Dense(maze.size))
    model.add(PReLU())
    model.add(Dense(num_actions))
    model.compile(optimizer=’adam’, loss=’mse’)
    return model
    This is your deep Q-learning implementation. The goal of your deep Q-learning implementation is to find the best possible navigation
    sequence that results in reaching the treasure cell while maximizing the reward. In your implementation, you need to determine the optimal
    number of epochs to achieve a 100% win rate.
    You will need to complete the section starting with #pseudocode. The pseudocode has been included for you.
    In [9]: def qtrain(model, maze, **opt):
    # exploration factor
    global epsilon
    # number of epochs
    n_epoch = opt.get(‘n_epoch’, 15000)
    # maximum memory to store episodes
    max_memory = opt.get(‘max_memory’, 1000)
    # maximum data size for training
    data_size = opt.get(‘data_size’, 50)
    # start time
    start_time = datetime.datetime.now()
    # Construct environment/game from numpy array: maze (see above)
    qmaze = TreasureMaze(maze)
    # Initialize experience replay object
    experience = GameExperience(model, max_memory=max_memory)
    win_history = []
    # history of win/lose game
    hsize = qmaze.maze.size//2
    # history window size
    win_rate = 0.0
    # pseudocode:
    # For each epoch:
    #
    Agent_cell = randomly select a free cell
    #
    Reset the maze with agent set to above position
    #
    Hint: Review the reset method in the TreasureMaze.py class.
    #
    envstate = Environment.current_state
    #
    Hint: Review the observe method in the TreasureMaze.py class.
    #
    While state is not game over:
    #
    previous_envstate = envstate
    #
    Action = randomly choose action (left, right, up, down) either by exploration o
    r by exploitation
    #
    envstate, reward, game_status = qmaze.act(action)
    #
    Hint: Review the act method in the TreasureMaze.py class.
    #
    episode = [previous_envstate, action, reward, envstate, game_status]
    #
    Store episode in Experience replay object
    #
    Hint: Review the remember method in the GameExperience.py class.
    #
    Train neural network model and evaluate loss
    #
    Hint: Call GameExperience.get_data to retrieve training data (input and target) and
    pass to model.fit method
    #
    to train the model. You can call model.evaluate to determine loss.
    #
    If the win rate is above the threshold and your model passes the completion check,
    that would be your epoch.
    #Print the epoch, loss, episodes, win count, and win rate for each epoch
    dt = datetime.datetime.now() – start_time
    t = format_time(dt.total_seconds())
    template = “Epoch: {:03d}/{:d} | Loss: {:.4f} | Episodes: {:d} | Win count: {:d} | W
    in rate: {:.3f} | time: {}”
    print(template.format(epoch, n_epoch-1, loss, n_episodes, sum(win_history), win_rat
    e, t))
    # We simply check if training has exhausted all free cells and if in all
    # cases the agent won.
    if win_rate > 0.9 : epsilon = 0.05
    if sum(win_history[-hsize:]) == hsize and completion_check(model, qmaze):
    print(“Reached 100%% win rate at epoch: %d” % (epoch,))
    break
    # Determine the total time for training
    dt = datetime.datetime.now() – start_time
    seconds = dt.total_seconds()
    t = format_time(seconds)
    Test Your Model
    Now we will start testing the deep Q-learning implementation. To begin, select Cell, then Run All from the menu bar. This will run your
    notebook. As it runs, you should see output begin to appear beneath the next few cells. The code below creates an instance of
    TreasureMaze.
    In [10]: qmaze = TreasureMaze(maze)
    show(qmaze)
    Out[10]:
    In the next code block, you will build your model and train it using deep Q-learning. Note: This step takes several minutes to fully run.
    In [11]: model = build_model(maze)
    qtrain(model, maze, epochs=1000, max_memory=8*maze.size, data_size=32)
    Epoch: 000/14999 | Loss: 0.0017 | Episodes: 148 | Win count: 0 | Win rate: 0.000 | time: 1
    2.5 seconds
    Epoch: 001/14999 | Loss: 0.0017 | Episodes: 145 | Win count: 0 | Win rate: 0.000 | time: 2
    3.9 seconds
    Epoch: 002/14999 | Loss: 0.0018 | Episodes: 142 | Win count: 0 | Win rate: 0.000 | time: 3
    5.4 seconds
    Epoch: 003/14999 | Loss: 0.0013 | Episodes: 7 | Win count: 1 | Win rate: 0.000 | time: 36.
    0 seconds
    Epoch: 004/14999 | Loss: 0.0013 | Episodes: 1 | Win count: 2 | Win rate: 0.000 | time: 36.
    1 seconds
    Epoch: 005/14999 | Loss: 0.0391 | Episodes: 134 | Win count: 2 | Win rate: 0.000 | time: 4
    6.5 seconds
    Epoch: 006/14999 | Loss: 0.0052 | Episodes: 139 | Win count: 2 | Win rate: 0.000 | time: 5
    7.5 seconds
    Epoch: 007/14999 | Loss: 0.0038 | Episodes: 144 | Win count: 2 | Win rate: 0.000 | time: 6
    8.9 seconds
    Epoch: 008/14999 | Loss: 0.0025 | Episodes: 73 | Win count: 3 | Win rate: 0.000 | time: 7
    4.6 seconds
    Epoch: 009/14999 | Loss: 0.0105 | Episodes: 11 | Win count: 4 | Win rate: 0.000 | time: 7
    5.5 seconds
    Epoch: 010/14999 | Loss: 0.0088 | Episodes: 10 | Win count: 5 | Win rate: 0.000 | time: 7
    6.3 seconds
    Epoch: 011/14999 | Loss: 0.0053 | Episodes: 139 | Win count: 5 | Win rate: 0.000 | time: 8
    7.4 seconds
    Epoch: 012/14999 | Loss: 0.0112 | Episodes: 137 | Win count: 5 | Win rate: 0.000 | time: 9
    8.4 seconds
    Epoch: 013/14999 | Loss: 0.0014 | Episodes: 142 | Win count: 5 | Win rate: 0.000 | time: 1
    09.6 seconds
    Epoch: 014/14999 | Loss: 0.0016 | Episodes: 146 | Win count: 5 | Win rate: 0.000 | time: 1
    21.1 seconds
    Epoch: 015/14999 | Loss: 0.0046 | Episodes: 46 | Win count: 6 | Win rate: 0.000 | time: 12
    4.8 seconds
    Epoch: 016/14999 | Loss: 0.0044 | Episodes: 15 | Win count: 7 | Win rate: 0.000 | time: 12
    6.0 seconds
    Epoch: 017/14999 | Loss: 0.0048 | Episodes: 7 | Win count: 8 | Win rate: 0.000 | time: 12
    6.5 seconds
    Epoch: 018/14999 | Loss: 0.0046 | Episodes: 5 | Win count: 9 | Win rate: 0.000 | time: 12
    7.0 seconds
    Epoch: 019/14999 | Loss: 0.0307 | Episodes: 143 | Win count: 9 | Win rate: 0.000 | time: 1
    38.2 seconds
    Epoch: 020/14999 | Loss: 0.0268 | Episodes: 2 | Win count: 10 | Win rate: 0.000 | time: 13
    8.4 seconds
    Epoch: 021/14999 | Loss: 0.0022 | Episodes: 12 | Win count: 11 | Win rate: 0.000 | time: 1
    39.4 seconds
    Epoch: 022/14999 | Loss: 0.0024 | Episodes: 144 | Win count: 11 | Win rate: 0.000 | time:
    150.8 seconds
    Epoch: 023/14999 | Loss: 0.0161 | Episodes: 142 | Win count: 11 | Win rate: 0.000 | time:
    162.3 seconds
    Epoch: 024/14999 | Loss: 0.0028 | Episodes: 143 | Win count: 11 | Win rate: 0.000 | time:
    173.5 seconds
    Epoch: 025/14999 | Loss: 0.0205 | Episodes: 7 | Win count: 12 | Win rate: 0.000 | time: 17
    4.1 seconds
    Epoch: 026/14999 | Loss: 0.0021 | Episodes: 143 | Win count: 12 | Win rate: 0.000 | time:
    185.4 seconds
    Epoch: 027/14999 | Loss: 0.0044 | Episodes: 1 | Win count: 13 | Win rate: 0.000 | time: 18
    5.5 seconds
    Epoch: 028/14999 | Loss: 0.0413 | Episodes: 141 | Win count: 13 | Win rate: 0.000 | time:
    196.9 seconds
    Epoch: 029/14999 | Loss: 0.0058 | Episodes: 4 | Win count: 14 | Win rate: 0.000 | time: 19
    7.3 seconds
    Epoch: 030/14999 | Loss: 0.0346 | Episodes: 140 | Win count: 14 | Win rate: 0.000 | time:
    209.7 seconds
    Epoch: 031/14999 | Loss: 0.0026 | Episodes: 3 | Win count: 15 | Win rate: 0.000 | time: 21
    0.0 seconds
    Epoch: 032/14999 | Loss: 0.0022 | Episodes: 144 | Win count: 15 | Win rate: 0.469 | time:
    222.2 seconds
    Epoch: 033/14999 | Loss: 0.0743 | Episodes: 15 | Win count: 16 | Win rate: 0.500 | time: 2
    23.4 seconds
    Epoch: 034/14999 | Loss: 0.0366 | Episodes: 5 | Win count: 17 | Win rate: 0.531 | time: 22
    3.8 seconds
    /
    i
    i
    i
    i
    Out[11]: 631.285955
    This cell will check to see if the model passes the completion check. Note: This could take several minutes.
    In [12]: completion_check(model, qmaze)
    show(qmaze)
    Out[12]:
    This cell will test your model for one game. It will start the pirate at the top-left corner and run play_game. The agent should find a path
    from the starting position to the target (treasure). The treasure is located in the bottom-right corner.
    In [13]: pirate_start = (0, 0)
    play_game(model, qmaze, pirate_start)
    show(qmaze)
    Out[13]:
    Save and Submit Your Work
    After you have finished creating the code for your notebook, save your work. Make sure that your notebook contains your name in the
    filename (e.g. Doe_Jane_ProjectTwo.ipynb). This will help your instructor access and grade your work easily. Download a copy of your
    IPYNB file and submit it to Brightspace. Refer to the Jupyter Notebook in Apporto Tutorial if you need help with these tasks.

    Still stressed from student homework?
    Get quality assistance from academic writers!

    Order your essay today and save 25% with the discount code LAVENDER