ml-agents/python/Basics.ipynb


								{

								 "cells": [

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "# Unity ML-Agents\n",

								    "## Environment Basics\n",

								    "This notebook contains a walkthrough of the basic functions of the Python API for Unity ML-Agents. For instructions on building a Unity environment, see [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Getting-Started-with-Balance-Ball.md)."

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "### 1. Set environment parameters\n",

								    "\n",

								    "Be sure to set `env_name` to the name of the Unity environment file you want to launch."

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": 1,

								   "metadata": {},

								   "outputs": [],

								   "source": [

								    "env_name = \"3DBall\"  # Name of the Unity environment binary to launch\n",

								    "train_mode = True  # Whether to run the environment in training or inference mode"

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "### 2. Load dependencies"

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": 2,

								   "metadata": {},

								   "outputs": [],

								   "source": [

								    "import matplotlib.pyplot as plt\n",

								    "import numpy as np\n",

								    "\n",

								    "from unityagents import UnityEnvironment\n",

								    "\n",

								    "%matplotlib inline"

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "### 3. Start the environment\n",

								    "`UnityEnvironment` launches and begins communication with the environment when instantiated.\n",

								    "\n",

								    "Environments contain _brains_ which are responsible for deciding the actions of their associated _agents_. Here we check for the first brain available, and set it as the default brain we will be controlling from Python."

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": 3,

								   "metadata": {},

								   "outputs": [

								    {

								     "name": "stderr",

								     "output_type": "stream",

								     "text": [

								      "INFO:unityagents:\n",

								      "'Ball3DAcademy' started successfully!\n",

								      "Unity Academy name: Ball3DAcademy\n",

								      "        Number of Brains: 1\n",

								      "        Number of External Brains : 1\n",

								      "        Lesson number : 0\n",

								      "        Reset Parameters :\n",

								      "\t\t\n",

								      "Unity brain name: Ball3DBrain\n",

								      "        Number of Visual Observations (per agent): 0\n",

								      "        Vector Observation space type: continuous\n",

								      "        Vector Observation space size (per agent): 8\n",

								      "        Number of stacked Vector Observation: 3\n",

								      "        Vector Action space type: continuous\n",

								      "        Vector Action space size (per agent): 2\n",

								      "        Vector Action descriptions: , \n"

								     ]

								    },

								    {

								     "name": "stdout",

								     "output_type": "stream",

								     "text": [

								      "Unity Academy name: Ball3DAcademy\n",

								      "        Number of Brains: 1\n",

								      "        Number of External Brains : 1\n",

								      "        Lesson number : 0\n",

								      "        Reset Parameters :\n",

								      "\t\t\n",

								      "Unity brain name: Ball3DBrain\n",

								      "        Number of Visual Observations (per agent): 0\n",

								      "        Vector Observation space type: continuous\n",

								      "        Vector Observation space size (per agent): 8\n",

								      "        Number of stacked Vector Observation: 3\n",

								      "        Vector Action space type: continuous\n",

								      "        Vector Action space size (per agent): 2\n",

								      "        Vector Action descriptions: , \n"

								     ]

								    }

								   ],

								   "source": [

								    "env = UnityEnvironment(file_name=env_name)\n",

								    "\n",

								    "# Examine environment parameters\n",

								    "print(str(env))\n",

								    "\n",

								    "# Set the default brain to work with\n",

								    "default_brain = env.brain_names[0]\n",

								    "brain = env.brains[default_brain]"

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "### 4. Examine the observation and state spaces\n",

								    "We can reset the environment to be provided with an initial set of observations and states for all the agents within the environment. In ML-Agents, _states_ refer to a vector of variables corresponding to relevant aspects of the environment for an agent. Likewise, _observations_ refer to a set of relevant pixel-wise visuals for an agent."

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": 4,

								   "metadata": {},

								   "outputs": [

								    {

								     "name": "stdout",

								     "output_type": "stream",

								     "text": [

								      "Agent state looks like: \n",

								      "[ 0.          0.          0.          0.          0.          0.          0.\n",

								      "  0.          0.          0.          0.          0.          0.          0.\n",

								      "  0.          0.         -0.01467304 -0.01468306 -0.52082086  4.\n",

								      " -0.79952097  0.          0.          0.        ]\n"

								     ]

								    }

								   ],

								   "source": [

								    "# Reset the environment\n",

								    "env_info = env.reset(train_mode=train_mode)[default_brain]\n",

								    "\n",

								    "# Examine the state space for the default brain\n",

								    "print(\"Agent state looks like: \\n{}\".format(env_info.vector_observations[0]))\n",

								    "\n",

								    "# Examine the observation space for the default brain\n",

								    "for observation in env_info.visual_observations:\n",

								    "    print(\"Agent observations look like:\")\n",

								    "    if observation.shape[3] == 3:\n",

								    "        plt.imshow(observation[0,:,:,:])\n",

								    "    else:\n",

								    "        plt.imshow(observation[0,:,:,0])"

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "### 5. Take random actions in the environment\n",

								    "Once we restart an environment, we can step the environment forward and provide actions to all of the agents within the environment. Here we simply choose random actions based on the `action_space_type` of the default brain."

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": 5,

								   "metadata": {},

								   "outputs": [

								    {

								     "name": "stdout",

								     "output_type": "stream",

								     "text": [

								      "Total reward this episode: 0.700000025331974\n",

								      "Total reward this episode: 1.500000037252903\n",

								      "Total reward this episode: 0.6000000238418579\n",

								      "Total reward this episode: 1.0000000298023224\n",

								      "Total reward this episode: 0.40000002086162567\n",

								      "Total reward this episode: 0.5000000223517418\n",

								      "Total reward this episode: 0.700000025331974\n",

								      "Total reward this episode: 1.1000000312924385\n",

								      "Total reward this episode: 0.9000000283122063\n",

								      "Total reward this episode: 1.1000000312924385\n"

								     ]

								    }

								   ],

								   "source": [

								    "for episode in range(10):\n",

								    "    env_info = env.reset(train_mode=train_mode)[default_brain]\n",

								    "    done = False\n",

								    "    episode_rewards = 0\n",

								    "    while not done:\n",

								    "        if brain.vector_action_space_type == 'continuous':\n",

								    "            env_info = env.step(np.random.randn(len(env_info.agents), \n",

								    "                                                brain.vector_action_space_size))[default_brain]\n",

								    "        else:\n",

								    "            env_info = env.step(np.random.randint(0, brain.vector_action_space_size, \n",

								    "                                                  size=(len(env_info.agents))))[default_brain]\n",

								    "        episode_rewards += env_info.rewards[0]\n",

								    "        done = env_info.local_done[0]\n",

								    "    print(\"Total reward this episode: {}\".format(episode_rewards))"

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "### 6. Close the environment when finished\n",

								    "When we are finished using an environment, we can close it with the function below."

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": 6,

								   "metadata": {},

								   "outputs": [],

								   "source": [

								    "env.close()"

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": null,

								   "metadata": {},

								   "outputs": [],

								   "source": []

								  }

								 ],

								 "metadata": {

								  "anaconda-cloud": {},

								  "kernelspec": {

								   "display_name": "Python 3",

								   "language": "python",

								   "name": "python3"

								  },

								  "language_info": {

								   "codemirror_mode": {

								    "name": "ipython",

								    "version": 3

								   },

								   "file_extension": ".py",

								   "mimetype": "text/x-python",

								   "name": "python",

								   "nbconvert_exporter": "python",

								   "pygments_lexer": "ipython3",

								   "version": "3.6.3"

								  }

								 },

								 "nbformat": 4,

								 "nbformat_minor": 1

								}