浏览代码

Added reference to Basics in Jupyter installation

- Added consistent naming to the 3D Balance Ball environment
- Minor fixes to the Basics notebook
/develop-generalizationTraining-TrainerController
Marwan Mattar 6 年前
当前提交
095632d6
共有 7 个文件被更改,包括 130 次插入57 次删除
  1. 7
      docs/Background-Jupyter.md
  2. 17
      docs/Getting-Started-with-Balance-Ball.md
  3. 4
      docs/Learning-Environment-Examples.md
  4. 4
      docs/ML-Agents-Overview.md
  5. 2
      docs/Readme.md
  6. 2
      docs/Training-PPO.md
  7. 151
      python/Basics.ipynb

7
docs/Background-Jupyter.md


# Jupyter
# Background: Jupyter
embedded visualizations. We provide one such notebook, `Basics.ipynb`, for
testing the Python API.
embedded visualizations. We provide one such notebook, `python/Basics.ipynb`, for
testing the Pyton control interface to a Unity build in the
[Getting Started with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md).
For a walkthrough of how to use Jupyter, see
[Running the Jupyter Notebook](http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/execute.html)

17
docs/Getting-Started-with-Balance-Ball.md


# Getting Started with the 3D Balance Ball Example
# Getting Started with the 3D Balance Ball Environment
This tutorial walks through the end-to-end process of opening an ML-Agents
example environment in Unity, building the Unity executable, training an agent

environments or as ways to test new ML algorithms. After reading this tutorial,
you should be able to explore and build the example environments.
![Balance Ball](images/balance.png)
![3D Balance Ball](images/balance.png)
This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball contains
a number of platforms and balls (which are all copies of each other).

In order to install and set up ML-Agents, the Python dependencies and Unity,
see the [installation instructions](Installation.md).
## Understanding a Unity Environment (Balance Ball)
## Understanding a Unity Environment (3D Balance Ball)
An agent is an autonomous actor that observes and interacts with an
_environment_. In the context of Unity, an environment is a scene containing

The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several platforms. Each platform in the scene is an
independent agent, but they all share the same brain. Balance Ball does this
independent agent, but they all share the same brain. 3D Balance Ball does this
to speed up training since all twelve agents contribute to training in parallel.
### Academy

## Training the Brain with Reinforcement Learning
Now that we have a Unity executable containing the simulation environment, we
can perform the training.
can perform the training. To first ensure that your environment and the Python
API work as expected, you can use the `python/Basics`
[Jupyter notebook](Background-Jupyter.md).
This notebook contains a simple walkthrough of the functionality of the API.
Within `Basics`, be sure to set `env_name` to the name of the environment file
you built earlier.
### Training with PPO

* Lesson - only interesting when performing
[curriculum training](Training-Curriculum-Learning.md).
This is not used in the 3d Balance Ball environment.
This is not used in the 3D Balance Ball environment.
* Cumulative Reward - The mean cumulative episode reward over all agents.
Should increase during a successful training session.
* Entropy - How random the decisions of the model are. Should slowly decrease

4
docs/Learning-Environment-Examples.md


* Visual Observations: 0
* Reset Parameters: None
## 3DBall
## 3DBall: 3D Balance Ball
![Balance Ball](images/balance.png)
![3D Balance Ball](images/balance.png)
* Set-up: A balance-ball task, where the agent controls the platform.
* Goal: The agent must balance the platform in order to keep the ball on it for as long as possible.

4
docs/ML-Agents-Overview.md


The
[Getting Started with the 3D Balance Ball Example](Getting-Started-with-Balance-Ball.md)
tutorial covers this training mode with the **Balance Ball** sample environment.
tutorial covers this training mode with the **3D Balance Ball** sample environment.
### Custom Training and Inference

To help you use ML-Agents, we've created several in-depth tutorials
for [installing ML-Agents](Installation.md),
[getting started](Getting-Started-with-Balance-Ball.md)
with a sample Balance Ball environment (one of our many
with the 3D Balance Ball environment (one of our many
[sample environments](Learning-Environment-Examples.md)) and
[making your own environment](Learning-Environment-Create-New.md).

2
docs/Readme.md


* [Installation & Set-up](Installation.md)
* [Background: Jupyter Notebooks](Background-Jupyter.md)
* [Docker Set-up (Experimental)](Using-Docker.md)
* [Getting Started with the Balance Ball Environment](Getting-Started-with-Balance-Ball.md)
* [Getting Started with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md)
* [Example Environments](Learning-Environment-Examples.md)
## Creating Learning Environments

2
docs/Training-PPO.md


# Training with Proximal Policy Optimization
This section is still to be written. Refer to [Getting Started with the Balance Ball Environment](Getting-Started-with-Balance-Ball.md) for a walk-through of the PPO training process.
This section is still to be written. Refer to [Getting Started with the 3D Balance Ball Environment](Getting-Started-with-Balance-Ball.md) for a walk-through of the PPO training process.
## Best Practices when training with PPO

151
python/Basics.ipynb


"cell_type": "markdown",
"metadata": {},
"source": [
"# Unity ML Agents\n",
"# Unity ML-Agents\n",
"This notebook contains a walkthrough of the basic functions of the Python API for Unity ML Agents. For instructions on building a Unity environment, see [here](https://github.com/Unity-Technologies/ml-agents/wiki/Getting-Started-with-Balance-Ball)."
"This notebook contains a walkthrough of the basic functions of the Python API for Unity ML-Agents. For instructions on building a Unity environment, see [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Getting-Started-with-Balance-Ball.md)."
]
},
{

"### 1. Load dependencies"
"### 1. Set environment parameters\n",
"\n",
"Be sure to set `env_name` to the name of the Unity environment file you want to launch."
"execution_count": null,
"metadata": {
"collapsed": true
},
"execution_count": 1,
"metadata": {},
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"\n",
"from unityagents import UnityEnvironment\n",
"\n",
"%matplotlib inline"
"env_name = \"3DBall\" # Name of the Unity environment binary to launch\n",
"train_mode = True # Whether to run the environment in training or inference mode"
]
},
{

"### 2. Set environment parameters\n",
"\n",
"Be sure to set `env_name` to the name of the Unity environment file you want to launch."
"### 2. Load dependencies"
"execution_count": null,
"metadata": {
"collapsed": true
},
"execution_count": 2,
"metadata": {},
"env_name = \"3DBall\" # Name of the Unity environment binary to launch\n",
"train_mode = True # Whether to run the environment in training or inference mode"
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"\n",
"from unityagents import UnityEnvironment\n",
"\n",
"%matplotlib inline"
]
},
{

},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:unityagents:\n",
"'Ball3DAcademy' started successfully!\n",
"Unity Academy name: Ball3DAcademy\n",
" Number of Brains: 1\n",
" Number of External Brains : 1\n",
" Lesson number : 0\n",
" Reset Parameters :\n",
"\t\t\n",
"Unity brain name: Ball3DBrain\n",
" Number of Visual Observations (per agent): 0\n",
" Vector Observation space type: continuous\n",
" Vector Observation space size (per agent): 8\n",
" Number of stacked Vector Observation: 3\n",
" Vector Action space type: continuous\n",
" Vector Action space size (per agent): 2\n",
" Vector Action descriptions: , \n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Unity Academy name: Ball3DAcademy\n",
" Number of Brains: 1\n",
" Number of External Brains : 1\n",
" Lesson number : 0\n",
" Reset Parameters :\n",
"\t\t\n",
"Unity brain name: Ball3DBrain\n",
" Number of Visual Observations (per agent): 0\n",
" Vector Observation space type: continuous\n",
" Vector Observation space size (per agent): 8\n",
" Number of stacked Vector Observation: 3\n",
" Vector Action space type: continuous\n",
" Vector Action space size (per agent): 2\n",
" Vector Action descriptions: , \n"
]
}
],
"source": [
"env = UnityEnvironment(file_name=env_name)\n",
"\n",

},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Agent state looks like: \n",
"[ 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. -0.01467304 -0.01468306 -0.52082086 4.\n",
" -0.79952097 0. 0. 0. ]\n"
]
}
],
"source": [
"# Reset the environment\n",
"env_info = env.reset(train_mode=train_mode)[default_brain]\n",

},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total reward this episode: 0.700000025331974\n",
"Total reward this episode: 1.500000037252903\n",
"Total reward this episode: 0.6000000238418579\n",
"Total reward this episode: 1.0000000298023224\n",
"Total reward this episode: 0.40000002086162567\n",
"Total reward this episode: 0.5000000223517418\n",
"Total reward this episode: 0.700000025331974\n",
"Total reward this episode: 1.1000000312924385\n",
"Total reward this episode: 0.9000000283122063\n",
"Total reward this episode: 1.1000000312924385\n"
]
}
],
"source": [
"for episode in range(10):\n",
" env_info = env.reset(train_mode=train_mode)[default_brain]\n",

},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"execution_count": 6,
"metadata": {},
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {

"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
"version": "3.6.3"
}
},
"nbformat": 4,
正在加载...
取消
保存