浏览代码

Merge branch 'master' of https://github.com/Unity-Technologies/ml-agents into dev-broadcast

/develop-generalizationTraining-TrainerController
vincentpierre 7 年前
当前提交
431fc43c
共有 6 个文件被更改,包括 22 次插入47 次删除
  1. 48
      docs/Getting-Started-with-Balance-Ball.md
  2. 10
      docs/Unity-Agents---Python-API.md
  3. 4
      docs/installation.md
  4. 4
      python/ppo.py
  5. 1
      python/ppo/history.py
  6. 2
      python/ppo/trainer.py

48
docs/Getting-Started-with-Balance-Ball.md


Let's get started!
## Getting Unity ML Agents
### Start by installing **Unity 2017.1** or later (required)
Download link available [here](https://store.unity.com/download?ref=update).
If you are new to using the Unity Editor, you can find the general documentation [here](https://docs.unity3d.com/Manual/index.html).
## Installation
### Clone the repository
Once installed, you will want to clone the Agents GitHub repository. References will be made throughout to `unity-environment` and `python` directories. Both are located at the root of the repository.
In order to install and set-up the Python and Unity environments, see the instructions [here](installation.md).
## Building Unity Environment
Launch the Unity Editor, and log in, if necessary.

3. Expand the `Ball3DAcademy` GameObject and locate its child object `Ball3DBrain` within the Scene hierarchy in the editor. Ensure Type of Brain for this object is set to `External`.
4. *File -> Build Settings*
5. Choose your target platform:
- (optional) Select “Developer Build” to log debug messages.
- (optional) Select “Development Build” to log debug messages.
## Installing Python API
In order to train an agent within the framework, you will need to install Python 2 or 3, and the dependencies described below.
### Windows Users
If you are a Windows user who is new to Python/TensorFlow, follow [this guide](https://nitishmutha.github.io/tensorflow/2017/01/22/TensorFlow-with-gpu-for-windows.html) to set up your Python environment.
### Requirements
* Jupyter
* Matplotlib
* numpy
* Pillow
* Python (2 or 3)
* TensorFlow (1.0+)
### Installing Dependencies
To install dependencies, go into the `python` directory and run:
`pip install .`
or
`pip3 install .`
If your Python environment doesn't include `pip`, see these [instructions](https://packaging.python.org/guides/installing-using-linux-tools/#installing-pip-setuptools-wheel-with-linux-package-managers) on installing it.
Once dependencies are installed, you are ready to test the Ball balance environment from Python.
## Training the Brain with Reinforcement Learning
### Testing Python API

To ensure that your environment and the Python API work as expected, you can use the `python/Basics` Jupyter notebook. This notebook contains a simple walkthrough of the functionality of the API. Within `Basics`, be sure to set `env_name` to the name of the environment file you built earlier.
## Training the Brain with Reinforcement Learning
### Training with PPO
In order to train an agent to correctly balance the ball, we will use a Reinforcement Learning algorithm called Proximal Policy Optimization (PPO). This is a method that has been shown to be safe, efficient, and more general purpose than many other RL algorithms, as such we have chosen it as the example algorithm for use with ML Agents. For more information on PPO, OpenAI has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/) explaining it.

### Observing Training Progress
In order to observe the training process in more detail, you can use Tensorboard.
In your command line, run :
In your command line, enter into `python` and then run :
`tensorboard --logdir='summaries`
`tensorboard --logdir=summaries`
Then navigate to `localhost:6006`.

4. Select the `3DBallBrain` object from the Scene hierarchy.
5. Change the `Type of Brain` to `Internal`.
6. Drag the `<env_name>.bytes` file from the Project window of the Editor to the `Graph Model` placeholder in the `3DBallBrain` inspector window.
7. Set the `Graph Placeholder` size to 1.
7. Set the `Graph Placeholder` size to 1 (_Note that step 7 and 8 are done because 3DBall is a continuous control environment, and the TensorFlow model requires a noise parameter to decide actions. In cases with discrete control, epsilon is not needed_).
8. Add a placeholder called `epsilon` with a type of `floating point` and a range of values from 0 to 0.
9. Press the Play button at the top of the editor.

10
docs/Unity-Agents---Python-API.md


- `train_model` indicates whether to run the environment in train (`True`) or test (`False`) mode.
- `config` is an optional dictionary of configuration flags specific to the environment. For more information on adding optional config flags to an environment, see [here](Making-a-new-Unity-Environment.md#implementing-yournameacademy). For generic environments, `config` can be ignored. `config` is a dictionary of strings to floats where the keys are the names of the `resetParameters` and the values are their corresponding float values.
- **Step : `env.step(action, memory=None, value = None)`**
Sends a step signal to the environment using the actions. Note that if you have more than one brain in the environment, you must provide a dictionary from brain names to actions.
Sends a step signal to the environment using the actions. For each brain :
- `value` is an optional input that be used to send a single float per agent to be displayed if and `AgentMonitor.cs` component is attached to the agent.
- `value` is an optional input that be used to send a single float per agent to be displayed if and `AgentMonitor.cs` component is attached to the agent.
Note that if you have more than one external brain in the environment, you must provide dictionaries from brain names to arrays for `action`, `memory` and `value`. For example: If you have two external brains named `brain1` and `brain2` each with one agent taking two continuous actions, then you can have:
```python
action = {'brain1':[1.0, 2.0], 'brain2':[3.0,4.0]}
```
Returns a dictionary mapping brain names to BrainInfo objects.
- **Close : `env.close()`**
Sends a shutdown signal to the environment and closes the communication socket.

4
docs/installation.md


* Matplotlib
* numpy
* Pillow
* Python (2 or 3)
* Python (2 or 3; 64bit required)
To install dependencies, go into the `python` directory and run (depending on your python version):
To install dependencies, go into the `python` sub-directory of the repositroy, and run (depending on your python version) from the command line:
`pip install .`

4
python/ppo.py


--hidden-units=<n> Number of units in hidden layer [default: 64].
--batch-size=<n> How many experiences per gradient descent update step [default: 64].
--keep-checkpoints=<n> How many model checkpoints to keep [default: 5].
--worker-id=<n> Number to add to communication port (5005). Used for asynchronous agent scenarios [default: 0].
'''
options = docopt(_USAGE)

save_freq = int(options['--save-freq'])
env_name = options['<env>']
keep_checkpoints = int(options['--keep-checkpoints'])
worker_id = int(options['--worker-id'])
# Algorithm-specific parameters for tuning
gamma = float(options['--gamma'])

hidden_units = int(options['--hidden-units'])
batch_size = int(options['--batch-size'])
env = UnityEnvironment(file_name=env_name)
env = UnityEnvironment(file_name=env_name, worker_id=worker_id)
print(str(env))
brain_name = env.brain_names[0]

1
python/ppo/history.py


:return: Randomized buffer
"""
s = np.arange(global_buffer[history_keys[2]].shape[0])
np.random.shuffle(s)
for key in history_keys:
if len(global_buffer[key]) > 0:
global_buffer[key] = global_buffer[key][s]

2
python/ppo/trainer.py


idx = info.agents.index(agent)
if not info.local_done[idx]:
if self.use_observations:
history['observations'].append(info.observations[idx])
history['observations'].append([info.observations[0][idx]])
if self.use_states:
history['states'].append(info.states[idx])
if self.is_continuous:

正在加载...
取消
保存