浏览代码

first commit

/develop-gpu-test
Anupam Bhatnagar 5 年前
当前提交
fddede25
共有 4 个文件被更改,包括 76 次插入44 次删除
  1. 23
      docs/Readme.md
  2. 74
      docs/Training-ML-Agents.md
  3. 6
      docs/Using-Tensorboard.md
  4. 17
      docs/Python-venv.md

23
docs/Readme.md


* [Installation](Installation.md)
* [Background: Jupyter Notebooks](Background-Jupyter.md)
* [Docker Set-up](Using-Docker.md)
* [Basic Guide](Basic-Guide.md)
* [Using Python Virtual Environment](Python-venv.md)
* [Basic Guide](Basic-Guide.md)
* [ML-Agents Toolkit Overview](ML-Agents-Overview.md)
* [Background: Unity](Background-Unity.md)
* [Background: Machine Learning](Background-Machine-Learning.md)

[Heuristic](Learning-Environment-Design-Heuristic-Brains.md),
[Learning](Learning-Environment-Design-Learning-Brains.md)
* [Learning Environment Best Practices](Learning-Environment-Best-Practices.md)
* [Using the Monitor](Feature-Monitor.md)
* [Using the Video Recorder](https://github.com/Unity-Technologies/video-recorder)
* [Using an Executable Environment](Learning-Environment-Executable.md)
* [Creating Custom Protobuf Messages](Creating-Custom-Protobuf-Messages.md)
Optional for first time users
* [Using the Monitor](Feature-Monitor.md)
* [Using the Video Recorder](https://github.com/Unity-Technologies/video-recorder)
* [Using an Executable Environment](Learning-Environment-Executable.md)
* [Creating Custom Protobuf Messages](Creating-Custom-Protobuf-Messages.md)
* [Using TensorBoard to Observe Training](Using-Tensorboard.md)
### Advanced Training Methods
* [Training Using Concurrent Unity Instances](Training-Using-Concurrent-Unity-Instances.md)
### Cloud Training
* [Training Using Concurrent Unity Instances](Training-Using-Concurrent-Unity-Instances.md)
* [Using TensorBoard to Observe Training](Using-Tensorboard.md)
## Inference

74
docs/Training-ML-Agents.md


using TensorBoard during or after training by running the following command:
```sh
tensorboard --logdir=summaries
tensorboard --logdir=summaries --port 6006
```
And then opening the URL: [localhost:6006](http://localhost:6006).

the oldest checkpoint is deleted when saving a new checkpoint. Defaults to 5.
* `--lesson=<n>`: Specify which lesson to start with when performing curriculum
training. Defaults to 0.
* `--load`: If set, the training code loads an already trained model to
initialize the neural network before training. The learning code looks for the
model in `models/<run-id>/` (which is also where it saves models at the end of
training). When not set (the default), the neural network weights are randomly
initialized and an existing model is not loaded.
* `--num-envs=<n>`: Specifies the number of concurrent Unity environment instances to
collect experiences from when training. Defaults to 1.
* `--base-port`: Specifies the starting port. Each concurrent Unity environment instance
will get assigned a port sequentially, starting from the `base-port`. Each instance
will use the port `(base_port + worker_id)`, where the `worker_id` is sequential IDs
given to each instance from 0 to `num_envs - 1`. Default is 5005.
* `--docker-target-name=<dt>`: The Docker Volume on which to store curriculum,
executable and model files. See [Using Docker](Using-Docker.md).
* `--run-id=<path>`: Specifies an identifier for each training run. This
identifier is used to name the subdirectories in which the trained model and
summary statistics are saved as well as the saved model itself. The default id

[Academy Properties](Learning-Environment-Design-Academy.md#academy-properties).
* `--train`: Specifies whether to train model or only run in inference mode.
When training, **always** use the `--train` option.
* `--num-envs=<n>`: Specifies the number of concurrent Unity environment instances to collect
experiences from when training. Defaults to 1.
* `--base-port`: Specifies the starting port. Each concurrent Unity environment instance will get assigned a port sequentially, starting from the `base-port`. Each instance will use the port `(base_port + worker_id)`, where the `worker_id` is sequential IDs given to each instance from 0 to `num_envs - 1`. Default is 5005.
* `--docker-target-name=<dt>`: The Docker Volume on which to store curriculum,
executable and model files. See [Using Docker](Using-Docker.md).
* `--load`: If set, the training code loads an already trained model to
initialize the neural network before training. The learning code looks for the
model in `models/<run-id>/` (which is also where it saves models at the end of
training). When not set (the default), the neural network weights are randomly
initialized and an existing model is not loaded.
* `--no-graphics`: Specify this option to run the Unity executable in
`-batchmode` and doesn't initialize the graphics driver. Use this only if your
training doesn't involve visual observations (reading from Pixels). See

The training config files `config/trainer_config.yaml`, `config/sac_trainer_config.yaml`,
`config/gail_config.yaml`, `config/online_bc_config.yaml` and `config/offline_bc_config.yaml`
specifies the training method, the hyperparameters, and a few additional values to use when
training with PPO, SAC, GAIL (with PPO), and online and offline BC. These files are divided into sections.
The **default** section defines the default values for all the available
training with Proximal Policy Optimization(PPO), Soft Actor-Critic(SAC), GAIL (Generative Adversarial
Imitation Learning) with PPO, and online and offline Behavioral Cloning(BC)/Imitation. These files are
divided into sections. The **default** section defines the default values for all the available
settings. You can also add new sections to override these defaults to train
specific Brains. Name each of these override sections after the GameObject
containing the Brain component that should use these settings. (This GameObject

| **Setting** | **Description** | **Applies To Trainer\*** |
| :------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------- |
| batch_size | The number of experiences in each iteration of gradient descent. | PPO, SAC, BC |
| batch_size | The number of experiences in each iteration of gradient descent. | PPO, SAC, BC |
| buffer_size | The number of experiences to collect before updating the policy model. In SAC, the max size of the experience buffer. | PPO, SAC |
| buffer_init_steps | The number of experiences to collect into the buffer before updating the policy model. | SAC |
| buffer_size | The number of experiences to collect before updating the policy model. In SAC, the max size of the experience buffer. | PPO, SAC |
| buffer_init_steps | The number of experiences to collect into the buffer before updating the policy model. | SAC |
| hidden_units | The number of units in the hidden layers of the neural network. | PPO, SAC, BC |
| init_entcoef | How much the agent should explore in the beginning of training. | SAC |
| hidden_units | The number of units in the hidden layers of the neural network. | PPO, SAC, BC |
| init_entcoef | How much the agent should explore in the beginning of training. | SAC |
| learning_rate | The initial learning rate for gradient descent. | PPO, SAC, BC |
| max_steps | The maximum number of simulation steps to run during a training session. | PPO, SAC, BC |
| memory_size | The size of the memory an agent must keep. Used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC, BC |
| normalize | Whether to automatically normalize observations. | PPO, SAC |
| learning_rate | The initial learning rate for gradient descent. | PPO, SAC, BC |
| max_steps | The maximum number of simulation steps to run during a training session. | PPO, SAC, BC |
| memory_size | The size of the memory an agent must keep. Used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC, BC |
| normalize | Whether to automatically normalize observations. | PPO, SAC |
| num_layers | The number of hidden layers in the neural network. | PPO, SAC, BC |
| pretraining | Use demonstrations to bootstrap the policy neural network. See [Pretraining Using Demonstrations](Training-PPO.md#optional-pretraining-using-demonstrations). | PPO, SAC |
| reward_signals | The reward signals used to train the policy. Enable Curiosity and GAIL here. See [Reward Signals](Reward-Signals.md) for configuration options. | PPO, SAC, BC |
| save_replay_buffer | Saves the replay buffer when exiting training, and loads it on resume. | SAC |
| sequence_length | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC, BC |
| summary_freq | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard. | PPO, SAC, BC |
| tau | How aggressively to update the target network used for bootstrapping value estimation in SAC. | SAC |
| time_horizon | How many steps of experience to collect per-agent before adding it to the experience buffer. | PPO, SAC, (online)BC |
| trainer | The type of training to perform: "ppo", "sac", "offline_bc" or "online_bc". | PPO, SAC, BC |
| train_interval | How often to update the agent. | SAC |
| num_update | Number of mini-batches to update the agent with during each update. | SAC |
| use_recurrent | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC, BC |
| num_layers | The number of hidden layers in the neural network. | PPO, SAC, BC |
| pretraining | Use demonstrations to bootstrap the policy neural network. See [Pretraining Using Demonstrations](Training-PPO.md#optional-pretraining-using-demonstrations). | PPO, SAC |
| reward_signals | The reward signals used to train the policy. Enable Curiosity and GAIL here. See [Reward Signals](Reward-Signals.md) for configuration options. | PPO, SAC, BC |
| save_replay_buffer | Saves the replay buffer when exiting training, and loads it on resume. | SAC |
| sequence_length | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC, BC |
| summary_freq | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard. | PPO, SAC, BC |
| tau | How aggressively to update the target network used for bootstrapping value estimation in SAC. | SAC |
| time_horizon | How many steps of experience to collect per-agent before adding it to the experience buffer. | PPO, SAC, (online)BC |
| trainer | The type of training to perform: "ppo", "sac", "offline_bc" or "online_bc". | PPO, SAC, BC |
| train_interval | How often to update the agent. | SAC |
| num_update | Number of mini-batches to update the agent with during each update. | SAC |
| use_recurrent | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC, BC |
\*PPO = Proximal Policy Optimization, SAC = Soft Actor-Critic, BC = Behavioral Cloning (Imitation)
For specific advice on setting hyperparameters based on the type of training you
are conducting, see:

6
docs/Using-Tensorboard.md


3. From the command line run :
```sh
tensorboard --logdir=summaries
tensorboard --logdir=summaries --port=6006
**Note** The default port tensorboard uses is 6006. If there is an existing session
running on port 6006 a new session can be launched on an open port using the --port
option.
**Note:** If you don't assign a `run-id` identifier, `mlagents-learn` uses the
default string, "ppo". All the statistics will be saved to the same sub-folder

17
docs/Python-venv.md


# Installing and Running ML-Agents in a virtual environment
__Requirement - Python 3.6 must be installed on the server. Python 3.6 can be [here](https://www.python.org/downloads/)__
## Mac OS X Setup
1. Create a folder where the virtual environments will live ` $ mkdir ~/python-venvs `
1. To create a new environment named `test-env` execute `$ python3 -m venv ~/python-envs/test-env`
1. To activate the environment execute `$ source ~/python-envs/test-env/bin/activate`
1. Install ML-Agents package using `$ pip3 install mlagents`
1. To deactivate the environment execute `$ deactivate `
## Ubuntu Setup
1. Install the python3-venv package using `$ sudo apt-get install python3-venv`
Now follow the steps in the Mac OS X installation.
正在加载...
取消
保存