浏览代码

Improvements to Training-ML-Agents (#3776)

* Improvements to Training-ML-Agents

- Removed duplicate documentation
- Moved CLI descriptions to learn.py
- Reorganized "Training with mlagents-learn" into 5 sub-sections

* fixed formatting errors and incorporated minor feedback

* minor improvement

* Minor formatting.

* fixed run-id references

* Keeping link to use Inference consistent with master

Will update the UIE page in a separate PR.

* Squashed commit of the following:

commit 9600d0fbe6684eca69fb5bab84ab0f6754fc8b0f
Author: Marwan Mattar <marwan@unity3d.com>
Date:   Tue Apr 14 17:45:33 2020 -0700

    Various doc improvements (#3775)

    * Various doc improvements

    For Using-Virtual-Environment.md:
    - Made a note regarding updating setuptools and pip.
    - Changed lists from "-" to "*"

    For Using-Tensorboard.md:
    - Changed the ordered list to use "1."

    For Training-on-Microsoft-Azure-Custom-Instance.md:
    - Deleted ...
/develop/gym-wrapper
GitHub 5 年前
当前提交
8c5edc99
共有 2 个文件被更改,包括 208 次插入255 次删除
  1. 368
      docs/Training-ML-Agents.md
  2. 95
      ml-agents/mlagents/trainers/learn.py

368
docs/Training-ML-Agents.md


# Training ML-Agents
The ML-Agents toolkit conducts training using an external Python training
process. During training, this external process communicates with the Academy
to generate a block of agent experiences. These
experiences become the training set for a neural network used to optimize the
agent's policy (which is essentially a mathematical function mapping
observations to actions). In reinforcement learning, the neural network
optimizes the policy by maximizing the expected rewards. In imitation learning,
the neural network optimizes the policy to achieve the smallest difference
between the actions chosen by the agent trainee and the actions chosen by the
expert in the same situation.
The output of the training process is a model file containing the optimized
policy. This model file is a TensorFlow data graph containing the mathematical
operations and the optimized weights selected during the training process. You
can set the generated model file in the Behaviors Parameters under your
Agent in your Unity project to decide the best course of action for an agent.
Use the command `mlagents-learn` to train your agents. This command is installed
with the `mlagents` package and its implementation can be found at
`ml-agents/mlagents/trainers/learn.py`. The [configuration file](#training-config-file),
like `config/trainer_config.yaml` specifies the hyperparameters used during training.
You can edit this file with a text editor to add a specific configuration for
each Behavior.
For a broad overview of reinforcement learning, imitation learning and all the
training scenarios, methods and options within the ML-Agents Toolkit, see
[ML-Agents Toolkit Overview](ML-Agents-Overview.md).
For a broader overview of reinforcement learning, imitation learning and the
ML-Agents training process, see [ML-Agents Toolkit
Overview](ML-Agents-Overview.md).
Once your learning environment has been created and is ready for training, the
next step is to initiate a training run. Training in the ML-Agents Toolkit is
powered by a dedicated Python package, `mlagents`. This package exposes a
command `mlagents-learn` that is the single entry point for all training
workflows (e.g. reinforcement leaning, imitation learning, curriculum learning).
Its implementation can be found at
[ml-agents/mlagents/trainers/learn.py](../ml-agents/mlagents/trainers/learn.py).
Use the `mlagents-learn` command to train agents. `mlagents-learn` supports
training with
[reinforcement learning](Background-Machine-Learning.md#reinforcement-learning),
[curriculum learning](Training-Curriculum-Learning.md),
and [behavioral cloning imitation learning](Training-Imitation-Learning.md).
### Starting Training
Run `mlagents-learn` from the command line to launch the training process. Use
the command line patterns and the `config/trainer_config.yaml` file to control
training options.
`mlagents-learn` is the main training utility provided by the ML-Agents Toolkit.
It accepts a number of CLI options in addition to a YAML configuration file that
contains all the configurations and hyperparameters to be used during training.
The set of configurations and hyperparameters to include in this file depend on
the agents in your environment and the specific training method you wish to
utilize. Keep in mind that the hyperparameter values can have a big impact on
the training performance (i.e. your agent's ability to learn a policy that
solves the task). In this page, we will review all the hyperparameters for all
training methods and provide guidelines and advice on their values.
The basic command for training is:
To view a description of all the CLI options accepted by `mlagents-learn`, use
the `--help`:
mlagents-learn <trainer-config-file> --env=<env_name> --run-id=<run-identifier>
mlagents-learn --help
where
* `<trainer-config-file>` is the file path of the trainer configuration yaml.
* `<env_name>`__(Optional)__ is the name (including path) of your Unity
executable containing the agents to be trained. If `<env_name>` is not passed,
the training will happen in the Editor. Press the :arrow_forward: button in
Unity when the message _"Start training by pressing the Play button in the
Unity Editor"_ is displayed on the screen.
* `<run-identifier>` is an optional identifier you can use to identify the
results of individual training runs.
For example, suppose you have a project in Unity named "CatsOnBicycles" which
contains agents ready to train. To perform the training:
1. [Build the project](Learning-Environment-Executable.md), making sure that you
only include the training scene.
2. Open a terminal or console window.
3. Navigate to the directory where you installed the ML-Agents Toolkit.
4. Run the following to launch the training process using the path to the Unity
environment you built in step 1:
The basic command for training is:
mlagents-learn config/trainer_config.yaml --env=../../projects/Cats/CatsOnBicycles.app --run-id=cob_1
mlagents-learn <trainer-config-file> --env=<env_name> --run-id=<run-identifier>
During a training session, the training program prints out and saves updates at
regular intervals (specified by the `summary_freq` option). The saved statistics
are grouped by the `run-id` value so you should assign a unique id to each
training run if you plan to view the statistics. You can view these statistics
using TensorBoard during or after training by running the following command:
where
```sh
tensorboard --logdir=summaries --port 6006
```
- `<trainer-config-file>` is the file path of the trainer configuration yaml.
This contains all the hyperparameter values. We offer a detailed guide on the
structure of this file and the meaning of the hyperameters (and advice on how
to set them) in the dedicated [Training Config File](#training-config-file)
section below.
- `<env_name>`**(Optional)** is the name (including path) of your
[Unity executable](Learning-Environment-Executable.md) containing the agents
to be trained. If `<env_name>` is not passed, the training will happen in the
Editor. Press the :arrow_forward: button in Unity when the message _"Start
training by pressing the Play button in the Unity Editor"_ is displayed on
the screen.
- `<run-identifier>` is a unique name you can use to identify the results of
your training runs.
And then opening the URL: [localhost:6006](http://localhost:6006).
See the
[Getting Started Guide](Getting-Started.md#training-a-new-model-with-reinforcement-learning)
for a sample execution of the `mlagents-learn` command.
**Note:** The default port TensorBoard uses is 6006. If there is an existing session
running on port 6006 a new session can be launched on an open port using the --port
option.
#### Observing Training
When training is finished, you can find the saved model in the `models` folder
under the assigned run-id — in the cats example, the path to the model would be
`models/cob_1/CatsOnBicycles_cob_1.nn`.
Regardless of which training methods, configurations or hyperparameters you
provide, the training process will always generate three artifacts:
While this example used the default training hyperparameters, you can edit the
[trainer_config.yaml file](#training-config-file) with a text editor to set
different values.
1. Summaries (under the `summaries/` folder): these are training metrics that
are updated throughout the training process. They are helpful to monitor your
training performance and may help inform how to update your hyperparameter
values. See [Using TensorBoard](Using-Tensorboard.md) for more details on how
to visualize the training metrics.
1. Models (under the `models/` folder): these contain the model checkpoints that
are updated throughout training and the final model file (`.nn`). This final
model file is generated once either when training completes or is
interrupted.
1. Timers file (also under the `summaries/` folder): this contains aggregated
metrics on your training process, including time spent on specific code
blocks. See [Profiling in Python](Profiling-Python.md) for more information
on the timers generated.
To interrupt training and save the current progress, hit Ctrl+C once and wait for the
model to be saved out.
These artifacts (except the `.nn` file) are updated throughout the training
process and finalized when training completes or is interrupted.
### Loading an Existing Model
#### Stopping and Resuming Training
If you've quit training early using Ctrl+C, you can resume the training run by running
`mlagents-learn` again, specifying the same `<run-identifier>` and appending the `--resume` flag
to the command.
To interrupt training and save the current progress, hit `Ctrl+C` once and wait
for the model(s) to be saved out.
You can also use this mode to run inference of an already-trained model in Python.
Append both the `--resume` and `--inference` to do this. Note that if you want to run
inference in Unity, you should use the
[Unity Inference Engine](Getting-started.md#running-a-pre-trained-model).
To resume a previously interrupted or completed training run, use the `--resume`
flag and make sure to specify the previously used run ID.
If you've already trained a model using the specified `<run-identifier>` and `--resume` is not
specified, you will not be able to continue with training. Use `--force` to force ML-Agents to
overwrite the existing data.
If you would like to re-run a previously interrupted or completed training run
and re-use the same run ID (in this case, overwriting the previously generated
artifacts), then use the `--force` flag.
Alternatively, you might want to start a new training run but _initialize_ it using an already-trained
model. You may want to do this, for instance, if your environment changed and you want
a new model, but the old behavior is still better than random. You can do this by specifying `--initialize-from=<run-identifier>`, where `<run-identifier>` is the old run ID.
#### Loading an Existing Model
### Command Line Training Options
You can also use this mode to run inference of an already-trained model in
Python by using both the `--resume` and `--inference` flags. Note that if you
want to run inference in Unity, you should use the
[Unity Inference Engine](Getting-Started.md#running-a-pre-trained-model).
In addition to passing the path of the Unity executable containing your training
environment, you can set the following command line options when invoking
`mlagents-learn`:
Alternatively, you might want to start a new training run but _initialize_ it
using an already-trained model. You may want to do this, for instance, if your
environment changed and you want a new model, but the old behavior is still
better than random. You can do this by specifying
`--initialize-from=<run-identifier>`, where `<run-identifier>` is the old run
ID.
* `--env=<env>`: Specify an executable environment to train.
* `--curriculum=<file>`: Specify a curriculum JSON file for defining the
lessons for curriculum training. See [Curriculum
Training](Training-Curriculum-Learning.md) for more information.
* `--sampler=<file>`: Specify a sampler YAML file for defining the
sampler for parameter randomization. See [Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md) for more information.
* `--keep-checkpoints=<n>`: Specify the maximum number of model checkpoints to
keep. Checkpoints are saved after the number of steps specified by the
`save-freq` option. Once the maximum number of checkpoints has been reached,
the oldest checkpoint is deleted when saving a new checkpoint. Defaults to 5.
* `--lesson=<n>`: Specify which lesson to start with when performing curriculum
training. Defaults to 0.
* `--num-envs=<n>`: Specifies the number of concurrent Unity environment instances to
collect experiences from when training. Defaults to 1.
* `--run-id=<run-identifier>`: Specifies an identifier for each training run. This
identifier is used to name the subdirectories in which the trained model and
summary statistics are saved as well as the saved model itself. The default id
is "ppo". If you use TensorBoard to view the training statistics, always set a
unique run-id for each training run. (The statistics for all runs with the
same id are combined as if they were produced by a the same session.)
* `--save-freq=<n>`: Specifies how often (in steps) to save the model during
training. Defaults to 50000.
* `--seed=<n>`: Specifies a number to use as a seed for the random number
generator used by the training code.
* `--env-args=<string>`: Specify arguments for the executable environment. Be aware that
the standalone build will also process these as
[Unity Command Line Arguments](https://docs.unity3d.com/Manual/CommandLineArguments.html).
You should choose different argument names if you want to create environment-specific arguments.
All arguments after this flag will be passed to the executable. For example, setting
`mlagents-learn config/trainer_config.yaml --env-args --num-orcs 42` would result in
` --num-orcs 42` passed to the executable.
* `--base-port`: Specifies the starting port. Each concurrent Unity environment instance
will get assigned a port sequentially, starting from the `base-port`. Each instance
will use the port `(base_port + worker_id)`, where the `worker_id` is sequential IDs
given to each instance from 0 to `num_envs - 1`. Default is 5005. __Note:__ When
training using the Editor rather than an executable, the base port will be ignored.
* `--inference`: Specifies whether to only run in inference mode. Omit to train the model.
To load an existing model, specify a run-id and combine with `--resume`.
* `--resume`: If set, the training code loads an already trained model to
initialize the neural network before training. The learning code looks for the
model in `models/<run-id>/` (which is also where it saves models at the end of
training). This option only works when the models exist, and have the same behavior names
as the current agents in your scene.
* `--force`: Attempting to train a model with a run-id that has been used before will
throw an error. Use `--force` to force-overwrite this run-id's summary and model data.
* `--initialize-from=<run-identifier>`: Specify an old run-id here to initialize your model from
a previously trained model. Note that the previously saved models _must_ have the same behavior
parameters as your current environment.
* `--no-graphics`: Specify this option to run the Unity executable in
`-batchmode` and doesn't initialize the graphics driver. Use this only if your
training doesn't involve visual observations (reading from Pixels). See
[here](https://docs.unity3d.com/Manual/CommandLineArguments.html) for more
details.
* `--debug`: Specify this option to enable debug-level logging for some parts of the code.
* `--cpu`: Forces training using CPU only.
* Engine Configuration :
* `--width` : The width of the executable window of the environment(s) in pixels
(ignored for editor training) (Default 84)
* `--height` : The height of the executable window of the environment(s) in pixels
(ignored for editor training). (Default 84)
* `--quality-level` : The quality level of the environment(s). Equivalent to
calling `QualitySettings.SetQualityLevel` in Unity. (Default 5)
* `--time-scale` : The time scale of the Unity environment(s). Equivalent to setting
`Time.timeScale` in Unity. (Default 20.0, maximum 100.0)
* `--target-frame-rate` : The target frame rate of the Unity environment(s).
Equivalent to setting `Application.targetFrameRate` in Unity. (Default: -1)
## Training Config File
### Training Config File
The Unity ML-Agents Toolkit provides a wide range of training scenarios, methods
and options. As such, specific training runs may require different training
configurations and may generate different artifacts and TensorBoard statistics.
This section offers a detailed guide into how to manage the different training
set-ups withing the toolkit.
The training config files `config/trainer_config.yaml`, `config/sac_trainer_config.yaml`,
`config/gail_config.yaml` and `config/offline_bc_config.yaml` specifies the training method,
the hyperparameters, and a few additional values to use when training with Proximal Policy
Optimization(PPO), Soft Actor-Critic(SAC), GAIL (Generative Adversarial Imitation Learning)
with PPO/SAC, and Behavioral Cloning(BC)/Imitation with PPO/SAC. These files are divided
into sections. The **default** section defines the default values for all the available
training with PPO, SAC, GAIL (with PPO), and BC. These files are divided into sections.
The **default** section defines the default values for all the available settings. You can
also add new sections to override these defaults to train specific Behaviors. Name each of these
override sections after the appropriate `Behavior Name`. Sections for the
The training config files `config/trainer_config.yaml`,
`config/sac_trainer_config.yaml`, `config/gail_config.yaml` and
`config/offline_bc_config.yaml` specifies the training method, the
hyperparameters, and a few additional values to use when training with Proximal
Policy Optimization(PPO), Soft Actor-Critic(SAC), GAIL (Generative Adversarial
Imitation Learning) with PPO/SAC, and Behavioral Cloning(BC)/Imitation with
PPO/SAC. These files are divided into sections. The **default** section defines
the default values for all the available training with PPO, SAC, GAIL (with
PPO), and BC. These files are divided into sections. The **default** section
defines the default values for all the available settings. You can also add new
sections to override these defaults to train specific Behaviors. Name each of
these override sections after the appropriate `Behavior Name`. Sections for the
| **Setting** | **Description** | **Applies To Trainer\*** |
| :------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------- |
| batch_size | The number of experiences in each iteration of gradient descent. | PPO, SAC |
| batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model. | |
| beta | The strength of entropy regularization. | PPO |
| buffer_size | The number of experiences to collect before updating the policy model. In SAC, the max size of the experience buffer. | PPO, SAC |
| buffer_init_steps | The number of experiences to collect into the buffer before updating the policy model. | SAC |
| epsilon | Influences how rapidly the policy can evolve during training. | PPO |
| hidden_units | The number of units in the hidden layers of the neural network. | PPO, SAC |
| init_entcoef | How much the agent should explore in the beginning of training. | SAC |
| lambd | The regularization parameter. | PPO |
| learning_rate | The initial learning rate for gradient descent. | PPO, SAC |
| learning_rate_schedule | Determines how learning rate changes over time. | PPO, SAC |
| max_steps | The maximum number of simulation steps to run during a training session. | PPO, SAC |
| memory_size | The size of the memory an agent must keep. Used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC |
| normalize | Whether to automatically normalize observations. | PPO, SAC |
| num_epoch | The number of passes to make through the experience buffer when performing gradient descent optimization. | PPO |
| num_layers | The number of hidden layers in the neural network. | PPO, SAC |
| behavioral_cloning | Use demonstrations to bootstrap the policy neural network. See [Pretraining Using Demonstrations](Training-PPO.md#optional-behavioral-cloning-using-demonstrations). | PPO, SAC |
| reward_signals | The reward signals used to train the policy. Enable Curiosity and GAIL here. See [Reward Signals](Reward-Signals.md) for configuration options. | PPO, SAC |
| save_replay_buffer | Saves the replay buffer when exiting training, and loads it on resume. | SAC |
| sequence_length | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC |
| summary_freq | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard. | PPO, SAC |
| tau | How aggressively to update the target network used for bootstrapping value estimation in SAC. | SAC |
| time_horizon | How many steps of experience to collect per-agent before adding it to the experience buffer. | PPO, SAC |
| trainer | The type of training to perform: "ppo", "sac", "offline_bc" or "online_bc". | PPO, SAC |
| train_interval | How often to update the agent. | SAC |
| num_update | Number of mini-batches to update the agent with during each update. | SAC |
| use_recurrent | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC |
| init_path | Initialize trainer from a previously saved model. | PPO, SAC |
\*PPO = Proximal Policy Optimization, SAC = Soft Actor-Critic, BC = Behavioral
Cloning (Imitation), GAIL = Generative Adversarial Imitation Learning
\*PPO = Proximal Policy Optimization, SAC = Soft Actor-Critic, BC = Behavioral Cloning (Imitation), GAIL = Generative Adversarial Imitaiton Learning
| **Setting** | **Description** | **Applies To Trainer\*** |
| :--------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------- |
| batch_size | The number of experiences in each iteration of gradient descent. | PPO, SAC |
| batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model. | |
| beta | The strength of entropy regularization. | PPO |
| buffer_size | The number of experiences to collect before updating the policy model. In SAC, the max size of the experience buffer. | PPO, SAC |
| buffer_init_steps | The number of experiences to collect into the buffer before updating the policy model. | SAC |
| epsilon | Influences how rapidly the policy can evolve during training. | PPO |
| hidden_units | The number of units in the hidden layers of the neural network. | PPO, SAC |
| init_entcoef | How much the agent should explore in the beginning of training. | SAC |
| lambd | The regularization parameter. | PPO |
| learning_rate | The initial learning rate for gradient descent. | PPO, SAC |
| learning_rate_schedule | Determines how learning rate changes over time. | PPO, SAC |
| max_steps | The maximum number of simulation steps to run during a training session. | PPO, SAC |
| memory_size | The size of the memory an agent must keep. Used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC |
| normalize | Whether to automatically normalize observations. | PPO, SAC |
| num_epoch | The number of passes to make through the experience buffer when performing gradient descent optimization. | PPO |
| num_layers | The number of hidden layers in the neural network. | PPO, SAC |
| behavioral_cloning | Use demonstrations to bootstrap the policy neural network. See [Pretraining Using Demonstrations](Training-PPO.md#optional-behavioral-cloning-using-demonstrations). | PPO, SAC |
| reward_signals | The reward signals used to train the policy. Enable Curiosity and GAIL here. See [Reward Signals](Reward-Signals.md) for configuration options. | PPO, SAC |
| save_replay_buffer | Saves the replay buffer when exiting training, and loads it on resume. | SAC |
| sequence_length | Defines how long the sequences of experiences must be while training. Only used for training with a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC |
| summary_freq | How often, in steps, to save training statistics. This determines the number of data points shown by TensorBoard. | PPO, SAC |
| tau | How aggressively to update the target network used for bootstrapping value estimation in SAC. | SAC |
| time_horizon | How many steps of experience to collect per-agent before adding it to the experience buffer. | PPO, SAC |
| trainer | The type of training to perform: "ppo", "sac", "offline_bc" or "online_bc". | PPO, SAC |
| train_interval | How often to update the agent. | SAC |
| num_update | Number of mini-batches to update the agent with during each update. | SAC |
| use_recurrent | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC |
| init_path | Initialize trainer from a previously saved model. | PPO, SAC |
* [Training with PPO](Training-PPO.md)
* [Training with SAC](Training-SAC.md)
* [Using Recurrent Neural Networks](Feature-Memory.md)
* [Training with Curriculum Learning](Training-Curriculum-Learning.md)
* [Training with Imitation Learning](Training-Imitation-Learning.md)
* [Training with Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)
- [Training with PPO](Training-PPO.md)
- [Training with SAC](Training-SAC.md)
- [Training with Self-Play](Training-Self-Play.md)
- [Using Recurrent Neural Networks](Feature-Memory.md)
- [Training with Curriculum Learning](Training-Curriculum-Learning.md)
- [Training with Imitation Learning](Training-Imitation-Learning.md)
- [Training with Environment Parameter Randomization](Training-Environment-Parameter-Randomization.md)
[example environments](Learning-Environment-Examples.md)
to the corresponding sections of the `config/trainer_config.yaml` file for each
example to see how the hyperparameters and other configuration variables have
been changed from the defaults.
### Debugging and Profiling
If you enable the `--debug` flag in the command line, the trainer metrics are logged to a CSV file
stored in the `summaries` directory. The metrics stored are:
* brain name
* time to update policy
* time since start of training
* time for last experience collection
* number of experiences used for training
* mean return
This option is not available currently for Behavioral Cloning.
Additionally, we have included basic [Profiling in Python](Profiling-Python.md) as part of the toolkit.
This information is also saved in the `summaries` directory.
[example environments](Learning-Environment-Examples.md) to the corresponding
sections of the `config/trainer_config.yaml` file for each example to see how
the hyperparameters and other configuration variables have been changed from the
defaults.

95
ml-agents/mlagents/trainers/learn.py


)
argparser.add_argument("trainer_config_path")
argparser.add_argument(
"--env", default=None, dest="env_path", help="Name of the Unity executable "
"--env",
default=None,
dest="env_path",
help="Path to the Unity executable to train",
help="Curriculum config yaml file for environment",
help="YAML file for defining the lessons for curriculum training",
)
argparser.add_argument(
"--lesson",
default=0,
type=int,
help="The lesson to start with when performing curriculum training",
help="Reset parameter yaml file for environment",
help="YAML file for defining the sampler for environment parameter randomization",
help="How many model checkpoints to keep",
)
argparser.add_argument(
"--lesson", default=0, type=int, help="Start learning from this lesson"
help="The maximum number of model checkpoints to keep. Checkpoints are saved after the"
"number of steps specified by the save-freq option. Once the maximum number of checkpoints"
"has been reached, the oldest checkpoint is deleted when saving a new checkpoint.",
)
argparser.add_argument(
"--load",

default=False,
dest="resume",
action="store_true",
help="Resumes training from a checkpoint. Specify a --run-id to use this option.",
help="Whether to resume training from a checkpoint. Specify a --run-id to use this option. "
"If set, the training code loads an already trained model to initialize the neural network "
"before resuming training. This option is only valid when the models exist, and have the same "
"behavior names as the current agents in your scene.",
)
argparser.add_argument(
"--force",

help="Force-overwrite existing models and summaries for a run ID that has been used "
"before.",
help="Whether to force-overwrite this run-id's existing summary and model data. (Without "
"this flag, attempting to train a model with a run-id that has been used before will throw "
"an error.",
help="The run identifier for model and summary statistics.",
help="The identifier for the training run. This identifier is used to name the "
"subdirectories in which the trained model and summary statistics are saved as well "
"as the saved model itself. If you use TensorBoard to view the training statistics, "
"always set a unique run-id for each training run. (The statistics for all runs with the "
"same id are combined as if they were produced by a the same session.)",
)
argparser.add_argument(
"--initialize-from",

"This can be used, for instance, to fine-tune an existing model on a new environment. ",
"This can be used, for instance, to fine-tune an existing model on a new environment. "
"Note that the previously saved models must have the same behavior parameters as your "
"current environment.",
"--save-freq", default=50000, type=int, help="Frequency at which to save model"
"--save-freq",
default=50000,
type=int,
help="How often (in steps) to save the model during training",
"--seed", default=-1, type=int, help="Random seed used for training"
"--seed",
default=-1,
type=int,
help="A number to use as a seed for the random number generator used by the training code",
)
argparser.add_argument(
"--train",

default=False,
dest="inference",
action="store_true",
help="Run in Python inference mode (don't train). Use with --resume to load a model trained with an "
"existing run ID.",
help="Whether to run in Python inference mode (i.e. no training). Use with --resume to load "
"a model trained with an existing run ID.",
help="Base port for environment communication",
help="The starting port for environment communication. Each concurrent Unity environment "
"instance will get assigned a port sequentially, starting from the base-port. Each instance "
"will use the port (base_port + worker_id), where the worker_id is sequential IDs given to "
"each instance from 0 to (num_envs - 1). Note that when training using the Editor rather "
"than an executable, the base port will be ignored.",
help="Number of parallel environments to use for training",
help="The number of concurrent Unity environment instances to collect experiences "
"from when training",
help="Whether to run the environment in no-graphics mode",
help="Whether to run the Unity executable in no-graphics mode (i.e. without initializing "
"the graphics driver. Use this only if your agents don't use visual observations.",
help="Whether to run ML-Agents in debug mode with detailed logging",
help="Whether to enable debug-level logging for some parts of the code",
help="Arguments passed to the Unity executable.",
help="Arguments passed to the Unity executable. Be aware that the standalone build will also "
"process these as Unity Command Line Arguments. You should choose different argument names if "
"you want to create environment-specific arguments. All arguments after this flag will be "
"passed to the executable.",
"--cpu", default=False, action="store_true", help="Run with CPU only"
"--cpu",
default=False,
action="store_true",
help="Forces training using CPU only",
)
argparser.add_argument("--version", action="version", version="")

"--width",
default=84,
type=int,
help="The width of the executable window of the environment(s)",
help="The width of the executable window of the environment(s) in pixels "
"(ignored for editor training).",
help="The height of the executable window of the environment(s)",
help="The height of the executable window of the environment(s) in pixels "
"(ignored for editor training)",
help="The quality level of the environment(s)",
help="The quality level of the environment(s). Equivalent to calling "
"QualitySettings.SetQualityLevel in Unity.",
help="The time scale of the Unity environment(s)",
help="The time scale of the Unity environment(s). Equivalent to setting "
"Time.timeScale in Unity.",
help="The target frame rate of the Unity environment(s)",
help="The target frame rate of the Unity environment(s). Equivalent to setting "
"Application.targetFrameRate in Unity.",
)
return argparser

正在加载...
取消
保存