浏览代码

Merge remote-tracking branch 'upstream/develop' into develop-flat-code-restructure

/develop-generalizationTraining-TrainerController
Deric Pang 7 年前
当前提交
e0e02ae6
共有 8 个文件被更改,包括 2570 次插入58 次删除
  1. 2
      README.md
  2. 18
      docs/Feature-Monitor.md
  3. 20
      docs/Learning-Environment-Create-New.md
  4. 86
      python/mlagents/mlagents/learn.py
  5. 3
      python/mlagents/mlagents/trainers/ppo/trainer.py
  6. 497
      docs/images/3dballhard.png
  7. 1001
      docs/images/bananaimitation.png
  8. 1001
      docs/images/image-banner.png

2
README.md


<img src="docs/images/unity-wide.png" align="middle" width="3000"/>
<img src="docs/images/image-banner.png" align="middle" width="3000"/>
# Unity ML-Agents Toolkit (Beta)
**The Unity Machine Learning Agents Toolkit** (ML-Agents) is an open-source Unity plugin

18
docs/Feature-Monitor.md


The monitor allows visualizing information related to the agents or training process within a Unity scene.
You can track many different things both related and unrelated to the agents themselves. To use the Monitor, call the Log function anywhere in your code :
You can track many different things both related and unrelated to the agents themselves. By default, the Monitor is only active in the *inference* phase, so not during training. To change this behaviour, you can activate or deactivate it by calling `SetActive(boolean)`. For example to also show the monitor during training, you can call it in the `InitializeAcademy()` method of your `Academy`:
```csharp
using MLAgents;
public class YourAcademy : Academy {
public override void InitializeAcademy()
{
Monitor.SetActive(true);
}
}
```
To add values to monitor, call the `Log` function anywhere in your code :
```csharp
Monitor.Log(key, value, target)

* *`float[]`* - The Monitor Log call can take an additional argument called `displayType` that can be either `INDEPENDENT` (default) or `PROPORTIONAL` :
* *`INDEPENDENT`* is used to display multiple independent floats as a histogram. The histogram will be a sequence of vertical sliders.
* *`PROPORTION`* is used to see the proportions between numbers. For each float in values, a rectangle of width of value divided by the sum of all values will be show. It is best for visualizing values that sum to 1.
* *`target`* is the transform to which you want to attach information. If the transform is `null` the information will be attached to the global monitor.
* *`target`* is the transform to which you want to attach information. If the transform is `null` the information will be attached to the global monitor.
* **NB:** When adding a target transform that is not the global monitor, make sure you have your main camera object tagged as `MainCamera` via the inspector. This is needed to properly display the text onto the screen.

20
docs/Learning-Environment-Create-New.md


**Note:** When you mark an agent as done, it stops its activity until it is reset. You can have the agent reset immediately, by setting the Agent.ResetOnDone property to true in the inspector or you can wait for the Academy to reset the environment. This RollerBall environment relies on the `ResetOnDone` mechanism and doesn't set a `Max Steps` limit for the Academy (so it never resets the environment).
To encourage the agent along, we also reward it for getting closer to the target (saving the previous distance measurement between steps):
```csharp
// Getting closer
if (distanceToTarget < previousDistance)
{
AddReward(0.1f);
}
```
It can also encourage an agent to finish a task more quickly to assign a negative reward at each step:
```csharp

Done();
}
// Getting closer
if (distanceToTarget < previousDistance)
{
AddReward(0.1f);
}
// Time penalty
AddReward(-0.05f);

AddReward(-1.0f);
Done();
}
previousDistance = distanceToTarget;
// Actions, size = 2
Vector3 controlSignal = Vector3.zero;

## Final Editor Setup
Now, that all the GameObjects and ML-Agent components are in place, it is time to connect everything together in the Unity Editor. This involves assigning the Brain object to the Agent and setting the Brain properties so that they are compatible with our agent code.
Now, that all the GameObjects and ML-Agent components are in place, it is time to connect everything together in the Unity Editor. This involves assigning the Brain object to the Agent, changing some of the Agent Components properties, and setting the Brain properties so that they are compatible with our agent code.
4. Change `Decision Frequency` from `1` to `5`.
![Assign the Brain to the RollerAgent](images/mlagents-NewTutAssignBrain.png)

86
python/mlagents/mlagents/learn.py


# # Unity ML-Agents Toolkit
# ## ML-Agent Learning
import logging

from docopt import docopt
def run_training(sub_id, run_seed, run_options):
"""
Launches training session.
:param sub_id: Unique id for training session.
:param run_seed: Random seed used for training.
:param run_options: Command line arguments for training.
"""
# Docker Parameters
if run_options['--docker-target-name'] == 'Empty':
docker_target_name = ''
else:
docker_target_name = run_options['--docker-target-name']
# General parameters
run_id = run_options['--run-id']
load_model = run_options['--load']
train_model = run_options['--train']
save_freq = int(run_options['--save-freq'])
keep_checkpoints = int(run_options['--keep-checkpoints'])
worker_id = int(run_options['--worker-id'])
curriculum_file = str(run_options['--curriculum'])
if curriculum_file == "None":
curriculum_file = None
lesson = int(run_options['--lesson'])
fast_simulation = not bool(run_options['--slow'])
no_graphics = run_options['--no-graphics']
# Constants
# Assumption that this yaml is present in same dir as this file
base_path = os.path.dirname(__file__)
trainer_config_path = os.path.abspath(os.path.join(base_path, "trainer_config.yaml"))
# Create controller and begin training.
tc = TrainerController(run_options['<env>'], run_id + "-" + str(sub_id),
save_freq, curriculum_file, fast_simulation,
load_model, train_model, worker_id + sub_id,
keep_checkpoints, lesson, run_seed,
docker_target_name, trainer_config_path, no_graphics)
tc.start_learning()
def main():
print('''

--keep-checkpoints=<n> How many model checkpoints to keep [default: 5].
--lesson=<n> Start learning from this lesson [default: 0].
--load Whether to load the model or randomly initialize [default: False].
--run-id=<path> The sub-directory name for model and summary statistics [default: ppo].
--run-id=<path> The directory name for model and summary statistics [default: ppo].
--worker-id=<n> Number to add to communication port (5005). Used for multi-environment [default: 0].
--docker-target-name=<dt> Docker Volume to store curriculum, executable and model files [default: Empty].
--no-graphics Whether to run the Unity simulator in no-graphics mode [default: False].
--worker-id=<n> Number to add to communication port (5005) [default: 0].
--docker-target-name=<dt> Docker volume to store training-specific files [default: Empty].
--no-graphics Whether to run the environment in no-graphics mode [default: False].
# Docker Parameters
if options['--docker-target-name'] == 'Empty':
docker_target_name = ''
else:
docker_target_name = options['--docker-target-name']
# General parameters
run_id = options['--run-id']
load_model = options['--load']
train_model = options['--train']
save_freq = int(options['--save-freq'])
keep_checkpoints = int(options['--keep-checkpoints'])
worker_id = int(options['--worker-id'])
curriculum_file = str(options['--curriculum'])
if curriculum_file == "None":
curriculum_file = None
lesson = int(options['--lesson'])
fast_simulation = not bool(options['--slow'])
no_graphics = options['--no-graphics']
trainer_config_path = options['<trainer-config-path>']
def run_training(sub_id, use_seed):
tc = TrainerController(env_path, run_id + "-" + str(sub_id), save_freq, curriculum_file, fast_simulation,
load_model, train_model, worker_id + sub_id, keep_checkpoints, lesson, use_seed,
docker_target_name, trainer_config_path, no_graphics)
tc.start_learning()
if env_path is None and num_runs > 1:
raise TrainerError("It is not possible to launch more than one concurrent training session "

for i in range(num_runs):
if seed == -1:
use_seed = np.random.randint(0, 9999)
else:
use_seed = seed
p = multiprocessing.Process(target=run_training, args=(i, use_seed))
seed = np.random.randint(0, 9999)
p = multiprocessing.Process(target=run_training, args=(i, seed, options))
jobs.append(p)
p.start()

3
python/mlagents/mlagents/trainers/ppo/trainer.py


if curr_info.agents != next_info.agents:
curr_info = self.construct_curr_info(next_info)
if len(curr_info.agents) == 0:
return []
if self.use_visual_obs:
for i in range(len(curr_info.visual_observations)):
feed_dict[self.model.visual_in[i]] = curr_info.visual_observations[i]

497
docs/images/3dballhard.png
文件差异内容过多而无法显示
查看文件

1001
docs/images/bananaimitation.png
文件差异内容过多而无法显示
查看文件

1001
docs/images/image-banner.png
文件差异内容过多而无法显示
查看文件

正在加载...
取消
保存