比较提交

...
此合并请求有变更与目标分支冲突。
/config/ppo/3DBall.yaml
/config/sac/3DBall.yaml
/com.unity.ml-agents/Documentation~/com.unity.ml-agents.md
/com.unity.ml-agents/Runtime/Academy.cs
/com.unity.ml-agents/Runtime/Agent.cs
/com.unity.ml-agents/Runtime/Demonstrations/DemonstrationRecorder.cs
/docs/Installation-Anaconda-Windows.md
/docs/Installation.md
/docs/Training-on-Amazon-Web-Service.md
/ml-agents/mlagents/trainers/learn.py
/ml-agents/mlagents/trainers/trainer_controller.py
/ml-agents/mlagents/trainers/stats.py
/ml-agents/mlagents/trainers/ppo/trainer.py
/ml-agents/mlagents/trainers/sac/trainer.py
/ml-agents/mlagents/trainers/trainer/rl_trainer.py
/README.md
/com.unity.ml-agents/Runtime/DiscreteActionMasker.cs
/ml-agents/mlagents/tf_utils/tf.py
/ml-agents/mlagents/trainers/optimizer/tf_optimizer.py
/ml-agents/mlagents/trainers/policy/tf_policy.py

8 次代码提交

作者 SHA1 备注 提交日期
Anupam Bhatnagar 392a84f1 [skip ci] fixing property decorator in sac 4 年前
Anupam Bhatnagar 8b6c19ae [skip ci] adding should_still_train method to ppo 4 年前
Anupam Bhatnagar 0aedad7c fixing should_still_train call in rl_trainer.py 4 年前
Anupam Bhatnagar 26dc42e5 [skip ci] 4 年前
Anupam Bhatnagar a4567f27 [skip ci] restore process trajectory super calls 4 年前
Anupam Bhatnagar f7a3c06e [skip ci] updating sac 4 年前
Anupam Bhatnagar beb7cb74 [skip ci] adding gpu config settings 4 年前
Anupam Bhatnagar 4afd8f92 first commit 4 年前
共有 20 个文件被更改,包括 87 次插入40 次删除
  1. 6
      README.md
  2. 2
      config/ppo/3DBall.yaml
  3. 2
      config/sac/3DBall.yaml
  4. 4
      docs/Installation-Anaconda-Windows.md
  5. 8
      docs/Installation.md
  6. 2
      docs/Training-on-Amazon-Web-Service.md
  7. 2
      com.unity.ml-agents/Documentation~/com.unity.ml-agents.md
  8. 4
      com.unity.ml-agents/Runtime/Academy.cs
  9. 26
      com.unity.ml-agents/Runtime/Agent.cs
  10. 2
      com.unity.ml-agents/Runtime/Demonstrations/DemonstrationRecorder.cs
  11. 2
      com.unity.ml-agents/Runtime/DiscreteActionMasker.cs
  12. 2
      ml-agents/mlagents/tf_utils/tf.py
  13. 6
      ml-agents/mlagents/trainers/stats.py
  14. 2
      ml-agents/mlagents/trainers/learn.py
  15. 16
      ml-agents/mlagents/trainers/optimizer/tf_optimizer.py
  16. 4
      ml-agents/mlagents/trainers/policy/tf_policy.py
  17. 7
      ml-agents/mlagents/trainers/trainer_controller.py
  18. 7
      ml-agents/mlagents/trainers/ppo/trainer.py
  19. 16
      ml-agents/mlagents/trainers/sac/trainer.py
  20. 7
      ml-agents/mlagents/trainers/trainer/rl_trainer.py

6
README.md


# Unity ML-Agents Toolkit
[![docs badge](https://img.shields.io/badge/docs-reference-blue.svg)](https://github.com/Unity-Technologies/ml-agents/tree/release_2_docs/docs/)
[![docs badge](https://img.shields.io/badge/docs-reference-blue.svg)](https://github.com/Unity-Technologies/ml-agents/tree/release_3_docs/docs/)
[![license badge](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE)

## Releases & Documentation
**Our latest, stable release is `Release 2`. Click
[here](https://github.com/Unity-Technologies/ml-agents/tree/release_2_docs/docs/Readme.md)
**Our latest, stable release is `Release 3`. Click
[here](https://github.com/Unity-Technologies/ml-agents/tree/release_3_docs/docs/Readme.md)
to get started with the latest release of ML-Agents.**
The table below lists all our releases, including our `master` branch which is

2
config/ppo/3DBall.yaml


gamma: 0.99
strength: 1.0
keep_checkpoints: 5
max_steps: 500000
max_steps: 100000
time_horizon: 1000
summary_freq: 12000
threaded: true

2
config/sac/3DBall.yaml


gamma: 0.99
strength: 1.0
keep_checkpoints: 5
max_steps: 500000
max_steps: 100000
time_horizon: 1000
summary_freq: 12000
threaded: true

4
docs/Installation-Anaconda-Windows.md


the ml-agents Conda environment by typing `activate ml-agents`)_:
```sh
git clone --branch release_2 https://github.com/Unity-Technologies/ml-agents.git
git clone --branch release_3 https://github.com/Unity-Technologies/ml-agents.git
The `--branch release_2` option will switch to the tag of the latest stable
The `--branch release_3` option will switch to the tag of the latest stable
release. Omitting that will get the `master` branch which is potentially
unstable.

8
docs/Installation.md


of our tutorials / guides assume you have access to our example environments).
```sh
git clone --branch release_2 https://github.com/Unity-Technologies/ml-agents.git
git clone --branch release_3 https://github.com/Unity-Technologies/ml-agents.git
The `--branch release_2` option will switch to the tag of the latest stable
The `--branch release_3` option will switch to the tag of the latest stable
release. Omitting that will get the `master` branch which is potentially
unstable.

ML-Agents Toolkit for your purposes. If you plan to contribute those changes
back, make sure to clone the `master` branch (by omitting `--branch release_2`
back, make sure to clone the `master` branch (by omitting `--branch release_3`
from the command above). See our
[Contributions Guidelines](../com.unity.ml-agents/CONTRIBUTING.md) for more
information on contributing to the ML-Agents Toolkit.

#### Advanced: Local Installation for Development
You can [add the local](https://docs.unity3d.com/Manual/upm-ui-local.html)
`com.unity.ml-agents` package (from the repository that you just cloned) to our
`com.unity.ml-agents` package (from the repository that you just cloned) to your
project by:
1. navigating to the menu `Window` -> `Package Manager`.

2
docs/Training-on-Amazon-Web-Service.md


2. Clone the ML-Agents repo and install the required Python packages
```sh
git clone --branch release_2 https://github.com/Unity-Technologies/ml-agents.git
git clone --branch release_3 https://github.com/Unity-Technologies/ml-agents.git
cd ml-agents/ml-agents/
pip3 install -e .
```

2
com.unity.ml-agents/Documentation~/com.unity.ml-agents.md


[unity ML-Agents Toolkit]: https://github.com/Unity-Technologies/ml-agents
[unity inference engine]: https://docs.unity3d.com/Packages/com.unity.barracuda@latest/index.html
[package manager documentation]: https://docs.unity3d.com/Manual/upm-ui-install.html
[installation instructions]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Installation.md
[installation instructions]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Installation.md
[github repository]: https://github.com/Unity-Technologies/ml-agents
[python package]: https://github.com/Unity-Technologies/ml-agents
[execution order of event functions]: https://docs.unity3d.com/Manual/ExecutionOrder.html

4
com.unity.ml-agents/Runtime/Academy.cs


* API. For more information on each of these entities, in addition to how to
* set-up a learning environment and train the behavior of characters in a
* Unity scene, please browse our documentation pages on GitHub:
* https://github.com/Unity-Technologies/ml-agents/tree/release_2_docs/docs/
* https://github.com/Unity-Technologies/ml-agents/tree/release_3_docs/docs/
*/
namespace Unity.MLAgents

/// fall back to inference or heuristic decisions. (You can also set agents to always use
/// inference or heuristics.)
/// </remarks>
[HelpURL("https://github.com/Unity-Technologies/ml-agents/tree/release_2_docs/" +
[HelpURL("https://github.com/Unity-Technologies/ml-agents/tree/release_3_docs/" +
"docs/Learning-Environment-Design.md")]
public class Academy : IDisposable
{

26
com.unity.ml-agents/Runtime/Agent.cs


/// [OnDisable()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnDisable.html]
/// [OnBeforeSerialize()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnBeforeSerialize.html
/// [OnAfterSerialize()]: https://docs.unity3d.com/ScriptReference/MonoBehaviour.OnAfterSerialize.html
/// [Agents]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md
/// [Reinforcement Learning in Unity]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design.md
/// [Agents]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md
/// [Reinforcement Learning in Unity]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design.md
/// [Unity ML-Agents Toolkit manual]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Readme.md
/// [Unity ML-Agents Toolkit manual]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Readme.md
[HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/" +
[HelpURL("https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/" +
"docs/Learning-Environment-Design-Agents.md")]
[Serializable]
[RequireComponent(typeof(BehaviorParameters))]

/// for information about mixing reward signals from curiosity and Generative Adversarial
/// Imitation Learning (GAIL) with rewards supplied through this method.
///
/// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md#rewards
/// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
/// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md#rewards
/// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
/// </remarks>
/// <param name="reward">The new value of the reward.</param>
public void SetReward(float reward)

/// for information about mixing reward signals from curiosity and Generative Adversarial
/// Imitation Learning (GAIL) with rewards supplied through this method.
///
/// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md#rewards
/// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
/// [Agents - Rewards]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md#rewards
/// [Reward Signals]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/ML-Agents-Overview.md#a-quick-note-on-reward-signals
///</remarks>
/// <param name="increment">Incremental reward value.</param>
public void AddReward(float increment)

/// implementing a simple heuristic function can aid in debugging agent actions and interactions
/// with its environment.
///
/// [Demonstration Recorder]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md#recording-demonstrations
/// [Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md#actions
/// [Demonstration Recorder]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md#recording-demonstrations
/// [Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md#actions
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// </remarks>
/// <example>

/// For more information about observations, see [Observations and Sensors].
///
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// [Observations and Sensors]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md#observations-and-sensors
/// [Observations and Sensors]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md#observations-and-sensors
/// </remarks>
public virtual void CollectObservations(VectorSensor sensor)
{

///
/// See [Agents - Actions] for more information on masking actions.
///
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md#actions
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md#actions
/// </remarks>
/// <seealso cref="OnActionReceived(float[])"/>
public virtual void CollectDiscreteActionMasks(DiscreteActionMasker actionMasker)

///
/// For more information about implementing agent actions see [Agents - Actions].
///
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md#actions
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md#actions
/// </remarks>
/// <param name="vectorAction">
/// An array containing the action vector. The length of the array is specified

2
com.unity.ml-agents/Runtime/Demonstrations/DemonstrationRecorder.cs


/// See [Imitation Learning - Recording Demonstrations] for more information.
///
/// [GameObject]: https://docs.unity3d.com/Manual/GameObjects.html
/// [Imitation Learning - Recording Demonstrations]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs//Learning-Environment-Design-Agents.md#recording-demonstrations
/// [Imitation Learning - Recording Demonstrations]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs//Learning-Environment-Design-Agents.md#recording-demonstrations
/// </remarks>
[RequireComponent(typeof(Agent))]
[AddComponentMenu("ML Agents/Demonstration Recorder", (int)MenuGroup.Default)]

2
com.unity.ml-agents/Runtime/DiscreteActionMasker.cs


///
/// See [Agents - Actions] for more information on masking actions.
///
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_2_docs/docs/Learning-Environment-Design-Agents.md#actions
/// [Agents - Actions]: https://github.com/Unity-Technologies/ml-agents/blob/release_3_docs/docs/Learning-Environment-Design-Agents.md#actions
/// </remarks>
/// <param name="branch">The branch for which the actions will be masked.</param>
/// <param name="actionIndices">The indices of the masked actions.</param>

2
ml-agents/mlagents/tf_utils/tf.py


# Everywhere else is caught by the banned-modules setting for flake8
import tensorflow as tf # noqa I201
from distutils.version import LooseVersion
import horovod.tensorflow as hvd
# LooseVersion handles things "1.2.3a" or "4.5.6-rc7" fairly sensibly.

"""
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.visible_device_list = str(hvd.local_rank())
# For multi-GPU training, set allow_soft_placement to True to allow
# placing the operation into an alternative device automatically
# to prevent from exceptions if the device doesn't suppport the operation

6
ml-agents/mlagents/trainers/stats.py


from mlagents_envs.logging_util import get_logger
from mlagents_envs.timers import set_gauge
from mlagents.tf_utils import tf, generate_session_config
import horovod.tensorflow as hvd
logger = get_logger(__name__)

) -> None:
is_training = "Not Training."
if "Is Training" in values:
stats_summary = stats_summary = values["Is Training"]
stats_summary = values["Is Training"]
rank = hvd.rank()
"Horovod Rank: {}. "
rank,
category,
step,
time.time() - self.training_start_time,

2
ml-agents/mlagents/trainers/learn.py


add_metadata as add_timer_metadata,
)
from mlagents_envs import logging_util
import horovod.tensorflow as hvd
logger = logging_util.get_logger(__name__)

sampler_manager, resampling_interval = create_sampler_manager(
options.parameter_randomization, run_seed
)
hvd.init()
trainer_factory = TrainerFactory(
options.behaviors,
write_path,

16
ml-agents/mlagents/trainers/optimizer/tf_optimizer.py


from mlagents.trainers.settings import TrainerSettings, RewardSignalType
from mlagents.trainers.components.bc.module import BCModule
try:
import horovod.tensorflow as hvd
except ImportError:
hvd = None
class TFOptimizer(Optimizer): # pylint: disable=W0223
def __init__(self, policy: TFPolicy, trainer_params: TrainerSettings):

def create_optimizer_op(
self, learning_rate: tf.Tensor, name: str = "Adam"
) -> tf.train.Optimizer:
return tf.train.AdamOptimizer(learning_rate=learning_rate, name=name)
if hvd is not None:
adam_optimizer = tf.train.AdamOptimizer(
learning_rate=learning_rate, name=name
)
horovod_optimizer = hvd.DistributedOptimizer(adam_optimizer)
else:
adam_optimizer = tf.train.AdamOptimizer(
learning_rate=learning_rate, name=name
)
return horovod_optimizer if hvd is not None else adam_optimizer
def _execute_model(
self, feed_dict: Dict[tf.Tensor, np.ndarray], out_dict: Dict[str, tf.Tensor]

4
ml-agents/mlagents/trainers/policy/tf_policy.py


from mlagents.trainers.settings import TrainerSettings, NetworkSettings
from mlagents.trainers.brain import BrainParameters
from mlagents.trainers import __version__
import horovod.tensorflow as hvd
logger = get_logger(__name__)

self._load_graph(self.model_path, reset_global_steps=reset_steps)
else:
self._initialize_graph()
self.sess.run(hvd.broadcast_global_variables(0))
def get_weights(self):
with self.graph.as_default():

:param steps: The number of steps the model was trained for
:return:
"""
if hvd.rank() != 0:
return
with self.graph.as_default():
last_checkpoint = os.path.join(self.model_path, f"model-{steps}.ckpt")
self.saver.save(self.sess, last_checkpoint)

7
ml-agents/mlagents/trainers/trainer_controller.py


from mlagents.trainers.agent_processor import AgentManager
from mlagents.trainers.settings import CurriculumSettings
from mlagents.trainers.training_status import GlobalTrainingStatus, StatusType
import horovod.tensorflow as hvd
class TrainerController(object):

"""
Saves current model to checkpoint folder.
"""
if hvd.rank() != 0:
return
for brain_name in self.trainers.keys():
for name_behavior_id in self.brain_name_to_identifier[brain_name]:
self.trainers[brain_name].save_model(name_behavior_id)

"""
Exports latest saved models to .nn format for Unity embedding.
"""
if hvd.rank() != 0:
return
for brain_name in self.trainers.keys():
for name_behavior_id in self.brain_name_to_identifier[brain_name]:
self.trainers[brain_name].export_model(name_behavior_id)

7
ml-agents/mlagents/trainers/ppo/trainer.py


Uses demonstration_buffer to update the policy.
The reward signal generators must be updated in this method at their own pace.
"""
buffer_length = self.update_buffer.num_experiences
self._maybe_write_summary(self.get_step + self.hyperparameters.buffer_size)
self._maybe_save_model(self.get_step + self.hyperparameters.buffer_size)
self._increment_step(self.hyperparameters.buffer_size, self.brain_name)
# Make sure batch_size is a multiple of sequence length. During training, we
# will need to reshape the data into a batch_size x sequence_length tensor.

for _ in range(num_epoch):
self.update_buffer.shuffle(sequence_length=self.policy.sequence_length)
buffer = self.update_buffer
max_num_batch = buffer_length // batch_size
max_num_batch = self.hyperparameters.buffer_size // batch_size
for i in range(0, max_num_batch * batch_size, batch_size):
update_stats = self.optimizer.update(
buffer.make_mini_batch(i, i + batch_size), n_sequences

16
ml-agents/mlagents/trainers/sac/trainer.py


)
)
@property
def should_still_train(self) -> bool:
"""
Returns whether or not the trainer should train. A Trainer could
stop training if it wasn't training to begin with, or if max_steps
is reached.
"""
return self.is_training and self.steps_per_update * self.update_steps <= \
self.get_max_steps
super()._process_trajectory(trajectory)
self._maybe_write_summary(self.get_step + len(trajectory.steps))
self._maybe_save_model(self.get_step + len(trajectory.steps))
self._increment_step(len(trajectory.steps), trajectory.behavior_id)
last_step = trajectory.steps[-1]
agent_id = trajectory.agent_id # All the agents should have the same ID

while (
self.step - self.hyperparameters.buffer_init_steps
) / self.update_steps > self.steps_per_update:
logger.debug("Updating SAC policy at step {}".format(self.step))
buffer = self.update_buffer
if self.update_buffer.num_experiences >= self.hyperparameters.batch_size:

7
ml-agents/mlagents/trainers/trainer/rl_trainer.py


"""
return False
@abc.abstractmethod
def _update_policy(self) -> bool:
"""
Uses demonstration_buffer to update model.

Takes a trajectory and processes it, putting it into the update buffer.
:param trajectory: The Trajectory tuple containing the steps to be processed.
"""
self._maybe_write_summary(self.get_step + len(trajectory.steps))
self._maybe_save_model(self.get_step + len(trajectory.steps))
self._increment_step(len(trajectory.steps), trajectory.behavior_id)
pass
def _maybe_write_summary(self, step_after_process: int) -> None:
"""

"""
if self._next_summary_step == 0: # Don't write out the first one
self._next_summary_step = self._get_next_interval_step(self.summary_freq)
if step_after_process >= self._next_summary_step and self.get_step != 0:
if step_after_process >= self._next_summary_step:
self._write_summary(self._next_summary_step)
def _maybe_save_model(self, step_after_process: int) -> None:

正在加载...
取消
保存