浏览代码
Updating the barracuda 1.4.0 (#5291)
Updating the barracuda 1.4.0 (#5291)
Initial commit second commit. The no-extrinsic was trained without the log reward (reward = prob) while the new one is (reward = log_prob - log_prior) A few results, it looks like Walker-diverse-r05-bigger.onnx is doing something Modified pushblock using next state and action. Did not help Fixing bug that had 9 diversity settings instead of 8 removing results/exp-continuous-div
vincentpierre
4 年前
当前提交
4e14879d
共有 27 个文件被更改,包括 6511 次插入 和 54 次删除
-
37Project/Assets/ML-Agents/Examples/PushBlock/Prefabs/PushBlockArea.prefab
-
17Project/Assets/ML-Agents/Examples/PushBlock/Scripts/PushAgentBasic.cs
-
79Project/Assets/ML-Agents/Examples/Pyramids/Prefabs/AreaPB.prefab
-
11Project/Assets/ML-Agents/Examples/Pyramids/Scripts/PyramidAgent.cs
-
70Project/Assets/ML-Agents/Examples/Walker/Prefabs/Platforms/Platform.prefab
-
86Project/Assets/ML-Agents/Examples/Walker/Prefabs/Ragdoll/WalkerRagdoll.prefab
-
10Project/Assets/ML-Agents/Examples/Walker/Scripts/WalkerAgent.cs
-
2com.unity.ml-agents/CHANGELOG.md
-
2com.unity.ml-agents/package.json
-
13config/ppo/Pyramids.yaml
-
4config/ppo/Walker.yaml
-
7ml-agents/mlagents/trainers/settings.py
-
3ml-agents/mlagents/trainers/torch/components/reward_providers/__init__.py
-
4ml-agents/mlagents/trainers/torch/components/reward_providers/reward_provider_factory.py
-
1001Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-10M.onnx
-
14Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-10M.onnx.meta
-
1001Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-diverse-no-extrinsic.onnx
-
14Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-diverse-no-extrinsic.onnx.meta
-
1001Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-diverse-r02-bigger.onnx
-
14Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-diverse-r02-bigger.onnx.meta
-
1001Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-diverse-r05-bigger.onnx
-
14Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-diverse-r05-bigger.onnx.meta
-
1001Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-extrinsic-log-diverse.onnx
-
14Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-extrinsic-log-diverse.onnx.meta
-
1001Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-new-reward-1.onnx
-
14Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-new-reward-1.onnx.meta
-
130ml-agents/mlagents/trainers/torch/components/reward_providers/diverse_reward_provider.py
1001
Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-10M.onnx
文件差异内容过多而无法显示
查看文件
文件差异内容过多而无法显示
查看文件
|
|||
fileFormatVersion: 2 |
|||
guid: c4c1c9de2772f48e8b0cc2cdd62ce8c5 |
|||
ScriptedImporter: |
|||
internalIDToNameTable: [] |
|||
externalObjects: {} |
|||
serializedVersion: 2 |
|||
userData: |
|||
assetBundleName: |
|||
assetBundleVariant: |
|||
script: {fileID: 11500000, guid: 683b6cb6d0a474744822c888b46772c9, type: 3} |
|||
optimizeModel: 1 |
|||
forceArbitraryBatchSize: 1 |
|||
treatErrorsAsWarnings: 0 |
|||
importMode: 1 |
1001
Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-diverse-no-extrinsic.onnx
文件差异内容过多而无法显示
查看文件
文件差异内容过多而无法显示
查看文件
|
|||
fileFormatVersion: 2 |
|||
guid: e97d22662d00b43ed999531846572c36 |
|||
ScriptedImporter: |
|||
internalIDToNameTable: [] |
|||
externalObjects: {} |
|||
serializedVersion: 2 |
|||
userData: |
|||
assetBundleName: |
|||
assetBundleVariant: |
|||
script: {fileID: 11500000, guid: 683b6cb6d0a474744822c888b46772c9, type: 3} |
|||
optimizeModel: 1 |
|||
forceArbitraryBatchSize: 1 |
|||
treatErrorsAsWarnings: 0 |
|||
importMode: 1 |
1001
Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-diverse-r02-bigger.onnx
文件差异内容过多而无法显示
查看文件
文件差异内容过多而无法显示
查看文件
|
|||
fileFormatVersion: 2 |
|||
guid: 7c800225c636c4e299f1decdfa9b9029 |
|||
ScriptedImporter: |
|||
internalIDToNameTable: [] |
|||
externalObjects: {} |
|||
serializedVersion: 2 |
|||
userData: |
|||
assetBundleName: |
|||
assetBundleVariant: |
|||
script: {fileID: 11500000, guid: 683b6cb6d0a474744822c888b46772c9, type: 3} |
|||
optimizeModel: 1 |
|||
forceArbitraryBatchSize: 1 |
|||
treatErrorsAsWarnings: 0 |
|||
importMode: 1 |
1001
Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-diverse-r05-bigger.onnx
文件差异内容过多而无法显示
查看文件
文件差异内容过多而无法显示
查看文件
|
|||
fileFormatVersion: 2 |
|||
guid: 55f976c29f6ff4d2bbe6dff72645410a |
|||
ScriptedImporter: |
|||
internalIDToNameTable: [] |
|||
externalObjects: {} |
|||
serializedVersion: 2 |
|||
userData: |
|||
assetBundleName: |
|||
assetBundleVariant: |
|||
script: {fileID: 11500000, guid: 683b6cb6d0a474744822c888b46772c9, type: 3} |
|||
optimizeModel: 1 |
|||
forceArbitraryBatchSize: 1 |
|||
treatErrorsAsWarnings: 0 |
|||
importMode: 1 |
1001
Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-extrinsic-log-diverse.onnx
文件差异内容过多而无法显示
查看文件
文件差异内容过多而无法显示
查看文件
|
|||
fileFormatVersion: 2 |
|||
guid: a9ceaa0e80ae34d229e1a2277d3388e5 |
|||
ScriptedImporter: |
|||
internalIDToNameTable: [] |
|||
externalObjects: {} |
|||
serializedVersion: 2 |
|||
userData: |
|||
assetBundleName: |
|||
assetBundleVariant: |
|||
script: {fileID: 11500000, guid: 683b6cb6d0a474744822c888b46772c9, type: 3} |
|||
optimizeModel: 1 |
|||
forceArbitraryBatchSize: 1 |
|||
treatErrorsAsWarnings: 0 |
|||
importMode: 1 |
1001
Project/Assets/ML-Agents/Examples/Walker/TFModels/Walker-new-reward-1.onnx
文件差异内容过多而无法显示
查看文件
文件差异内容过多而无法显示
查看文件
|
|||
fileFormatVersion: 2 |
|||
guid: 29312cd291f9d4f2d91c7272a2a14324 |
|||
ScriptedImporter: |
|||
internalIDToNameTable: [] |
|||
externalObjects: {} |
|||
serializedVersion: 2 |
|||
userData: |
|||
assetBundleName: |
|||
assetBundleVariant: |
|||
script: {fileID: 11500000, guid: 683b6cb6d0a474744822c888b46772c9, type: 3} |
|||
optimizeModel: 1 |
|||
forceArbitraryBatchSize: 1 |
|||
treatErrorsAsWarnings: 0 |
|||
importMode: 1 |
|
|||
import numpy as np |
|||
from typing import Dict |
|||
from mlagents.torch_utils import torch |
|||
|
|||
from mlagents_envs.base_env import ObservationType |
|||
from mlagents.trainers.buffer import AgentBuffer |
|||
from mlagents.trainers.torch.components.reward_providers.base_reward_provider import ( |
|||
BaseRewardProvider, |
|||
) |
|||
from mlagents.trainers.settings import DiverseSettings |
|||
from mlagents.trainers.torch.action_flattener import ActionFlattener |
|||
from mlagents.trainers.torch.agent_action import AgentAction |
|||
|
|||
from mlagents_envs.base_env import BehaviorSpec |
|||
from mlagents_envs import logging_util |
|||
from mlagents.trainers.torch.utils import ModelUtils |
|||
from mlagents.trainers.torch.networks import NetworkBody |
|||
from mlagents.trainers.trajectory import ObsUtil |
|||
|
|||
logger = logging_util.get_logger(__name__) |
|||
|
|||
|
|||
class DiverseRewardProvider(BaseRewardProvider): |
|||
# From https://arxiv.org/pdf/1802.06070.pdf |
|||
def __init__(self, specs: BehaviorSpec, settings: DiverseSettings) -> None: |
|||
super().__init__(specs, settings) |
|||
self._ignore_done = False # Tried with false. Bias for staying alive. |
|||
self._use_actions = False |
|||
|
|||
self._network = DiverseNetwork(specs, settings, self._use_actions) |
|||
self.optimizer = torch.optim.SGD( |
|||
self._network.parameters(), lr=settings.learning_rate |
|||
) |
|||
self._diverse_index = -1 |
|||
self._max_index = len(specs.observation_specs) |
|||
for i, spec in enumerate(specs.observation_specs): |
|||
if spec.observation_type == ObservationType.GOAL_SIGNAL: |
|||
self._diverse_index = i |
|||
|
|||
def evaluate(self, mini_batch: AgentBuffer) -> np.ndarray: |
|||
with torch.no_grad(): |
|||
prediction = self._network(mini_batch) |
|||
truth = ModelUtils.list_to_tensor( |
|||
ObsUtil.from_buffer(mini_batch, self._max_index)[self._diverse_index] |
|||
) |
|||
rewards = torch.log(torch.sum((prediction * truth), dim=1) + 1e-10) \ |
|||
- np.log(1 / self._network.diverse_size) |
|||
|
|||
return rewards.detach().cpu().numpy() |
|||
|
|||
def update(self, mini_batch: AgentBuffer) -> Dict[str, np.ndarray]: |
|||
prediction = self._network(mini_batch) |
|||
truth = ModelUtils.list_to_tensor( |
|||
ObsUtil.from_buffer(mini_batch, self._max_index)[self._diverse_index] |
|||
) |
|||
# loss = torch.mean( |
|||
# torch.sum(-torch.log(prediction + 1e-10) * truth, dim=1), dim=0 |
|||
# ) |
|||
loss = - torch.mean( |
|||
torch.log(torch.sum((prediction * truth), dim=1)) |
|||
) |
|||
self.optimizer.zero_grad() |
|||
loss.backward() |
|||
self.optimizer.step() |
|||
return {"Losses/DIVERSE Loss": loss.detach().cpu().numpy()} |
|||
|
|||
def get_modules(self): |
|||
return {f"Module:{self.name}": self._network} |
|||
|
|||
|
|||
class DiverseNetwork(torch.nn.Module): |
|||
EPSILON = 1e-10 |
|||
|
|||
def __init__(self, specs: BehaviorSpec, settings: DiverseSettings, use_actions:bool) -> None: |
|||
super().__init__() |
|||
self._use_actions = use_actions |
|||
state_encoder_settings = settings.network_settings |
|||
if state_encoder_settings.memory is not None: |
|||
state_encoder_settings.memory = None |
|||
logger.warning( |
|||
"memory was specified in network_settings but is not supported. It is being ignored." |
|||
) |
|||
self._action_flattener = ActionFlattener(specs.action_spec) |
|||
new_spec = [ |
|||
spec |
|||
for spec in specs.observation_specs |
|||
if spec.observation_type != ObservationType.GOAL_SIGNAL |
|||
] |
|||
diverse_spec = [ |
|||
spec |
|||
for spec in specs.observation_specs |
|||
if spec.observation_type == ObservationType.GOAL_SIGNAL |
|||
][0] |
|||
|
|||
print(" > ",new_spec , "\n\n\n", " >> ", diverse_spec) |
|||
self._all_obs_specs = specs.observation_specs |
|||
|
|||
self.diverse_size = diverse_spec.shape[0] |
|||
|
|||
if self._use_actions: |
|||
self._encoder = NetworkBody( |
|||
new_spec, state_encoder_settings, self._action_flattener.flattened_size |
|||
) |
|||
else: |
|||
self._encoder = NetworkBody( |
|||
new_spec, state_encoder_settings |
|||
) |
|||
self._last_layer = torch.nn.Linear( |
|||
state_encoder_settings.hidden_units, self.diverse_size |
|||
) |
|||
|
|||
def forward(self, mini_batch: AgentBuffer) -> torch.Tensor: |
|||
n_obs = len(self._encoder.processors) + 1 |
|||
np_obs = ObsUtil.from_buffer_next(mini_batch, n_obs) |
|||
# Convert to tensors |
|||
tensor_obs = [ |
|||
ModelUtils.list_to_tensor(obs) |
|||
for obs, spec in zip(np_obs, self._all_obs_specs) |
|||
if spec.observation_type != ObservationType.GOAL_SIGNAL |
|||
] |
|||
|
|||
if self._use_actions: |
|||
action = self._action_flattener.forward(AgentAction.from_buffer(mini_batch)) |
|||
hidden, _ = self._encoder.forward(tensor_obs, action) |
|||
else: |
|||
hidden, _ = self._encoder.forward(tensor_obs) |
|||
self._encoder.update_normalization(mini_batch) |
|||
|
|||
prediction = torch.softmax(self._last_layer(hidden), dim=1) |
|||
return prediction |
撰写
预览
正在加载...
取消
保存
Reference in new issue