浏览代码

clipping values and updated zombie

/develop/coma-withq
Andrew Cohen 4 年前
当前提交
b0bf7817
共有 7 个文件被更改,包括 1183 次插入817 次删除
  1. 944
      Project/Assets/ML-Agents/Examples/PushBlock/Scenes/2ZombieVs3AgentsPushBlock.unity
  2. 6
      Project/Assets/ML-Agents/Examples/PushBlock/Scripts/ZombiePushBlockDeathEnvController.cs
  3. 4
      config/ppo/PushBlock.yaml
  4. 12
      ml-agents/mlagents/trainers/ppo/optimizer_torch.py
  5. 1001
      Project/Assets/ML-Agents/Examples/PushBlock/Prefabs/ZombiePushBlockCollabArea 1.prefab
  6. 7
      Project/Assets/ML-Agents/Examples/PushBlock/Prefabs/ZombiePushBlockCollabArea 1.prefab.meta
  7. 26
      config/ppo/Zombie.yaml

944
Project/Assets/ML-Agents/Examples/PushBlock/Scenes/2ZombieVs3AgentsPushBlock.unity
文件差异内容过多而无法显示
查看文件

6
Project/Assets/ML-Agents/Examples/PushBlock/Scripts/ZombiePushBlockDeathEnvController.cs


item.Agent.AddReward(score);
}
}
}
StartCoroutine(GoalScoredSwapGroundMaterial(m_PushBlockSettings.goalScoredMaterial, 0.5f));
StartCoroutine(GoalScoredSwapGroundMaterial(m_PushBlockSettings.goalScoredMaterial, 0.5f));
ResetScene();
ResetScene();
}
}
public void ZombieTouchedBlock()

4
config/ppo/PushBlock.yaml


batch_size: 128
buffer_size: 2048
learning_rate: 0.0003
beta: 0.01
beta: 0.001
epsilon: 0.2
lambd: 0.95
num_epoch: 3

hidden_units: 512
hidden_units: 256
num_layers: 2
vis_encode_type: simple
reward_signals:

12
ml-agents/mlagents/trainers/ppo/optimizer_torch.py


for name, head in values.items():
old_val_tensor = old_values[name]
returns_tensor = returns[name]
#clipped_value_estimate = old_val_tensor + torch.clamp(
# head - old_val_tensor, -1 * epsilon, epsilon
#)
clipped_value_estimate = old_val_tensor + torch.clamp(
head - old_val_tensor, -1 * epsilon, epsilon
)
#v_opt_b = (returns_tensor - clipped_value_estimate) ** 2
#value_loss = ModelUtils.masked_mean(torch.max(v_opt_a, v_opt_b), loss_masks)
value_loss = ModelUtils.masked_mean(v_opt_a, loss_masks)
v_opt_b = (returns_tensor - clipped_value_estimate) ** 2
value_loss = ModelUtils.masked_mean(torch.max(v_opt_a, v_opt_b), loss_masks)
#value_loss = ModelUtils.masked_mean(v_opt_a, loss_masks)
value_losses.append(value_loss)
value_loss = torch.mean(torch.stack(value_losses))
return value_loss

1001
Project/Assets/ML-Agents/Examples/PushBlock/Prefabs/ZombiePushBlockCollabArea 1.prefab
文件差异内容过多而无法显示
查看文件

7
Project/Assets/ML-Agents/Examples/PushBlock/Prefabs/ZombiePushBlockCollabArea 1.prefab.meta


fileFormatVersion: 2
guid: 0ed44cea8ffd043ca8db76dd48deb7a2
PrefabImporter:
externalObjects: {}
userData:
assetBundleName:
assetBundleVariant:

26
config/ppo/Zombie.yaml


behaviors:
PushBlock:
trainer_type: ppo
hyperparameters:
batch_size: 128
buffer_size: 2048
learning_rate: 0.0003
beta: 0.01
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: constant
network_settings:
normalize: false
hidden_units: 256
num_layers: 3
vis_encode_type: simple
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
keep_checkpoints: 5
max_steps: 20000000 #2000000
time_horizon: 64
summary_freq: 10000
threaded: true
正在加载...
取消
保存