浏览代码

no target, increase lambda

/develop/coma-noact
Andrew Cohen 4 年前
当前提交
bd341f7f
共有 3 个文件被更改,包括 3 次插入3 次删除
  1. 2
      config/ppo/PushBlock.yaml
  2. 2
      ml-agents/mlagents/trainers/ppo/optimizer_torch.py
  3. 2
      ml-agents/mlagents/trainers/ppo/trainer.py

2
config/ppo/PushBlock.yaml


learning_rate: 0.0003
beta: 0.01
epsilon: 0.2
lambd: 0.8
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:

2
ml-agents/mlagents/trainers/ppo/optimizer_torch.py


self.optimizer.step()
ModelUtils.soft_update(
self.policy.actor_critic.critic, self.policy.actor_critic.target, 0.005
self.policy.actor_critic.critic, self.policy.actor_critic.target, 1.0
)
update_stats = {
# NOTE: abs() is not technically correct, but matches the behavior in TensorFlow.

2
ml-agents/mlagents/trainers/ppo/trainer.py


# This is later use as target for the different value estimates
# agent_buffer_trajectory[f"{name}_returns"].set(local_return)
agent_buffer_trajectory[f"{name}_returns_q"].set(returns_v)
agent_buffer_trajectory[f"{name}_returns_b"].set(returns_v)
agent_buffer_trajectory[f"{name}_returns_b"].set(returns_b)
agent_buffer_trajectory[f"{name}_returns_v"].set(returns_v)
agent_buffer_trajectory[f"{name}_advantage"].set(local_advantage)
tmp_advantages.append(local_advantage)

正在加载...
取消
保存