浏览代码

no target net

/develop/coma-noact
Andrew Cohen 4 年前
当前提交
5741f8f6
共有 3 个文件被更改,包括 5 次插入5 次删除
  1. 4
      config/ppo/PushBlock.yaml
  2. 4
      ml-agents/mlagents/trainers/optimizer/torch_optimizer.py
  3. 2
      ml-agents/mlagents/trainers/ppo/optimizer_torch.py

4
config/ppo/PushBlock.yaml


batch_size: 128
buffer_size: 2048
learning_rate: 0.0003
beta: 0.01
beta: 0.005
lambd: 0.95
lambd: 0.8
num_epoch: 3
learning_rate_schedule: linear
network_settings:

4
ml-agents/mlagents/trainers/optimizer/torch_optimizer.py


memory = torch.zeros([1, 1, self.policy.m_size])
value_estimates, marg_val_estimates, mem = self.policy.actor_critic.target_critic_pass(
value_estimates, marg_val_estimates, mem = self.policy.actor_critic.critic_pass(
current_obs,
actions,
memory,

)
next_value_estimates, next_marg_val_estimates, next_mem = self.policy.actor_critic.target_critic_pass(
next_value_estimates, next_marg_val_estimates, next_mem = self.policy.actor_critic.critic_pass(
next_obs,
next_actions,
memory,

2
ml-agents/mlagents/trainers/ppo/optimizer_torch.py


self.optimizer.step()
ModelUtils.soft_update(
self.policy.actor_critic.critic, self.policy.actor_critic.target, .001
self.policy.actor_critic.critic, self.policy.actor_critic.target, 1.
)
update_stats = {

正在加载...
取消
保存