浏览代码

Use tanh squash

/develop/torch-tanh
Ervin Teng 4 年前
当前提交
0cdb2040
共有 2 个文件被更改,包括 3 次插入3 次删除
  1. 1
      ml-agents/mlagents/trainers/ppo/trainer.py
  2. 5
      ml-agents/mlagents/trainers/torch/distributions.py

1
ml-agents/mlagents/trainers/ppo/trainer.py


behavior_spec,
self.trainer_settings,
condition_sigma_on_obs=False, # Faster training for PPO
tanh_squash=True,
separate_critic=behavior_spec.action_spec.is_continuous(),
)
return policy

5
ml-agents/mlagents/trainers/torch/distributions.py


def sample(self):
sample = self.mean + torch.randn_like(self.mean) * self.std
return sample / 3
return sample
unscaled_val = value * 3 # Inverse of the clipping
-((unscaled_val - self.mean) ** 2) / (2 * var + EPSILON)
-((value - self.mean) ** 2) / (2 * var + EPSILON)
- log_scale
- math.log(math.sqrt(2 * math.pi))
)

正在加载...
取消
保存