浏览代码

Proper dimensions for entropy, sum before bonus in PPO

/develop/debugtorchfood
Ervin Teng 4 年前
当前提交
e8431a6d
共有 2 个文件被更改,包括 5 次插入2 次删除
  1. 4
      ml-agents/mlagents/trainers/ppo/optimizer_torch.py
  2. 3
      ml-agents/mlagents/trainers/torch/distributions.py

4
ml-agents/mlagents/trainers/ppo/optimizer_torch.py


ModelUtils.list_to_tensor(batch["action_probs"]),
loss_masks,
)
# Use the sum of entropy across actions, not the mean
entropy_sum = torch.sum(entropy, dim=1)
- decay_bet * ModelUtils.masked_mean(entropy, loss_masks)
- decay_bet * ModelUtils.masked_mean(entropy_sum, loss_masks)
)
# Set optimizer learning rate

3
ml-agents/mlagents/trainers/torch/distributions.py


if self.conditional_sigma:
log_sigma = torch.clamp(self.log_sigma(inputs), min=-20, max=2)
else:
log_sigma = self.log_sigma
# Expand so that entropy matches batch size
log_sigma = self.log_sigma.expand(inputs.shape[0], -1)
if self.tanh_squash:
return [TanhGaussianDistInstance(mu, torch.exp(log_sigma))]
else:

正在加载...
取消
保存