浏览代码

Try reduce bias

/develop/coma2/singlenetwork
Ervin Teng 4 年前
当前提交
4fe8d036
共有 1 个文件被更改,包括 5 次插入1 次删除
  1. 6
      ml-agents/mlagents/trainers/ppo/trainer.py

6
ml-agents/mlagents/trainers/ppo/trainer.py


from collections import defaultdict
from typing import cast
import statistics
import numpy as np

:return: list of advantage estimates for time-steps t to T.
"""
value_estimates = np.append(value_estimates, value_next)
delta_t = rewards + gamma * value_estimates[1:] - baseline
q_estimate = rewards + gamma * value_estimates[1:]
delta_t = (q_estimate - statistics.mean(q_estimate)) - (
baseline - statistics.mean(baseline)
)
advantage = discount_rewards(r=delta_t, gamma=gamma * lambd)
return advantage

正在加载...
取消
保存