浏览代码

_

/exp-diverse-behavior
vincentpierre 4 年前
当前提交
7c74c967
共有 2 个文件被更改,包括 5 次插入5 次删除
  1. 8
      config/sac/Walker.yaml
  2. 2
      ml-agents/mlagents/trainers/sac/optimizer_torch.py

8
config/sac/Walker.yaml


learning_rate: 0.0003
learning_rate_schedule: constant
batch_size: 1024
buffer_size: 200000 #2000000
buffer_size: 2000000
buffer_init_steps: 0
tau: 0.005
steps_per_update: 30.0

network_settings:
normalize: true
hidden_units: 256
num_layers: 3
hidden_units: 512
num_layers: 4
vis_encode_type: simple
goal_conditioning_type: none
reward_signals:

keep_checkpoints: 5
max_steps: 15000000
max_steps: 150000000
time_horizon: 1000
summary_freq: 30000

2
ml-agents/mlagents/trainers/sac/optimizer_torch.py


with torch.no_grad():
cont_log_probs = log_probs.continuous_tensor
target_current_diff = torch.sum(
cont_log_probs, dim=1) + self.target_entropy.continuous
cont_log_probs, dim=1) + 10 * self.target_entropy.continuous
# print(self.target_entropy.continuous, cont_log_probs, torch.sum(
# cont_log_probs, dim=1) + self.target_entropy.continuous)
# We update all the _cont_ent_coef as one block

正在加载...
取消
保存