浏览代码

Update SAC documentation

/develop/sac-apex
Ervin Teng 4 年前
当前提交
55c876c8
共有 1 个文件被更改,包括 11 次插入7 次删除
  1. 18
      docs/Training-SAC.md

18
docs/Training-SAC.md


### Steps Per Update
`steps_per_update` corresponds to the number of agent steps (actions) taken for each mini-batch sampled and used during training. In SAC, a single "update" corresponds to grabbing a batch of size `batch_size` from the experience
replay buffer, and using this mini batch to update the models. Typically, this should be greater
than 1. Note that setting `steps_per_update` lower will improve sample efficiency (reduce the number of steps required to train)
`steps_per_update` corresponds to the average ratio of agent steps (actions) taken to updates made of the agent's
policy. In SAC, a single "update" corresponds to grabbing a batch of size `batch_size` from the experience
replay buffer, and using this mini batch to update the models. Note that it is not guaranteed that after
exactly `steps_per_update` steps an update will be made, only that the ratio will hold true over many steps.
Typically, `steps_per_update` should be greater than or equal to 1. Note that setting `steps_per_update` lower will
improve sample efficiency (reduce the number of steps required to train)
environments) `steps_per_update` equal to the number of agents in the scene is a good balance. For slow environments (steps
take 0.1 seconds or more) reducing `steps_per_update` may improve training speed.
We can also change `steps_per_update` to lower than 1 to update more often than once per step, though this is usually
not neccessary.
environments) `steps_per_update` equal to the number of agents in the scene is a good balance.
For slow environments (steps take 0.1 seconds or more) reducing `steps_per_update` may improve training speed.
We can also change `steps_per_update` to lower than 1 to update more often than once per step, though this will
usually result in a slowdown unless the environment is very slow.
Typical Range: `1` - `20`

正在加载...
取消
保存