浏览代码

Update steps_per_update documentation

Add constant
Tweak buffer max size
/develop/sac-apex
Ervin Teng 5 年前
当前提交
817aab95
共有 3 个文件被更改,包括 12 次插入9 次删除
  1. 16
      docs/Training-SAC.md
  2. 2
      ml-agents/mlagents/trainers/agent_processor.py
  3. 3
      ml-agents/mlagents/trainers/sac/trainer.py

16
docs/Training-SAC.md


Curiosity reward, which can be used to encourage exploration in sparse extrinsic reward
environments.
#### Number of Updates for Reward Signal (Optional)
#### Steps Per Update for Reward Signal (Optional)
`reward_signal_steps_per_update` for the reward signals corresponds to the number of steps per mini batch sampled
and used for updating the reward signals. By default, we update the reward signals once every time the main policy is updated.

### Steps Per Update
`steps_per_update` corresponds to the number of agent steps (actions) taken for each mini-batch sampled and used during training. In SAC, a single "update" corresponds to grabbing a batch of size `batch_size` from the experience
replay buffer, and using this mini batch to update the models. Typically, this should be greater than 1.
However, to imitate the training procedure in certain papers (e.g.
[Kostrikov et. al](http://arxiv.org/abs/1809.02925), [Blondé et. al](http://arxiv.org/abs/1809.02064)),
we may want to update N times with different mini batches before grabbing additional samples.
We can change `steps_per_update` to lower than 1 to accomplish this.
replay buffer, and using this mini batch to update the models. Typically, this should be greater
than 1. Note that setting `steps_per_update` lower will improve sample efficiency (reduce the number of steps required to train)
but increase the CPU time spent performaing updates. For most environments where steps are fairly fast (e.g. our example
environments) `steps_per_update` equals the number of agents in the scene is a good balance. For slow environments (steps
take 0.1 seconds or more) reducing `steps_per_update` may improve training speed.
We can also change `steps_per_update` to lower than 1 to update more often than once per step, though this is usually
not neccessary.
Typical Range: `10` - `20`
Typical Range: `1` - `20`
### Tau

2
ml-agents/mlagents/trainers/agent_processor.py


pass
def __init__(self, behavior_id: str, maxlen: int = 1000):
def __init__(self, behavior_id: str, maxlen: int = 20):
"""
Initializes an AgentManagerQueue. Note that we can give it a behavior_id so that it can be identified
separately from an AgentManager.

3
ml-agents/mlagents/trainers/sac/trainer.py


logger = get_logger(__name__)
BUFFER_TRUNCATE_PERCENT = 0.8
DEFAULT_STEPS_PER_UPDATE = 1
class SACTrainer(RLTrainer):

self.steps_per_update = (
trainer_parameters["steps_per_update"]
if "steps_per_update" in trainer_parameters
else 1
else DEFAULT_STEPS_PER_UPDATE
)
self.reward_signal_steps_per_update = (
trainer_parameters["reward_signals"]["reward_signal_steps_per_update"]

正在加载...
取消
保存