Curiosity reward, which can be used to encourage exploration in sparse extrinsic reward
environments.
#### Number of Updates for Reward Signal (Optional)
#### Steps Per Update for Reward Signal (Optional)
`reward_signal_steps_per_update` for the reward signals corresponds to the number of steps per mini batch sampled
and used for updating the reward signals. By default, we update the reward signals once every time the main policy is updated.
### Steps Per Update
`steps_per_update` corresponds to the number of agent steps (actions) taken for each mini-batch sampled and used during training. In SAC, a single "update" corresponds to grabbing a batch of size `batch_size` from the experience
replay buffer, and using this mini batch to update the models. Typically, this should be greater than 1.
However, to imitate the training procedure in certain papers (e.g.