浏览代码

[bug-fix] Fix entropy computation for GaussianDistribution (#3684)

/develop/add-fire
GitHub 5 年前
当前提交
141831da
共有 5 个文件被更改,包括 12 次插入4 次删除
  1. 1
      com.unity.ml-agents/CHANGELOG.md
  2. 2
      ml-agents/mlagents/trainers/distributions.py
  3. 1
      ml-agents/mlagents/trainers/sac/optimizer.py
  4. 10
      ml-agents/mlagents/trainers/tests/test_distributions.py
  5. 2
      ml-agents/mlagents/trainers/tests/test_simple_rl.py

1
com.unity.ml-agents/CHANGELOG.md


- Renamed 'Generalization' feature to 'Environment Parameter Randomization'.
- Fixed an issue where specifying `vis_encode_type` was required only for SAC. (#3677)
- The way that UnityEnvironment decides the port was changed. If no port is specified, the behavior will depend on the `file_name` parameter. If it is `None`, 5004 (the editor port) will be used; otherwise 5005 (the base environment port) will be used.
- Fixed the reported entropy values for continuous actions (#3684)
- Fixed an issue where switching models using `SetModel()` during training would use an excessive amount of memory. (#3664)
- Environment subprocesses now close immediately on timeout or wrong API version. (#3679)
- Fixed an issue in the gym wrapper that would raise an exception if an Agent called EndEpisode multiple times in the same step. (#3700)

2
ml-agents/mlagents/trainers/distributions.py


self, encoded: "GaussianDistribution.MuSigmaTensors"
) -> tf.Tensor:
single_dim_entropy = 0.5 * tf.reduce_mean(
tf.log(2 * np.pi * np.e) + tf.square(encoded.log_sigma)
tf.log(2 * np.pi * np.e) + 2 * encoded.log_sigma
)
# Make entropy the right shape
return tf.ones_like(tf.reshape(encoded.mu[:, 0], [-1])) * single_dim_entropy

1
ml-agents/mlagents/trainers/sac/optimizer.py


"q1_loss": self.q1_loss,
"q2_loss": self.q2_loss,
"entropy_coef": self.ent_coef,
"entropy": self.policy.entropy,
"update_batch": self.update_batch_policy,
"update_value": self.update_batch_value,
"update_entropy": self.update_batch_entropy,

10
ml-agents/mlagents/trainers/tests/test_distributions.py


def test_gaussian_distribution():
with tf.Graph().as_default():
logits = tf.Variable(initial_value=[[0, 0]], trainable=True, dtype=tf.float32)
logits = tf.Variable(initial_value=[[1, 1]], trainable=True, dtype=tf.float32)
distribution = GaussianDistribution(
logits,
act_size=VECTOR_ACTION_SPACE,

assert out.shape[1] == VECTOR_ACTION_SPACE[0]
output = sess.run([distribution.total_log_probs])
assert output[0].shape[0] == 1
# Test entropy is correct
log_std_tensor = tf.get_default_graph().get_tensor_by_name(
"log_std/BiasAdd:0"
)
feed_dict = {log_std_tensor: [[1.0, 1.0]]}
entropy = sess.run([distribution.entropy], feed_dict=feed_dict)
# Entropy with log_std of 1.0 should be 2.42
assert pytest.approx(entropy[0], 0.01) == 2.42
def test_tanh_distribution():

2
ml-agents/mlagents/trainers/tests/test_simple_rl.py


def test_recurrent_ppo(use_discrete):
env = MemoryEnvironment([BRAIN_NAME], use_discrete=use_discrete)
override_vals = {
"max_steps": 4000,
"max_steps": 5000,
"batch_size": 64,
"buffer_size": 128,
"learning_rate": 1e-3,

正在加载...
取消
保存