浏览代码

Working continuous updates

/develop/nopreviousactions
Ervin Teng 5 年前
当前提交
bc04f9dc
共有 2 个文件被更改,包括 3 次插入3 次删除
  1. 5
      ml-agents/mlagents/trainers/ppo/models.py
  2. 1
      ml-agents/mlagents/trainers/ppo/optimizer.py

5
ml-agents/mlagents/trainers/ppo/models.py


num_layers = 1
if brain.vector_action_space_type == "continuous":
self.create_cc_actor(h_size, num_layers, vis_encode_type)
self.entropy = tf.ones_like(tf.reshape(self.entropy, [-1])) * self.entropy
else:
self.create_dc_actor_critic(h_size, num_layers, vis_encode_type)

self.all_log_probs = tf.identity(all_probs, name="action_probs")
self.entropy = 0.5 * tf.reduce_mean(
single_dim_entropy = 0.5 * tf.reduce_mean(
# Make entropy the right shape
self.entropy = tf.ones_like(tf.reshape(mu[:, 0], [-1])) * single_dim_entropy
# We keep these tensors the same name, but use new nodes to keep code parallelism with discrete control.
self.log_probs = tf.reduce_sum(

1
ml-agents/mlagents/trainers/ppo/optimizer.py


:param out_dict: Output dictionary mapping names to nodes.
:return: Dictionary mapping names to input data.
"""
print(feed_dict)
network_out = self.sess.run(list(out_dict.values()), feed_dict=feed_dict)
run_out = dict(zip(list(out_dict.keys()), network_out))
return run_out
正在加载...
取消
保存