浏览代码

added opp, decay eps removed

/asymm-envs
Andrew Cohen 4 年前
当前提交
13c2a209
共有 4 个文件被更改,包括 11 次插入10 次删除
  1. 4
      Project/Assets/ML-Agents/Examples/Tennis/Prefabs/TennisArea.prefab
  2. 8
      Project/Assets/ML-Agents/Examples/Tennis/Scripts/TennisAgent.cs
  3. 2
      config/trainer_config.yaml
  4. 7
      ml-agents/mlagents/trainers/ppo/optimizer.py

4
Project/Assets/ML-Agents/Examples/Tennis/Prefabs/TennisArea.prefab


m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 10
vectorObservationSize: 14
numStackedVectorObservations: 3
vectorActionSize: 03000000
vectorActionDescriptions: []

m_Name:
m_EditorClassIdentifier:
m_BrainParameters:
vectorObservationSize: 10
vectorObservationSize: 14
numStackedVectorObservations: 3
vectorActionSize: 03000000
vectorActionDescriptions: []

8
Project/Assets/ML-Agents/Examples/Tennis/Scripts/TennisAgent.cs


sensor.AddObservation(m_InvertMult * m_BallRb.velocity.x / 40f);
sensor.AddObservation(m_BallRb.velocity.y / 60f);
//sensor.AddObservation(m_InvertMult * (opponent.transform.position.x - myArea.transform.position.x) / -25f);
//sensor.AddObservation((opponent.transform.position.y - myArea.transform.position.y) / -7f);
//sensor.AddObservation(m_InvertMult * m_OpponentRb.velocity.x / 20f);
//sensor.AddObservation(m_OpponentRb.velocity.y / 20f);
sensor.AddObservation(m_InvertMult * (opponent.transform.position.x - myArea.transform.position.x) / -25f);
sensor.AddObservation((opponent.transform.position.y - myArea.transform.position.y) / -7f);
sensor.AddObservation(m_InvertMult * m_OpponentRb.velocity.x / 20f);
sensor.AddObservation(m_OpponentRb.velocity.y / 20f);
//sensor.AddObservation(m_InvertMult * gameObject.transform.rotation.z);
sensor.AddObservation((m_InvertMult * (gameObject.transform.rotation.eulerAngles.z - (1f - m_InvertMult) * 180f) - 35f) / 125f);

2
config/trainer_config.yaml


batch_size: 2048
buffer_size: 20480
hidden_units: 512
beta: 1.0e-3
beta: 1.0e-2
threaded: false
time_horizon: 1000
self_play:

7
ml-agents/mlagents/trainers/ppo/optimizer.py


)
advantage = tf.expand_dims(self.advantage, -1)
decay_epsilon = tf.train.polynomial_decay(
epsilon, self.policy.global_step, max_step, 0.1, power=1.0
)
# decay_epsilon = tf.train.polynomial_decay(
# epsilon, self.policy.global_step, max_step, 0.1, power=1.0
# )
decay_epsilon = tf.Variable(epsilon)
decay_beta = tf.Variable(beta)
value_losses = []

正在加载...
取消
保存